oops_datedir_repo-0.0.17/0000775000175000017500000000000011723762572016440 5ustar robertcrobertc00000000000000oops_datedir_repo-0.0.17/README0000644000175000017500000000637711712135212017312 0ustar robertcrobertc00000000000000************************************************************************* python-oops-datedir-repo: A simple disk repository for OOPS Error reports ************************************************************************* Copyright (c) 2011, Canonical Ltd This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, version 3 only. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this program. If not, see . GNU Lesser General Public License version 3 (see the file LICENSE). This is a component of the python-oops project: https://launchpad.net/python-oops. An OOPS report is a report about something going wrong in a piece of software... thus, an 'oops' :) This package provides disk storage, management, and a serialisation format for OOPSes stored in the repository. Programs or services that are generating OOPS reports need this package or other similar ones, if they want to persist the reports. Dependencies ============ * Python 2.6+ * The oops package (https://launchpad.net/python-oops or 'oops' on pypi). Testing Dependencies ==================== * fixtures (http://pypi.python.org/pypi/fixtures) * subunit (http://pypi.python.org/pypi/python-subunit) (optional) * testtools (http://pypi.python.org/pypi/testtools) Usage ===== oops_datedir_repo is an extension package for the oops package. The DateDirRepo class provides an OOPS publisher (``DateDirRepo.publish``) which will write OOPSes into the repository. Retrieving OOPSes can be done by using the low level serializer_rfc822 functions : an OOPS report can be written to a disk file via the serializer_rfc822.write() function, and read via the matching read() function. The uniquefileallocator module is used by the repository implementation and provides a system for allocating file names on disk. Typical usage:: >>> config = oops.Config() >>> with fixtures.TempDir() as tempdir: ... repo = oops_datedir_repo.DateDirRepo('/tmp/demo', 'servername') ... config.publishers.append(repo.publish) ... ids = config.publish({'oops': '!!!'}) For more information see the oops package documentation or the api docs. Installation ============ Either run setup.py in an environment with all the dependencies available, or add the working directory to your PYTHONPATH. Development =========== Upstream development takes place at https://launchpad.net/python-oops-datedir-repo. To setup a working area for development, if the dependencies are not immediately available, you can use ./bootstrap.py to create bin/buildout, then bin/py to get a python interpreter with the dependencies available. To run the tests use the runner of your choice, the test suite is oops_datedir_repo.tests.test_suite. For instance:: $ bin/py -m testtools.run oops_datedir_repo.tests.test_suite If you have testrepository you can run the tests with that:: $ testr run oops_datedir_repo-0.0.17/PKG-INFO0000664000175000017500000001105211723762572017534 0ustar robertcrobertc00000000000000Metadata-Version: 1.1 Name: oops_datedir_repo Version: 0.0.17 Summary: OOPS disk serialisation and repository management. Home-page: https://launchpad.net/python-oops-datedir-repo Author: Launchpad Developers Author-email: launchpad-dev@lists.launchpad.net License: UNKNOWN Description: ************************************************************************* python-oops-datedir-repo: A simple disk repository for OOPS Error reports ************************************************************************* Copyright (c) 2011, Canonical Ltd This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, version 3 only. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this program. If not, see . GNU Lesser General Public License version 3 (see the file LICENSE). This is a component of the python-oops project: https://launchpad.net/python-oops. An OOPS report is a report about something going wrong in a piece of software... thus, an 'oops' :) This package provides disk storage, management, and a serialisation format for OOPSes stored in the repository. Programs or services that are generating OOPS reports need this package or other similar ones, if they want to persist the reports. Dependencies ============ * Python 2.6+ * The oops package (https://launchpad.net/python-oops or 'oops' on pypi). Testing Dependencies ==================== * fixtures (http://pypi.python.org/pypi/fixtures) * subunit (http://pypi.python.org/pypi/python-subunit) (optional) * testtools (http://pypi.python.org/pypi/testtools) Usage ===== oops_datedir_repo is an extension package for the oops package. The DateDirRepo class provides an OOPS publisher (``DateDirRepo.publish``) which will write OOPSes into the repository. Retrieving OOPSes can be done by using the low level serializer_rfc822 functions : an OOPS report can be written to a disk file via the serializer_rfc822.write() function, and read via the matching read() function. The uniquefileallocator module is used by the repository implementation and provides a system for allocating file names on disk. Typical usage:: >>> config = oops.Config() >>> with fixtures.TempDir() as tempdir: ... repo = oops_datedir_repo.DateDirRepo('/tmp/demo', 'servername') ... config.publishers.append(repo.publish) ... ids = config.publish({'oops': '!!!'}) For more information see the oops package documentation or the api docs. Installation ============ Either run setup.py in an environment with all the dependencies available, or add the working directory to your PYTHONPATH. Development =========== Upstream development takes place at https://launchpad.net/python-oops-datedir-repo. To setup a working area for development, if the dependencies are not immediately available, you can use ./bootstrap.py to create bin/buildout, then bin/py to get a python interpreter with the dependencies available. To run the tests use the runner of your choice, the test suite is oops_datedir_repo.tests.test_suite. For instance:: $ bin/py -m testtools.run oops_datedir_repo.tests.test_suite If you have testrepository you can run the tests with that:: $ testr run Platform: UNKNOWN Classifier: Development Status :: 2 - Pre-Alpha Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python oops_datedir_repo-0.0.17/setup.py0000755000175000017500000000414111723762426020151 0ustar robertcrobertc00000000000000#!/usr/bin/env python # # Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). from distutils.core import setup import os.path description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read() setup(name="oops_datedir_repo", version="0.0.17", description="OOPS disk serialisation and repository management.", long_description=description, maintainer="Launchpad Developers", maintainer_email="launchpad-dev@lists.launchpad.net", url="https://launchpad.net/python-oops-datedir-repo", packages=['oops_datedir_repo'], package_dir = {'':'.'}, classifiers = [ 'Development Status :: 2 - Pre-Alpha', 'Intended Audience :: Developers', 'License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)', 'Operating System :: OS Independent', 'Programming Language :: Python', ], install_requires = [ 'bson', 'iso8601', 'launchpadlib', # Needed for pruning - perhaps should be optional. 'oops', 'pytz', ], extras_require = dict( test=[ 'fixtures', 'testtools', ] ), entry_points=dict( console_scripts=[ # `console_scripts` is a magic name to setuptools 'bsondump = oops_datedir_repo.bsondump:main', 'prune = oops_datedir_repo.prune:main', ]), ) oops_datedir_repo-0.0.17/oops_datedir_repo/0000775000175000017500000000000011723762572022141 5ustar robertcrobertc00000000000000oops_datedir_repo-0.0.17/oops_datedir_repo/repository.py0000644000175000017500000003002111723216367024720 0ustar robertcrobertc00000000000000# # Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """The primary interface to oopses stored on disk - the DateDirRepo.""" __metaclass__ = type __all__ = [ 'DateDirRepo', ] import datetime import errno from functools import partial from hashlib import md5 import os.path import stat from pytz import utc import anybson as bson import serializer import serializer_bson from uniquefileallocator import UniqueFileAllocator class DateDirRepo: """Publish oopses to a date-dir repository. A date-dir repository is a directory containing: * Zero or one directories called 'metadata'. If it exists this directory contains any housekeeping material needed (such as a metadata.conf ini file). * Zero or more directories named like YYYY-MM-DD, which contain zero or more OOPS reports. OOPS file names can take various forms, but must not end in .tmp - those are considered to be OOPS reports that are currently being written. """ def __init__(self, error_dir, instance_id=None, serializer=None, inherit_id=False, stash_path=False): """Create a DateDirRepo. :param error_dir: The base directory to write OOPSes into. OOPSes are written into a subdirectory this named after the date (e.g. 2011-12-30). :param instance_id: If None, OOPS file names are named after the OOPS id which is generated by hashing the serialized OOPS (without the id field). Otherwise OOPS file names and ids are created by allocating file names through a UniqueFileAllocator. UniqueFileAllocator has significant performance and concurrency limits and hash based naming is recommended. :param serializer: If supplied should be the module (e.g. oops_datedir_repo.serializer_rfc822) to use to serialize OOPSes. Defaults to using serializer_bson. :param inherit_id: If True, use the oops ID (if present) supplied in the report, rather than always assigning a new one. :param stash_path: If True, the filename that the OOPS was written to is stored in the OOPS report under the key 'datedir_repo_filepath'. It is not stored in the OOPS written to disk, only the in-memory model. """ if instance_id is not None: self.log_namer = UniqueFileAllocator( output_root=error_dir, log_type="OOPS", log_subtype=instance_id, ) else: self.log_namer = None self.root = error_dir if serializer is None: serializer = serializer_bson self.serializer = serializer self.inherit_id = inherit_id self.stash_path = stash_path self.metadatadir = os.path.join(self.root, 'metadata') self.config_path = os.path.join(self.metadatadir, 'config.bson') def publish(self, report, now=None): """Write the report to disk. The report is written to a temporary file, and then renamed to its final location. Programs concurrently reading from a DateDirRepo should ignore files ending in .tmp. :param now: The datetime to use as the current time. Will be determined if not supplied. Useful for testing. """ # We set file permission to: rw-r--r-- (so that reports from # umask-restricted services can be gathered by a tool running as # another user). wanted_file_permission = ( stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH) if now is not None: now = now.astimezone(utc) else: now = datetime.datetime.now(utc) # Don't mess with the original report when changing ids etc. original_report = report report = dict(report) if self.log_namer is not None: oopsid, filename = self.log_namer.newId(now) else: md5hash = md5(serializer_bson.dumps(report)).hexdigest() oopsid = 'OOPS-%s' % md5hash prefix = os.path.join(self.root, now.strftime('%Y-%m-%d')) if not os.path.isdir(prefix): os.makedirs(prefix) # For directories we need to set the x bits too. os.chmod( prefix, wanted_file_permission | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH) filename = os.path.join(prefix, oopsid) if self.inherit_id: oopsid = report.get('id') or oopsid report['id'] = oopsid self.serializer.write(report, open(filename + '.tmp', 'wb')) os.rename(filename + '.tmp', filename) if self.stash_path: original_report['datedir_repo_filepath'] = filename os.chmod(filename, wanted_file_permission) return report['id'] def republish(self, publisher): """Republish the contents of the DateDirRepo to another publisher. This makes it easy to treat a DateDirRepo as a backing store in message queue environments: if the message queue is down, flush to the DateDirRepo, then later pick the OOPSes up and send them to the message queue environment. For instance: >>> repo = DateDirRepo('.') >>> repo.publish({'some':'report'}) >>> queue = [] >>> def queue_publisher(report): ... queue.append(report) ... return report['id'] >>> repo.republish(queue_publisher) Will scan the disk and send the single found report to queue_publisher, deleting the report afterwards. Empty datedir directories are automatically cleaned up, as are stale .tmp files. If the publisher returns None, signalling that it did not publish the report, then the report is not deleted from disk. """ two_days = datetime.timedelta(2) now = datetime.date.today() old = now - two_days for dirname, (y,m,d) in self._datedirs(): date = datetime.date(y, m, d) prune = date < old dirpath = os.path.join(self.root, dirname) files = os.listdir(dirpath) if not files and prune: # Cleanup no longer needed directory. os.rmdir(dirpath) for candidate in map(partial(os.path.join, dirpath), files): if candidate.endswith('.tmp'): if prune: os.unlink(candidate) continue with file(candidate, 'rb') as report_file: report = serializer.read(report_file) oopsid = publisher(report) if oopsid: os.unlink(candidate) def _datedirs(self): """Yield each subdir which looks like a datedir.""" for dirname in os.listdir(self.root): try: y, m, d = dirname.split('-') y = int(y) m = int(m) d = int(d) except ValueError: # Not a datedir continue yield dirname, (y, m, d) def _read_config(self): """Return the current config document from disk.""" try: with open(self.config_path, 'rb') as config_file: return bson.loads(config_file.read()) except IOError, e: if e.errno != errno.ENOENT: raise return {} def get_config(self, key): """Return a key from the repository config. :param key: A key to read from the config. """ return self._read_config()[key] def set_config(self, key, value): """Set config option key to value. This is written to the bson document root/metadata/config.bson :param key: The key to set - anything that can be a key in a bson document. :param value: The value to set - anything that can be a value in a bson document. """ config = self._read_config() config[key] = value try: with open(self.config_path + '.tmp', 'wb') as config_file: config_file.write(bson.dumps(config)) except IOError, e: if e.errno != errno.ENOENT: raise os.mkdir(self.metadatadir) with open(self.config_path + '.tmp', 'wb') as config_file: config_file.write(bson.dumps(config)) os.rename(self.config_path + '.tmp', self.config_path) def oldest_date(self): """Return the date of the oldest datedir in the repository. If pruning / resubmission is working this should also be the date of the oldest oops in the repository. """ dirs = list(self._datedirs()) if not dirs: raise ValueError("No OOPSes in repository.") return datetime.date(*sorted(dirs)[0][1]) def prune_unreferenced(self, start_time, stop_time, references): """Delete OOPS reports filed between start_time and stop_time. A report is deleted if all of the following are true: * it is in a datedir covered by [start_time, stop_time] inclusive of the end points. * It is not in the set references. * Its timestamp falls between start_time and stop_time inclusively or it's timestamp is outside the datedir it is in or there is no timestamp on the report. :param start_time: The lower bound to prune within. :param stop_time: The upper bound to prune within. :param references: An iterable of OOPS ids to keep. """ start_date = start_time.date() stop_date = stop_time.date() midnight = datetime.time(tzinfo=utc) for dirname, (y,m,d) in self._datedirs(): dirdate = datetime.date(y, m, d) if dirdate < start_date or dirdate > stop_date: continue dirpath = os.path.join(self.root, dirname) files = os.listdir(dirpath) deleted = 0 for candidate in map(partial(os.path.join, dirpath), files): if candidate.endswith('.tmp'): # Old half-written oops: just remove. os.unlink(candidate) deleted += 1 continue with file(candidate, 'rb') as report_file: report = serializer.read(report_file) report_time = report.get('time', None) if (report_time is None or report_time.date() < dirdate or report_time.date() > dirdate): # The report is oddly filed or missing a precise # datestamp. Treat it like midnight on the day of the # directory it was placed in - this is a lower bound on # when it was actually created. report_time = datetime.datetime.combine( dirdate, midnight) if (report_time >= start_time and report_time <= stop_time and report['id'] not in references): # Unreferenced and prunable os.unlink(candidate) deleted += 1 if deleted == len(files): # Everything in the directory was deleted. os.rmdir(dirpath) oops_datedir_repo-0.0.17/oops_datedir_repo/serializer.py0000644000175000017500000000371111723762235024660 0ustar robertcrobertc00000000000000# Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """Read from any known serializer. Where possible using the specific known serializer is better as it is more efficient and won't suffer false positives if two serializations happen to pun with each other (unlikely though that is). Typical usage: >>> fp = file('an-oops', 'rb') >>> report = serializer.read(fp) See the serializer_rfc822 and serializer_bson modules for information about serializing OOPS reports by hand. Generally just using the DateDirRepo.publish method is all that is needed. """ __all__ = [ 'read', ] import bz2 from StringIO import StringIO from oops_datedir_repo import ( anybson as bson, serializer_bson, serializer_rfc822, ) def read(fp): """Deserialize an OOPS from a bson or rfc822 message. The whole file is read regardless of the OOPS format. :raises IOError: If the file has no content. """ # Deal with no-rewindable file pointers. content = fp.read() if len(content) == 0: # This OOPS has no content raise IOError("Empty OOPS Report") if content[0:3] == "BZh": content = bz2.decompress(content) try: return serializer_bson.read(StringIO(content)) except (KeyError, bson.InvalidBSON): return serializer_rfc822.read(StringIO(content)) oops_datedir_repo-0.0.17/oops_datedir_repo/anybson.py0000664000175000017500000000211311723216367024155 0ustar robertcrobertc00000000000000# Copyright (c) 2012, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). __all__ = [ 'dumps', 'loads', ] try: from bson import dumps, loads # Create the exception that won't be raised by this version of # bson class InvalidBSON(Exception): pass except ImportError: from bson import BSON, InvalidBSON def dumps(obj): return BSON.encode(obj) def loads(data): return BSON(data).decode(tz_aware=True) oops_datedir_repo-0.0.17/oops_datedir_repo/uniquefileallocator.py0000644000175000017500000002015711712135212026544 0ustar robertcrobertc00000000000000# Copyright (c) 2010, 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """Create uniquely named log files on disk.""" __all__ = ['UniqueFileAllocator'] __metaclass__ = type import datetime import errno import os.path import stat import threading import pytz UTC = pytz.utc # the section of the ID before the instance identifier is the # days since the epoch, which is defined as the start of 2006. epoch = datetime.datetime(2006, 01, 01, 00, 00, 00, tzinfo=UTC) class UniqueFileAllocator: """Assign unique file names to logs being written from an app/script. UniqueFileAllocator causes logs written from one process to be uniquely named. It is not safe for use in multiple processes with the same output root - each process must have a unique output root. """ def __init__(self, output_root, log_type, log_subtype): """Create a UniqueFileAllocator. :param output_root: The root directory that logs should be placed in. :param log_type: A string to use as a prefix in the ID assigned to new logs. For instance, "OOPS". :param log_subtype: A string to insert in the generate log filenames between the day number and the serial. For instance "T" for "Testing". """ self._lock = threading.Lock() self._output_root = output_root self._last_serial = 0 self._last_output_dir = None self._log_type = log_type self._log_subtype = log_subtype self._log_token = "" def _findHighestSerialFilename(self, directory=None, time=None): """Find details of the last log present in the given directory. This function only considers logs with the currently configured log_subtype. One of directory, time must be supplied. :param directory: Look in this directory. :param time: Look in the directory that a log written at this time would have been written to. If supplied, supercedes directory. :return: a tuple (log_serial, log_filename), which will be (0, None) if no logs are found. log_filename is a usable path, not simply the basename. """ if directory is None: directory = self.output_dir(time) prefix = self.get_log_infix() lastid = 0 lastfilename = None for filename in os.listdir(directory): logid = filename.rsplit('.', 1)[1] if not logid.startswith(prefix): continue logid = logid[len(prefix):] if logid.isdigit() and (lastid is None or int(logid) > lastid): lastid = int(logid) lastfilename = filename if lastfilename is not None: lastfilename = os.path.join(directory, lastfilename) return lastid, lastfilename def _findHighestSerial(self, directory): """Find the last serial actually applied to disk in directory. The purpose of this function is to not repeat sequence numbers if the logging application is restarted. This method is not thread safe, and only intended to be called from the constructor (but it is called from other places in integration tests). """ return self._findHighestSerialFilename(directory)[0] def getFilename(self, log_serial, time): """Get the filename for a given log serial and time.""" log_subtype = self.get_log_infix() # TODO: Calling output_dir causes a global lock to be taken and a # directory scan, which is bad for performance. It would be better # to have a split out 'directory name for time' function which the # 'want to use this directory now' function can call. output_dir = self.output_dir(time) second_in_day = time.hour * 3600 + time.minute * 60 + time.second return os.path.join( output_dir, '%05d.%s%s' % ( second_in_day, log_subtype, log_serial)) def get_log_infix(self): """Return the current log infix to use in ids and file names.""" return self._log_subtype + self._log_token def newId(self, now=None): """Returns an (id, filename) pair for use by the caller. The ID is composed of a short string to identify the Launchpad instance followed by an ID that is unique for the day. The filename is composed of the zero padded second in the day followed by the ID. This ensures that reports are in date order when sorted lexically. """ if now is not None: now = now.astimezone(UTC) else: now = datetime.datetime.now(UTC) # We look up the error directory before allocating a new ID, # because if the day has changed, errordir() will reset the ID # counter to zero. self.output_dir(now) self._lock.acquire() try: self._last_serial += 1 newid = self._last_serial finally: self._lock.release() subtype = self.get_log_infix() day_number = (now - epoch).days + 1 log_id = '%s-%d%s%d' % (self._log_type, day_number, subtype, newid) filename = self.getFilename(newid, now) return log_id, filename def output_dir(self, now=None): """Find or make the directory to allocate log names in. Log names are assigned within subdirectories containing the date the assignment happened. """ if now is not None: now = now.astimezone(UTC) else: now = datetime.datetime.now(UTC) date = now.strftime('%Y-%m-%d') result = os.path.join(self._output_root, date) if result != self._last_output_dir: self._lock.acquire() try: self._last_output_dir = result # make sure the directory exists try: os.makedirs(result) except OSError, e: if e.errno != errno.EEXIST: raise # Make sure the directory permission is set to: rwxr-xr-x permission = ( stat.S_IRWXU | stat.S_IRGRP | stat.S_IXGRP | stat.S_IROTH | stat.S_IXOTH) os.chmod(result, permission) # TODO: Note that only one process can do this safely: its not # cross-process safe, and also not entirely threadsafe: # another # thread that has a new log and hasn't written it # could then use that serial number. We should either make it # really safe, or remove the contention entirely and log # uniquely per thread of execution. self._last_serial = self._findHighestSerial(result) finally: self._lock.release() return result def listRecentReportFiles(self): now = datetime.datetime.now(UTC) yesterday = now - datetime.timedelta(days=1) directories = [self.output_dir(now), self.output_dir(yesterday)] for directory in directories: report_names = os.listdir(directory) for name in sorted(report_names, reverse=True): yield directory, name def setToken(self, token): """Append a string to the log subtype in filenames and log ids. :param token: a string to append.. Scripts that run multiple processes can use this to create a unique identifier for each process. """ self._log_token = token oops_datedir_repo-0.0.17/oops_datedir_repo/serializer_rfc822.py0000644000175000017500000001736711712135212025745 0ustar robertcrobertc00000000000000# Copyright (c) 2010, 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """Read / Write an OOPS dict as an rfc822 formatted message. This style of OOPS format is very web server specific, not extensible - it should be considered deprecated. The reports this serializer handles always have the following variables (See the python-oops api docs for more information about these variables): * id: The name of this error report. * type: The type of the exception that occurred. * value: The value of the exception that occurred. * time: The time at which the exception occurred. * reporter: The reporting program. * topic: The identifier for the template/script that oopsed. [this is written as Page-Id for compatibility with as yet unported tools.] * branch_nick: The branch nickname. * revno: The revision number of the branch. * tb_text: A text version of the traceback. * username: The user associated with the request. * url: The URL for the failed request. * req_vars: The request variables. Either a list of 2-tuples or a dict. * branch_nick: A name for the branch of code that was running when the report was triggered. * revno: The revision that the branch was at. * Informational: A flag, True if the error wasn't fatal- if it was 'informational'. [Deprecated - this is no longer part of the oops report conventions. Existing reports with it set are still read, but the key is only present if it was truely in the report.] """ __all__ = [ 'read', 'write', ] __metaclass__ = type import datetime import logging import rfc822 import re import urllib import iso8601 def read(fp): """Deserialize an OOPS from an RFC822 format message.""" msg = rfc822.Message(fp) id = msg.getheader('oops-id') exc_type = msg.getheader('exception-type') exc_value = msg.getheader('exception-value') datestr = msg.getheader('date') if datestr is not None: date = iso8601.parse_date(msg.getheader('date')) else: date = None topic = msg.getheader('topic') if topic is None: topic = msg.getheader('page-id') username = msg.getheader('user') url = msg.getheader('url') try: duration = float(msg.getheader('duration', '-1')) except ValueError: duration = float(-1) informational = msg.getheader('informational') branch_nick = msg.getheader('branch') revno = msg.getheader('revision') reporter = msg.getheader('oops-reporter') # Explicitly use an iterator so we can process the file # sequentially. In most instances the iterator will actually # be the file object passed in because file objects should # support iteration. lines = iter(msg.fp) statement_pat = re.compile(r'^(\d+)-(\d+)(?:@([\w-]+))?\s+(.*)') def is_req_var(line): return "=" in line and not statement_pat.match(line) def is_traceback(line): return line.lower().startswith('traceback') or line.startswith( '== EXTRA DATA ==') req_vars = [] statements = [] first_tb_line = '' for line in lines: first_tb_line = line line = line.strip() if line == '': continue else: match = statement_pat.match(line) if match is not None: start, end, db_id, statement = match.groups() if db_id is not None: db_id = intern(db_id) # This string is repeated lots. statements.append( [int(start), int(end), db_id, statement]) elif is_req_var(line): key, value = line.split('=', 1) req_vars.append([urllib.unquote(key), urllib.unquote(value)]) elif is_traceback(line): break req_vars = dict(req_vars) # The rest is traceback. tb_text = ''.join([first_tb_line] + list(lines)) result = dict(id=id, type=exc_type, value=exc_value, time=date, topic=topic, tb_text=tb_text, username=username, url=url, duration=duration, req_vars=req_vars, timeline=statements, branch_nick=branch_nick, revno=revno) if informational is not None: result['informational'] = informational if reporter is not None: result['reporter'] = reporter return result def _normalise_whitespace(s): """Normalise the whitespace in a string to spaces""" if s is None: return None # (used by the cast to %s to get 'None') return ' '.join(s.split()) def _safestr(obj): if isinstance(obj, unicode): return obj.replace('\\', '\\\\').encode('ASCII', 'backslashreplace') # A call to str(obj) could raise anything at all. # We'll ignore these errors, and print something # useful instead, but also log the error. # We disable the pylint warning for the blank except. try: value = str(obj) except: logging.getLogger('oops_datedir_repo.serializer_rfc822').exception( 'Error while getting a str ' 'representation of an object') value = '' % ( str(type(obj).__name__)) # Some str() calls return unicode objects. if isinstance(value, unicode): return _safestr(value) # encode non-ASCII characters value = value.replace('\\', '\\\\') value = re.sub(r'[\x80-\xff]', lambda match: '\\x%02x' % ord(match.group(0)), value) return value def to_chunks(report): """Returns a list of bytestrings making up the serialized oops.""" chunks = [] def header(label, key, optional=True): if optional and key not in report: return value = _safestr(report[key]) value = _normalise_whitespace(value) chunks.append('%s: %s\n' % (label, value)) header('Oops-Id', 'id', optional=False) header('Exception-Type', 'type') header('Exception-Value', 'value') if 'time' in report: chunks.append('Date: %s\n' % report['time'].isoformat()) header('Page-Id', 'topic') header('Branch', 'branch_nick') header('Revision', 'revno') header('User', 'username') header('URL', 'url') header('Duration', 'duration') header('Informational', 'informational') header('Oops-Reporter', 'reporter') chunks.append('\n') safe_chars = ';/\\?:@&+$, ()*!' if 'req_vars' in report: try: items = sorted(report['req_vars'].items()) except AttributeError: items = report['req_vars'] for key, value in items: chunks.append('%s=%s\n' % ( urllib.quote(_safestr(key), safe_chars), urllib.quote(_safestr(value), safe_chars))) chunks.append('\n') if 'timeline' in report: for row in report['timeline']: (start, end, category, statement) = row[:4] chunks.append('%05d-%05d@%s %s\n' % ( start, end, _safestr(category), _safestr(_normalise_whitespace(statement)))) chunks.append('\n') if 'tb_text' in report: chunks.append(_safestr(report['tb_text'])) return chunks def write(report, output): """Write a report to a file.""" output.writelines(to_chunks(report)) oops_datedir_repo-0.0.17/oops_datedir_repo/serializer_bson.py0000644000175000017500000000475011723216367025705 0ustar robertcrobertc00000000000000# Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """Read / Write an OOPS dict as a bson dict. This style of OOPS format is very extensible and maintains compatability with older rfc822 oops code: the previously mandatory keys are populated on read. Use of bson serializing is recommended. The reports this serializer handles always have the following variables (See the python-oops api docs for more information about these variables): * id: The name of this error report. * type: The type of the exception that occurred. * value: The value of the exception that occurred. * time: The time at which the exception occurred. * reporter: The reporting program. * topic: The identifier for the template/script that oopsed. * branch_nick: The branch nickname. * revno: The revision number of the branch. * tb_text: A text version of the traceback. * username: The user associated with the request. * url: The URL for the failed request. * req_vars: The request variables. Either a list of 2-tuples or a dict. * branch_nick: A name for the branch of code that was running when the report was triggered. * revno: The revision that the branch was at. """ __all__ = [ 'dumps', 'read', 'write', ] __metaclass__ = type import anybson as bson def read(fp): """Deserialize an OOPS from a bson message.""" report = bson.loads(fp.read()) for key in ( 'branch_nick', 'revno', 'type', 'value', 'time', 'topic', 'username', 'url'): report.setdefault(key, None) report.setdefault('duration', -1) report.setdefault('req_vars', {}) report.setdefault('tb_text', '') report.setdefault('timeline', []) return report def dumps(report): """Return a binary string representing report.""" return bson.dumps(report) def write(report, fp): """Write report to fp.""" return fp.write(dumps(report)) oops_datedir_repo-0.0.17/oops_datedir_repo/bsondump.py0000644000175000017500000000247111723216367024340 0ustar robertcrobertc00000000000000# # Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """Print a BSON document for easier human inspection. This can be used for oopses, which are commonly (though not necessarily) stored as BSON. usage: bsondump FILE """ from pprint import pprint import sys import anybson as bson def main(argv=None): if argv is None: argv = sys.argv if len(argv) != 2: print __doc__ sys.exit(1) # I'd like to use json here, but not everything serializable in bson is # easily representable in json - even before getting in to the weird parts, # oopses commonly have datetime objects. -- mbp 2011-12-20 pprint(bson.loads(file(argv[1]).read())) oops_datedir_repo-0.0.17/oops_datedir_repo/prune.py0000644000175000017500000001345411712135212023630 0ustar robertcrobertc00000000000000# # Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). """Delete OOPSes that are not referenced in the bugtracker. Currently only has support for the Launchpad bug tracker. """ __metaclass__ = type import datetime import logging import optparse from textwrap import dedent import sys from launchpadlib.launchpad import Launchpad from launchpadlib.uris import lookup_service_root from pytz import utc import oops_datedir_repo __all__ = [ 'main', ] class LaunchpadTracker: """Abstracted bug tracker/forums etc - permits testing of main().""" def __init__(self, options): self.lp = Launchpad.login_anonymously( 'oops-prune', options.lpinstance, version='devel') def find_oops_references(self, start_time, end_time, project=None, projectgroup=None): projects = set([]) if project is not None: projects.add(project) if projectgroup is not None: [projects.add(lp_proj.name) for lp_proj in self.lp.project_groups[projectgroup].projects] result = set() lp_projects = self.lp.projects one_week = datetime.timedelta(weeks=1) for project in projects: lp_project = lp_projects[project] current_start = start_time while current_start < end_time: current_end = current_start + one_week if current_end > end_time: current_end = end_time logging.info( "Querying OOPS references on %s from %s to %s", project, current_start, current_end) result.update(lp_project.findReferencedOOPS( start_date=current_start, end_date=current_end)) current_start = current_end return result def main(argv=None, tracker=LaunchpadTracker, logging=logging): """Console script entry point.""" if argv is None: argv = sys.argv usage = dedent("""\ %prog [options] The following options must be supplied: --repo And either --project or --projectgroup e.g. %prog --repo . --projectgroup launchpad-project Will process every member project of launchpad-project. When run this program will ask Launchpad for OOPS references made since the last date it pruned up to, with an upper limit of one week from today. It then looks in the repository for all oopses created during that date range, and if they are not in the set returned by Launchpad, deletes them. If the repository has never been pruned before, it will pick the earliest datedir present in the repository as the start date. """) description = \ "Delete OOPS reports that are not referenced in a bug tracker." parser = optparse.OptionParser( description=description, usage=usage) parser.add_option('--project', help="Launchpad project to find references in.") parser.add_option('--projectgroup', help="Launchpad project group to find references in.") parser.add_option('--repo', help="Path to the repository to read from.") parser.add_option( '--lpinstance', help="Launchpad instance to use", default="production") options, args = parser.parse_args(argv[1:]) def needed(*optnames): present = set() for optname in optnames: if getattr(options, optname, None) is not None: present.add(optname) if not present: if len(optnames) == 1: raise ValueError('Option "%s" must be supplied' % optname) else: raise ValueError( 'One of options %s must be supplied' % (optnames,)) elif len(present) != 1: raise ValueError( 'Only one of options %s can be supplied' % (optnames,)) needed('repo') needed('project', 'projectgroup') logging.basicConfig( filename='prune.log', filemode='w', level=logging.DEBUG) repo = oops_datedir_repo.DateDirRepo(options.repo) one_week = datetime.timedelta(weeks=1) one_day = datetime.timedelta(days=1) # Only prune OOPS reports more than one week old. prune_until = datetime.datetime.now(utc) - one_week # Ignore OOPS reports we already found references for - older than the last # prune date. try: prune_from = repo.get_config('pruned-until') except KeyError: try: oldest_oops = repo.oldest_date() except ValueError: logging.info("No OOPSes in repo, nothing to do.") return 0 midnight_utc = datetime.time(tzinfo=utc) prune_from = datetime.datetime.combine(oldest_oops, midnight_utc) # The tracker finds all the references for the selected dates. finder = tracker(options) references = finder.find_oops_references( prune_from, prune_until, options.project, options.projectgroup) # Then we can delete the unreferenced oopses. repo.prune_unreferenced(prune_from, prune_until, references) # And finally save the fact we have scanned up to the selected date. repo.set_config('pruned-until', prune_until) return 0 oops_datedir_repo-0.0.17/oops_datedir_repo/__init__.py0000644000175000017500000000302511723762453024246 0ustar robertcrobertc00000000000000# # Copyright (c) 2011, Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, version 3 only. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see . # GNU Lesser General Public License version 3 (see the file LICENSE). # same format as sys.version_info: "A tuple containing the five components of # the version number: major, minor, micro, releaselevel, and serial. All # values except releaselevel are integers; the release level is 'alpha', # 'beta', 'candidate', or 'final'. The version_info value corresponding to the # Python version 2.0 is (2, 0, 0, 'final', 0)." Additionally we use a # releaselevel of 'dev' for unreleased under-development code. # # If the releaselevel is 'alpha' then the major/minor/micro components are not # established at this point, and setup.py will use a version of next-$(revno). # If the releaselevel is 'final', then the tarball will be major.minor.micro. # Otherwise it is major.minor.micro~$(revno). __version__ = (0, 0, 17, 'beta', 0) __all__ = [ 'DateDirRepo', ] from oops_datedir_repo.repository import DateDirRepo