oops_datedir_repo-0.0.17/ 0000775 0001750 0001750 00000000000 11723762572 016440 5 ustar robertc robertc 0000000 0000000 oops_datedir_repo-0.0.17/README 0000644 0001750 0001750 00000006377 11712135212 017312 0 ustar robertc robertc 0000000 0000000 *************************************************************************
python-oops-datedir-repo: A simple disk repository for OOPS Error reports
*************************************************************************
Copyright (c) 2011, Canonical Ltd
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, version 3 only.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see .
GNU Lesser General Public License version 3 (see the file LICENSE).
This is a component of the python-oops project:
https://launchpad.net/python-oops. An OOPS report is a report
about something going wrong in a piece of software... thus, an 'oops' :)
This package provides disk storage, management, and a serialisation format for
OOPSes stored in the repository. Programs or services that are generating OOPS
reports need this package or other similar ones, if they want to persist the
reports.
Dependencies
============
* Python 2.6+
* The oops package (https://launchpad.net/python-oops or 'oops' on pypi).
Testing Dependencies
====================
* fixtures (http://pypi.python.org/pypi/fixtures)
* subunit (http://pypi.python.org/pypi/python-subunit) (optional)
* testtools (http://pypi.python.org/pypi/testtools)
Usage
=====
oops_datedir_repo is an extension package for the oops package.
The DateDirRepo class provides an OOPS publisher (``DateDirRepo.publish``)
which will write OOPSes into the repository.
Retrieving OOPSes can be done by using the low level serializer_rfc822
functions : an OOPS report can be written to a disk file via the
serializer_rfc822.write() function, and read via the matching read() function.
The uniquefileallocator module is used by the repository implementation and
provides a system for allocating file names on disk.
Typical usage::
>>> config = oops.Config()
>>> with fixtures.TempDir() as tempdir:
... repo = oops_datedir_repo.DateDirRepo('/tmp/demo', 'servername')
... config.publishers.append(repo.publish)
... ids = config.publish({'oops': '!!!'})
For more information see the oops package documentation or the api docs.
Installation
============
Either run setup.py in an environment with all the dependencies available, or
add the working directory to your PYTHONPATH.
Development
===========
Upstream development takes place at https://launchpad.net/python-oops-datedir-repo.
To setup a working area for development, if the dependencies are not
immediately available, you can use ./bootstrap.py to create bin/buildout, then
bin/py to get a python interpreter with the dependencies available.
To run the tests use the runner of your choice, the test suite is
oops_datedir_repo.tests.test_suite.
For instance::
$ bin/py -m testtools.run oops_datedir_repo.tests.test_suite
If you have testrepository you can run the tests with that::
$ testr run
oops_datedir_repo-0.0.17/PKG-INFO 0000664 0001750 0001750 00000011052 11723762572 017534 0 ustar robertc robertc 0000000 0000000 Metadata-Version: 1.1
Name: oops_datedir_repo
Version: 0.0.17
Summary: OOPS disk serialisation and repository management.
Home-page: https://launchpad.net/python-oops-datedir-repo
Author: Launchpad Developers
Author-email: launchpad-dev@lists.launchpad.net
License: UNKNOWN
Description: *************************************************************************
python-oops-datedir-repo: A simple disk repository for OOPS Error reports
*************************************************************************
Copyright (c) 2011, Canonical Ltd
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, version 3 only.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see .
GNU Lesser General Public License version 3 (see the file LICENSE).
This is a component of the python-oops project:
https://launchpad.net/python-oops. An OOPS report is a report
about something going wrong in a piece of software... thus, an 'oops' :)
This package provides disk storage, management, and a serialisation format for
OOPSes stored in the repository. Programs or services that are generating OOPS
reports need this package or other similar ones, if they want to persist the
reports.
Dependencies
============
* Python 2.6+
* The oops package (https://launchpad.net/python-oops or 'oops' on pypi).
Testing Dependencies
====================
* fixtures (http://pypi.python.org/pypi/fixtures)
* subunit (http://pypi.python.org/pypi/python-subunit) (optional)
* testtools (http://pypi.python.org/pypi/testtools)
Usage
=====
oops_datedir_repo is an extension package for the oops package.
The DateDirRepo class provides an OOPS publisher (``DateDirRepo.publish``)
which will write OOPSes into the repository.
Retrieving OOPSes can be done by using the low level serializer_rfc822
functions : an OOPS report can be written to a disk file via the
serializer_rfc822.write() function, and read via the matching read() function.
The uniquefileallocator module is used by the repository implementation and
provides a system for allocating file names on disk.
Typical usage::
>>> config = oops.Config()
>>> with fixtures.TempDir() as tempdir:
... repo = oops_datedir_repo.DateDirRepo('/tmp/demo', 'servername')
... config.publishers.append(repo.publish)
... ids = config.publish({'oops': '!!!'})
For more information see the oops package documentation or the api docs.
Installation
============
Either run setup.py in an environment with all the dependencies available, or
add the working directory to your PYTHONPATH.
Development
===========
Upstream development takes place at https://launchpad.net/python-oops-datedir-repo.
To setup a working area for development, if the dependencies are not
immediately available, you can use ./bootstrap.py to create bin/buildout, then
bin/py to get a python interpreter with the dependencies available.
To run the tests use the runner of your choice, the test suite is
oops_datedir_repo.tests.test_suite.
For instance::
$ bin/py -m testtools.run oops_datedir_repo.tests.test_suite
If you have testrepository you can run the tests with that::
$ testr run
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
oops_datedir_repo-0.0.17/setup.py 0000755 0001750 0001750 00000004141 11723762426 020151 0 ustar robertc robertc 0000000 0000000 #!/usr/bin/env python
#
# Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
from distutils.core import setup
import os.path
description = file(os.path.join(os.path.dirname(__file__), 'README'), 'rb').read()
setup(name="oops_datedir_repo",
version="0.0.17",
description="OOPS disk serialisation and repository management.",
long_description=description,
maintainer="Launchpad Developers",
maintainer_email="launchpad-dev@lists.launchpad.net",
url="https://launchpad.net/python-oops-datedir-repo",
packages=['oops_datedir_repo'],
package_dir = {'':'.'},
classifiers = [
'Development Status :: 2 - Pre-Alpha',
'Intended Audience :: Developers',
'License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)',
'Operating System :: OS Independent',
'Programming Language :: Python',
],
install_requires = [
'bson',
'iso8601',
'launchpadlib', # Needed for pruning - perhaps should be optional.
'oops',
'pytz',
],
extras_require = dict(
test=[
'fixtures',
'testtools',
]
),
entry_points=dict(
console_scripts=[ # `console_scripts` is a magic name to setuptools
'bsondump = oops_datedir_repo.bsondump:main',
'prune = oops_datedir_repo.prune:main',
]),
)
oops_datedir_repo-0.0.17/oops_datedir_repo/ 0000775 0001750 0001750 00000000000 11723762572 022141 5 ustar robertc robertc 0000000 0000000 oops_datedir_repo-0.0.17/oops_datedir_repo/repository.py 0000644 0001750 0001750 00000030021 11723216367 024720 0 ustar robertc robertc 0000000 0000000 #
# Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""The primary interface to oopses stored on disk - the DateDirRepo."""
__metaclass__ = type
__all__ = [
'DateDirRepo',
]
import datetime
import errno
from functools import partial
from hashlib import md5
import os.path
import stat
from pytz import utc
import anybson as bson
import serializer
import serializer_bson
from uniquefileallocator import UniqueFileAllocator
class DateDirRepo:
"""Publish oopses to a date-dir repository.
A date-dir repository is a directory containing:
* Zero or one directories called 'metadata'. If it exists this directory
contains any housekeeping material needed (such as a metadata.conf ini
file).
* Zero or more directories named like YYYY-MM-DD, which contain zero or
more OOPS reports. OOPS file names can take various forms, but must not
end in .tmp - those are considered to be OOPS reports that are currently
being written.
"""
def __init__(self, error_dir, instance_id=None, serializer=None,
inherit_id=False, stash_path=False):
"""Create a DateDirRepo.
:param error_dir: The base directory to write OOPSes into. OOPSes are
written into a subdirectory this named after the date (e.g.
2011-12-30).
:param instance_id: If None, OOPS file names are named after the OOPS
id which is generated by hashing the serialized OOPS (without the
id field). Otherwise OOPS file names and ids are created by
allocating file names through a UniqueFileAllocator.
UniqueFileAllocator has significant performance and concurrency
limits and hash based naming is recommended.
:param serializer: If supplied should be the module (e.g.
oops_datedir_repo.serializer_rfc822) to use to serialize OOPSes.
Defaults to using serializer_bson.
:param inherit_id: If True, use the oops ID (if present) supplied in
the report, rather than always assigning a new one.
:param stash_path: If True, the filename that the OOPS was written to
is stored in the OOPS report under the key 'datedir_repo_filepath'.
It is not stored in the OOPS written to disk, only the in-memory
model.
"""
if instance_id is not None:
self.log_namer = UniqueFileAllocator(
output_root=error_dir,
log_type="OOPS",
log_subtype=instance_id,
)
else:
self.log_namer = None
self.root = error_dir
if serializer is None:
serializer = serializer_bson
self.serializer = serializer
self.inherit_id = inherit_id
self.stash_path = stash_path
self.metadatadir = os.path.join(self.root, 'metadata')
self.config_path = os.path.join(self.metadatadir, 'config.bson')
def publish(self, report, now=None):
"""Write the report to disk.
The report is written to a temporary file, and then renamed to its
final location. Programs concurrently reading from a DateDirRepo
should ignore files ending in .tmp.
:param now: The datetime to use as the current time. Will be
determined if not supplied. Useful for testing.
"""
# We set file permission to: rw-r--r-- (so that reports from
# umask-restricted services can be gathered by a tool running as
# another user).
wanted_file_permission = (
stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH)
if now is not None:
now = now.astimezone(utc)
else:
now = datetime.datetime.now(utc)
# Don't mess with the original report when changing ids etc.
original_report = report
report = dict(report)
if self.log_namer is not None:
oopsid, filename = self.log_namer.newId(now)
else:
md5hash = md5(serializer_bson.dumps(report)).hexdigest()
oopsid = 'OOPS-%s' % md5hash
prefix = os.path.join(self.root, now.strftime('%Y-%m-%d'))
if not os.path.isdir(prefix):
os.makedirs(prefix)
# For directories we need to set the x bits too.
os.chmod(
prefix, wanted_file_permission | stat.S_IXUSR | stat.S_IXGRP |
stat.S_IXOTH)
filename = os.path.join(prefix, oopsid)
if self.inherit_id:
oopsid = report.get('id') or oopsid
report['id'] = oopsid
self.serializer.write(report, open(filename + '.tmp', 'wb'))
os.rename(filename + '.tmp', filename)
if self.stash_path:
original_report['datedir_repo_filepath'] = filename
os.chmod(filename, wanted_file_permission)
return report['id']
def republish(self, publisher):
"""Republish the contents of the DateDirRepo to another publisher.
This makes it easy to treat a DateDirRepo as a backing store in message
queue environments: if the message queue is down, flush to the
DateDirRepo, then later pick the OOPSes up and send them to the message
queue environment.
For instance:
>>> repo = DateDirRepo('.')
>>> repo.publish({'some':'report'})
>>> queue = []
>>> def queue_publisher(report):
... queue.append(report)
... return report['id']
>>> repo.republish(queue_publisher)
Will scan the disk and send the single found report to queue_publisher,
deleting the report afterwards.
Empty datedir directories are automatically cleaned up, as are stale
.tmp files.
If the publisher returns None, signalling that it did not publish the
report, then the report is not deleted from disk.
"""
two_days = datetime.timedelta(2)
now = datetime.date.today()
old = now - two_days
for dirname, (y,m,d) in self._datedirs():
date = datetime.date(y, m, d)
prune = date < old
dirpath = os.path.join(self.root, dirname)
files = os.listdir(dirpath)
if not files and prune:
# Cleanup no longer needed directory.
os.rmdir(dirpath)
for candidate in map(partial(os.path.join, dirpath), files):
if candidate.endswith('.tmp'):
if prune:
os.unlink(candidate)
continue
with file(candidate, 'rb') as report_file:
report = serializer.read(report_file)
oopsid = publisher(report)
if oopsid:
os.unlink(candidate)
def _datedirs(self):
"""Yield each subdir which looks like a datedir."""
for dirname in os.listdir(self.root):
try:
y, m, d = dirname.split('-')
y = int(y)
m = int(m)
d = int(d)
except ValueError:
# Not a datedir
continue
yield dirname, (y, m, d)
def _read_config(self):
"""Return the current config document from disk."""
try:
with open(self.config_path, 'rb') as config_file:
return bson.loads(config_file.read())
except IOError, e:
if e.errno != errno.ENOENT:
raise
return {}
def get_config(self, key):
"""Return a key from the repository config.
:param key: A key to read from the config.
"""
return self._read_config()[key]
def set_config(self, key, value):
"""Set config option key to value.
This is written to the bson document root/metadata/config.bson
:param key: The key to set - anything that can be a key in a bson
document.
:param value: The value to set - anything that can be a value in a
bson document.
"""
config = self._read_config()
config[key] = value
try:
with open(self.config_path + '.tmp', 'wb') as config_file:
config_file.write(bson.dumps(config))
except IOError, e:
if e.errno != errno.ENOENT:
raise
os.mkdir(self.metadatadir)
with open(self.config_path + '.tmp', 'wb') as config_file:
config_file.write(bson.dumps(config))
os.rename(self.config_path + '.tmp', self.config_path)
def oldest_date(self):
"""Return the date of the oldest datedir in the repository.
If pruning / resubmission is working this should also be the date of
the oldest oops in the repository.
"""
dirs = list(self._datedirs())
if not dirs:
raise ValueError("No OOPSes in repository.")
return datetime.date(*sorted(dirs)[0][1])
def prune_unreferenced(self, start_time, stop_time, references):
"""Delete OOPS reports filed between start_time and stop_time.
A report is deleted if all of the following are true:
* it is in a datedir covered by [start_time, stop_time] inclusive of
the end points.
* It is not in the set references.
* Its timestamp falls between start_time and stop_time inclusively or
it's timestamp is outside the datedir it is in or there is no
timestamp on the report.
:param start_time: The lower bound to prune within.
:param stop_time: The upper bound to prune within.
:param references: An iterable of OOPS ids to keep.
"""
start_date = start_time.date()
stop_date = stop_time.date()
midnight = datetime.time(tzinfo=utc)
for dirname, (y,m,d) in self._datedirs():
dirdate = datetime.date(y, m, d)
if dirdate < start_date or dirdate > stop_date:
continue
dirpath = os.path.join(self.root, dirname)
files = os.listdir(dirpath)
deleted = 0
for candidate in map(partial(os.path.join, dirpath), files):
if candidate.endswith('.tmp'):
# Old half-written oops: just remove.
os.unlink(candidate)
deleted += 1
continue
with file(candidate, 'rb') as report_file:
report = serializer.read(report_file)
report_time = report.get('time', None)
if (report_time is None or
report_time.date() < dirdate or
report_time.date() > dirdate):
# The report is oddly filed or missing a precise
# datestamp. Treat it like midnight on the day of the
# directory it was placed in - this is a lower bound on
# when it was actually created.
report_time = datetime.datetime.combine(
dirdate, midnight)
if (report_time >= start_time and
report_time <= stop_time and
report['id'] not in references):
# Unreferenced and prunable
os.unlink(candidate)
deleted += 1
if deleted == len(files):
# Everything in the directory was deleted.
os.rmdir(dirpath)
oops_datedir_repo-0.0.17/oops_datedir_repo/serializer.py 0000644 0001750 0001750 00000003711 11723762235 024660 0 ustar robertc robertc 0000000 0000000 # Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""Read from any known serializer.
Where possible using the specific known serializer is better as it is more
efficient and won't suffer false positives if two serializations happen to pun
with each other (unlikely though that is).
Typical usage:
>>> fp = file('an-oops', 'rb')
>>> report = serializer.read(fp)
See the serializer_rfc822 and serializer_bson modules for information about
serializing OOPS reports by hand. Generally just using the DateDirRepo.publish
method is all that is needed.
"""
__all__ = [
'read',
]
import bz2
from StringIO import StringIO
from oops_datedir_repo import (
anybson as bson,
serializer_bson,
serializer_rfc822,
)
def read(fp):
"""Deserialize an OOPS from a bson or rfc822 message.
The whole file is read regardless of the OOPS format.
:raises IOError: If the file has no content.
"""
# Deal with no-rewindable file pointers.
content = fp.read()
if len(content) == 0:
# This OOPS has no content
raise IOError("Empty OOPS Report")
if content[0:3] == "BZh":
content = bz2.decompress(content)
try:
return serializer_bson.read(StringIO(content))
except (KeyError, bson.InvalidBSON):
return serializer_rfc822.read(StringIO(content))
oops_datedir_repo-0.0.17/oops_datedir_repo/anybson.py 0000664 0001750 0001750 00000002113 11723216367 024155 0 ustar robertc robertc 0000000 0000000 # Copyright (c) 2012, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
__all__ = [
'dumps',
'loads',
]
try:
from bson import dumps, loads
# Create the exception that won't be raised by this version of
# bson
class InvalidBSON(Exception):
pass
except ImportError:
from bson import BSON, InvalidBSON
def dumps(obj):
return BSON.encode(obj)
def loads(data):
return BSON(data).decode(tz_aware=True)
oops_datedir_repo-0.0.17/oops_datedir_repo/uniquefileallocator.py 0000644 0001750 0001750 00000020157 11712135212 026544 0 ustar robertc robertc 0000000 0000000 # Copyright (c) 2010, 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""Create uniquely named log files on disk."""
__all__ = ['UniqueFileAllocator']
__metaclass__ = type
import datetime
import errno
import os.path
import stat
import threading
import pytz
UTC = pytz.utc
# the section of the ID before the instance identifier is the
# days since the epoch, which is defined as the start of 2006.
epoch = datetime.datetime(2006, 01, 01, 00, 00, 00, tzinfo=UTC)
class UniqueFileAllocator:
"""Assign unique file names to logs being written from an app/script.
UniqueFileAllocator causes logs written from one process to be uniquely
named. It is not safe for use in multiple processes with the same output
root - each process must have a unique output root.
"""
def __init__(self, output_root, log_type, log_subtype):
"""Create a UniqueFileAllocator.
:param output_root: The root directory that logs should be placed in.
:param log_type: A string to use as a prefix in the ID assigned to new
logs. For instance, "OOPS".
:param log_subtype: A string to insert in the generate log filenames
between the day number and the serial. For instance "T" for
"Testing".
"""
self._lock = threading.Lock()
self._output_root = output_root
self._last_serial = 0
self._last_output_dir = None
self._log_type = log_type
self._log_subtype = log_subtype
self._log_token = ""
def _findHighestSerialFilename(self, directory=None, time=None):
"""Find details of the last log present in the given directory.
This function only considers logs with the currently
configured log_subtype.
One of directory, time must be supplied.
:param directory: Look in this directory.
:param time: Look in the directory that a log written at this time
would have been written to. If supplied, supercedes directory.
:return: a tuple (log_serial, log_filename), which will be (0,
None) if no logs are found. log_filename is a usable path, not
simply the basename.
"""
if directory is None:
directory = self.output_dir(time)
prefix = self.get_log_infix()
lastid = 0
lastfilename = None
for filename in os.listdir(directory):
logid = filename.rsplit('.', 1)[1]
if not logid.startswith(prefix):
continue
logid = logid[len(prefix):]
if logid.isdigit() and (lastid is None or int(logid) > lastid):
lastid = int(logid)
lastfilename = filename
if lastfilename is not None:
lastfilename = os.path.join(directory, lastfilename)
return lastid, lastfilename
def _findHighestSerial(self, directory):
"""Find the last serial actually applied to disk in directory.
The purpose of this function is to not repeat sequence numbers
if the logging application is restarted.
This method is not thread safe, and only intended to be called
from the constructor (but it is called from other places in
integration tests).
"""
return self._findHighestSerialFilename(directory)[0]
def getFilename(self, log_serial, time):
"""Get the filename for a given log serial and time."""
log_subtype = self.get_log_infix()
# TODO: Calling output_dir causes a global lock to be taken and a
# directory scan, which is bad for performance. It would be better
# to have a split out 'directory name for time' function which the
# 'want to use this directory now' function can call.
output_dir = self.output_dir(time)
second_in_day = time.hour * 3600 + time.minute * 60 + time.second
return os.path.join(
output_dir, '%05d.%s%s' % (
second_in_day, log_subtype, log_serial))
def get_log_infix(self):
"""Return the current log infix to use in ids and file names."""
return self._log_subtype + self._log_token
def newId(self, now=None):
"""Returns an (id, filename) pair for use by the caller.
The ID is composed of a short string to identify the Launchpad
instance followed by an ID that is unique for the day.
The filename is composed of the zero padded second in the day
followed by the ID. This ensures that reports are in date order when
sorted lexically.
"""
if now is not None:
now = now.astimezone(UTC)
else:
now = datetime.datetime.now(UTC)
# We look up the error directory before allocating a new ID,
# because if the day has changed, errordir() will reset the ID
# counter to zero.
self.output_dir(now)
self._lock.acquire()
try:
self._last_serial += 1
newid = self._last_serial
finally:
self._lock.release()
subtype = self.get_log_infix()
day_number = (now - epoch).days + 1
log_id = '%s-%d%s%d' % (self._log_type, day_number, subtype, newid)
filename = self.getFilename(newid, now)
return log_id, filename
def output_dir(self, now=None):
"""Find or make the directory to allocate log names in.
Log names are assigned within subdirectories containing the date the
assignment happened.
"""
if now is not None:
now = now.astimezone(UTC)
else:
now = datetime.datetime.now(UTC)
date = now.strftime('%Y-%m-%d')
result = os.path.join(self._output_root, date)
if result != self._last_output_dir:
self._lock.acquire()
try:
self._last_output_dir = result
# make sure the directory exists
try:
os.makedirs(result)
except OSError, e:
if e.errno != errno.EEXIST:
raise
# Make sure the directory permission is set to: rwxr-xr-x
permission = (
stat.S_IRWXU | stat.S_IRGRP | stat.S_IXGRP |
stat.S_IROTH | stat.S_IXOTH)
os.chmod(result, permission)
# TODO: Note that only one process can do this safely: its not
# cross-process safe, and also not entirely threadsafe:
# another # thread that has a new log and hasn't written it
# could then use that serial number. We should either make it
# really safe, or remove the contention entirely and log
# uniquely per thread of execution.
self._last_serial = self._findHighestSerial(result)
finally:
self._lock.release()
return result
def listRecentReportFiles(self):
now = datetime.datetime.now(UTC)
yesterday = now - datetime.timedelta(days=1)
directories = [self.output_dir(now), self.output_dir(yesterday)]
for directory in directories:
report_names = os.listdir(directory)
for name in sorted(report_names, reverse=True):
yield directory, name
def setToken(self, token):
"""Append a string to the log subtype in filenames and log ids.
:param token: a string to append..
Scripts that run multiple processes can use this to create a
unique identifier for each process.
"""
self._log_token = token
oops_datedir_repo-0.0.17/oops_datedir_repo/serializer_rfc822.py 0000644 0001750 0001750 00000017367 11712135212 025745 0 ustar robertc robertc 0000000 0000000 # Copyright (c) 2010, 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""Read / Write an OOPS dict as an rfc822 formatted message.
This style of OOPS format is very web server specific, not extensible - it
should be considered deprecated.
The reports this serializer handles always have the following variables (See
the python-oops api docs for more information about these variables):
* id: The name of this error report.
* type: The type of the exception that occurred.
* value: The value of the exception that occurred.
* time: The time at which the exception occurred.
* reporter: The reporting program.
* topic: The identifier for the template/script that oopsed.
[this is written as Page-Id for compatibility with as yet unported tools.]
* branch_nick: The branch nickname.
* revno: The revision number of the branch.
* tb_text: A text version of the traceback.
* username: The user associated with the request.
* url: The URL for the failed request.
* req_vars: The request variables. Either a list of 2-tuples or a dict.
* branch_nick: A name for the branch of code that was running when the report
was triggered.
* revno: The revision that the branch was at.
* Informational: A flag, True if the error wasn't fatal- if it was
'informational'.
[Deprecated - this is no longer part of the oops report conventions. Existing
reports with it set are still read, but the key is only present if it was
truely in the report.]
"""
__all__ = [
'read',
'write',
]
__metaclass__ = type
import datetime
import logging
import rfc822
import re
import urllib
import iso8601
def read(fp):
"""Deserialize an OOPS from an RFC822 format message."""
msg = rfc822.Message(fp)
id = msg.getheader('oops-id')
exc_type = msg.getheader('exception-type')
exc_value = msg.getheader('exception-value')
datestr = msg.getheader('date')
if datestr is not None:
date = iso8601.parse_date(msg.getheader('date'))
else:
date = None
topic = msg.getheader('topic')
if topic is None:
topic = msg.getheader('page-id')
username = msg.getheader('user')
url = msg.getheader('url')
try:
duration = float(msg.getheader('duration', '-1'))
except ValueError:
duration = float(-1)
informational = msg.getheader('informational')
branch_nick = msg.getheader('branch')
revno = msg.getheader('revision')
reporter = msg.getheader('oops-reporter')
# Explicitly use an iterator so we can process the file
# sequentially. In most instances the iterator will actually
# be the file object passed in because file objects should
# support iteration.
lines = iter(msg.fp)
statement_pat = re.compile(r'^(\d+)-(\d+)(?:@([\w-]+))?\s+(.*)')
def is_req_var(line):
return "=" in line and not statement_pat.match(line)
def is_traceback(line):
return line.lower().startswith('traceback') or line.startswith(
'== EXTRA DATA ==')
req_vars = []
statements = []
first_tb_line = ''
for line in lines:
first_tb_line = line
line = line.strip()
if line == '':
continue
else:
match = statement_pat.match(line)
if match is not None:
start, end, db_id, statement = match.groups()
if db_id is not None:
db_id = intern(db_id) # This string is repeated lots.
statements.append(
[int(start), int(end), db_id, statement])
elif is_req_var(line):
key, value = line.split('=', 1)
req_vars.append([urllib.unquote(key), urllib.unquote(value)])
elif is_traceback(line):
break
req_vars = dict(req_vars)
# The rest is traceback.
tb_text = ''.join([first_tb_line] + list(lines))
result = dict(id=id, type=exc_type, value=exc_value, time=date,
topic=topic, tb_text=tb_text, username=username, url=url,
duration=duration, req_vars=req_vars, timeline=statements,
branch_nick=branch_nick, revno=revno)
if informational is not None:
result['informational'] = informational
if reporter is not None:
result['reporter'] = reporter
return result
def _normalise_whitespace(s):
"""Normalise the whitespace in a string to spaces"""
if s is None:
return None # (used by the cast to %s to get 'None')
return ' '.join(s.split())
def _safestr(obj):
if isinstance(obj, unicode):
return obj.replace('\\', '\\\\').encode('ASCII',
'backslashreplace')
# A call to str(obj) could raise anything at all.
# We'll ignore these errors, and print something
# useful instead, but also log the error.
# We disable the pylint warning for the blank except.
try:
value = str(obj)
except:
logging.getLogger('oops_datedir_repo.serializer_rfc822').exception(
'Error while getting a str '
'representation of an object')
value = '' % (
str(type(obj).__name__))
# Some str() calls return unicode objects.
if isinstance(value, unicode):
return _safestr(value)
# encode non-ASCII characters
value = value.replace('\\', '\\\\')
value = re.sub(r'[\x80-\xff]',
lambda match: '\\x%02x' % ord(match.group(0)), value)
return value
def to_chunks(report):
"""Returns a list of bytestrings making up the serialized oops."""
chunks = []
def header(label, key, optional=True):
if optional and key not in report:
return
value = _safestr(report[key])
value = _normalise_whitespace(value)
chunks.append('%s: %s\n' % (label, value))
header('Oops-Id', 'id', optional=False)
header('Exception-Type', 'type')
header('Exception-Value', 'value')
if 'time' in report:
chunks.append('Date: %s\n' % report['time'].isoformat())
header('Page-Id', 'topic')
header('Branch', 'branch_nick')
header('Revision', 'revno')
header('User', 'username')
header('URL', 'url')
header('Duration', 'duration')
header('Informational', 'informational')
header('Oops-Reporter', 'reporter')
chunks.append('\n')
safe_chars = ';/\\?:@&+$, ()*!'
if 'req_vars' in report:
try:
items = sorted(report['req_vars'].items())
except AttributeError:
items = report['req_vars']
for key, value in items:
chunks.append('%s=%s\n' % (
urllib.quote(_safestr(key), safe_chars),
urllib.quote(_safestr(value), safe_chars)))
chunks.append('\n')
if 'timeline' in report:
for row in report['timeline']:
(start, end, category, statement) = row[:4]
chunks.append('%05d-%05d@%s %s\n' % (
start, end, _safestr(category),
_safestr(_normalise_whitespace(statement))))
chunks.append('\n')
if 'tb_text' in report:
chunks.append(_safestr(report['tb_text']))
return chunks
def write(report, output):
"""Write a report to a file."""
output.writelines(to_chunks(report))
oops_datedir_repo-0.0.17/oops_datedir_repo/serializer_bson.py 0000644 0001750 0001750 00000004750 11723216367 025705 0 ustar robertc robertc 0000000 0000000 # Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""Read / Write an OOPS dict as a bson dict.
This style of OOPS format is very extensible and maintains compatability with
older rfc822 oops code: the previously mandatory keys are populated on read.
Use of bson serializing is recommended.
The reports this serializer handles always have the following variables (See
the python-oops api docs for more information about these variables):
* id: The name of this error report.
* type: The type of the exception that occurred.
* value: The value of the exception that occurred.
* time: The time at which the exception occurred.
* reporter: The reporting program.
* topic: The identifier for the template/script that oopsed.
* branch_nick: The branch nickname.
* revno: The revision number of the branch.
* tb_text: A text version of the traceback.
* username: The user associated with the request.
* url: The URL for the failed request.
* req_vars: The request variables. Either a list of 2-tuples or a dict.
* branch_nick: A name for the branch of code that was running when the report
was triggered.
* revno: The revision that the branch was at.
"""
__all__ = [
'dumps',
'read',
'write',
]
__metaclass__ = type
import anybson as bson
def read(fp):
"""Deserialize an OOPS from a bson message."""
report = bson.loads(fp.read())
for key in (
'branch_nick', 'revno', 'type', 'value', 'time', 'topic',
'username', 'url'):
report.setdefault(key, None)
report.setdefault('duration', -1)
report.setdefault('req_vars', {})
report.setdefault('tb_text', '')
report.setdefault('timeline', [])
return report
def dumps(report):
"""Return a binary string representing report."""
return bson.dumps(report)
def write(report, fp):
"""Write report to fp."""
return fp.write(dumps(report))
oops_datedir_repo-0.0.17/oops_datedir_repo/bsondump.py 0000644 0001750 0001750 00000002471 11723216367 024340 0 ustar robertc robertc 0000000 0000000 #
# Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""Print a BSON document for easier human inspection.
This can be used for oopses, which are commonly (though not necessarily)
stored as BSON.
usage: bsondump FILE
"""
from pprint import pprint
import sys
import anybson as bson
def main(argv=None):
if argv is None:
argv = sys.argv
if len(argv) != 2:
print __doc__
sys.exit(1)
# I'd like to use json here, but not everything serializable in bson is
# easily representable in json - even before getting in to the weird parts,
# oopses commonly have datetime objects. -- mbp 2011-12-20
pprint(bson.loads(file(argv[1]).read()))
oops_datedir_repo-0.0.17/oops_datedir_repo/prune.py 0000644 0001750 0001750 00000013454 11712135212 023630 0 ustar robertc robertc 0000000 0000000 #
# Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
"""Delete OOPSes that are not referenced in the bugtracker.
Currently only has support for the Launchpad bug tracker.
"""
__metaclass__ = type
import datetime
import logging
import optparse
from textwrap import dedent
import sys
from launchpadlib.launchpad import Launchpad
from launchpadlib.uris import lookup_service_root
from pytz import utc
import oops_datedir_repo
__all__ = [
'main',
]
class LaunchpadTracker:
"""Abstracted bug tracker/forums etc - permits testing of main()."""
def __init__(self, options):
self.lp = Launchpad.login_anonymously(
'oops-prune', options.lpinstance, version='devel')
def find_oops_references(self, start_time, end_time, project=None,
projectgroup=None):
projects = set([])
if project is not None:
projects.add(project)
if projectgroup is not None:
[projects.add(lp_proj.name)
for lp_proj in self.lp.project_groups[projectgroup].projects]
result = set()
lp_projects = self.lp.projects
one_week = datetime.timedelta(weeks=1)
for project in projects:
lp_project = lp_projects[project]
current_start = start_time
while current_start < end_time:
current_end = current_start + one_week
if current_end > end_time:
current_end = end_time
logging.info(
"Querying OOPS references on %s from %s to %s",
project, current_start, current_end)
result.update(lp_project.findReferencedOOPS(
start_date=current_start, end_date=current_end))
current_start = current_end
return result
def main(argv=None, tracker=LaunchpadTracker, logging=logging):
"""Console script entry point."""
if argv is None:
argv = sys.argv
usage = dedent("""\
%prog [options]
The following options must be supplied:
--repo
And either
--project
or
--projectgroup
e.g.
%prog --repo . --projectgroup launchpad-project
Will process every member project of launchpad-project.
When run this program will ask Launchpad for OOPS references made since
the last date it pruned up to, with an upper limit of one week from
today. It then looks in the repository for all oopses created during
that date range, and if they are not in the set returned by Launchpad,
deletes them. If the repository has never been pruned before, it will
pick the earliest datedir present in the repository as the start date.
""")
description = \
"Delete OOPS reports that are not referenced in a bug tracker."
parser = optparse.OptionParser(
description=description, usage=usage)
parser.add_option('--project',
help="Launchpad project to find references in.")
parser.add_option('--projectgroup',
help="Launchpad project group to find references in.")
parser.add_option('--repo', help="Path to the repository to read from.")
parser.add_option(
'--lpinstance', help="Launchpad instance to use", default="production")
options, args = parser.parse_args(argv[1:])
def needed(*optnames):
present = set()
for optname in optnames:
if getattr(options, optname, None) is not None:
present.add(optname)
if not present:
if len(optnames) == 1:
raise ValueError('Option "%s" must be supplied' % optname)
else:
raise ValueError(
'One of options %s must be supplied' % (optnames,))
elif len(present) != 1:
raise ValueError(
'Only one of options %s can be supplied' % (optnames,))
needed('repo')
needed('project', 'projectgroup')
logging.basicConfig(
filename='prune.log', filemode='w', level=logging.DEBUG)
repo = oops_datedir_repo.DateDirRepo(options.repo)
one_week = datetime.timedelta(weeks=1)
one_day = datetime.timedelta(days=1)
# Only prune OOPS reports more than one week old.
prune_until = datetime.datetime.now(utc) - one_week
# Ignore OOPS reports we already found references for - older than the last
# prune date.
try:
prune_from = repo.get_config('pruned-until')
except KeyError:
try:
oldest_oops = repo.oldest_date()
except ValueError:
logging.info("No OOPSes in repo, nothing to do.")
return 0
midnight_utc = datetime.time(tzinfo=utc)
prune_from = datetime.datetime.combine(oldest_oops, midnight_utc)
# The tracker finds all the references for the selected dates.
finder = tracker(options)
references = finder.find_oops_references(
prune_from, prune_until, options.project, options.projectgroup)
# Then we can delete the unreferenced oopses.
repo.prune_unreferenced(prune_from, prune_until, references)
# And finally save the fact we have scanned up to the selected date.
repo.set_config('pruned-until', prune_until)
return 0
oops_datedir_repo-0.0.17/oops_datedir_repo/__init__.py 0000644 0001750 0001750 00000003025 11723762453 024246 0 ustar robertc robertc 0000000 0000000 #
# Copyright (c) 2011, Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, version 3 only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see .
# GNU Lesser General Public License version 3 (see the file LICENSE).
# same format as sys.version_info: "A tuple containing the five components of
# the version number: major, minor, micro, releaselevel, and serial. All
# values except releaselevel are integers; the release level is 'alpha',
# 'beta', 'candidate', or 'final'. The version_info value corresponding to the
# Python version 2.0 is (2, 0, 0, 'final', 0)." Additionally we use a
# releaselevel of 'dev' for unreleased under-development code.
#
# If the releaselevel is 'alpha' then the major/minor/micro components are not
# established at this point, and setup.py will use a version of next-$(revno).
# If the releaselevel is 'final', then the tarball will be major.minor.micro.
# Otherwise it is major.minor.micro~$(revno).
__version__ = (0, 0, 17, 'beta', 0)
__all__ = [
'DateDirRepo',
]
from oops_datedir_repo.repository import DateDirRepo