pax_global_header00006660000000000000000000000064122514464310014514gustar00rootroot0000000000000052 comment=9c59179234f1cc5129f2a8acbade9aee2486ecf5 logster-0.0.1/000077500000000000000000000000001225144643100131715ustar00rootroot00000000000000logster-0.0.1/.gitignore000066400000000000000000000000461225144643100151610ustar00rootroot00000000000000*.pyc *.log *.state *.egg-info *.venv logster-0.0.1/.travis.yml000066400000000000000000000003501225144643100153000ustar00rootroot00000000000000language: python python: - "2.5" - "2.6" - "2.7" - "3.1" - "3.2" # command to install dependencies install: "pip install nose --use-mirrors" script: PYTHONPATH=`pwd`/parsers nosetests tests/ branches: only: - master logster-0.0.1/Makefile000066400000000000000000000003661225144643100146360ustar00rootroot00000000000000install: /bin/mkdir -p /usr/share/logster /bin/mkdir -p /var/log/logster /usr/bin/install -m 0755 -t /usr/sbin logster /usr/bin/install -m 0644 -t /usr/share/logster logster_helper.py /usr/bin/install -m 0644 -t /usr/share/logster parsers/* logster-0.0.1/README.md000066400000000000000000000117001225144643100144470ustar00rootroot00000000000000# Logster - generate metrics from logfiles [![Build Status](https://secure.travis-ci.org/etsy/logster.png)](http://travis-ci.org/etsy/logster) Logster is a utility for reading log files and generating metrics in Graphite or Ganglia. It is ideal for visualizing trends of events that are occurring in your application/system/error logs. For example, you might use logster to graph the number of occurrences of HTTP response code that appears in your web server logs. Logster maintains a cursor, via logtail, on each log file that it reads so that each successive execution only inspects new log entries. In other words, a 1 minute crontab entry for logster would allow you to generate near real-time trends in Graphite or Ganglia for anything you want to measure from your logs. This tool is made up of a framework script, logster, and parsing scripts that are written to accommodate your specific log format. Two sample parsers are included in this distribution. The parser scripts essentially read a log file line by line, apply a regular expression to extract useful data from the lines you are interested in, and then aggregate that data into metrics that will be submitted to either Ganglia or Graphite. Take a look through the sample parsers, which should give you some idea of how to get started writing your own. ## History The logster project was created at Etsy as a fork of ganglia-logtailer (https://bitbucket.org/maplebed/ganglia-logtailer). We made the decision to fork ganglia-logtailer because we were removing daemon-mode from the original framework. We only make use of cron-mode, and supporting both cron- and daemon-modes makes for more work when creating parsing scripts. We care strongly about simplicity in writing parsing scripts -- which enables more of our engineers to write log parsers quickly. ## Installation Logster depends on the "logtail" utility that can be obtained from the logcheck package, either from a Debian package manager or from source: http://packages.debian.org/source/sid/logcheck RPMs for logcheck can be found here: http://rpmfind.net/linux/rpm2html/search.php?query=logcheck Once you have logtail installed via the logcheck package, you make want to look over the actual logster script itself to adjust any paths necessary. Then the only other thing you need to do is run the installation commands in the Makefile: $ sudo make install ## Usage You can test logster from the command line. There are two sample parsers: SampleLogster, which generates stats from an Apache access log; and Log4jLogster, which generates stats from a log4j log. The --dry-run option will allow you to see the metrics being generated on stdout rather than sending them to either Ganglia or Graphite. $ sudo /usr/sbin/logster --dry-run --output=ganglia SampleLogster /var/log/httpd/access_log $ sudo /usr/sbin/logster --dry-run --output=graphite --graphite-host=graphite.example.com:2003 SampleLogster /var/log/httpd/access_log Additional usage details can be found with the -h option: $ ./logster -h usage: logster [options] parser logfile Tail a log file and filter each line to generate metrics that can be sent to common monitoring packages. Options: -h, --help show this help message and exit -p METRIC_PREFIX, --metric-prefix=METRIC_PREFIX Add prefix to all published metrics. This is for people that may multiple instances of same service on same host. -x METRIC_SUFFIX, --metric-suffix=METRIC_PREFIX Add suffix to all published metrics. This is for people that may multiple instances of same service on same host. --parser-help Print usage and options for the selected parser --parser-options=PARSER_OPTIONS Options to pass to the logster parser such as "-o VALUE --option2 VALUE". These are parser-specific and passed directly to the parser. --gmetric-options=GMETRIC_OPTIONS Options to pass to gmetric such as "-d 180 -c /etc/ganglia/gmond.conf" (default). These are passed directly to gmetric. --graphite-host=GRAPHITE_HOST Hostname and port for Graphite collector, e.g. graphite.example.com:2003 -s STATE_DIR, --state-dir=STATE_DIR Where to store the logtail state file. Default location /var/run -o OUTPUT, --output=OUTPUT Where to send metrics (can specify multiple times). Choices are 'graphite', 'ganglia', or 'stdout'. -d, --dry-run Parse the log file but send stats to standard output. -D, --debug Provide more verbose logging for debugging. logster-0.0.1/bin/000077500000000000000000000000001225144643100137415ustar00rootroot00000000000000logster-0.0.1/bin/logster000077500000000000000000000315341225144643100153540ustar00rootroot00000000000000#!/usr/bin/python -tt # -*- coding: utf-8 -*- ### ### logster ### ### Tails a log and applies a log parser (that knows what to do with specific) ### types of entries in the log, then reports metrics to Ganglia and/or Graphite. ### ### Usage: ### ### $ logster [options] parser logfile ### ### Help: ### ### $ logster -h ### ### ### Copyright 2011, Etsy, Inc. ### ### This file is part of Logster. ### ### Logster is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### Logster is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### ### You should have received a copy of the GNU General Public License ### along with Logster. If not, see . ### ### Forked from the ganglia-logtailer project ### (http://bitbucket.org/maplebed/ganglia-logtailer): ### Copyright Linden Research, Inc. 2008 ### Released under the GPL v2 or later. ### For a full description of the license, please visit ### http://www.gnu.org/licenses/gpl.txt ### import os import sys import re import optparse import stat import logging.handlers import fcntl import socket import traceback from time import time from math import floor # Local dependencies from logster.logster_helper import LogsterParsingException, LockingError # Globals gmetric = "/usr/bin/gmetric" logtail = "/usr/sbin/logtail2" log_dir = "/var/log/logster" state_dir = "/var/run" script_start_time = time() # Command-line options and parsing. cmdline = optparse.OptionParser(usage="usage: %prog [options] parser logfile", description="Tail a log file and filter each line to generate metrics that can be sent to common monitoring packages.") cmdline.add_option('--logtail', action='store', default=logtail, help='Specify location of logtail. Default %s' % logtail) cmdline.add_option('--metric-prefix', '-p', action='store', help='Add prefix to all published metrics. This is for people that may multiple instances of same service on same host.', default='') cmdline.add_option('--metric-suffix', '-x', action='store', help='Add suffix to all published metrics. This is for people that may add suffix at the end of their metrics.', default=None) cmdline.add_option('--parser-help', action='store_true', help='Print usage and options for the selected parser') cmdline.add_option('--parser-options', action='store', help='Options to pass to the logster parser such as "-o VALUE --option2 VALUE". These are parser-specific and passed directly to the parser.') cmdline.add_option('--gmetric-options', action='store', help='Options to pass to gmetric such as "-d 180 -c /etc/ganglia/gmond.conf" (default). These are passed directly to gmetric.', default='-d 180 -c /etc/ganglia/gmond.conf') cmdline.add_option('--graphite-host', action='store', help='Hostname and port for Graphite collector, e.g. graphite.example.com:2003') cmdline.add_option('--state-dir', '-s', action='store', default=state_dir, help='Where to store the logtail state file. Default location %s' % state_dir) cmdline.add_option('--output', '-o', action='append', choices=('graphite', 'ganglia', 'stdout'), help="Where to send metrics (can specify multiple times). Choices are 'graphite', 'ganglia', or 'stdout'.") cmdline.add_option('--dry-run', '-d', action='store_true', default=False, help='Parse the log file but send stats to standard output.') cmdline.add_option('--debug', '-D', action='store_true', default=False, help='Provide more verbose logging for debugging.') options, arguments = cmdline.parse_args() if options.parser_help: options.parser_options = '-h' if (len(arguments) != 2): cmdline.print_help() cmdline.error("Supply at least two arguments: parser and logfile.") if not options.output: cmdline.print_help() cmdline.error("Supply where the data should be sent with -o (or --output).") if 'graphite' in options.output and not options.graphite_host: cmdline.print_help() cmdline.error("You must supply --graphite-host when using 'graphite' as an output type.") class_name = arguments[0] log_file = arguments[1] state_dir = options.state_dir logtail = options.logtail # Logging infrastructure for use throughout the script. # Uses appending log file, rotated at 100 MB, keeping 5. if (not os.path.isdir(log_dir)): os.mkdir(log_dir) logger = logging.getLogger('logster') formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s') hdlr = logging.handlers.RotatingFileHandler('%s/logster.log' % log_dir, 'a', 100 * 1024 * 1024, 5) hdlr.setFormatter(formatter) logger.addHandler(hdlr) logger.setLevel(logging.INFO) if (options.debug): logger.setLevel(logging.DEBUG) ## This provides a lineno() function to make it easy to grab the line ## number that we're on (for logging) ## Danny Yoo (dyoo@hkn.eecs.berkeley.edu) ## taken from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/145297 import inspect def lineno(): """Returns the current line number in our program.""" return inspect.currentframe().f_back.f_lineno def submit_stats(parser, duration, options): metrics = parser.get_state(duration) if 'ganglia' in options.output: submit_ganglia(metrics, options) if 'graphite' in options.output: submit_graphite(metrics, options) if 'stdout' in options.output: submit_stdout(metrics, options) def submit_stdout(metrics, options): for metric in metrics: if (options.metric_prefix != ""): metric.name = options.metric_prefix + "_" + metric.name if (options.metric_suffix is not None): metric.name = metric.name + "_" + options.metric_suffix print "%s %s" %(metric.name, metric.value) def submit_ganglia(metrics, options): for metric in metrics: if (options.metric_prefix != ""): metric.name = options.metric_prefix + "_" + metric.name if (options.metric_suffix is not None): metric.name = metric.name + "_" + options.metric_suffix gmetric_cmd = "%s %s --name %s --value %s --type %s --units \"%s\"" % ( gmetric, options.gmetric_options, metric.name, metric.value, metric.type, metric.units) logger.debug("Submitting Ganglia metric: %s" % gmetric_cmd) if (not options.dry_run): os.system("%s" % gmetric_cmd) else: print "%s" % gmetric_cmd def submit_graphite(metrics, options): if (re.match("^[\w\.\-]+\:\d+$", options.graphite_host) == None): raise Exception, "Invalid host:port found for Graphite: '%s'" % options.graphite_host if (not options.dry_run): host = options.graphite_host.split(':') s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((host[0], int(host[1]))) for metric in metrics: if (options.metric_prefix != ""): metric.name = options.metric_prefix + "." + metric.name if (options.metric_suffix is not None): metric.name = metric.name + "." + options.metric_suffix metric_string = "%s %s %s" % (metric.name, metric.value, metric.timestamp) logger.debug("Submitting Graphite metric: %s" % metric_string) if (not options.dry_run): s.send("%s\n" % metric_string) else: print "%s %s" % (options.graphite_host, metric_string) if (not options.dry_run): s.close() def start_locking(lockfile_name): """ Acquire a lock via a provided lockfile filename. """ if os.path.exists(lockfile_name): raise LockingError("Lock file already exists.") f = open(lockfile_name, 'w') try: fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) f.write("%s" % os.getpid()) except IOError: # Would be better to also check the pid in the lock file and remove the # lock file if that pid no longer exists in the process table. raise LockingError("Cannot acquire logster lock (%s)" % lockfile_name) logger.debug("Locking successful") return f def end_locking(lockfile_fd, lockfile_name): """ Release a lock via a provided file descriptor. """ try: fcntl.flock(lockfile_fd, fcntl.LOCK_UN | fcntl.LOCK_NB) except IOError, e: raise LockingError("Cannot release logster lock (%s)" % lockfile_name) try: os.unlink(lockfile_name) except OSError, e: raise LockingError("Cannot unlink %s" % lockfile_name) logger.debug("Unlocking successful") return def main(): dirsafe_logfile = log_file.replace('/','-') logtail_state_file = '%s/logtail-%s%s.state' % (state_dir, class_name, dirsafe_logfile) logtail_lock_file = '%s/logtail-%s%s.lock' % (state_dir, class_name, dirsafe_logfile) shell_tail = "%s -f %s -o %s" % (logtail, log_file, logtail_state_file) logger.info("Executing parser %s on logfile %s" % (class_name, log_file)) logger.debug("Using state file %s" % logtail_state_file) # Import and instantiate the class from the module passed in. Files and Class names must be the same. module = __import__('logster.parsers.' + class_name, fromlist=[class_name]) parser = getattr(module, class_name)(option_string=options.parser_options) # Check for lock file so we don't run multiple copies of the same parser # simultaneuosly. This will happen if the log parsing takes more time than # the cron period, which is likely on first run if the logfile is huge. try: lockfile = start_locking(logtail_lock_file) except LockingError, e: logger.warning("Failed to get lock. Is another instance of logster running?") sys.exit(1) # Get input to parse. try: # Read the age of the state file to see how long it's been since we last # ran. Replace the state file if it has gone missing. While we are her, # touch the state file to reset the time in case logtail doesn't # find any new lines (and thus won't update the statefile). try: state_file_age = os.stat(logtail_state_file)[stat.ST_MTIME] # Calculate now() - state file age to determine check duration. duration = floor(time()) - floor(state_file_age) logger.debug("Setting duration to %s seconds." % duration) except OSError, e: logger.info('Writing new state file and exiting. (Was either first run, or state file went missing.)') input = os.popen(shell_tail) retval = input.close() if (retval != 256): logger.warning('%s returned bad exit code %s' % (shell_tail, retval)) end_locking(lockfile, logtail_lock_file) sys.exit(0) # Open a pipe to read input from logtail. input = os.popen(shell_tail) except SystemExit, e: raise except Exception, e: # note - there is no exception when logtail doesn't exist. # I don't know when this exception will ever actually be triggered. print ("Failed to run %s to get log data (line %s): %s" % (shell_tail, lineno(), e)) end_locking(lockfile, logtail_lock_file) sys.exit(1) # Parse each line from input, then send all stats to their collectors. try: for line in input: try: parser.parse_line(line) except LogsterParsingException, e: # This should only catch recoverable exceptions (of which there # aren't any at the moment). logger.debug("Parsing exception caught at %s: %s" % (lineno(), e)) submit_stats(parser, duration, options) except Exception, e: print "Exception caught at %s: %s" % (lineno(), e) traceback.print_exc() end_locking(lockfile, logtail_lock_file) sys.exit(1) # Log the execution time exec_time = round(time() - script_start_time, 1) logger.info("Total execution time: %s seconds." % exec_time) # Set mtime and atime for the state file to the startup time of the script # so that the cron interval is not thrown off by parsing a large number of # log entries. os.utime(logtail_state_file, (floor(script_start_time), floor(script_start_time))) end_locking(lockfile, logtail_lock_file) # try and remove the lockfile one last time, but it's a valid state that it's already been removed. try: end_locking(lockfile, logtail_lock_file) except Exception, e: pass if __name__ == '__main__': main() logster-0.0.1/logster/000077500000000000000000000000001225144643100146505ustar00rootroot00000000000000logster-0.0.1/logster/__init__.py000066400000000000000000000000001225144643100167470ustar00rootroot00000000000000logster-0.0.1/logster/logster_helper.py000066400000000000000000000034621225144643100202450ustar00rootroot00000000000000#!/usr/bin/python ### ### Copyright 2011, Etsy, Inc. ### ### This file is part of Logster. ### ### Logster is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### Logster is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### ### You should have received a copy of the GNU General Public License ### along with Logster. If not, see . ### from time import time class MetricObject(object): """General representation of a metric that can be used in many contexts""" def __init__(self, name, value, units='', type='float'): self.name = name self.value = value self.units = units self.type = type self.timestamp = int(time()) class LogsterParser(object): """Base class for logster parsers""" def parse_line(self, line): """Take a line and do any parsing we need to do. Required for parsers""" raise RuntimeError, "Implement me!" def get_state(self, duration): """Run any calculations needed and return list of metric objects""" raise RuntimeError, "Implement me!" class LogsterParsingException(Exception): """Raise this exception if the parse_line function wants to throw a 'recoverable' exception - i.e. you want parsing to continue but want to skip this line and log a failure.""" pass class LockingError(Exception): """ Exception raised for errors creating or destroying lockfiles. """ pass logster-0.0.1/logster/parsers/000077500000000000000000000000001225144643100163275ustar00rootroot00000000000000logster-0.0.1/logster/parsers/ErrorLogLogster.py000066400000000000000000000050741225144643100220020ustar00rootroot00000000000000### A logster parser file that can be used to count the number of different ### messages in an Apache error_log ### ### For example: ### sudo ./logster --dry-run --output=ganglia ErrorLogLogster /var/log/httpd/error_log ### ### import time import re from logster_helper import MetricObject, LogsterParser from logster_helper import LogsterParsingException class ErrorLogLogster(LogsterParser): def __init__(self, option_string=None): '''Initialize any data structures or variables needed for keeping track of the tasty bits we find in the log we are parsing.''' self.notice = 0 self.warn = 0 self.error = 0 self.crit = 0 self.other = 0 # Regular expression for matching lines we are interested in, and capturing # fields from the line self.reg = re.compile('^\[[^]]+\] \[(?P\w+)\] .*') def parse_line(self, line): '''This function should digest the contents of one line at a time, updating object's state variables. Takes a single argument, the line to be parsed.''' try: # Apply regular expression to each line and extract interesting bits. regMatch = self.reg.match(line) if regMatch: linebits = regMatch.groupdict() level = linebits['loglevel'] if (level == 'notice'): self.notice += 1 elif (level == 'warn'): self.warn += 1 elif (level == 'error'): self.error += 1 elif (level == 'crit'): self.crit += 1 else: self.other += 1 else: raise LogsterParsingException, "regmatch failed to match" except Exception, e: raise LogsterParsingException, "regmatch or contents failed with %s" % e def get_state(self, duration): '''Run any necessary calculations on the data collected from the logs and return a list of metric objects.''' self.duration = duration / 10 # Return a list of metrics objects return [ MetricObject("notice", (self.notice / self.duration), "Logs per 10 sec"), MetricObject("warn", (self.warn / self.duration), "Logs per 10 sec"), MetricObject("error", (self.error / self.duration), "Logs per 10 sec"), MetricObject("crit", (self.crit / self.duration), "Logs per 10 sec"), MetricObject("other", (self.other / self.duration), "Logs per 10 sec"), ] logster-0.0.1/logster/parsers/Log4jLogster.py000066400000000000000000000071671225144643100212330ustar00rootroot00000000000000### Author: Mike Babineau , EA2D ### ### A sample logster parser file that can be used to count the number ### of events for each log level in a log4j log. ### ### Example (note WARN,ERROR,FATAL is default): ### sudo ./logster --output=stdout Log4jLogster /var/log/example_app/app.log --parser-options '-l WARN,ERROR,FATAL' ### ### ### Logster copyright 2011, Etsy, Inc. ### ### This file is part of Logster. ### ### Logster is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### Logster is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### ### You should have received a copy of the GNU General Public License ### along with Logster. If not, see . ### import time import re import optparse from logster.logster_helper import MetricObject, LogsterParser from logster.logster_helper import LogsterParsingException class Log4jLogster(LogsterParser): def __init__(self, option_string=None): '''Initialize any data structures or variables needed for keeping track of the tasty bits we find in the log we are parsing.''' if option_string: options = option_string.split(' ') else: options = [] optparser = optparse.OptionParser() optparser.add_option('--log-levels', '-l', dest='levels', default='WARN,ERROR,FATAL', help='Comma-separated list of log levels to track: (default: "WARN,ERROR,FATAL")') opts, args = optparser.parse_args(args=options) self.levels = opts.levels.split(',') for level in self.levels: # Track counts from 0 for each log level setattr(self, level, 0) # Regular expression for matching lines we are interested in, and capturing # fields from the line (in this case, a log level such as WARN, ERROR, or FATAL). self.reg = re.compile('[0-9-_:\.]+ (?P%s)' % ('|'.join(self.levels)) ) def parse_line(self, line): '''This function should digest the contents of one line at a time, updating object's state variables. Takes a single argument, the line to be parsed.''' try: # Apply regular expression to each line and extract interesting bits. regMatch = self.reg.match(line) if regMatch: linebits = regMatch.groupdict() log_level = linebits['log_level'] if log_level in self.levels: current_val = getattr(self, log_level) setattr(self, log_level, current_val+1) else: raise LogsterParsingException, "regmatch failed to match" except Exception, e: raise LogsterParsingException, "regmatch or contents failed with %s" % e def get_state(self, duration): '''Run any necessary calculations on the data collected from the logs and return a list of metric objects.''' self.duration = duration metrics = [MetricObject(level, (getattr(self, level) / self.duration)) for level in self.levels] return metricslogster-0.0.1/logster/parsers/MetricLogster.py000066400000000000000000000114041225144643100214640ustar00rootroot00000000000000### Author: Mark Crossfield , Mark Crossfield ### Rewritten and extended in collaboration with Jeff Blaine, who first contributed the MetricLogster. ### ### Collects arbitrary metric lines and spits out aggregated ### metric values (MetricObjects) based on the metric names ### found in the lines. Any conforming metric, one parser. Sweet. ### The logger indicates whether metric is a count or time by use of a marker. ### This is enough information to work out what to push to Graphite; ### - for counters the values are totalled ### - for times the median and 90th percentile (configurable) are computed ### ### Logs should contain lines such as below - these can be interleaved with other lines with no problems. ### ### ... METRIC_TIME metric=some.metric.time value=10ms ### ... METRIC_TIME metric=some.metric.time value=11ms ### ... METRIC_TIME metric=some.metric.time value=20ms ### ... METRIC_COUNT metric=some.metric.count value=1 ### ... METRIC_COUNT metric=some.metric.count value=2.2 ### ### Results: ### some.metric.count 3.2 ### some.metric.time.mean 13.6666666667 ### some.metric.time.median 11 ### some.metric.time.90th_percentile 18.2 ### ### If the metric is a time the parser will extract the unit from the fist line it encounters for each run. ### This means it is important for the logger to be consistent with its units. ### Note: units are irrelevant for Graphite, as it does not support them; this functionality is to cater for Ganglia. ### ### For example: ### sudo ./logster --output=stdout MetricLogster /var/log/example_app/app.log --parser-options '--percentiles 25,75,90' ### ### Based on SampleLogster which is Copyright 2011, Etsy, Inc. import re import optparse from . import stats_helper from logster.logster_helper import MetricObject, LogsterParser from logster.logster_helper import LogsterParsingException class MetricLogster(LogsterParser): def __init__(self, option_string=None): '''Initialize any data structures or variables needed for keeping track of the tasty bits we find in the log we are parsing.''' self.counts = {} self.times = {} if option_string: options = option_string.split(' ') else: options = [] optparser = optparse.OptionParser() optparser.add_option('--percentiles', '-p', dest='percentiles', default='90', help='Comma-separated list of integer percentiles to track: (default: "90")') opts, args = optparser.parse_args(args=options) self.percentiles = opts.percentiles.split(',') # General regular expressions, expecting the metric name to be included in the log file. self.count_reg = re.compile('.*METRIC_COUNT\smetric=(?P[^\s]+)\s+value=(?P[0-9.]+)[^0-9.].*') self.time_reg = re.compile('.*METRIC_TIME\smetric=(?P[^\s]+)\s+value=(?P[0-9.]+)\s*(?P[^\s$]*).*') def parse_line(self, line): '''This function should digest the contents of one line at a time, updating object's state variables. Takes a single argument, the line to be parsed.''' count_match = self.count_reg.match(line) if count_match: countbits = count_match.groupdict() count_name = countbits['count_name'] if not self.counts.has_key(count_name): self.counts[count_name] = 0.0 self.counts[count_name] += float(countbits['count_value']); time_match = self.time_reg.match(line) if time_match: time_name = time_match.groupdict()['time_name'] if not self.times.has_key(time_name): unit = time_match.groupdict()['time_unit'] self.times[time_name] = {'unit': unit, 'values': []}; self.times[time_name]['values'].append(float(time_match.groupdict()['time_value'])) def get_state(self, duration): '''Run any necessary calculations on the data collected from the logs and return a list of metric objects.''' metrics = [] if duration > 0: metrics += [MetricObject(counter, self.counts[counter]/duration) for counter in self.counts] for time_name in self.times: values = self.times[time_name]['values'] unit = self.times[time_name]['unit'] metrics.append(MetricObject(time_name+'.mean', stats_helper.find_mean(values), unit)) metrics.append(MetricObject(time_name+'.median', stats_helper.find_median(values), unit)) metrics += [MetricObject('%s.%sth_percentile' % (time_name,percentile), stats_helper.find_percentile(values,int(percentile)), unit) for percentile in self.percentiles] return metrics logster-0.0.1/logster/parsers/PostfixLogster.py000066400000000000000000000101471225144643100217000ustar00rootroot00000000000000### A logster parser file that can be used to count the number ### of sent/deferred/bounced emails from a Postfix log, along with ### some other associated statistics. ### ### For example: ### sudo ./logster --dry-run --output=ganglia PostfixParser /var/log/maillog ### ### ### Copyright 2011, Bronto Software, Inc. ### ### This parser is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### This parser is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### import time import re from logster.logster_helper import MetricObject, LogsterParser from logster.logster_helper import LogsterParsingException class PostfixLogster(LogsterParser): def __init__(self, option_string=None): '''Initialize any data structures or variables needed for keeping track of the tasty bits we find in the log we are parsing.''' self.numSent = 0 self.numDeferred = 0 self.numBounced = 0 self.totalDelay = 0 self.numRbl = 0 # Regular expression for matching lines we are interested in, and capturing # fields from the line (in this case, http_status_code). self.reg = re.compile('.*delay=(?P[^,]+),.*status=(?P(sent|deferred|bounced))') def parse_line(self, line): '''This function should digest the contents of one line at a time, updating object's state variables. Takes a single argument, the line to be parsed.''' try: # Apply regular expression to each line and extract interesting bits. regMatch = self.reg.match(line) if regMatch: linebits = regMatch.groupdict() if (linebits['status'] == 'sent'): self.totalDelay += float(linebits['send_delay']) self.numSent += 1 elif (linebits['status'] == 'deferred'): self.numDeferred += 1 elif (linebits['status'] == 'bounced'): self.numBounced += 1 except Exception, e: raise LogsterParsingException, "regmatch or contents failed with %s" % e def get_state(self, duration): '''Run any necessary calculations on the data collected from the logs and return a list of metric objects.''' self.duration = duration totalTxns = self.numSent + self.numBounced + self.numDeferred pctDeferred = 0.0 pctSent = 0.0 pctBounced = 0.0 avgDelay = 0 mailTxnsSec = 0 mailSentSec = 0 #mind divide by zero situations if (totalTxns > 0): pctDeferred = (self.numDeferred / totalTxns) * 100 pctSent = (self.numSent / totalTxns) * 100 pctBounced = (self.numBounced / totalTxns ) * 100 if (self.numSent > 0): avgDelay = self.totalDelay / self.numSent if (self.duration > 0): mailTxnsSec = totalTxns / self.duration mailSentSec = self.numSent / self.duration # Return a list of metrics objects return [ MetricObject("numSent", self.numSent, "Total Sent"), MetricObject("pctSent", pctSent, "Percentage Sent"), MetricObject("numDeferred", self.numDeferred, "Total Deferred"), MetricObject("pctDeferred", pctDeferred, "Percentage Deferred"), MetricObject("numBounced", self.numBounced, "Total Bounced"), MetricObject("pctBounced", pctBounced, "Percentage Bounced"), MetricObject("mailTxnsSec", mailTxnsSec, "Transactions per sec"), MetricObject("mailSentSec", mailSentSec, "Sends per sec"), MetricObject("avgDelay", avgDelay, "Average Sending Delay"), ] logster-0.0.1/logster/parsers/SampleLogster.py000066400000000000000000000066351225144643100214740ustar00rootroot00000000000000### A sample logster parser file that can be used to count the number ### of response codes found in an Apache access log. ### ### For example: ### sudo ./logster --dry-run --output=ganglia SampleLogster /var/log/httpd/access_log ### ### ### Copyright 2011, Etsy, Inc. ### ### This file is part of Logster. ### ### Logster is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### Logster is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### ### You should have received a copy of the GNU General Public License ### along with Logster. If not, see . ### import time import re from logster.logster_helper import MetricObject, LogsterParser from logster.logster_helper import LogsterParsingException class SampleLogster(LogsterParser): def __init__(self, option_string=None): '''Initialize any data structures or variables needed for keeping track of the tasty bits we find in the log we are parsing.''' self.http_1xx = 0 self.http_2xx = 0 self.http_3xx = 0 self.http_4xx = 0 self.http_5xx = 0 # Regular expression for matching lines we are interested in, and capturing # fields from the line (in this case, http_status_code). self.reg = re.compile('.*HTTP/1.\d\" (?P\d{3}) .*') def parse_line(self, line): '''This function should digest the contents of one line at a time, updating object's state variables. Takes a single argument, the line to be parsed.''' try: # Apply regular expression to each line and extract interesting bits. regMatch = self.reg.match(line) if regMatch: linebits = regMatch.groupdict() status = int(linebits['http_status_code']) if (status < 200): self.http_1xx += 1 elif (status < 300): self.http_2xx += 1 elif (status < 400): self.http_3xx += 1 elif (status < 500): self.http_4xx += 1 else: self.http_5xx += 1 else: raise LogsterParsingException, "regmatch failed to match" except Exception, e: raise LogsterParsingException, "regmatch or contents failed with %s" % e def get_state(self, duration): '''Run any necessary calculations on the data collected from the logs and return a list of metric objects.''' self.duration = duration # Return a list of metrics objects return [ MetricObject("http_1xx", (self.http_1xx / self.duration), "Responses per sec"), MetricObject("http_2xx", (self.http_2xx / self.duration), "Responses per sec"), MetricObject("http_3xx", (self.http_3xx / self.duration), "Responses per sec"), MetricObject("http_4xx", (self.http_4xx / self.duration), "Responses per sec"), MetricObject("http_5xx", (self.http_5xx / self.duration), "Responses per sec"), ] logster-0.0.1/logster/parsers/SquidLogster.py000066400000000000000000000104741225144643100213340ustar00rootroot00000000000000### A sample logster parser file that can be used to count the number ### of responses and object size in the squid access.log ### ### For example: ### sudo ./logster --dry-run --output=ganglia SquidLogster /var/log/squid/access.log ### ### ### Copyright 2011, Etsy, Inc. ### ### This file is part of Logster. ### ### Logster is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### Logster is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### ### You should have received a copy of the GNU General Public License ### along with Logster. If not, see . ### import time import re from logster_helper import MetricObject, LogsterParser from logster_helper import LogsterParsingException class SquidLogster(LogsterParser): def __init__(self, option_string=None): '''Initialize any data structures or variables needed for keeping track of the tasty bits we find in the log we are parsing.''' self.size_transferred = 0 self.squid_codes = { 'TCP_MISS': 0, 'TCP_DENIED': 0, 'TCP_HIT': 0, 'TCP_MEM_HIT': 0, 'OTHER': 0, } self.http_1xx = 0 self.http_2xx = 0 self.http_3xx = 0 self.http_4xx = 0 self.http_5xx = 0 # Regular expression for matching lines we are interested in, and capturing # fields from the line (in this case, http_status_code, size and squid_code). self.reg = re.compile('^[0-9.]+ +(?P[0-9]+) .*(?P(TCP|UDP|NONE)_[A-Z_]+)/(?P\d{3}) .*') def parse_line(self, line): '''This function should digest the contents of one line at a time, updating object's state variables. Takes a single argument, the line to be parsed.''' try: # Apply regular expression to each line and extract interesting bits. regMatch = self.reg.match(line) if regMatch: linebits = regMatch.groupdict() status = int(linebits['http_status_code']) squid_code = linebits['squid_code'] size = int(linebits['size']) if (status < 200): self.http_1xx += 1 elif (status < 300): self.http_2xx += 1 elif (status < 400): self.http_3xx += 1 elif (status < 500): self.http_4xx += 1 else: self.http_5xx += 1 if self.squid_codes.has_key(squid_code): self.squid_codes[squid_code] += 1 else: self.squid_codes['OTHER'] += 1 self.size_transferred += size else: raise LogsterParsingException, "regmatch failed to match" except Exception, e: raise LogsterParsingException, "regmatch or contents failed with %s" % e def get_state(self, duration): '''Run any necessary calculations on the data collected from the logs and return a list of metric objects.''' self.duration = duration # Return a list of metrics objects return_array = [ MetricObject("http_1xx", (self.http_1xx / self.duration), "Responses per sec"), MetricObject("http_2xx", (self.http_2xx / self.duration), "Responses per sec"), MetricObject("http_3xx", (self.http_3xx / self.duration), "Responses per sec"), MetricObject("http_4xx", (self.http_4xx / self.duration), "Responses per sec"), MetricObject("http_5xx", (self.http_5xx / self.duration), "Responses per sec"), MetricObject("size", (self.size_transferred / self.duration), "Size per sec") ] for squid_code in self.squid_codes: return_array.append(MetricObject("squid_" + squid_code, (self.squid_codes[squid_code]/self.duration), "Squid code per sec")) return return_array logster-0.0.1/logster/parsers/__init__.py000066400000000000000000000000001225144643100204260ustar00rootroot00000000000000logster-0.0.1/logster/parsers/stats_helper.py000066400000000000000000000021031225144643100213720ustar00rootroot00000000000000### Author: Mark Crossfield , Mark Crossfield ### ### A helper to assist with the calculation of statistical functions. This has probably been done better elsewhere but I wanted an easy import. ### ### Percentiles are calculated with linear interpolation between points. def find_median(numbers): return find_percentile(numbers,50) def find_percentile(numbers,percentile): numbers.sort() if len(numbers) == 0: return None if len(numbers) == 1: return numbers[0]; elif (float(percentile) / float(100))*float(len(numbers)-1) %1 != 0: left_index = int(percentile * (len(numbers) - 1) / 100) number_one = numbers[left_index ] number_two = numbers[left_index + 1] return number_one + ( number_two - number_one) * (((float(percentile)/100)*(len(numbers)-1)%1)) else: return numbers[int(percentile*(len(numbers)-1)/100)] def find_mean(numbers): if len(numbers) == 0: return None else: return sum(numbers,0.0) / len(numbers) logster-0.0.1/setup.py000077500000000000000000000006431225144643100147110ustar00rootroot00000000000000#!/usr/bin/env python try: from setuptools import setup except ImportError: from distutils.core import setup setup( name='logster', version='0.0.1', description='Parse log files, generate metrics for Graphite and Ganglia', author='Etsy', url='https://github.com/etsy/logster', packages=['logster'], zip_safe=False, scripts=[ 'bin/logster' ], license='GPL3', ) logster-0.0.1/tests/000077500000000000000000000000001225144643100143335ustar00rootroot00000000000000logster-0.0.1/tests/test_stats_helper.py000066400000000000000000000052631225144643100204470ustar00rootroot00000000000000from logster.parsers import stats_helper import unittest class TestStatsHelper(unittest.TestCase): def test_median_of_1(self): self.assertEqual(stats_helper.find_median([0]), 0) self.assertEqual(stats_helper.find_median([1]), 1) self.assertEqual(stats_helper.find_median([1,2]), 1.5) self.assertEqual(stats_helper.find_median([1,2,3]), 2) self.assertEqual(stats_helper.find_median([1,-1]), 0) self.assertEqual(stats_helper.find_median([1,999999]), 500000) def test_median_floats(self): self.assertEqual(stats_helper.find_median([float(1.1),float(2.3),float(0.4)]), 1.1) def test_max_0(self): self.assertEqual(stats_helper.find_percentile([0],100), 0) def test_max_0_to_1(self): self.assertEqual(stats_helper.find_percentile([0,1],100), 1) def test_max_0_to_3(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3],100), 3) def test_max_0_to_5(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3,4,5],100), 5) def test_max_0_to_6(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3,4,5,6],100), 6) def test_max_0_to_10(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3,4,5,6,7,8,9,10],100), 10) def test_max_0_to_11(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3,4,5,6,7,8,9,10,11],100), 11) def test_max_floats(self): self.assertEqual(stats_helper.find_percentile([0,0.1,1.5,100],100), 100) def test_10th_0_to_10(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3,4,5,6,7,8,9,10],10), 1) def test_10th_1_to_3(self): self.assertEqual(stats_helper.find_percentile([1,2,3],10), 1.2) def test_12th_0_to_9(self): self.assertEqual(stats_helper.find_percentile([0,1,2,3,4,5,6,7,8,9],12), 1.08) def test_90th_0(self): self.assertEqual(stats_helper.find_percentile([0],90), 0) def test_90th_1(self): self.assertEqual(stats_helper.find_percentile([1],90), 1) def test_90th_1_2(self): self.assertEqual(stats_helper.find_percentile([1,2],90), 1.9) def test_90th_1_2_3(self): self.assertEqual(stats_helper.find_percentile([1,2,3],90), 2.8) def test_90th_1_minus1(self): self.assertEqual(stats_helper.find_percentile([1,-1],90), 0.8) def test_90th_1_to_10(self): self.assertEqual(stats_helper.find_percentile([1,2,3,4,5,6,7,8,9,10],90), 9.1) def test_90th_1_to_11(self): self.assertEqual(stats_helper.find_percentile([1,2,3,4,5,6,7,8,9,10,11],90), 10) def test_90th_1_to_15_noncontiguous(self): self.assertAlmostEqual(stats_helper.find_percentile([1,2,3,4,5,6,7,8,9,15],90), 9.6)