mon-1.2.0/0000755003616100016640000000000010640450347012212 5ustar trockijtrockijmon-1.2.0/doc/0000755003616100016640000000000010640450347012757 5ustar trockijtrockijmon-1.2.0/doc/globals0000644003616100016640000000175510061516616014334 0ustar trockijtrockijconfig globals in %CF $ALERTDIR $AUTHFILE $AUTHTYPE $BASEDIR $CF $CFBASEDIR $CLIENT_TIMEOUT $DTLOGFILE $DTLOGGING $LOGDIR $MAXPROCS $MAX_KEEP $OCFILE $PIDFILE $RANDSTART $SCRIPTDIR $SERVPORT $SNMPPORT $STATEDIR $TRAPPORT $USERFILE globals $RCSID $AUTHOR $OS $TRAP_PDU $SLEEPINT $STOPPED $STOPPED_TIME $TRAP_PRO_VERSION $PROT_VERSION $HOSTNAME $PWD %oncall $iovec $numclients $clientcount %clients @last_alerts @last_failures $procs $fdset_rbits $fdset_ebits %watch_disabled $i (debugging only) $tm $lasttm alert flags $FL_MONITOR $FL_UPALERT $FL_TRAP $FL_TRAPTIMEOUT $FL_STARTUPALERT trap types $TRAP_COLDSTART $TRAP_WARMSTART $TRAP_LINKDOWN $TRAP_LINKUP $TRAP_AUTHFAIL $TRAP_EGPNEIGHBORLOSS $TRAP_ENTERPRISE $TRAP_HEARTBEAT opstatuses $STAT_FAIL $STAT_OK $STAT_COLDSTART $STAT_WARMSTART $STAT_LINKDOWN $STAT_UNKNOWN $STAT_TIMEOUT $STAT_UNTESTED $STAT_DEPEND $STAT_WARN %OPSTAT %MONITORHASH %ALERTHASH mon-1.2.0/doc/README.hints0000644003616100016640000000325610061516616014770 0ustar trockijtrockij$Id: README.hints,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ To be sure that mon works reliably, you may want to pay attention to the following hints: -Keep all alert and mon directories on a local filesystem. When the daemon is run, be sure that PATH does not contain remote filesystems. -Do your best to make the mon host maintain independence of all systems that it is monitoring. Configurations may vary as different services are being monitored. For example, if you need to monitor whether DNS is operational, don't depend on DNS being available in the monitor script. Use a local hosts table which contains all the hosts referred to in the configuration file. -If you're monitoring a network resource, don't depend on using the network to deliver alerts. If you subscribe to a paging service, get "Quick Page" or "tpage", and hook a modem and phone line up to the host which runs the daemon. -Be aware of dependencies on services so that you're not surprised when one component fails, and then see that three more things fail because of this. If you get burnt by this situation, learn from it and see what you can do minimize the dependencies. -Remember the power of "m4" if you want to do more complex things with the configuration file. To be sure that mon works efficiently, respect these rules: -Monitor programs should parallelize, like fping does. Instead of doing a bunch of fork(2)s to send out a lot of pings, it does it all from a single process, using nonblocking I/O, like Dan Farmer's "fping" from the Satan package. -If you use hostnames in your hostgroups, consider keeping a local /etc/hosts file on the mon server. Monitors can generate lots of DNS traffic. mon-1.2.0/doc/README.paging0000644003616100016640000000375310061516616015112 0ustar trockijtrockij$Id: README.paging,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ SO YOU WANT MON TO PAGE YOU?? ----------------------------- SOFTWARE It's not the job of mon to page you, but it *does* have the responsibility of triggering an alert program, which can page you. Paging via modem is probably the best way to handle notification, since phone systems usually fail much less than local (and wide) area networks. Think of it as "out-of-band" notification. QuickPage mon ships with a wrapper for QuickPage, which uses a modem to send an alphanumeric page via the IXO/TAP/SNPP (what is it today???) protocol. QuickPage is very simple to configure, supports groups, runs on a number of platforms, and is free. The latest version of QuickPage can be found at ftp.it.mtu.edu:/pub/QuickPage. Tpage Originally maintained by Tom Limoncelli (of INN FAQ fame), tpage was one of the earliest paging progmrams. It doesn't seem to be very well maintained recently, but it's worth having a look at. It supports multiple users and an "on call" schedule (which mon can already do), but its probably worth looking at anyway. It's mostly written in Perl. The last time I looked (Tue Sep 16 09:49:18 PDT 1997), I was not able to locate tpage-2.40.tar.gz :( Have a look at the IXO FAQ for more information, supposedly available from ftp://ftp.airnote.net/pub/paging-info/ixo.faq EMAIL PAGING If your paging company allows you to send pages via electronic mail, you can use the "netpage.alert" script that comes with mon. It just calls sendmail and fires off email to one or more addresses with a specially formatted subject line that should give maximum information in your pager's tiny alpha LCD. To format a page nicely on a tiny LCD, you may have to play with end of line characters. For example, pagenet pagers seem to ignore any EOL sequences other than just a plain \r A reminder--you might not want to rely on the network to send you a message if the network is down :) mon-1.2.0/doc/README.variables0000644003616100016640000000302710061516616015607 0ustar trockijtrockij$Id: README.variables,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ THIS FILE IS PROBABLY OUTDATED The following are global variables: maxkeep The maximum lines of alert or failure history to retain. For each service, the following variables are maintained by the server, and are available from the "status" command: _failure_count The number of failures since the start of monitoring. _start_of_monitor The time that the monitoring of this particular service started, as reported by the time(2) system call. _alert_count The number of alerts triggered since _start_of_monitor, in time(2) format. _last_failure The time(2) of the last failure that was detected. _last_success The time(2) of the last success for this service. _last_output The output of the last monitor command (not the current). _current_output The output of the current monitor command. _timer The number of seconds left before the monitor is invoked again. _last_opstatus The _op_status before the dependency code set it to STAT_DEPEND _op_status STAT_FAIL the monitor returned a failure STAT_OK the monitor returned a success STAT_COLDSTART a coldstart trap was received STAT_WARMSTART a warmstart trap was received STAT_LINKDOWN a linkdown trap was received STAT_UNKOWN unknown (reserved for stupid things) STAT_TIMEOUT a trap timeout occurred STAT_UNTESTED this service has not yet been tested STAT_DEPEND this service has been marked by the depend routines STAT_WARN a warning state mon-1.2.0/doc/monshow.10000644003616100016640000001526110230411543014526 0ustar trockijtrockij.\" $Id: monshow.1,v 1.2 2005/04/17 07:42:27 trockij Exp $ .TH monshow 1 "$Date: 2005/04/17 07:42:27 $" Linux "monshow" .SH NAME monshow \- show operational status of mon server. .SH SYNOPSIS .B monshow .RB [ \-\-help ] .RB [ \-\-showall ] .RB [ \-\-full ] .RB [ \-\-disabled ] .RB [ \-\-detail .IR group,service ] .RB [ \-\-view .IR name ] .RB [ \-\-auth ] .RB [ \-\-login .IR user ] .RB [ \-\-old ] .RB [ \-\-server .IR hostname ] .RB [ \-\-port .IR portnum ] .RB [ \-\-prot .IR protocol ] .RB [ \-\-rcfile .IR file ] .SH DESCRIPTION .B monshow show the operational status of the .B mon server. Both command-line and CGI interfaces are available. .SH OPTIONS .TP .B \-\-help show help .\" .\" .\" .TP .B \-\-showall Do not read configuration file, and show operational status of all groups and services. .\" .\" .\" .TP .B \-\-full Instead of showing only failed services, show all services no matter the state. .\" .TP .BI \-\-detail\ group,service Display detailed information for .I group and .IR service . This includes description, detailed output of the monitor, dependency information, and more. When invoked via CGI, append "detail=group,service" to get detail for a service. .\" .TP .BI "--view " name Display a pre-configured view. When invoked via CGI, supply the arguments "view=name" in the URL, or by using this technique: "http://monhost/monshow.cgi/name". For security reasons, leading forward slashes and imbedded ".."s are removed from the view name. .\" .TP .B \-\-auth Authenticate client to the mon server. .\" .\" .\" .TP .B \-\-disabled Show disabled groups, services, and hosts. The default is to not show anything which is disabled, but this may be overridden by the config file. .\" .\" .\" .TP .BI "--server " hostname Connect to the mon server on host .IR hostname . .I hostname can be either the name of a host or an IP address. If this name is not supplied by this argument, then the environment variable .I MONHOST is used, if it exists. Otherwise, .I monshow will fail. .\" .\" .\" .TP .BI \-\-login \ username When authenticating, use .IR username . .\" .\" .\" .TP .BI \-\-port \ portnum Connect to the server on .BR portnum . .\" .\" .\" .TP .BI \-\-prot \ protocol Sets the protocol to .IR protocol . The protocol must match the format "1.2.3". If unset, the default supplied by the Mon::Client module is used. Do not use this parameter unless you really know what you are doing. .\" .\" .\" .TP .B \-\-old Use the old 0.37 protocol and port number (32777). .\" .\" .\" .TP .BI \-\-rcfile \ file Use configuration file .I file instead of ~/.monshowrc. .SH CGI INVOCATION If .B monshow is invoked with the "REQUEST_METHOD" environment variable set, then CGI invocation is assumed. In that case, .B monshow gathers variables and commands submitted via the POST method and QUERY_STRING. Command-line options are ignored for security reasons. All reports which are produced via the web interface have a text mode equivalent. .SH VIEWS A view is a pre-defined configuration supplied to .BR monshow . Views can be used to generate different reports of the status of certain services for different audiences. They are especially useful if you are monitoring hundreds of things with mon, and you need to see only a subset of the overall operational status. For example, the web server admins can see a report which has only the web server statuses, and the file server admins can have their own report which shows only the servers. Users can customize their own views by editing their own configurations. Views are stored as files in a system-wide directory, typically .IR /etc/mon/monshow , where each file specifies one view. If this path is not suitable for any reason, it can be changed by modifying the .B $VIEWPATH variable in the .B monshow script. When invoking .B monshow from the command line, the view to display is specified by the .BI "--view=" name argument. In the case of CGI invocation, views can be specified by appending either .I "?view=name" or .I "/name" to the URL. For example, the following are equivalent: .I "http://monhost/monshow.cgi?view=test" .br .I "http://monhost/monshow.cgi/test" If a view is not specified, then a default configuration will be loaded from .I "$HOME/.monshowrc" (command-line invocation) or .I "cgi-path/.monshowrc" (CGI invocation). .SH VIEW CONFIGURATION FILE The view file contains a list of which services to display, how to display them, and a number of other parameters. Blank lines and lines beginning with a # (pound) are ignored. .TP .BI "watch" " group" Include the status of all the services for "group". .\" .TP .BI "service" " group service" Include the status of the service specified by .I group and .IR service . .\" .P If no .B watch or .B service configuration lines are present, then the status of all groups and services are displayed. .\" .TP .B "set show-disabled" This has the same effect as using the .B \-\-disabled option. .\" .TP .BI "set host" " hostname" Query the mon server .IR hostname . .\" .TP .BI "set port" " number" The TCP port which the mon server is listening on. .\" .TP .BI "set prot" " protocol" Set the protocol. This probably should not be used unless you really know what you're doing. .\" .TP .B "set full" Show everything disabled, all failures, all successes, and all untested services. .\" .TP .BI "set bg" " color" Background color for the CGI report. The value of this parameter should resemble "d5d5d5" (without the quotes). .\" .TP .BI "set bg-ok" " color" Background color for services which are in an "ok" state. .\" .TP .BI "set bg-fail" " color" Background color for services which are failing. .\" .TP .BI "set bg-untested" " color" Background color for services which have yet to be tested. .\" .TP .BI "set refresh" " seconds" For CGI output, set the frequency that the report reloads. The default is to not reload. .\" .TP .BI "summary-len" " len" For CGI output, set the maximum length of the summary output to display. Summary text which exceeds .I len will be truncated and replaced with ellipses. .\" .TP .BI "link" " group service URL" For the CGI report, make a link to .I URL at the bottom of the detail report for .I "group/service" for more information. .\" .TP .BI "link-text" " group service" Insert all HTML up until a line beginning with "END" after the link specified with the .B "link" setting. .\" .TP .B "set html-header" Lines after this statement, continuing up until a line beginning with the word "END" will be displayed after the "" tag in the CGI output. Use this to display custom headers, including images and other fancy things. .SH "ENVIRONMENT VARIABLES" .IP MONHOST The hostname of the server which runs the .B mon process. .SH SEE ALSO mon(8) .SH BUGS Report bugs to the email address below. .SH AUTHOR Jim Trocki mon-1.2.0/doc/README.monitors0000644003616100016640000003104410061516616015511 0ustar trockijtrockij$Id: README.monitors,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ The following monitors are provided with the distribution, to get you started. It's simple to add your own monitors. See the man page for "mon" to learn how. fping.monitor ------------- This pings a list of hosts efficiently using the fping program, from the Satan distribution. fping.monitor is just a simple shell wrapper for fping, and is normally invoked with just the list of hosts to ping. Here's a trick: say you don't want to trigger an alert until a machine has unpingable for some number of minutes. Give fping.monitor the arguments "-r 3 -t 240000". arguments: -a only report failure if all hosts fail -r num retry "num" times for each host before reporting failure -t num set timeout between retries to "num" milliseconds -s num consider hosts whose response time exceed "num" milliseconds as failures -T for each failed host (no response only), traceroute to the system and report the output ping.monitor ------------ Similar to fping.monitor, but uses the system's ping program. This serializes the pings, which is normally bad to do. This is simply an alternative to fping.monitor, if you can't get fping to compile. I've only tested it with Linux and Solaris. freespace.monitor ----------------- This will monitor disk space usage of a particular NFS server. Arguments are supplied as "path:kBfree [path:kBfree...]". If free space dips below kBfree, then it returns a failure condition, and the output is how much space is left on that server. If you use this monitor, use separate mounts for the volumes that you want to test, mounting them with the "-o ro,intr,soft" options, so things don't hang too bad if the server is down. You should use the ";;" directive to the monitor line, because freespace.monitor doesn't take a list of hosts. Here's an example: watch nfsservers service fping interval 5m monitor fping.monitor alert mail.alert mis-alert@company.com alert netpage.alert mis-pager alertevery 60m service freespace interval 10m monitor freespace.monitor /server1:5000000 /server2:5000000 ;; alert mail.alert mis-alert@company.com alert netpage.alert mis-pager alertevery 60m tcp.monitor ----------- Useful to see if it's possible to connect to a particular port on a particular server. This is over-simplified, and does not yet support parsing of the output from these services. Options are "-p port" to tell which port to check, and a list of hosts. http_t.monitor -------------- This monitor, contributed by Jon Meek (meekj@pt.cyanamid.com), will use HTTP to connect to a server, get a page, and log the transfer speed of the transaction. It uses the Time::HiRes Perl module, available from CPAN. It can also register a failure if the transfer doesn't complete within a certain number of seconds. See the source code for an explanation of the arguments. http_tp.monitor --------------- Used to measure and log http file transfer speed and use a proxy. See the comments in the source code for instructions. Requires Time::HiRes, LWP::UserAgent, and HTTP::Request. dns.monitor ----------- dns.monitor will make several DNS queries to verify that a server is providing the correct information for a zone. The zone argument is the zone to check. There can be multiple zone arguments. The master argument is the master server for the zone. It will be queried for the base information. Then each server will be queried to verify that it has the correct answers. It is assumed that each server is supposed to be authoritative for the zone. ftp.monitor ----------- Connect to an ftp server, wait for an acceptible prompt, and log out. hpnp.monitor ------------ Uses SNMP to monitor HP JetDirect-equipped printers. Reports failures as told by the various objects in HP's MIB, and returns the message that is showing on the printer's LCD ("LOW TONER", "LOAD LETTER", etc.). http.monitor ------------ Connects to an http server, retrieves a URL, and returns true if everything is OK. imap.monitor ------------ Connects to an IMAP server, checks for a sane response, and then logs out. ldap.monitor ------------ This script will search an LDAP server for objects that match the -filter option, starting at the DN given by the -basedn option. Each DN found must contain the attribute given by the -attribute option and the attribute's value must match the value given by the -value option. Servers are given on the command line. At least one server must be specified. netappfree.monitor ------------------ Use SNMP to get free disk space from a Network Appliance exits with value of 1 if free space on any host drops below the supplied parameter, or exits with the value of 2 if there is a "soft" error (SNMP library error, or could not get a response from the server). This requires the UCD SNMP library and G.S. Marzot's Perl SNMP module. Supply a configuration file with "--config file" option (see etc/netappfree.cf for an example), or "--list" for a listing of filesystems which are on your filers. Use --list for help in building a configuration file. nntp.monitor ------------ Tries to connect to a nntp server, and wait for the right output. ping.monitor ------------ Returns a list of hosts which not reachable via ICMP echo. Uses the system's default ping, rather than fping. pop3.monitor ------------ Connects to a POP3 server, waits for the OK prompt, then logs out. process.monitor --------------- Monitor snmp processes. Arguments are: [-c community] host [host ...] This script will exit with value 1 if host:community has processErrorFlag set. The summary output line will be the host names that failed and the name of the process. The detail lines are what UCD snmp returns for an ErrorMsg. ('Too (many|few) (name) running (# = x)'). If there is an SNMP error (either a problem with the SNMP libraries, or a problem communicating via SNMP with the destination host), this script will exit with a warning value of 2. There probably should be a better way to specify a given process to watch instead of everything-ucd-snmp-is-watching. reboot.monitor -------------- Polls the SNMP agent on hosts, and triggers a failure when a reboot is detected. smtp.monitor ------------ Connects to an SMTP server, waits for a prompt, and then logs out. telnet.monitor -------------- Use tcp_scan to try to connect to the telnet port on a bunch of hosts, and look for a "login" prompt. msql-mysql.monitor, rpc.monitor ------------------------------- See the separate README for these monitors. readdir.monitor --------------- From: gilles LamiraL To: "mon@linux.kernel.org" Subject: readdir monitor Hello, I wrote a monitor that reads several directories and tells if the number of files in each directory exceeds a given number. Possible uses are testing /var/spool/mqueue or /var/spool/lp/ It is a local monitor. No SNMP here. I think it can be easyly called from an SNMP agent. 1) The allowed number can be specified for each directory. 2) You can add a regex filter to match the file names. Only one regex is allowed for all directories. Tell me if you want one for directory. 3) The return status is interesting. It gives the exceeded values in a log based 2 way. For example: You want to check if /var/spool/mqueue contains less than 100 messages $ ls /var/spool/mqueue | wc -l 479 $ ./my-readdir.monitor /var/spool/mqueue:100 /var/spool/mqueue:479 $ echo $? 3 1 means more than 100 messages 2 means more than 200 messages 3 means more than 400 messages 4 means more than 800 messages ... 255 means more than 5.79 * 10^76 messages (579 + 74 zeros !) Nice ? See more example in the script itself. up_rtt.monitor -------------- mon monitor to check for circuit up and measure RTT. Jon Meek - 09-May-1998. Requires Perl Modules "Time::HiRes" and "Statistics::Descriptive". dialin.monitor -------------- Dials in to a modem and fails if a carrier and a prompt is not detected. Useful for telling if your modem pool is down or if some spaz modem has quit answering the phone. dialin.monitor requires the Perl Expect module, available from CPAN. This program performs UUCP-style locking, and needs to run setgid uucp to accomplish this. Provided is dialin.monitor.wrap.c, a simple little C program which is installed as setgid uucp and directly executes the actual dialin.monitor Perl script. This is required because some systems (e.g. Linux) do not allow setuid/setgid scripts. To build, edit the Makefile in mon.d, and adjust monpath to your environment. The do: make && make install dialin.monitor accepts several arguments. The only required argument is "-n", which specifies the phone number to dial. -n number dial in to "number" -t secs timeout to wait for "CONNECT" from modem (60) -l lockdir directory to use for UUCP-style locking ("/var/lock") -D device serial device to use ("/dev/modem") foundry-chassis.monitor ----------------------- Reports the power supply and fan status of Foundry chassis-based switches, like the BigIron and the FastIron. This uses the "FOUNDRY-SN-AGENT-MIB" and "FOUNDRY-SN-ROOT-MIB". Foundry annoyingly ships their MIBs in one giant file. What I do is separate them into distinct files so that the UCD tools don't need to parse the single giant file. I've tested this with staged failures of PSUs and it works fine. It actually caught an actual non-staged failure once. Arguments are: -c community SNMP community to use silkworm.monitor --------------- Reports port, fan, power supply, and temperature failures in Brocade SilkWorm FCAL switches. It requires Brocade's "SW-MIB" MIB. Sensor failures are explicitly reported by the agent, read by this monitor, and reported to mon. This monitor identifies port problems by paying attention to only those ports whose administrative status is "online", yet the actual operational status is not "online". This monitor has not yet been tested in the case of an actual (or staged) failure. That doesn't mean it doesn't work--it's just that it hasn't been tested :) Arguments are: -c community SNMP community to use cpqhealth.monitor ----------------- Report fan, PSU, and temperature failures from systems running the Compaq "Insight Manager". It requires the "CPQHLTH-MIB" MIB, and the UCD SNMP libs w/the Perl module. We've had this running for a little while now, and both tested it with "staged" failures and actual failures, and it seems to work rather well. The Insight agent is a bit quirky, though. I've seen where it reports that both PSUs are installed, running without error, yet it says it is not in a redundant configuration. Arguments are: -c community SNMP community to use mon.monitor ----------- Report the running status of a mon server. Arguments are: -p port port to use, defaults to 2583 -t timeout timeout in seconds, defaults to 30 -u username username (optional) -p password password (optional) traceroute.monitor ------------------ Monitor routes from monitor machine to a remote system using traceroute. Alarm and log when changes are detected. See embedded POD documentation for details. smtp3.monitor ------------- smtp monior which performs logging of connect times. See embedded POD documentation for details. http_tpp.monitor --------------- Parallel query http server monitor for mon. Logs timing and size results, can use a proxy server, and can incorporate a "Smart Alarm" function via a user supplied Perl module. See embedded POD documentation for details. file_change.monitor ------------------- file_change.monitor will watch specified files in a directory and trigger an alert when any monitored file changes, or is missing. File changes can optionally be logged using RCS. See embedded POD documentation for details. na_quota.monitor ---------------- report quota limits on network appliance filers. see the comments in the file for details. mon-1.2.0/doc/how-to-write-an-alert.txt0000644003616100016640000000000310061516616017556 0ustar trockijtrockij mon-1.2.0/doc/README.mon.cgi0000644003616100016640000002704710230411543015170 0ustar trockijtrockijIntroduction to mon.cgi -------------------------------------------------------- This interface, along with mon itself, is available from ftp://ftp.kernel.org/pub/software/admin/mon/ Development versions of mon.cgi can be found at http://www.nam-shub.com/files/ -------------------------------------------------------- mon.cgi is a web-based GUI for mon. Its purpose is twofold: 1) To provide an easy-to-read visual display of all the status items that mon keeps track of, and 2) To provide an easy-to-use web administration interface to allow users to perform all mon administration tasks from any web browser. This package and the documentation assumes that you have at least a basic familiarity with mon. ----------------------------------------------------------------- mon.cgi v.1.52 21-May-2001 by Andrew Ryan This interface, along with mon itself, is available from ftp://ftp.kernel.org/pub/software/admin/mon/ Development versions of mon.cgi can be found at http://www.nam-shub.com/files/ ----------------------------------------------------------------- This is the latest stable version of mon.cgi, meant to be used only with mon 0.38-21 and above, and a version of Mon::Client that is 0.11 or higher. The chief reason that you will need the new version is for the "test config" functionality. This release has 4 new features of note: 1) Access control. Using the 'watch' keyword in the config file, you can restrict access to a particular configuration on a per-hostgroup basis. 'watch' keywords can be regular expressions. Original idea and keyword name stolen from monshow :) 2) 'watch' keywords can either be implemented "softly" -- by default only certain hostgroups are shown, but all can be accessed -- or "strictly" -- only the hostgroups explicitly allowed by 'watch' keywords can be accessed in any way. Using strict access control, an organization using mon to watch systems belonging to multiple customers to be able to segregate those different customers' monitoring completely. 3) There's now a login button. The people have spoken! 4) mon.cgi now checks for the proper version of Mon::Client before it starts. This was a major support problem. Plus many other bug fixes and small improvements, as usual. This release should be considered stable until proven otherwise :) Please see the CHANGES file for more information about this release. Thanks to all who report bugs, submit patches, and give feedback. Andrew Ryan Installing mon.cgi ------------------ Instructions for installing mon.cgi are located in the header of the mon.cgi file itself. Roughly speaking, the order of events is: 1) Install mon and get it working, set up monpasswd and auth.cf files and get them verifiably working if you're using mon.cgi authentication (hint: you should be!). 2) Install a web server, preferably Apache, and preferably with mod_perl built in. Start the web server and verify that it works. 3) Put mon.cgi in your cgi-bin directory and make sure it is executable by the apache user (make it 0755 or 0555). 4) Edit your mon.cgi file to change default values to match your environment (e.g. contact email, your company logo, your company name, etc.). 5) If you're requiring users to log in (highly recommended), you must change the default app secret variable $app_secret in your copy of mon.cgi, and install the Crypt::TripleDES module from CPAN on the machine which will be running mon.cgi. 6) If you want to easily customize the look and feel of mon.cgi, as well as various other configuration options, copy the sample mon.cgi.cf file (in the /config directory of this distribution) into a location where your webserver can read it, and edit the line beginning '$moncgi_config_file = ""' to reflect the path to your config file. You can then change the look and feel of mon.cgi, as well as implement access controls, directly from this file. mon.cgi Design Goals -------------------- 1) Provide 100% of the functionality of mon in a graphical user interface. Ideally, there will be some things that the GUI is better for, and inevitably, some things that the command line will always win out for. 2) Maintain 100% compatibility with mon and Mon::Client. If a patch to mon or Mon::Client is required to get a piece of mon.cgi functionality working, we write it, submit it, and get it folded in to the main distribution before making it official in mon.cgi. 3) Expose mon to the largest number of people possible in the most useful way. It is the author's belief that mon is a very useful piece of monitoring software, and it is also my belief that the best way to insure the growth and support of this software is to expose it to a large number of people in your organization in a way that will cause them to reach the same conclusion. A web client is the most universal way to achieve this goal at the present time, as a web client can be run on any network that mon would be. 4) Simplicity and lightness. In other words: Compatibility on a large number of client browser sizes, versions, and resolutions; No frames! ; Adhering to as many of the standard good usability conventions as possible ; Keeping mon.cgi all one file, with a very short setup time ; No special modules required past those needed to run mon, and optional additional modules kept to a minimum ; 100% text browser compatibility ; Performance and speed ; Low resource utilization. Sometimes these design goals work against one another, but hopefully we come out ahead when tradeoffs are made. Alternatives to mon.cgi ----------------------- If you don't like mon.cgi but you would still like a web GUI, you have 2 alternatives. Your first alternative is Jim's monshow, which ships with mon in the clients/ subdirectory of the mon distribution. The second alternative is Gilles Lamiral's Minotaure, which can be found at ftp://ftp.kernel.org/pub/software/admin/mon/contrib/. Both of these are fully functional and may suit your needs better than mon.cgi. You are encouraged to take a look at them both and decide which is best for you. SITE CUSTOMIZATION ------------------ mon.cgi has always been "customizable," in that the source was available and you were encouraged to substitute your own parameters (e.g., mon host, mon port, company logo, etc.). But this meant that with each new version, you had to go back and re-edit the source code. Not a big deal, but still something of a pain. As of v1.49, mon.cgi includes some features which are meant to facilitate these changes and make site-specific customizations easier to perform, especially as mon and mon.cgi continue to evolve. Creating Your Own Config File ----------------------------- Previous to v.1.49 of mon.cgi, you could customize the look of the page, but all customizations had to be done in the source itself. This has numerous disadvantages, so 1.49 introduces an *optional* config file which will be read only as necessary and will allow you to specify custom values for parameters without having to touch the source code each time. You can still edit the source each time if you want, but if you want to set up a config file, follow these steps: 1) Copy the config file (included with the mon.cgi distribution) config/mon.cgi.cf to a location of your choice. It's best to start with a sample config file, because the config file format is very simple, and it will give you a chance to see how it works and experiment with parameters. 2) Edit the mon.cgi source code to find the line that specifies the variable "$moncgi_config_file". Change the value to the filesystem path of your copy of your mon.cgi config file. 3) Now you can edit the config file and make changes at will. Every time you change the mtime of the file (e.g., by saving it in a text editor, or touch'ing the file), mon.cgi will re-read the config file and the changes will take effect. If there are errors in parsing the config file, they will go to STDERR, which in most setups will end up in your web server's error log. Look in the errors file if your config isn't working like you expect it to work. Adding A New Row And Custom Commands To The Command Button Bar -------------------------------------------------------------- Adding a new row to the command button bar, with corresponding custom commands, is quite a bit more involved than the relatively simple matter of changing a config file. If you've developed, or are interested in developing your own custom commands, however, this functionality might be just what you needed. In the following example, we add a command called "ack_all" to the button bar, and also add the routine to do the ack'ing. The actual guts of the ack_all routine aren't included, but the goal of these instructions is to give you enough to start off. The first step is to create your own moncgi_custom_print_bar function. A stub function exists in the mon.cgi code, and the below code shows you how you would put in your own function that has one button, labeled "Acknowledge All Failures". Sample moncgi_custom_print_bar subroutine: sub moncgi_custom_print_bar { # # This is a sample routine, which adds a third row to the # command table, with one command: "Acknowledge All Failures" # my ($face)= (@_); $webpage->print("\n"); $webpage->print("\tAcknowledge All Failures\n"); $webpage->print("\n"); } The next step is to tell mon.cgi that you are using your own custom commands, by creating your own moncgi_custom_commands subroutine. Again, there is a sample function in the mon.cgi code which you can replace with your own. Sample moncgi_custom_commands subroutine: sub moncgi_custom_commands { if ($command eq "ack_all") { # # Set up the page # &setup_page("Acknowledge All Alarms"); # # Note: you would have to write the "ack all" # command yourself! &moncgi_ack_all; } else { # # We didn't find anything, return # return 0; } return 1; # we did find something, suppress further command processing } The last step is to create the actual subroutines which will do the custom work you want them to do (assuming you weren't just calling existing commands in a different way. In our example, this means we have to write a function that actually goes out and acks all existing failures. We won't do this here, but hopefully this gives you an idea of how to proceed. sub moncgi_ack_all { # # Here is where the actual code to do the "ack all" would go # } When future releases of mon.cgi come out, you can copy and paste your custom subroutines and be up and running with the new version in minimal time. At least, that is what this was designed for. Credits ------- The current maintainer is Andrew Ryan . Report all bugs to him or the mon users mailing list. + Originally by: Arthur K. Chan + Based on the Mon program by Jim Trocki . http://www.kernel.org/software/mon/ + Rewritten to support Mon::Client, mod_perl, taint mode, authentication, the strict pragma, and other visual/functional enhancements by Andrew Ryan . + Downtime logging contributed by Martha H Greenberg + Site customization extensions by Ed Ravin + The contributions of members of the mon-users mailing list have been invaluable in many ways. mon-1.2.0/doc/README.alerts0000644003616100016640000000072010061516616015126 0ustar trockijtrockij$Id: README.alerts,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ The following alerts are provided with the distribution: mail.alert Sends an email alert to a list of email addresses. Separate addresses with spaces, not commas. qpage.alert Calls QuickPage for one or more pagers. Multiple pagers are separated on the command line via spaces. file.alert Logs alerts to a file, which is specified as the first argument on the command line. mon-1.2.0/doc/README.snmpdiskspace.monitor0000755003616100016640000001111410616216722020171 0ustar trockijtrockij# NAME # snmpdiskspace.monitor # # # SYNOPSIS # snmpdiskspace.monitor [--list] [--timeout seconds] [--config filename] # [--community string] [--free minfree] # [--retries retries] [--usemib ] host... # # # DESCRIPTION # This script uses the Host Resources MIB (RFC1514), and optionally # the MS Windows NT Performance MIB, or UCD-SNMP extensions # (enterprises.ucdavis.dskTable.dskEntry) to monitor diskspace on hosts # via SNMP. # # snmpdiskspace.monitor uses a config file to allow the specification of # minimum free space on a per-host and per-partition basis. The config # file allows the use of regular expressions, so it is quite flexible in # what it can allow. See the sample config file for more details and # syntax. # # The script only checks disks marked as "FixedDisks" by the Host MIB, # which should help cut down on the number of CD-ROM drives # erroneously reported as being full! Since the drive classification # portion of the UCD Host MIB isn't too great on many OS'es, though, # this won't buy you a lot. Empire's SNMP agent gets this right on # all the hosts that I checked, though. Not sure about the MS MIB. # UCD-SNMP only checks specific partition types (md, hd, sd, ida) # # snmpdiskspace.monitor is intended for use as a monitor for the mon # network monitoring package. # # # OPTIONS # --community The SNMP community string to use. Default is "public". # --config The config file to use. Default is either # /etc/mon/snmpdiskspace.cf or # /usr/lib/mon/mon.d/snmpdiskspace.cf, in that order. # --retries The number of retries to use, if we get an SNMP timeout. # Default is retry 5 times. # --timeout Seconds to wait before declaring a timeout on an SNMP get. # Default is 20 seconds. # --free The default minimum free space, in a percentage or absolute # quantity, as per the config file. Thus, arguments of, for # example, "20%", "1gb", "50mb" are all valid. # Default is 5% free on every partition checked. # # --ifree The default minimum free inode percentage, specified as # a percentage. Default is 5% free. # # --list Give a verbose listing of all partitions checked on all # specified hosts. # # --listall like --list, but also lists the thresholds defined for # each filesystem, so you can doublecheck the config file # # --usemib Choose which MIB to use: one or more of host, perf, ucd # Default tries all three, in that order # # --debug enable debug output for config file parsing and MIB fetching # # # EXIT STATUS # Exit status is as follows: # 0 No problems detected. # 1 Free space on any host was below the supplied parameter. # 2 A "soft" error occurred, either a SNMP library error, # or could not get a response from the server. # # In the case where both a soft error and a freespace violation are # detected, exit status is 1. # # BUGS # When using the net-snmp agent, you must build it with "--with-dummy-values" # or the monitor may not parse the Host Resources MIB properly. # # List of local filesystem types used when parsing the UCD MIB should be # configurable. # # # NOTES # $Id: README.snmpdiskspace.monitor,v 1.1.2.1 2007/05/02 23:25:06 trockij Exp $ # # * Added support for inode status via UCD-SNMP MIB. Fourth column in config # file (optional) is for inode%. # * added --debug and --usemib options. Latter needed so you can force use # of UCD mib if you want inode status. # * rearranged the error messages to be more Mon-like (hostname first) # * added code to synchronize instance numbers when using UCD MIB. This # could solve the "sparse MIB" problem usually fixed by the # --with-dummy-values option in net-snmp if needed for other agents # Ed Ravin (eravin@panix.com), January 2005 # # Added support for regex hostnames and partition names in the config file, # 'use strict' by andrew ryan . # # Generalised to handle multible mibs by jens persson # Changes Copyright (C) 2000, jens persson # # Modified for use with UCD-SNMP by Johannes Walch for # NWE GmbH (j.walch@nwe.de) # # Support for UCD's disk MIB added by Matt Simonsen # # # SEE ALSO # mon: http://www.kernel.org/software/mon/ # # This requires the UCD SNMP library and G.S. Marzot's Perl SNMP # module. (http://ucd-snmp.ucdavis.edu and CPAN, respectively). # # The Empire SystemEdge SNMP agent: http://www.empire.com mon-1.2.0/doc/CHANGES.mon.cgi0000644003616100016640000003013310146140376015302 0ustar trockijtrockijmon.cgi v1.52 21-May-2001 ------------------------- + added check for sufficient Mon::Client version + added optional "watch" keyword to config file that allows users to see only the groups they are configured to be allowed to see, by regex. + added optional keyword "show_watch_strict" that, when set to "yes", will enforce watch keywords strictly, and not allow the mon.cgi user to see any detail about any other hostgroup. + query_groups added summary/ack information to failed services + query_groups: now prints red or yellow as appropriate, instead of just red, for failed services. + added "log in" link to mon.cgi base page + moncgi_get_params: Fixed bug with bug with null values of $monhost and $monport getting through. + fixed moncgi_reset bug - keepstate & no-keepstate are reversed + moncgi_authform: passwd dialog s cleared after unsuccessful password entry. + new function: moncgi_login - allow user to log in prior to having to execute a privileged action. + new config parameter: logo_link. logo_link is a URI that will be linked to the logo picture, if logo is defined. + New function: can_show_group(groupname), to test if a group can be shown according to the "watch" directives. + The following functions were updated to reflect the new watch keyword access control routines : list_alerthist, list_dtlog, query_group, list_disabled, svc_details, mon_test_service, moncgi_test_all, mon_enable, mon_disable, mon_ack + fixed numerous warnings, did some code cleanup and improved comments. + Fixed another mod_perl bug in monhost/monport parsing + Updated moncgi-appsecret.pl, in the util directory, to reflect new code. mon.cgi v1.51 22-Mar-2001 ------------------------- + Fixed taint-checking problem with monhost and monport args (Mon::Client was complaining under TaintMode/-T). mon.cgi v1.50 15-Mar-2001 ------------------------- + Config file parsing support was not working properly. This has been fixed, and a new subroutine was introduced: initialize_config_globals. mon.cgi v1.49 14-Mar-2001 ------------------------- + Add test_config option on main menu bar (new 0.38.21 command) + change reset to single button, with follow-up page, giving two choices -- reset keepstate and reset. + new function - moncgi_reset to allow users to choose which type of reset they would like to execute. + Patch from Ed Ravin (eravin@panix.com) to accomodate a site-specific custom toolbar row and site-specific menu commands. + added a optional config file that lets users specify their own mon.cgi parameters. + added TVA color scheme to the distro (from tbates@tva.gov) + Use HTML::Entities to escape HTML submitted as ack messages, avoiding cross-site scripting attacks/javascript and ensure proper encoding of characters entered as ack messages. HTML scrubbing can be skipped by setting the variable untaint_ack_msgs to "no". + remove all
's and replace with
        . Important messages were
        often getting cut off the screen by the use of 
.
        + make $monhost and $monport optional CGI params as 'h' and
        'p' respectively
        + added "test service" and "test-all" to query_group page

mon.cgi v1.48 01-Dec-2000
-------------------------
	+ Have ability to do mass disabling/enabling of hosts and
	services in hostgroup.
	+ query_group: have radio button for enabled/disabled status
	(facilitates mass en/disabling)
	+ query_group: added a table on to show services for that group,
	enabled/disabled with radio button.
	+ query_group: now includes service status on this page
	+ query_group: mass dis/enabling of svcs requires a new function,
	mon_state_change
	+ svc_details: widened the table
	+ main: Command matching changed to use exact matches instead of
	regex matches (duh).
	+ main: fix bug with Revision tag in $VERSION
	+ list_disabled: Also added mass disabling
	+ mon_state_change_enable_only: new function to support
	list_disabled mass re-enabling.
	+ list_pids: cleaned up function and formatting
	+ added mon_state_change function for mass state changing
	+ added mon_list_opstatus function
	+ query_opstatus: moved legend to below main table
	+ query_opstatus: changed legend to use bgcolor instead of font color
	+ query_opstatus: ack message is now included in summary
	+ query_opstatus: increased main table width to 100%
	+ query_opstatus: can now test svcs from this page
	+ ability to do multiple tests at the same time for a single
	hostgroup
	+ moncgi_test_all: new function to test all svcs in group
	+ Ran mon.cgi through 'tidy' (http://www.w3.org/People/Raggett/tidy/)
	for improved HTML compliance. Most common pages are OK now (I think)
	except for table summary attributes. I'll get to them eventually. 
	+ added last_ok time for failed services in "Last Check" column
	+ color of UNCHECKED services is now midnight blue by default,
	unchecked services are now readable in the default color scheme!



mon.cgi v1.46 20-Aug 2000
-------------------------
	+ Fixed bug in list_dtlog that would show min and max failure time
	as "-1" seconds if no failures had been seen on that service. Also
	the table is now not printed at all instead of being a 0-row table.
	+ Made it easier for users to get themselves out of the situation
	where they enter in a valid username and an invalid password.
	+ Made the summary info MUCH easier to see when a service is in
	the failure state.
	+ alert_details is now "svc_details", a much more descriptive name,
	since it shows success as well as failure details.
	+ svc_details [nee alert_details] got a little bit of a cleanup 
	(not much).
	+ list_dtlog now has a configurable maximum number of entries per
	page that it will display, defaults at 100. Large downtime logs 
	would not render well in most browsers, and would not render at
	all with Netscape's table drawing algorithm.
	+ Added optional $monport argument, in case you don't run mon 
	on port 2583.
	+ Trap watches are now correctly handled and printed (thanks
	to Ed Ravin  for the bug report and fix).
	+ Fixed bug in pp_sec that would cause "1 days" to be printed
	out instead of "1 day".


mon.cgi v1.45 05-Jun 2000
-------------------------
	+ query_opstatus: Built an "amber level" alert for services 
	that have failed  but never issued an alert
	+ query_opstatus: Changed "Last Checked" and "Est. Next Check" 
	times to be deltas instead of absolute times, both relative to 
	servertime and not localtime.
	+ Added ACK (and re-ack) feature
	+ query_opstatus: Added additional visual warnings if scheduler 
	is not running or cannot be contacted.
	+ Changed default app secret
	+ Button bar at top of each page is cleaner
	+ Fixed bug with scheduler falsely claiming to be stopped if you try
	to stop the scheduler and aren't authenticated, or if the server is
	not running. 
	+ Fixed bug where multiple auth failures are displayed if a user
	is not authenticated (should only notify once)
	+ Made it easier to not hit "reset server" button accidentally
	+ Made font on ONDS check times size -1
	+ Show the downtime log as an option on query_group
	+ Fixed "test immediately" stuff so it tests and then shows right
	status
	+ list_opstatus: hostgroup column no longer goes white if svc is 
	unchecked
	+ alert_details is MUCH spiffier
	+ alert_details now checks to see if a monitor for that service/group
	is currently running, and as such, the status reported is subject
	to change very soon.
	+ Added more decriptive text to service status table in alert_details
	alert_details.
	+ Changed default return screen on enable_service to be alert_details
	if that's where the user last came from.
	+ Added new 0.38-18 data types for alert_details
	+ list_dtlog: Display median in addition to mean failure time 
	to lessen effects of
	downtime outliers.
	+ Added a Refresh button on alert_details page
	+ Cleaned up the list_disabled function
	+ Got rid of backwards() function, unused relic from old mon.cgi
	+ Fixed the META REFRESH tags so that it works on all browsers (put
	it in the header where it belongs) and handles more cases 
	(alert_details, test_service)
	+ Started using servertime in places instead of time on local web
	server
	+ Visual enhancements for this version submitted by
	Brian Doherty 
	+ Fixed a bug in the "failure-free operation %" calculation if
	you had an extremely large number of failures in a time period, %
	could show up as negative.


mon.cgi v1.38 18-Feb 2000
-------------------------
	+ MAJOR speedup, only use one Mon connection per page view.
	  Pages typically load 2-3x faster.
	+ list_opstatus in Summary mode is now more brief. All "OK, 
	  Non-Disabled Services" (ONDS) for any given hostgroup are
	  now aggregated in a single line.
	  If you monitor a lot of services on each of your host 
	  groups, this will save you a lot of screen real estate.
	  Services which are disabled and/or failing are still broken
	  out individually.
	+ added FAILED flag to Status box , moved DISABLED flag, so
	  mon.cgi works with Lynx & w3m or any other text browser
	  that supports tables (only Lynx and w3m tested, looks great
	  with w3m by the way).
	+ changed default path of cookie to "/" to avoid lynx complaining
	  about "invalid cookie path".
	+ changed alert_details to use a table, include "view downtime log"
	+ on query_group page, turn box gray if host is disabled.
	+ fixed a div0 bug if you have no entries in your dtlog and ask
	  to view it
	+ changed disabled host in query_group to sort alpha even when
	  hosts are disabled.
	+ alert_details function now auto-detects failure/success, doesn't
	  need to be told which one to look for ("test service immediately"
	  would show inconsistent results from this behavior, since it
	  is impossible to know the results of a test before you run it!)


mon.cgi v.1.35
--------------
+ Downtime log viewing/querying support.
+ Disabled services/hosts/watches now appear as gray-colored boxes on
the main display screen. This makes it easier to see what is disabled.
+ Fixed loadstate and savestate bugs again. These commands now work.
+ I finally have sort of a release process, so hopefully my releases
will not be littered with formatting code that is specific to my
environment, and they will run fine out of the box when you get them.
+ Fixed a few routines to work with changing ways Mon::Client asks you
to do things.
+ Also, if you are logged in as an authenticated user (not the
"default user", if one is defined), your username will appear on each
page, so you always know who you are authenticated as.
+ Added a logout button. 
+ Added ability to do "reset keepstate" as well as "reset" from the
web interface.
+ The command bar is now 2 lines instead of one. Even on my 21"
monitor, 13 buttons was too much to have on 1 line (let alone my poor
800x600 laptop LCD!).
+ Mon::Client::test is broken in v0.7. To make it work in the way that
mon.cgi expects it to, change line 1470 in Client.pm v0.7 from:
>     if ($what !~ /^alert|startupalert|upalert$/) {
to
<     if ($what !~ /^monitor|alert|startupalert|upalert$/) {


mon.cgi 1.32.1.2 01-Feb 2000
----------------------------
+ Fixed loadstate and savestate to not be NOOPs.
+ Established a "default" user for when authentication was required but
you don't want to make users log in just to list status.
+ Along with the default user, there is also now a "switch user" feature
that offers the user the chance to re-authenticate to a user of higher
privilege if they are denied the running of a command due to a lack
of authorization.
+ Fixed HTML bugs with hardcoded colors in font and table tags scattered
throughout code (patch courtesy of Martha H Greenberg ,
thanks!). This makes it possible to run mon.cgi in colors other than the
default scheme. mon.cgi users take note however, testing color schemes is
not part of my QA process (such as it is) and so if you find something
broken, let me know and I'll fix it.


mon-1.2.0/doc/README.msql-mysql.monitor0000644003616100016640000000266610061516616017454 0ustar  trockijtrockijmysql-mysql.monitor README
==========================

See the monitor script itself for most of the pertinent usage information.

This is msql-mysql.monitor, a monitor for mon that tries to intelligently
check if an mSQL or MySQL SQL server is operational.

This monitor required the perl5 modules DBI, DBD::mysql, and DBD::mSQL,
available from CPAN (http://www.cpan.org/).

The monitor may be installed as msql.monitor, in which case it defaults to
mSQL mode, or as mysql.monitor, in which case it defaults to MySQL mode. 
Regardless of how it is installed, the --mode switch may be used to force
the monitor into msql or mysql mode.

In order for the monitor to succeed, the following must be true:

- For either mode, the server must be up and answering.
- For mSQL mode, the server ACLs must allow connections from the host running
  mon as the effective user running mon to the given database.
- For MySQL mode, the server grant tables must allow connections from the host
  running mon with the username and password provided to the given database.
- For either mode, the database specified must exist, and must contain at
  least one table

If any of these conditions are not met, the monitor will fail and the DBI
error will be returned to mon for processing by the appropriate alerts.

-- 
j.

James FitzGibbon                                                james@ican.net
System Engineer, ACC Global Net                   Voice/Fax (416)207-7171/7610
mon-1.2.0/doc/moncmd.10000644003616100016640000002014210230411543014303 0ustar  trockijtrockij.\" $Id: moncmd.1,v 1.2 2005/04/17 07:42:27 trockij Exp $
.TH moncmd 1 "$Date: 2005/04/17 07:42:27 $" Linux "moncmd"
.SH NAME
moncmd \- send commands to mon daemon and show the results.
.SH SYNOPSIS
.B moncmd
.RB [ \-a ]
.RB [ \-d ]
.RB [ \-l
.IR username ]
.RB [ \-f
.IR file ]
.RB [ \-s
.IR server ]
.RB [ \-p
.IR port ]
.RB [ command ]
.SH DESCRIPTION
.B moncmd
sends commands to the
.B mon
server.
.SH OPTIONS
.TP
.B \-a
Authenticate with the server.
.TP
.B \-d
enable debugging, which is the same as
.B \-s
.IR localhost .
.TP
.BI "-f " file
Read and execute commands from
.IR file .
.TP
.BI "-l " username
Supply
.I username
as the username while authenticating to the server.
.TP
.BI \-s\  server
Connect to
.IR server .
.TP
.BI \-p\  port
Use TCP port
.I port
when connecting to the server, instead of the
default of 32777.

.SH MONITOR HOST
.B moncmd
will use the host specified by the
.B \-s
parameter as the server.  If there is no
.B \-s
parameter it will use the host specified in the MONHOST environment
variable.  If there is no host in either of these locations it will exit
with an error.

.SH BATCH OPERATION
If no commands are supplied to
.B moncmd
on the command line, then commands will
be taken from either standard input, or from
the file specified by the -f parameter.
If standard input is connected
to a TTY and the -a option is supplied,
then it will prompt for a password.
If the -a option is supplied without the -f
option and standard input is not a TTY, then
the username and password are read from
standard input using the syntax "USER=username" and
"PASS=password". The remaining input lines are
interpreted as commands to send to the server.
.B moncmd
will not take usernames or passwords from a file,
for obvious security reasons.

If the username is neither supplied by the -l parameter
or through standard input,
it is taken from the effective user ID of the
current process.

.SH COMMANDS
The following is a list of the commands that
the server understands.
.\"
.\"
.TP
.BI "enable | disable service " group " " servicename
Enables/disables alerts for
.I group
and
.IR servicename .
All disabled states are automatically saved to the
state file, which may optionally be re-loaded upon
restarting or initial startup of the server.

.TP
.BI "ack " "group service comment"
Acknowledge a failure condition. This will store
.I comment
in the state of service (queryable by doing a
.I "list opstatus"
command), and will surppress further alerts for
the service. Once the service returns to a non-failure
state, then the acknowledgement is reset.

.TP
.B "version"
Displays the protocol version in the form of
"version
.BR num """
where
.B num
is the protocol version number.

.TP
.BI "list aliases"
Lists aliases.

.TP
.BI "list aliasgroups"
Lists alias groups.

.TP
.B savestate
Save the state of the server. Currently, the only state which
is saved is the host/watches/services which are disabled.

.TP
.B loadstate
Load the state of the server. Currently, the only state which
is loaded is the host/watches/services which are disabled.

.TP
.BI "enable | disable host " hostname
Enables/disables host
.I hostname
in all groups. When the monitor is called, this
hostname will not be included in the list of
hostnames sent to the monitor. If a group has only
one hostname in it, then the
.BI "enable | disable watch"
command should be used instead.

.TP
.BI "enable | disable watch " watchgroup
Enables/disables an entire watch for
.IR watchgroup ,
as defined in
the configuration file. Disabling a watch not only
stops alerts from happening, but it stops the actual
monitor processes from being scheduled.

.TP
.BI "reset"
Resets the server, forcing it to re-read the configuration file,
kill off currently running monitors,
restart all monitoring, and reset all counters.
This command is only accessible if
.B moncmd
connects from the host which is running the
.B mon
server.
.TP
.BI "reset stopped"
Resets the server and immediately stops the scheduler.
This is an atomic version of the commands
.B "reset"
and
.BR "stop" .

.TP
.BI "reload auth"
Reloads the auth.cf file in order to incorporate any new changes.
The auth table is completely re-generated; it is not merged.

.TP
.BI "reset keepstate"
If the word "keepstate" comes after the reset command,
the server will do a "loadstate" right after the reset,
before the scheduler kicks back in.

.TP
.BI "stop"
Stops the scheduler, but continues to allow
client connections.

.TP
.BI "start"
Re-starts the scheduler after it has been
stopped.

.TP
.BI "test monitor " group " " servicename
Triggers a test for
.I "group"
and
.I "service"
immediately by
setting the service's countdown timer to zero.

.TP
.BI "test (alert | upalert | startupalert) " group " " servicename " " retval " " period
Triggers a test alert, upalert, or startupalert for
.I group
and
.IR servicename .
.I retval
is the integer exit value to pass to the alert via
the MON_RETVAL environment variable. You must also
specify the
.I period
as it appears in the configuration file.
All alerts of the given type in that period will be triggered,
but the alert will not be logged.

.TP
.BI "servertime"
Returns the current time of the server as seconds since Jan 1, 1970.

.TP
.BI "list group " groupname
Lists the members of group
.IR groupname .

.TP
.B "list descriptions"
List the descriptions of each service, as defined
in the configuration file. If a service description
is undefined, then it is not listed.

.TP
.BI "list alerthist"
Lists the last alarms triggered for each service of each
watch group, in addition to the summary output. The number
of alerts to keep in memory is bounded by the
.I maxkeep
variable, configurable on the
.B mon
command line at startup, and expandable with the
.B set
command during runtime.

.TP
.BI "list failurehist"
Lists the last failures, in addition to the summary output.
This is also limited by the
.I maxkeep
variable.

.TP
.BI "list opstatus"
Lists operational status of all services. Reports whether the last time
a service group was tested resulted in success or failure. The output
is:

.nf
group service untested
group service time timeleft succeeded
group service time timeleft failed output
.fi

where
.I output
is the first line of output from the monitor script
which failed,
.I time
is the time that the condition was last noticed in
.BR time (2)
format, and
.I timeleft
is the number of seconds left until the service is tested
again.

.TP
.BI "list successes"
Generates the same output as the
.B "list opstatus"
command, but only shows the services that
have succeeded the last time they were tested.

.TP
.BI "list failures"
Generates the same output as the
.B "list opstatus"
command, but only shows the services that
have failed the last time they were tested.

.TP
.BI "list disabled"
Lists all hosts and services which have been disabled by the
.B "disable host|service"
command.

.TP
.BI "list pids"
Shows the currently active watch groups/services along with their
process IDs, and it process ID of the server daemon.

.TP
.BI "list watch"
Lists all watches and services.

.TP
.BI "list state"
Lists the state of the scheduler.

.TP
.BI "set " group " " service " " variable " " value
Sets a variable to value. Useful for temporarily changing an interval
or alertevery value. Be careful, because this can just set any
value in the %watch hash, and some values that are specified in
the configuration file like "10m" or "35s" are converted and stored as
just plain integer seconds (e.g. "alertevery").

.TP
.BI "get " group " " service " " variable
Displays the value of group service variable.

.TP
.BI "set " variable " " value
Assigns
.I value
to the global variable
.IR variable .

.TP
.BI "set opstatus " group " " service " " value
Sets the opstatus value for
.I group
and
.I service.

.TP
.BI "get " variable
Shows the value of global variable
.IR variable .

.TP
.BI "term"
Terminates the server.
This command is only accessible if
.B moncmd
connects from the host which is running the
.B mon
server.

.SH "ENVIRONMENT VARIABLES"

.IP MONSERVER
The hostname of the server which runs the
.B mon
process.
.IP MONPORT
The port number to connect to.

.SH SEE ALSO
mon(8)
.SH BUGS
Report bugs to the email address below.
.SH AUTHOR
Jim Trocki 
mon-1.2.0/doc/README.protocol0000644003616100016640000000755210273202056015502 0ustar  trockijtrockij$Id: README.protocol,v 1.2 2005/07/31 17:02:38 vitroth Exp $

MON PROTOCOL
------------

The client/server protocol for mon works like this:

The server listens on TCP port 2583, which has been assigned by IANA.

In the following, a "line" is a sequence of ASCII text, terminated with
a newline (0A in hexadecimal).

A request submitted by the client is a single line.  Only one request
per line is permitted. Any number of requests per session is permitted.
The client indicates the end of requests by sending a "quit" request.

The reply to a request is zero or more lines lines.  The end of the
reply is terminated with a positive or negative acknowledgement line.
The positive acks match this regular expression:

^2[0-9][0-9] .*$

Negative acks match this expression:

^5[0-9][0-9] .*$

Characters trailing the leading integer are a comment which summarizes
the success or failure.

The actual value of the leading integer is not meaningful except to
indicate success (200-299) or failure (500-599).

An example session follows:

(client connects to port 2583)
Client: list state
Server: scheduler running
Server: 220 list state completed
Client: list xyzzy
Server: 520 unknown list command
Client: list watch
Server: group1 service1
Server: group1 service2
Server: group2 service1
Server: 220 list watch completed
Client: quit
Server: 220 quitting
(server and client terminate tcp session)

If no requests are received from the client in a given amount of time
(configurable on the server), then the server will timeout the connection
and hang up on it.

The following are the valid requests and replies implemented in
mon-0.99.2, as defined in the "client_command" routine. A command is
indicated by the word itself, required arguments are surrounded by {},
and optional arguments are surrounded by []. Case is not significant
for the commands themselves, but may be significant for their arguments.

quit
    Terminate connection with server. The server sends a "220" response then
    terminates the TCP session.

protid {version}
    Report whether or not the protocol version matches the server's protocol
    version. Returns 520 nack on failure, or 220 ack on success. As far as I
    know, nothing uses this command.

login {user} {pass}
    Attempts to log in the "user" with password "pass".  This is required if
    user authentication is specified in the server's auth.cf file.

reset [stopped] [keepstate]
    Aborts all currently running monitors, re-reads the server's configuration
    file, and reinitializes the state of all monitoring.  If "stopped" is
    specified, stops the scheduler before the reload. If "keepstate" is
    specified, then the state of the disabled list is reinstated after the
    reset.

reload auth
    Reloads the auth.cf file.

clear timers {group} {service}
    Resets all timers associated with a service. This includes the interval
    counter, traptimeout, trapduration, last alert, consecutive failures, and
    alertafterival.

test monitor {group} {service}

test alert {group} {service} {retval} {period}

test startupalert {group} {service} {retval} {period}

test upalert {group} {service} {retval} {period}

test config

version

loadstate disabled

savestate disabled

savestate opstatus

term

stop

start

set maxkeep {num}

set {group} {service} {variable} {value}

setview {view}

getview

get maxkeep

get {group} {service} {variable}

list descriptions

list group {group}

list opstatus

list opstatus {group,service} [group,service ...]

list disabled

list alerthist

list failures

list failurehist

list successes

list warnings

list pids

list watch

list state

list aliases

list aliasgroups

list deps

list dtlog

list views

ack {group} {service} {comment}

disable watch {group}

disable service {group} {service}

disable host {host [host ...]}

enable watch {group}

enable service {group} {service}

enable host {host [host ...]}

servertime

checkauth {cmd}

mon-1.2.0/doc/README.snmpvar.monitor0000644003616100016640000002425510146140376017021 0ustar  trockijtrockij                             snmpvar.monitor               by P.Holzleitner

What does it do?

    snmpvar.monitor is a plug-in for the "mon" systems monitoring package
    written by Jim Trockij (http://www.kernel.org/software/mon).
    
    Called by mon, it queries freely configurable values using SNMP,
    compares them against specified limits and reports any violation.

    Some parameters that can be monitored (just to give you an idea):

      Equipment operational status (temperature, fan rotation)
      UPS Status (line power / battery, minimum line voltage, load % ...)
      Switch/Router status (interface up, BGP session up, ...)
      Server status (redundant power supply OK, disk array OK, ...)
      Status of services (process running, mail queue length, ...)
      
      
License

  GNU GPLv2 (http://www.fsf.org/licenses/gpl.txt) - See file COPYING

  
Quick Start:

    * Make sure you have UCD SNMP 3.6.2+ (libraries) and the Perl SNMP
      module installed (http://www.cpan.org/misc/cpan-faq.html)
    * Copy snmpvar.mon to your mon.d directory
    * Copy snmpvar.def to /etc/mon, add your own variables
    * Copy snmpvar.cf to /etc/mon and edit to match your needs
    * Test from mon.d directory with ./snmpvar.monitor -l host1 host2 ...
    * Test again from mon.d directory with ./snmpvar.monitor host1 host2 ...
    * Add watch/service to mon.cf, using snmpvar.monitor


Commandline options:

    --varconf=/path/to/snmpvar.def if neither /etc/mon nor /usr/lib/mon/etc
    --config=/path/to/snmpvar.cf if neither /etc/mon nor /usr/lib/mon/etc
    --community=your_SNMP_read_community if not 'public'

    --groups=Power,Disks  test only a subset of variables for a host group

    --timeout=n		SNMP GET timeout in seconds
    --retries=n		number of times to retry the SNMP GET
    --debug		tell what config is being useed
    --mibs='mib1:mib2:mibn'	load specified MIBs
    --list[=linesperpage]]	produce human-readable listing, not alarms

   For every host name passed on the command line, snmpval.monitor looks
   up the list of variables and corresponding limits in the configuration
   file (snmpmon.cf).

   If a --groups option is present, only those variables are checked
   which are in one of the specified groups.  To specify more than one
   group, separate group names with commas.  You can also exclude groups
   by prefixing the group name(s) with '-'.  Don't mix in- and exclusion.
   Examples:
      --groups=Power        only vars in the Power group
      --groups=Power,Env    vars in the Power or Env group
      --groups=-Power,-Env  all vars except those in Power or Env groups
      --groups=Power,-Env   won't work (only the exclusions)

   For every such variable, it looks up the OID, description etc. from
   the variable definition file (snmpvar.def).

   This monitor looks for configuration files in the current directory,
   in /etc/mon and /usr/lib/mon/etc.  Command line option --varconf 
   overrides the location of the variable definition file, option
   --config sets the configuration file name.

   When invoked with the --list option, the output format is changed
   into a more human-readable form used to check and troubleshoot the
   configuration.  This option must not be used from within MON.


Exit values:
   0  if everything is OK
   1  if any observed value is outside the specified interval
   2  in case of an SNMP error (e.g. no response from host)


Basic Troubleshooting:

    use snmpvar.monitor --list option to see variable values
    use snmpwalk your_hostname public .1 | less to verify SNMP agent


The snmpvar.def File:

    In this file we define variables that can be retrieved via SNMP.
    In a way, the .def file is snmpvar.monitor's idea of a MIB.
    
    Entries consist of a "Variable variable-name" declaration

      Variable PE4300_TEMP_MB

    [NOTE: The variable name cannot be "Host" or "FriendlyName"]
    followed by the mandatory specification of Object ID and Description:    
    
      OID            .1.3.6.1.4.1.674.10891.300.1.5.2.2.1.3
      Description    Motherboard Temperature
      
    It is suggested that OIDs be entered numerically as shown above 
    in order to eliminate the need for having the SNMP libraries compile
    the relevant MIB files on every invocation of the monitor.
    By default, this monitor loads no MIBs.  If you want to use symbolic
    OIDs, use the --mibs commandline option to specify which MIBs you need.
    
    By the author's convention, an OID describing an array of values, like
    ifOperStat which takes the interface number as an index, is written
    with a trailing dot, while OIDs of scalars end in a number.  As of 
    version 1.1.1, the monitor will insert the dot before the index if you
    forgot it in the .def file.
    
   
    Optional Elements of a Variable definition:
      
      DefaultIndex   3 4 5
      
    A list of indices to test by default.  Let's say the OID is .1.2.3. and
    DefaultIndex is "18 22 36", then the monitor will retrieve the values of
    .1.2.3.18, .1.2.3.22 and .1.2.3.36 when testing this variable, and will
    compare them all against the limits.  Where necessary, the DefaultIndex
    can be overridden for one host/variable combination, using the Index
    statement in the .cf file.

      FriendlyName	3	Disk Fan 1
      
    This lets you replace the standard display of "Variable [Index]",
    e.g. "Fan Speed [5]", with individual labels for each index.
    The FriendlyName option is typically specified in the .def file for
    items that have the same name for every use, e.g. component names like
    in the case of fans, power supplies etc.  The same option exists in
    the .cf file to name a particular variable on a particular host, e.g.
    to display a line name instead of an interface number on a router.
    If the FriendlyName string begins with "@", the Description is
    substituted for the "@".

      Scale          / 10.0
    
    A formula to re-scale the value returned from the host.
    The expression is appended to the raw value and the resulting expression
    is evaluated by Perl.  The raw value is available as $rawval if necessary.
      
      Unit           C

    Used in value display / messages,
    
      Decode	 1	unknown
      Decode	 2	OK
      Decode	 3	FAILURE

    Values retrieved through SNMP are often enumerations of status codes.
    The Decode statement lets you put text labels on these values.

      DefaultGroup	Environment
      
    Defines that all, by default, instances of this variable go into the
    specified group.  Individual overrides possible in .cf file.


      DefaultMin  300
      DefaultMax 2000
      DefaultEQ  1000
      DefaultNEQ 1000
      
    Default alarm limits.  See description of Min/Max/EQ/NEQ below.
      

The snmpvar.cf File:

    In here, you "call up" the variables to be retrieved for a particular
    host.

    Entries consist of a "Host host-name" declaration followed by at least
    one "variable-name [options ...]" line.
    
      Host ntserv1
    
    This hostname corresponds to the hostname on the command line, i.e. the
    hostname you used in MON's hostgroup statement.
    
      FOO_FAN_RPM   Min 1000  Max 5000  MaxValid 10000  Index 1 2 3 4
      
    This example uses almost all options.  It instructs the monitor to
    retrieve the OID specified under "FOO_FAN_RPM" in the .def file.
    
      Min  300		specifies a minimum value, measured >= minimum
      Max 2000		specifies a maximum value, measured <= maximum
      EQ  1000		specifies a exact value, measured == maximum
      NEQ 1000		specifies a exact value, measured != maximum
      
    If the measured value is outside of these limits, a failure is reported.
    To test for "Value = X", use "Min X  Max X".
      
      MinValid -1
      MaxValid 10000
      
    Some monitoring hardware occasionally measures garbage.  To avoid
    triggering an alarm when this happens, you can use MinValid/MaxValid
    to specify the range (inclusive) of plausible values for this variable.
    If the measured value exceeds these limits, only a warning will be
    generated, but no failure will be reported to MON.

      Group Environment
      
    Puts this particular variable into the specified group.
    Groups are used to test a partial set of the variables specified for
    a host, by using the --groups= command line option.

      Index 1 2 3
      
    This tells the monitor which object instances (array elements) to test
    in case of a non-scalar object.  Since the list of indices can be as
    long as necessary, the Index option must be the last one on the line
    (after Min X, Max Y etc.)
    The list specified as DefaultIndex in the .def file entry for this
    variable is used unless  Index is pecified here.

    When retrieving a non-scalar value, the snmpvar.monitor will normally
    display the instances (array elements) by appending their index to the
    description, as in "Line Status [3]".
    
    Often, it is desirable to label individual instances in a more
    mnemonic way.  To do this, you can add a number of FriendlyName
    directives after a variable request, like this:
    
      Host firewall
        IF_OPERSTAT		Index 1 2 3
	    FriendlyName	1	 1: Leased Line
	    FriendlyName	2	 2: DMZ
	    FriendlyName	3	 3: Internal Router
	    
    In this case, the monitor checks the ifOperStat for interfaces 1, 2,
    and 3 on host "firewall".  If interface 3 were not "up", the monitor
    would signal a failure of "Internal Router" instead of "ifOperStat [3]".
    If the FriendlyName string begins with "@", the Description is
    substituted for the "@".
    If all instances of this variable having the same index have the same
    meaning regardless of what host they are on, you can put the FriendlyName
    statement into te respective variable definition in the .def file
    instead. 


The snmpopt.cf File:

    This optional file is used to pass parameters to the SNMP library.
    
    For SNMPv1, this is generally not necessary unless the target's 
    SNMP port differs from the default (161).

    Note that SNMPv1 community string, timeout and retries can also be
    specified on the snmpvar.monitor command line, overriding whatever
    default or configuration file setting.
    
    You will need to edit this file in order to use SNMPv3.
    

mon-1.2.0/doc/README.cgi-bin0000644003616100016640000000145310230411543015137 0ustar  trockijtrockij$Id: README.cgi-bin,v 1.3 2005/04/17 07:42:27 trockij Exp $

mon.cgi
-------
    mon.cgi is the more advanced web interface to mon, maintained by
    Andrew Ryan .

minotaur
--------
    minotaur is maintained by Gilles Lamiral. You may obtain it from
    here:

    http://www.linux-france.org/prj/minotaure/


monshow.cgi
-----------

    monshow can be found in the client/ directory. Put it into your
    cgi-bin directory, rename it monshow.cgi, and it should run via your
    web server's CGI mechanism. Upon startup, this script looks for a
    configuration file named "/etc/mon/monshowrc", or ".monshowrc" in
    the working directory. An example monshowrc is in the etc/ direcory.
    Read the man page for more information.


Jim Trocki
Transmeta Corporation
trockij@arctic.org
mon-1.2.0/doc/README.traps0000644003616100016640000000621710061516616014774 0ustar  trockijtrockijThe protocol for agents (remote or local monitor scripts)
to deliver failures to the mon server:

Trap consists of tag/value pairs which are separated by newlines. The
first tag must be "pro", which is the protocol version.

Tags which are understood are:

#
# MON-specific tags
# pro   protocol
# aut   auth
# typ   type (0=mon, 1=snmpv1)
# spc   specific type (TRAP_*)
# seq   sequence
# grp   group
# svc   service
# hst   host
# sta   status (opstatus)
# tsp   timestamp as time(2) value
# sum   summary output
# dtl   detail (terminated by \n.\n)
#
# SNMP-specific tags
# ent   enterprise OID
# agt   agent address
# gtp   generic trap type
# stp   enterprise-specific trap type
# tmp   sysUptime timestamp
# vbl   varbindlist (OID = value)
#

SNMP-specific tags do nothing at this time.

Rather than formulating the trap PDU yourself, it's a good idea to use
Mon::Client::send_trap. See the POD for Mon::Client for more details,
or see remote.alert for an example.

If an alert for a watch or service is delivered to a mon server and
its configuration does not include that watch or service, it will use
the default watch/service "default" to deliver the alert. If "default"
is not defined in the mon.cf, the alert will be logged and then discarded.

NOTE: alert/upalert stats are not handled specially for 'default' traps,
so if one unknown alert trap comes in, followed by a unknown upalert
from a different host, then the alert output from mon may be confusing.
Set up a default watch, and use it as a debugging guide to catch random
trap and remind you to update your mon config file.

watch default
    service default
	period wd {Sun-Sat}
	    alert some.alert
	    upalert some.alert -u

See the mon.1 man page for the list of environment variables availble to
monitor and alert programs. One particular environmet variable to note is
the MON_TRAPINTEND variable. This is a colon (:) separated watch
group / service pair which was the intended recipient when a default watch
group and service were invoked for a trap.  This hopefully gives you
some ability to figure out what to do with a trap caught by "default",
and could be exploited to allow a lazy administrator to send useful
information from alerts ;)

There is a (very simple) alert script called "remote.alert" which
delivers a failure detected locally to a remote mon process. This
allows centralization of alert handling, and it allows distributed
mon processes. Pass the mon host name via -H  and the port via
-P .

you could use remote.alert to send a trap from one mon server to another
mon server. this can be useful for implementing a hierarchy of mon
servers, where the topmost level serves as the alert management node
for the lower leaf nodes. for example:

mon server "highlevel":

watch pr-internet
    service http_tp
        period wd {Sun-Sat}
            alert mail.alert name@address.com


mon server "lowlevel":

watch pr-internet
    service http_tp
	monitor http_tp.monitor
	interval 5m
	period wd {Sun-Sat}
	    alert remote.alert -H highlevel


when the pr-internet/http_tp service fails on the mon server "lowlevel",
it will send a trap to the mon server "highlevel", which will then send
the email alert.

mon-1.2.0/doc/mon.80000644003616100016640000013701110637737257013662 0ustar  trockijtrockij.\" $Id: mon.8,v 1.6.2.3 2007/06/25 13:10:07 trockij Exp $
.TH mon 8 "$Date: 2007/06/25 13:10:07 $" Linux "Parallel Service Monitoring Daemon"
.SH NAME
mon \- monitor services for availability, sending alarms upon failures.
.SH SYNOPSIS
.B mon
.RB [ \-dfhlMSv ]
.RB [ \-a
.IR dir ]
.RB [ \-A
.IR authfile ]
.RB [ \-b
.IR dir ]
.RB [ \-B
.IR dir ]
.RB [ \-c
.IR config ]
.RB [ \-D
.IR dir ]
.RB [ \-i
.IR secs ]
.RB [ \-k
.IR num ]
.RB [ \-l
.IR [ statetype ] ]
.RB [ \-L
.IR dir ]
.RB [ \-m
.IR num ]
.RB [ \-p
.IR num ]
.RB [ \-P
.IR pidfile ]
.RB [ \-r
.IR delay ]
.RB [ \-s
.IR dir ]
.SH DESCRIPTION
.B mon
is a general-purpose scheduler for monitoring service availability
and triggering alerts upon detecting failures.
.B mon
was designed to be open in the sense that it supports arbitrary
monitoring facilities and alert methods via a common interface, which
are easily implemented through programs (in C, Perl, shell, etc.), 
SNMP traps, and special Mon (UDP packet) traps.

.SH OPTIONS
.TP
.BI \-a\  dir
Path to alert scripts. Default is
.IR /usr/local/lib/mon/alert.d:alert.d .
Multiple alert paths may be specified by separating them with
a colon.  Non-absolute paths are taken to be relative to the
base directory
.RI ( /usr/lib/mon
by default).
.TP
.BI \-b\  dir
Base directory for mon. scriptdir, alertdir, and statedir
are all relative to this directory unless specified from /.
Default is
.IR /usr/lib/mon .
.TP
.BI \-B\  dir
Configuration file base directory. All config files are located here, including
mon.cf, monusers.cf, and auth.cf.
.TP
.BI \-A\  authfile
Authentication configuration file. By default this is
.IR /etc/mon/auth.cf " if the " /etc/mon
directory exists, or
.I /usr/lib/mon/auth.cf
otherwise.
.TP
.BI \-c\  file
Read configuration from
.IR file .
This defaults to
IR /etc/mon/mon.cf " if the " /etc/mon
directory exists, otherwise to
.IR /etc/mon.cf .
.TP
.BI \-d
Enable debugging mode.
.TP
.BI \-D\ dir
Path to state directory.  Default is the first of
.IR /var/state/mon ", " /var/lib/mon ", and " /usr/lib/mon/state.d
which exists.
.TP
.BI \-f
Fork and run as a daemon process. This is the
preferred way to run
.BR mon .
.TP
.BI \-h
Print help information.
.TP
.BI \-i\  secs
Sleep interval, in seconds. Defaults to 1. This shouldn't need to
be adjusted for any reason.
.TP
.BI \-k\  num
Set log history to a maximum of
.I num
entries. Defaults
to 100.
.TP
.BI \-l\ statetype
Load state from the last saved state file. The 
supported saved state types are 
.B disabled
for disabled watches, services, and hosts, 
.B opstatus
for failure/alert/ack status of 
all services,
and 
.B all 
for both.  If no statetype is provided, 
.B disabled
is assumed.
.TP
.BI \-L\ dir
Sets the log dir. See also
.B logdir
in the configuration file.  The default is
.B /var/log/mon
if that directory exists, otherwise
.BR log.d
in the base directory.
.TP
.B \-M
Pre-process the configuration file with the
macro expansion package
.IR m4 .
.\"
.\"
.\"
.TP
.BI \-m\  num
Set the throttle for the maximum number of processes to
.IR num .
.TP
.BI \-p\  num
Make server listen on port
.IR num .
This defaults to 2583.
.TP
.B \-S
Start with the scheduler stopped.
.TP
.BI \-P\  pidfile
Store the server's pid in
.IR pidfile ,
the default is the first of
.IR /var/run/mon/mon.pid ,
.IR /var/run/mon.pid ,
and
.IR /etc/mon.pid
whose directory exists.  An empty value tells
.B mon
not to use a pid file.
.TP
.BI \-r\  delay
Sets the number of seconds used to randomize the startup delay
before each service is scheduled. Refer to the global
.I randstart
variable in the configuration file.
.TP
.BI \-s\  dir
Path to monitor scripts. Default is
.IR /usr/local/lib/mon/mon.d:mon.d .
Multiple alert paths may be specified by separating them with
a colon.  Non-absolute paths are taken to be relative to the
base directory
.RI ( /usr/lib/mon
by default).
.TP
.BI \-v
Print version information.

.SH DEFINITIONS
.TP
.BI monitor
A program which tests for a certain condition, returns either true or
false, and optionally produces output to be passed back to the scheduler.
Common monitors detect host reachability via ICMP echo messages, or
connection to TCP services.
.TP
.BI period
A period in time as interpreted by the Time::Period module.
.TP
.BI alert
A program which sends a message when invoked by the scheduler.
The scheduler calls upon an alert when it detects a failure from
a monitor.
An alert program accepts a set of command-line arguments from the
scheduler, in addition to data via standard input.
.TP
.BI hostgroup
A single host or list of hosts, specified as names or IP addresses.
.TP
.BI service
A collection of parameters used to deal with monitoring a particular
resource which is provided by a group. Services are usually modeled after
things such as an SMTP server, ICMP echo capability, server disk space
availability, or SNMP events.
.TP
.BI view
A collection of hostgroups, used to filter mon output for client display.
i.e. a 'network-services' view might be defined so your network staff
can see just the hostgroups which matter to them, without having to see
all hostgroups defined in Mon.
.TP
.BI watch
A collection of services which apply to a particular group.
.SH OPERATION
When the
.B mon
scheduler starts, it reads a configuration file to determine the
services it needs to monitor. The configuration file defaults to
.IR /etc/mon.cf ,
and can be specified using the
.BI \-c
parameter. If the
.B -M
option is specified, then the configuration file is pre-processed
with
.IR m4 .
If the configuration file ends with .m4, the file is also processed by
m4 automatically.

The scheduler enters a loop which handles client connections,
monitor invocations, and failure alerts. Each service has a timer,
specified in the configuration file as the
.BI interval
variable, which tells the scheduler how frequently to invoke a
monitor process.
The scheduler may be temporarily stopped. While it is stopped, client
access still functions, but it just doesn't schedule things. This
is useful in conjunction while resetting the server, because you can do this:
save the hosts and services which are disabled, reset the server
with the scheduler stopped, re-disabled those hosts and services,
then start the scheduler. It also allows making atomic changes
across several client connections.
See the
.B moncmd
man page for more information.

.SH MONITOR\ PROGRAMS
Monitor processes are invoked with the arguments specified in the
configuration file, appended by the hosts from the applicable
host group. For example, if the watch group is "servers", which contain
the hostnames "smtp", "nntp", and "ns", and the monitor line reads
as follows,
.br
\fC
monitor fping.monitor -t 4000 -r 2
\fR
.br
then the exectuable "fping.monitor" will be executed with these
parameters:
.br
\fC
MONITOR_DIR/fping.monitor -t 4000 -r 2 smtp nntp ns
\fR
.br

MONITOR_DIR is actually a search path, by default
.I /usr/local/lib/mon/mon.d
then
.IR /usr/lib/mon/mon.d ,
but it can be overridden by the
.BI \-s
option or in the configuration file.
If all hosts in the hostgroup have been disabled,
then a warning is sent to syslog and the monitor is
not run. This behavior may be overridden with the
"allow_empty_group" option in the service definition.
If the final argument to the "monitor" line is ";;"
(it must be preceded by whitespace),
then the host list will not be appended to the parameter list.

In addition to environment variables defined by
the user in the service definition,
.B mon
passes certain variables to monitor process.

.TP
.B MON_LAST_SUMMARY
The first line of the output from the last time the monitor exited.
This is not the summary of the current monitor run, but the previous
one.  This may be used by an alert script to provide historical
context in an alert.

.TP
.B MON_LAST_OUTPUT
The entire output of the monitor from the last time it exited.  This
is not the output of the current monitor run, but the previous one.
This may be used by an alert script to provide historical context in
an alert.


.TP
.B MON_LAST_FAILURE
The time(2) of the last failure for this service.

.TP
.B MON_FIRST_FAILURE
The time(2) of the first time this service failed.

.TP
.B MON_LAST_SUCCESS
The time(2) of the last time this service passed.

.TP
.B MON_DESCRIPTION
The description of this service, as defined in the
configuration file using the
.I description
tag.

.TP
.B MON_DEPEND_STATUS
The depend status, "o" if dependency failure, "1" otherwise.

.TP
.B MON_LOGDIR
The directory log files should be placed,
as indicated by the
.I logdir
global configuration variable.

.TP
.B MON_STATEDIR
The directory where state files should be kept,
as indicated by the
.I statedir
global configuration variable.

.TP
.B MON_CFBASEDIR
The directory where configuration files should be kept,
as indicated by the
.I cfbasedir
global configuration variable.

.P
"fping.monitor" should return an exit status of 0 if it
completed successfully (found no problems), or nonzero if a problem
was detected. The first line of output from the monitor
script has a special meaning: it
is used as a brief summary of the exact failure which was detected, and
is passed to the alert program. All remaining output is also passed
to the alert program, but it has no required interpretation.

If a monitor for a particular service is still
running, and the time comes for
.B mon
to run another monitor for that service, it will not
start another monitor. For example, if the
.I interval
is 10s, and the monitor does not finish running
within 10 seconds, then
.B mon
will wait until the first monitor exits before
running another one.

.SH ALERT DECISION LOGIC
Upon a non-zero or zero exit status, the associated alert or upalert
program (respectively) is started,
pending the following conditions: If an alert for a specific
service is disabled, do not send an alert.
If
.B dep_behavior
is set to
.IR "'a'" ,
or
.B alertdepend
is set, and a parent dependency is failing, then suppress the alert.
If the alert has previously been acknowledged, do not send
the alert, unless it is an upalert.
If an alert is not within the specified period, record the failure
via syslog(3) and do not send an alert.
If the failure does not fall within a defined period, do not
send an alert.
No upalerts are sent without corresponding down alerts,
unless
.B no_comp_alerts
is defined in the period section. An upalert will only be sent
if the previous state is a failure.
If an alert was already sent within the last
.B alertevery
interval, do not send another alert,
.I unless
the summary output from the current monitor program differs from the last
monitor process.
Otherwise, send an alert using each alert program
listed for that period. The
.B "observe_detail"
argument to
.B alertevery
affects this behavior by observing the changes in the detail part
of the output in addition to the summary line.
If a monitor has successive failures and the
summary output changes in each of them,
.B alertevery
will not suppress multiple consecutive alerts.
The reasoning is that if the summary output changes, then
a significant event occurred and the user should be alerted.
The "strict" argument to alertevery will suppress both
comparing the output from the previous monitor run to the current
and prevent a successful return value of the monitor from
resetting the alertevery timer. For example, "alertevery 24h strict"
will only send out an alert once every 24 hours, regardless of
whether the monitor output changes, or if the service stops and then
starts failing.

.SH ALERT\ PROGRAMS
Alert programs are found in the path supplied with the
.BI \-a
parameter, or in the
.I /usr/local/lib/mon/alert.d
and
directories if not specified.  They are invoked with the following command-line
parameters:

.TP
.BI \-s\  service
Service tag from the configuration file.
.TP
.BI \-g\  group
Host group name from the configuration file.
.TP
.BI \-h\  hosts
The expanded version of the host group, space delimited, but contained
in one shell "word".
.TP
.BI \-l\  alertevery
The number of seconds until the next alarm will be sent.
.TP
.BI \-O
This option  is  supplied  to an alert only if the
alert is being generated as a result of an expected traap timing out
.TP
.BI \-t\  time
The time (in
.BR time (2)
format) of when this failure condition
was detected.
.TP
.BI \-T
This option is supplied to an alert only if the alert was triggered by a trap
.TP
.B \-u
This option is supplied to an alert only if it is being
called as an upalert.

.P
The remaining arguments are supplied from the trailing parameters in
the configuration file, after the "alert" service parameter.

As with monitor programs, alert programs are invoked with environment
variables defined by the user in the service definition, in addition
to the following which are explicitly set by the server:

.TP
.B MON_LAST_SUMMARY
The first line of the output from the last time the
monitor exited.

.TP
.B MON_LAST_OUTPUT
The entire output of the monitor from the last time it
exited.

.TP
.B MON_LAST_FAILURE
The time(2) of the last failure for this service.

.TP
.B MON_FIRST_FAILURE
The time(2) of the first time this service failed.

.TP
.B MON_LAST_SUCCESS
The time(2) of the last time this service passed.

.TP
.B MON_DESCRIPTION
The description of this service, as defined in the
configuration file using the
.I description
tag.

.TP
.B MON_GROUP
The watch group which triggered this alarm

.TP
.B MON_SERVICE
The service heading which generated this alert

.TP
.B MON_RETVAL
The exit value of the failed monitor program, or return value
as accepted from a trap.

.TP
.B MON_OPSTATUS
The operational status of the service.

.TP
.B MON_ALERTTYPE
Has one of the following values: "failure", "up", "startup",
"trap", or "traptimeout", and signifies the type of alert which
was triggered.

.TP
.B MON_TRAP_INTENDED
This is only set when an unknown mon trap is received and caught
by the default/defaut watch/service. This contains colon
separated entries of the trap's intended watch group and service name.

.TP
.B MON_LOGDIR
The directory log files should be placed,
as indicated by the
.I logdir
global configuration variable.

.TP
.B MON_STATEDIR
The directory where state files should be kept,
as indicated by the
.I statedir
global configuration variable.

.TP
.B MON_CFBASEDIR
The directory where configuration files should be kept,
as indicated by the
.I cfbasedir
global configuration variable.

.P
The first line from standard input must be used as a brief summary
of the problem, normally supplied as the subject line of an email, or
text sent to an alphanumeric pager. Interpretation of all subsequent
lines read from stdin is left up to the alerting program. The usual
parameters are a list of recipients to deliver the notification to.
The interpretation of the recipients is not specified, and is up
to the alert program.

.SH CONFIGURATION FILE
The configuration file consists of zero or more global variable definitions,
zero or more hostgroup definitions,
and one or more watch definitions. Each watch definition may have one
or more service definitions. A watch definition is terminated by a blank
line, another definition, or the end of the file. A line beginning with optional
leading whitespace and a pound ("#") is
regarded as a comment, and is ignored.

Lines are parsed as they are read. Long lines may be continued by ending
them with a backslash ("\\").  If a line is continued, then the backslash,
the trailing whitespace after the backslash, and the leading whitespace
of the following line are removed. The end result is assembled into a
single line.

Typically the configuration file has the following layout:

1. Global variable definitions

2. Hostgroup definitions

3. Watch definitions

See the "etc/example.cf" file which comes for the distribution for an example.

.SS "Global Variables"
The following variables may be set to override compiled-in
defaults. Command-line options will have a higher precedence than
these definitions.

.TP
.BI "alertdir = " dir
.I dir
is the full path to the alert scripts. This is the value set by
the
.B \-a
command-line parameter.

Multiple alert paths may be specified by separating them with
a colon.  Non-absolute paths are taken to be relative to the
base directory
.RI ( /usr/lib/mon
by default).

When the configuration file is read, all alerts referenced from the
configuration will be looked up in each of these paths, and the full
path to the first instance of the alert found is stored in a hash. This
hash is only generated upon startup or after a "reset" command, so newly
added alert scripts will not be recognized until a "reset" is performed.

.TP
.BI "mondir = " dir
.I dir
is the full path to the monitor scripts. This value may also be
set by the
.B \-s
command-line parameter. If this path does not begin with a "/", it will be
relative to
.IR basedir .

Multiple alert paths may be specified by separating them with
a colon. All paths must be absolute.

When the configuration file is read, all monitors referenced from the
configuration will be looked up in each of these paths, and the
full path to the first
instance of the monitor found is stored in a hash. This hash is only
generated upon startup or after a "reset" command, so newly added monitor
scripts will not be recognized until a "reset" is performed.

.TP
.BI "statedir = " dir
.I dir
is the full path to the state directory.
.B mon
uses this directory to save various state information. If this path does not begin with a "/", it will be
relative to
.IR basedir .

.TP
.BI "logdir = " dir
.I dir
is the full path to the log directory.
.B mon
uses this directory to save various logs, including
the downtime log. If this path does not begin with a "/", it will be
relative to
.IR basedir .

.TP
.BI "basedir = " dir
.I dir
is the full path for the state, log, monitor, and alert directories.

.TP
.BI "cfbasedir = " dir
.I dir
is the full path where all the config files can be found
(monusers.cf, auth.cf, etc.).

.TP
.BI "authfile = " file
.I file
is the path to the authentication file. If the path does not begin
with a "/", it will be relative to
.IR cfbasedir .

.TP
.BI "authtype = " "type [type...]"
.I type
is the type of authentication to use. A space-separated list of
types may be specified, and they will be checked the order they are
listed. As soon as a successful authentication is performed, the user
is considered authenticated by mon for the duration of the session and
no more authentication checks are performed.

If
.I type
is
.BR getpwnam ,
then the standard Unix passwd file authentication method will be used
(calls getpwnam(3) on the user and compares the crypt(3)ed version
of the password with what it gets from getpwnam). This will not work
if shadow passwords are enabled on the system.

If
.I type
is
.BR userfile ,
then usernames and hashed passwords are read from
.IR userfile ,
which is defined via the
.B userfile
configuration variable.

If
.I type
is
.BR pam ,
then PAM (pluggable authentication modules) will be used for authentication.
The service specified by the
.B pamservice
global will be used. If no global is given, the PAM
.B passwd
service will be used.

If
.I type
is
.BR trustlocal ,
then if the client connection comes from locahost, the username passed from 
the client will be trusted, and the password will be ignored.  This can be used 
when you want the client to handle the authentication for you.  I.e. a CGI script 
using one of the many apache authentication methods.

.TP
.BI "userfile = " file
This file is used when
.B authtype
is set to
.IR userfile .
It consists of a sequence of lines of the format
.BR "'username : password'" .
.B password
is stored as the hash returned by the standard Unix
crypt(3) function. 
.B NOTE:
the format of this file is compatible with the Apache file based
username/password file format. It is possible to use the
.I htpasswd
program supplied with Apache to manage the mon userfile.

Blank lines and lines beginning with # are ignored.

.TP
.BI "pamservice = " service
The PAM service used for authentication. This is applicable
only if "pam" is specified as a parameter to the
.B authtype
setting. If this global is not defined, it defaults
to
.BR "passwd" .

.TP
.BI "serverbind = " addr

.TP
.BI "trapbind = " addr

.B serverbind
and
.B trapbind
specify which address to bind the server and trap ports to, respectively.
If these are not defined, the default address is INADDR_ANY, which
allows connections on all interfaces. For security reasons,
it could be a good idea to bind only to the loopback interface.

.TP
.BI "dtlogfile = " file
.I file
is a file which will be used to record the downtime log. Whenever
a service fails for some amount of time and then stop failing, this
event is written to the log. If this parameter is not set, no
logging is done. The format of the file is as follows (# is a
comment and may be ignored):

.BR "timenoticed group service firstfail downtime interval summary".

.B timenoticed
is the time(2) the service came back up.

.B "group service"
is the group and service which failed.

.B "firstfail"
is the time(2) when the service began to fail.

.B "downtime"
is the number of seconds the service failed.

.B "interval"
is the frequency (in seconds) that the service is polled.

.B "summary"
is the summary line from when the service was failing.

.TP
.BI "monerrfile = " filename
By default, when mon daemonizes itself, it connects
stdout and stderr to /dev/null. If
.B monerrfile
is set to a file, then stdout and stderr will be
appended to that file. In all cases stdin is connected
to /dev/null. If mon is told to run in the foreground
and to not daemonize, then none of this applies, since
stdin/stdout/stderr stay connected to whatever they
were at the time of invocation.

.TP
.BI "dtlogging = " yes/no

Turns downtime logging on or off. The default is off.

.TP
.BI "histlength = " num
.I num
is the the maximum number of events to be retained
in history list. The default is 100.
This value may also be set by the
.B \-k
command-line parameter.

.TP
.BI "historicfile = " file
If this variable is set, then alerts are logged to
.IR file ,
and upon startup, some (or all) of the past history is read
into memory.

.TP
.BI "historictime = " timeval
.I num
is the amount of the history file to read upon startup.
"Now" -
.I timeval
is read. See the explanation of
.I interval
in the "Service Definitions" section
for a description of
.IR timeval .

.TP
.BI "serverport = " port
.I port
is the TCP port number that the server should bind to. This value may also be
set by the
.B \-p
command-line parameter. Normally this port is looked up via getservbyname(3),
and it defaults to 2583.

.TP
.BI "trapport = " port
.I port
is the UDP port number that the trap server should bind to.
Normally this port is looked up via getservbyname(3),
and it defaults to 2583.

.TP
.BI "pidfile = " path
.I path
is the file the sever will store its pid in.  This value may also be set
by the
.B \-P
command-line parameter.

.TP
.BI "maxprocs = " num
Throttles the number of concurrently forked processes to
.I num.
The intent is to provide a safety net for the unlikely situation
when the server tries to take on too many tasks at once.  Note that this
situation has only been reported to happen when trying to use a garbled
configuration file! You don't want to use a garbled configuration
file now, do you?

.TP
.BI "cltimeout = " secs
Sets the client inactivity timeout to
.I secs.
This is meant to help thwart denial of service attacks or
recover from crashed clients.
.I secs
is interpreted as a "1h/1m/1s" string, where
"1m" = 60 seconds.

.TP
.BI "randstart = " interval
When the server starts, normally all services will not be scheduled
until the interval defined in the respective service section.
This can cause long delays before the first check of a service,
and possibly a high load on the server if multiple things are scheduled
at the same intervals.
This option is used to randomize the scheduling
of the first test for all services during the startup period, and
immediately after the
.I reset
command.
If
.I randstart
is defined, the scheduled run time of all services of all watch groups
will be a random number between zero and
.I randstart
seconds.

.TP
.BI "dep_recur_limit = " depth
Limit dependency recursion level to
.IR depth .
If dependency recursion (dependencies which depend on other dependencies)
tries to go beyond
.IR depth ,
then the recursion is aborted and a messages is logged to syslog.
The default limit is 10.

.TP
.BI "dep_behavior = " {a|m|hm}
.B dep_behavior
controls whether the dependency expression
suppresses one of: the running of alerts, the running of 
monitors, or the passing of individual hosts to the monitors.
Read more about the behavior in the "Service Definitions" 
section below.

This is a global setting which controls the default
settings for the service-specified variable.

.TP
.BI "dep_memory = " timeval
If set, dep_memory will cause dependencies to continue to prevent
alerts/monitoring for a period of time after the service returns to a
normal state.  This can be used to prevent over-eager alerting when a
machine is rebooting, for example.  See the explanation of
.I interval
in the "Service Definitions" section
for a description of
.IR timeval .

This is a global setting which controls the default
settings for the service-specified variable.

.TP
.BI "syslog_facility = " facility
Specifies the syslog facility used for logging.
.B daemon
is the default.



.TP
.BI "startupalerts_on_reset = " {yes|no}

If set to "yes", startupalerts will be invoked when the
.B reset
client command is executed. The default is "no".

.TP
.BI "monremote = " program

If set, this external program will be called by Mon when various
client requests are processed.  This can be used to propagate those
changes from one Mon server to another, if you have multiple
monitoring machines.  An example script, 
.B monremote.pl
is available in the clients directory.

.SS "Hostgroup Entries"

Hostgroup entries begin with the keyword
.BR hostgroup ,
and are followed by a hostgroup tag and one or more hostnames
or IP addresses, separated by whitespace. The hostgroup tag must
be composed of alphanumeric
characters, a dash ("-"), a period ("."),
or an underscore ("_"). Non-blank lines following
the first hostgroup line are interpreted as more hostnames.
The hostgroup definition ends with a blank line. For example:

.RS
.nf
hostgroup servers nameserver smtpserver nntpserver
	nfsserver httpserver smbserver

hostgroup router_group cisco7000 agsplus
.fi
.RE

.SS "View Entries"
View entries begin with the keyword
.BR view , 
and are followed by a view tag and the names of one or more
hostgroups.  The view tag must be composed of alphanumeric
characters, a dash ("-"), a period ("."),
or an underscore ("_"). Non-blank lines following
the first view line are interpreted as more hostgroup names.
The view definition ends with a blank line. For example:

.RS
.nf
view servers dns-servers web-servers file-servers
     mail-servers

view network-services routers switches vpn-servers
.fi
.RE


.SS "Watch Group Entries"

Watch entries begin with a line that starts
with the keyword
.BR watch ,
followed by whitespace and a single word which
normally refers
to a pre-defined hostgroup. If the second word is not recognized
as a hostgroup tag, a new hostgroup is created whose tag is
that word, and that word is its only member.

Watch entries consist of one or more service definitions.

A watch group is terminated by a blank line, the end of the file, or by a
subsequent definition, "watch", "hostgroup", or otherwise.

There may be a special watch group entry called "default". If a
default watch group is defined with a service entry named "default",
then this definition will be used in handling traps received for
an unrecognized watch and service.

.SS "Service Definitions"

.TP
.BI service " servicename"
A service definition begins with they keyword
.B service
followed by a word which is the tag for this service.
This word must be unique among all services defined for the
same watch group.

The components of a service are an interval, monitor, and
one or more time period definitions, as defined below.

If a service name of "default" is defined within a watch
group called "dafault" (see above), then the default/default
definition will be used for handling unknown mon traps.

The following configuration parameters are valid only following
a service definition:

.TP
.BI VARIABLE= "value"
Environment variables may be defined for each service, which will be
included in the environment of monitors and alerts. Variables must
be specified in all capital letters, must begin with an alphabetical
character or an underscore, and there must be no spaces to the left
of the equal sign.

.TP
.BI interval " timeval"
The keyword
.B interval
followed by a time value specifies the frequency that
a monitor script will be triggered.
Time values are defined as "30s", "5m", "1h", or "1d",
meaning 30 seconds, 5 minutes, 1 hour, or 1 day. The numeric portion
may be a fraction, such as "1.5h" or an hour and a half. This
format of a time specification will be referred to as
.IR timeval .

.TP
.BI failure_interval " timeval"
Adjusts the polling interval to
.I timeval
when the service check is failing. Resets the interval
to the original when the service succeeds.

.TP
.BI traptimeout " timeval"
This keyword takes the same time specification argument as
.BI interval ,
and makes the service expect a trap from an external source
at least that often, else a failure will be registered. This is
used for a heartbeat-style service.

.TP
.BI trapduration " timeval"
If a trap is received, the status of the service the trap was delivered
to will normally remain constant. If
.B trapduration
is specified, the status of the service will remain in a failure
state for the duration specified by
.IR timeval ,
and then it will be reset to "success".

.TP
.BI randskew " timeval"
Rather than schedule the monitor script to run at the start of each
interval, randomly adjust the interval specified by the
.B interval
parameter by plus-or-minus
.B "randskew".
The skew value is specified as the
.B interval
parameter: "30s", "5m", etc...
For example if
.B "interval"
is 1m, and
.B "randskew"
is "5s", then
.I mon
will schedule the monitor script some time between every
55 seconds and 65 seconds.
The intent is to help distribute the load on the server when
many services are scheduled at the same intervals.

.TP
.BI monitor " monitor-name [arg...]"
The keyword
.B monitor
followed by a script name and arguments
specifies the monitor to run when the timer
expires. Shell-like quoting conventions are
followed when specifying the arguments to send
to the monitor script.
The script is invoked from the directory
given with the
.B \-s
argument, and all following words are supplied
as arguments to the monitor program, followed by the
list of hosts in the group referred to by the current watch group.
If the monitor line ends with ";;" as a separate word,
the host groups are not appended to the argument list
when the program is invoked.

.TP
.B allow_empty_group
The
.B allow_empty_group
option will allow a monitor to be invoked even when the
hostgroup for that watch is empty because of
disabled hosts. The default behavior is not
to invoke the monitor when all hosts in a hostgroup
have been disabled.

.TP
.BI description " descriptiontext"
The text following
.B description
is queried by client programs, passed to alerts and monitors via an
environment variable. It should contain a brief description of the
service, suitable for inclusion in an email or on a web page.

.TP
.BI exclude_hosts " host [host...]"
Any hosts listed after
.B exclude_hosts
will be excluded from the service check.

.TP
.BI exclude_period " periodspec"
Do not run a scheduled monitor during the time
identified by
.IR periodspec .

.TP
.BI depend " dependexpression"
The
.B depend
keyword is used to specify a dependency expression, which
evaluates to either true of false, in the boolean sense.
Dependencies are actual Perl expressions, and must obey all syntactical
rules. The expressions are evaluated in their own package space so as
to not accidentally have some unwanted side-effect.
If a syntax error is found when evaluating the expression, it
is logged via syslog.

Before evaluation, the following substitutions on the expression occur:
phrases which look like "group:service" are substituted with the value
of the current operational status of that specified service. These
opstatus substitutions are computed recursively, so if service A
depends upon service B, and service B depends upon service C, then
service A depends upon service C. Successful operational statuses (which
evaluate to "1") are "STAT_OK", "STAT_COLDSTART", "STAT_WARMSTART", and
"STAT_UNKNOWN".  The word "SELF" (in all caps) can be used for the group
(e.g. "SELF:service"), and is an abbreviation for the current watch group.

This feature can be used to control alerts for services which are
dependent on other services, e.g. an SMTP test which is dependent upon
the machine being ping-reachable.

.TP
.BI dep_behavior " {a|m|hm}"
The evaluation of the dependency graphs specified via the
.B depend
keyword
can control the
suppression of alert or monitor invocations, or the suppression
of individual hosts passed to the monitor.

.BR "Alert suppression" .
If this option is set to "a",
then the dependency expression
will be evaluated after the
monitor for the service exits or
after a trap is received.
An alert will only be sent
if the evaluation succeeds, meaning
that none of the nodes in the dependency
graph indicate failure.

.BR "Monitor suppression" .
If it is set to "m",
then the dependency expression will be evaulated
before the monitor for the service is about to run.
If the evaulation succeeds, then the monitor
will be run. Otherwise, the monitor will not
be run and the status of the service will remain
the same.

.BR "Host suppression" .
If it is set to "hm" then Mon will extract the list of "parent"
services from the dependency expression.  (In fact the expression can
be just a list of services.) Then when the monitor for the service is
about to be run, for each host in the current hostgroup Mon will
search all the parent services which are currently failing and look
for the hostname in the current summary output.  If the hostname is
found, this host will be excluded from this run of the monitor.  This
can be used to e.g. allow an SMTP test on a group of hosts to still be run
even when a single host is not ping-reachable.  If all the rest of the
hosts are working fine, the service will be in an OK state, but if
another host fails the SMTP test Mon can still alert about that host
even though the parent dependency was failing.  The dependency
expression will
.B not
be used recursively in this case.

.TP
.BI alertdepend " dependexpression"
.TP
.BI monitordepend " dependexpression"
.TP
.BI hostdepend " dependexpression"
These keywords allow you to specify multiple dependency expressions of 
different types.  Each one corresponds to the different 
.B dep_behavior
settings listed above.  They will be evaluated independently in the different
contexts as listed above.  If
.B depend
is present, it takes precedence over the matching keyword, depending on the
.B dep_behavior
setting.

.TP
.BI "dep_memory " timeval
If set, dep_memory will cause dependencies to continue to prevent
alerts/monitoring for a period of time after the service returns to a
normal state.  This can be used to prevent over-eager alerting when a
machine is rebooting, for example.  See the explanation of
.I interval
in the "Service Definitions" section
for a description of
.IR timeval .

.TP
.BI redistribute " alert [arg...]"
A service may have one redistribute option, which is a special form of an
an alert definition.  This alert will be called on every service status
update, even sequential success status updates.  This can be used to
integrate Mon with another monitoring system, or to link together multiple
Mon servers via an alert script that generates Mon traps.  See the "ALERT
PROGRAMS" section above for a list of the parameters mon will pass
automatically to alert programs.

.TP
.BI unack_summary
Remove the "acknowledged" state from a service if the summary component of the
failure message changes.  In most common usage the summary is the list
of hosts that are failing, so additional hosts failing would remove an
ack.


.SS "Period Definitions"

Periods are used to define the conditions which
should allow alerts
to be delivered.

.TP
.BI period " [label:] periodspec"
A period groups one or more alarms and variables
which control how often an alert happens when there
is a failure.
The
.B period
definition has two forms. The first
takes an argument which is a
period specification from Patrick Ryan's
Time::Period Perl 5 module. Refer to
"perldoc Time::Period" for more information.

The second form requires a label followed by a period specification, as
defined above. The label is a tag consisting of an alphabetic character
or underscore followed by zero or more alphanumerics or underscores
and ending with a colon. This
form allows multiple periods with the same period definition. One use
is to have a period definition which has no
.B alertafter
or
.B alertevery
parameters for a particular time period, and another
for the same time period with a different
set of alerts that does contain those
parameters.

Period definitions, in either the first or second form, must be unique within
each service definition. For example, if you need to define two
periods both for "wd {Sun-Sat}", then one or both of the period definitions
must specify a label such as "period t1: wd {Sun-Sat}" and
"period t2: wd {Sun-Sat}".

.TP
.BI alertevery " timeval [observe_detail | strict]"
The
.B alertevery
keyword (within a
.B period
definition) takes the same type of argument as the
.B interval
variable, and limits the number of times an alert
is sent when the service continues to fail.
For example, if the interval is "1h", then only
the alerts in the period section will only
be triggered once every hour. If the
.B alertevery
keyword is
omitted in a period entry, an alert will be sent
out every time a failure is detected. By default,
if the summary output of two successive failures changes,
then the alertevery interval is overridden, and an alert
will be sent.
If the string
"observe_detail" is the last argument, then both the summary
and detail output lines will be considered when comparing the
output of successive failures.
If the string "strict" is the last argument, then the output
of the monitor or the state change of the service will have
no effect on when alerts are sent. That is, "alertevery 24h strict"
will send only one alert every 24 hours, no matter what.
Please refer to the
.B "ALERT DECISION LOGIC"
section for a detailed explanation of how alerts are suppressed.

.TP
.BI alertafter " num"

.TP
.BI alertafter " num timeval"

.TP
.BI alertafter " timeval"
The
.B alertafter
keyword (within a
.B period
section) has three forms: only with the "num"
argument, or with the "num timeval" arguments,
or only with the "timeval" argument.
In the first form, an alert will only be invoked
after "num" consecutive failures.

In the second form,
the arguments are a positive integer followed by an interval,
as described by the
.B interval
variable above.
If these parameters are specified,
then the alerts for that period will only
be called after that many failures happen
within that interval. For example,
if
.B alertafter
is given the arguments "3\ 30m", then the alert will be called
if 3 failures happen within 30 minutes.

In the third form,
the argument is an interval,
as described by the
.B interval
variable above.
Alerts for that period
will only be called if the service has been
in a failure state for more than the length
of time desribed by the interval, regardless
of the number of failures noticed within that
interval.

.TP
.BI numalerts " num"

This variable tells the server to call no more than
.I num
alerts during a
failure. The alert counter is kept on a per-period basis,
and is reset upon each success.

.TP
.B "no_comp_alerts"

If this option is specified, then upalerts will be called whenever the
service state changes from failure to success, rather than only after
a corresponding "down" alert.

.TP
.BI alert " alert [arg...]"
A period may contain multiple alerts, which are triggered
upon failure of the service. An alert is specified with
the
.B alert
keyword, followed by an optional
.B exit
parameter, and arguments which are interpreted the same as
the
.B monitor
definition, but without the ";;" exception. The
.B exit
parameter takes the form of 
.B "exit=x"
or
.B "exit=x-y"
and has the effect that the alert is only called if the
exit status of the monitor script falls within the range
of the
.B exit
parameter. If, for example, the alert line is
.I "alert exit=10-20 mail.alert mis"
then
.I mail-alert
will only be invoked with
.I mis
as its arguments if the monitor
program's exit value is between 10 and 20. This feature
allows you to trigger different alerts at different
severity levels (like when free disk space goes from 8% to 3%).

See the
.B "ALERT PROGRAMS"
section above for a list of the pramaeters mon will pass 
automatically to alert programs.

.TP
.BI upalert " alert [arg...]"
An
.B upalert
is the compliment of an
.BR alert .
An upalert is called when a services makes the state transition from
failure to success, if a corresponding "down" alert
was previously sent. The
.B upalert
script is called supplying
the same parameters as the
.B alert
script, with the addition of the
.B \-u
parameter which is simply used to let
an alert script know that it is being called
as an upalert. Multiple upalerts may be
specified for each period definition.
Set the per-period
.B no_comp_alerts
option to 
send an upalert regardless if whether or not
a "down" alert was  sent.

.TP
.BI startupalert " alert [arg...]"
A
.B startupalert
is only called when the
.B mon
server starts execution, or when a "reset"
command was issued to the server, depending on
the setting of the
.B startupalerts_on_reset
global.
Unlike other alerts,
.B startupalerts
are not called following the
exit of a monitor, i.e. they are
called in their own right, therefore the
"exit=" argument is not applicable to
.B startupalert.

.TP
.BI upalertafter " timeval"
The
.B upalertafter
parameter is specified as a string that
follows the syntax of the
.B interval
parameter ("30s", "1m", etc.), and
controls the triggering of an
.BR upalert .
If a service comes back up after
being down for a time greater than
or equal to the value of this option, an
.B upalert
will be called. Use this option to prevent
upalerts to be called because of "blips" (brief outages).

.SH "AUTHENTICATION CONFIGURATION FILE"
The file specified by the
.B authfile
variable in the configuration file (or
passed via the
.B "-A"
parameter) will be loaded upon startup.
This file defines restrictions upon which client
commands may be executed by which users. It is a
text file which consists of comments,
command definitions, and trap authentication parameters.
A comment line begins with optional
whitespace followed by pound sign. Blank lines are ignored.

The file is separated into a command section and a trap
section. Sections are specified by a single line containing
one of the following statements:

.RS
.nf
	command section
.fi
.RE

or

.RS
.nf
	trap section
.fi
.RE

Lines following one of the above statements apply to that section until
either the end of the file or another section begins.

A command definition consists of a command, followed by a colon,
followed by a comma-separated list of users who may execute the command.
The default is that no users may execute any commands unless they are
explicitly allowed in this configuration file. For clarity, a user can
be denied by prefixing the user name with "!". If the word "AUTH_ANY"
is used for a username, then any authenticated user will be allowed to
execute the command. If the word "all" is used for a username, then
that command may be executed by any user, authenticated or not.

The trap section allows configuration of which users may send traps from
which hosts. The syntax is a source host (name or ip address), whitespace,
a username, whitespace, and a plaintext password for that user. If
the source host is "*", then allow traps from any host. If the username
is "*", then accept traps without regard for the username or password. If
no hosts or users are specified, then no traps will be accepted.

An example configuration file:

.RS
.nf
command section
list:		all
reset:		root,admin
loadstate:      	root
savestate:      	root

trap section
127.0.0.1	root	r@@tp4sswrd
.fi
.RE

This means that all clients are able to perform the
.B list
command, "root" is able to perform "reset", "loadstate", "savestate",
and "admin" is able to execute the "reset"
command.

.SH CLIENT\-SERVER\ INTERFACE
The server listens on TCP port 2583, which may be overridden using
the
.BI \-p\  port
option. Commands are a single line each, terminated by a newline.
The server can handle any number of simultaneous client connections.

.SH CLIENT\ INTERFACE\ COMMANDS

See manual page for
.BR moncmd .

.SH MON\ TRAPPING
Mon has the facility to receive special "mon traps" from any local
or remote machine. Currently, the only available method for
sending mon traps are through the Mon::Client perl interface,
though the UDP packet format is defined well enough to permit
the writing of traps in other languages.

Traps are handled similarly to monitors: a trap sends
an operational status, summary line, and description
text, and mon generates an alert or
upalert as necessary.

Traps can be caught by any watch/service group set up in
the mon configuration file, however it is suggested that
you configure watch/service groups specifically for
the traps you expect to receive. When defining a special
watch/service group for traps, do not include a "monitor"
directive (as no monitor need be invoked). Since a monitor
is not being invoked, it is not necessary for the watch
definition to have a hostgroup which contains real host names.
Just make up a useful name, and mon will automatically create
the watch group for you.

Here is a simple config file example:

.RS
.nf
watch trap-service
	service host1-disks
		description TRAP: for host1 disk status
		period wd {Sun-Sat}
			alert mail.alert someone@your.org
			upalert mail.alert -u someone@your.org

.fi
.RE

Since mon listens on a UDP port for any trap, a
default facility is available for handling traps to unknown
groups or services.
To enable this facility, you must include a "default" watch
group with a "default" service entry containing the specifics
of alarms.  If a default/default watch group and service are
not configured, then unknown traps get logged via syslog, and
no alarm is sent.
.B NOTE:
The default/default facility is a single entity as far as
accounting and alarming go. Alarm programs which are not
aware of this fact may send confusing information when a
failure trap comes from one machine, followed by a
success (ok) trap from a different machine. See the alarm
environment variable
.B MON_TRAP_INTENDED
above for a possible way around this. It is intended that
default/default be used as a facility to catch unknown
traps, and should not be relied upon to catch all traps
in a production environment. If you are lazy and only want
to use default/default for catching all traps,
it would be best to disable
upalerts, and use the MON_TRAP_INTENDED environment
variable in alert scripts to make the alerts more
meaningful to you.

Here is an example default facility:

.RS
.nf
watch default
	service default
		description Default trap service
		period wd {Sun-Sat}
			alert mail.alert someone@your.org
			upalert mail.alert -u someone@your.org

.fi
.RE

.SH EXAMPLES
The
.B mon
distribution comes with an
example configuration called
.IR example.cf .
Refer to that file for more information.

.SH SEE ALSO
.BR moncmd (1),
.BR Time::Period (3pm),
.BR Mon::Client (3pm)
.SH HISTORY
.B mon
was written because I couldn't find anything out
there that did just what I needed, and nothing was worth modifying
to add the features I wanted. It doesn't have a cool name, and
that bothers me because I couldn't think of one.
.SH BUGS
Report bugs to the email address below.
.SH AUTHOR
Jim Trocki 
mon-1.2.0/doc/README.rpc.monitor0000644003616100016640000000307110061516616016110 0ustar  trockijtrockij$Id: README.rpc.monitor,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $

README for rpc.monitor
-----------------------

This program is a monitor for RPC-based services such as the NFS
protocol, NIS, and anything else that is based on the RPC protocol.
Some general examples of RPC failures that this program can detect
are:

  - missing and malfunctioning RPC daemons (such as mountd and nfsd)
  - systems that are mostly down (responding to ping and maybe
    accepting TCP connections, but not much else is working)
  - systems that are extremely overloaded (and start timing out simple
    RPC requests)

To test services, the monitor queries the portmapper for a listing of
RPC programs and then optionally tests programs using the RPC null
procedure.

At Transmeta, we use:

  "rpc.monitor -a" to monitor Network Appliance filers
  "rpc.monitor -r mountd -r nfs" to monitor Linux and Sun systems

Some notes:

  - The "-a" option only tests registered RPC programs, if you want to
    test specific RPC programs, use "-r" options along with "-a".
  - The "-a" option may not be feasible for use on many Unix systems.
    Some provide more services than you want to test, include RPC
    programs that don't provide the RPC null procedure, reject calls
    from unprivileged ports, or have other problems.
  - If you get an unexpected "unknown RPC program" error, you may
    want to check the rpc line in /etc/nsswitch.conf.

Let me know if you have any comments, improvements, or suggestions.

-- 
Daniel Quinlan (at work)
quinlan@transmeta.com           http://www.pathname.com/~quinlan/
mon-1.2.0/doc/how-to-write-a-monitor.txt0000644003616100016640000000000210061516616017757 0ustar  trockijtrockij

mon-1.2.0/doc/README.software0000644003616100016640000000110310061516616015462 0ustar  trockijtrockij$Id: README.software,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $

Here is some good software to have when building monitors and
alerts for mon:

QuickPage v3.2
ftp.it.mtu.edu:/pub/QuickPage
Software to send a page via modem and the IXO protocol.  It's very simple
to set up.

Satan 1.1.1
ftp://coast.cs.purdue.edu/pub/tools/unix/satan/
The Satan network security scanner has a number of utilites which 
are useful when adapted to mon.

sock
http://www.kohala.com/~rstevens/tcpipiv1.tar.Z
A utility written by Richard Stevens which allows manipulation
of TCP and UDP connections.
mon-1.2.0/doc/README.syslog.monitor0000644003616100016640000000457710146140376016660 0ustar  trockijtrockijReadme file for syslog.monitor
$Id: README.syslog.monitor,v 1.2 2004/11/15 14:45:18 vitroth Exp $

(Note: This Readme file is an insult to the reader. Better documentation
 will come as soon as I find more time and fix some more bugs)

INTRODUCTION

This is a syslog for mon (http://www.kernel.org/software/mon/) by Jim
Trocki.

It is different from the other monitors, because it is constantly running
and communicates with the mon server via Mon::Client over the network,
instead of running under mon's supervision.

It listens for syslog packets comeing in from the network, parse them,
checks them against a rule set and reports to the mon server if necessary.

REQUIREMENTS

You need to have the following non-std Perl modules installed:

     Time::HiRes
	 Mon::Client

DETAILS

syslog.monitor accepts a single command line parameter, the name of the
configuration file. All options are explained inside the configuration file,
see syslog.conf as an example.

At startup, the daemon retrieves a list of all watches from the mon server
for which a service "syslog" is defined. We also read the hostgroup
definition for this watch from the mon server. (The hostnames are resolved
and the result is used to check if the incoming syslog packet is accepted
and which host it came from, so you should make sure your hostnames resolve
to all IPs from which your systems might send a syslog packet - on a Cisco,
you might want to consider "logging source-interface")

This basically amounts to:

  For every hostgroup you want syslog.monitor to accept and monitor syslog
  packets, define a syslog service.

This watch/service is where we later send our traps.

For those hosts, add a line like

*.*         @syslog.monitor.host.name

to /etc/syslog.conf.

Configure syslog.monitor by editing syslog.conf and following the comments
therein.

Start syslog.monitor.

Restart mon.

killall -HUP syslogd on the hosts you want to monitor.

Read the logfiles and fix the problems. ;-)


AUTHOR

Please don't bother Jim with questions relating to this.

If this should lead to global warming, code freeze or Elvis's revival, I
accept absolutely no responsibility. However, I will gladly receive and
incoporate bugfixes and sensible bug reports.

Lars Marowsky-Brée 

URL

It appears we have made our way to
ftp://ftp.kernel.org/pub/software/mon/contrib/ - please use a mirror, as
described on http://www.kernel.org/.
mon-1.2.0/INSTALL0000644003616100016640000001570410637741301013251 0ustar  trockijtrockij$Id: INSTALL,v 1.2.2.3 2007/06/25 13:27:29 trockij Exp $

OVERVIEW
--------

There are several components you'll need to get working to
have a fully functional mon installation. 

    1. mon, the server 
    2. Mon::Client, the Perl library used by some clients
    3. C programs in mon.d
    4. Optional (but highly useful) monitors
    5. A customized mon.cf to make the server do what you want


1. MON SERVER
-------------

The "mon" daemon uses Perl 5.n, where n >= 005_01. 

Mon requires that *.ph be created from the system header files.  If you try to
run mon and Perl complains with the "did you run h2ph?" message, then chances
are this step wasn't done, either by your package manager or manually after
Perl installation. You can fix it by doing the following, as root:

	cd /usr/include
	h2ph -r -l .

You'll need the following modules for the server to function, all of
which are available from your nearest CPAN archive. The listed
CPAN paths relative to /cpan/modules/by-authors/id/ -- versions of 
modules on CPAN change quickly, so there may be newer versions available,
but the following are known to work:

    Time::Period	PRYAN/Period-1.20.tar.gz
    Time::HiRes		J/JH/JHI/Time-HiRes-1.59.tar.gz


2. INSTALLING THE PERL CLIENT MODULE
------------------------------------

The Perl client module is distributed as a separate package, and is required
for the web interfaces "mon.cgi" and "monshow" and for "moncmd". It is named
"mon-client-*.tar.gz".  Refer to that for installation instructions.  It is
available on kernel.org mirrors in the /pub/software/admin/mon directory, and
in CVS on sourceforge.net.  Be sure to match the version of mon-client with the
version of mon you are using.  At this time, branch "mon-1-2-branch" of the mon
CVS module matches the "mon-client-1-2-branch" branch of the mon-client CVS
module. See http://sourceforge.net/projects/mon/ for information on CVS access.


3. COMPILING THE C CODE (optional)
----------------------------------

Some of the monitors included with mon are written in C and need to
be compiled for your system. If you want to use the RPC monitor or the 
dialin.monitor wrapper,

    cd mon.d
    (edit Makefile)
    make
    make install
    cd ..

Keep in mind that although this is known to work on Linux, Solaris, and AIX,
it may not compile on your system. It is not required for the operation of mon
itself.


4. MONITORS
-----------

All of the monitor and alert scripts that are packaged with mon are actually
optional in that mon will only use the monitors which you have specified in
your configuration. There is no need to install any extra dependencies, such as
Perl modules, for monitors which you will not be using.

You may test to see if a monitor works outside of mon by simply running it from
the command line, giving it a host as an argument. For example:

    $ ./fping.monitor uplift

    start time: Mon Jun 25 07:52:34 2007
    end time  : Mon Jun 25 07:52:34 2007
    duration  : 0 seconds

    --------------------------------------------------------------------------
    reachable hosts                          rtt
    --------------------------------------------------------------------------
    uplift                                   0.07 ms

Some monitors may need Perl modules which may not already be installed on your
system. To determine if you have the requisite modules installed, you can pass
the monitor to "perl -c" and see if it says "OK", like this:

   $ perl -c fping.monitor 
   fping.monitor syntax OK

If it gripes about "Can't locate...", then what follows are the modules which
you'll need to install from CPAN.

Monitors may have their own embedded documentation at the top, and some
may provide their own POD (viewable with "pod2man xyz.monitor | nroff -man").
There may also be some additional documentation on some monitors in the "doc"
directory of the root of the tarball.


5. MON.CF CUSTOMIZATION AND STARTUP
-----------------------------------

-You may want to begin learning about mon by following the slides from the
 presentation. They cover the components, what they do, how they relate, and
 how it all works together. Find them here:

 http://mon.wiki.kernel.org/index.php/Documentation

-Read the man page for "mon" and "moncmd" in the doc/ directory to get
 an overview of the directories involved, i.e. the configuration,
 alert, monitors, state, and run directories.

 cd doc
 nroff -man mon.8 | more

-read the "READMEs" in the doc/ directory for some useful
 insight on system configuration.

-Make your own mon.cf file, using the suppled "example.cf" (located
 in the etc/ directory) as a template, or the m4-based "example.m4":

 cp etc/example.cf mon.cf
 
or

 cp etc/example.m4 mon.m4

-Edit the "auth.cf" file. This file controls which users can perform
 what command. The default is pretty restrictive (read-only), but that's
 only for safety. Currently, "moncmd", "monshow", and "mon.cgi" are the
 only clients which are able to authenticate themselves to the server;
 the 2-way pager interface does not yet perform authentication. However,
 these programs work fine in read-only mode.


-Add the following lines to /etc/services:

mon             2583/tcp                        # MON
mon             2583/udp                        # MON traps

-You may want to make a DNS CNAME entry called "monhost" for your
 host that will run "mon". You can then set the environment variable
 MONHOST to be this host. "moncmd" uses this variable.

-The Perl scripts look for perl in /usr/bin. You might want to change
 this. I'd advise keeping a locally-installed copy of Perl if you're
 going to monitor network resources and you expect this stuff to work
 when some component of the network is down.

-Test it by starting "mon" from the distribution directory. Use these
 arguments if you chose the non-m4 config:

    ./mon -f -c mon.cf -b `pwd`

and these arguments for the m4-based config:

    ./mon -f -M -c mon.m4 -b `pwd`

To see if it's running on your machine:

    ./clients/moncmd -s localhost list pids

If you get some output, then things are probably OK. Check the
syslog for further diagnostics.

Mon doesn't really need to be installed in any special location.  Just
keep it on the local disk of the machine which will be running the server.


WEB INTERFACE
-------------

This distribution contains two web interfaces: monshow and mon.cgi.  monshow is
a simple report-only tool which supports configurable "views" of the mon
configuration. monshow also operates as a textmode report generator.

mon.cgi, however, supports the full functionality of mon, including the ability
to disable/enable groups and hosts and services, acknowledge failed services,
show alert and downtime history, authenticate users, among many other things.

To install monshow, simply copy clients/monshow into your web server's cgi-bin
path and name it "monshow.cgi". You may want to read the man page in the doc/
directory so that you can understand how to configure a "view" to your liking.

To install mon.cgi, follow the instructions found in doc/README.mon.cgi.
mon-1.2.0/KNOWN-PROBLEMS0000644003616100016640000000027510146140374014253 0ustar  trockijtrockij#
# $Name: mon-1-2-0-release $
#

KNOWN PROBLEMS in $Name: mon-1-2-0-release $
------------------------------

clients/skymon has not yet been updated to work with the new Mon::Client
API.
mon-1.2.0/clients/0000755003616100016640000000000010640450346013652 5ustar  trockijtrockijmon-1.2.0/clients/batch-example0000755003616100016640000000044610061516617016317 0ustar  trockijtrockij#!/bin/sh
#
# an example of calling moncmd in batch mode
#
# $Id: batch-example,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $
#
trap "stty echo && echo && exit" 2

echo -n "Password: "
stty -echo
read p
echo
stty echo

cat <);
	print "\n";
	system "stty echo";
	die "invalid password\n" if ($PASS =~ /^\s*$/);

    } elsif (!@ARGV) {
	$cmd = <$H>;
	while (defined ($cmd) && $cmd =~ /user=|pass=/i) {
	    chomp $cmd;
	    if ($cmd =~ /^user=(\S+)$/i) {
		$USER=$1 if (!defined ($USER));
	    } elsif ($cmd =~ /^pass=(\S+)$/i) {
		$PASS=$1;
	    }
	    
	    $cmd = <$H>;
	}

    }

    die "inadequate authentication information supplied\n"
    	if ($USER eq "" || $PASS eq "");
}

#
# set up TCP socket
#
$iaddr = inet_aton ($MONSERVER) ||
	die "Unable to find server '$MONSERVER'\n";

if ($MONPORT =~ /\D/) { $MONPORT = getservbyname ($MONPORT, 'tcp') }
$paddr = sockaddr_in ($MONPORT, $iaddr);
$proto = getprotobyname ('tcp');

socket (MON, PF_INET, SOCK_STREAM, $proto) ||
    die "could not create socket: $!\n";
connect (MON, $paddr) ||
    die "could not connect: $!\n";

select (MON); $| = 1; select (STDOUT);

#if( defined(my $line = )) {
#    chomp $line;
#    unless( $line =~ /^220\s/) {
#	die "didn't receive expected welcome message\n";
#    }
#} else {
#    die "error communicating with mon server: $!\n";
#}

#
# authenticate self to the server if necessary
#
if ($opt_a) {
    ($l, @out) = do_cmd(MON, "login $USER $PASS");
    die "Could not authenticate\n"
	if ($l =~ /^530/);
}


if ($opt_f or !@ARGV) {
    $cmd = <$H> if ($opt_f || !@ARGV);
    $l = "";
    while (defined ($cmd) && defined ($l)) {
	#
	# send the command
	#
	chomp $cmd;
	($l, @out) = do_cmd (MON, $cmd);
	last if (!defined ($l));
	for (@out) {
	    print "$_\n";
	}
	print "$l\n";

	$cmd = <$H>;
    }
    close ($H);

} else {
    ($l, @out) = do_cmd (MON, "@ARGV");
    for (@out) {
	print "$_\n";
    }
    print "$l\n";
}

#
# log out
#
do_cmd (MON, "quit");

close(MON);


#
# submit a command to the server, wait for a response
#
sub do_cmd {
    my ($fd, $cmd) = @_;
    my ($l, @out);

    return ("", undef) if ($cmd =~ /^\s*$/);

    @out = ();
    print $fd "$cmd\n";
    while (defined($l = <$fd>)) {
        chomp $l;
        if ($l =~ /^(\d{3}\s)/) {
            last;
        }
        push (@out, $l);
    }

    ($l, @out);
}


#
# usage
#
sub usage {
    print <|host |service  )\n";
    exit;
}


# Make sure we're running on the master.
$hostname = hostname;
$hostname =~ tr/a-z/A-Z/;
$mon_master =~ tr/a-z/A-Z/;
if ($hostname ne $mon_master) {
    print STDERR "No propagation from servers other then the master!\n";
    exit -1;
}


# Figure out what the argument portions of the URL need to be.
if ($ARGV[0] eq 'disable') {
    $args = "?command=mon_disable";
    if ($ARGV[1] eq 'watch') {
	$args .= "&args=watch,$ARGV[2]&rt=none";
    } elsif ($ARGV[1] eq 'service') {
	$args .= "&args=service,$ARGV[2],$ARGV[3]&rt=none";
    } elsif ($ARGV[1] eq 'host') {
	$args .= "&args=host,$ARGV[2]&rt=none";
    }
} elsif ($ARGV[0] eq 'enable') {
    $args = "?command=mon_enable";
    if ($ARGV[1] eq 'watch') {
	$args .= "&args=watch,$ARGV[2]&rt=none";
    } elsif ($ARGV[1] eq 'service') {
	$args .= "&args=service,$ARGV[2],$ARGV[3]&rt=none";
    } elsif ($ARGV[1] eq 'host') {
	$args .= "&args=host,$ARGV[2]&rt=none";
    }
} elsif ($ARGV[0] eq 'test') {
    $args = "?command=mon_test_service&args=$ARGV[1],$ARGV[2]";
} else {
    print STDERR "Unknown command $ARGV[0]\n";
    exit -1;
}


$ENV{HTTPS_CERT_FILE} = $crt;
$ENV{HTTPS_KEY_FILE} = $key;

# Now fork and do the work.
# We fork so that we don't wait for each individual request to finish.
# We fork twice so that the kernel will take care of process cleanup for us.
$pid = fork;
if ($pid) {
    waitpid ($pid, 0);
    print STDERR "Parent exiting\n" if ($debug);
    exit 0;
} else {
    foreach $host (@hosts) {
	if (fork) {
	    next;
	}
	my $ua = LWP::UserAgent->new;
	
	$ua->agent("MonRemote/0.1");
	print STDERR "@ARGV\n" if ($debug);
	print STDERR "$args\n" if ($debug);
	my $req = HTTP::Request->new(GET => "https://$host/$path$args");
	
	$req ->content_type('application/x-www-form-urlencoded');
	
	my $res = $ua->request($req);
	
	if ($res->is_success) {
	    print STDERR "Worker exiting\n" if ($debug);
	    exit 0;
	} else {
	    print STDERR "\n$host\n@ARGV\nRequest to remote server failed\n";
	    exit 0;
	}
    }
    print STDERR "Child exiting\n" if ($debug);
    exit 0;
}
mon-1.2.0/clients/monfailures0000755003616100016640000000553010061516617016130 0ustar  trockijtrockij#!/usr/bin/perl -w

# Quickly show Mon failure status from command line.

# to configure, hard-code the user and password for either
# your public Mon username or a username that is only allowed
# to use the "list" command and nothing else.  I run this
# script out of inetd on the mon server so the people who can
# see its results can't read the script (and see the hard-coded
# password).

# Written by Ed Ravin  Wed Jan  2 12:23:44 EST 2002
# Release Version: 1.2


# $Header: /cvsroot/mon/mon/clients/monfailures,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $

use strict;


my %opt;
use Getopt::Long;
GetOptions (\%opt, "debug",  "server=s", "port=s", "user=s", "password=s");

############################  configurable stuff 
my $default_user="";
my $default_password= "";
############################ 


my $debug= $opt{'debug'} || 0; 

my (%failures);
my ($now);


use Mon::Client;

my $mon;

# find the client

    if (!defined ($mon = Mon::Client->new)) {
		die "$0: could not create client object: $@";
    }

	if (defined $opt{'server'}) {
	    $mon->host ($opt{'server'});
	}
	else {
		$mon->host ("localhost");
	}

	$mon->port ($opt{'port'})   if (defined $opt{'port'});
	$mon->username($opt{'user'} || $default_user);
	$mon->password($opt{'password'} || $default_password);

	$mon->connect;
	die "$0: Could not connect to server: " . $mon->error . "\n"
		unless $mon->connected;

	if ($mon->username ne "")
	{
	    $mon->login;
	    die "$0: login failure: " . $mon->error . "\n" if $mon->error;
	}

	# Load data from Mon


	%failures = $mon->list_failures;
	die "$0: Error doing list_failures : " . $mon->error
		if ($mon->error);

	$now= time;  # time mon data was fetched


# group=thathost service=port8888 opstatus=0 last_opstatus=0 exitval=1 timer=11
# last_success=0 last_trap=0 last_check=955058065 ack=0 ackcomment=''
# alerts_sent=0 depstatus=0 depend='' monitor='tcp.monitor -p 8888'
# last_summary='thathost'
# last_detail='\0athathost could not connect: Connection refused\0a'
# last_failure=955058067 interval=60 first_failure=955055062
# failure_duration=3052

my ($watch, $service, $downtime, $summary, $acked);
format STDOUT_TOP =

Hostgroup:Service               Down Since           Error Summary
-----------------               ----------           -------------
.

format STDOUT =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<<<<<<<<
$watch . ":" . $service,   $downtime,             $summary
.

# list out any failures
if (%failures)
{
	foreach $watch (keys %failures) {
	   foreach $service (keys %{$failures{$watch}}) {
			my $sref= \%{$failures{$watch}->{$service}};
			$downtime= localtime $sref->{'first_failure'};
			$acked= $sref->{'ack'} !=0;
			$summary= $sref->{'last_summary'};

	$summary= "[acked] $summary" if $acked;
	write;
			}
	}
	print "\n";
	exit(1);
}
else
{
	print "No failures found.\n";
	exit(0);
}
mon-1.2.0/clients/mon.cgi0000755003616100016640000045015410616437072015147 0ustar  trockijtrockij#!/usr/bin/perl -T
#!/usr/bin/perl -Tw broke when I made changes to list_dtlog that involved
# submitting three commas ",,," in a row into the value of $args :(
#
# NAME
#  mon.cgi
#
#
# DESCRIPTION
#  Web interface for the Mon resource monitoring system. mon.cgi
#  implements a significant subset of the Perl interface to Mon, which
#  allows administrators to quickly view the status of their network
#  and perform many common Mon tasks with a simple web client.
#
# Requires mon 0.38-21 and Mon::Client 0.11 for proper operation.
#
#
# AUTHORS
#  Originally by:
#   Arthur K. Chan 
#  Based on the Mon program by Jim Trocki . 
#   http://www.kernel.org/software/mon/
#  Rewritten to support Mon::Client, mod_perl, taint mode,
#  authentication, the strict pragma, and other visual/functional 
#  enhancements by Andrew Ryan .
#  Downtime logging contributed by Martha H Greenberg 
#  Site-specific customization routines contributed by Ed Ravin 
#
# ----------------------------------------------------------------------
# $Id: mon.cgi,v 1.4.2.1 2007/05/03 19:55:38 trockij Exp $
# ----------------------------------------------------------------------
#
#
# INSTRUCTIONS
# Install this cgi script to wherever your cgi-bin directory sits
# on your mon server. If you don't have a web server installed, try
# http://www.apache.org. This script hasn't been tested with any
# web server, although there is no reason it wouldn't work under
# any other web server as a CGI script.
#
# This script now runs cleanly under mod_perl (tested under apache 1.3.9,
# mod_perl 1.21), if you're running that. Global variables have not
# been eliminated but at least we're being careful now.
#
# This script also runs cleanly under taint mode, which is activated
# by using the -T switch for CGI scripts, and by using the directive
# "PerlTaintCheck On" in your httpd.conf file if you are running
# under mod_perl.
#
# Modify the "Configurable Parameters" section below to customize it
# to your site's settings. mon.cgi also supports an optional config
# file which allows you to set all the same parameters. Please
# see the file README.site-customization for more details.
#
# If you want to easily customize the look and feel of mon.cgi, 
# as well as various other configuration options, copy the sample 
# mon.cgi.cf file (in the /config directory of this distribution) 
# into a location where your webserver can read it, and edit the 
# line beginning '$moncgi_config_file = ""' to reflect the path 
# to your config file. You can then change the look and feel of 
# mon.cgi, as well as implement access controls, directly from this 
# file.
#
# If you want to do a lot of the things that this script lets you do,
# and you don't want any authorization to be necessary, then
# you need to open up your auth.cf file to allow anyone to perform
# actions that you would like mon.cgi to perform. 
#
# No authentication might work in an environment where there are very
# few Mon users and they can all be trusted equally, or if you want
# to use mon.cgi in a sort of "read-only" capacity, where all users
# can list, for example, but no web users can enable/disable 
# monitoring and/or control the server in any way.
#
# Alternatively, if you want to use authentication, you need to have
# a working authentication setup with Mon from the command line
# before attempting to make authentication work with mon.cgi.
#
# Authentication is very flexible, and is trivial to implement in 
# mon.cgi, assuming you already have authentication working from 
# the command line. Just un-comment out the "$must_login line, change
# $app_secret to be something unique (VERY IMPORTANT!) and 
# mon will start requiring authentication for ALL commands.
#
# Authentication users should change their app secret on a regular
# basis if security is a concern. Actually, if security is a concern,
# don't run mon.cgi, because unless you use SSL, AND your monhost is
# on the same server as your web server, AND you use a short timeout
# on cookies, AND you change your app secret often and keep it
# VERY secure, you don't have a secure web system. But this simple 
# authentication mechanism is enough to keep most people happy.
#
#
# This script will require the CGI perl module. Available at any
# perl CPAN site. See http://www.perl.org for details. Oh, and don't
# forget Mon::Client, but the assumption is you are already running
# mon in some fashion and so you know this already.
#
# In addition, if you want to use the authentication piece of mon.cgi,
# you need to install the Crypt::TripleDES module, also available
# (tested w/ Crypt::TripleDES v0.24), and your browser needs to allow
# cookies (or else you need to be prepared to type in your username
# and password an awful lot!).
#
#
# BUGS
#  Probably many.
#  Send bugs/comments about this software to 
#  Andrew Ryan 
#  Please include any output from your web server's error log that the
#  script might have output, this will help immensely in solving problems.
#  Also please include the versions of mon.cgi, Mon and Mon::Client you 
#  are using.
#


BEGIN {
    # Auto-detect if we are running under mod_perl or CGI.
    $USE_MOD_PERL = exists $ENV{'MOD_PERL'}
    ? 1 : 0;
    if ($USE_MOD_PERL) {
	# Use the cgi module and compile all methods at 
	# the beginning but only once
	use CGI qw (-compile :standard) ;				       
    } else {
	# Use the cgi module and compile all methods only 
	# when they are invoked via the autoloader.
	#use CGI qw (-debug) ; #DEBUG
	use CGI qw (:standard) ;
    }
    $CGI::POST_MAX=1024 * 100;  # max 100K posts
    $CGI::DISABLE_UPLOADS = 1;  # no uploads
}


# Configurable Parameters ----------------------------------------------
# Basic global vars
use Mon::Client;			       # mon client interface
use strict;			               # because strict is GOOD
use vars qw($RCSID $RCSVERSION $VERSION $AUTHOR $organization $monadmin 
	    $logo $logo_link $reload_time  $monhost $monport $url 
	    $login_expire_time $cookie_name $cookie_path %cgiparams
            $vcookie_name $vcookie_path $vcookie @views $curview
	    $monhost_and_port_args $monhost_and_port_args_meta
	    $has_read_config $moncgi_config_file $cf_file_mtime
	    $untaint_ack_msgs @show_watch $show_watch_strict
	    $required_mon_client_version);
# Formatting-related global vars
use vars qw($BGCOLOR $TEXTCOLOR $LINKCOLOR $VLINKCOLOR 
	    $greenlight_color $redlight_color $unchecked_color 
	    $yellowlight_color $disabled_color 
	    $fixed_font_face $sans_serif_font_face 
	    $dtlog_max_failures_per_page);
# Security-related global vars
use vars qw($must_login $app_secret %loginhash $des $has_prompted_for_auth $destroy_auth_cookie $default_username $default_password);
$has_prompted_for_auth = "";        #this must always be cleared for mod_perl
undef $destroy_auth_cookie;        #this must always be undef'd for mod_perl
undef %cgiparams;        # this must always be undef'd for mod_perl
undef $monhost_and_port_args;      # This is defined if the user overrided monhost or monport
undef $monhost_and_port_args_meta;      # This is defined if the user overrided monhost or monport
undef @show_watch;
undef $show_watch_strict;

$RCSID = '$Id: mon.cgi,v 1.4.2.1 2007/05/03 19:55:38 trockij Exp $';
$RCSVERSION = '$Revision: 1.4.2.1 $';
$VERSION = $RCSVERSION;
$VERSION =~ s/\Revision: //i; $VERSION =~ s/\$//g ; $VERSION =~ s/\s+//g;
$AUTHOR = 'andrewr@nam-shub.com';
$required_mon_client_version = "0.11";  #Version of Mon::Client which we require for successful operation

#
# If you want to use a config file to specify mon.cgi parameters, put
# the full path to the file in this variable. 
#
# If you do not wish to use a config file, leave this variable empty.
#
$moncgi_config_file = "";


#
# This subroutine initializes the configuration variables which
# can be set here, but also overridden with the optional mon.cgi 
# config file.
#
# We put this subroutine at the top of the code so that users
# can get to it more easily.
#
sub initialize_config_globals ;
sub initialize_config_globals {
    undef $has_read_config ;   #undef this for mod_perl
    $must_login = "";                    #this must always be undef'd for mod_perl

    $organization = "";	   # Organization name.
    $monadmin = "BOFH\@your.domain";		   # Your e-mail address.
                                                   # note: must escape @ signs!
    $logo = "";      # Company or mon logo.
    $reload_time = 180;				      # Seconds for page reload.

    $monhost = "localhost";				# Mon server hostname.
    $monport = "2583";				# Mon port number.

    #$must_login = "yes";                 # Uncomment this out if you want 
    # authentication to be mandatory
                                     # for all connections to the mon server.
    #!!! WARNING!!!!! You must change $app_secret to something unique to your site!
    $app_secret = '1.90LK=R==36jlel492jl><>header();
    print $webpage->start_html(-title=>"mon.cgi error: insufficient Mon::Client version",
			       -BGCOLOR=>$BGCOLOR,
			       -TEXT=>$TEXTCOLOR,
			       -LINK=>$LINKCOLOR,
			       -VLINK=>$VLINKCOLOR,
			       -META=>{
				   'generator'=>"mon.cgi $VERSION ($AUTHOR)",
			       },
			       );    
    print $webpage->h3("Insufficient version ($Mon::Client::VERSION) of the Mon::Client perl module installed. Please upgrade your Mon::Client to at least version $required_mon_client_version before running mon.cgi.");
    print $webpage->h3("Also note, if you're running mon.cgi under Apache+mod_perl, you'll need to restart Apache after upgrading the Mon::Client library.");
    print $webpage->end_html;
    exit;
}


#
# Read CGI params -- these overwrite anything in a config file 
# or hard-coded.
# This can change the value of the $monhost and $monport 
# global variables defined in initialize_config_globals and
# moncgi_read_cf.
#
# This can cause a problem if $monhost or $monport are defined here and we are running mod_perl...
#
&moncgi_get_params;


#
# Used to escape HTML in ack's
#
if ($untaint_ack_msgs =~ /^y(es)?$/i) {
    eval "use HTML::Entities" ;
} else {
    undef $untaint_ack_msgs;
}


# Initialize a TripleDES global if login is required, 
#  otherwise undef $must_login
if ($must_login =~ /^y(es)?$/i) {
    eval "use Crypt::TripleDES";
    $des = new Crypt::TripleDES;
} else {
    $must_login = "";
}

#
# Set (or unset) $show_watch_strict according to its value
#
if ($must_login =~ /^y(es)?$/i) {
    $show_watch_strict = 1;
} else {
    $show_watch_strict = "";
}

#
#Initialize the wordy descriptions of alert variables
#
%alert_vars = (
	       'depend' => "Dependencies, if any",
	       'service' => "Service being monitored",
	       'last_check' => "The last time this service was checked",
	       'timer' => "Time remaining until this service is next checked",
	       'last_summary' => "Summary output from most recent failure of this service",
	       'opstatus' => "Current status of this service (0=error, 1=OK, 7=unchecked)",
	       'alerts_sent' => "Number of alerts sent",
	       'interval' => "Test interval, in seconds",
	       'last_detail' => "Detail output from the most recent failure of this service",
	       'monitor' => "Monitor used to test this service",
	       'last_trap' => "Last time a trap was received on this service",
	       'last_alert' => "Last time an alert was sent for this service",
	       'last_success' => "Last time this service returned an OK result",
	       'group' => "Hostgroup",
	       'failure_duration' => "Length of failure",
	       'ack' => "Acknowledgement status (1=failed service was ack'ed)",
	       'ackcomment' => "Comment issued by the acknowledger",
	       'first_failure' => "First failure time of this service",
	       'last_failure' => "Last failure time of this service",
	       'depend' => "Hostgroups/Services on which this service depends",
	       'last_check' => "Time this service was last checked",
	       'service' => "Service being checked",
	       'last_opstatus' => "Previous opstatus for this service (0=error, 1=OK, 7=unchecked)",
	       'exitval' => "Last exit value of monitor for this service (0=OK, anything else indicates failure)",
	       'depstatus' => "Dependency status (1 = dependencies OK, 0=dependencies not OK or no dependencies)",
	       'last_summary' => "Summary output from most recent failure of this service",
	       'last_detail' => "Detail output from the most recent failure of this service",
	       );


# These are variables from svc_details which should be represented as
# pretty-printed time strings. Mon gives them to us as UNIX time(2), so we
# have to convert. They used to be hardcoded deep into the code, this
# is an attempt at a readability improvement.
# Example representation: '(31 days, 17 hours, 53 minutes, 25 seconds ago)'
@time_based_alert_vars = (
			  "last_check",
			  "last_failure",
			  "last_trap",
			  "last_alert",
			  "last_success",
			  "last_failure",
			  "first_failure",
			  );

# These are variables from svc_details which should be represented as
# "pretty printed" seconds/minutes/hours/days.
# Example representation: '4 minutes, 19 seconds'
@pp_sec_alert_vars = (
			  "timer",
			  "interval",
			  "failure_duration",
		      );


%auth_commands = (                 # This global tracks the authorization
                                   # status of all commands mon.cgi can 
                                   # issue for a user. It is a candidate for
                                   # inclusion in a cookie someday.
		  list =>      { auth=>0, bgcolor=>""},
		  reset =>     { auth=>0, bgcolor=>""},
		  stop =>      { auth=>0, bgcolor=>""},
		  start =>     { auth=>0, bgcolor=>""},
		  savestate => { auth=>0, bgcolor=>""},
		  loadstate => { auth=>0, bgcolor=>""},
		  disable =>   { auth=>0, bgcolor=>""},
		  enable =>    { auth=>0, bgcolor=>""},
		  test =>      { auth=>0, bgcolor=>""},
		  ack =>       { auth=>0, bgcolor=>""},
		  reload =>    { auth=>0, bgcolor=>""},
		  );

$auth_commands_checked = 0;        # This global tracks whether authorization
                                   # for all commands has been checked.


###############################################################
# Function definitions begin below
###############################################################
#
# Forward declare all functions, for our sanity.
#
#
# General functions
#
sub pp_sec ;
sub pp_sec_brief ;
sub arithmetic_mean ;
sub median ;
sub std_dev ;
sub validate_name ;
sub gen_ciphertext ;
sub gen_cleartext ;
#
# Base mon.cgi pages
#
sub setup_page ;
sub print_bar ;
sub query_opstatus ;
sub can_show_group ;
sub list_status ;
sub query_group ;
sub end_page ;
sub list_alerthist ;
sub svc_details ;
sub list_disabled ;
sub list_dtlog ;
sub list_pids ;
#
# mon functions
#
sub mon_connect ;
sub mon_views;
sub mon_list_group ;
sub mon_list_watch ;
sub mon_list_failures ;
sub mon_list_successes ;
sub mon_list_opstatus ;
sub mon_list_disabled ;
sub mon_reload ;
sub mon_loadstate ;
sub mon_savestate ;
sub mon_loadstate_savestate ;
sub mon_schedctl ;
sub mon_list_pids ;
sub mon_list_descriptions ;
sub mon_enable ;
sub mon_disable ;
sub mon_test_service ;
sub mon_test_config ;
sub mon_reset ;
sub mon_list_alerthist ;
sub mon_list_sched_state ;
sub mon_list_dtlog ;
sub mon_ack ;
sub mon_servertime ;
sub mon_checkauth ;
sub mon_state_change_enable_only ;
sub mon_state_change ;
#
# mon.cgi functions
#
sub moncgi_get_params ;
sub moncgi_logout ;
sub moncgi_authform ;
sub moncgi_generic_button ;
sub moncgi_switch_user ;
sub moncgi_print_service_table_legend ;
sub moncgi_list_dtlog_navtable ;
sub moncgi_test_all ;
sub moncgi_reset ;
sub moncgi_read_cf ;
sub moncgi_login ;
sub moncgi_custom_print_bar ;
sub moncgi_custom_commands;

###############################################################
# General functions, not specific to Mon or mon.cgi
###############################################################
sub pp_sec {
    # This routine converts a number of seconds into a text string
    # suitable for (pretty) printing. The dtlog from Mon reports downtime
    # in seconds, and we want to present the user with more meaningful
    # data than "the service has been down for 13638 seconds"
    #
    # By Martha Greenberg  w/ pedantic plural 
    # modifications by Andrew.
    use integer;
    my $n = $_[0];
    my ($days, $hrs, $min, $sec) = ($n / 86400, $n % 86400 / 3600,
				    $n % 3600 / 60, $n % 60);
    my $s = $sec . " second";
    $s .= "s" if $sec != 1;   #because 0 is plural too :)
    if ($min > 0) {
	if ($min == 1) {
	    $s = $min . " minute, " . $s;
	} else {
	    $s = $min . " minutes, " . $s;
	}
    }
    if ($hrs > 0) {
	if ($hrs == 1) {
	    $s = $hrs . " hour, " . $s;
	} else {
	    $s = $hrs . " hours, " . $s;
	}
    }
    if ($days > 0) {
	if ($days == 1) {
	    $s = $days . " day, " . $s;
	} else {
	    $s = $days . " days, " . $s;
	}
    }
    return $s;
}


sub pp_sec_brief {
    # This routine converts a number of seconds into a text string
    # suitable for brief (yet pretty) printing.
    #
    # We use this on the opstatus page to display deltas of times (for
    # last check, next check).
    use integer;
    my $n = $_[0];
    my $s;
    if ($n >= 0) {
	$s .= "+" ;
    } else {
	$s .= "-" ;
	$n = abs($n);
    }
    my ($days, $hrs, $min, $sec) = ($n / 86400, $n % 86400 / 3600,
				    $n % 3600 / 60, $n % 60);
    if ($days > 0) {
	$s .= $days . "d";
    }
    if ($hrs > 0) {
	$s .= $hrs . "h";
    }
    if ($min > 0) {
	$s .= $min . "m";
    }
    $s .= "${sec}s" ;
    return $s;
}


sub arithmetic_mean {
    # Given an array of numbers, this function returns the arithmetic mean
    return 0 if scalar(@_) == 0 ;      #don't waste our time
    my $sum = 0;
    foreach (@_) {
	$sum += $_;
    }
    return $sum/scalar(@_);
}

sub median {
    # Given an array of numbers, this function returns the median
    return 0 if scalar(@_) == 0 ;      #don't waste our time
    my $middle_element_index = int(scalar(@_)/2);
    @_ = sort {$a <=> $b} @_;
    if (scalar(@_)%2 == 1) {  #odd num of elements, take the middle
        return $_[$middle_element_index];
    } else {  # even # of elements, take the avg of the 2 middle elements
        return &arithmetic_mean($_[$middle_element_index],$_[($middle_element_index-1)]);
    }
}


sub std_dev {
    # Given an array of numbers, this function returns their 
    # standard deviation.
    return 0 if scalar(@_) < 2 ;      #don't waste our time
    my $sum = 0;
    my $mean = &arithmetic_mean(@_);
    foreach (@_) {
	$sum += ( $_ - $mean )**2 ;
    }
   return ( $sum/(scalar(@_) - 1) )**0.5 ;
}


sub validate_name {
    # Return untainted host or group name if safe, undef otherwise.
    # Because you can never scrub your input too well.
    return $_[0] =~ /^([\w.\-_]+)$/ ? $1 : undef;
}


sub gen_ciphertext {
    # This function takes as its input a piece of plaintext and
    # returns a piece of ASCII, 3DES-encoded ciphertext, or undef if 3DES fails
    # for some reason. The key used is the global value "$app_secret".
    my ($plaintext) = (@_);
    my $ciphertext ;

    if ($ciphertext = $des->encrypt3 ("$plaintext", "$app_secret" )) {
	# convert key to hex
	$ciphertext = unpack("H*", $ciphertext) ;
#	print "ciphertext is $ciphertext
\n"; #DEBUG return $ciphertext; } else { return undef; } } sub gen_cleartext { # This function takes as its input a piece of ASCII, 3DES-encoded # ciphertext and returns a piece of plaintext, # or undef if 3DES fails for some reason. The key used is the # global value "$app_secret". my ($ciphertext) = (@_); my $plaintext ; return undef if ! $ciphertext; #convert key to format decrypt3() will understand $ciphertext = pack("H*", $ciphertext) ; if ($plaintext = $des->decrypt3 ("$ciphertext", "$app_secret" )) { # print "plaintext is $plaintext
\n"; #DEBUG return $plaintext; } else { return undef; } } ############################################################### # Presentation functions. These all have the common feature # that they format a bunch of information and present it in # a nice(?) way to the user. ############################################################### sub setup_page { # Setup the html doc headers and such # Also, get/set username/password cookie if $must_login is in effect my ($title) = (@_); my (@time, $ttime); my $title_color = "$TEXTCOLOR"; my $page_title = "$organization : " if $organization ne ""; $page_title = "${page_title}MON - $title ($monhost:$monport)"; my $time_now = time; my @expires_time = gmtime($time_now + $login_expire_time); # Put the cookie date format in the standard cookie format my $expires = sprintf ("%s, %.2d-%s-%d %.2d:%.2d:%.2d GMT", @days_of_week[$expires_time[6]], $expires_time[3], @year_months[$expires_time[4]], $expires_time[5] + 1900, @expires_time[2,1,0]); my ($encrypted_password, $cookie, $cookie_value); my $refresh_url; # # Define $args as null if it is not currently defined # $args = "" if ! defined($args); # Set the refresh page to always be the summary page, unless # certain commands are selected. if ( $command =~ "^query_opstatus_" ) { $refresh_url = "$url?${monhost_and_port_args_meta}command=$command"; } elsif ( ($command eq "mon_test_service") || ($command eq "svc_details") ) { $refresh_url = "$url?${monhost_and_port_args_meta}command=svc_details&args=$args"; } elsif ($command eq "query_group") { $refresh_url = "$url?${monhost_and_port_args_meta}command=query_group&args=$args"; } elsif ($command eq "list_dtlog") { $refresh_url = "$url?${monhost_and_port_args_meta}command=list_dtlog&args=$args"; } else { $refresh_url = "$url?${monhost_and_port_args_meta}command=query_opstatus"; } if ($must_login) { if ( ( defined($loginhash{'username'}) ) && ( $loginhash{'username'} ne "" ) ) { # Don't get the username and password from the cookie # if the user just submitted it via the login form. # Encrypt the password for cookie storage. $encrypted_password = &gen_ciphertext($loginhash{'password'}) ; } else { # Get the existing cookie and parse it $cookie_value = $webpage->cookie(-name=>"$cookie_name", ); ($loginhash{'username'},$loginhash{'password'}) = split(':', $cookie_value) if $cookie_value; $encrypted_password = $loginhash{'password'} ; # Decrypt the password string (if any) for use by the app, # unless the user just submitted it in cleartext. $loginhash{'password'} = &gen_cleartext($loginhash{'password'}) ; # for some reason (bug?) I get a space at the end of the password # that is returned here, so for now let's take it out, since # spaces are illegal in passwords anyway. $loginhash{'password'} =~ s/\s+//g if defined($loginhash{'password'}) ; } # Set up the new cookie (re-issue a new cookie with every access) if ($destroy_auth_cookie) { $cookie_value = "" ; } elsif ( defined($loginhash{'password'}) ) { $cookie_value = "$loginhash{'username'}:$encrypted_password" ; } else { # no username was supplied $cookie_value = "" ; } $cookie = $webpage->cookie(-name=>"$cookie_name", -value=>"$cookie_value", -expires=>"$expires", -path=>"$cookie_path", ); mon_views(); print $webpage->header( -cookie=>[$cookie,$vcookie], -refresh=>"$reload_time; URL=$refresh_url", ); } else { # Plain & simple, no cookie, no passsword $encrypted_password = "" ; mon_views(); print $webpage->header( -cookie => [$vcookie], -refresh=>"$reload_time; URL=$refresh_url", ); } print $webpage->start_html(-title=>"$page_title", -BGCOLOR=>$BGCOLOR, -TEXT=>$TEXTCOLOR, -LINK=>$LINKCOLOR, -VLINK=>$VLINKCOLOR, -META=>{ 'generator'=>"mon.cgi $VERSION ($AUTHOR)", }, ); # Useful for debugging username/password/cookie issues #DEBUG # print "cookie value is $cookie_value
\n"; #DEBUG # print "encrypt passwd is $encrypted_password
\n"; #DEBUG # print "username is "$loginhash{'username'}"
\n"; #DEBUG # print "decrypt passwd is "$loginhash{'password'}"
\n"; #DEBUG # # Print the logo image, if it was defined by the user. # if ($logo) { $webpage->print("\n"); } else { $webpage->print("\"[$organization\n"); } $webpage->print("
\n"); # # If the user has given a logo_link link, then insert an # anchor tag to it here, if logo_link was defined by the # user. # if ($logo_link) { $webpage->print("\"[$organization

MON: $title

MON: $title

\n"); $webpage->print("
\n"); } else { #just print the generic page with no logo and no link $webpage->print("\n"); if (@views) { $webpage->print(""); } $webpage->print("

MON: $title

", $webpage->start_form, $webpage->popup_menu(-name=>'setview', -values=>["--all--",sort(@views)], -default=>$curview), $webpage->submit(-name=>'Change View'), $webpage->end_form, "
\n"); } &print_bar; @time = localtime($time); $ttime = sprintf ("%.2d:%.2d:%.2d on %s, %.2d-%s-%d", @time[2,1,0], @days_of_week[$time[6]], $time[3], @year_months[$time[4]], $time[5] + 1900 ); $webpage->print("
"); $webpage->print ("\nThis information was presented at $ttime"); # # If the user is currently logged in, tell the user who # they are logged in as. # If the user is NOT currently logged in, offer to log # them in. # if ( ($loginhash{'username'}) && ($loginhash{'username'} ne $default_username) ) { $webpage->print (" to user $loginhash{'username'} (log off user $loginhash{'username'})") ; } else { $webpage->print (" (log in)"); } if ($curview && $curview ne '--all--') { $webpage->print("
Current View: $curview. If you're not seeing what you're expecting to see, try changing views via the menu at the top of this page."); } $webpage->print(".
"); } sub print_bar { # Print the command bar. Called at both the beginning and the end # of each page. # my $button = "INPUT TYPE=\"submit\" NAME=\"command\""; my $table_width = "100%"; my $face = $sans_serif_font_face; my ($cmd, $command, $desc); my $all_commands_unauthorized = 1; my $i; foreach $cmd (keys %auth_commands) { $auth_commands{$cmd}{'auth'} = &mon_checkauth($cmd); #last if $auth_commands{$cmd}{'auth'} == -1; # stop checking if we can't # contact the server $auth_commands{$cmd}{'bgcolor'} = $auth_commands{$cmd}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : ""; $all_commands_unauthorized = 0 if $auth_commands{$cmd}{'auth'} == 1 ; } # Check to see if authentication is required and the selected user has # no permissions to do anything. This can only happen if your mon admin # is very cruel and gives you an account with no permissions (should # be very rare), or if a user enters the wrong password for a valid # account name (the usual case). # The main thing I don't like about this solution is that it embeds # the default l/p into the URL, but this isn't a secure password # anyway, right? $cmd = "list"; #set $cmd to the most basic of commands if ($auth_commands{$cmd}{'auth'} == 0) { $webpage->print("

Cannot connect to the mon server. Check the mon process to see if it is running.


\n"); } elsif ( ($must_login) && ($all_commands_unauthorized == 1) ) { $webpage->print("

You are attempting to log in with the username "$loginhash{'username'}" but your password is incorrect.

Either enter in the correct username/password above or click here to clear your authentication credentials and log back in as the default user.


\n"); } $auth_commands_checked = 1; # We have to lay the tables out by hand bec. of the colspanning :( $webpage->print("\n"); # # Print the first row of the command table # $webpage->print("\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\n"); # # Print the second row of the menu table # $webpage->print("\n"); $webpage->print("\t\n"); $webpage->print("\t\n") ; $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\t\n"); $webpage->print("\n"); # # Print the optional third row of the command table # This row can be defined at individual sites and contain commands # of your choice. Thanks to Ed Ravin (eravin@panix.com) for this. # &moncgi_custom_print_bar($face); # row 3, if any, for the local site $webpage->print("
Show Operational Status (summary) (failures only)Show Alert HistoryLoad scheduler stateStart schedulerList Disabled Hosts/ Watches/ SvcsTest Mon Config File
Show Operational Status   (full)Show Downtime LogSave scheduler stateStop schedulerReload auth fileList Mon PIDsReset Mon
"); } # query the server operational status ---------------------------------- sub query_opstatus { my ($detail_level) = (@_); my ($retval); # some variables for failures my (%op_success, %op_failure); my @scheduler_status = &mon_list_sched_state ; my $opstatus_table_width = "100%"; # the width of the whole opstatus tbl my $service_column_width = "40%"; # the width of the Service column my $service_legend_table_width = "70%"; # the width of the service legend table $webpage->print("
\n"); if ($scheduler_status[0] != 0) { $webpage->print ("The scheduler on $monhost:$monport is currently running. "); } else { my @sched_down_time = localtime ($scheduler_status[1]); my $pretty_sched_down_time = sprintf ("%.2d:%.2d:%.2d, %s-%s-%s\n", @sched_down_time[2, 1, 0, 3], @year_months[$sched_down_time[4]], $sched_down_time[5]+1900); if ($scheduler_status[1] != 0) { $webpage->print ("
The scheduler has been stopped since $pretty_sched_down_time.
\n"); } else { #value is undef, scheduler cannot be contacted (or auth failure) $webpage->print ("
The scheduler cannot be contacted at this time.
\n"); } } $webpage->print ("This page will reload every $reload_time seconds.
"); $webpage->print("

\n"); %op_success = &mon_list_successes; %op_failure = &mon_list_failures ; $webpage->print(""); $webpage->print ("\n"); $webpage->print ("\n"); $webpage->print ("\n"); # Give extra notification if the scheduler is down (this is a big deal!) unless ($scheduler_status[0] != 0) { if ($scheduler_status[1] != 0) { $webpage->print ("\n"); } else { $webpage->print ("\n"); } } &list_status($detail_level, %op_failure) if defined(%op_failure); if ($detail_level eq "failures") { $webpage->print ("\n") unless %op_failure; } else { &list_status($detail_level, %op_success) if defined(%op_success); } $webpage->print("
Host GroupService (legend)Last CheckedEst. Next Check
! SCHEDULER IS NOT RUNNING. RESULTS SHOWN BELOW MAY NOT BE CORRECT !
! SCHEDULER CANNOT BE CONTACTED. RESULTS SHOWN BELOW MAY NOT BE CORRECT !
No failures found.
\n"); # Print the legend below the table &moncgi_print_service_table_legend ($service_legend_table_width); } # # This subroutine tests whether a given group is allowed to be # shown to the user. # Inputs: Name of group to check # Outputs: 1 Group is allowed to be shown # 0 Group is not allowed to be shown # sub can_show_group { my ($group) = (@_); my $watch; # Do not print out the status for this group # unless it is on the "allowed" list if (@show_watch) { #user defined one or more watch keywords # # Loop through each access control and look for a match # foreach $watch (@show_watch) { if ( $group =~ m/^$watch$/ ) { #we found a match #print STDERR "Group $group matched '$watch'\n"; #DEBUG return 1; } } } else { #user didn't define any watch keywords, so show everything return 1; } return 0; } sub list_status { # This function lists the status of all hosts and services. It is # kind of a mess, but this is the function that 90% of the time # you will be viewing, and it has to do a lot. It could still be # cleaned up considerably though. # my ($detail_level, %op) = (@_); my (%group_list, $group, $service, $s, $g, $h); my (@time); my $bg_fail = $redlight_color ; my $bg_fail_noalerts = $yellowlight_color ; my $bg_ok = $greenlight_color; my $td_bg_color; my $face = $sans_serif_font_face; my %d = &mon_list_disabled ; my %desc = &mon_list_descriptions if $detail_level eq "full"; my $servertime = &mon_servertime; my (%ONDS, %ONDS_lastcheck, %ONDS_nextcheck) ; #special hashes of arrays for "OK, Non-Disabled Services" my %OPSTAT = %Mon::Client::OPSTAT; my $service_disabled_string ; my ($service_acked_string, $ackcomment) ; my $host_disabled_string ; my $watch_disabled_string ; my $failure_string; my $desc_string ; my %saw ; #used for sorting my @disabled_hosts; foreach $group (sort keys %op) { #begin group loop # Only show this group if we are allowed to see it next unless &can_show_group($group) ; if ($detail_level eq "full") { # get a list of members of the group if we haven't already # we need the defined() to deal with certain empty groups $group_list{$group} = [ &mon_list_group($group) ] unless defined(@{$group_list{$group}}); } foreach $service (sort keys %{$op{$group}}) { #begin service loop $s = \%{$op{$group}->{$service}}; $service_disabled_string = ""; $service_acked_string = ""; $host_disabled_string = ""; $watch_disabled_string = ""; $desc_string = ""; undef %saw; undef @disabled_hosts; $service_disabled_string = "(DISABLED)" if ${ d{"services"}{$group}{$service} }; # assemble the ACK message, if any. # Escape the HTML to avoid any potential nastiness if the # user requested it, otherwise, just pass it on through # as is. if ( $op{$group}{$service}{'ack'} != 0 ) { if ($untaint_ack_msgs) { # # We untaint # $ackcomment = $op{$group}{$service}{'ackcomment'} eq "" ? "(no ack msg)" : HTML::Entities::encode_entities($op{$group}{$service}{'ackcomment'}) ; } else { # # We don't untaint # $ackcomment = $op{$group}{$service}{'ackcomment'} eq "" ? "(no ack msg)" : $op{$group}{$service}{'ackcomment'} ; } $service_acked_string = " (ACKED: $ackcomment)" ; } foreach $g (keys %{$d{"hosts"}}) { foreach $h (keys %{$d{"hosts"}{$group}}) { push(@disabled_hosts , $h); } } # uniq and sort the returned array of disabled hosts @saw{@disabled_hosts} = (); @disabled_hosts = sort keys %saw; $host_disabled_string = join(" " , @disabled_hosts) if scalar(@disabled_hosts) > 0 ; $host_disabled_string = "
($host_disabled_string DISABLED)\n" if ($host_disabled_string ne ""); $watch_disabled_string = "(DISABLED)" if (${d{"watches"}{$group}}); $td_bg_color = ( ($watch_disabled_string ne "") || ($host_disabled_string ne "") ) ? $disabled_color : $BGCOLOR ; # Don't print the service individually if we are in brief mode # mode and the service is an ONDS. next if ( ($s->{"opstatus"} == $OPSTAT{"ok"}) && ($service_disabled_string eq "") && ($detail_level ne "full") ); # Now print the first column (the group and its status) $webpage->print("\n"); # check to see if full display was requested if ($detail_level eq "full") { $desc_string = ($desc{$group}{$service} ne "") ? " "$desc{$group}{$service}"" : " <no description specified>" ; $webpage->print("$watch_disabled_string"); $webpage->print("$group
\n("); $webpage->print(join(", ",@{$group_list{$group}})); $webpage->print(")$host_disabled_string\n"); } else { $webpage->print("\n"); $webpage->print("$watch_disabled_string"); $webpage->print("$group$host_disabled_string\n"); } # Now print the second column (the service and its status) if ($s->{"opstatus"} == $OPSTAT{"untested"}) { # for untested, don't use a bg in table cell and change # font color instead. $td_bg_color = ($service_disabled_string eq "") ? $unchecked_color : $disabled_color ; $webpage->print("\n"); $webpage->print("$service_disabled_string"); $webpage->print("${service}"); $webpage->print("
(UNCHECKED)
${desc_string}\n"); $webpage->print("\n"); } elsif ($s->{"opstatus"} == $OPSTAT{"fail"}) { # Check to see if the service has issued any alerts. # If not, then we call this service "failing" instead # of failed, on the assumption that if it hasn't # generated an alert yet, it isn't "really" important, # although you'd still like to know about it. if ( $s->{"alerts_sent"} == 0 ) { $td_bg_color = $bg_fail_noalerts ; $failure_string = "FAILED,NOALERTS" ; } else { $td_bg_color = $bg_fail ; $failure_string = "FAILED" ; # Also give the # of alerts if "full" view was selected $failure_string .= ",alerts_sent=" . $s->{"alerts_sent"} if $detail_level eq "full"; } $td_bg_color = ($service_disabled_string eq "") ? $td_bg_color : $disabled_color ; $webpage->print(""); $webpage->print("$service_disabled_string"); $webpage->print("${service}${desc_string} : \n"); $webpage->print("$s->{last_summary}\n"); $webpage->print("
($failure_string)"); $webpage->print(" ${service_acked_string}") if $service_acked_string ne ""; $webpage->print("\n"); } elsif ($s->{"opstatus"} == $OPSTAT{"ok"}) { $td_bg_color = ($service_disabled_string eq "") ? $bg_ok : $disabled_color ; $webpage->print(""); $webpage->print("$service_disabled_string"); $webpage->print("${service}${desc_string}\n"); } else { my $txt = ""; for (keys %OPSTAT) { $txt = $_ if ($s->{"opstatus"} == $OPSTAT{$_}); } $webpage->print(""); $webpage->print("${service_disabled_string}${service} (details: $txt)\n"); } if ($s->{"opstatus"} == $OPSTAT{"untested"}) { $webpage->print(""); $webpage->print("--"); $webpage->print("\n"); } else { $webpage->print(""); if ($s->{"last_check"}) { #traps have a null value for last_check # Service IS NOT a trap if ($s->{"opstatus"} == $OPSTAT{"fail"}) { #check svc status # If service is failing, print last checked time # and also print out last_OK time print &pp_sec_brief($s->{"last_check"} - $servertime); # We need to check if the var is defined, since traps # can throw us off. $webpage->print("
(Last OK: "); if ( (!defined($s->{"last_success"})) || ($s->{"last_success"} == 0) ) { #service is currently failed and does not have a last_success time defined # The event has never occurred $webpage->print ("Never") ; } else { #service is currently failed and has a last_success time defined # Pretty-print the time that the service was last OK @time = localtime ($s->{"last_success"}); $_ = $s->{"last_success"}; # Also calculate the delta and pretty-print that $s->{"last_success"} .= "(" . &pp_sec($time - $_) . " ago)"; print &pp_sec_brief($s->{"last_success"} - $servertime); } $webpage->print(")"); } else { # If service is not failing, just print last checked time print &pp_sec_brief($s->{"last_check"} - $servertime); } } else { # Service IS a trap, or has never been checked $webpage->print("--"); } $webpage->print("
\n"); } $webpage->print(""); # Handle case where service is a trap and hence last_check=undef if ( ( $s->{"timer"} == 0 ) && (!($s->{"last_check"})) ) { $webpage->print("--"); } else { $webpage->print (""); $webpage->print (&pp_sec_brief($s->{"timer"})); $webpage->print (""); $webpage->print ("  (test all on $group)") ; } $webpage->print("\n"); # The next 4 lines are the old way of printing (absolute time) #@time = localtime ($qtime + $s->{"timer"}); #$webpage->print(""); #printf("%.2d:%.2d:%.2d\n", @time[2, 1, 0] ); #$webpage->print("\n"); $webpage->print(""); } # end $service loop # # NEW: print a "compressed" version of OK, Non-Disabled Services (ONDS) # (the assumption being that ONDS's are not all that interesting, # let's not use up a lot of screen real estate discussing them) # # The whole thing is contingent upon us being in "brief" mode if ($detail_level ne "full") { # Build the array of ONDS's foreach $service (sort keys %{ $op{$group} }) { $s = \%{ $op{$group}->{$service} }; next if ( ${d{"services"}{$group}{$service}} ) ; if ($s->{"opstatus"} == $OPSTAT{"ok"}) { push (@{ $ONDS{$group} }, "$service"); if ($s->{"last_check"}) { #check to see if service is a trap # Service IS NOT a trap push (@{ $ONDS_lastcheck{$group} }, &pp_sec_brief($s->{"last_check"} - $servertime) ); } else { # Service IS a trap push (@{ $ONDS_lastcheck{$group} }, "--"); } if ( ( $s->{"timer"} == 0 ) && (!($s->{"last_check"})) ) { # Service IS a trap push (@{ $ONDS_nextcheck{$group} }, "--"); } else { # Service IS NOT a trap #push (@{ $ONDS_nextcheck{$group} }, &pp_sec_brief($s->{"timer"}) ); push (@{ $ONDS_nextcheck{$group} }, "" . &pp_sec_brief($s->{"timer"}) .""); } } } # print the OK, no disabled services for this host if any exist if ( defined(@{ $ONDS{$group} }) ) { $td_bg_color = ( ($watch_disabled_string ne "") || ($host_disabled_string ne "") ) ? $disabled_color : $BGCOLOR ; $webpage->print("\n"); $webpage->print(""); $webpage->print("$watch_disabled_string$group$host_disabled_string"); $webpage->print("\n"); $webpage->print(""); print join(", ", @{$ONDS{$group}}); $webpage->print("\n"); #Now print out the time(s) $webpage->print(""); print join(", ", @{$ONDS_lastcheck{$group}}); $webpage->print("\n"); $webpage->print(""); # Add the "test all" command if there is more than # one service in the group. push (@{ $ONDS_nextcheck{$group} }, "(test all on $group)") if scalar(@{ $ONDS_nextcheck{$group} }) > 1 ; print join(", ", @{$ONDS_nextcheck{$group}}); $webpage->print("\n"); $webpage->print("\n"); } } # end of ONDS printing } # end of $group loop } sub query_group { # # Print out info about the hosts in a particular hostgroup # my ($args) = (@_); my $group = &validate_name ($args); my (%hosts, $host, $retval, $c, $e, $s, $service, $status, %op, $bgcolor, $msg, $ena_checked, $dis_checked) ; my %OPSTAT = %Mon::Client::OPSTAT; my $table_width = "90%"; # Color to shade the cells depending on whether a user is allowed to # execute the disable/enable commands. my $disable_command_bgcolor = $auth_commands{'disable'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : "" ; my $enable_command_bgcolor = $auth_commands{'enable'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : "" ; my %d = &mon_list_disabled ; if (!defined $group) { $webpage->print("Invalid host group \"$group\"\n"); return undef; } @_ = &mon_list_group($group) ; # turn the array into a hash, which is what we really want foreach (@_) { $hosts{$_} = ""; } $webpage->print("
This page will reload every $reload_time seconds.
"); # Only show the rest of this page if we are allowed to see # information about this group. if ($show_watch_strict) { unless ( &can_show_group($group) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$group'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } $webpage->print("
Reload this page immediately.
\n"); ############################################################# # The table is divided into 3 sections: hostgroup, hosts, # and services ############################################################# # Start the form and set up our defaults print $webpage->startform(-action=>"$url"); $webpage->param('command','mon_state_change'); print $webpage->hidden(-name=>'command', ); $webpage->param('group',"$group"); print $webpage->hidden(-name=>'group', ); $webpage->param('h',"$monhost"); print $webpage->hidden(-name=>'h', ); $webpage->param('monport',"$monport"); print $webpage->hidden(-name=>'p', ); # Start the table $webpage->print("
"); ############################################################# # Print the hostgroup portion of the table ############################################################# $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("\n"); $webpage->print(""); $bgcolor = ( ${d{"watches"}{$group}} ) ? $disabled_color : $BGCOLOR; $webpage->print("") ; $dis_checked = ( ${d{"watches"}{$group}} ) ? "checked" : ""; $ena_checked = (!${d{"watches"}{$group}} ) ? "checked" : ""; $webpage->print(""); $webpage->print(""); $webpage->print("\n"); ############################################################# # Print the hosts portion of the table ############################################################# $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); foreach $host (keys %hosts) { if ($host =~ /^\*/) { $host =~ s/^\*// ; #strip the * or else mon dies $hosts{$host} = "disabled"; } else { $hosts{$host} = "enabled"; } } foreach $host (sort keys %hosts) { next if ($host =~ /^\*/); #ignore disabled hosts $webpage->print(""); if ($hosts{$host} eq "disabled") { #check to see if the host is disabled # the host is currently disabled $host =~ s/^\*// ; #strip the * or else mon dies (should already be gone but just in case, this is a bad thing to happen) $webpage->print(""); $webpage->print(""); $webpage->print(""); } else { #host is not currently disabled $webpage->print(""); $webpage->print(""); $webpage->print(""); } $webpage->print( "\n") ; } ############################################################# # Print the services portion of the table ############################################################# $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("\n"); %op = &mon_list_opstatus; foreach $service (sort keys %{$op{$group}}) { #begin service loop $s = \%{$op{$group}->{$service}}; undef $bgcolor; if (${d{"services"}{$group}{$service}}) { #service is disabled $bgcolor = "$disabled_color"; } $dis_checked = ( ${d{"services"}{$group}{$service}} ) ? "checked" : ""; $ena_checked = (!${d{"services"}{$group}{$service}} ) ? "checked" : ""; if ($s->{"opstatus"} == $OPSTAT{"ok"}) { #OK $bgcolor = "$greenlight_color" unless $bgcolor; $msg="(status: OK)"; } elsif ($s->{"opstatus"} == $OPSTAT{"fail"}) { unless ($bgcolor) { if ( $s->{"alerts_sent"} == 0 ) { $bgcolor = "$yellowlight_color"; $msg=": $s->{'last_summary'}
(status: FAILED,NOALERTS)"; } else { $bgcolor = "$redlight_color"; $msg=": $s->{'last_summary'}
(status: FAILED)"; } if ( $s->{'ackcomment'} ne "" ) { $msg .= "(ACKED: $s->{'ackcomment'})"; } else { $msg .= "(no ack msg)"; } } } elsif ($s->{"opstatus"} == $OPSTAT{"untested"}) { $bgcolor = "$unchecked_color" unless $bgcolor; $msg="(status: UNTESTED)"; } $webpage->print("") ; # Check whether the service is disabled or enabled $webpage->print(""); $webpage->print(""); $webpage->print("\n"); } $webpage->print(""); $webpage->print(""); $webpage->print("
Hostgroup "$group"EnabledDisabled
$group (list downtime log for hostgroup $group)
Members of hostgroup \"$group\"EnabledDisabled
$host (DISABLED)$host
Services monitored on hostgroup \"$group\"
(test all services on hostgroup $group)
EnabledDisabled
$service $msg
    (list downtime log for $group:$service)
") ; $webpage->print("
    (test service $service on group $group immediately)
"); print $webpage->reset(-name=>'Cancel Changes'); $webpage->print("
"); print $webpage->submit(-name=>'Apply Changes'); $webpage->print("
"); print $webpage->end_form(); } sub end_page { # End the document with a footer and contact info &print_bar; if ($monadmin ne "") { $webpage->print("
For questions about this server,
contact $monadmin
mon.cgi v$VERSION
"); } print $webpage->end_html; } sub list_alerthist { # This function lists the alert history formatted in a table my @l = &mon_list_alerthist ; my ($line, $localtime); my $table_width = "80%"; $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); foreach $line (reverse sort {$a->{"time"} <=> $b->{"time"}} (@l)) { # Only show the alert if we are allowed to see information # about this group. if ($show_watch_strict) { next unless &can_show_group($line->{group}); } $localtime = localtime ($line->{"time"}); $webpage->print(""); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $line->{"alert"} =~ s{^.*\/([^/]*)$}{$1}; $webpage->print("\n"); my $args = "-"; if ($line->{"args"} !~ /^\s*$/) { $args = $line->{"args"}; } $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); } $webpage->print("
GroupServiceTypeTimeAlertArgsSummary
print("args=$line->{group}\">$line->{group}$line->{service}$line->{type}$localtime$line->{alert}$args$line->{summary}
\n"); print $webpage->hr; } sub svc_details { # Lists details about a particular alert's status, regardless of # whether the alert is successful or not. # As of 1.40, this function has been expanded considerably, and # has also been given the benefit of additional verbiage from the # global variables %alert_vars, @time_based_alert_vars, and # @pp_sec_alert_vars # my ($arg) = (@_); my ($group, $service) = split (/\,/, $arg); # Only show the rest of this page if we are allowed to see # information about this group. if ($show_watch_strict) { unless ( &can_show_group($group) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$group'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } my $status; my $retval; my (@pids, $server, $acknowledge_string, $name_string, $ackcomment_default); my (%op, $s, $g, $var, @time); my $table_width = "90%"; #let's give both tables the same width my $font_color; my %d = &mon_list_disabled; my %desc = &mon_list_descriptions; my $servertime = &mon_servertime; my @group_members = &mon_list_group($group) ; my $time_now = $servertime; my $enable_command_bgcolor = $auth_commands{'enable'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : "" ; my $disable_command_bgcolor = $auth_commands{'disable'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : "" ; my $test_command_bgcolor = $auth_commands{'test'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : "" ; my $ack_command_bgcolor = $auth_commands{'ack'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : "" ; my $disabled_hosts_string ; # Determine whether the service is failing or not # We'll be optimistic and assume it's NOT failing :) # Actually, unchecked services also show up as "ok". %op = &mon_list_successes; if ($op{$group}{$service}) { $status = "ok"; } else { $status = "fail"; %op = &mon_list_failures; } $webpage->print("
This page will reload every $reload_time seconds.
"); print $webpage->hr; $webpage->print("
"); if (${d{"services"}{$group}{$service}}) { #service is disabled $webpage->print ("Test detail for disabled service $service in group $group \n"); } elsif ($op{$group}{$service}{'opstatus'} == 1) { #OK $webpage->print ("Success detail for group $group\n"); $webpage->print ("and service test $service "); } elsif ($op{$group}{$service}{'opstatus'} == 7) { # service is Untested $webpage->print ("Test detail for group $group and service test $service"); $webpage->print ("
(service is UNCHECKED!)"); } else { # service is Failed (or some other opstatus code I don't know about) # If a service fails, print the detail and summary information # at the top. Yes, it's buried down in the middle of the page, but # it's important enough to take up screen real estate with and # bring it to the top. $font_color = ($op{$group}{$service}{'alerts_sent'} == 0) ? $yellowlight_color : $redlight_color ; $webpage->print("\n"); $webpage->print(""); # Now print the detail and summary information for the failed service $op{$group}->{$service}->{'last_summary'} = "<not specified>" if $op{$group}->{$service}->{'last_summary'} eq "" ; $op{$group}->{$service}->{'last_detail'} = "<not specified>" if $op{$group}->{$service}->{'last_detail'} eq "" ; $op{$group}->{$service}->{'last_detail'} =~ s/\n/
/g; $webpage->print("\n"); $webpage->print("\n"); $webpage->print("
"); $webpage->print("
"); $webpage->print ("Failure detail for group $group "); $webpage->print ("and service test $service: \n"); $webpage->print("
"); $webpage->print("
Failure summary:$op{$group}->{$service}->{'last_summary'}
Failure detail:$op{$group}->{$service}->{'last_detail'}
"); } # Issue warning if the whole group has been disabled. if ( ${d{"watches"}{$group}} ) { $webpage->print ("
(NOTE: group $group is disabled.)"); } # Issue warning if any hosts in the group are currently disabled. foreach (@group_members) { if ($_ =~ s/^\*//) { $disabled_hosts_string = " $_" . $disabled_hosts_string ; } } $disabled_hosts_string = "
(NOTE: The following host(s) in group $group are disabled: $disabled_hosts_string)" if $disabled_hosts_string ne ""; $webpage->print ($disabled_hosts_string); # Check to see if a test for this service is currently running # and report back if it is (because the status might not have updated # yet...) @pids = &mon_list_pids; if (!defined(@pids)) { $webpage->print("Unable to to determine whether this service is currently being tested (list_pids failed)!\n"); } else { shift @pids; #discard server PID, we don't need it for (@pids) { if ( ($_->{"watch"} eq $group) && ($_->{"service"} eq $service) ) { # we have a match, a monitor is currently running for # this group and this service. $webpage->print("
NOTE: A monitor for this service is currently running as PID $_->{'pid'}. Results/opstatus codes might change when this test finishes!\n"); } } } $webpage->print("
Reload this page immediately.
"); $webpage->print("
"); print $webpage->br; $webpage->print(""); $webpage->print ("\n"); if (${d{"services"}{$group}{$service}}) { #service is disabled, offer to enable $webpage->print("\n"); } else { #service is enabled, offer to disable $webpage->print("\n"); } $webpage->print (""); $webpage->print("\n"); $webpage->print("
Test service $service on group $group immediately(ENABLE service $service in group $group)(DISABLE service $service in group $group)List downtime log for service $service and group $group
\n"); # If the service is in a failure state, offer to ack it. if ($status eq "fail") { $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("\n"); $webpage->print("
"); if ($op{$group}{$service}{'ack'} != 0) { # Service has already been acked, offer to re-ack $acknowledge_string = "Re-acknowledge this failure:
(changes the acknowledgement message)
"; $ackcomment_default = "Was:\"$op{$group}{$service}{'ackcomment'}\""; } else { # Service has not yet been acked, offer to ack $acknowledge_string = "Acknowledge this failure:
(disables all subsequent alerts for this failure period)
"; $ackcomment_default = "${name_string}"; } $webpage->print("$acknowledge_string "); $webpage->print("
"); # It is crucial that we reset the param value, otherwise the # value of the default will be ignored. print $webpage->startform(-action=>"$url"); $webpage->param('ackcomment', $ackcomment_default); $webpage->param('h',"$monhost"); print $webpage->hidden(-name=>'h'); $webpage->param('p',"$monport"); print $webpage->hidden(-name=>'p'); print $webpage->textfield(-name=>'ackcomment', -value=>"$ackcomment_default", -size=>40, ); # The textarea is nice, but you lose the ability to hit ENTER to # submit your ack, which I really really like. # print $webpage->textarea(-name=>'ackcomment', # -value=>"$ackcomment_default", # -rows=>2, # -wrap=>'soft', # -columns=>40, # ); $webpage->print("  "); # We also have to reset this param value $webpage->param('command','mon_ack'); print $webpage->hidden(-name=>'command', ); $webpage->param('args',"$group,$service"); # We also have to reset this param value print $webpage->hidden(-name=>'args', ); print $webpage->submit(-name=>'ack'); print $webpage->end_form(); $webpage->print("
\n"); } $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); my $desc_string = ($desc{$group}{$service} ne "") ? ""$desc{$group}{$service}"" : "<no description specified>" ; $webpage->print("\n"); foreach $var (sort {$b cmp $a } (keys %{$op{$group}{$service}}) ) { # Special cases where we have a time formatted in secs # since 1970, we'll make it look purty if (grep /^${var}$/, @time_based_alert_vars) { # We need to check if the var is defined, since traps # can throw us off. if ( (!defined($op{$group}->{$service}->{$var})) || ($op{$group}->{$service}->{$var} == 0) ) { # The event has never occurred $op{$group}->{$service}->{$var} = "Never" ; } else { # Pretty-print the time @time = localtime ($op{$group}->{$service}->{$var}); $_ = $op{$group}->{$service}->{$var}; $op{$group}->{$service}->{$var} = sprintf ("%.2d:%.2d:%.2d, %s-%s-%s\n", @time[2, 1, 0, 3], @year_months[$time[4]], $time[5]+1900); # Also calculate the delta and pretty-print that $op{$group}->{$service}->{$var} .= "(" . &pp_sec($time_now - $_) . " ago)"; } } elsif ($op{$group}->{$service}->{$var} eq "") { # special case where value of $var is empty #(i.e. mon has never seen the service fail) $op{$group}->{$service}->{$var} = "-"; } elsif (grep /^${var}$/, @pp_sec_alert_vars) { if ( (!defined($op{$group}->{$service}->{$var})) || ($op{$group}->{$service}->{$var} == 0) ) { # The event has never occurred $op{$group}->{$service}->{$var} = "Never" ; } else { # Special case where time is a duration and # should be pretty printed. $op{$group}->{$service}->{$var} = &pp_sec($op{$group}->{$service}->{$var}); } } $op{$group}->{$service}->{$var} =~ s/\n/
/g; $webpage->print("\n"); } $webpage->print("
Variable Description (name)Value
Service Description$desc_string
$alert_vars{$var} ($var)$op{$group}->{$service}->{$var}
\n"); print $webpage->hr; } sub list_disabled { # This function lists all the disabled watches, services, and hosts # and returns the result as pretty(?) HTML my (%d, $group, $service, $host, $watch); my (@disabled_hosts, @disabled_svcs); my $enable_command_bgcolor = $auth_commands{'enable'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : $BGCOLOR ; %d = &mon_list_disabled; print $webpage->hr; $webpage->print("
\n"); # Start the form and set up our defaults print $webpage->startform(-action=>"$url"); $webpage->param('command','mon_state_change_enable_only'); print $webpage->hidden(-name=>'command', ); $webpage->param('h',"$monhost"); print $webpage->hidden(-name=>'h'); $webpage->param('p',"$monport"); print $webpage->hidden(-name=>'p'); $webpage->print("\n"); $webpage->print( ""); $webpage->print(""); $webpage->print(""); $webpage->print( "\n"); $webpage->print( ""); if ( scalar(keys %{$d{"watches"}}) > 0 ) { for (keys %{$d{"watches"}}) { $webpage->print(""); $webpage->print("\n"); } } else { $webpage->print("\n"); } # # Disabled hosts portion of table # $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print( "\n"); foreach $group (keys %{$d{"hosts"}}) { foreach $host (keys %{$d{"hosts"}{$group}}) { push(@disabled_hosts,""); push(@disabled_hosts,""); } } if (scalar(@disabled_hosts) > 0 ) { print join("\n", @disabled_hosts); } else { $webpage->print(""); $webpage->print(""); $webpage->print("\n"); } # # Disabled services portion of table # $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("\n"); foreach $watch (keys %{$d{"services"}}) { foreach $service (keys %{$d{"services"}{$watch}}) { push(@disabled_svcs, ""); push(@disabled_svcs,""); } } if (scalar(@disabled_svcs) > 0 ) { print join("\n", @disabled_svcs); } else { $webpage->print("\n"); } $webpage->print(""); $webpage->print("\n"); $webpage->print("

Disabled Watches

Enable?

$_
<NONE>

Disabled Hosts

Enable?

$group:$host
<NONE>

Disabled Services

Enable?

$watch:$service
<NONE>
"); print $webpage->reset(-name=>'Cancel Changes'); $webpage->print("
"); print $webpage->submit(-name=>'Apply Changes'); $webpage->print("
\n"); print $webpage->end_form(); $webpage->print("
\n"); } sub list_dtlog { # Accepts the following arguments, all of which are optional: # (group,service,sortby,first log entry to show, last log entry to show) # to sort by. # # Default sort key is failtime. # # If {service,group} is null, shows detail about all failures # for the given {group,service}. # # Default first log entry to show is 1. # # Default last log entry to show is 1+$dtlog_max_failures_per_page. # # No arguments means show all service failures for all groups, # sorted by failtime. # # Someday it would be nice to take args like time ranges, etc., but # that capability should really be built into Mon itself since it # is such a useful feature and something which could leverage a lot # of mon's timeperiod work as well. # # Original patch by Martha H Greenberg # my ($arg) = (@_); my ($group, $service,$sortby,$dtlog_begin,$dtlog_end); if ( defined($arg) ) { ($group, $service,$sortby,$dtlog_begin,$dtlog_end) = split (/\,/, $arg) ; } else { $group = ""; $service = ""; $sortby = "" ; $dtlog_begin = ""; $dtlog_end = ""; } $dtlog_begin = 1 unless ( ($dtlog_begin) && ($dtlog_begin > 0) ); $dtlog_end = ($dtlog_begin + $dtlog_max_failures_per_page - 1) unless ( ($dtlog_end) && ($dtlog_end > 0) ); $sortby = "failtime" unless ($sortby); my $face = $sans_serif_font_face; my $summary_table_width = "80%"; my $dt_table_width = "100%"; my $i; # This has keeps track of the sortby keys and what # their descriptions map to. my %sortby_key = ("group" => "Group", "service" => "Service", "failtime" => "Service Failure Begin Time", "timeup" => "Service Failure End Time", "downtime" => "Total Observed Failure Time", "interval" => "Testing Interval", "summary" => "Summary", ); print $webpage->hr; my ($line, $localtimeup, $localfailtime, $ppdowntime, $ppinterval); my ($first_failure_time, $total_failures, $mtbf, $mean_recovery_time, $median_recovery_time, $std_dev_recovery_time, $min_recovery_time, $max_recovery_time, @l) = &mon_list_dtlog($group, $service) ; my ($ppfft, $ppmtbf, $ppmean_recovery_time, $ppmedian_recovery_time, $ppmin_recovery_time, $ppmax_recovery_time, $ppstd_dev_recovery_time); # Estimated uptime calculation my $time_now = time; my $approx_uptime_pct = ( ( ($time_now - $first_failure_time + $mtbf ) > 0) && ( ($time_now - $first_failure_time + $mtbf - scalar(@l) * $mean_recovery_time > 0 ) ) ) ? sprintf("%.2f%", ( ( ($time_now - $first_failure_time + $mtbf) - (scalar(@l) * $mean_recovery_time) ) / ($time_now - $first_failure_time + $mtbf) ) * 100 ) : "<not applicable>"; $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $ppfft = $first_failure_time; my @fftime = localtime ($first_failure_time); $ppfft = sprintf ("%s, %s %d, %.2d at %.2d:%.2d:%.2d", @days_of_week[$fftime[6]], @year_months[$fftime[4]], $fftime[3], $fftime[5] + 1900 , @fftime[2,1,0] ); $webpage->print("\n\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $ppmtbf = &pp_sec($mtbf); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $ppmean_recovery_time = &pp_sec($mean_recovery_time); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $ppmedian_recovery_time = &pp_sec($median_recovery_time); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $ppstd_dev_recovery_time = &pp_sec($std_dev_recovery_time); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $ppmin_recovery_time = &pp_sec($min_recovery_time); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $ppmax_recovery_time = &pp_sec($max_recovery_time); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("
\n"); $webpage->print("Downtime Summary For Hostgroup "); if ($group eq "") { $webpage->print("<any>"); } else { $webpage->print(""$group""); } $webpage->print(" and Service "); if ($service eq "") { $webpage->print("<any>"); } else { $webpage->print(""$service""); } $webpage->print("
Log begins at:$ppfft
Total observed service failures:$total_failures
Mean time between service failures:$ppmtbf
Mean observed service failure time:$ppmean_recovery_time
Median observed service failure time:$ppmedian_recovery_time
Standard deviation of observed service failure times:$ppstd_dev_recovery_time
Minimum observed service failure time:$ppmin_recovery_time
Maximum observed service failure time:$ppmax_recovery_time
Approximate percentage of time in failure-free operation:$approx_uptime_pct
\n"); return 0 if scalar(@l) == 0; # stop if we returned no downtime events ($dtlog_begin, $dtlog_end) = &moncgi_list_dtlog_navtable ($url, $group, $service, $sortby, $dtlog_begin, $dtlog_end, $total_failures, scalar(@l), %sortby_key) ; # Start printing the actual downtime log table. # Print the header as a table with a thicker border $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); # default sort order is "failtime", so no need to resort if ($sortby ne "failtime") { # do a forward-alphanumeric or reverse-numeric sort, # depending on the sortby parameter if ( ($sortby eq "group") || ($sortby eq "service") || ($sortby eq "summary") ) { @l = (sort {$a->{"$sortby"} cmp $b->{"$sortby"}}(@l)); } else { @l = (reverse sort {$a->{"$sortby"} <=> $b->{"$sortby"}}(@l)); } } # Now print the rows of the downtime table for ( $i = $dtlog_begin ; $i <= $dtlog_end ; $i++ ) { $line = $l[$i-1]; $webpage->print(""); $webpage->print(""); $webpage->print("\n"); $localfailtime = localtime ($line->{"failtime"}); $webpage->print("\n"); $localtimeup = localtime ($line->{"timeup"}); $webpage->print("\n"); $ppdowntime = &pp_sec ($line->{"downtime"}); $webpage->print("\n"); $ppinterval = &pp_sec ($line->{"interval"}); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); } $webpage->print("
Entry$sortby_key{\"group\"}$sortby_key{\"service\"}$sortby_key{\"failtime\"}$sortby_key{\"timeup\"}$sortby_key{\"downtime\"}$sortby_key{\"interval\"}$sortby_key{\"summary\"}
$i{\"group\"}\">"); $webpage->print("$line->{\"group\"}{\"group\"},$line->{\"service\"}\">"); $webpage->print("$line->{\"service\"}$localfailtime$localtimeup$ppdowntime$ppinterval$line->{\"summary\"}
\n"); &moncgi_list_dtlog_navtable ($url, $group, $service, $sortby, $dtlog_begin, $dtlog_end, $total_failures, scalar(@l), %sortby_key) ; print $webpage->hr; undef @l; } sub list_pids { my $retval; my @pids = &mon_list_pids ; print $webpage->hr; print $webpage->h2("List of mon PID's:"); $webpage->print("Unable to list PID's on server!
\n") if !defined(@pids); my $server = shift @pids; $webpage->print ("Server PID is $server

\n"); if ( scalar(@pids) > 0 ) { $webpage->print("PID's of currently active monitors:
\n"); $webpage->print("
\n"); for (@pids) { $webpage->print (join ("", "\n", )); } $webpage->print("
HostgroupServicePID
", $_->{"watch"}, "", $_->{"service"}, "", $_->{"pid"}, "
\n"); } else { $webpage->print("<No monitors are running at this time>
\n"); } } ############################################################### # Mon-specific functions. These all have the common feature # that they connect to a Mon server and retrieve some data. # Generally these functions are called by the Presentation functions, # or if they are called directly they do no special output # formatting. ############################################################### sub mon_connect { # Performs the basic connection, and if necessary, authentication, # to a mon server. # If successful, returns a 1 # If unsuccessful because of login failure, returns -1 # If unsuccessful because of other reasons, returns 0 my $retval; # # If we're not connected, we need to connect and possibly authenticate # if ( (! defined($c->connected()) ) || ( $c->connected() == 0 ) ) { $c->connect(); if ($c->error) { $retval = $c->error; print "mon_connect: Could not contact mon server "$monhost": $retval \n" if $connect_failed == 0 ; $connect_failed = 1; #set the global $connect_failed var return 0; } # # Test to see if login is required, and if so, # set the username and password. # if ($must_login) { #print "
Login is "$loginhash{\"username\"}" and passwordis "$loginhash{\"password\"}"\n
"; # # Test to see if username and password are blank. # If so, then use the default username/password. # if ( ( ( ! defined($loginhash{"username"}) ) && ( ! defined($loginhash{"password"}) ) ) || ( ( $loginhash{"username"} eq "" ) && ( $loginhash{"password"} eq "" ) ) ) { # Login is required but no username/password was given, so # try the login and password to the default account $loginhash{"username"} = $default_username ; $loginhash{"password"} = $default_password ; } $c->login(%loginhash); #print "connected as user $loginhash{'username'}\n"; #DEBUG } } if ( ($must_login) && ( defined($c->error) ) && ( ($c->error =~ /530 login unsuccessful/) || ($c->error =~ /no password/) || ($c->error =~ /no username/) ) ) { # Login was required and unsuccessful, present the authform # if it hasn't already been presented. Since some presentation # functions call multiple methods you could very easily # end up with multiple prompts, which is confusing to the # user. &moncgi_authform ($command,"$args") unless $has_prompted_for_auth; $has_prompted_for_auth = 1; return -1; } return 1; } sub mon_views { my ($viewreq); my $conn = &mon_connect ; return 0 if $conn == 0; @views = $c->list_views (); print STDERR "list_views failed" if ($c->error); $viewreq = $webpage->param(-name=>"setview"); if ($viewreq && ($viewreq eq '--all--' || grep(/^$viewreq$/, @views))) { $vcookie = $webpage->cookie(-name=>"$vcookie_name", -value=>"$viewreq", -expires=>"+1y", -path=>"$vcookie_path", ); $curview = $viewreq; } else { $curview = $webpage->cookie(-name=>"$vcookie_name"); } if ($curview && $curview ne '--all--') { $c->setview($curview); } return 1; } sub mon_list_group { # List all the hosts in a given group. Returns an array of hosts # if successful, or undef if failure. my ($group) = (@_); my (@hosts, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; @hosts = $c->list_group ($group); unless ($c->error) { return @hosts ; } else { $retval = $c->error; $webpage->print ("Could not list groups on mon server "$monhost": $retval"); return undef; } } sub mon_list_watch { # List all the watches. Returns an array of defined watch groups and # services. # if successful, or undef if failure. my ($group) = (@_); my (@hosts, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; @hosts = $c->list_watch ($group); unless ($c->error) { return @hosts ; } else { $retval = $c->error; $webpage->print ("Could not list watches on mon server "$monhost": $retval"); return undef; } } sub mon_list_failures { # This function returns a hash of failures. my (%op, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; %op = $c->list_failures(); if ($c->error) { $retval = $c->error; print "Could not execute list failures command mon server on server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); } else { #print "
list_failures command executed successfully
\n"; } return %op; } sub mon_list_successes { # This function returns a hash of successes my (%op, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; %op = $c->list_successes(); if ($c->error) { $retval = $c->error; print "Could not execute list successes command on server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); } return %op; } sub mon_list_opstatus { # This function returns a hash of opstatus # It should accept an optional anonymous array argument, of # [group, service ...] to only return # the opstatus of the group/service pairs you are interested in. # # But it doesn't because I am lazy and don't need this feature # right now :) #my ($criteria) = @_; my (%op, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; %op = $c->list_opstatus(); if ($c->error) { $retval = $c->error; print "Could not execute list opstatus command on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } return %op; } sub mon_list_disabled { # This function lists all the disabled watches, services, and hosts # and returns the result as a hash my (%d, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; %d = $c->list_disabled(); if ($c->error) { $retval = $c->error; print "Could not execute list disabled command mon server on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } else { #print "
list_disabled command executed successfully
\n"; } return %d; } sub mon_reload { # Reload mon config file. # Right now the only option supported is to reload the auth.cf file. # my ($what) = (@_); print $webpage->hr; print $webpage->h2("Reloading mon..."); my $retval; my $conn = &mon_connect ; return 0 if $conn == 0; $retval = $c->reload($what); if ($c->error) { $retval = $c->error; print "Could not reload "$what" mon server on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } else { print "mon server on "$monhost" successfully reloaded.\n"; } } sub mon_loadstate { # A simple wrapper function that calls mon_loadstate_savestate with # the proper arguments. my ($state) = (@_); $state = "disabled" if $state eq ""; print $webpage->hr; print $webpage->h2("Loading saved state for $state..."); &mon_loadstate_savestate("load",$state); } sub mon_savestate { # A simple wrapper function that calls mon_loadstate_savestate with # the proper arguments. my ($state) = (@_); $state = "disabled" if $state eq ""; print $webpage->hr; print $webpage->h2("Saving current state for $state..."); &mon_loadstate_savestate("save",$state); } sub mon_loadstate_savestate { # Loads or saves state of a mon object specifed by $target. Currently # the only object supported by mon for loading/saving state is the # state of the scheduler, so using this function with $target="" will # load or save the state of the scheduler. # # The load/save action is specified by the variable $action my ($action, $state) = (@_); my $retval; my $conn = &mon_connect ; return 0 if $conn == 0; if ($action eq "save") { ($retval = $c->savestate($state)) || ($retval = $c->error); } elsif ($action eq "load") { ($retval = $c->loadstate($state)) || ($retval = $c->error); } if ($c->error ne "") { print "Could not $action state mon server for state "$state" on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } else { print "$action state succeded on mon server "$monhost".\n"; } } # Stop or start scheduler --------------------------------------------- sub mon_schedctl { # Either stops or starts the scheduler, depending on how it was called, # either with the "stop" argument or the "start" argument. # No return value. my ($action) = (@_); print $webpage->hr; print $webpage->h2("MON: $action scheduler..."); my $retval; my $conn = &mon_connect ; return 0 if $conn == 0; if ($action eq "stop") { $retval = $c->stop(); } elsif ($action eq "start") { $retval = $c->start(); } if ($c->error) { $retval = $c->error; print "Could not $action mon server on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } else { print "$action scheduler on mon server on "$monhost" succeeded.\n"; } } sub mon_list_pids { my $retval; my @pids; my $conn = &mon_connect ; #test return 0 if $conn == 0; @pids = $c->list_pids; # my $server = shift @pids; if ($c->error) { $retval = $c->error; print "Could not list pids on mon server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); return undef; } else { return @pids; } } sub mon_list_descriptions { # This subroutine executes the list_descriptions() routine and, # if successful, returns a hash of service descriptions, indexed # by watch and service. Returns undef on failure. # my $retval; my %desc; my $conn = &mon_connect ; return 0 if $conn == 0; %desc = $c->list_descriptions; if ($c->error) { $retval = $c->error; print "Could not list descriptions on mon server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); return undef; } else { return %desc; } } # Enable a disabled host/watch/service ---------------------------------- sub mon_enable { my ($arg) = (@_); my ($type, $arg1, $arg2) = split (/\,/, $arg); print $webpage->h2("Enabling service..."); my $retval; # Only show the rest of this page if we are allowed to see # information about this group. if ( ($show_watch_strict) && ( ! $type eq "host") ) { unless ( &can_show_group($arg1) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$arg1'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } my $conn = &mon_connect ; return 0 if $conn == 0; if ($type eq "service") { $retval = $c->enable_service($arg1, $arg2); } elsif ($type eq "host") { $retval = $c->enable_host($arg1); } elsif ($type eq "watch") { $retval = $c->enable_watch($arg1); } if ($c->error) { $retval = $c->error; print "Could not successfully execute command enable_${type} with arguments "$arg1" and "$arg2" on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); return 0; } else { print "enable_${type} succeeded for "; if ($type eq "service") { print "
watch $arg1, service $arg2"; } elsif ($type eq "host") { print "
host $arg1"; } elsif ($type eq "watch") { print "
watch $arg1"; } print "
\n"; } return 1; } # Disable an enabled service ---------------------------------------------- sub mon_disable { my ($arg) = (@_); my ($type, $arg1, $arg2) = split (/\,/, $arg); print $webpage->h2("Disabling service..."); my $retval; # Only show the rest of this page if we are allowed to see # information about this group. if ( ($show_watch_strict) && ( ! $type eq "host") ) { unless ( &can_show_group($arg1) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$arg1'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } my $conn = &mon_connect ; return 0 if $conn == 0; if ($type eq "service") { $retval = $c->disable_service($arg1, $arg2); } elsif ($type eq "host") { $retval = $c->disable_host($arg1); } elsif ($type eq "watch") { $retval = $c->disable_watch($arg1); } if ($c->error) { $retval = $c->error; print "Could not successfully execute command disable_${type} with arguments "$arg1" and "$arg2" on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } else { print "disable_${type} succeeded for "; if ($type eq "service") { print "
watch $arg1, service $arg2"; } elsif ($type eq "host") { print "
host $arg1"; } elsif ($type eq "watch") { print "
watch $arg1"; } print "
\n"; } } sub mon_test_service { # Test a service immediately. # Accepts as arguments a group and a service, and optionally an # test argument, which can be either alert, upalert, or startupalert. # Default test is "alert" my ($arg) = (@_); my ($group, $service,$test) = split (/\,/, $arg); # Only show the rest of this page if we are allowed to see # information about this group. if ($show_watch_strict) { unless ( &can_show_group($group) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$group'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } $test = "monitor" if $test eq ""; print $webpage->h2("Performing $test test on service $service in hostgroup $group..."); my $retval; my $conn = &mon_connect ; return 0 if $conn == 0; $retval = $c->test($test, $group, $service); if ($c->error) { $retval = $c->error; print "Could not successfully execute command "$test" test service "$service" on hostgroup "$group" on server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); } else { print "test $test completed for service $service and hostgroup $group:


"; print " $retval
\n"; } } sub mon_test_config { # Test the mon config file immediately. # Takes no argument (there is only one config file after all) # print $webpage->h2("Testing the syntax of your mon config file..."); my $retval; my $conn = &mon_connect ; return 0 if $conn == 0; my @s = $c->test_config; if ($c->error) { $retval = $c->error; if ( $retval !~ /^520 test config completed/ ) { # command not authorized print "Could not successfully execute command "test config" on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } elsif ( (defined ($s[0])) && ($s[0] == 0) ) { # there are config file errors $webpage->print("Error in config file found:
" . $s[1] . "\n\n
"); $webpage->print("Please note that you may have other errors in your configuration file, but the checking stops after the first one is found.
"); } else { # some other error occurred print "Could not successfully execute command "test config" on server "$monhost":
$retval (perhaps you don't have permissions in auth.cf?)
\n"; &moncgi_switch_user($retval); } } else { $webpage->print("Test config completed OK, no errors were found in your config file

\n"); } } # Reset mon ---------------------------------------------------- sub mon_reset { ($args) = (@_); print $webpage->hr; print $webpage->h2("Reset mon..."); my $retval; my $conn = &mon_connect ; return 0 if $conn == 0; if ( $args eq "keepstate" ) { # specify we want to keep the scheduler state $retval = $c->reset($args); } else { # reset scheduler state, don't give reset any arguments $retval = $c->reset(); } if ($c->error) { $retval = $c->error; $webpage->print ("Could not reset mon server on server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"); &moncgi_switch_user($retval); } else { $webpage->print ("mon server on "$monhost" successfully reset.

"); if ( $args eq "keepstate" ) { $webpage->print ("Scheduler state was NOT reset.

All previously disabled hosts/groups/services are still disabled."); } else { $webpage->print ("Scheduler state was reset.

All hosts/groups/services are now enabled."); } $webpage->print ("
\n"); } } # List alert history -------------------------------------------------- sub mon_list_alerthist { print $webpage->hr; print $webpage->h2("Alert History:"); my $retval ; my $conn = &mon_connect ; return 0 if $conn == 0; my @l = $c->list_alerthist(); if ($c->error) { $retval = $c->error; print "Could not list alert history on mon server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); } else { #print "
alert history on on "$monhost" successfully retrieved.
\n"; } return @l; } sub mon_list_sched_state { # This function returns an array, @scheduler_state, which # contains the state of the scheduler. This is not exactly # documented, but @scheduler_state[0]==0 seems to indicate # that the scheduler is stopped and @scheduler_state[1] # seems to hold the time (in epoch seconds) since the scheduler was # stopped. my (@scheduler_state, $retval); my $conn = &mon_connect ; return 0 if $conn == 0; @scheduler_state = $c->list_state(); if ($c->error) { $retval = $c->error; print "Could not execute list state command mon server on server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); } else { #print "
list state command executed successfully
\n"; } return @scheduler_state; } sub mon_list_dtlog { # Lists the downtime log (all of it) and returns the results as # an array of hash references. my ($group, $service) = (@_); my $retval ; my (@ltmp, @l, $line); my (@recovery_times); my ($first_failure_time, $mtbf, $mean_recovery_time, $median_recovery_time, $std_dev_recovery_time, $max_recovery_time, $min_recovery_time); my $max_recovery_time_default = -1; #some arbitrary number less than 0 my $min_recovery_time_default = 9999999999999; #some arbitrary really big number my $conn = &mon_connect ; return 0 if $conn == 0 ; @ltmp = $c->list_dtlog(); if ($c->error) { $retval = $c->error; $webpage->print("Could not list downtime log on mon server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"); &moncgi_switch_user($retval); return undef; } my $time_now = time; $max_recovery_time = $max_recovery_time_default; # initialize this to something really small $min_recovery_time = $min_recovery_time_default; # initialize this to something really big $first_failure_time = $time_now; # # Loop through all downtimes, get first downtime, min and max, # and filter based on criteria/specifications. # foreach $line (reverse sort {$a->{"failtime"} <=> $b->{"failtime"}}(@ltmp)){ # # Test to see if this is the first failure time. # if ($line->{"failtime"} < $first_failure_time) { # since this list is already sorted, this will only be true # the very first time we go thru this loop $first_failure_time = $line->{"failtime"} if $line->{"failtime"} < $first_failure_time ; } # # Skip this line if a group and/or service param was # specified and the param doesn't match. # if ( ( defined($group) ) && ($group ne "") && ($group ne $line->{"group"}) ) { next; } if ( ( defined($service) ) && ($service ne "") && ($service ne $line->{"service"}) ) { next; } # # Only show this downtime log entry if we are allowed to see # information about this group. This is probably slow but # no one said security was efficient! # if ($show_watch_strict) { unless ( &can_show_group($line->{"group"}) ) { # This line should not be shown to the user next; } } # # Add this downtime to the list of downtimes # for statistical calculation purposes. # push(@l, $line); push(@recovery_times, $line->{"downtime"}); # # set min and max downtimes # $min_recovery_time = $line->{"downtime"} if $line->{"downtime"} < $min_recovery_time; $max_recovery_time = $line->{"downtime"} if $line->{"downtime"} > $max_recovery_time; } undef @ltmp; #we don't need @ltmp's memory anymore # # Calculate mean recovery time # $mean_recovery_time = &arithmetic_mean(@recovery_times); # # also calculate median recovery time # $median_recovery_time = &median(@recovery_times); # # calculate the mean time between failures as: # (total elapsed time since first failure + E(time until first failure))/(total # of failures) # $mtbf = (scalar(@recovery_times) == 0) ? 0 : ($time_now - $first_failure_time + (($time_now - $first_failure_time) / scalar(@recovery_times))) / scalar(@recovery_times); # # Calculate std deviation of failure times # $std_dev_recovery_time = &std_dev(@recovery_times); # # In case $max_recovery_time and $min_recovery_time are unset # (i.e. there were no failures), set them to sensible defaults. # $max_recovery_time = 0 if $max_recovery_time == $max_recovery_time_default; $min_recovery_time = 0 if $min_recovery_time == $min_recovery_time_default; return $first_failure_time, scalar(@recovery_times), $mtbf, $mean_recovery_time, $median_recovery_time, $std_dev_recovery_time, $min_recovery_time, $max_recovery_time, @l ; } sub mon_ack { # This subroutine takes a comma separated list of a group and a service # as input. # It relies on the global "$ackcomment" for its text. # # If the user is authenticated as anyone other than the default # user, mon_ack inserts their name into the ack comment. # # mon_ack sends an acknowledgement to the mon server for the given # service on the host, the effect of which is to disable any further # alerts for the current failure period. # Ack'ing a service that is not in a failure state will produce the # following error: # "520 service is in a non-failure state" # # We get around this by only presenting the option to ack for # services that are currently in a failure state. # # This function does not return any values. my ($args) = @_; my ($group, $service) = split (/\,/, $args); # Only show the rest of this page if we are allowed to see # information about this group. if ($show_watch_strict) { unless ( &can_show_group($group) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$group'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } my $retval ; my $conn = &mon_connect ; return 0 if $conn == 0 ; my $name_string = ( ($loginhash{"username"} eq $default_username) || ($loginhash{"username"} eq "") ) ? "" : "[$loginhash{'username'}]: "; # Ack the service w/comment $retval = $c->ack($group, $service, "${name_string}${ackcomment}"); if ($c->error) { $retval = $c->error; $webpage->print("mon_ack: Could not ack service \"$service\" in group \"$group\" on mon server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"); &moncgi_switch_user($retval); } else { $webpage->print("Service \"$service\" in group \"$group\" acknowledged successfully."); } } sub mon_servertime { # This subroutine calls the Mon::Client::servertime function and # returns the time as a scalar, or undef if there is an error. my $retval; my $servertime; my $conn = &mon_connect ; return 0 if $conn == 0; $servertime = $c->servertime; if ($c->error) { $retval = $c->error; print "Could not get server time on mon server "$monhost": $retval (perhaps you don't have permissions in auth.cf?)\n"; &moncgi_switch_user($retval); return undef; } else { return $servertime; } } sub mon_checkauth { # This subroutine checks the authorization for a given command # to see if it is authorized by the server. # Returns: # -1 if a connection with the server cannot be made # 0 if the command is authorized, # 1 if the command is not # authorized, and returns the error string if checkauth fails # (which really shouldn't happen unless a Mon server isn't running) my ($cmd) = @_; my $retval ; my $conn = &mon_connect ; return 0 if $conn == 0 ; return -1 if $conn == -1 ; $retval = $c->checkauth($cmd); if ($retval == 0) { # command not authorized $retval = 0; } elsif ($retval == 1) { # command authorized $retval = 1; } else { # This command should not fail unless a mon server is not running $retval = $c->error; $webpage->print("mon_checkauth: Could not check auth for \"$cmd\" on mon server "$monhost": $retval\n"); # &moncgi_switch_user($retval); } return $retval; } sub mon_state_change_enable_only { # This is called only by list_disabled, and wraps mon_state_change # because mon_state_change assumes only one group and this can # span multiple groups (although the command is enable only, which # simplifies things some). my ($group, $param); my @groups = &mon_list_watch ; foreach $param ( keys(%cgiparams) ) { # For each matching action, try to enable or else exit # in case of an authentication failure (this makes us # keep our CGi variables so that when we do authenticate, # all of our actions are executed). if ($param =~ /^enagroup_/) { &mon_enable("watch,$cgiparams{$param}") || return undef; } elsif ($param =~ /^enahost_/) { &mon_enable("host,$cgiparams{$param}") || return undef; } elsif ($param =~ /^enasvc_/) { &mon_enable("service,$cgiparams{$param}") || return undef; } } &list_disabled; } sub mon_state_change { # This function changes one or more of the states of the hostgroup # given as an argument. It uses the global value %cgiparam to know # what else (hosts and/or services) to modify. my ($group, %hosts, $host, %op, $service); $group = $cgiparams{'group'}; my %d = &mon_list_disabled ; # List the state of the group if (!defined $group) { $webpage->print("Invalid host group \"$group\"\n"); return undef; } # Disable/enable group, if that was requested if ( ( defined ($cgiparams{"group_$group"}) ) && ( $cgiparams{"group_$group"} eq "ena" ) ) { # Enable group if the group is already disabled if ( ${d{"watches"}{$group}} ) { &mon_enable("watch,$group") || return 0; } } elsif ( ( defined ($cgiparams{"group_$group"}) ) && ( $cgiparams{"group_$group"} eq "dis" ) ) { # Disable group if the group is already enabled if ( ! ${d{"watches"}{$group}} ) { &mon_disable("watch,$group") || return 0; } } @_ = &mon_list_group($group) ; # List each member of the group, check to see if there is something # defined for the group. # If so, then change the state if that is required. # Don't make a state change if one is not required. foreach $host (@_) { # Hosts are disabled if they begin with an asterisk # (this is mon's convention, not mine) #print STDERR "Host is $host\n"; #DEBUG if ($host =~ m/^\*/) { # Host is disabled, check to see if we should try and re-enable $host =~ s/^\*//; if ( ( defined($cgiparams{"host_$host"}) ) && ( $cgiparams{"host_$host"} eq "ena") ) { # Try to enable and stop if we can't. &mon_enable("host,$host") || return 0; } } else { # Host is enabled, check to see if we should try and disable if ( ( defined($cgiparams{"host_$host"}) ) && ( $cgiparams{"host_$host"} eq "dis") ) { # Try to disable and stop if we can't. &mon_disable("host,$host") || return 0; } } } # Check each service on the host to see if its state # needs to change. %op = &mon_list_opstatus; foreach $service (sort keys %{ $op{$group} }) { if ( ( defined($cgiparams{"svc_$service"}) ) && ( $cgiparams{"svc_$service"} eq "dis") ) { if (! ${d{"services"}{$group}{$service}}) { #service is enabled # Try to enable and stop if we # can't (i.e. no permissions) &mon_disable("service,$group,$service") || return 0; } } elsif ( ( defined($cgiparams{"svc_$service"}) ) && ( $cgiparams{"svc_$service"} eq "ena") ) { if (${d{"services"}{$group}{$service}}) { #service is disabled # Try to enable and stop if we # can't (i.e. no permissions) &mon_enable("service,$group,$service") || return 0; } } } &query_group($group); } ############################################################### # Mon.cgi-specific functions # The moncgi_* fuctions generally do not manipulate Mon in any # way or present any Mon output to the user. # The moncgi_* functions will each generally call one or more # mon_* functions. ############################################################### # Get the params from the form ----------------------------------------- sub moncgi_get_params { my (@names, $name, $monhost_not_null, $monport_not_null); # # First get params we know about and expect to get. # $command = $webpage->param('command'); $args = $webpage->param('args'); $rt = $webpage->param('rt'); # return to value for pages # which need to keep state info about # which page called them. $rtargs = $webpage->param('rtargs'); #args for $rt $ackcomment = $webpage->param('ackcomment'); #ackcomment # # For the login form, grab username and password. # $loginhash{'username'} = $webpage->param('username'); $loginhash{'password'} = $webpage->param('password'); # # Now get any more parameters which we may have defined. # @names = $webpage->param; foreach $name (@names) { $cgiparams{$name} = $webpage->param($name); } $monhost_not_null = 1 if defined($cgiparams{"h"}) && $cgiparams{"h"} ne ""; $monport_not_null = 1 if defined($cgiparams{"p"}) && $cgiparams{"p"} ne ""; # # Allow user to override values of monhost and monport with # CGI params. # # Untaint monhost and monport, first remove bogus characters, # then "officially" untaint them using $1 # if ( defined($monhost_not_null) ) { # print STDERR "Monhost is defined as '$cgiparams{'h'}'\n"; #DEBUG $monhost = $cgiparams{"h"} ; $monhost =~ s/[^\w.-]//g; $monhost =~ /(.*)/; $monhost = $1; } if ( defined($monport_not_null) ) { # print STDERR "Monport is defined as '$cgiparams{'p'}'\n"; #DEBUG $monport = $cgiparams{"p"} ; $monport =~ s/[^\d]//g; $monport =~ /(.*)/; $monport = $1; } # # If the user gave either a host or port argument that may (or # may not) override the hard-coded value, we need to preserve # this value and encode it in all future URL's that we # generate, so that the value will be preserved when the # mon.cgi page auto-reloads. We also undef $has_read_config, # for mod_perl namespace purposes (scenario: instance 1 # of mon.cgi is invoked with a h= parameter, instance 2 is not, # the instance 2 will eventually pick up instance 1's h= # parameter. # # if ( ( defined($monhost_not_null) ) || ( defined($monport_not_null) ) ) { #print STDERR "Monhost is defined as '$cgiparams{'h'}' (h=$monhost)\n"; #DEBUG #print STDERR "Monport is defined as '$cgiparams{'p'}' (p=$monport)\n"; #DEBUG # # The META tag doesn't respect & for some reason, so # we define another variable. # At least under Navigator 4.x, this is true. # $monhost_and_port_args_meta = "h=$monhost&p=$monport&"; $monhost_and_port_args = "h=$monhost&p=$monport&"; undef $has_read_config ; } else { #user did not enter in either monhost or monport args #print STDERR "Monport ('$monport_not_null':'$cgiparams{'p'}') and monhost('$monhost_not_null':'$cgiparams{'h'}') are undefined.\n"; #DEBUG #undef $monhost_and_port_args_meta; #undef $monhost_and_port_args; $monhost_and_port_args_meta = ""; $monhost_and_port_args = ""; } } sub moncgi_logout { # This subroutine provides the written evidence to the user # that they have been logged out of the mon server. The actual # logging out is done in the setup_page() routine, when the # auth cookie is destroyed in accordance to the value of the # global variable $destroy_auth_cookie print $webpage->hr; print $webpage->h3("User $loginhash{'username'} has been logged off.
"); print $webpage->h3("You will need to re-authenticate to perform further privileged actions."); } # # This subroutine presents the authentication form to the user, # and then runs the command the user was trying to execute with # the new, improved level of privilege. # # This subroutine is usually called because a user tried to # do something that they aren't authorized to do. # # The special command name "moncgi_login" is given when the user # wants to log in without actually doing anything that requires # privs. # sub moncgi_authform { my ($command, $args) = (@_); print $webpage->startform(-method=>'POST', ); $webpage->print("
"); print $webpage->hr; # # Test to see if the command was the special command # "moncgi_login". # if ($command eq "moncgi_login") { $command = "query_opstatus"; print $webpage->h3("Please enter your mon username and password below.
"); } else { print $webpage->h3("You must authenticate as a user of sufficient privilege to perform this command.
"); } # # Start the table. # $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print("\n"); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("
MON username:"); print $webpage->textfield(-name=>'username', -size=>8, -maxlength=>100, ); $webpage->print("
MON password:"); # Reset the passwd param so it's always blank before # printing out the password field. $webpage->param('password',""); print $webpage->password_field(-name=>'password', -size=>8, -value=>"", -maxlength=>100, ); $webpage->print("
"); print $webpage->submit(-name=>'Login', ); $webpage->print("
\n"); $webpage->print("
"); print $webpage->br; # if this works correctly, the command will be re-executed with # the credentials the user just entered $webpage->param('command', $command); $webpage->param('args', $args); $webpage->param('ackcomment', $ackcomment); # Now pass on the rest of the CGI params as hidden # fields if there are any params passed to the form. foreach (keys (%cgiparams) ) { print $webpage->hidden(-name=>"$_", -value=>"$cgiparams{$_}", ); } print $webpage->end_form; } # Generic button function --------------------------------------------- # Not strictly necessary, but could be useful if you wished to disable # certain features of the client or add new ones in a test capacity. sub moncgi_generic_button { my ($title, $command) = (@_); print $webpage->hr; print $webpage->h2("$title"); $webpage->print ("(command $command not implemented in this client)\n"); print $webpage->hr; } sub moncgi_switch_user { # This subroutine is called after a command fails because a user # is authenticated as a user without sufficient permission to perform # the requested command, so we give the user a chance to re-authenticate # as someone of sufficient privilege. my ($retval) = (@_) ; if ( ($must_login) && ($retval =~ /520 command could not be executed/) ) { # User doesn't have permission to perform command &moncgi_authform ($command,"$args") unless $has_prompted_for_auth; $has_prompted_for_auth = 1; } } # # This subroutine just prints out the legend table explaining what # colors mean what. I broke it out as a separate function so I # could experiment with its location (top or bottom of table). # # Inputs: Width of table (e.g. "70%") # Outputs: always returns 1 sub moncgi_print_service_table_legend { my ($service_table_width) = (@_); # Old way to draw table if (1 == 0 ) { $webpage->print(""); $webpage->print ("\n"); $webpage->print("\n"); $webpage->print("\n "); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("\n"); $webpage->print("
Service color legend:UncheckedGoodFailed
(no alerts sent)
Failed
(alerts sent)
Disabled
"); } # New way to draw table $webpage->print(""); $webpage->print(""); $webpage->print ("\n"); $webpage->print("\n"); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("\n"); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("
Service color legend:
(top of table)
UncheckedGoodFailed
(no alerts sent)
Failed
(alerts sent)
Disabled
$unchecked_color$greenlight_color$yellowlight_color$redlight_color$disabled_color
\n"); return 1; } sub moncgi_list_dtlog_navtable { # This function is called by list_dtlog. It takes as arguments # the following: ($url, $group, $service, $sortby, $dtlog_begin, # $dtlog_end, $total_failures, $num_events, %sortby_key). # # This function doesn't print any downtime log information per # se. It just prints out navigation for the user to browse the # downtime log. # # This function prints out a table allowing the user to navigate # between the previous N downtime log entries and the next N downtime # log entries, where N is a number we derive based on what the range # the user asked for, what the maximum entries per page to show is, # and the total number of failures observed. # # The reason it's a separate function is that we want to have the # footer placed at the top and the bottom of the downtime log # table, so we're calling it twice. my ($url, $group, $service, $sortby, $dtlog_begin, $dtlog_end, $total_failures, $num_events, %sortby_key) = (@_); my ($next_matches_lower, $next_matches_upper, $prev_matches_lower, $prev_matches_upper); return ($dtlog_begin, $dtlog_end) if $num_events == 0; # stop if we returned no downtime events # Lower bound on which failures to show as next # min($dtlog_end + 1, $total_failures $next_matches_lower = ($total_failures < $dtlog_end + 1) ? $total_failures : $dtlog_end + 1; # Upper bound on which failures to show as next # min($next_matches_lower + $dtlog_max_failures_per_page - 1, $total_failures) $next_matches_upper = ( $total_failures < ($next_matches_lower + $dtlog_max_failures_per_page - 1) ) ? $total_failures : ($next_matches_lower + $dtlog_max_failures_per_page - 1) ; # now reset $dtlog_end in case we were handed a bogus end point $dtlog_end = $next_matches_upper if ($dtlog_end > $num_events); # now reset $dtlog_begin in case we were handed a bogus begin point $dtlog_begin = $dtlog_end if $dtlog_begin > $num_events; # Lower bound on which failures to show as prev # max(1, $dtlog_begin - $dtlog_max_failures_per_page) $prev_matches_lower = ($dtlog_begin - $dtlog_max_failures_per_page > 0) ? ($dtlog_begin - $dtlog_max_failures_per_page ) : 1 ; #$webpage->print("total_failures=$total_failures, dtlog_begin=$dtlog_begin, dtlog_end=$dtlog_end, prev_matches_lower=$prev_matches_lower, next_matches_upper = $next_matches_upper, next_matches_lower = $next_matches_lower\n"); #DEBUG # Start printing the "show previous, show next" table $webpage->print("
\n"); # Only give the option to show previous entries if there are previous # entries to show. if ( ($prev_matches_lower > 0) && ($dtlog_begin > 1) ) { # in the case where ($dtlog_begin - $dtlog_max_failures_per_page) # (e.g. we are currently showing entry 9-23 and want to display 1-8) if ( ( $dtlog_begin > 1) && ($dtlog_begin - $dtlog_max_failures_per_page < 0) ) { $prev_matches_lower = 1; } # Needs to be min($dtlog_begin -1, $total_failures) $prev_matches_upper = ($total_failures < $dtlog_begin - 1) ? $total_failures : $dtlog_begin - 1; printf("" , $url, ${monhost_and_port_args}, $group, $service, $sortby, $prev_matches_lower, $prev_matches_upper ); printf("See events %d-%d\n" , $prev_matches_lower, $prev_matches_upper) ; } # Print the current entries being shown $webpage->print("Displaying downtime events $dtlog_begin-$dtlog_end of $total_failures
(sorting by $sortby_key{$sortby})
"); # Only give the option to show subsequent entries if there are subsequent # entries to show. if ( ($next_matches_upper <= $total_failures) && ($dtlog_end < $total_failures) ) { printf("", $url, ${monhost_and_port_args}, $group, $service, $sortby, $next_matches_lower , $next_matches_upper ); printf("See events %d-%d\n", $next_matches_lower , $next_matches_upper) ; } $webpage->print("

\n"); # It is possible for these values to be changed in this routine, # so send back the values to the calling routine. return ($dtlog_begin, $dtlog_end); } sub moncgi_test_all { # Tests all services for a particular hostgroup # # Inputs: group name # Outputs: void # my ($group) = (@_); # Only show the rest of this page if we are allowed to see # information about this group. if ($show_watch_strict) { unless ( &can_show_group($group) ) { print $webpage->h3("You are not authorized to see detailed information for hostgroup '$group'."); print $webpage->h4("Please contact your system administrator for access."); return 0; } } my (%s, $service); my $conn = &mon_connect; return 0 if $conn == 0; %s = &mon_list_opstatus; if ( ! &mon_checkauth("test") ) { #command is unauthorized &moncgi_switch_user("520 command could not be executed"); return 0; } foreach $service (keys %{$s{$group}}) { &mon_test_service("$group,$service"); } return 1; } # # This subroutine presents a table for the user to reset keepstate # or to reset without keepstate. To do nothing is also an option. # sub moncgi_reset { my $reset_command_bgcolor = $auth_commands{'reset'}{'auth'} == 0 ? "bgcolor=\"$disabled_color\"" : $BGCOLOR ; $webpage->print("

"); $webpage->print(""); $webpage->print(""); $webpage->print(""); $webpage->print("
Reset mon server, keep scheduler state

This clears the state of the scheduler, re-reads the config file, and keeps all watches/hosts/services in their current disabled/enabled state.

You usually want to use this method of resetting the server.

Reset mon server, reset scheduler state

This clears the state of the scheduler, re-reads the config file, and sets all watches/hosts/services to ENABLED.

You only want to use this option if you're sure you want to re-enable all groups/hosts/services!

Do nothing and return to the main status screen


"); return 1; } # # This subroutine reads and parses the optional mon.cgi config file, # and alters global variable values accordingly. # # Inputs: config filename to read # # Outputs: 1 if up-to-date config file has been read # 0 if config file contains errors. # undef if up-to-date config file was not read (file not found) # sub moncgi_read_cf { my ($cf_file) = (@_) ; my ($newcf_file_mtime, $key, $val); # # Test that config file can be read, and if it can, check to see # if we've already read it. # if(-r $cf_file) { # First we check if we have ever read a config file. This # is controlled by the global variable $has_read_config # If so, then we check to see if the copy we have read is # the latest copy. # If not, then we try to read the config file. if ($has_read_config) { # # Since we've already read a config file at least once, # we now check to see if we need to read it again. # We re-read the config file if the mtime of the config # file is different (older OR newer) than the last config # file we read. # $newcf_file_mtime = (stat($cf_file))[9]; if ($newcf_file_mtime == $cf_file_mtime) { #print STDERR "Skipping config read ($newcf_file_mtime == $cf_file_mtime)\n"; #DEBUG return 1; } else { # # Record the new mtime # #print STDERR "Re-reading config ($newcf_file_mtime == $cf_file_mtime)\n"; #DEBUG $cf_file_mtime = $newcf_file_mtime; } } else { # # We've never read the config before, so we get the # initial config file mtime. # $cf_file_mtime = (stat($cf_file))[9]; } } else { print STDERR "mon.cgi: moncgi_read_cf: Unable to open config file '$cf_file' for reading: $!\n"; return undef; } # # Start reading config file # if ( open (CF , "$cf_file") ) { while () { chomp; # Skip blank lines and comment lines next if /^\s*\#/; next if /^\s*$/; # Strip off extra blank space at beg. and end of each line s/^\s*//; s/\s*$//; # # Parse config file lines # if (/^(\w+) \s* = \s* (.*) \s*$/ix) { $key = $1; $val = $2; # # Trivially untaint $key and $val. # We'll do the "real" untainting within the # config file parsing, later. # $key =~ /(.*)/; $key = $1; $val =~ /(.*)/; $val = $1; # # Look for matching key/value pairs and assign them # to the proper variable. Complain if a key/value # pair doesn't match. # if ($key eq "organization") { $organization = $val; } elsif ($key eq "monadmin") { $monadmin = $val; } elsif ($key eq "logo") { $logo = $val ; } elsif ($key eq "logo_link") { $logo_link = $val ; } elsif ($key eq "reload_time") { if ($val <= 0) { print STDERR "mon.cgi: moncgi_read_cf: $cf_file: dtlog_max_failures_per_page must be a number > 0\n"; return 0; } $reload_time = $val; } elsif ($key eq "monhost") { # strip out all non-valid chars for taint purposes $val =~ s/[^\d\w.-]//g; $monhost = $val; } elsif ($key eq "monport") { # strip out all non-digits for taint purposes $val =~ s/[^\d]//g; if ($val <= 0) { print STDERR "mon.cgi: moncgi_read_cf: $cf_file: monport must be a number > 0\n"; return 0; } $monport = $val; } elsif ($key eq "must_login") { $must_login = $val; } elsif ($key eq "app_secret") { $app_secret = $val; } elsif ($key eq "default_username") { $default_username = $val; } elsif ($key eq "default_password") { $default_password = $val; } elsif ($key eq "login_expire_time") { if ($val <= 0) { print STDERR "mon.cgi: moncgi_read_cf: $cf_file: login_expire_time must be a number > 0\n"; return 0; } $login_expire_time = $val; } elsif ($key eq "cookie_name") { $cookie_name = $val; } elsif ($key eq "cookie_path") { $cookie_path = $val; } elsif ($key eq "fixed_font_face") { $fixed_font_face = $val; } elsif ($key eq "sans_serif_font_face") { $sans_serif_font_face = $val; } elsif ($key eq "BGCOLOR") { $BGCOLOR = $val; } elsif ($key eq "TEXTCOLOR") { $TEXTCOLOR = $val; } elsif ($key eq "LINKCOLOR") { $LINKCOLOR = $val; } elsif ($key eq "VLINKCOLOR") { $VLINKCOLOR = $val; } elsif ($key eq "greenlight_color") { $greenlight_color = $val; } elsif ($key eq "redlight_color") { $redlight_color = $val; } elsif ($key eq "yellowlight_color") { $yellowlight_color = $val; } elsif ($key eq "unchecked_color") { $unchecked_color = $val; } elsif ($key eq "dtlog_max_failures_per_page") { if ($val <= 0) { print STDERR "mon.cgi: moncgi_read_cf: $cf_file: dtlog_max_failures_per_page must be a number > 0\n"; return 0; } $dtlog_max_failures_per_page = $val; } elsif ($key eq "untaint_ack_msgs") { $untaint_ack_msgs = $val; } elsif ($key eq "watch") { push(@show_watch, $val); } elsif ($key eq "show_watch_strict") { $show_watch_strict = $val; } else { print STDERR "mon.cgi: moncgi_read_cf: Unknown key-value pair in config file $cf_file: '$key = $val'\n"; return 0; } } else { print STDERR "mon.cgi: moncgi_read_cf: Unparseable config file line in config file '$cf_file': $_\n"; return 0; } } } else { print STDERR "mon.cgi: moncgi_read_cf: Unable to open config file '$cf_file' for reading: $!\n"; return undef; } # # If we've gotten this far, it means the config file was # successfully parsed. # # # Set the global variable that indicates that we have read the # config file. # $has_read_config = 1; #print STDERR "Read the config file!\n"; #DEBUG return 1; } # # This subroutine allows the user to log in without attempting # a privileged action. Trivial but useful. # sub moncgi_login { &moncgi_authform ("moncgi_login",""); } # # This subroutine allows site admins to define their own custom # rows of the command table, which will appear as a third row to the # main command table. Take a look at the sample that's here below. # # You're on your own here with whatever code you put in this subroutine! # Go nuts! # sub moncgi_custom_print_bar { # # This is a sample routine, contributed by Ed Ravin (eravin@panix.com). # # Everything is commented out, and none of the functions are implemented # here, but this should give you the idea of what a custom command # bar would look like. # my ($face)= (@_); #$webpage->print("\n"); #$webpage->print("\tLaunch Space Shuttle\n"); #$webpage->print("\tTake Coffee Break for 30 minutes\n"); #$webpage->print("\tReset Soda Machine\n"); #$webpage->print("\tAcknowledge All Alarms\n"); #$webpage->print("\n"); } # # This subroutine extends the main command processing loop at the end # of this script with your own custom commands. Used in conjunction # with moncgi_custom_print_bar, and other subroutines that you # define in this script, moncgi_custom_commands provides a nice way # to extend mon.cgi at your site. # From Ed Ravin (eravin@panix.com) # sub moncgi_custom_commands { if ($command eq "replace_this_string_with_a_custom_command") { &setup_page("Custom Command #1"); # &call_your_first_custom_code_here; } elsif ($command eq "replace_this_string_with_another_custom_command") { &setup_page("Custom Command #2"); # &call_your_second_custom_code_here; } # as many other custom commands as you care to define go below else # didn't find anything { return 0; } 1; # did find something, suppress further command processing } ############################################################### # Main program ############################################################### # # Instantiate the mon client # $c = new Mon::Client ( host => $monhost, port => $monport, ); if ($command eq "query_opstatus" ){ # Summary opstatus view &setup_page("Operation Status: Summary View"); &query_opstatus("summary"); } elsif ($command eq "query_group" ){ # Expand hostgroup. &setup_page("Group Expansion"); &query_group($args); } elsif ($command eq "list_alerthist"){ # Alert history button. &setup_page("List the alert history"); &list_alerthist; } elsif ($command eq "svc_details"){ # View failure details. &setup_page("Service Details"); &svc_details($args); } elsif ($command eq "list_disabled"){ # List disabled hosts button. &setup_page("List disabled hosts"); &list_disabled; } elsif ($command eq "mon_test_service"){ # Test a service immediately &setup_page("Test Service"); &mon_test_service($args); sleep 1; # if we don't sleep here, svc_details will kick off before # the test does and it will look like we aren't running a test. &svc_details($args); } elsif ($command eq "mon_schedctl"){ # Stop/start the scheduler &setup_page("$args scheduler"); &mon_schedctl ($args); &query_opstatus("summary"); } elsif ($command eq "list_dtlog"){ # List the downtime log &setup_page("List Downtime Log"); &list_dtlog($args); } elsif ($command eq "mon_disable"){ # Disable a host/group/service &setup_page("Disable alert for host, group. or service"); &mon_disable($args); if ($rt eq "query_group") { &query_group($rtargs); } else { &query_opstatus("summary"); } } elsif ($command eq "mon_enable"){ # Enable a host/group/service &setup_page("Enable alert for host, group, or service"); &mon_enable($args); if ($rt eq "query_group") { &query_group($rtargs); } elsif ($rt eq "svc") { &query_opstatus("summary"); } else { &query_opstatus; } } elsif ($command eq "list_pids"){ # View pid button. &setup_page("List pids of server, alerts and monitors."); &list_pids; } elsif ($command eq "mon_reset"){ # Reset mon button. &setup_page("Reset mon"); &mon_reset($args); } elsif ($command eq "mon_ack"){ # Reset mon button. &setup_page("Acknowledge service failure"); &mon_ack($args); #$ackcomment is a global &svc_details($args); } elsif ($command eq "mon_reload"){ # Reload button. &setup_page("Reload Mon"); &mon_reload($args); } elsif ($command eq "mon_loadstate"){ # load mon scheduler state &setup_page("Load Scheduler State"); # right now we expect $args to be empty, since loadstate doesn't take # any arguments, but someday it might, so we prepare. &mon_loadstate($args); } elsif ($command eq "mon_savestate"){ # save mon scheduler state &setup_page("Save Scheduler State"); # right now we expect $args to be empty, since loadstate doesn't take # any arguments, but someday it might, so we prepare. &mon_savestate($args); } elsif ($command eq "moncgi_logout"){ # Log out as auth user. $destroy_auth_cookie = "yes"; &setup_page("Logging out"); &moncgi_logout; } elsif ($command eq "query_opstatus_full"){ # Full operations status &setup_page("Operation Status: Full View"); &query_opstatus("full"); } elsif ($command eq "query_opstatus_failures") { &setup_page("Operation Status: Failures Only"); &query_opstatus("failures"); # Selection "mon_opstatus" will fall through to else. } elsif ($command eq "mon_state_change") { &setup_page("Operation Status: Disable/Enable Groups/Hosts/Services"); &mon_state_change; } elsif ($command eq "mon_state_change_enable_only") { &setup_page("Operation Status: Enable Groups/Hosts/Services"); &mon_state_change_enable_only; } elsif ($command eq "moncgi_test_all") { &setup_page("Operation Status: Test All Services In Group"); &moncgi_test_all($args); } elsif ($command eq "mon_test_config") { &setup_page("Test Mon Config File Syntax"); &mon_test_config($args); } elsif ($command eq "moncgi_reset") { &setup_page("Reset mon Server"); &moncgi_reset($args); } elsif ($command eq "moncgi_login") { &setup_page("Log In To mon Server"); &moncgi_login($args); } elsif ( &moncgi_custom_commands) # check for user extensions { # The actual custom commands are processed # inside &moncgi_custom_commands. # # moncgi_custom_commands returns non-zero if it finds # a command to execute; } else { # All else. &setup_page("Operation Status: Summary View"); &query_opstatus("summary"); } $webpage->print("
"); # # Some stuff we keep around for debugging # #print "commands is $command, args is $args
\n"; #DEBUG #print $webpage->dump; #DEBUG &end_page; $c->disconnect(); mon-1.2.0/clients/skymon/0000755003616100016640000000000010640450346015172 5ustar trockijtrockijmon-1.2.0/clients/skymon/README0000644003616100016640000000271010230411542016040 0ustar trockijtrockij$Id: README,v 1.2 2005/04/17 07:42:26 trockij Exp $ This is a "moncmd" interface to a SkyTel 2-way pager. It utilizes procmail filters and password authentication to do its trick. I would not call this a "secure" authentication mechanism, but in Marcus Ranum-speak it is "really nice". Use at your own risk. It would be even more "really nice" if this did SecureID or S/Key. Also keep in mind that all queries and all results pass through the Great Wide Internet to get back to your pager. INSTALLATION 1. Do this from the /usr/doc/mon/examples directory: mkdir ~/.skytel chmod 0700 ~/.skytel cp skymon.allow ~/.skytel/allow 2. Create an encrypted password using the following Perl snippet, substituting "password" with the password that you want, and "salt" with a *2-letter* salt. perl -e 'print crypt("password", "salt"), "\n"' > ~/.skytel/password chmod 0600 password 3. Add the contents of the "skymon.procmail" file to your .procmailrc. OPERATION Commands are sent via email with the following format: /password:command Commands are the following, and can only be used if they exist in the "allow" file: eh enable host es enable service ew enable watch dh host reason disable host ds watch service reason disable service dw watch reason disable watch lf list failures ld list disabled The idea behind the brevity is that it's a pain to compose messages on that silly little keypad. --------------------- Jim Trocki trockij@arctic.org mon-1.2.0/clients/skymon/allow0000644003616100016640000000024410061516617016234 0ustar trockijtrockij# # commands to allow through the pager interface # # $Id: allow,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $ # # enable eh es ew # disable dh ds dw # list lf ld mon-1.2.0/clients/skymon/procmail0000644003616100016640000000045010061516617016723 0ustar trockijtrockij#Add this entry to your .procmailrc file, substituting "PIN" for #your SkyTel 7-digit 2-way PIN, and your SkyTel pager email address #(PIN@skymail.com). # # $Id: procmail,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $ # :0 H B D b c * ^From: PIN@skytel.com * ^/[a-zA-Z0-9]*: |skymon PIN@skymail.com mon-1.2.0/clients/skymon/skymon0000755003616100016640000001105110230411542016424 0ustar trockijtrockij#!/usr/bin/perl # # handle mon commands send via a 2-way pager # # see /usr/doc/mon/README.skymon for information # # skytel pager email address should be supplied as first argument # # Jim Trocki, trockij@arctic.org # # $Id: skymon,v 1.2 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # $PASS = ""; $BUF = "\n"; $MONHOST = "monhost"; $ADDR = shift; die "no address specified\n" if ($ADDR eq ""); # # load command permissions # &load_allow() || die "could not load allow file:$!\n"; $p = 0; while (<>) { if (/^\/(\w+):(.*)/) { $password = $1; $cmd = $2; &check_password ($password) || die "pass\n"; foreach $c (split (/;/, $cmd)) { if ($BUF ne "\n" && $p) { $BUF .= "----\n"; } &parse_cmd ($c); $p = 1; } } } close (OUT); &mail_cmd() if ($BUF ne "\n"); exit; # # check password # sub check_password { my ($pass) = @_; my ($salt); &load_password() || return 0; print "$pass [$PASS]\n"; $salt = substr ($PASS, 0, 2); if (crypt ($pass, $salt) ne $PASS) { return 0; } return 1; } sub load_allow { my ($l); open (P, "$ENV{HOME}/.skytel/allow") || return 0; while (

) { next if /^\s*$/; next if /^\s*#/; $l = $_; chomp $l; $allow{$l} = 1; } close (P); } sub load_password { open (P, "$ENV{HOME}/.skytel/password") || return 0; $PASS =

; close (P); chomp $PASS; return 1; } sub load_address { open (P, "$ENV{HOME}/.skytel/address") || return 0; $ADDR =

; close (P); chomp $ADDR; return 1; } sub parse_cmd { my ($cmd) = @_; my ($c, @args); ($c, @args) = split (/\s+/, $cmd); # # list # if ($c eq "lf" && $allow{"lf"}) { &do_list (@args); } elsif ($c eq "ld" && $allow{"ld"}) { &do_list_disabled(@args); # # disable # } elsif ($c eq "dh" && $allow{"dh"}) { &do_command ("/usr/local/bin/moncmd -s $MONHOST disable host @args"); } elsif ($c eq "dw" && $allow{"dw"}) { &do_command ("/usr/local/bin/moncmd -s $MONHOST disable watch @args"); } elsif ($c eq "ds" && $allow{"ds"}) { &do_command ("/usr/local/bin/moncmd -s $MONHOST disable service @args"); # # enable # } elsif ($c eq "eh" && $allow{"eh"}) { &do_command ("/usr/local/bin/moncmd -s $MONHOST enable host @args"); } elsif ($c eq "ew" && $allow{"ew"}) { &do_command ("/usr/local/bin/moncmd -s $MONHOST enable watch @args"); } elsif ($c eq "es" && $allow{"es"}) { &do_command ("/usr/local/bin/moncmd -s $MONHOST enable service @args"); # # ack (not yet implemented) # } elsif ($c eq "a" && $allow{"a"}) { } } # # list failures # sub do_list { my (@args) = @_; my ($g, $s, $o, $l, $p); open (IN, "/usr/local/bin/moncmd -s monhost list failures|") || return; $p = 0; while () { last if (/220.*completed/); $l = $_; chomp $l; ($g, $s, $o) = ($l =~ (/^(\S+)\s+(\S+)\s+\d+\s+\d+\s+failed\s+(.*)/)); $BUF .= "\n" if ($p); $BUF .= "$g/$s:$o\n"; $p = 1; } close (IN); } # # list disabled # sub do_list_disabled { my (@args) = @_; open (IN, "/usr/local/bin/moncmd -s $MONHOST list disabled|") || return; $p = 0; while () { last if (/220.*completed/); $l = $_; chomp $l; $BUF .= "\n" if ($p); $BUF .= "$l\n"; $p = 1; } close (IN); } # # do_command # sub do_command { my ($cmd) = @_; my ($p); open (C, "$cmd|") || return; $p = 0; while () { $BUF .= "\n" if ($p); $BUF .= $_; $p = 1; } close (C); } # # mail the buffer back to the pager # sub mail_cmd { # &load_address() || die "could not load address\n"; # print "$BUF"; open (MAIL, "| /usr/lib/sendmail -oi -t") || die "could not open pipe to mail: $!\n"; print MAIL < "localhost", "full" => 0, "show-disabled" => 0, "bg" => "d5d5d5", "bg-ok" => "a0d0a0", "bg-fail" => "e088b7", "bg-untested" => "e0e0e0", "table-color" => "cccccc", "summary-len" => 20, }; my $GLOBAL = { "view-name" => undef, }; # # read config file # my ($e, $what) = read_cf ($CF); if ($e ne "") { err_die ("while reading config file, $e"); } # # cmdline args override config file # if (!$CGI) { $CF->{"all"} = 1 if ($opt{"showall"}); $CF->{"full"} = 1 if ($opt{"full"}); $CF->{"detail"} = $opt{"detail"} if ($opt{"detail"} ne ""); $CF->{"host"} = $opt{"server"} if ($opt{"server"}); $CF->{"port"} = $opt{"port"} if ($opt{"port"}); $CF->{"prot"} = $opt{"prot"} if ($opt{"prot"}); } # # Interrupt to see if a watch= option was specified # if( $CGI && $QUERY_ARGS{'watch'} ) { # # make a single entry of a pointer to a single array # $what = [ [ $QUERY_ARGS{"watch"} ] ]; } elsif( ! $CGI && $opt{'watch'} ) { # # make a single entry of a pointer to a single array # $what = [ [ $opt{"watch"} ] ]; } # # retrieve client status # my ($e, $st) = get_client_status ($what); if ($e ne "") { err_die ($e); } expand_watch ($what, $st); my $rows = select_table ($what, $st); compose_header ($st->{"state"}); # # CGI invocation # if ($CGI) { if ($QUERY_ARGS{"disabled"}) { compose_disabled ($st->{"disabled"}); } elsif (!$QUERY_ARGS{"detail"}) { compose_table ($rows, $st); compose_disabled ($st->{"disabled"}) if ($CF->{"show-disabled"}); if ($QUERY_ARGS{"watch"}) { $OUT_BUF .= <Back to summary table
EOB } } elsif ($QUERY_ARGS{"detail"}) { compose_detail ($QUERY_ARGS{"detail"}, $st); } } # # cmdline invocation # else { if ($opt{"disabled"}) { compose_disabled ($st->{"disabled"}); } elsif ($CF->{"detail"} ne "") { compose_detail ($CF->{"detail"}, $st); } else { compose_table ($rows, $st); compose_disabled ($st->{"disabled"}) if ($CF->{"show-disabled"}); } } compose_trailer; if ($CGI) { print <query ("login"); $pass = $CGI->query ("password"); } else { if ($opt{"login"}) { $login = $opt{"login"}; } else { return "could not determine username" if (!defined ($login = getpwuid($EUID))); } if (-t STDIN) { system "stty -echo"; print "Password: "; chop ($pass = ); print "\n"; system "stty echo"; return "invalid password" if ($pass =~ /^\s*$/); } else { my $cmd; while (defined ($cmd = <>)) { chomp $cmd; if ($cmd =~ /^user=(\S+)$/i) { $login = $1; } elsif ($cmd =~ /^pass=(\S+)$/i) { $pass = $1; } last if (defined ($login) && defined ($pass)); } } } return "inadequate authentication information supplied" if ($login eq "" || $pass eq ""); return ("", $login, $pass); } # # config file # sub read_cf { my $CF = shift; my ($group, $service); my @RC; my $view = 0; my $RC = "/etc/mon/monshowrc"; if ($CGI) { if ($ENV{"PATH_INFO"} =~ /^\/\S+/) { my $p=$ENV{"PATH_INFO"}; $p =~ s/^[.\/]*//; $p =~ s/\/*$//; $p =~ s/\.\.//g; $RC = "$VIEWPATH/$p"; $GLOBAL->{"view-name"} = $p; $view = 1; } elsif ($QUERY_ARGS{"view"} ne "") { $QUERY_ARGS{"view"} =~ s/^[.\/]*//; $QUERY_ARGS{"view"} =~ s/\.\.//g; $GLOBAL->{"view-name"} = $QUERY_ARGS{"view"}; $RC = "$VIEWPATH/$QUERY_ARGS{view}"; $view = 1; } elsif (-f ".monshowrc") { $RC = ".monshowrc"; } } else { if ($opt{"rcfile"}) { $RC = $opt{"rcfile"}; } elsif ($opt{"view"} ne "") { $RC = "$VIEWPATH/$opt{view}"; $GLOBAL->{"view-name"} = $opt{"view"}; $view = 1; } elsif (-f "$ENV{HOME}/.monshowrc") { $RC = "$ENV{HOME}/.monshowrc"; } } if ($opt{"old"}) { $CF->{"prot"} = "0.37.0"; $CF->{"port"} = 32777; } if (-f $RC) { open (IN, $RC) || return "could not read $RC: $!"; my $html_header = 0; my $link_text = 0; my $link_group = ""; my $link_service = ""; while () { next if (/^\s*#/ || /^\s*$/); if ($html_header) { if (/^END\s*$/) { $html_header = 0; next; } else { $CF->{"html-header"} .= $_; next; } } elsif ($link_text) { if (/^END\s*$/) { $link_text = 0; next; } else { $CF->{"links"}->{$link_group}->{$link_service}->{"link-text"} .= $_; next; } } else { chomp; s/^\s*//; s/\s*$//; } if (/^set \s+ (\S+) \s* (\S+)?/ix) { my $cmd = $1; my $arg = $2; if ($cmd eq "show-disabled") { } elsif ($cmd eq "host") { } elsif ($cmd eq "prot") { } elsif ($cmd eq "port") { } elsif ($cmd eq "full") { } elsif ($cmd eq "bg") { } elsif ($cmd eq "bg-ok") { } elsif ($cmd eq "bg-fail") { } elsif ($cmd eq "bg-untested") { } elsif ($cmd eq "table-color") { } elsif ($cmd eq "html-header") { $html_header = 1; next; } elsif ($cmd eq "refresh") { } elsif ($cmd eq "summary-len") { } else { print STDERR "unknown set, line $.\n"; next; } if ($arg ne "") { $CF->{$cmd} = $arg; } else { $CF->{$cmd} = 1; } } elsif (/^watch \s+ (\S+)/x) { push (@RC, [$1]); } elsif (/^service \s+ (\S+) \s+ (\S+)/x) { push (@RC, [$1, $2]); } elsif (/^link \s+ (\S+) \s+ (\S+) \s+ (.*)\s*/ix) { $CF->{"links"}->{$1}->{$2}->{"link"} = $3; } elsif (/^link-text \s+ (\S+) \s+ (\S+)/ix) { $link_text = 1; $link_group = $1; $link_service = $2; next; } else { my $lnum = $.; close (IN); err_die ("error in config file, line $."); } } close (IN); } elsif (! -f $RC && $view) { err_die ("no view found"); } return ("", \@RC); } sub secs_to_hms { my ($s) = @_; my ($dd, $hh, $mm, $ss); $dd = int ($s / 86400); $s -= $dd * 86400; $hh = int ($s / 3600); $s -= $hh * 3600; $mm = int ($s / 60); $s -= $mm * 60; $ss = $s; if ($dd == 0) { sprintf("%02d:%02d:%02d", $hh, $mm, $ss); } else { sprintf("%d days, %02d:%02d:%02d", $dd, $hh, $mm, $ss); } } # # exit displaying error in appropriate output format # sub err_die { my $msg = shift; if ($CGI) { print < Error

Error

$msg
EOF } else { print < All systems OK

EOF } else { print "\nAll systems OK\n"; } } # # client status # # return ("", $state, \%opstatus, \%disabled, \%deps, \%groups, \%descriptions); # sub get_client_status { my $what = shift; my $cl; if (!defined ($cl = Mon::Client->new)) { return "could not create client object: $@"; } my ($username, $pass); if ($opt{"auth"} && !$CGI) { my $e; ($e, $username, $pass) = get_auth; if ($e ne "") { return "$e"; } $cl->username ($username); $cl->password ($pass); } $cl->host ($CF->{"host"}) if (defined $CF->{"host"}); $cl->port ($CF->{"port"}) if (defined $CF->{"port"}); $cl->port ($CF->{"prot"}) if (defined $CF->{"prot"}); $cl->connect; if ($cl->error) { return "Could not connect to server: " . $cl->error; } # # authenticate self to the server if necessary # if ($opt{"auth"} && !defined ($cl->login)) { my $e = $cl->error; $cl->disconnect; return "Could authenticate: $e"; } # # get disabled things # my %disabled = $cl->list_disabled; if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not get disabled: $e"; } # # get state # my ($running, $t) = $cl->list_state; if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not get state: $e"; } my $state; if ($running) { $state = $t; } else { $state = "scheduler stopped since " . localtime ($t); } # # group/service list # my @watch = $cl->list_watch; if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not get opstatus: $e"; } # # get opstatus # my %opstatus; if (@{$what} == 0) { %opstatus = $cl->list_opstatus; if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not get opstatus: $e"; } } else { my @list; foreach my $r (@{$what}) { if (@{$r} == 2) { push @list, $r; } else { foreach my $w (@watch) { next if ($r->[0] ne $w->[0]); push @list, $w; } } } %opstatus = $cl->list_opstatus (@list); if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not get opstatus: $e"; } } # # dependencies # my %deps; if ($opt{"deps"}) { %deps = $cl->list_deps; if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not list deps: $e"; } } # # descriptions # my %desc; %desc = $cl->list_descriptions; if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not list descriptions: $e"; } # # groups # my %groups; if ($QUERY_ARGS{"detail"} || $CF->{"detail"} ne "") { foreach my $g (keys %opstatus) { my @g = $cl->list_group ($g); if ($cl->error) { my $e = $cl->error; $cl->disconnect; return "could not list group: $e"; } grep {s/\*//} @g; $groups{$g} = [@g]; } } # # log out # if (!defined $cl->disconnect) { return "error while disconnecting: " . $cl->error; } return ("", { "state" => $state, "opstatus" => \%opstatus, "disabled" => \%disabled, "deps" => \%deps, "groups" => \%groups, "desc" => \%desc, "watch" => \@watch, }); } sub compose_header { my $state = shift; my $t = localtime; # # HTML stuff # if ($CGI) { $OUT_BUF = < Operational Status EOF if ($CF->{"refresh"}) { $OUT_BUF .= < {refresh}> EOF } $OUT_BUF .= < EOF if ($CF->{"html-header"} ne "") { $OUT_BUF .= $CF->{"html-header"}; } $OUT_BUF .= <Operational Status EOF } foreach my $l (@{$rows}) { my ($depstate, $group, $service) = @{$l}; my $sref = \%{$st->{"opstatus"}->{$group}->{$service}}; $STATUS = "unknown"; $TIME = ""; $DEP = $depstate; my $last = ""; my $bgcolor = opstatus_color ($sref->{"opstatus"}); if ($sref->{"opstatus"} == $OPSTAT{"untested"}) { $STATUS = "untested"; $TIME = "untested"; } elsif ($sref->{"opstatus"} == $OPSTAT{"ok"}) { $STATUS = "-"; } elsif ($sref->{"opstatus"} == $OPSTAT{"fail"}) { if ($sref->{"ack"}) { if ($CGI) { $STATUS = "" . "ACK FAIL"; } else { $STATUS = "ACK FAIL"; } } else { $STATUS = "FAIL"; } } if ($depstate eq "") { $DEP = "-"; } $GROUP = $group; $SERVICE = $service; $DESC = $st->{"desc"}->{$group}->{$service}; $DESC = pre_pad_if_empty ($DESC) if ($CGI); if ($TIME eq "") { $TIME = tdiff_string (time - $sref->{"last_check"}); } if ($sref->{"timer"} < 60) { $NEXT = "$sref->{timer}s"; } else { $NEXT = secs_to_hms ($sref->{"timer"}); } if (length ($sref->{"last_summary"}) > $CF->{"summary-len"}) { $SUMMARY = substr ($sref->{"last_summary"}, 0, $CF->{"summary-len"}) . "..."; } else { $SUMMARY = $sref->{"last_summary"}; } $ALERTS = $sref->{"alerts_sent"} || "none"; my $fmt; if (!$CGI) { $fmt = < EOF } } if ($CGI) { $OUT_BUF .= < EOF } } sub compose_disabled { my $disabled = shift; if (!keys %{$disabled->{"watches"}} && !keys %{$disabled->{"services"}} && !keys %{$disabled->{"hosts"}}) { if ($CGI) { $OUT_BUF .= < Nothing is disabled.

EOF } else { print "\nNothing is disabled.\n"; } return; } if (keys %{$disabled->{"watches"}}) { if ($CGI) { $OUT_BUF .= < Disabled Watches EOF } else { print "\nDISABLED WATCHES:\n"; } foreach my $watch (keys %{$disabled->{"watches"}}) { if ($CGI) { $OUT_BUF .= "$watch
\n"; } else { print "$watch\n"; } } } my @disabled_services; foreach my $watch (keys %{$disabled->{"services"}}) { foreach my $service (keys %{$disabled->{"services"}{$watch}}) { push (@disabled_services, "$watch $service");; } } if (@disabled_services) { if ($CGI) { $OUT_BUF .= < Disabled Services EOF } else { print "\nDISABLED SERVICES\n"; } for (@disabled_services) { if ($CGI) { $OUT_BUF .= "$_
\n"; } else { print "$_\n"; } } } if (keys %{$disabled->{"hosts"}}) { if ($CGI) { $OUT_BUF .= < Disabled Hosts EOF } else { print "\nDISABLED HOSTS:\n"; } foreach my $group (keys %{$disabled->{"hosts"}}) { my @HOSTS = (); foreach my $host (keys %{$disabled->{"hosts"}{$group}}) { push (@HOSTS, $host); } if ($CGI) { $OUT_BUF .= sprintf ("%-15s %s
\n", $group, "@HOSTS"); } else { printf ("%-15s %s\n", $group, "@HOSTS"); } } } } sub compose_trailer { if ($CGI) { $OUT_BUF .= < EOF } } sub compose_detail { my ($args, $st) = @_; my ($group, $service) = split (/,/, $args, 2); if (!defined ($st->{"opstatus"}->{$group}->{$service})) { err_die ("$group/$service not a valid service"); } my $sref = \%{$st->{"opstatus"}->{$group}->{$service}}; my $d; my $bgcolor = opstatus_color ($sref->{"opstatus"}); $bgcolor = "bgcolor=\"#$bgcolor\"" if ($bgcolor ne ""); foreach my $k (keys %OPSTAT) { if ($OPSTAT{$k} == $sref->{"opstatus"}) { $sref->{"opstatus"} = "$k ($sref->{opstatus})"; last; } } foreach my $k (qw (opstatus exitval last_check timer ack ackcomment)) { if ($CGI && $sref->{$k} =~ /^\s*$/) { $d->{$k} = "

 
"; } else { $d->{$k} = $sref->{$k}; } } my $t = time; $d->{"last_check"} = tdiff_string ($t - $d->{"last_check"}) . " ago"; $d->{"timer"} = "in " . tdiff_string ($d->{"timer"}); foreach my $k (qw (last_success last_failure first_failure last_alert)) { if ($sref->{$k}) { $d->{$k} = localtime ($sref->{$k}); } } if ($sref->{"first_failure"}) { $d->{"failure_duration"} = secs_to_hms ($sref->{"failure_duration"}); } # # HTML output # if ($CGI) { my $sum = pre_pad_if_empty ($sref->{"last_summary"}); my $descr = pre_pad_if_empty ($st->{"desc"}->{$group}->{$service}); my $hosts = pre_pad_if_empty ("@{$st->{groups}->{$group}}"); $OUT_BUF .= <Detail for $group/$service
EOF if ($GLOBAL->{"view-name"} ne "") { $OUT_BUF .= < EOF } $OUT_BUF .= <
Server: $CF->{host}
Time: $t
State: $state
View: $GLOBAL->{"view-name"}
Color legend
all is OK
failure
untested


EOF } else { print <{host} time: $t state: $state EOF } } sub select_table { my ($what, $st) = @_; my @rows; # # display everything real nice # if ($CF->{"all"} || @{$what} == 0) { foreach my $group (keys %{$st->{"opstatus"}}) { foreach my $service (keys %{$st->{"opstatus"}->{$group}}) { push (@rows, [$group, $service]); } } } else { @rows = @{$what}; } my (%DEP, %DEPROOT); foreach my $l (@rows) { my ($group, $service) = @{$l}; my $sref = \%{$st->{"opstatus"}->{$group}->{$service}}; next if (!defined $sref->{"opstatus"}); # # disabled things # # fuckin' Perl, man. Just be referencing # $st->{"disabled"}->{"watches"}->{$group}, perl automagically # defines that hash element for you. Great. # if (defined ($st->{"disabled"}->{"watches"}) && defined ($st->{"disabled"}->{"watches"}->{$group})) { next; } elsif (defined ($st->{"disabled"}->{"services"}) && defined ($st->{"disabled"}->{"services"}->{$group}) && defined ($st->{"disabled"}->{"services"}->{$service})) { next; } # # potential root dependencies # elsif ($sref->{"depend"} eq "") { push (@{$DEPROOT{$sref->{"opstatus"}}}, ["R", $group, $service]); } # # things which have dependencies # else { push (@{$DEP{$sref->{"opstatus"}}}, ["D", $group, $service]); } } if ($CF->{"full"}) { [ @{$DEPROOT{$OPSTAT{"fail"}}}, @{$DEPROOT{$OPSTAT{"linkdown"}}}, @{$DEPROOT{$OPSTAT{"timeout"}}}, @{$DEPROOT{$OPSTAT{"coldstart"}}}, @{$DEPROOT{$OPSTAT{"warmstart"}}}, @{$DEPROOT{$OPSTAT{"untested"}}}, @{$DEPROOT{$OPSTAT{"unknown"}}}, @{$DEP{$OPSTAT{"fail"}}}, @{$DEP{$OPSTAT{"linkdown"}}}, @{$DEP{$OPSTAT{"timeout"}}}, @{$DEP{$OPSTAT{"coldstart"}}}, @{$DEP{$OPSTAT{"warmstart"}}}, @{$DEPROOT{$OPSTAT{"ok"}}}, @{$DEP{$OPSTAT{"ok"}}}, @{$DEP{$OPSTAT{"untested"}}}, @{$DEP{$OPSTAT{"unknown"}}}, ]; } else { [ @{$DEPROOT{$OPSTAT{"fail"}}}, @{$DEPROOT{$OPSTAT{"linkdown"}}}, @{$DEPROOT{$OPSTAT{"timeout"}}}, @{$DEPROOT{$OPSTAT{"coldstart"}}}, @{$DEPROOT{$OPSTAT{"warmstart"}}}, @{$DEP{$OPSTAT{"fail"}}}, @{$DEP{$OPSTAT{"linkdown"}}}, @{$DEP{$OPSTAT{"timeout"}}}, @{$DEP{$OPSTAT{"coldstart"}}}, @{$DEP{$OPSTAT{"warmstart"}}}, ]; } } # # build the table # sub compose_table { my ($rows, $st) = @_; if (@{$rows} == 0) { display_allok; return; } # # display the failure table # if ($CGI) { $OUT_BUF .= <
Dep Group Service Desc. Last check Next check Alerts Status Summary
$DEP $GROUP $SERVICE $DESC $TIME $NEXT $ALERTS $STATUS $SUMMARY
Description: $descr
Summary: $sum
Hosts: $hosts
Detail:
$sref->{last_detail}
	
EOF if ($d->{"ack"}) { my $comment = pre_pad_if_empty ($d->{"ackcomment"}); $OUT_BUF .= <

Acknowledgment of failure

$comment EOF } $OUT_BUF .= < EOF # # VAR: # variable name from "show opstatus" # # DESCR: # display name for variable # # IFZERO: # 0 = nothing special # 1 = do not display if zero # 2 = do not display if eq "" # # TYPE: # s = seconds # b = boolean # my ($VAR, $DESCR, $IFZERO, $TYPE) = (0..3); foreach my $k ( ["opstatus", "Operational Status", 0], ["exitval", "Exit Value", 0], ["depend", "Dependency", 2], ["monitor", "Monitor Program", 2], ["last_check", "Last Check", 2], ["timer", "Next Check", 2], ["last_success", "Last Success", 2], ["last_failure", "Last Failure", 2], ["first_failure", "First Failure", 2], ["failure_duration", "Failure Duration", 2], ["interval", "Schedule Interval", 0, "s"], ["exclude_period", "Exclude Period", 2], ["exclude_hosts", "Exclude Hosts", 2], ["randskew", "Random Skew", 1, "s"], ["alerts_sent", "Alerts Sent", 1], ["last_alert", "Last Alert", 2], ["monitor_duration", "Monitor Execution Duration", 2, "s"], ["monitor_running", "Monitor currently running", 0, "b"], ) { my $v = undef; if ($d->{$k->[$VAR]} ne "") { $v = \$d->{$k->[$VAR]}; } elsif ($sref->{$k->[$VAR]} ne "") { $v = \$sref->{$k->[$VAR]}; } # # convert types into display form # if ($k->[$TYPE] eq "s") { if ($$v >= 0) { $$v = secs_to_hms ($$v); } } elsif ($k->[$TYPE] eq "b") { $$v = $$v == 0 ? "false" : "true"; } # # display if zero? # next if ($k->[$IFZERO] == 1 && $$v == 0); next if ($k->[$IFZERO] == 2 && $$v eq ""); $OUT_BUF .= < $k->[$DESCR]: $$v EOF } $OUT_BUF .= < EOF # # custom links # if (defined ($CF->{"links"}->{$group}->{$service})) { if (defined ($CF->{"links"}->{$group}->{$service}->{"link-text"})) { $OUT_BUF .= <

{links}->{$group}->{$service}->{link}\">More Information

$CF->{links}->{$group}->{$service}->{'link-text'} EOF } else { $OUT_BUF .= < $CF->{links}->{$group}->{$service}->{link} EOF } } $OUT_BUF .= < Back to $group table Back to summary table

EOF } # # text output # else { my $n; $Text::Wrap::columns = 70; $n->{"desc"} = wrap (" ", " ", $st->{"desc"}->{$group}->{$service}); $n->{"last_summary"} = wrap (" ", " ", $sref->{"last_summary"}); $n->{"hosts"} = wrap (" ", " ", join (" ", @{$st->{groups}->{$group}})); print <{desc} summary ------- $n->{last_summary} hosts ----- $n->{hosts} -----DETAIL----- $sref->{last_detail} -----DETAIL----- EOF if ($d->{"ack"}) { print <{ackcomment} EOF } print <{opstatus} exitval: $d->{exitval} depend: $d->{depend} monitor: $d->{monitor} last check: $d->{last_check} next_check: $d->{timer} EOF } } sub opstatus_color { my $o = shift; my %color_hash = ( $OPSTAT{"untested"} => $CF->{"bg-untested"}, $OPSTAT{"ok"} => $CF->{"bg-ok"}, $OPSTAT{"fail"} => $CF->{"bg-fail"}, ); $color_hash{$o} || ""; } sub tdiff_string { my $time = shift; my $s; if ($time <= 90) { $s = "${time}s"; } else { $s = secs_to_hms ($time); } } # # for each watch entry which specifies only "group", # expand it into "group service" # sub expand_watch { my $what = shift; my $st = shift; for (my $i=0; $i<@{$what}; $i++) { if (@{$what->[$i]} == 1) { my @list; foreach my $l (@{$st->{"watch"}}) { if ($l->[0] eq $what->[$i]->[0]) { push @list, $l; } } splice (@{$what}, $i, 1, @list); } } } sub pre_pad_if_empty { my $l = shift; return "

 
" if ($l =~ /^\s*$/); $l; } mon-1.2.0/COPYRIGHT0000644003616100016640000000150310061516613013500 0ustar trockijtrockij$Id: COPYRIGHT,v 1.1.1.1 2004/06/09 05:18:03 trockij Exp $ The code in this distribution is Copyright (c) 1997-2001 by Jim Trocki. The copyrights of individual contributions are held by the respective authors of those contributions and/or their successors in interest. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. mon-1.2.0/CREDITS0000644003616100016640000000506310146140374013233 0ustar trockijtrockij$Id: CREDITS,v 1.2 2004/11/15 14:45:16 vitroth Exp $ This is a list of people who have contributed code and/or ideas to mon itself or its other components. Jon Meek meekj@pt.cyanamid.com Lots of ideas, inital testing under Solaris, and http.monitor code. David Nolan vitroth+@cmu.edu Many bug fixes and feature additions. David probably runs the largest mon installation in existence. Ed Ravin eravin@panix.com Bug fixes, many enhancements to monitors, feature additions, and fixes for BSD-isms. Martin J. Laubach mjl@emsi.priv.at Bug fixes, scheduler enhancements. Gilles Lamiral lamiral@mail.dotcom.fr Minotaur CGI interface, lots of bug fixes and enhancements. Arthur K. Chan artchan@althem.com Initial version of interactive WWW interface (mon.cgi). Ulrich Pfeifer pfeifer@wait.de Lots of code cleanups, and removal of the Open2 mess. Alan Cox alan@cymru.net Initial suggestion for "alertafter" parameter. David Eckelkamp davide@tradewave.com Supplied "monstatus" command-line utility, and LDAP and DNS monitor. Eric Saxe eric@Zcompany.com Found CR/LF bug in smtp.monitor Peter Gervai grin@tolna.net Found bug in config parser, suggested process throttle option. Michael Griffith grif@cadence.com Added some sanity checking to the config parser. Brian Moore bem@cmc.net 'summary' option for alerts. Michael Alan Dorman mdorman@@debian.org Contributed Perl SNPP alert. Alan Robertson alanr@bell-labs.com Maintains the RedHat RPM. Roderick Schertler roderick@argon.org Maintains the Debian package. Tiago Severina ts@uevora.pt Reported some bugs with FTP.monitor, and is currently working on the dependency code. Tiago has been actively contributing his patches that include inherited dependency support, and once this code stabilizes and is thoroughly tested in a production environment, it will be released. It is some really great stuff--thanks, Tiago! Jing Tan jing@walrus.com Jing has contributed the dependency code which is included in the distribution as of 0.38. James FitzGibbon james@ican.net Contributed msql/mysql monitor. Mark D. Nagel nagel@intelenet.net Submitted version command for Mon::Client module. Lars Marowsky-Bree lmb@teuto.net Added trap code to Mon::Client, contributed syslog.monitor, and fixed some bugs. Mark Wagner markw@horvitznewspapers.net Bug fixes and other contributions. Andrew Ryan andrewr@mycfo.com mon.cgi v1.32 and later, bug reports and fixes. Eric Sorenson eric@transmeta.com Documentation and RPM spec updates mon-1.2.0/etc/0000755003616100016640000000000010640450347012765 5ustar trockijtrockijmon-1.2.0/etc/snmpvar.cf0000644003616100016640000000610410146140376014765 0ustar trockijtrockij# # snmpvar.cf # # this is a sample configuration file for snmpvar.monitor. you # must configure this to meet your own needs. # # list of variables and ranges to be monitored by snmpvar.monitor # refers to variables defined in snmpvar.def # # a Dell server, RAID instrumentation only: Host nov-1 MEGARAID0_LOGICAL_STATUS Min 2 Max 2 Index 0 MEGARAID0_PHYS_STATUS Min 3 Max 3 Index 0 1 2 3 4 5 # a Compaq server: Host nov-2 # has 1 RAID volume, 6 physical disks CPQARRAY_LOG_STATUS Index 1 CPQARRAY_PHYS_STATUS Index 0 1 2 3 4 5 PROLIANT_TEMP_STATUS PROLIANT_PSU_STATUS PROLIANT_FAN_STATUS Index 2 4 5 # a Dell server running NT 4 with perfmib Host ntserv1 WINNT_MEM_COMMITTED Max 700 WINNT_LOGICAL_C_FREE Min 50 WINNT_LOGICAL_D_FREE Min 50 MEGARAID_C0_LOGICAL_STATUS Index 0 MEGARAID_C0_CH0_PHYS_STATUS Index 0 1 2 3 4 PE4300_TEMP_CPU PE4300_TEMP PE4300_5V_CURRENT PE4300_12V_CURRENT PE4300_3V_CURRENT PE4300_FAN_CPU_RPM PE4300_FAN_DISK_RPM PE4X00_PSU_STATUS # an APC UPS (with SNMP adapter or through controlling server running PowerNet) Host srvups1 APCUPS_OUTPUT_STAT APCUPS_LINEVOLT_MAX APCUPS_LINEVOLT_MIN # here, we override the default maximum specified in snmpvar.def: APCUPS_LOAD Max 75 APCUPS_BATT_TEMP # these are the MeasureUPS parameters (external sensor) APCUPS_EXT_TEMP Max 32 APCUPS_EXT_HUMID Min 10 Max 90 APCUPS_EXT_SWITCH_STAT Min 2 Max 2 Index 1 FriendlyName 1 Diesel Generator Status # an HP ProCurve 4000 switch Host hp4000-servers HP_ICF_FAN_STATE # has redundant PSU HP_ICF_PSU_STATE Index 2 3 IF_OPERSTAT Index 1 3 17 25 65 73 FriendlyName 1 A1: Server LAUREL FriendlyName 3 A3: Server HARDY FriendlyName 17 C1: Server TITAN (1000SX) FriendlyName 25 D1: Server MERCURY (1000SX) FriendlyName 65 I1: Switch D1017:G1 (1000TX) FriendlyName 73 J1: Switch SERVERS1:H1 (1000SX) # an IBM8272 Token Ring switch Host trsw1 IBM8272_LINK_STATE Min 1 Max 1 Index 1 2 3 4 5 6 7 9 11 12 13 14 15 16 17 18 21 22 23 24 FriendlyName 1 1: Floor 10 Ring FriendlyName 2 2: Floor 12 Ring FriendlyName 3 3: Floor 13 Ring FriendlyName 9 9: Server NOV-1 FriendlyName 13 13: Server ntserv1 FriendlyName 18 18: Switch 2 Interlink Fibre IBM8272_TEMP_SYS Min 1 Max 1 # a cisco router Host cisco1 IF_OPERSTAT Index 1 2 3 4 FriendlyName 1 1: Internal Ethernet FriendlyName 2 2: Internal TokenRing FriendlyName 3 3: Firewall BGP_PEERSTATE Index 10.1.1.1 10.2.1.1 FriendlyName 10.1.1.1 iBGP Session: myotherrouter FriendlyName 10.2.1.1 eBGP Session: Provider X CISCO_TEMP_STATE # a Nokia IP series firewall appliance Host firewall IF_OPERSTAT Index 1 2 3 FriendlyName 1 1: Leased Line FriendlyName 2 2: DMZ FriendlyName 3 3: Internal Router NOKIA_IP_CHASSIS_TEMP NOKIA_IP_FAN_STAT NOKIA_IP_PSU_STAT NOKIA_IP_PSU_TEMP # a Linux server with some private SNMP extensions Host mailserver LINUX_MAILQUEUE Max 80 mon-1.2.0/etc/na_quota.cf0000644003616100016640000000045310061516616015107 0ustar trockijtrockij# # # filer uranium tree /home kb_warn 10kb kb_emerg 5kb file_warn 200 file_emerg 50 tree /project-mis kb_warn 10kb kb_emerg 5kb file_warn 200 file_emerg 50 user trockij kb_warn 5gb kb_emerg 1gb file_warn 250 file_emerg 100 filer plutonium tree /project kb_warn 50kb mon-1.2.0/etc/example.cf0000644003616100016640000001550210637737260014745 0ustar trockijtrockij# # Example "mon.cf" configuration for "mon". # # $Id: example.cf,v 1.1.1.1.4.1 2007/06/25 13:10:08 trockij Exp $ # # Please read the mon.8 manual page! # # NOTE: # # A "watch" definition (a line which begins with the word "watch" and is # followed by "service" definitions) is terminated by an # empty line, or by a subsequent definition. You may not put blank lines # inside of your watch definitions. # # # global options # cfbasedir = /usr/lib/mon/etc alertdir = /usr/lib/mon/alert.d mondir = /usr/lib/mon/mon.d maxprocs = 20 histlength = 100 randstart = 60s # # authentication types: # getpwnam standard Unix passwd, NOT for shadow passwords # shadow Unix shadow passwords (not implemented) # userfile "mon" user file # authtype = getpwnam # # NB: hostgroup and watch entries are terminated with a blank line (or # end of file). Don't forget the blank lines between them or you lose. # # # group definitions (hostnames or IP addresses) # hostgroup serversbd1 dns-yp1 foo1 bar1 hostgroup serversbd2 dns-yp2 foo2 bar2 ola3 hostgroup routers cisco7000 linuxrouter agsplus hostgroup hubs cisco316t hp800t ssii10 hostgroup workstations blue yellow red green cornflower violet hostgroup netapps f330 f540 hostgroup wwwservers www hostgroup printers hp5si hp5c hp750c hostgroup new nntp hostgroup ftp ftp # # For the servers in building 1, monitor ping and telnet # BOFH is on weekend call :) # watch serversbd1 service ping description ping servers in bd1 interval 5m monitor fping.monitor period wd {Mon-Fri} hr {7am-10pm} alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com alertevery 1h period NOALERTEVERY: wd {Mon-Fri} hr {7am-10pm} alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com period wd {Sat-Sun} alert mail.alert bofh@domain.com alert page.alert bofh@domain.com service telnet description telnet to servers in bd1 interval 10m monitor telnet.monitor depend serversbd1:ping period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com watch serversbd2 service ping description ping servers in bd2 interval 5m monitor fping.monitor depend routers:ping period wd {Mon-Fri} hr {7am-10pm} alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com alertevery 1h period wd {Sat-Sun} alert mail.alert bofh@domain.com alert page.alert bofh@domain.com service telnet description telnet to servers in bd2 interval 10m monitor telnet.monitor depend routers:ping serversbd2:ping period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com watch mailhost service fping period wd {Mon-Fri} hr {7am-10pm} alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com alertevery 1h service telnet interval 10m monitor telnet.monitor period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com service smtp interval 10m monitor smtp.monitor period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert page.alert mis-pagers@domain.com service imap interval 10m monitor imap.monitor period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert page.alert mis-pagers@domain.com service pop interval 10m monitor pop3.monitor period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert page.alert mis-pagers@domain.com watch wwwservers service ping interval 2m monitor fping.monitor allow_empty_group period wd {Sun-Sat} alert qpage.alert mis-pagers alertevery 45m service http interval 4m monitor http.monitor allow_empty_group period wd {Sun-Sat} alert qpage.alert mis-pagers upalert mail.alert -S "web server is back up" mis alertevery 45m service telnet monitor telnet.monitor allow_empty_group period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com # # If the routers aren't pingable, send a page using # a phone line and the IXO protocol, which doesn't # rely on the network. Failure of a router is pretty serious, # so check every two minutes. # # Send out one page every 45 minutes, but log the failure # to a file every time. # watch routers service ping description routers which connect bd1 and bd2 interval 1m monitor fping.monitor period wd {Sun-Sat} alert qpage.alert mis-pagers alertevery 45m period LOGFILE: wd {Sun-Sat} alert file.alert -d /usr/lib/mon/log.d routers.log # # If mon cannot ping one of the hubs, users will be calling soon # watch hubs service ping interval 1m monitor fping.monitor period wd {Sun-Sat} alert qpage.alert mis-pagers alertevery 45m # # Monitor free disk space on the NFS servers # # When space gets below 5 megs, send mail, and delete # the oldest nightly snapshots. # # monitors that terminate with ";;" are not executed with the # host group appended to the command line # watch netapps service freespace interval 15m monitor freespace.monitor /f330:5000 /f540:5000 ;; period wd {Sun-Sat} alert mail.alert mis@domain.com # alert delete.snapshot alertevery 1h # # workstations # watch workstations service ping interval 5m monitor fping.monitor period wd {Sun-Sat} alert mail.alert mis@domain.com alertevery 1h # # news server # watch news service ping interval 5m monitor fping.monitor period wd {Sun-Sat} alert mail.alert mis@domain.com alertevery 1h service nntp interval 5m monitor nntp.monitor period wd {Sun-Sat} alert mail.alert mis@domain.com alertevery 1h # # HP printers # watch printers service ping interval 5m monitor fping.monitor period wd {Sun-Sat} alert mail.alert mis@domain.com alertevery 1h service hpnp interval 5m monitor hpnp.monitor period wd {Sun-Sat} alert mail.alert mis@domain.com alertevery 1h # # FTP server # watch ftp service ftp interval 5m monitor ftp.monitor period wd {Sun-Sat} alert mail.alert mis@domain.com alertevery 1h # # dial-in terminal server # watch dialin service 555-1212 interval 60m monitor dialin.monitor.wrap -n 555-1212 -t 80 ;; period wd {Sun-Sat} alert mail.alert mis@domain.com upalert mail.alert mis@domain.com alertevery 8h service 555-1213 interval 33m monitor dialin.monitor.wrap -n 555-1213 -t 80 ;; period wd {Sun-Sat} alert mail.alert mis@domain.com upalert mail.alert mis@domain.com alertevery 8h mon-1.2.0/etc/example.m40000644003616100016640000001675410061516616014676 0ustar trockijtrockijdnl dnl ######################################################################## dnl # This file is meant to be processed with m4 # dnl ######################################################################## dnl dnl # dnl # m4 macro definitions dnl # dnl define(_DIR_, `/usr/lib/mon')dnl define(_FILE_LOG_DIR_, `_DIR_/file-log.d')dnl dnl dnl # dnl # useful time periods dnl # dnl define(_FILE_, `_TST_/log.d')dnl define(_WEEKDAY_, `wd {Mon-Fri}')dnl define(_WEEKEND_, `wd {Sat-Sun}')dnl define(_ANYTIME_, `wd {Sun-Sat}')dnl define(_OFF_HOURS_, `wd {Mon-Fri} hr {10pm-7am}, wd {Sat Sun}')dnl define(_WORK_HOURS_, `wd {Mon-Fri} hr {7am-10pm}')dnl define(_PAGING_HOURS_, `_WORK_HOURS_')dnl dnl dnl # dnl # useful pager aliases dnl # dnl define(_MIS_PAGER_, `joe bob zoomzip')dnl dnl dnl # dnl # useful mail aliases dnl # dnl define(_MIS_EMAIL_, `joe bob zoomzip')dnl define(_PRINTER_EMAIL_, `zoomzip')dnl # facilities are responsible for dnl # printer maintenance define(_RAS_EMAIL_, `bob')dnl # bob is the remote access admin dnl dnl # dnl # -------------------------actual config begins here------------------------- dnl # # # Example "mon.cf" configuration for "mon". # # $Id: example.m4,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ # # # This works with 0.38pre13 # # # global options # cfbasedir = _DIR_/etc alertdir = _DIR_/alert.d mondir = _DIR_/mon.d maxprocs = 20 histlength = 100 randstart = 60s # # authentication types: # getpwnam standard Unix passwd, NOT for shadow passwords # shadow Unix shadow passwords (not implemented) # userfile "mon" user file # authtype = getpwnam # # NB: hostgroup and watch entries are terminated with a blank line (or # end of file). Don't forget the blank lines between them or you lose. # # # group definitions (hostnames or IP addresses) # hostgroup serversbd1 dns-yp1 foo1 bar1 hostgroup serversbd2 dns-yp2 foo2 bar2 ola3 hostgroup routers cisco7000 linuxrouter agsplus hostgroup hubs cisco316t hp800t ssii10 hostgroup workstations blue yellow red green cornflower violet hostgroup netapps f330 f540 hostgroup wwwservers www hostgroup printers hp5si hp5c hp750c hostgroup new nntp hostgroup ftp ftp # # For the servers in building 1, monitor ping and telnet # BOFH is on weekend call :) # watch serversbd1 service ping description ping servers in bd1 interval 5m monitor fping.monitor period _WORK_HOURS_ alert mail.alert _MIS_EMAIL_ alertevery 1h period PAGE: _PAGING_HOURS_ alert qpage.alert _MIS_PAGER_ alertevery 2h period _WEEKEND_ alert mail.alert bofh@domain.com alert qpage.alert bofh@domain.com alertevery 1h service telnet description telnet to servers in bd1 interval 10m monitor telnet.monitor depend serversbd1:ping period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert mail.alert _MIS_EMAIL_ period PAGE: _PAGING_HOURS_ alert qpage.alert _MIS_PAGER_ alertevery 2h watch serversbd2 service ping description ping servers in bd2 interval 5m monitor fping.monitor depend routers:ping period _WORK_HOURS_ alert mail.alert _MIS_EMAIL_ alertevery 1h period _WEEKEND_ alert mail.alert bofh@domain.com alert qpage.alert bofh@domain.com service telnet description telnet to servers in bd2 interval 10m monitor telnet.monitor depend routers:ping serversbd2:ping period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert mail.alert _MIS_EMAIL_ period PAGE: _PAGING_HOURS_ alert qpage.alert _MIS_PAGER_ alertevery 2h watch mailhost service fping period _WORK_HOURS_ alert mail.alert _MIS_EMAIL_ alert qpage.alert _MIS_PAGER_ alertevery 1h service telnet interval 10m monitor telnet.monitor period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert mail.alert _MIS_EMAIL_ alert qpage.alert _MIS_PAGER_ service smtp interval 10m monitor smtp.monitor period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert qpage.alert _MIS_PAGER_ service imap interval 10m monitor imap.monitor period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert qpage.alert _MIS_PAGER_ service pop interval 10m monitor pop3.monitor period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert qpage.alert _MIS_PAGER_ watch wwwservers service ping interval 2m monitor fping.monitor allow_empty_group period _ANYTIME_ alert qpage.alert _MIS_PAGER_ alertevery 45m service http interval 4m monitor http.monitor allow_empty_group period _ANYTIME_ alert qpage.alert _MIS_PAGER_ upalert mail.alert -S "web server is back up" _MIS_EMAIL_ alertevery 45m service telnet monitor telnet.monitor allow_empty_group period _WORK_HOURS_ alertevery 1h alertafter 2 30m alert mail.alert _MIS_EMAIL_ alert qpage.alert _MIS_PAGER_ # # If the routers aren't pingable, send a page using # a phone line and the IXO protocol, which doesn't # rely on the network. Failure of a router is pretty serious, # so check every two minutes. # # Send out one page every 45 minutes, but log the failure # to a file every time. # watch routers service ping description routers which connect bd1 and bd2 interval 1m monitor fping.monitor period _ANYTIME_ alert qpage.alert _MIS_PAGER_ alertevery 45m period LOGFILE: _ANYTIME_ alert file.alert -d _FILE_LOG_DIR_ routers.log ;; # # If mon cannot ping one of the hubs, users will be calling soon # watch hubs service ping interval 1m monitor fping.monitor period _ANYTIME_ alert qpage.alert _MIS_PAGER_ alertevery 45m # # Monitor free disk space on the NFS servers # # When space gets below 5 megs, send mail, and delete # the oldest nightly snapshots. # # monitors that terminate with ";;" are not executed with the # host group appended to the command line # watch netapps service freespace interval 15m monitor freespace.monitor /f330:5000 /f540:5000 ;; period _ANYTIME_ alert mail.alert _MIS_EMAIL_ # alert delete.snapshot alertevery 1h # # workstations # watch workstations service ping interval 5m monitor fping.monitor period _ANYTIME_ alert mail.alert _MIS_EMAIL_ alertevery 1h # # news server # watch news service ping interval 5m monitor fping.monitor period _ANYTIME_ alert mail.alert _MIS_EMAIL_ alertevery 1h service nntp interval 5m monitor nntp.monitor period _ANYTIME_ alert mail.alert _MIS_EMAIL_ alertevery 1h # # HP printers # watch printers service ping interval 5m monitor fping.monitor period _ANYTIME_ alert mail.alert _PRINTER_EMAIL_ alertevery 1h service hpnp interval 5m monitor hpnp.monitor period wd {Sun-Sat} alert mail.alert _PRINTER_EMAIL_ alertevery 1h # # FTP server # watch ftp service ftp interval 5m monitor ftp.monitor period _ANYTIME_ alert mail.alert _MIS_EMAIL_ alertevery 1h # # dial-in terminal server # watch dialin service 555-1212 interval 60m monitor dialin.monitor.wrap -n 555-1212 -t 80 ;; period _ANYTIME_ alert mail.alert _RAS_EMAIL_ upalert mail.alert _RAS_EMAIL_ alertevery 8h service 555-1213 interval 33m monitor dialin.monitor.wrap -n 555-1213 -t 80 ;; period wd {Sun-Sat} alert mail.alert _RAS_EMAIL_ upalert mail.alert _RAS_EMAIL_ alertevery 8h mon-1.2.0/etc/example.monshowrc0000644003616100016640000000205410061516616016361 0ustar trockijtrockij# # Configuration for monshow # # Place this file in one of the following places: # $HOME/.monshowrc # cgi-bin/.monshowrc # # /etc/mon/monshowrc will be overridden if any of the others # exist. # # $Id: example.monshowrc,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ # # mon server and port set host monhost set port 2583 #set prot # disabled groups/services/hosts set show-disabled # show all statuses instead of just failures set full # refresh web page every 5 minutes #set refresh 30 # # row colors for CGI, these are the defaults # set table-color cccccc set bg-ok a0d0a0 set bg-fail e088b7 set bg-untested e0e0e0 # # HTML header # set html-header

This is the custom HTML header

END # # footer for detail report # link bd2 ping http://monhost/detail-bd2-ping.html link-text bd2 ping This is detail about bd2 ping which is probably data collected from some other non-mon source. END # # show only these services. if none of these # are listed, show all groups and services # #watch serversbd1 #watch serversbd2 #service news nntp mon-1.2.0/etc/netappfree.cf0000644003616100016640000000133110061516616015425 0ustar trockijtrockij# # netappfree.cf- configuration file for netappfree.monitor # # format: # # host filesys free # # host hostname of the netapp, should correspond with a host # defined in the netapp host group # # filesys The filesystem to check, as represented in netapp.mib. # For ONTAP 5.*, resembles "/vol/vol0/" or "/vol/vol0/.snapshot" # For ONTAP 4.3.4, resembles "active" or "snapshot" # For ONTAP 4.3.1, resembles "/" or "/.snapshot" # # free The amount of free space which will trigger a failure, # expressed as "10kb", "10MB", or "10GB" # # $Id: netappfree.cf,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ # f330 / 4GB f540 active 4GB f630 /vol/vol0/ 4GB pu / 4gb np active 25gb mon-1.2.0/etc/snmpopt.cf0000644003616100016640000000066410146140376015004 0ustar trockijtrockij# # snmpopt.cf # # This optional file is used to pass parameters to the SNMP library, # used by snmpvar.monitor. # # (default values shown) # common options # Version = 1 # Port = 161 # Retries = 8 # Timeout = 5 # SNMPv1/v2 options # Community = public # SNMPv3 options # SecName = initial # SecLevel = noAuthNoPriv # AuthPass = # SecEngineId = # ContextEngineId = # Context # AuthProto = MD5 # PrivProto = DES # PrivPass = mon-1.2.0/etc/snmpdiskspace.cf0000644003616100016640000000445610616216722016154 0ustar trockijtrockij# # snmpdiskspace.cf- configuration file for snmpdiskspace.monitor # $Id: snmpdiskspace.cf,v 1.1.2.1 2007/05/02 23:25:06 trockij Exp $ # # format: # # host filesys free ifree # # The monitor script uses a "first match" algorithm. So put your more # specific directives at top, and leave the more general directives # for the bottom. # # # host Regex describing the name of the host(s). Remember to escape # dots if you're fully qualifying hostnames, e.g., # some\.domain\.com, otherwise you might not be matching what # you think you're matching. # # filesys Regex describing the filesystem to check, as represented # in the relevant mib (after mangling by the monitor). # Remember to use regex syntax, and not file glob syntax. # # free The amount of free space which will trigger a failure, # expressed as "10", "10kb", "10MB", or "10GB" for # bytes, kilobytes, megabytes or gigabytes. The format # "10%" signifies percent of the total disk space. # "0" turns of checking for the filesystem/disk. # # ifree Percentage of free inodes, below which will trigger a failure. # Expressed as "5%". The host must support the UCD dskTable MIB. # # # BE SURE TO TEST your configuration with the "--listall" option! # This way, you will see exactly what filesystems are found by the script, # and what their alarm thresholds will be. # # Examples: # * * 5% # Give a warning when the free space goes below 5 % # (This is the default behavior of the monitor) # This should always be the last line in your config file # because it will match everything. # # * * 5% 10% # As above, but also warn if free inodes drops below 10%. # # ior * 15% # On the host ior the limit is 15% # # poo / 1gb # poo's root should have a full gig free # # www[1-4] * 500mb # any partition on the machines www1, www2, www3, and ww4 # should have at least 500mb free. # # * /cdrom/.* 0 # anything that is mounted on /cdrom will be full anyway # At least for Solaris, you need a regex like this bec. # vold mounts each new CD on a new partition, and you # won't know its name until you put it into the drive. # # # Always ignore anything on cdrom partitions * /cdrom.* 0 * /mnt 0 # # # This line always should be last because it matches everything. * * 5% mon-1.2.0/etc/auth.cf0000644003616100016640000000153610061516616014244 0ustar trockijtrockij# # authentication file # # entries look like this: # command: {user|all}[,user...] # # THE DEFAULT IT TO DENY ACCESS TO ALL IF THIS FILE # DOES NOT EXIST, OR IF A COMMAND IS NOT DEFINED HERE # # # command section # command section ack: AUTH_ANY checkauth: all clear: AUTH_ANY disable: AUTH_ANY dump: AUTH_ANY enable: AUTH_ANY get: AUTH_ANY list: all loadstate: AUTH_ANY protid: all quit: all reload: AUTH_ANY reset: AUTH_ANY savestate: AUTH_ANY servertime: all set: AUTH_ANY start: AUTH_ANY stop: AUTH_ANY term: AUTH_ANY test: AUTH_ANY version: all # # trap section # # if no source hosts or users are defined, then do not # accept traps # trap section #source_host user password # # allow from user "mon" from any host # # * mon monpassword # # allow from host 127.0.0.1 without requiring # a valid username and password # # 127.0.0.1 * * # mon-1.2.0/etc/very-simple.cf0000644003616100016640000000142210061516616015551 0ustar trockijtrockij# # Very simple mon.cf file # # $Id: very-simple.cf,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ # alertdir = /usr/lib/mon/alert.d mondir = /usr/lib/mon/mon.d maxprocs = 20 histlength = 100 randstart = 60s # # define groups of hosts to monitor # hostgroup servers localhost hostgroup mail mailhost watch servers service ping interval 5m monitor fping.monitor period wd {Mon-Fri} hr {7am-10pm} alert mail.alert root@localhost alertevery 1h period wd {Sat-Sun} alert mail.alert root@localhost service telnet interval 10m monitor telnet.monitor period wd {Mon-Fri} hr {7am-10pm} alertevery 1h alertafter 2 30m alert mail.alert root@localhost mon-1.2.0/etc/S99mon0000755003616100016640000000142110616427407014013 0ustar trockijtrockij#!/bin/sh # # start/stop the mon server # # You probably want to set the path to include # nothing but local filesystems. # # chkconfig: 2345 99 10 # description: mon system monitoring daemon # processname: mon # config: /etc/mon/mon.cf # pidfile: /var/run/mon.pid # PATH=/bin:/usr/bin:/sbin:/usr/sbin export PATH # Source function library. . /etc/rc.d/init.d/functions # See how we were called. case "$1" in start) echo -n "Starting mon daemon: " daemon /usr/sbin/mon -f -l -c /etc/mon/mon.cf echo touch /var/lock/subsys/mon ;; stop) echo -n "Stopping mon daemon: " killproc mon echo rm -f /var/lock/subsys/mon ;; status) status mon ;; restart) killall -HUP mon ;; *) echo "Usage: mon {start|stop|status|restart}" exit 1 esac exit 0 mon-1.2.0/etc/mon.cgi.cf0000644003616100016640000000657310146140376014643 0ustar trockijtrockij# # The mon.cgi config file. # Format: # key = value # # Blank lines and lines that begin with '#' are ignored. # # Both key names and values are case sensitive. # # This file comes with the mon.cgi distribution and contains all of the # valid key/value pairs that mon.cgi will accept. # # The latest version of mon.cgi is always available at: # http://www.nam-shub.com/files/ # # If there are errors in your config file, mon.cgi will stop parsing it, # and will print messages to STDERR, which should end up in your web # server's error log. # # $Id: mon.cgi.cf,v 1.2 2004/11/15 14:45:18 vitroth Exp $ # # Your organization (what you want printed on the top of each page) organization = Network Operations # Contact email for mon administrator at your site monadmin = bofh@your.domain #Company or mon logo (URL path) logo = /URL-path/to/your.gif # URL to go to when you click on the logo image logo_link = http://www.kernel.org/pub/software/admin/mon/html/ # Seconds between page reload reload_time = 180 # Where to run mon (host,port) monhost = localhost monport = 2583 # Set this to anything other than 'Y' or 'yes' to turn off authentication # (HINT: authentication is a *good* thing) must_login = yes # Application secret. Set this to something long and unguessable. app_secret = LKAHETOI#KJHJKSHDOWOIUW^*((985i2hkljlkjfdhglkdhfgdlkfjghldksfjhg98 34tklh qrthq3 i3lu4 KLHKLJHKLJH ncxmvn owow y YnneO87210502673kn6l3 # Default username and password (only used if must_login is set) default_username = readonly default_password = public # Idle time, in seconds, until login cookie is invalidated. Note that if # ( login_expire_time < reload_time ) you will not be able to "idle". login_expire_time = 900 # Whether or not to untaint HTML in ack msgs using HTML::Entities (recommended) untaint_ack_msgs = yes # The name of the cookie set by mon.cgi and its path cookie_name = mon-cookie cookie_path = / # Default alternate fonts to use (assumes default font is a serif font) fixed_font_face = courier sans_serif_font_face = Helvetica, Arial # Default color scheme for page BGCOLOR = black TEXTCOLOR = white LINKCOLOR = yellow VLINKCOLOR = #00FFFF # Default colors for failed services greenlight_color = #009900 redlight_color = red unchecked_color = #000033 yellowlight_color = #FF9933 # # A white-background look for mon.cgi, from Thomas Bates # #BGCOLOR = #FFFFFF #TEXTCOLOR = #000000 #LINKCOLOR = 0000FF #VLINKCOLOR = #551a8b # #greenlight_color=#a0d0a0 #redlight_color=ff6060 #unchecked_color=f0f0f0 #disabled_color=#e0e0e0 #yellowlight_color = #FFAF4F # Maximum number of downtime events to show, per page dtlog_max_failures_per_page = 100 # Watch keywords will show only the specified hostgroups by default. # Matching is by regexp. # e.g., show the watch whose name is www #watch = www # e.g., show any watches whose names start with gw- #watch = gw-.* # Set show_watch_strict to 'yes' if you want to be sure that users only # information about the hostgroups that they are authorized to # view. If show_watch_strict is set to 1, as far as your GUI users # will know, there is nothing else running on the mon instance # except for their hostgroups, *even if those users know the names # of other hostgroups on your mon server*. # # Set to show_watch_strict to 'no' to show only the defined watch # groups by default, but allow users to see information about # others as well. show_watch_strict = no mon-1.2.0/etc/syslog-monitor.conf0000644003616100016640000000611010146140376016636 0ustar trockijtrockij# Configuration file for syslog.monitor # $Id: syslog-monitor.conf,v 1.2 2004/11/15 14:45:18 vitroth Exp $ ############################################################################# # Which timeout to set for select()ing on the input socket. # You really do not wish to play with this. # select_timeout 10 # Log level (just like syslog you know;) loglevel 6 # If undefined, will write to stdout # You better specify an absolute path here. # logfile /var/log/syslog.monitor # Where copies of incoming syslog messages get written to. # In the filename, you can define the following substitutions: # %H = gets replaced with the hostname # %L = gets replaced with the syslog level as a string # %l = same, but as a number # %F = syslog facility (local0, kern, ...) # %G = hostgroup the host belongs to # %D = date at which the message was received, in ISO 8601 (1999-04-03) syslogfile /var/log/syslog.%H.%F.%D # If set, will make syslog.monitor fork and go into the background as soon # as possible. # Be aware that the program will refuse to daemonize if you do not set a logfile. # daemon_mode mon_host cherusker.bi.teuto.net # Set these if necessary # mon_user # mon_pass # IP number on which to listen for incomeing UDP packets bind_ip 0.0.0.0 # port number (you almost certainly do not want to touch this) # bind_port 514 # Define a check called "emerg" check emerg # A slightly more elaborate description, which is sent to the mon server # as part of the trap desc Emergencies # The period which is monitored period 60m # How often this check _must_ trigger within said period. # Set to -1 to disable. min -1 # How often this check might occur at max within the period. max 3 # If this is set, no further matches will be checked if this check matched. # Use this carefully. # final # The check itself. Evaluated within Perl (), you can do powerful stuff # here. The current message is referenced by $$r. # Parameters you might want to match on: # $$r{'src_port'} - The source port from which the packert was sent. # $$r{'src_ip'} - The source IP. # $$r{'host'} - The hostname, resolved using the cache build # at startup. # $$r{'level'} - numeric syslog level of the message. (0-7) # $$r{'Level'} - syslog level as a string (ie 'crit') # $$r{'facility'} - Facility (ie 'local0' etc) # $$r{'msg'} - The text part of the message # $$r{'time'} - The unixtime at which the message was received, # $$r{'group'} - The group the host sending this message # belongs to pattern ($$r{'level'} <=3) # A "catch-all" - we really should receive at least one line within 15m, # But more than 1000 might be strange... check all desc All period 15m min 200 max 10000 final pattern (1) # Relating to hostgroup unix: group unix # For each host in the hostgroup unix, run a separate instance of each # check listed here (references the check defined above) per-host emerg # For the _entire_ hostgroup, run these checks: per-group all # Only on this host, run these: # on-host donar.bi.teuto.net emerg-kern mon-1.2.0/etc/snmpvar.def0000644003616100016640000003713110146140376015137 0ustar trockijtrockij# # sample snmpvar.def. you should configure this to meet your # own needs. # # Definitions of variables to be monitored using snmpvar.monitor # # # generic host (router/switch/...) Variable IF_OPERSTAT OID .1.3.6.1.2.1.2.2.1.8 Description ifOperStatus DefaultEQ 1 Decode 1 up Decode 2 down Decode 3 testing Decode 4 unknown Decode 5 dormant # generic router Variable BGP_PEERSTATE OID .1.3.6.1.2.1.15.3.1.2 Description bgpPeerState DefaultEQ 6 Decode 1 idle Decode 2 connect Decode 3 active Decode 4 opensent Decode 5 openconfirm Decode 6 established # generic Host Resources MIB implementation Variable HR_DEVICE_STATUS OID .1.3.6.1.2.1.25.3.2.1.5. Description Device Status DefaultEQ 2 Decode 1 unknown Decode 2 running Decode 3 warning Decode 4 testing Decode 5 down # some variables from a Windows NT "perfmib" configuration # see ms-perfmib directory for NT side configuration Variable WINNT_CPU_TOTAL OID .1.3.6.1.4.1.311.1.1.3.1.1.1.9.0 Description CPU Load Total Unit % Variable WINNT_CPU_SYS OID .1.3.6.1.4.1.311.1.1.3.1.1.1.11.0 Description CPU Load System Unit % Variable WINNT_MEM_COMMITTED OID .1.3.6.1.4.1.311.1.1.3.1.1.2.2.0 Description Committed Memory Scale / 1024 / 1024 # the Scale expression is used as (eval($rawval . $scale)) Unit MB Variable WINNT_MEM_AVAILABLE OID .1.3.6.1.4.1.311.1.1.3.1.1.2.1.0 Description Available Memory Scale / 1024 /1024 Unit MB Variable WINNT_LOGICAL_C_FREE OID .1.3.6.1.4.1.311.1.1.3.1.1.6.1.4.6.48.58.48.58.67.58 Description Free Disk Space on drive C Unit MB Variable WINNT_LOGICAL_D_FREE OID .1.3.6.1.4.1.311.1.1.3.1.1.6.1.4.6.48.58.48.58.68.58 Description Free Disk Space on drive D Unit MB # Dell PowerEdge 2550 Server Instrumentation Variable PE2550_FAN_SYS_RPM OID .1.3.6.1.4.1.674.10892.1.700.12.1.6.1. Description System Fan Speed DefaultIndex 1 2 3 Unit rpm DefaultMin 600 DefaultMax 6000 DefaultMaxValid 10000 DefaultGroup Environment Variable PE2550_FAN_DISK_RPM OID .1.3.6.1.4.1.674.10892.1.700.12.1.6.1.4 Description Disk Fan Speed Unit rpm DefaultMin 6000 DefaultMax 14000 DefaultMaxValid 15000 DefaultGroup Environment Variable PE2550_TEMP_CPU OID .1.3.6.1.4.1.674.10892.1.700.20.1.6.1. Description CPU Temperature DefaultIndex 1 2 Unit C Scale / 10.0 DefaultMax 50 DefaultGroup Environment Variable PE2550_TEMP OID .1.3.6.1.4.1.674.10892.1.700.20.1.6.1. Description Temperature DefaultIndex 3 4 5 FriendlyName 3 Motherboard FriendlyName 4 Backplane 1 FriendlyName 5 Backplane 2 Unit C Scale / 10.0 DefaultMax 40 DefaultGroup Environment Variable PE2550_PSU_STATUS DefaultIndex 1 2 OID .1.3.6.1.4.1.674.10892.1.600.12.1.5.1. Description Power Supply Status DefaultEQ 3 Decode 1 other Decode 2 unknown Decode 3 OK Decode 4 noncrit Decode 5 critical Decode 6 nonrecoverable DefaultGroup Power # Dell PowerEdge 4300 Server Instrumentation Variable PE4300_TEMP_CPU OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description CPU Temperature DefaultIndex 1 2 Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE4300_TEMP OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description Temperature DefaultIndex 3 4 5 6 FriendlyName 3 @Motherboard FriendlyName 4 @Ambient FriendlyName 5 @Backplane 1 FriendlyName 6 @Backplane 2 Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE4300_5V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+5V) DefaultIndex 1 4 7 Scale / 1000.0 Unit A DefaultMax 25 DefaultMaxValid 100 DefaultGroup Power Variable PE4300_12V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+12V) DefaultIndex 2 5 8 Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 100 DefaultGroup Power Variable PE4300_3V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+3V) DefaultIndex 3 6 9 Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 100 DefaultGroup Power Variable PE4300_FAN_CPU_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description CPU Fan Speed Unit rpm DefaultIndex 1 2 DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment # really the same as above, other index ranges only; different description # one could also make it an array and use FriendlyName in the .cf file Variable PE4300_FAN_DISK_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description Disk Fan Speed Unit rpm DefaultIndex 3 4 5 DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment Variable PE4X00_PSU_STATUS DefaultIndex 1 2 3 OID .1.3.6.1.4.1.674.10891.304.1.4.2.6.1. Description Power Supply Status DefaultEQ 3 Decode 1 other Decode 2 unknown Decode 3 OK Decode 4 noncrit Decode 5 critical Decode 6 nonrecoverable DefaultGroup Power Variable PE4X00_EXT_DISK1_PSU_STATUS DefaultIndex 1 2 OID .1.3.6.1.4.1.674.10891.304.1.4.2.6.2. Description ExtStorage 1 PSU Status DefaultEQ 3 Decode 1 other Decode 2 unknown Decode 3 OK Decode 4 noncrit Decode 5 critical Decode 6 nonrecoverable DefaultGroup Power # Dell PowerEdge 6350 Server Instrumentation Variable PE6350_TEMP_CPU OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description CPU Temperature DefaultIndex 1 2 3 4 Scale / 10.0 Unit C DefaultMax 55 DefaultGroup Environment Variable PE6350_TEMP OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description Temperature DefaultIndex 5 6 7 FriendlyName 5 @Motherboard FriendlyName 6 @Ambient FriendlyName 7 @Backplane Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE6350_TEMP_EXT_DISK1 OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.2.1 Description ExtStorage 1 Temperature Scale / 10.0 Unit C DefaultGroup Environment Variable PE6350_FAN_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description Fan Speed DefaultIndex 1 2 3 4 Unit rpm DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment Variable PE6350_FAN_RPM_EXT_DISK1 OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.2. Description ExtStorage 1 Fan Speed DefaultIndex 1 2 3 Unit rpm DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment # Dell PowerEdge 4200 Server Instrumentation Variable PE4200_TEMP_CPU OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description CPU Temperature DefaultIndex 1 2 Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE4200_TEMP OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description Temperature DefaultIndex 3 4 5 6 FriendlyName 3 @Ambient FriendlyName 4 @Panel FriendlyName 5 @Backplane Top FriendlyName 6 @Backplane Bottom Scale / 10.0 Unit C DefaultMax 35 DefaultGroup Environment Variable PE4200_PSU_5V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+5V) DefaultIndex 1 2 FriendlyName 1 @Top PSU FriendlyName 2 @Bottom PSU Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 50 DefaultGroup Power Variable PE4200_PSU_3V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+3.3V) DefaultIndex 3 4 FriendlyName 3 @Top PSU FriendlyName 4 @Bottom PSU Scale / 1000.0 Unit A DefaultMax 5 DefaultMaxValid 50 DefaultGroup Power Variable PE4200_PSU_12V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+12V) DefaultIndex 5 6 FriendlyName 5 @Top PSU FriendlyName 6 @Bottom PSU Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 50 DefaultGroup Power Variable PE4200_FAN_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description Fan Speed Unit rpm DefaultIndex 1 3 4 5 # Fan #2 is a standby unit FriendlyName 1 @Chassis 1 FriendlyName 2 @Chassis 2 FriendlyName 3 @Chassis 3 FriendlyName 4 @Top PSU FriendlyName 5 @Bottom PSU DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment # AMI MegaRAID (aka Dell PERC) RAID controller instrumentation Variable MEGARAID_C0_LOGICAL_STATUS OID .1.3.6.1.4.1.3582.1.1.2.1.3.0. Description RAID Ctl0 Volume Status DefaultEQ 2 Decode 0 offline Decode 1 degraded Decode 2 normal Decode 3 initialize Decode 4 checkconsistency Variable MEGARAID_C1_LOGICAL_STATUS OID .1.3.6.1.4.1.3582.1.1.2.1.3.1. Description RAID Ctl1 Volume Status DefaultEQ 2 Decode 0 offline Decode 1 degraded Decode 2 normal Decode 3 initialize Decode 4 checkconsistency Variable MEGARAID_C0_CH0_PHYS_STATUS OID .1.3.6.1.4.1.3582.1.1.3.1.4.0.0. Description Ctl0Ch0 Phys Drive Status DefaultEQ 3 Decode 1 ready Decode 3 online Decode 4 failed Decode 5 rebuild Decode 6 hotspare Decode 20 nondisk Variable MEGARAID_C1_CH0_PHYS_STATUS OID .1.3.6.1.4.1.3582.1.1.3.1.4.1.0. Description Ctl1Ch0 Phys Drive Status DefaultEQ 3 Decode 1 ready Decode 3 online Decode 4 failed Decode 5 rebuild Decode 6 hotspare Decode 20 nondisk Variable MEGARAID_C1_CH1_PHYS_STATUS OID .1.3.6.1.4.1.3582.1.1.3.1.4.1.1. Description Ctl1Ch1 Phys Drive Status DefaultEQ 3 Decode 1 ready Decode 3 online Decode 4 failed Decode 5 rebuild Decode 6 hotspare Decode 20 nondisk # APC SmartUPS monitoring (using PowerNet SNMP agents or SNMP adapter boards) Variable APCUPS_LINEVOLT_MAX OID .1.3.6.1.4.1.318.1.1.1.3.2.2.0 Description Recent Max Line Voltage Unit V DefaultMax 245 DefaultGroup Power Variable APCUPS_LINEVOLT_MIN OID .1.3.6.1.4.1.318.1.1.1.3.2.3.0 Description Recent Min Line Voltage Unit V DefaultMin 205 DefaultGroup Power Variable APCUPS_LOAD OID .1.3.6.1.4.1.318.1.1.1.4.2.3.0 Description Output Load Unit % DefaultMax 90 DefaultGroup Power Variable APCUPS_BATT_TEMP OID .1.3.6.1.4.1.318.1.1.1.2.2.2.0 Description Battery Temperature Unit C DefaultMax 45 DefaultGroup Environment # external sensors connected to a MeasureUPS board Variable APCUPS_EXT_TEMP OID .1.3.6.1.4.1.318.1.1.2.1.1.0 Description Temperature Unit C DefaultGroup Environment Variable APCUPS_EXT_HUMID OID .1.3.6.1.4.1.318.1.1.2.1.2.0 Description Humidity Unit % DefaultMin 10 DefaultMax 90 DefaultGroup Environment Variable APCUPS_EXT_SWITCH_STAT OID .1.3.6.1.4.1.318.1.1.2.2.2.1.5 Description Contact Decode 1 unknown Decode 2 OK Decode 3 FAULT Variable APCUPS_OUTPUT_STAT OID .1.3.6.1.4.1.318.1.1.1.4.1.1.0 Description UPS Status DefaultEQ 2 Decode 1 unknown Decode 2 Online Decode 3 On Battery Decode 4 On Smart Boost Decode 5 Timed Sleeping Decode 6 Software Bypass Decode 7 Off Decode 8 Rebooting Decode 9 Switched Bypass Decode 10 Hardware Failure Bypass Decode 11 Sleeping Until Power Return Decode 12 On Smart Trim DefaultGroup Power # Compaq ProLiant Server Instrumentation Variable PROLIANT_TEMP_STATUS OID .1.3.6.1.4.1.232.6.2.6.3.0 Description Temperature Status DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 Degraded Decode 4 FAILED DefaultGroup Environment Variable PROLIANT_FAN_STATUS OID .1.3.6.1.4.1.232.6.2.6.7.1.9.0. Description Fan Status DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 Degraded Decode 4 FAILED DefaultGroup Environment Variable PROLIANT_PSU_STATUS OID .1.3.6.1.4.1.232.6.2.9.3.1.5.0. Description Power Supply Status DefaultIndex 1 2 DefaultEQ 1 Decode 1 OK Decode 2 Failure Decode 3 BIST Failure Decode 4 Fan Failure Decode 5 Temp Failure Decode 6 Interlock Open DefaultGroup Power Variable CPQARRAY_LOG_STATUS OID .1.3.6.1.4.1.232.3.2.3.1.1.4.1. Description RAID Volume Status DefaultIndex 1 DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 FAILED Decode 4 Unconfigured Decode 5 Recovering Decode 6 Ready For Rebuild Decode 7 Rebuilding Decode 8 Wrong Drive Decode 9 Bad Connect Decode 10 Overheating Decode 11 Shutdown Decode 12 expanding Decode 13 Not Available Decode 14 Queued For Expansion Variable CPQARRAY_PHYS_STATUS OID .1.3.6.1.4.1.232.3.2.5.1.1.6.1. Description Phys Drive Status DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 Failed Decode 4 Predictive Failure # IBM 8272 Token Ring switch Variable IBM8272_LINK_STATE OID .1.3.6.1.4.1.2.6.66.1.2.2.1.1.15. Description Link State DefaultEQ 1 Decode 1 up Decode 2 down Variable IBM8272_TEMP_SYS OID .1.3.6.1.4.1.2.6.66.1.2.1.2.11.0 Description Switch Temperature DefaultEQ 1 Decode 1 normal Decode 2 HIGH DefaultGroup Environment # Nokia IP series firewall appliance Variable NOKIA_IP_CHASSIS_TEMP OID .1.3.6.1.4.1.94.1.21.1.1.5.0 Description Chassis Temperature DefaultEQ 1 Decode 1 normal Decode 2 OVERTEMP DefaultGroup Environment Variable NOKIA_IP_FAN_STAT OID .1.3.6.1.4.1.94.1.21.1.2.1.1.2. Description Fan Status DefaultEQ 1 Decode 1 running Decode 2 DEAD DefaultGroup Environment Variable NOKIA_IP_PSU_STAT OID .1.3.6.1.4.1.94.1.21.1.3.1.1.3. Description PSU Status DefaultEQ 1 Decode 1 running Decode 2 DEAD DefaultGroup Environment Variable NOKIA_IP_PSU_TEMP OID .1.3.6.1.4.1.94.1.21.1.3.1.1.2. Description Chassis Temperature DefaultEQ 1 Decode 1 normal Decode 2 OVERTEMP DefaultGroup Environment # Mail Server (custom extension scripts in UCD SNMP agent) Variable LINUX_MAILQUEUE OID .1.3.6.1.4.1.2021.8.1.101.1 Description Mail Queue Length # see sample in ucd-snmp subdir in snmpvar.monitor distribution # cisco router # ciscoEnvMonTemperatureState Variable CISCO_TEMP_STATE OID .1.3.6.1.4.1.9.9.13.1.3.1.6. Description Chassis Temperature DefaultIndex 1 DefaultEQ 1 Decode 1 normal Decode 2 Warning Decode 3 CRITICAL Decode 4 SHUTDOWN Decode 5 not present DefaultGroup Environment Variable CISCO_MEM_POOL_FREE OID .1.3.6.1.4.1.9.9.48.1.1.1.6. Description Memory Pool Free Bytes DefaultIndex 1 2 FriendlyName 1 CPU FriendlyName 2 I/O # HP switch # hpicfSensorStatus Variable HP_ICF_FAN_STATE OID .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4.1 Description Fan Status DefaultEQ 4 Decode 1 unknown Decode 2 bad Decode 3 warning Decode 4 good Decode 5 not present DefaultGroup Environment Variable HP_ICF_PSU_STATE OID .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4. Description PSU Status DefaultEQ 4 Decode 1 unknown Decode 2 bad Decode 3 warning Decode 4 good Decode 5 not present DefaultGroup Power mon-1.2.0/mon0000755003616100016640000040523310631517213012734 0ustar trockijtrockij#!/usr/bin/perl # # mon - schedules service tests and triggers alerts upon failures # # Jim Trocki, trockij@arctic.org # # $Id: mon,v 1.22.2.2 2007/06/06 11:46:19 trockij Exp $ # # Copyright (C) 1998 Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # use strict; my $RCSID='$Id: mon,v 1.22.2.2 2007/06/06 11:46:19 trockij Exp $'; my $AUTHOR='trockij@arctic.org'; my $RELEASE='$Name: mon-1-2-0-release $'; # # NetBSD rc.d script compatibility # $0= "mon" . " " . join(" ", @ARGV) if $^O eq "netbsd"; # # modules in the perl distribution # use Getopt::Long qw(:config no_ignore_case); use Text::ParseWords; use POSIX; use Fcntl; use Socket; use Sys::Hostname; use Sys::Syslog qw(:DEFAULT); use FileHandle; use Data::Dumper; # # CPAN modules # use Time::HiRes qw(gettimeofday tv_interval usleep); use Time::Period; sub auth; sub call_alert; sub check_auth; sub clear_timers; sub client_accept; sub client_close; sub client_command; sub client_dopending; sub client_write_opstatus; sub collect_output; sub daemon; sub debug; sub debug_dir; sub dep_ok; sub dep_summary; sub depend; sub dhmstos; sub die_die; sub disen_host; sub disen_service; sub disen_watch; sub do_alert; sub do_startup_alerts; sub err_startup; sub esc_str; sub gen_scriptdir_hash; sub handle_io; sub handle_trap; sub handle_trap_timeout; sub host_exists; sub host_singleton_group; sub inRange; sub init_cf_globals; sub init_globals; sub load_auth; sub load_state; sub normalize_paths; sub mysystem; sub init_dtlog; sub pam_conv_func; sub proc_cleanup; sub process_event; sub randomize_startdelay; sub read_cf; sub readhistoricfile; sub reload; sub remove_proc; sub reset_server; sub run_monitor; sub save_state; sub set_last_test; sub set_op_status; sub reset_timer; sub setup_server; sub sock_write; sub syslog_die; sub un_esc_str; sub usage; sub write_dtlog; # # globals # my %opt; # cmdline arguments my %CF; # configuration directives my $PWD; # current working directory my $HOSTNAME; # system hostname my $STOPPED; # 1 = scheduler stopped, 0 = not stopped my $STOPPED_TIME; # time(2) scheduler was stopped, if stopped my $SLEEPINT; # don't touch my %watch_disabled; # watches disabled, indexed by watch my %watch; # main configuration file data structure my %alias; # aliases my %groups; # hostgroups, indexed by group my %views; # view lists, indexed by name my %view_users; # view preferences, per user # # I/O routine globals # my %clients; # fds of connected clients my $numclients; # count of connected clients my %running; # procs which are forked and running, # indexed by group/service my $iovec; # used for select loop my %runningpid; # procs which are forked and running, # indexed by PID my $procs; # number of outstanding procs my %fhandles; # input file handles of children my %ibufs; # buffer structure to hold data from children my ($fdset_rbits, $fdset_ebits); # # history globals # my @last_alerts; # alert history, in memory my @last_failures; # failure history, in memory # # misc. globals # my $i; # loop iteration counter, used for debugging only my $lasttm; # the last time(2) the mon loop started my $pid_file_owner; # set when creating pid file my $tm; # used in main loop # # authentication structure globals # my %AUTHCMDS; my %NOAUTHCMDS; my %AUTHTRAPS; # # PAM authentication globals (must not be lexically scoped) # use vars qw ( $PAM_username $PAM_password ) ; # # opstatus globals # my (%OPSTAT, %FAILURE, %SUCCESS, %WARNING); # operational statuses my ($TRAP_COLDSTART, $TRAP_WARMSTART, # trap types $TRAP_LINKDOWN, $TRAP_LINKUP, $TRAP_AUTHFAIL, $TRAP_EGPNEIGHBORLOSS, $TRAP_ENTERPRISE, $TRAP_HEARTBEAT); my ($STAT_FAIL, $STAT_OK, $STAT_COLDSTART, # _op_status values $STAT_WARMSTART, $STAT_LINKDOWN, $STAT_UNKNOWN, $STAT_TIMEOUT, $STAT_UNTESTED, $STAT_DEPEND, $STAT_WARN); my ($FL_MONITOR, $FL_UPALERT, # alert type flags $FL_TRAP, $FL_TRAPTIMEOUT, $FL_STARTUPALERT, $FL_TEST, $FL_REDISTRIBUTE, $FL_ACKALERT, $FL_DISABLEALERT); my $TRAP_PDU; my (%ALERTHASH, %MONITORHASH); # hash of pathnames for # alerts/monitors my $PROT_VERSION; my $START_TIME; # time(2) server started my $TRAP_PRO_VERSION; # trap protocol version my $DEP_EVAL_SANDBOX; # perl environment for # dep evals # # argument parsing # my $getopt_result = GetOptions(\%opt, qw/ A|authfile=s B|cfbasedir=s D|statedir=s L|logdir=s M|m4:s O|syslogfacility=s P|pidfile=s S|stopped a|alertdir=s b|basedir=s c|configfile=s d|debug+ f|fork h|help i|sleep=i k|maxkeep=i l|loadstate:s m|maxprocs=i p|port=i r|randstart=s s|scriptdir=s t|trapport=i v|version /); if (!$getopt_result) { usage(); exit; } # # these two things can be taken care of without # initializing things further # if ($opt{"v"}) { print "$RCSID\n$RELEASE\n"; exit; } if ($opt{"h"}) { usage(); exit; } if ($opt{"d"}) { eval 'require Data::Dumper;'; if ($@ ne "") { die "error: $@\n"; } } if ($^O eq "linux" || $^O =~ /^(open|free|net)bsd$/ || $^O eq "aix") { Sys::Syslog::setlogsock ('unix'); } elsif ($^O eq "solaris") { Sys::Syslog::setlogsock ('stream'); } openlog ("mon", "cons,pid", $CF{"SYSLOG_FACILITY"}); # # definitions # die "basedir $opt{b} does not exist\n" if ($opt{"b"} && ! -d $opt{"b"}); init_globals(); init_cf_globals(); syslog_die ("config file $CF{CF} does not exist") if (! -f $CF{"CF"}); # # read config file # if ((my $err = read_cf ($CF{"CF"}, 1)) ne "") { syslog_die ("$err"); } closelog; openlog ("mon", "cons,pid", $CF{"SYSLOG_FACILITY"}); # # cmdline args override config file # $CF{"ALERTDIR"} = $opt{"a"} if ($opt{"a"}); $CF{"BASEDIR"} = $opt{"b"} if ($opt{"b"}); $CF{"AUTHFILE"} = $opt{"A"} if ($opt{"A"}); $CF{"LOGDIR"} = $opt{"L"} if ($opt{"L"}); $CF{"STATEDIR"} = $opt{"D"} if ($opt{"D"}); $CF{"SCRIPTDIR"} = $opt{"s"} if ($opt{"s"}); $CF{"PIDFILE"} = $opt{"P"} if defined($opt{"P"}); # allow empty pidfile $CF{"MAX_KEEP"} = $opt{"k"} if ($opt{"k"}); $CF{"MAXPROCS"} = $opt{"m"} if ($opt{"m"}); $CF{"SERVPORT"} = $opt{"p"} if ($opt{"p"}); $CF{"TRAPPORT"} = $opt{"t"} if ($opt{"t"}); $SLEEPINT = $opt{"i"} if ($opt{"i"}); if ($opt{"r"}) { syslog_die ("bad randstart value") if (!defined (dhmstos ($opt{"r"}))); $CF{"RANDSTART"} = dhmstos($opt{"r"}); } if ($opt{"S"}) { $STOPPED = 1; $STOPPED_TIME = time; } # # do some path cleanups and # build lookup tables for alerts and monitors # normalize_paths(); gen_scriptdir_hash(); if ($opt{"d"}) { debug_dir(); } # # load the auth control, bind, and listen # load_auth (1); load_view_users(1); # # init client interface # %clients is an I/O structure, indexed by the fd of the client # $numclients is the number of clients currently connected # $iovec is fd_set for clients and traps # %clients = (); $numclients = 0; $iovec = ''; setup_server(); # # fork and become a daemon # init_dtlog() if ($CF{"DTLOGGING"}); daemon() if ($opt{"f"}); if ($CF{"PIDFILE"} ne '' && open PID, ">$CF{PIDFILE}") { $pid_file_owner = $$; print PID "$pid_file_owner\n"; close PID; } set_last_test (); # # randomize startup checks if asked to # randomize_startdelay() if ($CF{"RANDSTART"}); @last_alerts = (); @last_failures = (); readhistoricfile (); $procs = 0; $i=0; $lasttm=time; $fdset_rbits = $fdset_ebits = ''; %watch_disabled = (); $SIG{HUP} = \&reset_server; $SIG{INT} = \&handle_sigterm; # for interactive debugging $SIG{TERM} = \&handle_sigterm; $SIG{PIPE} = 'IGNORE'; # # load previously saved state # if (exists $opt{"l"}) { if ($opt{"l"}) { # If -l was given an argument (all, disabled, opstatus, etc...) # pass that to load_state load_state($opt{"l"}); }else{ # Otherwise default to old behavior of just loading disabled hosts/services/groups load_state("disabled"); } } syslog ('info', "mon server started"); # # startup alerts # do_startup_alerts(); # # main monitoring loop # for (;;) { debug (1, "$i" . ($STOPPED ? " (stopped)" : "") . "\n"); $i++; $tm = time; # # step through the watch groups, decrementing and # handing expired timers # if (!$STOPPED) { if (defined $CF{"EXCLUDE_PERIOD"} && $CF{"EXCLUDE_PERIOD"} ne "" && inPeriod (time, $CF{"EXCLUDE_PERIOD"})) { debug (1, "not running monitors because of global exclude_period\n"); } else { foreach my $group (keys %watch) { foreach my $service (keys %{$watch{$group}}) { my $sref = \%{$watch{$group}->{$service}}; my $t = $tm - $lasttm; $t = 1 if ($t <= 0); # # trap timer # if ($sref->{"traptimeout"}) { $sref->{"_trap_timer"} -= $t; if ($sref->{"_trap_timer"} <= 0 && $tm - $sref->{"_last_trap"} > $sref->{"traptimeout"}) { $sref->{"_trap_timer"} = $sref->{"traptimeout"}; handle_trap_timeout ($group, $service); } } # # trap duration timer # if (defined ($sref->{"_trap_duration_timer"})) { $sref->{"_trap_duration_timer"} -= $t; if ($sref->{"_trap_duration_timer"} <= 0) { set_op_status ($group, $service, $STAT_OK); undef $sref->{"_trap_duration_timer"}; } } # # polling monitor timer # if ($sref->{"interval"} && $sref->{"_timer"} <= 0 && !$running{"$group/$service"}) { if (!$CF{"MAXPROCS"} || $procs < $CF{"MAXPROCS"}) { if (defined $sref->{"exclude_period"} && $sref->{"exclude_period"} ne "" && inPeriod (time, $sref->{"exclude_period"})) { debug (1, "not running $group,$service because of exclude_period\n"); } elsif (($sref->{"dep_behavior"} eq "m" && defined $sref->{"depend"} && $sref->{"depend"} ne "") || (defined $sref->{"monitordepend"} && $sref->{"monitordepend"} ne "")) { if (dep_ok ($sref, 'm')) { run_monitor ($group, $service); } else { debug (1, "not running $group,$service because of depend\n"); } } else { run_monitor ($group, $service); } } else { syslog ('info', "throttled at $procs processes"); } } else { $sref->{"_timer"} -= $t; if ($sref->{"_timer"} < 0) { $sref->{"_timer"} = 0; } } } } } } $lasttm = time; # # collect any output from subprocs # collect_output; # # clean up after exited processes, and trigger alerts # proc_cleanup; # # handle client, server, and trap I/O # this routine sleeps for $SLEEPINT if no I/O is ready # handle_io; } die "not reached"; END { unlink $CF{"PIDFILE"} if $$ == $pid_file_owner && $CF{"PIDFILE"} ne ''; } ############################################################################## # # startup alerts # sub do_startup_alerts { foreach my $group (keys %watch) { foreach my $service (keys %{$watch{$group}}) { do_alert ($group, $service, "", 0, $FL_STARTUPALERT); } } } # # handle alert event, throttling the alert call if necessary # sub do_alert { my ($group, $service, $output, $retval, $flags) = @_; my (@groupargs, $last_alert, $alert); my ($sref, $range, @alerts); debug (1, "do_alert flags=$flags\n"); $sref = \%{$watch{$group}->{$service}}; my $tmnow = time; if ($STOPPED) { syslog ("notice", "ignoring alert for $group,$service because the mon scheduler is stopped"); return; } # # if redistribute it set, call it now # if ($sref->{"redistribute"} ne '') { my ($fac, $args); ($fac, $args) = split (/\s+/, $sref->{"redistribute"}, 2); call_alert ( group => $group, service => $service, output => $output, retval => $retval, flags => $flags | $FL_REDISTRIBUTE, alert => $fac, args => $args, ) } # # if the alarm is disabled, ignore it # if ((exists $watch_disabled{$group} && $watch_disabled{$group} == 1) || (defined $sref->{"disable"} && $sref->{"disable"} == 1)) { syslog ("notice", "ignoring alert for $group,$service"); return; } # # dependency check # if (!($flags & $FL_STARTUPALERT) && !($flags & $FL_UPALERT) && ((defined $sref->{"depend"} && $sref->{"dep_behavior"} eq "a") || (defined $sref->{"alertdepend"}))) { if (!$sref->{"_depend_status"}) { debug (1, "alert for $group,$service supressed because of dep fail\n"); return; } } my ($summary) = split("\n", $output); $summary = "(NO SUMMARY)" if (!defined $summary || $summary =~ /^\s*$/m); my ($prevsumm) = split("\n", $sref->{"_failure_output"}) if (defined $sref->{"_failure_output"}); $prevsumm = "(NO SUMMARY)" if (!defined $prevsumm || $prevsumm =~ /^\s*$/m); my $strippedsummary = $summary; $strippedsummary =~ s/\s//mg; my $strippedprevious = $prevsumm; $strippedprevious =~ s/\s//mg; # If the summary changed, un-acknowledge the service if 'unack_summary' is set if ($sref->{'_ack'} != 0 && $sref->{'unack_summary'} == 1 && $strippedsummary ne $strippedprevious && !($flags & ($FL_UPALERT|$FL_ACKALERT|$FL_DISABLEALERT))) { print STDERR "Unacking $group/$service:\nSummary: X".$strippedsummary."X\nPrevious: X".$strippedprevious."X\n"; $sref->{"_ack"} = 0; $sref->{"_ack_comment"} = ""; $sref->{"_consec_failures"}=1; foreach my $period (keys %{$sref->{"periods"}}) { $sref->{"periods"}->{$period}->{"_last_alert"} = 0; # $sref->{"periods"}->{$period}->{"_alert_sent"} = 0; $sref->{"periods"}->{$period}->{"_1stfailtime"} = 0; $sref->{"periods"}->{$period}->{"_failcount"} = 0; } } # # no alerts for ack'd failures, except for upalerts or summary changes # when observe_summary is set # if ($sref->{"_ack"} != 0 && !($flags & ($FL_UPALERT|$FL_ACKALERT|$FL_DISABLEALERT))) { syslog ("debug", "no alert for $group.$service" . " because of ack'd failure"); return; } # # check each time period for pending alerts # foreach my $periodlabel (keys %{$sref->{"periods"}}) { # # only send alerts that are in the proper period # next if (!inPeriod ($tmnow, $sref->{"periods"}->{$periodlabel}->{"period"})); my $pref = \%{$sref->{"periods"}->{$periodlabel}}; # # skip upalerts/ackalerts not paired with down alerts # disable by setting "no_comp_alerts" in period section # if (!$pref->{"no_comp_alerts"} && ($flags & ($FL_UPALERT | $FL_ACKALERT)) && !$pref->{"_alert_sent"}) { syslog ('debug', "$group/$service/$periodlabel: Suppressing upalert since no down alert was sent.") if ($flags & $FL_UPALERT); syslog ('debug', "$group/$service/$periodlabel: Suppressing ackalert since no down alert was sent.") if ($flags & $FL_ACKALERT); next; } # # skip looping upalerts when "no_comp-alerts" set. # if ($pref->{"no_comp_alerts"} && ($flags & $FL_UPALERT) && ($pref->{"_no_comp_alerts_upalert_sent"}>0)) { next; } # # do this if we're not handling an upalert, startupalert, ackalert, or disablealert # if (!($flags & $FL_UPALERT) && !($flags & $FL_STARTUPALERT) && !($flags & $FL_DISABLEALERT) && !($flags & $FL_ACKALERT)) { # # alert only when exit code matches # if (exists $pref->{"alertexitrange"}) { next if (!inRange($retval, $pref->{"alertexitrange"})); } # # alert only numalerts # if ($pref->{"numalerts"} && $pref->{"_alert_sent"} >= $pref->{"numalerts"}) { syslog ('debug', "$group/$service/$periodlabel: Suppressing alert since numalerts is met."); next; } # # only alert once every "alertevery" seconds, unless # output from monitor is different or if strict alertevery # # strict and _ignore_summary are basically the same though # strict short-circuits and overrides other settings and exists # for compatibility with pre-1.1 configs # if ($pref->{"alertevery"} != 0 && # if alertevery is set and ($tmnow - $pref->{"_last_alert"} < $pref->{"alertevery"}) && # we're within the time period and one of these: (($pref->{"_alertevery_strict"}) || # [ strict is set or ($pref->{"_observe_detail"} && $sref->{"_failure_output"} eq $output) || # observing detail and output hasn't changed or (!$pref->{"_observe_detail"} && (!$pref->{"_ignore_summary"}) && ($prevsumm eq $summary)) || # not observing detail # and not ignoring summary and summ hasn't changed or ($pref->{"_ignore_summary"}))) # we're ignoring summary changes ] { syslog ('debug', "$group/$service/$periodlabel: Suppressing alert for now due to alertevery."); next; } # # alertafter NUM # if (defined $pref->{"alertafter_consec"} && ($sref->{"_consec_failures"} < $pref->{"alertafter_consec"})) { syslog ('debug', "$group/$service/$periodlabel: Suppressing alert for now due to alertafter consecutive failures."); next; } # # alertafter timeval # elsif ( (!defined ($pref->{"alertafter"})) && (defined ($pref->{"alertafterival"})) ) { $pref->{'_1stfailtime'} = $tmnow if $pref->{'_1stfailtime'} == 0; if ($tmnow - $pref->{'_1stfailtime'} <= $pref->{'alertafterival'}) { syslog ('debug', "$group/$service/$periodlabel: Suppressing alert for now due to alertafter numval."); next; } } # # alertafter NUM timeval # elsif (defined ($pref->{"alertafter"})) { $pref->{"_failcount"}++; if ($tmnow - $pref->{'_1stfailtime'} <= $pref->{'alertafterival'} && $pref->{"_failcount"} < $pref->{"alertafter"}) { syslog ('debug', "$group/$service/$periodlabel: Suppressing alert for now due to alertafter num timeval."); next; } # # start a new time interval # if ($tmnow - $pref->{'_1stfailtime'} > $pref->{'alertafterival'}) { $pref->{"_failcount"} = 1; } if ($pref->{"_failcount"} == 1) { $pref->{"_1stfailtime"} = $tmnow; } if ($pref->{"_failcount"} < $pref->{"alertafter"}) { syslog ('debug', "$group/$service/$periodlabel: Suppressing alert for now due to alertafter num timeval."); next; } } } # # at this point, no alerts are blocked, # so send the alerts # # # trigger multiple alerts in this period # if ($flags & $FL_UPALERT) { @alerts = @{$pref->{"upalerts"}}; } elsif ($flags & $FL_STARTUPALERT) { @alerts = @{$pref->{"startupalerts"}}; } elsif ($flags & $FL_DISABLEALERT) { @alerts = @{$pref->{"disablealerts"}}; } elsif ($flags & $FL_ACKALERT) { @alerts = @{$pref->{"ackalerts"}}; } else { @alerts = @{$pref->{"alerts"}}; } my $called = 0; for (my $i=0;$i<@alerts;$i++) { my ($range, $fac, $args); if ($alerts[$i] =~ /^exit\s*=\s*((\d+|\d+-\d+))\s/i) { $range=$1; next if (!inRange($retval, $range)); ($fac, $args) = (split (/\s+/, $alerts[$i], 3))[1,2]; } else { ($fac, $args) = split (/\s+/, $alerts[$i], 2); } $called++ if (call_alert ( group => $group, service => $service, output => $output, retval => $retval, flags => $flags, pref => $pref, alert => $fac, args => $args, ) ); } # # reset _alert_sent if up alert was sent from a trap # if ($called) { if( (($FL_TRAP | $flags) && ($FL_UPALERT & $flags)) ) { $pref->{"_alert_sent"} = 0; $pref->{"_last_alert"} = 0; } else { $pref->{"_alert_sent"}++; # # reset _no_comp_alerts_upalert_sent counter - when service will be # back up, upalert will be sent. # if ($pref->{"no_comp_alerts"}) { $pref->{"_no_comp_alerts_upalert_sent"} = 0; } } if ($pref->{"no_comp_alerts"} && ($flags & $FL_UPALERT)) { $pref->{"_no_comp_alerts_upalert_sent"}++; } } } } # # walk through the watch list and reset the time # the service was last called # sub set_last_test { my ($i, $k, $t); $t = time; foreach $k (keys %watch) { foreach my $service (keys %{$watch{$k}}) { $watch{$k}->{$service}->{"_timer"} = $watch{$k}->{$service}->{"interval"}; } } } # # parse configuration file # # build the following data structures: # # %group # each element of %group is an array of hostnames # group records are terminated by a blank line in the # configuration file # %watch{"group"}->{"service"}->{"variable"} = value # %alias # sub read_cf { my ($CF, $commit) = @_; my ($var, $watchgroup, $ingroup, $curgroup, $inwatch, $args, $hosts, %disabled, $h, $i, $inalias, $curalias, $inview, $curview); my ($sref, $pref); my ($service, $period); my ($authtype, @authtypes); my $line_num = 0; # # parse configuration file # if (exists($opt{"M"}) || $CF =~ /\.m4$/) { my $m4 = "m4"; $m4 = $opt{"M"} if (defined($opt{"M"})); return "could not open m4 pipe of cf file: $CF: $!" if (!open (CFG, "$m4 $CF |")); } else { return "could not open cf file: $CF: $!" if (!open (CFG, $CF)); } # # buffers to hold the new un-committed config # my %new_alias = (); my %new_views = (); my %new_CF = %CF; my %new_groups; my %new_watch; my %is_watch; my $servnum = 0; my $DEP_BEHAVIOR = "a"; my $DEP_MEMORY = 0; my $UNACK_SUMMARY = 0; my $incomplete_line = 0; my $linepart = ""; my $l = ""; my $acc_line = ""; for (;;) { # # read in a logical "line", which may span actual lines # do { $line_num++; last if (!defined ($linepart = )); next if $linepart =~ /^\s*#/; # # accumulate multi-line lines (ones which are \-escaped) # if ($incomplete_line) { $linepart =~ s/^\s*//; } if ($linepart =~ /^(.*)\\\s*$/) { $incomplete_line = 1; $acc_line .= $1; chomp $acc_line; next; } else { $acc_line .= $linepart; } $l = $acc_line; $acc_line = ""; chomp $l; $l =~ s/^\s*//; $l =~ s/\s*$//; $incomplete_line = 0; $linepart = ""; }; # # global variables which can be overriden by the command line # if (!$inwatch && $l =~ /^(\w+) \s* = \s* (.*) \s*$/ix) { if ($1 eq "alertdir") { $new_CF{"ALERTDIR"} = $2; } elsif ($1 eq "basedir") { $new_CF{"BASEDIR"} = $2; $new_CF{"BASEDIR"} = "$PWD/$new_CF{BASEDIR}" if ($new_CF{"BASEDIR"} !~ m{^/}); $new_CF{"BASEDIR"} =~ s{/$}{}; } elsif ($1 eq "cfbasedir") { $new_CF{"CFBASEDIR"} = $2; $new_CF{"CFBASEDIR"} = "$PWD/$new_CF{CFBASEDIR}" if ($new_CF{"CFBASEDIR"} !~ m{^/}); $new_CF{"CFBASEDIR"} =~ s{/$}{}; } elsif ($1 eq "mondir") { $new_CF{"SCRIPTDIR"} = $2; } elsif ($1 eq "logdir") { $new_CF{"LOGDIR"} = $2; } elsif ($1 eq "histlength") { $new_CF{"MAX_KEEP"} = $2; } elsif ($1 eq "serverport") { $new_CF{"SERVPORT"} = $2; } elsif ($1 eq "trapport") { $new_CF{"TRAPPORT"} = $2; } elsif ($1 eq "serverbind") { $new_CF{"SERVERBIND"} = $2; } elsif ($1 eq "clientallow") { $new_CF{"CLIENTALLOW"}= $2; } elsif ($1 eq "trapbind") { $new_CF{"TRAPBIND"} = $2; } elsif ($1 eq "pidfile") { $new_CF{"PIDFILE"} = $2; } elsif ($1 eq "randstart") { $new_CF{"RANDSTART"} = dhmstos($2); if (!defined ($new_CF{"RANDSTART"})) { close (CFG); return "cf error: bad value '$2' for randstart option (syntax: randstart = timeval), line $line_num"; } } elsif ($1 eq "maxprocs") { $new_CF{"MAXPROCS"} = $2; } elsif ($1 eq "statedir") { $new_CF{"STATEDIR"} = $2; } elsif ($1 eq "authfile") { $new_CF{"AUTHFILE"} = $2; if (! -r $new_CF{"AUTHFILE"}) { close (CFG); return "cf error: authfile '$2' does not exist or is not readable, line $line_num"; } } elsif ($1 eq "authtype") { $new_CF{"AUTHTYPE"} = $2; @authtypes = split(' ' , $new_CF{"AUTHTYPE"}) ; foreach $authtype (@authtypes) { if ($authtype eq "pam") { eval 'use Authen::PAM qw(:constants);' ; if ($@ ne "") { close (CFG); return "cf error: could not use PAM authentication: $@"; } } } } elsif ($1 eq "pamservice") { $new_CF{"PAMSERVICE"} = $2; } elsif ($1 eq "userfile") { $new_CF{"USERFILE"} = $2; if (! -r $new_CF{"USERFILE"}) { close (CFG); return "cf error: userfile '$2' does not exist or is not readable, line $line_num"; } } elsif ($1 eq "historicfile") { $new_CF{"HISTORICFILE"} = $2; } elsif ($1 eq "historictime") { $new_CF{"HISTORICTIME"} = dhmstos($2); if (!defined $new_CF{"HISTORICTIME"}) { close (CFG); return "cf error: bad value '$2' for historictime command (syntax: historictime = timeval), line $line_num"; } } elsif ($1 eq "cltimeout") { $new_CF{"CLIENT_TIMEOUT"} = dhmstos($2); if (!defined ($new_CF{"CLIENT_TIMEOUT"})) { close (CFG); return "cf error: bad value '$2' for cltimeout command (syntax: cltimeout = secs), line $line_num"; } } elsif ($1 eq "monerrfile") { $new_CF{"MONERRFILE"} = $2; } elsif ($1 eq "dtlogfile") { $new_CF{"DTLOGFILE"} = $2; } elsif ($1 eq "dtlogging") { $new_CF{"DTLOGGING"} = 0; if ($2 == 1 || $2 eq "yes" || $2 eq "true") { $new_CF{"DTLOGGING"} = 1; } } elsif ($1 eq "dep_recur_limit") { $new_CF{"DEP_RECUR_LIMIT"} = $2; } elsif ($1 eq "dep_behavior") { if ($2 ne "m" && $2 ne "a" && $2 ne "hm") { close (CFG); return "cf error: unknown dependency behavior '$2', line $line_num"; } $DEP_BEHAVIOR = $2; } elsif ($1 eq "dep_memory") { my $memory = dhmstos($2); if (!defined $memory) { close (CFG); return "cf error: bad value '$2' for dep_memory option (syntax: dep_memory = timeval), line $line_num"; } $DEP_MEMORY = $memory; } elsif ($1 eq "unack_summary") { if (defined $2) { if ($2 =~ /y(es)?/i) { $UNACK_SUMMARY = 1; } elsif ($2 =~ /n(o)?/i) { $UNACK_SUMMARY = 0; } elsif ($2 eq "0" || $2 eq "1") { $UNACK_SUMMARY = $2; } else { return "cf error: invalid unack_summary value '$2' (syntax: unack_summary [0|1|y|yes|n|no])"; } } else { $UNACK_SUMMARY = 1; } } elsif ($1 eq "syslog_facility") { $new_CF{"SYSLOG_FACILITY"} = $2; } elsif ($1 eq "startupalerts_on_reset") { if ($2 =~ /^1|yes|true|on$/i) { $new_CF{"STARTUPALERTS_ON_RESET"} = 1; } else { $new_CF{"STARTUPALERTS_ON_RESET"} = 0; } } elsif ($1 eq "monremote") { $new_CF{"MONREMOTE"} = $2; } elsif ($1 eq "exclude_period") { if (inPeriod (time, $2) == -1) { close (CFG); return "cf error: malformed exclude_period '$2' (the specified time period is not valid as per Time::Period::inPeriod), line $line_num"; } $new_CF{"EXCLUDE_PERIOD"} = $2; } else { close (CFG); return "cf error: unknown variable '$1', line $line_num"; } next; } # # end of record # if ($l eq "") { $ingroup = 0; $inalias = 0; $inwatch = 0; $period = 0; $inview = 0; $curgroup = ""; $curalias = ""; $watchgroup = ""; $servnum = 0; next; } # # hostgroup record # if ($l =~ /^hostgroup\s+([a-zA-Z0-9_.-]+)\s*(.*)/) { $curgroup = $1; $ingroup = 1; $inview = 0; $inalias = 0; $inwatch = 0; $period = 0; $hosts = $2; %disabled = (); foreach $h (grep (/^\*/, @{$groups{$curgroup}})) { # We have to make $i = $h because $h is actually # a pointer to %groups and will modify it. $i = $h; $i =~ s/^\*//; $disabled{$i} = 1; } @{$new_groups{$curgroup}} = split(/\s+/, $hosts); # # keep hosts which were previously disabled # for ($i=0;$i<@{$new_groups{$curgroup}};$i++) { $new_groups{$curgroup}[$i] = "*$new_groups{$curgroup}[$i]" if ($disabled{$new_groups{$curgroup}[$i]}); } next; } if ($ingroup) { push (@{$new_groups{$curgroup}}, split(/\s+/, $l)); for ($i=0;$i<@{$new_groups{$curgroup}};$i++) { $new_groups{$curgroup}[$i] = "*$new_groups{$curgroup}[$i]" if ($disabled{$new_groups{$curgroup}[$i]}); } next; } # # alias record # if ($l =~ /^alias\s+([a-zA-Z0-9_.-]+)\s*$/) { $inalias = 1; $inview = 0; $ingroup = 0; $inwatch = 0; $period = 0; $curalias = $1; next; } if ($inalias) { if ($l =~ /\A(.*)\Z/) { push (@{$new_alias{$curalias}}, $1); next; } } # # view record # if ($l =~ /^view\s+([a-zA-Z0-9_.-]+)\s+(.*)$/) { $inview = 1; $inalias = 0; $ingroup = 0; $inwatch = 0; $period = 0; $curview = $1; $new_views{$curview}={}; foreach (split(/\s+/, $2)) { $new_views{$curview}->{$_} = 1; }; next; } if ($inview) { foreach (split(/\s+/, $l)) { $new_views{$curview}->{$_} = 1; }; next; } # # watch record # if ($l =~ /^watch\s+([a-zA-Z0-9_.-]+)\s*/) { $watchgroup = $1; $inwatch = 1; $inview = 0; $inalias = 0; $ingroup = 0; $period = 0; if (!defined ($new_groups{$watchgroup})) { # # This hostgroup doesn't exist yet, we'll create it and warn # @{$new_groups{$watchgroup}} = ($watchgroup); print STDERR "Warning: watch group $watchgroup defined with no corresponding hostgroup.\n"; } if ($new_watch{$watchgroup}) { close (CFG); return "cf error: watch '$watchgroup' already defined, line $line_num"; } $curgroup = ""; $service = ""; next; } if ($inwatch) { # # env variables # if ($l =~ /^([A-Z_][A-Z0-9_]*)=(.*)/) { if ($service eq "") { close (CFG); return "cf error: environment variable defined without a service, line $line_num"; } $new_watch{$watchgroup}->{$service}->{"ENV"}->{$1} = $2; next; } # # non-env variables # else { $l =~ /^(\w+)\s*(.*)$/; $var = $1; $args = $2; } # # service entry # if ($var eq "service") { $service = $args; if ($service !~ /^[a-zA-Z0-9_.-]+$/) { close (CFG); return "cf error: invalid service tag '$args', line $line_num"; } elsif (exists $new_watch{$watchgroup}->{$service}) { close (CFG); return "cf error: service $service already defined for watch group $watchgroup, line $line_num"; } $period = 0; $sref = \%{$new_watch{$watchgroup}->{$service}}; $sref->{"service"} = $args; $sref->{"interval"} = undef; $sref->{"randskew"} = 0; $sref->{"redistribute"} = ""; $sref->{"dep_behavior"} = $DEP_BEHAVIOR; $sref->{"dep_memory"} = $DEP_MEMORY; $sref->{"exclude_period"} = ""; $sref->{"exclude_hosts"} = {}; $sref->{"_op_status"} = $STAT_UNTESTED; $sref->{"_last_op_status"} = $STAT_UNTESTED; $sref->{"_ack"} = 0; $sref->{"_ack_comment"} = ''; $sref->{"unack_summary"} = $UNACK_SUMMARY; $sref->{"_consec_failures"} = 0; $sref->{"_failure_count"} = 0 if (!defined($sref->{"_failure_count"})); $sref->{"_start_of_monitor"} = time if (!defined($sref->{"_start_of_monitor"})); $sref->{"_alert_count"} = 0 if (!defined($sref->{"_alert_count"})); $sref->{"_last_failure"} = 0 if (!defined($sref->{"_last_failure"})); $sref->{"_last_success"} = 0 if (!defined($sref->{"_last_success"})); $sref->{"_last_trap"} = 0 if (!defined($sref->{"_last_trap"})); $sref->{"_last_traphost"} = '' if (!defined($sref->{"_last_traphost"})); $sref->{"_exitval"} = "undef" if (!defined($sref->{"_exitval"})); $sref->{"_last_check"} = undef; # # -1 for _monitor_duration means no monitor has been run yet # so there is no duration data available # $sref->{"_monitor_duration"} = -1; $sref->{"_monitor_running"} = 0; $sref->{"_depend_status"} = undef; $sref->{"failure_interval"} = undef; $sref->{"_old_interval"} = undef; next; } if ($service eq "") { close (CFG); return "cf error: need to specify service in watch record, line $line_num"; } # # period definition # # for each service there can be one or more alert periods # this is stored as an array of hashes named # %{$watch{$watchgroup}->{$service}->{"periods"}} # each index for this hash is a unique tag for the period as # defined by the user or named after the period (such as # "wd {Mon-Fri} hr {7am-11pm}") # # the value of the hash is an array containing the list of alert commands # and arguments, so # # @alerts = @{$watch{$watchgroup}->{$service}->{"periods"}->{"TAG"}} # if ($var eq "period") { $period = 1; my $periodstr; if ($args =~ /^([a-z_]\w*) \s* : \s* (.*)$/ix) { $periodstr = $1; $args = $2; } else { $periodstr = $args; } if (exists $sref->{"periods"}->{$periodstr}) { close (CFG); return "cf error: period '$periodstr' already defined for watch group $watchgroup service $service, line $line_num"; } $pref = \%{$sref->{"periods"}->{$periodstr}}; if (inPeriod (time, $args) == -1) { close (CFG); return "cf error: malformed period '$args' (the specified time period is not valid as per Time::Period::inPeriod), line $line_num"; } $pref->{"period"} = $args; $pref->{"alertevery"} = 0; $pref->{"numalerts"} = 0; $pref->{"_alert_sent"} = 0; $pref->{"no_comp_alerts"} = 0; $pref->{"_no_comp_alerts_upalert_sent"} = 0; @{$pref->{"alerts"}} = (); @{$pref->{"upalerts"}} = (); @{$pref->{"ackalerts"}} = (); @{$pref->{"disablealerts"}} = (); @{$pref->{"startupalerts"}} = (); next; } # # period variables # if ($period) { if ($var eq "alert") { push @{$pref->{"alerts"}}, $args; } elsif ($var eq "ackalert") { push @{$pref->{"ackalerts"}}, $args; } elsif ($var eq "disablealert") { push @{$pref->{"disablealerts"}}, $args; } elsif ($var eq "upalert") { $sref->{"_upalert"} = 1; push @{$pref->{"upalerts"}}, $args; } elsif ($var eq "startupalert") { push @{$pref->{"startupalerts"}}, $args; } elsif ($var eq "alertevery") { $pref->{"_observe_detail"} = 0; $pref->{"_alertevery_strict"} = 0; $pref->{"_ignore_summary"} = 0; if ($args =~ /(\S+) \s+ observe_detail \s*$/ix) { $pref->{"_observe_detail"} = 1; $args = $1; } elsif ($args =~ /(\S+) \s+ ignore_summary \s*$/ix) { $pref->{"_ignore_summary"} = 1; $args = $1; } # # for backawards-compatibility with <= 0.38.21 # elsif ($args =~ /(\S+) \s+ summary/ix) { $args = $1; } # # strict # elsif ($args =~ /(\S+) \s+ strict \s*$/ix) { $pref->{"_alertevery_strict"} = 1; $args = $1; } if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid time interval '$args' (syntax: alertevery {positive number}{smhd} [ strict | observe_detail | ignore_summary ]), line $line_num"; } $pref->{"alertevery"} = $args; next; } elsif ($var eq "alertafter") { my ($p1, $p2); # # alertafter NUM # if ($args =~ /^(\d+)$/) { $p1 = $1; $pref->{"alertafter_consec"} = $p1; } # # alertafter timeval # elsif ($args =~ /^(\d+[hms])$/) { $p1 = $1; if (!($p1 = dhmstos ($p1))) { close (CFG); return "cf error: invalid time interval '$args' (syntax: alertafter = [{positive integer}] [{positive number}{smhd}]), line $line_num"; } $pref->{"alertafterival"} = $p1; $pref->{"_1stfailtime"} = 0; } # # alertafter NUM timeval # elsif ($args =~ /(\d+)\s+(\d+[hms])$/) { ($p1, $p2) = ($1, $2); if (($p1 - 1) * $sref->{"interval"} >= dhmstos($p2)) { close (CFG); return "cf error: interval & alertafter not sensible. No alerts can be generated with those parameters, line $line_num"; } $pref->{"alertafter"} = $p1; $pref->{"alertafterival"} = dhmstos ($p2); $pref->{"_1stfailtime"} = 0; $pref->{"_failcount"} = 0; } else { close (CFG); return "cf error: invalid interval specification '$args', line $line_num"; } } elsif ($var eq "upalertafter") { if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid upalertafter specification '$args' (syntax: upalertafter = {positive number}{smhd}), line $line_num"; } $pref->{"upalertafter"} = $args; } elsif ($var eq "numalerts") { if ($args !~ /^\d+$/) { close (CFG); return "cf error: -numeric arg '$args' (syntax: numalerts = {positive integer}, line $line_num"; } $pref->{"numalerts"} = $args; next; } elsif ($var eq "no_comp_alerts") { $pref->{"no_comp_alerts"} = 1; next; } elsif ($var eq "alerts_dont_count") { $pref->{"alerts_dont_count"} = 1; next; } elsif ($var eq 'alertexitrange') { if ($args !~ /^\s*(\d+|\d+-\d+)\s*$/) { close (CFG); return "cf error: invalid exit code range '$args', line $line_num"; } $pref->{"alertexitrange"} = $args; } else { close (CFG); return "cf error: unknown syntax [$l], line $line_num"; } } # # non-period variables # elsif (!$period) { if ($var eq "interval") { if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid time interval '$args' (syntax: interval = {positive number}{smhd}), line $line_num"; } } elsif ($var eq "failure_interval") { if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid interval '$args' (syntax: failure_interval = {positive number}{smhd}), line $line_num"; } } elsif ($var eq "monitor") { # valid } elsif ($var eq "redistribute") { # valid } elsif ($var eq "allow_empty_group") { # valid } elsif ($var eq "description") { # valid } elsif ($var eq "unack_summary") { if (defined $args) { if ($args =~ /y(es)?/i) { $args = 1; } elsif ($args =~ /n(o)?/i) { $args = 0; } if ($args eq "0" || $args eq "1") { $sref->{"unack_summary"} = $args; } else { return "cf error: invalid unack_summary value '$args' (syntax: unack_summary [0|1|y|yes|n|no])"; } } else { $sref->{"unack_summary"} = 1; } next; } elsif ($var eq "traptimeout") { if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid traptimeout interval '$args' (syntax: traptimeout = {positive number}{smhd}), line $line_num"; } $sref->{"_trap_timer"} = $args; } elsif ($var eq "trapduration") { if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid trapduration interval '$args' (syntax: trapduration = {positive number}{smhd}), line $line_num"; } } elsif ($var eq "randskew") { if (!($args = dhmstos ($args))) { close (CFG); return "cf error: invalid randskew time interval '$args' (syntax: randskew = {positive number}{smhd}), line $line_num"; } } elsif ($var eq "dep_behavior") { if ($args ne "m" && $args ne "a" && $args ne "hm") { close (CFG); return "cf error: unknown dependency behavior '$args' (syntax: dep_behavior = {m|a}), line $line_num"; } } elsif ($var eq "dep_memory") { my $timeval = dhmstos($args); if (!$timeval) { close (CFG); return "cf error: bad value '$args' for dep_memory option (syntax: dep_memory = timeval), line $line_num"; } $args = $timeval; } elsif ($var eq "depend") { $args =~ s/SELF:/$watchgroup:/g; } elsif ($var eq "alertdepend") { $args =~ s/SELF:/$watchgroup:/g; } elsif ($var eq "monitordepend") { $args =~ s/SELF:/$watchgroup:/g; } elsif ($var eq "hostdepend") { $args =~ s/SELF:/$watchgroup:/g; } elsif ($var eq "exclude_hosts") { my $ex = {}; foreach my $h (split (/\s+/, $args)) { $ex->{$h} = 1; } $args = $ex; } elsif ($var eq "exclude_period") { if (inPeriod (time, $args) == -1) { close (CFG); return "cf error: malformed exclude_period '$args' (the specified time period is not valid as per Time::Period::inPeriod), line $line_num"; } } else { close (CFG); return "cf error: unknown syntax [$l], line $line_num"; } $sref->{$var} = $args; } else { close (CFG); return "cf error: unknown syntax outside of period section [$l], line $line_num"; } } next; } close (CFG) || return "Could not open pipe to m4 (check that m4 is properly installed and in your PATH): $!"; # # Go through each defined hostgroup and check that there is a # watch associated with that hostgroup record. # # hostgroups without associated watches are not a violation of # mon config syntax, but it's usually not what you want. # for (keys(%new_watch)) { $is_watch{$_} = 1 }; foreach $watchgroup ( keys (%new_groups) ) { print STDERR "Warning: hostgroup $watchgroup has no watch assigned to it!\n" unless $is_watch{$watchgroup}; } # # no errors, commit new config if $commit was specified # return "" unless $commit; %views = %new_views; %alias = %new_alias; %groups = %new_groups; %watch = %new_watch; %CF = %new_CF; ""; } # # convert a string like "20m" into seconds # sub dhmstos { my ($str) = @_; my ($s); $str = lc ($str); if ($str =~ /^\s*(\d+(?:\.\d+)?)([dhms])\s*$/i) { if ($2 eq "m") { $s = $1 * 60; } elsif ($2 eq "h") { $s = $1 * 60 * 60; } elsif ($2 eq "d") { $s = $1 * 60 * 60 * 24; } else { $s = $1; } } else { return undef; } $s; } # # reset the state of the server on SIGHUP, and reread config # file. # sub reset_server { my ($keepstate) = @_; # # reap children that may be running # foreach my $pid (keys %runningpid) { my ($group, $service) = split (/\//, $runningpid{$pid}); kill 15, $pid; waitpid ($pid, 0); syslog ('info', "reset killed child $pid, exit status $?"); remove_proc ($pid); } $procs = 0; save_state ("all") if ($keepstate); syslog ('info', "resetting, and re-reading configuration $CF{CF}"); if ((my $err = read_cf ($CF{"CF"}, 1)) ne "") { syslog ('err', "error reading config file: $err"); return undef; } normalize_paths; gen_scriptdir_hash; $lasttm=time; # the last time(2) the loop started $fdset_rbits = $fdset_ebits = ''; set_last_test (); randomize_startdelay() if ($CF{"RANDSTART"}); load_state ("all") if ($keepstate); if ($CF{"DTLOGGING"}) { init_dtlog(); } readhistoricfile; if ($CF{"STARTUPALERTS_ON_RESET"}) { do_startup_alerts; } return 1; } sub init_dtlog { my $t = time; return if (!$CF{"DTLOGGING"}); if (!open (DTLOG, ">>$CF{DTLOGFILE}")) { syslog ('err', "could not append to $CF{DTLOGFILE}: $!"); $CF{"DTLOGGING"} = 0; } else { $CF{"DTLOGGING"} = 1; print DTLOG <{"host"} = inet_ntoa($addr); $clients{$fno}->{"fhandle"} = $CLIENT; $clients{$fno}->{"user"} = undef; # username if authenticated $clients{$fno}->{"timeout"} = $CF{"CLIENT_TIMEOUT"}; $clients{$fno}->{"last_read"} = time; # last time data was read $clients{$fno}->{"buf"} = ''; $numclients++; } # # do all pending client commands # sub client_dopending { my ($cl, $cmd, $l); foreach $cl (keys %clients) { if ($clients{$cl}->{"buf"} =~ /^([^\r\n]*)[\r\n]+/s) { $cmd = $1; $l = length ($cmd); $clients{$cl}->{"buf"} =~ s/^[^\r\n]*[\r\n]+//s; client_command ($cl, $cmd); } } } # # close a client connection # sub client_close { my ($cl, $reason) = @_; syslog ('info', "closing client $cl: $reason") if (defined $reason); die if !defined ($clients{$cl}->{"fhandle"}); close ($clients{$cl}->{"fhandle"}); delete $clients{$cl}; vec ($iovec, $cl, 1) = 0; $numclients--; } # # Handle a connection from a client # sub client_command { my ($cl, $l) = @_; my ($cmd, $args, $group, $service, $s, $sname, $stchanged); my ($var, $value, $msg, @l, $sock, $port, $addr, $sref, $auth, $fh); my ($user, $pass, @argsList, $comment); my ($authtype, @authtypes); my $is_auth = 0; #flag for multiple auth types syslog ('info', "client command \"$l\"") if ($l !~ /^\s*login/i); $fh = $clients{$cl}->{"fhandle"}; if ($l !~ /^(dump|login|disable|enable|quit|list|set|get|setview|getview| stop|start|loadstate|savestate|reset|clear|checkauth| reload|term|test|servertime|ack|version|protid)(\s+(.*))?$/ix) { sock_write ($fh, "520 invalid command\n"); return; } ($cmd, $args) = ("\L$1", $3); $stchanged = 0; print STDERR "client command $cmd\nclient args $args\n"; # # quit command # if ($cmd eq "quit") { sock_write ($fh, "220 quitting\n"); client_close ($cl); } elsif ($opt{"d"} && $cmd eq "dump") { print STDERR Dumper (\%watch), "\n\n"; # # protocol identification # } elsif ($cmd eq "protid") { if ($args != int ($PROT_VERSION)) { sock_write ($fh, "520 protocol mismatch\n"); } else { sock_write ($fh, "220 protocol match\n"); } # # login # } elsif ($cmd eq "login") { ($user, $pass) = split (/\s+/, $args, 2); @authtypes = split(' ' , $CF{"AUTHTYPE"}) ; # Check each for of authentication in order, and stop checking # as soon as we get a positive authentication result. foreach $authtype (@authtypes) { if (defined auth ($authtype, $user, $pass, $clients{$cl}->{"host"})) { $is_auth = 1; last; } } if ($is_auth != 1) { sock_write ($fh, "530 login unsuccessful\n"); } else { $clients{$cl}->{"user"} = $user; syslog ("info", "authenticated $user"); sock_write ($fh, "220 login accepted\n"); } # # reset # } elsif ($cmd eq "reset" && check_auth ($clients{$cl}->{"user"}, $cmd)) { my ($keepstate); if ($args =~ /stopped/i) { $STOPPED = 1; $STOPPED_TIME = time; } if ($args =~ /keepstate/) { $keepstate = 1; } if (reset_server ($keepstate)) { sock_write ($fh, "220 reset PID $$\@$HOSTNAME\n"); } else { sock_write ($fh, "520 reset PID $$\@$HOSTNAME failed, error in config file\n"); } # # reload # } elsif ($cmd eq "reload" && check_auth ($clients{$cl}->{"user"}, $cmd)) { if (!defined reload (split (/\s+/, $args))) { sock_write ($fh, "520 unknown reload command\n"); } else { sock_write ($fh, "220 reload completed\n"); } # # clear # } elsif ($cmd eq "clear" && check_auth ($clients{$cl}->{"user"}, $cmd)) { if ($args =~ /^timers \s+ ([a-zA-Z0-9_.-]+) \s+ ([a-zA-Z0-9_.-]+)/ix) { if (!defined $watch{$1}->{$2}) { sock_write ($fh, "520 unknown group\n"); } else { clear_timers ($1, $2); sock_write ($fh, "220 clear timers completed\n"); } } else { sock_write ($fh, "520 unknown clear command\n"); next; } # # test # } elsif ($cmd eq "test" && check_auth ($clients{$cl}->{"user"}, $cmd)) { my ($cmd, $args) = split (/\s+/, $args, 2); # # test monitor # if ($cmd eq "monitor") { my ($group, $service) = split (/\s+/, $args); if (!defined $watch{$group}->{$service}) { sock_write ($fh, "$group $service not defined\n"); } else { $watch{$group}->{$service}->{"_timer"} = 0; $watch{$group}->{$service}->{"_next_check"} = 0; mysystem("$CF{MONREMOTE} test $group $service") if ($CF{MONREMOTE}); } sock_write ($fh, "220 test monitor completed\n"); # # test alert # } elsif ($cmd =~ /^alert|startupalert|upalert|ackalert|disablealert$/) { my ($group, $service, $retval, $period) = split (/\s+/, $args, 4); if (!defined $watch{$group}->{$service}) { sock_write ($fh, "520 $group $service not defined\n"); } elsif (!defined $watch{$group}->{$service}->{"periods"}->{$period}) { sock_write ($fh, "520 period not defined\n"); } else { my $f = 0; my $a; if ($cmd eq "alert") { $a = $watch{$group}->{$service}->{"periods"}->{$period}->{"alerts"}; } elsif ($cmd eq "startupalert") { $f = $FL_STARTUPALERT; $a = $watch{$group}->{$service}->{"periods"}->{$period}->{"startupalerts"}; } elsif ($cmd eq "upalert") { $f = $FL_UPALERT; $a = $watch{$group}->{$service}->{"periods"}->{$period}->{"upalerts"}; } elsif ($cmd eq "ackalert") { $f = $FL_ACKALERT; $a = $watch{$group}->{$service}->{"periods"}->{$period}->{"ackalerts"}; } elsif ($cmd eq "disablealert") { $f = $FL_DISABLEALERT; $a = $watch{$group}->{$service}->{"periods"}->{$period}->{"disablealerts"}; } for (@{$a}) { my ($alert, $args) = split (/\s+/, $_, 2); if ($args =~ /^exit=/) { $args =~ s/^exit=\S+ \s+//x; } call_alert ( group => $group, service => $service, output => "test\ntest detail\n", retval => $retval, flags => $f | $FL_TEST, alert => $alert, args => $args, ); } sock_write ($fh, "220 test alert completed\n"); } # # test config file # } elsif ($cmd =~ /^config$/) { if ((my $err = read_cf ($CF{"CF"}, 0)) ne "") { sock_write ($fh, $err); sock_write ($fh, "\n520 test config completed, errors found in config file\n"); } else { sock_write ($fh, "220 test config completed OK, no errors found\n"); } } else { sock_write ($fh, "520 test error\n"); } # # version # } elsif ($cmd eq "version") { sock_write ($fh, "version " . int ($PROT_VERSION) . "\n"); sock_write ($fh, "220 version completed\n"); # # load state # } elsif ($cmd eq "loadstate" && check_auth ($clients{$cl}->{"user"}, $cmd)) { foreach (split (/\s+/, $args)) { load_state ($_); } sock_write ($fh, "220 loadstate completed\n"); # # save state # } elsif ($cmd eq "savestate" && check_auth ($clients{$cl}->{"user"}, $cmd)) { if ($args =~ /\S/) { foreach (split (/\s+/, $args)) { save_state ($_); } sock_write ($fh, "220 savestate completed\n"); } else { sock_write ($fh, "520 savestate error, arguments required\n"); } # # term # } elsif ($cmd eq "term" && check_auth ($clients{$cl}->{"user"}, $cmd)) { sock_write ($fh, "220 terminating server\n"); client_close ($cl, "terminated by user command"); syslog ("info", "terminating by user command"); exit; # # stop testing # } elsif ($cmd eq "stop"&& check_auth ($clients{$cl}->{"user"}, $cmd)) { $STOPPED = 1; $STOPPED_TIME = time; sock_write ($fh, "220 stop completed\n"); # # start testing # } elsif ($cmd eq "start" && check_auth ($clients{$cl}->{"user"}, $cmd)) { $STOPPED = 0; $STOPPED_TIME = 0; sock_write ($fh, "220 start completed\n"); } elsif ($cmd eq "setview") { my @args=split /\s+/, $args; if (@args > 1) { sock_write($fh, "500 Unknown setview command\n") } elsif (@args == 1) { if (defined($views{$args[0]})) { $clients{$cl}->{"view"} = $args[0]; sock_write($fh, "selecting view $args[0]\n"); sock_write($fh, "220 setview completed\n") } else { sock_write($fh, "504 unknown view $args[0]\n"); } } else { delete $clients{$cl}->{"view"}; sock_write($fh, "no view selected -- all groups will be displayed\n"); sock_write($fh, "220 setview completed\n") } } elsif ($cmd eq "getview") { if ($clients{$cl}->{"view"}) { sock_write($fh, "view ".$clients{$cl}->{"view"}. " selected\n"); } else { sock_write($fh, "no view selected -- all groups will be displayed\n"); } sock_write($fh, "220 getview completed\n") # # set # } elsif ($cmd eq "set" && check_auth ($clients{$cl}->{"user"}, $cmd)) { if ($args =~ /^maxkeep\s+(\d+)/) { $CF{"MAX_KEEP"} = $1; sock_write ($fh, "220 set completed\n"); } else { ($group, $service, $var, $value) = split (/\s+/, $args, 4); if (!defined $watch{$group}->{$service}) { sock_write ($fh, "520 $group,$service not defined\n"); } elsif ($var eq "opstatus") { if (!defined ($OPSTAT{$value})) { sock_write ($fh, "520 undefined opstatus\n"); } else { set_op_status ($group, $service, un_esc_str ((parse_line ('\s+', 0, $value))[0])); sock_write ($fh, "220 set completed\n"); } } else { $value = un_esc_str ((parse_line ('\s+', 0, $value))[0]); $watch{$group}->{$service}->{$var} = $value; sock_write ($fh, "$group $service $var='$value'\n"); sock_write ($fh, "220 set completed\n"); } } # # get # } elsif ($cmd eq "get" && check_auth ($clients{$cl}->{"user"}, $cmd)) { if ($args =~ /^maxkeep\s*$/) { sock_write ($fh, "maxkeep = $CF{MAX_KEEP}\n"); sock_write ($fh, "220 set completed\n"); } else { ($group, $service, $var) = split (/\s+/, $args, 3); if (!defined $watch{$group}->{$service}) { sock_write ($fh, "520 $group,$service not defined\n"); } else { sock_write ($fh, "$group $service $var='" . esc_str ($watch{$group}->{$service}->{$var}, 1) . "'\n"); sock_write ($fh, "220 get completed\n"); } } # # list # } elsif ($cmd eq "list" && check_auth ($clients{$cl}->{"user"}, $cmd)) { @argsList = split(/\s+/, $args); ($cmd, $args) = split (/\s+/, $args, 2); # # list service descriptions # if ($cmd eq "descriptions") { foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { sock_write ($fh, "$group $service " . esc_str ($watch{$group}->{$service}->{"description"}, 1) . "\n"); } } } sock_write ($fh, "220 list descriptions completed\n"); # # list group members # } elsif ($cmd eq "group") { if ($groups{$args}) { sock_write ($fh, "hostgroup $args @{$groups{$args}}\n"); sock_write ($fh, "220 list group completed\n"); } else { sock_write ($fh, "520 list group error, undefined group\n"); } # # list status of all services # } elsif ($cmd eq "opstatus") { if (!defined $args || $args eq "") { foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { client_write_opstatus ($fh, $group, $service); } } } sock_write ($fh, "220 list opstatus completed\n"); } else { my $err = 0; my @g = (); my ($group, $service); foreach my $gs (split (/\s+/, $args)) { ($group, $service) = split (/,/, $gs); $err++ && last if ($service ne "" && !defined $watch{$group}->{$service}); push (@g, [$group, $service]); } if (!$err) { foreach my $gs (@g) { if ($gs->[1] ne "") { client_write_opstatus ($fh, $gs->[0], $gs->[1]); } else { foreach $service (keys %{$watch{$gs->[0]}}) { client_write_opstatus ($fh, $gs->[0], $service); } } } sock_write ($fh, "220 list opstatus completed\n"); } else { sock_write ($fh, "520 $group,$service does not exist\n"); } } # # list disabled hosts and services # } elsif ($cmd eq "disabled") { foreach $group (keys %groups) { if (view_match($clients{$cl}->{"view"}, $group, undef)) { @l = grep (/^\*/, @{$groups{$group}}); if (@l) { grep (s/^\*//, @l); sock_write ($fh, "group $group: @l\n"); } } } foreach $group (keys %watch) { if (view_match($clients{$cl}->{"view"}, $group, undef)) { if (exists $watch_disabled{$group} && $watch_disabled{$group} == 1) { sock_write ($fh, "watch $group\n"); } } foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { if (defined $watch{$group}->{$service}->{'disable'} && $watch{$group}->{$service}->{'disable'} == 1) { sock_write ($fh, "watch $group service " . "$service\n"); } } } } sock_write ($fh, "220 list disabled completed\n"); # # list last alert history # } elsif ($cmd eq "alerthist") { foreach my $l (@last_alerts) { sock_write ($fh, esc_str ($l) . "\n"); } sock_write ($fh, "220 list alerthist completed\n"); # # list time of last failures for each service # } elsif ($cmd eq "failures") { foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { my $sref = \%{$watch{$group}->{$service}}; client_write_opstatus ($fh, $group, $service) if ($FAILURE{$sref->{"_op_status"}}); } } } sock_write ($fh, "220 list failures completed\n"); # # list the failure history # } elsif ($cmd eq "failurehist") { foreach my $l (@last_failures) { sock_write ($fh, esc_str ($l) . "\n"); } sock_write ($fh, "220 list failurehist completed\n"); # # list the time of last successes for each service # } elsif ($cmd eq "successes") { foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { my $sref = \%{$watch{$group}->{$service}}; client_write_opstatus ($fh, $group, $service) if ($SUCCESS{$sref->{"_op_status"}}); } } } sock_write ($fh, "220 list successes completed\n"); # # list warnings # } elsif ($cmd eq "warnings") { foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { my $sref = \%{$watch{$group}->{$service}}; client_write_opstatus ($fh, $group, $service) if ($WARNING{$sref->{"_op_status"}}); } } } sock_write ($fh, "220 list successes completed\n"); # # list process IDs # } elsif ($cmd eq "pids") { sock_write ($fh, "server $$\n"); foreach $value (keys %runningpid) { ($group, $service) = split (/\//, $runningpid{$value}); sock_write ($fh, "$group $service $value\n"); } sock_write ($fh, "220 list pids completed\n"); # # list watch groups and services # } elsif ($cmd eq "watch") { foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { if (!defined $watch{$group}->{$service}) { sock_write ($fh, "$group (undefined service)\n"); } else { sock_write ($fh, "$group $service\n"); } } } } sock_write ($fh, "220 list watch completed\n"); # # list server state # } elsif ($cmd eq "state") { if ($STOPPED) { sock_write ($fh, "scheduler stopped since $STOPPED_TIME\n"); } else { sock_write ($fh, "scheduler running\n"); } sock_write ($fh, "220 list state completed\n"); # # list aliases # } elsif ($cmd eq "aliases") { my (@listAliasesRequest) = @argsList; shift (@listAliasesRequest); # if no alias request, all alias are responded unless (@listAliasesRequest) { @listAliasesRequest = keys (%alias); } foreach my $alias (@listAliasesRequest){ sock_write ($fh, "alias $alias\n"); foreach $value (@{$alias{$alias}}) { sock_write ($fh, "$value\n"); } sock_write ($fh, "\n"); } sock_write ($fh, "220 list aliases completed\n"); # # list aliasgroups # } elsif ($cmd eq "aliasgroups") { my (@listAliasesRequest); @listAliasesRequest = keys (%alias); sock_write ($fh, "@listAliasesRequest\n") unless (@listAliasesRequest == 0); sock_write ($fh, "220 list aliasgroups completed\n"); # # list deps # } elsif ($cmd eq "deps") { foreach my $g (keys %watch) { foreach my $s (keys %{$watch{$g}}) { if (view_match($clients{$cl}->{"view"}, $group, $service)) { my $sref = \%{$watch{$g}->{$s}}; if ($sref->{"depend"} ne "") { sock_write ($fh, "exp $g $s '" . esc_str ($sref->{"depend"}, 1) . "'\n"); } else { sock_write ($fh, "exp $g $s 'NONE'\n"); } my @u = ($sref->{"depend"} =~ /[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+/g); if (@u) { sock_write ($fh, "cmp $g $s @u\n"); } else { sock_write ($fh, "cmp $g $s NONE\n"); } } } } sock_write ($fh, "220 list deps completed\n"); # # downtime log # } elsif ($cmd eq "dtlog") { if ($CF{"DTLOGGING"}) { if (!open (DTLOGTMP, "< $CF{DTLOGFILE}")) { sock_write ($fh, "520 list dtlog error, cannot open dtlog\n"); } else { while () { sock_write ($fh, $_ ) if (!/^#/ && !/^\s*$/); } close (DTLOGTMP); sock_write ($fh, "220 list dtlog completed\n"); } } else { sock_write ($fh, "520 list dtlog error, dtlogging is not turned on\n"); } # # list available views # } elsif ($cmd eq "views") { sock_write ($fh, "views ".join(' ',sort(keys %views))."\n"); sock_write ($fh, "220 list group completed\n"); # unknown list command } else { sock_write ($fh, "520 unknown list command\n"); } # # acknowledge a failure # } elsif ($cmd eq "ack" && check_auth ($clients{$cl}->{"user"}, $cmd)) { my ($group, $service, $comment) = split (/\s+/, $args, 3); if (!defined ($watch{$group})) { sock_write ($fh, "520 unknown group\n"); } elsif (!defined $watch{$group}->{$service}) { sock_write ($fh, "520 unknown service\n"); } my $sref = \%{$watch{$group}->{$service}}; if ($sref->{"_op_status"} == $STAT_OK || $sref->{"_op_status"} == $STAT_UNTESTED) { sock_write ($fh, "520 service is in a non-failure state\n"); } else { $sref->{"_ack"} = time; $sref->{"_ack_comment"} = $clients{$cl}->{"user"} . ": " . un_esc_str ((parse_line ('\s+', 0, $comment))[0]); sock_write ($fh, "220 ack completed\n"); do_alert($group, $service, $sref->{"_ack_comment"}, undef, $FL_ACKALERT) } # # disable watch, service or host # } elsif ($cmd eq "disable" && check_auth ($clients{$cl}->{"user"}, $cmd)) { ($cmd, $args) = split (/\s+/, $args, 2); # # disable watch # if ($cmd eq "watch") { if (!defined (disen_watch($args, 0))) { sock_write ($fh, "520 disable error, unknown watch \"$args\"\n"); } else { $stchanged++; mysystem("$CF{MONREMOTE} disable watch $args") if ($CF{MONREMOTE}); sock_write ($fh, "220 disable watch completed\n"); } # # disable service # } elsif ($cmd eq "service") { ($group, $service) = split (/\s+/, $args, 2); if (!defined (disen_service ($group, $service, 0))) { sock_write ($fh, "520 disable error, unknown service\n"); } else { $stchanged++; mysystem("$CF{MONREMOTE} disable service $group $service") if ($CF{MONREMOTE}); sock_write ($fh, "220 disable service completed\n"); do_alert($group, $service, $clients{$cl}->{"user"}, undef, $FL_DISABLEALERT) } # # disable host # } elsif ($cmd eq "host") { my @notfound = (); my @hosts = split (/\s+/, $args); foreach my $h (@hosts) { if (!host_exists ($h)) { push @notfound, $h; } } if (@notfound) { sock_write ($fh, "520 disable host failed, host(s) @notfound do not exist\n"); } else { foreach my $h (@hosts) { # # disable a watch if there is a group with this host # as its only member. this prevents warning messages # about monitors not being run on empty host groups # foreach my $g (host_singleton_group($h)) { disen_watch($g, 0); mysystem("$CF{MONREMOTE} disable watch $g") if ($CF{MONREMOTE}); } disen_host ($h, 0); $stchanged++; mysystem("$CF{MONREMOTE} disable host $h") if ($CF{MONREMOTE}); } sock_write ($fh, "220 disable host completed\n"); } } else { sock_write ($fh, "520 command could not be executed\n"); } # # enable watch, service or host # } elsif ($cmd eq "enable" && check_auth ($clients{$cl}->{"user"}, $cmd)) { ($cmd, $args) = split (/\s+/, $args, 2); # # enable watch # if ($cmd eq "watch") { if (!defined (disen_watch ($args, 1))) { sock_write ($fh, "520 enable error, unknown watch\n"); } else { $stchanged++; mysystem("$CF{MONREMOTE} enable watch $args") if ($CF{MONREMOTE}); sock_write ($fh, "220 enable watch completed\n"); } # # enable service # } elsif ($cmd eq "service") { ($group, $service) = split (/\s+/, $args, 2); if (!defined (disen_service ($group, $service, 1))) { sock_write ($fh, "520 enable error, unknown group\n"); } else { $stchanged++; mysystem("$CF{MONREMOTE} enable service $group $service") if ($CF{MONREMOTE}); sock_write ($fh, "220 enable completed\n"); } # # enable host # } elsif ($cmd eq "host") { foreach my $h (split (/\s+/, $args)) { foreach my $g (host_singleton_group($h)) { disen_watch($g, 1); mysystem("$CF{MONREMOTE} enable watch $g") if ($CF{MONREMOTE}); } disen_host ($h, 1); mysystem("$CF{MONREMOTE} enable host $h") if ($CF{MONREMOTE}); $stchanged++; } sock_write ($fh, "220 enable completed\n"); } else { sock_write ($fh, "520 command could not be executed\n"); } # # server time # } elsif ($cmd eq "servertime" && check_auth ($clients{$cl}->{"user"}, $cmd)) { sock_write ($fh, join ("", time, " ", scalar (localtime), "\n")); sock_write ($fh, "220 servertime completed\n"); # # check auth # } elsif ($cmd eq "checkauth") { @_ = split(' ',$args); $cmd = $_[0]; $user = $clients{$cl}->{"user"}; # Note that we call check_auth without syslogging here. if (check_auth($clients{$cl}->{"user"}, $cmd, 1)) { sock_write ($fh, "220 command authorized\n"); } else { sock_write ($fh, "520 command could not be executed\n"); } } else { sock_write ($fh, "520 command could not be executed, unknown command\n"); } save_state ("disabled") if ($stchanged); syslog ('info', "finished client command \"$l\"") if ($l !~ /^\s*login/i); } sub client_write_opstatus { my $fh = shift; my ($group, $service) = @_; my $sref = \%{$watch{$group}->{$service}}; my $summary = esc_str ($sref->{"_last_summary"}, 1); my $detail = esc_str ($sref->{"_last_detail"}, 1); my $depend = esc_str ($sref->{"depend"}, 1); my $hostdepend = esc_str ($sref->{"hostdepend"}, 1); my $monitordepend = esc_str ($sref->{"monitordepend"}, 1); my $alertdepend = esc_str ($sref->{"alertdepend"}, 1); my $monitor = esc_str ($sref->{"monitor"}, 1); my $comment; if ($sref->{"_ack"} != 0) { $comment = esc_str ($sref->{"_ack_comment"}, 1); } else { $comment = ''; } my $alerts_sent = 0; my $l = 0; foreach my $period (keys %{$sref->{"periods"}}) { $alerts_sent += $sref->{"periods"}->{$period}->{"_alert_sent"} if (!defined($sref->{"periods"}{$period}{"alerts_dont_count"})); $l = $sref->{"periods"}->{$period}->{"_last_alert"} if (defined $sref->{"periods"}->{$period}->{"_last_alert"} && $sref->{"periods"}->{$period}->{"_last_alert"} > $l); } my $buf = sprintf("group=$group service=$service opstatus=$sref->{_op_status} last_opstatus=%s exitval=%s timer=%s last_success=%s last_trap=%s last_traphost=%s last_check=%s ack=%s ackcomment=$comment alerts_sent=$alerts_sent depstatus=%s depend=$depend hostdepend=$hostdepend monitordepend=$monitordepend alertdepend=$alertdepend monitor=$monitor last_summary=%s last_detail=%s", (defined $sref->{_last_op_status} ? $sref->{_last_op_status} : ""), (defined $sref->{_exitval} ? $sref->{_exitval} : ""), (defined $sref->{_timer} ? $sref->{_timer} : ""), (defined $sref->{_last_success} ? $sref->{_last_success} : ""), (defined $sref->{_last_trap} ? $sref->{_last_trap} : ""), (defined $sref->{_last_traphost} ? $sref->{_last_traphost} : ""), (defined $sref->{_last_check} ? $sref->{_last_check} : ""), (defined $sref->{_ack} ? $sref->{_ack} : ""), (defined $sref->{"_depend_status"} ? int ($sref->{"_depend_status"}) : ""), $summary, $detail); $buf .= " last_failure=$sref->{_last_failure}" if ($sref->{"_last_failure"}); if ($sref->{"interval"}) { $buf .= " interval=$sref->{interval}" . " monitor_duration=$sref->{_monitor_duration}" . " monitor_running=$sref->{_monitor_running}" } $buf .= " exclude_period=". esc_str($sref->{exclude_period}) if ($sref->{"exclude_period"} ne ""); $buf .= " exclude_hosts=" . esc_str(join (" ", keys %{$sref->{exclude_hosts}})) if (keys %{$sref->{"exclude_hosts"}}); $buf .= " randskew=$sref->{randskew}" if ($sref->{"randskew"}); $buf .= " last_alert=$l" if ($l); if ($sref->{"_first_failure"}) { my $t = time - $sref->{"_first_failure"}; $buf .= " first_failure=$sref->{_first_failure}" . " failure_duration=$t"; } # if ($sref->{"_first_success"}) # { # my $t = time - $sref->{"_first_success"}; # $buf .= " first_success=$sref->{_first_success}" . # " success_duration=$t"; # } $buf .= "\n"; sock_write ($fh, $buf); } # # show usage # sub usage { print <<"EOF"; usage: mon [-a dir] [-A file] [-b dir] [-B dir] [-c config] [-d] [-D dir] [-f] [-h] [-i secs] [-k num] [-l [type]] [-L dir] [-M [path]] [-m num] [-p num] [-P file] [-r num] [-s dir] [-S] [-t num] mon -v -a dir alert script dir -A file authorization file -b dir base directory for alerts and monitors (basedir) -B dir base directory for configuration files (cfbasedir) -c config config file, defaults to "mon.cf" -d debug -D dir state directory (statedir) -f fork and become a daemon -h this help -i secs sleep interval (seconds), defaults to 1 -k num keep history of last num events -l [type] load some types of old state from statedir. type can be disabled (default), opstatus or all. -L dir log directory (logdir) -M [path] pre-process config file with m4. if m4 isn't in \$PATH specify the path to m4 here -m num throttle at maximum number of monitor processes -O facility syslog facility to use -p num server listens on port num -P file PID file -r num randomize startup schedule -s dir monitor script dir -S start with scheduler stopped -t port trap port -v print version Report bugs to $AUTHOR $RCSID EOF } # # become a daemon # sub daemon { my $pid; if ($pid = fork()) { # the parent goes away all happy and stuff exit (0); } elsif (!defined $pid) { die "could not fork: $!\n"; } setsid(); # # make it so that we cannot regain a controlling terminal # if ($pid = fork()) { # the parent goes away all happy and stuff exit (0); } elsif (!defined $pid) { syslog ('err', "could not fork: $!"); exit 1; } # chdir ('/'); umask (022); if (!open (N, "+>>" . $CF{"MONERRFILE"})) { syslog ("err", "could not open error output file $CF{'MONERRFILE'}: $!"); exit (1); } select (N); $| = 1; select (STDOUT); if (!open (STDIN, "/dev/null")) { syslog ("err", "could not open STDIN from /dev/null: $!"); exit (1); } print N "Mon starting at ".localtime(time)."\n"; if (!open(STDOUT, ">&N") || !open (STDERR, ">&N")) { syslog ("err", "could not redirect: $!"); exit(1); } syslog ('info', "running as daemon"); } # # debug # sub debug { my ($level, @l) = @_; return if (!defined $opt{"d"} || $level > $opt{"d"}); if ($opt{"d"} && !$opt{"f"}) { print STDERR @l; } else { syslog ('debug', join ('', @l)); } } # # die_die # sub die_die { my ($level, $msg) = @_; die "[$level] $msg\n" if ($opt{"d"}); syslog ($level, "fatal, $msg"); closelog(); exit (1); } # # handle cleanup of exited processes # trigger alerts on failures (or send no alert if disabled) # do some accounting # sub proc_cleanup { my ($summary, $tmnow, $buf); $tmnow = time; return if (keys %running == 0); while ((my $p = waitpid (-1, &WNOHANG)) >0) { next if (!exists $runningpid{$p}); my ($group, $service) = split (/\//, $runningpid{$p}); my $sref = \%{$watch{$group}->{$service}}; # # suck in any extra data # my $fh = $fhandles{$runningpid{$p}}; while (my $z = sysread ($fh, $buf, 8192)) { $ibufs{$runningpid{$p}} .= $buf; } debug (1, "PID $p ($runningpid{$p}) exited with [" . int ($?>>8) . "]\n"); $sref->{"_monitor_duration"} = $tmnow - $sref->{"_last_check"}; $sref->{"_monitor_running"} = 0; process_event ("m", $group, $service, int ($?>>8), $ibufs{$runningpid{$p}}); reset_timer ($group, $service); remove_proc ($p); } } # # handle the event where a monitor exits or a trap is received # # $type is "m" for monitor, "t" for trap # sub process_event { my ($type, $group, $service, $exitval, $output) = @_; debug (1, "process_event type=$type group=$group service=$service exitval=$exitval output=[$output]\n"); my $sref = \%{$watch{$group}->{$service}}; my $tmnow = time; my ($summary, $detail) = split("\n", $output, 2); $sref->{"_exitval"} = $exitval; if ($sref->{"depend"} ne "" && $sref->{"dep_behavior"} eq "a") { dep_ok ($sref, 'a'); } # # error exit value # if ($exitval) { # # accounting # $sref->{"_failure_count"}++; $sref->{"_consec_failures"}++; $sref->{"_last_failure"} = $tmnow; if ($sref->{"_op_status"} == $STAT_OK || $sref->{"_op_status"} == $STAT_UNKNOWN || $sref->{"_op_status"} == $STAT_UNTESTED) { $sref->{"_first_failure"} = $tmnow; } set_op_status ($group, $service, $STAT_FAIL); $summary = "(NO SUMMARY)" if ($summary =~ /^\s*$/m); $sref->{"_last_summary"} = $summary; $sref->{"_last_detail"} = $detail; shift @last_failures if (@last_failures > $CF{"MAX_KEEP"}); push @last_failures, "$group $service" . " $tm $summary"; syslog ('crit', "failure for $last_failures[-1]"); # # send an alert if necessary # if ($type eq "m") { do_alert ($group, $service, $output, $exitval, $FL_MONITOR); # # change interval if needed # if (defined ($sref->{"failure_interval"}) && !defined $sref->{"_old_interval"}) { $sref->{"_old_interval"} = $sref->{"interval"}; $sref->{"interval"} = $sref->{"failure_interval"}; $sref->{"_next_check"} = 0; } } elsif ($type eq "t") { do_alert ($group, $service, $output, $exitval, $FL_TRAP); } elsif ($type eq "T") { do_alert ($group, $service, $output, $exitval, $FL_TRAPTIMEOUT); } $sref->{"_failure_output"} = $output; } # # success exit value # else { if ($CF{"DTLOGGING"} && defined ($sref->{"_op_status"}) && $sref->{"_op_status"} == $STAT_FAIL) { write_dtlog ($sref, $group, $service); } my $old_status = $sref->{"_op_status"}; set_op_status ($group, $service, $STAT_OK); if ($type eq "t") { $sref->{"_last_uptrap"} = $tmnow; } # # if this service has just come back up and # we are paying attention to this event, # let someone know # if (($sref->{"redistribute"} ne '') || ((defined ($sref->{"_op_status"})) && ($old_status == $STAT_FAIL) && (defined($sref->{"_upalert"})) && (!defined($sref->{"upalertafter"}) || (($tmnow - $sref->{"_first_failure"}) >= $sref->{"upalertafter"})))) { # Save the last failing monitor's output for posterity $sref->{"_upalertoutput"}= $sref->{"_last_output"}; do_alert ($group, $service, $sref->{"_upalertoutput"}, 0, $FL_UPALERT); } # # send also when no upalertafter set # elsif (defined($sref->{"_upalert"}) && $old_status == $STAT_FAIL) { do_alert ($group, $service, $sref->{"_upalertoutput"}, 0, $FL_UPALERT); } $sref->{"_ack"} = 0; $sref->{"_ack_comment"} = ''; $sref->{"_first_failure"} = 0; $sref->{"_last_failure"} = 0; $sref->{"_consec_failures"} = 0; $sref->{"_failure_output"} = ""; $sref->{"_last_summary"} = $summary; $sref->{"_last_detail"} = $detail; # # reset the alertevery timer # foreach my $period (keys %{$sref->{"periods"}}) { # # "alertevery strict" should not reset _last_alert # if (!$sref->{"periods"}->{$period}->{"_alertevery_strict"}) { $sref->{"periods"}->{$period}->{"_last_alert"} = 0; } $sref->{"periods"}->{$period}->{"_1stfailtime"} = 0; $sref->{"periods"}->{$period}->{"_alert_sent"} = 0; } # # change interval back to original # if (defined ($sref->{"failure_interval"}) && $sref->{"_old_interval"} != undef) { $sref->{"interval"} = $sref->{"_old_interval"}; $sref->{"_old_interval"} = undef; $sref->{"_next_check"} = 0; } $sref->{"_last_success"} = $tmnow; } # # save the output # $sref->{"_last_output"} = $output; $sref->{"_last_summary"} = $summary; $sref->{"_last_detail"} = $detail; } # # collect output from running processes # sub collect_output { my ($buf, $rout); return if (!keys %running); my $nfound = select ($rout=$fdset_rbits, undef, undef, 0); debug (1, "select returned $nfound file handles\n"); return if ($! == &EINTR); if ($nfound) { # # look for the file descriptors that are readable, # and try to read as much as possible from them # foreach my $k (keys %fhandles) { my $fh = $fhandles{$k}; if (vec ($rout, fileno($fh), 1) == 1) { my $z = 0; while ($z = sysread ($fh, $buf, 8192)) { $ibufs{$k} .= $buf; debug (1, "[$buf] from $fh\n"); } # # ignore if EAGAIN, since we're nonblocking # if (!defined($z) && $! == &EAGAIN) { # # error on this descriptor # } elsif (!defined($z)) { debug (1, "error on $fh: $!\n"); syslog ('err', "error on $fh: $!"); vec($fdset_rbits, fileno($fh), 1) = 0; } elsif ($z == 0 && $! == &EAGAIN) { debug (1, "EAGAIN on $fh\n"); # # if EOF encountered, stop trying to # get input from this file descriptor # } elsif ($z == 0) { debug (1, "EOF on $fh\n"); vec($fdset_rbits, fileno($fh), 1) = 0; } } } } } # # handle forking a monitor process, and set up variables # sub run_monitor { my ($group, $service) = @_; my (@args, @groupargs, $pid, @ghosts, $monitor, $monitorargs); my $sref = \%{$watch{$group}->{$service}}; ($monitor, $monitorargs) = ($sref->{"monitor"} =~ /^(\S+)(\s+(.*))?$/); if (!defined $MONITORHASH{$monitor} || ! -f $MONITORHASH{$monitor}) { syslog ('err', "no monitor found while trying to run [$monitor]"); return undef; } else { $monitor = $MONITORHASH{$monitor}; } $monitor .= " " . $monitorargs if ($monitorargs); @ghosts = (); # # if monitor ends with ";;", do not append groups # to command line # if ($monitor =~ /;;\s*$/) { $monitor =~ s/\s*;;\s*$//; @args = quotewords ('\s+', 0, $monitor); @ghosts = (1); # # exclude disabled hosts # } else { @ghosts = grep (!/^\*/, @{$groups{$group}}); # # per-service excludes # if (keys %{$sref->{"exclude_hosts"}}) { my @g = (); for (my $i=0; $i<@ghosts; $i++) { push (@g, $ghosts[$i]) if !$sref->{"exclude_hosts"}->{$ghosts[$i]}; } @ghosts = @g; } # # per-host dependencies # if ((defined $sref->{"depend"} && $sref->{"depend"} ne "" && $sref->{"dep_behavior"} eq 'hm') || (defined $sref->{"hostdepend"} && $sref->{"hostdepend"} ne "")) { my @g = (); my $sum = dep_summary($sref); for (my $i=0; $i<@ghosts; $i++) { push (@g, $ghosts[$i]) if (! grep /\Q$ghosts[$i]\E/, @$sum); } @ghosts = @g; } @args = (quotewords ('\s+', 0, $monitor), @ghosts); } if (@ghosts == 0 && !defined ($sref->{"allow_empty_group"})) { syslog ('err', "monitor for $group/$service" . " not called because of no host arguments\n"); reset_timer ($group, $service); } else { $fhandles{"$group/$service"} = new FileHandle; $pid = open ($fhandles{"$group/$service"}, '-|'); if (!defined $pid) { syslog ('err', "Could not fork: $!"); delete $fhandles{"$group/$service"}; return 0; } elsif ($pid == 0) { open(STDERR, '>&STDOUT') or syslog ('err', "Could not dup stderr: $!"); open(STDIN, "{"ENV"}}) { $ENV{$v} = $sref->{"ENV"}->{$v}; } $ENV{"MON_GROUP"} = $group; $ENV{"MON_SERVICE"} = $service; $ENV{"MON_LAST_SUMMARY"} = $sref->{"_last_summary"} if (defined $sref->{"_last_summary"}); $ENV{"MON_LAST_OUTPUT"} = $sref->{"_last_output"} if (defined $sref->{"_last_output"}); $ENV{"MON_LAST_FAILURE"} = $sref->{"_last_failure"} if (defined $sref->{"_last_failure"}); $ENV{"MON_FIRST_FAILURE"} = $sref->{"_first_failure"} if (defined $sref->{"_first_failure"}); $ENV{"MON_DEPEND_STATUS"} = $sref->{"_depend_status"} if (defined $sref->{"_depend_status"}); $ENV{"MON_FIRST_SUCCESS"} = $sref->{"_first_success"} if (defined $sref->{"_first_success"}); $ENV{"MON_LAST_SUCCESS"} = $sref->{"_last_success"} if (defined $sref->{"_last_success"}); $ENV{"MON_DESCRIPTION"} = $sref->{"description"} if (defined $sref->{"description"}); $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"}; $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"}; $ENV{"MON_CFBASEDIR"} = $CF{"CFBASEDIR"}; if (!exec @args) { syslog ('err', "could not exec '@args': $!"); exit (1); } } $sref->{"_last_check"} = scalar (time); $sref->{"_monitor_running"} = 1; debug (1, "watching file handle ", fileno ($fhandles{"$group/$service"}), " for $group/$service\n"); # # set nonblocking I/O and setup bit vector for select(2) # configure_filehandle ($fhandles{"$group/$service"}) || syslog ("err", "could not configure filehandle for $group/$service: $!"); vec ($fdset_rbits, fileno($fhandles{"$group/$service"}), 1) = 1; $fdset_ebits |= $fdset_rbits; # # note that this is running # $running{"$group/$service"} = 1; $runningpid{$pid} = "$group/$service"; $ibufs{"$group/$service"} = ""; $procs++; } if ($sref->{"_next_check"}) { $sref->{"_next_check"} += $sref->{"interval"}; } else { $sref->{"_next_check"} = time() + $sref->{"interval"}; } } # # set the countdown timer for this service # sub reset_timer { my ($group, $service) = @_; my $sref = \%{$watch{$group}->{$service}}; if ($sref->{"randskew"} != 0) { $sref->{"_timer"} = $sref->{"interval"} + (int (rand (2)) == 0 ? -int(rand($sref->{"randskew"}) + 1) : int(rand($sref->{"randskew"})+1)); } elsif ($sref->{"_next_check"}) { if (($sref->{"_timer"} = $sref->{"_next_check"} - time()) < 0) { $sref->{"_timer"} = $sref->{"interval"}; } } else { $sref->{"_timer"} = $sref->{"interval"}; } } # # randomize the delay before each test # $opt{"randstart"} is seconds # sub randomize_startdelay { my ($group, $service); foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { $watch{$group}->{$service}->{"_timer"} = int (rand ($CF{"RANDSTART"})); } } } # # return 1 if $val is within $range, # where $range = "number" or "number-number" # sub inRange { my ($val, $range) = @_; my ($retval); $retval = 0; if ($range =~ /^(\d+)$/ && $val == $1) { $retval = 1 } elsif ($range =~ /^(\d+)\s*-\s*(\d+)$/ && ($val >= $1 && $val <= $2)) { $retval = 1 } $retval; } # # disable ($cmd==0) or enable a watch # sub disen_watch { my ($w, $cmd) = @_; return undef if (!defined ($watch{$w})); if (!$cmd) { $watch_disabled{$w} = 1; } else { $watch_disabled{$w} = 0; } } # # disable ($cmd==0) or enable a service # sub disen_service { my ($g, $s, $cmd) = @_; my ($snum); return undef if (!defined $watch{$g}); return undef if (!defined $watch{$g}->{$s}); if (!$cmd) { $watch{$g}->{$s}->{"disable"} = 1; } else { $watch{$g}->{$s}->{"disable"} = 0; } } # # disable ($cmd==0) or enable a host # sub disen_host { my ($h, $cmd) = @_; my $found = undef; foreach my $g (keys %groups) { if ((!defined $cmd) || $cmd == 0) { if (grep (s/^$h$/*$h/, @{$groups{$g}})) { $found = 1; } } else { if (grep (s/^\*$h$/$h/, @{$groups{$g}})) { $found = 1; } } } $found; } sub host_exists { my $host = shift; my $found = 0; foreach my $g (keys %groups) { if (grep (/^$host$/, @{$groups{$g}})) { $found = 1; last; } } $found; } # # given a host, search groups and return an array of group # names which have that host as their only member. return # an empty array if no group found # # sub host_singleton_group { my $host = shift; my @found; foreach my $g (keys %groups) { if (grep (/^\*?$host$/, @{$groups{$g}}) && scalar(@{$groups{$g}}) == 1) { push (@found, $g); } } return (@found); } # # save state # sub save_state { my (@states) = @_; my ($group, $service, @l, $state); foreach $state (@states) { if ($state eq "disabled" || $state eq "all") { if (!open (STATE, ">$CF{STATEDIR}/disabled")) { syslog ("err", "could not write to state file: $!"); next; } foreach $group (keys %groups) { @l = grep (/^\*/, @{$groups{$group}}); if (@l) { grep (s/^\*//, @l); grep { print STATE "disable host $_\n" } @l; } } foreach $group (keys %watch) { if (exists $watch_disabled{$group} && $watch_disabled{$group} == 1) { print STATE "disable watch $group\n"; } foreach $service (keys %{$watch{$group}}) { if (defined $watch{$group}->{$service}->{'disable'} && $watch{$group}->{$service}->{'disable'} == 1) { print STATE "disable service $group $service\n"; } } } close (STATE); } if ($state eq "opstatus" || $state eq "all") { if (!open (STATE, ">$CF{STATEDIR}/opstatus")) { syslog ("err", "could not write to opstatus state file: $!"); next; } foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { print STATE "group=$group\tservice=$service"; foreach my $var (qw(op_status failure_count alert_count last_success first_success consec_failures last_failure first_failure last_summary last_failure_time last_failure_summary last_failure_detail last_detail ack ack_comment last_trap last_traphost exitval last_check last_op_status failure_output trap_timer)) { print STATE "\t$var=" . esc_str($watch{$group}->{$service}->{"_$var"}); } foreach my $periodlabel (keys %{$watch{$group}->{$service}->{periods}}) { foreach my $var (qw(last_alert alert_sent 1stfailtime failcount)) { print STATE "\t$periodlabel:$var=" . esc_str($watch{$group}->{$service}{periods}{$periodlabel}{"_$var"}); } } print STATE "\n"; } } close (STATE); } } } # # load state # sub load_state { my (@states) = @_; my ($l, $cmd, $args, $group, $service, $what, $state); foreach $state (@states) { if ($state eq "disabled" || $state eq "all") { if (!open (STATE, "$CF{STATEDIR}/disabled")) { syslog ("err", "could not read state file: $!"); next; } while (defined ($l = )) { chomp $l; ($cmd, $what, $args) = split (/\s+/, $l, 3); next if ($cmd ne "disable"); if ($what eq "host") { disen_host ($args); } elsif ($what eq "watch") { syslog ("err", "undefined watch reading state file: $l") if (!defined disen_watch ($args)); } elsif ($what eq "service") { ($group, $service) = split (/\s+/, $args, 2); syslog ("err", "undefined group or service reading state file: $l") if (!defined disen_service ($group, $service)); } } syslog ("info", "state '$state' loaded"); close (STATE); } if ($state eq "opstatus" || $state eq "all") { if (!open (STATE, "$CF{STATEDIR}/opstatus")) { syslog ("err", "could not read state file: $!"); next; } while (defined ($l = )) { chomp $l; my %opstatus = map{ /^(.*)=(.*)$/; $1 => $2} split (/\t/, $l,); next unless (exists $opstatus{group} && exists $watch{$opstatus{group}} && exists $opstatus{service} && exists $watch{$opstatus{group}}->{$opstatus{service}}); foreach my $op (keys %opstatus) { next if ($op eq 'group' || $op eq 'service'); if ($op =~ /^(.*):(.*)$/) { next unless exists $watch{$opstatus{group}}->{$opstatus{service}}{periods}{$1}; $watch{$opstatus{group}}->{$opstatus{service}}{periods}{$1}{"_$2"} = un_esc_str($opstatus{$op}); } else { $watch{$opstatus{group}}->{$opstatus{service}}{"_$op"} = un_esc_str($opstatus{$op}); } } } syslog ("info", "state '$state' loaded"); close (STATE); } } } # # authenticate a login # sub auth { my ($type, $user, $plaintext, $host) = @_; my ($pass, %u, $l, $u, $p); if ($user eq "" || ($type ne 'trustlocal' && $plaintext eq "")) { syslog ('err', "an undef username or password supplied"); return undef; } # # standard UNIX passwd # if ($type eq "getpwnam") { (undef, $pass) = getpwnam($user); return undef if (!defined $pass); if ((crypt ($plaintext, $pass)) ne $pass) { return undef; } return 1; # # shadow password # } elsif ($type eq "shadow") { # # "mon" authentication # } elsif ($type eq "userfile") { if (!open (U, $CF{"USERFILE"})) { syslog ('err', "could not open user file '$CF{USERFILE}': $!"); return undef; } while () { next if (/^\s*#/ || /^\s*$/); chomp; ($u,$p) = split (/\s*:\s*/, $_, 2); $u{$u} = $p; } close (U); return undef if (!defined($u{$user})); #user was not found in userfile return undef if ((crypt ($plaintext, $u{$user})) ne $u{$user}); #user gave wrong password return 1; # # PAM authentication # } elsif ($type eq "pam") { local $PAM_username = $user; local $PAM_password = $plaintext; my $pamh; if (!ref($pamh = new Authen::PAM($CF{'PAMSERVICE'}, $PAM_username, \&pam_conv_func))) { syslog ('err', "Error code $pamh during PAM init!: $!"); return undef; } my $res = $pamh->pam_authenticate ; return undef if ($res != &Authen::PAM::PAM_SUCCESS) ; return 1; } elsif ($type eq "trustlocal") { # We're configured to trust all authentications from localhost # i.e. cgi scripts are handling authentication themselves return undef if ($host ne "127.0.0.1"); return 1; } else { syslog ('err', "authentication type '$type' not known"); } return undef; } # # load the table of who can do which commands # sub load_auth { my ($startup) = @_; my ($l, $cmd, $users, $u, $host, $user, $password, $sect); %AUTHCMDS = (); %NOAUTHCMDS = (); %AUTHTRAPS = (); $sect = "command"; if (!open (C, $CF{"AUTHFILE"})) { err_startup ($startup, "could not open $CF{AUTHFILE}: $!"); return undef; } while (defined ($l = )) { next if ($l =~ /^\s*#/ || $l =~ /^\s*$/); chomp $l; $l =~ s/^\s*//; $l =~ s/\s*$//; if ($l =~ /^command\s+section/) { $sect = "command"; next; } elsif ($l =~ /^trap\s+section/) { $sect = "trap"; next; } if ($sect eq "command") { ($cmd, $users) = split (/\s*:\s*/, $l, 2); if (!defined $users) { err_startup ($startup, "could not parse line $. of auth file\n"); next; } foreach $u (split (/\s*,\s*/, $users)) { if ( $u =~ /^AUTH_ANY$/ ) { # Allow all authenticated users $AUTHCMDS{"\L$cmd"}{$u} = 1; } elsif ( $u =~ /^!(.*)/ ) { # Directive is to "deny-user" $NOAUTHCMDS{"\L$cmd"}{$1} = 1; } else { # Directive is to "allow-user" $AUTHCMDS{"\L$cmd"}{$u} = 1; } } } elsif ($sect eq "trap") { if ($l !~ /^(\S+)\s+(\S+)\s+(\S+)$/) { syslog ('err', "invalid entry in trap sect of $CF{AUTHFILE}, line $."); next; } ($host, $user, $password) = ($1, $2, $3); if ($host eq "*") { # # allow traps from all hosts # } elsif ($host =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/) { if (($host = inet_aton ($host)) eq "") { syslog ('err', "invalid host in $CF{AUTHFILE}, line $."); next; } } elsif ($host =~ /^[A-Z\d][[A-Z\.\d\-]*[[A-Z\d]+$/i) { if (($host = inet_aton ($host)) eq "") { syslog ('err', "invalid host in $CF{AUTHFILE}, line $."); next; } } else { syslog ('err', "invalid host in $CF{AUTHFILE}, line $."); next; } if ($host ne "*") { $host = inet_ntoa ($host); } syslog ('notice', "Adding trap auth of: $host $user $password"); $AUTHTRAPS{$host}{$user} = $password; } else { syslog ('err', "unknown section in $CF{AUTHFILE}: $l"); } } close (C); } sub load_view_users {} sub view_match { my ($view, $group, $service) = @_; if (!defined($view)) { # print STDERR "No view in use\n"; return 1; } if (defined($group) && defined($views{$view}->{$group})) { # print STDERR "View $view contains $group\n"; return 1; } if (defined($views{$view}->{$group.":".$service})) { # print STDERR "View $view contains $group:$service\n"; return 1; } return 0; } # # return undef if $user isn't permitted to perform $cmd # Optional third argument controls logging to syslog. # e.g., # check_auth("joe", "disable") # will check to see if user joe is authorized to disable, and # complain to syslog if joe is not authorized # check_auth("joe", "disable", 1) # will check to see if user joe is authorized to disable but # NOT complain to syslog if joe is not authorized # sub check_auth { my ($user, $cmd, $no_syslog) = @_; # # Check to see if the authenticated user is specifically # denied the ability to run this command. # if ( (defined ($user) && $NOAUTHCMDS{$cmd}{$user}) || (defined ($user) && $NOAUTHCMDS{$cmd}{"AUTH_ANY"}) ) { syslog ("err", "user '$user' tried '$cmd', denied"); return undef; } # # Check for "all". This allows any client, authenticated or # not, to execute the requested command. # return 1 if ($AUTHCMDS{$cmd}{"all"}); # # Check for AUTH_ANY. This allows any authenticated user to # execute the requested command. # return 1 if (defined ($user) && $AUTHCMDS{$cmd}{"AUTH_ANY"}); # # Check to see if the authenticated user is specifically #allowed the ability to run this command. # return 1 if (defined ($user) && $AUTHCMDS{$cmd}{$user}); syslog ("err", "user '$user' tried '$cmd', not authenticated") unless defined($no_syslog); return undef; } # # reload things # sub reload { my (@what) = @_; for (@what) { if ($_ eq "auth") { load_auth; } else { return undef; } } return 1; } sub err_startup { my ($startup, $msg) = @_; if ($startup) { die "$msg\n"; } else { syslog ('err', $msg); } } # # handle a trap # sub handle_trap { my ($buf, $from) = @_; my $time = time; my %trap = (); my $flags = 0; my $tmnow = time; my $intended; my $fromip; # # MON-specific tags # pro protocol # aut auth # usr username # pas password # typ type ("failure", "up", "startup", "trap", "traptimeout") # spc specific type (STAT_OK, etc.) THIS IS NO LONGER USED # seq sequence # grp group # svc service # hst host # sta status (same as exit status of a monitor) # tsp timestamp as time(2) value # sum summary output # dtl detail # # # this part validates the trap # { foreach my $line (split (/\n/, $buf)) { if ($line =~ /^(\w+)=(.*)/) { my $trap_name = $1; my $trap_val = $2; chomp $trap_val; $trap_val =~ s/^\'(.*)\'$/\1/; $trap{$trap_name} = un_esc_str ($trap_val); } else { syslog ('err', "unspecified tag in trap: $line"); } } $trap{"sum"} = "$trap{sum}\n" if ($trap{"sum"} !~ /\n$/); my ($port, $addr) = sockaddr_in ($from); $fromip = inet_ntoa ($addr); # # trap authentication # my ($traphost, $trapuser, $trappass); if (defined ($AUTHTRAPS{"*"})) { $traphost = "*"; } else { $traphost = $fromip; } if (defined ($AUTHTRAPS{$traphost}{"*"})) { $trapuser = "*"; $trappass = ""; } else { $trapuser = $trap{"usr"}; $trappass = $trap{"pas"}; } if (!defined ($AUTHTRAPS{$traphost})) { syslog ('err', "received trap from unauthorized host: $fromip"); return undef; } if ($trapuser ne "*") { if (!defined $AUTHTRAPS{$traphost}{$trapuser} || crypt ($trappass, $AUTHTRAPS{$traphost}{$trapuser}) ne $AUTHTRAPS{$traphost}{$trapuser}) { syslog ('err', "received trap from unauthorized user $trapuser, host $traphost"); return undef; } } # # protocol version # if ($trap{"pro"} < $TRAP_PRO_VERSION) { syslog ('err', "cannot handle traps from version less than $TRAP_PRO_VERSION"); return undef; } # # validate trap type # if (!defined $trap{"sta"}) { syslog ('err', "no trap sta value specified from $fromip"); return undef; } # # if mon receives a trap for an unknown group/service, then the # default/default group/service should catch these if it is defined # if (!defined $watch{$trap{"grp"}} && defined $watch{"default"}) { $intended = "$trap{'grp'}:$trap{'svc'}"; $trap{"grp"} = "default"; } if ($trap{"grp"} eq 'default' && !defined($watch{default}->{$trap{"svc"}}) && defined($watch{'default'}->{'default'})) { $trap{"svc"} = "default"; } if (!defined ($groups{$trap{"grp"}})) { syslog ('err', "trap received for undefined group $trap{grp}"); return; } elsif (!defined $watch{$trap{"grp"}}->{$trap{"svc"}}) { syslog ('err', "trap received for undefined service type $trap{grp}/$trap{svc}"); return; } } # # trap has been validated, proceed # my $sref = \%{$watch{$trap{"grp"}}->{$trap{"svc"}}}; # # a trap recieved resets the trap timeout timer # if (exists $sref->{"traptimeout"}) { $sref->{"_trap_timer"} = $sref->{"traptimeout"}; } $sref->{"_last_trap"} = $time; if ($intended) { $sref->{"_intended"} = $intended; } syslog ('info', "trap $trap{typ} $trap{spc} from " . "$fromip grp=$trap{grp} svc=$trap{svc}, sta=$trap{sta}\n"); $sref->{"_trap_duration_timer"} = $sref->{"trapduration"} if ($sref->{"trapduration"}); process_event ("t", $trap{"grp"}, $trap{"svc"}, $trap{"sta"}, "$trap{sum}\n$trap{dtl}"); if( defined($sref->{"_intended"}) ) { undef($sref->{"_intended"}); } } # # trap timeout # sub handle_trap_timeout { my ($group, $service) = @_; my ($tmnow); $tmnow = time; my $sref = \%{$watch{$group}->{$service}}; $sref->{"_trap_timer"} = $sref->{"traptimeout"}; process_event ("T", $group, $service, 1, "trap timeout\n" . "trap timeout after " . $sref->{"traptimeout"} . "s at " . localtime ($tmnow) . "\n"); } # # write to a socket # sub sock_write { my ($sock, $buf) = @_; my ($nleft, $nwritten); $nleft = length ($buf); while ($nleft) { $nwritten = syswrite ($sock, $buf, $nleft); if (!defined ($nwritten)) { return undef if ($! != EAGAIN); usleep (100000); next; } $nleft -= $nwritten; substr ($buf, 0, $nwritten) = ""; } } # # do I/O processing for traps and client connections # sub handle_io { # # build iovec for server connections, traps, and clients # $iovec = ''; my $niovec = ''; vec ($iovec, fileno (TRAPSERVER), 1) = 1; vec ($iovec, fileno (SERVER), 1) = 1; foreach my $cl (keys %clients) { vec ($iovec, $cl, 1) = 1; } # # handle client I/O while there is some to handle # my $sleep = $SLEEPINT; my $tm0 = [gettimeofday]; my $n; while ($n = select ($niovec = $iovec, undef, undef, $sleep)) { my $tm1 = [gettimeofday]; if ($! != &EINTR) { # # mon trap # if (vec ($niovec, fileno (TRAPSERVER), 1)) { my ($from, $trapbuf); if (!defined ($from = recv (TRAPSERVER, $trapbuf, 65536, 0))) { syslog ('err', "error trying to recv a trap: $!"); } else { handle_trap ($trapbuf, $from); } next; # # client connections # } elsif (vec ($niovec, fileno (SERVER), 1)) { client_accept; } # # read data from clients if any exists # if ($numclients) { foreach my $cl (keys %clients) { next if (!vec ($niovec, $cl, 1)); my $buf = ''; $n = sysread ($clients{$cl}->{"fhandle"}, $buf, 8192); if ($n == 0 && $! != &EAGAIN) { client_close ($cl); } elsif (!defined $n) { client_close ($cl, "read error: $!"); } else { $clients{$cl}->{"buf"} .= $buf; $clients{$cl}->{"timeout"} = $CF{"CLIENT_TIMEOUT"}; $clients{$cl}->{"last_read"} = time; } } } } # # execute client commands which have been read # client_dopending if ($numclients); last if (tv_interval ($tm0, $tm1) >= $SLEEPINT); $sleep = $SLEEPINT - tv_interval ($tm0, $tm1); } if (!defined ($n)) { syslog ('err', "select returned an error for I/O loop: $!"); } # # count down client inactivity timeouts and close expired connections # if ($numclients) { foreach my $cl (keys %clients) { my $timenow = time; $clients{$cl}->{"timeout"} = $timenow - $clients{$cl}->{"last_read"}; if ($clients{$cl}->{"timeout"} >= $CF{"CLIENT_TIMEOUT"}) { client_close ($cl, "timeout after $CF{CLIENT_TIMEOUT}s"); } } } } # # generate alert and monitor path hashes # sub gen_scriptdir_hash { my ($d, @scriptdirs, @alertdirs, $found); %MONITORHASH = (); %ALERTHASH = (); foreach $d (split (/\s*:\s*/, $CF{"SCRIPTDIR"})) { if (-d "$d" && -x "$d") { push (@scriptdirs, $d); } else { syslog ('err', "scriptdir $d is not usable"); } } foreach $d (split (/\s*:\s*/, $CF{"ALERTDIR"})) { if (-d $d && -x $d) { push (@alertdirs, $d); } else { syslog ('err', "alertdir $d is not usable"); } } # # monitors # foreach my $group (keys %watch) { foreach my $service (keys %{$watch{$group}}) { next if (!defined $watch{$group}->{$service}->{"monitor"}); my $monitor = (split (/\s+/, $watch{$group}->{$service}->{"monitor"}))[0]; $found = 0; foreach (@scriptdirs) { if (-x "$_/$monitor") { $MONITORHASH{$monitor} = "$_/$monitor" unless (defined $MONITORHASH{$monitor}); $found++; last; } } if (!$found) { syslog ('err', "$monitor not found in one of (\@scriptdirs[@scriptdirs])"); } } } # # alerts # foreach my $group (keys %watch) { foreach my $service (keys %{$watch{$group}}) { if ($watch{$group}->{$service}->{"redistribute"} ne '') { my $alert = $watch{$group}->{$service}->{"redistribute"}; $found = 0; foreach (@alertdirs) { if (-x "$_/$alert") { $ALERTHASH{$alert} = "$_/$alert" unless (defined $ALERTHASH{$alert}); $found++; } } if (!$found) { syslog ('err', "$alert not found in one of (\@alerttdirs[@alertdirs])"); } } foreach my $period (keys %{$watch{$group}->{$service}->{"periods"}}) { foreach my $my_alert ( @{$watch{$group}->{$service}->{"periods"}->{$period}->{"alerts"}}, @{$watch{$group}->{$service}->{"periods"}->{$period}->{"upalerts"}}, @{$watch{$group}->{$service}->{"periods"}->{$period}->{"startupalerts"}}, @{$watch{$group}->{$service}->{"periods"}->{$period}->{"ackalerts"}}, @{$watch{$group}->{$service}->{"periods"}->{$period}->{"disablealerts"}}, ) { my $alert = $my_alert; $alert =~ s/^(\S+=\S+ )*(\S+).*$/$2/; $found = 0; foreach (@alertdirs) { if (-x "$_/$alert") { $ALERTHASH{$alert} = "$_/$alert" unless (defined $ALERTHASH{$alert}); $found++; } } if (!$found) { syslog ('err', "$alert not found in one of (\@alerttdirs[@alertdirs])"); } } } } } } # # do some processing on dirs # sub normalize_paths { my ($authtype, @authtypes); # # do some sanity checks on dirs # $CF{"STATEDIR"} = "$CF{BASEDIR}/$CF{STATEDIR}" if ($CF{"STATEDIR"} !~ m{^/}); syslog ('err', "$CF{STATEDIR} does not exist") if (! -d $CF{"STATEDIR"}); $CF{"LOGDIR"} = "$CF{BASEDIR}/$CF{LOGDIR}" if ($CF{"LOGDIR"} !~ m{^/}); syslog ('err', "$CF{LOGDIR} does not exist") if (! -d $CF{LOGDIR}); $CF{"AUTHFILE"} = "$CF{CFBASEDIR}/$CF{AUTHFILE}" if ($CF{"AUTHFILE"} !~ m{^/}); syslog ('err', "$CF{AUTHFILE} does not exist") if (! -f $CF{"AUTHFILE"}); @authtypes = split(' ' , $CF{"AUTHTYPE"}) ; foreach $authtype (@authtypes) { if ($authtype eq "userfile") { $CF{"USERFILE"} = "$CF{CFBASEDIR}/$CF{USERFILE}" if ($CF{"USERFILE"} !~ m{^/}); syslog ('err', "$CF{USERFILE} does not exist") if (! -f $CF{"USERFILE"}); } } $CF{"DTLOGFILE"} = "$CF{LOGDIR}/$CF{DTLOGFILE}" if ($CF{"DTLOGFILE"} !~ m{^/}); if ($CF{"HISTORICFILE"} ne "") { $CF{"HISTORICFILE"} = "$CF{LOGDIR}/$CF{HISTORICFILE}" if ($CF{"HISTORICFILE"} !~ m{^/}); } # # script and alert dirs may have multiple paths # foreach my $dir (\$CF{"SCRIPTDIR"}, \$CF{"ALERTDIR"}) { my @n; foreach my $d (split (/\s*:\s*/, $$dir)) { $d =~ s{/$}{}; $d = "$CF{BASEDIR}/$d" if ($d !~ m{^/}); syslog ('err', "$d does not exist, check your alertdir and mondir paths") unless (-d $d); push @n, $d; } $$dir = join (":", @n); } } # # set opstatus and save old status # sub set_op_status { my ($group, $service, $status) = @_; $watch{$group}->{$service}->{"_last_op_status"} = $watch{$group}->{$service}->{"_op_status"}; $watch{$group}->{$service}->{"_op_status"} = $status; } sub debug_dir { print STDERR < 1, $STAT_LINKDOWN => 1, $STAT_TIMEOUT => 1, ); %SUCCESS = ( $STAT_OK => 1, $STAT_COLDSTART => 1, $STAT_WARMSTART => 1, $STAT_UNKNOWN => 1, $STAT_UNTESTED => 1, ); %WARNING = ( $STAT_COLDSTART => 1, $STAT_WARMSTART => 1, $STAT_UNKNOWN => 1, $STAT_WARN => 1, ); %OPSTAT = ("fail" => $STAT_FAIL, "ok" => $STAT_OK, "coldstart" => $STAT_COLDSTART, "warmstart" => $STAT_WARMSTART, "linkdown" => $STAT_LINKDOWN, "unknown" => $STAT_UNKNOWN, "timeout" => $STAT_TIMEOUT, "untested" => $STAT_UNTESTED); # # fast lookup hashes for alerts and monitors # %MONITORHASH = (); %ALERTHASH = (); } # # clear timers # sub clear_timers { my ($group, $service) = @_; return undef if (!defined $watch{$group}->{$service}); my $sref = \%{$watch{$group}->{$service}}; $sref->{"_trap_timer"} = $sref->{"traptimeout"} if ($sref->{"traptimeout"}); $sref->{"_trap_duration_timer"} = $sref->{"trapduration"} if ($sref->{"trapduration"}); $sref->{"_timer"} = $sref->{"interval"} if ($sref->{"interval"}); $sref->{"_consec_failures"} = 0 if ($sref->{"_consec_failures"}); foreach my $period (keys %{$sref->{"periods"}}) { my $pref = \%{$sref->{"periods"}->{$period}}; $pref->{"_last_alert"} = 0 if ($pref->{"alertevery"}); $pref->{"_consec_failures"} = 0 if ($pref->{"alertafter_consec"}); $pref->{'_1stfailtime'} = 0 if ($pref->{"alertafterival"}); } } # # load some amount of the alert history into memory # sub readhistoricfile { return if ($CF{"HISTORICFILE"} eq ""); if (!open (HISTFILE, $CF{"HISTORICFILE"})) { syslog ('err', "Could not read history from $CF{HISTORICFILE} : $!"); return; } my $epochLimit = 0; if ($CF{"HISTORICTIME"} != 0) { $epochLimit = time - $CF{"HISTORICTIME"}; } @last_alerts = (); while () { next if (/^\s*$/ || /^\s*#/); chomp; my $epochAlert = (split(/\s+/))[3]; push (@last_alerts, $_) if ($epochAlert >= $epochLimit); } close (HISTFILE); if (defined $CF{"MAX_KEEP"}) { splice(@last_alerts, 0, $#last_alerts + 1 - $CF{"MAX_KEEP"}); } } # # This routine simply calls an alert. # # call with %args = ( # group => "name of group", # service => "name of service", # pref => "optional period reference", # alert => "alert script", # args => "args to alert script", # flags => "flags, as in $FL_*", # retval => "return value of monitor", # output => "output of monitor", # ) # sub call_alert { my (%args) = @_; foreach my $mandatory_arg (qw(group service flags retval alert output)) { if (!exists $args{$mandatory_arg}) { debug (1, "returning from call_alert because of missing arg $mandatory_arg\n"); return (undef); } } my @groupargs = grep (!/^\*/, @{$groups{$args{"group"}}}); my $tmnow = time; my ($summary) = split("\n", $args{"output"}); $summary = "(NO SUMMARY)" if (!defined $summary || $summary =~ /^\s*$/m); my $sref = \%{$watch{$args{"group"}}->{$args{"service"}}}; my $pref; if (defined $args{"pref"}) { $pref = $args{"pref"}; } if (! defined $args{"args"}) { $args{"args"} = ''; } my $alert = ""; if (!defined $ALERTHASH{$args{"alert"}} || ! -f $ALERTHASH{$args{"alert"}}) { syslog ('err', "no alert found while trying to run $args{alert}"); return undef; } else { $alert = $ALERTHASH{$args{"alert"}}; } my $alerttype = ""; # sent to syslog and stored in @last_alerts my $alert_type = "failure"; # MON_ALERTTYPE set to this if ($args{"flags"} & $FL_UPALERT) { $alerttype = "upalert"; $alert_type = "up"; } elsif ($args{"flags"} & $FL_STARTUPALERT) { $alerttype = "startupalert"; $alert_type = "startup"; } elsif ($args{"flags"} & $FL_ACKALERT) { $alerttype = "ackalert"; $alert_type = "ack"; } elsif ($args{"flags"} & $FL_DISABLEALERT) { $alerttype = "disablealert"; $alert_type = "disable"; } elsif ($args{"flags"} & $FL_TRAPTIMEOUT) { $alerttype = "traptimeoutalert"; $alert_type = "traptimeout"; } elsif ($args{"flags"} & $FL_TRAP) { $alerttype = "trapalert"; $alert_type = "trap"; } elsif ($args{"flags"} & $FL_TEST) { $alerttype = "testalert"; $alert_type = "test"; } else { $alerttype = "alert"; } # # log why we are triggering an alert # my $a = $alert; $a =~ s{^.*/([^/]+)$}{$1}; syslog ("alert", "calling $alerttype $a for" . " $args{group}/$args{service} ($alert,$args{args}) $summary") if (!($args{"flags"} & $FL_REDISTRIBUTE));; # We may block while writing to the alert script, so we'll fork first, allowing the # master process to move on. my $pid; if ($pid = fork()) { ## Master # Do Nothing } elsif (defined($pid)) { ## Child my $pid = open (ALERT, "|-"); if (!defined $pid) { syslog ('err', "could not fork: $!"); return undef; } # # grandchild, the actual alert # if ($pid == 0) { # # set env variables to pass to the alert # foreach my $v (keys %{$sref->{"ENV"}}) { $ENV{$v} = $sref->{"ENV"}->{$v}; } $ENV{"MON_LAST_SUMMARY"} = $sref->{"_last_summary"} if (defined $sref->{"_last_summary"}); $ENV{"MON_LAST_OUTPUT"} = $sref->{"_last_output"} if (defined $sref->{"_last_output"}); $ENV{"MON_LAST_FAILURE"} = $sref->{"_last_failure"} if (defined $sref->{"_last_failure"}); $ENV{"MON_FIRST_FAILURE"} = $sref->{"_first_failure"} if (defined $sref->{"_first_failure"}); $ENV{"MON_FIRST_SUCCESS"} = $sref->{"_first_success"} if (defined $sref->{"_last_success"}); $ENV{"MON_LAST_SUCCESS"} = $sref->{"_last_success"} if (defined $sref->{"_last_success"}); $ENV{"MON_DESCRIPTION"} = $sref->{"description"} if (defined $sref->{"description"}); $ENV{"MON_GROUP"} = $args{"group"} if (defined $args{"group"}); $ENV{"MON_SERVICE"} = $args{"service"} if (defined $args{"service"}); $ENV{"MON_RETVAL"} = $args{"retval"} if (defined $args{"retval"}); $ENV{"MON_OPSTATUS"} = $sref->{"_op_status"} if (defined $sref->{"_op_status"}); $ENV{"MON_ACK"} = $sref->{"_ack_comment"} if ($sref->{"_ack"} && $sref->{"_ack_comment"} ne ""); $ENV{"MON_ALERTTYPE"} = $alert_type; $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"}; $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"}; $ENV{"MON_CFBASEDIR"} = $CF{"CFBASEDIR"}; if( defined($sref->{"_intended"}) ) { $ENV{"MON_TRAP_INTENDED"} = $sref->{"_intended"}; } else { undef ($ENV{"MON_TRAP_INTENDED"}) if (defined($ENV{"MON_TRAP_INTENDED"})); } my $t; $t = "-u" if ($args{"flags"} & $FL_UPALERT); $t = "-a" if ($args{"flags"} & $FL_ACKALERT); $t = "-D" if ($args{"flags"} & $FL_DISABLEALERT); $t = "-T" if ($args{"flags"} & $FL_TRAP); $t = "-O" if ($args{"flags"} & $FL_TRAPTIMEOUT); my @execargs = ( $alert, "-s", "$args{service}", "-g", "$args{group}", "-h", "@groupargs", "-t", "$tmnow", ); if ($t) { push @execargs, $t; } if ($args{"args"} ne "") { push @execargs, quotewords('\s+',0,$args{"args"}); } if (!exec @execargs) { syslog ('err', "could not exec alert $alert: $!"); return undef; } exit; } # # this will block if the alert is sucking gas, which is why we forked above # print ALERT $args{"output"}; close (ALERT); exit; } # # test alerts and redistributions don't count # return (1) if ($args{"flags"} & ($FL_TEST | $FL_REDISTRIBUTE)); # # tally this alert # if (defined $args{"pref"}) { $pref->{"_last_alert"} = $tmnow; } $sref->{"_alert_count"}++; # # store this in the log # shift @last_alerts if (@last_alerts > $CF{"MAX_KEEP"}); my $alertline = "$alerttype $args{group} $args{service}" . " $tmnow $alert ($args{args}) $summary"; push @last_alerts, $alertline; # # append to alert history file # if ($CF{"HISTORICFILE"} ne "") { if (!open (HISTFILE, ">>$CF{HISTORICFILE}")) { syslog ('err', "Could not append alert history to $CF{HISTORICFILE}: $!"); } else { print HISTFILE $alertline, "\n"; close (HISTFILE); } } return 1; } # # recursively evaluate a dependency expression # substitutes "GROUP:SERVICE" with "1" or "0" if the service is pass/fail, resp. # # returns an anonymous hash reference # # { # status =>, # "D" recursion depth exceeded # # "O" everything is OK # # "E" eval error # depend =>, # 1 for success (no deps in a failure state) # # 0 if any deps failed # error =>, # the textual error associated with "D" or "E" status # } # sub depend { my ($depend, $depth, $deptype) = @_; debug (2, "checking DEP [$depend]\n"); if ($depth > $CF{"DEP_RECUR_LIMIT"}) { return { status => "D", depend => undef, error => "recursion too deep for ($depend)", }; } foreach my $depstr ($depend =~ /[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+/g) { my ($group ,$service) = split(':', $depstr); my $sref = \%{$watch{$group}->{$service}}; my $depval = undef; my $subdepend = ""; if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq $deptype) { $subdepend = $sref->{"depend"}; } elsif ($deptype eq 'a' && defined $sref->{"alertdepend"}) { $subdepend = $sref->{"alertdepend"}; } elsif ($deptype eq 'm' && defined $sref->{"monitordepend"}) { $subdepend = $sref->{"monitordepend"}; } # # disabled watches and services used to be counted as "passing" # now we'll use the actual values, to avoid having dependent services # alert when a broken service gets disabled # # if ((exists $watch_disabled{$group} && $watch_disabled{$group}) || (defined $sref->{"disable"} && $sref->{"disable"} == 1)) # { # $depval = 1; # # # root dependency found # # } # elsif ($subdepend eq "") if ($subdepend eq "") { debug (2, " found root dep $group,$service\n"); $depval = $SUCCESS{$sref->{"_op_status"}} && ($sref->{"_last_failure_time"} < (time - $sref->{"dep_memory"})); # # not a root dep, recurse # } else { # # do it recursively # my $dstatus = depend ($subdepend, $depth + 1, $deptype); debug (2, "recur depth $depth returned $dstatus->{status},$dstatus->{depend}\n"); # # a bad thing happened, bail out # if ($dstatus->{"status"} ne "O") { debug (2, "recursive dep failure for $group,$service (status=$dstatus->{status})\n"); return $dstatus; } $depval = $dstatus->{"depend"} && $SUCCESS{$sref->{"_op_status"}} && ($sref->{"_last_failure_time"} < (time - $sref->{"dep_memory"})); } my $v = int ($depval); debug (2, " ($group,$service) $depth depend=[$v][$depend]"); $depend =~ s/\b$depstr\b/$v/g; debug (2, " depend=[$depend]\n"); } debug (2, " before eval: [$depend]"); my $e = eval("$DEP_EVAL_SANDBOX $depend"); debug (2, " after eval: [$e]\n"); if ($@ eq "") { return { status => "O", depend => $e, }; } else { return { status => "E", depend => $e, error => $@, }; } } # # returns undef on error # 0 if dependency failure, sets _depend_status to 0 # 1 if dependencies are OK, sets _depend_status to 1 # sub dep_ok { my $sref = shift; my $deptype = shift; my $depend = ""; if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq $deptype) { $depend = $sref->{"depend"}; } elsif ($deptype eq 'a' && defined $sref->{"alertdepend"}) { $depend = $sref->{"alertdepend"}; } elsif ($deptype eq 'm' && defined $sref->{"monitordepend"}) { $depend = $sref->{"monitordepend"}; } return 1 unless ($depend ne ""); my $s = depend ($depend, 0, $deptype); if ($s->{"status"} eq "D") { debug (2, "dep recursion too deep\n"); return undef; } elsif ($s->{"status"} eq "E") { syslog ("notice", "eval error for dependency starting at $depend: ".$s->{error}); return undef; } elsif ($s->{"status"} eq "O" && !$s->{"depend"}) { $sref->{"_depend_status"} = 0; return 0; } $sref->{"_depend_status"} = 1; return 1; } # # returns undef on error # otherwise a reference to a list summaries from all # DIRECT dependencies currently failing sub dep_summary { my $sref = shift; my @sum; my @deps = (); if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq "hm") { @deps = ($sref->{"depend"} =~ /[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+/g); } elsif (defined $sref->{"hostdepend"}) { @deps = ($sref->{"hostdepend"} =~ /[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+/g); } return [] if (! @deps); foreach (@deps) { my ($group, $service) = split /:/; if (!(exists $watch{$group} && exists $watch{$group}->{$service})) { return undef; } if ($watch{$group}->{$service}{"_op_status"} == $STAT_FAIL) { push @sum, $watch{$group}->{$service}{"_last_summary"}; } elsif ($watch{$group}->{$service}{"_last_failure_time"} >= (time - $watch{$group}->{$service}{"dep_memory"})) { push @sum, $watch{$group}->{$service}{"_last_failure_summary"}; } } return \@sum; } # # convert a string to a hex-escaped string, returning # the escaped string. # # $str is the string to be escaped # if $inquotes is true, backslashes are doubled, making # the escaped string suitable to be enclosed in # single quotes and later passed to Text::quotewords. # For example, var='quoted value' # sub esc_str { my $str = shift; my $inquotes = shift; my $escstr = ""; return $escstr if (!defined $str); for (my $i = 0; $i < length ($str); $i++) { my $c = substr ($str, $i, 1); if (ord ($c) <= 32 || ord ($c) > 126 || $c eq "\"" || $c eq "\'") { $c = sprintf ("\\%02x", ord($c)); } elsif ($inquotes && $c eq "\\") { $c = "\\\\"; } $escstr .= $c; } $escstr; } # # convert a hex-escaped string into an unescaped string, # returning the unescaped string # sub un_esc_str { my $str = shift; $str =~ s{\\([0-9a-f]{2})}{chr(hex($1))}eg; $str; } sub syslog_die { my $msg = shift; syslog ("err", $msg); die "$msg\n"; } no warnings; # Redefining syslog sub syslog { eval { local $SIG{"__DIE__"}= sub { }; my @log = map { s/\%//mg; } @_; Sys::Syslog::syslog(@log); } } use warnings; # # Have a "conversation" with a PAM authentication module. This fools the # PAM module into authenticating us non-interactively. # sub pam_conv_func { my @res; while ( @_ ) { my $code = shift; my $msg = shift; my $ans = ""; $ans = $PAM_username if ($code == Authen::PAM::PAM_PROMPT_ECHO_ON() ); $ans = $PAM_password if ($code == Authen::PAM::PAM_PROMPT_ECHO_OFF() ); push @res, Authen::PAM::PAM_SUCCESS(); push @res, $ans; } push @res, Authen::PAM::PAM_SUCCESS(); return @res; } sub write_dtlog { my ($sref, $group, $service) = @_; my $tmnow = time; $sref->{"_first_failure"} = $START_TIME if ($sref->{"_first_failure"} == 0); if (!open (DTLOG, ">>$CF{DTLOGFILE}")) { syslog ('err', "could not append to $CF{DTLOGFILE}: $!"); $CF{"DTLOGGING"} = 0; } else { $CF{"DTLOGGING"} = 1; print DTLOG ($tmnow, " $group", " $service", " ", 0 + $sref->{"_first_failure"}, " ", 0 + $tmnow - $sref->{"_first_failure"}, " ", 0 + $sref->{'interval'}, " $sref->{'_last_summary'}\n") or syslog ('err', "error writing to $CF{DTLOGFILE}: $!"); close(DTLOG); } } # Perl's "system" function blocks. We don't want the mon process to # ever block. So we fork then call system. Mon will handle the # child process cleanup elsewhere. sub mysystem { my @args = @_; my $pid; print STDERR "mysystem called: @args\n"; if ($pid = fork()) { ## parent return; } elsif (defined($pid)) { ## child system(@args); exit(0) } else { ## parent - fork failed print STDERR "You lose!\n"; } print STDERR "mysystem returning\n"; }; mon-1.2.0/state.d/0000755003616100016640000000000010640450347013554 5ustar trockijtrockijmon-1.2.0/state.d/README0000644003616100016640000000003510061516614014427 0ustar trockijtrockijThis is the state directory. mon-1.2.0/README0000644003616100016640000000675610637737255013124 0ustar trockijtrockij$Name: mon-1-2-0-release $ $Id: README,v 1.3.2.2 2007/06/25 13:10:05 trockij Exp $ INTRODUCTION ------------ "mon" is a tool for monitoring the availability of services, and sending alerts on prescribed events. Services are defined as anything tested by a "monitor" program, which can be something as simple as pinging a system, or as complex as analyzing the results of an application-level transaction. Alerts are actions such as sending emails, making submissions to ticketing systems, or triggering resource fail-over in a high-availability cluster. The tool is extremely useful for system administrators, but not limited to use by them. It was designed to be a general-purpose problem alerting system, separating the tasks of testing services for availability and sending alerts when things fail. To achieve this, "mon" is implemented as a scheduler which runs the programs which do the testing, and triggering alert programs when these scripts detect failure. Alerts can be controlled by a variety of "squelch" knobs, and complex dependencies can be configured to help suppress excessive alerts. None of the actual service testing or reporting is actually handled directly by the mon server itself. These functions are handled by auxillary programs. This model was chosen because it is very extensible, and does not require changing the code of the scheduler to add new tests or alert types. For example, an alphanumeric paging alert can be added simply by writing a new alert script, and referencing the alert script in the configuration file. Monitoring the temperature in a room can be done by adding a script that gathers data from a thermistor via a serial port. Often these monitoring scripts can just be wrappers for pre-existing software, such as "ping" or "ftp". The mon scheduler also can service network clients, allowing manipulation of run-time parameters, disabling and enabling of alerts and tests, listing failure and alert history, and reporting of current states of all monitors. There are several clients which come with the distribution, found in cgi-bin/ and clients/ : -moncmd, which is a command-line client. moncmd supports the full functionality of the client/server interface. -monshow, a dual command-line and CGI interface report generator for showing the operational status of the services monitored by the server. It displays nicely-formatted columnar output of the current operational status, groups, and the failure log. -skymon, which is a SkyTel 2-Way paging interface, allowing you to query the server's state and to manipulate it in the same manner as moncmd, right from your pager. Access is controlled via a simple password and an access control file. -mon.cgi, which is an interactive web interface, allowing you to not only view status information, but to change parameters in the server while it is running. AVAILABILITY ------------ The latest release of mon is available from kernel.org in /pub/software/admin/mon/. Please choose a mirror from: http://www.kernel.org/mirrors/ The WWW page is at http://www.kernel.org/software/mon/ CVS --- CVS trees of both the development trunk and stable release branches are available from anonymous CVS access on sourceforge.net. To check out the latest, see: http://mon.wiki.kernel.org/index.php/Development INSTALLATION ------------ See the "INSTALL" file for installation instructions. ---------- Jim Trocki Software Engineer Linux Systems Group Unisys Malvern, PA mon-1.2.0/mon.d/0000755003616100016640000000000010640450347013225 5ustar trockijtrockijmon-1.2.0/mon.d/msql-mysql.monitor0000755003616100016640000000740510620056565016770 0ustar trockijtrockij#!/usr/bin/perl # # $Id: msql-mysql.monitor,v 1.1.1.1.4.1 2007/05/08 11:22:29 trockij Exp $ # # arguments: # # [--mode [msql|mysql]] --username=username --password=password # --database=database --port=# # hostname # # a monitor to determine if a mSQL or MySQL database server is operational # # Rather than use tcp.monitor to ensure that your SQL server is responding # on the proper port, this attempts to connect to and list the databases # on a given database server. # # The single argument, --mode [msql|mysql] is inferred from the script name # if it is named mysql.monitor or msql.monitor. Thus, the following two are # equivalent: # # ln msql-mysql.monitor msql.monitor # ln msql-mysql.monitor mysql.monitor # msql.monitor hostname # mysql.monitor hostname # # and # # msql-mysql.monitor --mode msql hostname # msql-mysql.monitor --mode mysql hostname # # use the syntax that you feel more comfortable with. # # This monitor requires the perl5 DBI, DBD::mSQL and DBD::mysql modules, # available from CPAN (http://www.cpan.org) # # Copyright (C) 1998, ACC TelEnterprises # Written by James FitzGibbon # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use DBI; use Getopt::Long; use POSIX ':signal_h'; my @details=(); my @failures=(); my $mask = POSIX::SigSet->new( SIGALRM ); my $action = POSIX::SigAction->new( sub { die "connect timeout" }, # the handler code ref $mask, # not using (perl 5.8.2 and later) 'safe' switch or sa_flags ); GetOptions( \%options, "mode=s", "port=i", "username=s", "password=s", "database=s", "timeout=i" ); # uncomment these two lines and provide suitable information if you don't # want to pass sensitive information on the command line #$options{username} ||= "username"; #$options{password} ||= "password"; $options{timeout} = 60 if ! $options{timeout}; if( $0 =~ m/\/msql\.monitor$/ || $options{mode} =~ m/msql/i ) { $mode = "mSQL"; $options{port} = 1114 if ! $options{port}; } elsif( $0 =~ m/\/mysql\.monitor/ || $options{mode} =~ m/mysql/i) { $mode = "mysql"; $options{port} = 3306 if ! $options{port}; } else { print "invalid mode $mode!\n"; exit 1; } for $host( @ARGV ) { my $dbh = 0; my $oldaction = POSIX::SigAction->new(); sigaction( 'ALRM', $action, $oldaction ); eval { alarm $options{timeout}; $dbh = DBI->connect( "DBI:$mode:$options{database}:$host:$options{port}", $options{username}, $options{password}, { PrintError => 0 } ); alarm 0; }; alarm 0; sigaction( 'ALRM', $oldaction ); if ($@) { push( @failures, $host); push( @details, "$host: Could not connect to $mode server on $options{port}: $@\n"); next; } elsif( ! $dbh ) { push( @failures, $host); push( @details, "$host: Could not connect to $mode server on $options{port}: " . $DBI::errstr . "\n"); next; } @tables = $dbh->tables(); if( $#tables < 0 ) { push( @failures, $host); push( @details, "$host: No tables found for database $options{database}\n"); } $dbh->disconnect(); } if (@failures) { print join (" ", sort @failures), "\n"; print sort @details if (scalar @details > 0); exit 1; } else { exit 0; } mon-1.2.0/mon.d/radius.monitor0000755003616100016640000001004410146140377016127 0ustar trockijtrockij#!/usr/bin/perl # # Monitor radius processes # # Based upon radius.monitor by Brian Moore, posted to the mon mailing list # # Arguments are: # # --username=user --password=pass --secret=secret # [--port=#] [--attempts=#] [--dictionary=/path/to/dictionary] # hostname [hostname ...] # # Arguments are in standard POSIX format and can be given as the least # significant part (i.e. -p is the same as --password). # # This monitor performs a real RADIUS check, attempting to be as much like a # terminal server as possible. This requires that you include a username, # password, and secret in your mon.cf file. Depending on your unix # implementation, this may allow unscrupulous users to view the command line # arguments, including your RADIUS secret. If you prefer, you can uncomment # three lines below (see comments) to provide defaults for username, # password, and secret. # # This monitor attempts to check a username and password up to n times # (defaults to 9, but can be set via the --attempts=# command line switch). # It only registers a failure to mon after failing to receive a satisfactory # response n times. It returns an immediate failure to mon if it receives a # failed authentication. For this reason, you will need to create a dummy # user on your RADIUS server for authentication testing. # # # Copyright (C) 1998, ACC TelEnterprises # Written by James FitzGibbon # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Authen::Radius; use Sys::Hostname; use Getopt::Long; GetOptions( \%options, "port=i", "secret=s", "username=s", "password=s", "attempts=i", "dictionary=s" ); $options{"port"} ||= 1645; $options{"attempts"} ||= 9; # # uncomment these three lines and replace with appropriate info if you'd prefer # not to pass sensitive information on the command line # $options{"username"} = "username"; $options{"password"} = "password"; $options{"secret"} = "radius-secret"; $options{"dictionary"} = "/etc/radius/dictionary"; Authen::Radius->load_dictionary( $options{dictionary} ); undef $diag; @failed_hosts = (); foreach $host (@ARGV) { $auth = new Authen::Radius(Host => "$host:$options{port}", Secret => $options{secret} ); $auth->add_attributes( { Name => "User-Name", Value => $options{username} }, { Name => "Password", Value => $options{password} }, { Name => "NAS-IP-Address", Value => join( ".", unpack ( "C4", (gethostbyname( hostname() ))[4] ) ) }, ); $done = 0; $attempts = 0; while( ! $done ) { $auth->send_packet( ACCESS_REQUEST ); $err = $auth->get_error(); if( $err ne "ENONE" ) { $attempts++; if( $attempts > $options{attempts} ) { push @failed_hosts, $host; push( @failures, "$host failed for user $options{username}: " . $auth->strerror( $err ) ); $done = 1; } next; } $resptype = $auth->recv_packet(); $err = $auth->get_error(); if( $err ne "ENONE" ) { $attempts++; if( $attempts > $options{attempts} ) { push @failed_hosts, $host; push( @failures, "$host failed for user $options{username}: " . $auth->strerror( $err ) ); $done = 1; } } elsif( $resptype == ACCESS_REJECT ) { push @failed_hosts, $host; push( @failures, "$host returned bad auth for user $options{username}" ); $done = 1; } else { $done = 1; } } } if (@failed_hosts) { print "@failed_hosts\n\n"; print join (", ", @failures), "\n"; exit 1; }; exit 0; mon-1.2.0/mon.d/telnet.monitor0000755003616100016640000000373610230411543016133 0ustar trockijtrockij#!/usr/bin/perl # # Use Net::Telnet to connect to a list of hosts. # # -p port connect to 'port' (defaults to 23) # -t secs set timeout to 'secs' (defaults to 10) # -l '/regex/' wait for /regex/, (defaults to "/ogin:/i") # # Arguments are "host [host...]" # # Jim Trocki, trockij@arctic.org # # $Id: telnet.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Net::Telnet; use Getopt::Std; getopts ("l:p:t:"); $TIMEOUT = $opt_t || 10; $PORT = $opt_p || 23; $LOGIN = $opt_l || "/ogin:/i"; @failures = (); foreach my $host (@ARGV) { my $t = new Net::Telnet ( Timeout => $TIMEOUT, Port => $PORT, ); if (!defined $t) { push @failures, [$host, "could not create new Net::Telnet object"]; next; } $t->errmode ("return"); if (!defined $t->open ($host)) { push @failures, [$host, $t->errmsg]; next; } my $ok = $t->waitfor ( Match => $LOGIN, Timeout => $TIMEOUT, ); if (!defined $ok) { push @failures, [$host, "did not get prompt: ". $t->errmsg]; } $t->close; } if (@failures == 0) { exit 0; } for (@failures) { push @l, $_->[0]; } print join (" ", sort @l), "\n"; for (@failures) { print "$_->[0]: $_->[1]\n"; } exit 1; mon-1.2.0/mon.d/ldap.monitor0000755003616100016640000001126510061516614015563 0ustar trockijtrockij#!/usr/bin/perl # # This script will search an LDAP server for objects that match the -filter # option, starting at the DN given by the -basedn option. Each DN found must # contain the attribute given by the -attribute option and the attribute's # value must match the value given by the -value option. Servers are given on # the command line. At least one server must be specified. # This script use the Net::LDAP, which uses some LDAP libraries like those # from UMich, Netscape, or ISODE. # # Porting to LDAP (from LDAPapi) by Thomas Quinot , # 1999-09-20. # Copyright (C) 1998, David Eckelkamp # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # # $Id: ldap.monitor,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ # use Net::LDAP; use Getopt::Long; # Here are the default values for the things you can specify via options $LDAPPort = 389; $BaseDN = "o=Your Org, c=US"; $Filter = "cn=Directory Admin"; $Attribute = "objectClass"; $Value = "YourValue"; $verbose = 0; @errs = (); %OptVars = ("port" => \$LDAPPort, "basedn" => \$BaseDN, "filter" => \$Filter, "attribute" => \$Attribute, "value" => \$Value, "verbose" => \$verbose); if (!GetOptions(\%OptVars, "port=i", "basedn=s", "filter=s", "attribute=s", "value=s", "verbose")) { print "Problems with Options, sorry.\n"; exit 1; } # There has to be at least one argument left, the ldap server to query. if ($#ARGV < 0) { print "$0: Insufficient arguments. There must be at least 1 server to query\n"; exit 1; } # Loop through all the server given on the command line. $ErrCnt = 0; foreach $LDAPHost (@ARGV) { # Open the connection to the server and do a simple, anonymous bind unless ($ldap = Net::LDAP->new($LDAPHost, port => $LDAPPort)) { push(@FailedHosts, "$LDAPHost:$LDAPPort"); push(@errs, "ldap_init Failed: host=$LDAPHost:$LDAPPort: $!"); $ErrCnt++; next; } unless ($ldap->bind) { $ErrCnt++; push(@FailedHosts, "$LDAPHost:$LDAPPort"); #ldap_perror($ldap, "ldap bind failed: host=$LDAPHost:$LDAPPort\n"); push(@errs, "ldap bind failed: host=$LDAPHost:$LDAPPort"); next; } unless ($mesg = $ldap->search(base => $BaseDN, filter => $Filter)) { my($errnd, $extramsg, $err); push(@errs, "$LDAPHost " . $mesg->error); $ldap->unbind; push(@FailedHosts, "$LDAPHost:$LDAPPort"); $ErrCnt++; next; } $nentries = 0; foreach $entry ($mesg->entries) { my $dn = $entry->dn; $nentries++; foreach $attr ($entry->attributes) { $record{$dn}->{$attr} = [$entry->get ($attr)]; } } $ldap->unbind; if ($nentries == 0) { push(@errs, "$LDAPHost returned no entries"); push(@FailedHosts, "$LDAPHost:$LDAPPort"); $ErrCnt++; next; } # Analyze results. # Step 1 is to loop through all DNs returned from the search. print "Looking for $Attribute=$Value\n" if $verbose; foreach $dn (sort keys %record) { print "checking object $dn\n" if $verbose; # Loop through the attributes for this DN $attrFound = 0; $goodVal = 0; foreach $attr (keys %{$record{$dn}}) { print " checking attr=$attr\n" if $verbose; next unless ($attr eq $Attribute); $attrFound++; print " found correct attribute\n" if $verbose; # Each value could be/is an array so search the array foreach $val (@{$record{$dn}{$attr}}) { print " checking val = $val\n" if $verbose; next unless ($val eq $Value); $goodVal++; print " found correct value\n" if $verbose; last; } last if ($goodVal); } if (!$attrFound || !$goodVal) { print "For object $dn:\n"; } if (!$attrFound) { $ErrCnt++; push(@errs,"Could not find Attribute \"$Attribute\" for DN=$dn"); push(@FailedHosts, "$LDAPHost:$LDAPPort"); } elsif (!$goodVal) { $ErrCnt++; push(@errs, "Value \"$Value\" not found for Attribute \"$Attribute\""); push(@FailedHosts, "$LDAPHost:$LDAPPort"); } } } if ($ErrCnt > 0) { print join (" ", sort @FailedHosts), "\n"; print join("\n", @errs), "\n"; } exit $ErrCnt; mon-1.2.0/mon.d/trace.monitor0000755003616100016640000003103510146140377015741 0ustar trockijtrockij#!/usr/bin/perl # # trace.monitor # # trace the route to an address, record previous routes, # compare newest path to last path and report divergences # while considering load-balanced hops, and log paths # historically. # # for use with mon # # use "trace.monitor -h" for help # # Jim Trocki # # $Id: trace.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $ # # Copyright (C) 2001-2003, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use strict; use Getopt::Std; use Data::Dumper; sub traceroute; sub print_path; sub load_last; sub path_to_string; sub debug; sub path_to_hash; sub test; sub append_log; sub print_hop; sub usage; sub process_hosts; my %opt; getopts ('hLs:l:d:t:m:', \%opt); if ($opt{"h"}) { usage; exit; } die "must supply host\n" if (@ARGV == 0); my $TIMEOUT = $opt{"t"} || 30; my $DEBUG = $opt{"d"} || 0; my $METHOD = "m"; if ($opt{"m"} ne "" && $opt{"m"} !~ /^[mn]$/) { die "method must be one of 'n' or 'm'\n"; } if ($opt{"m"}) { $METHOD = $opt{"m"}; } #my $LOGDIR = "/var/lib/mon/log.d"; #my $STATEDIR = "/var/lib/mon/state.d"; my $LOGDIR = "."; my $STATEDIR = "."; if (-d $opt{"l"}) { $LOGDIR = $opt{"l"}; } elsif (-d $ENV{"MON_LOGDIR"}) { $LOGDIR = $ENV{"MON_LOGDIR"}; } if (-d $opt{"s"}) { $STATEDIR = $opt{"s"}; } elsif (-d $ENV{"MON_STATEDIR"}) { $STATEDIR = $ENV{"MON_STATEDIR"}; } # # do the testing on each host # my ($failures, $failure_detail, $successes, $success_detail) = process_hosts (@ARGV); # # all the testing/logging is done, # now report the successes and failures # my $num_failures = @{$failures}; if ($num_failures) { print "@{$failures}\n"; } else { print "\n"; } for (my $i = 0; $i < @{$failures}; $i++) { print "$failures->[$i]\n--------------------\n"; print "$failure_detail->[$i]\n"; print "\n"; } if ($num_failures) { print "\n"; } for (my $i = 0; $i < @{$successes}; $i++) { print "$successes->[$i]\n---------------------------\n"; print "$success_detail->[$i]\n"; print "\n"; } exit $num_failures; # # print path # # if second arg is true, return the string # instead of printing it # sub print_path { my ($path, $str) = @_; my $string = ""; for (my $i= 0; $i < @{$path->{"path"}}; $i++) { my $hop = $path->{"path"}->[$i]; my @h = (); foreach my $list (@{$hop}) { push @h, sprintf ('%-15s %-10s', $list->[0], $list->[1]); } if ($str) { $string .= sprintf ("%02d %s\n", $i, "@h"); } else { printf ("%02d %s\n", $i, "@h"); } } $string; } sub print_hop { my ($path, $hopnum) = @_; my $hop = $path->{"path"}->[$hopnum]; my @h = (); foreach my $list (@{$hop}) { push @h, sprintf ('%-15s %-10s', $list->[0], $list->[1]); } sprintf ("%02d %s\n", $hopnum, "@h"); } sub save_last { my ($f, $p) = @_; if (!open (OUT, ">$f")) { return "$!"; } if (!$p->{"time"}) { print OUT time . " "; } else { print OUT "$p->{time} "; } print OUT "$p->{to} "; print OUT path_to_string ($p), "\n"; close (OUT); ""; } sub append_log { my ($f, $p) = @_; if (!open (OUT, ">>$f")) { return "$!"; } if (!$p->{"time"}) { print OUT time . " "; } else { print OUT "$p->{time} "; } print OUT "$p->{to} "; print OUT path_to_string ($p), "\n"; close (OUT); ""; } sub load_last { my ($f) = @_; if (!open (IN, $f)) { return "$!"; } my ($time, $path, $to); while () { next if (/^\s*#/ || /^\s*$/); chomp; next if (!/^\d+\s/); ($time, $to, $path) = split (/\s+/, $_); last; } close (IN); if ($path eq "") { return ("no path found in file"); } my %p; $p{"time"} = $time; $p{"to"} = $to; $p{"path"} = string_to_path ($path); $p{"hpath"} = path_to_hash ($p{"path"}); ("", { %p }); } sub path_to_string { my ($path) = @_; my @formatted_path; foreach my $hop (@{$path->{"path"}}) { my @tries = (); foreach my $hop_try (@{$hop}) { push @tries, "$hop_try->[0]/$hop_try->[1]"; } push @formatted_path, join (",", @tries); } join ("-", @formatted_path); } sub string_to_path { my ($string) = @_; my @path; foreach my $hop (split (/-/, $string)) { my @tries = (); foreach my $try (split (/,/, $hop)) { push @tries, [split (/\//, $try)]; } push @path, [@tries]; } [@path]; } sub save_path { my ($file, $path) = @_; } # # returns -1 if paths do not diverge, # or the index into @{$path1} where they do. # sub compare_paths { my ($path1, $path2, $behavior) = @_; # # $behavior is one of: # "n" normal # "m" mux mode, treat all routes on the same hop as # equals # my $i = 0; my $diverge = -1; while ($i < @{$path1->{"path"}} && $diverge == -1) { debug ("comparing hop $i"); # # path1 is longer than path2 # if ($i >= @{$path2->{"path"}}) { debug ("path1 longer than path2"); $diverge = $i; last; } else { # # MUX method # # no divergence if at least one of the routers for this # hop matches with the last sample. this is an attempt # to consider load-balanced hops. # # if ($behavior eq "m") { debug ("comparing using mux"); my $found = 0; foreach my $ip (keys %{$path1->{"hpath"}->[$i]}) { if ($path2->{"hpath"}->[$i]->{$ip} > 0) { debug ("found matching route at $i"); $found = 1; last; } } if (!$found) { debug ("did not find matching router at pos $i"); $diverge = $i; last; } } # # DEFAULT method # # default is to compare all routers for each hop # between path samples, and if they differ at all, # then consider it a divergence. # else { debug ("comparing using default"); # # hop tries differ # if (@{$path1->{"path"}->[$i]} != @{$path2->{"path"}->[$i]}) { $diverge = $i; last; } else { for (my $j = 0; $j < @{$path1->{"path"}->[$i]}; $j++) { if ($path1->{"path"}->[$i]->[$j]->[0] ne $path2->{"path"}->[$i]->[$j]->[0]) { debug ("found divergence index $j"); $diverge = $i; last; } else { debug ("no divergence index $j"); } } } } } $i++; } if ($diverge != -1 && @{$path1->{"path"}} != @{$path2->{"path"}}) { debug ("path lengths differ"); return $#{$path1->{"path"}}; } return $diverge; } # # traceroute to a host and return a data structure of the hops # and timings # # returns the list: # ( # "error msg, empty string if no error", # { # "path" => # [ # [["hop1 try1", ms], ["hop1 try2", ms], ["hop1 try3", ms]], # [["hop2 try2", ms], ...], # ... # ], # "hpath" => # [ # {"ipaddr" => count, ...}, # ], # } # ) # sub traceroute { my ($host, $timeout, $traceroute_args) = @_; my $pid; if (!($pid = open (IN, "traceroute -n $traceroute_args $host 2>/dev/null |"))) { return ($!, []); } my $hop = 0; my @hops = (); my @hash_hops = (); if ($timeout) { $SIG{"ALRM"} = sub {die "timeout" }; } eval { if ($timeout) { alarm ($timeout); } while () { if (!/^\s*\d+/) { debug ("skipping $_"); next; } my $line = $_; chomp $line; $line =~ s/^\s*//; debug ($line, 5); my @l = split (/\s+/, $line); $hop = shift @l; my @hoplist = (); my %hophash = (); my $i = 0; my $router = ""; while ($i < @l) { if ($l[$i] =~ /^\d+\.\d+\.\d+\.\d+$/) { $router = $l[$i]; $i++; } # # timeout # elsif ($l[$i] eq "*") { push @hoplist, ["*", 0]; $hophash{"*"}++; $i++; next; } # # a real router reply # if ($router ne "") { if ($l[$i+1] ne "ms") { close (IN); return ("expecting ms [$line]", []); } my $time = $l[$i]; $i += 2; push @hoplist, [$router, $time]; $hophash{$router}++; # # skip over failures # if ($l[$i] =~ /^!/) { $i++; } } else { close (IN); return ("don't know [$line]", []); } } push @hops, [@hoplist]; push @hash_hops, {%hophash}; } if ($timeout) { alarm (0); } }; close (IN); if ($@ && $timeout && $@ =~ /timeout/) { kill 9, $pid; push @hops, [["timeout", $timeout]]; return ("timeout", [@hops]); } my $t = time; ("", { "path" => [@hops], "hpath" => [@hash_hops], "time" => $t, "to" => $host, }); } sub debug { my ($msg, $level) = @_; if ($DEBUG && $level <= $DEBUG) { print STDERR "$msg\n"; } } sub path_to_hash { my $path = shift; my @new_path = (); for (my $i = 0; $i < @{$path}; $i++) { my $hop = $path->[$i]; for (my $j = 0; $j < @{$hop}; $j++) { $new_path[$i]->{$hop->[$j]->[0]}++; } } [@new_path]; } sub test { my ($msg, $path1, $path2) = @_; $path1->{"hpath"} = path_to_hash ($path1->{"path"}); $path2->{"hpath"} = path_to_hash ($path2->{"path"}); print "BEGIN: $msg\n"; my $r = compare_paths ($path1, $path2, "m"); if ($r == -1) { print "END: $msg no divergence\n"; } else { print "END: $msg divergence at $r\n"; } } sub usage { print <{"path"}}; $i++) { my $l_hop = print_hop ($last, $i); my $n_hop = print_hop ($p, $i); my $s = " "; if ($i == $diverge) { $s = "* "; } $l_hop = "$s$l_hop"; $n_hop = "$s$n_hop"; $old_pathstr .= "$l_hop"; $new_pathstr .= "$n_hop"; } push @failure_detail, "divergence at hop $diverge\n" . "old: " . print_hop ($last, $diverge) . "new: " . print_hop ($p, $diverge) . "\n" . "was: " . localtime ($last->{"time"}) . "\n$old_pathstr\n" . "is: " . localtime ($p->{"time"}) . "\n$new_pathstr\n"; } } if ($diverge == -1 || !defined $diverge) { push @successes, $host; push @success_detail, "at " . localtime ($p->{"time"}) . "\n" . print_path ($p, 1) . "\n"; } } ([@failures], [@failure_detail], [@successes], [@success_detail]); } mon-1.2.0/mon.d/cpqhealth.monitor0000755003616100016640000002202510061516615016611 0ustar trockijtrockij#!/usr/bin/perl # # "mon" monitor to detect thermal/fan/psu failures # for Compaq Proliant machines which run the "Compaq Insight Agent" # # arguments are "[-c community] [-f] [-p] [-t] host [host...]" # # -f do not query the fan table # -p do not query the PSU table # -t do not query the temperature table # # Jim Trocki # # $Id: cpqhealth.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 2000, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Std; use strict; sub get_status; sub get_table; sub get_vars; sub fancy_psu_table; $ENV{"MIBS"} = "CPQHLTH-MIB"; my %opt; getopts ('c:pft', \%opt); my $COMM = $opt{"c"} || "public"; my @failures = (); my $detail = ""; my $detail_head = ""; my $fan_detail = ""; my $psu_detail = ""; foreach my $host (@ARGV) { my %status = get_status ($host, $COMM); if ($status{"error"} ne "") { push (@failures, $host); $detail .= "could not retrieve status from $host: $status{error}\n\n"; next; } elsif ($status{"failure"}) { push (@failures, $host); } if ($status{"overall_status"}->{"redundant_fans"} ne "ok") { $detail_head .= "$host has fan problem\n"; } if ($status{"overall_status"}->{"redundant_psus"} ne "ok") { $detail_head .= "$host has PSU problem\n"; } if ($status{"overall_status"}->{"temperature"} ne "ok") { $detail_head .= "$host has temperature problem\n"; } # # Fan # $fan_detail .= fancy_fan_table ($host, $status{"flt_tol_fan_table"}) . "\n"; # # PSU # $psu_detail .= fancy_psu_table ($host, $status{"psu_status_table"}) . "\n"; } # # output returned to mon # if (@failures != 0) { print join (" ", sort @failures), "\n"; } else { print "\n"; } print "$detail_head"; if (!$opt{"f"}) { print < "error name, empty string means no error", # ) # sub get_status { my ($host, $comm) = @_; my $s; if (!defined ($s = new SNMP::Session ( "DestHost" => $host, "Community" => $comm, "UseEnums" => 1, "Version" => 2, ))) { return ("error" => "cannot create session"); } my $error; my $failure_detected = 0; my $psu_status_table; my $flt_tol_fan_table; my $temp_sensor_table; my $overall_status; # # is this really a compaq box w/the insight agent? # my $sys_oid = $s->get (["sysObjectID", 0]); return ("error" => $s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); return ("error" => "not Compaq device") if ($sys_oid ne ".1.3.6.1.4.1.311.1.1.3.1.2"); if (0) { # # get overall status # # from what i can tell, this is totally useless, and # it doesn't tell you about the things you actually care about, # like whether the PSUs are running in a redundant configuration # or not, or PSUs which used to be in order but are now # out of order. # ($error, $overall_status) = get_vars ($s, ["cpqHeThermalCondition", 0], ["cpqHeThermalTempStatus", 0], ["cpqHeThermalSystemFanStatus", 0], ["cpqHeThermalCpuFanStatus", 0], ["cpqHeFltTolPwrSupplyCondition", 0], ); return ("error" => "$error while retrieving overall status") if ($error ne ""); } $overall_status = { "redundant_fans" => "ok", "temperature" => "ok", "redundant_psus" => "ok", }; if (!$opt{"p"}) { # # PSU table # ($error, $psu_status_table) = get_table ($s, ["cpqHeFltTolPowerSupplyChassis"], ["cpqHeFltTolPowerSupplyBay"], ["cpqHeFltTolPowerSupplyPresent"], ["cpqHeFltTolPowerSupplyCondition"], ["cpqHeFltTolPowerSupplyStatus"], ["cpqHeFltTolPowerSupplyMainVoltage"], ["cpqHeFltTolPowerSupplyCapacityUsed"], ["cpqHeFltTolPowerSupplyCapacityMaximum"], ["cpqHeFltTolPowerSupplyRedundant"], ["cpqHeFltTolPowerSupplyModel"], ["cpqHeFltTolPowerSupplySerialNumber"], ["cpqHeFltTolPowerSupplyAutoRev"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$psu_status_table}) { next if ($r->{"cpqHeFltTolPowerSupplyPresent"} eq "absent"); if ($r->{"cpqHeFltTolPowerSupplyCondition"} ne "ok" || $r->{"cpqHeFltTolPowerSupplyStatus"} ne "noError" || $r->{"cpqHeFltTolPowerSupplyRedundant"} eq "notRedundant") { $failure_detected = 1; $overall_status->{"redundant_psus"} = "fail"; last; } } } if (!$opt{"f"}) { # # Fan chassis table # ($error, $flt_tol_fan_table) = get_table ($s, ["cpqHeFltTolFanChassis"], ["cpqHeFltTolFanIndex"], ["cpqHeFltTolFanLocale"], ["cpqHeFltTolFanPresent"], ["cpqHeFltTolFanType"], ["cpqHeFltTolFanSpeed"], ["cpqHeFltTolFanRedundant"], ["cpqHeFltTolFanRedundantPartner"], ["cpqHeFltTolFanCondition"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$flt_tol_fan_table}) { next if ($r->{"cpqHeFltTolFanPresent"} ne "present"); if ($r->{"cpqHeFltTolFanRedundant"} ne "redundant" || $r->{"cpqHeFltTolFanCondition"} ne "ok") { $failure_detected = 1; $overall_status->{"redundant_fans"} = "fail"; last; } } } if (!$opt{"t"}) { # # chassis temp table # ($error, $temp_sensor_table) = get_table ($s, ["cpqHeTemperatureChassis"], ["cpqHeTemperatureIndex"], ["cpqHeTemperatureLocale"], ["cpqHeTemperatureCelsius"], ["cpqHeTemperatureThreshold"], ["cpqHeTemperatureCondition"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$temp_sensor_table}) { # # cpqHeTemperatureCelsius == -1 if not present # (as far as i can tell) # next if ($r->{"cpqHeTemperatureCelsius"} == -1); if ($r->{"cpqHeTemperatureCondition"} ne "ok") { $failure_detected = 1; $overall_status->{"temperature"} = "fail"; last; } } } ( "error" => "", "failure" => $failure_detected, "overall_status" => $overall_status, "temp_sensor_table" => $temp_sensor_table, "flt_tol_fan_table" => $flt_tol_fan_table, "psu_status_table" => $psu_status_table, ); } sub get_table { my ($s, @tbl) = @_; my $table = []; my $tblid = $tbl[0]->[0]; my $i = 0; my $row = new SNMP::VarList (@tbl); return ("MIB problem") if (!defined $row); while (defined ($s->getnext ($row))) { last if ($s->{"ErrorStr"} ne ""); my $r = $row->[0]->[0]; last if ($r ne $tblid); foreach my $col (@{$row}) { $table->[$i]->{"iid"} = $col->[1]; $table->[$i]->{$col->[0]} = $col->[2]; } $i++; } return ($s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); ( "", $table, ); } sub get_vars { my ($s, @vars) = @_; my $r = new SNMP::VarList ( @vars ); return ("MIB problem") if (!defined $r); return ($s->ErrorStr) if (!defined ($s->get ($r))); my $v; foreach my $element (@{$r}) { $v->{$element->[0]} = $element->[2]; } ("", $v); } sub fancy_psu_table { my ($host, $psu_status_table) = @_; my $detail; foreach my $r (@{$psu_status_table}) { $detail .= sprintf ("%-12s psu %-3s %-10s %-11s %-12s %-11s %-4s %s\n", $host, $r->{"iid"}, $r->{"cpqHeFltTolPowerSupplyPresent"}, $r->{"cpqHeFltTolPowerSupplyStatus"}, $r->{"cpqHeFltTolPowerSupplyRedundant"}, $r->{"cpqHeFltTolPowerSupplyCondition"}, $r->{"cpqHeFltTolPowerSupplyBay"}, $r->{"cpqHeFltTolPowerSupplyChassis"}, ); } $detail; } sub fancy_fan_table { my ($host, $fan_table) = @_; my $detail; foreach my $r (@{$fan_table}) { $detail .= sprintf ("%-12s fan %-3s %-4s %-9s %-10s %-7s %-12s %-7s %-7s %s\n", $host, $r->{"iid"}, $r->{"cpqHeFltTolFanChassis"}, $r->{"cpqHeFltTolFanLocale"}, $r->{"cpqHeFltTolFanPresent"}, $r->{"cpqHeFltTolFanCondition"}, $r->{"cpqHeFltTolFanRedundant"}, $r->{"cpqHeFltTolFanRedundantPartner"}, $r->{"cpqHeFltTolFanSpeed"}, $r->{"cpqHeFltTolFanType"}, ); } $detail; } mon-1.2.0/mon.d/freespace.monitor0000755003616100016640000000431310230411543016565 0ustar trockijtrockij#!/usr/bin/perl # # Monitor disk space usage # # Arguments are: # # path:kBfree [path:kBfree...] # or # path:free% [path:free%...] # # This script will exit with value 1 if "path" has less than # "kBfree" kilobytes, or less than "free" percent available. # # The first output line is a list of the paths which failed, and # how much space is free, in megabytes. # # If you are testing NFS-mounted directories, should probably # mount them with the ro,intr,soft options, so that operations # on those mount points don't block forever if the server is # down, and I may eventually change this code to use an alarm(2) # to interrupt the stat and statfs system calls. # # This requires Fabien Tassin's Filesys::DiskSpace module, available from # your friendly neighborhood CPAN mirror. See http://www.perl.com/perl/ # # Jim Trocki, trockij@arctic.org # # $Id: freespace.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Filesys::DiskSpace; foreach (@ARGV) { ($path, $minavail) = split (/:/, $_, 2); ($fs_type, $fs_desc, $used, $avail, $fused, $favail) = df ($path); if (!defined ($used)) { push (@failures, "statfs error for $path: $!"); next; } if ($minavail =~ /(\d+(\.\d+)?)%/o) { $minavail = int(($used + $avail) * $1 / 100); } if ($avail < $minavail) { push (@failures, sprintf ("%1.1fGB free on %s", $avail / 1024 / 1024, $path)); } } if (@failures) { print join (", ", @failures), "\n"; exit 1; } exit 0; mon-1.2.0/mon.d/up_rtt.monitor0000755003616100016640000001706110146140377016163 0ustar trockijtrockij#!/usr/bin/perl # # mon monitor to check for circuit up and measure RTT # # # Jon Meek - 09-May-1998 # # Requires Perl Modules "Time::HiRes" and "Statistics::Descriptive" # # # # $Id: up_rtt.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $ # # Copyright (C) 1998, Jon Meek, meekj@ieee.org # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # mon config file watch entry: #watch frame-relay # service up_rtt # interval 5m # monitor up_rtt.monitor -T 2 -l /my/log_directory/logs/wan/fr_rtt_YYYYMM.log # period wd {Sun-Sat} # alert mail.alert meekj # alertevery 1h # A new log file will be created each month in the above example the files will # be of the form fr_rtt_199810.log # The YYYYMM format is the only date string possible in the current version use Getopt::Std; use Socket; use IO::Socket; use Time::HiRes qw( gettimeofday tv_interval ); use Statistics::Descriptive; getopts ("drt:T:l:U:"); # -d Debug mode # -p # -t n Number of seconds to wait for a packet to be echoed back # -T n Alarm if the minimum measured RTT is greater than n seconds # -l file Log file name with optional YYYYMM part that will be transformed to current month # -U num Number of UDP packets to send # -r Log individual raw RTTs $TimeOut = $opt_t || 10; # Timeout in seconds $NUM_UDP_TRYS = $opt_U || 5; # Number of UDP packets to send # Solaris MSG_WAITALL 0x40 /* Wait for complete recv or error */ #linux/socket.h:#define MSG_WAITALL 0x100 /* Wait for a full request */ # $RecvRet = recv($S, $Echo, $DataLength, 64); # Solaris & older versions of Linux #$RecvFlags = 0; # May work on all systems due to small packets used here #$RecvFlags = 64; # Hardcode for Solaris #$RecvFlags = 256; # Hardcode for Linux 2.2.x $RecvFlags = &MSG_WAITALL; # Requires that h2ph was run on the appropriate include directory print "MSG_WAITALL: $RecvFlags\n" if $opt_d; @Failures = (); @Hosts = @ARGV; # Host names are left on the command line after Getopt $GoodPackets = 0; foreach $TargetHost (@Hosts) { undef @RawRTT; $stat = Statistics::Descriptive::Full->new(); $TimeOfDay = time; $ReturnedPackets = &UDPcheck($TargetHost); # Try UDP echo first if ($ReturnedPackets == 0) { # If the UDP ping failed, then try TCP ($ReturnedPackets, $RTT) = &TCPcheck($TargetHost); $ResultString{$TargetHost} = sprintf "%d %s %0.4f T", $TimeOfDay, $TargetHost, $RTT; if ($opt_d) { print "$ResultString{$TargetHost}\n"; } } else { $min = $stat->min(); $mean = $stat->mean(); $max = $stat->max(); $count = $stat->count(); if ($opt_r && (defined @RawRTT)) { $ResultString{$TargetHost} = sprintf "%d %s", $TimeOfDay, $TargetHost; foreach $rtt (@RawRTT) { $ResultString{$TargetHost} .= sprintf " %0.4f", $rtt; } } else { $ResultString{$TargetHost} = sprintf "%d %s %0.4f %0.4f %0.4f %d", $TimeOfDay, $TargetHost, $min, $mean, $max, $count; } if ($opt_T) { # Check minimum RTT for alarm limit if ($min > $opt_T) { print "Minimum RTT pushing $host\n" if $opt_d; push (@Failures, $TargetHost); } } if ($opt_d) { print "$ResultString{$TargetHost}\n"; } } } # Write results to logfile, if -l if ($opt_l) { $LogFile = $opt_l; ($sec,$min,$hour,$mday,$Month,$Year,$wday,$yday,$isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $host (sort keys %ResultString) { print LOG "$ResultString{$host}\n"; } close LOG; } if (@Failures == 0) { # Indicate "all OK" to mon print "\n--------- No Failures ---------\n" if $opt_d; exit 0; } print "\n--------- Have Failures - mon Data Below ---------\n" if $opt_d; @SortedFailures = sort @Failures; print "@SortedFailures\n"; foreach $host (@SortedFailures) { print "$ResultString{$host}\n"; } print "\n"; exit 1; # Indicate failure to mon # # Subroutines below # sub UDPcheck { # Send multiple UDP packets my($TargetHost) = @_; my($DroppedPackets, $GoodPackets); $DroppedPackets = 0; $GoodPackets = 0; $dt = -1; # Will report -1 on failure $S = new IO::Socket::INET (PeerAddr => $TargetHost, PeerPort => 7, Proto => 'udp', ); do { &udpLeaveError($TargetHost, "Can't open UDP socket to $TargetHost\n"); return 0; } unless ($S); $TimeNow = time; for ($i = 1; $i <= $NUM_UDP_TRYS; $i++) { $Out = "UDP$i"; # Number the packets # $Out .= ' 'x52; # Make a 56+ byte packet $DataLength = length($Out); $Echo = ''; # Clear input buffer $t1 = [gettimeofday]; $BytesSent = send($S, $Out, 0); # Send the data $SIG{ALRM} = \&ReadTimeOut; eval { alarm($TimeOut); $RecvRet = recv($S, $Echo, $DataLength, $RecvFlags); $t2 = [gettimeofday]; alarm(0); }; if ($@ =~ /Read Timeout/) { $DroppedPackets++; if ($opt_d) { print " Dropped packet $i, waited $TimeOut s\n"; } } else { $dt = tv_interval ($t1, $t2); if ($Echo eq $Out) { $stat->add_data($dt); push(@RawRTT, $dt); if ($opt_d) { print "$i - $DataLength - $dt -$Echo-\n"; } $GoodPackets++; } else { if ($opt_d) { print "$i - $DataLength - $dt Bad Packet\n"; } } } } $S->close(); return $GoodPackets; } sub TCPcheck { # Send a single TCP packet my($TargetHost) = @_; my($DroppedPackets, $GoodPackets, $dt); $GoodPackets = 0; $i = 1; $dt = -1; # Will report -1 on failure $S = new IO::Socket::INET (PeerAddr => $TargetHost, PeerPort => 7, Proto => 'tcp', ); do { &tcpLeaveError($TargetHost, "Can't open TCP socket to $TargetHost\n"); return $GoodPackets, $dt; } unless ($S); $Out = "TCP$i"; $DataLength = length($Out); $t1 = [gettimeofday]; $BytesSent = send($S, $Out, 0); # Send the data $SIG{ALRM} = \&ReadTimeOut; eval { alarm($TimeOut); $RecvRet = recv($S, $Echo, $DataLength, $RecvFlags); $t2 = [gettimeofday]; alarm(0); }; if ($@ =~ /Read Timeout/) { $DroppedPackets++; if ($opt_d) { print " No Echo from TCP packet $i, waited $TimeOut s\n"; } } else { $dt = tv_interval ($t1, $t2); if ($Echo eq $Out) { if ($opt_d) { print "TCP $i - $DataLength - $dt -$Echo-\n"; } $GoodPackets++; } else { if ($opt_d) { print "TCP $i - $DataLength - $dt Bad Packet\n"; } } } $S->close(); return $GoodPackets, $dt; } sub udpLeaveError { # Don't call this one a failure, TCP might work my ($host, $reason) = @_; print "udpLeaveError $host\n" if $opt_d; } sub tcpLeaveError { # If we get here, it was a failure my ($host, $reason) = @_; print "tcpLeaveError pushing $host\n" if $opt_d; push (@Failures, $host); } sub ReadTimeOut { # For alarm/timeout signal die 'Read Timeout'; } mon-1.2.0/mon.d/http_tppnp.monitor0000755003616100016640000007103310630617405017044 0ustar trockijtrockij#!/usr/bin/perl # # Parallel http monitor, with timing, using separate process for each request # results are gathered using a named pipe # an optional "SmartAlarm" capability is provided # to classify alarms and/or limit alarms when there # are sporadic outages # # http_tppnp.monitor : http _ timing - proxy - parallel - named pipe # http _ t p p np # # # Jon Meek # Lawrenceville, NJ # meekj at ieee.org # # $Id: http_tppnp.monitor,v 1.2.2.1 2007/06/03 20:05:25 trockij Exp $ # # Copyright (C) 2004, Jon Meek # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # =head1 NAME B - http/https server parallel monitor for mon =head1 DESCRIPTION http/https server monitor for mon. Logs timing and size results, can use a proxy server. Each measurements is made using a separate measurement process, a central server is used to collect, process, and log the results. An optional "SmartAlarm" capability is provided to classify alarms and/or limit alarms when there are sporadic outages =head1 SYNOPSIS B -l log_file_YYYYMM.log [--servertimeout nn] [--clienttimeout nn] [--responsealarmtime nn] [--randskew nn] [--okcodes nnn,mmm,kkk] [--okstring 'Required string'] [--nocache] [--pipe pipename] [--stripprotocol] [--smartalarm smartalarm.module] [--sacfg smartalarm.cfg] [--smartalarmdir /smartalarm/path] [--forcesmartalarm] [--d --debug] [--debuglog file] [--v] host [host:/path_to_doc ...] The host list can be in any combination of the following: webmail.mysite.com/index.html http://webmail.mysite.com/ test.mysite.com/~meekj/ca_zip.txt@proxy.mysite.com http://webmail.mysite.com:81/ https://webmail.mysite.com/ http is the default if the protocol is not specified =head1 OPTIONS =over 5 =item B<-l log_file_template> or B<--log log_file_template> /path/to/logs/internet_web_YYYYMM.log Current year & month are substituted for YYYYMM, that is the only possible template at this time. The format of the log file is: unix_time proxy protocol://host path bytes response_time response_code If B<--stripprotocol> is specified then protocol:// is not included. The response_time is in seconds. If the response was determined to be a failure the time is reported as a negative number. =item B<-c> or B<--okcodes> Comma seperated list of acceptable http response codes, 200 is the default but must be explicitly included in the list if -c or --okcodes is used. =item B<--okstring 'Required string'> If defined, the string must be present in the content for the test to pass. The string can be a simple Perl regular expression, be sure to test it on your data. Note that Perl regular expression special characters ( +?.*^$()[{|\ ) may need to be escaped if your string is NOT supposed to be a regular expression. =item B<--nocache> Add 'Pragma: no-cache' and 'Cache-Control: max-stale=0' headers to all requests. In addition, check for Warning headers which indicate that the content was delivered from the cache anyway. This seems to be required when monitoring certain Web sites through certain cache servers. =item B<--servertimeout N s> Wait this long before giving up the wait for measurement results. If you change this, be sure that it is at least (clienttimeout + randskew + 5) seconds. Defaults to 45 seconds. =item B<--clienttimeout N s> N s The maximum time each measurement process waits for a response after its request is made (timeout starts after randskew time). Defaults to 30 seconds. =item B<--responsealarmtime N s> or B<-T N s> Trigger an alarm if any response is greater than N seconds. Defaults to a very large number, effectively disabling response time checks beyond the regular timeout. =item B<--randskew N s> Each measurement process will wait a random number of seconds, up to this maximum number before starting. Defaults to 10 seconds. =item B<--stripprotocol> Strip {http, https, ftp}:// from the URL stored in the logfile, for backwards compatibility of log format. =item B<--smartalarm Full/path/or/NameOfSmartAlarm> For selecting the httpSmartAlarm module to filter alarms and trigger an alarm only if certain conditions are met. If the full path is not specified, then the smart alarm is expected to exist in the ./mon.d directory (or more precisely, in the same directory as this monitor). Note that .pm should not be included in the module name, however the monitor will strip it out if it is included. The httpSmartAlarm module has the following structure: package httpSmartAlarm; # # Module to provide "Smart Alarms" for http_tppnp.monitor # use Exporter(); $VERSION = 0.02; @ISA = qw(Exporter); @EXPORT = qw(CheckAlarm); sub CheckAlarm { my ($ConfigFile, %TestResult) = @_; $TotalDownCount = 0; @DownList = (); &ReadParams($ConfigFile); # Read your config file, if you have one foreach $k (sort keys %TestResult) { # Check the results print "TestResult: $k - $TestResult{$k}\n" if $Debug; ($Failed, $tod, $proxy, $protocol, $site, $file, $size, $t, $http_code) = split(' ', $TestResult{$k}); # # Supply some sort of algorithm here # } return ($TotalDownCount, @DownList); } # Supply a ReadParams subroutine, if needed 1; =item B<--smartalarmdir /path/to/SmartAlarm> Alternate method of supplying the path to the filter module. =item B<--forcesmartalarm> Run SmartAlarm even if there are no failures. Useful if your SmartAlarm looks for other problems such as a bad route. =item B<--sacfg> The full path to the SmartAlarm configuration file. =item B<--pipe /path/to/pipe> The full path, including file name, of the named pipe used for inter-process communication. The default is /tmp/http_tppnp, the PID of the server process is added to this name to ensure uniqueness and allow multiple sets of server/clients to run simultaneously. =item B<-d> or B<--debug> Debug/Test, for manual testing only. =item B<--debuglog file> Write debug and response data to file. Defaults to STDOUT. =item B<-v> Verbose, show content of returned data, for manual testing only. =item B<-a> [Not backported from http_tpp yet] list all results if there is a failure, otherwise list only failed tests =item B<-r> [Not backported from http_tpp yet] Follow redirects, can be useful with -d =back =head1 MON CONFIGURATION EXAMPLE Note that a proxy will be used to access ot.myweb.com hostgroup internet_web www.ama-assn.org www.gartner.com test.mysite.com/~meekj/ca_zip.txt ot.myweb.com/ca_zip.txt@proxy.mysite.com watch internet_web service internet_web interval 5m monitor http_tppnp.monitor -l /usr/local/mon/logs/internet_web_YYYYMM.log -T 10 -t 15 period wd {Sun-Sat} alert mail.alert firewall_admin alertevery 1h summary Command line test examples: http_tppnp.monitor -d www.redhat.com bns.pha.com mythey.com/_mem_bin/FormsLogin.asp\?/ nonexist.pha.com www.sun.com/@proxy.labs.theyw.com http_tppnp.monitor -d www.redhat.com@proxy.labs.theyw.com www.sun.com/@proxy.labs.theyw.com www.yahoo.com/@proxy.labs.theyw.com =head1 BUGS Using a proxy for https or ftp has not been tested, and probably does not work at this time because all proxies are invoked as http. The path to mkfifo is hardcoded to /usr/bin/mkfifo, this is good for Linux and Solaris, but should be an option. Earlier versions had occasional problems with zombie/defunct processes under extreme conditions, such as DNS slowness. Additional protections have been added and this does not seem to be a problem. At times, the monitor would do an "exit 1" telling mon that there was a failure even though the failure list is empty. This is probably fixed. It was due the main program exiting before all the child processes. A two second wait before an "exit 0" appears to be sufficient, but the SIGCHLD handler is also disabled. If zombie processes appear, this method should be reviewed. The above problem could be avoided with a mon option to ignore alerts with an empty failure summary. There should be multiple "debug" output levels. One level should report only information useful to a user running the program manually, such as response times, byte counts, special headers, etc. =head1 REQUIRED PERL MODULES LWP::UserAgent HTTP::Request::Common Time::HiRes and, if https/SSL monitoring will be performed Crypt::SSLeay =head1 AUTHOR Jon Meek, meekj at ieee.org =head1 SEE ALSO http.monitor Use only for simple testing of a small number of hosts. http_tp.monitor Not actively maintained. http_tpp.monitor Should not be used, this monitor is a replacement. phttp.monitor by Gilles LAMIRAL lwp-http.mon by Daniel Hagerty (hag at linnaean.org) =cut $RCSid = q{$Id: http_tppnp.monitor,v 1.2.2.1 2007/06/03 20:05:25 trockij Exp $ }; use IO::Socket; use POSIX qw(:signal_h WNOHANG); use Getopt::Long; use Time::HiRes qw( gettimeofday tv_interval ); use LWP::UserAgent; use HTTP::Request::Common; $SmartAlarmConfig = ''; # Initialize, in case none is supplied # # Note that options needed in the client space must be passed on the client # command line. See sub ForkClient below. # GetOptions( "servertimeout=i" => \$ServerTimeout, "clienttimeout=i" => \$ClientTimeout, "responsealarmtime=i" => \$ResponseAlarmTime, "T=i" => \$ResponseAlarmTime, "randskew=i" => \$RandSkew, "okcodes=s" => \$opt_c, "okstring=s" => \$OKstring, "pipe=s" => \$NamedPipe, "c=s" => \$opt_c, "l=s" => \$opt_l, "log=s" => \$opt_l, "stripprotocol" => \$StripProtocol, "nocache" => \$NoCache, "smartalarm=s" => \$SmartAlarm, # Name of the SmartAlarm module "sacfg=s" => \$SmartAlarmConfig, # Name of the SmartAlarm config file "smartalarmdir=s" => \$SmartAlarmDir, "forcesmartalarm" => \$ForceSmartAlarm, "d" => \$Debug, "debug" => \$Debug, "debuglog=s" => \$DebugLog, "v", "client", # For use by client only "url=s" => \$URL, "proxy=s" => \$Proxy, ); $ServerTimeout = 45 unless $ServerTimeout; $ClientTimeout = 30 unless $ClientTimeout; $ResponseAlarmTime = 10000 unless $ResponseAlarmTime; $RandSkew = 10 unless defined $RandSkew; # Can be zero $NamedPipe = '/tmp/http_tppnp' unless $NamedPipe; $MKFIFO = '/usr/bin/mkfifo'; # Program to make the named pipe, or FIFO my $ResponseCount = 0; # Count the responses as they are delivered my %httpCode = (); # Where the results are kept my %httpTime = (); # Keys are in URL@proxy form my %httpSize = (); my %s = (); # A temporary hash used to pass data $TimeOfDay = time; if ($Debug) { # STDOUT is default destination for debug messages $DebugLog = q{-} unless defined $DebugLog; } if ($DebugLog) { open(DEBUGLOG, ">>$DebugLog") || warn "Can't open debug log: $DebugLog"; $Debug = 1; } ######################################################################################### # # Client code - started by fork-exec in Server code below # if ($opt_client) { sleep 1; # Give the server a second to get setup sub PipeProblem { # For alarm/timeout signal my $signame = shift; print "$ProgName could not write to pipe, received signal $signame\n"; print DEBUGLOG "\n--------- Exiting from PipeProblem with alert ---------\n\n" if $Debug; exit 1; } $SIG{PIPE} = \&PipeProblem; $RandomDelayTime = int(rand($RandSkew)); print DEBUGLOG "Child($$): $Proxy $URL - Delaying $RandomDelayTime s (max $RandSkew)\n" if $Debug; # exit if ($URL =~ /junk/); # For testing what happens if a client never responds (URL contains 'junk') sleep($RandomDelayTime); # Randomly delay ourselves to avoid a rush my $ua = new LWP::UserAgent; $ua->timeout($ClientTimeout); # Set timeout for LWP $TheContent = ''; if ($Proxy ne 'noproxy') { $ua->proxy('http', "http://$Proxy"); # Need to generalize this for other protocols } $s{measurementtime} = time; # Not currently used, but may become log option $dt = 0; $t0 = [gettimeofday]; # Get start time if ($NoCache) { # Request fresh content to get past caches $response = $ua->get($URL, Pragma => 'no-cache', 'Cache-Control' => 'max-stale=0'); } else { $response = $ua->request(GET $URL); } $t1 = [gettimeofday]; # Get end time $dt = tv_interval($t0, $t1); # Compute elapsed time $ResultCode = $response->code(); $WarningHeader = $response->header('Warning'); # Some caches might return this, see check below $TheContent = $response->content(); $ByteCount = length($TheContent); if ($NoCache && ($WarningHeader =~ /(\d{3})/)) { # Be sure that fresh data were delivered $WarningCode = $1; # If not, alter the Result Code to force an alarm $ResultCode = 503 if (($WarningCode >= 110) && ($WarningCode < 199)); } $StringCheckFail = 0; if ($OKstring) { # Check the content to verify that the required string is present $StringCheckFail = 1 if ($TheContent !~ /$OKstring/); } print DEBUGLOG "URL: $URL $ResultCode $ByteCount $dt\n" if $Debug; print "Warning Header: $WarningHeader\n" if $Debug; print $TheContent if $opt_v; # # Submit the results to the server process over a named pipe # if (-p $NamedPipe) { # Be sure that the pipe is there, otherwise our server may have exited open (PIPE, ">$NamedPipe") || die "Can't open pipe: $NamedPipe\n"; print PIPE "$URL $Proxy $ResultCode $StringCheckFail $ByteCount $dt\n"; print DEBUGLOG "\nChild($$) --------- Exiting normally ---------\n" if $Debug; exit 0; # The client invocation ends here } else { print DEBUGLOG "Child($$) exiting because pipe $NamedPipe does not exist\n" if $Debug; exit 0; } } ############# End Client Section ################################################### #################################################################################### # ############# Server Section #################################### # # # Determine path to monitor, for starting children # $ProgName = $0; # Will need full path print DEBUGLOG "\n\nStarting at $TimeOfDay Name: $ENV{PWD} / $ProgName\n" if $Debug; if (!(-x $ProgName)) { # We can't find ourself, won't be able to exec! print DEBUGLOG @ARGV if $Debug; print DEBUGLOG "\n" if $Debug; print "$ProgName cannot be found, or is not executable by mon\n"; exit 1; # Indicate failure to mon } if ($SmartAlarm) { # Use Smart Alarm module use File::Basename; $basename = basename($SmartAlarm); # Get the path to the module $dirname = dirname($SmartAlarm); if ((length($dirname) == 0) || ($dirname eq '.')) { $SmartAlarmDir = dirname($ProgName) unless $SmartAlarmDir; } else { $SmartAlarmDir = $dirname; } $basename =~ s/\.pm$//; print DEBUGLOG "SmartAlarmDir: $SmartAlarmDir Module: $basename\n" if $Debug; # use lib "/usr/local/mon/mon.d"; # Use ENV variable or option later push (@INC, $SmartAlarmDir); eval "use $basename"; do { print "Couldn't load $SmartAlarmDir/$basename.pm: $@\n"; exit 1; } unless ($@ eq ''); httpSmartAlarm->import(); } # # Reap children to avoid defunct processes / zombies # See "Network Programming with Perl" by Lincoln Stein # sub Reaper { my $signame = shift; my $timenow = time; while ((my $child_pid = waitpid(-1, WNOHANG)) > 0) { print DEBUGLOG "Parent $$ Reaped child: $child_pid after $signame at $timenow\n" if $Debug; } } $SIG{CHLD} = \&Reaper; # Handle interrupt key and termination signals sub OtherSIGs { my $signame = shift; unlink $NamedPipe; print "$ProgName Terminated on Signal: $signame\n"; print DEBUGLOG "\n--------- Exiting OtherSIGs with alert following $signame ---------\n\n" if $Debug; exit 1; } $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = \&OtherSIGs; # # Make the named pipe for children to report results # $NamedPipe .= ".$$"; # Tack on the PID for uniqueness print DEBUGLOG "Making $NamedPipe\n" if $Debug; $cmd = qq{$MKFIFO $NamedPipe}; $ret_val = system($cmd); #$SIG{CHLD} = $SIG{PIPE} = $SIG{INT} = 'IGNORE'; # don't want to die on 'Broken pipe' or Ctrl-C if ($opt_c) { # Parse list of acceptable http response codes (@t) = split(/,/, $opt_c); foreach $code (@t) { $AcceptableResponseCode{$code}++; } } else { $AcceptableResponseCode{200}++; # Default is 200 } foreach $target (@ARGV) { # Build host and path lists print DEBUGLOG "\nTarget: $target\n" if $Debug; # # Normalize the request # we may want to have more restrictive URL formats in the future # and eliminate this # $protocol = 'http'; # Default protocol $host_path = ''; if ($target =~ /^(\w+):\/\/(.*)/) { $protocol = $1; $host_path = $2; } else { $host_path = $target; } print DEBUGLOG "Protocol: $protocol host/path: $host_path\n" if $Debug; undef $proxy_server; if ($host_path =~ /@/) { ($host_path, $proxy_server) = split(/@/, $host_path, 2); } elsif (defined $Proxy) { # Allow one proxy to be set for all tests, but override with @ in host/path $proxy_server = $Proxy; } ($host, $Path) = split(/\//, $host_path, 2); if (defined $proxy_server) { $ProxyServer = $proxy_server; } else { $ProxyServer = 'noproxy'; } print DEBUGLOG "$host - $ProxyServer - $Path\n" if $Debug; $URL = "$protocol://$host/$Path"; push(@URLs, $URL); push(@Proxies, $ProxyServer); } $RandSkew = 0 if (@URLs <= 1); # No need to delay if there is a single URL # # Open the named pipe, must be in read/write mode, otherwise open will block # open (PIPE, "+< $NamedPipe") || die "Server Process: Can't open pipe: $NamedPipe\n"; # # Use evals for time-out capability # eval { $SIG{ALRM} = sub {die "Server alarm timeout"}; alarm($ServerTimeout); eval { # # Check each target URL by firing off a measurement child process # for ($i = 0; $i <= $#URLs; $i++) { $URL = $URLs[$i]; $Proxy = $Proxies[$i]; $URL_Proxy = $URL . '@' . $Proxy; # Unique test key $URL_Proxies{$URL_Proxy}++; # Checklist, used to track replies &ForkClient($URL, $Proxy); # Fire off a client to run the test } while (1) { $in = ; print DEBUGLOG "Data from pipe: $in" if $Debug; ($s{url}, $s{proxy}, $s{result_code}, $s{string_check_fail}, $s{byte_count}, $s{dt}) = split(' ', $in); $url = $s{url}; $proxy = $s{proxy}; $URL_Proxy = $url . '@' . $proxy; delete $URL_Proxies{$URL_Proxy}; # Saw this combination, check it off the list $NumTestsLeft = scalar keys(%URL_Proxies); print DEBUGLOG " $NumTestsLeft tests to go\n" if $Debug; # # Save measurement results in hashes # $httpCode{$URL_Proxy} = $s{result_code}; $httpStringFail{$URL_Proxy} = $s{string_check_fail}; $httpTime{$URL_Proxy} = $s{dt}; $httpSize{$URL_Proxy} = $s{byte_count}; last if ($NumTestsLeft == 0); # Bail out and process if we got all the replies } close PIPE; alarm(0); }; alarm(0); # Race condition prevention }; unlink $NamedPipe; # For housekeeping, and to let any straggling clients know # that the server process has exited # # Process the results, exit occurs from ProcessResults # &ProcessResults(\%httpCode, \%httpStringFail, \%httpTime, \%httpSize); ############# End of Server Code ############################################ # # Subroutines below # sub ForkClient { my ($url, $proxy) = @_; FORK: if ($pid = fork) { # parent here # child process pid is available in $pid # waitpid($pid,0); # Can't do this and retain parallelism # $returnstatus = ($? >> 8); } elsif (defined $pid) { #pid is zero here if defined # child here # Form our exec() string $execstring = "$ProgName --client --url $url --proxy $proxy --pipe $NamedPipe --randskew $RandSkew"; $execstring .= ' --nocache' if $NoCache; # Add additional flags $execstring .= ' -d' if $Debug; $execstring .= " --debuglog $DebugLog" if $DebugLog; $execstring .= " --okstring '$OKstring'" if $OKstring; $execstring .= ' -v' if $opt_v; print DEBUGLOG "execstring: $execstring\n" if $Debug; exec($execstring); # parent process pid is available with getppid } elsif ($! =~ /No more process/) { # EAGAIN, supposedly recoverable fork error sleep 2; redo FORK; } else { # weirdo fork error # return 1; } } # # Check for alarm conditions, etc. # sub ProcessResults { my ($Codes, $StringFail, $Times, $Sizes) = @_; my @Failures = (); my %FailureDetail = (); my %ResultString = (); # # Check for non-responders, LWP will usually give an error # so we may not exercise this often # foreach $r (keys %URL_Proxies) { # Unfullfilled test results print DEBUGLOG "$r $URL_Proxies{$r}\n" if $Debug; push(@Failures, $r); $ThisOneFailed = 1; $FailureDetail{$r} = 'No response'; ($protocol, $host, $path, $proxy) = &split_url($r); $Times->{$r} = -1.0; $Sizes->{$r} = 0; $Codes->{$r} = 0; if ($StripProtocol) { # Don't include http:// etc in log file for backwards compatibility $ResultString{$r} = sprintf("%d %s %s %s %d %0.4f %d", $TimeOfDay, $proxy, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } else { $ResultString{$r} = sprintf("%d %s %s://%s %s %d %0.4f %d", $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } $SmartAlarmString{$r} = sprintf("%d %d %s %s %s %s %d %d %0.3f %s", $ThisOneFailed, $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } # # Check response codes, times, etc # print DEBUGLOG "\nProcessResults\n" if $Debug; foreach $r (keys %$Codes) { next if (exists $URL_Proxies{$r}); # We already got it above $ThisOneFailed = 0; printf DEBUGLOG ("%8.3f %5d %6d %s\n", $Times->{$r}, $Codes->{$r}, $Sizes->{$r}, $r) if $Debug; # # Check http response code against list # if (!exists $AcceptableResponseCode{$Codes->{$r}}) { $ThisOneFailed++; $Times->{$r} = -1.0 * $Times->{$r}; # Log uses negative time as failure indicator $FailureDetail{$r} = "Bad response code ($Codes->{$r}) "; } # # Check response time against limit, if set, but don't negate response time # if ($ResponseAlarmTime) { if ($Times->{$r} > $ResponseAlarmTime) { $ThisOneFailed++; $FailureDetail{$r} .= 'Long response time'; } } if ($StringFail->{$r}) { $ThisOneFailed++; $FailureDetail{$r} .= qq{String '$OKstring' is missing}; } if ($ThisOneFailed) { push(@Failures, $r); } # Pick apart the URL so that we can generate a log entry # compatible with previous versions # ($protocol, $host, $path, $proxy) = &split_url($r); if ($StripProtocol) { # Don't include http:// etc in log file for backwards compatibility $ResultString{$r} = sprintf("%d %s %s %s %d %0.4f %d", $TimeOfDay, $proxy, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } else { $ResultString{$r} = sprintf("%d %s %s://%s %s %d %0.4f %d", $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } $SmartAlarmString{$r} = sprintf("%d %d %s %s %s %s %d %d %0.3f %s", $ThisOneFailed, $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } if ($Debug) { foreach $r (sort keys %ResultString) { print DEBUGLOG "ResultString: $ResultString{$r}\n"; } } # # Write results to logfile, if -l # if ($opt_l) { $LogFile = $opt_l; ($sec, $min, $hour, $mday, $Month, $Year, $wday, $yday, $isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month if (-e $LogFile) { # Check for existing log file $NewLogFile = 0; } else { $NewLogFile = 1; } open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $r (sort keys %ResultString) { print LOG "$ResultString{$r}\n"; } close LOG; } if ((@Failures == 0) && $ForceSmartAlarm) { # Run SmartAlarm to look for other problems, i.e. bad route ($count, @Failures) = &CheckAlarm($SmartAlarmConfig, %SmartAlarmString); if (@Failures == 0) { sleep 2; # Allow SIGCHLDs to arrive $SIG{CHLD} = 'IGNORE'; # We are finished, don't wait for straggling SIGCHLDs (hopefully will not leave zombies) exit 0; } $SummaryString = join ' ', @Failures; # Double check failure list $SummaryString =~ s/^\s+//; # Trim whitespace $SummaryString =~ s/\s+$//; # exit 0 if (length($SummaryString) <= 0); # Require data in failure list print "$SummaryString\n"; # Note that we are not supplying any detail data from SmartAlarm print DEBUGLOG "\n--------- Exiting ForceSmartAlarm alarm mode with alert ---------\n\n" if $Debug; exit 1; # Indicate failure to mon } if (@Failures == 0) { # No failures, exit with status 0 print DEBUGLOG "\n--------- No Failures ---------\n" if $Debug; print DEBUGLOG "\n--------- Exiting normally ---------\n\n" if $Debug; sleep 2; # Allow SIGCHLDs to arrive $SIG{CHLD} = 'IGNORE'; # We are finished, don't wait for straggling SIGCHLDs (hopefully will not leave zombies) exit 0; } if ($SmartAlarm) { # Smart alarm enabled, check the down list to see if we really # want to trigger an alarm ($SmartAlarmDownCount, @SmartAlarmFailures) = &CheckAlarm($SmartAlarmConfig, %SmartAlarmString); print DEBUGLOG "*** SmartAlarm Result: $SmartAlarmDownCount\n" if $Debug; if ($SmartAlarmDownCount) { # Have alarm, exit with status 1 print DEBUGLOG "\n--------- Have Smart Alarm Failures - mon Data Below ---------\n" if $Debug; @SortedFailures = sort @SmartAlarmFailures; # Sort to help mon in summary mode $SummaryString = join ' ', @SortedFailures; # Double check failure list $SummaryString =~ s/^\s+//; # Trim whitespace $SummaryString =~ s/\s+$//; # exit 0 if (length($SummaryString) <= 0); # Require data in failure list print "$SummaryString\n"; # There were failures, list them foreach $r (sort @Failures) { # Then provide details print "$r $Sizes->{$r} bytes $Times->{$r} s $FailureDetail{$r}\n"; } print DEBUGLOG "\n--------- Exiting SmartAlarm mode with alert ---------\n\n" if $Debug; exit 1; # Indicate failure to mon } print DEBUGLOG "\n--------- No Failures Classified by SmartAlarm ---------\n" if $Debug; print DEBUGLOG "\n--------- Exiting SmartAlarm mode ---------\n\n" if $Debug; sleep 2; # Allow SIGCHLDs to arrive $SIG{CHLD} = 'IGNORE'; # We are finished, don't wait for straggling SIGCHLDs (hopefully will not leave zombies) exit 0; } # Regular alarm mode print DEBUGLOG "\n--------- Have Failures - mon Data Below ---------\n" if $Debug; @SortedFailures = sort @Failures; # Sort to help mon in summary mode $SummaryString = join ' ', @SortedFailures; # Double check failure list $SummaryString =~ s/^\s+//; # Trim whitespace $SummaryString =~ s/\s+$//; # exit 0 if (length($SummaryString) <= 0); # Require data in failure list print "$SummaryString\n"; # There were failures, list them foreach $r (@SortedFailures) { # Then provide details print "$r $Sizes->{$r} bytes $Times->{$r} s $FailureDetail{$r}\n"; } print DEBUGLOG "\n--------- Exiting regular alarm mode with alert ---------\n\n" if $Debug; exit 1; # Indicate failure to mon } # # Pick apart the URL so that we can generate a log entry # compatible with previous versions # sub split_url { my $r = shift; my ($protocol, $host, $path, $proxy); $r =~ /^(\w+):\/\/([^\/]+)\/?(.*?)@(.*)/; $protocol = $1; $host = $2; # Ends when '/' seen $path = $3; $proxy = $4; if (length($path) < 1) { # Set the path for logging purposes $path = '/'; # we don't want an empty, space separated, field } return $protocol, $host, $path, $proxy; } mon-1.2.0/mon.d/snmpvar.monitor0000755003616100016640000004137010146140377016334 0ustar trockijtrockij#!/usr/bin/perl # ############################################################################ ## ## ## snmpvar.monitor Version 1.6.0 ## ## 2003-05-21 ## ## Copyright (C) 2000-2003 ## ## Peter Holzleitner (peter@holzleitner.com) ## ## ## ############################################################################ # # A MON plug-in monitor to test numeric values retrieved via SNMP # against configured limits. # # Arguments: # # [--community=cmn] [--group=groups] [--timeout=n] [--retries=n] [--debug] # [--varconf=filename] [--config=filename] [--snmpconf=filename] # [--mibs='mib1:mib2:mibn'] [--list[=linesperpage]] host [host ...] # # For every host name passed on the command line, snmpval.monitor looks # up the list of variables and corresponding limits in the configuration # file (snmpmon.cf). # # If a --groups option is present, only those variables are checked # which are in one of the specified groups. To specify more than one # group, separate group names with commas. You can also exclude groups # by prefixing the group name(s) with '-'. Don't mix in- and exclusion. # Examples: # --groups=Power only vars in the Power group # --groups=Power,Env vars in the Power or Env group # --groups=-Power,-Env all vars except those in Power or Env groups # --groups=Power,-Env won't work (only the exclusions) # # For every such variable, it looks up the OID, description etc. from # the variable definition file (snmpvar.def). # # This monitor looks for configuration files in the current directory, # in /etc/mon and /usr/lib/mon/etc. Command line option --varconf # overrides the location of the variable definition file, option # --config sets the configuration file name. # # For formats, please refer to the sample configuration files. # # By default, this monitor does not load any MIB, and OIDs are specified # numerically in the configuration files. Use the option --mibs # to force certain MIBs to be loaded. # # When invoked with the --list option, the output format is changed # into a more human-readable form used to check and troubleshoot the # configuration. This option must not be used from within MON. # # # Exit values: # 0 if everything is OK # 1 if any observed value is outside the specified interval # 2 in case of an SNMP error (e.g. no response from host) # # Requirements: # # UCD SNMP library (3.6.2 or higher) # G.S. Marzot's Perl SNMP module (from CPAN). # # # License: # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software Foundation, # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA # # # History: # # 1.6.0 21 May 2003 Equal and Non-Equal tests in addition to < and > (P.H.) # 1.5.1 09 Apr 2003 change \w to ^\s in FriendlyName detection to allow # indices containing "." like IP Addresses # 1.5.0 04 Dec 2002 per-host SNMP options (Ryan VanderBijl + P.H.) # --list shows all hosts if none specified (Ryan V.) # more output with --debug option (P.H.) # 1.4.0 10 Sep 2002 extended SNMP configuration (Dan Urist) # 1.3.0 15 May 2002 added GROUP option (Dave Alden) # added DEFAULTGROUP, group exclusion (P.H.) # decimals OK in limits (britcey) # added DefaultMin/Max (P.H.) # 1.2.0 21 Mar 2001 added FriendlyName option (P.H.) # 1.1.2 10 Jul 2000 fixed -l output with plausibility checks (P.H.) # 1.1.1 04 Apr 2000 automatically add dot between OID and index (P.H.) # 1.1.0 30 Mar 2000 added upper and lower plausibility limits (P.H.) # 1.0.1 24 Jan 2000 bugfix: reading Decode definitions (P.H.) # 1.0.0 13 Jan 2000 initial release (P.H.) # use SNMP; use Getopt::Long; use Sys::Syslog; sub ReadVarDef; sub ReadVarList; sub ReadSNMPConf; sub GetSNMPArgs; sub Decode; GetOptions (\%opt, "config=s", "groups=s", "varconf=s", "snmpconf=s", "community=s", "port=i", "timeout=i", "retries=i", "mibs=s", "list:i", "debug"); die "no host arguments\n" if ( (@ARGV == 0) && !exists($opt{'list'}) ); $RET = 0; @ERRS = (); @HOSTS = (); ($^O eq "linux" || $^O eq "openbsd") && Sys::Syslog::setlogsock('unix'); openlog('snmpvar.mon', 'cons,pid', 'daemon'); # find config files $CF1 = '/etc/mon'; $CF2 = '/usr/lib/mon/etc'; $VARCONF_FILE = (-d $CF1 ? $CF1 : $CF2) . '/snmpvar.def'; $MONCONF_FILE = (-d $CF1 ? $CF1 : $CF2) . '/snmpvar.cf'; $SNMPCONF_FILE = (-d $CF1 ? $CF1 : $CF2) . '/snmpopt.cf'; # pick up local config files for testing $VARCONF_FILE = './snmpvar.def' if -e './snmpvar.def'; $MONCONF_FILE = './snmpvar.cf' if -e './snmpvar.cf'; $SNMPCONF_FILE = './snmpopt.cf' if -e './snmpopt.cf'; # commandline ovverides ini any case $VARCONF_FILE = $opt{'varconf'} || $VARCONF_FILE; $MONCONF_FILE = $opt{'config'} || $MONCONF_FILE; $SNMPCONF_FILE = $opt{'snmpconf'} || $SNMPCONF_FILE; print STDERR "\nsnmpvar.monitor: configured from $VARCONF_FILE, $MONCONF_FILE\n\n" if $opt{'debug'}; ReadVarDef($VARCONF_FILE) || die "could not read variable definition: $!\n"; ReadVarList($MONCONF_FILE) || die "could not read config: $!\n"; ReadSNMPConf($SNMPCONF_FILE); # this is optional stuff # load only the necessary MIBs: $ENV{'MIBS'} = $opt{'mibs'} || ''; $FORMAT_LINES_PER_PAGE = $opt{'list'} || 25; $GROUPS = "," . $opt{'groups'} . "," if ($opt{'groups'}); @ARGV = keys %VARLIST if ( exists($opt{'list'}) && @ARGV == 0 ); foreach $host (@ARGV) { $VARS = $VARLIST{$host}; # %VARLIST{$host}{$var}{'MIN'|'MAX'} next unless $VARS; my $SNMPARGS = &GetSNMPArgs($host); if($opt{'debug'}) { print STDERR "$host SNMP Parameters:\n"; foreach $so (keys %SNMPARGS) { print " $so = $SNMPARGS{$so}\n"; } print STDERR "\n"; } if (!defined($s = new SNMP::Session(DestHost => $host, %SNMPARGS))) { $RET = 2 unless $RET > 2; $errmsg = "could not create session to $host: " . $SNMP::Session::ErrorStr; print STDERR "$errmsg\n" if $opt{'debug'}; push (@HOSTS, $host); push (@ERRS, $errmsg); next; } @HE = (); # list of errors for THIS host foreach $var (sort keys %$VARS) { # skip vars that are not in selected group, if any: if($GROUPS ne '') { $g = $$VARS{$var}{'GROUP'}; # assigned group of this variable next if $GROUPS =~ /,-$g,/i; # excluded group next if !($GROUPS =~ /-/) && !($GROUPS =~ /,$g,/i); # included group } $oid = $VARDEF{$var}{'OID'}; @IDX = split(/ +/, $$VARS{$var}{'IDX'}); if(@IDX == ()) { @IDX = (''); } else { $oid .= '.' unless $oid =~ /.+\.$/; } foreach $i (@IDX) { $ioid = $oid . $i; $pi = $i ne '' ? " [$i]" : ''; $descr = $VARDEF{$var}{'DESCR'}; $fn = $FRIENDLYNAME{$host}{$var}{$i} || $VARDEF{$var}{'FNAME'}{$i}; $fn =~ s/^@/$descr /; $vardescr = $fn || $descr . $pi; $rawval = $s->get($ioid); if ($s->{ErrorNum}) { $RET = 2 unless $RET > 2; $errmsg = "error retrieving $host:$var$pi($ioid): " . $s->{ErrorStr}; print STDERR "$errmsg\n" if $opt{'debug'}; push (@HE, $errmsg); next; } $val = eval ($rawval . $VARDEF{$var}{'SCALE'}); $min = $$VARS{$var}{'MIN'}; $max = $$VARS{$var}{'MAX'}; $eq = $$VARS{$var}{'EQ'}; $neq = $$VARS{$var}{'NEQ'}; $minvalid = $$VARS{$var}{'MINVALID'}; $maxvalid = $$VARS{$var}{'MAXVALID'}; $stat = 'OK'; $DEC = $VARDEF{$var}{'DEC'}; $pval = Decode($DEC, $val); $pmin = Decode($DEC, $min); $pmax = Decode($DEC, $max); $peq = Decode($DEC, $eq); $pneq = Decode($DEC, $neq); $pmin = $pmax = $peq if defined($eq); $pmin = $pmax = '!' . $pneq if defined($neq); if(defined($minvalid) && ($val < $minvalid)) { $stat = 'INV<'; syslog('warning', "$host: $vardescr less than lower plausibility limit: $pval"); write if defined $opt{'list'}; next; } if(defined($maxvalid) && ($val > $maxvalid)) { $stat = 'INV>'; syslog('warning', "$host: $vardescr larger than upper plausibility limit: $pval"); write if defined $opt{'list'}; next; } if(defined($min) && ($val < $min)) { $stat = 'FAIL<'; push (@HE, "$vardescr LOW: $pval $VARDEF{$var}{'UNIT'} (<$pmin)"); } if(defined($max) && ($val > $max)) { $stat = 'FAIL>'; push (@HE, "$vardescr HIGH: $pval $VARDEF{$var}{'UNIT'} (>$pmax)"); } if(defined($eq) && ($val != $eq)) { $stat = 'FAIL<>'; push (@HE, "$vardescr: $pval $VARDEF{$var}{'UNIT'} (<> $peq)"); } if(defined($neq) && ($val == $neq)) { $stat = 'FAIL='; push (@HE, "$vardescr: $pval $VARDEF{$var}{'UNIT'} (== $pneq)"); } write if defined $opt{'list'}; } # foreach(index) } # foreach(var) if (@HE) { push (@HOSTS, $host); push (@ERRS, $host . ":\n" . join("\n", @HE)); $RET = 1 unless $RET > 1; # previous error level 2 takes precedence } } # foreach(host) # in case of list output, suppress error listing by exiting here: exit 0 if defined $opt{'list'}; if ($RET) { print "@HOSTS\n\n"; print join("\n", @ERRS), "\n"; } exit $RET; # ---------------------------------------------------------------------- # subroutines begin # ---------------------------------------------------------------------- # # decode enumerations # sub Decode { my ($D, $v) = @_; my $dv; return $v unless $D; # can only decode with valid decoder hash $dv = $$D{$v} || '?'; # look up value return "$dv($v)"; } # # read variable definitions from file # sub ReadVarDef { my ($f) = @_; my ($curvar, $keyword, $param); $curvar = ''; open (CF, $f) || return undef; while () { next if (/^\s*#/ || /^\s*$/); chomp; /^\s*(\w*)\s*(.*)/; $keyword = $1; $param = $2; $curvar = $param if $keyword =~ /Variable/i; if($curvar ne '') { $VARDEF{$curvar}{'OID'} = $param if $keyword =~ /OID/i; $VARDEF{$curvar}{'DESCR'} = $param if $keyword =~ /Descr.*/i; $VARDEF{$curvar}{'UNIT'} = $param if $keyword =~ /Unit/i; $VARDEF{$curvar}{'SCALE'} = $param if $keyword =~ /Scale/i; $VARDEF{$curvar}{'DEFIDX'} = $param if $keyword =~ /DefaultIndex/i; $VARDEF{$curvar}{'DEFGRP'} = $param if $keyword =~ /DefaultGroup/i; $VARDEF{$curvar}{'DEFMIN'} = $param if $keyword =~ /DefaultMin/i; $VARDEF{$curvar}{'DEFMAX'} = $param if $keyword =~ /DefaultMax/i; $VARDEF{$curvar}{'DEFEQ'} = $param if $keyword =~ /DefaultEq/i; $VARDEF{$curvar}{'DEFNEQ'} = $param if $keyword =~ /DefaultNEq/i; $VARDEF{$curvar}{'DEFMINVAL'} = $param if $keyword =~ /DefaultMinValid/i; $VARDEF{$curvar}{'DEFMAXVAL'} = $param if $keyword =~ /DefaultMaxValid/i; if($keyword =~ /Decode/i) { $param =~ /\s*([^\s]+)\s+(.*)$/; $VARDEF{$curvar}{'DEC'}{$1} = $2; } if($keyword =~ /FriendlyName/i) { $param =~ /\s*([^\s]+)\s+(.*)$/; $VARDEF{$curvar}{'FNAME'}{$1} = $2; } } } # while() close (CF); return 1; } # # read list of variables to be monitored # sub ReadVarList { my ($f) = @_; my ($curhost, $curvar, $var, $param); $curhost = ''; open (CF, $f) || return undef; while () { next if (/^\s*#/ || /^\s*$/); chomp; if(/Host\s+(\S+)/i) { $curhost = $1; $curvar = ''; next; } if(/\s+SNMP\s+(\S+)\s+(.+)/i) { next unless $curhost; print "READVARLIST($curhost): SNMP: $1 $2\n"; $SNMP{$curhost}{lc $1} = $2; next; } if(/\s+FriendlyName\s+([^\s]+)\s+(.+)/i) { next unless $curhost; next unless $curvar; $FRIENDLYNAME{$curhost}{$curvar}{$1} = $2; next; } /^\s+(\S+)\s*(.*)$/; $curvar = $1; $param = $2; if($curhost) { $VARLIST{$curhost}{$curvar}{'MIN'} = $VARDEF{$curvar}{'DEFMIN'}; $VARLIST{$curhost}{$curvar}{'MIN'} = $1 if $param =~ /Min\s+([\d\.]+)/i; $VARLIST{$curhost}{$curvar}{'MAX'} = $VARDEF{$curvar}{'DEFMAX'}; $VARLIST{$curhost}{$curvar}{'MAX'} = $1 if $param =~ /Max\s+([\d\.]+)/i; $VARLIST{$curhost}{$curvar}{'EQ'} = $VARDEF{$curvar}{'DEFEQ'}; $VARLIST{$curhost}{$curvar}{'EQ'} = $1 if $param =~ /Eq\s+([\d\.]+)/i; $VARLIST{$curhost}{$curvar}{'NEQ'} = $VARDEF{$curvar}{'DEFNEQ'}; $VARLIST{$curhost}{$curvar}{'NEQ'} = $1 if $param =~ /NEq\s+([\d\.]+)/i; $VARLIST{$curhost}{$curvar}{'MINVALID'} = $VARDEF{$curvar}{'DEFMINVAL'}; $VARLIST{$curhost}{$curvar}{'MINVALID'} = $1 if $param =~ /MinValid\s+([\d\.]+)/i; $VARLIST{$curhost}{$curvar}{'MAXVALID'} = $VARDEF{$curvar}{'DEFMAXVAL'}; $VARLIST{$curhost}{$curvar}{'MAXVALID'} = $1 if $param =~ /MaxValid\s+([\d\.]+)/i; $VARLIST{$curhost}{$curvar}{'IDX'} = $VARDEF{$curvar}{'DEFIDX'}; $VARLIST{$curhost}{$curvar}{'IDX'} = $1 if $param =~ /Index\s+(.+)$/i; $VARLIST{$curhost}{$curvar}{'GROUP'} = $VARDEF{$curvar}{'DEFGRP'}; $VARLIST{$curhost}{$curvar}{'GROUP'} = $1 if $param =~ /Group\s+(.+)$/i; } } # while() close (CF); return 1; } sub ReadSNMPConf { my ($f) = @_; my $tag; my $val; if (-r $f) { print STDERR "\nsnmpvar.monitor: reading SNMP options from $f\n" if $opt{'debug'}; open(SNMPCONF, $f) or die "Huh? $f readable but open fails?"; while() { chomp; next if (/^\s*#/ || /^\s*$/); next unless /^\s*(\S+)\s*=\s*(.+)$/; $SNMPDEF{ lc $1 } = $2; print STDERR "snmpvar.monitor: $1 = $2\n" if $opt{'debug'}; } close SNMPCONF; } print STDERR "\n\n" if $opt{'debug'}; } sub GetSNMPArgs { my ($host) = @_; my $SNMPARGS; # Common options $SNMPARGS{Version} = $SNMP{$host}{version} || $SNMPDEF{version} || 1; $SNMPARGS{RemotePort} = $SNMP{$host}{port} || $opt{'port'} || $SNMPDEF{remoteport} || 161; $SNMPARGS{Retries} = $SNMP{$host}{retries} || $opt{'retries'} || $SNMPDEF{retries} || 8; $SNMPARGS{Timeout} = $SNMP{$host}{timeout} || $opt{'timeout'} || $SNMPDEF{timeout} || 5; # some people may prefer microseconds, but small values should mean seconds: $SNMPARGS{Timeout} *= 1000000 if $SNMPARGS{Timeout} < 1000; # SNMP v.1/v.2 options if ($SNMPARGS{Version} < 3) { $SNMPARGS{Community} = $SNMP{$host}{community} || $opt{'community'} || $SNMPDEF{community} || 'public'; } # SNMP v.3 options if ($SNMPARGS{Version} == 3) { $SNMPARGS{SecName} = $SNMP{$host}{secname} || $SNMPDEF{secname} || 'initial'; $SNMPARGS{SecLevel} = $SNMP{$host}{seclevel} || $SNMPDEF{seclevel} || 'noAuthNoPriv'; $SNMPARGS{AuthPass} = $SNMP{$host}{authpass} || $SNMPDEF{authpass} || ''; $SNMPARGS{SecEngineId} = $SNMP{$host}{secengineid} || $SNMPDEF{secengineid} || ''; $SNMPARGS{ContextEngineId} = $SNMP{$host}{contextengineid} || $SNMPDEF{contextengineid} || ''; $SNMPARGS{Context} = $SNMP{$host}{context} || $SNMPDEF{context} || ''; $SNMPARGS{AuthProto} = $SNMP{$host}{authproto} || $SNMPDEF{authproto} || 'MD5'; $SNMPARGS{PrivProto} = $SNMP{$host}{privproto} || $SNMPDEF{privproto} || 'DES'; $SNMPARGS{PrivPass} = $SNMP{$host}{privpass} || $SNMPDEF{privpass} || ''; } return %SNMPARGS; } format STDOUT_TOP = Host Variable min value max stat ---------------------------------------------------------------------------- . format STDOUT = @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<< @>>>>> @>>>>>> @<<< @>>>>> @<<<<< $host, $vardescr, $pmin, $pval, $VARDEF{$var}{'UNIT'}, $pmax, $stat . mon-1.2.0/mon.d/process.monitor0000755003616100016640000000551510614100573016317 0ustar trockijtrockij#!/usr/bin/perl # # Monitor snmp processes # # Arguments are: # # [-c community] host [host ...] # # This script will exit with value 1 if host:community has prErrorFlag # set. The summary output line will be the host names that failed # and the name of the process. The detail lines are what UCD snmp returns # for an prErrMessage. ('Too (many|few) (name) running (# = x)'). # If there is an SNMP error (either a problem with the SNMP libraries, # or a problem communicating via SNMP with the destination host), # this script will exit with a warning value of 2. # # There probably should be a better way to specify a given process to # watch instead of everything-ucd-snmp-is-watching. # # $Id: process.monitor,v 1.1.1.1.4.1 2007/04/26 10:39:55 trockij Exp $ # # # Copyright (C) 1998, Brian Moore # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Std; $ENV{'MIBS'} = "UCD-SNMP-MIB"; getopts("c:"); $community = $opt_c || 'public'; $RETVAL = 0; foreach $host (@ARGV) { $session = new SNMP::Session(DestHost => $host, Version => 2, Community => $community); if (!defined ($session)) { $RETVAL = ($RETVAL == 1) ? 1 : 2; push @failures, "$host session error"; push @longerr, "$host could not get SNMP session"; next; } my $v = new SNMP::Varbind (["prIndex"]); $session->getnext ($v); while (!$session->{"ErrorStr"} && $v->tag eq "prIndex") { my @q = $session->get ([ ["prNames", $v->iid], ["prMin", $v->iid], ["prMax", $v->iid], ["prCount", $v->iid], ["prErrorFlag", $v->iid], ["prErrMessage", $v->iid], # ["prErrFix", $v->iid], ]); last if ($session->{"ErrorStr"}); if ($q[4] > 0) { $RETVAL = 1; push (@failures, $host); push (@longerr, "$host: count=$q[3] min=$q[1] max=$q[2] err=$q[5]"); } $session->getnext ($v); } if ($session->{"ErrorStr"}) { push (@failures, $host); push (@longerr, "$host returned an SNMP error: " . $session->{"ErrorStr"}); $RETVAL = 1; } } if (@failures) { print join (", ", sort @failures), "\n", "\n"; print join ("\n", @longerr), "\n"; } exit $RETVAL; mon-1.2.0/mon.d/mon.monitor0000755003616100016640000000311710616437073015440 0ustar trockijtrockij#!/usr/bin/perl # # mon.monitor # # monitor mon server # # Jim Trocki # # $Id: mon.monitor,v 1.1.1.1.4.1 2007/05/03 19:55:39 trockij Exp $ use strict; use English; use Mon::Client; use Getopt::Std; my %opt; getopts ('u:p:P:t:', \%opt); my @failures; my @details; my $TIMEOUT = 30; $TIMEOUT = $opt{"t"} if ($opt{"t"}); foreach my $host (@ARGV) { my $c = new Mon::Client ( "host" => $host, ); if ($opt{"p"}) { $c->port ($opt{"p"}); } eval { local $SIG{"ALRM"} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; if (!defined $c->connect) { push @failures, $host; push @details, "$host: " . $c->error; undef $c; next; } if ($opt{"u"} && $opt{"P"}) { if (! defined $c->login ( "username" => $opt{"u"}, "password" => $opt{"P"}, )) { push @failures, $host; push @details, "$host: " . $c->error; undef $c; next; } } my @st = $c->list_state; if ($c->error ne "") { push @failures, $host; push @details, "$host: " . $c->error; $c->disconnect; undef $c; next; } push @details, "$host: @st"; if (!defined $c->disconnect) { push @failures, $host; push @details, "$host: could not disconnect, " . $c->error; undef $c; next; } undef $c; }; if ($EVAL_ERROR =~ "Timeout Alarm") { push @failures, $host; push @details, "$host: timeout"; } } if (@failures) { print join (" ", sort @failures), "\n"; } else { print "no failures\n"; } if (@details) { print join ("\n", @details), "\n"; } if (@failures) { exit 1; } exit 0; mon-1.2.0/mon.d/dns-query.monitor0000755003616100016640000000414510061516616016573 0ustar trockijtrockij#!/usr/bin/perl # # very straightforward dns monitor for use with "mon" # # arguments: # -t timeout timeout (defaults to 5 seconds) # -n name name to query, defaults to "mailhost" # # $Id: dns-query.monitor,v 1.1.1.1 2004/06/09 05:18:06 trockij Exp $ # # Copyright (C) 2003, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use strict; use Getopt::Std; use Net::DNS::Resolver; my %opt; getopts ("t:n:", \%opt); my $TIMEOUT = defined $opt{"t"} ? $opt{"t"} : 5; my $NAME = $opt{"n"} || "mailhost"; my $r = Net::DNS::Resolver->new; if (!defined $r) { die "could not create new Net::DNS::Resolver object\n"; } $r->udp_timeout ($TIMEOUT); my (%good, %bad); foreach my $server (@ARGV) { $r->nameservers ($server); my $p = $r->search ($NAME); if (!defined $p) { $bad{$server}->{"detail"} = $r->errorstring; } else { my $n = $p->{"answer"}->[0]; $good{$server}->{"detail"} = "$n->{name} $n->{class} $n->{type} $n->{address}"; } } if (keys %bad) { print join (" ", sort keys %bad), "\n"; } else { print "\n"; } if (keys %bad) { print "failures:\n"; foreach my $k (keys %bad) { print " $k: $bad{$k}->{detail} ($NAME)\n"; } print "\n"; } if (keys %good) { print "successes:\n"; foreach my $k (keys %good) { print " $k: successfull lookup for $good{$k}->{detail} ($NAME)\n"; } } exit 1 if (keys %bad); exit 0; mon-1.2.0/mon.d/na_quota.monitor0000755003616100016640000003344710061516614016460 0ustar trockijtrockij#!/usr/bin/perl -w # # "mon" monitor to detect quotas near their limits on Network Appliance # filers using SNMP # # Originally by Jim Trocki # Updated by Theo Van Dinter (tvd@colltech.com, felicity@kluge.net) (c) 2001 # $Id: na_quota.monitor,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ # # Invoke from mon via: # monitor na_quota.monitor [-c snmp community] [-f configuration file] ;; # # This script uses a configuration file to determine the alert points and # which filers to probe. The configuration file format is as follows: # filer_name type volume name size_diff files_diff # # filer_name = hostname of a filer (ie: toaster) # type = quota type, either tree or user # volume = the volume to check, can be "*" to set the default # name = the name of the qtree, can be "*" to set the default # size_diff = the amount of free space available to cause an alert. # "# [kmgt]b", "#.# [kmgt]b", "#" (assumes KB), or "#%" # files_diff = the number of files available to cause an alert. # "#" or "#%" # # For values with whitespace, enclose in quotes. (ie: "30 KB") # Use "-" for no alert, files_diff can be left off if unused. # Either size_diff or files_diff must be defined ("-" for both isn't allowed). # Volume and Name can be "*" for "all". It will be overridden as appropriate: # i.e.: toaster tree vol0 * 20% # default all trees in vol0 to 20% # toaster tree vol0 foo 10% # vol0 tree foo will use 10% instead # toaster tree vol0 bar 1GB # vol0 tree bar will use 1GB instead # toaster tree vol0 baz 0 # vol0 tree baz will only alert when # # size used == size limit # # Alerts occur when the used space/files is within "size_diff" or # "files_diff" from the limit. So if a size limit is at 100KB and used # is 80KB, the free space is 20KB or 20%. An alert will occur if the # size_diff is >= 20KB or >=20%. Note: The percentage is a rounded # integer, so 10% actually means 9.5 - 10.4% or below 10.4%... # # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 2 of the License, or (at your # option) any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # use SNMP; use strict; use Getopt::Std; use Text::ParseWords; use vars qw/ $opt_f $opt_c $host %failures /; $|++; $ENV{"MIBS"} = "NETWORK-APPLIANCE-MIB"; # We need the NetApp MIB loaded ... getopts ('c:f:'); $opt_f ||= "/usr/lib/mon/na_quota.cf"; # Configuration File $opt_c ||= "public"; # Community my $CONF = read_cf($opt_f); die "error reading config file $opt_f: $CONF\n" unless ( ref $CONF ); # Go through each specified host foreach $host (@ARGV) { if (!defined $CONF->{$host}) { $failures{$host} = "$host specified on the command line but not defined in $opt_f"; next; } my $q = retrieve_quotas ($host, $opt_c); # Grab the quotas from $host # If $q isn't a reference, it's an error string. unless (ref $q) { $failures{$host} = "could not retrieve quotas from $host: $q"; next; } # Check for user quotas, then tree quotas. foreach ( "user", "tree" ) { my $fails = quota_check($host,$_,$CONF,$q); @failures{keys %{$fails}} = values %{$fails} if ( defined $fails ); } } # Display failures if there were any, then exit with 0 or 1 appropriately. my $retval = 0; if (defined %failures) { print join("\n",join(" ",sort keys %failures),"",values %failures,""); $retval = 1; } exit $retval; # Read in the configuration file # # Input : Path to the configuration file # Output: Hash reference of the configuration # $ref->{filer_name}->{quota_type}->{volume}->{quota_name}->{"files" or "size"} = diff # sub read_cf { my $cf = shift; my $error = undef; my($filer,$type,$user,$path); my $CONF = undef; # Open the configuration file or return the error string open(CF, "<$cf") || return $!; while (defined($_=)) { chomp; s/\s*#.*$//; # kill comments s/^\s*//; s/\s*$//; # get rid of pre-suff whitespace next unless /\S/; # skip blank lines # Use quotewords to allow for quotas with spaces, etc. my($filer,$type,$volume,$name,$size,$files) = "ewords('\s+',0,$_); # Allow "-" to mean undefined. $size = undef if ( defined $size and $size eq "-" ); $files = undef if ( defined $files and $files eq "-" ); unless ( defined $filer && $filer =~ /^[a-z0-9_.-]+$/i ) { $error = "invalid filer name specified, $filer, line $."; last; } unless ( defined $type && $type =~ /^(tree|user)$/i ) { $error = "invalid quota type specified, $type, line $."; last; } unless ( defined $volume && ( $volume eq "*" || $volume =~ /^[_a-z][_a-z0-9]*$/i ) ) { $error = "invalid volume specified, $volume, line $."; last; } unless ( defined $name ) { $error = "invalid name specified, $name, line $."; last; } unless ( defined $size || defined $files ) { $error = "invalid line, no size_quota or file_quota, line $."; last; } # Convert the filer and type to lowercase, the rest are case-sensitive $filer = lc $filer; $type = lc $type; # If we have a KB limit and it's a valid limit ... if ( defined $size && defined($size = to_kb($size)) ) { $CONF->{$filer}->{$type}->{$volume}->{$name}->{"size"} = $size; } elsif ( defined $size ) { $error = "invalid size specification, $size, line $."; last; } # If we have a file limit and it's a valid limit ... if ( defined $files && $files =~ /^\d+\s*\%?$/ ) { $CONF->{$filer}->{$type}->{$volume}->{$name}->{"files"} = $files; } elsif ( defined $files ) { $error = "invalid files specification, $files, line $."; last; } } close (CF); # Return either the configuration HASH or the error string. return ( defined $error ) ? $error : $CONF; } # Convert given units to KB # # Input : Scalar value that is one of the following: # "# xB" (x=[kmgt]), "#" (assume KB), or "#%" # Output: integer "#" in KB or "#%" (passthru) # sub to_kb { my $value = shift; my ($num, $unit); if ($value =~ /^(\d+(?:\.\d+)?)\s*([kmgt])b$/i) { # "# xB" ($num, $unit) = ($1, lc $2); } elsif ( $value =~ /^\d+\s*\%?$/ ) { # "#%" or "#" (assume KB) return $value; } else { # who knows? error out. return undef; } # Figure out the prefix xB -> KB conversion ratio. Leave as KB by default. my $mval = ($unit eq "m") ? 1024 : ($unit eq "g") ? 1048576 : ($unit eq "t") ? 1073741824 : 1; return (int ($num*$mval)); } # Convert given # of KB into a more displayable string (MB, GB, etc.) # # Input : Scalar value of KB. Any non-numeric chars are stripped out. # Output: String in the format "#.##xB" where x is [KMGT]. ie: 10 becomes "10KB". # sub from_kb { my $value = shift; my @prefix = qw/ T G M K /; my $index = $#prefix; return undef unless defined $value; $value =~ tr/0-9//cd; # we only handle numbers (KB) while ( $value > 1024 && $index >= 0 ) { # Run until we can't go any further! $index--; $value /= 1024; } return sprintf "%.2f%sB", $value, $prefix[$index]; # Return the formatted string } # Retrieve the quota information from the NetApp via SNMP. # # Input : Hostname and SNMP Community (defaults to public) # Output: Hash reference of the quota information # $ref->{"user"}->{volume}->{username or uid}->{info} = value # $ref->{"tree"}->{volume}->{tree_name}->{info} = value # where info is: "qrVKBytesUsed", "qrVKBytesLimit", "qrVFilesUsed", "qrVFileLimit" # sub retrieve_quotas { my $host = shift; # Hostname my $community = shift || "public"; # SNMP Community, "public" by default my $quotas = undef; # Hash of quota information my %volnames = (); # Hash of volume names for reference # Establish the SNMP session if possible. my $s = new SNMP::Session ( DestHost => $host, Community => $community || "public", "Version" => 2, UseEnums => 1, ); if (!defined $s) { return "could not create SNMP session" } # Map volume numbers to names for use later on my $v = new SNMP::Varbind (["qvStateVolume"]); $s->getnext ($v); while (!$s->{"ErrorStr"} && $v->tag eq "qvStateVolume") { my @q = $s->get ([ ["qvStateVolume", $v->iid], ["qvStateName", $v->iid], ]); last if ($s->{"ErrorStr"}); $volnames{$q[0]} = $q[1]; $s->getnext ($v); } if ($s->{"ErrorStr"}) { return $s->{"ErrorStr"}; } # Get the quota information $v = new SNMP::Varbind (["qrVIndex"]); $s->getnext ($v); while (!$s->{"ErrorStr"} && $v->tag eq "qrVIndex") { # go through each qrVIndex my @q = $s->get ([ ["qrVType", $v->iid], ["qrVId", $v->iid], ["qrVKBytesUsed", $v->iid], ["qrVKBytesLimit", $v->iid], ["qrVFilesUsed", $v->iid], ["qrVFileLimit", $v->iid], ["qrVPathName", $v->iid], ["qrVVolume", $v->iid], ["qrVTree", $v->iid], ]); last if ($s->{"ErrorStr"}); # exit if there's a problem # Skip the crap quotas... if ( $q[0] ne "qrVTypeUnknown" && $q[0] ne "qrVTypeUserDefault" ) { # Turn qrVTypeUser and qrVTypeTree into "user" and "tree" if ( $q[0] =~ /^qrVType(User|Tree)$/ ) { $q[0] =~ s/^.+(User|Tree)$/\L$1/; } else { return "Unknown quota type ($q[0]) returned from filer!"; } # Map volume number to volume name $q[7] = $volnames{$q[7]}; # Map UID to Username if possible, use the system ... if ($q[0] eq "user"){ my($user) = (getpwuid($q[1]))[0]; $q[1] = $user if defined $user; } # Setup hash of quotas. type -> vol -> name -> key = value my $id = ( $q[0] eq "user" ) ? $q[1] : $q[8]; $quotas->{$q[0]}->{$q[7]}->{$id}->{"qrVKBytesUsed"} = $q[2]; $quotas->{$q[0]}->{$q[7]}->{$id}->{"qrVKBytesLimit"} = $q[3]; $quotas->{$q[0]}->{$q[7]}->{$id}->{"qrVFilesUsed"} = $q[4]; $quotas->{$q[0]}->{$q[7]}->{$id}->{"qrVFilesLimit"} = $q[5]; } $s->getnext ($v); # go on to the next one } # If we errored out, return the error. Otherwise return the hash reference. return ( $s->{"ErrorStr"} ) ? $s->{"ErrorStr"} : $quotas; } # This subroutine will check both tree and user quotas. Note: It's fairly nasty. # It was much much worse at one point, but it's a bit cleaner now. There's one routine # to handle both the tree and user quotas now instead of one for each type (size/files). # # Input : hostname, type (user|files), configuration hash ref, quota information hash ref # Output: hash reference of failures, or undef if there are no failures. # sub quota_check { my($host,$type,$CONF,$q) = @_; my $failures = undef; my $actvolume; # sort by volume name (unnecessary, but (*) needs to be last...) foreach $actvolume ( sort { ($a eq "*")?1:($b eq "*")?-1:$a cmp $b; } keys %{$CONF->{$host}->{$type}} ) { my $names = $CONF->{$host}->{$type}->{$actvolume}; my @volumes = (); my $done = (); # Generate the appropriate volume list if ( $actvolume eq "*" ) { @volumes = grep(!exists $done->{$host}->{$type}->{$_}, keys %{$q->{$type}}); } else { @volumes = ( $actvolume ); } my $volume; foreach $volume ( @volumes ) { my $actname; # sort by name (unnecessary, but (*) needs to be last...) foreach $actname ( sort { ($a eq "*")?1:($b eq "*")?-1:$a cmp $b; } keys %{$names} ) { my $qtype = $names->{$actname}; # quota information my @names = (); # Generate the appropriate quota name list if ( $actname eq "*" ) { @names = grep(!exists $done->{$host}->{$type}->{$volume}->{$_}, keys %{$q->{$type}->{$volume}}); } else { @names = ( $actname ); } my $name; foreach $name ( @names ) { # Keep track of which stuff we've checked $done->{$host}->{$type}->{$volume}->{$name}++; # If a configured check isn't quota-ed, report it as error. if ( exists $q->{$type}->{$volume}->{$name} ) { my %info = %{$q->{$type}->{$volume}->{$name}}; my $sorf; foreach $sorf ( "size", "files" ) { my $kbofi = ($sorf eq "size")?"KBytes":"Files"; my $limit = $info{"qrV${kbofi}Limit"}; my $used = $info{"qrV${kbofi}Used"}; if ( exists $qtype->{$sorf} ) { # Verify that the usage is being limited if ( $limit < 0 ) { $failures->{"$host:$volume:$name"}="requested quota ('$host $type $volume $name $sorf') isn't limited on the filer" unless ( $actname eq "*" ); next; } elsif ( $limit == 0 ) { $failures->{"$host:$volume:$name"}="requested quota ('$host $type $volume $name $sorf') has a limit of 0 $sorf on the filer"; next; } # Percentage and # free/left # Make sure to round fpct ... ;) my $fkb = $limit-$used; my $fpct = int($fkb/$limit*100+0.5); if ( $qtype->{$sorf} =~ /^(\d+)\s*\%$/ ) { my $pct = $1; if ( $fpct <= $pct ) { my $msg = "$type $sorf quota $host:$volume:$name has $fpct% "; $msg.=($sorf eq "files")?"files left": "(".from_kb($fkb).") free"; $msg.=" <= $pct% ($actvolume:$actname)"; $failures->{"$host:$volume:$name"}=$msg; } } else { # non-percent if ( $fkb <= $qtype->{$sorf} ) { my $msg = "$type $sorf quota $host:$volume:$name has "; $msg.=($sorf eq "files")?"$fkb files left": from_kb($fkb)." free"; $msg.=" <= "; $msg.=($sorf eq "files")?$qtype->{$sorf}: from_kb($qtype->{$sorf}); $msg.=" ($actvolume:$actname)"; $failures->{"$host:$volume:$name"}=$msg; } } } } } else { $failures->{"$host:$volume:$name"}="requested quota ('$host $type $volume $name') doesn't exist on filer" unless ( $actname eq "*" ); next; } } } } } return $failures; } mon-1.2.0/mon.d/nntp.monitor0000755003616100016640000001164710616437074015636 0ustar trockijtrockij#!/usr/bin/perl # # nntp.monitor # example: monitor nntp.monitor -g groupname -u user -a password # # Connect to an nntp server which possibly requires authentication, and # wait for the right output. # # For use with "mon". # # Arguments are "-p port -t timeout [-g group] [-f] host [host...] -u username -a password" # # This monitor connects to the NNTP server(s), checks for a greeting, logs in, # then performs a "mode reader" and a "group (groupname)", and then disconnects. # If the group is not specified by the -g option, then "control" is assumed. # # if "-f" is supplied, then it is assumed that a feeder is being tested, # and the "mode reader" and "group (groupname)" commands are not executed. # # Adapted from "http.monitor" by # Jim Trocki, trockij@arctic.org # authentication support added by # Kai Schaetzl/conactive.com # # http.monitor written by # # Jon Meek # American Cyanamid Company # Princeton, NJ # # $Id: nntp.monitor,v 1.2.2.1 2007/05/03 19:55:40 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; getopts ("fg:p:t:d:u:a:"); $GROUP = $opt_g || 'control'; $PORT = $opt_p || 119; $TIMEOUT = $opt_t || 30; $FEEDER = $opt_f; $DEBUG = $opt_d || ""; $USER = $opt_u || ''; $PASS = $opt_a || ''; @failures = (); foreach $host (@ARGV) { if (! &nntpGET($host, $PORT)) { push (@failures, $host); } } if (@failures == 0) { exit 0; } print join (" ", sort @failures), "\n"; exit 1; sub nntpGET { use Socket; use Sys::Hostname; my($Server, $Port) = @_; my($ServerOK, $TheContent); $ServerOK = 0; $TheContent = ''; $Path = '/'; ############################################################### eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; $result = &OpenSocket($Server, $Port); # Open a connection to the server if ($result == 0) { # Failure to open the socket print "Unable to open socket\n" if $DEBUG; return ''; } # # welcome message # $in = ; if ($in !~ /^2\d\d/) { alarm 0; print "No welcome message\n" if $DEBUG; return 0; } if (!$FEEDER) { if ($USER ne "") { # user print S "authinfo user $USER\r\n"; $in = ; if ($in !~ /^381/) { alarm 0; print "No reaction to authinfo user\n" if $DEBUG; return 0; } # password print S "authinfo pass $PASS\r\n"; $in = ; if ($in !~ /^281/) { alarm 0; print "No reaction to authinfo pass or wrong password\n" if $DEBUG; return 0; } } # # mode reader, wait for OK response # print S "mode reader\r\n"; $in = ; if ($in !~ /^2\d\d/) { alarm 0; print "Unable to perform 'mode reader'\n" if $DEBUG; return 0; } # # select $GROUP group, wait for OK response # print S "group $GROUP\r\n"; $in = ; if ($in !~ /^2\d\d/) { alarm 0; print "Unable to select group '$GROUP'\n" if $DEBUG; return 0; } } # # log out # print S "quit\r\n"; $in = ; if ($in !~ /^2\d\d/) { alarm 0; print "No response on 'quit' command.\n" if $DEBUG; return 0; } $ServerOK = 1; close(S); alarm 0; # Cancel the alarm }; if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { print "**** Time Out\n"; return 0; } return $ServerOK; } sub OpenSocket { # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # local($OtherHostname, $Port) = @_; local($OurHostname, $sockaddr, $name, $aliases, $proto, $type, $len, $ThisAddr, $that); $OurHostname = &hostname; ($name, $aliases, $proto) = getprotobyname('tcp'); ($name, $aliases, $Port) = getservbyname($Port, 'tcp') unless $Port =~ /^\d+$/; ($name, $aliases, $type, $len, $ThisAddr) = gethostbyname($OurHostname); ($name, $aliases, $type, $len, $OtherHostAddr) = gethostbyname($OtherHostname); my $that = sockaddr_in ($Port, $OtherHostAddr); $result = socket(S, &PF_INET, &SOCK_STREAM, $proto) || return undef; $result = connect(S, $that) || return undef; select(S); $| = 1; select(STDOUT); # set S to be un-buffered return 1; # success } mon-1.2.0/mon.d/Makefile0000644003616100016640000000145010061516615014663 0ustar trockijtrockij# # $Id: Makefile,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # compiles on Linux, Solaris 2.5, Solaris 2.6, and AIX Version 4.2 # CC = gcc CFLAGS = -O2 -Wall -g LDFLAGS = LDLIBS = # uncomment next line for Solaris # LDLIBS = -lnsl -lsocket MONPATH=/usr/lib/mon DIALIN_MONITOR_REAL=$(MONPATH)/mon.d/dialin.monitor PROGS = rpc.monitor dialin.monitor.wrap all: $(PROGS) rpc.monitor: rpc.monitor.c $(CC) -o rpc.monitor $(CFLAGS) $(LDFLAGS) rpc.monitor.c $(LDLIBS) dialin.monitor.wrap: dialin.monitor.wrap.c $(CC) -o dialin.monitor.wrap $(CFLAGS) $(LDFLAGS) \ -DREAL_DIALIN_MONITOR=\"$(DIALIN_MONITOR_REAL)\" \ dialin.monitor.wrap.c clean: rm -f $(PROGS) install: install -d $(MONPATH)/mon.d install rpc.monitor $(MONPATH)/mon.d/ install -g uucp -m 02555 dialin.monitor.wrap $(MONPATH)/mon.d/ mon-1.2.0/mon.d/rpc.monitor.c0000644003616100016640000002550010061516614015642 0ustar trockijtrockij/* * $Id: rpc.monitor.c,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ * * a monitor for RPC services, contains some code from rpcinfo(8) * * Copyright (C) 1986 Sun Microsystems, Inc. * Copyright (C) 1998 Daniel Quinlan * * Sun RPC is a product of Sun Microsystems, Inc. and is provided for * unrestricted use provided that this legend is included on all tape * media and as a part of the software program in whole or part. Users * may copy or modify Sun RPC without charge, but are not authorized * to license or distribute it to anyone else except as part of a product or * program developed by the user. * * SUN RPC IS PROVIDED AS IS WITH NO WARRANTIES OF ANY KIND INCLUDING THE * WARRANTIES OF DESIGN, MERCHANTIBILITY AND FITNESS FOR A PARTICULAR * PURPOSE, OR ARISING FROM A COURSE OF DEALING, USAGE OR TRADE PRACTICE. * * Sun RPC is provided with no support and without any obligation on the * part of Sun Microsystems, Inc. to assist in its use, correction, * modification or enhancement. * * SUN MICROSYSTEMS, INC. SHALL HAVE NO LIABILITY WITH RESPECT TO THE * INFRINGEMENT OF COPYRIGHTS, TRADE SECRETS OR ANY PATENTS BY SUN RPC * OR ANY PART THEREOF. * * In no event will Sun Microsystems, Inc. be liable for any lost revenue * or profits or other special, indirect and consequential damages, even if * Sun has been advised of the possibility of such damages. * * Sun Microsystems, Inc. * 2550 Garcia Avenue * Mountain View, California 94043 */ #if (__svr4__ && __sun__) #define SOLARIS #endif #include #include #include #include #include #include #include #include #ifdef SOLARIS #include #include #endif #include #include #include #include #include #define DEFAULT_TIMEOUT 10 struct failure_ent { char *host; char *reason; struct failure_ent *next; }; struct program_ent { int number; int required; int listed; struct program_ent *next; }; /* function prototypes */ void test_host(char *host); void test_programs (char *host, struct pmaplist *head); void test_failure(char *host, char *reason); void add_program(char *program, int required); void print_failures(); int get_rpc_number(char *program); #ifndef SOLARIS int get_inet_address(struct sockaddr_in *, char *); #endif void usage(); void alarm_nop(); /* global variables */ int opt_all; /* test all registered programs */ int opt_programs; /* test specific programs */ int opt_verbose; /* be verbose */ int timeout; /* host timeout */ char *binary; /* name of this program */ struct program_ent *programs; /* list of programs to test */ struct failure_ent *failures; /* list of test failures */ int main(int argc, char **argv) { int c; /* set defaults for global variables */ opt_all = 0; opt_programs = 0; opt_verbose = 0; timeout = DEFAULT_TIMEOUT; programs = NULL; failures = NULL; binary = strdup(argv[0]); while ((c = getopt(argc, argv, "t:avhp:r:")) != EOF) { switch(c) { case 't': timeout = atoi(optarg); break; case 'a': opt_all = 1; break; case 'p': opt_programs = 1; add_program(optarg, 0); break; case 'r': opt_programs = 1; add_program(optarg, 1); break; case 'v': opt_verbose = 1; break; case 'h': default: usage(); break; } } while (optind < argc) { test_host(argv[optind++]); } if (failures) { print_failures(); exit(1); } exit(0); } void add_program(char *program, int required) { struct program_ent *new, *tmp; new = (struct program_ent *) malloc(sizeof(struct program_ent)); new->number = get_rpc_number(program); new->required = required; new->listed = 0; new->next = NULL; if (opt_verbose) { printf("adding test: %s, %d, %s\n", program, new->number, required ? "required" : "unrequired"); } if (programs == NULL) { programs = new; } else { tmp = programs; while (tmp->next != NULL) { tmp = tmp->next; } tmp->next = new; } } int want_test(int number) { struct program_ent *tmp; tmp = programs; while (tmp != NULL) { if (number == tmp->number) { return 1; } tmp = tmp->next; } return 0; } void test_failure(char *host, char *reason) { struct failure_ent *tmp, *new; /* chomp */ if (reason[strlen(reason) - 1] == '\n') { reason[strlen(reason) - 1] = '\0'; } new = (struct failure_ent *) malloc(sizeof(struct failure_ent)); new->host = strdup(host); new->reason = strdup(reason); new->next = NULL; if (failures == NULL) { failures = new; } else { tmp = failures; while (tmp->next != NULL) { tmp = tmp->next; } tmp->next = new; } } void print_failures() { struct failure_ent *tmp; char *last = NULL; /* print failed hosts, removing consecutive duplicates */ tmp = failures; while (tmp != NULL) { if (!last || strcmp(last, tmp->host)) { printf("%s%s", last ? " " : "", tmp->host); } last = tmp->host; tmp = tmp->next; } printf("\n"); /* print reasons */ tmp = failures; while (tmp != NULL) { printf("%s: %s\n", tmp->host, tmp->reason); tmp = tmp->next; } } void test_host(char *host) { struct pmaplist *head = NULL; struct timeval tv; CLIENT *client; #ifndef SOLARIS struct sockaddr_in addr; int socket = RPC_ANYSOCK; #endif tv.tv_sec = timeout; tv.tv_usec = 0; #ifdef SOLARIS if ((client = clnt_create_timed(host, PMAPPROG, PMAPVERS, "tcp", &tv)) == NULL) { #else if (! get_inet_address(&addr, host)) { test_failure(host, "gethostbyname failed"); return; } addr.sin_port = htons(PMAPPORT); signal(SIGALRM, alarm_nop); alarm(timeout); if ((client = clnttcp_create(&addr, PMAPPROG, PMAPVERS, &socket, 0, 0)) == NULL) { alarm(0); #endif test_failure(host, clnt_spcreateerror("clnttcp_create failed")); return; } #ifndef SOLARIS alarm(0); #endif clnt_control(client, CLSET_TIMEOUT, (char *) &tv); if (clnt_call(client, PMAPPROC_DUMP, (xdrproc_t) xdr_void, NULL, (xdrproc_t) xdr_pmaplist, (char *) &head, tv) != RPC_SUCCESS) { test_failure(host, clnt_sperror(client, "clnt_call failed")); clnt_destroy(client); return; } if (head == NULL) { test_failure(host, "no remote programs registered"); clnt_destroy(client); return; } if (opt_all || opt_programs) { test_programs(host, head); } clnt_destroy(client); } void test_programs (char *host, struct pmaplist *head) { struct timeval tv; CLIENT *tclient; struct protoent *proto; char failtext[80]; int testing; struct rpcent *rpc; struct program_ent *ptmp; tv.tv_sec = timeout; tv.tv_usec = 0; if (opt_verbose) printf("testing host program vers proto port\n"); /* reset listed flag */ for (ptmp = programs; ptmp != NULL; ptmp = ptmp->next) ptmp->listed = 0; for (; head != NULL; head = head->pml_next) { proto = getprotobynumber(head->pml_map.pm_prot); rpc = getrpcbynumber(head->pml_map.pm_prog); testing = (opt_all || want_test(head->pml_map.pm_prog)); for (ptmp = programs; ptmp != NULL; ptmp = ptmp->next) if (head->pml_map.pm_prog == ptmp->number) ptmp->listed = 1; if (opt_verbose) { printf("%-7s", testing ? "true" : "false"); printf("%10s%10ld%5ld", host, head->pml_map.pm_prog, head->pml_map.pm_vers); if (proto) printf("%6s", proto->p_name); else printf("%6ld", head->pml_map.pm_prot); printf("%7ld", head->pml_map.pm_port); if (rpc) printf(" %s\n", rpc->r_name); else printf("\n"); } if (!testing) continue; if (!proto) { fprintf(stderr, "%s: %ld: unknown protocol\n", binary, head->pml_map.pm_prot); exit(1); } #ifdef SOLARIS if ((tclient = clnt_create_timed(host, head->pml_map.pm_prog, head->pml_map.pm_vers, proto->p_name, &tv)) == NULL) { #else signal(SIGALRM, alarm_nop); alarm(timeout); if ((tclient = clnt_create(host, head->pml_map.pm_prog, head->pml_map.pm_vers, proto->p_name)) == NULL) { alarm(0); #endif if (rpc) snprintf(failtext, 80, "clnt_create failed: %s/%s/v%ld", rpc->r_name, proto->p_name, (long)head->pml_map.pm_vers); else snprintf(failtext, 80, "clnt_create failed: %ld/%s/v%ld", head->pml_map.pm_prog, proto->p_name, (long)head->pml_map.pm_vers); test_failure(host, clnt_spcreateerror(failtext)); continue; } else { #ifndef SOLARIS alarm(0); #endif clnt_control(tclient, CLSET_TIMEOUT, (char *) &tv); if (clnt_call(tclient, NULLPROC, (xdrproc_t) xdr_void, NULL, (xdrproc_t) xdr_void, NULL, tv) != RPC_SUCCESS) { if (rpc) snprintf(failtext, 80, "clnt_call failed: %s/%s/v%ld", rpc->r_name, proto->p_name, (long)head->pml_map.pm_vers); else snprintf(failtext, 80, "clnt_call failed: %ld/%s/v%ld", head->pml_map.pm_prog, proto->p_name, (long)head->pml_map.pm_vers); test_failure(host, clnt_sperror(tclient, failtext)); clnt_destroy(tclient); continue; } clnt_destroy(tclient); } } /* did we find everything we want to require? */ for (ptmp = programs; ptmp != NULL; ptmp = ptmp->next) { if (ptmp->required && ptmp->listed == 0) { rpc = getrpcbynumber(ptmp->number); if (rpc) snprintf(failtext, 80, "RPC program not registered: %s", rpc->r_name); else snprintf(failtext, 80, "RPC program not registered: %d", ptmp->number); test_failure(host, failtext); } } } int get_rpc_number(char *program) { struct rpcent *rpc; if (isalpha(*program)) { rpc = getrpcbyname(program); if (!rpc) { fprintf(stderr, "%s: %s: unknown RPC program\n", binary, program); exit(1); } return(rpc->r_number); } else { return(atoi(program)); } } #ifndef SOLARIS int get_inet_address(struct sockaddr_in *addr, char *host) { struct hostent *hp; bzero((char *) addr, sizeof *addr); addr->sin_addr.s_addr = (unsigned long) inet_addr(host); if (addr->sin_addr.s_addr == -1 || addr->sin_addr.s_addr == 0) { if ((hp = gethostbyname(host)) == NULL) { return(0); } bcopy(hp->h_addr, (char *)&addr->sin_addr, hp->h_length); } addr->sin_family = AF_INET; return(1); } #endif void usage () { printf( "Usage: %s [options] host [host...]\n" " -t n host timeout in seconds (%d seconds is the default)\n" " -a test all registered programs with RPC null procedure\n" " -p program test program (either a name or number) if registered with\n" " RPC null procedure, multiple -p flags may be specified\n" " -r program same as \"-p\", except fail if program is not registered\n" " -v verbose mode\n", binary, DEFAULT_TIMEOUT); exit(0); } void alarm_nop() { return; } mon-1.2.0/mon.d/http.monitor0000755003616100016640000001111110620054614015606 0ustar trockijtrockij#!/usr/bin/perl # # Use try to connect to a http server. # For use with "mon". # # http.monitor [-p port] [-t secs] [-u url] [-a agent] [-o] host [host...] # # -p port TCP port to connect to (defaults to 80) # -t secs timeout, defaults to 30 # -u url path to get, defaults to "/" # -a agent User-Agent, default to "mon.d/http.monitor" # -o omit http headers from healthy hosts # -m regex match regex in response (header + content) # # Jon Meek # American Cyanamid Company # Princeton, NJ # # $Id: http.monitor,v 1.1.1.1.4.1 2007/05/08 11:05:48 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; use Data::Dumper; sub httpGET; getopts ("p:t:u:a:m:o"); $PORT = $opt_p || 80; $TIMEOUT = $opt_t || 30; $URL = $opt_u || "/"; $USERAGENT = $opt_a || "mon.d/http.monitor"; $MATCHRE = $opt_m; my %good; my %bad; exit 0 if (!@ARGV); foreach my $host (@ARGV) { my $result = httpGET ($host, $PORT); if (!$result->{"ok"}) { $bad{$host} = $result; } else { $good{$host} = $result; } } my $ret; if (keys %bad) { $ret = 1; print join (" ", sort keys %bad), "\n"; } else { $ret = 0; print "\n"; } foreach my $h (keys %bad) { print "HOST $h: $bad{$h}->{error}\n"; if ($bad{$h}->{"header"} ne "") { print $bad{$h}->{"header"}, "\n"; } print "\n"; } if (!$opt_o) { foreach my $h (keys %good) { print "HOST $h: ok\n"; print $good{$h}->{"header"}, "\n"; print "\n"; } } exit $ret; sub httpGET { use Socket; use Sys::Hostname; my($Server, $Port) = @_; my($ServerOK, $TheContent); $TheContent = ''; my $Path = $URL; my $result = { "ok" => 0, "error" => undef, "header" => undef, }; ############################################################### eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; my $err = &OpenSocket($Server, $Port); # Open a connection to the server if ($err ne "") { # Failure to open the socket $result = { "ok" => 0, "error" => $err, "header" => undef, }; return undef; } print S "GET $Path HTTP/1.0\r\n"; print S "Host: $Server\r\n"; print S "User-Agent: $USERAGENT\r\n\r\n"; while ($in = ) { $TheContent .= $in; # Store data for later processing } close(S); alarm 0; # Cancel the alarm }; ($result->{"header"}) = ($TheContent =~ /^(.*?)\r?\n\r?\n/s); if ($TheContent =~ /^HTTP\/([\d\.]+)\s+(200|30[12]|401)\b/) { $result->{"ok"} = 1; } else { $result->{"ok"} = 0; $result->{"error"} = "HTTP response code failure"; } if ($MATCHRE ne "") { if ($TheContent =~ /$MATCHRE/s) { $result->{"ok"} = 1; } else { $result->{"ok"} = 0; $result->{"error"} = $error = "Regex match failed"; } } if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { $result->{"ok"} = 0; $result->{"error"} = "timeout after $TIMEOUT seconds"; } return $result; } # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # # returns "" on success, or an error string on failure # sub OpenSocket { my ($host, $port) = @_; my $proto = (getprotobyname('tcp'))[2]; return ("could not get protocol") if (!defined $proto); my $conn_port; if ($port =~ /^\d+$/) { $conn_port = $port; } else { $conn_port = (getservbyname($port, 'tcp'))[2]; return ("could not getservbyname for $port") if (!defined $conn_port); } my $host_addr = (gethostbyname($host))[4]; return ("gethostbyname failure") if (!defined $host_addr); my $that = sockaddr_in ($conn_port, $host_addr); if (!socket (S, &PF_INET, &SOCK_STREAM, $proto)) { return ("socket: $!"); } if (!connect (S, $that)) { return ("connect: $!"); } select(S); $| = 1; select(STDOUT); ""; } mon-1.2.0/mon.d/dialin.monitor0000755003616100016640000001703110061516615016101 0ustar trockijtrockij#!/usr/bin/perl # # Check to see if modems on the terminal server are still answering the phone # # This script should be run using a setgid uucp wrapper which then # invokes this script in order to gain permissions to do UUCP-style # modem locking. # # Jim Trocki # # $Id: dialin.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # use Getopt::Std; use Expect; use POSIX; use English; use Sys::Syslog; sub hangup_ath; sub hangup_dtr; sub lock_modem; sub unlock_modem; sub attempt_dialin; Sys::Syslog::setlogsock ('unix'); openlog ("dialin.monitor", "cons,pid", "daemon"); getopts ('dn:D:l:r:t:', \%opt); die "not running as GID uucp, are you using the setgid wrapper?\n" unless ($opt{"d"} || $EGID == getgrnam ("uucp")); $NUMBER = $opt{"n"} || die "specify dial number with -n\n"; $DIAL_TIMEOUT = $opt{"t"} || 60; $DEVICE = $opt{"D"} || "/dev/modem"; $LOCKDIR = $opt{"l"} || "/var/lock"; $RETRIES = $opt{"r"} || 2; $SIG{"TERM"} = \&unlock_modem_and_die; $SIG{"INT"} = \&unlock_modem_and_die; my $no = 0; my $yes = 0; my @detail = (); for (my $retr = 0; $retr < $RETRIES; $retr++) { my $r = attempt_dialin; if ($r eq "OK") { $yes++; last; } $no++; push @detail, $r; sleep 5; } exit 0 if ($yes); print "failure for $NUMBER after $RETRIES retries\n", join ("\n", @detail), "\n"; exit 1; # # make an attempt to dial in # returns "OK" upon success, and an error string upon failure # sub attempt_dialin { my $got_lock = 0; for (my $tries=0; $tries < 6; $tries++) { my $r = lock_modem; if ($r eq "ok") { $got_lock = 1; last; } if ($opt{"d"}) { print STDERR "problem getting lock: $!\n" } sleep 10; } if ($opt{"d"}) { print STDERR "got_lock=$got_lock\n" } if (!$got_lock) { return "could not get lock"; } if ((my $r = hangup_dtr) != 1) { unlock_modem; return "could not drop DTR: $r"; } if (!open (MODEM, "+<$DEVICE")) { return "could not open $DEVICE: $!"; } select (MODEM); $| = 1; select (STDOUT); my $s = Expect->exp_init (\*MODEM); if (!$s) { unlock_modem; return "could not init expect"; } $s->exp_stty ("9600", "cs8", "-clocal", "crtscts", "-parenb", "-cstopb"); # # Reset modem # my $err = ""; for (my $n =0; $n < 2; $n++) { if ($opt{"d"}) { print STDERR "resetting modem\n" } print $s "ATZ\r"; ($matched_pattern_position, $error, $successfully_matching_string, $before_match,$after_match) = $s->expect (5, "OK"); if ($error eq "1:TIMEOUT") { $err = "timeout waiting for OK prompt"; next; } elsif (!defined $matched_pattern_position) { $err = "no pattern matched waiting for OK prompt"; next; } $err = ""; $s->clear_accum; last; } if ($err ne "") { unlock_modem; return $err; } # # dial phone number and wait for connect # if ($opt{"d"}) { print STDERR "dialing and waiting for carrier\n" } print $s "ATDT$NUMBER\r"; ($matched_pattern_position, $error, $successfully_matching_string, $before_match,$after_match) = $s->expect ($DIAL_TIMEOUT, "CONNECT", "BUSY", "NO CARRIER", "NO DIALTONE"); if ($error eq "1:TIMEOUT") { unlock_modem; return "timeout waiting for connection [$before_match] [$after_match]"; } elsif (!defined $matched_pattern_position) { unlock_modem; return "no pattern matched waiting for connection"; } elsif ($matched_pattern_position != 1) { unlock_modem; return "no connection: $successfully_matching_string"; } $s->clear_accum; # # once connected, wait for login prompt # if ($opt{"d"}) { print STDERR "got connect, waiting for login prompt\n" } ($matched_pattern_position, $error, $successfully_matching_string, $before_match,$after_match) = $s->expect (30, "login:"); if ($error eq "1:TIMEOUT") { close (MODEM); hangup_dtr; unlock_modem; return "connection but timeout on prompt"; } elsif (!defined $matched_pattern_position) { close (MODEM); hangup_dtr; unlock_modem; return "no login prompt"; } $s->clear_accum; # # got login prompt, hang up phone # if ($opt{"d"}) { print STDERR "all OK, hanging up\n" } close (MODEM); if (($r = hangup_dtr) != 1) { unlock_modem; return "could not toggle DTR: $r"; } unlock_modem; return "OK"; } # # this quite possibly only works on Linux, or at least that's # where I stole the TIOCM_DTR and TIOCMBIC values from # sub hangup_dtr { if ($opt{"d"}) { print STDERR "\tdropping DTR\n" } open (M, "+<$DEVICE") || return "could not open device: $!"; my $TIOCM_DTR = pack ("i", 0x02); my $TIOCMBIC = 0x5417; if (!ioctl (M, $TIOCMBIC, $TIOCM_DTR)) { return "could not do ioctl: $!"; } sleep 5; close (M); return 1; } sub hangup_ath { my $i, $ok; my $matched_pattern_position, $error, $successfully_matching_string, $before_match, $after_match; if ($opt{"d"}) { print STDERR "\thanging up via ATH\n" } $ok = 0; PLUS_LOOP: for ($i=0; $i<3; $i++) { print $s "+++"; ($matched_pattern_position, $error, $successfully_matching_string, $before_match, $after_match) = $s->expect (4, "OK"); if (defined $matched_pattern_position) { $ok = 1; last PLUS_LOOP; } } return undef unless ($ok); print $s "ATH\r"; ($matched_pattern_position, $error, $successfully_matching_string, $before_match, $after_match) = $s->expect (5, "OK"); return 1 if ($matched_pattern_position == 1); return undef; } sub lock_modem { my $dev, $pid; my $lockfile, $tmpfile; if ($opt{"d"}) { print STDERR "\tlocking modem\n" } ($dev) = $DEVICE =~ /\/([^\/]*)$/; return "device unparseable" unless ($dev ne ""); $lockfile = "$LOCKDIR/LCK..$dev"; $tmpfile = "$LOCKDIR/TMPLCK..$$"; open (O, ">$tmpfile") || return "could not open tmp lock: $!"; print O "$$\n"; close (O); if (!link ($tmpfile, $lockfile)) { if ($! == EEXIST) { if (!open (I, $lockfile)) { unlink ($tmpfile); return "could not open existing lock file: $!"; } $pid = ; close (I); my ($npid) = $pid =~ /(\d+)/; my $v = kill 0, $npid; if ($v == 0 && $! != ESRCH) { unlink ($tmpfile); return "lock already exists"; } if ($opt{"d"}) { print STDERR "lock from dead process exists\n" } if (!unlink ($lockfile)) { unlink ($tmpfile); if ($opt{"d"}) { print STDERR "lock from dead process exists, could not unlink stale lock\n" } return "could not unlink stale lock file: $!"; } if (!link ($tmpfile, $lockfile)) { unlink ($tmpfile); if ($opt{"d"}) { print STDERR "lock from dead process exists, could create lock\n" } return "could not create lock file: $!"; } } else { return "cannot create lock file: $!"; } } unlink ($tmpfile) || return "could not unlink temp lock file: $!"; if ($opt{"d"}) { print STDERR "lock is OK\n" } return "ok"; # create a temp file in /var/lock and put PID in it # make a hard link to it, check return value # if errno = EEXIST then # open and read existing lock file # check if the proc exists via kill -0 # if it does exist, unlink the tmpfile and return failure # else unlink lockfile # link tmpfile to lockfile # unlink tmpfile # return OK # } sub unlock_modem { my $dev; if ($opt{"d"}) { print STDERR "\tunlocking modem\n" } ($dev) = $DEVICE =~ /\/([^\/]*)$/; return "device unparseable" unless ($dev ne ""); unlink ("$LOCKDIR/LCK..$dev") || return "could not unlink lock file: $!"; return 1; } sub unlock_modem_and_die { unlock_modem; die; } mon-1.2.0/mon.d/phttp.monitor0000755003616100016640000005220610146140377016005 0ustar trockijtrockij#!/usr/bin/perl -w =head1 NAME phttp.monitor - parallel http monitor. =head1 SYNOPSIS type this: phttp.monitor --help and read, it is a safe job. =head1 DESCRIPTION phttp.monitor checks http servers in parallel without forking. The request can be an arbitrary multi-line string and the response can be parsed using an arbitrary regular expression. So, HTTP proxies, GET POST PUT TRACE directives, authorization scheme, xxx code or complex content responses, are all possible. =head1 RETURN STATUS O on success for all hosts, or usage demand (--help option) 1 on failure of any host =head1 SUMMARY LINE list of hosts that failed the test with the connection time (in secondes) beetween (), if any, like : www.foo.org(15) www.boo.com(1) =head1 DETAILS detail output (just after summary) follows this convention: =over =item * lines beginning with + are successes =item * lines beginning with ~ are just warnings =item * lines beginning with - are failures =back =head1 CAVEATS =head2 OPEN FILE HANDLES Be careful that the number of open file handles is limited. Usually 1024 and since 0, 1, 2 (stdin, stdout, stderr) are already open, you have only 1021 maximum connections allowed and upon upper connections the tests will systematically fail. =head2 TIMEOUT The timeout counter for each host begins just after the first connect command. The name resolution is already done so it does not count. But since everything is done in parallel, be carefull that the timeout can come from your proper bandwidth, cpu etc. For example, using the same host on both sides (client and server) and running phttp.monitor with a "-n 19" nice value, the first complete response comes after ~35 secondes and the last after ~55 secondes. All were successful, thanks to Apache. Yes, I demanded the same header page 1021 times and I am not rich (an old Cirix 133 Mega hertz). =head2 DOS Deny Of Service is easy if you have a good tube and a good box. Please, do not use this software for hard war. Be nice. =head1 LICENCE This is GNU PUBLIC LICENCE software =head1 AUTHOR Gilles LAMIRAL lamiral@mail.dotcom.fr =cut require 5.002; # Give me more than five use strict; # use English; # because use French does not work... use Getopt::Long; # way home use Socket; # ou chaussette use POSIX; # or Y use FileHandle; use IO::Select; # No I did not go in a laugh school, I just sucked a clown... use Time::HiRes qw(gettimeofday usleep); $OUTPUT_AUTOFLUSH = 1; my $VERSION = 0.02; my( $help, $debugGeneral, $debugOptions, $debugResolution, $debugCreation, $debugConnection, $debugSelection, $debugWriting, $debugReading, $debugResults, $debugAnalyse, $debugEverything, ); #my $hostname; my( $port, $request, $nbrequests, $inserthost, $timeout, $softimeout, $regex); getoptions(); usage(), exit(0) if ($help); defaultvalues(); my @list = split(/\s+/, join(" ", @ARGV, (" ")) x $nbrequests); #my $iaddr = gethostbyname($hostname); #my @iaddr = unpack('C4', $iaddr); #$debugGeneral and print 'my ip = ', join('.',@iaddr), "\n"; my $proto = getprotobyname('tcp'); #my $paddr = sockaddr_in(0, $iaddr); $debugGeneral and print dump_posix(); $debugGeneral and print "CREATING THE USEFUL DATA\n"; my (%client, %onrace, %offrace, %badrace, %goodrace, %pacerace, %fh2id); my $count = 0; foreach my $host (@list) { $count++; $client{$count}{'host'} = $host; $client{$count}{'success'} = ""; $client{$count}{'problem'} = ""; $onrace{$count}++; } resolve_names(); create_sockets(); first_connection(); write_preparation(); my $readwriteable_handles = new IO::Select(); ($debugGeneral or $debugWriting or $debugReading) and print "SELECTING, WRITING, READING AND CLOSING\n"; # What write my %notconnected = %onrace; ONRACE: while(keys(%onrace)){ my (@new_writeable, @new_readable, @new_errorable); $debugGeneral and print dump_onrace(); my @id = sort { $a <=> $b } keys(%onrace); IDT: foreach my $id (@id) { my $now = gettimeofday; my $begin = $client{$id}{"begin"}; if (($now - $begin) > $timeout) { # game over, baby. $client{$id}{"problem"} .= "- hard timeout reached\n"; $debugConnection and print "hard timeout reached\n"; $client{$id}{"fhandle"}->close(); delete($notconnected{$id}); outrace($id); next IDT; }; } # We have to look if the connection succeeded # before doing IO @id = sort { $a <=> $b } keys(%notconnected); ID: foreach my $id (@id) { my($command) = ""; $command = connect($client{$id}{"fhandle"}, $client{$id}{"hispaddr"}); $debugConnection and print "reconnect host id : $client{$id}{'host'} $id\n"; if (defined($command) and ($command == 1)) { # Linux success $debugConnection and print "reconnect succeeded : [$command]\n"; $client{$id}{"success"} .= "+ reconnect succeeded\n"; delete($notconnected{$id}); $onrace{$id}++; #next ID; } elsif ((not defined($command)) and (($! == EISCONN()))) { # Solaris success $client{$id}{"success"} .= "+ reconnect command succeeded EISCONN : $!\n"; $debugConnection and print "reconnect command succeeded EISCONN : $! ", scalar($! + 0), "\n"; # good and sorry. delete($notconnected{$id}); $onrace{$id}++; #next ID; } elsif ((not defined($command)) and (($! == EALREADY()) or ($! == EAGAIN()))) { #$client{$id}{"problem"} .= "~ reconnect command EALREADY : $!\n"; $debugConnection and print "reconnect command EALREADY : $! ", scalar($! + 0), "\n"; # not so bad, play again. $onrace{$id}++; next ID; } elsif (defined($command) and ($command == -1) and ($! == ETIMEOUT())) { $client{$id}{"problem"} .= "- reconnect command failed ETIMEOUT : $!\n"; $debugConnection and print "reconnect command failed by timeout : $!\n"; $client{$id}{"fhandle"}->close(); delete($notconnected{$id}); outrace($id); next ID; } else { $client{$id}{"problem"} .= "- reconnect command failed : $!\n"; $debugConnection and print "reconnect command failed : $! ", scalar($! + 0), "\n"; if (defined($command)) { $debugConnection and print "command status : $command\n"; }else{ $debugConnection and print "command status : not defined\n"; } $client{$id}{"fhandle"}->close(); $debugConnection and print "deleting $client{$id}{'host'} $id\n"; delete($notconnected{$id}); outrace($id); next ID; } $readwriteable_handles->add($client{$id}{"fhandle"}); $fh2id{$client{$id}{"fhandle"}} = $id; } @new_writeable = $readwriteable_handles->can_write(10); @new_readable = $readwriteable_handles->can_read(2); $debugWriting and print "writeable : ", join (" ", map { $fh2id{$_} } @new_writeable), "\n"; $debugReading and print "readable : ", join (" ", map { $fh2id{$_} } @new_readable), "\n"; WRITE: foreach my $sock (@new_writeable) { my($id, $nleft, $bytes_wrote); $id = $fh2id{$sock}; $nleft = length ($client{$id}{"wbuf"}); next if ($nleft == 0); $debugWriting and print "syswrite to $client{$id}{'host'} $nleft bytes\n"; $bytes_wrote = syswrite ($sock, $client{$id}{"wbuf"}, $nleft); if (defined($bytes_wrote)){ $debugWriting and print "bytes_wrote = $bytes_wrote\n"; if ($bytes_wrote == 0) { # Server close the connexion $readwriteable_handles->remove($sock); $client{$id}{"problem"} .= "- server close the connexion : $!\n"; $debugWriting and print "server close the connexion : $!\n"; $sock->close(); outrace($id); }else{ substr($client{$id}{"wbuf"}, 0, $bytes_wrote) = ""; if (length($client{$id}{"wbuf"}) == 0) { # No more writing $client{$id}{"success"} .= "+ syswrite command succeeded\n"; $debugWriting and print "syswrite command succeeded on $client{$id}{'host'}\n"; #$readwriteable_handles->remove($sock); #$sock->close(); #delete($onrace{$id}); } } }else{ if ($! == EAGAIN()){ $debugWriting and print "EAGAIN\n"; next WRITE; }elsif($! == EINPROGRESS()){ $debugWriting and print "EINPROGRESS\n"; next WRITE; }else{ $debugWriting and print "Do not know what happened : $!\n"; next WRITE; } } } READ: foreach my $sock (@new_readable) { my($id, $buf); $id = $fh2id{$sock}; $buf = <$sock>; if ($buf) { $debugReading and print "reading from $client{$id}{'host'} : ", length($buf), " bytes\n"; $client{$id}{'rbuf'} .= $buf; } else { $debugReading and print "reading from $client{$id}{'host'} : ", 0, " bytes\n"; $readwriteable_handles->remove($sock); $sock->close(); delete($onrace{$id}); finishrace($id); } } usleep (100000); } analyse_race(); $debugResults and dump_final(); summary(); details(); exit 1 if scalar(%badrace); exit 0; # burk ! ################################################################################ ################################## END OF MAIN ################################# ################################################################################ sub getoptions { GetOptions( "help" => \$help, "Dopt" => \$debugOptions, "Dgen" => \$debugGeneral, "Dres" => \$debugResolution, "Dcre" => \$debugCreation, "Dcon" => \$debugConnection, "Dsel" => \$debugSelection, "Dwri" => \$debugWriting, "Drea" => \$debugReading, "Dana" => \$debugAnalyse, "Dfin" => \$debugResults, "Dall" => \$debugEverything, "port=i" => \$port, "nbrequests=i" => \$nbrequests, "request=s" => \$request, "inserthost!" => \$inserthost, "timeout=i" => \$timeout, "softimeout=i" => \$softimeout, "regex=s" => \$regex, # "hostname=s" => \$hostname, ); } sub defaultvalues { $port = defined($port) ? $port : 80; $request = defined($request) ? $request : 'HEAD / HTTP/1.0\nUser-Agent: phttp.monitor\r\n\r\n'; # Thanks so much Larry Wall ! finger in the nose. $request =~ s!\Q\n!\n!g ; $request =~ s!\Q\r!\r!g ; $nbrequests = defined($nbrequests) ? $nbrequests : 1; $inserthost = defined($inserthost) ? $inserthost : 1; $timeout = defined($timeout) ? $timeout : 20; $softimeout = defined($softimeout) ? $softimeout : $timeout; $regex = defined($regex) ? $regex : '^HTTP/([\d\.]+)\s+200\b'; $debugOptions = 1, $debugGeneral = 1, $debugCreation = 1, $debugResolution = 1, $debugConnection = 1, $debugSelection = 1, $debugWriting = 1, $debugReading = 1, $debugResults = 1, $debugAnalyse = 1 if ($debugEverything); $debugOptions and print dump_options(); sub dump_options { # Why I wrote a function for this? # Silly ! my (@dump); push (@dump, "port = $port", "\n", "request = $request", "\n", "nbrequests = $nbrequests", "\n", "inserthost = $inserthost", "\n", "timeout = $timeout", "\n", "softimeout = $softimeout", "\n", "regex = $regex", "\n", ); return (@dump); } } sub usage { print < : the http port to connect to. default is 80. --request : the request send to the servers. default is HEAD / HTTP/1.0\\nUser-Agent: phttp.monitor\\n\\n CAVEAT: Do not forget to quote the string. enable -Dopt to see what you really input. You can use \\n to mean newline. --(no)inserthost : insert individual hostname and port in the HTTP request. really usefull with virtual servers. default is on. If you do not want this feature, use --noinserthost --nbrequests : number of requests to do per host do not use it to make DOS attack, please. default is 1. --timeout : time out for any connection. do not forget that all is done in parallel and may that timeout can come from bandwith, cpu usage, etc. default is 20. --softimeout : soft timeout for any connection. the connection will not be interupted when the soft timeout is reached but the test will fail. default is <--timeout> --regex : regular expression to match for a good http response. default is : ^HTTP/([\\d\\.]+)\\s+200\\b CAVEAT: Do not forget to quote the string. enable -Dopt to see what you really input. --Dgen : print general debug information. --Dopt : print option and variables values. --Dres : print debug information on name resolution. --Dcre : print debug information on non blocking socket creation. --Dcon : print debug information on socket connection. --Dsel : print debug information on selecting socket. --Dwri : print debug information on writing socket. --Drea : print debug information on reading socket. --Dana : print debug information on analysing results. --Dfin : print debug information on final results. --Dall : print all debug information. ARGUMENTS host1 host2 ... : list of host to check DEFAULT with no option, the default behavior is exactly like the command: $0 \\ --port=80 \\ --nbrequests=1 \\ --request='HEAD / HTTP/1.0\\r\\nUser-Agent: phttp.monitor\\r\\n\\r\\n' \\ --inserthost \\ --timeout=20 \\ --softimeout=20 \\ --regex='^HTTP/([\\d\\.]+)\\s+200\\b' EOF } sub write_preparation { my @id = sort { $a <=> $b } keys(%onrace); $debugWriting and print "PREPARING THE WRITE MESSAGES\n"; foreach my $id (@id) { # I did not start to try with real HTTP server ! #my $message = "$id" x 100 . "\n"; my $message = qq!$request!; if ($inserthost) { $message =~ s/\r\n\r\n/\r\nHost: $client{$id}{'host'}\r\n\r\n/; }; $client{$id}{"wbuf"} = $message x 1; $debugWriting and print $client{$id}{"wbuf"}; $client{$id}{"length_to_write"} = length($client{$id}{"wbuf"}); } } sub outrace { my ($id) = @_; $client{$id}{'end'} = gettimeofday; delete($onrace{$id}); $badrace{$id}++; $offrace{$id}++; } sub finishrace { my ($id) = @_; delete($onrace{$id}); $client{$id}{'end'} = gettimeofday; $goodrace{$id}++; } sub first_connection { ($debugGeneral or $debugConnection) and print "ASKING FOR CONNECTIONS\n"; $debugConnection and print dump_onrace(); my @id = sort { $a <=> $b } keys(%onrace); foreach my $id (@id) { my ($command); $debugConnection and printf "%-4s %s\n", $id, $client{$id}{'host'}; $client{$id}{"begin"} = gettimeofday; $command = connect($client{$id}{"fhandle"}, $client{$id}{"hispaddr"}); $debugConnection and print "connect : $!\n"; if ($command or ((not $command) and ($! == EINPROGRESS()))){ # Good in non blocking context $client{$id}{"success"} .= "+ first connect succeeded\n"; $onrace{$id}++; }else{ $client{$id}{"problem"} .= "- first connect failed : [$command] $!\n"; $debugConnection and print "first connect failed : [$command] $!\n"; $client{$id}{"fhandle"}->close(); outrace($id); next; } } } sub resolve_names { ($debugGeneral or $debugResolution) and print "RESOLVING NAMES\n"; $debugResolution and print dump_onrace(); my @id = sort { $a <=> $b } keys(%onrace); foreach my $id (@id) { my ($command, $hisiaddr, $hispaddr); $hisiaddr = inet_aton($client{$id}{'host'}); if (defined($hisiaddr)){ # Good $client{$id}{"success"} .= "+ resolving $client{$id}{'host'} succeeded\n"; $debugResolution and printf "%-4s %20s %-15s\n", $id, $client{$id}{'host'}, join('.', unpack('C4', $hisiaddr)); $onrace{$id}++; }else{ # Bad $client{$id}{"problem"} .= "- could not resolve $client{$id}{'host'}\n"; $debugResolution and printf "%-4s %20s %-15s\n", $id, $client{$id}{'host'}, "...."; # This is just because it fails early. $client{$id}{'begin'} = gettimeofday; outrace($id); next; } $hispaddr = pack_sockaddr_in($port, $hisiaddr); if (defined($hispaddr)){ # Good $client{$id}{"success"} .= "+ pack_sockaddr_in command succeeded\n"; $client{$id}{"hispaddr"} = $hispaddr; $onrace{$id}++; }else{ # Bad $client{$id}{"problem"} .= "- pack_sockaddr_in command failed\n"; $debugConnection and print "pack_sockaddr_in command failed\n"; $client{$id}{"fhandle"}->close(); outrace($id); next; } } } sub create_sockets { ($debugGeneral or $debugCreation) and print "CREATING THE NON-BLOCKING SOCKETS\n"; $debugCreation and print dump_onrace(); my @id = sort { $a <=> $b } keys(%onrace); foreach my $id (@id) { my $command; $client{$id}{"fhandle"} = new FileHandle; $client{$id}{"fhandle"}->autoflush(); $command = socket($client{$id}{"fhandle"}, PF_INET, SOCK_STREAM, $proto); if (defined($command)) { if ($command != 0){ # Good $client{$id}{"success"} .= "+ socket command succeeded\n"; $debugCreation and print "socket command succeeded $id -> $client{$id}{'host'}\n"; $onrace{$id}++; }else { # Bad $client{$id}{"problem"} .= "- socket command failed [$command] $!\n"; $debugCreation and print "socket command failed $id -> $client{$id}{'host'} [$command] $!\n"; $client{$id}{"fhandle"}->close(); # This is just because it fails early. $client{$id}{'begin'} = gettimeofday; outrace($id); next; } }else{ # Bad $client{$id}{"problem"} .= "- socket command failed [undef] $!\n"; $debugCreation and print "socket command failed $id -> $client{$id}{'host'} [undef] $!\n"; $client{$id}{"fhandle"}->close(); # This is just because it fails early. $client{$id}{'begin'} = gettimeofday; outrace($id); next; } $command = fcntl($client{$id}{"fhandle"}, F_SETFL(), O_NONBLOCK); if ($command == 0){ # Good $client{$id}{"success"} .= "+ fcntl command succeeded\n"; $onrace{$id}++; }else{ # Bad $client{$id}{"problem"} .= "- fcntl command failed $!\n"; $debugCreation and print "fcntl command failed $!\n"; $client{$id}{"fhandle"}->close(); # This is just because it fails early. $client{$id}{'begin'} = gettimeofday; outrace($id); next; } } } sub dump_posix { my (@dump); push (@dump, "eagain = ", EAGAIN(), "\n", "einprogress = ", EINPROGRESS(), "\n", "etimeout = ", ETIMEDOUT(), "\n", "ealready = ", EALREADY(), "\n", "eisconn = ", EISCONN(), "\n", ); return (@dump); } sub dump_onrace { my @dump; push (@dump, "ONRACE : ", join(" ", sort { $a <=> $b } keys(%onrace)), "\n", ); return @dump; } sub dump_final { print "-" x 0, "\n" x 0, "-" x 36, " RESULTS ", "-" x 35, , "\n", ; print "SUCCESS : ", join (" ", sort map { "$client{$_}{'host'}" . "(" . int($client{$_}{'end'} - $client{$_}{'begin'} + 0.5) . ")" } keys(%pacerace)), "\n"; print "FAILED : ", join (" ", sort map { "$client{$_}{'host'}" . "(" . int($client{$_}{'end'} - $client{$_}{'begin'} + 0.5) . ")" } keys(%badrace)), "\n"; print "-" x 80, "\n"; } sub analyse_race { $debugGeneral and print "ANALYSING RESPONSES\n"; foreach my $id (sort { $a <=> $b } keys(%goodrace)) { my $rbuf= $client{$id}{'rbuf'} || undef; my $host = $client{$id}{'host'}; my $timeresponse = $client{$id}{'end'} - $client{$id}{'begin'}; $debugAnalyse and print "$host response in $timeresponse s :\n$rbuf"; if ($rbuf =~ m~$regex~) { $client{$id}{"success"} .= "+ match the pattern expected\n"; my $end = $client{$id}{'end'}; my $begin = $client{$id}{'begin'}; if (($end - $begin) > $softimeout) { # game over, baby. $client{$id}{"problem"} .= "- soft timeout reached\n"; $debugAnalyse and print "soft timeout reached\n"; $badrace{$id}++; } else{ $pacerace{$id}++; }; }else{ $client{$id}{"problem"} .= "couldn't match the regexp \"$regex\"\n"; $client{$id}{"problem"} .= "in the response below:\n\n$rbuf\n"; $badrace{$id}++; } } } sub summary { my @summary; $debugGeneral and print "SUMARY LINE :\n"; return unless (scalar(%badrace)); foreach my $id (sort { $client{$::a}{'host'} cmp $client{$::b}{'host'} || ($client{$::a}{'end'} - $client{$::a}{'begin'}) <=> ($client{$::b}{'end'} - $client{$::b}{'begin'}) || $::a <=> $::b } keys(%badrace)) { push(@summary, "$client{$id}{'host'}" . "(" . int($client{$id}{'end'} - $client{$id}{'begin'} + 0.5) . ")" ) ; } print join (" ", @summary), "\n" x 1; return; # The first I wrote (it works, of course) # the sorting is alphabetic print join (" ", sort map { "$client{$_}{'host'}" . "(" . int($client{$_}{'end'} - $client{$_}{'begin'} + 0.5) . ")" } keys(%badrace)), "\n" x 2; } sub details { $debugGeneral and print "DETAILS :\n"; foreach my $id (sort { $client{$::a}{'host'} cmp $client{$::b}{'host'} || $::a <=> $::b } keys(%badrace)) { print "\nDetail for id:$id -> $client{$id}{'host'}(", int($client{$id}{'end'} - $client{$id}{'begin'} + 0.5), ")\n", $client{$id}{'success'}, $client{$id}{'problem'}, ; } } mon-1.2.0/mon.d/tcp.monitor0000755003616100016640000000462410230411543015423 0ustar trockijtrockij#!/usr/bin/perl -w # # try to connect to a particular # port on a bunch of hosts. For use with "mon". # # Arguments are "-p port host [host...]" # # Jim Trocki, trockij@arctic.org # # $Id: tcp.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use Socket; my %opt; getopts ("p:t:", \%opt); my $PORT = $opt{"p"} || 23; my $TIMEOUT = $opt{"t"} || 10; my @failures = (); my @detail = (); my $ALARM = 0; foreach my $host (@ARGV) { my $pro = getprotobyname ('tcp'); if (!defined $pro) { die "could not getprotobyname\n"; } if (!defined socket (S, PF_INET, SOCK_STREAM, $pro)) { die "could not create socket: $!\n"; } my $a = inet_aton ($host); if (!defined $a) { push @failures, $host; push @detail, "$host could not inet_aton"; close (S); next; } my $sin = sockaddr_in ($PORT, $a); if (!defined $sin) { push @failures, $host; push @detail, "$host could not sockaddr_in"; close (S); next; } my $r; eval { local $SIG{"ALRM"} = sub { die "alarm\n" }; alarm $TIMEOUT; $r = connect (S, $sin); alarm 0; }; if ($@) { push @failures, $host; if ($@ eq "alarm\n") { push @detail, "$host timeout"; } else { push @detail, "$host interrupted syscall: $!"; } close (S); next; } if (!defined $r) { push @failures, $host; push @detail, "$host could not connect: $!"; close (S); next; } if (!defined close (S)) { push @failures, $host; push @detail, "$host could not close socket: $!"; next; } } if (@failures == 0) { exit 0; } print join (" ", sort @failures), "\n"; print "\n", join ("\n", @detail), "\n"; exit 1; mon-1.2.0/mon.d/foundry-chassis.monitor0000755003616100016640000001471510061516615017770 0ustar trockijtrockij#!/usr/bin/perl # # "mon" monitor to detect chassis-related failures for Foundry switches # # arguments are "[-c community] host [host...]" # # # Jim Trocki # # $Id: foundry-chassis.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 2000, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Std; use strict; sub get_status; sub get_table; sub get_vars; $ENV{"MIBS"} = "FOUNDRY-SN-AGENT-MIB:RFC1213-MIB"; my %opt; getopts ('c:', \%opt); my $COMM = $opt{"c"} || "public"; my @failures = (); my $detail = ""; my %fans; my %PSUs; my %modules; foreach my $host (@ARGV) { my %status = get_status ($host, $COMM); if ($status{"error"} ne "") { push (@failures, $host); $detail .= "could not retrieve status from $host: $status{error}\n\n"; next; } elsif ($status{"failure"}) { push (@failures, $host); $detail .= $status{"failure_summary"}; } $fans{$host} = $status{"fan_table"}; $PSUs{$host} = $status{"psu_table"}; $modules{$host} = $status{"mod_table"}; } if (@failures != 0) { print join (" ", sort @failures), "\n\n$detail\n"; } else { print "\n$detail"; } print <{"iid"}, $r->{"snChasFanOperStatus"}, $r->{"snChasFanDescription"}); } print "\n"; } print <{"iid"}, $r->{"snChasPwrSupplyOperStatus"}, $r->{"snChasPwrSupplyDescription"}); } print "\n"; } print <{"iid"}, $r->{"snAgentBrdModuleStatus"}, $r->{"snAgentBrdMainBrdDescription"}); } print "\n"; } if (@failures != 0) { exit 1; } exit 0; # # params: (hostname, community) # # returns: # ( # "error" => "error name, empty string means no error", # "failure" => "nonzero if a failure", # "psu_table" => [], # "fan_table" => [], # "mod_table" => [], # ) # sub get_status { my ($host, $comm) = @_; my $s; if (!defined ($s = new SNMP::Session ( "DestHost" => $host, "Community" => $comm, "Version" => 2, "UseEnums" => 1, ))) { return ("error" => "cannot create session"); } my $error; my $failure_detected = 0; my $failure_summary = ""; my $psu_table; my $fan_table; my $mod_table; # # is this really a foundry box? # my $sys_oid = $s->get (["sysObjectID", 0]); return ("error" => $s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); return ("error" => "not Foundry device") if ($sys_oid !~ /^\.1\.3\.6\.1\.4\.1\.1991\./); # # this is indeed foundry equipment, so # get power supply table # ($error, $psu_table) = get_table ($s, ["snChasPwrSupplyDescription"], ["snChasPwrSupplyOperStatus"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$psu_table}) { next if ($r->{"snChasPwrSupplyOperStatus"} eq "normal"); $failure_detected = 1; $failure_summary .= "$host PSU failure: $r->{snChasPwrSupplyDescription}\n"; last; } # # get fan table # ($error, $fan_table) = get_table ($s, ["snChasFanDescription"], ["snChasFanOperStatus"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$fan_table}) { next if ($r->{"snChasFanOperStatus"} eq "normal"); $failure_detected = 1; $failure_summary .= "$host fan failure: $r->{snChasFanDescription}\n"; last; } # # get module status # ($error, $mod_table) = get_table ($s, ["snAgentBrdMainBrdDescription"], ["snAgentBrdModuleStatus"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$mod_table}) { next if ($r->{"snAgentBrdModuleStatus"} eq "moduleRunning"); $failure_detected = 1; $failure_summary .= "$host module failure: $r->{snAgentBrdMainBrdDescription}\n"; last; } ( "error" => "", "failure" => $failure_detected, "failure_summary" => $failure_summary, "psu_table" => $psu_table, "fan_table" => $fan_table, "mod_table" => $mod_table, ); } sub get_table { my ($s, @tbl) = @_; my $table = []; my $tblid = $tbl[0]->[0]; my $i = 0; my $row = new SNMP::VarList (@tbl); return ("MIB problem") if (!defined $row); while (defined ($s->getnext ($row))) { last if ($s->{"ErrorStr"} ne ""); my $r = $row->[0]->[0]; last if ($r ne $tblid); foreach my $col (@{$row}) { $table->[$i]->{"iid"} = $col->[1]; $table->[$i]->{$col->[0]} = $col->[2]; } $i++; } return ($s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); ( "", $table, ); } sub get_vars { my ($s, @vars) = @_; my $r = new SNMP::VarList ( @vars ); return ("MIB problem") if (!defined $r); return ($s->ErrorStr) if (!defined ($s->get ($r))); my $v; foreach my $element (@{$r}) { $v->{$element->[0]} = $element->[2]; } ("", $v); } mon-1.2.0/mon.d/xedia-ipsec-tunnel.monitor0000755003616100016640000001501310061516615020335 0ustar trockijtrockij#!/usr/bin/perl # # "mon" monitor to detect dropped tunnels # on a Xedia (Lucent) AccessPoint gateway # # arguments are "[-c community] [-h] [-t] host [host...]" # # -c community SNMP community # -r display PTR records for RemoteAddr # -t display # # Jim Trocki # # $Id: xedia-ipsec-tunnel.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 2000, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Std; use Socket; use strict; sub get_status; sub get_table; sub get_vars; sub fancy_psu_table; sub bitmask_len; sub rev_lookup; $ENV{"MIBS"} = "RFC1213-MIB:XEDIA-IPSEC-MIB"; my %opt; getopts ('c:rt', \%opt); my $COMM = $opt{"c"} || "public"; my @failures = (); my $detail; my %tunnels; my @HOSTS; my @TUNNELS; my %RESOLVE_CACHE; @HOSTS = @ARGV; if ($opt{"t"}) { @TUNNELS = @ARGV; } foreach my $host (@HOSTS) { my %status = get_status ($host, $COMM); if ($status{"error"} ne "") { push (@failures, $host); $detail .= "could not retrieve status from $host: $status{error}\n\n"; next; } elsif ($status{"failure"}) { if ($opt{"t"}) { foreach my $addr (@{$status{"failed_tunnels"}}) { if ($opt{"r"}) { my $r = rev_lookup ($addr->[0]); push (@failures, $r eq "unknown" ? "$addr->[0]/$addr->[1]" : $r) } else { push (@failures, "$addr->[0]/$addr->[1]"); } } } else { push (@failures, $host); } $detail .= $status{"failure_summary"}; } $tunnels{$host} = $status{"tunnel_table"}; } # # output returned to mon # if (@failures != 0) { print join (" ", @failures), "\n"; print "$detail\n"; } else { print "no problems\n"; } print <{"ipsecTunnelAdminStatus"}, $r->{"ipsecTunnelOperStatus"}, $r->{"ipsecTunnelCurSAs"}, $r->{"ipsecTunnelRemoteGateway"}, $r->{"ipsecTunnelRemoteAddress"} . "/" . bitmask_len ($r->{"ipsecTunnelRemoteAddressMask"}), ); if ($opt{"r"}) { print " " x 62 . rev_lookup ($r->{"ipsecTunnelRemoteAddress"}) . "\n"; } } print "\n"; } exit 1 if (@failures != 0); exit 0; # # params: (hostname, community) # # returns: # ( # "error" => "error name, empty string means no error", # ) # sub get_status { my ($host, $comm) = @_; my $s; if (!defined ($s = new SNMP::Session ( "DestHost" => $host, "Community" => $comm, "Version" => 2, "UseEnums" => 1, ))) { return ("error" => "cannot create session"); } my $error; my $failure_detected = 0; my $failure_summary = ""; my $tunnel_table; my $failed_tunnels; # # is this really a xedia router? # my $sys_oid = $s->get (["sysObjectID", 0]); return ("error" => $s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); return ("error" => "not Xedia AP") if ($sys_oid !~ /^\.1\.3\.6\.1\.4\.1\.838\.5.(1|450|1000)/); # # tunnel table # ($error, $tunnel_table) = get_table ($s, ["ipsecTunnelType"], ["ipsecTunnelAdminStatus"], ["ipsecTunnelOperStatus"], ["ipsecTunnelCurSAs"], ["ipsecTunnelRemoteGateway"], ["ipsecTunnelRemoteAddress"], ["ipsecTunnelRemoteAddressMask"], ); return ("error" => $error) if ($error ne ""); my $pruned_tunnel_table; foreach my $r (@{$tunnel_table}) { next if ($r->{"ipsecTunnelType"} ne "siteToSiteDynamic"); push (@{$pruned_tunnel_table}, $r); next if ($r->{"ipsecTunnelAdminStatus"} eq "up" && $r->{"ipsecTunnelOperStatus"} eq "up"); next if ($r->{"ipsecTunnelAdminStatus"} eq "down"); push @{$failed_tunnels}, [$r->{"ipsecTunnelRemoteAddress"}, bitmask_len ($r->{"ipsecTunnelRemoteAddressMask"})]; $failure_summary .= "$host tunnel failure for $r->{ipsecTunnelRemoteAddress}/" . bitmask_len ($r->{"ipsecTunnelRemoteAddressMask"}) . " gw $r->{ipsecTunnelRemoteGateway}\n"; $failure_detected++; } ( "error" => "", "failure" => $failure_detected, "failure_summary" => $failure_summary, "failed_tunnels" => $failed_tunnels, "tunnel_table" => $pruned_tunnel_table, ); } sub get_table { my ($s, @tbl) = @_; my $table = []; my $tblid = $tbl[0]->[0]; my $i = 0; my $row = new SNMP::VarList (@tbl); return ("MIB problem") if (!defined $row); while (defined ($s->getnext ($row))) { last if ($s->{"ErrorStr"} ne ""); my $r = $row->[0]->[0]; last if ($r ne $tblid); foreach my $col (@{$row}) { $table->[$i]->{"iid"} = $col->[1]; $table->[$i]->{$col->[0]} = $col->[2]; } $i++; } return ($s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); ( "", $table, ); } sub get_vars { my ($s, @vars) = @_; my $r = new SNMP::VarList ( @vars ); return ("MIB problem") if (!defined $r); return ($s->ErrorStr) if (!defined ($s->get ($r))); my $v; foreach my $element (@{$r}) { $v->{$element->[0]} = $element->[2]; } ("", $v); } sub bitmask_len { my $bitmask = shift; my $len = 0; foreach my $octet (split /\./, $bitmask) { for (my $i = 0; $i < 8; $i++) { $len += $octet & 1; $octet >>= 1; } } $len; } sub rev_lookup { my $addr = shift; return $RESOLVE_CACHE{$addr} if ($RESOLVE_CACHE{$addr}); my $h = gethostbyaddr (inet_aton ($addr), AF_INET); if (!defined $h) { $h = "unknown"; } else { $h =~ s/\..*$//; } $RESOLVE_CACHE{$addr} = $h; $h; } mon-1.2.0/mon.d/dialin.monitor.wrap.c0000644003616100016640000000115410146140376017267 0ustar trockijtrockij/* * setgid wrapper for use with dialin.monitor. Required for * UUCP locking permissions in /var/lock. * * cc -o dialin.monitor.wrap dialin.monitor.wrap.c * chown root dialin.monitor.wrap * chgrp uucp dialin.monitor.wrap * chmod 02555 dialin.monitor.wrap * * $Id: dialin.monitor.wrap.c,v 1.2 2004/11/15 14:45:18 vitroth Exp $ * */ #include #ifndef REAL_DIALIN_MONITOR #define REAL_DIALIN_MONITOR "/usr/lib/mon/mon.d/dialin.monitor" #endif int main (int argc, char *argv[]) { char *real_img = REAL_DIALIN_MONITOR; argv[0] = real_img; /* exec */ return execv (real_img, argv); } mon-1.2.0/mon.d/file_change.monitor0000755003616100016640000001750010146140377017070 0ustar trockijtrockij#!/usr/bin/perl # # mon monitor to watch for file changes # # Jon Meek - April 2001 - # $RCSid = q{$Id: file_change.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $}; # # Copyright (C) 2001 Jon Meek # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # =head1 NAME file_change.monitor =head1 SYNOPSIS I =head1 DESCRIPTION B will watch specified files in a directory and trigger an alert when any monitored file changes, or is missing. File changes can optionally be logged using RCS. This monitor was designed to monitor B of the actual files. The files will often be copies of files mirrored from remote systems such as firewalls, routers, mail gateways, etc. File changes are detected using MD5 checksums. Current file information is stored in the mon state directory in a file that corresponds to the base directory (with '/' replaced by '_'). The file contains a time stamp (in UNIX seconds), the MD5 checksum (in hexadecimal) and the name of the file. This file is generated automatically if it does not already exist. =head1 OPTIONS =over 5 =item B<-b base_directory> The base directory for the files. All filenames are relative to this directory. =item B<-r> Log file changes using RCS. Each directory containing files to be checked should have an RCS subdirectory. The mon user needs access (possibly both file permissions and RCS authorization) to the RCS files. B will leave the file locked so that it will be able to check in the next revision. This option should NOT be used when the original files are being monitored directly since there will be permission issues and because the current checkin process re-writes the original file (via I). =item B<-d> Provide debugging information. =back =head1 EXAMPLE MON CONFIGURATION hostgroup fw_configs pr-ifw/pr-ifw1.html pt-ifw/pt-ifw.html rd-ifw/rd-ifw1.html rd-ifw/rd-ifw2.html watch fw_configs service file_change interval 5m monitor file_change.monitor -r -b /home/httpd/html/sys_status period wd {Sun-Sat} alert mail.alert meekj@ieee.org =head1 HANDLING RCS PERMISSION ISSUES Files must be initially checked-in before the monitor can take over the task. sudo -u netmon ci -l file.cfg If file_change.monitor ever runs as a different user, such as root, there will be problems. To correct the situation, perform the following on each file: sudo ci -u -mfile_change.monitor file.cfg sudo -u netmon co -l file.cfg =head1 BUGS B does not currently recognize any file locking mechanism. There could be a problem if a file is being modified when the monitor runs. Using a program like rsync to copy files to the monitor directory should nearly eliminate this problem since copies are usually made to a temporary file and the rename is an atomic operation. =head1 AUTHOR Jon Meek =cut use Getopt::Long; use Digest::MD5; $TimeNow = time; GetOptions( "b=s" => \$BaseDir, "d" => \$Debug, "r" => \$RCS, ); $StateFile = $BaseDir; $StateFile =~ s/\//_/g; # Change / to _ to make filename $StateFile = "$ENV{MON_STATEDIR}/$StateFile"; $CI = 'ci'; # Assume that RCS's ci is in the path print "Will use RCS: $RCS\n" if $Debug; # # Read the previous checksums if the State File exists # if (-e $StateFile) { print "Existing $StateFile\n" if $Debug; open(F, $StateFile); while ($in = ) { ($t, $md5, $f) = split(' ', $in); if ($md5 eq 'LastCheck') { # May add a LastCheck time later $LastCheck = $md5; } else { $PrevChangeTime{$f} = $t; $PrevMD5{$f} = $md5; } } close F; $StateFileExists = 1; # Remember that there is a existing State File } else { $StateFileExists = 0; # or not print "No Existing $StateFile\n" if $Debug; } $NewFile = 0; @Failures = (); foreach $f (@ARGV) { if ($f =~ /\*/) { push(@Files, glob("$BaseDir/$f")); } else { push(@Files, "$BaseDir/$f"); } } #@Files = @ARGV; # File names are left on the command line after Getopt $md5 = new Digest::MD5; foreach $f (@Files) { # Check each file # if (defined $BaseDir) { # $rdfile = "$BaseDir/$f"; # } else { # $rdfile = "$f"; # } # if (open(F, $rdfile)) { if (open(F, $f)) { seek(F, 0, 0); # Compute MD5 checksum $md5->reset(); $md5->addfile(F); $fileMD5 = $md5->hexdigest(); close F; print "File: $f\n NewMD5: $fileMD5\n PreviousMD5: $PrevMD5{$f}\n" if $Debug; $CurrentMD5{$f} = $fileMD5; if (exists $PrevMD5{$f}) { if ($CurrentMD5{$f} ne $PrevMD5{$f}) { push(@Failures, $f); $CurrentChangeTime{$f} = $TimeNow; $dTime = $TimeNow - $PrevChangeTime{$f}; $fmtTimeDiff = &TimeDisplayScale($dTime); $ResultString{$f} = "File changed, previous change $fmtTimeDiff"; print " File: $f changed, previous change $dTime s\n" if $Debug; if ($RCS) { # Check new version of file into RCS $Command = "$CI -l -mfile_change.monitor $f 2>&1"; # Save STDOUT & STDIN from ci print " ci command: $Command\n" if $Debug; $OpenResult = open(CI, "$Command |") or warn "Can't fork $Command: $!"; print " command open result: $CmdResult\n" if $Debug; while ($in = ) { $CiOutput .= $in; } close CI or warn "Can't close ci: $!/$?";; print " ci output: $CiOutput\n" if $Debug; } } else { $CurrentChangeTime{$f} = $PrevChangeTime{$f}; } } else { # Here if no state file entry exists for this $f $CurrentChangeTime{$f} = $TimeNow; $NewFile++; } } else { # The file does not exist, report as a failure push(@Failures, $f); $ResultString{$f} = 'File Does Not Exist'; } } print "\n" if $Debug; # # Report results, or keep quiet if all is well # print "New: $NewFile Fail: @Failures $StateFile\n" if $Debug; if ($NewFile || @Failures) { # Need to write new state file print "Writing a new $StateFile\n" if $Debug; open(F, ">$StateFile"); foreach $f (sort keys %CurrentMD5) { print F "$CurrentChangeTime{$f} $CurrentMD5{$f} $f\n"; } close F; } if (@Failures == 0) { # No files changed exit 0; } # # Handle file change notification # print "@Failures\n"; foreach $f (sort @Failures) { print "$f: $ResultString{$f}\n\n"; } print "\n"; print "$CiOutput\n"; #print "$CiOutput\n" if $Debug; exit 1; # Indicate failures sub TimeDisplayScale { # Scale the time from seconds to minutes/hours/days my($current_time, $past_time) = @_; my($dt, $ret, $idt, $dts); $dt = $current_time - $past_time; $dts = $dt; if ($dt < 60) { # Arbitrary seconds/minutes boundry $ret = "$dt seconds"; } else { $dt = $dt / 60; $idt = int ($dt + 0.5); $ret = sprintf("%d minutes (%d s)", $idt, $dts); } if ($dt > 120) { # Arbitrary minutes/hours boundry $dt = $dt / 60; $ret = sprintf("%0.1f hours (%d s)", $dt, $dts); } if ($dt > 48) { # Arbitrary hours/day boundry $dt = $dt / 24; $ret = sprintf("%0.1f days (%d s)", $dt, $dts); } return $ret; } __END__ file_change.monitor -d -b /home/httpd/html/sys_status pr-ifw/pr-ifw1.html pt-ifw/pt-ifw.html ~/lab/mon/file_change.monitor -r -d -b /home/httpd/html/sys_status rd-ifw/rd-ifw2.html mon-1.2.0/mon.d/smtp3.monitor0000755003616100016640000004362110630617564015722 0ustar trockijtrockij#!/usr/bin/perl # Yet another smtp monitor using IO::Socket with timing, logging # This version looks deeper than the banner to catch milter and other problems # # $Id: smtp3.monitor,v 1.2.2.1 2007/06/03 20:07:16 trockij Exp $ # # Copyright (C) 2001-2006, Jon Meek, meekj at ieee.org # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # =head1 NAME B - smtp monitor for mon with timing, logging, optional MX lookup, and diagnostic capability. =head1 DESCRIPTION A SMTP monitor using IO::Socket with connection response timing and optional logging. This test is reasonably complete. Following the greeting banner from the SMTP server the monitor client issues the HELO and MAIL commands then closes the session with a QUIT command. Early versions of this monitor simply looked at the initial greeting banner, but that did not detect certain temporary failure conditions. While configuring mon for this monitor keep in mind that a busy mail server may reject new connections. =head1 SYNOPSIS B [-d] [-l log_file_YYYYMM.log] [--timeout timeout_seconds] [--alarmtime alarm_time] [--maxfailtime seconds] [--mx] [--esmtp] [--requiretls] [--nofail] [--from user@domain.com] [--to r1@d1.com,r2@d2.edu] [--size nnnnn] [--port nn] host host1 host2 ... =head1 OPTIONS =over 5 =item B<-d> Debug/Diagnostic mode. Useful for manual command line use for diagnosing mail delivery problems. To determine if a mail destination will accept mail the --mx flag will useful. =item B<--timeout timeout> Connect timeout in seconds. =item B<--alarmtime alarm_timeout> Alarm if connect is successful but took longer than alarm_timeout seconds. =item B<--maxfailtime seconds> Alarm if connect fails only if the response time is greater than this value. If a Sendmail server is in REFUSE_LA, or similar, state due to load it will usually reject the connection in a few milliseconds. A typical value might be 0.050 for servers near the monitoring system. =item B<-l log_file_template> /path/to/logs/smtp_YYYYMM.log Current year & month are substituted for YYYYMM, that is the only possible template at this time. =item B<--mx> Lookup the MX records for the domains/hosts and test them in preference order. The first successful test will be considered a success for that domain. This was originally devised for manual command line use as a tool to verify that mail stuck in outbound queues really can not be delivered. It could be used with mon as well, however you are usually going to want to test ALL of your smtp servers, not just be sure that one of them is OK. --mx applies to all of the domains/hosts listed on the command line. =item B<--esmtp> Try ESMTP before SMTP. =item B<--requiretls> Check that STARTTLS is offered, fail if it is not. This option forces B<--esmtp>. =item B<--nofail> Never provide a failure return to mon. Useful in certain testing envrionments when logging. =item B<--port nnn> Specify a port to use. Defaults to 25. =back =head1 MON CONFIGURATION EXAMPLE hostgroup smtp mail1.mymails.org mail2.mymails.org mail3.mymails.org watch smtp service smtp_check interval 5m monitor smtp3.monitor --timeout 70 --alarmtime 30 -l /n/na1/logs/wan/smtp_YYYYMM.log period wd {Sun-Sat} alert mail.alert meekj@mymails.org alertevery 1h summary =head1 LOG FILE FORMAT A normal log entry has the format: measurement_time smtp_host_name connect_time A failed connection log entry contains: measurement_time smtp_host_name connect_time smtp_code_and_greeting (or connect_error) Where: F - Is the time of the connection attempt in seconds since 1970 F - Is the name of the smtp server that was tested. If --mx was selected then this field is servername=MX_record where MX_record is the mail domain (host) from the command line. F - Is the time from the connect request until the SMTP greeting appeared in seconds with 100 microsecond resolution. If the connection failed the time spent waiting for the connection will be a negative number. F - Should have the SMTP response code integer followed by the greeting banner if there was a problem. F - If present may indicate "Connect failed" meaning that the connect attempt failed immediately, possibly due to a DNS lookup error or because the server is not running any service on port 25. The field may also be "Connect timeout" indicating that the connect failed after the set timeout period. =head1 BUGS It should be possible to specify --esmtp and --requiretls on a per-host basis. A SMTP temporary failure code could cause the monitor to retry the connection a certain number of times. It is not yet possible to specify the username / domain for the HELO and MAIL commands, but it would be very simple to add. =head1 REQUIRED NON-STANDARD PERL MODULES IO::Socket Time::HiRes Net::DNS (only if --mx option will be used) If you do not have Time::HiRes you can choose to comment out the lines that refer to F and F but several features will be lost. =head1 AUTHOR Jon Meek, meekj at ieee.org $Id: smtp3.monitor,v 1.2.2.1 2007/06/03 20:07:16 trockij Exp $ =cut use English; use Sys::Hostname; use Getopt::Long; use IO::Socket; use Time::HiRes qw( gettimeofday tv_interval ); $RCSid = q{$Id: smtp3.monitor,v 1.2.2.1 2007/06/03 20:07:16 trockij Exp $ }; $ESMTP = 0; $RequireTLS = 0; GetOptions ('mx' => \$UseMX, 'd' => \$opt_d, 'esmtp' => \$ESMTP, 'requiretls' => \$RequireTLS, 'timeout=i' => \$TimeOut, 't=i' => \$TimeOut, 'alarmtime=i' => \$opt_T, 'maxfailtime=f' => \$MaxFailTime, 'T=i' => \$opt_T, 'logfile=s' => \$opt_l, 'l=s' => \$opt_l, 'nofail' => \$NoFail, 'size=i' => \$MessageSize, 'port=i' => \$Port, 'from=s' => \$FromAddress, 'to=s' => \$ToAddresses, ); $ESMTP = 1 if $RequireTLS; if ($UseMX) { # Will need Net::DNS Module, but don't require the module if it won't be used eval "use Net::DNS"; do { warn "Couldn't load Net::DNS: $@"; undef $UseMX; } unless ($@ eq ''); $Resolver = new Net::DNS::Resolver; } $Port = 'smtp(25)' unless $Port; $TimeOut = 30 unless $TimeOut; # Default timeout in seconds $dt = 0; # Initialize connect time variable @Failures = (); # Initialize failure list $TimeOfDay = time; # Current time print "TimeOfDay: $TimeOfDay\n" if $opt_d; # # Get the process username and the hostname of the monitor machine # $MonitorUsername = getpwuid($UID); $MonitorHostname = hostname; $host_address = gethostbyname($MonitorHostname); $MonitorHostname = gethostbyaddr($host_address, AF_INET); $FromAddress = qq{$MonitorUsername\@$MonitorHostname} unless $FromAddress; print " From: $FromAddress\n" if $opt_d; print " TimeOut: $TimeOut\n" if $opt_d; # # Check each host, or MX record # foreach $host (@ARGV) { print "Check: $host\n" if $opt_d; # # Get the MX records, if we need them # if ($UseMX) { undef %MXval; undef @MXorder; @mx = mx($Resolver, $host); if (@mx) { foreach $rr (@mx) { $preference = $rr->preference; $mxrecord = $rr->exchange; $MXval{$mxrecord} = $preference; } } else { print "can't find MX records for $host: ", $Resolver->errorstring, "\n" if $opt_d; push(@Failures, $host); # Call it a failure $FailureDetail{$host} = "Can't find MX records"; next; } # # Sort the MX records into preference order # print "MX records for $host:\n" if $opt_d; foreach $k (sort {$MXval{$a} <=> $MXval{$b}} keys %MXval) { $Arecord = ''; # Clear for this MX push(@MXorder, $k); if ($opt_d) { # If in debug/verbose mode lookup A record $name = $k . '.'; # Append dot for absolute lookup if ($packet = $Resolver->search($name)) { @answer = $packet->answer; foreach $rr (@answer) { $address = ''; $name = $rr->name; $type = $rr->type; $address = $rr->address if ($type eq 'A'); $Arecord .= "$type: $address "; # Append, in case some other records are found } } else { $arecord = "Could not find A record for $name"; } } printf " %3d - %s %s\n", $MXval{$k}, $k, $Arecord if $opt_d; } } # # Now actually do the smtp check # if ($UseMX && @mx) { # Check MX records, stop after first success foreach $mx (@MXorder) { $HostPlusMX = "$host=$mx"; push(@HostNames, $HostPlusMX); $TestTime{$HostPlusMX} = time; print "Checking $HostPlusMX\n" if $opt_d; $result = &CheckSMTP($HostPlusMX); last if ($result); } } else { # Regular host check push(@HostNames, $host); $TestTime{$host} = time; $result = &CheckSMTP($host); } } if ($opt_d) { foreach $host (sort @HostNames) { print "$TestTime{$host} $host $ConnectTime{$host} $InitialBanner{$host}\n"; # ($shortfail, $rest) = split(/\n/, $InitialBanner{$host}, 2); # print "$TestTime{$host} $host $ConnectTime{$host} $shortfail\n"; } } # Write results to logfile, if -l if ($opt_l) { # Determine logfile name, usually based on year/month $LogFile = $opt_l; ($sec,$min,$hour,$mday,$Month,$Year,$wday,$yday,$isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $host (sort @HostNames) { $FailureDetail{$host} =~ s/\n/ /g; # Put it on one line, but result may be too long $FailureDetail{$host} =~ s/ $//; # Trim final space # ($shortfail, $rest) = split(/\n/, $FailureDetail{$host}, 2); # print LOG "$TestTime{$host} $host $ConnectTime{$host} $shortfail\n"; print LOG "$TestTime{$host} $host $ConnectTime{$host} $FailureDetail{$host}\n"; } close LOG; } if (@Failures == 0) { # Indicate "all OK" to mon exit 0; } # # Otherwise we have one or more failures # @SortedFailures = sort @Failures; print "@SortedFailures\n"; foreach $host (@SortedFailures) { print "$host $ConnectTime{$host} $FailureDetail{$host}\n"; } print "\n"; exit 0 if $NoFail; # Never indicate failure if $NoFail is set exit 1; # Indicate failure to mon sub CheckSMTP { my $host = shift; my $t1, $t2, $dt, $mx_name, $stripped_host; my $Failure = 0; # Flag to indicate failure for return code # return 0 may not be working inside eval my $buflength = 1024; if ($host =~ /=/) { # Have MX data ($mx_name, $stripped_host) = split(/=/, $host); } else { $stripped_host = $host; } # # Use eval/alarm to handle timeout # eval { local $SIG{ALRM} = sub { die "timeout\n" }; # Alarm handler alarm($TimeOut); # Do a SIG_ALRM in $TimeOut seconds $t1 = [gettimeofday]; # Start connection timer, then connect my $sock = IO::Socket::INET->new(PeerAddr => $stripped_host, PeerPort => $Port, Proto => 'tcp'); if (defined $sock) { # Connection succeded $in = ''; $bytes = sysread($sock, $in, $buflength); # Handle multi-line banners $InitialBanner{$host} = $in; $t2 = [gettimeofday]; # Stop clock print " Banner: $InitialBanner{$host}\n" if $opt_d; if ($InitialBanner{$host} !~ /^220/) { # Consider "220 Service ready" to be only valid push(@Failures, $host); # Note failure if (length($InitialBanner{$host}) == 0) { # Note empty banner $InitialBanner{$host} = 'null'; } $FailureDetail{$host} = "BANNER: " . $InitialBanner{$host}; # Save failure banner $ConnectTime{$host} = -1; # last; $Failure = 1; print "QUIT\r\n" if $opt_d; print $sock "QUIT\r\n"; # Shutdown connection close $sock; return 0; } if ($ESMTP) { # Try EHLO first print "EHLO $MonitorHostname\r\n" if $opt_d; print $sock "EHLO $MonitorHostname\r\n"; $in = ''; $bytes = sysread($sock, $in, $buflength); # Handle multi-line banners $EhloResponse{$host} = $in; print " EHLO resp: $EhloResponse{$host}\n" if $opt_d; if ($EhloResponse{$host} !~ /^250/) { # Consider "250 Requested mail action okay, completed" to be only valid push(@Failures, $host); # Note failure print "EHLO Failure!\n" if $opt_d; $FailureDetail{$host} = "EHLO: " . $EhloResponse{$host}; # Save failure banner #last; $Failure = 1; print "QUIT\r\n" if $opt_d; print $sock "QUIT\r\n"; # Shutdown connection close $sock; return 0 if $RequireESMTP; } if ($RequireTLS && ($EhloResponse{$host} !~ /STARTTLS/)){ # Check TLS advertisement push(@Failures, $host); # Note failure $FailureDetail{$host} = "STARTTLS Not Offered "; print "STARTTLS Not Offered!\n" if $opt_d; print $sock "QUIT\r\n"; # Shutdown connection close $sock; return 0; } } if (!$ESMTP or ($ESMTP && $Failure)) { print $sock "HELO $MonitorHostname\r\n"; $in = ''; $bytes = sysread($sock, $in, $buflength); # Handle multi-line banners $HeloResponse{$host} = $in; print " HELO resp: $HeloResponse{$host}\n" if $opt_d; if ($HeloResponse{$host} !~ /^250/) { # Consider "250 Requested mail action okay, completed" to be only valid push(@Failures, $host); # Note failure print "HELO Failure!\n" if $opt_d; $FailureDetail{$host} = "HELO: " . $HeloResponse{$host}; # Save failure banner #last; $Failure = 1; print "QUIT\r\n" if $opt_d; print $sock "QUIT\r\n"; # Shutdown connection close $sock; return 0; } } $FromLine = qq{MAIL From:<$FromAddress>}; if ($MessageSize) { $FromLine .= qq{ SIZE=$MessageSize}; } $FromLine .= qq{\r\n}; print $FromLine if $opt_d; print $sock $FromLine; chomp($MailResponse{$host} = <$sock>); print " MAIL resp: $MailResponse{$host}\n" if $opt_d; if ($MailResponse{$host} !~ /^250\s+/) { # Consider "250 Requested mail action okay, completed" to be only valid push(@Failures, $host); # Note failure $FailureDetail{$host} = "MAIL: " . $MailResponse{$host}; # Save failure banner #last; $Failure = 1; print "QUIT\r\n" if $opt_d; print $sock "QUIT\r\n"; # Shutdown connection close $sock; return 0; } if ($ToAddresses) { # Addresses given on command line (@to_addrs) = split(/,/, $ToAddresses); foreach $to (@to_addrs) { $RcptCommand = qq{RCPT TO:<$to>}; print "$RcptCommand\r\n" if $opt_d; print $sock "$RcptCommand\r\n"; chomp($RcptResponse = <$sock>); print " RCPT resp: $RcptResponse\n" if $opt_d; } } print "QUIT\r\n" if $opt_d; print $sock "QUIT\r\n"; # Shutdown connection close $sock; $dt = tv_interval ($t1, $t2); # Compute connection time $ConnectTime{$host} = sprintf("%0.4f", $dt); # Format to 100us resolution if ($opt_T) { # Check for slow response if ($dt > $opt_T) { push(@Failures, $host); # Call it a failure $FailureDetail{$host} = "Slow Connect"; $Failure = 1; return 0; } } } else { # Connection failed $t2 = [gettimeofday]; # Stop clock $dt = tv_interval ($t1, $t2); # Compute connection time $ConnectTime{$host} = sprintf("-%0.4f", $dt); # Format to 100us resolution, -val if failure print " Connect to $host failed\n" if $opt_d; if ($MaxFailTime) { if ($dt <= $MaxFailTime) { # Don't alarm on connection refusals due to server load $Failure = 0; return 1; } } push(@Failures, $host); # Save failed host $FailureDetail{$host} = "Connect failed"; $Failure = 1; return 0; } }; alarm(0); # Stop alarm countdown if ($@ =~ /timeout/) { # Detect timeout failures $t2 = [gettimeofday]; # Stop clock $dt = tv_interval ($t1, $t2); # Compute connection time $ConnectTime{$host} = sprintf("-%0.4f", $dt); # Format to 100us resolution, -val if timeout push(@Failures, $host); print " Connect to $host timed-out\n" if $opt_d; $FailureDetail{$host} = "Connect timeout"; $Failure = 1; return 0; } if ($Failure) { # Important when an MX record list is being checked return 0; } else { return 1; } } __END__ SMTP Reply Codes From RFC-821 - may use in the future 211 System status, or system help reply 214 Help message [Information on how to use the receiver or the meaning of a particular non-standard command; this reply is useful only to the human user] 220 Service ready 221 Service closing transmission channel 250 Requested mail action okay, completed 251 User not local; will forward to 354 Start mail input; end with . 421 Service not available, closing transmission channel [This may be a reply to any command if the service knows it must shut down] 450 Requested mail action not taken: mailbox unavailable [E.g., mailbox busy] 451 Requested action aborted: local error in processing 452 Requested action not taken: insufficient system storage 500 Syntax error, command unrecognized [This may include errors such as command line too long] 501 Syntax error in parameters or arguments 502 Command not implemented 503 Bad sequence of commands 504 Command parameter not implemented 550 Requested action not taken: mailbox unavailable [E.g., mailbox not found, no access] 551 User not local; please try 552 Requested mail action aborted: exceeded storage allocation 553 Requested action not taken: mailbox name not allowed [E.g., mailbox syntax incorrect] 554 Transaction failed mon-1.2.0/mon.d/rd.monitor0000755003616100016640000001140010061516614015237 0ustar trockijtrockij#!/usr/bin/perl # readdir.monitor # Return a list of directories that contains more than given files numbers # For use with "mon" or stand alone. # # # Usage : my-mailqueue.monitor [options] [dir1[:num1] dir2[:num2] ...] # # --number n : the maximum file number allowed # --regex string : a regex expression to match # --debug : print some debug information (do not use this with mon) # but just in command line to understand everything :-) # dir1 dir2 : list of directory to check # Examples: # # Do nothing (nothing to check) # $ ./my-readdir.monitor # # Checks if: # /var/spool/mqueue contains more than 50 files (50 is a default value) # # $ ./my-readdir.monitor /var/spool/mqueue/ # # Checks if: # /var/spool/mqueue contains more than 14 files # # $ ./my-readdir.monitor /var/spool/mqueue/:14 # # Check if : # /var/spool/mqueue contains more than 14 files # /var/spool/lp/requests contains more than 7 files # # $ ./my-readdir.monitor /var/spool/mqueue/:14 /var/spool/lp/requests:7 # # Check if : # /var contains more than 34 files # /var/spool contains more than 34 files # /bin contains more than 65 files # # $ ./my-readdir.monitor --number=34 /var /var/spool /bin:65 # # Check if : # /var/spool/mqueue contains more than 3 files which name # begins with "df" # # $ ./my-readdir.monitor --number=3 --regex "^df" /var/spool/mqueue # # The regex can be every perl regex. # this program exits with the $maxlogNumberFound in log based 2 fashion. # file number found = 1 * number Allowed, log = 1 # file number found = 2 * number Allowed, log = 2 # file number found = 4 * number Allowed, log = 3 # . # . # . # 2^n * $numberAllowed, log = n+1 # # Reverse: # if the return status is N, that means that the directories # contains more (or equal) than 2^(N-1) * allowed files number # and less than 2^N * allowed files number # The worse situation is return (in case of several directories) # # Gilles LAMIRAL, lamiral@mail.dotcom.fr # # # Copyright (C) 1998, Gilles LAMIRAL # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as # published by the Free Software Foundation; either version 2 of the # License, or (at your option) any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 # USA # use Getopt::Long; use DirHandle; GetOptions( "number:i" => \$globalNumberAllowed, "debug" => \$debug, "regex:s" => \$regex ); $globalNumberAllowed = ($globalNumberAllowed) ? $globalNumberAllowed : "50"; @failures = (); $maxlogNumberFound = 0; foreach $dir (@ARGV) { my ($dirHandle, @filesList, @filesListFiltered, $numberFound); ($dir, $numberAllowed) = split (/:/, $dir, 2); $numberAllowed = ($numberAllowed) ? $numberAllowed : $globalNumberAllowed; ($debug) and print "directory checked : $dir\n"; ($debug) and print " number of file allowed : $numberAllowed\n"; $dirHandle = new DirHandle "$dir"; if (defined $dirHandle) { # reads the directory and filters "." and ".." files @filesList = grep !/^\.\.?$/, $dirHandle->read(); $dirHandle->close(); if ($regex) { foreach $file (@filesList) { push(@filesListFiltered, $file) if ($file =~ /$regex/); } @filesList = @filesListFiltered; } $numberFound = scalar(@filesList); #($debug) and print "@filesList\n"; ($debug) and print " number of files : $numberFound\n"; if ($numberFound >= $numberAllowed) { push (@failures, sprintf ("%s:%s", $dir, $numberFound)); $logNumberFound = 1+(log($numberFound / $numberAllowed)/log(2)); ($debug) and print " 1+(log($numberFound/$numberAllowed)/log(2)) = ", $logNumberFound,"\n"; if ($logNumberFound > $maxlogNumberFound) { $maxlogNumberFound = $logNumberFound; } } }else{ warn "Could not open $dir : $!\, warn at"; push (@failures, sprintf ("%s:%s", $dir, "COULD_NOT_OPEN")); }; } if (@failures == 0) { exit 0; } ($debug) and print "\nSummary:"; print join (" ", sort @failures), "\n"; if ($maxlogNumberFound >= 1) { ($debug) and print "maxlogNumberFound (exit status) =$maxlogNumberFound\n"; $exitStatus = int($maxlogNumberFound); }else{ $exitStatus = 1; } exit($exitStatus); mon-1.2.0/mon.d/fping.monitor0000755003616100016640000001172410620740052015742 0ustar trockijtrockij#!/usr/bin/perl # # Return a list of hosts which not reachable via ICMP echo # # Jim Trocki, trockij@arctic.org # # $Id: fping.monitor,v 1.3.2.1 2007/05/11 01:00:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use strict; use Getopt::Std; my %opt; getopts ("ahr:s:t:T", \%opt); sub usage { print <&1 |") || die "could not open pipe to fping: $!\n"; my @unreachable; my @alive; my @addr_not_found; my @slow; while () { chomp; if (/^(\S+) is unreachable/) { push (@unreachable, $1); } elsif (/^(\S+) is alive \((\S+)/) { if ($opt{"s"} && $2 > $opt{"s"}) { push (@slow, [$1, $2]); } else { push (@alive, [$1, $2]); } } elsif (/^(\S+)\s+address\s+not\s+found/) { push @addr_not_found, $1; push @unreachable, $1; } # # fping can output a number of messages in addition to the eventual # reachable/unreachable. Ignore them since we'll also get the main # "unreachable" message). # elsif (/^ICMP .+ from \S+ for ICMP Echo sent to /) { # do nothing } # # ICMP Host Unreachable from 1.2.3.4 for ICMP Echo sent to 2.4.6.8 # elsif (/^ICMP (.*) for ICMP Echo sent to (\S+)/) { if (! exists $details{$2}) { $details{$2}= $_; } } elsif (/^ICMP Time Exceeded from \S+ for ICMP Echo sent to (\S+) /) { push @unreachable, $1; } else { print STDERR "unidentified output from fping: [$_]\n"; } } close (IN); $END_TIME = time; my $retval = $? >> 8; if ($retval == 3) { print "fping: invalid cmdline arguments [$CMD @ARGV]\n"; exit 1; } elsif ($retval == 4) { print "fping: system call failure\n"; exit 1; } elsif ($retval == 1 || $retval == 2 || @slow != 0) { print join (" ", sort (@unreachable, map { $_->[0] } @slow)), "\n\n"; } elsif ($retval == 0) { print "\n"; } else { print "unknown return code ($retval) from fping\n"; } print "start time: " . localtime ($START_TIME) . "\n"; print "end time : " . localtime ($END_TIME) . "\n"; print "duration : " . ($END_TIME - $START_TIME) . " seconds\n\n"; if (@unreachable != 0) { print <&1"); } print "\n"; } # # fail only if all hosts do not respond # if ($opt{"a"}) { if (@unreachable == @ARGV) { exit 1; } exit 0; } exit 1 if (@slow != 0); exit $retval; mon-1.2.0/mon.d/smtp.monitor0000755003616100016640000001175010230411543015616 0ustar trockijtrockij#!/usr/bin/perl # # Use try to connect to a SMTP server, and # wait for the right output. # # For use with "mon". # # Arguments are "-p port -t timeout host [host...]" # # Adapted from "http.monitor" by # Jim Trocki, trockij@arctic.org # # http.monitor written by # # Jon Meek # American Cyanamid Company # Princeton, NJ # # $Id: smtp.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; getopts ("p:t:"); $PORT = $opt_p || 25; $TIMEOUT = $opt_t || 30; my %good; my %bad; foreach $host (@ARGV) { my $result = smtpGET($host, $PORT); if ($result->{"ok"}) { $good{$host} = $result; } else { $bad{$host} = $result; } } # # summary line # if (keys %bad == 0) { print "\n"; } else { print join (" ", sort keys %bad), "\n"; } # # detail # foreach my $host (keys %bad) { print "$host failed with error " . $bad{$host}->{"error"}, "\n"; print "detail for $host\n"; print "==============================================================================\n"; if ($bad{$host}->{"detail"} ne "") { print $bad{$host}->{"detail"}; } else { print "no detail\n"; } print "\n"; } print "\n"; foreach my $host (keys %good) { print "$host succeeded\n"; print "detail for $host\n"; print "==============================================================================\n"; if ($good{$host}->{"detail"} ne "") { print $good{$host}->{"detail"}; } else { print "no detail\n"; } print "\n"; } if (keys %bad != 0) { exit 1; } exit 0; sub smtpGET { use Socket; use Sys::Hostname; my($Server, $Port) = @_; my($ServerOK, $TheContent); my ($OurHostname); my $result = { "ok" => 0, "error" => undef, "detail" => undef, }; $ServerOK = 0; $TheContent = ''; $Path = '/'; ############################################################### eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; if (! OpenSocket($Server, $Port)) { $result->{"error"} .= "Unable to create SMTP connection to port $Port"; $result->{"ok"} = 0; return $result; } $in = ; $result->{"detail"} .= $in; while ($in =~ /^220-/) { $in = ; $result->{"detail"} .= $in; } if ($in !~ /^220 /) { alarm 0; print S "QUIT\r\n"; close (S); $result->{"error"} = "did not receive 220 greeting"; $result->{"ok"} = 0; return $result; } $OurHostname = &hostname; print S "HELO $OurHostname\r\n"; $in = ; $result->{"detail"} .= $in; while ($in =~ /^250-/) { $in = ; $result->{"detail"} .= $in; } if ($in !~ /^250 /) { alarm 0; print S "QUIT\r\n"; close (S); $result->{"error"} = "did not get 250 response to HELO"; $result->{"ok"} = 0; return $result; } print S "quit\r\n"; $in = ; $result->{"detail"} .= $in; if ($in !~ /^221 /) { alarm 0; print S "QUIT\r\n"; close (S); $result->{"error"} = "did not get 221 response to quit"; $result->{"ok"} = 0; return $result; } $result->{"ok"} = 1; print S "QUIT\r\n"; close(S); alarm 0; # Cancel the alarm }; if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { $result->{"error"} = "timeout"; $result->{"ok"} = 0; return $result; } return $result; } sub OpenSocket { # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # local($OtherHostname, $Port) = @_; local($OurHostname, $sockaddr, $name, $aliases, $proto, $type, $len, $ThisAddr, $that); $OurHostname = &hostname; ($name, $aliases, $proto) = getprotobyname('tcp'); ($name, $aliases, $Port) = getservbyname($Port, 'tcp') unless $Port =~ /^\d+$/; ($name, $aliases, $type, $len, $ThisAddr) = gethostbyname($OurHostname); ($name, $aliases, $type, $len, $OtherHostAddr) = gethostbyname($OtherHostname); my $that = sockaddr_in ($Port, $OtherHostAddr); $result = socket(S, &PF_INET, &SOCK_STREAM, $proto) || return undef; $result = connect(S, $that) || return undef; select(S); $| = 1; select(STDOUT); # set S to be un-buffered return 1; # success } mon-1.2.0/mon.d/imap.monitor0000755003616100016640000001103610301645774015574 0ustar trockijtrockij#!/usr/bin/perl # # Use try to connect to an IMAP server, and # wait for the right output. # # For use with "mon". # # Arguments are "-p port -t timeout host [host...]" # # Adapted from "http.monitor" by # Jim Trocki, trockij@transmeta.com # # http.monitor written by # # Jon Meek # American Cyanamid Company # Princeton, NJ # # $Id: imap.monitor,v 1.3 2005/08/20 15:27:56 vitroth Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; getopts ("m:p:t:"); $PORT = $opt_p || 143; $TIMEOUT = $opt_t || 30; $MAILBOX=$opt_m || undef; @failures = (); foreach $host (@ARGV) { if (! &imapGET($host, $PORT)) { push (@failures, $host); } } if (@failures == 0) { exit 0; } print join (" ", sort @failures), "\n\n", join ("\n", @longerr), "\n"; exit 1; sub imapGET { use Socket; use Sys::Hostname; my($Server, $Port) = @_; my($ServerOK, $TheContent, $cmd); $ServerOK = 0; $TheContent = ''; $Path = '/'; ############################################################### eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; $result = &OpenSocket($Server, $Port); # Open a connection to the server if ($result == 0) { # Failure to open the socket push @longerr, "$Server: Unable to connect"; return ''; } $in = ; if ($in !~ /^\* (OK|PREAUTH|BYE)/) { alarm 0; push @longerr, "$Server: No IMAP banner received"; return 0; } $cmd="login"; print S "A1 LOGIN ANONYMOUS ANONYMOUS\r\n"; while (defined($in=)) { if ($in =~ /^A1 (\w+) (.*)/) { if ($1 eq "OK") { $ServerOK = 1; } else { $errmsg="$1 $2"; } last; } } if ($ServerOK && $MAILBOX) { $cmd="examine"; $ServerOK=0; print S "A2 EXAMINE $MAILBOX\r\n"; while (defined($in=)) { if ($in =~ /^A2 (\w+) (.*)/) { if ($1 eq "OK") { $ServerOK = 1; } else { $errmsg="$1 $2"; } last; } } } if ($ServerOK) { $cmd="logout"; $ServerOK=0; print S "A3 LOGOUT\r\n"; while (defined($in=)) { if ($in =~ /^A3 (\w+) (.*)/) { if ($1 eq "OK") { $ServerOK = 1; } else { $errmsg="$1 $2"; } last; } } } if (!$ServerOK) { if ($errmsg) { push @longerr, "$Server: bad response to $cmd: $errmsg"; } else { push @longerr, "$Server: No response to $cmd"; } } close(S); alarm 0; # Cancel the alarm }; if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { push @longerr, "$Server: **** Time Out\n"; return 0; } elsif ($EVAL_ERROR) { push @longerr, "$Server: $EVAL_ERROR"; return 0; } return $ServerOK; } sub OpenSocket { # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # local($OtherHostname, $Port) = @_; local($OurHostname, $sockaddr, $name, $aliases, $proto, $type, $len, $ThisAddr, $that); $OurHostname = &hostname; ($name, $aliases, $proto) = getprotobyname('tcp'); ($name, $aliases, $Port) = getservbyname($Port, 'tcp') unless $Port =~ /^\d+$/; ($name, $aliases, $type, $len, $ThisAddr) = gethostbyname($OurHostname); ($name, $aliases, $type, $len, $OtherHostAddr) = gethostbyname($OtherHostname); my $that = sockaddr_in ($Port, $OtherHostAddr); $result = socket(S, &PF_INET, &SOCK_STREAM, $proto) || return undef; $result = connect(S, $that) || return undef; select(S); $| = 1; select(STDOUT); # set S to be un-buffered return 1; # success } mon-1.2.0/mon.d/silkworm.monitor0000755003616100016640000001503010061516615016505 0ustar trockijtrockij#!/usr/bin/perl # # "mon" monitor to detect thermal/fan/psu/port failures # for brocade silkworm fcal switches # # arguments are "[-c community] host [host...]" # # -c community SNMP community # # Jim Trocki # # $Id: silkworm.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 2000, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Std; use strict; sub get_status; sub get_table; sub get_vars; sub fancy_psu_table; $ENV{"MIBS"} = "RFC1213-MIB:SW-MIB"; my %opt; getopts ('c:', \%opt); my $COMM = $opt{"c"} || "public"; my @failures = (); my $detail; my %sensors; my %ports; my %opstatus; foreach my $host (@ARGV) { my %status = get_status ($host, $COMM); if ($status{"error"} ne "") { push (@failures, $host); $detail .= "could not retrieve status from $host: $status{error}\n\n"; next; } elsif ($status{"failure"}) { push (@failures, $host); $detail .= $status{"failure_summary"}; } $sensors{$host} = $status{"sensor_table"}; $ports{$host} = $status{"port_table"}; $opstatus{$host} = $status{"operstatus"}; } # # output returned to mon # if (@failures != 0) { print join (" ", sort @failures), "\n"; print "$detail\n"; } else { print "\n"; } print <{"swSensorValue"}; $val = "unknown" if ($val == -2147483648); printf ("%-20s %-12s %-9s %-10s %s\n", $host, $r->{"swSensorType"}, $r->{"swSensorStatus"}, $val, $r->{"swSensorInfo"}, ); } print "\n"; } print <{"swFCPortIndex"}, $r->{"swFCPortType"}, $r->{"swFCPortAdmStatus"}, $r->{"swFCPortPhyState"}, $r->{"swFCPortOpStatus"}, $r->{"swFCPortLinkState"}, ); } print "\n"; } exit 1 if (@failures != 0); exit 0; # # params: (hostname, community) # # returns: # ( # "error" => "error name, empty string means no error", # ) # sub get_status { my ($host, $comm) = @_; my $s; if (!defined ($s = new SNMP::Session ( "DestHost" => $host, "Community" => $comm, "Version" => 2, "UseEnums" => 1, ))) { return ("error" => "cannot create session"); } my $error; my $failure_detected = 0; my $failure_summary = ""; my $sensor_table; my $port_table; my $operstatus; # # is this really a brocade fcal switch? # my $sys_oid = $s->get (["sysObjectID", 0]); return ("error" => $s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); return ("error" => "not Brocade fiberchannel switch") if ($sys_oid !~ /^\.1\.3\.6\.1\.4\.1\.1588\.2\.1\.1\.[12]/); # # operational status # if (!defined ($operstatus = $s->get ("swOperStatus.0"))) { return ("error" => "$s->{ErrorStr}"); } $failure_detected++ if ($operstatus ne "online"); # # sensor table # ($error, $sensor_table) = get_table ($s, ["swSensorType"], ["swSensorStatus"], ["swSensorValue"], ["swSensorInfo"], ); return ("error" => $error) if ($error ne ""); # # port table # ($error, $port_table) = get_table ($s, ["swFCPortIndex"], ["swFCPortType"], ["swFCPortPhyState"], ["swFCPortOpStatus"], ["swFCPortAdmStatus"], ["swFCPortLinkState"], ); return ("error" => $error) if ($error ne ""); foreach my $r (@{$sensor_table}) { next if ($r->{"swSensorStatus"} eq "nominal" || $r->{"swSensorStatus"} eq "absent" ); $failure_summary .= "$host sensor $r->{swSensorType} failure ($r->{swSensorStatus})\n"; $failure_detected++; } foreach my $r (@{$port_table}) { # # ignore disabled/loopback ports # next if ($r->{"swFCPortLinkState"} ne "enabled"); # # ignore operational ports # next if ($r->{"swFCPortOpStatus"} eq "online"); $failure_summary .= "$host port $r->{swFCPortIndex} failure ($r->{swFCPortOpStatus})\n"; $failure_detected++; } ( "error" => "", "failure" => $failure_detected, "failure_summary" => $failure_summary, "operstatus" => $operstatus, "sensor_table" => $sensor_table, "port_table" => $port_table, ); } sub get_table { my ($s, @tbl) = @_; my $table = []; my $tblid = $tbl[0]->[0]; my $i = 0; my $row = new SNMP::VarList (@tbl); return ("MIB problem") if (!defined $row); while (defined ($s->getnext ($row))) { last if ($s->{"ErrorStr"} ne ""); my $r = $row->[0]->[0]; last if ($r ne $tblid); foreach my $col (@{$row}) { $table->[$i]->{"iid"} = $col->[1]; $table->[$i]->{$col->[0]} = $col->[2]; } $i++; } return ($s->{"ErrorStr"}) if ($s->{"ErrorStr"} ne ""); ( "", $table, ); } sub get_vars { my ($s, @vars) = @_; my $r = new SNMP::VarList ( @vars ); return ("MIB problem") if (!defined $r); return ($s->ErrorStr) if (!defined ($s->get ($r))); my $v; foreach my $element (@{$r}) { $v->{$element->[0]} = $element->[2]; } ("", $v); } mon-1.2.0/mon.d/hpnp.monitor0000755003616100016640000000556310061516615015615 0ustar trockijtrockij#!/usr/bin/perl # # SNMP monitoring of HP JetDirect-equipped printers # # returns 1 if an actual printer failure is indicated by SNMP, # or 2 if it cannot communicate with the printer. # # Jim Trocki # # $Id: hpnp.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Long; GetOptions (\%opt, "community=s", "timeout=i", "retries=i", "lpq"); die "no host arguments\n" if (@ARGV == 0); $RET = 0; @ERRS = (); $COMM = $opt{"community"} || "public"; $TIMEOUT = $opt{"timeout"} * 1000 * 1000 || 2000000; $RETRIES = $opt{"retries"} || 5; @DESC = ("offline", "paper problem", "needs human intervention", "peripheral error", "paper out", "paper jam", "toner low"); foreach $host (@ARGV) { undef $s; if (!defined($s = new SNMP::Session (DestHost => $host, Timeout => $TIMEOUT, Community => $COMM, "Version" => 2, Retries => $RETRIES))) { print "cannot create SNMP session to $host\n"; $RET = ($RET == 1) ? 1 : 2; next; } undef $vars; $vars = new SNMP::VarList ( ['.1.3.6.1.4.1.11.2.3.9.1.1.2.1', 0], # line state ['.1.3.6.1.4.1.11.2.3.9.1.1.2.2', 0], # paper state ['.1.3.6.1.4.1.11.2.3.9.1.1.2.3', 0], # human intervention ['.1.3.6.1.4.1.11.2.3.9.1.1.2.6', 0], # gdStatusPeripheralError ['.1.3.6.1.4.1.11.2.3.9.1.1.2.8', 0], # gdStatusPaperOut ['.1.3.6.1.4.1.11.2.3.9.1.1.2.9', 0], # gdStatusPaperJam ['.1.3.6.1.4.1.11.2.3.9.1.1.2.10', 0], # gdStatusTonerLow ['.1.3.6.1.4.1.11.2.3.9.1.1.3', 0], # gdStatusDisplay ); if (!defined ($s->get($vars))) { push (@hosts, $host); push (@ERRS, "$host unreachable\n\n"); $RET = ($RET == 1) ? 1 : 2; next; } $display = ${$vars}[7]->val; $h = 0; @H = (); for ($i = 0; $i< 6; $i++) { if (${$vars}[$i]->val) { push (@H, @DESC[$i]); $RET = 1; $h = 1; } } if ($h) { push (@hosts, $host); } if (@H > 0) { push (@ERRS, "$host\n" . "-" x length($host) . "\n" . join ("\n", @H, "\n")); push (@ERRS, "display reads:\n$display\n\n"); } } if (@hosts > 0) { print join (" ", sort @hosts), "\n"; print "\n"; print @ERRS; } exit $RET; mon-1.2.0/mon.d/asyncreboot.monitor0000755003616100016640000001254310061516615017174 0ustar trockijtrockij#!/usr/bin/perl # # monitor host reboots via SNMP, asynchronously # # for use with "mon" # # options: # # asynch-reboot.monitor --statefile=filename --dir=dir host1 host2... # # Since this is scheduled from mon, it must maintain state between # runs. It uses the "statefile" for this, in which it stores a # sysUpTime sample for each host specified on the command line. # # THIS STATE FILE MUST NOT BE SHARED BETWEEN MULTIPLE INSTANCES OF THIS # MONITOR, SINCE IT DOES NOT HANDLE LOCKING OF THE FILE DURING UPDATES!! # # Jim Trocki # # $Id: asyncreboot.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP 1.8; use Getopt::Long; sub secs_to_hms; sub read_state_file; sub write_state_file; sub secs_to_hms; sub print_failures; GetOptions (\%opt, "statefile=s", "dir=s", "timeout=i"); $STATEDIR = $opt{"dir"} || "/usr/lib/mon/state.d"; $STATEFILE = $opt{"statefile"} || "state"; $STATE = "$STATEDIR/$STATEFILE"; $TIMEOUT = $opt{"timeout"} || 10; die "reboot state dir $STATEDIR does not exist\n" if (! -w $STATEDIR); die "no host arguments\n" if (@ARGV == 0); read_state_file; %getting = (); %gotten = (); %sessions = (); foreach my $host (@ARGV) { $sessions{$host} = new SNMP::Session ( "DestHost" => $host, "Retries" => 10, "Version" => 2, ); $getting{$host} = 1; $sessions{$host}->get ( [['.1.3.6.1.2.1.1.3.0']], [\&h, $sessions{$host}], ); } SNMP::MainLoop($TIMEOUT); # # what was gotten and what was not # @failures = (); foreach my $host (keys %gotten) { next if ($gotten{$host}->{"uptime"} eq "timeout"); # # no history for this # if (!defined $last_sample{$host}{"uptime"}) { } elsif ($last_sample{$host}{"uptime"} < 2**32 - (120 * 60 * 100) && $gotten{$host}->{"uptime"} < $last_sample{$host}{"uptime"}) { push (@failures, $host); $last_sample{$host}{"olduptime"} = $last_sample{$host}{"uptime"}; $last_sample{$host}{"oldcheck"} = $last_sample{$host}{"lastcheck"}; } $last_sample{$host}{"uptime"} = $gotten{$host}->{"uptime"}; $last_sample{$host}{"lastcheck"} = $gotten{$host}->{"time"}; } for (keys %getting) { print "notgot $_ $getting{$_}\n"; $gotten{$_}->{"uptime"} = "timeout"; $gotten{$_}->{"time"} = time; $gotten{$_}->{"error"} = "timeout"; } write_state_file; # # all is OK, nobody has rebooted # if (@failures == 0) { exit; } print_failures; exit 1; # # callback for asynch requests # sub h { $gotten{$_[0]->{"DestHost"}}->{"time"} = time; delete $getting{$_[0]->{"DestHost"}}; if (!defined ($_[1])) { $gotten{$_[0]->{"DestHost"}}->{"uptime"} = "timeout"; $gotten{$_[0]->{"DestHost"}}->{"error"} = $_[0]->{"ErrorStr"}; return; } $gotten{$_[0]->{"DestHost"}}->{"uptime"} = $_[1][0]->val; } sub secs_to_hms { my ($s) = @_; my ($dd, $hh, $mm, $ss); $dd = int ($s / 86400); $s -= $dd * 86400; $hh = int ($s / 3600); $s -= $hh * 3600; $mm = int ($s / 60); $s -= $mm * 60; $ss = $s; if ($dd == 0) { sprintf("%02d:%02d", $hh, $mm); } else { sprintf("%d days, %02d:%02d", $dd, $hh, $mm); } } sub read_state_file { my $host; if (! -f $STATE) { open (O, ">$STATE"); close (O); } # # read in state file # if (!open (IN, "$STATE")) { die "could not open state file $STATE\n"; } while (defined ($host = )) { if ($host =~ /^(\S+) (\d+) (\d+)/) { $last_sample{$1}{"uptime"} = $2; $last_sample{$1}{"lastcheck"} = $3; } } close (IN); } sub write_state_file { # # update state file # if (!open (OUT, ">$STATE")) { die "could not open $STATEFILE for writing: $!\n"; } foreach my $k (sort keys %last_sample) { print OUT "$k $last_sample{$k}{uptime} $last_sample{$k}{lastcheck}\n"; } close (OUT); } # # we have reboots, so calculate uptime, downtime, # and report it # sub print_failures { my $t = time; my @f; foreach my $host (@failures) { my $downtime = secs_to_hms ( $t - $last_sample{$host}{"oldcheck"} - int ($last_sample{$host}{"uptime"} / 100) ); my $uptime = secs_to_hms ( int ($last_sample{$host}{"olduptime"} / 100) ); push (@f, "$host / down $downtime / up $uptime"); } @f = sort @f; print join (" ", sort @failures), "\n"; printf ("%-20s %s\n", "host", "rebooted on"); printf ("----------------------------------------------\n"); my $timen = time; foreach my $host (@failures) { my $secs = int($last_sample{$host}{"uptime"} / 100); my $t = localtime ($timen - $secs); printf ("%-20s %s\n", $host, $t); } print "\n"; print join ("\n", @f), "\n"; } mon-1.2.0/mon.d/reboot.monitor0000755003616100016640000001173710146140377016144 0ustar trockijtrockij#!/usr/bin/perl -w # # monitor host reboots via SNMP # # for use with "mon" # # options: # # reboot.monitor --statefile=filename --dir=dir [--community=com] host1 host2... # # Since this is scheduled from mon, it must maintain state between # runs. It uses the "statefile" for this, in which it stores a # sysUpTime sample for each host specified on the command line. # # THIS STATE FILE MUST NOT BE SHARED BETWEEN MULTIPLE INSTANCES OF THIS # MONITOR, SINCE IT DOES NOT HANDLE LOCKING OF THE FILE DURING UPDATES!! # # Jim Trocki # # $Id: reboot.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # modified June 2000 by Ed Ravin # changed output to conform with other monitors (just hostname on summary) # minor cosmetic changes (usage, use default Mon state dir) # the old behavior still available with the "--verbose" option use SNMP; use Getopt::Long; ($ME = $0) =~ s-.*/--; GetOptions (\%opt, "statefile=s", "dir=s", "community=s", "verbose"); $STATEDIR = $opt{"dir"} ? $opt{"dir"} : $ENV{"MON_STATEDIR"} ? $ENV{"MON_STATEDIR"} : die "$ME: --dir not specified and \$MON_STATEDIR not set\n"; $STATEFILE = $opt{"statefile"} || $ME; $STATE = "$STATEDIR/$STATEFILE"; $COMM = $opt{"community"} || "public"; die "$ME: reboot state dir $STATEDIR does not exist\n" if (! -w $STATEDIR); $VERBOSE= $opt{"verbose"} || 0; die "$ME: no host arguments\n" if (@ARGV == 0); if (! -f $STATE) { open (O, ">$STATE"); close (O); } # # read in state file # if (!open (IN, "$STATE")) { die "$ME: could not open state file $STATE\n"; } while (defined ($host = )) { if ($host =~ /^(\S+) (\d+) (\d+)/) { $last_sample{$1}{"uptime"} = $2; $last_sample{$1}{"lastcheck"} = $3; } } close (IN); # # get uptime for each host via SNMP # @failures = (); foreach $host (@ARGV) { if (!defined($s = new SNMP::Session (DestHost => $host, Community => $COMM, "Version" => 2))) { print "reboot.monitor: cannot create SNMP session to $host\n"; next; } if (!defined($u = $s->get("sysUpTime.0"))) { next; } # # If the uptime is lower than the last sample, # assume this is a reboot. Note that this cannot # account for counter rollover! # # # no history for this # if (!defined $last_sample{$host}{"uptime"}) { } elsif ($last_sample{$host}{"uptime"} < 2**32 - (120 * 60 * 100) && $u < $last_sample{$host}{"uptime"}) { push (@failures, $host); $last_sample{$host}{"olduptime"} = $last_sample{$host}{"uptime"}; $last_sample{$host}{"oldcheck"} = $last_sample{$host}{"lastcheck"}; } $last_sample{$host}{"uptime"} = $u; $last_sample{$host}{"lastcheck"} = time; } # # update state file # if (!open (OUT, ">$STATE")) { die "$ME: could not open $STATEFILE for writing: $!\n"; } foreach $k (sort keys %last_sample) { print OUT "$k $last_sample{$k}{uptime} $last_sample{$k}{lastcheck}\n"; } close (OUT); # # all is OK, nobody has rebooted # if (@failures == 0) { exit; } # # we have reboots, so calculate uptime, downtime, # and report it # $t = time; foreach $host (@failures) { $downtime = &secs_to_hms ( $t - $last_sample{$host}{"oldcheck"} - int ($last_sample{$host}{"uptime"} / 100)); $uptime = &secs_to_hms ( int ($last_sample{$host}{"olduptime"} / 100)); if ($VERBOSE) { push (@f, "$host / down $downtime / up $uptime"); } else { push (@f, "$host"); } } @f = sort @f; print "@f\n"; printf ("%-20s %-25s %-25s\n", "host", "rebooted on", "last seen up on"); printf ("%-20s %-25s %-25s\n", "-" x 20, "-" x 25, "-" x 25); $timen = time; foreach $host (@failures) { $secs = int($last_sample{$host}{"uptime"} / 100); $t = localtime ($timen - $secs); $downtime= localtime($last_sample{$host}{"oldcheck"} - ($last_sample{$host}{"uptime"} / 100) ); printf ("%-20s %-25s %-25s\n", $host, $t, $downtime); } exit 1; sub secs_to_hms { my ($s) = @_; my ($dd, $hh, $mm, $ss); $dd = int ($s / 86400); $s -= $dd * 86400; $hh = int ($s / 3600); $s -= $hh * 3600; $mm = int ($s / 60); $s -= $mm * 60; $ss = $s; if ($dd == 0) { sprintf("%02d:%02d", $hh, $mm); } else { sprintf("%d days, %02d:%02d", $dd, $hh, $mm); } } mon-1.2.0/mon.d/ftp.monitor0000755003616100016640000001122010620057066015425 0ustar trockijtrockij#!/usr/bin/perl # # Use try to connect to a FTP server, and # wait for the right output. # # For use with "mon". # # Arguments are "-p port -t timeout host [host...]" # # Adapted from "http.monitor" by # Jim Trocki # # http.monitor originally written by # # Jon Meek # American Cyanamid Company # Princeton, NJ # # $Id: ftp.monitor,v 1.1.1.1.4.1 2007/05/08 11:25:42 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; getopts ("p:t:"); $PORT = $opt_p || 21; $TIMEOUT = $opt_t || 30; my %good; my %bad; # # collect the status on all the hosts # foreach my $host (@ARGV) { my $result = ftpGET ($host, $PORT); if (!$result->{"ok"}) { $bad{$host} = $result; } else { $good{$host} = $result; } } # # summary line # if (keys %bad == 0) { print "\n"; } else { print join (" ", sort keys %bad), "\n"; } # # detail # foreach my $host (keys %bad) { print "$host failed with error " . $bad{$host}->{"error"}, "\n"; print "detail for $host\n"; print "==============================================================================\n"; if ($bad{$host}->{"detail"} ne "") { print $bad{$host}->{"detail"}; } else { print "no detail\n"; } print "\n"; } print "\n"; foreach my $host (keys %good) { print "$host succeeded\n"; print "detail for $host\n"; print "==============================================================================\n"; if ($good{$host}->{"detail"} ne "") { print $good{$host}->{"detail"}; } else { print "no detail\n"; } print "\n"; } if (keys %bad != 0) { exit 1; } exit 0; sub ftpGET { use Socket; use Sys::Hostname; my($Server, $Port) = @_; my($ok); my $result = { "ok" => 0, "error" => undef, "detail" => undef, }; ############################################################### eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; my $err = &OpenSocket($Server, $Port); # Open a connection to the server if ($err ne "") { # Failure to open the socket $result = { "ok" => 0, "error" => $err, "detail" => undef, }; return $result; } while ($in = ) { $result->{"detail"} .= " < $in"; if ($in =~ /^220 /) { $result->{"ok"} = 1; last; } } if (!$result->{"ok"}) { alarm 0; $result->{"ok"} = 0; $result->{"error"} = "Connection refused"; close(S); return undef; } print S "quit\r\n"; $result->{"detail"} .= " > quit\n"; while ($in = ) { $result->{"detail"} .= " < $in"; next if ($in =~ /^[0-9]{3}\-/); if ($in !~ /^221 /) { alarm 0; $result->{"ok"} = 0; $result->{"error"} = "FTP server error after quit"; close(S); return undef; } } close(S); alarm 0; # Cancel the alarm }; # # catch timeout # if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { $result->{"ok"} = 0; $result->{"error"} = "timeout"; } return $result; } # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # # returns "" on success, or an error string on failure # sub OpenSocket { my ($host, $port) = @_; my $proto = (getprotobyname('tcp'))[2]; return ("could not get protocol") if (!defined $proto); my $conn_port; if ($port =~ /^\d+$/) { $conn_port = $port; } else { $conn_port = (getservbyname($port, 'tcp'))[2]; return ("could not getservbyname for $port") if (!defined $conn_port); } my $host_addr = (gethostbyname($host))[4]; return ("gethostbyname failure") if (!defined $host_addr); my $that = sockaddr_in ($conn_port, $host_addr); if (!socket (S, &PF_INET, &SOCK_STREAM, $proto)) { return ("socket: $!"); } if (!connect (S, $that)) { return ("connect: $!"); } select(S); $| = 1; select(STDOUT); ""; } mon-1.2.0/mon.d/snmpdiskspace.monitor0000755003616100016640000004167210620054615017513 0ustar trockijtrockij#!/usr/bin/perl # # NAME # snmpdiskspace.monitor # # # SYNOPSIS # snmpdiskspace.monitor [--list] [--timeout seconds] [--config filename] # [--community string] [--free minfree] # [--retries retries] [--usemib ] host... # # # DESCRIPTION # This script uses the Host Resources MIB (RFC1514), and optionally # the MS Windows NT Performance MIB, or UCD-SNMP extensions # (enterprises.ucdavis.dskTable.dskEntry) to monitor diskspace on hosts # via SNMP. # # snmpdiskspace.monitor uses a config file to allow the specification of # minimum free space on a per-host and per-partition basis. The config # file allows the use of regular expressions, so it is quite flexible in # what it can allow. See the sample config file for more details and # syntax. # # The script only checks disks marked as "FixedDisks" by the Host MIB, # which should help cut down on the number of CD-ROM drives # erroneously reported as being full! Since the drive classification # portion of the UCD Host MIB isn't too great on many OS'es, though, # this won't buy you a lot. Empire's SNMP agent gets this right on # all the hosts that I checked, though. Not sure about the MS MIB. # UCD-SNMP only checks specific partition types (md, hd, sd, ida) # # snmpdiskspace.monitor is intended for use as a monitor for the mon # network monitoring package. # # # OPTIONS # --community The SNMP community string to use. Default is "public". # --config The config file to use. Default is either # /etc/mon/snmpdiskspace.cf or # /usr/lib/mon/mon.d/snmpdiskspace.cf, in that order. # --retries The number of retries to use, if we get an SNMP timeout. # Default is retry 5 times. # --timeout Seconds to wait before declaring a timeout on an SNMP get. # Default is 20 seconds. # --free The default minimum free space, in a percentage or absolute # quantity, as per the config file. Thus, arguments of, for # example, "20%", "1gb", "50mb" are all valid. # Default is 5% free on every partition checked. # # --ifree The default minimum free inode percentage, specified as # a percentage. Default is 5% free. # # --list Give a verbose listing of all partitions checked on all # specified hosts. # # --listall like --list, but also lists the thresholds defined for # each filesystem, so you can doublecheck the config file # # --usemib Choose which MIB to use: one or more of host, perf, ucd # Default tries all three, in that order # # --debug enable debug output for config file parsing and MIB fetching # # # EXIT STATUS # Exit status is as follows: # 0 No problems detected. # 1 Free space on any host was below the supplied parameter. # 2 A "soft" error occurred, either a SNMP library error, # or could not get a response from the server. # # In the case where both a soft error and a freespace violation are # detected, exit status is 1. # # BUGS # When using the net-snmp agent, you must build it with "--with-dummy-values" # or the monitor may not parse the Host Resources MIB properly. # # List of local filesystem types used when parsing the UCD MIB should be # configurable. # # # NOTES # $Id: snmpdiskspace.monitor,v 1.1.2.2 2007/05/08 11:05:49 trockij Exp $ # # * Added support for inode status via UCD-SNMP MIB. Fourth column in config # file (optional) is for inode%. # * added --debug and --usemib options. Latter needed so you can force use # of UCD mib if you want inode status. # * rearranged the error messages to be more Mon-like (hostname first) # * added code to synchronize instance numbers when using UCD MIB. This # could solve the "sparse MIB" problem usually fixed by the # --with-dummy-values option in net-snmp if needed for other agents # Ed Ravin (eravin@panix.com), January 2005 # # Added support for regex hostnames and partition names in the config file, # 'use strict' by andrew ryan . # # Generalised to handle multible mibs by jens persson # Changes Copyright (C) 2000, jens persson # # Modified for use with UCD-SNMP by Johannes Walch for # NWE GmbH (j.walch@nwe.de) # # Support for UCD's disk MIB added by Matt Simonsen # # # SEE ALSO # mon: http://www.kernel.org/software/mon/ # # This requires the UCD SNMP library and G.S. Marzot's Perl SNMP # module. (http://ucd-snmp.ucdavis.edu and CPAN, respectively). # # The Empire SystemEdge SNMP agent: http://www.empire.com # # # COPYRIGHT # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use strict; use SNMP; use Getopt::Long; sub readcf; sub toBytes; sub get_values; # setup what mibs to use # $ENV{"MIBS"} = 'RFC1213-MIB:HOST-RESOURCES-MIB:WINDOWS-NT-PERFORMANCE:UCD-SNMP-MIB'; $ENV{"MIBS"} = 'RFC1213-MIB:HOST-RESOURCES-MIB:UCD-SNMP-MIB'; my %opt; # parse the commandline GetOptions (\%opt, "community=s", "timeout=i", "retries=i", "config=s", "list", "listall", "free=i", "ifree=n", "usemib=s", "debug"); die "No host arguments given!\n" if (@ARGV == 0); my $RET = 0; #exit value of script my @ERRS = (); # array holding detail output my @HOSTS = (); # array holding summary output my @cfgfile = (); #array holding contents of config file # Read in defaults my $COMM = $opt{"community"} || $ENV{"COMMUNITY"} || "public"; my $TIMEOUT = $opt{"timeout"} * 100000 || 2000000; #default timeout is 20 seconds my $RETRIES = $opt{"retries"} || 5; my $CONFIG = $opt{"config"} || (-d "/etc/mon" ? "/etc/mon" : "/usr/lib/mon/mon.d") . "/snmpdiskspace.cf"; my $DISKFREE = $opt{"free"} || -5; #default max % full is 95% my $INODEFREE = $opt{"ifree"} || 5; #default max % inode full is 95% my $USEMIB= $opt{"usemib"} || "host perf ucd"; my $LIST= $opt{"list"} || $opt{"listall"} || 0; my $LISTALL= $opt{"listall"} || 0; my $DEBUG= $opt{"debug"} || 0; my ($host, $checkval, $icheckval, %FREE, $disk, @disklist, $cfgline); # read the config file if ( !readcf ($CONFIG) ) { # not being able to read config file shouldn't be a fatal, since we # have defaults we can use. print STDERR "readcf: Could not read config file $CONFIG: $!\n"; } # now do the checks for each host foreach $host (@ARGV) { # fetch the info from the computers @disklist = get_values($host); next unless (@disklist) && (ref($disklist[0]) eq "ARRAY"); #make sure we got an OK return value from get_values before going any further # Now check each partition foreach $disk (@disklist) { undef $checkval ; undef $icheckval ; # Go through the config file line by line until we # find a match for this host/partition. Stop as soon # as we find a match. foreach $cfgline (@cfgfile) { if ( ($host =~ m/^$cfgline->[0]$/) && ($disk->[2] =~ m/^$cfgline->[1]$/) ) { print STDERR "'$host' matched /^$cfgline->[0]\$/ or '$disk->[2]' matched /^$cfgline->[1]\$/, using checkval $cfgline->[2]\n" if $DEBUG; $checkval = $cfgline->[2] ; $icheckval= $cfgline->[3] ; last; } } # Set to default otherwise $checkval = $DISKFREE unless defined($checkval); $icheckval= $INODEFREE unless defined($icheckval); $icheckval=~ s/%$//; # do the checking, first absolute and then percentage next if $checkval == 0 && $icheckval == 0; # nothing to check: ignore my $hostfailed= 0; if (($checkval > 0) && ($disk->[0] <$checkval)) { $hostfailed++; push (@ERRS,sprintf("%s: filesystem %s is (%1.1f%% full), %1.0fMB free (below threshold %1.0fMB free)", $host , $disk->[2] , $disk->[1] , $disk->[0] / 1048576, $checkval / 1048576 )); } elsif (($checkval < 0) && ($disk->[1] - $checkval >=100)) { $hostfailed++; push (@ERRS,sprintf("%s: filesystem %s is (%1.1f%% full), %1.0fMB free (below threshold %s%% free)", $host , $disk->[2] , $disk->[1] , $disk->[0] / 1048576, abs($checkval) )); } if (($icheckval > 0) && ($disk->[3] ne "N/A") && (100 - $disk->[3]) < $icheckval ) { $hostfailed++; push (@ERRS, sprintf ("%s: filesystem %s has %1.1f%% inodes free (below threshold %s%% inodes free)", $host, $disk->[2], 100 - $disk->[3], $icheckval )); } if ($hostfailed) { push (@HOSTS, $host); $RET = 1; } # if the user want a listing, then the user will get a listing :-) write if ($LIST or $LISTALL); if ($LISTALL) { printf(" Will alarm if MB free declines below threshold %1.0fMB free\n", $checkval / 1048576) if $checkval > 0; printf(" Will alarm if %%free space declines below threshold %1.1f%% free\n", abs($checkval)) if $checkval < 0; printf(" No free space alarm defined in config file.\n") if $checkval == 0; printf(" Will alarm if %%free inodes declines below %1.1f%%\n", $icheckval) if $icheckval > 0; printf(" No %%inodes free alarm defined in config file.\n") if $icheckval == 0; printf(" WARNING: Unable to alarm on inodes free, dskPercentNode not found in MIB\n") if $disk->[3] eq "N/A" and $icheckval > 0; } } } if ($LIST or $LISTALL) { print "\n\n"; } # Uniq the array of failures, so multiple failures on a single host # are reported in the details section (lines #2-infinity) but not # in the summary (line #1). # Then print out the failures, if any. my %saw; undef %saw; @saw{@HOSTS} = (); @HOSTS = keys %saw; if ($RET) { print "@HOSTS\n"; print "\n"; print join("\n", @ERRS), "\n"; } exit $RET; # # read configuration file # sub readcf { my ($f) = @_; my ($l, $host, $filesys, $free, $ifree); open (CF, $f) || return undef; while () { next if (/^\s*#/ || /^\s*$/); chomp; ($host, $filesys, $free, $ifree) = split; # if (!defined ($FREE{$host}{$filesys} = toBytes ($free))) { if (!push (@cfgfile, [$host , $filesys , toBytes ($free), $ifree || 0]) ) { die "error free specification, config $f, line $.\n"; } print STDERR "cf: assigned host=$host, filesys=$filesys, free=$free, ifree=$ifree\n" if $DEBUG; } close (CF); } sub toBytes { # take a string and parse it as folows # N return N # N kb return N*1024 # N mb return N*1024^2 # N gb return N*1024^3 # N % return -N my ($free) = @_; my ($n, $u); if ($free =~ /^(\d+\.\d+)(kb|mb|gb|%|)$/i) { ($n, $u) = ($1, "\L$2"); } elsif ($free =~ /^(\d+)(kb|mb|gb|%|)$/i) { ($n, $u) = ($1, "\L$2"); } else { return undef; } return (int ($n * -1)) if ($u eq "%"); return (int ($n * 1024 )) if ($u eq "kb"); return (int ($n * 1024 * 1024)) if ($u eq "mb"); return (int ($n * 1024 * 1024 * 1024)) if ($u eq "gb"); int ($n); } # # Do the work of trying to get the data from the host via SNMP # sub get_values { my ($host) = @_; my (@disklist,$Type,$Descr,$AllocationUnits,$Size,$Used,$Freespace,$Percent,$InodePercent); my ($v,$s); if (!defined($s = new SNMP::Session (DestHost => $host, Timeout => $TIMEOUT, Community => $COMM, Retries => $RETRIES))) { $RET = ($RET == 1) ? 1 : 2 ; push (@HOSTS, $host); push (@ERRS, "$host: could not create session: " . $s->{ErrorStr}); return undef; } # First we try to use the Host mib (RFC1514) # supported by net-snmpd on most platforms, see http://www.net-snmp.org # # You can also use the Empire (http://www.empire.com) # SNMP agent to provide hostmib support on UNIX and NT. if ($USEMIB =~ /host/i) { $v = new SNMP::VarList ( ['hrStorageIndex'], ['hrStorageType'], ['hrStorageDescr'], ['hrStorageAllocationUnits'], ['hrStorageSize'], ['hrStorageUsed'], ); while (defined $s->getnext($v)) { last if ($v->[0]->tag !~ /hrStorageIndex/); $Type = $v->[1]->val; $Descr = $v->[2]->val; $AllocationUnits = $v->[3]->val; $Size = $v->[4]->val; $Used = $v->[5]->val; $Freespace = (($Size - $Used) * $AllocationUnits); print STDERR "Found HOST MIB filesystem: Type=$Type, Descr=$Descr, AllocationUnits=$AllocationUnits, Size=$Size, Used=$Used\n" if $DEBUG; # This next check makes sure we're only looking at storage # devices of the "FixedDevice" type (4). For comparison, Physical # RAM is 2, Virtual Memory is 3, Floppy Disk is 6, and CD-ROM is 7 # Using the Empire agent, this will eliminate drive types other # than hard disks. The UCD agent is not as good as determining # drive types under the HOST mib. next if ($Type !~ /\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/); if ($Size != 0) { $Percent= ($Used / $Size) * 100.0; } else { $Percent=0; }; push (@disklist,[$Freespace,$Percent,$Descr, "N/A"]); print STDERR "Using HOST MIB filesystem: $Descr ($Type)\n" if $DEBUG; }; if (@disklist) { return @disklist; }; }; # Then we test the perfmib from M$ NT resource kit # I'm using the agent/mib-defs from # http://www.wtcs.org/snmp4tpc/ # for somereason every second request fails, # so we fetch the variables twice and discards # the bad ones if ($USEMIB =~ /perf/i) { $v = new SNMP::VarList ( ['ldisklogicalDiskIndex'], ['ldiskPercentFreeSpace'], ['ldiskPercentFreeSpace'], ['ldiskFreeMegabytes'], ['ldiskFreeMegabytes'], ); while (defined $s->getnext($v)) { # Make sure we are still in relevant portion of MIB last if ($v->[1]->val !~ /^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/); last if ($v->[0]->val =~ /Total/); $Descr = ( $v->[0]->val =~ /.*:.*:(\w+:)$/gi)[-1] ; $Percent = $v->[2]->val; $Freespace = $v->[4]->val * 1024 * 1024; push (@disklist,[$Freespace,$Percent,$Descr, "N/A"]); print STDERR "Using PERF MIB filesystem: $Descr, $Freespace,$Percent\n" if $DEBUG; }; if (@disklist) { return @disklist; } } #Try UCD-SNMP .enterprises.ucdavis.dskTable.dskEntry MIB extrnsion # Comes with UCD-SNMP / net-snmp if ($USEMIB =~ /ucd/i) { $v = new SNMP::VarList ( ['dskIndex'], ['dskPath'], ['dskPercent'], ['dskAvail'], ['dskDevice'], ['dskPercentNode'], ); while (defined $s->getnext($v)) { last if ($v->[0]->tag !~ /dskIndex/); # end of MIB? my $instancenum= $v->[0]->iid; # what instance number? # check for partial fetches (like swap partition) that won't # return all the MIB entries if ($v->[2]->iid != $instancenum or $v->[3]->iid != $instancenum or $v->[5]->iid != $instancenum) { # ignore this instance and try to move on to next # we wouldn't need this if use-dummy-values really worked $v = new SNMP::VarList ( ['dskIndex', $instancenum], ['dskPath', $instancenum], ['dskPercent', $instancenum], ['dskAvail', $instancenum], ['dskDevice', $instancenum], ['dskPercentNode', $instancenum], ); next; } $Descr = $v->[1]->val; $Percent = $v->[2]->val; $Freespace = $v->[3]->val; $Freespace *= 1024; #Convert from kbytes to bytes to make consistent $Type = $v->[4]->val; $InodePercent = $v->[5]->val; print STDERR "Found UCD MIB filesystem: Type=$Type, Descr=$Descr, Percent=$Percent, Freespace=$Freespace, InodePercent=$InodePercent\n" if $DEBUG; # Try to catch only local filesystems. This covers the # the basics, but probably should be configurable next unless ( $Type =~ m/\b(md|hd|wd|sd|ida|raid)/ ) ; print STDERR "Using UCD MIB filesystem: $Descr ($Type)\n" if $DEBUG; push (@disklist,[$Freespace,$Percent,$Descr, $InodePercent]); }; if (@disklist) { return @disklist; } } #Check for errors if ($s->{ErrorNum}) { push (@HOSTS, $host); push (@ERRS, "$host: could not get SNMP info: " . $s->{ErrorStr}); $RET = ($RET == 1) ? 1 : 2 ; return undef; } # Check for OID not found push (@HOSTS, $host); push (@ERRS, "$host: Disk space OIDs not found in MIB(s): $USEMIB"); $RET = ($RET == 1) ? 1 : 2 ; return undef; } # format specifications, should be able to cut, paste and edit into a config file format STDOUT_TOP = System Description % Used Free space Inode% ------------------------------------------------------------------------------- . format STDOUT = @<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<< @###.# % @#######.# mb @>>>>>> $host, $disk->[2], $disk->[1], $disk->[0]/1024/1024, ( $disk->[3] ne "N/A" ? ($disk->[3] + 0) . "%" : "N/A") . mon-1.2.0/mon.d/ping.monitor0000755003616100016640000000303710230411543015567 0ustar trockijtrockij#!/bin/sh # # Return a list of hosts which not reachable via ICMP echo # # Jim Trocki, trockij@arctic.org # # $Id: ping.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # plat=`uname` p=`which ping` if [ ! -x "$p" ] then echo ping.monitor error, could not find ping exit 1 fi if [ "$#" = 0 ] then echo ping: no hosts found exit 1 fi case "$plat" in Linux) PING="ping -c 1" ;; SunOS) PING="/usr/sbin/ping" ;; NetBSD|OpenBSD) PING="/sbin/ping -c 1" ;; *) echo "unknown plat <$plat>" exit 1 ;; esac failed="" for h in "$@" do if $PING $h >/dev/null 2>/dev/null then : else if [ "$failed" = "" ] then failed="$h" else failed="$failed $h" fi fi done if [ "$failed" != "" ] then echo "$failed" exit 1 fi exit 0 mon-1.2.0/mon.d/netappfree.monitor0000755003616100016640000002120110061516614016763 0ustar trockijtrockij#!/usr/bin/perl -w # # Use SNMP to get free disk space or inode status from a Network Appliance # # exit values: # 1 - free space or inodes on any host dropped below the supplied parameter # 2 - network or SNMP error (SNMP library error, no response from server) # 3 - config error - (filesystem in config file does not exist on filer) # USAGE # [--community=] [--timeout=] # [--config=/path/to/configfile] [--list] host1 host2 ... # EXAMPLES # --list option will dump current status from requested hosts: # netappfree.monitor --list filer1 filer2 filer3 # sample output: # filer ONTAP filesystem KB total KB avail Inode% # ---------------------------------------------------------------------------- # filer1 6.1.2R3 /vol/vol0/ 61092616 6773416 86 # filer1 6.1.2R3 /vol/vol0/.snaps 2545524 1260240 0 # sample invocation in mon.cf, with local MIB directory for the Netapp MIB # NETWORK-APPLIANCE-MIB.txt (copy from /etc/mib/netapp.mib on filer): # service freespace # description test freespace and inodes on Netapp filers # depend SELF:ping # MIBDIRS=/usr/local/share/snmp/mibs # interval 7m # monitor netappfree.monitor # CONFIG FILE FORMAT # # Run "netappfree --list host1 host2 ..." first to get list of filesystems # and whether inodes are properly reported. If you don't want to monitor # inodes for a particular FS, leave tha column blank. # # # host filesystem freespace [InodeThreshold] # (in kb, gb, or mb) (in % or k) # # filer1 /vol/main/ 5gb 90% # filer2 /vol/vol0/ 5gb 500k # # This requires the UCD SNMP library and G.S. Marzot's Perl SNMP # module. # # Originally by Jim Trocki. Modified by Theo Van Dinter # (tvd@colltech.com, felicity@kluge.net) to add verbose error output, # more error checking, etc. Can be used in conjunction with # snapdelete.alert to auto-remove snapshots if needed. # Modified December 2003 by Ed Ravin (eravin@panix.com) to add inode # checking, detect nonexistent filesystem in config file, pass perl -w # checks, added more info to error messages for clarity, updated doc comments # above. # $Id: netappfree.monitor,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # Copyright (C) 1999-2001, Theo Van Dinter # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use SNMP; use Getopt::Long; sub list; sub readcf; sub toKB; $ENV{"MIBS"} = 'RFC1213-MIB:NETWORK-APPLIANCE-MIB'; GetOptions (\%opt, "community=s", "timeout=i", "retries=i", "config=s", "list"); die "no host arguments\n" if (@ARGV == 0); $RET = 0; @ERRS = (); %HOSTS = (); $COMM = $opt{"community"} || "public"; $TIMEOUT = $opt{"timeout"} || 2; $TIMEOUT *= 1000 * 1000; $RETRIES = $opt{"retries"} || 5; $CONFIG = $opt{"config"} || (-d "/etc/mon" ? "/etc/mon" : "/usr/lib/mon/etc") . "/netappfree.cf"; ($dfIndex, $dfFileSys, $dfKBytesTotal, $dfKBytesAvail, $dfInodesFree, $dfPerCentInodeCapacity) = (0..5); list (@ARGV) if ($opt{"list"}); readcf ($CONFIG) || die "could not read config: $!\n"; foreach $host (@ARGV) { next if (!defined $FREE{$host}); if (!defined($s = new SNMP::Session (DestHost => $host, Timeout => $TIMEOUT, Community => $COMM, Retries => $RETRIES))) { $RET = ($RET == 1) ? 1 : 2; $HOSTS{$host} ++; push (@ERRS, "could not create session to $host: " . $SNMP::Session::ErrorStr); next; } $v = new SNMP::VarList ( ['dfIndex'], ['dfFileSys'], ['dfKBytesTotal'], ['dfKBytesAvail'], ['dfInodesFree'], ['dfPerCentInodeCapacity'], ); if ( $v->[$dfIndex]->tag !~ /^df/ ) { push(@ERRS,"OIDs not mapping correctly! Check that NetApp MIB is available!"); $RET = 1; last; } while (defined $s->getnext($v)) { last if ($v->[$dfIndex]->tag !~ /dfIndex/); my $filesys= $v->[$dfFileSys]->val; next unless exists($FREE{$host}{$filesys}); if ($v->[$dfKBytesAvail]->val < $FREE{$host}{$filesys}{'bytes'}) { $HOSTS{$host} ++; push (@ERRS, sprintf ("%1.1fGB free on %s:%s (threshold %1.1fGB, fs size %1.1fGB)", $v->[$dfKBytesAvail]->val / 1024 / 1024, $host, $filesys, $FREE{$host}{$filesys}{'bytes'} / 1024 / 1024, $v->[$dfKBytesTotal]->val / 1024 / 1024) ); $RET = 1; } # mark filesys entry as seen in filer's MIB $FREE{$host}{$v->[$dfFileSys]->val}{'existsOnFiler'}= 1; if (defined($FREE{$host}{$v->[$dfFileSys]->val}{'inode'})) { my $inodefreewanted= $FREE{$host}{$v->[$dfFileSys]->val}{'inode'}; if (0 < $inodefreewanted and $inodefreewanted < 1) { # percentage? if ($v->[$dfPerCentInodeCapacity]->val > $inodefreewanted * 100) { # percentage exceeded? $HOSTS{$host} ++; push (@ERRS, sprintf("%d%% inodes used on %s:%s, over threshold of %d%%", $v->[$dfPerCentInodeCapacity]->val, $host, $v->[$dfFileSys]->val, $inodefreewanted * 100 )); $RET = 1; } } elsif ($v->[$dfInodesFree]->val < $inodefreewanted) { $HOSTS{$host} ++; push (@ERRS, sprintf("%1.1f inodes free on %s:%s, below threshold of %1.1f", $v->[$dfInodesFree]->val, $host, $v->[$dfFileSys]->val, $inodefreewanted )); $RET = 1; } } } if ($s->{ErrorNum}) { $HOSTS{$host} ++; push (@ERRS, "could not get dfIndex for $host: " . $s->{ErrorStr}); $RET = ($RET == 1) ? 1 : 2; } } foreach $host (@ARGV) { foreach $filesys (keys %{$FREE{$host}}) { if (! $FREE{$host}{$filesys}{'existsOnFiler'} ) { $HOSTS{$host} ++; push (@ERRS, "filesystem $filesys does not exist on $host"); $RET = ($RET == 1) ? 1 : 3; } } } if ($RET) { print join(" ", sort keys %HOSTS), "\n\n", join("\n", @ERRS), "\n"; } exit $RET; # # read configuration file # sub readcf { my ($f) = @_; my ($l, $host, $filesys, $free, $inodefree); open (CF, $f) || return undef; while () { next if (/^\s*#/ || /^\s*$/); chomp; ($host, $filesys, $free, $inodefree) = split; if (!defined ($FREE{$host}{$filesys}{'bytes'} = toKB ($free))) { die "error free specification, config $f, line $.\n"; } if (!defined ($FREE{$host}{$filesys}{'inode'} = toIN ($inodefree))) { # allow this to be optional for compatibility # die "error inodefree specification, config $f, line $.\n"; } $FREE{$host}{$filesys}{'existsOnFiler'}= 0; } close (CF); } sub toKB { my ($free) = @_; my ($n, $u); if ($free =~ /^(\d+\.\d+)(kb|mb|gb)$/i) { ($n, $u) = ($1, "\L$2"); } elsif ($free =~ /^(\d+)(kb|mb|gb)$/i) { ($n, $u) = ($1, "\L$2"); } else { return undef; } return (int ($n * 1024)) if ($u eq "mb"); return (int ($n * 1024 * 1024)) if ($u eq "gb"); int ($n); } sub toIN { my ($infree) =@_; return undef unless defined($infree); if ($infree =~ /^(\d+\.?\d+)%$/) { # percentage return $1 / 100; } if ($infree =~ /^(\d+\.?\d+)(k|kb)$/) { # kilos? return $1 * 1024; } if ($infree =~ /^(\d+\.?\d+)$/) { # bare?? return $1; } return undef; } sub list { my (@hosts) = @_; foreach $host (@hosts) { if (!defined($s = new SNMP::Session (DestHost => $host, Timeout => $TIMEOUT, Community => $COMM, Retries => $RETRIES))) { print STDERR "could not create session to $host: " . $SNMP::Session::ErrorStr, "\n"; next; } $ver = $s->get(['sysDescr', 0]); $ver =~ s/^netapp.*release\s*([^:]+):.*$/$1/i; $v = new SNMP::VarList ( ['dfIndex'], ['dfFileSys'], ['dfKBytesTotal'], ['dfKBytesAvail'], ['dfInodesFree'], ['dfPerCentInodeCapacity'], ); while (defined $s->getnext($v)) { last if ($v->[$dfIndex]->tag !~ /dfIndex/); write; } } exit 0; } format STDOUT_TOP = filer ONTAP filesystem KB total KB avail Inode% ------------------------------------------------------------------------------ . format STDOUT = @<<<<<<<<<<<<<< @<<<<<<<<<< @<<<<<<<<<<<<<<< @>>>>>>>>>> @>>>>>>>>>> @>> $host, $ver, $v->[1]->[2], $v->[2]->[2], $v->[3]->[2], $v->[5]->[2] . mon-1.2.0/mon.d/seq.monitor0000755003616100016640000000343310230411543015422 0ustar trockijtrockij#!/bin/sh # # This is for testing mon during development. # # Call this script with $1 set to a directory, and # $2 set to some text tag, # usually the name of the service you're testing. # # Put a file in $1 called "$1.seq", which is a space # separated list of words. On consecutive calls to # this script, the next word will be interpreted. # # If the word is "0" then this script exits with success. # # Otherwise, echo the word and exit with a failure # # You probably want to "rm -f $1/$2.count" the first # time you run this. # # Jim Trocki, trockij@arctic.org # # $Id: seq.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # path=$1 id=$2 if [ -f "$path/$id.count" ] then count=`cat $path/$id.count` else count=0 fi if [ ! -d "$path" ] then echo "$path" not found exit 1 fi seq=`cat $path/$id.seq` set -- $seq max=$# if [ "$count" = "$max" ] then count=0 else shift $count fi echo `expr $count + 1` > $path/$id.count if [ "$1" = 0 ] then echo "success" exit 0 else echo "failure:$1" exit 1 fi mon-1.2.0/mon.d/dns.monitor0000755003616100016640000004140010477271425015432 0ustar trockijtrockij#!/usr/bin/perl # # Copyright (C) 1998 David Eckelkamp # Copyright (C) 2002-2006 Carnegie Mellon University # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id: dns.monitor,v 1.3 2006/09/05 12:52:37 vitroth Exp $ # =head1 NAME dns.monitor - Monitor DNS servers for the "mon" system =head1 SYNOPSIS B =over 12 ( [ I<-zone zone [-zone zone ...]> =over 4 I<-master server [-master server ...]> I<[-serial_threshold num]> I<[-failsingle]> ] =back | [ I<-caching_only> =over 4 I<-query record[:type[:value]] [-query record[:type[:value]] ...]> ] ) =back I<[-tcp]> I<[-retry num]> I<[-retransmit num]> I<[-timeout num]> I<[-debug num]> I =back =head1 DESCRIPTION B will make several DNS queries to verify that a server is operating correctly In normal mode, B will compare the zones between a master server and one or more slave servers. The I argument is the zone to check. There can be multiple I arguments. The I argument is the master server for the I. There can be multiple I arguments. The master server(s) will be queried for the base information. If the I argument is provided, the serials collected from the I servers are checked to be within I. The greatest serial of all of the I servers is chosen for comparison. Then each I will be queried to verify that it has the correct answers. If the I argument is provided, the slave servers must return a zone whose serial number is no more than the threshold from the serial number of the zone on the master. (Zone serial numbers may not be identical during zone propagation, or on Dynamic DNS zones which may be updated hundreds or thousands of times an hour) It is assumed that each I is supposed to be authoritative for the I. The I<-tcp> option will cause lookups to be done via TCP instead of the default UDP. In caching mode, specified via the I<-caching_only> switch, B will perform a set of DNS queries to one or more servers. The I argument is the query to perform. The query may have an optional query type specified as I<:type> on the end of the query. I.e your.zone.com:MX will cause B to fetch the MX records for your.zone.com. There can be multiple I arguments. The query type may also have an optional result specified as I<:value> on the end of the query (type must also be specified). Each I will be contacted to verify that it returns a valid response to the query. If a query result is specified B will return an error is the DNS query returns an answer which differs from the supplied result. If you wish to use B to verify that a caching DNS server is actually fetching fresh data from other servers successfully, it is recommended that the DNS records you query should have very short TTLs. The exit code of B will be the highest number of servers which failed on a single zone/query, 0 if no problems occurred, or -1 if an error with the script arguments was detected. If all of the I servers fail, the return code will be 252. If using the I option and any I server fails, the return code will be 251. =head1 AUTHOR The script was originally written by David Eckelkamp The script was modified to support Caching DNS servers, configurable retry/timeout parameters, multiple DNS Master servers, and configurable Zone serials by David Nolan and Jason Carr from Carnegie Mellon University. =cut use strict; use Getopt::Long; use English; use File::Basename; use Net::DNS::Resolver; use Net::DNS::Packet; use Net::DNS::RR; use Data::Dumper; my($Program) = basename($0); my(@Zones) = (); my(@Queries) = (); my(@Master) = (); my($SerialThreshold) = (0); my($CachingServer) = (0); my($UseTCP) = (0); my ($retries, $retrans, $timeout) = ( 2, 5, undef ); my $debug = 0; my $failsingle = 0; my(%OptVars) = ( "master" => \@Master, "zone" => \@Zones, "serial_threshold" => \$SerialThreshold, "caching_only" => \$CachingServer, "query" => \@Queries, "retry" => \$retries, "retransmit" => \$retrans, "timeout" => \$timeout, "tcp" => \$UseTCP, "debug" => \$debug, "failsingle" => \$failsingle ); if (!GetOptions(\%OptVars, "master=s@", "zone=s@", "serial_threshold=s", "caching_only", "tcp", "query=s@", "retry=i", "retransmit=i", "timeout=i", "debug", "failsingle")) { print STDERR "Problems with Options, sorry\n"; exit -1; } if ( $#ARGV < 0 ) { print STDERR "$Program: at least one server must be specified\n"; usage(); exit -1; } if (!$CachingServer) { if (!defined(@Master)) { print STDERR "$Program: The zone master server must be specified\n"; usage(); exit -1; } if ( !defined(@Zones) ) { print STDERR "$Program: At least one zone must be specified\n"; usage(); exit -1; } } else { if ( !defined(@Queries) ) { print STDERR "$Program: At least one query must be specified\n"; usage(); exit -1; } } if (!$CachingServer) { my($err_cnt) = 0; my($bad_servers, $reason, $failcount, @FailedZones, @FailedServers, @Reasons); my($zone, $line, $i); foreach $zone (@Zones) { ($bad_servers, $reason, $failcount) = dns_verify($zone, \@Master, \@ARGV); if (defined($bad_servers)) { $err_cnt = $failcount if ($failcount > $err_cnt); push(@FailedZones, $zone); push(@FailedServers, $bad_servers); push(@Reasons, $reason); } } @FailedServers=split(' ',join(" ",@FailedServers)); my (@UniqFailedServers, %saw); @saw{@FailedServers} = (); @UniqFailedServers = keys %saw; if ($err_cnt > 0) { print join(" ", @UniqFailedServers); print "\n"; # Now print the detail lines for ($i=0; $i<=$#FailedZones; $i++) { print "Zone '$FailedZones[$i]': failed servers: $FailedServers[$i]\n"; print "Diagnostics:\n"; foreach $line (split("\n", $Reasons[$i])) { print " $line\n"; } print "\n"; } } exit $err_cnt; } else { my($err_cnt) = 0; my($bad_servers, $reason, $failcount, @FailedQuerys, @FailedServers, @Reasons); my($query, $type, $line, $i, $target); foreach (@Queries) { ($query, $type, $target) = split /:/; $type = 'A' if ($type eq ""); ($bad_servers, $reason, $failcount) = dns_test($query, $type, $target, @ARGV); if (defined($bad_servers)) { $err_cnt = $failcount if ($failcount > $err_cnt); push(@FailedQuerys, "$query $type") if (!$target); push(@FailedQuerys, "$query $type == $target $type") if ($target); push(@FailedServers, $bad_servers); push(@Reasons, $reason); } } @FailedServers=split(' ',join(" ",@FailedServers)); my (@UniqFailedServers, %saw); @saw{@FailedServers} = (); @UniqFailedServers = keys %saw; if ($err_cnt > 0) { print join(" ", @UniqFailedServers); print "\n"; # Now print the detail lines for ($i=0; $i<=$#FailedQuerys; $i++) { print "Query '$FailedQuerys[$i]': failed servers: $FailedServers[$i]\n"; print "Diagnostics:\n"; foreach $line (split("\n", $Reasons[$i])) { print " $line\n"; } print "\n"; } } exit $err_cnt; } # dns_verify($zone, \@master, \@Servers) # This subroutine takes 3 or more arguments. The first argument is the name of # the DNS zone/domain to check. The second argument is the name of the DNS # server you consider to be the master of the given zone. The subroutine # will make a DNS query to the the master to get the SOA for the zone and # extract the serial number. The third and rest of the arguments are taken as # names of slave DNS servers. Each server will be queried for the SOA of the # given zone and the serial number will be checked against that found in the # SOA record on the master server. By default the zone serials must be # the same. This may be overridden by the serial_threshold command line # argument. # The return value is a 3 element list. The first element is a space delimited # string containing the names of the slave servers that did not match the # master zone. The second element is a string containing the diagnostic # output that should explain the problem encountered. The third element is a count # of how many servers failed, which will be used as the exit code. sub dns_verify { # First verify that we have enough arguments. my($Zone) = shift; my(@Master) = @{shift()}; my(@Servers) = @{shift()}; my($result) = undef; my(@failed, $res, $soa_req, $Serial, $error_cnt, $server); my(%serials) = (); my(%errors) = (); # Query the $Master for the SOA of $Zone and get the serial number. $res = new Net::DNS::Resolver; $res->usevc(1) if ($UseTCP); $res->defnames(0); # don't append default zone $res->recurse(0); # no recursion $res->retry($retries); # retries before failure $res->retrans($retrans); # retransmission interval $res->udp_timeout($timeout); # set udp timeout $res->tcp_timeout($timeout); # set tcp timeout $error_cnt=0; # Loop through each master server foreach my $qs (@Master) { $res->nameservers($qs); $soa_req = $res->query($Zone, "SOA"); if (!defined($soa_req) || ($soa_req->header->ancount <= 0)) { $error_cnt++; $errors{$qs} = sprintf("SOA query for $Zone from $qs failed %s\n", $res->errorstring); if ($res->errorstring eq 'NOERROR') { $errors{$qs} .= sprintf(" Empty answer received. (No zone on server?)\n") } if ($failsingle) { return ($qs, $errors{$qs}, 251); } next; } unless ($soa_req->header->aa) { $error_cnt++; $errors{$qs} = sprintf("$qs is not authoritative for $Zone\n"); if ($failsingle) { return ($qs, $errors{$qs}, 251); } next; } unless ($soa_req->header->ancount == 1) { $error_cnt++; $errors{$qs} = sprintf("Too many answers for SOA query to %s for %s\n", $qs, $Zone); if ($failsingle) { return ($qs, $errors{$qs}, 251); } next; } unless (($soa_req->answer)[0]->type eq "SOA") { $error_cnt++; $errors{$qs} = printf("Query for SOA for %s from %s failed: " . "return type = %s\n", $Zone, $qs, ($soa_req->answer)[0]->type); if ($failsingle) { return ($qs, $errors{$qs}, 251); } next; } $serials{$qs} = ($soa_req->answer)[0]->serial; } if ($debug >= 2) { print Data::Dumper->Dump([\%serials], ['serials']); } if ($error_cnt == scalar @Master) { # all masters errored return("", values %errors, 251); } my $maxvalue = undef; my $minvalue = undef; my $maxkey = undef; my $minkey = undef; foreach my $key (keys %serials) { if ($serials{$key} > $maxvalue) { $maxvalue = $serials{$key}; $maxkey = $key; } if (($serials{$key} < $minvalue) || (!defined $minkey)) { $minvalue = $serials{$key}; $minkey = $key; } } if (abs($maxvalue - $minvalue) > $SerialThreshold) { return ($minkey, "\nQuery to $minkey about $Zone failed\n" . "Serial number = $minvalue, should have been $maxvalue\n", 252) } $Serial = $maxvalue; return ("", "\nNo SOA Serial found for $Zone!?!?", 252) if (!$Serial); # Now, foreach server given on the command line, get the serial number from # the SOA and compare it to the master. $error_cnt = 0; foreach $server (@Servers) { $res = new Net::DNS::Resolver; $res->usevc(1) if ($UseTCP); $res->defnames(0); # don't append default zone $res->recurse(0); # no recursion $res->retry($retries); $res->retrans($retrans); $res->udp_timeout($timeout); $res->tcp_timeout($timeout); $res->nameservers($server); $soa_req = $res->query($Zone, "SOA"); if (!defined($soa_req) || ($soa_req->header->ancount <= 0)) { $error_cnt++; push(@failed, $server); $result .= sprintf("\nSOA query for $Zone from $server failed %s\n", $res->errorstring); if ($res->errorstring eq 'NOERROR') { $result .= sprintf(" Empty answer received. (No zone on server?)\n"); } next; } unless($soa_req->header->aa && $soa_req->header->ancount == 1 && ($soa_req->answer)[0]->type eq "SOA" && ((abs(($soa_req->answer)[0]->serial - $Serial)) <= $SerialThreshold)) { $error_cnt++; push(@failed, $server); $result .= sprintf("\nQuery to $server about $Zone failed\n" . "Authoritative = %s\n" . "Answer count = %d\n" . "Answer Type = %s\n" . "Serial number = %s, should have been %s\n" , $soa_req->header->aa ? "yes" : "no", $soa_req->header->ancount, ($soa_req->answer)[0]->type, ($soa_req->answer)[0]->serial, $Serial); next; } } if ($error_cnt == 0) { return(undef, undef, undef); } else { return("@failed", $result, $error_cnt); } } # dns_test($query, $type, $target, $server, ...) # This subroutine takes 4 or more arguments. The first argument is the name of # the DNS record to query. The second argument is the type of the DNS # query to perform. The third argument is the name of a second DNS record to query, # whose results should match the first query. The fourth and rest of the arguments are # taken as names of caching DNS servers. Each server will be queried for the # given record and type # The return value is a 3 element list. The first element is a space delimited # string containing the names of the servers that failed to respond to the # query. The second element is a string containing the diagnostic # output that should explain the problem encountered. The third element is the # count of how many servers failed, which will be used as the exit code. sub dns_test { # First verify that we have enough arguments. my($Query, $type, $target, @Servers) = @_; my($result) = undef; my(@failed, $res, $req, $treq, $Serial, $error_cnt, $server); # Now, foreach server given on the command line, # make the query $error_cnt = 0; foreach $server (@Servers) { $res = new Net::DNS::Resolver; $res->defnames(0); # don't append default zone $res->retry($retries); # 2 retries before failure $res->retrans($retrans); $res->udp_timeout($timeout); $res->tcp_timeout($timeout); $res->nameservers($server); $req = $res->query($Query, $type); if (!defined($req) || ($req->header->ancount <= 0)) { $error_cnt++; push(@failed, $server); $result .= sprintf("\n$type query for $Query from $server failed %s\n", $res->errorstring); next; } elsif ($target) { $treq = $res->query($target, $type); my $status = 0; foreach my $qans ($req->answer) { print STDERR $qans->string."\n" if ($debug); print STDERR $qans->rdatastr."\n" if ($debug); foreach my $tans ($treq->answer) { print STDERR "target\n" if ($debug); print STDERR $tans->string."\n" if ($debug); print STDERR $tans->rdatastr."\n" if ($debug); if ($tans->rdatastr eq $qans->rdatastr) { print STDERR "match found\n" if ($debug); $status = 1; last; } } last if ($status); } if (!$status) { $error_cnt++; push @failed, $server; $result .= "Query $Query:$type failed to match $target\n"; } } } if ($error_cnt == 0) { return(undef, undef, undef); } else { return("@failed", $result, $error_cnt); } } sub usage { print STDERR < - ntp monitor using ntpdate to do most of the work =head1 DESCRIPTION A mon monitor to verify that ntp is running on multiple servers, those servers have synchronized time, and that the times are within specified limits. The mon server should be running ntp since the times are reported relative to the system performing the query. =head1 SYNOPSIS B =head1 OPTIONS =over 5 =item B<--maxstratum> Maximum stratum number, default is 10. Stratum 16 indicates that ntp is running on a system, but the clock is not synchronized. An alarm will be triggered if this value is exceeded. =item B<--maxoffset> Maximum value of the clock offset in seconds, default is 800 ms (a large value, ntp typically keeps clocks within milliseconds of each other). An alarm will be triggered if this value is exceeded. =item B<-l log_file_template> or B<--log log_file_template> /path/to/logs/internet_web_YYYYMM.log Current year & month are substituted for YYYYMM, that is the only possible template at this time. The format of the log file is: time server stratum offset delay time is in UNIX seconds, offset, and delay are in seconds. Note that offset and delay times in mon detail, and the optional HTML page are in milliseconds. =item B<-shortalerts> Use only hostname in alert list. For organizations with long FQDNs this will make mail and pager alerts more readable. =item B<--htmlfile /full/path/to/file.html> Optional location to write the formated results from the current test. Be sure that the directory is writeable by the user under whom mon is running. =item B<-d> or B<--debug> Debug/Test/Verbose, for manual testing only. =item B<--ntpdate> Specify the location of ntpdate, the default is /usr/sbin/ntpdate =back =head1 MON CONFIGURATION EXAMPLE hostgroup ntp ntp1.somedomain.org ntp2.somedomain.org ntp3.somedomain.org watch ntp service ntpdate interval 30m monitor ntpdate.monitor --maxoffset 0.100 --log /usr/local/mon/logs/gv-ntp-YYYYMM.log period wd {Sun-Sat} alert mail.alert user@somedomain.org alertevery 1h summary =head1 BUGS Listing a server twice can cause ntpdate to report that server as Stratum 0. This can happen even if an alias name is used. The shortalerts option only reports the hostname, it could be extended to provide a configurable number of FQND fields. ntpdate will be removed from the NTP distribution at some point. This monitor will need to be modified to use some form of ntpd -q instead. Check the first line of this file to be sure that it points to an appropriate perl executable. =head1 AUTHOR Jon Meek, meekj at ieee.org =head1 SEE ALSO ntp.monitor by Daniel Hagerty =cut $RCSid = q{$Id: ntpdate.monitor,v 1.3.2.1 2007/06/03 13:14:27 trockij Exp $ }; # # Jon Meek # Lawrenceville, NJ # meekj at ieee.org # # # $Id: ntpdate.monitor,v 1.3.2.1 2007/06/03 13:14:27 trockij Exp $ # # Copyright (C) 2002-2006, Jon Meek # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Long; GetOptions( "maxstratum=i" => \$MaxStratum, "maxoffset=f" => \$MaxOffset, # "dns" => \$UseDNS, "d|debug" => \$Debug, "l=s" => \$LogFile, "log=s" => \$LogFile, "htmlfile=s" => \$HtmlFile, "shortalerts" => \$ShortAlerts, "ntpdate=s" => \$NTPDATE, ); use Net::DNS; use Sys::Hostname; use POSIX qw(strftime); # # Set Defaults # # ntpdate reports stratum 16 if ntp is running, but time is not synchronized # stratum 0 will be reported if ntp is not running # $MaxStratum = 10 unless $MaxStratum; $MinStratum = 1; # Use the first occurrence of this stratum as the reference time for alarms $ReferenceStratum = 1 unless $ReferenceStratum; # # Trigger alarm if the time is ever off by this much # $MaxOffset = 0.800 unless $MaxOffset; # seconds $NTPDATE = '/usr/sbin/ntpdate' unless $NTPDATE; $HtmlFileHandle = &HTMLheader($HtmlFile) if $HtmlFile; @Failures = (); @Hosts = @ARGV; # Host names are left on the command line after Getopt %NameByIP = &DNSlookups(\@Hosts); $TimeOfDay = time; # Current time print "TimeOfDay: $TimeOfDay\n" if $Debug; $cmd = qq{$NTPDATE -q @Hosts |}; $pid = open(NTP, $cmd) || die "Couldn't run $cmd: $!\n"; while ($in = ) { # print $in if $Debug; chomp $in; # # Pick out server strings # if ($in =~ /^server\s+([\d\.]+),\s+stratum\s+(\d+),\s+offset\s+([\d\.\-\+]+),\s+delay\s+([\d\.\-\+]+)/) { $ip = $1; $stratum = $2; $offset = $3; $delay = $4; $name = $NameByIP{$ip}; print "$in Name: $name Stratum: $stratum\n" if $Debug; if (exists $NameByIP{$ip}) { # Use system name if we have it $HostName = $NameByIP{$ip}; } else { $HostName = $ip; # Otherwise use IP address } $IP{$HostName} = $ip; $Stratum{$HostName} = $stratum; $Offset{$HostName} = $offset; $Delay{$HostName} = $delay; $Detail{$HostName} = $in; if ((!defined $ReferenceOffset) && ($stratum == 1)) { # Save offset from first stratum 1 server seen $ReferenceOffset = $offset; } # # Prepare log entries # if ($LogFile or $Debug) { $LogString{$HostName} = qq{$TimeOfDay $HostName $stratum $offset $delay}; } } } #1234567890123456789012345678901234567890123456789012345678901234567890 #fwmon-gv.gv.us.pri.wyeth.com 0.276 2 0.413 61.050 # # Build formatted results and check alarm limits # $FmtDetail = qq{NTP Server times in milliseconds Delta Stratum Rel Delay\n}; &HTMLtableHeader($HtmlFileHandle, 'NTP Server', 'Delta, ms', 'Stratum', 'Rel, ms', 'Delay, ms', 'Status') if $HtmlFile; foreach $hostname (sort keys %Stratum) { $DeltaTime = $Offset{$hostname} - $ReferenceOffset; $DeltaTimeByHost{$hostname} = $DeltaTime; $msDeltaTime = 1000 * $DeltaTime; $msOffset = 1000 * $Offset{$hostname}; $msDelay = 1000 * $Delay{$hostname}; $FmtDetail .= sprintf ("%-35s %9.3f %3d %9.3f %9.3f", $hostname, $msDeltaTime, $Stratum{$hostname}, $msOffset, $msDelay); $fail_string = ' '; if (($Stratum{$hostname} > $MaxStratum) || ($Stratum{$hostname} < $MinStratum) || (abs($DeltaTime) > $MaxOffset)) { $ip = $IP{$hostname}; $FailureDetail{$hostname} = $Detail{$hostname}; push(@Failures, $hostname); $FmtDetail .= q{ Fail}; $fail_string = 'Fail'; } $FmtDetail .= "\n"; if ($HtmlFile) { $fDeltaTime = sprintf("%12.3f", $msDeltaTime); $fOffset = sprintf("%12.3f", $msOffset); $fDelay = sprintf("%12.3f", $msDelay); &HTMLtableRow($HtmlFileHandle, $hostname, $fDeltaTime, $Stratum{$hostname}, $fOffset, $fDelay, $fail_string); } } print "\n$FmtDetail\n" if $Debug; # # Write results to logfile, if -l # if ($LogFile) { $LogFile = $LogFile; ($sec, $min, $hour, $mday, $Month, $Year, $wday, $yday, $isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month if (-e $LogFile) { # Check for existing log file $NewLogFile = 0; } else { $NewLogFile = 1; } open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $ip (sort keys %LogString) { print LOG "$LogString{$ip}\n"; } close LOG; } if ($Debug) { foreach $ip (sort keys %LogString) { print "LOG: $LogString{$ip}\n"; } } &HTMLtrailer($HtmlFileHandle) if $HtmlFile; if (@Failures == 0) { # Indicate "all OK" to mon exit 0; } # # Otherwise we have one or more failures # if ($ShortAlerts) { foreach $host (sort @Failures) { if ($host =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/) { # IP address, don't shorten push(@SortedFailures, $host); } else { $host =~ /(.*?)\./; push(@SortedFailures, $1); } } } else { @SortedFailures = sort @Failures; } print "------- Have Failures -------\n" if $Debug; print "@SortedFailures\n"; print "------- Details -------\n" if $Debug; print $FmtDetail; #foreach $hostname (sort keys %FailureDetail) { # print "$NameByIP{$hostname} $hostname $FailureDetail{$hostname} $DeltaTimeByHost{$hostname} s\n"; #} exit 1; # Indicate failure to mon # # Get the IP addresses for the hosts (because ntpdate returns IP addresses) # sub DNSlookups { my ($Hosts) = @_; $res = new Net::DNS::Resolver; for (my $i = 0; $i < @$Hosts; $i++) { $target = $Hosts->[$i]; $query = $res->search($target); if ($query) { foreach $rr ($query->answer) { #print "$target Type: ", $rr->type, "\n" if $Debug; if ($rr->type eq "A") { print $rr->address . ' ' if $Debug; $NameByIP{$rr->address} = $target; } } } } return %NameByIP; } sub HTMLheader { # # Print basic standard header for this application # my($FileName) = @_; local *F; open(F, ">$FileName") || warn "$$ can't open $FileName, check permissions"; $Title = "NTP Server Status"; $MonitorHostname = hostname; $FmtTimeNow = strftime("%A %d-%b-%Y %H:%M:%S %Z", localtime(time)); print F <<"EndOfHeader"; $Title

$Title from $MonitorHostname

$FmtTimeNow

EndOfHeader return *F; } sub HTMLtableHeader { my($FileHandle, @Headers) = @_; print $FileHandle "\n"; foreach $h (@Headers) { print $FileHandle "\n"; } print $FileHandle "\n"; } sub HTMLtableRow { my ($FileHandle, @Fields) = @_; my ($align, $f); $align = ''; print $FileHandle "\n"; foreach $f (@Fields) { print $FileHandle "$f\n"; $align = ' align=right'; } print $FileHandle "\n"; } sub HTMLtrailer { # # Print basic standard trailer for this application # my($FileHandle) = @_; print $FileHandle "
$h
\n\n\n"; close $FileHandle; } mon-1.2.0/mon.d/lpd.monitor0000755003616100016640000001635510061516615015430 0ustar trockijtrockij#!/usr/bin/perl # # Try to connect to an lpd server and get status of print queues. # For use with "mon". # # lpd.monitor [-l] [-d] [-s secs] [-p port] [-t secs] -h host queue [queue...] # # -l interpret queue output as lprng (error, status, etc.) # -d do not show detail # -e report queues with "error" jobs # -s secs report jobs stalled longer than "secs" as an error # -h host host running lpd # -p port TCP port to connect to (defaults to 515) # -t secs timeout, defaults to 30 # # "get" routine based on other monitors written by Jon Meek # # $Id: lpd.monitor,v 1.1.1.1 2004/06/09 05:18:05 trockij Exp $ # # Copyright (C) 2001, 2002, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; sub lpdGET; sub check_lprng; sub OpenSocket; getopts ("deh:s:lp:t:P:"); $PORT = $opt_p || 515; $TIMEOUT = $opt_t || 30; $STALLED = $opt_s || 60; # stalled 1 mins $HOST = $opt_h || die "no host supplied with -h\n"; my %good; my %bad; my %details; exit 0 if (!@ARGV); foreach my $queue (@ARGV) { my $result = lpdGET ($HOST, $PORT, $queue); if (!$result->{"ok"}) { $bad{$queue} = $result; } # # look in lprng output for bad things # elsif ($opt_l) { my $err = check_lprng ($result->{"header"}); $details{$queue} = $err->{"fields"}; if (!$err->{"ok"}) { $bad{$queue} = $result; $bad{$queue}->{"error"} = $err->{"error"}; } else { $good{$queue} = $result; } } else { $good{$queue} = $result; } } my $ret; if (keys %bad) { $ret = 1; print join (" ", sort keys %bad), "\n"; } else { $ret = 0; print "\n"; } # # show detail # if (!$opt_d) { # # failure detail # foreach my $q (keys %bad) { print "------------------------------------------------------------------------------\n"; print "HOST $HOST QUEUE $q: $bad{$q}->{error}\n"; print $details{$q}->{"Printer"}, "\n"; print "------------------------------------------------------------------------------\n"; if ($opt_l) { # this will probably never be true if ($details{$q}->{"queuelist"} eq "") { print "queue empty\n"; } else { print $details{$q}->{"queuelist"}; } } elsif ($bad{$q}->{"header"} ne "") { print $bad{$q}->{"header"}, "\n"; } print "\n"; } if (keys %good) { print <<'EOF'; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%% %%%%%%% %%%%%%% the following are queues which have no problems at this moment %%%%%%% %%%%%%% %%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% EOF } # # non-failure detail # foreach my $q (keys %good) { print "------------------------------------------------------------------------------\n"; print "HOST $HOST QUEUE $q: ok\n"; print $details{$q}->{"Printer"}, "\n"; print "------------------------------------------------------------------------------\n"; if ($opt_l) { if ($details{$q}->{"queuelist"} eq "") { print "queue empty\n"; } else { print "$details{$q}->{queuelist}\n"; } } else { print $good{$q}->{"header"}, "\n"; } print "\n"; } } exit $ret; sub lpdGET { use Socket; use Sys::Hostname; my($Server, $Port, $Queue) = @_; my($ServerOK, $TheContent); $TheContent = ''; my $result; eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; my $err = &OpenSocket($Server, $Port); # Open a connection to the server if ($err ne "") { # Failure to open the socket $result = { "ok" => 0, "error" => $err, "header" => undef, }; return undef; } # # lpd queue list # "short" listing command is 0x03 # "long" listing command is 0x04 # print S "\x04$Queue\x0a"; $/ = "\x0a"; while (defined ($in = )) { $TheContent .= $in; # Store data for later processing } close(S); alarm 0; # Cancel the alarm $ServerOK = 1; }; if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { return { "ok" => 0, "error" => "timeout after $TIMEOUT seconds", "header" => $TheContent, }; } if ($result->{"error"} ne "") { return $result; } return { "ok" => $ServerOK, "header" => $TheContent, "error" => undef, }; } # # look for badness in lprng output # error # stalled > $opt_n secs # sub check_lprng { my ($buff) = @_; my $in_rank = 0; my $fail = 0; my $status = { "ok" => 1, "fields" => {}, "error" => "", }; foreach my $l (split (/\x0d?\x0a/sm, $buff)) { # # sort data # if ($l =~ /^\s+([^:]+):\s+(.*)$/) { $status->{"fields"}->{$1} .= $2; } elsif ($l =~ /^Printer:\s+(.*)/) { $status->{"fields"}->{"Printer"} = $1; } if ($l =~ /^\s+Rank.*Owner/) { $in_rank++; $status->{"fields"}->{"queuelist"} = ""; } if ($in_rank) { $status->{"fields"}->{"queuelist"} .= "$l\n"; } # # check for errors # if ($in_rank && $opt_e && $l =~ /^error/ && !$fail) { $status->{"ok"} = 0; $status->{"error"} = "job error"; $fail = 1; } elsif ($in_rank && $l =~ /^stalled\((\d+)sec/ && $1 > $STALLED && !$fail) { $status->{"ok"} = 0; $status->{"error"} = "job stalled $1 seconds"; $fail = 1; } elsif ($in_rank && $l =~ /^active\(attempt-(\d+)/ && !$fail) { $status->{"ok"} = 0; $status->{"error"} = "multiple attempts, currently $1"; $fail = 1; } } return $status; } # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # # returns "" on success, or an error string on failure # sub OpenSocket { my ($host, $port) = @_; my $proto = (getprotobyname('tcp'))[2]; return ("could not get protocol") if (!defined $proto); my $conn_port; if ($port =~ /^\d+$/) { $conn_port = $port; } else { $conn_port = (getservbyname($port, 'tcp'))[2]; return ("could not getservbyname for $port") if (!defined $conn_port); } my $host_addr = (gethostbyname($host))[4]; return ("gethostbyname failure") if (!defined $host_addr); my $that = sockaddr_in ($conn_port, $host_addr); if (!socket (S, &PF_INET, &SOCK_STREAM, $proto)) { return ("socket: $!"); } if (!connect (S, $that)) { return ("connect: $!"); } select(S); $| = 1; select(STDOUT); ""; } mon-1.2.0/mon.d/local-syslog.monitor0000755003616100016640000001615210630542607017256 0ustar trockijtrockij#!/usr/bin/perl # # A syslog monitor for mon, with support for Cisco router messages # # $Id: local-syslog.monitor,v 1.1.2.1 2007/06/03 13:43:35 trockij Exp $ # # Copyright (C) 2001-2006, Jon Meek, meekj at ieee.org # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # =head1 NAME B - syslog monitor for mon =head1 DESCRIPTION Watch local syslog log files and alert when a regular expression is matched. Include router interface descriptions for logs generate by a Cisco router. =head1 SYNOPSIS When the regular expression is matched, an alert is triggered. The alert details contain the actual log entries that matched. An optional additional description can be added to the details for an IP address / Interface Name pair if the log entry comes from a Cisco router. A common usage involves having other systems send their syslog information to this mon server so they can be watched by this monitor. =head1 OPTIONS =over 5 =item B<-c> Configuration file =item B<-d> Debug/Test. Useful for manual command line use. =back =head1 CONFIGURATION FILE =item B Regular expression that will trigger an alarm when there is a match. =item B The top of the directory tree where the syslog files are kept. =item B Syslog files are usually specified on the command line, but can be listed in the config file as well. =item B IP Address / Interface Name pairs can be expanded in the details section of an alert. These lines consist of an IP address followed by whitespace, then the interface name followed by whitespace, then a description of the interface. This is added to the alert if the log entry is found to match the output generated from a Cisco router. Example: Strings %BGP-5|%LINEPROTO-5-UPDOWN|%LINK-3-UPDOWN BaseDir /d2/logs/remotesyslog #File exnetmon-gv/cisco.log #File ot-gv/cisco.log 192.168.56.1 FastEthernet0/0 To Extranet Switch 192.168.56.1 Serial0/0 Frame-Relay DHEC266467 DLCI 612 192.168.56.1 FastEthernet0/1 Inside Interface 192.168.56.1 Serial0/1 T-1 to ICO DHEC397828 =head1 MON CONFIGURATION EXAMPLE hostgroup remote-syslog exnetmon-wl/cisco.log ot-wl/cisco.log watch remote-syslog service syslog interval 15m monitor local-syslog.monitor -c /usr/local/mon/syslog.cfg period wd {Sun-Sat} alert mail.alert meekj-at-ieee.org =head1 AUTHOR Jon Meek, meekj at ieee.org =cut use Getopt::Std; #use File::Basename; #use Time::HiRes qw( gettimeofday tv_interval ); $BaseDir = ''; @Failures = (); # Initialize failure list $TimeOfDay = time; # Current time getopts ("dc:"); if (defined $ENV{MON_STATEDIR}) { # Are we running under mon? $STATE_DIR = $ENV{MON_STATEDIR}; $RunningUnderMon = 1; } else { $RunningUnderMon = 0; } if ($opt_c) { # Read configuration file $ConfigFile = $opt_c; if (open(C, $ConfigFile)) { while ($in = ) { last if ($in =~ /^Exit/i); next if ($in =~ /^\#/); # Comments chomp $in; if ($in =~ /^BaseDir/i) { ($tag, $BaseDir) = split(' ', $in, 2); next; } if ($in =~ /^File/i) { ($tag, $file) = split(' ', $in, 2); push(@Files, $file); next; } if ($in =~ /^Strings/i) { ($tag, $data) = split(' ', $in, 2); $RegEx .= $data; next; } if ($in =~ /^[0-9\.]+/) { ($router_ip, $interface_name, $description) = split(' ', $in, 3); # print "$router_ip - $interface_name - $description\n" if $opt_d; $RouterInterfaceDescription{"$router_ip $interface_name"} = $description; next; } if ($in =~ /^RouterList/i) { ($tag, $RouterListFile) = split(' ', $in, 2); next; } if ($in =~ /^StateDir/i) { # If the mon environment variable needs to be overriden ($tag, $STATE_DIR) = split(' ', $in, 2); next; } } } else { print "local-syslog.monitor: Couldn't open $ConfigFile configuration file\n"; exit 1; } } push (@Files, @ARGV); # Get file names from the command line foreach $File (@Files) { $FullPath = "$BaseDir/$File"; if (!-e $FullPath) { # File does not exist print "**** $FullPath does not exist\n" if $opt_d; next; } $StateFile = $FullPath; $StateFile =~ s/\//-/g; # Change / to - to make filename $StateFile =~ s/^-//; $StateFile = "$STATE_DIR/$StateFile"; # # Read the previous file sizes if the State File exists # if (-e $StateFile) { print "Existing $StateFile\n" if $opt_d; open(F, $StateFile); $in = ; ($t, $last_size) = split(' ', $in); close F; $StateFileExists = 1; # Remember that there is a existing State File } else { $StateFileExists = 0; # or not $last_size = 0; print "No Existing $StateFile\n" if $opt_d; } # # Get file information # ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($FullPath); # $basename = basename($SmartAlarm); # Get the path to the module # $dirname = dirname($SmartAlarm); print "$StateFile Last size: $last_size Size - : $size\n" if $opt_d; next if ($size == $last_size); # File has not changed since last check (may want to check time too) $FailureDetail{$File} = ''; open(F, $FullPath); if ($size > $last_size) { # Position to read new lines only seek(F, $last_size, 0); } while ($in = ) { chomp $in; if ($in =~ /$RegEx/o) { # Check the master regular expression # # see if we have a description on the router/interface (from cisco) # if ($in =~ /^(\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/) { $router_ip = $2; $in =~ /\s+Interface\s+([\w\/\.]+)/; $interface_name = $1; # print "*** $router_ip $interface_name\n"; if (exists $RouterInterfaceDescription{"$router_ip $interface_name"}) { $in .= ' [' . $RouterInterfaceDescription{"$router_ip $interface_name"} . ']'; } } $HaveFailureData{$File}++; $FailureDetail{$File} .= "$in\n"; print "$in\n" if $opt_d; } } $CurrentPosition = tell(F); print "Final Position: $CurrentPosition\n" if $opt_d; close F; # Write new state file print "Writing a new $StateFile\n" if $opt_d; open(F, ">$StateFile"); print F "$TimeOfDay $CurrentPosition\n"; close F; } foreach $k (sort keys %HaveFailureData) { push(@Failures, $k); } if (@Failures == 0) { # Indicate "all OK" to mon exit 0; } # # Otherwise we have one or more failures # print "@Failures\n"; foreach $k (@Failures) { print "$k $FailureDetail{$k}\n"; } print "\n"; exit 1; # Indicate failure to mon mon-1.2.0/mon.d/pop3.monitor0000755003616100016640000000657110230411543015521 0ustar trockijtrockij#!/usr/bin/perl # # Use try to connect to a POP-3 server, and # wait for the right output. # # For use with "mon". # # Arguments are "-p port -t timeout host [host...]" # # Adapted from "http.monitor" by # Jim Trocki, trockij@arctic.org # # http.monitor written by # # Jon Meek # American Cyanamid Company # Princeton, NJ # # $Id: pop3.monitor,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use English; getopts ("p:t:"); $PORT = $opt_p || 110; $TIMEOUT = $opt_t || 30; @failures = (); @details = (); foreach $host (@ARGV) { if (! &pop3GET($host, $PORT)) { push (@failures, $host); } } if (@failures == 0) { exit 0; } print join (" ", sort @failures), "\n"; print @details if (scalar @details > 0); exit 1; sub pop3GET { use Socket; use Sys::Hostname; my($Server, $Port) = @_; my($ServerOK, $TheContent); $ServerOK = 0; $TheContent = ''; $Path = '/'; ############################################################### eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; $result = &OpenSocket($Server, $Port); # Open a connection to the server if ($result == 0) { # Failure to open the socket return ''; } $in = ; if ($in !~ /^\+OK/) { alarm 0; return 0; } print S "quit\r\n"; $in = ; if ($in !~ /^\+OK/) { alarm 0; return 0; } $ServerOK = 1; close(S); alarm 0; # Cancel the alarm }; if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) { push(@details, "$host: timeout($TIMEOUT)\n"); return 0; } return $ServerOK; } sub OpenSocket { # # Make a Berkeley socket connection between this program and a TCP port # on another (or this) host. Port can be a number or a named service # local($OtherHostname, $Port) = @_; local($OurHostname, $sockaddr, $name, $aliases, $proto, $type, $len, $ThisAddr, $that); $OurHostname = &hostname; ($name, $aliases, $proto) = getprotobyname('tcp'); ($name, $aliases, $Port) = getservbyname($Port, 'tcp') unless $Port =~ /^\d+$/; ($name, $aliases, $type, $len, $ThisAddr) = gethostbyname($OurHostname); ($name, $aliases, $type, $len, $OtherHostAddr) = gethostbyname($OtherHostname); if (!defined $OtherHostAddr) { push (@details, "$host: cannot resolve hostname\n"); return undef; } my $that = sockaddr_in ($Port, $OtherHostAddr); if (! ($result = socket(S, &PF_INET, &SOCK_STREAM, $proto)) || (! ($result = connect(S, $that))) ) { push (@details, "$host: $!\n"); return undef; } select(S); $| = 1; select(STDOUT); # set S to be un-buffered return 1; # success } mon-1.2.0/mon.d/traceroute.monitor0000755003616100016640000004300310630536553017021 0ustar trockijtrockij#!/usr/bin/perl # # mon monitor to watch for route changes # # There is currently a hardcoded path to the traceroute binary, see $TRACEROUTE # but it can be overriden in the config file. # # Jon Meek - 31-May-1999 (original code) # # # Jon Meek # Lawrenceville, NJ # meekj at ieee.org # # $Id: traceroute.monitor,v 1.2.2.1 2007/06/03 13:08:59 trockij Exp $ # # Copyright (C) 2001-2005, Jon Meek # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # =head1 NAME B - Route monitor for mon. =head1 DESCRIPTION Monitor routes from monitor machine to a remote system using traceroute. Alarm and log when changes are detected. =head1 SYNOPSIS B The logfile template is usually specified in the configuration file. =head1 OPTIONS =over 5 =item B<-d> Debug/Test =item B<-c config.cfg> Configuration file for this monitor, see example below =item B<-t timeout> Timeout for traceroute to run in seconds default is 20s =item B<-l log_file_template> /path/to/logs/internet_web_YYYYMM.log Current year & month are substituted for YYYYMM, that is the only possible template at this time. =item B<-u> Fail On Unexpected Hop Only - An alert will be triggered if any of the IP addresses specified using the UnexpectedHop item in the configuration file are observed. Any other route change will not create an alert. This option is useful for alerting when a backup circuit becomes part of a default route. This option causes traceroute.monitor to act more like a traditional mon monitor because an alert will be triggered on every monitor run if an UnexpectedHop appears in the path. The use of mon's alertafter N directive can be used to filter out short term glitches that might cause a route to change for just one monitor cycle. =back =head1 MON CONFIGURATION EXAMPLE hostgroup route1 rt-tb-paris-26 rt-tb-london-18 rt-tta-pr01r00-4 rt-cam-cer001-5 rt-tta-pn01r00-4 watch route1 service traceroute interval 15m monitor traceroute.monitor -c /usr/local/mon/traceroute.cf period wd {Sun-Sat} alert mail.alert meekj alertevery 1h summary =head1 CONFIGURATION FILE EXAMPLE # traceroute.monitor Config File RouteLogFile /usr/local/mon/logs/routes_YYYYMM.log RouterList /usr/local/mon/rt.list Traceroute /usr/sbin/traceroute TracerouteOptions -I StateDir /usr/local/mon/state.d EquivIP 10.22.4.254 10.22.5.254 10.22.6.254 EquivIP 10.28.4.254 10.28.5.254 10.28.6.254 StopAt 172.30.124.17 A firewall StopAt 172.31.124.17 Another firewall UnexpectedHop 10.22.249.2 S2S VPN Tunnel AlertMessage For more details see: http://$HOSTNAME/cgi-bin/network/traceroute.anal EndAlertMessage Lines with '#' in the first column are ignored. RouteLogFile - A new log file will be created each month in the above example the files will be of the form routes_199810.log The YYYYMM format is the only date string possible in the current version The logs contain time stamped route changes. RouterList - Optional IP address to router name translation in /etc/hosts format (IP_address router_name). Supplying this list will provide considerably more meaningful alarm messages, especially if the router names contain geographical information. Without this list, or DNS records, the extended alarm is just a list of interface IP addresses. DNS results take precedence over this list. Traceroute - Overrides the default of /usr/sbin/traceroute TracerouteOptions - Supply additional options to traceroute. -I tells traceroute to use ICMP rather than UDP on some systems. Note that -n is always supplied so that no DNS lookups are performed. StateDir - Overrides the default path of the mon environment variable MON_STATEDIR. Files named F contain the last observed route. EquivIP - A space separated list of IP addresses that should be considered equivalent for the purposes of determining route changes. Likely used where there are secondary addresses on router or switch interfaces. StopAt - A single IP address followed by an optional comment. The traceroute will be terminated when this address is seen. This allows a route check to a system on another network, such as the Internet, without tracking the route on a network that you do not control. A common use would be to put your firewall address in a StopAt directive. There can be multiple StopAt lines. UnexpectedHop - See the description of the -u option above. The presence of an UnexpectedHop in a path will trigger an alarm regardless of whether the -u option is set, or not. AlertMessage - A message that is appended to the alert details starts on the next line. $HOSTNAME, if present in the message, is converted to the name of the machine on which the monitor is running. EndAlertMessage - The AlertMessage can be terminated by this directive, but it is not required if the message is at the end of the configuration file. =head1 BUGS There probably are some. =head1 AUTHOR Jon Meek, meekj@ieee.org =head1 SEE ALSO F - A CGI script to display route change information. =cut use Getopt::Long; use Sys::Hostname; use POSIX qw(:signal_h WNOHANG strftime); use Socket; #getopts ("vdt:l:c:"); GetOptions ('v' => \$opt_v, 'd' => \$opt_d, 'debug' => \$opt_d, 't=i' => \$opt_t, 'l=s' => \$opt_l, 'c=s' => \$opt_c, 'u' => \$FailOnUnexpectedHopOnly, 'unexpectedhoponly' => \$FailOnUnexpectedHopOnly, ); # -l file Log file name with optional YYYYMM part that will be transformed to current month $TimeOut = $opt_t || 20; # Set default timeout in seconds # Usual Linux config $TRACEROUTE = '/bin/traceroute'; #$STATE_DIR = '/usr/local/mon/state.d'; if (defined $ENV{MON_STATEDIR}) { # Are we running under mon? $STATE_DIR = $ENV{MON_STATEDIR}; $RunningUnderMon = 1; } else { $RunningUnderMon = 0; } if ($opt_c) { # Read configuration file $ConfigFile = $opt_c; if (open(C, $ConfigFile)) { while ($in = ) { last if ($in =~ /^Exit/i); next if ($in =~ /^\#/); # Comments chomp $in; if ($in =~ /^RouteLogFile\s+/i) { ($tag, $LogFile) = split(' ', $in, 2); next; } if ($in =~ /^Traceroute\s+/i) { # Need whitespace to distinguish this option ($tag, $TRACEROUTE) = split(' ', $in, 2); next; } if ($in =~ /^TracerouteOptions\s+/i) { ($tag, $TracerouteOptions) = split(' ', $in, 2); next; } if ($in =~ /^RouterList\s+/i) { ($tag, $RouterListFile) = split(' ', $in, 2); next; } if ($in =~ /^StateDir\s+/i) { # If the mon environment variable needs to be overriden ($tag, $STATE_DIR) = split(' ', $in, 2); next; } if ($in =~ /^EquivIP\s+/i) { ($tag, $ips) = split(' ', $in, 2); (@ip_list) = split(' ', $ips); # $ip_string = " $ips "; # Each IP is surrounded by whitespace foreach $ip (@ip_list) { $EquivIP{$ip} = [ @ip_list ]; } next; } if ($in =~ /^StopAt\s+/i) { ($tag, $stop_addr, $stop_comment) = split(' ', $in, 3); $StopAddress{$stop_addr}++; $StopComment{$stop_addr} = $stop_comment; next; } if ($in =~ /^UnexpectedHop\s+/i) { ($tag, $ip, $name) = split(' ', $in, 3); print "UnexpectedHop $ip, $name
\n" if $opt_d; $UnexpectedHop{$ip}++; $UnexpectedHopName{$ip} = $name; } if ($in =~ /^AlertMessage/i) { # Extra text to add to an alert starts on next line while ($in = ) { last if ($in =~ /^EndAlertMessage/i); $AlertMessage .= $in; } $MonitorHostname = hostname; $AlertMessage =~ s/\$HOSTNAME/$MonitorHostname/; } } } else { print "traceroute.monitor: Couldn't open $ConfigFile configuration file\n"; exit 1; } } if ($opt_l) { # Command line overrides config file $LogFile = $opt_l; } if ((defined $RouterListFile) && $opt_v) { # Read the router names now open(F, $RouterListFile); while ($in = ) { chomp $in; ($ip, $name) = split(' ', $in, 2); $RouterByIP{$ip} = $name; } close F; } @Failures = (); @Hosts = @ARGV; # Host names are left on the command line after Getopt if ($TestOnly) { foreach $h (@Hosts) { $name = &HopName($h); print "Host: $h $name\n"; if (defined $EquivIP{$h}) { print " Has equivalent IP\n"; } } $ip1 = $Hosts[0]; $ip2 = $Hosts[1]; $equiv_check = grep /^$ip2$/, @{ $EquivIP{$ip1} }; print "$ip1 $ip2 $equiv_check\n"; @equiv_arr = grep /^$ip2$/, @{ $EquivIP{$ip1} }; print "$ip1 $ip2 @equiv_arr\n"; foreach $ip (@equiv_arr) { print " $ip\n"; } exit; } # # Reap children to avoid defunct processes / zombies # See "Network Programming with Perl" by Lincoln Stein # sub Reaper { while ((my $child_pid = waitpid(-1, WNOHANG)) > 0) { print "Reaped child: $child_pid\n" if $opt_d; } } sub OtherSIGs { print "traceroute.monitor Exiting on Signal\n"; exit 1; } $SIG{CHLD} = \&Reaper; $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = \&OtherSIGs; # # Run traceroute for each destination, collect route # foreach $TargetHost (@Hosts) { $TimeOfDay = time; $FmtTimeOfDay = strftime("%A %d-%b-%Y %H:%M:%S %Z", localtime($TimeOfDay)); @HopList = (); # Initialize hop list for this traceroute to $TargetHost $cmd = qq{$TRACEROUTE -n $TracerouteOptions $TargetHost 2>/dev/null |}; print "Options: ->$TracerouteOptions<-\nCommand: $cmd\n" if $opt_d; eval { $SIG{ALRM} = sub {die "timeout" }; print "Setting timeout to $TimeOut s\n" if $opt_d; alarm($TimeOut); eval { # discard STDERR data from traceroute $pid = open(TR, $cmd) || die "Couldn't run traceroute\n"; print "$FmtTimeOfDay Traceroute to $TargetHost pid: $pid\n" if $opt_d; while ($in = ) { print $in if $opt_d; if ($in =~ /\*\s+\*\s+\*/) { # Get * * * then give up push(@HopList, '*'); # Indicate that the traceroute did not complete kill 13, $pid; # 13 = PIPE, prevents Broken Pipe Error, at least on Solaris last; } # We will only pick up the first IP address listed on a line for now # Get IP address into $1 $in =~ /\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+/; $ThisHopIP = $1; push(@HopList, $ThisHopIP); # Build route hop list if ($UnexpectedHop{$ThisHopIP}) { $UnexpectedHopSeen{$TargetHost}++; } if ($opt_v) { chomp $in; $name = &HopName($h); print "$in $name $RouterByIP{$ThisHopIP}\n"; } if (exists $StopAddress{$ThisHopIP}) { print "Stopping at $ThisHopIP $StopComment{$ThisHopIP}\n\n" if $opt_v; kill 'TERM', $pid; # Terminate the traceroute alarm(0); return; # May be correct way to leave eval, instead of last } } alarm(0); }; alarm(0); }; if ($@) { # Check for SIG if ($@ =~ /timeout/) { # It was a traceroute timeout print "Traceroute timeout\n" if $opt_d; push(@HopList, '*'); # Indicate that the traceroute did not complete kill 13, $pid; # 13 = PIPE, prevents Broken Pipe Error, at least on Solaris } else { print "Exiting due to some other alarm\n" if $opt_d; die; # Some other problem } } close TR; $route =~ s/\-$//; # Remove trailing '-' from route string $previous_hop = ''; $route = ''; foreach $h (@HopList) { $route .= "$h-" unless ($h eq $previous_hop); $previous_hop = $h; } $route =~ s/\-$//; # Remove trailing '-' from route string $ResultString{$TargetHost} = "$TimeOfDay $TargetHost $route"; if ($opt_d) { print "$TargetHost: $ResultString{$TargetHost}\n"; print " $route\n"; } } $FmtTimeOfDay = strftime("%A %d-%b-%Y %H:%M:%S %Z", localtime(time)); print "$FmtTimeOfDay finish $TargetHost pid: $pid\n\n" if $opt_d; # # Compare just measured routes with previous route stored in state file # or just make the state file if this is the first time for a destination # # TODO: if new destination (no state file), then log route to log file # add IP to name translation for mail messages foreach $k (sort keys %ResultString) { print "$ResultString{$k}\n" if $opt_d; $state_file = "$STATE_DIR/lastroute.$k"; if (-e $state_file) { # We have checked this route before, compare current ($t2, $host2, $current_route) = split(' ', $ResultString{$k}); open(S, $state_file) || warn "Can't open $state_file for reading\n"; $in = ; chomp $in; ($t1, $host1, $prev_route) = split(' ', $in); close S; if ($opt_d) { print "Previous route for $host1 -$prev_route-\n"; print "Current route for $host2 -$current_route-\n"; } $ThisTargetFailed = 0; $ThisTargetFailed++ if $UnexpectedHopSeen{$k}; if (&RouteChanged($current_route, $prev_route)) { # Route changed, record, alarm if not using only unexpected hops $ThisTargetFailed++ unless $FailOnUnexpectedHopOnly; $LogString{$k} = $ResultString{$k}; # Use separate string for logging route changes so that we don't always log unexpected hops if ($RunningUnderMon) { # Write results open(S, ">$state_file") || warn "Can't open $state_file for writing\n"; print S "$ResultString{$k}\n"; close S; } } if ($ThisTargetFailed) { push (@Failures, $k); print " Alarm\n" if $opt_d; } } else { # The state file does not yet exist, so make it if ($RunningUnderMon) { # Write results open(S, ">$state_file") || warn "Can't open $state_file for writing\n"; print S "$ResultString{$k}\n"; close S; } # $LogString{$k} = $ResultString{$k}; # Always log the first instance push (@Failures, $k); # Call it a failure so it will be logged and notification will be sent print " New route added to check: $k\n" if $opt_d; } } # # Write results to logfile # if ($LogFile) { ($sec,$min,$hour,$mday,$Month,$Year,$wday,$yday,$isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month if (-e $LogFile) { # Check for existing log file $NewLogFile = 0; } else { $NewLogFile = 1; } if ($NewLogFile || (@Failures > 0)) { # Only log if new log file, or if route changes open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; if ($NewLogFile) { # New log file, record all routes being tested foreach $host (sort keys %ResultString) { print LOG "$ResultString{$host}\n"; } } if (($NewLogFile == 0) && (@Failures > 0)) { # Just record changes, not repeated unexpected hops foreach $host (sort keys %LogString) { print LOG "$LogString{$host}\n"; } } close LOG; } } if (@Failures == 0) { # Exit if there were no failures exit 0; } # Note that we might have already read this file if --v was specified # if (defined $RouterListFile) { # Read the router names if we have a failure open(F, $RouterListFile); while ($in = ) { chomp $in; ($ip, $name) = split(' ', $in, 2); $RouterByIP{$ip} = $name; } close F; } @SortedFailures = sort @Failures; # To make summary mode in mon happy print "@SortedFailures\n"; foreach $host (@SortedFailures) { print "$host:\n"; ($t, $target, $rest) = split(' ', $ResultString{$host}); (@hop_ips) = split(/\-/, $rest); foreach $hop_ip (@hop_ips) { $name = &HopName($hop_ip); printf " %-15s %s", $hop_ip, $name; print " Unexpected Hop: $UnexpectedHopName{$hop_ip}" if ($UnexpectedHop{$hop_ip}); print "\n"; } print "\n"; } if (defined $AlertMessage) { print "\n$AlertMessage\n"; } exit 1; sub RouteChanged { my ($current_route, $prev_route) = @_; my(@current_ips, @prev_ips); if ($current_route eq $prev_route) { # Simple case, same string, no change return 0; } (@current_ips) = split(/\-/, $current_route); (@prev_ips) = split(/\-/, $prev_route); if ($#current_ips != $#prev_ips) { # Another simple case, different number of hops return 1; # Fail } for ($i = 0; $i <= $#current_ips; $i++) { $ip1 = $current_ips[$i]; $ip2 = $prev_ips[$i]; next if ($ip1 eq $ip2); $equiv_check = grep /^$ip2$/, @{ $EquivIP{$ip1} }; if ($equiv_check == 0) { # Not same, or equivalent, route different, fail return 1; } print "$i $ip1 $ip2 $equiv_check\n" if $opt_d; } return 0; # Good, no route change } sub HopName { my ($h_ip) = @_; my ($lookupname, $ha); # print "HopName: $h_ip
\n" if $opt_d; if ($h_ip =~ /^\*/) { $name = 'Not reached'; return $name; } if (exists $HopNameCache{$h_ip}) { # Already have the name $name = $HopNameCache{$h_ip}; return $name; } $name = ''; $ha = inet_aton($h_ip); $lookupname = gethostbyaddr($ha, &AF_INET); if (length($lookupname) > 0) { # Attempt lookup of address via "normal" system methods (DNS, hosts, etc) $name = $lookupname; } elsif ($UseRouterList) { $name = $RouterByIP{$h_ip} if (exists $RouterByIP{$h_ip}); } # Have an auxilliary name? such as "London Internet Router", prepend it # $name = qq{$RouteHopName{$h_ip} $name} if (exists $RouteHopName{$h_ip}); $HopNameCache{$h_ip} = $name; # print "$h_ip -- $name
\n" if $opt_d; return $name; } mon-1.2.0/mon.spec0000644003616100016640000002001510616427404013656 0ustar trockijtrockij# # spec file for package mon (Version 1.0.0pre3) # # Copyright (c) 2004 SUSE LINUX AG, Nuernberg, Germany. # This file and all modifications and additions to the pristine # package are under the same license as the package itself. # # Please submit bugfixes or comments via http://www.suse.de/feedback/ # BuildRequires: bash bzip2 cpio cpp diffutils file filesystem findutils grep groff gzip info m4 make man patch sed tar texinfo autoconf automake binutils gcc libtool perl rpm Name: mon Version: 1.0.0pre4jt1 Release: 2 Summary: The mon network monitoring system License: GPL Group: System/Monitoring URL: http://www.kernel.org/software/mon/ Source: http://www.kernel.org/pub/software/admin/mon/%{name}-%{version}.tar.bz2 Source1: http://www.kernel.org/pub/software/admin/mon/mon-client-%{version}.tar.bz2 Requires: perl Requires: perl(Time::Period) Requires: perl-Convert-BER Requires: fping Requires: perl-libwww-perl BuildRoot: %{_tmppath}/%{name}-%{version}-build %define filelist %{name}-%{version}-filelist %description "mon" is a tool for monitoring the availability of services. Services may be network-related, environmental conditions, or nearly anything that can be tested with software. It is extremely useful for system administrators, but not limited to use by them. It was designed to be a general-purpose problem alerting system, separating the tasks of testing services for availability and sending alerts when things fail. To achieve this, "mon" is implemented as a scheduler which runs the programs which do the testing, and triggering alert programs when these scripts detect failure. None of the actual service testing or reporting is actually handled by "mon". These functions are handled by auxillary programs. Authors: -------- Jim Trocki %prep ################################################################### %setup -q %setup -T -D -a 1 ################################################################### %build cd mon.d make cd ../mon-client-%{version} %{__perl} Makefile.PL `%{__perl} -MExtUtils::MakeMaker -e ' print qq|PREFIX=%{buildroot}%{_prefix}| if \$ExtUtils::MakeMaker::VERSION =~ /5\.9[1-6]|6\.0[0-5]/ '` %{__make} ################################################################### %install rm -rf %{buildroot} mkdir -p %{buildroot}/%{_libdir}/mon/alert.d mkdir -p %{buildroot}/%{_sbindir} mkdir -p %{buildroot}/%{_mandir}/man1 mkdir -p %{buildroot}/%{_libdir}/mon/mon.d mkdir -p %{buildroot}/%{_localstatedir}/lib/mon mkdir -p %{buildroot}/%{_libdir}/mon/utils mkdir -p %{buildroot}/%{_sysconfdir}/mon mkdir -p %{buildroot}/%{_sysconfdir}/init.d mkdir -p %{buildroot}/%{_sysconfdir}/logrotate.d mkdir -p ./examples cp mon %{buildroot}/%{_sbindir}/ cp -a ./alert.d/ %{buildroot}/%{_libdir}/mon/ cp ./clients/moncmd %{buildroot}/%{_sbindir}/moncmd cp ./clients/monshow %{buildroot}/%{_sbindir}/monshow cp -a ./doc/*.1 %{buildroot}/%{_mandir}/man1/ mv ./etc/very-simple.cf %{buildroot}/%{_sysconfdir}/mon/mon.cf mv ./etc/auth.cf %{buildroot}/%{_sysconfdir}/mon mv ./etc/S99mon %{buildroot}/%{_sysconfdir}/init.d/mon cp -a ./etc/* ./examples cp -a ./mon.d/{*.monitor,*.wrap} %{buildroot}/%{_libdir}/mon/mon.d/ cp -a ./utils/ %{buildroot}/%{_libdir}/mon/ mkdir -p %{buildroot}/sbin ln -sf ../etc/init.d/mon %{buildroot}/sbin/rcmon cd mon-client-%{version} && %{makeinstall} `%{__perl} -MExtUtils::MakeMaker -e ' print \$ExtUtils::MakeMaker::VERSION <= 6.05 ? qq|PREFIX=%{buildroot}%{_prefix}| : qq|DESTDIR=%{buildroot}| '` cd .. # clean up after perl module install - remove special files find %{buildroot} -name "perllocal.pod" -o -name ".packlist" -o -name "*.bs" |xargs -i rm -f {} # build filelist echo "%defattr(-,root,root)" > %filelist find %{buildroot} -type f -printf "/%%P\n" | grep -v "man/man" >> %filelist [ -z %filelist ] && { echo "ERROR: EMPTY FILE LIST" exit -1 } ################################################################### %files -f %filelist %doc %{_mandir}/man1/moncmd.1* %doc %{_mandir}/man1/monshow.1* %doc %{_mandir}/man3/Mon::* %doc CHANGES COPYING COPYRIGHT CREDITS INSTALL KNOWN-PROBLEMS README %doc TODO VERSION mon.lsm %doc ./doc/README.* %doc ./doc/globals %doc ./examples ################################################################### %clean if [ -z "${RPM_BUILD_ROOT}" -a "${RPM_BUILD_ROOT}" != "/" ] then rm -rf $RPM_BUILD_ROOT fi rm -rf $RPM_BUILD_ROOT ################################################################### %preun if [ -r %{_localstatedir}/run/mon.pid ]; then /etc/init.d/mon stop fi ################################################################### %post if [ -d %{_localstatedir}/log -a ! -f %{_localstatedir}/log/mon_history.log ]; then touch %{_localstatedir}/log/mon_history.log fi if [ $1 = 1 ]; then /sbin/chkconfig --add mon fi ################################################################### %postun if [ "$1" = "0" -a -f %{_localstatedir}/log/mon_history.log ]; then rm -f %{_localstatedir}/log/mon_history.log fi %changelog -n mon * Thu Jul 07 2004 - eric@transmeta.com - update to 1.0.0pre2, remove suse-ness * Mon Mar 01 2004 - hmacht@suse.de - building as nonroot-user * Fri Feb 27 2004 - kukuk@suse.de - Cleanup neededforbuild - fix compiler warnings * Mon Feb 10 2003 - lmb@suse.de - Fixed path to comply with FHS. * Fri Oct 18 2002 - lmb@suse.de - Fix for Bugzilla #21086: init script had a broken path and syntax error. * Tue Aug 20 2002 - lmb@suse.de - Fix for Bugzilla # 17936; PreRequires corrected. * Mon Aug 12 2002 - lmb@suse.de - Perl dependencies updated for Perl 5.8.0 * Fri Jul 26 2002 - lmb@suse.de - Perl dependencies adjusted to comply with SuSE naming scheme * Fri Jul 26 2002 - lmb@suse.de - Adapted from Conectiva to UnitedLinux - init script cleanup * Wed Jul 24 2002 - Fábio Olivé Leite - Version: mon-0.99.2-1ul - Adapted for United Linux * Sat Jul 20 2002 - Claudio Matsuoka - Version: mon-0.99.2-3cl - updated dependencies on perl modules to lowercase names * Thu May 16 2002 - Fábio Olivé Leite - Version: mon-0.99.2-2cl - Added %%attr to %%{_libdir}/mon/*, so that the helper scripts are executable Closes: #5522 (aparente problema com as permissões) - Changed initscript to use gprintf Closes: #4172 (Internacionalização (?)) * Fri Dec 28 2001 - Ricardo Erbano - Version: mon-0.99.2-1cl - New upstream relase 0.99.2 * Sat Nov 17 2001 - Claudio Matsuoka - Version: mon-0.38.20-6cl - fixed doc permissions * Thu Jun 21 2001 - Eliphas Levy Theodoro - Version: mon-0.38.20-5cl - fixed initscript - /usr/lib/mon -> /usr/sbin (Closes: #3792) - added requires for perl-Convert-BER - added post{,un} scripts to handle logfile mon_history.log * Fri Mar 23 2001 - Luis Claudio R. Gonçalves - Version: mon-0.38.20-4cl - fixed the initscript (it was missing a "-f" switch) * Tue Oct 31 2000 - Arnaldo Carvalho de Melo - %%{_sysconfdir}/mon is part of this package - small cleanups * Thu Sep 28 2000 - Fábio Olivé Leite - Wrong version in the mon-perl dependency... * Thu Sep 21 2000 - Fábio Olivé Leite - Updated to 0.38.20. * Fri Jun 16 2000 - Fábio Olivé Leite - Fixed TIM alert, added history file, added logrotate script * Mon Jun 12 2000 - Fábio Olivé Leite - Added an alert via TIM Celular cellphones * Thu Jun 08 2000 - Fábio Olivé Leite - Made the %%preun nicer * Thu Jun 01 2000 - Fábio Olivé Leite - New spec format * Mon Apr 17 2000 - Fábio Olivé Leite - Added a new monitor (initscript.monitor) * Fri Apr 14 2000 - Fábio Olivé Leite - Added proxy support to http.monitor * Thu Apr 13 2000 - Fábio Olivé Leite - Fixed a small bug in the init script - Added scripts to alert via Mobi pagers and Global Telecom cellphones * Mon Apr 10 2000 - Fábio Olivé Leite - Initial RPM packaging mon-1.2.0/CHANGES0000644003616100016640000014351410631517211013207 0ustar trockijtrockij$Id: CHANGES,v 1.3.2.12 2007/06/06 11:46:17 trockij Exp $ Changes between mon-1.2.0- and mon-1.2.0-release Wed Jun 6 07:45:35 EDT 2007 ----------------------------------------------- -Bunch of fixes from Augie Schwer: Added RPM spec update to do a "chkconfig on" in the post install, and fixed a path bug in S99mon Fix config parsing of unack_summary, added docs for the option in the man page Fix display of ack'd services in mon.cgi Fix -p/-P option problem in mon.monitor Fix nntp.monitor "-f" option Allow snpp.alert to "use strict" -added "-m" to http.monitor to match header/content with regex by Jim Trocki -added hard timeout patch to msql-mysql.monitor by Arkadiusz Miskiewicz -added fix to ftp.monitor to handle multiline reply after quit command by Arkadiusz Miskiewicz -fix to fping.monitor to correct parsing of fping 2.2b1 output. It prints lines like "ICMP Host Unreachable from" to stderr, but to stdout it prints "(host) is unreachable", and the regex was matching both, which is wrong. by Tim Berger -added -u option, "UnexpectedHop", "AlertMessage", to traceroute.monitor by Jon Meek -updated output formatting in ntpdate.monitor by Jon Meek -added local-syslog.monitor by Jon Meek -added --okstring, Cache-Control http header, --debuglog, to http_tppnp.monitor by Jon Meek -updates to smtp3.monitor, --alarmtime, --maxfailtime by Jon Meek -fixed "disable host" behavior. if a host was the only member of a hostgroup, that watch would be disabled, but that host would not be disabled in other hostgroups. the correct behavior is that a host will be disabled individually in all hostgroups, and if it is the only memeber of a hostgroup, the watch associated with that hostgroup will be disabled. enabling the host again undoes all of that. the behavior of disabling the watch is useful so that disabling a host in a single-hosted hostgroup does not inadvertently leave an empty hostgroup, which will generate log warnings when services are run against it. reported by Ed Ravin Changes between mon-1.0.0pre3 and mon-1.0.0pre4 Tue Aug 3 08:02:35 EDT 2004 ----------------------------------------------- -when allow_empty_group is not set and no host arguments to pass to a monitor, the interval wasn't being reset so it would spam the syslog with lots of "no host arguments" messages. this is fixed. -in reset_timer, there was a chance that _timer could get set to a negative value, which is not right. fixed it. -fixed the bug where lots of mon processes could accumulate if the exec of an alert failed. also fixed error handling of failed alerts. -added "show failures only" button to mon.cgi to speed it up. by Ed Ravin -small permissions fix to rpm spec file -added MON_CFBASEDIR variable to monitor and alert environment, which is set to the value of "cfbasedir" in the config file. -removed unfinished snmp trap handling stuff. it doesn't work at all, and it's misleading to people even though the man page says it doesn't work. -added monitor_duration and monitor_running output to opstatus detail in monshow Changes between mon-1.0.0pre1 and mon-1.0.0pre3 Mon Jul 12 09:12:29 EDT 2004 ----------------------------------------------- -changed README to refer to the new, more sensible name for the perl module client, which is mon-client -applied eric's updates to INSTALL and added a mention of monshow and mon.cgi as the web interfaces -added eric's rpm spec file (i removed the patches because they are no longer needed) -added lmb's syslog.monitor (a nifty hack) -added 'alertevery strict' code and docs, updated the README and INSTALL to mention CVS, updated CREDITS -incorporated mon.cgi 1.52 -minor addition to alert behavior explanation in mon.8 -in dialin.monitor.wrap.c, return the exit status of execv (if it fails, that is) -fixed path to perl in file_change.monitor and smtp3.monitor -added some rcs tags to identify the file versions -handle_trap_timeout now calls process_event, and it works fine with alert/upalert/alertevery/etc. as shown by my testing -received traps now reset the trap timeout counter, and fixed some other stuff wrt trap timeouts -added sub process_event and made proc_cleanup and handle_trap use it so that the alert mgmt code is shared rather than in two places. i tested as much of it as i could and all seems to work well now, especially upalert, alertafter, alertevery with traps. -added per-service "_monitor_duration" variable which records how many seconds the previous monitor took to execute. this is available via "list opstatus". if no monitor has executed yet then the value is -1. -added per-service "_monitor_running" variable whose value is 0 or 1 depending on whether the monitor is currently running for that service. -removed gunk from handle_trap regarding the various TRAP_COLDSTART, etc. processing, since most of it was a bad idea anyway, or at least as far as i could tell. traps and their exit values are now processed exactly as monitors are, which simplifies things greatly and adds to more intuitive functionality. this means the "spc" value in a trap is now ignored. -fixed some args processing in call_alert -fixed a bug which would prevent alerts or upalerts from being sent when call alerts is passed the "output" argument whose value is undef -remove usage of parse_line in trap processing (backported from mon 1.1 code) -make esc_str escape spaces in order to be compatible with monperl-1-0-0pre1 -added list of all possible client commands to moncmd -added --community to set the snmp community in reboot.monitor -patch to traceroute.monitor from meekj added StateDir, TracerouteOptions, StopAt config options some bugfixes to config file parsing reap children to avoid defunct processes added timeout alarm -up_rtt.monitor added -r to log individual rtts, better error reporting for tcp and udp check Changes between mon-0.99.3-47 and mon-1.0.0pre1 ----------------------------------------------- Fri Jun 18 10:35:18 EDT 2004 -removed nonsensical unless statement which would conditionally set the op status to STAT_OK. it should be set unconditionally -added "strict" option to alertevery -changed protocol to escape spaces to coincide with the change in Mon::Client Changes between mon-0.99.2 and mon-0.99.3 Fri Jun 11 10:55:27 EDT 2004 -------------------------------------------- -updated lpd.monitor -added "watch" parameter to monshow submitted by Joe Rhett -xedia-ipsec-tunnel.monitor now understands the new OIDs for sysObjectID.0 in the newer versions of the software -fixed exclude_period parsing problem reported by Konstantin 'Kastus' Shchuka and Jeroen Moors -fixed a setlogsock problem reported by Gilles Lamiral added AIX to the systems which require setlogsock -added "clientallow" restriction (trockij renamed it that from serverallow) by Ed Ravin -added monfailures to clients directory, contributed by Ed Ravin -patch to fping.monitor which catches more error messages from fping by Ed Ravin -patch for minor *bsd startup nits by Ed Ravin -patch to msql-mysql.monitor to support the more typical summary/detail output format. by Ed Ravin -patch to phttp.monitor which corrects the uninitialized variable error by Ed Ravin -patch to phttp.monitor to show more detail in regexp failures by Erik Inge Bolsø -patch to imap.monitor to report the usual summary followed by details, and clarify some error messages for a couple of situations by Ed Ravin -adjust for some current fping output (ICMP host unreachable), correct the docs for failure_interval (which is currently listed as a period def rather than a service def) from Debian users, submitted by Roderick Schertler -another patch to fping.monitor to catch ICMP Time Exceeded failure, submitted by John Nelson -MON_DESCRIPTION now supplied to monitors -added "-f" to etc/S99mon -taint fix for perl 5.8 in monshow from Roderick Schertler -added trace.monitor, and alternate route path monitor -changed ftp.monitor to detect no ftp server when socket opened okay. submitted by Dan Kendall -updates to ftp.monitor to show detail of ftp conversation -added irc.alert -added dns-query.monitor -mysql.monitor - fix for deprecation of _ListTables by Aled Treharne -updated smtp.monitor to output detail -added "version => 2" to monitors which use the net-snmp module so that they work with net-snmp 5.0.6 -minor documentation updates -fixed a bug with the CGI invocation of monshow which would yield the error message "premature end of script headers" when you "drilled down". bug reported by Hugh Caley -mail.alert includes the service description in the body -fix for alertafter timer, fix for upalertafter feature sent by Adrian Chung -fix phttp.monitor for RFC compliance, uses \r\n everywhere in its requests. Just \n leads to a "400 Bad Request" on IIS 6.0 in native mode. sent by Erik Inge Bolsø -fixed reboot.monitor and asyncreboot.monitor to handle counter roll-overs -_upalertafterinterval typo fix from Michael Rademacher -fix to phttp.monitor for EINPROGRESS from Erik Inge Bolsø -updated file_change.monitor from Jon Meek -dtlog a bug where blank lines from the dtlog are being output to the client, and the client is interpreting the timestamp as zero. fixed by David Nolan -fixed qpage.alert: Only the first pager gets notified when there is more than one listed for a qpage.alert. The problem is that qpage returns 0 for failure and 1 for success which is backwards from what the alert routine thinks will happen. submitted by -updated nntp.monitor to support authentication submitted by Kai Schaetzl/conactive.com -unbuffered monerrfile, maybe it'll work -fixed trap auth problem, auth.cf parsing bug submitted by danb@level7.ro -updated mon.8 to explain how to set environment variables for each service to be passed to monitors and alerts. also removed the wording that the client handling is iterative (it is not). -updated netappfree.monitor, submitted by Ed Ravin -patch to fix broken upalerts submitted by Daniel Fenert -patch to dns.monitor for added functionality Added -serial_threshold command line argument to allow the zone serials between the master and the slaves by that much, at most. Necessary to avoid spurious errors during zone propagation. High thresholds are typically unnecessary, but when using Dynamic DNS, with zones that update hundreds if not thousands of times an hour, they can be off by quite a bit but still be OK. If propagation completely fails, eventually we'll exceed the threshold. Added a mode for monitoring caching only name servers. Give the -caching_only argument, and then instead of -zone and -master arguments, you specify -query arguments, which are of the form record[:type]. (With A records being the default type) So you might specify '-query myzone.com:MX -query myzone.com:A -query _servicename._udp.myzone.com:SRV' Every server will be queried for each request, and must return a valid response. But the records will NOT be cross checked against each other, as various round-robin DNS situations may cause the different servers to have different data. Fixed some error reporting code to format the output better Changed the script exit value to be the highest count of how many servers failed on a single query. (I.e. if three servers are queried, for 20 records, the highest error code possible is 3, not 20 as it was before) I found all of these changes to be necessary in our environment, and none of them greatly change the original behavior, so I figured they were worth submitting. I would just submit a diff, but a context diff was actually BIGGER then just sending the whole file... submitted by David Nolan -fixed tiny bug in the cmdline operation of monshow which was causing the unexpected "No non-switch args expected" which was reported by -connect STDIN to /dev/null upon daemonization even if monerrfile is specified. -added the "monerrfile" documentation to mon.8 and explained the "all" directive in auth.cf Changes between mon-0.99.1 and mon-0.99.2 Sat Sep 8 10:06:01 EDT 2001 -------------------------------------------- -fping.monitor reports the error when it gets a return value from fping which it doesn't recognize. this could have been the cause of some phantom alerts reported w/empty summary lines. -fixed comments in CHANGELOG -andrew ryan patch to fix checkauth and some monerrfile fixes, theo's fix for alertevery. this fixes the "cannot connect to mon server" problem with mon.cgi. -andrew ryan patch to open/close dtlog for each entry, renamed open_dtlog to init_dtlog -updated KNOWN-PROBLEMS Changes between mon-0.38.21 and mon-0.99.1 Sun Aug 19 15:18:55 EDT 2001 -------------------------------------------- ******DEFAULT BEHAVIOR HAS CHANGED FOR THE FOLLOWING FEATURES************ the following two defaults were changed, since they seem to be unintuitive to most people, based on feedback given on the mailing list. -the old "comp_alerts" is now the default. to get the old behavior, specify "no_comp_alerts" in the period section. -the default is now the old "summary" behavior for alertevery. that means that for successive failures with "alertevery" used to suppress multiple alerts, only the summary line will be used to short-circuit the alert suppression. to get the old behavior, append "no_summary" to the alertevery line. the old "summary" syntax is still permitted to help w/backwards compatibility. ************************************************************************* -cleaned up config parsing a bit -updates to up_rtt.monitor, added traceroute.monitor, smtp3.monitor, http_tpp.monitor, file_change.monitor -fixed problem where upalerts were not sent for ack'd failures -updated the sample etc/auth.cf to give examples of trap authentication -updated man page for mon to include better explanation of auth.cf syntax. -formatting updates to monshow, added "summary-len" option, html fixes -fixed problem where server responded twice with an extra "220 ok" after doing a reload -rewrote fping.monitor to return more verbose output, and to sort the failed hosts on the summary line. this was wreaking havoc with "alertevery", since the order of the failed hosts in the summary might change, even though the same hosts were failing on successive tests. added "-s ms" option which will consider hosts with a response time greater than "ms" milliseconds as failures. added "-a" option to fail only when all hosts fail, and "-T" to call traceroute on each failed host. "-h" lists options. -made nearly all monitors output their summary line (if it is a list of hosts) in sorted order. -updated man page for mon with more detail on the behavior of "alertevery" and "alertevery ... summary" -added xedia-ipsec-tunnel.monitor to monitor site-to-site ipsec tunnels on a Xedia AP450 router. -silkworm.monitor recognizes different brocade OEM'd fcal switches, ignores "absent" sensors, and has a work-around for the braindead behavior of swFCPortAdmStatus to detect offline ports. -fix to msql-mysql.monitor to allow --port to override default port. submitted by Adrian Phillips -stdout and stderr now can be sent to a file by specifying a filename in the variable "monerrfile". submitted by Ed Ravin -updated dns.monitor to output only the failed hosts on the summary line. "test config" fix, new authentication directives "!" and "AUTH_ANY". "AUTH_ANY", check and warnings for hostgroups which are defined but never used more descriptive error when m4 is not found removed second definition of disen_host and load_stat "alertafter timeval" patch, alerts for period will only be called if the service has been in a failure state for more than the length of time desribed by the interval, regardless of the number of failures noticed within that interval. submitted by Andrew Ryan -more verbose error when bind(2) failure tyop fixes to mon.1 updated COPYRIGHT mon.1 is now mon.8, and references to mon.1 changed accordingly update to mon.d/Makefile to use $CFLAGS and $LDFLAGS silence some warnings in rpc.monitor.c add /usr/local/lib to standard search paths for alert.d and mon.d, and updated mon.8 make monshow run under taint mode, fixes view directory to match the docs default server for moncmd and monshow is now localhost http.monitor accepts a 302 status (moved temporarily) fixed --auth in monshow reboot.monitor now uses $MON_STATEDIR as the default state directory, and "reboot.monitor" (not "state") as default state file. FD_CLOEXEC fix update to monshow.1 submitted by Roderick Schertler -fix to pop3.monitor to produce more verbose errors fix to reboot.monitor to add --verbose option submitted by Ed Ravin -qpage.alert accepts "-v" option for verbose smtp.monitor has increased verbosity of failure details submitted by Steve Siirila -re-wrote Steve Siirila's mon.monitor to use Mon::Client and put it in mon.d -patch to do proper syslog handling on openbsd, MON_DEPEND_STATUS env variable passed to monitors submitted by Mark D. Nagel -added "failure_interval" functionality. i actually re-wrote the patch to make it a bit more proper, and renamed the parameter from "alertintervalcheck" to "failure_interval" for clarity. submitted by CHASSERIAU JeanLuc -netappfree.monitor changes Allows the monitor to give more verbose error messages which will handle multiple volumes. Instead of reporting: "1.0GB free on " it will now say: "1.0GB free on :/vol/" Fixes a bug where multiple alerts from a single filer would cause multiple entries in the summary line. Allows the monitor to handle the case where the NetApp MIB isn't available to the script. added na_quota.monitor. trockij made some small changes to it so that it will allow disable and enable to work. submitted by Theo Van Dinter Changes between mon-0.38.20 and mon-0.38.21 Sun Jan 14 11:55:06 PST 2001 -------------------------------------------- -merged in Andrew Ryan's mon.testconfig.patch to enhance error detection and reporting of config file errors. a new client command "test config" loads and parses a new config file w/o committing it, and returns error conditions found. -added foundry-chassis.monitor, detects PSU failures on Foundry chassis devices. -update for up_rtt.monitor and added http_tp.monitor from Jon Meek. -fixed OS detection, patch supplied by Roderick Schertler -tiny patch to freespace.monitor which lets the user specify a min % free, submitted by Christian Lademann -http.monitor now accepts 401 responses as success, a tweak from Tim Small -documentation correction from Chris Snell -added cpqhealth.monitor to which detects PSU/fan/temp problems by querying the Compaq Insight manager agent on Presario systems -save sum and dtl into last_summary and last_detail from traps, bug reported by Jan Krivonoska -patch to correct trap decoding problem, submitted by Ramon Buckland -a trap timeout now clears the value of last_detail -dtlog is now written to upon reception of an "ok" trap -patch from Gilles Lamiral which adds accuracy to scheduler's synchronous operation. this should help keep rrdmon happy. -added silkworm.monitor to test the operational status of Brocade Silkworm FCAL switches. it should detect port, fan, psu, and temperature failures. -fix to http.monitor from Andrew Ryan which prints the HTTP response header even if a timeout was encountered. also fixed another bug w/regards to timeout handling. i applied this fix to the following monitors: up_rtt.monitor http.monitor ftp.monitor http_t.monitor smtp.monitor pop3.monitor nntp.monitor imap.monitor -http.monitor will allow you to supply a user agent string of your own via "-a useragent". also "-o" will omit HTTP headers from properly working hosts (Andrew Ryan ) Changes between mon-0.38.19 and mon-0.38.20 Sat Aug 26 13:29:45 PDT 2000 -------------------------------------------- -updated some docs -http.monitor checks for 401 status code -fixed the buggered 0.38.19 release. damn you, cvs, damn you. Changes between mon-0.38.18 and mon-0.38.19 Sun Aug 20 14:28:23 PDT 2000 ---------------------------------------------- -fixed exclude_hosts (again) and tested and tested and tested and it works -patch from andrew ryan to add checkauth command -included phttp.monitor from Gilles Lamiral -changed some wording in INSTALL -first stage of new config buffering -readhistoricfile now clears out last_alerts before reading it in -added -t TRAPPORT cmdline arg -merged patch from Andrew Ryan to support multiple authtypes, including PAM support. Also fixed a bug when the user is listed in auth.cf but not in the userfile. -updated documentation of mon.1 to include PAM authentication. -removed non-portable sockaddr pack statements from monitors. -CVS has pissed me off to no end with its anomalies, so I did a sensible thing and converted the repository to prcs. prcs seems to be simple, easy to understand, not quirky, and good enough. So, if you notice that the ID version numbers in the sources have changed, this is why. -removed mon.cgi, and replaced it with a README Changes between mon-0.38.17 and mon-0.38.18 Sat Mar 4 11:24:34 PST 2000 ---------------------------------------------- -http.monitor accepts 200 and 302 -monshow changes, mostly detail output -"list opstatus" command shows more data: first_failure failure_duration exclude_hosts interval exclude_period randskew last_alert -fixed exclude_hosts Changes between mon-0.38.16 and mon-0.38.17 Sun Feb 27 20:18:46 PST 2000 ---------------------------------------------- -added "SELF:" for "depend" variable. When the config file is parsed, SELF: expands into "currentwatch:". -fixed some errors in mon.1 -added exclude_hosts -added exclude_period -removed duplicate parsing in read_cf -"list opstatus" will now accept a list of "group,service" pairs if you don't want to list every single group and service. -documented MON_LOGDIR and MON_STATEDIR in mon.1 -changed how args are split in client_command -more enhancements to monshow, esp. config options and "view" support. read the man page for the details. "views" are meant to show a subset of the mon opstatus, and be configurable by the clients. for example, each department can get their own view of the systems and services which they care to monitor instead of seeing the entire list of services monitored by the server. -added protid client command, and store PROT_VERSION as an integer for simple comparison. Changes between mon-0.38.15 and mon-0.38.16 Sun Feb 6 16:45:55 PST 2000 ---------------------------------------------- -monshow now properly displays the "last check" column in seconds, and it also displays the description, and you can click on services to get details. acknlowledged failures are indicated. -rewrote cf-to-hosts to support continuation lines -fixed some documentation -upalerts work with traps now, thanks to Jim Farrell -savestate now produces an error if called w/o arguments -a patch set submitted by Andreas J. Koenig that helps with some of the documentation -silly "list pids" output fixed so that the output doesn't have lines beginning with numbers, which confuses Mon::Client. submitted by David Waitzman -fixed problem with acking non-failed services -config var that allows specification of syslog facility to use -detail about how "use snmp" is parsed. it's now a variable in the config file, and it still doesn't really do anything. -historicfile is re-read upon server reset. -catching a HUP in the I/O event look should no longer produce the "error trying to recv a trap" message in syslog. -new config option "startupalerts_on_reset" -new client command "list dtlog" submitted by Martha H Greenberg Changes between mon-0.38.14 and mon-0.38.15 Sun Nov 14 11:20:23 PST 1999 ---------------------------------------------- -Re-wrote dependency code, and fixed the "no upalerts with dependencies" bug. -list opstatus output now includes a new variable called "depstatus" -Documented the "alertafter" behavior if only 1 argument is supplied. -Fixed a bug in the arg processing of tcp.monitor, submitted by Phillip Pollard . -Disabling hosts which do not exist now produces an error -Giving an invalid disable command now produces an error -Added "list deps" command. -If config file ends in .m4, process it with m4. -monshow now shows --deps -trap.alert uses opstatus found in MON_OPSTATUS or -o, and correctly reports it using "spc=". -fixed problem where ack'ing a non-existent service is not complained about, reported by bem@cmc.net. -"use strict"-ified the server. -monshow now does CGI && command-line, opstatus.cgi is deprecated see etc/example.monshowrc -ldap.monitor now uses Net::LDAP -summary output of successes is saved in _last_summary -client output is hex-escaped, and received traps are un-escaped. install the Mon-0.6 perl client for this to work properly, since it includes the appropriate changes. -renamed "reset" function. This was a BIG booboo and it was causing a core dump once in a while. "reset" is a perl built-in, which I didn't realize :( -tags in traps are unquoted and unescaped in handle_trap. Mon::Client was changed to quote and escape all of them. -added "numalerts" per-period variable and documented it. it controls the number of alerts sent for a failure -added "comp_alerts" per-period variable and documented it. this var stops upalerts from being sent w/o a complementary "down" alert -it is not possible to specify the binding address for server and trap ports. see the man page for details. -fixed some signal handling and terminal input in moncmd. patch provided by djw@metatel.com -patch from doke@conectiv-comm.com to correct error reporting in msql-mysql.monitor -long lines may be continued by trailing them with a backslash. read the man page for more info. -added "alerts_sent" to opstatus output Changes between mon-0.38.13 and mon-0.38.14 Mon Aug 23 10:48:42 PDT 1999 ---------------------------------------------- -Some clarification in INSTALL procedure. -Removed old patch that attempted to fix the "no upalerts with deps" problem. -Added recursion limit for deps, and the "dep_recur_limit" config parameter in the config file. -Some changes to "monitor .* ;;" parsing behavior. -telnet.monitor now uses Net::Telnet, which is more efficient than forking a copy of tcp_scan. -freespace.monitor uses the newly renamed Filesys::DiskSpace, which used to be File::Df. -added asyncreboot.monitor, which uses the UCD SNMP asynchronous API to get the uptime of a bunch of devices in parallel, similar to fping. This requires ucd-snmp-3.6.3 or greater and SNMP-1.8 or greater. -Ditch stderr in fping.monitor, submitted by felicity@kluge.net -ftp.monitor now sends "quit\r\n" -Dependency bug fixed re: $dlastChecked, reported by felicity@kluge.net -Commented out some spurious output in dns.monitor, as submitted by brad@shub-internet.org -Tiny fix to mon.cgi from Matthew Price -Fix to trap.alert to make it actually work w/o complaining about "undefined type". -Fix to opstatus.cgi for refresh, submitted by howie@thingy.com, bug ID 16. -Patch from Petter Reinholdtsen to add debug output to nntp.monitor, and -g to specify the newsgroup to test. -Re-wrote tcp.monitor to not require tcp_scan. No more dependencies on the "Satan" software, since fping is available separately. -Virtual host support in http.monitor, submitted by Neale Pickett Changes between mon-0.38.12 and mon-0.38.13 Sun Jun 13 11:18:16 PDT 1999 --------------------------------------------- -Monitors and alerts are now passed ENV variables MON_STATEDIR and MON_LOGDIR. -Fixes and tuning to opstatus.cgi. -monstatus has been removed. Replacement is monshow. -util/cf-to-hosts accepts -M flag to pre-process with m4. -Fixed some monshow output when service has not yet been tested. -Some adjustments to the monshow man page. -Forked monitors now close server sockets before execing the monitor. Bug ID 16 submitted by james9394@hotmail.com. -Bug re: "time" file in output of monshow. -Some minor code cleanups. -ping.monitor now recognizes netbsd. -mon.cgi uses Mon::Client, but not all the functionality has been converted to this interface, namely the "disable" and "reset" features. Changes between mon-0.38.11 and mon-0.38.12 --------------------------------------------- -Fixed "list descriptions" bug submitted by Vad Adamluk -Added "last_check" and "monitor" output to client list opstatus. opstatus.cgi uses this. Only works for 0.38.* protocol. -opstatus.cgi now uses Mon::Client, and some bug fixes and enhancements. -Removed "bind" from ftp.monitor http.monitor http_t.monitor imap.monitor nntp.monitor pop3.monitor smtp.monitor. It was unnecessary. Changes between mon-0.38.10 and mon-0.38.11 --------------------------------------------- Another small (but substantial) bug fix in call_alert which would prevent alerts from being called if $args{"args"} was passed as an undefined value. Changes between mon-0.38.9 and mon-0.38.10 --------------------------------------------- -Fixed a bug where call_alert didn't set _last_alert correctly, thus causing things like alertevery to not work properly. -Small bug fix in handle_trap_timeout -Removed some debugging junk for dtlogging -A few code cleanups here and there -Fixed @groupargs problem in call_alert Changes between mon-0.38.8 and mon-0.38.9 --------------------------------------------- -Removed %var% substitution in favor of -M, which pre-processes the config file with m4. Macro expansion should be handled by software whose sole purpose is to perform macro expansion, hence m4. -Added an "example.m4" in the etc/ directory. -Added "fail" trap. -Pass _op_status value to alerts via env variable MON_OPSTATUS. -Updated file.alert to log MON_OPSTATUS. -Fixed bug in client buffer handling where a blank line submitted by the client would prevent all future commands from being processed. -The server no longer disconnects the client on an invalid command. -Added "--disabled" and "--state" commands to monshow. Showing disabled hosts is no longer the default. The defaults can be set in ~/.monshowrc. This requires the latest Perl module (Mon-0.4). Also added "--old" option. -Added man page for monshow. -Updated some docs in mon.1 -Don't complain if userfile does not exist and the authtype is not userfile. -Patched in Gilles' historicfile stuff, and documented it in mon.1, and fixed some bugs. -Alerts are no longer called with -l parameter. It's never been documented, and no alerts use it, so I'm ditching it. -version command returns a value like "0.38.9" rather than a float. -Separated alert calling function from the function which determines if an alert should be called. -Alerts are now forked with a separate environment than the parent. -"test alert|upalert|startupalert" client command added, which will immediately call an alert for testing purposes. Updated the docs for moncmd to reflect this command. Changes between mon-0.38pre7 and mon-0.38.8 --------------------------------------------- -mon is now kept under CVS control (exclusively to maintain my own personal sanity). The Perl module is distributed as a separate file now, so that it can find its home in the CPAN module directory. -Documented "traptimeout" and "trapduration", and cleaned up some docs in mon.1. -Included upalerts and startupalerts in gen_scriptdir_hash() -Lots of code cleanups in read_cf. -alertafter now has two forms, one just like before, and one with a single integer argument which alerts after some number of consecutive failures. -I should have done this long ago. %watch now looks like this: $watch{$group}->{$service} instead of $watch{$group}[$service] and $service is the text of the service, not an integer. -Lots of code cleanups regarding global variables which are altered by read_cf. -Fixed "list successes" and "list failures" command. -Added "clear timers" command which clears the timers for things like alertafter and alertevery and such. -netappfree.monitor has some MIB reading changes which fixes the core dumping problem. -Added set_op_status. -Removed some debug cruft from check_depend. -Fix to $fhandles{"$group/$service"}. -Updated "-h" output to be accurate. -Test -f to see if an alert or monitor exists before trying to exec it. -gilles reported a problem with the servertime output, which was fixed. -"interval" initialization was supplying a default interval, which isn't cool because it didn't allow you to have a service w/o an interval for use as a trap sink. The new default is undef. -I started work on muxpect, which is sort of a combination of the mux capabilities of fping and doing Expect-style chat sequences over TCP sockets. It is meant to replace those millions of TCP-based monitors in the mon.d directory with a less CPU-intensive version. -Some alert decision code moved from proc_cleanup to do_alert where it belongs. -Changed some trap code. Changes between mon-0.38pre6 and mon-0.38pre7 --------------------------------------------- -Added "basedir=" and -b, and "cfbasedir=" and -B -use usleep. -Added startupalerts which are called upon startup. -alerts called with env variable MON_ALERTTYPE -logdir, added downtime logging via dtlogging/dtlogfile -Periods can now be specified using a LABEL: tag (similar to labeling blocks and loops in Perl). This allows multiple periods with the same period value. This feature is useful because the "alertafter" and "alertevery" counters are kept on a per-period basis. -Fixed process.monitor to use the new values for the process table in the UCD MIB. -Fixed a problem with reload and path/file expansion. -Alerts are now called with MON_RETVAL set to the exit value of the monitor. -Added trap.alert. Not quite documented. -Added version command to Mon::Client, thanks to nagel@intelenet.net. Changes between mon-0.38pre5 and mon-0.38pre6 --------------------------------------------- -Some small adjustments to fping.monitor. -SNMP trap reception is now part of the I/O loop. -Began work on handle_snmp_trap, and got rid of SNMP-related junk in handle_trap. -Fixed problem with whitespace and monitors ending in ";;" reported by llee@stevens-tech.edu. -mon now has an officially assigned port number from the IANA. It is 2583, and the appropriate adjustments have been made to the clients. -Fixed sock_write in server to handle EAGAIN condition when kernel socket buffers fill up. -Added dialin.monitor which checks to see if dial-in modem lines are operational. It requires the Perl Expect module. Documentation is in doc/README.monitors. -Added an incomplete na_quota.monitor which is meant to monitor Network Appliance quota trees. Changes between mon-0.38pre4 and mon-0.38pre5 --------------------------------------------- -Fixed bug #3, problem with %alias -Fixed bug #4, problem with unpacking a socket which wasn't really a socket yet (out of order assignments) -Renamed Client to Mon-0.01 to follow the Perl module naming convention better, and to make room for things like logging modules and such. -Implemented more protocol commands to Mon::Client. Only 4 left... -Adjusted nntp.monitor to allow for some protocol / implementation inconsistencies. The commands now strictly follow RFC977. -Fixed problem with 0.38 protocol and Mon::Client. -Added multiple authentication types, including getpwnam, shadow, and userfile. Read the man page for details. -Added "version" client command to identify the protocol version. -Added host && user authentication to traps. Configuration is done in auth.cf. No documentation yet. -Added simple downtime logging, and documented it in mon.1. -Tiny change to reboot.monitor. -Added Mon::SNMP module to decode SNMP traps. -Added pod to Mon::Client. I think it took as long to code it as it did to document it. Changes between mon-0.38pre3 and mon-0.38pre4 --------------------------------------------- -Added fixes from Chris Adams that correct some $ALERTDIR and monitor argument problems. -Fixes to monstatus from brian moore. -Another fix to get the "exit=n" stuff working with alerts again, broken because of ALERTHASH code. -Wrote "monshow" in the clients directory, which is a per-user configurable command-line client. -Mon::Client perl module included to help simplify writing clients. It doesn't implement a number of commands yet. Look at the end of Client.pm to see which commands have been implemented and which have not been. "monshow" is in the clients directory, and it is an example of how to use the Mon::Client module. Mon::Client also needs POD documentation. Changes between mon-0.38pre2 and mon-0.38pre3 --------------------------------------------- -Added "ack" client command, which will acknowledge a service failure and surppress all further alerts for that service while it continues to fail. See the moncmd man page for details. You can "ack" with a string of text. -alertdir and scriptdir can now contain multiple colon-separated paths. This feature is useful for keeping site-specific monitors and alerts in their own directory which is separate from the monitors which are distributed with mon itself. Updated the docs for this. A hash is generated after each time the configuration is read which holds the location of where each monitor and alert script can be found. Errors are reported via syslog, so pay attention to them. -Some "alias" code tweaks. Gilles, does it work??? If no, send the patch. -Poked a little with the trap code. The trap format now contains a "spc" tag which specifies the specific type of trap, like maybe SNMPv1 or SNMPv2 or "mon 0.38". -An update to rpc.monitor to let it build under Solaris. It can now also check to see if an arbitrary RPC program number is registered. Documentation updates. Changes between mon-0.38pre1 and mon-0.38pre2 --------------------------------------------- -Some fixed from brian moore to correct client hangups -netappfree.monitor changes, including --list option to list the filesystems on the filers for help in building a config file. -Trap handling changes, including packet format. More provisions for direct SNMP handling. I might add direct provisions for mon to take SNMP traps directly. UCD SNMP trap handling callback mechanism doesn't fit into mon very well. -"list opstatus" output is now different -Time::HiRes is now required. The trick is that handle_io() wants to spend $SLEEPINT handling I/O from clients. Some OSs allow select(2) to return the time remaining, which we want because if select returned in say, 0.2 seconds then we want to call select with a timeout of 0.8 seconds so that we get the full second of waiting for I/O. Some OSs do *not* return the time remaining from a select call, and time(2) doesn't return sub-second resolution, so we need gettimeofday(2) to figure out how long select spent waiting. I guess the whole point here is to try to handle traps as soon as they come in. -Fixed @last_failures discrepancy with traps. -Added Gilles' alias record stuff to config file -Included Jon Meek's up_rtt.monitor which checks the availability of hosts and logs some statistics, like min/mean/max round trip times. Requires Time::HiRes and Statistics::Descriptive. Changes between mon-0.37l and mon-0.38 --------------------------------------- -Asynchronous trap handling. A remote agent may deliver a trap to trigger some action to be performed by a centralized mon server. -Client I/O entirely re-written to support multiple simultaneous non-blocking clients. -New client commands: test, set opstatus, list descriptions -Descriptions are now allowed in service definitions -Added Gilles' my-mon.cgi web interface. -Added Jing Tan's dependency code -When a service comes back up, it resets _first_failure so that alertevery does the right thing. -When handling a "term" from the client, kill -15 children instead of -9. -A fix from brian moore which corrected the client timeouts. -Added "servertime" client command. -Fixed moncmd to be more batch-friendly. -Some security patches to mon.cgi from Roderick Schertler, including and changes to mon and the documentation for Debian compatability. -Added "reload auth" command, which reloads the authentication table. -Added per-service environment variable passing to monitor and alert scripts. -Fixed '"no summary" with upalerts' problem reported by Eric Buda . The output of successful monitors could be lost under certain circumstances. -Fixed a small problem with upalerts reported by Josh Wilmes . Upalerts would be triggered for everything the first time mon is started. -process.monitor may optionally not load the MIBs upon startup. -"-A" option would not make itself relative to the directory that mon was started from. -netpage.alert not calls sendmail rather than "mail -s". Another fix from Josh Wilmes. -A trivial tweak to nntp.monitor. -Fixed problem with hostgroups named with periods reported by several people. This would cause a monitor process to not ever get cleaned up. -Changed how load_auth handles errors -fping.monitor adds a newline (right after removing it with tr :) -changed the debug behavior to allow multiple debug levels Changes between mon-0.37k and mon-0.37l --------------------------------------- -Config parser change from Michael Griffith that complains when "alertafter" will never trigger an alert. -Added "savestate" and "loadstate". Currently these only save and load the state of things disabled. -The server now can authenticate clients using a simple configuration file which can restrict certain users to using only some (or all) commands. "moncmd" was updated to support this feature. -Addition of "upalerts" which may be called when a service changes state from failure to success. "upalerts" can be controlled by the "upalertafter" parameter. -"alertevery" now ignores detailed output when it decides whether or not to send an alert. Patch submitted by brian moore . -"hostgroup and hyphen" patch. This simple patch will allow hyphens and periods in hostgroup tags. -Multiline output fixes in smtp.monitor -Now monitors are not called when no host arguments are supplied. This can be overridden with the per-service "allow_empty_group" option. -A fix to ftp.monitor by Tiago Severina which allows for multiple 220 lines in the greeting from the FTP server. -Added snpp.alert, contributed by Mike Dorman . This requires the SNPP Perl module. -Added ldap.monitor, contributed by David Eckelkamp . This requires the Net::LDAPapi module. -Added dns.monitor, contributed by David Eckelkamp . This requires the Net::DNS module. -Monitor definitions can now include shell-like quoted words, as defined by the Text::ParseWords module (included with perl5). e.g.: monitor something.monitor -f "this is an argument" -a arg -"allow_empty_group" is a new per-service option. If set, monitors will still be run even if all hosts in the applicable hostgroup have been disabled. The default is that allow_empty_group is not set. -Monitors are now forked with stdin connected to /dev/null. -Added "stop" and "start" commands which let make the server cease from scheduling any monitors. While stopped, clients can still be handled. The server may be started[sic] in "stopped" mode with -S. There is now a "reset stopped", which is an atomic version of "reset" and "stop". This is useful if you want to re-disable things immediately after a reset, so there will be no race conditions after the reset and before you disable things. opstatus.cgi now also reports the state of the scheduler. -Updated documentation for monitors, the main "mon" manual, and the "moncmd" manual. -Fixed a few problems in handle_client that had to do with shutting the server down. Changes between mon-0.37j and mon-0.37k --------------------------------------- -ftp.monitor defaults to the SMTP port instead of FTP! Thanks to ryde@tripnet.se for pointing this out :) -alanr@bell-labs.com added "-u" flag to http.monitor so that you can specify the URL to get. -Added hpnp monitor, which uses SNMP to query your HP JetDirect boards in your printers, and warns you when things go awry. For example, if there is a paper jam, mon can send out email telling you exactly that, and it includes in the mail the current readout on the printer's LCD. -Added netappfree.monitor, which uses SNMP to get the free space from Network Appliance filers. Uses a configuration file to set low-watermarks for each filer. -Added process.monitor (thanks to Brian Moore), which queries the UCD SNMP agents to determine if there are errors with particular processes on a machine. This is very useful for monitoring those processes which seem to die off on occasion :) Changes between mon-0.37i and mon-0.37j --------------------------------------- Tue Apr 14 19:22:13 PDT 1998 -Configuration parser now dies when a watch is "accidentally" re-defined. -Added process throttle to prevent a number of forked processes to go beyond a given value. This is a paranoia "safety net" setting. Changes between mon-0.37h and mon-0.37i --------------------------------------- Sun Apr 5 13:59:07 PDT 1998 -Added "randstart" and "randskew" parameters that can help balance out the load from services which are sheduled at the same interval. -Added "exit=range" argument to "alert", which allows triggering alerts based on the exit status of a monitor script. -Added an IMAP monitor, and an SNMP "reboot" monitor -Added http_t.monitor, which times HTTP transactions -Merged in patches supplied by Roderick Schertler - Changes to mon: - Support a pid file. This is necessary for the system's daemon control script (which stops and starts the daemon, plus tells it to reload its configuration) to work. - Treat SIGINT like SIGTERM (for interactive debugging). - Allow a `hostgroup' line in mon.cf which doesn't have any host names (useful for putting each host name on a line by itself). - Add `d' (meaning `days') to the list of suffixes accepted by the interval and alertevery keywords. - Squelch extra blank line output by alerthist and failurehist commands if there are no corresponding history entries. - Bug fix: fork() returns undef, not -1, on error. - Set umask 022, no 0. - Changes to mon.cgi: - Set -T mode. - Allow all local info fields to be blank, and set them that way by default. - Use the same default mon host as the other interfaces. - Use $ENV{SCRIPT_NAME} as the default $url. - Don't hardcode the path to mon, assume it is in the path. - Vet the name passed to the `list group' command. The old code would allow remote users to run arbitrary local commands. - Changes to opstatus.cgi: - Set -T mode. - Correct port, was 32768 should be 32777. - Add missing Content-Type to html_die(). - In monstatus correct the my() line in populate_group(), and add missing $group initialization. - Tweak typesetting in the mon.1 and moncmd.1 man pages. Changes between mon-0.37g and mon-0.37h --------------------------------------- Mon Jan 19 07:22:14 PST 1998 -I didn't merge back in a change to fping.monitor which sorts the output of fping, this causing alerts to go off unnecessarily when fping would return hosts in a different order each time it is run. An alert is send once every "alertevery" interval, unless the output changes. This is where it messed things up. -added GPL header to all source files. Changes between mon-0.37f and mon-0.37g ---------------------------------------- Sat Jan 10 10:40:26 PST 1998 -Fixed memory leak, with the help of Martin Laubach and Ulrich Pfeifer. The Perl 4.004_04 IPC::Open2 routine has a leak in it. -Now includes the SkyTel 2-way pager interface for mon! What a hack, but it works pretty well! -Also includes Art Chan's interactive web interface. It has buttons and graphics and all that other stuff that everyone wants! -Removed the Perl 5.003 Sys::Syslog patches. I don't want to encourage anyone to use an outdated version of Perl, especially since there have been plenty of bug fixes since then. -Server now handles multiple commands per client connection, and opstatus.cgi has been changed to take advantage of this. It's much faster now. Changes between mon-0.37e and mon-0.37f ---------------------------------------- Fri Oct 3 06:14:50 PDT 1997 -Fixed a small typo in "mon.d/freespace.monitor" that would correctly detect a failure condition for low disk space, but the text that it would report was incorrect. -As per Sean Robinson's suggestions, renamed the syslog patches to Perl 5.004 to accurately reflect what versions of Perl they patch. -In "mon.d/http.monitor", fixed problem with what matches as a valid HTTP response. "200 OK" is incorrect, because the text that follows the 200 is undefined in the specs. mon-1.2.0/COPYING0000644003616100016640000004322010061516613013242 0ustar trockijtrockij$Id: COPYING,v 1.1.1.1 2004/06/09 05:18:03 trockij Exp $ GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) 19yy This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19yy name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. mon-1.2.0/TODO0000644003616100016640000003256510640447465012724 0ustar trockijtrockij-implement trap delivery for "redistribute" in the mon server itself as an option. retain the "call script" behavior, but maybe specify internal trap delivery via "redistribute -h hostname [hostname...]". also allow multiple redistribute lines to build a list of scripts to call -deliver traps with acknowledgement via tcp -add protocol commands to dump entire status + configuration in one operation to reduce latency (not so many serialized get/response operations just to get status) -no alerts for n mins -better cookbook of examples, including some pre-fab m4 defines for templates with focus on the ability to quickly configure mon out-of-the-box for the most common setups -period "templates" > like I have to repeat my period definitions all 260 times, one for > each watch. we should have templates in the Mon config file for any > kind of object so it can be reused. so do you mean a way to define a "template" for a period so that you don't need to keep rewriting "wd {Sun-Sat}", or so that it'll use some default period if you don't specify one, or what? i can see this working a bunch of different ways. like this? define period-template xyz period wd {Sun-Sat} alert mail.alert mis@domain.com alert page.alert mis-pagers@domain.com alertevery 1h watch something service something period template(xyz) watch somethingelse service something period template(xyz) # override the 1h alertevery 2h -my recent thoughts on config management are that the parsing should be all modularized, (a keeping the config parsing code in a separate perl module to be reused by other apps), and there should be a way to turn the resulting data structure into xml and importing the same back, not so you can write your config by hand in xml, but so you can use some generic xml editing tool to mess around with the config, to get one type of gui. -the most common things should be easiest to do, regardless of a gui or text file config. that is what makes stuff "easy". however, i don't think more complicated setups lend themselves to guis as much, and in complicated setups you have to invest a lot of time to learn how the tool works, and a fancy gui in that case is less of a payoff. this is for configuration, i mean. fancy guis for reporting and stuff are good, no doubt. -global alert definitions with their own squelches (alertevery, etc.) > also, alarms need to be collated so pagers and cell phones don't get > buried with large numbers of alerts. I have a custom solution that I > wrote for this, but it's a lousy solution since it essentially implements > its own paging system. i could see how it would be good to be able to define some alert destinations *outside* of the period definitions, then refer to them in the period definitions, then you can do "collation" that way. like this: define global-alert xyz mail.alert xyz@lmnop.com alertevery 1h watch service period globalalert xyz <---collated globally watch service period globalalert xyz <---collated globally alert mail.alert pdq@lmnop.com <---not collated that would be quite easy to do and i think very useful. you could apply all the same squelch knobs (alertevery, etc.) to the global ones. ----- (from mon-1.2.0) $Id: TODO,v 1.2.2.1 2007/06/27 11:51:17 trockij Exp $ -add short a "radius howto" to the doc/ directory. -make traps authenticate via the same scheme used to obscure the password in RADIUS packets -descriptions defined in mon.cf should be 'quoted' -document command section and trap section in authfile -finish support for receiving snmp traps -output to client should be buffered and incorporated into the I/O loop. There is the danger that a sock_write to a client will block the server. -finish muxpect -make "chainable" alerts ?? i don't recall who asked for this or how it would work -make alerts nonblocking, and handle them in a similar fashion to monitors. i.e., serialize per-service (or per-period) alerts. -document "clear" client command -Document trap authentication. -Document traps. -Make monitors parallelize their tasks, similar to fping.monitor. This is an important scalability problem. -re-vamp the host disabling. 1) store them in a table with a timeout on each so that they can automatically re-enable themselves so people don't forget to re-enable them manually. 2) don't do the disabling by "commenting" them out of the host groups. We still want them to be tested for failure, but just disable alerts that have to do with the disabled hosts. When a host is commented out, accept a "reason" field that is later accessible so that you can tell why someone disabled the host. -allow checking a service at a particular time of day, maybe using inPeriod. -maybe make a command that will disable an alert for a certain amount of time -make it possible to disable just one of multiple alarms in a service -make a logging facility which forks and execs external logging daemons and writes to them via some ipc such as unix domain socket. mon should be sure that one of each type of these loggers is running at all times. configure the logging either globally or for each service. write both the success and failure status to the log in some "list opstatus" type format. each logger can do as it wishes with the data (e.g. stuff it into rrdtool, mysql, cat it to a file, etc.) # global setting logger = file watch stuff service http logger file -p _LOGDIR_ ... service fping # this will use the global logger setting ... service # this will override the global logger setting logger none ... common options to logger: -d dir path to logging dir -f file name of log file -g, -s group, service ----------- notes on a v2 protocol redesign from trockij - Configuring on a hostgroup scheme works very well. In the beginning, mon was never intended to get this complex(tm), it was intended to be a tool where it was easy to whip up custom monitoring scripts and alert scripts and plug them into a framework which allowed them all to connect to each other, and to have a way to easily build custom clients and report generators as well. - However, per host status is needed now. - This requires changes to both mon itself and also the monitors / alerts. Backward compatibility is important, and KISS is very important to retain the ease at which one can whip up a new monitor or alert or reporting client. - There will be a new protocol for communicating with the monitors / alerts, which will be masked by a Mon::Monitor / Mon::Alert module in Perl. Appropriate shell functions will be provided by the first one who asks. See below for the protocol. - We still want to retain the benefits of the old behaviour, but extend some alert management features, such as the ability to liberate alert definitions from the service periods so they can be used globally. - The server code might be broken up into multiple files (I/O routines, config parser, related parts, etc) - monitors can communicate better with the alerts (see below). For example, the monitor might hint (using "a_mail_list") the mail.alert about where else to send a warning that a user dir goes over quota. (Attention should be paid to privacy that we don't accidentially inform all users that /home/foo/how-i-will-destroy-western-civilization/ is consuming 1GB too much space ;) - Associations: these allow monitors to communicate details about failures back to the server which can be used to specify who to alert. The associations are based on key/value pairs specified in the association config file, and are expanded on the alert command line (or possibly within the alert protocol) if "@assoc-*" is in the configuration. If a host assoc. is needed, an alert spec will look like: alert mail.alert admin@xyz.com @assoc-host There are two association types (possibly more in the future): host associations, and user-defined associations. Host associations use the "assoc-host" specifier, and map one or more username to an individual host. User-defined associations are just that, and begin with the "assoc-u-" specifier. Monitors return associations via the "assoc-*" key in the monitor protocol. Alerts obtain association information either via command-line arguments which were expanded by the server from "@assoc-*" in the config file, or via the "assoc-*" key in the alert protocol. - Metrics are only passed to the mon server for "monitoring" purposes, but can be marked up in such a way that they could be easily piped to a logging utility, one which is not part of the mon process itself. monitors are _encouraged_ to collect and report performance data. "Failures" are basically just a conclusion based upon performance data and it makes no sense to collect the data twice, e.g. if you have mon polling ifInOctets.0 on a system, why should mrtg have to poll on its own. It may be desireable to propose a "unified logging system" which all monitors can easily use, something which is pluggable and extensible - The hostgroup syntax is going to be extended to add per host options. (which will be passed to the monitors / alerts using the new protocol) ns1.teuto.net( fs="/(80%,90%)",mail_list="lmb@teuto.net" ) would be passed as "h_fs=/(80%,90%)" and "h_mail_list="lmb@teuto.net" FLOATING MONITORS A floating monitor is started by mon and remains running for the entire time. If it dies, it is automatically restarted. The server forks off a separate process for fping and communicates with it via some IPC, like a named pipe or a socket or something. The floating monitor sits there waiting for a message from the server that says "start checking now". The server then adds this descriptor to %fhandles and %running and treats it similar to other forked monitors. When the floting monitor is done, it spits its output back to the server and then goes dormant again, awaiting another message from the server. Floating monitors are started when mon starts, and are restarted if mon notices that they go away. This is a way to save on fork() overhead, but to also PROTOCOL The protocol will be simple and ASCII based, in the form of "key=value". Line continuation will be provided by prefixing following lines with a ">". A "\n" on a line by itself indicates the start of a new block. The order of the keys should not be important. The first block will always contain metadata further defining the following blocks. The "version" key is always present. The current protocol version is "1". (In the examples, everything after a "#" is a comment and should be cut out) KEY CONVENTIONS Keys only private to monitors will be prefixed with an "m_". In the same vain, keys private to alerts will be prefixed with a "a_", and additional host option keys specified in the mon.cf file will be prefixed with a "h_" before being passed to monitors/alerts. By convention, flags only pertaining to a specific alert will embed that name in the key name too - ie keys only pertaining to "mail.alert" will start with "a_mail_". The key/values pairs will be passed to all processes for a specific service. "h_" are static between invocations as they come from the mon.cf file. "m_" keys will be preserved between multiple monitor executions. "a_" keys will be passed from the monitor to the alert script. MONITOR PROTOCOL (monitor -> mon) The metadata block is followed by a block describing the overall hostgroup status, followed by a detailled status for each host. The following keys are defined for the blocks: "summary" = contains a one line short summary of the status. "status" = up, fail, ignore "metric_1" = an opaque floating point number which can be referenced for triggering alerts. May try to give an "operational percentage". More than one metric may be returned. (Ping rtt, packet loss, disk space etc) "description" = longer elaborate description of the current status. "host" = hostgroup member to which this status applies. The overall hostgroup status does not include this field. "assoc-host" = host association "assoc-u-*" = user-defined association Here is an example for a hypothetical hostgroup with 2 hosts and the ping service. ### version=1 summary=Still alive. metric_1=50 # Packetloss metric_2=20.23 # rtt times description=1 out of 2 hosts still responding. > Whatever else one might want to say about the status. It is difficult to > come up with a good text here so I will just babble. status=up host=foo.bar.com metric_1=100 metric_2=0 # 100% packet loss make rtt measurements difficult ;) summary=ICMP unreachable from 2.2.2.2 status=fail description=PING 2.2.2.2 (2.2.2.2): 56 data bytes > >--- 2.2.2.2 ping statistics --- >23 packets transmitted, 0 packets received, 100% packet loss metric_1=0 metric_2=52.1 summary=ICMP echo reply received ok status=up description=64 bytes from 212.8.197.2: icmp_seq=0 ttl=60 time=110.0 ms >64 bytes from 212.8.197.2: icmp_seq=1 ttl=60 time=32.3 ms >64 bytes from 212.8.197.2: icmp_seq=2 ttl=60 time=32.8 ms >64 bytes from 212.8.197.2: icmp_seq=3 ttl=60 time=33.4 ms > >--- ns1.teuto.net ping statistics --- >4 packets transmitted, 4 packets received, 0% packet loss >round-trip min/avg/max = 32.3/52.1/110.0 ms host=baz.bar.com ###### Points still open: - mon -> monitor communication - mon <-> alert communication - the new trap protocol - muxpect - a unified logging proposal mon-1.2.0/VERSION0000644003616100016640000000043710146140374013263 0ustar trockijtrockij$Name: mon-1-2-0-release $ $Log: VERSION,v $ Revision 1.2 2004/11/15 14:45:16 vitroth Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 Revision 1.1.1.1.2.1 2004/06/12 18:17:57 trockij added some rcs tags to identify the file versions mon-1.2.0/muxpect/0000755003616100016640000000000010640450347013677 5ustar trockijtrockijmon-1.2.0/muxpect/muxpect.h0000644003616100016640000000400610061516614015532 0ustar trockijtrockij/* * $Id: muxpect.h,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ */ /* * Copyright (C) 1998 Jim Trocki * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifndef _MUXPECT_H #define _MUXPECT_H struct chat_sess { char expect[BUFSIZ]; char send[BUFSIZ]; struct chat_sess *next; }; typedef struct chat_sess chat_sess_t; struct muxconn { int fd; char hostname[80]; struct sockaddr_in saddr; unsigned short port; time_t timeout; char buf[8192]; int buf_offset; chat_sess_t *ch_sess; regex_t re_preg; char re_errbuf[100]; int status; char summ[80]; char detail[8192]; struct muxconn *next; }; typedef struct muxconn muxconn_t; /* status values */ #define MUXCONN_FAILURE 0 /* chat failure */ #define MUXCONN_SUCCESS 1 /* chat success */ #define MUXCONN_TIMEOUT 2 /* chat timeout */ #define MUXCONN_INPROGRESS 3 /* in progress (not complete) */ #define MUXCONN_SETUP 4 /* setup state */ int match_buffer (regex_t *, char *, char *, char *, int); struct chat_sess *read_expect (char *file); void dump_chat_sess (chat_sess_t *); void usage (void); muxconn_t * setup_muxconn_struct (char **, int, chat_sess_t *, unsigned short port, char *, int); int setup_connect (muxconn_t *, char *, int); int io_loop (muxconn_t *, int, char *, int); #endif /* _MUXPECT_H */ mon-1.2.0/muxpect/io.c0000644003616100016640000000666210061516614014461 0ustar trockijtrockij/* * $Id: io.c,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ */ /* * Copyright (C) 1998 Jim Trocki * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "muxpect.h" int setup_connect (muxconn_t *muxlist, char *muxerr, int muxerrsiz) { muxconn_t *c; int s, fl; int err; struct protoent *pe; c = muxlist; err = 0; if ((pe = getprotobyname ("tcp")) == NULL) { snprintf (muxerr, muxerrsiz, "could not look up tcp proto"); return 0; } while (c != NULL) { s = socket (AF_INET, SOCK_STREAM, pe->p_proto); if (s < 0) { snprintf (muxerr, muxerrsiz, "could not create socket: %s", sys_errlist[errno]); return 0; } /* * set nonblocking */ c->fd = s; if ( (fl = fcntl (s, F_GETFL, 0)) < 0) { snprintf (muxerr, muxerrsiz, "could not get flags: %s", sys_errlist[errno]); return 0; } fl |= O_NONBLOCK; if ( (fl = fcntl (s, F_SETFL, fl)) < 0) { snprintf (muxerr, muxerrsiz, "could not set flags: %s", sys_errlist[errno]); return 0; } /* * connect */ c->saddr.sin_port = htons (c->port); if (connect (s, &c->saddr, sizeof (c->saddr)) == -1 && errno != EINPROGRESS) { c->status = MUXCONN_FAILURE; snprintf (c->detail, sizeof (c->detail), "could not connect: %s", sys_errlist[errno]); snprintf (c->summ, sizeof (c->summ), "%s", c->hostname); } else { c->status = MUXCONN_INPROGRESS; } c = c->next; } return 1; } /* * main IO loop */ int io_loop (muxconn_t *muxlist, int timeout, char *err, int errsiz) { fd_set r_fd, w_fd; struct timeval tval, t0, t1; muxconn_t *c; int d; FD_ZERO (&r_fd); FD_ZERO (&w_fd); d = 0; c = muxlist; while (c->next != NULL) { if (c->status != MUXCONN_INPROGRESS) { c = c->next; continue; } FD_SET (c->fd, &r_fd); FD_SET (c->fd, &w_fd); if (c->fd > d) d = c->fd; } timerclear (&tval); timerclear (&t0); timerclear (&t1); if (gettimeofday (&t0, NULL) == -1) { snprintf (err, errsiz, "could not gettimeofday: %s", sys_errlist[errno]); return 0; } while (timerisset (&tval)) { #if 0 select (); #endif if (gettimeofday (&t1, NULL) < 0) { snprintf (err, errsiz, "could not gettimeofday: %s", sys_errlist[errno]); return 0; } } c = muxlist; while (c->next != NULL) { if (c->status == MUXCONN_INPROGRESS) c->status = MUXCONN_TIMEOUT; c = c->next; } return 1; } mon-1.2.0/muxpect/README0000644003616100016640000000066510230411543014554 0ustar trockijtrockij$Id: README,v 1.2 2005/04/17 07:42:27 trockij Exp $ muxpect is not yet complete. It is a work-in-progress. The purpose is to have a monitor which can multiplex TCP connections on the same port to multiple destinations and do chat-style interaction on them all in parallel. Given this functionality, many of the TCP-based monitors in mon.d can be replaced by this code, making things much more efficient. Jim Trocki trockij@arctic.org mon-1.2.0/muxpect/Makefile0000644003616100016640000000031510061516614015333 0ustar trockijtrockijCC=gcc CFLAGS=-O2 -Wall -g OBJS = muxpect.o setup.o io.o all: muxpect muxpect.o : muxpect.h setup.o : muxpect.h io.o : muxpect.h muxpect: $(OBJS) $(CC) -o $@ $(OBJS) clean: rm -f *.o muxpect core mon-1.2.0/muxpect/setup.c0000644003616100016640000001023510061516614015201 0ustar trockijtrockij/* * $Id: setup.c,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ */ /* * Copyright (C) 1998 Jim Trocki * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include #include #include #include #include #include #include #include #include #include "muxpect.h" /* * read expect/send file */ struct chat_sess * read_expect (char *file) { FILE *f; char buf[BUFSIZ]; chat_sess_t *first, *cur, *ncur; int send_expect; int lnum; if ((f = fopen (file, "r")) == NULL) return ((chat_sess_t *)NULL); send_expect = 1; first = cur = (chat_sess_t *) calloc (1, sizeof (chat_sess_t)); if (first == NULL) return ((chat_sess_t *) NULL); first->next = (chat_sess_t *) NULL; cur->next = (chat_sess_t *) NULL; lnum = 0; while (fgets (buf, BUFSIZ, f) != NULL) { lnum++; if (strchr (buf, '#') == buf) continue; buf[strlen (buf) -1] = '\0'; if (send_expect) { strncpy (cur->expect, buf, BUFSIZ); send_expect--; } else { strncpy (cur->send, buf, BUFSIZ); send_expect++; ncur = (chat_sess_t *) calloc (1, sizeof (chat_sess_t)); if (ncur == NULL) { fclose (f); return ((chat_sess_t *) NULL); } ncur->next = (chat_sess_t *) NULL; cur->next = ncur; cur = ncur; } } cur->next = (chat_sess_t *) NULL; if (ferror (f)) { fclose (f); return ((chat_sess_t *)NULL); } fclose (f); return (first); } /* * debugging */ void dump_chat_sess (chat_sess_t *c) { chat_sess_t *p; p = c; while (p->next != NULL) { printf ("exp: %s\n", p->expect); printf ("snd: %s\n", p->send); p = p->next; } printf ("done\n"); } /* * help */ void usage (void) { printf ("usage:\n"); printf (" muxpect [-d] [-h] -f file\n"); printf (" -d debug\n"); printf (" -h show this help\n"); printf (" -f file chat script filename\n"); printf ("\n"); exit (0); } /* * create list of muxconns for all hosts */ muxconn_t * setup_muxconn_struct (char **hosts, int ind, chat_sess_t *sess, unsigned short port, char *errbuf, int errbufsiz) { muxconn_t *c, *l, *n; int i; struct hostent *hent; i = ind; l = (muxconn_t *)NULL; c = (muxconn_t *)NULL; while (hosts[i] != NULL) { n = calloc (1, sizeof (muxconn_t)); if (n == NULL) { strncpy (errbuf, "could not alloc memory", errbufsiz); return ((muxconn_t *) NULL); } n->next = (muxconn_t *) NULL; if (c == (muxconn_t *) NULL) c = n; else l->next = n; n->fd = 0; strncpy (n->hostname, hosts[i], sizeof (n->hostname)); n->timeout = 0; n->buf_offset = 0; n->ch_sess = sess; n->status = MUXCONN_INPROGRESS; n->next = (muxconn_t *) NULL; n->port = port; if (hosts[i][0] >= '0' && hosts[i][0] <= '9') { if (inet_aton (hosts[i], &n->saddr.sin_addr) == 0) { snprintf (errbuf, errbufsiz, "invalid IP address supplied, %s", hosts[i]); return ((muxconn_t *) NULL); } } else { if ((hent = gethostbyname (hosts[i])) == NULL) { snprintf (errbuf, errbufsiz, "could not resolve host %s", hosts[i]); return ((muxconn_t *) NULL); } memcpy (&n->saddr.sin_addr, hent->h_addr_list[0], hent->h_length); } l = n; i++; } if (i == ind) { snprintf (errbuf, errbufsiz, "no hosts supplied"); return ((muxconn_t *) NULL); } return (c); } mon-1.2.0/muxpect/muxpect.c0000644003616100016640000000536110061516614015532 0ustar trockijtrockij/* * $Id: muxpect.c,v 1.1.1.1 2004/06/09 05:18:04 trockij Exp $ */ /* * Copyright (C) 1998 Jim Trocki * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include #include #include #include #include #include #include #include #include #include "muxpect.h" int debug; /* * main */ int main (int argc, char **argv) { char fname[200], errbuf[100]; chat_sess_t *sess; int c, help; unsigned short port; help = 0; debug = 0; memset (fname, 0, sizeof (fname)); optind = 0; port = 0; while ((c = getopt (argc, argv, "dhf:p:")) != EOF) { switch (c) { case 'h': help++; break; case 'd': debug++; break; case 'f': strncpy (fname, optarg, sizeof(fname)); break; case 'p': port = (unsigned short) strtol (optarg, (char **)NULL, 10); break; } } if (help) usage(); if (fname[0] == '\0') usage(); /* * read config file */ sess = read_expect (fname); if (sess == NULL) { fprintf (stderr, "could not read\n"); exit (1); } if (debug) dump_chat_sess (sess); if (setup_muxconn_struct (argv, optind, sess, port, errbuf, sizeof(errbuf)) == NULL) { printf ("could not setup sessions: %s\n", errbuf); exit (-1); } exit (0); } /* * do a regex against a buffer, returning true or fals */ int match_buffer (regex_t *preg, char *pat, char *buf, char *errbuf, int errbuflen) { int r; r = regcomp (preg, pat, REG_EXTENDED | REG_ICASE | REG_NOSUB); if (r) { regerror (r, preg, errbuf, errbuflen); fprintf (stderr, "error in regcomp: %s\n", errbuf); exit (1); } r = regexec (preg, buf, 0, 0, 0); if (r == REG_NOMATCH) { fprintf (stderr, "no match\n"); } else if (r != REG_NOERROR) { regerror (r, preg, errbuf, errbuflen); fprintf (stderr, "error in regexec: %s\n", errbuf); return (-1); } printf ("match\n"); return (0); } mon-1.2.0/utils/0000755003616100016640000000000010640450347013352 5ustar trockijtrockijmon-1.2.0/utils/syslog.monitor0000755003616100016640000005045010146140377016312 0ustar trockijtrockij#!/usr/bin/perl -w # # syslog.monitor - monitors incoming syslog packets and reports to mon # # Author: Lars Marowsky-Brée, lars@marowsky-bree.de # # Copyright (C) 1999 Lars Marowsky-Brée # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # ### Nothing to see below this line. ### Abandon hope all ye who enter here ### Here be dragons! ############################################################################# # Me, use modules? no, me never use modules package main; use strict; use Socket; use Net::hostent; use Time::HiRes qw (time alarm sleep gettimeofday); use Mon::Client; use POSIX qw(setsid strftime); # automagically inserted by CVS my $VERSION = '$Id: syslog.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $'; ############################################################################# # Global variables # Map syslog facility numbers to names my %Num2Facility = ( 0 => 'kern', 1 => 'user', 2 => 'mail', 3 => 'daemon', 4 => 'auth', 5 => 'syslog', 6 => 'lpr', 7 => 'news', 8 => 'uucp', 9 => 'cron', 10 => 'authpriv', 11 => 'ftp', 12 => 'reserved-12', 13 => 'reserved-13', 14 => 'reserved-14', 15 => 'reserved-15', 16 => 'local0', 17 => 'local1', 18 => 'local2', 19 => 'local3', 20 => 'local4', 21 => 'local5', 22 => 'local6', 23 => 'local7', ); # Map syslog level numbers to names my %Num2Level = ( 0 => 'emerg', 1 => 'alert', 2 => 'crit', 3 => 'err', 4 => 'warn', 5 => 'notice', 6 => 'info', 7 => 'debug' ); # Contains a list of LogEntry object init params my %Checks = (); # Hash of hostgroup members hostnames, indexed by hostgroup name my %GROUP_MEMBERS; # IP -> hostname resolving my %IP2Host; # IP -> hostgroup resolving my %IP2Group; # array of references to LogEntry objects, indexed by hostname my %ChecksPerHost; # array of references to LogEntry objects per hostgroup my %ChecksPerGroup; # Global Mon::Client object my $mon; # The configuration is read into this hash my %CONF; ############################################################################# # Setup my ($conf_file) = @ARGV; if (!defined($conf_file) || $conf_file eq "") { die "No configuration file given"; } &ReadConf($conf_file); if ($CONF{'daemon_mode'} == 1) { if ($CONF{'logfile'} ne '') { &daemonize; } else { &Log(2,"You can't summon a daemon while talking to the public"); } } # We need some information from the mon server now... &ChatMonServer; # Parse the hosts, resolve them etc &ParseHosts; # Build the cache, precompile the checks &BuildChecks; # Open listener port my $proto = getprotobyname('udp'); socket(SOCKET, Socket::PF_INET, Socket::SOCK_DGRAM, $proto) || die "Could not create listening socket: $!"; bind(SOCKET, scalar Socket::sockaddr_in($CONF{'bind_port'}, Socket::inet_aton($CONF{'bind_ip'}))) || die "Could not bind authentication socket: $!"; # prepare to select my ($whence,$line,$rin,$rout); $rin = ''; vec($rin, fileno(SOCKET), 1) = 1; # At which time we did the last full walk of the chains my $last_full_walk = time; # Msg - contains the currently processed message # LastMsg - contains the last Msg hash, per host my (%LastMsg,%Msg); ############################################################################# LOOP: while (1) { if (!select($rout = $rin, undef, undef, $CONF{'select_timeout'})) { &Log(7,"select timeout"); next LOOP; } # Read the incoming UDP packet if (!($whence = recv(SOCKET, $line, 8192, 0) )) { &Log(3,"recv error: $!"); next LOOP; } # Parse the incoming UDP packet envelope my ($src_port,$src_ip) = sockaddr_in($whence); $src_ip = inet_ntoa($src_ip); chomp($line); &Log(7,"Received syslog message from $src_ip"); # If this IP does not resolve to a hostname, it is bogus if (!defined($IP2Host{$src_ip})) { &Log(3,"Received unauthorized message from $src_ip, ignoring"); next LOOP; } my ($level,$facility,$msg); if ($line =~ /^\<(\d+)\>([^:]+): (.*)$/o) { # Decode the message %Msg = (); $Msg{'src_port'} = $src_port; $Msg{'src_ip'} = $src_ip; $Msg{'host'} = $IP2Host{$src_ip}; $Msg{'level'} = $1 & 7; $Msg{'Level'} = $Num2Level{$1 & 7}; $Msg{'facility'} = $Num2Facility{$1 >> 3}; $Msg{'msg'} = $3; $Msg{'time'} = time; $Msg{'group'} = $IP2Group{$src_ip}; # Log the message if necessary &OwnLog(\%Msg); # Walk through the processing hooks here... my $check; PER_HOST: foreach $check (@{$ChecksPerHost{ $Msg{'host'} }}) { if ($check->check(\%Msg) == 1) { last PER_HOST; } } PER_GROUP: foreach $check (@{$ChecksPerGroup{ $Msg{'group'}}}) { if ($check->check(\%Msg) == 1) { last PER_GROUP; } } # Store message for further reference %{$LastMsg{$src_ip}} = %Msg; } elsif ($line =~ /^last message repeated (\d+) times$/o) { my $count = $1; # Handle repetition - last msg from the host is still available # in %LastMsg{$src_ip} &Log(7,"Last message repeated $count times"); } else { &Log(2,"Unknown input ignored: $line"); } } continue { # Before continuing, always check if the checks need to be run, # so that the low threshold can be triggered if ($last_full_walk - time > $CONF{'full_walk_timeout'}) { &Log(7,"Full walk triggered after $CONF{'full_walk_timeout'} seconds"); my ($check_ary); foreach $check_ary (@ChecksPerHost{keys %ChecksPerHost}, @ChecksPerGroup{keys %ChecksPerGroup} ) { my ($check); foreach $check (@$check_ary) { &Log(7,"Running for ".$check->{'group'}."/".$check->{'host'}); $check->check({'level' => 7, 'Level' => $Num2Level{7}, 'msg' => 'SYSLOG.MONITOR: SELECT TIMEOUT', 'time' => time, }); } # foreach $check } # foreach $check_ary } # if } # continue ############################################################################# sub BuildChecks { &Log(6,"Building check cache, precompiling objects"); # First, build the per-host cache my ($group); foreach $group (keys %{$CONF{'checks-per-host'}}) { if (defined($GROUP_MEMBERS{$group})) { # Build the "per-host" checks my ($host); foreach $host (@{$GROUP_MEMBERS{$group}}) { &Log(6,"Building per host checks for $group/$host"); my ($check); CHECK: foreach $check (@{$CONF{'on-host'}{$group}{$host}},@{$CONF{'checks-per-host'}{$group}}) { if (!defined($Checks{$check})) { &Log(3,"Undefined check $check for $host, ignoring"); next CHECK; } push @{$ChecksPerHost{$host}},LogEntry->new($Checks{$check},$group,$host); } } } else { &Log(3,"Unknown hostgroup $group referenced in config file"); } } # Second, build the per-group cache foreach $group (keys %{$CONF{'checks-per-group'}}) { if (defined($GROUP_MEMBERS{$group})) { &Log(6,"Building per group checks for $group"); my $check; CHECK: foreach $check (@{$CONF{'checks-per-group'}{$group}}) { if (!defined($Checks{$check})) { &Log(3,"Undefined check $check for group $group, ignoring"); next CHECK; } push @{$ChecksPerGroup{$group}},LogEntry->new($Checks{$check},$group,'ALL'); } } else { &Log(3,"undefined group $group, ignoring"); next GROUP; } } &Log(6,"Finished building check cache"); } sub FormatTime { # Prints the time like a proper Cisco my ($time) = @_; return strftime("%b %e %H:%M:%S", localtime($time)) .sprintf(".%03d",($time - int($time)) * 1000 ); } # Log a message if the priority is high enough sub Log { my ($prio,$msg) = @_; if ($prio <= $CONF{'loglevel'}) { my $line = &FormatTime(time). sprintf(": %- 6.6s: %s\n", $Num2Level{$prio},$msg); if ($CONF{'logfile'} ne "") { open(LOG,">>$CONF{'logfile'}") || die "Could not open logfile!"; print LOG $line; close(LOG); } else { print $line; } } } sub OwnLog { # Log the message to the file specified in syslog.conf my ($r) = @_; return if ($CONF{'syslogfile'} eq ""); my $f = $CONF{'syslogfile'}; # Ok, logfile is defined. do the substitutions my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($$r{'time'}); $year += 1900; print $f."\n"; $f =~ s/\%H/$$r{host}/; $f =~ s/\%L/$$r{Level}/; $f =~ s/\%l/$$r{level}/; $f =~ s/\%F/$$r{facility}/; $f =~ s/\%G/$$r{group}/; $f =~ s/\%D/sprintf "%04d-%02d-%02d",$year,$mon,$mday/e; # Make sure everything is still okay $f =~ s/[^A-Za-z0-9\.\-\/]//og; open(F,">>$f"); print F &FormatTime($$r{'time'}).sprintf(" %s %s.%s: %s\n", $$r{'host'}, $$r{'facility'},$$r{'Level'},$$r{'msg'}); close(F); } sub ChatMonServer { # Setup the mon connection $mon = Mon::Client->new( host => $CONF{'mon_host'}, username => $CONF{'mon_user'}, password => $CONF{'mon_pass'}); &Log(6,"Connecting to mon host $CONF{'mon_host'}"); # Retrieve information from the mon server about hostgroups if (!defined ($mon->connect)) { &Log(2,"Could not connect to server: " . $mon->error); die; } my %opstatus; if (!(%opstatus = $mon->list_opstatus)) { &Log(2,"could not get opstatus: " . $mon->error); $mon->disconnect; die; } # We are only interested in hostgroups which have the "syslog" service # defined, and thus are able to process our traps my ($group); foreach $group (keys %opstatus) { if (defined($opstatus{$group}{'syslog'})) { my (@hosts) = $mon->list_group($group); @{$GROUP_MEMBERS{$group}} = @hosts; } } # We don't need the TCP connection anymore from here on. # This might change in the future if Mon::Client ever sends # traps via tcp $mon->disconnect; } # Parse the hostnames, and fill in the %IP2Host / %IP2Group sub ParseHosts { my ($group,$host); &Log(6,"Resolving hostnames and building cache"); foreach $group (keys %GROUP_MEMBERS) { HOST: foreach $host (@{$GROUP_MEMBERS{$group}}) { my $h = gethostbyname($host); if (!defined($h)) { &Log(3,"Failed to resolve $host, ignoring"); next HOST; } if (@{$h->addr_list} > 1 ) { my $addr; for $addr ( @{$h->addr_list} ) { $IP2Host{inet_ntoa($addr)} = $host; $IP2Group{inet_ntoa($addr)} = $group; } } else { $IP2Host{inet_ntoa($h->addr)} = $host; $IP2Group{inet_ntoa($h->addr)} = $group; } } } } # Send a trap to the mon server sub SendTrap { my ($l) = @_; my ($typ,$opstatus,$sum,$dtl); if ($l->{status} == 0) { $opstatus = 'ok'; $sum = $l->{host}.": ".$l->{desc}." ok since " .localtime($l->{status_time}); $dtl = "\nHappened ".scalar(@{$l->{matches}})." within " .$l->{period}."s"; } elsif ($l->{status} == -1) { $opstatus = 'fail'; $sum = $l->{host}.": ".$l->{desc}." occured too seldom since " .localtime($l->{status_time}); $dtl = "\nLast time was " .localtime($l->{last_match}); } elsif ($l->{status} == 1) { $opstatus = 'fail'; $sum = $l->{host}.": ".$l->{desc}." occured too often since " .localtime($l->{status_time}); $dtl = "\nHappened ".scalar(@{$l->{matches}})." within " .$l->{period}."s\n"; # Include copy of the line which triggered the trap $dtl .= ${$l->{'last_matched_msg'}}{'msg'}."\n"; } else { &Log(0,"BUG: Unknown status in SendTrap"); return undef; } &Log(4,"Sending trap: ".$l->{'group'}." $opstatus $sum"); # Send the trap $mon->send_trap( group => $l->{'group'}, service => 'syslog', retval => 1, opstatus => $opstatus, summary => $sum, detail => $dtl) || &Log(2, "trap sending failed: ".$mon->error); } sub ReadConf { my ($conf) = @_; if ($conf !~ /^[a-z0-9\.\-\/]+$/oi) { &Log(1,"Security violation: $conf contains illegal characters"); die; } # Setup defaults %CONF = ( 'select_timeout' => 10, 'full_walk_timeout' => 30, 'bind_ip' => '0.0.0.0', 'bind_port' => 514, 'logfile' => '', 'daemon_mode' => 0, 'syslogfile' => "", ); if (!open(CONF,"<$conf")) { &Log(2,"Failed to open configuration file"); die; } my ($l,$lineno); my $level = 'global'; my ($CHECKNAME,$GROUPNAME); while (defined($l = )) { chomp $l; $l =~ s/^\s*//; $l =~ s/\s*$//; $lineno++; next if $l =~ /^#/; if ($level eq 'global') { if ($l =~ /^full_walk_timeout\s+(.*)$/o) { $CONF{'full_walk_timeout'} = &dhmstos($1); next; } elsif ($l =~ /^select_timeout\s+(.*)$/o) { $CONF{'select_timeout'} = &dhmstos($1); next; } elsif ($l =~ /^loglevel\s+(\d)$/o) { $CONF{'loglevel'} = $1; next; } elsif ($l =~ /^logfile\s+([a-z0-9\.\-\/]*)$/io) { $CONF{'logfile'} = $1; next; } elsif ($l =~ /^syslogfile\s+([\%a-z0-9\.\-\/]+)$/io) { $CONF{'syslogfile'} = $1; next; } elsif ($l =~ /^daemon_mode\s*$/o) { $CONF{'daemon_mode'} = 1; next; } elsif ($l =~ /^bind_ip\s+(\d+\.\d+\.\d+\.\d+)$/o) { $CONF{'bind_ip'} = $1; next; } elsif ($l =~ /^bind_port\s+(\d+)$/o) { $CONF{'bind_port'} = $1; next; } elsif ($l =~ /^mon_host\s+(\S+)$/o) { $CONF{'mon_host'} = $1; next; } elsif ($l =~ /^mon_user\s+(\S+)$/o) { $CONF{'mon_user'} = $1; next; } elsif ($l =~ /^mon_pass\s+(\S+)$/o) { $CONF{'mon_pass'} = $1; next; } elsif ($l =~ /^check\s+(\S+)$/o) { $level = 'check'; $CHECKNAME = lc($1); $Checks{$CHECKNAME} = { 'name' => lc($1), 'period' => 300, 'min' => -1, 'max' => 1, 'final' => 0, 'desc' => 'I was too lazy to write a proper configuration file', }; next; } elsif ($l =~ /^group\s+(.*)$/o) { $level = 'group'; $GROUPNAME = $1; next; } elsif ($l eq "") { next; } ################ END GLOBAL CONFIGURATION FILE OPTIONS } elsif ($level eq 'check') { if ($l =~ /^period\s+(.*)$/o) { $Checks{$CHECKNAME}{'period'} = &dhmstos($1); next; } elsif ($l =~ /^min\s+(\-?\d+)$/o) { $Checks{$CHECKNAME}{'min'} = $1; next; } elsif ($l =~ /^max\s+(\-?\d+)$/o) { $Checks{$CHECKNAME}{'max'} = $1; next; } elsif ($l =~ /^desc\s+(.*)$/o) { $Checks{$CHECKNAME}{'desc'} = $1; next; } elsif ($l =~ /^pattern\s+(.*)$/o) { $Checks{$CHECKNAME}{'pattern'} = $1; next; } elsif ($l =~ /^final\s*$/o) { $Checks{$CHECKNAME}{'final'} = 1; next; } elsif ($l eq "") { # blank line indicates end of check block $level = 'global'; $CHECKNAME = ''; next; } #### END OF "CHECK" part } elsif ($level eq 'group') { if ($l =~ /^per-host\s+(.*)$/o) { @{$CONF{'checks-per-host'}{$GROUPNAME}} = split(/\s+/,$1); next; } elsif ($l =~ /^per-group\s+(.*)$/o) { @{$CONF{'checks-per-group'}{$GROUPNAME}} = split(/\s+/,$1); next; } elsif ($l =~ /^on-host\s+(\S+)\s+(.*)$/o) { @{$CONF{'on-host'}{$GROUPNAME}{$1}} = split(/\s+/,$2); next; } elsif ($l eq "") { $level = 'global'; $GROUPNAME = ''; next; } } &Log(3,"Error while parsing configuration file, line $lineno: $l"); } } # # convert a string like "20m" into seconds # sub dhmstos { my ($str) = @_; my ($s); if ($str =~ /^\s*(\d+(?:\.\d+)?)([dhms])\s*$/i) { if ($2 eq "m") { $s = $1 * 60; } elsif ($2 eq "h") { $s = $1 * 60 * 60; } elsif ($2 eq "d") { $s = $1 * 60 * 60 * 24; } else { $s = $1; } } else { return undef; } $s; } sub daemonize { chdir '/' or die "Can't chdir to /: $!"; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; setsid or die "Can't start a new session: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; } ############################################################################## package LogEntry; # Some of the more important stuff happens here use strict; use Time::HiRes qw (time alarm sleep); BEGIN { use Exporter (); use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); # set the version for version checking $VERSION = 0.01; @ISA = qw(Exporter); @EXPORT = qw(); %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ], # your exported package globals go here, # as well as any optionally exported functions @EXPORT_OK = qw(); } use vars @EXPORT_OK; # non-exported package globals go here use vars qw(); sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = { }; bless ($self, $class); # Initialise with remaining arguments if (@_) { $self->init(@_); } return $self; } sub init (\%$$) { my ($self,$INIT,$group,$host) = @_; # We load some values from the INIT hash %{$self} = %{$INIT}; # After changing the pattern, it is sensible to reset our counters @{$self->{matches}} = (); $self->{group} = $group; $self->{host} = $host; # 0 : did not trigger # -1 : triggered because of too few matches # 1 : triggered because of too many matches $self->{status} = 0; $self->{status_time} = time; $self->{last_match} = 0; # The checkitem is a piece of code which we precompile here. my $code = 'sub { my ($r)=@_; if ('.$$INIT{'pattern'} .') { return 1; } else { return 0 } }'; &::Log(7,"Compiling: $code"); $self->{matcher} = eval $code; if ($@) { &::Log(2,"Error while compiling ".$$INIT{'name'}." ignoring"); $self->{matcher} = sub { return 0; }; } return $self->{matcher}; } sub check { my ($self,$msg) = @_; &::Log(7,"Checking ".$self->{desc}); my $code; eval { $code = &{$self->{matcher}}($msg); }; if ($@) { &::Log(2,"$self->{desc}: Fatal error while matching: $@"); return 0; } my $t = time; # Trim our data backlog while ( (scalar(@{$self->{matches}})>0) && ($t-$self->{matches}[0] > $self->{period})) { shift @{$self->{matches}} } if ($code == 1) { &::Log(7,"$self->{desc}: Matched"); # Pattern matched. Record timestamp. push @{$self->{matches}},$t; $self->{last_match} = $t; # Keep a copy of the last match %{$self->{last_matched_msg}} = %{$msg}; } my $count = scalar(@{$self->{matches}}); my $age = $t-$self->{status_time}; # First, we check if we matched too often. We don't check for # the age here since nothing is going to magically lower the match # counter. if (($count > $self->{max})) { &::Log(7,"$self->{desc}: Matched too often within period"); $self->trigger(1); # if we are below the threshold, and our age is at least # period (we need to check for the age - otherwise, we might # later on receive more messages and be alright / too high) } elsif (($count < $self->{min}) && ($age >= $self->{period})) { &::Log(7,"$self->{desc}: Matched too seldom within period"); $self->trigger(-1); # same in blue for the "ok" condition } elsif (($count > $self->{min}) && ($age >= $self->{period})) { &::Log(7,"$self->{desc}: Roger"); $self->trigger(0); } &::Log(7,"$self->{desc}: Current counter: $count"); # Abort processing if we are a final check and matched if ( ($code == 1) && ($self->{'final'} == 1) ) { &::Log(7,"$self->{desc}: Terminating walk due to final check"); return 1; } else { return 0; } &::Log(0,"Here are dragons"); die; } sub trigger { my ($self,$status) = @_; return if ($status == $self->{status}); &::Log(6,"$self->{desc}: Status change: ".$self->{status}."->".$status ." Counter: ".scalar(@{$self->{matches}})); $self->{status} = $status; $self->{status_since} = time; # We had a status change and need to send the right trap &::SendTrap($self); } mon-1.2.0/utils/cf-to-hosts0000755003616100016640000000535110230411543015440 0ustar trockijtrockij#!/usr/bin/perl # # Convert hostgroup entries in a mon configuration file # into a local hosts file # # Jim Trocki, trockij@arctic.org # # $Id: cf-to-hosts,v 1.2 2005/04/17 07:42:27 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA use strict; sub resolve_hosts; sub print_hosts; sub read_cf; use Getopt::Std; use Socket; my %resolved; my %opt; getopts ("hM", \%opt); if ($opt{"h"}) { print <)); next if $linepart =~ /^\s*#/; # # accumulate multi-line lines (ones which are \-escaped) # if (!defined $acc_line) { $linepart =~ s/^\s*//; } if ($linepart =~ /^(.*)\\\s*$/) { $acc_line .= $1; chomp $acc_line; next; } else { $acc_line .= $linepart; } $l = $acc_line; $acc_line = undef; chomp $l; $l =~ s/^\s*//; $l =~ s/\s*$//; $linepart = ""; if ($l eq "") { $ingroup = 0; next; } if ($l =~ /^hostgroup\s+(\S+)\s+(.*)/) { $ingroup = 1; resolve_hosts ($2); next; } elsif ($ingroup) { resolve_hosts ($l); next; } } close (IN); return ""; } mon-1.2.0/alert.d/0000755003616100016640000000000010640450346013542 5ustar trockijtrockijmon-1.2.0/alert.d/trap.alert0000755003616100016640000000377510230411542015546 0ustar trockijtrockij#!/usr/bin/perl # # Trap alert, for use with mon-0.38pre* and greater. # # Specify user and pass via MON_TRAP_USER (-U) and MON_TRAP_PASS (-P) # # Jim Trocki, trockij@arctic.org # # $Id: trap.alert,v 1.3 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; use Mon::Client; use Socket; getopts ("s:g:h:t:l:o:uU:P:T:"); $summary=; chomp $summary; $detail = ""; while () { $detail .= $_; } chomp $detail; $t = time; $USER = ($ENV{"MON_TRAP_USER"} || $opt_U) || ""; $PASS = ($ENV{"MON_TRAP_PASS"} || $opt_P) || ""; $OPST = defined $ENV{"MON_OPSTATUS"} ? $ENV{"MON_OPSTATUS"} : 0; if ($opt_o) { $OPST = int ($opt_o); } foreach $op (keys %Mon::Client::OPSTAT) { $OPSTATUS = $op if ($Mon::Client::OPSTAT{$op} == $OPST); } $c = new Mon::Client ( port => getservbyname ('mon', 'udp') || 2583, ); $c->user($USER) if ($USER); $c->password($PASS) if ($PASS); foreach $host (@ARGV) { $c->host($host); $res = $c->send_trap( group => $ENV{MON_GROUP}, service => $ENV{MON_SERVICE}, retval => $ENV{MON_RETVAL}, opstatus => $OPSTATUS, summary => $summary, detail => $detail, ); print STDERR "Error sending trap to $host\n" if (!$res); print STDERR "Error is: ". $c->error() . "\n" if (!$res); } exit; mon-1.2.0/alert.d/snpp.alert0000755003616100016640000000342010616437070015557 0ustar trockijtrockij#!/usr/bin/perl -w # # snpp.alert - Pure perl SNPP client # # Copyright (C) 1998, Michael Alan Dorman # # snpp.alert is based on the alert.template distributed by mon. # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id: snpp.alert,v 1.1.1.1.4.1 2007/05/03 19:55:36 trockij Exp $ # use strict; use vars qw /$opt_g $opt_q $opt_s $opt_t $opt_u/; use Getopt::Std; use Net::SNPP; getopts ("s:g:h:t:l:q:u"); my $opt_q ||= 'localhost'; # # the first line is summary information, adequate to send to a pager # or email subject line # # # the following lines normally contain more detailed information, # but this is monitor-dependent # # see the "Alert Programs" section in mon(1) for an explanation # of the options that are passed to the monitor script. # my $summary = ; chomp $summary; my $t = localtime ($opt_t); my ($wday,$mon,$day,$tm) = split (/\s+/, $t); my $snpp = Net::SNPP->new ($opt_q) or die; my $ALERT = $opt_u ? "UPALERT" : "ALERT"; $snpp->send ( Pager => [ @ARGV ], Message => "$ALERT $opt_g/$opt_s: $summary ($wday $mon $day $tm)" ); $snpp->quit; mon-1.2.0/alert.d/alert.template0000755003616100016640000000325010230411542016377 0ustar trockijtrockij#!/usr/bin/perl # # template for an alert # # Jim Trocki, trockij@arctic.org # # $Id: alert.template,v 1.2 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; getopts ("s:g:h:t:l:u"); # # the first line is summary information, adequate to send to a pager # or email subject line # # # the following lines normally contain more detailed information, # but this is monitor-dependent # # see the "Alert Programs" section in mon(1) for an explanation # of the options that are passed to the monitor script. # $summary=; chomp $summary; $t = localtime($opt_t); ($wday,$mon,$day,$tm) = split (/\s+/, $t); print <) { print; } mon-1.2.0/alert.d/netpage.alert0000755003616100016640000000333110230411542016207 0ustar trockijtrockij#!/usr/bin/perl # # netpage.alert - network page alert for mon # # The first line from STDIN is summary information, adequate to send # to a pager or email subject line. Even though this code is entirely # trivial, I wrote it just so that when it's specified in the mon.cf # file, it is clear that the paging alert is network-dependent. # # Jim Trocki, trockij@arctic.org # # $Id: netpage.alert,v 1.2 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # $RCSID='$Id: netpage.alert,v 1.2 2005/04/17 07:42:26 trockij Exp $'; use Getopt::Std; getopts ("s:g:h:t:l:u"); $summary=; chomp $summary; $pageaddrs = join (',', @ARGV); $t = localtime($opt_t); ($wday,$mon,$day,$tm) = split (/\s+/, $t); $ALERT = $opt_u ? "UPALERT" : "ALERT"; open (MAIL, "| /usr/lib/sendmail -oi -t") || die "could not open pipe to mail: $!\n"; print MAIL <) { print MAIL; } close (MAIL); mon-1.2.0/alert.d/qpage.alert0000755003616100016640000000401010230411542015654 0ustar trockijtrockij#!/usr/bin/perl # # qpage.alert - send an alert via QuickPage # # This will accept multiple pager IDs in @ARGV and call qpage for # each one of them, but you should probably use qpage groups if possible. # # qpage-specific options: # -c coverage area # -f SNPP CALLerid # -l service level # -q SNPP server, translates to "qpage -s" # # Jim Trocki, trockij@arctic.org # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Std; getopts ("s:g:h:t:c:f:l:q:uv"); # # the first line is summary information, adequate to send to a pager # or email subject line # # # the following lines normally contain more detailed information, # but this is monitor-dependent # @MSG=; $summary = shift @MSG; chomp $summary; $t = localtime($opt_t); ($wday,$mon,$day,$tm) = split (/\s+/, $t); $ALERT = $opt_u ? "UPALERT" : "ALERT"; foreach $pagedest (@ARGV) { if ($opt_v) { if (open(QPAGE, "| qpage -p $pagedest 2>/dev/null")) { print QPAGE "$ALERT $opt_g/$opt_s: $summary ($wday $mon $day $tm)\n"; print QPAGE @MSG; close QPAGE; } else { die "could not open pipe to qpage: $!\n"; } } else { if (system ("qpage -p $pagedest " . "'$ALERT $opt_g/$opt_s: $summary ($wday $mon $day $tm)'" . "2>/dev/null")) { die "could not open pipe to qpage: $?\n"; } } } mon-1.2.0/alert.d/test.alert0000755003616100016640000000016210061516617015555 0ustar trockijtrockij#!/bin/sh # # $Id: test.alert,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $ echo "`date` $*" >> /tmp/test.alert.log mon-1.2.0/alert.d/mail.alert0000755003616100016640000000426510230411542015515 0ustar trockijtrockij#!/usr/bin/perl # # mail.alert - Mail alert for mon # # The first line from STDIN is summary information, adequate to send # to a pager or email subject line. # # -f from@addr.x set the smtp envelope "from" address # # Jim Trocki, trockij@arctic.org # # $Id: mail.alert,v 1.3 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # $RCSID='$Id: mail.alert,v 1.3 2005/04/17 07:42:26 trockij Exp $'; use Getopt::Std; use Text::Wrap; getopts ("S:s:g:h:t:l:f:u"); $summary=; chomp $summary; $summary = $opt_S if (defined $opt_S); $mailaddrs = join (',', @ARGV); $mailfrom = "-f $opt_f -F $opt_f" if (defined $opt_f); $ALERT = $opt_u ? "UPALERT" : "ALERT"; $t = localtime($opt_t); ($wday,$mon,$day,$tm) = split (/\s+/, $t); open (MAIL, "| /usr/lib/sendmail -oi -t $mailfrom") || die "could not open pipe to mail: $!\n"; print MAIL <) { print MAIL; } close (MAIL); mon-1.2.0/alert.d/irc.alert0000755003616100016640000000776310230411542015356 0ustar trockijtrockij#!/usr/bin/perl # # irc.alert - irc alert for "mon" # # options are: # -s service # -g group # -h "host1 host2 host3..." # -t tmnow # -u (if upalert) # -T (if trap) # -O (if traptimeout) # # -j join the channel before doing PRIVMSG # (some channel modes prevent PRIVMSG from # user who hasn't joined the channel) # -c channel name of the channel (without leading #) # -S server irc server # -U user user for irc server # -n nick nick # -d post alert detail to irc channel # -N num try num different nicks before giving up # -p secs when showing detail, pause secs between # sending each line. secs may be fractional. # # Jim Trocki, trockij@arctic.org # # $Id: irc.alert,v 1.2 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use strict; use IO::Socket::INET; use Getopt::Std; use English; my %opt; getopts ("s:g:h:t:uTOjc:S:U:n:dN:p:", \%opt); my $CHAN = $opt{"c"} || "mon"; my $NICK = $opt{"n"} || "mon"; my $USER = $opt{"U"} || $NICK; my $SERVER = $opt{"S"} || die "must supply server via -S\n"; my $NICK_TRIES = $opt{"N"} || 5; my $PAUSE = $opt{"p"} || 0; my $TIMEOUT = 10; # # read in what the mon server sends us about the alert # my $summary = <>; $summary = "UNKNOWN" if ($summary eq ""); my @details; while (<>) { chomp; push @details, $_; } eval { local $SIG{ALRM} = sub { die "Timeout Alarm" }; alarm $TIMEOUT; # # make the connection # my $s = new IO::Socket::INET ( "PeerAddr" => "$SERVER:6667", "Proto" => "tcp", "Timeout" => 10, ); die if (!defined $s); # # register with the irc server # print $s "NICK $NICK\r\n"; print $s "USER $USER uplift.transmeta.com $USER :$USER\r\n"; my $nick_tries = 0; # # if we get in, there will be a "001" reply # from the server. deal with nick collisions. # while (<$s>) { s/\r\n//; # # we're in # last if (/^:\S+\s+001\s/); # # nick already in use, pick a new one # if (/^:\S+\s+433\s/ || /^:\S+\s+432\s/) { if (++$nick_tries >= $NICK_TRIES) { print $s "QUIT\r\n"; die "could not get an unused nick, giving up\n"; } my ($nick, $num) = ($NICK, 0); if ($NICK =~ /_/) { ($nick, $num) = split (/_/, $NICK); } $NICK = "$nick" . "_" . ++$num; print $s "NICK $NICK\r\n"; } } # # /join the channel if requested # if ($opt{"j"}) { print $s "JOIN #$CHAN\r\n"; } my @t = split (/\s+/, scalar (localtime ($opt{"t"} ? $opt{"t"} : time))); my $t = "$t[2]-$t[1] $t[3]"; my $alert = $opt{"u"} ? "UPALERT" : "ALERT"; print $s "PRIVMSG #$CHAN :$alert $t ($opt{g}/$opt{s}): $summary\r\n"; # # print out the detail if requested # if ($opt{"d"}) { foreach my $detail (@details) { print $s "PRIVMSG #$CHAN : $t ($opt{g}/$opt{s}): $detail\r\n"; if ($PAUSE) { my ($rin, $win, $ein); select ($rin, $win, $ein, $PAUSE); } } } # # /leave the channel # if ($opt{"j"}) { print $s "PART #$CHAN\r\n"; } print $s "QUIT :byebye\r\n"; while (<$s>) { # whatever } close $s; alarm 0; }; if ($EVAL_ERROR) { die "$EVAL_ERROR"; } mon-1.2.0/alert.d/file.alert0000755003616100016640000000356210230411542015511 0ustar trockijtrockij#!/usr/bin/perl # # file.alert - log alert to a file # # The first line from STDIN is summary information, adequate to send # to a pager or email subject line. # # Jim Trocki, trockij@arctic.org # # $Id: file.alert,v 1.2 2005/04/17 07:42:26 trockij Exp $ # # Copyright (C) 1998, Jim Trocki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # $RCSID='$Id: file.alert,v 1.2 2005/04/17 07:42:26 trockij Exp $'; use Getopt::Std; getopts ("d:S:s:g:h:t:l:uOT"); $summary=; chomp $summary; $summary = $opt_S if (defined $opt_S); $file = shift; $file = "file" if (!defined $file); $file = "$opt_d/$file" if ($opt_d); $ALERT = $ENV{"MON_ALERTTYPE"} || "UNKNOWN ALERT"; if (defined $ENV{"MON_OPSTATUS"}) { $OPSTATUS = $ENV{"MON_OPSTATUS"}; } else { $OPSTATUS = "UNKNOWN OPSTATUS"; } $t = localtime($opt_t); ($wday,$mon,$day,$tm) = split (/\s+/, $t); open (F, ">>$file") || die "could not append to $file: $!\n"; print F <) { print F; } print F ".\n"; close (F);