pax_global_header00006660000000000000000000000064145042664710014523gustar00rootroot0000000000000052 comment=d63377f7c3102784fb6268e8ff699649f15d2b3e check_pgactivity-REL2_7/000077500000000000000000000000001450426647100153375ustar00rootroot00000000000000check_pgactivity-REL2_7/.editorconfig000066400000000000000000000004051450426647100200130ustar00rootroot00000000000000root = true [*] charset = utf-8 indent_style = space indent_size = 4 # Unix-style newlines end_of_line = lf # Remove any whitespace characters preceding newline characters trim_trailing_whitespace = true # Newline ending every file insert_final_newline = true check_pgactivity-REL2_7/.gitignore000066400000000000000000000000411450426647100173220ustar00rootroot00000000000000*.data *.data.lock *~ /tmp_check check_pgactivity-REL2_7/CHANGELOG.md000066400000000000000000000300501450426647100171460ustar00rootroot00000000000000Changelog ========= 2023-09-25 v2.7: * add: compatibility with PostgreSQL 15 and 16, Jehan-Guillaume de Rorthais, Thomas Reiss * change: `stat_snapshot_age` is compatible from PostgreSQL v9.5 to v14, Benoit Lobréau * change: simplify session accounting in `backends_status`, Thomas Reiss * fix: compatibility of `backup_label_age` with PostgreSQL 15 and after, Thomas Reiss * fix: in `pg_dump_backup`, error explicitly if `--path` is a directory, Christophe Courtois * fix: `temp_files` on PG10 was counting all DB files, Christophe Courtois, Benoit Lobréau * fix: make `check_archiver` output `oldest_ready_wal=0` when archive queue is empty, Thomas Reiss * fix: make `check_archiver` work properly with PostgreSQL 10 without being superuser, Thomas Reiss * fix: in `backends_status`, avoid "idle in transaction" false positive for PostgreSQL 9.2 and after, Thomas Reiss * fix: issue with check_pgactivity missing lock file, Joern Ott, Julien Rouhaud, Jehan-Guillaume de Rorthais * fix: in `btree_bloat`, adjust index tuple header size, Shangzi Xie, Jehan-Guillaume de Rorthais Thanks to all contributors of this release for feedbacks, bug reports, patches and patch reviews, etc. 2022-07-08 v2.6: * add: new `session_stats` service to gather miscellaneous session statistics, Frédéric Yhuel * add: compatibility with PostgreSQL 14, Frédéric Yhuel * change: service `autovacuum` does not show `max_workers` anymore for 8.2 and below, Jehan-Guillaume de Rorthais * change: various messages and code cleanup, Julian Vanden Broeck, Jehan-Guillaume de Rorthais * fix: `last_vacuum` and `last_analyse` to reports the correct oldest maintenance, Frédéric Yhuel, Jehan-Guillaume de Rorthais * fix: service `check_oldest_idlexact` now use `state_change` instead of `xact_start` to calculate the idle time, Thomas Reiss * fix: improve locking around the status file to avoid dead locks and status file truncation, Arnaud Aujou, Julien Rouhaud, Jehan-Guillaume de Rorthais * fix: possible division by 0 in `table_bloat` service, Pavel Golub * fix: threshold check and support interval for service `check_stat_snapshot_age`, Jehan-Guillaume de Rorthais * fix: service `check_archiver` when a .history or .backup file is staled, Thomas Reiss * fix: service `sequences_exhausted` now checks also sequences that are not owned by a table column, Thomas Reiss * fix: service `check_archiver` when no WAL was ever archived, Thomas Reiss Thank you to all contributors of this release for feedbacks, bug reports, patches and patch reviews, etc. 2020-11-24 v2.5: - add: new `oldest_xmin` service - add: new `extensions_versions` service - add: new `checksum_errors` service - add: support for v13 and other improvements on `replication_slots` - add: v13 compatibility for service `wal_files` - add: various documentation details and examples - add: support service `replication_slots` on standby - add: accept single `b` or `o` as size unit - add: json and json_strict output formats - add: `size` and/or `delta` thresholds for `database_size` service - add: thresholds are now optional for service `database_size` - add: support for v12 and v13 `archive_folder` - regression: threshold `repslot` becomes `spilled` in service `replication_slots` - regression: in services latest_vacuum and latest_analyze: a critical alert is now raised on tables that were never analyzed/vacuumed or whose maintenance date was lost due to a crash - fix: avoid alerts for lack of maintenance on inactive db - fix: forbid rare cases of division by zero in `wal_files` - fix: do not alert on missing file in `temp_files` for v10+ - fix: detect lack of maintenance in `last_vacuum` and `last_analyze` for never maintained tables - fix: backend count for v10+ - fix: replace NaN with "U" for strict outputs - fix: do not count walsenders as part of `max_connections` - fix: broken `archiver` service with v10+ - fix: perl warning when archiver is not active 2019-01-30 v2.4: - add a new `uptime` service - add ability to filter by application_name in longest_query and oldest_idlexact service - add minimal delta size to pgdump_backup service to avoid alert when backup grows small in size - allow psql connections without providing connection arguments: rely on the binary default behaviour and environment variables - returns CRITICAL if connection fails for service `connection`, instead of UNKNOWN - add documentation example for pgback in pgdump_service - add documentation for archive_folder - add information on necessary priviledges for all services - replication_slots service handle wal files and pg_replslots files separately - take account of the new BRIN summarize state of autovacuum - avoid warning for -dev versions in pga_version service - ignore startup and backup replication states in service streaming_delta - fix handling or file reading errors in archive_folder service - fix wal magic number for version 10 - fix service stat_snapshot_age to output the correct age - fix archiver and replication_slots services to work properly on a standby node - fix archiver to raise OK on a slave - fix is_replay_paused for PostgreSQL 10 - fix max_nb_wal calculation in wal_files service - fix uninitialized bug in hit_ratio when database do not yet have statistics - fix check_backend_status in order to ignore unknown status - fix service sequences_exhausted to take account of sequence's minvalue - fix sequences_exhausted to take account of sequences only in the current db - fix exclude option in backends_status service - fix archive_folder: timeline numbers are hexadecimal - fix head levels in man page - check for errors when saving status 2017-11-13 v2.3: - add complete support for PostgreSQL 10, including non-privileged monitoring features - add some documentation to help new contributors - add ability to use time units for thresholds in service backend_status - fix a long-standing bug in service backends_status - fix sequences_exhausted to work with sequences attached to unusual types - fix fetching method for service minor_version 2017-04-28 v2.2: - add support for PostgreSQL 9.6 - add early-support for PostgreSQL 10 - add service sequences_exhausted to monitor sequence usage - add service stat_snapshot_age to detect a stuck stats collector process - add service wal_receiver to monitor replication on standby's end - add service pgdata_permission to monitor rights and ownership of the PGDATA - add support for "pending restart" parameters from PostgreSQL 9.5+ in check_settings - add timeline id in perfdata output from wal_files - fix wal_files, archiver, check_is_replay_paused, check_hot_standby_delta, check_streaming_delta and check_replication_slots for PostgreSQL 10 - fix archive_folder to handle compressed archived WAL properly - fix backends_status for PostgreSQL 9.6 - improve and rename "ready_archives" to "archiver" - warn when no rows are returned in custom_query service - make thresholds optional in service hot_standby_delta - make thresholds optional in service streaming_delta - remove useless thresholds in backends/maximum_connections perfdata - add warn/crit threshold to steaming_delta perfdatas - use parameter server_version_num to detect PostgreSQL version - fix a race condition in is_storable to handle concurrent executions correctly - fix a bug in service locks that occurs with PostgreSQL 8.2 - fix rounding in hit_ratio - fix perl warning when some ENV variables are not defined - fix bug in "human" output format - fix version check for all hosts for service hot_standby_delta - fix bug in pg_dump_backups related to age of global files - fix documentation about default db connection 2016-08-29 2.0: - support various output format - add output format "nagios_strict" - add output format "debug" - add output format "binary" - add output format "human" - force UTF8 encoding - fix a bug where pod2usage couldn't find the original script - fix wal size computation for 9.3+ (255 -vs- 256 seg of 16MB) - fix perl warning with pg_dump_backup related to unknown database - fix buffers_backend unit in check_bgwriter - do not connect ot the cluster if using --dbinclude for service pg_dump_backup - add argument --dump-status-file, useful for debugging - add service "table_unlogged" - add basic support to timeline cross in service archive_folder - add service "settings" - add service "invalid_indexes" 2016-01-28 1.25: - add service pg_dump_backup - change units of service bgwriter (github issue #29) - support PostgreSQL 9.5 - fix backends service to remove autovacuum from the connection count (github issue #14) - fix backends service to add walsenders to the connection count (github issue #14) - fix a harmless perl warning - fix wal_size service to support 9.5+ - fix corruption on status file on concurrent access - fix bad estimation in btree bloat query with mostly NULL columns 2015-09-28 1.24: - improve message for streaming_delta and hot_standby_delta services - add replication_slot service - enhance table_bloat queries - enhance btree_bloat queries - add -l option, aliased for --list - backends service has a new maximum_connections perfdata - backends service now consider the maximum connections as max_connections - superuser_reserved_connections - improve checks for hot_standby_delta service - fix check_pgactivity to run with Perl 5.10.0 - add commit_ratio service - various documentations improvements 2015-02-05 1.23: - better handling of errors related to status file - support fillfactor in btree_bloat and table_bloat services - compute hit_ratio since last run, which mean the value is now really precise - add --dbinclude and --dbexclude arguments - fix # of idle in xact in odlest_idlexact service - check the temp file creation for queries succeed - accept non-decimal only thresholds in pga_version (making it works with beta versions) - fix compatibility issue with perl 5.8 - add perl version to pga_version and --version 2014-12-30 1.22: - fix pga_version service to accept non-decimal only versions - fix temp_files service bug, leading to "ERROR: set-valued function called in context that cannot accept a set" errors 2014-12-24 1.21: - fix temp_files service 2014-12-24 1.20: - add RPM specfile - add temp_files service - fix bug #13 (illegal division by 0) - fux bad regexp in autovacuum service - fix wrong curl command line options 2014-12-03 1.19: - fix oldest_idlexact service - documentation improvements - fix last_vacuum/analyze last exec time computation 2014-11-03 1.18: - fix issue in locks service with PG 9.0- 2014-10-29 1.17: - improve btree index bloat accuracy 2014-10-23 1.16: - fix btree_bloat service to support index on expression - various documentation improvements - fix SIReadLocks output in locks service - fix missing database in oldest_idlexact service - add warning & critical values in hot_standby service perfdata - add predicate locks support in locks service - enhance backup_label_service on PG 9.3+ - fix streaming_delta service when called on a standby 2014-09-19 1.15: - do not compute wal_rate on standby in wal_files service 2014-09-09 1.14: - return critial if negative age in max_freeze_age service - add wal_rate perfdata to wal_files service - general enhancement in documentation - add perfdata in streaming_delta service - fix autovacuum service on PG 8.3+ 2014-09-05 1.13: - add autovacuum service - fix wrong behavior when using 0 in a time unit 2014-08-07 1.12: - add wal_keep_segments in wal_files service perfdata - fix the expected number of WAL in wal_files service - fix issue in table_bloat service leading to precess to indexes - remove some useless perfdata from backends_status service 2014-08-05 1.11: - handle disabled and insufficient privilege status in backends_status service - improve accuracy of table_bloat service 2014-07-31 1.10: - split bloat service into more accurate btree_bloat and table_bloat service - fix issue if the server name contains a "=" - fix Perl warning in hot_standby_delta service check_pgactivity-REL2_7/CONTRIBUTING.md000066400000000000000000000214251450426647100175740ustar00rootroot00000000000000# Contributing to check_pgactivity ## Adding a new service ### Get check_pgactivity know about your new service At the beginning of the check_pgactivity, the %services hash describes every available services. The hash is defined this way : ``` my %services = ( # 'service_name' => { # 'sub' => sub reference to call to run this service # 'desc' => 'a desctiption of the service' # } 'autovacuum' => { 'sub' => \&check_autovacuum, 'desc' => 'Check the autovacuum activity.' }, ... 'example' => { 'sub' => \&check_example, 'desc' => 'Check number of connections (example service).' } ``` First, add the service_name and values for the sub and desc entries. That is enough to declare a new service. You can now see your new service listed when you call check_pgactivity with the --list argument. ### Implement your service Now, define a new check_servicename function before the mark "End of SERVICE section in pod doc". To know about arguments provided to the service support function, see the %args hash. Get some inspiration from other service functions to see how to handle an new one, for example check_stat_snapshot_age is a good starter as it is really simple. Here are however some guidelines. Your service has to identify itself with a variable $me. This variable will be used when calling the output functions. For example : ``` sub check_example { my $me = 'POSTGRES_EXAMPLE'; ``` Several other variables are defined : * `@rs`: array to store the result set of the monitoring query * `@perfdata`: array to store the returned perdata * `@msg`: array to store the returned messages * `@hosts`: array to know the host(s) to query * `%args`: hash containing service arguments Consider using these variables names as a convention. Also populate your own %args hash from the first argument : ``` my %args = %{ $_[0] }; ``` Now the interesting stuff comes. You can declare a simple query to monitor something in your PostgreSQL server. Here we declare a query that may work with any PostgreSQL version, for example : ``` my $query = qq{ SELECT count(*) FROM pg_stat_activity}; ``` If you must provide multiple queries for mulitple PostgreSQL versions, please refer to the section "Supporting multiple PostgreSQL versions". If your service has to react according some user-thresholds, verify that the user has given the thresholds as arguments : ``` defined $args{'warning'} and defined $args{'critical'} ``` Use pod2usage to output the error message. It is recommended to validate the format of the threshold using a regexp : ``` $args{'warning'} =~ m/^([0-9.]+)/ ``` The parse_hosts function will populate the @hosts array : ``` @hosts = @{ parse_hosts %args }; ``` If your service does not work until a given PostgreSQL version, use some code like : ``` is_compat $hosts[0], 'example', $PG_VERSION_95 or exit 1; ``` Query the database using the query function : ``` @rs = @{ query ( $hosts[0], $query ) }; ``` In our example, we can directly get the result : ``` $num_connections = $rs[0][0]; ``` Populate the @perfdata array with the resulting metrics : ``` push @perfdata => [ "connections_number", $num_connections, undef ]; ``` You must provide some mandatory data in the @perfdata array : * perfdata name * the data itself, in numerical form * the unit used:'B' for bytes, 's' for secondes or undef for raw numbers The following data are optional : * warning threshold * critical threshold * minimum value * maximum value Your service can return "ok" by calling the function of the same name : ``` return ok( $me, \@msg, \@perfdata ); ``` check_pgactivity provides 4 functions for the 4 given A Nagios-compatible service states : * ok * warning * critical * unknown ### Document your new service check_pgactivity's documentation is handled in its source file, in POD format, as Plain Old Documentation format. Refer to the perlpod documentation for further informations. See also the releasing.md file to see how to regenerate the documentation. ### Test your new service Test your service in several conditions, verify that it returns a OK, a warning or critical alert by simulating each conditions. Test your service upon several PostgreSQL versions. Verify that the service returns an error for unsupported versions, and that every other versions work well. ### Submit your patch The best way to submit your patch is to send a pull request in github. ## Supporting multiple PostgreSQL versions Each major PostgreSQL version brings some incompatibilities from previous releases. You can easily add compatibility to a new PostgreSQL version by following some guidelines given here. First, you have to add a new $PG_VERSION_XXX variable, as following : ``` my $PG_VERSION_MIN = 70400; my $PG_VERSION_74 = 70400; ... my $PG_VERSION_95 = 90500; my $PG_VERSION_96 = 90600; my $PG_VERSION_100 = 100000; ``` The value of the variable is given from the parameter server_version_num. You can look at the function set_pgversion() and is_compat() to see how it is used. Then, you will probably have to adapt some queries for the new PostgreSQL version. In order to ease this process, most probes provides a %queries hash that stores a given query associated to a given PostgreSQL version. You don't have to write the same query for each major release, you can simply store the appropriate query for the version that enters the incompatibility, it will be used for each following version. For example, the probe autovacuum, implemented in the check_autovacuum function provides the following hash : ``` my %queries = ( # field current_query, not autovacuum_max_workers $PG_VERSION_81 => q{ SELECT current_query, extract(EPOCH FROM now()-query_start)::bigint, 'NaN' FROM pg_stat_activity WHERE current_query LIKE 'autovacuum: %' ORDER BY query_start ASC }, # field current_query, autovacuum_max_workers $PG_VERSION_83 => q{ SELECT a.current_query, extract(EPOCH FROM now()-a.query_start)::bigint, s.setting FROM (SELECT current_setting('autovacuum_max_workers') AS setting) AS s LEFT JOIN ( SELECT * FROM pg_stat_activity WHERE current_query LIKE 'autovacuum: %' ) AS a ON true ORDER BY query_start ASC }, # field query, still autovacuum_max_workers $PG_VERSION_92 => q{ SELECT a.query, extract(EPOCH FROM now()-a.query_start)::bigint, s.setting FROM (SELECT current_setting('autovacuum_max_workers') AS setting) AS s LEFT JOIN ( SELECT * FROM pg_stat_activity WHERE query LIKE 'autovacuum: %' ) AS a ON true ORDER BY a.query_start ASC } ); ``` Then, call the query_ver() function, giving the host you want to query, usually $host[0], and the %queries hash : ``` @rs = @{ query_ver( $hosts[0], %queries ) }; ``` query_ver() will do the job for you and use the appropriate query for the current PostgreSQL release and return the result in the @rs array. Also, if a probe does not work until a given version, use a derivate of the following code : ``` is_compat $hosts[0], 'autovacuum', $PG_VERSION_81 or exit 1; ``` See the code around to look how to support new PostgreSQL versions. Sometimes you have to write a totally different code path to support a new release, as it happened for PostgreSQL 10. See for example check_archiver how it is handled. ## Storing statistics One of the key feature of check_pgactivity consists of the ability to store intermediate results in a binary file. That allows to calculate delta values between two calls of the same service. The underlying implementation uses the Storable library from the Perl language. Thus, you can easily store any Perl data structure into the resulting statistics file. First, the load call will populate the data structure using the following arguments : * the host structure ref that holds the "host" and "port" parameters * the name of the structure to load * the path to the file storage As you may guess, there's also a save function to store the data structure into the statistics file with the following arguments : * the host structure ref that holds the "host" and "port" parameters * the name of the structure to save * the ref of the structure to save * the path to the file storage See for example the function check_bgwriter to see how to use the functions and how to store your intermediate metrics. ## Debugging check_pgactivity Use the --debug option to enable the debug output. Use dprint() function to output some specific debugging messages to the developer or the user. check_pgactivity-REL2_7/LICENSE000066400000000000000000000016621450426647100163510ustar00rootroot00000000000000Copyright (c) 2012-2023, Open PostgreSQL Monitoring Development Group (OPMDG). Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. IN NO EVENT SHALL OPMDG BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF OPMDG HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. OPMDG SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND OPMDG HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. check_pgactivity-REL2_7/README000066400000000000000000001531261450426647100162270ustar00rootroot00000000000000NAME check_pgactivity - PostgreSQL plugin for Nagios SYNOPSIS check_pgactivity {-w|--warning THRESHOLD} {-c|--critical THRESHOLD} [-s|--service SERVICE ] [-h|--host HOST] [-U|--username ROLE] [-p|--port PORT] [-d|--dbname DATABASE] [-S|--dbservice SERVICE_NAME] [-P|--psql PATH] [--debug] [--status-file FILE] [--path PATH] [-t|--timemout TIMEOUT] check_pgactivity [-l|--list] check_pgactivity [--help] DESCRIPTION check_pgactivity is designed to monitor PostgreSQL clusters from Nagios. It offers many options to measure and monitor useful performance metrics. COMPATIBILITY Each service is available from a different PostgreSQL version, from 7.4, as documented below. The psql client must be 8.3 at least. It can be used with an older server. Please report any undocumented incompatibility. -s, --service SERVICE The Nagios service to run. See section SERVICES for a description of available services or use "--list" for a short service and description list. -h, --host HOST Database server host or socket directory (default: $PGHOST or "localhost") See section "CONNECTIONS" for more informations. -U, --username ROLE Database user name (default: $PGUSER or "postgres"). See section "CONNECTIONS" for more informations. -p, --port PORT Database server port (default: $PGPORT or "5432"). See section "CONNECTIONS" for more informations. -d, --dbname DATABASE Database name to connect to (default: $PGDATABASE or "template1"). WARNING! This is not necessarily one of the database that will be checked. See "--dbinclude" and "--dbexclude" . See section "CONNECTIONS" for more informations. -S, --dbservice SERVICE_NAME The connection service name from pg_service.conf to use. See section "CONNECTIONS" for more informations. --dbexclude REGEXP Some services automatically check all the databases of your cluster (note: that does not mean they always need to connect on all of them to check them though). "--dbexclude" excludes any database whose name matches the given Perl regular expression. Repeat this option as many time as needed. See "--dbinclude" as well. If a database match both dbexclude and dbinclude arguments, it is excluded. --dbinclude REGEXP Some services automatically check all the databases of your cluster (note: that does not imply that they always need to connect to all of them though). Some always exclude the 'postgres' database and templates. "--dbinclude" checks ONLY databases whose names match the given Perl regular expression. Repeat this option as many time as needed. See "--dbexclude" as well. If a database match both dbexclude and dbinclude arguments, it is excluded. -w, --warning THRESHOLD The Warning threshold. -c, --critical THRESHOLD The Critical threshold. -F, --format OUTPUT_FORMAT The output format. Supported output are: "binary", "debug", "human", "nagios", "nagios_strict", "json" and "json_strict". Using the "binary" format, the results are written in a binary file (using perl module "Storable") given in argument "--output". If no output is given, defaults to file "check_pgactivity.out" in the same directory as the script. The "nagios_strict" and "json_strict" formats are equivalent to the "nagios" and "json" formats respectively. The only difference is that they enforce the units to follow the strict Nagios specs: B, c, s or %. Any unit absent from this list is dropped (Bps, Tps, etc). --tmpdir DIRECTORY Path to a directory where the script can create temporary files. The script relies on the system default temporary directory if possible. -P, --psql FILE Path to the "psql" executable (default: "psql"). It should be version 8.3 at least, but the server can be older. --status-file PATH Path to the file where service status information is kept between successive calls. Default is to save a file called "check_pgactivity.data" in the same directory as the script. Note that this file is protected from concurrent writes using a lock file located in the same directory, having the same name than the status file, but with the extension ".lock". On some plateform, network filesystems may not be supported correctly by the locking mechanism. See "perldoc -f flock" for more information. --dump-status-file Dump the content of the status file and exit. This is useful for debugging purpose. --dump-bin-file [PATH] Dump the content of the given binary file previously created using "--format binary". If no path is given, defaults to file "check_pgactivity.out" in the same directory as the script. -t, --timeout TIMEOUT Timeout (default: "30s"), as raw (in seconds) or as an interval. This timeout will be used as "statement_timeout" for psql and URL timeout for "minor_version" service. -l, --list List available services. -V, --version Print version and exit. --debug Print some debug messages. -?, --help Show this help page. THRESHOLDS THRESHOLDS provided as warning and critical values can be raw numbers, percentages, intervals or sizes. Each available service supports one or more formats (eg. a size and a percentage). Percentage If THRESHOLD is a percentage, the value should end with a '%' (no space). For instance: 95%. Interval If THRESHOLD is an interval, the following units are accepted (not case sensitive): s (second), m (minute), h (hour), d (day). You can use more than one unit per given value. If not set, the last unit is in seconds. For instance: "1h 55m 6" = "1h55m6s". Size If THRESHOLD is a size, the following units are accepted (not case sensitive): b (Byte), k (KB), m (MB), g (GB), t (TB), p (PB), e (EB) or Z (ZB). Only integers are accepted. Eg. "1.5MB" will be refused, use "1500kB". The factor between units is 1024 bytes. Eg. "1g = 1G = 1024*1024*1024." CONNECTIONS check_pgactivity allows two different connection specifications: by service or by specifying values for host, user, port, and database. Some services can run on multiple hosts, or needs to connect to multiple hosts. You might specify one of the parameters below to connect to your PostgreSQL instance. If you don't, no connection parameters are given to psql: connection relies on binary defaults and environment. The format for connection parameters is: Parameter "--dbservice SERVICE_NAME" Define a new host using the given service. Multiple hosts can be defined by listing multiple services separated by a comma. Eg. --dbservice service1,service2 For more information about service definition, see: Parameters "--host HOST", "--port PORT", "--user ROLE" or "--dbname DATABASE" One parameter is enough to define a new host. Usual environment variables (PGHOST, PGPORT, PGDATABASE, PGUSER, PGSERVICE, PGPASSWORD) or default values are used for missing parameters. As for usual PostgreSQL tools, there is no command line argument to set the password, to avoid exposing it. Use PGPASSWORD, .pgpass or a service file (recommended). If multiple values are given, define as many host as maximum given values. Values are associated by position. Eg.: --host h1,h2 --port 5432,5433 Means "host=h1 port=5432" and "host=h2 port=5433". If the number of values is different between parameters, any host missing a parameter will use the first given value for this parameter. Eg.: --host h1,h2 --port 5433 Means: "host=h1 port=5433" and "host=h2 port=5433". Services are defined first For instance: --dbservice s1 --host h1 --port 5433 means: use "service=s1" and "host=h1 port=5433" in this order. If the service supports only one host, the second host is ignored. Mutual exclusion between both methods You can not overwrite services connections variables with parameters "--host HOST", "--port PORT", "--user ROLE" or "--dbname DATABASE" SERVICES Descriptions and parameters of available services. archive_folder Check if all archived WALs exist between the oldest and the latest WAL in the archive folder and make sure they are 16MB. The given folder must have archived files from ONE cluster. The version of PostgreSQL that created the archives is only checked on the last one, for performance consideration. This service requires the argument "--path" on the command line to specify the archive folder path to check. Obviously, it must have access to this folder at the filesystem level: you may have to execute it on the archiving server rather than on the PostgreSQL instance. The optional argument "--suffix" defines the suffix of your archived WALs; this is useful for compressed WALs (eg. .gz, .bz2, ...). Default is no suffix. This service needs to read the header of one of the archives to define how many segments a WAL owns. Check_pgactivity automatically handles files with extensions .gz, .bz2, .xz, .zip or .7z using the following commands: gzip -dc bzip2 -dc xz -dc unzip -qqp 7z x -so If needed, provide your own command that writes the uncompressed file to standard output with the "--unarchiver" argument. Optional argument "--ignore-wal-size" skips the WAL size check. This is useful if your archived WALs are compressed and check_pgactivity is unable to guess the original size. Here are the commands check_pgactivity uses to guess the original size of .gz, .xz or .zip files: gzip -ql xz -ql unzip -qql Default behaviour is to check the WALs size. Perfdata contains the number of archived WALs and the age of the most recent one. Critical and Warning define the max age of the latest archived WAL as an interval (eg. 5m or 300s ). Required privileges: unprivileged role; the system user needs read access to archived WAL files. Sample commands: check_pgactivity -s archive_folder --path /path/to/archives -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --suffix .gz -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --ignore-wal-size --suffix .bz2 -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --unarchiver "unrar p" --ignore-wal-size --suffix .rar -w 15m -c 30m archiver (8.1+) Check if the archiver is working properly and the number of WAL files ready to archive. Perfdata returns the number of WAL files waiting to be archived. Critical and Warning thresholds are optional. They apply on the number of files waiting to be archived. They only accept a raw number of files. Whatever the given threshold, a critical alert is raised if the archiver process did not archive the oldest waiting WAL to be archived since last call. Required privileges: superuser (=9.0 only), "fastpath function call", "active", "waiting for lock", "undefined", "disabled" and "insufficient privilege". insufficient privilege appears when you are not allowed to see the statuses of other connections. This service supports the argument "--exclude REGEX" to exclude queries matching the given regular expression. You can use multiple "--exclude REGEX" arguments. Critical and Warning thresholds are optional. They accept a list of 'status_label=value' separated by a comma. Available labels are "idle", "idle_xact", "aborted_xact", "fastpath", "active" and "waiting". Values are raw numbers or time units and empty lists are forbidden. Here is an example: -w 'waiting=5,idle_xact=10' -c 'waiting=20,idle_xact=30,active=1d' Perfdata contains the number of backends for each status and the oldest one for each of them, for 8.2+. Note that the number of backends reported in Nagios message includes excluded backends. Required privileges: an unprivileged user only sees its own queries; a pg_monitor (10+) or superuser (<10) role is required to see all queries. checksum_errors (12+) Check for data checksums error, reported in pg_stat_database. This service requires that data checksums are enabled on the target instance. UNKNOWN will be returned if that's not the case. Critical and Warning thresholds are optional. They only accept a raw number of checksums errors per database. If the thresholds are not provided, a default value of `1` will be used for both thresholds. Checksums errors are CRITICAL issues, so it's highly recommended to keep default threshold, as immediate action should be taken as soon as such a problem arises. Perfdata contains the number of error per database. Required privileges: unprivileged user. backup_label_age (8.1+) Check the age of the backup label file. Perfdata returns the age of the backup_label file, -1 if not present. Critical and Warning thresholds only accept an interval (eg. 1h30m25s). Required privileges: grant execute on function pg_stat_file(text, boolean) (pg12+); unprivileged role (9.3+); superuser (<9.3) bgwriter (8.3+) Check the percentage of pages written by backends since last check. This service uses the status file (see "--status-file" parameter). Perfdata contains the ratio per second for each "pg_stat_bgwriter" counter since last execution. Units Nps for checkpoints, max written clean and fsyncs are the number of "events" per second. Critical and Warning thresholds are optional. If set, they *only* accept a percentage. Required privileges: unprivileged role. btree_bloat Estimate bloat on B-tree indexes. Warning and critical thresholds accept a comma-separated list of either raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to bloat size, not object size. If a percentage is given, the threshold will apply to the bloat size compared to the total index size. If multiple threshold values are passed, check_pgactivity will choose the largest (bloat size) value. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. It also supports a "--exclude REGEX" parameter to exclude relations matching a regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple "--exclude REGEX" parameters. Perfdata will return the number of indexes of concern, by warning and critical threshold per database. A list of the bloated indexes will be returned after the perfdata. This list contains the fully qualified bloated index name, the estimated bloat size, the index size and the bloat percentage. Required privileges: superuser (<10) able to log in all databases, or at least those in "--dbinclude"; superuser (<10); on PostgreSQL 10+, a user with the role pg_monitor suffices, provided that you grant SELECT on the system table pg_statistic to the pg_monitor role, in each database of the cluster: "GRANT SELECT ON pg_statistic TO pg_monitor;" session_stats (14+) Gather miscellaneous session statistics. This service uses the status file (see --status-file parameter). Perfdata contains the session / active / idle-in-transaction times for each database since last call, as well as the number of sessions per second, and the number of sessions killed / abandoned / terminated by fatal errors. Required privileges: unprivileged role. commit_ratio (all) Check the commit and rollback rate per second since last call. This service uses the status file (see --status-file parameter). Perfdata contains the commit rate, rollback rate, transaction rate and rollback ratio for each database since last call. Critical and Warning thresholds are optional. They accept a list of comma separated 'label=value'. Available labels are rollbacks, rollback_rate and rollback_ratio, which will be compared to the number of rollbacks, the rollback rate and the rollback ratio of each database. Warning or critical will be raised if the reported value is greater than rollbacks, rollback_rate or rollback_ratio. Required privileges: unprivileged role. configuration (8.0+) Check the most important settings. Warning and Critical thresholds are ignored. Specific parameters are : "--work_mem", "--maintenance_work_mem", "--shared_buffers","--wal_buffers", "--checkpoint_segments", "--effective_cache_size", "--no_check_autovacuum", "--no_check_fsync", "--no_check_enable", "--no_check_track_counts". Required privileges: unprivileged role. connection (all) Perform a simple connection test. No perfdata is returned. This service ignores critical and warning arguments. Required privileges: unprivileged role. custom_query (all) Perform the given user query. Specify the query with "--query". The first column will be used to perform the test for the status if warning and critical are provided. The warning and critical arguments are optional. They can be of format integer (default), size or time depending on the "--type" argument. Warning and Critical will be raised if they are greater than the first column, or less if the "--reverse" option is used. All other columns will be used to generate the perfdata. Each field name is used as the name of the perfdata. The field value must contain your perfdata value and its unit appended to it. You can add as many fields as needed. Eg.: SELECT pg_database_size('postgres'), pg_database_size('postgres')||'B' AS db_size Required privileges: unprivileged role (depends on the query). database_size (8.1+) Check the variation of database sizes, and return the size of every databases. This service uses the status file (see "--status-file" parameter). Perfdata contains the size of each database and their size delta since last call. Critical and Warning thresholds are optional. They are a list of optional 'label=value' separated by a comma. It allows to fine tune the alert based on the absolute "size" and/or the "delta" size. Eg.: -w 'size=500GB' -c 'size=600GB' -w 'delta=1%' -c 'delta=10%' -w 'size=500GB,delta=1%' -c 'size=600GB,delta=10GB' The "size" label accepts either a raw number or a size and checks the total database size. The "delta" label accepts either a raw number, a percentage, or a size. The aim of the delta parameter is to detect unexpected database size variations. Delta thresholds are absolute value, and delta percentages are computed against the previous database size. A same label must be filled for both warning and critical. For backward compatibility, if a single raw number or percentage or size is given with no label, it applies on the size difference for each database since the last execution. Both threshold bellow are equivalent: -w 'delta=1%' -c 'delta=10%' -w '1%' -c '10%' This service supports both "--dbexclude" and "--dbinclude" parameters. Required privileges: unprivileged role. extensions_versions (9.1+) Check all extensions installed in all databases (including templates) and raise a critical alert if the current version is not the default version available on the instance (according to pg_available_extensions). Typically, it is used to detect forgotten extension upgrades after package upgrades or a pg_upgrade. Perfdata returns the number of outdated extensions in each database. This service supports both "--dbexclude" and "--dbinclude" parameters. Schemas are ignored, as an extension cannot be installed more than once in a database. This service supports multiple "--exclude" argument to exclude one or more extensions from the check. To ignore an extension only in a particular database, use 'dbname/extension_name' syntax. Examples: --dbexclude 'devdb' --exclude 'testdb/postgis' --exclude 'testdb/postgis_topology' --dbinclude 'proddb' --dbinclude 'testdb' --exclude 'powa' Required privileges: unprivileged role able to log in all databases hit_ratio (all) Check the cache hit ratio on the cluster. This service uses the status file (see "--status-file" parameter). Perfdata returns the cache hit ratio per database. Template databases and databases that do not allow connections will not be checked, nor will the databases which have never been accessed. Critical and Warning thresholds are optional. They only accept a percentage. This service supports both "--dbexclude" and "--dbinclude" parameters. Required privileges: unprivileged role. hot_standby_delta (9.0) Check the data delta between a cluster and its hot standbys. You must give the connection parameters for two or more clusters. Perfdata returns the data delta in bytes between the master and each hot standby cluster listed. Critical and Warning thresholds are optional. They can take one or two values separated by a comma. If only one value given, it applies to both received and replayed data. If two values are given, the first one applies to received data, the second one to replayed ones. These thresholds only accept a size (eg. 2.5G). This service raises a Critical if it doesn't find exactly ONE valid master cluster (ie. critical when 0 or 2 and more masters). Required privileges: unprivileged role. is_hot_standby (9.0+) Checks if the cluster is in recovery and accepts read only queries. This service ignores critical and warning arguments. No perfdata is returned. Required privileges: unprivileged role. is_master (all) Checks if the cluster accepts read and/or write queries. This state is reported as "in production" by pg_controldata. This service ignores critical and warning arguments. No perfdata is returned. Required privileges: unprivileged role. invalid_indexes (8.2+) Check if there are invalid indexes in a database. A critical alert is raised if an invalid index is detected. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. This service supports a "--exclude REGEX" parameter to exclude indexes matching a regular expression. The regular expression applies to "database.schema_name.index_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named index (schema + index) for all databases or on a qualified named index in only one database. You can use multiple "--exclude REGEX" parameters. Perfdata will return the number of invalid indexes per database. A list of invalid indexes will be returned after the perfdata. This list contains the fully qualified index name. If excluded index is set, the number of exclude indexes is returned. Required privileges: unprivileged role able to log in all databases. is_replay_paused (9.1+) Checks if the replication is paused. The service will return UNKNOWN if executed on a master server. Thresholds are optional. They must be specified as interval. OK will always be returned if the standby is not paused, even if replication delta time hits the thresholds. Critical or warning are raised if last reported replayed timestamp is greater than given threshold AND some data received from the master are not applied yet. OK will always be returned if the standby is paused, or if the standby has already replayed everything from master and until some write activity happens on the master. Perfdata returned: * paused status (0 no, 1 yes, NaN if master) * lag time (in second) * data delta with master (0 no, 1 yes) Required privileges: unprivileged role. last_analyze (8.2+) Check on each databases that the oldest "analyze" (from autovacuum or not) is not older than the given threshold. This service uses the status file (see "--status-file" parameter) with PostgreSQL 9.1+. Perfdata returns oldest "analyze" per database in seconds. With PostgreSQL 9.1+, the number of [auto]analyses per database since last call is also returned. Critical and Warning thresholds only accept an interval (eg. 1h30m25s) and apply to the oldest execution of analyse. Tables that were never analyzed, or whose analyze date was lost due to a crash, will raise a critical alert. NOTE: this service does not raise alerts if the database had strictly no writes since last call. In consequence, a read-only database can have its oldest analyze reported in perfdata way after your thresholds, but not raise any alerts. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. Required privileges: unprivileged role able to log in all databases. last_vacuum (8.2+) Check that the oldest vacuum (from autovacuum or otherwise) in each database in the cluster is not older than the given threshold. This service uses the status file (see "--status-file" parameter) with PostgreSQL 9.1+. Perfdata returns oldest vacuum per database in seconds. With PostgreSQL 9.1+, it also returns the number of [auto]vacuums per database since last execution. Critical and Warning thresholds only accept an interval (eg. 1h30m25s) and apply to the oldest vacuum. Tables that were never vacuumed, or whose vacuum date was lost due to a crash, will raise a critical alert. NOTE: this service does not raise alerts if the database had strictly no writes since last call. In consequence, a read-only database can have its oldest vacuum reported in perfdata way after your thresholds, but not raise any alerts. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. Required privileges: unprivileged role able to log in all databases. locks (all) Check the number of locks on the hosts. Perfdata returns the number of locks, by type. Critical and Warning thresholds accept either a raw number of locks or a percentage. For percentage, it is computed using the following limits for 7.4 to 8.1: max_locks_per_transaction * max_connections for 8.2+: max_locks_per_transaction * (max_connections + max_prepared_transactions) for 9.1+, regarding lockmode : max_locks_per_transaction * (max_connections + max_prepared_transactions) or max_pred_locks_per_transaction * (max_connections + max_prepared_transactions) Required privileges: unprivileged role. longest_query (all) Check the longest running query in the cluster. Perfdata contains the max/avg/min running time and the number of queries per database. Critical and Warning thresholds only accept an interval. This service supports both "--dbexclude" and "--dbinclude" parameters. It also supports argument "--exclude REGEX" to exclude queries matching the given regular expression from the check. Above 9.0, it also supports "--exclude REGEX" to filter out application_name. You can use multiple "--exclude REGEX" parameters. Required privileges: an unprivileged role only checks its own queries; a pg_monitor (10+) or superuser (<10) role is required to check all queries. max_freeze_age (all) Checks oldest database by transaction age. Critical and Warning thresholds are optional. They accept either a raw number or percentage for PostgreSQL 8.2 and more. If percentage is given, the thresholds are computed based on the "autovacuum_freeze_max_age" parameter. 100% means that some table(s) reached the maximum age and will trigger an autovacuum freeze. Percentage thresholds should therefore be greater than 100%. Even with no threshold, this service will raise a critical alert if a database has a negative age. Perfdata returns the age of each database. This service supports both "--dbexclude" and "--dbinclude" parameters. Required privileges: unprivileged role. minor_version (all) Check if the cluster is running the most recent minor version of PostgreSQL. Latest versions of PostgreSQL can be fetched from PostgreSQL official website if check_pgactivity has access to it, or must be given as a parameter. Without "--critical" or "--warning" parameters, this service attempts to fetch the latest version numbers online. A critical alert is raised if the minor version is not the most recent. You can optionally set the path to your prefered retrieval tool using the "--path" parameter (eg. "--path '/usr/bin/wget'"). Supported programs are: GET, wget, curl, fetch, lynx, links, links2. If you do not want to (or cannot) query the PostgreSQL website, provide the expected versions using either "--warning" OR "--critical", depending on which return value you want to raise. The given string must contain one or more MINOR versions separated by anything but a '.'. For instance, the following parameters are all equivalent: --critical "10.1 9.6.6 9.5.10 9.4.15 9.3.20 9.2.24 9.1.24 9.0.23 8.4.22" --critical "10.1, 9.6.6, 9.5.10, 9.4.15, 9.3.20, 9.2.24, 9.1.24, 9.0.23, 8.4.22" --critical "10.1,9.6.6,9.5.10,9.4.15,9.3.20,9.2.24,9.1.24,9.0.23,8.4.22" --critical "10.1/9.6.6/9.5.10/9.4.15/9.3.20/9.2.24/9.1.24/9.0.23/8.4.22" Any other value than 3 numbers separated by dots (before version 10.x) or 2 numbers separated by dots (version 10 and above) will be ignored. If the running PostgreSQL major version is not found, the service raises an unknown status. Perfdata returns the numerical version of PostgreSQL. Required privileges: unprivileged role; access to http://www.postgresql.org required to download version numbers. oldest_2pc (8.1+) Check the oldest *two-phase commit transaction* (aka. prepared transaction) in the cluster. Perfdata contains the max/avg age time and the number of prepared transactions per databases. Critical and Warning thresholds only accept an interval. Required privileges: unprivileged role. oldest_idlexact (8.3+) Check the oldest *idle* transaction. Perfdata contains the max/avg age and the number of idle transactions per databases. Critical and Warning thresholds only accept an interval. This service supports both "--dbexclude" and "--dbinclude" parameters. Above 9.2, it supports "--exclude" to filter out connections. Eg., to filter out pg_dump and pg_dumpall, set this to 'pg_dump,pg_dumpall'. Before 9.2, this services checks for idle transaction with their start time. Thus, the service can mistakenly take account of transaction transiently in idle state. From 9.2 and up, the service checks for transaction that really had no activity since the given thresholds. Required privileges: an unprivileged role checks only its own queries; a pg_monitor (10+) or superuser (<10) role is required to check all queries. oldest_xmin (8.4+) Check the xmin *horizon* from distinct sources of xmin retention. Per default, Perfdata outputs the oldest known xmin age for each database among running queries, opened or idle transactions, pending prepared transactions, replication slots and walsender. For versions prior to 9.4, only "2pc" source of xmin retention is checked. Using "--detailed", Perfdata contains the oldest xmin and maximum age for the following source of xmin retention: "query" (a running query), "active_xact" (an opened transaction currently executing a query), "idle_xact" (an opened transaction being idle), "2pc" (a pending prepared transaction), "repslot" (a replication slot) and "walwender" (a WAL sender replication process), for each connectable database. If a source doesn't retain any transaction for a database, NaN is returned. For versions prior to 9.4, only "2pc" source of xmin retention is available, so other sources won't appear in the perfdata. Note that xmin retention from walsender is only set if "hot_standby_feedback" is enabled on remote standby. Critical and Warning thresholds are optional. They only accept a raw number of transaction. This service supports both "--dbexclude"" and "--dbinclude"" parameters. Required privileges: a pg_read_all_stats (10+) or superuser (<10) role is required to check pg_stat_replication. 2PC, pg_stat_activity, and replication slots don't require special privileges. pg_dump_backup Check the age and size of backups. This service uses the status file (see "--status-file" parameter). The "--path" argument contains the location to the backup folder. The supported format is a glob pattern matching every folder or file that you need to check. The "--pattern" is required, and must contain a regular expression matching the backup file name, extracting the database name from the first matching group. Optionally, a "--global-pattern" option can be supplied to check for an additional global file. Examples: To monitor backups like: /var/lib/backups/mydb-20150803.dump /var/lib/backups/otherdb-20150803.dump /var/lib/backups/mydb-20150804.dump /var/lib/backups/otherdb-20150804.dump you must set: --path '/var/lib/backups/*' --pattern '(\w+)-\d+.dump' If the path contains the date, like this: /var/lib/backups/2015-08-03-daily/mydb.dump /var/lib/backups/2015-08-03-daily/otherdb.dump then you can set: --path '/var/lib/backups/*/*.dump' --pattern '/\d+-\d+-\d+-daily/(.*).dump' For compatibility with pg_back (https://github.com/orgrim/pg_back), you should use: --path '/path/*{dump,sql}' --pattern '(\w+)_[0-9-_]+.dump' --global-pattern 'pg_global_[0-9-_]+.sql' The "--critical" and "--warning" thresholds are optional. They accept a list of 'metric=value' separated by a comma. Available metrics are "oldest" and "newest", respectively the age of the oldest and newest backups, and "size", which must be the maximum variation of size since the last check, expressed as a size or a percentage. "mindeltasize", expressed in B, is the minimum variation of size needed to raise an alert. This service supports the "--dbinclude" and "--dbexclude" arguments, to respectively test for the presence of include or exclude files. The argument "--exclude" enables you to exclude files younger than an interval. This is useful to ignore files from a backup in progress. Eg., if your backup process takes 2h, set this to '125m'. Perfdata returns the age of the oldest and newest backups, as well as the size of the newest backups. Required privileges: unprivileged role; the system user needs read access on the directory containing the dumps (but not on the dumps themselves). pga_version Check if this script is running the given version of check_pgactivity. You must provide the expected version using either "--warning" OR "--critical". No perfdata is returned. Required privileges: none. pgdata_permission (8.2+) Check that the instance data directory rights are 700, and belongs to the system user currently running postgresql. The check on rights works on all Unix systems. Checking the user only works on Linux systems (it uses /proc to avoid dependencies). Before 9.3, you need to provide the expected owner using the "--uid" argument, or the owner will not be checked. Required privileges: <11:superuser v11: user with pg_monitor or pg_read_all_setting The system user must also be able to read the folder containing PGDATA: the service has to be executed locally on the monitored server. replication_slots (9.4+) Check the number of WAL files retained and spilled files for each replication slots. Perfdata returns the number of WAL kept for each slot and the number of spilled files in pg_replslot for each logical replication slot. Since v13, if "max_slot_wal_keep_size" is greater or equal to 0, perfdata reports the size of WAL to produce before each slot becomes "unreserved" or "lost". Note that this size can become negative if the WAL status for the limited time where the slot becomes "unreserved". It is set to zero as soon as the last checkpoint finished and the status becomes "lost". This service needs superuser privileges to obtain the number of spill files or returns 0 in last resort. Critical and Warning thresholds are optional. They accept either a raw number (for backward compatibility, only wal threshold will be used) or a list of 'wal=value' and/or 'spilled=value' and/or 'remaining=size'. Respectively number of kept wal files, number of spilled files in pg_replslot for each logical slot and remaining bytes before a slot becomes "unreserved" or "lost". Moreover, with v13 and after, the service raises a warning alert if a slot becomes "unreserved". It raises a critical alert if the slot becomes "lost". Required privileges: v9.4: unprivileged role, or superuser to monitor spilled files for logical replication v11+: unprivileged user with GRANT EXECUTE on function pg_ls_dir(text) Here is somes examples: -w 'wal=50,spilled=20' -c 'wal=100,spilled=40' -w 'spilled=20,remaining=160MB' -c 'spilled=40,remaining=48MB' settings (9.0+) Check if the current settings have changed since they were stored in the service file. The "known" settings are recorded during the very first call of the service. To update the known settings after a configuration change, call this service again with the argument "--save". No perfdata. Critical and Warning thresholds are ignored. A Critical is raised if at least one parameter changed. Required privileges: unprivileged role. sequences_exhausted (7.4+) Check all sequences and raise an alarm if the column or sequences gets too close to the maximum value. The maximum value is calculated from the maxvalue of the sequence or from the column type when the sequence is owned by a column (the smallserial, serial and bigserial types). Perfdata returns the sequences that trigger the alert. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. Critical and Warning thresholds accept a percentage of the sequence filled. Required privileges: unprivileged role able to log in all databases stat_snapshot_age (9.5 to 14 included) Check the age of the statistics snapshot (statistics collector's statistics). This probe helps to detect a frozen stats collector process. Perfdata returns the statistics snapshot age. Critical and Warning thresholds only accept an interval (eg. 1h30m25s). Required privileges: unprivileged role. streaming_delta (9.1+) Check the data delta between a cluster and its standbys in streaming replication. Optional argument "--slave" allows you to specify some slaves that MUST be connected. This argument can be used as many times as desired to check multiple slave connections, or you can specify multiple slaves connections at one time, using comma separated values. Both methods can be used in a single call. The provided values must be of the form "APPLICATION_NAME IP". Both following examples will check for the presence of two slaves: --slave 'slave1 192.168.1.11' --slave 'slave2 192.168.1.12' --slave 'slave1 192.168.1.11','slave2 192.168.1.12' This service supports a "--exclude REGEX" parameter to exclude every result matching a regular expression on application_name or IP address fields. You can use multiple "--exclude REGEX" parameters. Perfdata returns the data delta in bytes between the master and every standbies found, the number of standbies connected and the number of excluded standbies. Critical and Warning thresholds are optional. They can take one or two values separated by a comma. If only one value is supplied, it applies to both flushed and replayed data. If two values are supplied, the first one applies to flushed data, the second one to replayed data. These thresholds only accept a size (eg. 2.5G). Required privileges: unprivileged role. table_unlogged (9.5+) Check if tables are changed to unlogged. In 9.5, you can switch between logged and unlogged. Without "--critical" or "--warning" parameters, this service attempts to fetch all unlogged tables. A critical alert is raised if an unlogged table is detected. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. This service supports a "--exclude REGEX" parameter to exclude relations matching a regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple "--exclude REGEX" parameters. Perfdata will return the number of unlogged tables per database. A list of the unlogged tables will be returned after the perfdata. This list contains the fully qualified table name. If "--exclude REGEX" is set, the number of excluded tables is returned. Required privileges: unprivileged role able to log in all databases, or at least those in "--dbinclude". table_bloat Estimate bloat on tables. Warning and critical thresholds accept a comma-separated list of either raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to bloat size, not object size. If a percentage is given, the threshold will apply to the bloat size compared to the table + TOAST size. If multiple threshold values are passed, check_pgactivity will choose the largest (bloat size) value. This service supports both "--dbexclude" and "--dbinclude" parameters. The 'postgres' database and templates are always excluded. This service supports a "--exclude REGEX" parameter to exclude relations matching the given regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple "--exclude REGEX" parameters. Warning: With a non-superuser role, this service can only check the tables that the given role is granted to read! Perfdata will return the number of tables matching the warning and critical thresholds, per database. A list of the bloated tables will be returned after the perfdata. This list contains the fully qualified bloated table name, the estimated bloat size, the table size and the bloat percentage. Required privileges: superuser (<10) able to log in all databases, or at least those in "--dbinclude"; superuser (<10); on PostgreSQL 10+, a user with the role pg_monitor suffices, provided that you grant SELECT on the system table pg_statistic to the pg_monitor role, in each database of the cluster: "GRANT SELECT ON pg_statistic TO pg_monitor;" temp_files (8.1+) Check the number and size of temp files. This service uses the status file (see "--status-file" parameter) for 9.2+. Perfdata returns the number and total size of temp files found in "pgsql_tmp" folders. They are aggregated by database until 8.2, then by tablespace (see GUC temp_tablespaces). Starting with 9.2, perfdata returns as well the number of temp files per database since last run, the total size of temp files per database since last run and the rate at which temp files were generated. Critical and Warning thresholds are optional. They accept either a number of file (raw value), a size (unit is mandatory to define a size) or both values separated by a comma. Thresholds are applied on current temp files being created AND the number/size of temp files created since last execution. Required privileges: <10: superuser v10: an unprivileged role is possible but it will not monitor databases that it cannot access, nor live temp files v11: an unprivileged role is possible but must be granted EXECUTE on functions pg_ls_dir(text), pg_read_file(text), pg_stat_file(text, boolean); the same restrictions than on v10 will still apply v12+: a role with pg_monitor privilege. uptime (8.1+) Returns time since postmaster start ("uptime", from 8.1), since configuration reload (from 8.4), and since shared memory initialization (from 10). Please note that the uptime is unaffected when the postmaster resets all its children (for example after a kill -9 on a process or a failure). From 10+, the 'time since shared memory init' aims at detecting this situation: in fact we use the age of the oldest non-client child process (usually checkpointer, writer or startup). This needs pg_monitor access to read pg_stat_activity. Critical and Warning thresholds are optional. If both are set, Critical is raised when the postmaster uptime or the time since shared memory initialization is less than the critical threshold. Warning is raised when the time since configuration reload is less than the warning threshold. If only a warning or critical threshold is given, it will be used for both cases. Obviously these alerts will disappear from themselves once enough time has passed. Perfdata contain the three values (when available). Required privileges: pg_monitor on PG10+; otherwise unprivileged role. wal_files (8.1+) Check the number of WAL files. Perfdata returns the total number of WAL files, current number of written WAL, the current number of recycled WAL, the rate of WAL written to disk since the last execution on the master cluster and the current timeline. Critical and Warning thresholds accept either a raw number of files or a percentage. In case of percentage, the limit is computed based on: 100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target) For PostgreSQL 8.1 and 8.2: 100% = 1 + checkpoint_segments * 2 If "wal_keep_segments" is set for 9.0 to 9.4, the limit is the greatest of the following formulas: 100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target) 100% = 1 + wal_keep_segments + 2 * checkpoint_segments For 9.5 to 12, the limit is: 100% = max_wal_size (as a number of WAL) + wal_keep_segments (if set) For 13 and above: 100% = max_wal_size + wal_keep_size (as numbers of WAL) Required privileges: <10:superuser (<10) v10:unprivileged user with pg_monitor v11+ :unprivileged user with pg_monitor, or with grant EXECUTE on function pg_ls_waldir EXAMPLES Execute service "last_vacuum" on host "host=localhost port=5432": check_pgactivity -h localhost -p 5432 -s last_vacuum -w 30m -c 1h30m Execute service "hot_standby_delta" between hosts "service=pg92" and "service=pg92s": check_pgactivity --dbservice pg92,pg92s --service hot_standby_delta -w 32MB -c 160MB Execute service "streaming_delta" on host "service=pg92" to check its slave "stby1" with the IP address "192.168.1.11": check_pgactivity --dbservice pg92 --slave "stby1 192.168.1.11" --service streaming_delta -w 32MB -c 160MB Execute service "hit_ratio" on host "slave" port "5433, excluding database matching the regexps "idelone" and "(?i:sleep)": check_pgactivity -p 5433 -h slave --service hit_ratio --dbexclude idelone --dbexclude "(?i:sleep)" -w 90% -c 80% Execute service "hit_ratio" on host "slave" port "5433, only for databases matching the regexp "importantone": check_pgactivity -p 5433 -h slave --service hit_ratio --dbinclude importantone -w 90% -c 80% VERSION check_pgactivity version 2.7, released on Mon Sep 25 2023. LICENSING This program is open source, licensed under the PostgreSQL license. For license terms, see the LICENSE provided with the sources. AUTHORS Author: Open PostgreSQL Monitoring Development Group Copyright: (C) 2012-2023 Open PostgreSQL Monitoring Development Group check_pgactivity-REL2_7/README.pod000066400000000000000000001377761450426647100170250ustar00rootroot00000000000000=head1 NAME check_pgactivity - PostgreSQL plugin for Nagios =head1 SYNOPSIS check_pgactivity {-w|--warning THRESHOLD} {-c|--critical THRESHOLD} [-s|--service SERVICE ] [-h|--host HOST] [-U|--username ROLE] [-p|--port PORT] [-d|--dbname DATABASE] [-S|--dbservice SERVICE_NAME] [-P|--psql PATH] [--debug] [--status-file FILE] [--path PATH] [-t|--timemout TIMEOUT] check_pgactivity [-l|--list] check_pgactivity [--help] =head1 DESCRIPTION check_pgactivity is designed to monitor PostgreSQL clusters from Nagios. It offers many options to measure and monitor useful performance metrics. =head1 COMPATIBILITY Each service is available from a different PostgreSQL version, from 7.4, as documented below. The psql client must be 8.3 at least. It can be used with an older server. Please report any undocumented incompatibility. =over =item B<-s>, B<--service> SERVICE The Nagios service to run. See section SERVICES for a description of available services or use C<--list> for a short service and description list. =item B<-h>, B<--host> HOST Database server host or socket directory (default: $PGHOST or "localhost") See section C for more informations. =item B<-U>, B<--username> ROLE Database user name (default: $PGUSER or "postgres"). See section C for more informations. =item B<-p>, B<--port> PORT Database server port (default: $PGPORT or "5432"). See section C for more informations. =item B<-d>, B<--dbname> DATABASE Database name to connect to (default: $PGDATABASE or "template1"). B! This is not necessarily one of the database that will be checked. See C<--dbinclude> and C<--dbexclude> . See section C for more informations. =item B<-S>, B<--dbservice> SERVICE_NAME The connection service name from pg_service.conf to use. See section C for more informations. =item B<--dbexclude> REGEXP Some services automatically check all the databases of your cluster (note: that does not mean they always need to connect on all of them to check them though). C<--dbexclude> excludes any database whose name matches the given Perl regular expression. Repeat this option as many time as needed. See C<--dbinclude> as well. If a database match both dbexclude and dbinclude arguments, it is excluded. =item B<--dbinclude> REGEXP Some services automatically check all the databases of your cluster (note: that does not imply that they always need to connect to all of them though). Some always exclude the 'postgres' database and templates. C<--dbinclude> checks B databases whose names match the given Perl regular expression. Repeat this option as many time as needed. See C<--dbexclude> as well. If a database match both dbexclude and dbinclude arguments, it is excluded. =item B<-w>, B<--warning> THRESHOLD The Warning threshold. =item B<-c>, B<--critical> THRESHOLD The Critical threshold. =item B<-F>, B<--format> OUTPUT_FORMAT The output format. Supported output are: C, C, C, C, C, C and C. Using the C format, the results are written in a binary file (using perl module C) given in argument C<--output>. If no output is given, defaults to file C in the same directory as the script. The C and C formats are equivalent to the C and C formats respectively. The only difference is that they enforce the units to follow the strict Nagios specs: B, c, s or %. Any unit absent from this list is dropped (Bps, Tps, etc). =item B<--tmpdir> DIRECTORY Path to a directory where the script can create temporary files. The script relies on the system default temporary directory if possible. =item B<-P>, B<--psql> FILE Path to the C executable (default: "psql"). It should be version 8.3 at least, but the server can be older. =item B<--status-file> PATH Path to the file where service status information is kept between successive calls. Default is to save a file called C in the same directory as the script. Note that this file is protected from concurrent writes using a lock file located in the same directory, having the same name than the status file, but with the extension C<.lock>. On some plateform, network filesystems may not be supported correctly by the locking mechanism. See C for more information. =item B<--dump-status-file> Dump the content of the status file and exit. This is useful for debugging purpose. =item B<--dump-bin-file> [PATH] Dump the content of the given binary file previously created using C<--format binary>. If no path is given, defaults to file C in the same directory as the script. =item B<-t>, B<--timeout> TIMEOUT Timeout (default: "30s"), as raw (in seconds) or as an interval. This timeout will be used as C for psql and URL timeout for C service. =item B<-l>, B<--list> List available services. =item B<-V>, B<--version> Print version and exit. =item B<--debug> Print some debug messages. =item B<-?>, B<--help> Show this help page. =back =head2 THRESHOLDS THRESHOLDS provided as warning and critical values can be raw numbers, percentages, intervals or sizes. Each available service supports one or more formats (eg. a size and a percentage). =over =item B If THRESHOLD is a percentage, the value should end with a '%' (no space). For instance: 95%. =item B If THRESHOLD is an interval, the following units are accepted (not case sensitive): s (second), m (minute), h (hour), d (day). You can use more than one unit per given value. If not set, the last unit is in seconds. For instance: "1h 55m 6" = "1h55m6s". =pod =item B If THRESHOLD is a size, the following units are accepted (not case sensitive): b (Byte), k (KB), m (MB), g (GB), t (TB), p (PB), e (EB) or Z (ZB). Only integers are accepted. Eg. C<1.5MB> will be refused, use C<1500kB>. The factor between units is 1024 bytes. Eg. C<1g = 1G = 1024*1024*1024.> =back =head2 CONNECTIONS check_pgactivity allows two different connection specifications: by service or by specifying values for host, user, port, and database. Some services can run on multiple hosts, or needs to connect to multiple hosts. You might specify one of the parameters below to connect to your PostgreSQL instance. If you don't, no connection parameters are given to psql: connection relies on binary defaults and environment. The format for connection parameters is: =over =item B C<--dbservice SERVICE_NAME> Define a new host using the given service. Multiple hosts can be defined by listing multiple services separated by a comma. Eg. --dbservice service1,service2 For more information about service definition, see: L =item B C<--host HOST>, C<--port PORT>, C<--user ROLE> or C<--dbname DATABASE> One parameter is enough to define a new host. Usual environment variables (PGHOST, PGPORT, PGDATABASE, PGUSER, PGSERVICE, PGPASSWORD) or default values are used for missing parameters. As for usual PostgreSQL tools, there is no command line argument to set the password, to avoid exposing it. Use PGPASSWORD, .pgpass or a service file (recommended). If multiple values are given, define as many host as maximum given values. Values are associated by position. Eg.: --host h1,h2 --port 5432,5433 Means "host=h1 port=5432" and "host=h2 port=5433". If the number of values is different between parameters, any host missing a parameter will use the first given value for this parameter. Eg.: --host h1,h2 --port 5433 Means: "host=h1 port=5433" and "host=h2 port=5433". =item B For instance: --dbservice s1 --host h1 --port 5433 means: use "service=s1" and "host=h1 port=5433" in this order. If the service supports only one host, the second host is ignored. =item B You can not overwrite services connections variables with parameters C<--host HOST>, C<--port PORT>, C<--user ROLE> or C<--dbname DATABASE> =back =head2 SERVICES Descriptions and parameters of available services. =over =item B Check if all archived WALs exist between the oldest and the latest WAL in the archive folder and make sure they are 16MB. The given folder must have archived files from ONE cluster. The version of PostgreSQL that created the archives is only checked on the last one, for performance consideration. This service requires the argument C<--path> on the command line to specify the archive folder path to check. Obviously, it must have access to this folder at the filesystem level: you may have to execute it on the archiving server rather than on the PostgreSQL instance. The optional argument C<--suffix> defines the suffix of your archived WALs; this is useful for compressed WALs (eg. .gz, .bz2, ...). Default is no suffix. This service needs to read the header of one of the archives to define how many segments a WAL owns. Check_pgactivity automatically handles files with extensions .gz, .bz2, .xz, .zip or .7z using the following commands: gzip -dc bzip2 -dc xz -dc unzip -qqp 7z x -so If needed, provide your own command that writes the uncompressed file to standard output with the C<--unarchiver> argument. Optional argument C<--ignore-wal-size> skips the WAL size check. This is useful if your archived WALs are compressed and check_pgactivity is unable to guess the original size. Here are the commands check_pgactivity uses to guess the original size of .gz, .xz or .zip files: gzip -ql xz -ql unzip -qql Default behaviour is to check the WALs size. Perfdata contains the number of archived WALs and the age of the most recent one. Critical and Warning define the max age of the latest archived WAL as an interval (eg. 5m or 300s ). Required privileges: unprivileged role; the system user needs read access to archived WAL files. Sample commands: check_pgactivity -s archive_folder --path /path/to/archives -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --suffix .gz -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --ignore-wal-size --suffix .bz2 -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --unarchiver "unrar p" --ignore-wal-size --suffix .rar -w 15m -c 30m =item B (8.1+) Check if the archiver is working properly and the number of WAL files ready to archive. Perfdata returns the number of WAL files waiting to be archived. Critical and Warning thresholds are optional. They apply on the number of files waiting to be archived. They only accept a raw number of files. Whatever the given threshold, a critical alert is raised if the archiver process did not archive the oldest waiting WAL to be archived since last call. Required privileges: superuser ( (8.1+) Check the autovacuum activity on the cluster. Perfdata contains the age of oldest running autovacuum and the number of workers by type (VACUUM, VACUUM ANALYZE, ANALYZE, VACUUM FREEZE). Thresholds, if any, are ignored. Required privileges: unprivileged role. =item B (all) Check the total number of connections in the PostgreSQL cluster. Perfdata contains the number of connections per database. Critical and Warning thresholds accept either a raw number or a percentage (eg. 80%). When a threshold is a percentage, it is compared to the difference between the cluster parameters C and C. Required privileges: an unprivileged user only sees its own queries; a pg_monitor (10+) or superuser (<10) role is required to see all queries. =item B (8.2+) Check the status of all backends. Depending on your PostgreSQL version, statuses are: C, C, C (>=9.0 only), C, C, C, C, C and C. B appears when you are not allowed to see the statuses of other connections. This service supports the argument C<--exclude REGEX> to exclude queries matching the given regular expression. You can use multiple C<--exclude REGEX> arguments. Critical and Warning thresholds are optional. They accept a list of 'status_label=value' separated by a comma. Available labels are C, C, C, C, C and C. Values are raw numbers or time units and empty lists are forbidden. Here is an example: -w 'waiting=5,idle_xact=10' -c 'waiting=20,idle_xact=30,active=1d' Perfdata contains the number of backends for each status and the oldest one for each of them, for 8.2+. Note that the number of backends reported in Nagios message B excluded backends. Required privileges: an unprivileged user only sees its own queries; a pg_monitor (10+) or superuser (<10) role is required to see all queries. =item B (12+) Check for data checksums error, reported in pg_stat_database. This service requires that data checksums are enabled on the target instance. UNKNOWN will be returned if that's not the case. Critical and Warning thresholds are optional. They only accept a raw number of checksums errors per database. If the thresholds are not provided, a default value of `1` will be used for both thresholds. Checksums errors are CRITICAL issues, so it's highly recommended to keep default threshold, as immediate action should be taken as soon as such a problem arises. Perfdata contains the number of error per database. Required privileges: unprivileged user. =item B (8.1+) Check the age of the backup label file. Perfdata returns the age of the backup_label file, -1 if not present. Critical and Warning thresholds only accept an interval (eg. 1h30m25s). Required privileges: grant execute on function pg_stat_file(text, boolean) (pg12+); unprivileged role (9.3+); superuser (<9.3) =item B (8.3+) Check the percentage of pages written by backends since last check. This service uses the status file (see C<--status-file> parameter). Perfdata contains the ratio per second for each C counter since last execution. Units Nps for checkpoints, max written clean and fsyncs are the number of "events" per second. Critical and Warning thresholds are optional. If set, they I accept a percentage. Required privileges: unprivileged role. =item B Estimate bloat on B-tree indexes. Warning and critical thresholds accept a comma-separated list of either raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to B size, not object size. If a percentage is given, the threshold will apply to the bloat size compared to the total index size. If multiple threshold values are passed, check_pgactivity will choose the largest (bloat size) value. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. It also supports a C<--exclude REGEX> parameter to exclude relations matching a regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple C<--exclude REGEX> parameters. Perfdata will return the number of indexes of concern, by warning and critical threshold per database. A list of the bloated indexes will be returned after the perfdata. This list contains the fully qualified bloated index name, the estimated bloat size, the index size and the bloat percentage. Required privileges: superuser (<10) able to log in all databases, or at least those in C<--dbinclude>; superuser (<10); on PostgreSQL 10+, a user with the role pg_monitor suffices, provided that you grant SELECT on the system table pg_statistic to the pg_monitor role, in each database of the cluster: C =item B (14+) Gather miscellaneous session statistics. This service uses the status file (see --status-file parameter). Perfdata contains the session / active / idle-in-transaction times for each database since last call, as well as the number of sessions per second, and the number of sessions killed / abandoned / terminated by fatal errors. Required privileges: unprivileged role. =item B (all) Check the commit and rollback rate per second since last call. This service uses the status file (see --status-file parameter). Perfdata contains the commit rate, rollback rate, transaction rate and rollback ratio for each database since last call. Critical and Warning thresholds are optional. They accept a list of comma separated 'label=value'. Available labels are B, B and B, which will be compared to the number of rollbacks, the rollback rate and the rollback ratio of each database. Warning or critical will be raised if the reported value is greater than B, B or B. Required privileges: unprivileged role. =item B (8.0+) Check the most important settings. Warning and Critical thresholds are ignored. Specific parameters are : C<--work_mem>, C<--maintenance_work_mem>, C<--shared_buffers>,C<--wal_buffers>, C<--checkpoint_segments>, C<--effective_cache_size>, C<--no_check_autovacuum>, C<--no_check_fsync>, C<--no_check_enable>, C<--no_check_track_counts>. Required privileges: unprivileged role. =item B (all) Perform a simple connection test. No perfdata is returned. This service ignores critical and warning arguments. Required privileges: unprivileged role. =item B (all) Perform the given user query. Specify the query with C<--query>. The first column will be used to perform the test for the status if warning and critical are provided. The warning and critical arguments are optional. They can be of format integer (default), size or time depending on the C<--type> argument. Warning and Critical will be raised if they are greater than the first column, or less if the C<--reverse> option is used. All other columns will be used to generate the perfdata. Each field name is used as the name of the perfdata. The field value must contain your perfdata value and its unit appended to it. You can add as many fields as needed. Eg.: SELECT pg_database_size('postgres'), pg_database_size('postgres')||'B' AS db_size Required privileges: unprivileged role (depends on the query). =item B (8.1+) B of database sizes, and B of every databases. This service uses the status file (see C<--status-file> parameter). Perfdata contains the size of each database and their size delta since last call. Critical and Warning thresholds are optional. They are a list of optional 'label=value' separated by a comma. It allows to fine tune the alert based on the absolute C and/or the C size. Eg.: -w 'size=500GB' -c 'size=600GB' -w 'delta=1%' -c 'delta=10%' -w 'size=500GB,delta=1%' -c 'size=600GB,delta=10GB' The C label accepts either a raw number or a size and checks the total database size. The C label accepts either a raw number, a percentage, or a size. The aim of the delta parameter is to detect unexpected database size variations. Delta thresholds are absolute value, and delta percentages are computed against the previous database size. A same label must be filled for both warning and critical. For backward compatibility, if a single raw number or percentage or size is given with no label, it applies on the size difference for each database since the last execution. Both threshold bellow are equivalent: -w 'delta=1%' -c 'delta=10%' -w '1%' -c '10%' This service supports both C<--dbexclude> and C<--dbinclude> parameters. Required privileges: unprivileged role. =item B (9.1+) Check all extensions installed in all databases (including templates) and raise a critical alert if the current version is not the default version available on the instance (according to pg_available_extensions). Typically, it is used to detect forgotten extension upgrades after package upgrades or a pg_upgrade. Perfdata returns the number of outdated extensions in each database. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Schemas are ignored, as an extension cannot be installed more than once in a database. This service supports multiple C<--exclude> argument to exclude one or more extensions from the check. To ignore an extension only in a particular database, use 'dbname/extension_name' syntax. Examples: --dbexclude 'devdb' --exclude 'testdb/postgis' --exclude 'testdb/postgis_topology' --dbinclude 'proddb' --dbinclude 'testdb' --exclude 'powa' Required privileges: unprivileged role able to log in all databases =item B (all) Check the cache hit ratio on the cluster. This service uses the status file (see C<--status-file> parameter). Perfdata returns the cache hit ratio per database. Template databases and databases that do not allow connections will not be checked, nor will the databases which have never been accessed. Critical and Warning thresholds are optional. They only accept a percentage. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Required privileges: unprivileged role. =item B (9.0) Check the data delta between a cluster and its hot standbys. You must give the connection parameters for two or more clusters. Perfdata returns the data delta in bytes between the master and each hot standby cluster listed. Critical and Warning thresholds are optional. They can take one or two values separated by a comma. If only one value given, it applies to both received and replayed data. If two values are given, the first one applies to received data, the second one to replayed ones. These thresholds only accept a size (eg. 2.5G). This service raises a Critical if it doesn't find exactly ONE valid master cluster (ie. critical when 0 or 2 and more masters). Required privileges: unprivileged role. =item B (9.0+) Checks if the cluster is in recovery and accepts read only queries. This service ignores critical and warning arguments. No perfdata is returned. Required privileges: unprivileged role. =item B (all) Checks if the cluster accepts read and/or write queries. This state is reported as "in production" by pg_controldata. This service ignores critical and warning arguments. No perfdata is returned. Required privileges: unprivileged role. =item B (8.2+) Check if there are invalid indexes in a database. A critical alert is raised if an invalid index is detected. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. This service supports a C<--exclude REGEX> parameter to exclude indexes matching a regular expression. The regular expression applies to "database.schema_name.index_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named index (schema + index) for all databases or on a qualified named index in only one database. You can use multiple C<--exclude REGEX> parameters. Perfdata will return the number of invalid indexes per database. A list of invalid indexes will be returned after the perfdata. This list contains the fully qualified index name. If excluded index is set, the number of exclude indexes is returned. Required privileges: unprivileged role able to log in all databases. =item B (9.1+) Checks if the replication is paused. The service will return UNKNOWN if executed on a master server. Thresholds are optional. They must be specified as interval. OK will always be returned if the standby is not paused, even if replication delta time hits the thresholds. Critical or warning are raised if last reported replayed timestamp is greater than given threshold AND some data received from the master are not applied yet. OK will always be returned if the standby is paused, or if the standby has already replayed everything from master and until some write activity happens on the master. Perfdata returned: * paused status (0 no, 1 yes, NaN if master) * lag time (in second) * data delta with master (0 no, 1 yes) Required privileges: unprivileged role. =item B (8.2+) Check on each databases that the oldest C (from autovacuum or not) is not older than the given threshold. This service uses the status file (see C<--status-file> parameter) with PostgreSQL 9.1+. Perfdata returns oldest C per database in seconds. With PostgreSQL 9.1+, the number of [auto]analyses per database since last call is also returned. Critical and Warning thresholds only accept an interval (eg. 1h30m25s) and apply to the oldest execution of analyse. Tables that were never analyzed, or whose analyze date was lost due to a crash, will raise a critical alert. B: this service does not raise alerts if the database had strictly no writes since last call. In consequence, a read-only database can have its oldest analyze reported in perfdata way after your thresholds, but not raise any alerts. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. Required privileges: unprivileged role able to log in all databases. =item B (8.2+) Check that the oldest vacuum (from autovacuum or otherwise) in each database in the cluster is not older than the given threshold. This service uses the status file (see C<--status-file> parameter) with PostgreSQL 9.1+. Perfdata returns oldest vacuum per database in seconds. With PostgreSQL 9.1+, it also returns the number of [auto]vacuums per database since last execution. Critical and Warning thresholds only accept an interval (eg. 1h30m25s) and apply to the oldest vacuum. Tables that were never vacuumed, or whose vacuum date was lost due to a crash, will raise a critical alert. B: this service does not raise alerts if the database had strictly no writes since last call. In consequence, a read-only database can have its oldest vacuum reported in perfdata way after your thresholds, but not raise any alerts. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. Required privileges: unprivileged role able to log in all databases. =item B (all) Check the number of locks on the hosts. Perfdata returns the number of locks, by type. Critical and Warning thresholds accept either a raw number of locks or a percentage. For percentage, it is computed using the following limits for 7.4 to 8.1: max_locks_per_transaction * max_connections for 8.2+: max_locks_per_transaction * (max_connections + max_prepared_transactions) for 9.1+, regarding lockmode : max_locks_per_transaction * (max_connections + max_prepared_transactions) or max_pred_locks_per_transaction * (max_connections + max_prepared_transactions) Required privileges: unprivileged role. =item B (all) Check the longest running query in the cluster. Perfdata contains the max/avg/min running time and the number of queries per database. Critical and Warning thresholds only accept an interval. This service supports both C<--dbexclude> and C<--dbinclude> parameters. It also supports argument C<--exclude REGEX> to exclude queries matching the given regular expression from the check. Above 9.0, it also supports C<--exclude REGEX> to filter out application_name. You can use multiple C<--exclude REGEX> parameters. Required privileges: an unprivileged role only checks its own queries; a pg_monitor (10+) or superuser (<10) role is required to check all queries. =item B (all) Checks oldest database by transaction age. Critical and Warning thresholds are optional. They accept either a raw number or percentage for PostgreSQL 8.2 and more. If percentage is given, the thresholds are computed based on the "autovacuum_freeze_max_age" parameter. 100% means that some table(s) reached the maximum age and will trigger an autovacuum freeze. Percentage thresholds should therefore be greater than 100%. Even with no threshold, this service will raise a critical alert if a database has a negative age. Perfdata returns the age of each database. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Required privileges: unprivileged role. =item B (all) Check if the cluster is running the most recent minor version of PostgreSQL. Latest versions of PostgreSQL can be fetched from PostgreSQL official website if check_pgactivity has access to it, or must be given as a parameter. Without C<--critical> or C<--warning> parameters, this service attempts to fetch the latest version numbers online. A critical alert is raised if the minor version is not the most recent. You can optionally set the path to your prefered retrieval tool using the C<--path> parameter (eg. C<--path '/usr/bin/wget'>). Supported programs are: GET, wget, curl, fetch, lynx, links, links2. If you do not want to (or cannot) query the PostgreSQL website, provide the expected versions using either C<--warning> OR C<--critical>, depending on which return value you want to raise. The given string must contain one or more MINOR versions separated by anything but a '.'. For instance, the following parameters are all equivalent: --critical "10.1 9.6.6 9.5.10 9.4.15 9.3.20 9.2.24 9.1.24 9.0.23 8.4.22" --critical "10.1, 9.6.6, 9.5.10, 9.4.15, 9.3.20, 9.2.24, 9.1.24, 9.0.23, 8.4.22" --critical "10.1,9.6.6,9.5.10,9.4.15,9.3.20,9.2.24,9.1.24,9.0.23,8.4.22" --critical "10.1/9.6.6/9.5.10/9.4.15/9.3.20/9.2.24/9.1.24/9.0.23/8.4.22" Any other value than 3 numbers separated by dots (before version 10.x) or 2 numbers separated by dots (version 10 and above) will be ignored. If the running PostgreSQL major version is not found, the service raises an unknown status. Perfdata returns the numerical version of PostgreSQL. Required privileges: unprivileged role; access to http://www.postgresql.org required to download version numbers. =item B (8.1+) Check the oldest I (aka. prepared transaction) in the cluster. Perfdata contains the max/avg age time and the number of prepared transactions per databases. Critical and Warning thresholds only accept an interval. Required privileges: unprivileged role. =item B (8.3+) Check the oldest I transaction. Perfdata contains the max/avg age and the number of idle transactions per databases. Critical and Warning thresholds only accept an interval. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Above 9.2, it supports C<--exclude> to filter out connections. Eg., to filter out pg_dump and pg_dumpall, set this to 'pg_dump,pg_dumpall'. Before 9.2, this services checks for idle transaction with their start time. Thus, the service can mistakenly take account of transaction transiently in idle state. From 9.2 and up, the service checks for transaction that really had no activity since the given thresholds. Required privileges: an unprivileged role checks only its own queries; a pg_monitor (10+) or superuser (<10) role is required to check all queries. =item B (8.4+) Check the xmin I from distinct sources of xmin retention. Per default, Perfdata outputs the oldest known xmin age for each database among running queries, opened or idle transactions, pending prepared transactions, replication slots and walsender. For versions prior to 9.4, only C<2pc> source of xmin retention is checked. Using C<--detailed>, Perfdata contains the oldest xmin and maximum age for the following source of xmin retention: C (a running query), C (an opened transaction currently executing a query), C (an opened transaction being idle), C<2pc> (a pending prepared transaction), C (a replication slot) and C (a WAL sender replication process), for each connectable database. If a source doesn't retain any transaction for a database, NaN is returned. For versions prior to 9.4, only C<2pc> source of xmin retention is available, so other sources won't appear in the perfdata. Note that xmin retention from walsender is only set if C is enabled on remote standby. Critical and Warning thresholds are optional. They only accept a raw number of transaction. This service supports both C<--dbexclude>" and C<--dbinclude>" parameters. Required privileges: a pg_read_all_stats (10+) or superuser (<10) role is required to check pg_stat_replication. 2PC, pg_stat_activity, and replication slots don't require special privileges. =item B Check the age and size of backups. This service uses the status file (see C<--status-file> parameter). The C<--path> argument contains the location to the backup folder. The supported format is a glob pattern matching every folder or file that you need to check. The C<--pattern> is required, and must contain a regular expression matching the backup file name, extracting the database name from the first matching group. Optionally, a C<--global-pattern> option can be supplied to check for an additional global file. Examples: To monitor backups like: /var/lib/backups/mydb-20150803.dump /var/lib/backups/otherdb-20150803.dump /var/lib/backups/mydb-20150804.dump /var/lib/backups/otherdb-20150804.dump you must set: --path '/var/lib/backups/*' --pattern '(\w+)-\d+.dump' If the path contains the date, like this: /var/lib/backups/2015-08-03-daily/mydb.dump /var/lib/backups/2015-08-03-daily/otherdb.dump then you can set: --path '/var/lib/backups/*/*.dump' --pattern '/\d+-\d+-\d+-daily/(.*).dump' For compatibility with pg_back (https://github.com/orgrim/pg_back), you should use: --path '/path/*{dump,sql}' --pattern '(\w+)_[0-9-_]+.dump' --global-pattern 'pg_global_[0-9-_]+.sql' The C<--critical> and C<--warning> thresholds are optional. They accept a list of 'metric=value' separated by a comma. Available metrics are C and C, respectively the age of the oldest and newest backups, and C, which must be the maximum variation of size since the last check, expressed as a size or a percentage. C, expressed in B, is the minimum variation of size needed to raise an alert. This service supports the C<--dbinclude> and C<--dbexclude> arguments, to respectively test for the presence of include or exclude files. The argument C<--exclude> enables you to exclude files younger than an interval. This is useful to ignore files from a backup in progress. Eg., if your backup process takes 2h, set this to '125m'. Perfdata returns the age of the oldest and newest backups, as well as the size of the newest backups. Required privileges: unprivileged role; the system user needs read access on the directory containing the dumps (but not on the dumps themselves). =item B Check if this script is running the given version of check_pgactivity. You must provide the expected version using either C<--warning> OR C<--critical>. No perfdata is returned. Required privileges: none. =item B (8.2+) Check that the instance data directory rights are 700, and belongs to the system user currently running postgresql. The check on rights works on all Unix systems. Checking the user only works on Linux systems (it uses /proc to avoid dependencies). Before 9.3, you need to provide the expected owner using the C<--uid> argument, or the owner will not be checked. Required privileges: <11:superuser v11: user with pg_monitor or pg_read_all_setting The system user must also be able to read the folder containing PGDATA: B =item B (9.4+) Check the number of WAL files retained and spilled files for each replication slots. Perfdata returns the number of WAL kept for each slot and the number of spilled files in pg_replslot for each logical replication slot. Since v13, if C is greater or equal to 0, perfdata reports the size of WAL to produce before each slot becomes C or C. Note that this size can become negative if the WAL status for the limited time where the slot becomes C. It is set to zero as soon as the last checkpoint finished and the status becomes C. This service needs superuser privileges to obtain the number of spill files or returns 0 in last resort. Critical and Warning thresholds are optional. They accept either a raw number (for backward compatibility, only wal threshold will be used) or a list of 'wal=value' and/or 'spilled=value' and/or 'remaining=size'. Respectively number of kept wal files, number of spilled files in pg_replslot for each logical slot and remaining bytes before a slot becomes C or C. Moreover, with v13 and after, the service raises a warning alert if a slot becomes C. It raises a critical alert if the slot becomes C. Required privileges: v9.4: unprivileged role, or superuser to monitor spilled files for logical replication v11+: unprivileged user with GRANT EXECUTE on function pg_ls_dir(text) Here is somes examples: -w 'wal=50,spilled=20' -c 'wal=100,spilled=40' -w 'spilled=20,remaining=160MB' -c 'spilled=40,remaining=48MB' =item B (9.0+) Check if the current settings have changed since they were stored in the service file. The "known" settings are recorded during the very first call of the service. To update the known settings after a configuration change, call this service again with the argument C<--save>. No perfdata. Critical and Warning thresholds are ignored. A Critical is raised if at least one parameter changed. Required privileges: unprivileged role. =item B (7.4+) Check all sequences and raise an alarm if the column or sequences gets too close to the maximum value. The maximum value is calculated from the maxvalue of the sequence or from the column type when the sequence is owned by a column (the smallserial, serial and bigserial types). Perfdata returns the sequences that trigger the alert. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. Critical and Warning thresholds accept a percentage of the sequence filled. Required privileges: unprivileged role able to log in all databases =item B (9.5 to 14 included) Check the age of the statistics snapshot (statistics collector's statistics). This probe helps to detect a frozen stats collector process. Perfdata returns the statistics snapshot age. Critical and Warning thresholds only accept an interval (eg. 1h30m25s). Required privileges: unprivileged role. =item B (9.1+) Check the data delta between a cluster and its standbys in streaming replication. Optional argument C<--slave> allows you to specify some slaves that MUST be connected. This argument can be used as many times as desired to check multiple slave connections, or you can specify multiple slaves connections at one time, using comma separated values. Both methods can be used in a single call. The provided values must be of the form "APPLICATION_NAME IP". Both following examples will check for the presence of two slaves: --slave 'slave1 192.168.1.11' --slave 'slave2 192.168.1.12' --slave 'slave1 192.168.1.11','slave2 192.168.1.12' This service supports a C<--exclude REGEX> parameter to exclude every result matching a regular expression on application_name or IP address fields. You can use multiple C<--exclude REGEX> parameters. Perfdata returns the data delta in bytes between the master and every standbies found, the number of standbies connected and the number of excluded standbies. Critical and Warning thresholds are optional. They can take one or two values separated by a comma. If only one value is supplied, it applies to both flushed and replayed data. If two values are supplied, the first one applies to flushed data, the second one to replayed data. These thresholds only accept a size (eg. 2.5G). Required privileges: unprivileged role. =item B (9.5+) Check if tables are changed to unlogged. In 9.5, you can switch between logged and unlogged. Without C<--critical> or C<--warning> parameters, this service attempts to fetch all unlogged tables. A critical alert is raised if an unlogged table is detected. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. This service supports a C<--exclude REGEX> parameter to exclude relations matching a regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple C<--exclude REGEX> parameters. Perfdata will return the number of unlogged tables per database. A list of the unlogged tables will be returned after the perfdata. This list contains the fully qualified table name. If C<--exclude REGEX> is set, the number of excluded tables is returned. Required privileges: unprivileged role able to log in all databases, or at least those in C<--dbinclude>. =item B Estimate bloat on tables. Warning and critical thresholds accept a comma-separated list of either raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to B size, not object size. If a percentage is given, the threshold will apply to the bloat size compared to the table + TOAST size. If multiple threshold values are passed, check_pgactivity will choose the largest (bloat size) value. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. This service supports a C<--exclude REGEX> parameter to exclude relations matching the given regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple C<--exclude REGEX> parameters. B: With a non-superuser role, this service can only check the tables that the given role is granted to read! Perfdata will return the number of tables matching the warning and critical thresholds, per database. A list of the bloated tables will be returned after the perfdata. This list contains the fully qualified bloated table name, the estimated bloat size, the table size and the bloat percentage. Required privileges: superuser (<10) able to log in all databases, or at least those in C<--dbinclude>; superuser (<10); on PostgreSQL 10+, a user with the role pg_monitor suffices, provided that you grant SELECT on the system table pg_statistic to the pg_monitor role, in each database of the cluster: C =item B (8.1+) Check the number and size of temp files. This service uses the status file (see C<--status-file> parameter) for 9.2+. Perfdata returns the number and total size of temp files found in C folders. They are aggregated by database until 8.2, then by tablespace (see GUC temp_tablespaces). Starting with 9.2, perfdata returns as well the number of temp files per database since last run, the total size of temp files per database since last run and the rate at which temp files were generated. Critical and Warning thresholds are optional. They accept either a number of file (raw value), a size (unit is B to define a size) or both values separated by a comma. Thresholds are applied on current temp files being created AND the number/size of temp files created since last execution. Required privileges: <10: superuser v10: an unprivileged role is possible but it will not monitor databases that it cannot access, nor live temp files v11: an unprivileged role is possible but must be granted EXECUTE on functions pg_ls_dir(text), pg_read_file(text), pg_stat_file(text, boolean); the same restrictions than on v10 will still apply v12+: a role with pg_monitor privilege. =item B (8.1+) Returns time since postmaster start ("uptime", from 8.1), since configuration reload (from 8.4), and since shared memory initialization (from 10). Please note that the uptime is unaffected when the postmaster resets all its children (for example after a kill -9 on a process or a failure). From 10+, the 'time since shared memory init' aims at detecting this situation: in fact we use the age of the oldest non-client child process (usually checkpointer, writer or startup). This needs pg_monitor access to read pg_stat_activity. Critical and Warning thresholds are optional. If both are set, Critical is raised when the postmaster uptime or the time since shared memory initialization is less than the critical threshold. Warning is raised when the time since configuration reload is less than the warning threshold. If only a warning or critical threshold is given, it will be used for both cases. Obviously these alerts will disappear from themselves once enough time has passed. Perfdata contain the three values (when available). Required privileges: pg_monitor on PG10+; otherwise unprivileged role. =item B (8.1+) Check the number of WAL files. Perfdata returns the total number of WAL files, current number of written WAL, the current number of recycled WAL, the rate of WAL written to disk since the last execution on the master cluster and the current timeline. Critical and Warning thresholds accept either a raw number of files or a percentage. In case of percentage, the limit is computed based on: 100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target) For PostgreSQL 8.1 and 8.2: 100% = 1 + checkpoint_segments * 2 If C is set for 9.0 to 9.4, the limit is the greatest of the following formulas: 100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target) 100% = 1 + wal_keep_segments + 2 * checkpoint_segments For 9.5 to 12, the limit is: 100% = max_wal_size (as a number of WAL) + wal_keep_segments (if set) For 13 and above: 100% = max_wal_size + wal_keep_size (as numbers of WAL) Required privileges: <10:superuser (<10) v10:unprivileged user with pg_monitor v11+ :unprivileged user with pg_monitor, or with grant EXECUTE on function pg_ls_waldir =back =head2 EXAMPLES =over =item Execute service "last_vacuum" on host "host=localhost port=5432": check_pgactivity -h localhost -p 5432 -s last_vacuum -w 30m -c 1h30m =item Execute service "hot_standby_delta" between hosts "service=pg92" and "service=pg92s": check_pgactivity --dbservice pg92,pg92s --service hot_standby_delta -w 32MB -c 160MB =item Execute service "streaming_delta" on host "service=pg92" to check its slave "stby1" with the IP address "192.168.1.11": check_pgactivity --dbservice pg92 --slave "stby1 192.168.1.11" --service streaming_delta -w 32MB -c 160MB =item Execute service "hit_ratio" on host "slave" port "5433, excluding database matching the regexps "idelone" and "(?i:sleep)": check_pgactivity -p 5433 -h slave --service hit_ratio --dbexclude idelone --dbexclude "(?i:sleep)" -w 90% -c 80% =item Execute service "hit_ratio" on host "slave" port "5433, only for databases matching the regexp "importantone": check_pgactivity -p 5433 -h slave --service hit_ratio --dbinclude importantone -w 90% -c 80% =back =head1 VERSION check_pgactivity version 2.7, released on Mon Sep 25 2023. =head1 LICENSING This program is open source, licensed under the PostgreSQL license. For license terms, see the LICENSE provided with the sources. =head1 AUTHORS S S check_pgactivity-REL2_7/RELEASING.md000066400000000000000000000114561450426647100172010ustar00rootroot00000000000000# Releasing ## Source code In `check_pgactivity`: * edit variable `$VERSION` * update the version and release date at the end of the inline documentation * date format: `LC_TIME=C date +"%a %b %d %Y"` In `check_pgactivity.spec`: * update the tag in the `_tag` variable (first line) * update the version in `Version:` * edit the changelog * date format: `LC_TIME=C date +"%a %b %d %Y"` In `CHANGELOG.md`, add a changelog entry for the new release with the format: ~~~ YYYY-MM-DD vX.Y: * add: description... * ... * change: description... * ... * fix: description... * ... ~~~ In `t/01-pga_version.t`, edit variable `$good_version`. Update documentation using the following commands: ~~~ pod2text check_pgactivity > README podselect check_pgactivity > README.pod ~~~ Update the `contributors` file with new contributors. ## Run tests Run the tests against all possible PostgreSQL releases. At least the one officially supported. Fix code or tests before releasing. ## Commit Commit all these changes together with commit message: `Release X.Y`. ## Tagging and building tar file Directly into the official repo: ~~~ TAG=REL2_4 git tag -a $TAG git push --tags git archive --prefix=check_pgactivity-$TAG/ -o /tmp/check_pgactivity-$TAG.tgz $TAG ~~~ ## Release on github - Go to https://github.com/OPMDG/check_pgactivity/releases - Edit the release notes for the new tag - Set "check_pgactivity $VERSION" as title, eg. "check_pgactivity 2.4" - Here is the format of the release node itself: ~~~ YYYY-MM-DD - Version X.Y Changelog: * item 1 * item 2 * ... ~~~ - Upload the tar file - Save - Check or update https://github.com/OPMDG/check_pgactivity/releases ## Building the RPM file ### Installation ~~~ yum group install "Development Tools" yum install rpmdevtools useradd makerpm ~~~ ### Building the package ~~~ su - makerpm rpmdev-setuptree git clone https://github.com/OPMDG/check_pgactivity.git spectool -R -g check_pgactivity/check_pgactivity.spec rpmbuild -ba check_pgactivity/check_pgactivity.spec ~~~ The RPM is generated into `rpmbuild/RPMS/noarch`. Don't forget to upload the package on github release page. ## Building the Debian package Debian packaging is handled by the Debian Mainteners (see https://salsa.debian.org/?name=check_pgactivity). A new release will trigger the release of a new package. ### Community ## Packages Ping packager on mailing list pgsql-pkg-yum if needed to update the RPM on PGDG repo. ## Nagios Exchange Update: * the release number * the services list * add latest packages, zip and tarball. https://exchange.nagios.org/directory/Plugins/Databases/PostgresQL/check_pgactivity/details Ask Thomas (frost242) Reiss or Jehan-Guillaume (ioguix) de Rorthais for credentials. ## Submit a news on postgresql.org * login: https://www.postgresql.org/account/login/?next=/account/ * go https://www.postgresql.org/account/edit/news/ * click "Submit News Article" * organisation: check_pgactivity project * Title: "Release: check_pgactivity 2.4" * Content: ~~~ check_pgactivity version 2.4 released ======================== check\_pgactivity is a PostgreSQL plugin for Nagios. This plugin is written with a focus on a rich perfdata set. Every new features of PostgreSQL can be easily monitored with check\_pgactivity. Changelog : * ... * ... * ... Here are some useful links: * github repo: [https://github.com/OPMDG/check_pgactivity](https://github.com/OPMDG/check_pgactivity) * reporting issues: [https://github.com/OPMDG/check_pgactivity/issues](https://github.com/OPMDG/check_pgactivity/issues) * latest release: [https://github.com/OPMDG/check_pgactivity/releases/latest](https://github.com/OPMDG/check_pgactivity/releases/latest) * contributors: [https://github.com/OPMDG/check_pgactivity/blob/master/contributors](https://github.com/OPMDG/check_pgactivity/blob/master/contributors) Thanks to all the contributors! ~~~ * check "Related Open Source" * click on submit * wait for moderators... ## pgsql-announce Send a mail to the pgsql-announce mailing list. Eg.: ~~~ check_pgactivity v2.4 has been released on January 30th 2019 under PostgreSQL licence. check_pgactivity is a PostgreSQL plugin for Nagios. This plugin is written with a focus on a rich perfdata set. Every new features of PostgreSQL can be easily monitored with check_pgactivity. Changelog : * ... * ... * ... Here are some useful links: * github repo: https://github.com/OPMDG/check_pgactivity * reporting issues: https://github.com/OPMDG/check_pgactivity/issues * latest release: https://github.com/OPMDG/check_pgactivity/releases/latest * contributors: https://github.com/OPMDG/check_pgactivity/blob/master/contributors Thanks to all the contributors! ~~~ ## Tweets & blogs Make some noise... check_pgactivity-REL2_7/check_pgactivity000077500000000000000000012315541450426647100206200ustar00rootroot00000000000000#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group =head1 NAME check_pgactivity - PostgreSQL plugin for Nagios =head1 SYNOPSIS check_pgactivity {-w|--warning THRESHOLD} {-c|--critical THRESHOLD} [-s|--service SERVICE ] [-h|--host HOST] [-U|--username ROLE] [-p|--port PORT] [-d|--dbname DATABASE] [-S|--dbservice SERVICE_NAME] [-P|--psql PATH] [--debug] [--status-file FILE] [--path PATH] [-t|--timemout TIMEOUT] check_pgactivity [-l|--list] check_pgactivity [--help] =head1 DESCRIPTION check_pgactivity is designed to monitor PostgreSQL clusters from Nagios. It offers many options to measure and monitor useful performance metrics. =head1 COMPATIBILITY Each service is available from a different PostgreSQL version, from 7.4, as documented below. The psql client must be 8.3 at least. It can be used with an older server. Please report any undocumented incompatibility. =cut use vars qw($VERSION $PROGRAM); use strict; use warnings; use 5.008; use POSIX; use Data::Dumper; use File::Basename; use File::Spec; use File::Temp (); use Getopt::Long qw(:config bundling no_ignore_case_always); use List::Util qw(max); use Pod::Usage; use Scalar::Util qw(looks_like_number); use Fcntl qw(:flock); use Storable qw(retrieve store); use Config; use FindBin; # messing with PATH so pod2usage always finds this script my @path = split /$Config{'path_sep'}/ => $ENV{'PATH'}; push @path => $FindBin::Bin; $ENV{'PATH'} = join $Config{'path_sep'} => @path; undef @path; # force the env in English delete $ENV{'LC_ALL'}; $ENV{'LC_ALL'} = 'C'; setlocale( LC_ALL, 'C' ); delete $ENV{'LANG'}; delete $ENV{'LANGUAGE'}; $| = 1; $VERSION = '2.7'; $PROGRAM = 'check_pgactivity'; my $PG_VERSION_MIN = 70400; my $PG_VERSION_74 = 70400; my $PG_VERSION_80 = 80000; my $PG_VERSION_81 = 80100; my $PG_VERSION_82 = 80200; my $PG_VERSION_83 = 80300; my $PG_VERSION_84 = 80400; my $PG_VERSION_90 = 90000; my $PG_VERSION_91 = 90100; my $PG_VERSION_92 = 90200; my $PG_VERSION_93 = 90300; my $PG_VERSION_94 = 90400; my $PG_VERSION_95 = 90500; my $PG_VERSION_96 = 90600; my $PG_VERSION_100 = 100000; my $PG_VERSION_110 = 110000; my $PG_VERSION_120 = 120000; my $PG_VERSION_130 = 130000; my $PG_VERSION_140 = 140000; my $PG_VERSION_150 = 150000; # reference to the output sub my $output_fmt; # Available services and descriptions. # # The referenced sub called to exec each service takes one parameter: a # reference to the arguments hash (%args) # # Note that we cannot use Perl prototype for these subroutine as they are # called indirectly (thus the args given by references). my %services = ( # 'service_name' => { # 'sub' => sub reference to call to run this service # 'desc' => 'a description of the service' # } 'autovacuum' => { 'sub' => \&check_autovacuum, 'desc' => 'Check the autovacuum activity.' }, 'backends' => { 'sub' => \&check_backends, 'desc' => 'Number of connections, compared to max_connections.' }, 'backends_status' => { 'sub' => \&check_backends_status, 'desc' => 'Number of connections in relation to their status.' }, 'checksum_errors' => { 'sub' => \&check_checksum_errors, 'desc' => 'Check data checksums errors.' }, 'session_stats' => { 'sub' => \&check_session_stats, 'desc' => 'Miscellaneous session statistics, including session rate.' }, 'commit_ratio' => { 'sub' => \&check_commit_ratio, 'desc' => 'Commit and rollback rate per second and commit ratio since last execution.' }, 'database_size' => { 'sub' => \&check_database_size, 'desc' => 'Variation of database sizes.', }, 'extensions_versions' => { 'sub' => \&check_extensions_versions, 'desc' => 'Check that installed extensions are up-to-date.' }, 'table_unlogged' => { 'sub' => \&check_table_unlogged, 'desc' => 'Check unlogged tables' }, 'wal_files' => { 'sub' => \&check_wal_files, 'desc' => 'Total number of WAL files.', }, 'archiver' => { 'sub' => \&check_archiver, 'desc' => 'Check the archiver status and number of wal files ready to archive.', }, 'last_vacuum' => { 'sub' => \&check_last_vacuum, 'desc' => 'Check the oldest vacuum (from autovacuum or not) on the database.', }, 'last_analyze' => { 'sub' => \&check_last_analyze, 'desc' => 'Check the oldest analyze (from autovacuum or not) on the database.', }, 'locks' => { 'sub' => \&check_locks, 'desc' => 'Check the number of locks on the hosts.' }, 'oldest_2pc' => { 'sub' => \&check_oldest_2pc, 'desc' => 'Check the oldest two-phase commit transaction.' }, 'oldest_idlexact' => { 'sub' => \&check_oldest_idlexact, 'desc' => 'Check the oldest idle transaction.' }, 'oldest_xmin' => { 'sub' => \&check_oldest_xmin, 'desc' => 'Check the xmin horizon from distinct sources of xmin retention.' }, 'longest_query' => { 'sub' => \&check_longest_query, 'desc' => 'Check the longest running query.' }, 'bgwriter' => { 'sub' => \&check_bgwriter, 'desc' => 'Check the bgwriter activity.', }, 'archive_folder' => { 'sub' => \&check_archive_folder, 'desc' => 'Check archives in given folder.', }, 'minor_version' => { 'sub' => \&check_minor_version, 'desc' => 'Check if the PostgreSQL minor version is the latest one.', }, 'hot_standby_delta' => { 'sub' => \&check_hot_standby_delta, 'desc' => 'Check delta in bytes between a master and its hot standbys.', }, 'streaming_delta' => { 'sub' => \&check_streaming_delta, 'desc' => 'Check delta in bytes between a master and its standbys in streaming replication.', }, 'settings' => { 'sub' => \&check_settings, 'desc' => 'Check if the configuration file changed.', }, 'hit_ratio' => { 'sub' => \&check_hit_ratio, 'desc' => 'Check hit ratio on databases.' }, 'backup_label_age' => { 'sub' => \&check_backup_label_age, 'desc' => 'Check age of backup_label file.', }, 'connection' => { 'sub' => \&check_connection, 'desc' => 'Perform a simple connection test.' }, 'custom_query' => { 'sub' => \&check_custom_query, 'desc' => 'Perform the given user query.' }, 'configuration' => { 'sub' => \&check_configuration, 'desc' => 'Check the most important settings.', }, 'btree_bloat' => { 'sub' => \&check_btree_bloat, 'desc' => 'Check B-tree index bloat.' }, 'max_freeze_age' => { 'sub' => \&check_max_freeze_age, 'desc' => 'Check oldest database in transaction age.' }, 'invalid_indexes' => { 'sub' => \&check_invalid_indexes, 'desc' => 'Check for invalid indexes.' }, 'is_master' => { 'sub' => \&check_is_master, 'desc' => 'Check if cluster is in production.' }, 'is_hot_standby' => { 'sub' => \&check_is_hot_standby, 'desc' => 'Check if cluster is a hot standby.' }, 'pga_version' => { 'sub' => \&check_pga_version, 'desc' => 'Check the version of this check_pgactivity script.' }, 'is_replay_paused' => { 'sub' => \&check_is_replay_paused, 'desc' => 'Check if the replication is paused.' }, 'table_bloat' => { 'sub' => \&check_table_bloat, 'desc' => 'Check tables bloat.' }, 'temp_files' => { 'sub' => \&check_temp_files, 'desc' => 'Check temp files generation.' }, 'replication_slots' => { 'sub' => \&check_replication_slots, 'desc' => 'Check delta in bytes of the replication slots.' }, 'pg_dump_backup' => { 'sub' => \&check_pg_dump_backup, 'desc' => 'Check pg_dump backups age and retention policy.' }, 'stat_snapshot_age' => { 'sub' => \&check_stat_snapshot_age, 'desc' => 'Check stats collector\'s stats age.' }, 'sequences_exhausted' => { 'sub' => \&check_sequences_exhausted, 'desc' => 'Check that auto-incremented colums aren\'t reaching their upper limit.' }, 'pgdata_permission' => { 'sub' => \&check_pgdata_permission, 'desc' => 'Check that the permission on PGDATA is 700.' }, 'uptime' => { 'sub' => \&check_uptime, 'desc' => 'Time since postmaster start or configurtion reload.' }, ); =over =item B<-s>, B<--service> SERVICE The Nagios service to run. See section SERVICES for a description of available services or use C<--list> for a short service and description list. =item B<-h>, B<--host> HOST Database server host or socket directory (default: $PGHOST or "localhost") See section C for more informations. =item B<-U>, B<--username> ROLE Database user name (default: $PGUSER or "postgres"). See section C for more informations. =item B<-p>, B<--port> PORT Database server port (default: $PGPORT or "5432"). See section C for more informations. =item B<-d>, B<--dbname> DATABASE Database name to connect to (default: $PGDATABASE or "template1"). B! This is not necessarily one of the database that will be checked. See C<--dbinclude> and C<--dbexclude> . See section C for more informations. =item B<-S>, B<--dbservice> SERVICE_NAME The connection service name from pg_service.conf to use. See section C for more informations. =item B<--dbexclude> REGEXP Some services automatically check all the databases of your cluster (note: that does not mean they always need to connect on all of them to check them though). C<--dbexclude> excludes any database whose name matches the given Perl regular expression. Repeat this option as many time as needed. See C<--dbinclude> as well. If a database match both dbexclude and dbinclude arguments, it is excluded. =item B<--dbinclude> REGEXP Some services automatically check all the databases of your cluster (note: that does not imply that they always need to connect to all of them though). Some always exclude the 'postgres' database and templates. C<--dbinclude> checks B databases whose names match the given Perl regular expression. Repeat this option as many time as needed. See C<--dbexclude> as well. If a database match both dbexclude and dbinclude arguments, it is excluded. =item B<-w>, B<--warning> THRESHOLD The Warning threshold. =item B<-c>, B<--critical> THRESHOLD The Critical threshold. =item B<-F>, B<--format> OUTPUT_FORMAT The output format. Supported output are: C, C, C, C, C, C and C. Using the C format, the results are written in a binary file (using perl module C) given in argument C<--output>. If no output is given, defaults to file C in the same directory as the script. The C and C formats are equivalent to the C and C formats respectively. The only difference is that they enforce the units to follow the strict Nagios specs: B, c, s or %. Any unit absent from this list is dropped (Bps, Tps, etc). =item B<--tmpdir> DIRECTORY Path to a directory where the script can create temporary files. The script relies on the system default temporary directory if possible. =item B<-P>, B<--psql> FILE Path to the C executable (default: "psql"). It should be version 8.3 at least, but the server can be older. =item B<--status-file> PATH Path to the file where service status information is kept between successive calls. Default is to save a file called C in the same directory as the script. Note that this file is protected from concurrent writes using a lock file located in the same directory, having the same name than the status file, but with the extension C<.lock>. On some plateform, network filesystems may not be supported correctly by the locking mechanism. See C for more information. =item B<--dump-status-file> Dump the content of the status file and exit. This is useful for debugging purpose. =item B<--dump-bin-file> [PATH] Dump the content of the given binary file previously created using C<--format binary>. If no path is given, defaults to file C in the same directory as the script. =item B<-t>, B<--timeout> TIMEOUT Timeout (default: "30s"), as raw (in seconds) or as an interval. This timeout will be used as C for psql and URL timeout for C service. =item B<-l>, B<--list> List available services. =item B<-V>, B<--version> Print version and exit. =item B<--debug> Print some debug messages. =item B<-?>, B<--help> Show this help page. =back =cut my %args = ( 'service' => undef, 'host' => undef, 'username' => undef, 'port' => undef, 'dbname' => undef, 'dbservice' => undef, 'detailed' => 0, 'warning' => undef, 'critical' => undef, 'exclude' => [], 'dbexclude' => [], 'dbinclude' => [], 'tmpdir' => File::Spec->tmpdir(), 'psql' => undef, 'path' => undef, 'status-file' => dirname(__FILE__) . '/check_pgactivity.data', 'output' => dirname(__FILE__) . '/check_pgactivity.out', 'query' => undef, 'type' => undef, 'reverse' => 0, 'work_mem' => undef, 'maintenance_work_mem' => undef, 'shared_buffers' => undef, 'wal_buffers' => undef, 'checkpoint_segments' => undef, 'effective_cache_size' => undef, 'no_check_autovacuum' => 0, 'no_check_fsync' => 0, 'no_check_enable' => 0, 'no_check_track_counts' => 0, 'ignore-wal-size' => 0, 'unarchiver' => '', 'save' => 0, 'suffix' => '', 'slave' => [], 'list' => 0, 'help' => 0, 'debug' => 0, 'timeout' => '30s', 'dump-status-file' => 0, 'dump-bin-file' => undef, 'format' => 'nagios', 'uid' => undef ); # Set name of the program without path* my $orig_name = $0; $0 = $PROGRAM; # Die on kill -1, -2, -3 or -15 $SIG{'HUP'} = $SIG{'INT'} = $SIG{'QUIT'} = $SIG{'TERM'} = \&terminate; # Handle SIG sub terminate() { my ($signal) = @_; die ("SIG $signal caught"); } # Print the version and exit sub version() { printf "check_pgactivity version %s, Perl %vd\n", $VERSION, $^V; exit 0; } # List services that can be performed sub list_services() { print "List of available services:\n\n"; foreach my $service ( sort keys %services ) { printf "\t%-17s\t%s\n", $service, $services{$service}{'desc'}; } exit 0; } # Check wrapper around Storable::file_magic to fallback on # Storable::read_magic under perl 5.8 and below # WARNINGS: # * you must hold a lock on the lockfile **BEFORE** calling this sub # * the given arg must be an existing and readable file sub is_storable($) { my $storage = shift; my $head; return defined Storable::file_magic( $storage ) if defined *Storable::file_magic{CODE}; open my $fh, '<', $storage; read $fh, $head, 64; close $fh; return defined Storable::read_magic($head); } # Find a unique string for the database instance connection. # Used by save and load. # # Parameter: host structure ref that holds the "host" and "port" parameters sub find_hostkey($) { my $host = shift; return "$host->{'host'}$host->{'port'}" if defined $host->{'host'} and defined $host->{'port'}; return $host->{'dbservice'} if defined $host->{'dbservice'}; return "binary defaults"; } # Record on disk some given data to a given file. # # To avoid data loss, the sub: # * first require an exclusive lock on the lock file, # * load the existing data # * call the closure to record its data in there # * write back the edited data to disk # * close the file and release the lock. # # The closure is reponsible to define how and where in the structure it wants to # record its data. # # Parameters are : # * $fn_record: a closure responsible to record its data in the given # structure ref. # * $storage: file to write to. sub save_internal($$) { my $fn_record = shift; my $storage = shift; my $all = {}; my $lockfile = "${storage}.lock"; open my $fh, '>', $lockfile or die "can't open «${lockfile}»: $!"; flock($fh, LOCK_EX) or die "can't get exclusive lock on «${lockfile}»: $!"; if ( -e $storage ) { die "can not write to binary file «${storage}»" unless -w ${storage}; die "file «${storage}» not recognized as a check_pgactivity binary file" unless is_storable $storage; eval { $all = retrieve($storage) }; die "could not retrieve data from «${storage}»:\n $@" if $@; } $fn_record->($all); eval { store( $all, $storage ); }; die "could not update data in «${storage}»:\n $@" if $@; # closing the fh releases the lock close $fh or die "could not release «${lockfile}»: $!"; } # Record the given ref content for the given host in a file on disk. # The file is defined by argument "--status-file" on command line. By default: # # dirname(__FILE__) . '/check_pgactivity.data' # # The status file is a data structure saving for each host ($host) the status # of each service ($name). Format of data in this file is: # { # "${host}${port}" => { # "$name" => $ref # } # } # # Each call of save($host, $name, $ref, $storage) only overwrite the data for # given $host for the given $name service. To avoid data loss, the sub first # require an exclusive lock on the lock file, then load the existing data, # write its values, then close the file and release the lock. # # Data can be retrieved later using the "load" sub. # # Parameters are : # * the $host structure ref that holds the "host" and "port" parameters # * the $name of the structure to save # * the $ref of the structure to save # * the $storage to write to sub save($$$$) { my $host = shift; my $name = shift; my $ref = shift; my $storage = shift; my $hostkey = find_hostkey($host); my $fn_record = sub { my $s = shift; $s->{$hostkey}{$name} = $ref; }; save_internal($fn_record, $storage); } # Lock, load, unlock and return the whole content of the status file on disk. sub load_all($) { my $storage = shift; my $lockfile = "${storage}.lock"; my $all; return undef unless -e $storage; die "can not read status file «${storage}»" unless -r $storage; # Make sure that the lockfile exist. It could have been removed, or just # not have been created if upgrading from older versions. if (not -f $lockfile) { open my $fh, '>', $lockfile or die "can't open «${lockfile}»: $!"; close $fh or die "could not release «${lockfile}»: $!"; } open my $fh, '<', $lockfile or die "can't open «${lockfile}»: $!"; flock($fh, LOCK_SH) or die "can't get shared lock on «${lockfile}»: $!"; die "file «${storage}» not recognized as a check_pgactivity status-file" unless is_storable $storage; eval { $all = retrieve($storage) }; die "could not read status file «${storage}»:\n $@" if $@; # closing the fh releases the lock close $fh or die "could not release «${lockfile}»: $!"; return $all; } # Load the given ref content for the given host from the file on disk. # # See "save" sub comments for more info. # Parameters are : # * the host structure ref that holds the "host" and "port" parameters # * the name of the structure to load # * the path to the file storage sub load($$$) { my $host = shift; my $name = shift; my $storage = shift; my $hostkey = find_hostkey( $host ); my $all; $all = load_all( $storage ); return $all->{$hostkey}{$name}; } sub dump_status_file { my $f = shift; my $all; $f = $args{'status-file'} unless defined $f; $f = $args{'output'} unless $f ; $all = load_all( $f ); print Data::Dumper->new( [ $all ] )->Terse(1)->Dump; exit 0; } # Return formatted size string with units. # Parameter: size in bytes sub to_size($) { my $val = shift; my @units = qw{B kB MB GB TB PB EB}; my $size = ''; my $mod = 0; my $i; return $val if $val =~ /^(-?inf)|(NaN$)/i; $val = int($val); for ( $i=0; $i < 6 and abs($val) > 1024; $i++ ) { $mod = $val%1024; $val = int( $val/1024 ); } $val = "$val.$mod" unless $mod == 0; return "${val}$units[$i]"; } # Return formatted time string with units. # Parameter: duration in seconds sub to_interval($) { my $val = shift; my $interval = ''; return $val if $val =~ /^-?inf/i; $val = int($val); if ( $val > 604800 ) { $interval = int( $val / 604800 ) . "w "; $val %= 604800; } if ( $val > 86400 ) { $interval .= int( $val / 86400 ) . "d "; $val %= 86400; } if ( $val > 3600 ) { $interval .= int( $val / 3600 ) . "h"; $val %= 3600; } if ( $val > 60 ) { $interval .= int( $val / 60 ) . "m"; $val %= 60; } $interval .= "${val}s" if $val > 0; return "${val}s" unless $interval; # return a value if $val <= 0 return $interval; } =head2 THRESHOLDS THRESHOLDS provided as warning and critical values can be raw numbers, percentages, intervals or sizes. Each available service supports one or more formats (eg. a size and a percentage). =over =item B If THRESHOLD is a percentage, the value should end with a '%' (no space). For instance: 95%. =item B If THRESHOLD is an interval, the following units are accepted (not case sensitive): s (second), m (minute), h (hour), d (day). You can use more than one unit per given value. If not set, the last unit is in seconds. For instance: "1h 55m 6" = "1h55m6s". =cut sub is_size($){ my $str_size = lc( shift() ); return 1 if $str_size =~ /^\s*[0-9]+([kmgtpez]?[bo]?)?\s*$/ ; return 0; } sub is_time($){ my $str_time = lc( shift() ); return 1 if ( $str_time =~ /^(\s*([0-9]\s*[smhd]?\s*))+$/ ); return 0; } # Return a duration in seconds from an interval (with units). sub get_time($) { my $str_time = lc( shift() ); my $ts = 0; my @date; die( "Malformed interval: «$str_time»!\n" . "Authorized unit are: dD, hH, mM, sS\n" ) unless is_time($str_time); # no bad units should exist after this line! @date = split( /([smhd])/, $str_time ); LOOP_TS: while ( my $val = shift @date ) { $val = int($val); die("Wrong value for an interval: «$val»!") unless defined $val; my $unit = shift(@date) || ''; if ( $unit eq 'm' ) { $ts += $val * 60; next LOOP_TS; } if ( $unit eq 'h' ) { $ts += $val * 3600; next LOOP_TS; } if ( $unit eq 'd' ) { $ts += $val * 86400; next LOOP_TS; } $ts += $val; } return $ts; } =pod =item B If THRESHOLD is a size, the following units are accepted (not case sensitive): b (Byte), k (KB), m (MB), g (GB), t (TB), p (PB), e (EB) or Z (ZB). Only integers are accepted. Eg. C<1.5MB> will be refused, use C<1500kB>. The factor between units is 1024 bytes. Eg. C<1g = 1G = 1024*1024*1024.> =back =cut # Return a size in bytes from a size with unit. # If unit is '%', use the second parameter to compute the size in bytes. sub get_size($;$) { my $str_size = shift; my $size = 0; my $unit = ''; die "Only integers are accepted as size. Adjust the unit to your need." if $str_size =~ /[.,]/; $str_size =~ /^([0-9]+)(.*)$/; $size = int($1); $unit = lc($2); return $size unless $unit ne ''; if ( $unit eq '%' ) { my $ratio = shift; die("Can not compute a ratio without the factor!") unless defined $unit; return int( $size * $ratio / 100 ); } return $size if $unit eq 'b'; return $size * 1024 if $unit =~ '^k[bo]?$'; return $size * 1024**2 if $unit =~ '^m[bo]?$'; return $size * 1024**3 if $unit =~ '^g[bo]?$'; return $size * 1024**4 if $unit =~ '^t[bo]?$'; return $size * 1024**5 if $unit =~ '^p[bo]?$'; return $size * 1024**6 if $unit =~ '^e[bo]?$'; return $size * 1024**7 if $unit =~ '^z[bo]?$'; die("Unknown size unit: $unit"); } =head2 CONNECTIONS check_pgactivity allows two different connection specifications: by service or by specifying values for host, user, port, and database. Some services can run on multiple hosts, or needs to connect to multiple hosts. You might specify one of the parameters below to connect to your PostgreSQL instance. If you don't, no connection parameters are given to psql: connection relies on binary defaults and environment. The format for connection parameters is: =over =item B C<--dbservice SERVICE_NAME> Define a new host using the given service. Multiple hosts can be defined by listing multiple services separated by a comma. Eg. --dbservice service1,service2 For more information about service definition, see: L =item B C<--host HOST>, C<--port PORT>, C<--user ROLE> or C<--dbname DATABASE> One parameter is enough to define a new host. Usual environment variables (PGHOST, PGPORT, PGDATABASE, PGUSER, PGSERVICE, PGPASSWORD) or default values are used for missing parameters. As for usual PostgreSQL tools, there is no command line argument to set the password, to avoid exposing it. Use PGPASSWORD, .pgpass or a service file (recommended). If multiple values are given, define as many host as maximum given values. Values are associated by position. Eg.: --host h1,h2 --port 5432,5433 Means "host=h1 port=5432" and "host=h2 port=5433". If the number of values is different between parameters, any host missing a parameter will use the first given value for this parameter. Eg.: --host h1,h2 --port 5433 Means: "host=h1 port=5433" and "host=h2 port=5433". =item B For instance: --dbservice s1 --host h1 --port 5433 means: use "service=s1" and "host=h1 port=5433" in this order. If the service supports only one host, the second host is ignored. =item B You can not overwrite services connections variables with parameters C<--host HOST>, C<--port PORT>, C<--user ROLE> or C<--dbname DATABASE> =back =cut sub parse_hosts(\%) { my %args = %{ shift() }; my @hosts = (); if (defined $args{'dbservice'}) { push @hosts, { 'dbservice' => $_, 'name' => "service:$_", 'pgversion' => undef } foreach split /,/, $args{'dbservice'}; } # Add as many hosts than necessary depending on given parameters # host/port/db/user. # Any missing parameter will be set to its default value. if (defined $args{'host'} or defined $args{'username'} or defined $args{'port'} or defined $args{'dbname'} ) { $args{'host'} = $ENV{'PGHOST'} || 'localhost' unless defined $args{'host'}; $args{'username'} = $ENV{'PGUSER'} || 'postgres' unless defined $args{'username'}; $args{'port'} = $ENV{'PGPORT'} || '5432' unless defined $args{'port'}; $args{'dbname'} = $ENV{'PGDATABASE'} || 'template1' unless defined $args{'dbname'}; my @dbhosts = split( /,/, $args{'host'} ); my @dbnames = split( /,/, $args{'dbname'} ); my @dbusers = split( /,/, $args{'username'} ); my @dbports = split( /,/, $args{'port'} ); my $nbhosts = max $#dbhosts, $#dbnames, $#dbusers, $#dbports; # Take the first value for each connection property as default. # eg. "-h localhost -p 5432,5433" gives two hosts: # * localhost:5432 # * localhost:5433 for ( my $i = 0; $i <= $nbhosts; $i++ ) { push( @hosts, { 'host' => $dbhosts[$i] || $dbhosts[0], 'port' => $dbports[$i] || $dbports[0], 'db' => $dbnames[$i] || $dbnames[0], 'user' => $dbusers[$i] || $dbusers[0], 'pgversion' => undef } ); $hosts[-1]{'name'} = sprintf('host:%s port:%d db:%s', $hosts[-1]{'host'}, $hosts[-1]{'port'}, $hosts[-1]{'db'} ); } } if ( not @hosts ) { # No connection parameters given. # The psql execution relies on binary defaults and env variables. # We look for libpq environment variables to save them and preserve # default psql behaviour as query() always resets them. my $name = 'binary defaults'; push @hosts, { 'name' => 'binary defaults', 'pgversion' => undef }; $hosts[0]{'host'} = $ENV{'PGHOST'} if defined $ENV{'PGHOST'}; $hosts[0]{'port'} = $ENV{'PGPORT'} if defined $ENV{'PGPORT'}; $hosts[0]{'db'} = $ENV{'PGDATABASE'} if defined $ENV{'PGDATABASE'}; $hosts[0]{'user'} = $ENV{'PGUSER'} if defined $ENV{'PGUSER'}; $hosts[0]{'dbservice'} = $ENV{'PGSERVICE'} if defined $ENV{'PGSERVICE'}; if (defined $ENV{'PGHOST'} ) { $hosts[0]{'host'} = $ENV{'PGHOST'}; $name .= " host:$ENV{'PGHOST'}"; } if (defined $ENV{'PGPORT'} ) { $hosts[0]{'port'} = $ENV{'PGPORT'}; $name .= " port:$ENV{'PGPORT'}"; } if (defined $ENV{'PGDATABASE'} ) { $hosts[0]{'db'} = $ENV{'PGDATABASE'}; $name .= " db:$ENV{'PGDATABASE'}"; } if (defined $ENV{'PGSERVICE'} ) { $hosts[0]{'dbservice'} = $ENV{'PGSERVICE'}; $name .= " service:$ENV{'PGSERVICE'}"; } $hosts[0]{'user'} = $ENV{'PGUSER'} if defined $ENV{'PGUSER'}; $hosts[0]{'name'} = $name; } dprint ('Hosts: '. Dumper(\@hosts)); return \@hosts; } # Execute a query on a host. # Params: # * host # * query # * (optional) database # * (optional) get_fields # The result is an array of arrays: # [ # [column1, ...] # line1 # ... # ] sub query($$;$$$) { my $host = shift; my $query = shift; my $db = shift; my @res = (); my $res = ''; my $RS = chr(30); # ASCII RS (record separator) my $FS = chr(3); # ASCII ETX (end of text) my $get_fields = shift; my $onfail = shift || \&status_unknown; my $tmpfile; my $psqlcmd; my $rc; local $/ = undef; delete $ENV{PGSERVICE}; delete $ENV{PGDATABASE}; delete $ENV{PGHOST}; delete $ENV{PGPORT}; delete $ENV{PGUSER}; delete $ENV{PGOPTIONS}; $ENV{PGDATABASE} = $host->{'db'} if defined $host->{'db'}; $ENV{PGSERVICE} = $host->{'dbservice'} if defined $host->{'dbservice'}; $ENV{PGHOST} = $host->{'host'} if defined $host->{'host'}; $ENV{PGPORT} = $host->{'port'} if defined $host->{'port'}; $ENV{PGUSER} = $host->{'user'} if defined $host->{'user'}; $ENV{PGOPTIONS} = '-c client_encoding=utf8 -c client_min_messages=error -c statement_timeout=' . get_time($args{'timeout'}) * 1000; dprint ("Query: $query\n"); dprint ("Env. service: $ENV{PGSERVICE} \n") if defined $host->{'dbservice'}; dprint ("Env. host : $ENV{PGHOST} \n") if defined $host->{'host'}; dprint ("Env. port : $ENV{PGPORT} \n") if defined $host->{'port'}; dprint ("Env. user : $ENV{PGUSER} \n") if defined $host->{'user'}; dprint ("Env. db : $ENV{PGDATABASE}\n") if defined $host->{'db'}; $tmpfile = File::Temp->new( TEMPLATE => 'check_pga-XXXXXXXX', DIR => $args{'tmpdir'} ) or die "Could not create or write in a temp file!"; print $tmpfile "$query;" or die "Could not create or write in a temp file!"; $psqlcmd = qq{ $args{'psql'} -w --set "ON_ERROR_STOP=1" } . qq{ -qXAf $tmpfile -R $RS -F $FS }; $psqlcmd .= qq{ --dbname='$db' } if defined $db; $res = qx{ $psqlcmd 2>&1 }; $rc = $?; dprint("Query rc: $rc\n"); dprint( sprintf( " stderr (%u): «%s»\n", length $res, $res ) ) if $rc; exit $onfail->('CHECK_PGACTIVITY', [ "Query failed !\n" . $res ] ) unless $rc == 0; if (defined $res) { chop $res; my $col_num; push @res, [ split(chr(3) => $_, -1) ] foreach split (chr(30) => $res, -1); $col_num = scalar( @{ $res[0] } ); shift @res unless defined $get_fields; pop @res if $res[-1][0] =~ m/^\(\d+ rows?\)$/; # check the number of column is valid. # FATAL if the parsing was unsuccessful, eg. if one field contains x30 # or x03. see gh issue #155 foreach my $row ( @res ) { exit status_unknown('CHECK_PGACTIVITY', [ "Could not parse query result!\n" ] ) if scalar( @$row ) != $col_num; } } dprint( "Query result: ". Dumper( \@res ) ); return \@res; } # Select the appropriate query among an hash of queries according to the # backend version and execute it. Same argument order than in "query" sub. # Hash of query must be of this form: # { # pg_version_num => $query1, # ... # } # # where pg_version_num is the minimum PostgreSQL version which can run the # query. This version number is numeric. See "set_pgversion" about # how to compute a PostgreSQL num version, or globals $PG_VERSION_*. sub query_ver($\%;$) { my $host = shift; my %queries = %{ shift() }; # Shift returns undef if the db is not given. The value is then set in # "query" sub my $db = shift; set_pgversion($host); foreach my $ver ( sort { $b <=> $a } keys %queries ) { return query( $host, $queries{$ver}, $db ) if ( $ver <= $host->{'version_num'} ); } return undef; } # Return an array with all databases in given host but # templates and "postgres". # By default does not return templates and 'postgres' database # except if the 2nd optional parameter non empty. Each service # has to decide what suits it. sub get_all_dbname($;$) { my @dbs; my $host = shift; my $cond = shift; my $query = 'SELECT datname FROM pg_database WHERE datallowconn '; $query .= q{ AND NOT datistemplate AND datname <> 'postgres' } if not defined $cond; $query .= ' ORDER BY datname'; push @dbs => $_->[0] foreach ( @{ query( $host, $query ) } ); return \@dbs; } # Query and set the version for the given host sub set_pgversion($) { my $host = shift; unless ( $host->{'version'} ) { my $rs = query( $host, q{SELECT setting FROM pg_catalog.pg_settings WHERE name IN ('server_version_num', 'server_version') ORDER BY name = 'server_version_num'} ); if ( $? != 0 ) { dprint("FATAL: psql error, $!\n"); exit 1; } $host->{'version'} = $rs->[0][0]; chomp( $host->{'version'} ); if ( scalar(@$rs) > 1 ) { # only use server_version_num for PostgreSQL 8.2+ $host->{'version_num'} = $rs->[1][0]; chomp( $host->{'version_num'} ); } elsif ( $host->{'version'} =~ /^(\d+)\.(\d+)(.(\d+))?/ ) { # get back to the regexp handling for PostgreSQL <8.2 $host->{'version_num'} = int($1) * 10000 + int($2) * 100; # alpha/beta version have no minor version number $host->{'version_num'} += int($4) if defined $4; } dprint(sprintf ("host %s is version %s/%s\n", $host->{'name'}, $host->{'version'}, $host->{'version_num'}) ); return; } return 1; } # Check host compatibility, with warning sub is_compat($$$;$) { my $host = shift; my $service = shift; my $min = shift; my $max = shift() || 9999999;; my $ver; set_pgversion($host); $ver = 100*int($host->{'version_num'}/100); unless ( $ver >= $min and $ver <= $max ) { warn sprintf "Service %s is not compatible with host '%s' (v%s).\n", $service, $host->{'name'}, $host->{'version'}; return 0; } return 1; } # Check host compatibility, without warning sub check_compat($$;$) { my $host = shift; my $min = shift; my $max = shift() || 9999999; my $ver; set_pgversion($host); $ver = 100*int($host->{'version_num'}/100); return 0 unless ( $ver >= $min and $ver <= $max ); return 1; } # check guc value sub is_guc($$$) { my $host = shift; my $guc = shift; my $val = shift; my $ans; $ans = query( $host, " SELECT setting FROM pg_catalog.pg_settings WHERE name = '$guc' "); unless (exists $ans->[0][0]) { warn sprintf "Unknown GUC \"$guc\"."; return 0; } dprint("GUC '$guc' value is '$ans->[0][0]', expected '$val'\n"); unless ( $ans->[0][0] eq $val ) { warn sprintf "This service requires \"$guc=$val\"."; return 0; } return 1; } sub dprint { return unless $args{'debug'}; foreach (@_) { print "DEBUG: $_"; } } sub status_unknown($;$$$) { return $output_fmt->( 3, $_[0], $_[1], $_[2], $_[3] ); } sub status_critical($;$$$) { return $output_fmt->( 2, $_[0], $_[1], $_[2], $_[3] ); } sub status_warning($;$$$) { return $output_fmt->( 1, $_[0], $_[1], $_[2], $_[3] ); } sub status_ok($;$$$) { return $output_fmt->( 0, $_[0], $_[1], $_[2], $_[3] ); } sub bin_output ($$;$$$) { my $rc = shift; my $service = shift; my $storage = $args{'output'}; my $ref; my @msg; my @perfdata; my @longmsg; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; $ref = { 'timestamp' => time, 'rc' => $rc, 'service' => $service, 'messages' => \@msg, 'perfdata' => \@perfdata, 'longmsg' => \@longmsg }; my $fn_record = sub { my $s = shift; $s->{ $args{'service'} } = $ref; }; save_internal($fn_record, $storage); } sub debug_output ($$;$$$) { my $rc = shift; my $service = shift; my $ret; my @msg; my @perfdata; my @longmsg; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; $ret = sprintf "%-15s: %s\n", 'Service', $service; $ret .= sprintf "%-15s: 0 (%s)\n", "Returns", "OK" if $rc == 0; $ret .= sprintf "%-15s: 1 (%s)\n", "Returns", "WARNING" if $rc == 1; $ret .= sprintf "%-15s: 2 (%s)\n", "Returns", "CRITICAL" if $rc == 2; $ret .= sprintf "%-15s: 3 (%s)\n", "Returns", "UNKNOWN" if $rc == 3; $ret .= sprintf "%-15s: %s\n", "Message", $_ foreach @msg; $ret .= sprintf "%-15s: %s\n", "Long message", $_ foreach @longmsg; $ret .= sprintf "%-15s: %s\n", "Perfdata", Data::Dumper->new([ $_ ])->Indent(0)->Terse(1)->Dump foreach @perfdata; print $ret; return $rc; } sub human_output ($$;$$$) { my $rc = shift; my $service = shift; my $ret; my @msg; my @perfdata; my @longmsg; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; $ret = sprintf "%-15s: %s\n", 'Service', $service; $ret .= sprintf "%-15s: 0 (%s)\n", "Returns", "OK" if $rc == 0; $ret .= sprintf "%-15s: 1 (%s)\n", "Returns", "WARNING" if $rc == 1; $ret .= sprintf "%-15s: 2 (%s)\n", "Returns", "CRITICAL" if $rc == 2; $ret .= sprintf "%-15s: 3 (%s)\n", "Returns", "UNKNOWN" if $rc == 3; $ret .= sprintf "%-15s: %s\n", "Message", $_ foreach @msg; $ret .= sprintf "%-15s: %s\n", "Long message", $_ foreach @longmsg; foreach my $perfdata ( @perfdata ) { map {$_ = undef unless defined $_} @{$perfdata}[2..6]; if ( defined $$perfdata[2] and $$perfdata[2] =~ /B$/ ) { $ret .= sprintf "%-15s: %s=%s", "Perfdata", $$perfdata[0], to_size($$perfdata[1]); $ret .= sprintf " warn=%s", to_size( $$perfdata[3] ) if defined $$perfdata[3]; $ret .= sprintf " crit=%s", to_size( $$perfdata[4] ) if defined $$perfdata[4]; $ret .= sprintf " min=%s", to_size( $$perfdata[5] ) if defined $$perfdata[5]; $ret .= sprintf " max=%s", to_size( $$perfdata[6] ) if defined $$perfdata[6]; $ret .= "\n"; } elsif ( defined $$perfdata[2] and $$perfdata[2] =~ /\ds$/ ) { $ret .= sprintf "%-15s: %s=%s", "Perfdata", $$perfdata[0], to_interval( $$perfdata[1] ); $ret .= sprintf " warn=%s", to_interval( $$perfdata[3] ) if defined $$perfdata[3]; $ret .= sprintf " crit=%s", to_interval( $$perfdata[4] ) if defined $$perfdata[4]; $ret .= sprintf " min=%s", to_interval( $$perfdata[5] ) if defined $$perfdata[5]; $ret .= sprintf " max=%s", to_interval( $$perfdata[6] ) if defined $$perfdata[6]; $ret .= "\n"; } else { $ret .= sprintf "%-15s: %s=%s", "Perfdata", $$perfdata[0], $$perfdata[1]; $ret .= sprintf "%s", $$perfdata[2] if defined $$perfdata[2]; $ret .= sprintf " warn=%s", $$perfdata[3] if defined $$perfdata[3]; $ret .= sprintf " crit=%s", $$perfdata[4] if defined $$perfdata[4]; $ret .= sprintf " min=%s", $$perfdata[5] if defined $$perfdata[5]; $ret .= sprintf " max=%s", $$perfdata[6] if defined $$perfdata[6]; $ret .= "\n"; } } print $ret; return $rc; } sub nagios_output ($$;$$$) { my $rc = shift; my $ret = shift; my @msg; my @perfdata; my @longmsg; $ret .= " OK" if $rc == 0; $ret .= " WARNING" if $rc == 1; $ret .= " CRITICAL" if $rc == 2; $ret .= " UNKNOWN" if $rc == 3; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; $ret .= ": ". join( ', ', @msg ) if @msg; if ( scalar @perfdata ) { $ret .= " |"; foreach my $perf ( @perfdata ) { # escape quotes $$perf[0] =~ s/'/''/g; # surounding quotes if space in the label $$perf[0] = "'$$perf[0]'" if $$perf[0] =~ /\s/; # the perfdata itself and its unit $ret .= " $$perf[0]=$$perf[1]"; # init and join optional values (unit/warn/crit/min/max) map {$_ = "" unless defined $_} @{$perf}[2..6]; $ret .= join ';' => @$perf[2..6]; # remove useless semi-colons at end $ret =~ s/;*$//; } } $ret .= "\n". join( ' ', @longmsg ) if @longmsg; print $ret; return $rc; } sub set_strict_perfdata { my $perfdata = shift; map { $$_[1] = 'U' if $$_[1] eq 'NaN'; $$_[2] = '' if exists $$_[2] and defined $$_[2] and $$_[2] !~ /\A[Bcs%]\z/; } @{ $perfdata }; } sub nagios_strict_output ($$;$$$) { my $rc = shift; my $ret = shift; my @msg; my @perfdata; my @longmsg; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; set_strict_perfdata ( \@perfdata ); return nagios_output( $rc, $ret, \@msg, \@perfdata, \@longmsg ); } sub json_output ($$;$$$) { my $rc = shift; my $service = shift; my @msg; my @perfdata; my @longmsg; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; my $obj = {}; $obj->{'service'} = $service; $obj->{'status'} = 'OK' if $rc == 0; $obj->{'status'} = 'WARNING' if $rc == 1; $obj->{'status'} = 'CRITICAL' if $rc == 2; $obj->{'status'} = 'UNKNOWN' if $rc == 3; $obj->{'msg'} = \@msg; $obj->{'longmsg'} = \@longmsg; my %data = map{ $$_[0] => { 'val' => $$_[1], 'unit' => $$_[2], 'warn' => $$_[3], 'crit' => $$_[4], 'min' => $$_[5], 'max' => $$_[6] } } @perfdata; $obj->{'perfdata'} = \%data; print encode_json( $obj ); return $rc; } sub json_strict_output ($$;$$$) { my $rc = shift; my $ret = shift; my @msg; my @perfdata; my @longmsg; @msg = @{ $_[0] } if defined $_[0]; @perfdata = @{ $_[1] } if defined $_[1]; @longmsg = @{ $_[2] } if defined $_[2]; set_strict_perfdata ( \@perfdata ); return json_output( $rc, $ret, \@msg, \@perfdata, \@longmsg ); } =head2 SERVICES Descriptions and parameters of available services. =over =item B Check if all archived WALs exist between the oldest and the latest WAL in the archive folder and make sure they are 16MB. The given folder must have archived files from ONE cluster. The version of PostgreSQL that created the archives is only checked on the last one, for performance consideration. This service requires the argument C<--path> on the command line to specify the archive folder path to check. Obviously, it must have access to this folder at the filesystem level: you may have to execute it on the archiving server rather than on the PostgreSQL instance. The optional argument C<--suffix> defines the suffix of your archived WALs; this is useful for compressed WALs (eg. .gz, .bz2, ...). Default is no suffix. This service needs to read the header of one of the archives to define how many segments a WAL owns. Check_pgactivity automatically handles files with extensions .gz, .bz2, .xz, .zip or .7z using the following commands: gzip -dc bzip2 -dc xz -dc unzip -qqp 7z x -so If needed, provide your own command that writes the uncompressed file to standard output with the C<--unarchiver> argument. Optional argument C<--ignore-wal-size> skips the WAL size check. This is useful if your archived WALs are compressed and check_pgactivity is unable to guess the original size. Here are the commands check_pgactivity uses to guess the original size of .gz, .xz or .zip files: gzip -ql xz -ql unzip -qql Default behaviour is to check the WALs size. Perfdata contains the number of archived WALs and the age of the most recent one. Critical and Warning define the max age of the latest archived WAL as an interval (eg. 5m or 300s ). Required privileges: unprivileged role; the system user needs read access to archived WAL files. Sample commands: check_pgactivity -s archive_folder --path /path/to/archives -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --suffix .gz -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --ignore-wal-size --suffix .bz2 -w 15m -c 30m check_pgactivity -s archive_folder --path /path/to/archives --unarchiver "unrar p" --ignore-wal-size --suffix .rar -w 15m -c 30m =cut sub check_archive_folder { my @msg; my @longmsg; my @msg_crit; my @msg_warn; my @perfdata; my @history_files; my @filelist; my @filelist_sorted; my @branch_wals; my $w_limit; my $c_limit; my $timeline; my $start_tl; my $end_tl; my $wal; my $seg; my $latest_wal_age; my $dh; my $fh; my $wal_version; my $filename_re; my $history_re; my $suffix = $args{'suffix'}; my $check_size = not $args{'ignore-wal-size'}; my $me = 'POSTGRES_ARCHIVES'; my $seg_per_wal = 255; # increased later for pg > 9.2 my %args = %{ $_[0] }; my %unarchive_cmd = ( '.gz' => "gzip -dc", '.bz2' => "bzip2 -dc", '.xz' => "xz -dc", '.zip' => "unzip -qqp", '.7z' => "7z x -so" ); my %wal_versions = ( '80' => 53340, '81' => 53341, '82' => 53342, '83' => 53346, '84' => 53347, '90' => 53348, '91' => 53350, '92' => 53361, '93' => 53365, '94' => 53374, '95' => 53383, '96' => 53395, # 0xD093 '100' => 53399, # 0xD097 '110' => 53400, # 0xD098 '120' => 53505, # 0xD101 '130' => 53510, # 0xD106 '140' => 53517, # 0xD10D '150' => 53520, # 0xD110 '160' => 53523 # 0xD113 ); # "path" argument must be given pod2usage( -message => 'FATAL: you must specify the archive folder using "--path ".', -exitval => 127 ) unless defined $args{'path'}; # invalid "path" argument pod2usage( -message => "FATAL: \"$args{'path'}\" is not a valid folder.", -exitval => 127 ) unless -d $args{'path'}; # warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); opendir( $dh, $args{'path'} ) or die "Cannot opendir $args{'path'} : $!\n"; $filename_re = qr/^[0-9A-F]{24}$suffix$/; @filelist = map { [ $_ => (stat("$args{'path'}/$_"))[9,7] ] } grep( /$filename_re/, readdir($dh) ); seekdir( $dh, 0 ); $history_re = qr/^[0-9A-F]{8}.history$suffix$/; @history_files = grep /$history_re/, readdir($dh) ; closedir($dh); return status_unknown( $me, ['No archived WAL found.'] ) unless @filelist; $w_limit = get_time($args{'warning'}); $c_limit = get_time($args{'critical'}); # Sort by mtime @filelist_sorted = sort { ($a->[1] <=> $b->[1]) || ($a->[0] cmp $b->[0]) } grep{ (defined($_->[0]) and defined($_->[1])) or die "Cannot read WAL files" } @filelist; $latest_wal_age = time() - $filelist_sorted[-1][1]; # Read the XLOG_PAGE_MAGIC header to guess $seg_per_wal if ( $args{'unarchiver'} eq '' and $suffix =~ /^.(?:gz|bz2|zip|xz|7z)$/ ) { open $fh, "-|", qq{ $unarchive_cmd{$suffix} "$args{'path'}/$filelist_sorted[-1][0]" 2>/dev/null } or die "could not read first WAL using '$unarchive_cmd{$suffix}': $!"; } elsif ( $args{'unarchiver'} ne '' ) { open $fh, "-|", qq{ $args{'unarchiver'} "$args{'path'}/$filelist_sorted[-1][0]" 2>/dev/null } or die "could not read first WAL using '$args{'unarchiver'}': $!"; } else { # Fallback on raw parsing of first WAL open $fh, "<", "$args{'path'}/$filelist_sorted[-1][0]" or die ("Could not read first WAL: $!\n"); } read( $fh, $wal_version, 2 ); close $fh; $wal_version = unpack('S', $wal_version); die ("Could not parse XLOG_PAGE_MAGIC") unless defined $wal_version; dprint ("wal version: $wal_version\n"); die "Unknown WAL XLOG_PAGE_MAGIC $wal_version!" unless grep /^$wal_version$/ => values %wal_versions; # FIXME: As there is no consensus about XLOG_PAGE_MAGIC algo across # PostgreSQL versions this piece of code should be checked for # compatibility for each new PostgreSQL version to confirm the new # XLOG_PAGE_MAGIC is still greater than the previous one (or at least the # 9.2 one). $seg_per_wal++ if $wal_version >= $wal_versions{'93'}; push @perfdata, [ 'latest_archive_age', $latest_wal_age, 's', $w_limit, $c_limit ]; push @perfdata, [ 'num_archives', scalar(@filelist_sorted) ]; dprint ("first wal: $filelist_sorted[0][0]\n"); dprint ("last wal: $filelist_sorted[-1][0]\n"); $start_tl = substr($filelist_sorted[0][0], 0, 8); $end_tl = substr($filelist_sorted[-1][0], 0, 8); $timeline = hex($start_tl); $wal = hex(substr($filelist_sorted[0][0], 8, 8)); $seg = hex(substr($filelist_sorted[0][0], 16, 8)); # look for history files if timeline differs if ( $start_tl ne $end_tl ) { if ( -s "$args{'path'}/$end_tl.history" ) { open my $fd, "<", "$args{'path'}/$end_tl.history"; while ( <$fd> ) { next unless m{^\s*(\d)\t([0-9A-F]+)/([0-9A-F]+)\t.*$}; push @branch_wals => sprintf("%08d%08s%08X", $1, $2, hex($3)>>24); } close $fd; } } # Check ALL archives are here. for ( my $i=0, my $j=0; $i <= $#filelist_sorted ; $i++, $j++ ) { dprint ("Checking WAL $filelist_sorted[$i][0]\n"); my $curr = sprintf('%08X%08X%08X%s', $timeline, $wal + int(($seg + $j)/$seg_per_wal), ($seg + $j)%$seg_per_wal, $suffix ); if ( $curr ne $filelist_sorted[$i][0] ) { push @msg => "Wrong sequence or file missing @ '$curr'"; last; } if ( $check_size ) { if ( $suffix eq '.gz' ) { my $ans = qx{ gzip -ql "$args{'path'}/$curr" 2>/dev/null }; $filelist_sorted[$i][2] = 16777216 if $ans =~ /^\s*\d+\s+16777216\s/; } elsif ( $suffix eq '.xz' ) { my @ans = qx{ xz -ql --robot "$args{'path'}/$curr" 2>/dev/null }; $filelist_sorted[$i][2] = 16777216 if $ans[-1] =~ /\w+\s+\d+\s+\d+\s+16777216\s+/; } elsif ( $suffix eq '.zip' ) { my $ans; $ans = qx{ unzip -qql "$args{'path'}/$curr" 2>/dev/null }; $filelist_sorted[$i][2] = 16777216 if $ans =~ /^\s*16777216/; } if ( $filelist_sorted[$i][2] != 16777216 ) { push @msg => "'$curr' is not 16MB"; last; } } if ( grep /$curr/, @branch_wals ) { dprint( "Found a boundary @ $curr !\n" ); $timeline++; $j--; } } return status_critical( $me, \@msg, \@perfdata ) if @msg; push @msg => scalar(@filelist_sorted)." WAL archived in '$args{'path'}', " ."latest archived since ". to_interval($latest_wal_age); return status_critical( $me, \@msg, \@perfdata, \@longmsg ) if $latest_wal_age >= $c_limit; return status_warning( $me, \@msg, \@perfdata, \@longmsg ) if $latest_wal_age >= $w_limit; return status_ok( $me, \@msg, \@perfdata, \@longmsg ); } =item B (8.1+) Check if the archiver is working properly and the number of WAL files ready to archive. Perfdata returns the number of WAL files waiting to be archived. Critical and Warning thresholds are optional. They apply on the number of files waiting to be archived. They only accept a raw number of files. Whatever the given threshold, a critical alert is raised if the archiver process did not archive the oldest waiting WAL to be archived since last call. Required privileges: superuser ( "FATAL: critical and warning thresholds only accept raw numbers.", -exitval => 127 ) if defined $args{'critical'} and $args{'warning'} !~ m/^([0-9]+)$/ and $args{'critical'} !~ m/^([0-9]+)$/; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "archiver".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'archiver', $PG_VERSION_81 or exit 1; if (check_compat $hosts[0], $PG_VERSION_81, $PG_VERSION_96) { # cf. pgarch_readyXlog in src/backend/postmaster/pgarch.c about how the # archiver process pick the next WAL to archive. # We try to reproduce the same algo here my $query = q{ SELECT s.f, extract(epoch from (pg_stat_file('pg_xlog/archive_status/'||s.f)).modification), extract(epoch from current_timestamp) FROM pg_ls_dir('pg_xlog/archive_status') AS s(f) WHERE f ~ '^[0123456789ABCDEF.history.backup.partial]{16,40}\.ready$' ORDER BY s.f ASC}; $prev_archiving = load( $hosts[0], 'archiver', $args{'status-file'} ) || ''; @rs = @{ query( $hosts[0], $query ) }; $nb_files = scalar @rs; push @perfdata => [ 'ready_archive', $nb_files, undef, $args{'warning'}, $args{'critical'}, 0 ]; if ( $nb_files > 0 ) { push @perfdata => [ 'oldest_ready_wal', int( $rs[0][2] - $rs[0][1] ), 's', undef, undef, 0 ]; if ( $rs[0][0] ne $prev_archiving ) { save $hosts[0], 'archiver', $rs[0][0], $args{'status-file'}; } else { push @msg => sprintf 'archiver failing on %s', substr($rs[0][0], 0, 24); push @longmsg => sprintf '%s not archived since last check', substr($rs[0][0], 0, -6); } } else { push @perfdata => [ 'oldest_ready_wal', 0, 's', undef, undef, 0 ]; save $hosts[0], 'archiver', '', $args{'status-file'}; } push @msg => "$nb_files WAL files ready to archive"; } else { # Version 10 and higher: use pg_stat_archiver # as the monitoring user may not be super-user. # FIXME: on a slave with archive_mode=always: # 1) fails while parsing an .history file # 2) pg_last_wal_receive_lsn always returns zero if the slave is fed # with pure log shipping (streaming is ok) # field 1: number of WAL segment to archive # field 2: how long archiving is failing in second # field 3: last segment archived # field 4: last failing segment # field 5: mod time of the next wal to archive (only for superuser in v10) my %queries = ( $PG_VERSION_100 => q{ SELECT coalesce(pg_wal_lsn_diff( current_pos, /* compute LSN from last archived offset */ (to_hex(COALESCE(last_archived_off, last_failed_off)/4294967296) ||'/'||to_hex(COALESCE(last_archived_off, last_failed_off)%4294967296))::pg_lsn )::bigint / walsegsize, 0), failing, CASE WHEN failing THEN extract('epoch' from (current_timestamp - last_archived_time)) ELSE 0 END, last_archived_wal, last_failed_wal, /* mod time of the next wal to archive * executed only if superuser */ (SELECT extract('epoch' from (current_timestamp - (pg_stat_file('pg_wal/'||pg_walfile_name( (to_hex((last_archived_off+1)/4294967296) ||'/'||to_hex((last_archived_off+1)%4294967296))::pg_lsn ))).modification )) WHERE current_setting('is_superuser')::bool ) AS oldest FROM ( SELECT last_archived_wal, last_archived_time, last_failed_wal, walsegsize, /* compute last archive offset */ -- WAL offset ('x'||substr(last_archived_wal_hex, 9, 8))::bit(32)::bigint*4294967296 -- offset to the begining of the segment + ('x'||substr(last_archived_wal, 17, 8))::bit(32)::bigint * walsegsize -- offset to the end of the segment + walsegsize AS last_archived_off, ('x'||substr(last_failed_wal_hex, 9, 8))::bit(32)::bigint*4294967296 -- offset to the begining of the segment + ('x'||substr(last_failed_wal, 17, 8))::bit(32)::bigint * walsegsize -- offset to the end of the segment + walsegsize AS last_failed_off, CASE WHEN pg_is_in_recovery() THEN pg_last_wal_receive_lsn() ELSE pg_current_wal_lsn() END AS current_pos, (last_failed_time >= last_archived_time) OR (last_archived_time IS NULL AND last_failed_time IS NOT NULL) AS failing FROM (SELECT last_archived_wal, last_archived_time, CASE WHEN (last_archived_wal NOT LIKE '%.history' AND last_archived_wal NOT LIKE '%.backup') THEN last_archived_wal ELSE NULL -- return NULL if last successfully archived file is a .history or .backup file END AS last_archived_wal_hex, CASE WHEN (last_failed_wal NOT LIKE '%.history' AND last_archived_wal NOT LIKE '%.backup') THEN last_failed_wal ELSE NULL -- return NULL if last failed archive is a .history or .backup file END AS last_failed_wal_hex, last_failed_wal, last_failed_time FROM pg_stat_archiver) AS a CROSS JOIN ( SELECT setting::bigint * CASE unit WHEN '8kB' THEN 8192 WHEN 'B' THEN 1 ELSE 0 END as walsegsize FROM pg_catalog.pg_settings WHERE name = 'wal_segment_size' ) AS s ) stats }, $PG_VERSION_110 => q{ SELECT coalesce(pg_wal_lsn_diff( current_pos, /* compute LSN from last archived offset */ (to_hex(COALESCE(last_archived_off, last_failed_off)/4294967296) ||'/'||to_hex(COALESCE(last_archived_off, last_failed_off)%4294967296))::pg_lsn )::bigint / walsegsize, 0), failing, CASE WHEN failing THEN extract('epoch' from (current_timestamp - last_archived_time)) ELSE 0 END, last_archived_wal, last_failed_wal, /* mod time of the next wal to archive */ extract('epoch' from (current_timestamp - (pg_stat_file('pg_wal/'||pg_walfile_name( (to_hex((last_archived_off+1)/4294967296) ||'/'||to_hex((last_archived_off+1)%4294967296))::pg_lsn ))).modification ) ) AS oldest FROM ( SELECT last_archived_wal, last_archived_time, last_failed_wal, walsegsize, /* compute last archive offset */ -- WAL offset ('x'||substr(last_archived_wal_hex, 9, 8))::bit(32)::bigint*4294967296 -- offset to the begining of the segment + ('x'||substr(last_archived_wal, 17, 8))::bit(32)::bigint * walsegsize -- offset to the end of the segment + walsegsize AS last_archived_off, ('x'||substr(last_failed_wal_hex, 9, 8))::bit(32)::bigint*4294967296 -- offset to the begining of the segment + ('x'||substr(last_failed_wal, 17, 8))::bit(32)::bigint * walsegsize -- offset to the end of the segment + walsegsize AS last_failed_off, CASE WHEN pg_is_in_recovery() THEN pg_last_wal_receive_lsn() ELSE pg_current_wal_lsn() END AS current_pos, (last_failed_time >= last_archived_time) OR (last_archived_time IS NULL AND last_failed_time IS NOT NULL) AS failing FROM (SELECT last_archived_wal, last_archived_time, CASE WHEN (last_archived_wal NOT LIKE '%.history' AND last_archived_wal NOT LIKE '%.backup') THEN last_archived_wal ELSE NULL -- return NULL if last successfully archived file is a .history or .backup file END AS last_archived_wal_hex, CASE WHEN (last_failed_wal NOT LIKE '%.history' AND last_archived_wal NOT LIKE '%.backup') THEN last_failed_wal ELSE NULL -- return NULL if last failed archive is a .history or .backup file END AS last_failed_wal_hex, last_failed_wal, last_failed_time FROM pg_stat_archiver) AS a CROSS JOIN ( SELECT setting::bigint * CASE unit WHEN '8kB' THEN 8192 WHEN 'B' THEN 1 ELSE 0 END as walsegsize FROM pg_catalog.pg_settings WHERE name = 'wal_segment_size' ) AS s ) stats } ); @rs = @{ query_ver( $hosts[0], %queries ) }; $nb_files = $rs[0][0]; push @perfdata => [ 'ready_archive', $nb_files, undef, $args{'warning'}, $args{'critical'}, 0 ]; if ( $rs[0][1] eq 't' ) { push @msg => sprintf 'archiver failing on %s', $rs[0][4]; if ( $rs[0][2] ne '' ) { push @longmsg => sprintf '%s not archived since %ds', $rs[0][4], $rs[0][2]; } else { push @longmsg => sprintf '%s not archived', $rs[0][4]; } } if ( $nb_files > 0 ) { if ( $rs[0][5] ne '' ) { push @perfdata => [ 'oldest_ready_wal', int( $rs[0][5] ), 's', undef, undef, 0 ]; } } else { push @perfdata => [ 'oldest_ready_wal', 0, 's', undef, undef, 0 ]; } push @msg => "$nb_files WAL files ready to archive"; } return status_critical( $me, \@msg, \@perfdata, \@longmsg ) if scalar @msg > 1; if ( defined $args{'critical'} and $nb_files >= $args{'critical'} ) { return status_critical( $me, \@msg, \@perfdata ); } elsif ( defined $args{'warning'} and $nb_files >= $args{'warning'} ) { return status_warning( $me, \@msg, \@perfdata ); } return status_ok( $me, \@msg, \@perfdata ); } =item B (8.1+) Check the autovacuum activity on the cluster. Perfdata contains the age of oldest running autovacuum and the number of workers by type (VACUUM, VACUUM ANALYZE, ANALYZE, VACUUM FREEZE). Thresholds, if any, are ignored. Required privileges: unprivileged role. =cut sub check_autovacuum { my @rs; my @perfdata; my @msg; my @longmsg; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_AUTOVACUUM'; my $oldest = undef; my $numautovac = 0; my $max_workers = "NaN"; my %activity = ( 'VACUUM' => 0, 'VACUUM_ANALYZE' => 0, 'ANALYZE' => 0, 'VACUUM_FREEZE' => 0, 'BRIN_SUMMARIZE' => 0 ); my %queries = ( # field current_query, not autovacuum_max_workers $PG_VERSION_81 => q{ SELECT current_query, extract(EPOCH FROM now()-query_start)::bigint, 'NaN' FROM pg_stat_activity WHERE current_query LIKE 'autovacuum: %' ORDER BY query_start ASC }, # field current_query, autovacuum_max_workers $PG_VERSION_83 => q{ SELECT a.current_query, extract(EPOCH FROM now()-a.query_start)::bigint, s.setting FROM (SELECT current_setting('autovacuum_max_workers') AS setting) AS s LEFT JOIN ( SELECT * FROM pg_stat_activity WHERE current_query LIKE 'autovacuum: %' ) AS a ON true ORDER BY query_start ASC }, # field query, still autovacuum_max_workers $PG_VERSION_92 => q{ SELECT a.query, extract(EPOCH FROM now()-a.query_start)::bigint, s.setting FROM (SELECT current_setting('autovacuum_max_workers') AS setting) AS s LEFT JOIN ( SELECT * FROM pg_stat_activity WHERE query LIKE 'autovacuum: %' ) AS a ON true ORDER BY a.query_start ASC } ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "autovacuum".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'autovacuum', $PG_VERSION_81 or exit 1; if (check_compat $hosts[0], $PG_VERSION_81, $PG_VERSION_96) { delete $activity{BRIN_SUMMARIZE}; } @rs = @{ query_ver( $hosts[0], %queries ) }; REC_LOOP: foreach my $r (@rs) { if ( not defined $oldest ){ $max_workers = $r->[2]; next REC_LOOP if ( $r->[1] eq "" ); $oldest = $r->[1]; } $numautovac++; if ( $r->[0] =~ '\(to prevent wraparound\)$' ) { $activity{'VACUUM_FREEZE'}++; } else { if ( $r->[0] =~ '^autovacuum: VACUUM ANALYZE' ) { $activity{'VACUUM_ANALYZE'}++; } elsif ( $r->[0] =~ 'autovacuum: VACUUM' ) { $activity{'VACUUM'}++; } elsif ( $r->[0] =~ 'autovacuum: BRIN summarize' ) { $activity{'BRIN_SUMMARIZE'}++; } else { $activity{'ANALYZE'}++; }; } $r->[0] =~ s/autovacuum: //; push @longmsg, $r->[0]; } $oldest = 'NaN' if not defined ( $oldest ); @perfdata = map { [ $_, $activity{$_} ] } keys %activity; push @perfdata, [ 'oldest_autovacuum', $oldest, 's' ]; push @perfdata, [ 'max_workers', $max_workers ] if $hosts[0]->{'version_num'} >= $PG_VERSION_83; push @msg, "Number of autovacuum: $numautovac"; push @msg, "Oldest autovacuum: " . to_interval($oldest) if $oldest ne "NaN"; return status_ok( $me, \@msg , \@perfdata, \@longmsg ); } =item B (all) Check the total number of connections in the PostgreSQL cluster. Perfdata contains the number of connections per database. Critical and Warning thresholds accept either a raw number or a percentage (eg. 80%). When a threshold is a percentage, it is compared to the difference between the cluster parameters C and C. Required privileges: an unprivileged user only sees its own queries; a pg_monitor (10+) or superuser (<10) role is required to see all queries. =cut sub check_backends { my @rs; my @perfdata; my @msg; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_BACKENDS'; my $num_backends = 0; my %queries = ( $PG_VERSION_MIN => q{ SELECT s.datname, s.numbackends, current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int FROM pg_catalog.pg_stat_database AS s JOIN pg_catalog.pg_database d ON d.oid = s.datid WHERE d.datallowconn }, # Remove autovacuum connections (autovac introduced in 8.1, but exposed # in pg_stat_activity since 8.2) $PG_VERSION_82 => q{ SELECT d.datname, count(*), current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int FROM pg_catalog.pg_stat_activity AS s JOIN pg_catalog.pg_database AS d ON d.oid = s.datid WHERE current_query NOT LIKE 'autovacuum: %' GROUP BY d.datname }, # Add replication connections 9.1 $PG_VERSION_91 => q{ SELECT s.*, current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int FROM ( SELECT d.datname, count(*) FROM pg_catalog.pg_stat_activity AS s JOIN pg_catalog.pg_database AS d ON d.oid = s.datid WHERE current_query NOT LIKE 'autovacuum: %' GROUP BY d.datname UNION ALL SELECT 'replication', count(*) FROM pg_catalog.pg_stat_replication ) AS s }, # Rename current_query => query $PG_VERSION_92 => q{ SELECT s.*, current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int FROM ( SELECT d.datname, count(*) FROM pg_catalog.pg_stat_activity AS s JOIN pg_catalog.pg_database AS d ON d.oid = s.datid WHERE query NOT LIKE 'autovacuum: %' GROUP BY d.datname UNION ALL SELECT 'replication', count(*) FROM pg_catalog.pg_stat_replication ) AS s }, # Only account client backends $PG_VERSION_100 => q{ SELECT s.*, current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int FROM ( SELECT d.datname, count(*) FROM pg_catalog.pg_stat_activity AS s JOIN pg_catalog.pg_database AS d ON d.oid = s.datid WHERE backend_type = 'client backend' GROUP BY d.datname UNION ALL SELECT 'replication', count(*) FROM pg_catalog.pg_stat_replication ) AS s }, $PG_VERSION_120 => q{ SELECT s.*, current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int FROM ( SELECT d.datname, count(*) FROM pg_catalog.pg_stat_activity AS s JOIN pg_catalog.pg_database AS d ON d.oid = s.datid WHERE backend_type = 'client backend' GROUP BY d.datname ) AS s } ); # Warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; # Warning and critical must be raw or %. pod2usage( -message => "FATAL: critical and warning thresholds only accept raw numbers or %.", -exitval => 127 ) unless $args{'warning'} =~ m/^([0-9.]+)%?$/ and $args{'critical'} =~ m/^([0-9.]+)%?$/; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "backends".', -exitval => 127 ) if @hosts != 1; @rs = @{ query_ver( $hosts[0], %queries ) }; $args{'critical'} = int( $rs[0][2] * $1 / 100 ) if $args{'critical'} =~ /^([0-9.]+)%$/; $args{'warning'} = int( $rs[0][2] * $1 / 100 ) if $args{'warning'} =~ /^([0-9.]+)%$/; LOOP_DB: foreach my $db (@rs) { $num_backends += $db->[1]; push @perfdata, [ $db->[0], $db->[1], '', $args{'warning'}, $args{'critical'}, 0, $db->[2] ]; } push @perfdata, [ 'maximum_connections', $rs[0][2], undef, undef, undef, 0, $rs[0][2] ]; push @msg => "$num_backends connections on $rs[0][2]"; return status_critical( $me, \@msg, \@perfdata ) if $num_backends >= $args{'critical'}; return status_warning( $me, \@msg, \@perfdata ) if $num_backends >= $args{'warning'}; return status_ok( $me, \@msg, \@perfdata ); } =item B (8.2+) Check the status of all backends. Depending on your PostgreSQL version, statuses are: C, C, C (>=9.0 only), C, C, C, C, C and C. B appears when you are not allowed to see the statuses of other connections. This service supports the argument C<--exclude REGEX> to exclude queries matching the given regular expression. You can use multiple C<--exclude REGEX> arguments. Critical and Warning thresholds are optional. They accept a list of 'status_label=value' separated by a comma. Available labels are C, C, C, C, C and C. Values are raw numbers or time units and empty lists are forbidden. Here is an example: -w 'waiting=5,idle_xact=10' -c 'waiting=20,idle_xact=30,active=1d' Perfdata contains the number of backends for each status and the oldest one for each of them, for 8.2+. Note that the number of backends reported in Nagios message B excluded backends. Required privileges: an unprivileged user only sees its own queries; a pg_monitor (10+) or superuser (<10) role is required to see all queries. =cut sub check_backends_status { my @rs; my @hosts; my @perfdata; my @msg_warn; my @msg_crit; my %warn; my %crit; my $max_connections; my $num_backends = 0; my $me = 'POSTGRES_BACKENDS_STATUS'; my %status = ( 'idle' => [0, 0], 'idle in transaction' => [0, 0], 'idle in transaction (aborted)' => [0, 0], 'fastpath function call' => [0, 0], 'waiting for lock' => [0, 0], 'active' => [0, 0], 'disabled' => [0, 0], 'undefined' => [0, 0], 'insufficient privilege' => [0, 0], ); my %translate = ( 'idle' => 'idle', 'idle_xact' => 'idle in transaction', 'aborted_xact' => 'idle in transaction (aborted)', 'fastpath' => 'fastpath function call', 'waiting' => 'waiting for lock', 'active' => 'active' ); my %queries = ( # Doesn't support "idle in transaction (aborted)" and xact age $PG_VERSION_82 => q{ SELECT CASE WHEN s.current_query = '' THEN 'idle' WHEN s.current_query = ' in transaction' THEN 'idle in transaction' WHEN s.current_query = ' function call' THEN 'fastpath function call' WHEN s.current_query = '' THEN 'disabled' WHEN s.current_query = '' THEN 'undefined' WHEN s.current_query = '' THEN 'insufficient privilege' WHEN s.waiting = 't' THEN 'waiting for lock' ELSE 'active' END AS status, NULL, current_setting('max_connections'), s.current_query FROM pg_stat_activity AS s JOIN pg_database d ON d.oid=s.datid WHERE d.datallowconn }, # Doesn't support "idle in transaction (aborted)" $PG_VERSION_83 => q{ SELECT CASE WHEN s.current_query = '' THEN 'idle' WHEN s.current_query = ' in transaction' THEN 'idle in transaction' WHEN s.current_query = ' function call' THEN 'fastpath function call' WHEN s.current_query = '' THEN 'disabled' WHEN s.current_query = '' THEN 'undefined' WHEN s.current_query = '' THEN 'insufficient privilege' WHEN s.waiting = 't' THEN 'waiting for lock' ELSE 'active' END AS status, extract('epoch' FROM date_trunc('milliseconds', current_timestamp-s.xact_start) ), current_setting('max_connections'), s.current_query FROM pg_stat_activity AS s JOIN pg_database d ON d.oid=s.datid WHERE d.datallowconn }, # Supports everything $PG_VERSION_90 => q{ SELECT CASE WHEN s.current_query = '' THEN 'idle' WHEN s.current_query = ' in transaction' THEN 'idle in transaction' WHEN s.current_query = ' in transaction (aborted)' THEN 'idle in transaction (aborted)' WHEN s.current_query = ' function call' THEN 'fastpath function call' WHEN s.current_query = '' THEN 'disabled' WHEN s.current_query = '' THEN 'undefined' WHEN s.current_query = '' THEN 'insufficient privilege' WHEN s.waiting = 't' THEN 'waiting for lock' ELSE 'active' END, extract('epoch' FROM date_trunc('milliseconds', current_timestamp-s.xact_start) ), current_setting('max_connections'), s.current_query FROM pg_stat_activity AS s JOIN pg_database d ON d.oid=s.datid WHERE d.datallowconn }, # pg_stat_activity schema change $PG_VERSION_92 => q{ SELECT CASE WHEN s.waiting = 't' THEN 'waiting for lock' WHEN s.query = '' THEN 'insufficient privilege' WHEN s.state IS NULL THEN 'undefined' ELSE s.state END, extract('epoch' FROM date_trunc('milliseconds', current_timestamp-s.state_change) ), current_setting('max_connections'), s.query FROM pg_stat_activity AS s JOIN pg_database d ON d.oid=s.datid WHERE d.datallowconn }, # pg_stat_activity schema change for wait events $PG_VERSION_96 => q{ SELECT CASE WHEN s.wait_event_type = 'Lock' THEN 'waiting for lock' WHEN s.query = '' THEN 'insufficient privilege' WHEN s.state IS NULL THEN 'undefined' ELSE s.state END, extract('epoch' FROM date_trunc('milliseconds', current_timestamp-s.state_change) ), current_setting('max_connections'), s.query FROM pg_stat_activity AS s JOIN pg_database d ON d.oid=s.datid WHERE d.datallowconn }, # pg_stat_activity now displays background processes $PG_VERSION_100 => q{ SELECT CASE WHEN s.wait_event_type = 'Lock' THEN 'waiting for lock' WHEN s.query = '' THEN 'insufficient privilege' WHEN s.state IS NULL THEN 'undefined' ELSE s.state END, extract('epoch' FROM date_trunc('milliseconds', current_timestamp-s.state_change) ), current_setting('max_connections'), s.query FROM pg_stat_activity AS s JOIN pg_database d ON d.oid=s.datid WHERE d.datallowconn AND backend_type IN ('client backend', 'background worker') } ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "backends_status".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'backends_status', $PG_VERSION_82 or exit 1; if ( defined $args{'warning'} ) { my $thresholds_re = qr/(idle|idle_xact|aborted_xact|fastpath|active|waiting)\s*=\s*(\d+\s*[smhd]?)/i; # Warning and critical must be raw pod2usage( -message => "FATAL: critical and warning thresholds only accept a list of 'label=value' separated by comma.\n" . "See documentation for more information.", -exitval => 127 ) unless $args{'warning'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ and $args{'critical'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ ; while ( $args{'warning'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); $warn{$translate{$threshold}} = $value if $1 and defined $2; } while ( $args{'critical'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); $crit{$translate{$threshold}} = $value if $1 and defined $2; } } @rs = @{ query_ver( $hosts[0], %queries ) }; delete $status{'idle in transaction (aborted)'} if $hosts[0]->{'version_num'} < $PG_VERSION_90; $max_connections = $rs[0][2] if scalar @rs; REC_LOOP: foreach my $r (@rs) { $num_backends++; foreach my $exclude_re ( @{ $args{'exclude'} } ) { next REC_LOOP if $r->[3] =~ /$exclude_re/; } if (exists $status{$r->[0]}) { $status{$r->[0]}[0]++; $status{$r->[0]}[1] = $r->[1] if $r->[1] and $r->[1] > $status{$r->[0]}[1]; } } STATUS_LOOP: foreach my $s (sort keys %status) { my @perf = ( $s, $status{$s}[0], undef ); push @perf, ( $warn{$s}, $crit{$s}, 0, $max_connections ) if ( exists $warn{$s} and exists $crit{$s} and $warn{$s} =~ /\d+$/ and $crit{$s} =~ /\d+$/ ); push @perfdata => [ @perf ]; if ( $hosts[0]->{'version_num'} >= $PG_VERSION_83 and $s !~ '^(?:disabled|undefined|insufficient)' ) { my @perf = ("oldest $s", $status{$s}[1], 's' ); push @perf, ( $warn{$s}, $crit{$s}, 0, $max_connections ) if ( exists $warn{$s} and exists $crit{$s} and $warn{$s} =~ /\d+\s*[smhd]/ and $crit{$s} =~ /\d+\s*[smhd]/ ); push @perfdata => [ @perf ]; } # Criticals if ( exists $crit{$s} ) { if ( $crit{$s} =~ /\d+\s*[smhd]/ ) { if ( $status{$s}[1] >= get_time($crit{$s}) ) { push @msg_crit => "$status{$s}[0] $s for $status{$s}[1] seconds"; next STATUS_LOOP; } } elsif ( $status{$s}[0] >= $crit{$s} ) { push @msg_crit => "$status{$s}[0] $s"; next STATUS_LOOP; } } # Warning if ( exists $warn{$s} ) { if ( $warn{$s} =~ /\d+\s*[smhd]/ ) { if ( $status{$s}[1] >= get_time($warn{$s}) ) { push @msg_warn => "$status{$s}[0] $s for $status{$s}[1] seconds"; next STATUS_LOOP; } } elsif ( $status{$s}[0] >= $warn{$s} ) { push @msg_warn => "$status{$s}[0] $s"; next STATUS_LOOP; } } } return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn > 0; return status_ok( $me, [ "$num_backends backend connected" ], \@perfdata ); } =item B (12+) Check for data checksums error, reported in pg_stat_database. This service requires that data checksums are enabled on the target instance. UNKNOWN will be returned if that's not the case. Critical and Warning thresholds are optional. They only accept a raw number of checksums errors per database. If the thresholds are not provided, a default value of `1` will be used for both thresholds. Checksums errors are CRITICAL issues, so it's highly recommended to keep default threshold, as immediate action should be taken as soon as such a problem arises. Perfdata contains the number of error per database. Required privileges: unprivileged user. =cut sub check_checksum_errors { my @msg_crit; my @msg_warn; my @rs; my @perfdata; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_CHECKSUM_ERRORS'; my $db_checked = 0; my $sql = q{SELECT COALESCE(s.datname, ''), checksum_failures FROM pg_catalog.pg_stat_database s}; my $w_limit; my $c_limit; # Warning and critical are optional pod2usage( -message => "FATAL: you must specify both critical and warning thresholds.", -exitval => 127 ) if ((defined $args{'warning'} and not defined $args{'critical'}) or (not defined $args{'warning'} and defined $args{'critical'})) ; # Warning and critical default to 1 if (not defined $args{'warning'} or not defined $args{'critical'}) { $w_limit = $c_limit = 1; } else { $w_limit = $args{'warning'}; $c_limit = $args{'critical'}; } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "database_size".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'checksum_error', $PG_VERSION_120 or exit 1; # Check if data checksums are enabled @rs = @{ query( $hosts[0], "SELECT pg_catalog.current_setting('data_checksums')" ) }; return status_unknown( $me, ['Data checksums are not enabled!'] ) unless ($rs[0][0] eq "on"); @rs = @{ query( $hosts[0], $sql ) }; DB_LOOP: foreach my $db (@rs) { $db_checked++; push @perfdata => [ $db->[0], $db->[1], '', $w_limit, $c_limit ]; if ( $db->[1] >= $c_limit ) { push @msg_crit => "$db->[0]: $db->[1] error(s)"; next DB_LOOP; } if ( $db->[1] >= $w_limit ) { push @msg_warn => "$db->[0]: $db->[1] error(s)"; next DB_LOOP; } } return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn > 0; return status_ok( $me, [ "$db_checked database(s) checked" ], \@perfdata ); } =item B (8.1+) Check the age of the backup label file. Perfdata returns the age of the backup_label file, -1 if not present. Critical and Warning thresholds only accept an interval (eg. 1h30m25s). Required privileges: grant execute on function pg_stat_file(text, boolean) (pg12+); unprivileged role (9.3+); superuser (<9.3) =cut sub check_backup_label_age { my $rs; my $c_limit; my $w_limit; my @perfdata; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_BACKUP_LABEL_AGE'; my %queries = ( $PG_VERSION_81 => q{SELECT max(s.r) AS value FROM ( SELECT CAST(extract(epoch FROM current_timestamp - (pg_stat_file(file)).modification) AS integer) AS r FROM pg_ls_dir('.') AS ls(file) WHERE file='backup_label' UNION SELECT 0 ) AS s}, $PG_VERSION_93 => q{ SELECT CASE WHEN pg_is_in_backup() THEN CAST(extract(epoch FROM current_timestamp - pg_backup_start_time()) AS integer) ELSE 0 END}, $PG_VERSION_120 => q{ SELECT coalesce(CAST((extract(epoch FROM current_timestamp - sf.modification)) AS integer), 0) FROM pg_stat_file('backup_label', true) sf; }, ); # warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $c_limit = get_time( $args{'critical'} ); $w_limit = get_time( $args{'warning'} ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "backup_label_age".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'backup_label_age', $PG_VERSION_81 or exit 1; $rs = @{ query_ver( $hosts[0], %queries )->[0] }[0]; push @perfdata, [ 'age', $rs, 's', $w_limit, $c_limit ]; return status_critical( $me, [ "age: ".to_interval($rs) ], \@perfdata ) if $rs > $c_limit; return status_warning( $me, [ "age: ".to_interval($rs) ], \@perfdata ) if $rs > $w_limit; return status_ok( $me, [ "backup_label file ".( $rs == 0 ? "absent":"present (age: ".to_interval($rs).")") ], \@perfdata ); } =item B (8.3+) Check the percentage of pages written by backends since last check. This service uses the status file (see C<--status-file> parameter). Perfdata contains the ratio per second for each C counter since last execution. Units Nps for checkpoints, max written clean and fsyncs are the number of "events" per second. Critical and Warning thresholds are optional. If set, they I accept a percentage. Required privileges: unprivileged role. =cut sub check_bgwriter { my @msg; my @msg_crit; my @msg_warn; my @rs; my @perfdata; my $delta_ts; my $delta_buff_total; my $delta_buff_backend; my $delta_buff_bgwriter; my $delta_buff_checkpointer; my $delta_buff_alloc; my $delta_checkpoint_timed; my $delta_checkpoint_req; my $delta_maxwritten_clean; my $delta_backend_fsync; my %new_bgw; my %bgw; my @hosts; my $now = time(); my %args = %{ $_[0] }; my $me = 'POSTGRES_BGWRITER'; my %queries = ( $PG_VERSION_83 => q{SELECT checkpoints_timed, checkpoints_req, buffers_checkpoint * current_setting('block_size')::numeric, buffers_clean * current_setting('block_size')::numeric, maxwritten_clean, buffers_backend * current_setting('block_size')::numeric, buffers_alloc * current_setting('block_size')::numeric, 0, 0 FROM pg_stat_bgwriter; }, $PG_VERSION_91 => q{SELECT checkpoints_timed, checkpoints_req, buffers_checkpoint * current_setting('block_size')::numeric, buffers_clean * current_setting('block_size')::numeric, maxwritten_clean, buffers_backend * current_setting('block_size')::numeric, buffers_alloc * current_setting('block_size')::numeric, buffers_backend_fsync, extract ('epoch' from stats_reset) FROM pg_stat_bgwriter; } ); # Warning and critical must be %. pod2usage( -message => "FATAL: critical and warning thresholds only accept percentages.", -exitval => 127 ) unless not (defined $args{'warning'} and defined $args{'critical'} ) or ( $args{'warning'} =~ m/^([0-9.]+)%$/ and $args{'critical'} =~ m/^([0-9.]+)%$/ ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "bgwriter".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'bgwriter', $PG_VERSION_83 or exit 1; %bgw = %{ load( $hosts[0], 'bgwriter', $args{'status-file'} ) || {} }; @rs = @{ query_ver( $hosts[0], %queries )->[0] }; $new_bgw{'ts'} = $now; $new_bgw{'checkpoint_timed'} = $rs[0]; $new_bgw{'checkpoint_req'} = $rs[1]; $new_bgw{'buff_checkpoint'} = $rs[2]; $new_bgw{'buff_clean'} = $rs[3]; $new_bgw{'maxwritten_clean'} = $rs[4]; $new_bgw{'buff_backend'} = $rs[5]; $new_bgw{'buff_alloc'} = $rs[6]; $new_bgw{'backend_fsync'} = $rs[7]; $new_bgw{'stat_reset'} = $rs[8]; save $hosts[0], 'bgwriter', \%new_bgw, $args{'status-file'}; return status_ok( $me, ['First call'] ) unless keys %bgw and defined $bgw{'ts'}; # 'ts' was added in 1.25, check for existence # instead of raising some ugly Perl errors # when upgrading. return status_ok( $me, ['Stats reseted since last call'] ) if $new_bgw{'stat_reset'} > $bgw{'stat_reset'} or $new_bgw{'checkpoint_timed'} < $bgw{'checkpoint_timed'} or $new_bgw{'checkpoint_req'} < $bgw{'checkpoint_req'} or $new_bgw{'buff_checkpoint'} < $bgw{'buff_checkpoint'} or $new_bgw{'buff_clean'} < $bgw{'buff_clean'} or $new_bgw{'maxwritten_clean'} < $bgw{'maxwritten_clean'} or $new_bgw{'buff_backend'} < $bgw{'buff_backend'} or $new_bgw{'buff_alloc'} < $bgw{'buff_alloc'} or $new_bgw{'backend_fsync'} < $bgw{'backend_fsync'}; $delta_buff_total = $rs[2] - $bgw{'buff_checkpoint'} + $rs[3] - $bgw{'buff_clean'} + $rs[5] - $bgw{'buff_backend'}; $delta_ts = $now - $bgw{'ts'}; $delta_buff_backend = ($rs[5] - $bgw{'buff_backend'}) / $delta_ts; $delta_buff_bgwriter = ($rs[3] - $bgw{'buff_clean'}) / $delta_ts; $delta_buff_checkpointer = ($rs[2] - $bgw{'buff_checkpoint'}) / $delta_ts; $delta_buff_alloc = ($rs[6] - $bgw{'buff_alloc'}) / $delta_ts; $delta_checkpoint_timed = ($rs[0] - $bgw{'checkpoint_timed'}) / $delta_ts; $delta_checkpoint_req = ($rs[1] - $bgw{'checkpoint_req'}) / $delta_ts; $delta_maxwritten_clean = ($rs[4] - $bgw{'maxwritten_clean'}) / $delta_ts; $delta_backend_fsync = ($rs[7] - $bgw{'backend_fsync'}) / $delta_ts; push @perfdata, ( [ 'buffers_backend', $delta_buff_backend, 'Bps' ], [ 'checkpoint_timed', $delta_checkpoint_timed, 'Nps' ], [ 'checkpoint_req', $delta_checkpoint_req, 'Nps' ], [ 'buffers_checkpoint', $delta_buff_checkpointer, 'Bps' ], [ 'buffers_clean', $delta_buff_bgwriter, 'Bps' ], [ 'maxwritten_clean', $delta_maxwritten_clean, 'Nps' ], [ 'buffers_backend_fsync', $delta_backend_fsync, 'Nps' ], [ 'buffers_alloc', $delta_buff_alloc, 'Bps' ] ); if ($delta_buff_total) { push @msg => sprintf( "%.2f%% from backends, %.2f%% from bgwriter, %.2f%% from checkpointer", 100 * $delta_buff_backend / $delta_buff_total, 100 * $delta_buff_bgwriter / $delta_buff_total, 100 * $delta_buff_checkpointer / $delta_buff_total ); } else { push @msg => "No writes"; } # Alarm if asked. # FIXME: threshold should accept a % and a minimal written size if ( defined $args{'warning'} and defined $args{'critical'} and $delta_buff_total ) { my $w_limit = get_size( $args{'warning'}, $delta_buff_total ); my $c_limit = get_size( $args{'critical'}, $delta_buff_total ); return status_critical( $me, \@msg, \@perfdata ) if $delta_buff_backend >= $c_limit; return status_warning( $me, \@msg, \@perfdata ) if $delta_buff_backend >= $w_limit; } return status_ok( $me, \@msg, \@perfdata ); } =item B Estimate bloat on B-tree indexes. Warning and critical thresholds accept a comma-separated list of either raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to B size, not object size. If a percentage is given, the threshold will apply to the bloat size compared to the total index size. If multiple threshold values are passed, check_pgactivity will choose the largest (bloat size) value. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. It also supports a C<--exclude REGEX> parameter to exclude relations matching a regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple C<--exclude REGEX> parameters. Perfdata will return the number of indexes of concern, by warning and critical threshold per database. A list of the bloated indexes will be returned after the perfdata. This list contains the fully qualified bloated index name, the estimated bloat size, the index size and the bloat percentage. Required privileges: superuser (<10) able to log in all databases, or at least those in C<--dbinclude>; superuser (<10); on PostgreSQL 10+, a user with the role pg_monitor suffices, provided that you grant SELECT on the system table pg_statistic to the pg_monitor role, in each database of the cluster: C =cut sub check_btree_bloat { my @perfdata; my @longmsg; my @rs; my @hosts; my @all_db; my $total_index; # num of index checked, without excluded ones my $w_count = 0; my $c_count = 0; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_BTREE_BLOAT'; my %queries = ( $PG_VERSION_74 => q{ SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size, bs*(relpages-est_pages)::bigint AS bloat_size, 100 * (relpages-est_pages)::float / relpages AS bloat_ratio FROM ( SELECT coalesce( 1+ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 ) AS est_pages, bs, nspname, tblname, idxname, relpages, is_na FROM ( SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm%maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth = 0 THEN 0 WHEN nulldatawidth::numeric%maxalign = 0 THEN maxalign ELSE nulldatawidth::numeric%maxalign END )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na FROM ( SELECT n.nspname, sub.tblname, sub.idxname, sub.reltuples, sub.relpages, sub.relam, 8192::numeric AS bs, CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS maxalign, 20 AS pagehdr, 16 AS pageopqdata, CASE WHEN max(coalesce(sub.stanullfrac,0)) = 0 THEN 8 ELSE 8 + (( 32 + 8 - 1 ) / 8) END AS index_tuple_hdr_bm, sum( (1-coalesce(sub.stanullfrac, 0)) * coalesce(sub.stawidth, 1024)) AS nulldatawidth, max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 OR count(1) <> sub.indnatts AS is_na FROM ( SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum, i.indnatts FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON ct.oid = i.indrelid JOIN pg_catalog.pg_statistic AS s ON s.starelid = i.indrelid AND s.staattnum = ANY ( string_to_array(pg_catalog.textin(pg_catalog.int2vectorout(i.indkey)), ' ')::smallint[] ) WHERE ci.relpages > 0 ) AS sub JOIN pg_catalog.pg_attribute AS a ON sub.starelid = a.attrelid AND sub.staattnum = a.attnum JOIN pg_catalog.pg_type AS t ON a.atttypid = t.oid JOIN pg_catalog.pg_namespace AS n ON sub.relnamespace = n.oid WHERE a.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, sub.indnatts ) AS sub2 ) AS sub3 JOIN pg_am am ON sub3.relam = am.oid WHERE am.amname = 'btree' ) AS sub4 WHERE NOT is_na ORDER BY 2,3,4 }, # Page header is 24 and block_size GUC, support index on expression $PG_VERSION_80 => q{ SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size, bs*(relpages-est_pages)::bigint AS bloat_size, 100 * (relpages-est_pages)::float / relpages AS bloat_ratio FROM ( SELECT coalesce(1 + ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 ) AS est_pages, bs, nspname, tblname, idxname, relpages, is_na FROM ( SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm%maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth = 0 THEN 0 WHEN nulldatawidth::numeric%maxalign = 0 THEN maxalign ELSE nulldatawidth::numeric%maxalign END )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na FROM ( SELECT n.nspname, sub.tblname, sub.idxname, sub.reltuples, sub.relpages, sub.relam, current_setting('block_size')::numeric AS bs, CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS maxalign, 24 AS pagehdr, 16 AS pageopqdata, CASE WHEN max(coalesce(sub.stanullfrac,0)) = 0 THEN 8 ELSE 8 + (( 32 + 8 - 1 ) / 8) END AS index_tuple_hdr_bm, sum( (1-coalesce(sub.stanullfrac, 0)) * coalesce(sub.stawidth, 1024)) AS nulldatawidth, max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na FROM ( SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON i.indrelid = ct.oid JOIN pg_catalog.pg_statistic AS s ON i.indexrelid = s.starelid WHERE ci.relpages > 0 UNION SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON ct.oid = i.indrelid JOIN pg_catalog.pg_statistic AS s ON s.starelid = i.indrelid AND s.staattnum = ANY ( string_to_array(pg_catalog.textin(pg_catalog.int2vectorout(i.indkey)), ' ')::smallint[] ) WHERE ci.relpages > 0 ) AS sub JOIN pg_catalog.pg_attribute AS a ON sub.starelid = a.attrelid AND sub.staattnum = a.attnum JOIN pg_catalog.pg_type AS t ON a.atttypid = t.oid JOIN pg_catalog.pg_namespace AS n ON sub.relnamespace = n.oid WHERE a.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9 ) AS sub2 ) AS sub3 JOIN pg_am am ON sub3.relam = am.oid WHERE am.amname = 'btree' ) AS sub4 WHERE NOT is_na ORDER BY 2,3,4 }, # Use ANY (i.indkey) w/o function call to cast from vector to array $PG_VERSION_81 => q{ SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size, bs*(relpages-est_pages)::bigint AS bloat_size, 100 * (relpages-est_pages)::float / relpages AS bloat_ratio FROM ( SELECT coalesce(1 + ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 ) AS est_pages, bs, nspname, tblname, idxname, relpages, is_na FROM ( SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm%maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth = 0 THEN 0 WHEN nulldatawidth::numeric%maxalign = 0 THEN maxalign ELSE nulldatawidth::numeric%maxalign END )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na FROM ( SELECT n.nspname, sub.tblname, sub.idxname, sub.reltuples, sub.relpages, sub.relam, current_setting('block_size')::numeric AS bs, CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS maxalign, 24 AS pagehdr, 16 AS pageopqdata, CASE WHEN max(coalesce(sub.stanullfrac,0)) = 0 THEN 8 ELSE 8 + (( 32 + 8 - 1 ) / 8) END AS index_tuple_hdr_bm, sum( (1-coalesce(sub.stanullfrac, 0)) * coalesce(sub.stawidth, 1024)) AS nulldatawidth, max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na FROM ( SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON i.indrelid = ct.oid JOIN pg_catalog.pg_statistic AS s ON i.indexrelid = s.starelid WHERE ci.relpages > 0 UNION SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON ct.oid = i.indrelid JOIN pg_catalog.pg_statistic AS s ON s.starelid = i.indrelid AND s.staattnum = ANY ( i.indkey ) WHERE ci.relpages > 0 ) AS sub JOIN pg_catalog.pg_attribute AS a ON sub.starelid = a.attrelid AND sub.staattnum = a.attnum JOIN pg_catalog.pg_type AS t ON a.atttypid = t.oid JOIN pg_catalog.pg_namespace AS n ON sub.relnamespace = n.oid WHERE a.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9 ) AS sub2 ) AS sub3 JOIN pg_am am ON sub3.relam = am.oid WHERE am.amname = 'btree' ) AS sub4 WHERE NOT is_na ORDER BY 2,3,4 }, # New column pg_index.indisvalid $PG_VERSION_82 => q{ SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size, bs*(relpages-est_pages_ff) AS bloat_size, 100 * (relpages-est_pages_ff)::float / relpages AS bloat_ratio FROM ( SELECT coalesce(1 + ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 ) AS est_pages, coalesce(1 + ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0 ) AS est_pages_ff, bs, nspname, tblname, idxname, relpages, fillfactor, is_na FROM ( SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, fillfactor, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm%maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth = 0 THEN 0 WHEN nulldatawidth::numeric%maxalign = 0 THEN maxalign ELSE nulldatawidth::numeric%maxalign END )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na FROM ( SELECT n.nspname, sub.tblname, sub.idxname, sub.reltuples, sub.relpages, sub.relam, sub.fillfactor, current_setting('block_size')::numeric AS bs, CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?) WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS maxalign, 24 AS pagehdr, 16 AS pageopqdata, CASE WHEN max(coalesce(sub.stanullfrac,0)) = 0 THEN 8 -- IndexTupleData size ELSE 8 + (( 32 + 8 - 1 ) / 8) END AS index_tuple_hdr_bm, sum( (1-coalesce(sub.stanullfrac, 0)) * coalesce(sub.stawidth, 1024)) AS nulldatawidth, max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na FROM ( SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum, coalesce(substring( array_to_string(ci.reloptions, ' ') from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON i.indrelid = ct.oid JOIN pg_catalog.pg_statistic AS s ON i.indexrelid = s.starelid WHERE ci.relpages > 0 UNION SELECT ct.relnamespace, ct.relname AS tblname, ci.relname AS idxname, ci.reltuples, ci.relpages, ci.relam, s.stawidth, s.stanullfrac, s.starelid, s.staattnum, coalesce(substring( array_to_string(ci.reloptions, ' ') from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor FROM pg_catalog.pg_index AS i JOIN pg_catalog.pg_class AS ci ON ci.oid = i.indexrelid JOIN pg_catalog.pg_class AS ct ON ct.oid = i.indrelid JOIN pg_catalog.pg_statistic AS s ON s.starelid = i.indrelid AND s.staattnum = ANY ( i.indkey ) WHERE ci.relpages > 0 ) AS sub JOIN pg_catalog.pg_attribute AS a ON sub.starelid = a.attrelid AND sub.staattnum = a.attnum JOIN pg_catalog.pg_type AS t ON a.atttypid = t.oid JOIN pg_catalog.pg_namespace AS n ON sub.relnamespace = n.oid WHERE a.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ) AS sub2 ) AS sub3 JOIN pg_am am ON sub3.relam = am.oid WHERE am.amname = 'btree' ) AS sub4 WHERE NOT is_na ORDER BY 2,3,4 } ); # Warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "btree_bloat".', -exitval => 127 ) if @hosts != 1; @all_db = @{ get_all_dbname( $hosts[0] ) }; # Iterate over all db ALLDB_LOOP: foreach my $db (sort @all_db) { my @rc; # handle max, avg and count for size and percentage, per relkind my $nb_ind = 0; my $idx_bloated = 0; next ALLDB_LOOP if grep { $db =~ /$_/ } @dbexclude; next ALLDB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; @rc = @{ query_ver( $hosts[0], %queries, $db ) }; BLOAT_LOOP: foreach my $bloat (@rc) { foreach my $exclude_re ( @{ $args{'exclude'} } ) { next BLOAT_LOOP if "$bloat->[0].$bloat->[1].$bloat->[3]" =~ m/$exclude_re/; } if ( defined $args{'warning'} ) { my $w_limit = 0; my $c_limit = 0; # We need to compute effective thresholds on each object, # as the value can be given in percentage # The biggest calculated size will be used. foreach my $cur_warning (split /,/, $args{'warning'}) { my $size = get_size( $cur_warning, $bloat->[4] ); $w_limit = $size if $size > $w_limit; } foreach my $cur_critical (split /,/, $args{'critical'}) { my $size = get_size( $cur_critical, $bloat->[4] ); $c_limit = $size if $size > $c_limit; } if ( $bloat->[5] > $w_limit ) { $idx_bloated++; $w_count++; $c_count++ if $bloat->[5] > $c_limit; push @longmsg => sprintf "%s.%s.%s %s/%s (%.2f%%);", $bloat->[0], $bloat->[1], $bloat->[3], to_size($bloat->[5]), to_size($bloat->[4]), $bloat->[6]; } } $nb_ind++; } $total_index += $nb_ind; push @perfdata => [ "idx bloated in $db", $idx_bloated ]; } # We use the warning count for the **total** number of bloated indexes return status_critical( $me, [ "$w_count/$total_index index(es) bloated" ], [ @perfdata ], [ @longmsg ] ) if $c_count > 0; return status_warning( $me, [ "$w_count/$total_index index(es) bloated" ], [ @perfdata ], [ @longmsg ] ) if $w_count > 0; return status_ok( $me, [ "Btree bloat ok" ], \@perfdata ); } =item B (14+) Gather miscellaneous session statistics. This service uses the status file (see --status-file parameter). Perfdata contains the session / active / idle-in-transaction times for each database since last call, as well as the number of sessions per second, and the number of sessions killed / abandoned / terminated by fatal errors. Required privileges: unprivileged role. =cut sub check_session_stats { my @rs; my @perfdata; my @hosts; my %sstats; my %new_sstats; my $instance_session_rate; my %args = %{ $_[0] }; my $me = 'POSTGRES_SESSION_STATS'; my $sql = q{ SELECT extract(EPOCH from now()), s.datname, s.sessions, s.session_time, s.active_time, s.idle_in_transaction_time, s.sessions_killed, s.sessions_abandoned, s.sessions_fatal FROM pg_stat_database s JOIN pg_database d ON s.datid = d.oid WHERE d.datallowconn }; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "session_stats".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'session_stats', $PG_VERSION_140 or exit 1; %sstats = %{ load( $hosts[0], 'session_stats', $args{'status-file'} ) || {} }; @rs = @{ query( $hosts[0], $sql ) }; $new_sstats{$_->[1]} = { 'ts' => $_->[0], 'sessions' => $_->[2], 'session_time' => $_->[3], 'active_time' => $_->[4], 'idle_in_xact_time' => $_->[5], 'killd' => $_->[6], 'aband' => $_->[7], 'fatal' => $_->[8] } foreach @rs; save $hosts[0], 'session_stats', \%new_sstats, $args{'status-file'}; return status_ok( $me, ['First call'] ) unless keys %sstats; foreach my $db ( keys %new_sstats ) { my $sessions = $new_sstats{$db}{'sessions'} - $sstats{$db}{'sessions'}; my $session_time = $new_sstats{$db}{'session_time'} - $sstats{$db}{'session_time'}; my $active_time = $new_sstats{$db}{'active_time'} - $sstats{$db}{'active_time'}; my $idle_in_xact_time = $new_sstats{$db}{'idle_in_xact_time'} - $sstats{$db}{'idle_in_xact_time'}; my $sessions_killd = $new_sstats{$db}{'killd'} - $sstats{$db}{'killd'}; my $sessions_aband = $new_sstats{$db}{'aband'} - $sstats{$db}{'aband'}; my $sessions_fatal = $new_sstats{$db}{'fatal'} - $sstats{$db}{'fatal'}; my $session_rate = $sessions / ( $new_sstats{$db}{'ts'} - $sstats{$db}{'ts'} ); $instance_session_rate += $session_rate; push @perfdata => ( [ "${db}_session_rate", $session_rate, ' sessions/s' ], [ "${db}_session_time", $session_time, 'ms' ], [ "${db}_active_time", $active_time, 'ms' ], [ "${db}_idle_in_transaction_time", $idle_in_xact_time, 'ms' ], [ "${db}_sessions_killed", $sessions_killd ], [ "${db}_sessions_abandoned", $sessions_aband ], [ "${db}_sessions_fatal", $sessions_fatal ] ); } return status_ok( $me, ["Number of sessions per second for all databases: " . sprintf( "%.2f", $instance_session_rate )], \@perfdata ); } =item B (all) Check the commit and rollback rate per second since last call. This service uses the status file (see --status-file parameter). Perfdata contains the commit rate, rollback rate, transaction rate and rollback ratio for each database since last call. Critical and Warning thresholds are optional. They accept a list of comma separated 'label=value'. Available labels are B, B and B, which will be compared to the number of rollbacks, the rollback rate and the rollback ratio of each database. Warning or critical will be raised if the reported value is greater than B, B or B. Required privileges: unprivileged role. =cut sub check_commit_ratio { my @rs; my @msg_warn; my @msg_crit; my @perfdata; my @hosts; my %xacts; my %new_xacts; my $global_commits; my $global_rollbacks; my %warn; my %crit; my %args = %{ $_[0] }; my $me = 'POSTGRES_COMMIT_RATIO'; my $sql = q{ SELECT floor(extract(EPOCH from now())), s.datname, s.xact_commit, s.xact_rollback FROM pg_stat_database s JOIN pg_database d ON s.datid = d.oid WHERE d.datallowconn }; if ( defined $args{'warning'} ) { my $thresholds_re = qr/(rollbacks|rollback_rate|rollback_ratio)\s*=\s*(\d+)/i; # warning and critical must be list of status=value pod2usage( -message => "FATAL: critical and warning thresholds only accept a list of 'label=value' separated by comma.\n" . "See documentation for more information about accepted labels.", -exitval => 127 ) unless $args{'warning'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ and $args{'critical'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ ; while ( $args{'warning'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); $warn{$threshold} = $value if $1 and defined $2; } while ( $args{'critical'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); $crit{$threshold} = $value if $1 and defined $2; } } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "commit_ratio".', -exitval => 127 ) if @hosts != 1; %xacts = %{ load( $hosts[0], 'commit_ratio', $args{'status-file'} ) || {} }; @rs = @{ query( $hosts[0], $sql ) }; $new_xacts{$_->[1]} = { 'ts' => $_->[0], 'commit' => $_->[2], 'rollback' => $_->[3] } foreach @rs; save $hosts[0], 'commit_ratio', \%new_xacts, $args{'status-file'}; return status_ok( $me, ['First call'] ) unless keys %xacts; foreach my $db ( keys %new_xacts ) { my $ratio = 0; my $commits = $new_xacts{$db}{'commit'} - $xacts{$db}{'commit'}; my $rollbacks = $new_xacts{$db}{'rollback'} - $xacts{$db}{'rollback'}; # default to 1 sec if called twice in the same second my $sec = ( $new_xacts{$db}{'ts'} - $xacts{$db}{'ts'} ) || 1; my $commit_rate = $commits / $sec; my $rollback_rate = $rollbacks / $sec; my $xact_rate = ($commits + $rollbacks ) / $sec; $global_commits += $commits; $global_rollbacks += $rollbacks; $ratio = $rollbacks * 100 / ( $commits + $rollbacks ) unless $rollbacks == 0; push @perfdata => ( [ "${db}_commit_rate", sprintf( "%.2f", $commit_rate ), 'tps' ], [ "${db}_rollback_rate", sprintf( "%.2f", $rollback_rate ), 'tps' ], [ "${db}_xact_rate", sprintf( "%.2f", $xact_rate ), 'tps' ], [ "${db}_rollback_ratio", sprintf( "%.2f", $ratio ), '%' ] ); THRESHOLD_LOOP: foreach my $val ( ('rollbacks', 'rollback_rate', 'rollback_ratio') ) { my $prefix = "${db}_${val}"; # Criticals if ( exists $crit{$val} ) { if ( $val eq "rollbacks" and $crit{$val} < $rollbacks ) { push @msg_crit => "'$prefix'=$rollbacks"; next THRESHOLD_LOOP; } if ( $val eq "rollback_rate" and $crit{$val} < $rollback_rate ) { push @msg_crit => sprintf "'%s'=%.2ftps", $prefix, $rollback_rate; next THRESHOLD_LOOP; } if ( $val eq "rollback_ratio" and $crit{$val} < $ratio ) { push @msg_crit => sprintf "'%s'=%.2f%%", $prefix, $ratio; next THRESHOLD_LOOP; } } # Warnings if ( exists $warn{$val} ) { if ( $val eq "rollbacks" and $warn{$val} < $rollbacks ) { push @msg_warn => "'$prefix'=$rollbacks"; next THRESHOLD_LOOP; } if ( $val eq "rollback_rate" and $warn{$val} < $rollback_rate ) { push @msg_warn => sprintf("'%s'=%.2ftps", $prefix, $rollback_rate ); next THRESHOLD_LOOP; } if ( $val eq "rollback_ratio" and $warn{$val} < $ratio ) { push @msg_warn => sprintf("'%s'=%.2f%%", $prefix, $ratio ); next THRESHOLD_LOOP; } } } } return status_critical( $me, [ "Commits: $global_commits - Rollbacks: $global_rollbacks", @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit > 0; return status_warning( $me, [ "Commits: $global_commits - Rollbacks: $global_rollbacks", @msg_warn ], \@perfdata ) if scalar @msg_warn > 0; return status_ok( $me, ["Commits: $global_commits - Rollbacks: $global_rollbacks"], \@perfdata ); } =item B (8.0+) Check the most important settings. Warning and Critical thresholds are ignored. Specific parameters are : C<--work_mem>, C<--maintenance_work_mem>, C<--shared_buffers>,C<--wal_buffers>, C<--checkpoint_segments>, C<--effective_cache_size>, C<--no_check_autovacuum>, C<--no_check_fsync>, C<--no_check_enable>, C<--no_check_track_counts>. Required privileges: unprivileged role. =cut sub check_configuration { my @hosts; my @msg_crit; my %args = %{ $_[0] }; my $me = 'POSTGRES_CONFIGURATION'; # This service is based on a probe by Marc Cousin (cousinmarc@gmail.com) # Limit parameters. Have defaut values my $work_mem = $args{'work_mem'} || 4096; # At least 4MB my $maintenance_work_mem = $args{'maintenance_work_mem'} || 65536; # At least 64MB my $shared_buffers = $args{'shared_buffers'} || 16384; # At least 128MB my $wal_buffers = $args{'wal_buffers'} || 64; # At least 512k. Or -1 for 9.1 my $checkpoint_segments = $args{'checkpoint_segments'} || 10; # At least 1GB. No way a modern server has less than 2GB of ram my $effective_cache_size = $args{'effective_cache_size'} || 131072; # These will be checked to verify they are still the default values (no # parameter, for now) autovacuum, fsync, # enable*,track_counts/stats_row_level my $no_check_autovacuum = $args{'no_check_autovacuum'} || 0; my $no_check_fsync = $args{'no_check_fsync'} || 0; my $no_check_enable = $args{'no_check_enable'} || 0; my $no_check_track_counts = $args{'no_check_track_counts'} || 0; my $sql = "SELECT name,setting FROM pg_settings WHERE ( ( name='work_mem' and setting::bigint < $work_mem ) or ( name='maintenance_work_mem' and setting::bigint < $maintenance_work_mem ) or ( name='shared_buffers' and setting::bigint < $shared_buffers ) or ( name='wal_buffers' and ( setting::bigint < $wal_buffers or setting = '-1') ) or ( name='checkpoint_segments' and setting::bigint < $checkpoint_segments ) or ( name='effective_cache_size' and setting::bigint < $effective_cache_size ) or ( name='autovacuum' and setting='off' and $no_check_autovacuum = 0) or ( name='fsync' and setting='off' and $no_check_fsync=0 ) or ( name~'^enable.*' and setting='off' and $no_check_enable=0 and name not in ('enable_partitionwise_aggregate', 'enable_partitionwise_join')) or (name='stats_row_level' and setting='off' and $no_check_track_counts=0) or (name='track_counts' and setting='off' and $no_check_track_counts=0) )"; # FIXME make one parameter --ignore to rules 'em all. @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "configuration".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'configuration', $PG_VERSION_80 or exit 1; my @rc = @{ query( $hosts[0], $sql ) }; DB_LOOP: foreach my $setting (@rc) { push @msg_crit => ( $setting->[0] . "=" . $setting->[1] ); } # All the entries in $result are an error. If the array isn't empty, we # return ERROR, and the list of errors return status_critical( $me, \@msg_crit ) if ( @msg_crit > 0 ); return status_ok( $me, [ "PostgreSQL configuration ok" ] ); } =item B (all) Perform a simple connection test. No perfdata is returned. This service ignores critical and warning arguments. Required privileges: unprivileged role. =cut sub check_connection { my @rs; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_CONNECTION'; my $sql = q{SELECT now(), version()}; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "connection".', -exitval => 127 ) if @hosts != 1; @rs = @{ query( $hosts[0], $sql, undef, undef, \&status_critical ) }; return status_ok( $me, [ "Connection successful at $rs[0][0], on $rs[0][1]" ] ); } =item B (all) Perform the given user query. Specify the query with C<--query>. The first column will be used to perform the test for the status if warning and critical are provided. The warning and critical arguments are optional. They can be of format integer (default), size or time depending on the C<--type> argument. Warning and Critical will be raised if they are greater than the first column, or less if the C<--reverse> option is used. All other columns will be used to generate the perfdata. Each field name is used as the name of the perfdata. The field value must contain your perfdata value and its unit appended to it. You can add as many fields as needed. Eg.: SELECT pg_database_size('postgres'), pg_database_size('postgres')||'B' AS db_size Required privileges: unprivileged role (depends on the query). =cut sub check_custom_query { my %args = %{ $_[0] }; my $me = 'POSTGRES_CUSTOM_QUERY'; my $sql = $args{'query'}; my $type = $args{'type'} || 'integer'; my $reverse = $args{'reverse'}; my $bounded = undef; my @rs; my @fields; my @perfdata; my @hosts; my @msg_crit; my @msg_warn; my $c_limit; my $w_limit; my $perf; my $value; # FIXME: add warn/crit threshold in perfdata # Query must be given pod2usage( -message => 'FATAL: you must set parameter "--query" with "custom_query" service.', -exitval => 127 ) unless defined $args{'query'} ; # Critical and Warning must be given with --type argument pod2usage( -message => 'FATAL: you must specify critical and warning thresholds with "--type" parameter.', -exitval => 127 ) unless ( not defined $args{'type'} ) or ( defined $args{'type'} and $args{'warning'} and $args{'critical'} ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "custom_query".', -exitval => 127 ) if @hosts != 1; # Handle warning and critical type if ( $type eq 'size' ) { $w_limit = get_size( $args{'warning'} ); $c_limit = get_size( $args{'critical'} ); } elsif ( $type eq 'time' ) { pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval with --type time.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $w_limit = get_time( $args{'warning'} ); $c_limit = get_time( $args{'critical'} ); } elsif (defined $args{'warning'} ) { pod2usage( -message => 'FATAL: given critical and/or warning are not numeric. Please, set "--type" parameter if needed.', -exitval => 127 ) if $args{'warning'} !~ m/^[0-9.]+$/ or $args{'critical'} !~ m/^[0-9.]+$/; $w_limit = $args{'warning'}; $c_limit = $args{'critical'}; } @rs = @{ query( $hosts[0], $sql, undef, 1 ) }; @fields = @{ shift @rs }; return status_unknown( $me, [ 'No row returned by the query!' ] ) unless defined $rs[0]; pod2usage( -message => 'FATAL: First column of your query is not numeric!', -exitval => 127 ) unless looks_like_number($rs[0][0]); DB_LOOP: foreach my $rec ( @rs ) { $bounded = $rec->[0] unless $bounded; $bounded = $rec->[0] if ( !$reverse and $rec->[0] > $bounded ) or ( $reverse and $rec->[0] < $bounded ); $value = shift( @{$rec} ); shift @fields; foreach my $perf ( @$rec ) { my ( $val, $uom ); $perf =~ m{([0-9.]*)(.*)}; $val = $1 if defined $1; $uom = $2 if defined $2; push @perfdata => [ shift @fields, $val, $uom ]; } if ( ( defined $c_limit ) and ( ( !$reverse and ( $value > $c_limit ) ) or ( $reverse and ( $value < $c_limit ) ) ) ) { push @msg_crit => "value: $value"; next DB_LOOP; } if ( ( defined $w_limit ) and ( ( !$reverse and ( $value > $w_limit ) ) or ( $reverse and ( $value < $w_limit ) ) ) ) { push @msg_warn => "value: $value"; next DB_LOOP; } } return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if defined $c_limit and ( ( !$reverse and $bounded > $c_limit) or ( $reverse and $bounded < $c_limit) ); return status_warning( $me, [ @msg_warn ], \@perfdata ) if defined $w_limit and ( ( !$reverse and $bounded > $w_limit) or ( $reverse and $bounded < $w_limit) ); return status_ok( $me, [ "Custom query ok" ], \@perfdata ); } =item B (8.1+) B of database sizes, and B of every databases. This service uses the status file (see C<--status-file> parameter). Perfdata contains the size of each database and their size delta since last call. Critical and Warning thresholds are optional. They are a list of optional 'label=value' separated by a comma. It allows to fine tune the alert based on the absolute C and/or the C size. Eg.: -w 'size=500GB' -c 'size=600GB' -w 'delta=1%' -c 'delta=10%' -w 'size=500GB,delta=1%' -c 'size=600GB,delta=10GB' The C label accepts either a raw number or a size and checks the total database size. The C label accepts either a raw number, a percentage, or a size. The aim of the delta parameter is to detect unexpected database size variations. Delta thresholds are absolute value, and delta percentages are computed against the previous database size. A same label must be filled for both warning and critical. For backward compatibility, if a single raw number or percentage or size is given with no label, it applies on the size difference for each database since the last execution. Both threshold bellow are equivalent: -w 'delta=1%' -c 'delta=10%' -w '1%' -c '10%' This service supports both C<--dbexclude> and C<--dbinclude> parameters. Required privileges: unprivileged role. =cut sub check_database_size { my @msg_crit; my @msg_warn; my @rs; my @perfdata; my @hosts; my %new_db_sizes; my %old_db_sizes; my %warn; my %crit; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_DB_SIZE'; my $db_checked = 0; my $sql = q{SELECT datname, pg_database_size(datname) FROM pg_database}; # Warning and critical are optional, but they are both required if one is given pod2usage( -message => "FATAL: you must specify both critical and warning thresholds.", -exitval => 127 ) if ( defined $args{'warning'} and not defined $args{'critical'} ) or ( not defined $args{'warning'} and defined $args{'critical'} ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "database_size".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'database_size', $PG_VERSION_81 or exit 1; if ( defined $args{'warning'} ) { my $thresholds_re = qr/(size|delta)\s*=\s*([^,]+)/i; # backward compatibility $args{'warning'} = "delta=$args{'warning'}" if is_size($args{'warning'}) or ($args{'warning'} =~ m/^([0-9.]+)%?$/); $args{'critical'} = "delta=$args{'critical'}" if is_size($args{'critical'}) or ($args{'critical'} =~ m/^([0-9.]+)%?$/); # Sanity check pod2usage( -message => "FATAL: wrong format for critical and/or warning thresholds.\n" . "See documentation for more information.", -exitval => 127 ) unless $args{'warning'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ and $args{'critical'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ ; while ( $args{'warning'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); $warn{$threshold} = $value if $1 and defined $2; } while ( $args{'critical'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); $crit{$threshold} = $value if $1 and defined $2; } # Further sanity checks pod2usage( -message => "FATAL: Size threshold only accept a raw number or a size.\n" . "See documentation for more information.", -exitval => 127 ) if (defined $warn{'size'} and not is_size($warn{'size'})) or (defined $crit{'size'} and not is_size($crit{'size'})); pod2usage( -message => "FATAL: you must specify both critical and warning thresholds for size.", -exitval => 127 ) if (defined $warn{'size'} and not defined $crit{'size'}) or (defined $crit{'size'} and not defined $warn{'size'}); pod2usage( -message => "FATAL: Delta threshold only accept a raw number, a size or a percentage.\n" . "See documentation for more information.", -exitval => 127 ) if (defined $warn{'delta'} and not ( is_size($warn{'delta'}) or $warn{'delta'} =~ m/^([0-9.]+)%?$/ )) or (defined $crit{'delta'} and not ( is_size($crit{'delta'}) or $crit{'delta'} =~ m/^([0-9.]+)%?$/ )); pod2usage( -message => "FATAL: you must specify both critical and warning thresholds for delta.", -exitval => 127 ) if (defined $warn{'delta'} and not defined $crit{'delta'}) or (defined $crit{'delta'} and not defined $warn{'delta'}); } # get old size from status file %old_db_sizes = %{ load( $hosts[0], 'db_size', $args{'status-file'} ) || {} }; @rs = @{ query( $hosts[0], $sql ) }; DB_LOOP: foreach my $db (@rs) { my $delta; # $old_db_sizes{ $db->[0] } is the previous DB size # $db->[1] is the new DB size $new_db_sizes{ $db->[0] } = $db->[1]; next DB_LOOP if grep { $db->[0] =~ /$_/ } @dbexclude; next DB_LOOP if @dbinclude and not grep { $db->[0] =~ /$_/ } @dbinclude; $db_checked++; unless ( defined $old_db_sizes{ $db->[0] } ) { push @perfdata => [ $db->[0], $db->[1], 'B' ]; next DB_LOOP; } $delta = $db->[1] - $old_db_sizes{ $db->[0] }; # Must check threshold for each database if ( defined $args{'warning'} ) { my $limit; my $w_limit; my $c_limit; # Check against max db size if ( defined $crit{'size'} ) { $c_limit = get_size( $crit{'size'}, $db->[1] ); push @msg_crit => sprintf( "%s (size: %s)", $db->[0], to_size($db->[1]) ) if $db->[1] >= $c_limit; } if ( defined $warn{'size'} and defined $c_limit and $db->[1] < $c_limit ) { $w_limit = get_size( $warn{'size'}, $db->[1] ); push @msg_warn => sprintf( "%s (size: %s)", $db->[0], to_size($db->[1]) ) if $db->[1] >= $w_limit; } push @perfdata => [ $db->[0], $db->[1], 'B', $w_limit, $c_limit ]; # Check against delta variations (% or absolute values) $c_limit = undef; $w_limit = undef; if ( defined $crit{'delta'} ) { $limit = get_size( $crit{'delta'}, $old_db_sizes{ $db->[0] }); dprint ("DB $db->[0] new size: $db->[1] old size $old_db_sizes{ $db->[0] } (delta $delta) critical delta $crit{'delta'} computed limit $limit \n"); push @msg_crit => sprintf( "%s (delta: %s)", $db->[0], to_size($delta) ) if abs($delta) >= $limit; $c_limit = "-$limit:$limit"; } if ( defined $warn{'delta'} and defined $c_limit and abs($delta) < $limit ) { $limit = get_size( $warn{'delta'}, $old_db_sizes{ $db->[0] } ); dprint ("DB $db->[0] new size: $db->[1] old size $old_db_sizes{ $db->[0] } (delta $delta) warning delta $warn{'delta'} computed limit $limit \n"); push @msg_warn => sprintf( "%s (delta: %s)", $db->[0], to_size($delta) ) if abs($delta) >= $limit; $w_limit = "-$limit:$limit"; } push @perfdata => [ "$db->[0]_delta", $delta, 'B', $w_limit, $c_limit ]; } else { push @perfdata => [ $db->[0], $db->[1], 'B' ]; push @perfdata => [ "$db->[0]_delta", $delta, 'B' ]; } } save $hosts[0], 'db_size', \%new_db_sizes, $args{'status-file'}; return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn > 0; return status_ok( $me, [ "$db_checked database(s) checked" ], \@perfdata ); } =item B (9.1+) Check all extensions installed in all databases (including templates) and raise a critical alert if the current version is not the default version available on the instance (according to pg_available_extensions). Typically, it is used to detect forgotten extension upgrades after package upgrades or a pg_upgrade. Perfdata returns the number of outdated extensions in each database. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Schemas are ignored, as an extension cannot be installed more than once in a database. This service supports multiple C<--exclude> argument to exclude one or more extensions from the check. To ignore an extension only in a particular database, use 'dbname/extension_name' syntax. Examples: --dbexclude 'devdb' --exclude 'testdb/postgis' --exclude 'testdb/postgis_topology' --dbinclude 'proddb' --dbinclude 'testdb' --exclude 'powa' Required privileges: unprivileged role able to log in all databases =cut sub check_extensions_versions { my @rs; my @perfdata; my @msg; my @longmsg; my @hosts; my @all_db; my $nb; my $me = 'POSTGRES_CHECK_EXT_VERSIONS'; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $tot_outdated = 0 ; my $query = q{SELECT name, default_version, installed_version FROM pg_catalog.pg_available_extensions WHERE installed_version != default_version}; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "extensions_versions".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'extensions_versions', $PG_VERSION_91 or exit 1; @all_db = @{ get_all_dbname( $hosts[0], 'all_dbs' ) }; # Iterate over all db ALLDB_LOOP: foreach my $db (sort @all_db) { next ALLDB_LOOP if grep { $db =~ /$_/ } @dbexclude; next ALLDB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; my $outdated = 0; # For each record: extension, default, installed @rs = @{ query ( $hosts[0], $query, $db ) }; REC_LOOP: foreach my $ext (sort @rs) { foreach my $exclude_re ( @{ $args{'exclude'} } ) { next REC_LOOP if $ext->[0] =~ /$exclude_re/ or "$db/$ext->[0]" =~ /$exclude_re/ ; } $outdated++; push @longmsg, "$db.$ext->[0]: $ext->[2] (should be: $ext->[1])"; } dprint("db $db: $outdated outdated ext\n"); $tot_outdated += $outdated; push @perfdata => [ $db, $outdated, undef, undef, 1, 0 ]; } return status_critical( $me, \@msg, \@perfdata, \@longmsg ) if $tot_outdated > 0; return status_ok( $me, \@msg, \@perfdata, \@longmsg ); } =item B (all) Check the cache hit ratio on the cluster. This service uses the status file (see C<--status-file> parameter). Perfdata returns the cache hit ratio per database. Template databases and databases that do not allow connections will not be checked, nor will the databases which have never been accessed. Critical and Warning thresholds are optional. They only accept a percentage. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Required privileges: unprivileged role. =cut sub check_hit_ratio { my @rs; my @perfdata; my @msg_crit; my @msg_warn; my @hosts; my %db_hitratio; my %new_db_hitratio; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_HIT_RATIO'; my $db_checked = 0; my $sql = q{SELECT d.datname, blks_hit, blks_read FROM pg_stat_database sd JOIN pg_database d ON d.oid = sd.datid WHERE d.datallowconn AND NOT d.datistemplate ORDER BY datname}; # Warning and critical must be %. if ( defined $args{'warning'} and defined $args{'critical'} ) { pod2usage( -message => "FATAL: critical and warning thresholds only accept percentages.", -exitval => 127 ) unless $args{'warning'} =~ m/^([0-9.]+)%$/ and $args{'critical'} =~ m/^([0-9.]+)%$/; $args{'warning'} = substr $args{'warning'}, 0, -1; $args{'critical'} = substr $args{'critical'}, 0, -1; } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "hit_ratio".', -exitval => 127 ) if @hosts != 1; %db_hitratio = %{ load( $hosts[0], 'db_hitratio', $args{'status-file'} ) || {} }; @rs = @{ query( $hosts[0], $sql ) }; DB_LOOP: foreach my $db (@rs) { my $ratio; my $hit_delta; my $read_delta; my @perfdata_value; $new_db_hitratio{ $db->[0] } = [ $db->[1], $db->[2], 'NaN' ]; next DB_LOOP if grep { $db->[0] =~ /$_/ } @dbexclude; next DB_LOOP if @dbinclude and not grep { $db->[0] =~ /$_/ } @dbinclude; $db_checked++; next DB_LOOP unless defined $db_hitratio{ $db->[0] }; $hit_delta = $new_db_hitratio{ $db->[0] }[0] - $db_hitratio{ $db->[0] }[0]; $read_delta = $new_db_hitratio{ $db->[0] }[1] - $db_hitratio{ $db->[0] }[1]; # Metrics moved since last run if ( $hit_delta + $read_delta > 0 ) { $ratio = 100 * $hit_delta / ( $hit_delta + $read_delta ); # rounding the fractional part to 2 digits $ratio = int($ratio*100+0.5)/100; $new_db_hitratio{ $db->[0] }[2] = $ratio; @perfdata_value = ( $db->[0], $ratio, '%' ); } # Without activity since last run, use previous hit ratio. # This should not happen as the query itself hits/reads. elsif ( $db->[1] + $db->[2] > 0 ) { $ratio = $db_hitratio{ $db->[0] }[2]; @perfdata_value = ( $db->[0], $ratio, '%' ); } # This database has no reported activity yet else { $ratio='NaN'; $new_db_hitratio{ $db->[0] }[2] = 'NaN'; @perfdata_value = ( $db->[0], 'NaN', '%' ); } push @perfdata_value => ( $args{'warning'}, $args{'critical'} ) if defined $args{'critical'}; push @perfdata => \@perfdata_value; if ( defined $args{'critical'} ) { if ( $ratio < $args{'critical'} ) { push @msg_crit => sprintf "%s: %s%%", $db->[0], $ratio; next DB_LOOP; } if ( defined $args{'warning'} and $ratio < $args{'warning'} ) { push @msg_warn => sprintf "%s: %s%%", $db->[0], $ratio; } } } save $hosts[0], 'db_hitratio', \%new_db_hitratio, $args{'status-file'}; if ( defined $args{'critical'} ) { return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn; } return status_ok( $me, [ "$db_checked database(s) checked" ], \@perfdata ); } =item B (9.0) Check the data delta between a cluster and its hot standbys. You must give the connection parameters for two or more clusters. Perfdata returns the data delta in bytes between the master and each hot standby cluster listed. Critical and Warning thresholds are optional. They can take one or two values separated by a comma. If only one value given, it applies to both received and replayed data. If two values are given, the first one applies to received data, the second one to replayed ones. These thresholds only accept a size (eg. 2.5G). This service raises a Critical if it doesn't find exactly ONE valid master cluster (ie. critical when 0 or 2 and more masters). Required privileges: unprivileged role. =cut sub check_hot_standby_delta { my @perfdata; my @msg; my @msg_crit; my @msg_warn; my $w_limit_received; my $c_limit_received; my $w_limit_replayed; my $c_limit_replayed; my @hosts; my %args = %{ $_[0] }; my $master_location = ''; my $num_clusters = 0; my $wal_size = hex('ff000000'); my $me = 'POSTGRES_HOT_STANDBY_DELTA'; # we need to coalesce on pg_last_xlog_receive_location because it returns # NULL during WAL Shipping my %queries = ( $PG_VERSION_90 => q{ SELECT (NOT pg_is_in_recovery())::int, CASE pg_is_in_recovery() WHEN 't' THEN coalesce( pg_last_xlog_receive_location(), pg_last_xlog_replay_location() ) ELSE pg_current_xlog_location() END, CASE pg_is_in_recovery() WHEN 't' THEN pg_last_xlog_replay_location() ELSE NULL END }, $PG_VERSION_100 => q{ SELECT (NOT pg_is_in_recovery())::int, CASE pg_is_in_recovery() WHEN 't' THEN coalesce( pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn() ) ELSE pg_current_wal_lsn() END, CASE pg_is_in_recovery() WHEN 't' THEN pg_last_wal_replay_lsn() ELSE NULL END }); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give two or more hosts with service "hot_standby_delta".', -exitval => 127 ) if @hosts < 2; foreach my $host ( @hosts ) { is_compat $host, 'hot_standby_delta', $PG_VERSION_90 or exit 1; } # Fetch LSNs foreach my $host (@hosts) { $host->{'rs'} = \@{ query_ver( $host, %queries )->[0] }; $num_clusters += $host->{'rs'}[0]; $master_location = $host->{'rs'}[1] if $host->{'rs'}[0]; } # Check that all clusters have the same major version. foreach my $host ( @hosts ) { return status_critical( $me, ["PostgreSQL major versions differ amongst clusters ($hosts[0]{'version'} vs. $host->{'version'})."] ) if substr($hosts[0]{'version_num'}, 0, -2) != substr($host->{'version_num'}, 0, -2); } return status_critical( $me, ['No cluster in production.'] ) if $num_clusters == 0; return status_critical( $me, ['More than one cluster in production.'] ) if $num_clusters != 1; if ( defined $args{'critical'} ) { ($w_limit_received, $w_limit_replayed) = split /,/, $args{'warning'}; ($c_limit_received, $c_limit_replayed) = split /,/, $args{'critical'}; if (!defined($w_limit_replayed)) { $w_limit_replayed = $w_limit_received; } if (!defined($c_limit_replayed)) { $c_limit_replayed = $c_limit_received; } $w_limit_received = get_size( $w_limit_received ); $c_limit_received = get_size( $c_limit_received ); $w_limit_replayed = get_size( $w_limit_replayed ); $c_limit_replayed = get_size( $c_limit_replayed ); } $wal_size = 4294967296 if $hosts[0]{'version_num'} >= $PG_VERSION_93; # We recycle this one to count the number of slaves $num_clusters = 0; $master_location =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $master_location = ( $wal_size * hex($1) ) + hex($2); # Compute deltas foreach my $host (@hosts) { next if $host->{'rs'}[0]; my ($a, $b) = split(/\//, $host->{'rs'}[1]); $host->{'receive_delta'} = $master_location - ( $wal_size * hex($a) ) - hex($b); ($a, $b) = split(/\//, $host->{'rs'}[2]); $host->{'replay_delta'} = $master_location - ( $wal_size * hex($a) ) - hex($b); $host->{'name'} =~ s/ db=.*$//; push @perfdata => ([ "receive delta $host->{'name'}", $host->{'receive_delta'} > 0 ? $host->{'receive_delta'}:0, 'B', $w_limit_received, $c_limit_received ], [ "replay delta $host->{'name'}", $host->{'replay_delta'} > 0 ? $host->{'replay_delta'}:0, 'B', $w_limit_replayed, $c_limit_replayed ]); if ( defined $args{'critical'} ) { if ($host->{'receive_delta'} > $c_limit_received) { push @msg_crit, "critical receive lag: " . to_size($host->{'receive_delta'}) . " for $host->{'name'}"; next; } if ($host->{'replay_delta'} > $c_limit_replayed) { push @msg_crit, "critical replay lag: " . to_size($host->{'replay_delta'}) . " for $host->{'name'}"; next; } if ($host->{'receive_delta'} > $w_limit_received) { push @msg_warn, "warning receive lag: " . to_size($host->{'receive_delta'}) . " for $host->{'name'}"; next; } if ($host->{'replay_delta'} > $w_limit_replayed) { push @msg_warn, "warning replay lag: " . to_size($host->{'replay_delta'}) . " for $host->{'name'}"; next; } } $num_clusters++; } return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if @msg_warn > 0; return status_ok( $me, [ "$num_clusters Hot standby checked" ], \@perfdata ); } =item B (9.0+) Checks if the cluster is in recovery and accepts read only queries. This service ignores critical and warning arguments. No perfdata is returned. Required privileges: unprivileged role. =cut sub check_is_hot_standby { my @rs; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_IS_HOT_STANDBY'; my %queries = ( $PG_VERSION_90 => q{SELECT pg_is_in_recovery()} ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "is_hot_standby".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'is_hot_standby', $PG_VERSION_90 or exit 1; @rs = @{ query_ver( $hosts[0], %queries )->[0] }; return status_critical( $me, [ "Cluster is not hot standby" ] ) if $rs[0] eq "f"; return status_ok( $me, [ "Cluster is hot standby" ] ); } =item B (all) Checks if the cluster accepts read and/or write queries. This state is reported as "in production" by pg_controldata. This service ignores critical and warning arguments. No perfdata is returned. Required privileges: unprivileged role. =cut sub check_is_master { my @rs; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_IS_MASTER'; # For PostgreSQL 9.0+, the "pg_is_in_recovery()" function is used, for # previous versions the ability to connect is enough. my %queries = ( $PG_VERSION_74 => q{ SELECT false }, $PG_VERSION_90 => q{ SELECT pg_is_in_recovery() } ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "is_master".', -exitval => 127 ) if @hosts != 1; @rs = @{ query_ver( $hosts[0], %queries )->[0] }; return status_critical( $me, [ "Cluster is not master" ] ) if $rs[0] eq "t"; return status_ok( $me, [ "Cluster is master" ] ); } =item B (8.2+) Check if there are invalid indexes in a database. A critical alert is raised if an invalid index is detected. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. This service supports a C<--exclude REGEX> parameter to exclude indexes matching a regular expression. The regular expression applies to "database.schema_name.index_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named index (schema + index) for all databases or on a qualified named index in only one database. You can use multiple C<--exclude REGEX> parameters. Perfdata will return the number of invalid indexes per database. A list of invalid indexes will be returned after the perfdata. This list contains the fully qualified index name. If excluded index is set, the number of exclude indexes is returned. Required privileges: unprivileged role able to log in all databases. =cut sub check_invalid_indexes { my @perfdata; my @longmsg; my @rs; my @hosts; my @all_db; my $total_idx = 0; # num of tables checked, without excluded ones my $total_extbl = 0; # num of excluded tables my $c_count = 0; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_INVALID_INDEXES'; my $query = q{ SELECT current_database(), nsp.nspname AS schemaname, cls.relname, idx.indisvalid FROM pg_class cls join pg_namespace nsp on nsp.oid = cls.relnamespace join pg_index idx on idx.indexrelid = cls.oid WHERE cls.relkind = 'i' AND nsp.nspname not like 'pg_toast%' AND nsp.nspname NOT IN ('information_schema', 'pg_catalog'); }; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give one (and only one) host with service "invalid_indexes".', -exitval => 127 ) if @hosts != 1; @all_db = @{ get_all_dbname( $hosts[0] ) }; # Iterate over all db ALLDB_LOOP: foreach my $db ( sort @all_db ) { my @rc; my $nb_idx = 0; my $idx_invalid = 0; next ALLDB_LOOP if grep { $db =~ /$_/ } @dbexclude; next ALLDB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; @rc = @{ query( $hosts[0], $query, $db ) }; INVALIDIDX_LOOP: foreach my $invalid (@rc) { foreach my $exclude_re ( @{ $args{'exclude'} } ) { if ("$invalid->[0].$invalid->[1].$invalid->[2]" =~ m/$exclude_re/){ $total_extbl++; next INVALIDIDX_LOOP ; } } if ($invalid->[3] eq "f") { # long message info : push @longmsg => sprintf "Invalid index = %s.%s.%s ; ", $invalid->[0], $invalid->[1], $invalid->[2]; $idx_invalid++; } $nb_idx++; } $total_idx += $nb_idx; $c_count += $idx_invalid; push @perfdata => ["invalid index in $db", $idx_invalid ]; } push @longmsg => sprintf "%i index(es) exclude from check", $total_extbl if $total_extbl > 0; # we use the critical count for the **total** number of invalid index return status_critical( $me, [ "$c_count/$total_idx index(es) invalid" ], \@perfdata, \@longmsg ) if $c_count > 0; return status_ok( $me, [ "No invalid index" ], \@perfdata, \@longmsg ); } =item B (9.1+) Checks if the replication is paused. The service will return UNKNOWN if executed on a master server. Thresholds are optional. They must be specified as interval. OK will always be returned if the standby is not paused, even if replication delta time hits the thresholds. Critical or warning are raised if last reported replayed timestamp is greater than given threshold AND some data received from the master are not applied yet. OK will always be returned if the standby is paused, or if the standby has already replayed everything from master and until some write activity happens on the master. Perfdata returned: * paused status (0 no, 1 yes, NaN if master) * lag time (in second) * data delta with master (0 no, 1 yes) Required privileges: unprivileged role. =cut sub check_is_replay_paused { my @perfdata; my @rs; my @hosts; my $w_limit = -1; my $c_limit = -1; my %args = %{ $_[0] }; my $me = 'POSTGRES_REPLICATION_PAUSED'; my %queries = ( $PG_VERSION_91 => q{ SELECT pg_is_in_recovery()::int AS is_in_recovery, CASE pg_is_in_recovery() WHEN 't' THEN pg_is_xlog_replay_paused()::int ELSE 0::int END AS is_paused, CASE pg_is_in_recovery() WHEN 't' THEN extract('epoch' FROM now()-pg_last_xact_replay_timestamp())::int ELSE NULL::int END AS lag, CASE WHEN pg_is_in_recovery() AND pg_last_xlog_replay_location() <> pg_last_xlog_receive_location() THEN 1::int WHEN pg_is_in_recovery() THEN 0::int ELSE NULL END AS delta}, $PG_VERSION_100 => q{ SELECT pg_is_in_recovery()::int AS is_in_recovery, CASE pg_is_in_recovery() WHEN 't' THEN pg_is_wal_replay_paused()::int ELSE 0::int END AS is_paused, CASE pg_is_in_recovery() WHEN 't' THEN extract('epoch' FROM now()-pg_last_xact_replay_timestamp())::int ELSE NULL::int END AS lag, CASE WHEN pg_is_in_recovery() AND pg_last_wal_replay_lsn() <> pg_last_wal_receive_lsn() THEN 1::int WHEN pg_is_in_recovery() THEN 0::int ELSE NULL END AS delta} ); if ( defined $args{'warning'} and defined $args{'critical'} ) { # warning and critical must be interval if provided. pod2usage( -message => "FATAL: critical and warning thresholds only accept interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $c_limit = get_time $args{'critical'}; $w_limit = get_time $args{'warning'}; } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "is_replay_paused".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], "is_replay_paused", $PG_VERSION_91 or exit 1; @rs = @{ query_ver( $hosts[0], %queries )->[0] }; return status_unknown( $me, [ "Server is not standby." ], [ [ 'is_paused', 'NaN' ], [ 'lag_time', 'NaN', 's' ], [ 'has_data_delta', 'NaN', 's' ] ] ) if not $rs[0]; push @perfdata, [ "is_paused", $rs[1] ]; push @perfdata, [ "lag_time", $rs[2], "s" ]; push @perfdata, [ "has_data_delta", $rs[3] ]; # Always return ok if replay is not paused return status_ok( $me, [ ' replay is not paused' ], \@perfdata ) if not $rs[1]; # Do we have thresholds? if ( $c_limit != -1 ) { return status_critical( $me, [' replay lag time: ' . to_interval( $rs[2] ) ], \@perfdata ) if $rs[3] and $rs[2] > $c_limit; return status_warning( $me, [' replay lag time: ' . to_interval( $rs[2] ) ], \@perfdata ) if $rs[3] and $rs[2] > $w_limit; } return status_ok( $me, [ ' replay is paused.' ], \@perfdata ); } # Agnostic check vacuum or analyze sub # FIXME: we can certainly do better about temp tables sub check_last_maintenance { my $rs; my $c_limit; my $w_limit; my @perfdata; my @msg_crit; my @msg_warn; my @msg; my @hosts; my @all_db; my %counts; my %new_counts; my $dbchecked = 0; my $type = $_[0]; my %args = %{ $_[1] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_LAST_' . uc($type); my %queries = ( # 1st field: oldest known maintenance on a table # -inf if a table never had maintenance # NaN if nothing found # 2nd field: total number of maintenance # 3nd field: total number of auto-maintenance # 4th field: hash(insert||update||delete) to detect write # activity between two run and avoid useless alerts # # 8.2 does not have per-database activity stats. We must aggregate # from pg_stat_user_tables $PG_VERSION_82 => qq{ SELECT coalesce(max( coalesce(extract(epoch FROM current_timestamp - greatest(last_${type}, last_auto${type})), 'infinity'::float)), 'NaN'::float), NULL, NULL, sum(hashtext(n_tup_ins::text ||n_tup_upd::text ||n_tup_del::text)) FROM pg_stat_user_tables WHERE schemaname NOT LIKE 'pg_temp_%' }, # Starting with 8.3, we can check database activity from # pg_stat_database $PG_VERSION_83 => qq{ SELECT coalesce(max( coalesce(extract(epoch FROM current_timestamp - greatest(last_${type}, last_auto${type})), 'infinity'::float)), 'NaN'::float), NULL, NULL, ( SELECT md5(tup_inserted::text||tup_updated::text||tup_deleted::text) FROM pg_catalog.pg_stat_database WHERE datname = current_database() ) FROM pg_stat_user_tables WHERE schemaname NOT LIKE 'pg_temp_%' AND schemaname NOT LIKE 'pg_toast_temp_%' }, # Starting with 9.1, we can add the analyze/vacuum counts $PG_VERSION_91 => qq{ SELECT coalesce(max( coalesce(extract(epoch FROM current_timestamp - greatest(last_${type}, last_auto${type})), 'infinity'::float)), 'NaN'::float), coalesce(sum(${type}_count), 0) AS ${type}_count, coalesce(sum(auto${type}_count), 0) AS auto${type}_count, ( SELECT md5(tup_inserted::text||tup_updated::text||tup_deleted::text) FROM pg_catalog.pg_stat_database WHERE datname = current_database() ) FROM pg_stat_user_tables WHERE schemaname NOT LIKE 'pg_temp_%' AND schemaname NOT LIKE 'pg_toast_temp_%' } ); # warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $c_limit = get_time $args{'critical'}; $w_limit = get_time $args{'warning'}; @hosts = @{ parse_hosts %args }; pod2usage( -message => "FATAL: you must give only one host with service \"last_$type\".", -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], "last_$type", $PG_VERSION_82 or exit 1; # check required GUCs if ($hosts[0]->{'version_num'} < $PG_VERSION_83) { is_guc $hosts[0], 'stats_start_collector', 'on' or exit 1; is_guc $hosts[0], 'stats_row_level', 'on' or exit 1; } else { is_guc $hosts[0], 'track_counts', 'on' or exit 1; } @all_db = @{ get_all_dbname( $hosts[0] ) }; %counts = %{ load( $hosts[0], "${type}_counts", $args{'status-file'} ) || {} }; LOOP_DB: foreach my $db (@all_db) { my @perf; my $rs; next LOOP_DB if grep { $db =~ /$_/ } @dbexclude; next LOOP_DB if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; $dbchecked++; $rs = query_ver( $hosts[0], %queries, $db )->[0]; $db =~ s/=//g; push @perfdata => [ $db, $rs->[0], 's', $w_limit, $c_limit ]; $new_counts{$db} = [ $rs->[1], $rs->[2] ]; if ( exists $counts{$db} ) { if ($hosts[0]->{'version_num'} >= $PG_VERSION_91 ) { my $delta = $rs->[1] - $counts{$db}[0]; my $delta_auto = $rs->[2] - $counts{$db}[1]; push @perfdata => ( [ "$db $type", $delta ], [ "$db auto$type", $delta_auto ] ); } # avoid alerts if no write activity since last call if ( defined $counts{$db}[2] and $counts{$db}[2] eq $rs->[3] ) { # keep old hashed status for this database $new_counts{$db}[2] = $counts{$db}[2]; next LOOP_DB; } } if ( $rs->[0] >= $c_limit ) { push @msg_crit => "$db: " . to_interval($rs->[0]); next LOOP_DB; } if ( $rs->[0] >= $w_limit ) { push @msg_warn => "$db: " . to_interval($rs->[0]); next LOOP_DB; } # iif everything is OK, save the current hashed status for this database $new_counts{$db}[2] = $rs->[3]; } save $hosts[0], "${type}_counts", \%new_counts, $args{'status-file'}; return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn > 0; return status_ok( $me, [ "$dbchecked database(s) checked" ], \@perfdata ); } =item B (8.2+) Check on each databases that the oldest C (from autovacuum or not) is not older than the given threshold. This service uses the status file (see C<--status-file> parameter) with PostgreSQL 9.1+. Perfdata returns oldest C per database in seconds. With PostgreSQL 9.1+, the number of [auto]analyses per database since last call is also returned. Critical and Warning thresholds only accept an interval (eg. 1h30m25s) and apply to the oldest execution of analyse. Tables that were never analyzed, or whose analyze date was lost due to a crash, will raise a critical alert. B: this service does not raise alerts if the database had strictly no writes since last call. In consequence, a read-only database can have its oldest analyze reported in perfdata way after your thresholds, but not raise any alerts. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. Required privileges: unprivileged role able to log in all databases. =cut sub check_last_analyze { return check_last_maintenance( 'analyze', @_ ); } =item B (8.2+) Check that the oldest vacuum (from autovacuum or otherwise) in each database in the cluster is not older than the given threshold. This service uses the status file (see C<--status-file> parameter) with PostgreSQL 9.1+. Perfdata returns oldest vacuum per database in seconds. With PostgreSQL 9.1+, it also returns the number of [auto]vacuums per database since last execution. Critical and Warning thresholds only accept an interval (eg. 1h30m25s) and apply to the oldest vacuum. Tables that were never vacuumed, or whose vacuum date was lost due to a crash, will raise a critical alert. B: this service does not raise alerts if the database had strictly no writes since last call. In consequence, a read-only database can have its oldest vacuum reported in perfdata way after your thresholds, but not raise any alerts. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. Required privileges: unprivileged role able to log in all databases. =cut sub check_last_vacuum { return check_last_maintenance( 'vacuum', @_ ); } =item B (all) Check the number of locks on the hosts. Perfdata returns the number of locks, by type. Critical and Warning thresholds accept either a raw number of locks or a percentage. For percentage, it is computed using the following limits for 7.4 to 8.1: max_locks_per_transaction * max_connections for 8.2+: max_locks_per_transaction * (max_connections + max_prepared_transactions) for 9.1+, regarding lockmode : max_locks_per_transaction * (max_connections + max_prepared_transactions) or max_pred_locks_per_transaction * (max_connections + max_prepared_transactions) Required privileges: unprivileged role. =cut sub check_locks { my @rs; my @perfdata; my @msg; my @hosts; my %args = %{ $_[0] }; my $total_locks = 0; my $total_pred_locks = 0; my $waiting_locks = 0; my $me = 'POSTGRES_LOCKS'; my %queries = ( $PG_VERSION_74 => q{ SELECT count(l.granted), ref.mode, current_setting('max_locks_per_transaction')::integer * current_setting('max_connections')::integer, 0, ref.granted FROM ( SELECT 'AccessShareLock', 't'::boolean UNION SELECT 'RowShareLock', 't' UNION SELECT 'RowExclusiveLock', 't' UNION SELECT 'ShareUpdateExclusiveLock', 't' UNION SELECT 'ShareLock', 't' UNION SELECT 'ShareRowExclusiveLock', 't' UNION SELECT 'ExclusiveLock', 't' UNION SELECT 'AccessExclusiveLock', 't' UNION SELECT 'AccessShareLock', 'f' UNION SELECT 'RowShareLock', 'f' UNION SELECT 'RowExclusiveLock', 'f' UNION SELECT 'ShareUpdateExclusiveLock', 'f' UNION SELECT 'ShareLock', 'f' UNION SELECT 'ShareRowExclusiveLock', 'f' UNION SELECT 'ExclusiveLock', 'f' UNION SELECT 'AccessExclusiveLock', 'f' ) ref (mode, granted) LEFT JOIN pg_locks l ON (ref.mode, ref.granted) = (l.mode, l.granted) GROUP BY 2,3,4,5 ORDER BY ref.granted, ref.mode }, $PG_VERSION_82 => q{ SELECT count(l.granted), ref.mode, current_setting('max_locks_per_transaction')::integer * ( current_setting('max_prepared_transactions')::integer + current_setting('max_connections')::integer), 0, ref.granted FROM (SELECT * FROM ( VALUES ('AccessShareLock', 't'::boolean), ('RowShareLock', 't'), ('RowExclusiveLock', 't'), ('ShareUpdateExclusiveLock', 't'), ('ShareLock', 't'), ('ShareRowExclusiveLock', 't'), ('ExclusiveLock', 't'), ('AccessExclusiveLock', 't'), ('AccessShareLock', 'f'), ('RowShareLock', 'f'), ('RowExclusiveLock', 'f'), ('ShareUpdateExclusiveLock', 'f'), ('ShareLock', 'f'), ('ShareRowExclusiveLock', 'f'), ('ExclusiveLock', 'f'), ('AccessExclusiveLock', 'f') ) lockmode (mode, granted) ) ref LEFT JOIN pg_locks l ON (ref.mode, ref.granted) = (l.mode, l.granted) GROUP BY 2,3,4,5 ORDER BY ref.granted, ref.mode }, $PG_VERSION_91 => q{ SELECT count(l.granted), ref.mode, current_setting('max_locks_per_transaction')::integer * ( current_setting('max_prepared_transactions')::integer + current_setting('max_connections')::integer), current_setting('max_pred_locks_per_transaction')::integer * ( current_setting('max_prepared_transactions')::integer + current_setting('max_connections')::integer), ref.granted FROM (SELECT * FROM ( VALUES ('AccessShareLock', 't'::boolean), ('RowShareLock', 't'), ('RowExclusiveLock', 't'), ('ShareUpdateExclusiveLock', 't'), ('ShareLock', 't'), ('ShareRowExclusiveLock', 't'), ('ExclusiveLock', 't'), ('AccessExclusiveLock', 't'), ('AccessShareLock', 'f'), ('RowShareLock', 'f'), ('RowExclusiveLock', 'f'), ('ShareUpdateExclusiveLock', 'f'), ('ShareLock', 'f'), ('ShareRowExclusiveLock', 'f'), ('ExclusiveLock', 'f'), ('AccessExclusiveLock', 'f'), ('SIReadLock', 't') ) lockmode (mode, granted) ) ref LEFT JOIN pg_locks l ON (ref.mode, ref.granted) = (l.mode, l.granted) GROUP BY 2,3,4,5 ORDER BY ref.granted, ref.mode } ); # warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; # warning and critical must be raw or %. pod2usage( -message => "FATAL: critical and warning thresholds only accept raw numbers or %.", -exitval => 127 ) unless $args{'warning'} =~ m/^([0-9.]+)%?$/ and $args{'critical'} =~ m/^([0-9.]+)%?$/; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "locks".', -exitval => 127 ) if @hosts != 1; @rs = @{ query_ver $hosts[0], %queries }; $args{'predcritical'} = $args{'critical'}; $args{'predwarning'} = $args{'warning'}; $args{'critical'} = int($1 * $rs[0][2]/100) if $args{'critical'} =~ /^([0-9.]+)%$/; $args{'warning'} = int($1 * $rs[0][2]/100) if $args{'warning'} =~ /^([0-9.]+)%$/; $args{'predcritical'} = int($1 * $rs[0][3]/100) if $args{'predcritical'} =~ /^([0-9.]+)%$/; $args{'predwarning'} = int($1 * $rs[0][3]/100) if $args{'predwarning'} =~ /^([0-9.]+)%$/; map { $total_locks += $_->[0] if $_->[1] ne 'SIReadLock'; $total_pred_locks += $_->[0] if $_->[1] eq 'SIReadLock'; if ($_->[4] eq 't') { if ($_->[1] ne 'SIReadLock') { push @perfdata => [ $_->[1], $_->[0], undef, $args{'warning'}, $args{'critical'} ]; } else { push @perfdata => [ $_->[1], $_->[0], undef, $args{'predwarning'}, $args{'predcritical'} ]; } } else { $waiting_locks += $_->[0]; push @perfdata => [ "Waiting $_->[1]", $_->[0], undef, $args{'warning'}, $args{'critical'} ]; } } @rs; push @msg => "$total_locks locks, $total_pred_locks predicate locks, $waiting_locks waiting locks"; return status_critical( $me, \@msg, \@perfdata ) if $total_locks >= $args{'critical'} or ( $hosts[0]->{'version_num'} >= $PG_VERSION_91 and $total_pred_locks >= $args{'predcritical'} ); return status_warning( $me, \@msg, \@perfdata ) if $total_locks >= $args{'warning'} or ( $hosts[0]->{'version_num'} >= $PG_VERSION_91 and $total_pred_locks >= $args{'predwarning'}); return status_ok( $me, \@msg, \@perfdata ); } =item B (all) Check the longest running query in the cluster. Perfdata contains the max/avg/min running time and the number of queries per database. Critical and Warning thresholds only accept an interval. This service supports both C<--dbexclude> and C<--dbinclude> parameters. It also supports argument C<--exclude REGEX> to exclude queries matching the given regular expression from the check. Above 9.0, it also supports C<--exclude REGEX> to filter out application_name. You can use multiple C<--exclude REGEX> parameters. Required privileges: an unprivileged role only checks its own queries; a pg_monitor (10+) or superuser (<10) role is required to check all queries. =cut sub check_longest_query { my @rs; my @perfdata; my @msg; my @hosts; my $c_limit; my $w_limit; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_LONGEST_QUERY'; my $longest_query = 0; my $nb_query = 0; my %stats = (); my %queries = ( $PG_VERSION_74 => q{SELECT d.datname, COALESCE(elapsed, -1), COALESCE(query, '') FROM pg_database AS d LEFT JOIN ( SELECT datname, current_query AS query, extract('epoch' FROM date_trunc('second', current_timestamp-query_start) ) AS elapsed FROM pg_stat_activity WHERE current_query NOT LIKE '%' ) AS s ON (d.datname=s.datname) WHERE d.datallowconn }, $PG_VERSION_90 => q{SELECT d.datname, COALESCE(elapsed, -1), COALESCE(query, ''), application_name FROM pg_database AS d LEFT JOIN ( SELECT datname, current_query AS query, extract('epoch' FROM date_trunc('second', current_timestamp-query_start) ) AS elapsed, application_name FROM pg_stat_activity WHERE current_query NOT LIKE '%' ) AS s ON (d.datname=s.datname) WHERE d.datallowconn }, $PG_VERSION_92 => q{SELECT d.datname, COALESCE(elapsed, 0), COALESCE(query, ''), application_name FROM pg_database AS d LEFT JOIN ( SELECT datname, query, extract('epoch' FROM date_trunc('second', current_timestamp-state_change) ) AS elapsed, application_name FROM pg_stat_activity WHERE state = 'active' ) AS s ON (d.datname=s.datname) WHERE d.datallowconn } ); # Warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $c_limit = get_time( $args{'critical'} ); $w_limit = get_time( $args{'warning'} ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "longest_query".', -exitval => 127 ) if @hosts != 1; @rs = @{ query_ver( $hosts[0], %queries ) }; REC_LOOP: foreach my $r (@rs) { # exclude/include on db name next REC_LOOP if grep { $r->[0] =~ /$_/ } @dbexclude; next REC_LOOP if @dbinclude and not grep { $r->[0] =~ /$_/ } @dbinclude; # exclude on query text foreach my $exclude_re ( @{ $args{'exclude'} } ) { next REC_LOOP if $r->[2] =~ /$exclude_re/; next REC_LOOP if defined($r->[3]) && $r->[3] =~ /$exclude_re/; } $stats{$r->[0]} = { 'num' => 0, 'max' => -1, 'avg' => 0, } unless exists $stats{$r->[0]}; next REC_LOOP unless $r->[2] ne ''; $longest_query = $r->[1] if $r->[1] > $longest_query; $nb_query++; $stats{$r->[0]}{'num'}++; $stats{$r->[0]}{'max'} = $r->[1] if $stats{$r->[0]}{'max'} < $r->[1]; $stats{$r->[0]}{'avg'} = ( $stats{$r->[0]}{'avg'} * ($stats{$r->[0]}{'num'} -1) + $r->[1]) / $stats{$r->[0]}{'num'}; } DB_LOOP: foreach my $db (keys %stats) { unless($stats{$db}{'max'} > -1) { $stats{$db}{'max'} = 'NaN'; $stats{$db}{'avg'} = 'NaN'; } push @perfdata, ( [ "$db max", $stats{$db}{'max'}, 's', $w_limit, $c_limit ], [ "$db avg", $stats{$db}{'avg'}, 's', $w_limit, $c_limit ], [ "$db #queries", $stats{$db}{'num'} ] ); if ( $stats{$db}{'max'} > $c_limit ) { push @msg => "$db: ". to_interval($stats{$db}{'max'}); next DB_LOOP; } if ( $stats{$db}{'max'} > $w_limit ) { push @msg => "$db: ". to_interval($stats{$db}{'max'}); } } return status_critical( $me, \@msg, \@perfdata ) if $longest_query > $c_limit; return status_warning( $me, \@msg, \@perfdata ) if $longest_query > $w_limit; return status_ok( $me, [ "$nb_query running querie(s)" ], \@perfdata ); } =item B (all) Checks oldest database by transaction age. Critical and Warning thresholds are optional. They accept either a raw number or percentage for PostgreSQL 8.2 and more. If percentage is given, the thresholds are computed based on the "autovacuum_freeze_max_age" parameter. 100% means that some table(s) reached the maximum age and will trigger an autovacuum freeze. Percentage thresholds should therefore be greater than 100%. Even with no threshold, this service will raise a critical alert if a database has a negative age. Perfdata returns the age of each database. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Required privileges: unprivileged role. =cut sub check_max_freeze_age { my @rs; my @perfdata; my @msg; my @msg_crit; my @msg_warn; my @hosts; my $c_limit; my $w_limit; my $oldestdb; my $oldestage = -1; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_MAX_FREEZE_AGE'; my %queries = ( $PG_VERSION_74 => q{SELECT datname, age(datfrozenxid) FROM pg_database WHERE datname <> 'template0' }, $PG_VERSION_82 => q{SELECT datname, age(datfrozenxid), current_setting('autovacuum_freeze_max_age') FROM pg_database WHERE datname <> 'template0' } ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "max_freeze_age".', -exitval => 127 ) if @hosts != 1; # warning and critical must be raw or %. if ( defined $args{'warning'} and defined $args{'critical'} ) { # warning and critical must be raw pod2usage( -message => "FATAL: critical and warning thresholds only accept raw numbers or % (for 8.2+).", -exitval => 127 ) unless $args{'warning'} =~ m/^([0-9]+)%?$/ and $args{'critical'} =~ m/^([0-9]+)%?$/; $w_limit = $args{'warning'}; $c_limit = $args{'critical'}; set_pgversion($hosts[0]); pod2usage( -message => "FATAL: only raw thresholds are compatible with PostgreSQL 8.1 and below.", -exitval => 127 ) if $hosts[0]->{'version_num'} < $PG_VERSION_82 and ($args{'warning'} =~ m/%\s*$/ or $args{'critical'} =~ m/%\s*$/); } @rs = @{ query_ver( $hosts[0], %queries ) }; if ( scalar @rs and defined $args{'critical'} ) { $c_limit = int($1 * $rs[0][2]/100) if $args{'critical'} =~ /^([0-9.]+)%$/; $w_limit = int($1 * $rs[0][2]/100) if $args{'warning'} =~ /^([0-9.]+)%$/; } REC_LOOP: foreach my $r (@rs) { my @perf; next REC_LOOP if grep { $r->[0] =~ /$_/ } @dbexclude; next REC_LOOP if @dbinclude and not grep { $r->[0] =~ /$_/ } @dbinclude; if ($oldestage < $r->[1]) { $oldestdb = $r->[0]; $oldestage = $r->[1]; } @perf = ( $r->[0], $r->[1] ); push @perf => ( undef, $w_limit, $c_limit ) if defined $c_limit; push @perfdata => [ @perf ]; if ( $r->[1] < 0 ) { push @msg_crit => "$r->[0] has a negative age" ; next REC_LOOP; } if ( defined $c_limit ) { if ( $r->[1] > $c_limit ) { push @msg_crit => "$r->[0]"; next REC_LOOP; } push @msg_warn => "$r->[0]" if defined $w_limit and $r->[1] > $w_limit; } } return status_critical( $me, [ 'Critical: '. join(',', @msg_crit) . (scalar @msg_warn? ' Warning: '. join(',', @msg_warn):'') ], \@perfdata ) if scalar @msg_crit; return status_warning( $me, [ 'Warning: '. join(',', @msg_warn) ], \@perfdata ) if scalar @msg_warn; return status_ok( $me, [ "oldest database is $oldestdb with age of $oldestage" ], \@perfdata ); } =item B (all) Check if the cluster is running the most recent minor version of PostgreSQL. Latest versions of PostgreSQL can be fetched from PostgreSQL official website if check_pgactivity has access to it, or must be given as a parameter. Without C<--critical> or C<--warning> parameters, this service attempts to fetch the latest version numbers online. A critical alert is raised if the minor version is not the most recent. You can optionally set the path to your prefered retrieval tool using the C<--path> parameter (eg. C<--path '/usr/bin/wget'>). Supported programs are: GET, wget, curl, fetch, lynx, links, links2. If you do not want to (or cannot) query the PostgreSQL website, provide the expected versions using either C<--warning> OR C<--critical>, depending on which return value you want to raise. The given string must contain one or more MINOR versions separated by anything but a '.'. For instance, the following parameters are all equivalent: --critical "10.1 9.6.6 9.5.10 9.4.15 9.3.20 9.2.24 9.1.24 9.0.23 8.4.22" --critical "10.1, 9.6.6, 9.5.10, 9.4.15, 9.3.20, 9.2.24, 9.1.24, 9.0.23, 8.4.22" --critical "10.1,9.6.6,9.5.10,9.4.15,9.3.20,9.2.24,9.1.24,9.0.23,8.4.22" --critical "10.1/9.6.6/9.5.10/9.4.15/9.3.20/9.2.24/9.1.24/9.0.23/8.4.22" Any other value than 3 numbers separated by dots (before version 10.x) or 2 numbers separated by dots (version 10 and above) will be ignored. If the running PostgreSQL major version is not found, the service raises an unknown status. Perfdata returns the numerical version of PostgreSQL. Required privileges: unprivileged role; access to http://www.postgresql.org required to download version numbers. =cut sub check_minor_version { my @perfdata; my @msg; my %latest_versions; my $rss; my @hosts; my $major_version; my %args = %{ $_[0] }; my $me = 'POSTGRES_MINOR_VERSION'; my $timeout = get_time($args{'timeout'}); my $url = 'http://www.postgresql.org/versions.rss'; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "minor_version".', -exitval => 127 ) if @hosts != 1; set_pgversion($hosts[0]); if (not defined $args{'warning'} and not defined $args{'critical'} ) { # These methods come from check_postgres, # by Greg Sabino Mullane , # licenced under BSD our %get_methods = ( 'GET' => "GET -t $timeout -H 'Pragma: no-cache' $url", 'wget' => "wget --quiet --timeout=$timeout --no-cache -O - $url", 'curl' => "curl --silent --location --max-time $timeout -H 'Pragma: no-cache' $url", 'fetch' => "fetch -q -T $timeout -o - $url", 'lynx' => "lynx --connect-timeout=$timeout --dump $url", 'links' => 'links -dump $url', 'links2' => 'links2 -dump $url' ); # Force the fetching method if ($args{'path'}) { my $meth = basename $args{'path'}; pod2usage( -message => "FATAL: \"$args{'path'}\" is not a valid program.", -exitval => 127 ) unless -x $args{'path'}; pod2usage( -message => "FATAL: \"$args{'path'}\" is not a supported program.", -exitval => 127 ) unless $meth =~ 'GET|wget|curl|fetch|lynx|links|links2'; # fetch the latest versions via $path $rss = qx{$get_methods{$meth} 2>/dev/null}; } else { # Fetch the latest versions foreach my $exe (values %get_methods) { $rss = qx{$exe 2>/dev/null}; last if $rss =~ 'PostgreSQL latest versions'; } } return status_unknown( $me, [ 'Could not fetch PostgreSQL latest versions' ] ) unless $rss; # Versions until 9.6 $latest_versions{"$1.$2"} = [$1 * 10000 + $2 * 100 + $3, "$1.$2.$3"] while ($rss =~ m/(\d+)\.(\d+)\.(\d+)/g && $1<10); # Versions from 10 $latest_versions{"$1"} = [$1 * 10000 + $2, "$1.$2"] while ($rss =~ m/<title>(\d+)\.(\d+)/g && $1>=10); } else { pod2usage( -message => 'FATAL: you must provide a warning OR a critical threshold for service minor_version!', -exitval => 127 ) if defined $args{'critical'} and defined $args{'warning'}; my $given_version = defined $args{'critical'} ? $args{'critical'} : $args{'warning'}; while ( $given_version =~ m/(\d+)\.(\d+)\.(\d*)/g ) { $latest_versions{"$1.$2"} = [$1 * 10000 + $2 * 100 + $3, "$1.$2.$3"] if $1<10 ; # v9.6.5=90605 } while ( $given_version =~ m/(\d+)\.(\d+)/g ) { $latest_versions{"$1"} = [$1 * 10000 + $2, "$1.$2"] if $1>=10 ; # v10.1 = 100001, v11.0=110000 } } if ( $hosts[0]{'version_num'} < 100000 ) { #eg 90605 for 9.6.5 -> major is 9.6 $hosts[0]{'version'} =~ '^(\d+\.\d+).*$'; $major_version = $1; } else { # eg 100001 for 10.1 -> major is 10 $major_version = int($hosts[0]{'version_num'}/10000) ; } dprint ("major version: $major_version\n"); unless ( defined $latest_versions{$major_version} ) { push @msg => "Unknown major PostgreSQL version $major_version"; return status_unknown( $me, \@msg ); } push @perfdata => [ 'version', $hosts[0]{'version_num'}, 'PGNUMVER' ]; if ( $hosts[0]{'version_num'} != $latest_versions{$major_version}[0] ) { push @msg => "PostgreSQL version ". $hosts[0]{'version'} ." (should be $latest_versions{$major_version}[1])"; return status_warning( $me, \@msg, \@perfdata ) if defined $args{'warning'}; return status_critical( $me, \@msg, \@perfdata ); } push @msg => "PostgreSQL version ". $hosts[0]{'version'}; return status_ok( $me, \@msg, \@perfdata ); } =item B<oldest_2pc> (8.1+) Check the oldest I<two-phase commit transaction> (aka. prepared transaction) in the cluster. Perfdata contains the max/avg age time and the number of prepared transactions per databases. Critical and Warning thresholds only accept an interval. Required privileges: unprivileged role. =cut sub check_oldest_2pc { my @rs; my @perfdata; my @msg; my @hosts; my $c_limit; my $w_limit; my $me = 'POSTGRES_OLDEST_2PC'; my $oldest_2pc = 0; my $nb_2pc = 0; my %stats = (); my $query = q{SELECT transaction, gid, coalesce(extract('epoch' FROM date_trunc('second', current_timestamp-prepared) ), -1), owner, d.datname FROM pg_database AS d LEFT JOIN pg_prepared_xacts AS x ON d.datname=x.database WHERE d.datallowconn }; # Warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $c_limit = get_time $args{'critical'}; $w_limit = get_time $args{'warning'}; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "oldest_2pc".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'postgres_oldest_2pc', $PG_VERSION_81 or exit 1; @rs = @{ query( $hosts[0], $query ) }; REC_LOOP: foreach my $r (@rs) { $stats{$r->[4]} = { 'num' => 0, 'max' => -1, 'avg' => 0, } unless exists $stats{$r->[4]}; $oldest_2pc = $r->[2] if $r->[2] > $oldest_2pc; $stats{$r->[4]}{'num'}++ if $r->[0]; $stats{$r->[4]}{'max'} = $r->[2] if $stats{$r->[4]}{'max'} < $r->[2]; $stats{$r->[4]}{'avg'} = ( $stats{$r->[4]}{'avg'} * ($stats{$r->[4]}{'num'} -1) + $r->[2]) / $stats{$r->[4]}{'num'} if $stats{$r->[4]}{'num'}; } DB_LOOP: foreach my $db (sort keys %stats) { $nb_2pc += $stats{$db}{'num'}; unless($stats{$db}{'max'} > -1) { $stats{$db}{'max'} = 'NaN'; $stats{$db}{'avg'} = 'NaN'; } push @perfdata, ( [ "$db max", $stats{$db}{'max'}, 's', $w_limit, $c_limit ], [ "$db avg", $stats{$db}{'avg'}, 's', $w_limit, $c_limit ], [ "$db # prep. xact", $stats{$db}{'num'} ] ); if ( $stats{$db}{'max'} > $c_limit ) { push @msg => "oldest 2pc on $db: ". to_interval($stats{$db}{'max'}); next DB_LOOP; } if ( $stats{$db}{'max'} > $w_limit ) { push @msg => "oldest 2pc on $db: ". to_interval($stats{$db}{'max'}); } } unshift @msg => "$nb_2pc prepared transaction(s)"; return status_critical( $me, \@msg, \@perfdata ) if $oldest_2pc > $c_limit; return status_warning( $me, \@msg, \@perfdata ) if $oldest_2pc > $w_limit; return status_ok( $me, \@msg, \@perfdata ); } =item B<oldest_idlexact> (8.3+) Check the oldest I<idle> transaction. Perfdata contains the max/avg age and the number of idle transactions per databases. Critical and Warning thresholds only accept an interval. This service supports both C<--dbexclude> and C<--dbinclude> parameters. Above 9.2, it supports C<--exclude> to filter out connections. Eg., to filter out pg_dump and pg_dumpall, set this to 'pg_dump,pg_dumpall'. Before 9.2, this services checks for idle transaction with their start time. Thus, the service can mistakenly take account of transaction transiently in idle state. From 9.2 and up, the service checks for transaction that really had no activity since the given thresholds. Required privileges: an unprivileged role checks only its own queries; a pg_monitor (10+) or superuser (<10) role is required to check all queries. =cut sub check_oldest_idlexact { my @rs; my @perfdata; my @msg; my @hosts; my $c_limit; my $w_limit; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_OLDEST_IDLEXACT'; my $oldest_idle = 0; my $nb_idle = 0; my %stats = ( ); my %queries = ( $PG_VERSION_83 => q{SELECT d.datname, coalesce(extract('epoch' FROM date_trunc('second', current_timestamp-xact_start) ), -1) FROM pg_database AS d LEFT JOIN pg_stat_activity AS a ON (a.datid = d.oid AND current_query = '<IDLE> in transaction')}, $PG_VERSION_92 => q{SELECT d.datname, coalesce(extract('epoch' FROM date_trunc('second', current_timestamp-state_change) ), -1) FROM pg_database AS d LEFT JOIN pg_stat_activity AS a ON (a.datid = d.oid AND state='idle in transaction') } ); # Exclude some apps if(defined $args{'exclude'}[0]){ $queries{$PG_VERSION_92}.=" WHERE a.application_name NOT IN ('".join("','", split(',', $args{'exclude'}[0]))."')"; } # Warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless ( is_time( $args{'warning'} ) and is_time( $args{'critical'} ) ); $c_limit = get_time $args{'critical'}; $w_limit = get_time $args{'warning'}; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "oldest_idlexact".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'oldest_idlexact', $PG_VERSION_83 or exit 1; @rs = @{ query_ver( $hosts[0], %queries ) }; REC_LOOP: foreach my $r (@rs) { $stats{$r->[0]} = { 'num' => 0, 'max' => -1, 'avg' => 0, } unless exists $stats{$r->[0]}; $oldest_idle = $r->[1] if $r->[1] > $oldest_idle; $stats{$r->[0]}{'num'}++ if $r->[1] > -1; $stats{$r->[0]}{'max'} = $r->[1] if $stats{$r->[0]}{'max'} < $r->[1]; $stats{$r->[0]}{'avg'} = ( $stats{$r->[0]}{'avg'} * ($stats{$r->[0]}{'num'} -1) + $r->[1]) / $stats{$r->[0]}{'num'} if $stats{$r->[0]}{'num'}; } DB_LOOP: foreach my $db (sort keys %stats) { next DB_LOOP if grep { $db =~ /$_/ } @dbexclude; next DB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; $nb_idle += $stats{$db}{'num'}; unless($stats{$db}{'max'} > -1) { $stats{$db}{'max'} = 'NaN'; $stats{$db}{'avg'} = 'NaN'; } push @perfdata, ( [ "$db max", $stats{$db}{'max'}, 's', $w_limit, $c_limit ], [ "$db avg", $stats{$db}{'avg'}, 's', $w_limit, $c_limit ], [ "$db # idle xact", $stats{$db}{'num'} ] ); if ( $stats{$db}{'max'} > $c_limit ) { push @msg => "oldest idle xact on $db: ". to_interval($stats{$db}{'max'}); next DB_LOOP; } if ( $stats{$db}{'max'} > $w_limit ) { push @msg => "oldest idle xact on $db: ". to_interval($stats{$db}{'max'}); } } unshift @msg => "$nb_idle idle transaction(s)"; return status_critical( $me, \@msg, \@perfdata ) if $oldest_idle > $c_limit; return status_warning( $me, \@msg, \@perfdata ) if $oldest_idle > $w_limit; return status_ok( $me, \@msg, \@perfdata ); } =item B<oldest_xmin> (8.4+) Check the xmin I<horizon> from distinct sources of xmin retention. Per default, Perfdata outputs the oldest known xmin age for each database among running queries, opened or idle transactions, pending prepared transactions, replication slots and walsender. For versions prior to 9.4, only C<2pc> source of xmin retention is checked. Using C<--detailed>, Perfdata contains the oldest xmin and maximum age for the following source of xmin retention: C<query> (a running query), C<active_xact> (an opened transaction currently executing a query), C<idle_xact> (an opened transaction being idle), C<2pc> (a pending prepared transaction), C<repslot> (a replication slot) and C<walwender> (a WAL sender replication process), for each connectable database. If a source doesn't retain any transaction for a database, NaN is returned. For versions prior to 9.4, only C<2pc> source of xmin retention is available, so other sources won't appear in the perfdata. Note that xmin retention from walsender is only set if C<hot_standby_feedback> is enabled on remote standby. Critical and Warning thresholds are optional. They only accept a raw number of transaction. This service supports both C<--dbexclude>" and C<--dbinclude>" parameters. Required privileges: a pg_read_all_stats (10+) or superuser (<10) role is required to check pg_stat_replication. 2PC, pg_stat_activity, and replication slots don't require special privileges. =cut sub check_oldest_xmin { my @rs; my @perfdata; my @msg; my @msg_crit; my @msg_warn; my @hosts; my $detailed; my $c_limit; my $w_limit; my %oldest_xmin; # track oldest xmin and its kind for each database my %args = %{ $_[0] }; my $me = 'POSTGRES_OLDEST_XMIN'; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my %queries = ( # 8.4 is the first supported version as we rely on window functions to # get the oldest xmin. Only 2PC has transaction information available $PG_VERSION_84 => q{ WITH ordered AS ( SELECT '2pc' AS kind, d.datname, -- xid type doesn't have range operators as the value will wraparound. -- Instead, rely on age() function and row_number() window function -- to get the oldest xid found. row_number() OVER ( PARTITION BY d.datname ORDER BY age(transaction) DESC NULLS LAST ) rownum, age(transaction) AS age, transaction AS xmin FROM (SELECT transaction, database FROM pg_prepared_xacts UNION ALL SELECT NULL, NULL ) sql(transaction, datname) -- we use this JOIN condition to make sure that we'll always have a -- full record for all (connectable) databases JOIN pg_database d ON d.datname = coalesce(sql.datname, d.datname) WHERE d.datallowconn ) SELECT datname, kind, age, xmin FROM ordered WHERE rownum = 1 }, # backend_xmin and backend_xid added to pg_stat_activity, # backend_xmin added to pg_stat_replication, # replication slots introduced $PG_VERSION_94 => q{ WITH raw AS ( -- regular backends SELECT CASE WHEN xact_start = query_start THEN 'query' ELSE CASE WHEN state = 'idle in transaction' THEN 'idle_xact' ELSE 'active_xact' END END AS kind, datname, coalesce(backend_xmin, backend_xid) AS xmin FROM pg_stat_activity -- exclude ourselves, as a blocked xmin in another database would be -- exposed in the database we're connecting too, which may otherwise -- not have the same xmin WHERE pid != pg_backend_pid() UNION ALL ( -- 2PC SELECT '2pc' AS kind, database AS datname, transaction AS xmin FROM pg_prepared_xacts ) UNION ALL ( -- replication slots SELECT 'repslot' AS kind, database AS datname, xmin AS xmin FROM pg_replication_slots ) UNION ALL ( -- walsenders SELECT 'walsender' AS kind, NULL AS datname, backend_xmin AS xmin FROM pg_stat_replication ) ), ordered AS ( SELECT kind, datname, -- xid type doesn't have range operators as the value will wraparound. -- Instead, rely on age() function and row_number() window function -- to get the oldest xid found. row_number() OVER ( PARTITION BY kind, datname ORDER BY age(xmin) DESC NULLS LAST ) rownum, age(xmin) AS age, xmin FROM raw ) SELECT f.datname, f.kind, o.age, o.xmin FROM ordered AS o RIGHT JOIN ( SELECT d.datname, v.kind FROM pg_catalog.pg_database d, (VALUES ( 'query' ), ( 'idle_xact' ), ( 'active_xact' ), ( '2pc' ), ( 'repslot' ), ( 'walsender' ) ) v(kind) WHERE d.datallowconn ) f ON o.datname = f.datname AND o.kind = f.kind WHERE coalesce(o.rownum, 1) = 1 } ); # Either both warning and critical are required or none. pod2usage( -message => "FATAL: you must specify both critical and warning thresholds or none of them.", -exitval => 127 ) unless ( defined $args{'warning'} and defined $args{'critical'}) or (not defined $args{'warning'} and not defined $args{'critical'}); if ( defined $args{'critical'} ) { $c_limit = $args{'critical'}; $w_limit = $args{'warning'}; # warning and critical must be raw. pod2usage( -message => "FATAL: critical and warning thresholds only accept raw number of transactions.", -exitval => 127 ) unless $args{'warning'} =~ m/^([0-9.]+)$/ and $args{'critical'} =~ m/^([0-9.]+)$/; } $detailed = $args{'detailed'}; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "oldest_xmin".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'oldest_xmin', $PG_VERSION_84 or exit 1; @rs = @{ query_ver( $hosts[0], %queries ) }; REC_LOOP: foreach my $r (@rs) { next REC_LOOP if @dbexclude and grep { $r->[0] =~ /$_/ } @dbexclude; next REC_LOOP if @dbinclude and not grep { $r->[0] =~ /$_/ } @dbinclude; map { $_ = 'NaN' if $_ eq ''} @{$r}[2..3]; if ($detailed) { push @perfdata => ( ["$r->[0]_$r->[1]_age", $r->[2]], ["$r->[0]_$r->[1]_xmin", $r->[3]] ); } else { if ( exists $oldest_xmin{$r->[0]} ) { $oldest_xmin{$r->[0]} = [ $r->[1], $r->[2] ] if $oldest_xmin{$r->[0]}[1] eq 'NaN' or $r->[2] > $oldest_xmin{$r->[0]}[1]; } else { $oldest_xmin{$r->[0]} = [ $r->[1], $r->[2] ]; } } if (defined $c_limit) { if ($r->[2] ne 'NaN' and $r->[2] > $c_limit) { push @msg_crit => "$r->[0]_$r->[1]_age"; next REC_LOOP; } push @msg_warn => "$r->[0]_$r->[1]_age" if ($r->[2] ne 'NaN' and $r->[2] > $w_limit); } } if (not $detailed) { foreach my $k (keys %oldest_xmin) { push @perfdata => ( ["${k}_age", $oldest_xmin{$k}[1]] ); push @msg, "Oldest xmin in $k from ". $oldest_xmin{$k}[0] if $oldest_xmin{$k}[1] ne 'NaN'; } } return status_critical( $me, [ 'Critical: '. join(',', @msg_crit) . (scalar @msg_warn? ' Warning: '. join(',', @msg_warn):''), @msg ], \@perfdata ) if scalar @msg_crit; return status_warning( $me, [ 'Warning: '. join(',', @msg_warn), @msg ], \@perfdata ) if scalar @msg_warn; return status_ok( $me, \@msg, \@perfdata ); } =item B<pg_dump_backup> Check the age and size of backups. This service uses the status file (see C<--status-file> parameter). The C<--path> argument contains the location to the backup folder. The supported format is a glob pattern matching every folder or file that you need to check. The C<--pattern> is required, and must contain a regular expression matching the backup file name, extracting the database name from the first matching group. Optionally, a C<--global-pattern> option can be supplied to check for an additional global file. Examples: To monitor backups like: /var/lib/backups/mydb-20150803.dump /var/lib/backups/otherdb-20150803.dump /var/lib/backups/mydb-20150804.dump /var/lib/backups/otherdb-20150804.dump you must set: --path '/var/lib/backups/*' --pattern '(\w+)-\d+.dump' If the path contains the date, like this: /var/lib/backups/2015-08-03-daily/mydb.dump /var/lib/backups/2015-08-03-daily/otherdb.dump then you can set: --path '/var/lib/backups/*/*.dump' --pattern '/\d+-\d+-\d+-daily/(.*).dump' For compatibility with pg_back (https://github.com/orgrim/pg_back), you should use: --path '/path/*{dump,sql}' --pattern '(\w+)_[0-9-_]+.dump' --global-pattern 'pg_global_[0-9-_]+.sql' The C<--critical> and C<--warning> thresholds are optional. They accept a list of 'metric=value' separated by a comma. Available metrics are C<oldest> and C<newest>, respectively the age of the oldest and newest backups, and C<size>, which must be the maximum variation of size since the last check, expressed as a size or a percentage. C<mindeltasize>, expressed in B, is the minimum variation of size needed to raise an alert. This service supports the C<--dbinclude> and C<--dbexclude> arguments, to respectively test for the presence of include or exclude files. The argument C<--exclude> enables you to exclude files younger than an interval. This is useful to ignore files from a backup in progress. Eg., if your backup process takes 2h, set this to '125m'. Perfdata returns the age of the oldest and newest backups, as well as the size of the newest backups. Required privileges: unprivileged role; the system user needs read access on the directory containing the dumps (but not on the dumps themselves). =cut sub check_pg_dump_backup { my @rs; my @stat; my @dirfiles; my @perfdata; my @msg_crit; my @msg_warn; my %db_sizes; my %firsts; my %lasts; my %crit; my %warn; my $mtime; my $size; my $me = 'POSTGRES_PGDUMP_BACKUP'; my $sql = 'SELECT datname FROM pg_database'; my $now = time(); my @hosts = @{ parse_hosts %args }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $min_age = 0; my $backup_path = $args{'path'}; my $pattern = $args{'pattern'}; my $thresholds_re = qr/(oldest|newest|size|mindeltasize)\s*=\s*(\d+[^,]*)/i; my $global_pattern = $args{'global-pattern'}; pod2usage( -message => "FATAL: you must specify a pattern for filenames", -exitval => 127 ) unless $pattern; pod2usage( -message => "FATAL: you must specify a backup path", -exitval => 127 ) unless $backup_path; pod2usage( -message => "FATAL: the backup_path must contain a wilcard and not be a directory", -exitval => 127 ) if ( -d "$backup_path") ; pod2usage( -message => 'FATAL: you must give one (and only one) host with service "pg_dump_backup".', -exitval => 127 ) if @hosts != 1; pod2usage( -message => "FATAL: to use a size threshold, a status-file is required", -exitval => 127 ) unless $args{'status-file'}; # warning and critical must be raw pod2usage( -message => "FATAL: critical and warning thresholds only accept a list of 'label=value' separated by comma.\n" . "See documentation for more information.", -exitval => 127 ) unless ( not defined $args{'warning'} ) or ( $args{'warning'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ and $args{'critical'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ ); $min_age = get_time( $args{'exclude'}[0] ) if defined $args{'exclude'}[0]; while ( $args{'warning'} and $args{'warning'} =~ /$thresholds_re/g ) { my ( $threshold, $value ) = ($1, $2); if( $threshold eq "oldest" or $threshold eq "newest" ) { pod2usage( -message => "FATAL: threshold for the oldest or newest backup age must be an interval: $threshold=$value", -exitval => 127 ) unless is_time($value); $value = get_time($value); } $warn{$threshold} = $value if $1 and defined $2; } while ( $args{'critical'} and $args{'critical'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); if( $threshold eq "oldest" or $threshold eq "newest" ) { pod2usage( -message => "FATAL: threshold for the oldest or newest backup age must be an interval: $threshold=$value", -exitval => 127 ) unless is_time($value); $value = get_time($value); } $crit{$threshold} = $value if $1 and defined $2; } dprint ("Looking at files: $backup_path \n"); # Stat files in the backup directory @dirfiles = glob $backup_path; foreach my $file ( @dirfiles ) { my $filename = $file; my $dbname; my $mtime; my $size; ( undef, undef, undef, undef, undef, undef, undef, $size, undef, $mtime ) = stat $file; dprint ("Looking at file: $file (size: $size) \n"); next if $now - $mtime < $min_age; if ( $global_pattern and $filename =~ $global_pattern ) { $firsts{'globals_objects'} = [ $mtime, $size ] if not exists $firsts{'globals_objects'} or $firsts{'globals_objects'}[0] > $mtime; $lasts{'globals_objects'} = [ $mtime, $size ] if not exists $lasts{'globals_objects'} or $lasts{'globals_objects'}[0] < $mtime; } next unless $filename =~ $pattern and defined $1; $dbname = $1; dprint ("Looking for a DB named: $dbname \n") ; $firsts{$dbname} = [ $mtime, $size ] if not exists $firsts{$dbname} or $firsts{$dbname}[0] > $mtime; $lasts{$dbname} = [ $mtime, $size ] if not defined $lasts{$dbname} or $lasts{$dbname}[0] < $mtime; } if ( scalar @dbinclude ) { push @rs => [ $_ ] foreach @dbinclude; } else { # Check against databases queried from pg_database @rs = @{ query( $hosts[0], $sql ) }; } # If global_pattern is defined, add them to the list to check push @rs => [ "globals_objects" ] if $global_pattern; %db_sizes = %{ load( $hosts[0], 'pg_dump_backup', $args{'status-file'} ) || {} } if exists $warn{'size'} or exists $crit{'size'}; ALLDB: foreach my $row ( @rs ) { my $db = $row->[0]; my @perf_newest; my @perf_oldest; my @perf_delta; my @perf_size; my $last_age; my $first_age; next if grep { $db =~ /$_/ } @dbexclude; next if @dbinclude and not grep { $db =~ /$_/ } @dbinclude and $db ne 'globals_objects'; if ( not exists $lasts{$db}[0] ) { push @msg_crit => sprintf("'%s_oldest'=NaNs", $db); push @msg_crit => sprintf("'%s_newest'=NaNs", $db); push @msg_crit => sprintf("'%s_size'=NaNB", $db); @perf_oldest = ( "${db}_oldest", 'NaN', 's' ); @perf_newest = ( "${db}_newest", 'NaN', 's' ); @perf_size = ( "${db}_size", 'NaN', 'B' ); @perf_delta = ( "${db}_delta", 'NaN', 'B' ); push @perfdata => ( \@perf_newest, \@perf_oldest, \@perf_size, \@perf_delta ); next; } $last_age = $now - $lasts{$db}[0]; $first_age = $now - $firsts{$db}[0]; @perf_oldest = ( "${db}_oldest", $first_age, 's' ); @perf_newest = ( "${db}_newest", $last_age, 's' ); @perf_size = ( "${db}_size", $lasts{$db}[1], 'B' ); @perf_delta = exists $db_sizes{$db} ? ( "${db}_delta", $lasts{$db}[1] - $db_sizes{$db}[1], 'B' ) : ( "${db}_delta", 0, 'B' ); if ( exists $warn{'newest'} or exists $crit{'newest'} ) { my $c_limit = $crit{'newest'}; my $w_limit = $warn{'newest'}; push @perf_newest => ( defined $w_limit ? $w_limit : undef ); push @perf_newest => ( defined $c_limit ? $c_limit : undef ); if ( defined $c_limit and $last_age > $c_limit ) { push @msg_crit => sprintf("'%s_newest'=%s", $db, to_interval( $last_age ) ); } elsif ( defined $w_limit and $last_age > $w_limit ) { push @msg_warn => sprintf("'%s_newest'=%s", $db, to_interval( $last_age ) ); } } if ( exists $warn{'oldest'} or exists $crit{'oldest'} ) { my $c_limit = $crit{'oldest'}; my $w_limit = $warn{'oldest'}; push @perf_oldest => ( defined $w_limit ? $w_limit : undef ); push @perf_oldest => ( defined $c_limit ? $c_limit : undef ); if ( defined $c_limit and $first_age > $c_limit ) { push @msg_crit => sprintf("'%s_oldest'=%s", $db, to_interval( $first_age ) ); } elsif ( defined $w_limit and $first_age > $w_limit ) { push @msg_warn => sprintf( "'%s_oldest'=%s", $db, to_interval( $first_age ) ); } } if ( exists $warn{'size'} or exists $crit{'size'} ) { next ALLDB unless exists $db_sizes{$db}; my $w_delta = get_size( $warn{'size'}, $db_sizes{$db}[1] ) if exists $warn{'size'}; my $c_delta = get_size( $crit{'size'}, $db_sizes{$db}[1] ) if exists $crit{'size'}; my $delta = abs( $lasts{$db}[1] - $db_sizes{$db}[1] ); push @perf_delta => ( defined $w_delta ? $w_delta: undef ); push @perf_delta => ( defined $c_delta ? $c_delta: undef ); my $w_mindeltasize = 0; my $c_mindeltasize = 0; $w_mindeltasize = $warn{'mindeltasize'} if exists $warn{'mindeltasize'}; $c_mindeltasize = $crit{'mindeltasize'} if exists $crit{'mindeltasize'}; if ( defined $c_delta and $delta > $c_delta and $delta >= $c_mindeltasize ) { push @msg_crit => sprintf("'%s_delta'=%dB", $db, $lasts{$db}[1]); } elsif ( defined $w_delta and $delta > $w_delta and $delta >= $w_mindeltasize ) { push @msg_warn => sprintf("'%s_delta'=%dB", $db, $lasts{$db}[1]); } } push @perfdata => ( \@perf_newest, \@perf_oldest, \@perf_size, \@perf_delta ); } save $hosts[0], 'pg_dump_backup', \%lasts, $args{'status-file'} if $args{'status-file'}; return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn; return status_ok( $me, [], \@perfdata ); } =item B<pga_version> Check if this script is running the given version of check_pgactivity. You must provide the expected version using either C<--warning> OR C<--critical>. No perfdata is returned. Required privileges: none. =cut sub check_pga_version { my @rs; my @hosts; my %args = %{ $_[0] }; my $me = 'PGACTIVITY_VERSION'; my $msg = "check_pgactivity $VERSION%s, Perl %vd"; pod2usage( -message => 'FATAL: you must provide a warning or a critical threshold for service pga_version!', -exitval => 127 ) if (defined $args{'critical'} and defined $args{'warning'}) or (not defined $args{'critical'} and not defined $args{'warning'}); pod2usage( -message => "FATAL: given version does not look like a check_pgactivity version!", -exitval => 127 ) if ( defined $args{'critical'} and $args{'critical'} !~ m/^\d\.\d+(?:_?(?:dev|beta|rc)\d*)?$/ ) or (defined $args{'warning'} and $args{'warning'} !~ m/^\d\.\d+(?:_?(?:dev|beta|rc)\d*)?$/ ); return status_critical( $me, [ sprintf($msg, " (should be $args{'critical'}!)", $^V) ] ) if defined $args{'critical'} and $VERSION ne $args{'critical'}; return status_warning( $me, [ sprintf($msg, " (should be $args{'warning'}!)", $^V) ] ) if defined $args{'warning'} and $VERSION ne $args{'warning'}; return status_ok( $me, [ sprintf($msg, "", $^V) ] ); } =item B<pgdata_permission> (8.2+) Check that the instance data directory rights are 700, and belongs to the system user currently running postgresql. The check on rights works on all Unix systems. Checking the user only works on Linux systems (it uses /proc to avoid dependencies). Before 9.3, you need to provide the expected owner using the C<--uid> argument, or the owner will not be checked. Required privileges: <11:superuser v11: user with pg_monitor or pg_read_all_setting The system user must also be able to read the folder containing PGDATA: B<the service has to be executed locally on the monitored server.> =cut sub check_pgdata_permission { my $me = 'POSTGRES_CHECK_PGDATA_PERMISSION'; my %args = %{ $_[0] }; my $criticity = 0; # 0=ok, 1=warn, 2=critical my $pg_uid = $args{'uid'}; my @rs; my @msg; my @longmsg; my @stat; my @hosts; my $mode; my $perm; my $query; my $dir_uid; my $stats_age; my $data_directory; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "pgdata_permission".', -exitval => 127 ) if scalar @hosts != 1; is_compat $hosts[0], 'pgdata_permission', $PG_VERSION_82 or exit 1; # Get the data directory $query = q{ SELECT setting FROM pg_settings WHERE name='data_directory' }; @rs = @{ query( $hosts[0], $query ) }; $data_directory = $rs[0][0]; return status_unknown( $me, [ "Postgresql returned this PGDATA: $data_directory, but I cannot access it: $!" ] ) unless @stat = stat( $data_directory ); $mode = $stat[2]; $dir_uid = $stat[4]; $perm = sprintf( "%04o", $mode & 07777 ); # starting with v11, PGDATA can be 0750. if ($perm eq '0700' or (check_compat($hosts[0], $PG_VERSION_110) and $perm eq '0750') ) { push @msg, ( "Permission is ok on $data_directory" ); } else { $criticity = 2; push @msg, ( "Permission is $perm on $data_directory" ); } # Now look at who this directory belongs to, and if it matches the user running # the instance. if ( defined $args{'uid'} ) { $pg_uid = getpwnam( $args{'uid'} ); } # Simplest way is to get the current backend pid and see who it belongs to elsif ( $^O =~ /linux/i and $hosts[0]{'version_num'} >= $PG_VERSION_93 ) { my $pg_line_uid; my @rs_tmp; # first query is a ugly hack to bypass query() control about number of # columns expected. $query = qq{ select ' '; select pg_backend_pid() as pid \\gset \\setenv PID :pid \\! cat /proc/\$PID/status }; @rs = @{ query ( $hosts[0], $query ) }; # takes part of the ugly hack to bypass query() control. shift @rs; # As the usual separators are not there, we only get a big record # separated with \n # Find the record beginning with Uid and get the third column # (containing EUID) @rs_tmp = split( /\n/, $rs[0][0] ); $pg_line_uid = ( grep (/^Uid/, @rs_tmp) )[0]; $pg_uid = ( split("\t", $pg_line_uid) )[2]; } if ( not defined $pg_uid ) { push @longmsg, ( "Cannot determine UID of user running postgres. Try to use '--uid'?" ); } elsif ( $pg_uid ne $dir_uid ) { $criticity = 2; push @msg, ( "User running Postgres ($pg_uid) doesn't own $data_directory ($dir_uid)" ); } else { push @msg, ( "Owner of $data_directory is ($pg_uid)" ); } return status_warning( $me, \@msg, undef, \@longmsg ) if $criticity == 1; return status_critical( $me, \@msg, undef, \@longmsg ) if $criticity; return status_ok( $me, \@msg, undef, \@longmsg ); } =item B<replication_slots> (9.4+) Check the number of WAL files retained and spilled files for each replication slots. Perfdata returns the number of WAL kept for each slot and the number of spilled files in pg_replslot for each logical replication slot. Since v13, if C<max_slot_wal_keep_size> is greater or equal to 0, perfdata reports the size of WAL to produce before each slot becomes C<unreserved> or C<lost>. Note that this size can become negative if the WAL status for the limited time where the slot becomes C<unreserved>. It is set to zero as soon as the last checkpoint finished and the status becomes C<lost>. This service needs superuser privileges to obtain the number of spill files or returns 0 in last resort. Critical and Warning thresholds are optional. They accept either a raw number (for backward compatibility, only wal threshold will be used) or a list of 'wal=value' and/or 'spilled=value' and/or 'remaining=size'. Respectively number of kept wal files, number of spilled files in pg_replslot for each logical slot and remaining bytes before a slot becomes C<unreserved> or C<lost>. Moreover, with v13 and after, the service raises a warning alert if a slot becomes C<unreserved>. It raises a critical alert if the slot becomes C<lost>. Required privileges: v9.4: unprivileged role, or superuser to monitor spilled files for logical replication v11+: unprivileged user with GRANT EXECUTE on function pg_ls_dir(text) Here is somes examples: -w 'wal=50,spilled=20' -c 'wal=100,spilled=40' -w 'spilled=20,remaining=160MB' -c 'spilled=40,remaining=48MB' =cut sub check_replication_slots { my $me = 'POSTGRES_REPLICATION_SLOTS'; my %args = %{ $_[0] }; my @msg_crit; my @msg_warn; my @longmsg; my @perfdata; my @hosts; my @rs; my @perf_wal_limits; my @perf_spilled_limits; my @perf_remaining_limits; my %warn; my %crit; my %queries = ( # 1st field: slot name # 2nd field: slot type # 3rd field: number of WAL kept because of the slot # 4th field: number of spill files for logical replication # 5th field: wal status for this slot (v13+) # 6th field: remaining safe bytes before max_slot_wal_keep_size (v13+) $PG_VERSION_94 => q{ WITH wal_size AS ( SELECT current_setting('wal_block_size')::int * setting::int AS val FROM pg_settings WHERE name = 'wal_segment_size' -- usually 2048 (blocks) ) SELECT slot_name, slot_type, replslot_wal_keep, count(slot_file) as replslot_files, -- 0 if not superuser NULL, NULL FROM (SELECT slot.slot_name, CASE WHEN slot_file <> 'state' THEN 1 END AS slot_file, slot_type, COALESCE( floor( CASE WHEN pg_is_in_recovery() THEN ( pg_xlog_location_diff(pg_last_xlog_receive_location(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_xlog_location_diff(restart_lsn, '0/0') % s.val) ) / s.val ELSE ( pg_xlog_location_diff(pg_current_xlog_location(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_xlogfile_name_offset(restart_lsn)).file_offset ) / s.val END ),0 ) as replslot_wal_keep FROM pg_replication_slots slot -- trick when user is not superuser LEFT JOIN ( SELECT slot2.slot_name, pg_ls_dir('pg_replslot/'||slot2.slot_name) as slot_file FROM pg_replication_slots slot2 WHERE current_setting('is_superuser')::bool ) files(slot_name,slot_file) ON slot.slot_name=files.slot_name CROSS JOIN wal_size s ) as d GROUP BY slot_name,slot_type,replslot_wal_keep}, $PG_VERSION_100 => q{ WITH wal_size AS ( SELECT current_setting('wal_block_size')::int * setting::int AS val FROM pg_settings WHERE name = 'wal_segment_size' -- usually 2048 (blocks) ) SELECT slot_name, slot_type, replslot_wal_keep, count(slot_file) AS spilled_files, -- 0 if not superuser NULL, NULL FROM (SELECT slot.slot_name, CASE WHEN slot_file <> 'state' THEN 1 END AS slot_file, slot_type, COALESCE( floor( CASE WHEN pg_is_in_recovery() THEN ( pg_wal_lsn_diff(pg_last_wal_receive_lsn(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_wal_lsn_diff(restart_lsn, '0/0') % s.val) ) / s.val ELSE ( pg_wal_lsn_diff(pg_current_wal_lsn(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_walfile_name_offset(restart_lsn)).file_offset ) / s.val END ),0 ) as replslot_wal_keep FROM pg_replication_slots slot -- trick when user is not superuser LEFT JOIN ( SELECT slot2.slot_name, pg_ls_dir('pg_replslot/'||slot2.slot_name) as slot_file FROM pg_replication_slots slot2 WHERE current_setting('is_superuser')::bool ) files(slot_name,slot_file) ON slot.slot_name=files.slot_name CROSS JOIN wal_size s ) as d GROUP BY slot_name,slot_type,replslot_wal_keep}, $PG_VERSION_110 => q{ WITH wal_size AS ( SELECT setting::int AS wal_segment_size -- unit: B (often 16777216) FROM pg_settings WHERE name = 'wal_segment_size' ) SELECT slot_name, slot_type, replslot_wal_keep, count(slot_file) AS spilled_files, -- 0 if not superuser NULL, NULL FROM (SELECT slot.slot_name, CASE WHEN slot_file <> 'state' THEN 1 END AS slot_file, slot_type, COALESCE( floor( CASE WHEN pg_is_in_recovery() THEN ( pg_wal_lsn_diff(pg_last_wal_receive_lsn(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_wal_lsn_diff(restart_lsn, '0/0') % s.wal_segment_size) ) / s.wal_segment_size ELSE ( pg_wal_lsn_diff(pg_current_wal_lsn(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_walfile_name_offset(restart_lsn)).file_offset ) / s.wal_segment_size END ),0 ) as replslot_wal_keep FROM pg_replication_slots slot -- trick when user is not superuser LEFT JOIN ( SELECT slot2.slot_name, pg_ls_dir('pg_replslot/'||slot2.slot_name) as slot_file FROM pg_replication_slots slot2 WHERE current_setting('is_superuser')::bool) files(slot_name,slot_file) ON slot.slot_name=files.slot_name CROSS JOIN wal_size s ) as d GROUP BY slot_name,slot_type,replslot_wal_keep}, $PG_VERSION_130 => q{ WITH wal_sz AS ( SELECT setting::int AS v -- unit: B (often 16777216) FROM pg_settings WHERE name = 'wal_segment_size' ), slot_sz AS ( SELECT setting::int AS v -- unit: MB FROM pg_settings WHERE name = 'max_slot_wal_keep_size' ) SELECT slot_name, slot_type, replslot_wal_keep, count(slot_file) AS spilled_files, -- 0 if not superuser wal_status, remaining_sz FROM ( SELECT slot.slot_name, CASE WHEN slot_file <> 'state' THEN 1 END AS slot_file, slot_type, CASE WHEN slot.wal_status = 'lost' THEN 0 ELSE COALESCE( floor( CASE WHEN pg_is_in_recovery() THEN ( pg_wal_lsn_diff(pg_last_wal_receive_lsn(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_wal_lsn_diff(restart_lsn, '0/0') % wal_sz.v) ) / wal_sz.v ELSE ( pg_wal_lsn_diff(pg_current_wal_lsn(), slot.restart_lsn) -- this is needed to account for whole WAL retention and -- not only size retention + (pg_walfile_name_offset(restart_lsn)).file_offset ) / wal_sz.v END ), 0 ) END AS replslot_wal_keep, slot.wal_status, CASE WHEN slot_sz.v >= 0 THEN slot.safe_wal_size ELSE NULL END AS remaining_sz FROM pg_replication_slots slot -- trick when user is not superuser LEFT JOIN ( SELECT slot2.slot_name, pg_ls_dir('pg_replslot/'||slot2.slot_name) as slot_file FROM pg_replication_slots slot2 WHERE current_setting('is_superuser')::bool) files(slot_name,slot_file ) ON slot.slot_name = files.slot_name CROSS JOIN wal_sz CROSS JOIN slot_sz ) as d GROUP BY slot_name, slot_type, replslot_wal_keep, wal_status, remaining_sz} ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "replication_slots".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'replication_slots', $PG_VERSION_94 or exit 1; # build warn/crit thresholds if ( defined $args{'warning'} ) { my $thresholds_re = qr/(wal|spilled|remaining)\s*=\s*(?:([^,]+))/i; if ($args{'warning'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ and $args{'critical'} =~ m/^$thresholds_re(\s*,\s*$thresholds_re)*$/ ) { while ( $args{'warning'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); if ($threshold eq 'remaining') { $warn{$threshold} = get_size $value; } else { pod2usage( -message => "FATAL: $threshold accept a raw number\n", -exitval => 127 ) unless $value =~ m/^([0-9]+)$/; $warn{$threshold} = $value; } } while ( $args{'critical'} =~ /$thresholds_re/g ) { my ($threshold, $value) = ($1, $2); if ($threshold eq 'remaining') { $crit{$threshold} = get_size $value; } else { pod2usage( -message => "FATAL: $threshold accept a raw number\n", -exitval => 127 ) unless $value =~ m/^([0-9]+)$/; $crit{$threshold} = $value; } } } # For backward compatibility elsif ($args{'warning'} =~ m/^([0-9]+)$/ and $args{'critical'} =~ m/^([0-9]+)$/ ) { $warn{'wal'} = $args{'warning'}; $crit{'wal'} = $args{'critical'}; } else { pod2usage( -message => "FATAL: critical and warning thresholds only accept:\n" . "- raw numbers for backward compatibility to set wal threshold.\n" . "- a list 'wal=value' and/or 'spilled=value' and/or remaining=size separated by comma.\n" . "See documentation for more information.", -exitval => 127 ) } pod2usage( -message => "FATAL: \"remaining=size\" can only be set for PostgreSQL 13 and after.", -exitval => 127 ) if $hosts[0]->{'version_num'} < $PG_VERSION_130 and ( exists $warn{'remaining'} or exists $crit{'remaining'} ); } @perf_wal_limits = ( $warn{'wal'}, $crit{'wal'} ) if defined $warn{'wal'} or defined $crit{'wal'}; @perf_spilled_limits = ( $warn{'spilled'}, $crit{'spilled'} ) if defined $warn{'spilled'} or defined $crit{'spilled'}; @perf_remaining_limits = ( $warn{'remaining'}, $crit{'remaining'} ) if defined $warn{'remaining'} or defined $crit{'remaining'}; @rs = @{ query_ver( $hosts[0], %queries ) }; SLOTS_LOOP: foreach my $row (@rs) { push @perfdata => [ "$row->[0]_wal", $row->[2],'File', @perf_wal_limits ] unless $row->[4] and $row->[4] eq 'lost'; # add number of spilled files if logical replication slot push @perfdata => [ "$row->[0]_spilled", $row->[3], 'File', @perf_spilled_limits ] if $row->[1] eq 'logical'; # add remaining safe bytes if available push @perfdata => [ "$row->[0]_remaining", $row->[5], '', @perf_remaining_limits ] if $row->[5]; # alert on number of WAL kept if ( defined $crit{'wal'} and $row->[2] > $crit{'wal'} ) { push @msg_crit, "$row->[0] wal files : $row->[2]"; push @longmsg => sprintf("Slot: %s wal files = %s above crit threshold %s", $row->[0], $row->[2], $crit{'wal'} ); } elsif ( defined $warn{'wal'} and $row->[2] > $warn{'wal'} ) { push @msg_warn, "$row->[0] wal files : $row->[2]"; push @longmsg => sprintf("Slot: %s wal files = %s above warn threshold %s", $row->[0], $row->[2], $warn{'wal'} ); } # alert on number of spilled files for logical replication if ( defined $crit{'spilled'} and $row->[3] > $crit{'spilled'} ) { push @msg_crit, "$row->[0] spilled files : $row->[3]"; push @longmsg => sprintf("Slot: %s spilled files = %s above critical threshold %s", $row->[0], $row->[3], $crit{'spilled'} ); } elsif ( defined $warn{'spilled'} and $row->[3] > $warn{'spilled'} ) { push @msg_warn, "$row->[0] spilled files : $row->[3]"; push @longmsg => sprintf("Slot: %s spilled files = %s above warning threshold %s", $row->[0], $row->[3], $warn{'spilled'} ); } # alert on wal status push @msg_warn, "$row->[0] unreserved" if $row->[4] and $row->[4] eq 'unreserved'; push @msg_crit, "$row->[0] lost" if $row->[4] and $row->[4] eq 'lost'; # do not test remaining bytes if no value available from query next unless $row->[5]; # alert on remaining safe bytes if ( defined $crit{'remaining'} and $row->[5] < $crit{'remaining'} ) { push @msg_crit, sprintf("slot %s not safe", $row->[0]); push @longmsg => sprintf("Remaining %s of WAL for slot %s", to_size($row->[5]), $row->[0] ); } elsif ( defined $warn{'remaining'} and $row->[5] < $warn{'remaining'} ) { push @msg_warn, sprintf("slot %s not safe", $row->[0]); push @longmsg => sprintf("Remaining %s of WAL for slot %s", to_size($row->[5]), $row->[0] ); } } return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata, \@longmsg ) if scalar @msg_crit > 0; return status_warning( $me, [ @msg_warn ], \@perfdata, \@longmsg ) if scalar @msg_warn > 0; return status_ok( $me, [ "Replication slots OK" ], \@perfdata, \@longmsg ); } =item B<settings> (9.0+) Check if the current settings have changed since they were stored in the service file. The "known" settings are recorded during the very first call of the service. To update the known settings after a configuration change, call this service again with the argument C<--save>. No perfdata. Critical and Warning thresholds are ignored. A Critical is raised if at least one parameter changed. Required privileges: unprivileged role. =cut sub check_settings { my $me = 'POSTGRES_SETTINGS'; my @long_msg; my @hosts; my @rs; my %settings; my %new_settings; my $pending_count = 0; my %args = %{ $_[0] }; my %queries = ( $PG_VERSION_90 => q{ SELECT coalesce(r.rolname, '*'), coalesce(d.datname, '*'), unnest(s.setconfig) AS setting, false AS pending_restart FROM pg_db_role_setting s LEFT JOIN pg_database d ON d.oid=s.setdatabase LEFT JOIN pg_roles r ON r.oid=s.setrole UNION ALL SELECT '*', '*', name||'='||current_setting(name),false FROM pg_settings }, $PG_VERSION_95 => q{ SELECT coalesce(r.rolname, '*'), coalesce(d.datname, '*'), unnest(s.setconfig) AS setting, false AS pending_restart FROM pg_db_role_setting s LEFT JOIN pg_database d ON d.oid=s.setdatabase LEFT JOIN pg_roles r ON r.oid=s.setrole UNION ALL SELECT '*', '*', name||'='||current_setting(name), pending_restart FROM pg_settings } ); @hosts = @{ parse_hosts %args }; is_compat $hosts[0], 'settings', $PG_VERSION_90 or exit 1; @rs = @{ query_ver( $hosts[0], %queries ) }; %settings = %{ load( $hosts[0], 'settings', $args{'status-file'} ) || {} }; # Save settings on the very first call $args{'save'} = 1 unless %settings; PARAM_LOOP: foreach my $row (@rs) { my ( $rolname, $datname, $setting, $pending ) = @$row; my ( $name, $val ) = split /=/ => $setting, 2; my $prefix = "$rolname\@$datname"; my $msg = "$setting"; if ( $pending eq "t" ) { $pending_count++; push @long_msg => "$name is pending restart !"; } $msg = "$prefix: $setting" unless $prefix eq '*@*'; $new_settings{$name}{$prefix} = $val; push @long_msg => $msg unless exists $settings{$name}{$prefix}; push @long_msg => $msg if exists $settings{$name}{$prefix} and $val ne $settings{$name}{$prefix}; delete $settings{$name}{$prefix}; } # Gather remaining settings that has not been processed foreach my $s ( keys %settings ) { foreach my $p ( keys %{ $settings{$s} } ) { my $prefix = ( $p eq "*@*"? ":" : " $p:" ); push @long_msg => "missing$prefix $s=$settings{$s}{$p}"; } } if ( $args{'save'} ) { save $hosts[0], 'settings', \%new_settings, $args{'status-file'}; return status_ok( $me, [ "Setting saved" ] ) } return status_warning( $me, [ 'Setting changed and pending restart!' ], undef, \@long_msg ) if $pending_count > 0; return status_warning( $me, [ 'Setting changed!' ], undef, \@long_msg ) if scalar @long_msg; return status_ok( $me, [ "Setting OK" ] ); } =item B<sequences_exhausted> (7.4+) Check all sequences and raise an alarm if the column or sequences gets too close to the maximum value. The maximum value is calculated from the maxvalue of the sequence or from the column type when the sequence is owned by a column (the smallserial, serial and bigserial types). Perfdata returns the sequences that trigger the alert. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. Critical and Warning thresholds accept a percentage of the sequence filled. Required privileges: unprivileged role able to log in all databases =cut sub check_sequences_exhausted { my $me = 'POSTGRES_CHECK_SEQ_EXHAUSTED'; my @rs; my @rs2; my @perfdata; my @msg; my @longmsg; my @hosts; my %args = %{ $_[0] }; my $stats_age; my @all_db; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $criticity=0; # 0=ok, 1=warn, 2=critical if ( not defined $args{'warning'} or not defined $args{'critical'} ) { # warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ); } unless ( $args{'warning'} =~ m/^([0-9.]+)%$/ and $args{'critical'} =~ m/^([0-9.]+)%$/) { # Warning and critical must be %. pod2usage( -message => "FATAL: critical and warning thresholds only accept %.", -exitval => 127 ); } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "sequences_exhausted".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'sequences_exhausted', $PG_VERSION_82 or exit 1; @all_db = @{ get_all_dbname( $hosts[0] ) }; # Iterate over all db ALLDB_LOOP: foreach my $db (sort @all_db) { next ALLDB_LOOP if grep { $db =~ /$_/ } @dbexclude; next ALLDB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; dprint ("Searching for exhausted dequences in $db \n"); my %sequences; # We have two code paths: one for < 10.0 and one for >=10.S if (check_compat $hosts[0], $PG_VERSION_82, $PG_VERSION_96) { # Search path is emptied so that ::regclass gives us full paths my $query = q{ SET search_path TO 'pg_catalog'; SELECT d.typname, seq.oid::regclass as name, pg_catalog.quote_ident(d.nspname) || '.' || pg_catalog.quote_ident(d.relname) || '.' || pg_catalog.quote_ident(d.attname) FROM pg_class seq LEFT JOIN (SELECT d.objid, t.typname, n.nspname, c.relname, a.attname FROM pg_catalog.pg_depend d INNER JOIN pg_catalog.pg_class c ON c.oid=d.refobjid INNER JOIN pg_catalog.pg_namespace n ON n.oid=c.relnamespace INNER JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid AND a.attnum=d.refobjsubid) INNER JOIN pg_catalog.pg_type t ON a.atttypid=t.oid WHERE d.classid='pg_catalog.pg_class'::pg_catalog.regclass AND d.refclassid='pg_catalog.pg_class'::pg_catalog.regclass AND d.deptype='a') d ON d.objid=seq.oid WHERE seq.relkind='S'}; # We got an array: for each record, type (int2, int4, int8), sequence name, and the column name @rs = @{ query ( $hosts[0], $query, $db ) }; dprint ("DB $db : found ".(scalar @rs )." sequences \n" ); next ALLDB_LOOP if ( scalar @rs ) <= 0 ; # Now a second query: get last_value for all sequences. There is no way to not generate a query here my @query_elements; foreach my $record(@rs) { # dprint ("Looking at sequence $record->[1] \n") ; my $seqname=$record->[1]; my $protected_seqname=$seqname; $protected_seqname=~s/'/''/g; # Protect quotes push @query_elements,("SELECT '$protected_seqname',last_value,min_value,max_value,increment_by FROM $seqname"); } my $query_elements=join("\nUNION ALL\n",@query_elements); # We got a second array: for each record, sequence, last value and max value (in this sequence) @rs2 = @{ query ( $hosts[0], $query_elements, $db ) }; # To make things easier, we store all of this in a hash table with all sequences, merging these two queries foreach my $record(@rs) { $sequences{$record->[1]}->{TYPE} = $record->[0]; $sequences{$record->[1]}->{COLNAME} = $record->[2]; } foreach my $record(@rs2) { $sequences{$record->[0]}->{LASTVALSEQ} = $record->[1]; $sequences{$record->[0]}->{MINVALSEQ} = $record->[2]; $sequences{$record->[0]}->{MAXVALSEQ} = $record->[3]; $sequences{$record->[0]}->{INCREMENTBY} = $record->[4]; } } else { # Version 10.0 and higher we now have pg_sequence and functions to # get the info directly my $query = q{ SET search_path TO 'pg_catalog'; SELECT typname, seq.seqrelid::regclass AS sequencename, pg_catalog.quote_ident(nspname) || '.' || pg_catalog.quote_ident(relname) || '.' || pg_catalog.quote_ident(attname), CASE WHEN has_sequence_privilege(seq.seqrelid, 'SELECT,USAGE'::text) THEN pg_sequence_last_value(seq.seqrelid::regclass) ELSE NULL::bigint END AS last_value, seq.seqmin AS min_value, seq.seqmax AS max_value, seq.seqincrement AS increment_by FROM pg_sequence seq LEFT JOIN (SELECT d.objid, nspname, relname, attname, typname FROM pg_catalog.pg_depend d INNER JOIN pg_catalog.pg_class c ON c.oid=d.refobjid INNER JOIN pg_catalog.pg_namespace n ON n.oid=c.relnamespace INNER JOIN pg_catalog.pg_attribute a ON ( a.attrelid=c.oid AND a.attnum=d.refobjsubid) INNER JOIN pg_catalog.pg_type t ON a.atttypid=t.oid WHERE d.classid='pg_catalog.pg_class'::pg_catalog.regclass AND d.refclassid='pg_catalog.pg_class'::pg_catalog.regclass AND d.deptype='a') class ON class.objid=seq.seqrelid;}; # We get an array: for each record, type (int2, int4, int8), # sequence name, column name, last, min, max, increment @rs = @{ query ( $hosts[0], $query, $db ) }; foreach my $record (@rs) { $sequences{$record->[1]}->{TYPE} = $record->[0]; $sequences{$record->[1]}->{COLNAME} = $record->[2]; $sequences{$record->[1]}->{LASTVALSEQ} = $record->[3]; $sequences{$record->[1]}->{MINVALSEQ} = $record->[4]; $sequences{$record->[1]}->{MAXVALSEQ} = $record->[5]; $sequences{$record->[1]}->{INCREMENTBY} = $record->[6]; } } # Calculate real max value: # We take into accound negative incrementby. If incrementby <0, use minvalseq foreach my $seq (keys %sequences) { my $max_value; # We don't make the difference between positive and negative limits # It's only 1 of difference, so there is no point… if ($sequences{$seq}->{TYPE} eq 'int2') { $max_value = 32767; } elsif ($sequences{$seq}->{TYPE} eq 'int4') { $max_value = 2147483647; } elsif ($sequences{$seq}->{TYPE} eq 'int8') { $max_value = 9223372036854775807; } # When we can't link the sequence to a column, we trust the dba/developer # and we're not going to try to guess. if ($sequences{$seq}->{LASTVALSEQ} eq '') { # Skip sequences having lastvalue not initialized delete $sequences{$seq}; next; } my $max_val_seq; if ($sequences{$seq}->{INCREMENTBY} >= 0) { $max_val_seq=$sequences{$seq}->{MAXVALSEQ}; $sequences{$seq}->{ABSVALSEQ} = $sequences{$seq}->{LASTVALSEQ}; } else { $max_val_seq =-$sequences{$seq}->{MINVALSEQ};# Reverse the sign $sequences{$seq}->{ABSVALSEQ} =-$sequences{$seq}->{LASTVALSEQ}; } # The real maximum value is the smallest of both my $real_max_value; if ( defined $max_value ) { $real_max_value = ($max_val_seq <= $max_value) ? $max_val_seq : $max_value; } else { $real_max_value = $max_val_seq; } $sequences{$seq}->{REALMAXVALUE} = $real_max_value; } # We have inverted values for the reverse-order sequences. We don't # have to think about it anymore. foreach my $seq(keys %sequences) { # First, get all info my $real_max_value=$sequences{$seq}->{REALMAXVALUE}; my $usable_amount=$real_max_value - $sequences{$seq}->{MINVALSEQ} + 1; my $lim_warning=$usable_amount-get_size($args{'warning'},$usable_amount); my $lim_critical=$usable_amount-get_size($args{'critical'},$usable_amount); my $how_much_left=$real_max_value-$sequences{$seq}->{ABSVALSEQ}; my $seq_desc; my $long_seq_desc; if ( $sequences{$seq}->{COLNAME} ne '' ) { $seq_desc="$db.$seq(" . $sequences{$seq}->{COLNAME} . ')'; $long_seq_desc="$db.$seq(owned by " . $sequences{$seq}->{COLNAME} . ')'; } else { $seq_desc="$db.$seq"; $long_seq_desc="$db.$seq"; } my $seq_criticity=0; if ($how_much_left<=$lim_critical) { $seq_criticity = 2; } elsif ($how_much_left<=$lim_warning) { $seq_criticity = 1; } if ($seq_criticity>=1) { push @perfdata => [ $seq_desc, $how_much_left, undef, $lim_warning, $lim_critical, 0, $sequences{$seq}->{REALMAXVALUE} ]; push @longmsg, "$long_seq_desc $how_much_left values left"; $criticity = ($criticity>$seq_criticity) ? $criticity : $seq_criticity; # Take highest of the criticities } } } return status_warning( $me, \@msg, \@perfdata, \@longmsg ) if $criticity == 1; return status_critical( $me, \@msg, \@perfdata, \@longmsg ) if $criticity == 2; return status_ok( $me, \@msg, \@perfdata ); } =item B<stat_snapshot_age> (9.5 to 14 included) Check the age of the statistics snapshot (statistics collector's statistics). This probe helps to detect a frozen stats collector process. Perfdata returns the statistics snapshot age. Critical and Warning thresholds only accept an interval (eg. 1h30m25s). Required privileges: unprivileged role. =cut sub check_stat_snapshot_age { my $me = 'POSTGRES_STAT_SNAPSHOT_AGE'; my @rs; my @perfdata; my @msg; my @hosts; my %args = %{ $_[0] }; my $stats_age; my $query = q{ SELECT extract(epoch from (now() - pg_stat_get_snapshot_timestamp())) AS age }; if ( defined $args{'warning'} or defined $args{'critical'} ) { # if a threshold is specified, both must be set pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; pod2usage( -message => "FATAL: critical and warning thresholds only acccepts interval.", -exitval => 127 ) unless is_time( $args{'warning'} ) and is_time( $args{'critical'} ); } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "stat_snapshot_age".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'stat_snapshot_age', $PG_VERSION_95, $PG_VERSION_140 or exit 1; @rs = @{ query ( $hosts[0], $query ) }; # Get statistics age in seconds $stats_age = $rs[0][0]; push @perfdata => [ "statistics_age", $stats_age, undef ]; if ( defined $args{'warning'} ) { my $w_limit = get_time $args{'warning'}; my $c_limit = get_time $args{'critical'}; push @{ $perfdata[0] } => ( $w_limit, $c_limit ); return status_critical( $me, \@msg, \@perfdata ) if $stats_age >= $c_limit; return status_warning( $me, \@msg, \@perfdata ) if $stats_age >= $w_limit; } return status_ok( $me, \@msg, \@perfdata ); } =item B<streaming_delta> (9.1+) Check the data delta between a cluster and its standbys in streaming replication. Optional argument C<--slave> allows you to specify some slaves that MUST be connected. This argument can be used as many times as desired to check multiple slave connections, or you can specify multiple slaves connections at one time, using comma separated values. Both methods can be used in a single call. The provided values must be of the form "APPLICATION_NAME IP". Both following examples will check for the presence of two slaves: --slave 'slave1 192.168.1.11' --slave 'slave2 192.168.1.12' --slave 'slave1 192.168.1.11','slave2 192.168.1.12' This service supports a C<--exclude REGEX> parameter to exclude every result matching a regular expression on application_name or IP address fields. You can use multiple C<--exclude REGEX> parameters. Perfdata returns the data delta in bytes between the master and every standbies found, the number of standbies connected and the number of excluded standbies. Critical and Warning thresholds are optional. They can take one or two values separated by a comma. If only one value is supplied, it applies to both flushed and replayed data. If two values are supplied, the first one applies to flushed data, the second one to replayed data. These thresholds only accept a size (eg. 2.5G). Required privileges: unprivileged role. =cut sub check_streaming_delta { my @perfdata; my @msg; my @msg_crit; my @msg_warn; my @rs; my $w_limit_flushed; my $c_limit_flushed; my $w_limit_replayed; my $c_limit_replayed; my @hosts; my %slaves; my %args = %{ $_[0] }; my @exclude = @{ $args{'exclude'} }; my $excluded = 0; my $wal_size = hex('ff000000'); my $me = 'POSTGRES_STREAMING_DELTA'; my $master_location = ''; my $num_clusters = 0; my %queries = ( # ** WARNING WARNING WARNING ** # # TAP tests relies heavily on column *ORDER* (not column name)! Columns # 4, 5, 6 and 7 are read/changed from t/lib/Mocker/Streaming.pm during # TAP tests. See comments in t/lib/Mocker/Streaming.pm for more info. # # If you need to change the column order or logic, you must change it # in TAP tests and in Mocker::Streaming as well. # # 1st field: slot application_name # 2nd field: client_addr # 3rd field: pid # 4th field: sent_lsn <= changed during TAP tests by Mocker::Streaming # 5th field: write_lsn <= changed during TAP tests by Mocker::Streaming # 6th field: flush_lsn <= changed during TAP tests by Mocker::Streaming # 7th field: replay_lsn # 8th field: current lsn <= read during TAP tests by Mocker::Streaming $PG_VERSION_100 => q{SELECT application_name, client_addr, pid, sent_lsn, write_lsn, flush_lsn, replay_lsn, CASE pg_is_in_recovery() WHEN true THEN pg_last_wal_receive_lsn() ELSE pg_current_wal_lsn() END FROM pg_stat_replication WHERE state NOT IN ('startup', 'backup')}, $PG_VERSION_92 => q{SELECT application_name, client_addr, pid, sent_location, write_location, flush_location, replay_location, CASE pg_is_in_recovery() WHEN true THEN pg_last_xlog_receive_location() ELSE pg_current_xlog_location() END FROM pg_stat_replication WHERE state NOT IN ('startup', 'backup')}, $PG_VERSION_91 => q{SELECT application_name, client_addr, procpid, sent_location, write_location, flush_location, replay_location, CASE pg_is_in_recovery() WHEN true THEN pg_last_xlog_receive_location() ELSE pg_current_xlog_location() END FROM pg_stat_replication WHERE state NOT IN ('startup', 'backup')} ); # FIXME this service should check for given slaves in opts! @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "streaming_delta".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'streaming_delta', $PG_VERSION_91 or exit 1; $wal_size = 4294967296 if $hosts[0]{'version_num'} >= $PG_VERSION_93; if ( scalar @{ $args{'slave'} } ) { $slaves{$_} = 0 foreach ( split ( /,/, join ( ',', @{ $args{'slave'} } ) ) ); } @rs = @{ query_ver( $hosts[0], %queries ) }; return status_unknown( $me, ['No slaves connected'], \@perfdata ) unless scalar @rs; $rs[0][7] =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $master_location = ( $wal_size * hex($1) ) + hex($2); if ( defined $args{'critical'} ) { ($w_limit_flushed, $w_limit_replayed) = split /,/, $args{'warning'}; ($c_limit_flushed, $c_limit_replayed) = split /,/, $args{'critical'}; if (!defined($w_limit_replayed)) { $w_limit_replayed = $w_limit_flushed; } if (!defined($c_limit_replayed)) { $c_limit_replayed = $c_limit_flushed; } $w_limit_flushed = get_size( $w_limit_flushed ); $c_limit_flushed = get_size( $c_limit_flushed ); $w_limit_replayed = get_size( $w_limit_replayed ); $c_limit_replayed = get_size( $c_limit_replayed ); } # Compute deltas foreach my $host (@rs) { my $send_delta; my $write_delta; my $flush_delta; my $replay_delta; my $name; if ( grep { $host->[0] =~ m/$_/ or $host->[1] =~ m/$_/ } @exclude ) { $excluded++; next; } $num_clusters++; $host->[3] =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $send_delta = $master_location - ( $wal_size * hex($1) ) - hex($2); $host->[4] =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $write_delta = $master_location - ( $wal_size * hex($1) ) - hex($2); $host->[5] =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $flush_delta = $master_location - ( $wal_size * hex($1) ) - hex($2); $host->[6] =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $replay_delta = $master_location - ( $wal_size * hex($1) ) - hex($2); $name = "$host->[0]\@$host->[1]"; push @perfdata => ( [ "sent delta $name", $send_delta, "B" ], [ "wrote delta $name", $write_delta, "B" ], [ "flushed delta $name", $flush_delta, "B", $w_limit_flushed, $c_limit_flushed ], [ "replay delta $name", $replay_delta, "B", $w_limit_replayed, $c_limit_replayed ], [ "pid $name", $host->[2] ] ); $slaves{"$host->[0] $host->[1]"} = 1; if ( defined $args{'critical'} ) { if ($flush_delta > $c_limit_flushed) { push @msg_crit, "critical flush lag: " . to_size($flush_delta) . " for $name"; next; } if ($replay_delta > $c_limit_replayed) { push @msg_crit, "critical replay lag: " . to_size($replay_delta) . " for $name"; next; } if ($flush_delta > $w_limit_flushed) { push @msg_warn, "warning flush lag: ". to_size($flush_delta) . " for $name"; next; } if ($replay_delta > $w_limit_replayed) { push @msg_warn, "warning replay lag: " . to_size($replay_delta) . " for $name"; next; } } } push @perfdata => [ '# of excluded slaves', $excluded ]; push @perfdata => [ '# of slaves', scalar @rs || 0 ]; while ( my ( $host, $connected ) = each %slaves ) { unshift @msg_crit => "$host not connected" unless $connected; } return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if @msg_warn > 0; return status_ok( $me, [ "$num_clusters slaves checked" ], \@perfdata ); } =item B<table_unlogged> (9.5+) Check if tables are changed to unlogged. In 9.5, you can switch between logged and unlogged. Without C<--critical> or C<--warning> parameters, this service attempts to fetch all unlogged tables. A critical alert is raised if an unlogged table is detected. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. This service supports a C<--exclude REGEX> parameter to exclude relations matching a regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple C<--exclude REGEX> parameters. Perfdata will return the number of unlogged tables per database. A list of the unlogged tables will be returned after the perfdata. This list contains the fully qualified table name. If C<--exclude REGEX> is set, the number of excluded tables is returned. Required privileges: unprivileged role able to log in all databases, or at least those in C<--dbinclude>. =cut sub check_table_unlogged { my @perfdata; my @longmsg; my @rs; my @hosts; my @all_db; my $total_tbl = 0; # num of tables checked, without excluded ones my $total_extbl = 0; # num of excluded tables my $c_count = 0; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_TABLE_UNLOGGED'; my $query = q{ SELECT current_database(), nsp.nspname AS schemaname, cls.relname, cls.relpersistence FROM pg_class cls join pg_namespace nsp on nsp.oid = cls.relnamespace WHERE cls.relkind = 'r' AND nsp.nspname not like 'pg_toast%' AND nsp.nspname NOT IN ('information_schema', 'pg_catalog') }; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give one (and only one) host with service "table_unlogged".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'table_unlogged', $PG_VERSION_95 or exit 1; @all_db = @{ get_all_dbname( $hosts[0] ) }; # Iterate over all db ALLDB_LOOP: foreach my $db (sort @all_db) { my @rc; my $nb_tbl = 0; my $tbl_unlogged = 0; next ALLDB_LOOP if grep { $db =~ /$_/ } @dbexclude; next ALLDB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; @rc = @{ query( $hosts[0], $query, $db ) }; UNLOGGED_LOOP: foreach my $unlogged (@rc) { foreach my $exclude_re ( @{ $args{'exclude'} } ) { if ("$unlogged->[0].$unlogged->[1].$unlogged->[2]" =~ m/$exclude_re/){ $total_extbl++; next UNLOGGED_LOOP ; } } # unlogged tables count if ($unlogged->[3] eq "u") { # long message info : push @longmsg => sprintf "%s.%s.%s (unlogged);", $unlogged->[0], $unlogged->[1], $unlogged->[2]; $tbl_unlogged++; } $nb_tbl++; } $total_tbl += $nb_tbl; $c_count += $tbl_unlogged; push @perfdata => [ "table unlogged in $db", $tbl_unlogged ]; } push @longmsg => sprintf "%i excluded table(s) from check", $total_extbl if $total_extbl > 0; # we use the critical count for the **total** number of unlogged tables return status_critical( $me, [ "$c_count/$total_tbl table(s) unlogged" ], \@perfdata, \@longmsg ) if $c_count > 0; return status_ok( $me, [ "No unlogged table" ], \@perfdata, \@longmsg ); } =item B<table_bloat> Estimate bloat on tables. Warning and critical thresholds accept a comma-separated list of either raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to B<bloat> size, not object size. If a percentage is given, the threshold will apply to the bloat size compared to the table + TOAST size. If multiple threshold values are passed, check_pgactivity will choose the largest (bloat size) value. This service supports both C<--dbexclude> and C<--dbinclude> parameters. The 'postgres' database and templates are always excluded. This service supports a C<--exclude REGEX> parameter to exclude relations matching the given regular expression. The regular expression applies to "database.schema_name.relation_name". This enables you to filter either on a relation name for all schemas and databases, on a qualified named relation (schema + relation) for all databases or on a qualified named relation in only one database. You can use multiple C<--exclude REGEX> parameters. B<Warning>: With a non-superuser role, this service can only check the tables that the given role is granted to read! Perfdata will return the number of tables matching the warning and critical thresholds, per database. A list of the bloated tables will be returned after the perfdata. This list contains the fully qualified bloated table name, the estimated bloat size, the table size and the bloat percentage. Required privileges: superuser (<10) able to log in all databases, or at least those in C<--dbinclude>; superuser (<10); on PostgreSQL 10+, a user with the role pg_monitor suffices, provided that you grant SELECT on the system table pg_statistic to the pg_monitor role, in each database of the cluster: C<GRANT SELECT ON pg_statistic TO pg_monitor;> =cut sub check_table_bloat { my @perfdata; my @longmsg; my @rs; my @hosts; my @all_db; my $total_tbl = 0; # num of table checked, without excluded ones my $w_count = 0; my $c_count = 0; my %args = %{ $_[0] }; my @dbinclude = @{ $args{'dbinclude'} }; my @dbexclude = @{ $args{'dbexclude'} }; my $me = 'POSTGRES_TABLE_BLOAT'; my %queries = ( # The base for the following queries come from: #  https://github.com/ioguix/pgsql-bloat-estimation # # Changes: # * use pg_statistic instead of pg_stats for performance # * as pg_namespace is not useful in subquery "s", move it as the very last join # Text types header is 4, page header is 20 and block size 8192 for 7.4. # page header is 24 and GUC block_size appears for 8.0 $PG_VERSION_74 => q{ SELECT current_database(), ns.nspname AS schemaname, tblname, bs*tblpages AS real_size, NULL, NULL, NULL, (tblpages-est_num_pages)*bs AS bloat_size, CASE WHEN tblpages > 0 AND tblpages - est_num_pages > 0 THEN 100 * (tblpages - est_num_pages)/tblpages::float ELSE 0 END AS bloat_ratio, is_na FROM ( SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_num_pages, tblpages, bs, tblid, relnamespace, tblname, heappages, toastpages, is_na FROM ( SELECT ( 4 + tpl_hdr_size + tpl_data_size + (2*ma) - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END - CASE WHEN tpl_data_size::numeric%ma = 0 THEN ma ELSE tpl_data_size::numeric%ma END ) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + coalesce(toast.relpages, 0)) AS tblpages, heappages, coalesce(toast.relpages, 0) AS toastpages, s.reltuples, coalesce(toast.reltuples, 0) AS toasttuples, bs, page_hdr, tblid, s.relnamespace, tblname, is_na FROM ( SELECT tbl.oid AS tblid, tbl.relnamespace, tbl.relname AS tblname, tbl.reltuples, tbl.relpages AS heappages, tbl.reltoastrelid, CASE WHEN cluster_version.v > 7 THEN current_setting('block_size')::numeric ELSE 8192::numeric END AS bs, CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma, CASE WHEN cluster_version.v > 7 THEN 24 ELSE 20 END AS page_hdr, CASE WHEN cluster_version.v > 7 THEN 27 ELSE 23 END + CASE WHEN MAX(coalesce(stanullfrac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END + CASE WHEN tbl.relhasoids THEN 4 ELSE 0 END AS tpl_hdr_size, sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 1024) ) AS tpl_data_size, max( CASE WHEN att.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na FROM pg_attribute att JOIN pg_class tbl ON att.attrelid = tbl.oid JOIN pg_statistic s ON s.starelid=tbl.oid AND s.staattnum=att.attnum CROSS JOIN ( SELECT substring(current_setting('server_version') FROM '#"[0-9]+#"%' FOR '#')::integer ) AS cluster_version(v) WHERE att.attnum > 0 AND NOT att.attisdropped AND tbl.relkind = 'r' GROUP BY 1,2,3,4,5,6,7,8,9, cluster_version.v, tbl.relhasoids ) as s LEFT JOIN pg_class toast ON s.reltoastrelid = toast.oid ) as s2 ) AS s3 JOIN pg_namespace AS ns ON ns.oid = s3.relnamespace WHERE NOT is_na ORDER BY ns.nspname,s3.tblname}, # Variable block size, page header is 24 and text types header is 1 or 4 for 8.3+ $PG_VERSION_82 => q{ SELECT current_database(), ns.nspname AS schemaname, tblname, bs*tblpages AS real_size, (tblpages-est_tblpages)*bs AS extra_size, CASE WHEN tblpages > 0 AND tblpages - est_tblpages > 0 THEN 100 * (tblpages - est_tblpages)/tblpages::float ELSE 0 END AS extra_ratio, fillfactor, (tblpages-est_tblpages_ff)*bs AS bloat_size, CASE WHEN tblpages > 0 AND tblpages - est_tblpages_ff > 0 THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float ELSE 0 END AS bloat_ratio, is_na FROM ( SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages, ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff, tblpages, fillfactor, bs, tblid, relnamespace, tblname, heappages, toastpages, is_na FROM ( SELECT ( 4 + tpl_hdr_size + tpl_data_size + (2*ma) - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END - CASE WHEN tpl_data_size::numeric%ma = 0 THEN ma ELSE tpl_data_size::numeric%ma END ) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + coalesce(toast.relpages, 0)) AS tblpages, heappages, coalesce(toast.relpages, 0) AS toastpages, s.reltuples, coalesce(toast.reltuples, 0) AS toasttuples, bs, page_hdr, tblid, s.relnamespace, tblname, fillfactor, is_na FROM ( SELECT tbl.oid AS tblid, tbl.relnamespace, tbl.relname AS tblname, tbl.reltuples, tbl.reltoastrelid, tbl.relpages AS heappages, coalesce(substring( array_to_string(tbl.reloptions, ' ') FROM '%fillfactor=#"__#"%' FOR '#')::smallint, 100) AS fillfactor, current_setting('block_size')::numeric AS bs, CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma, 24 AS page_hdr, CASE WHEN current_setting('server_version_num')::integer < 80300 THEN 27 ELSE 23 END + CASE WHEN MAX(coalesce(s.stanullfrac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END + CASE WHEN tbl.relhasoids THEN 4 ELSE 0 END AS tpl_hdr_size, sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 1024) ) AS tpl_data_size, bool_or(att.atttypid = 'pg_catalog.name'::regtype) AS is_na FROM pg_attribute AS att JOIN pg_class AS tbl ON att.attrelid = tbl.oid JOIN pg_statistic s ON s.starelid = tbl.oid AND s.staattnum=att.attnum LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid WHERE att.attnum > 0 AND NOT att.attisdropped AND tbl.relkind = 'r' GROUP BY 1,2,3,4,5,6,7,8,9,10, tbl.relhasoids ORDER BY 2,3 ) as s LEFT JOIN pg_class toast ON s.reltoastrelid = toast.oid ) as s2 ) AS s3 JOIN pg_namespace AS ns ON ns.oid = s3.relnamespace WHERE NOT is_na ORDER BY ns.nspname,s3.tblname}, # Exclude inherited stats $PG_VERSION_90 => q{ SELECT current_database(), nspname AS schemaname, tblname, bs*tblpages AS real_size, (tblpages-est_tblpages)*bs AS extra_size, CASE WHEN tblpages > 0 AND tblpages - est_tblpages > 0 THEN 100 * (tblpages - est_tblpages)/tblpages::float ELSE 0 END AS extra_ratio, fillfactor, (tblpages-est_tblpages_ff)*bs AS bloat_size, CASE WHEN tblpages > 0 AND tblpages - est_tblpages_ff > 0 THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float ELSE 0 END AS bloat_ratio, is_na FROM ( SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages, ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff, tblpages, fillfactor, bs, tblid, relnamespace, tblname, heappages, toastpages, is_na FROM ( SELECT ( 4 + tpl_hdr_size + tpl_data_size + (2*ma) - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END - CASE WHEN tpl_data_size::numeric%ma = 0 THEN ma ELSE tpl_data_size::numeric%ma END ) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + coalesce(toast.relpages, 0)) AS tblpages, heappages, coalesce(toast.relpages, 0) AS toastpages, s.reltuples, coalesce(toast.reltuples, 0) toasttuples, bs, page_hdr, tblid, s.relnamespace, tblname, fillfactor, is_na FROM ( SELECT tbl.oid AS tblid, tbl.relnamespace, tbl.relname AS tblname, tbl.reltoastrelid, tbl.reltuples, tbl.relpages AS heappages, coalesce(substring( array_to_string(tbl.reloptions, ' ') FROM '%fillfactor=#"__#"%' FOR '#')::smallint, 100) AS fillfactor, current_setting('block_size')::numeric AS bs, CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma, 24 AS page_hdr, 23 + CASE WHEN MAX(coalesce(s.stanullfrac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END + CASE WHEN tbl.relhasoids THEN 4 ELSE 0 END AS tpl_hdr_size, sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 1024) ) AS tpl_data_size, bool_or(att.atttypid = 'pg_catalog.name'::regtype) AS is_na FROM pg_attribute AS att JOIN pg_class AS tbl ON att.attrelid = tbl.oid JOIN pg_statistic AS s ON s.starelid = tbl.oid AND s.stainherit=false AND s.staattnum=att.attnum WHERE att.attnum > 0 AND NOT att.attisdropped AND tbl.relkind = 'r' GROUP BY 1,2,3,4,5,6,7,8,9, tbl.relhasoids ORDER BY 2,3 ) as s LEFT JOIN pg_class toast ON s.reltoastrelid = toast.oid ) as s2 ) AS s3 JOIN pg_namespace AS ns ON ns.oid = s3.relnamespace WHERE NOT is_na ORDER BY ns.nspname,s3.tblname}, # relhasoids has disappeared, performance improvements $PG_VERSION_120 => q{ SELECT current_database(), ns.nspname, tblname, bs*tblpages AS real_size, (tblpages-est_tblpages)*bs AS extra_size, CASE WHEN tblpages > 0 AND tblpages - est_tblpages > 0 THEN 100 * (tblpages - est_tblpages)/tblpages::float ELSE 0 END AS extra_ratio, fillfactor, CASE WHEN tblpages - est_tblpages_ff > 0 THEN (tblpages-est_tblpages_ff)*bs ELSE 0 END AS bloat_size, CASE WHEN tblpages > 0 AND tblpages - est_tblpages_ff > 0 THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float ELSE 0 END AS bloat_ratio, is_na FROM ( SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages, ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff, tblpages, fillfactor, bs, tblid, relnamespace, tblname, heappages, toastpages, is_na FROM ( SELECT ( 4 + tpl_hdr_size + tpl_data_size + (2*ma) - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END - CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE ceil(tpl_data_size)::int%ma END ) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + toastpages) AS tblpages, heappages, toastpages, reltuples, toasttuples, bs, page_hdr, tblid, relnamespace, tblname, fillfactor, is_na FROM ( SELECT tbl.oid AS tblid, tbl.relnamespace, tbl.relname AS tblname, tbl.reltuples, tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages, coalesce(toast.reltuples, 0) AS toasttuples, coalesce(substring( array_to_string(tbl.reloptions, ' ') FROM 'fillfactor=([0-9]+)')::smallint, 100 ) AS fillfactor, current_setting('block_size')::numeric AS bs, CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma, 24 AS page_hdr, 23 + CASE WHEN MAX(coalesce(s.stanullfrac,0)) > 0 THEN ( 7 + count(s.staattnum) ) / 8 ELSE 0::int END + CASE WHEN bool_or(att.attname = 'oid' and att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size, sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 0) ) AS tpl_data_size, bool_or(att.atttypid = 'pg_catalog.name'::regtype) OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.staattnum) AS is_na FROM pg_attribute AS att JOIN pg_class AS tbl ON att.attrelid = tbl.oid LEFT JOIN pg_statistic AS s ON s.starelid = tbl.oid AND s.stainherit = false AND s.staattnum = att.attnum LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid WHERE NOT att.attisdropped AND tbl.relkind = 'r' GROUP BY 1,2,3,4,5,6,7,8,9,10 ORDER BY 2,3 ) AS s ) AS s2 ) AS s3 JOIN pg_namespace AS ns ON ns.oid = s3.relnamespace WHERE NOT is_na ORDER BY ns.nspname, s3.tblname }, ); # Warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "table_bloat".', -exitval => 127 ) if @hosts != 1; @all_db = @{ get_all_dbname( $hosts[0] ) }; # Iterate over all db ALLDB_LOOP: foreach my $db (sort @all_db) { my @rc; # handle max, avg and count for size and percentage, per relkind my $nb_tbl = 0; my $tbl_bloated = 0; next ALLDB_LOOP if grep { $db =~ /$_/ } @dbexclude; next ALLDB_LOOP if @dbinclude and not grep { $db =~ /$_/ } @dbinclude; @rc = @{ query_ver( $hosts[0], %queries, $db ) }; BLOAT_LOOP: foreach my $bloat (@rc) { foreach my $exclude_re ( @{ $args{'exclude'} } ) { next BLOAT_LOOP if "$bloat->[0].$bloat->[1].$bloat->[2]" =~ m/$exclude_re/; } my $w_limit = 0; my $c_limit = 0; # We need to compute effective thresholds on each object, # as the value can be given in percentage # The biggest calculated size will be used. foreach my $cur_warning (split /,/, $args{'warning'}) { my $size = get_size( $cur_warning, $bloat->[3] ); $w_limit = $size if $size > $w_limit; } foreach my $cur_critical (split /,/, $args{'critical'}) { my $size = get_size( $cur_critical, $bloat->[3] ); $c_limit = $size if $size > $c_limit; } if ( $bloat->[7] > $w_limit ) { $tbl_bloated++; $w_count++; $c_count++ if $bloat->[7] > $c_limit; push @longmsg => sprintf "%s.%s.%s %s/%s (%.2f%%);", $bloat->[0], $bloat->[1], $bloat->[2], to_size($bloat->[7]), to_size($bloat->[3]), $bloat->[8]; } $nb_tbl++; } $total_tbl += $nb_tbl; push @perfdata => [ "table bloated in $db", $tbl_bloated ]; } # We use the warning count for the **total** number of bloated tables return status_critical( $me, [ "$w_count/$total_tbl table(s) bloated" ], \@perfdata, [ @longmsg ] ) if $c_count > 0; return status_warning( $me, [ "$w_count/$total_tbl table(s) bloated" ], \@perfdata, [ @longmsg ] ) if $w_count > 0; return status_ok( $me, [ "Table bloat ok" ], \@perfdata ); } =item B<temp_files> (8.1+) Check the number and size of temp files. This service uses the status file (see C<--status-file> parameter) for 9.2+. Perfdata returns the number and total size of temp files found in C<pgsql_tmp> folders. They are aggregated by database until 8.2, then by tablespace (see GUC temp_tablespaces). Starting with 9.2, perfdata returns as well the number of temp files per database since last run, the total size of temp files per database since last run and the rate at which temp files were generated. Critical and Warning thresholds are optional. They accept either a number of file (raw value), a size (unit is B<mandatory> to define a size) or both values separated by a comma. Thresholds are applied on current temp files being created AND the number/size of temp files created since last execution. Required privileges: <10: superuser v10: an unprivileged role is possible but it will not monitor databases that it cannot access, nor live temp files v11: an unprivileged role is possible but must be granted EXECUTE on functions pg_ls_dir(text), pg_read_file(text), pg_stat_file(text, boolean); the same restrictions than on v10 will still apply v12+: a role with pg_monitor privilege. =cut sub check_temp_files { my $me = 'POSTGRES_TEMP_FILES'; my $now = time(); my $w_flimit; my $c_flimit; my $w_limit; my $c_limit; my @perf_flimits; my @perf_limits; my $obj = 'database(s)'; my @msg_crit; my @msg_warn; my @perfdata; my @hosts; my @rs; my %prev_temp_files; my %new_temp_files; my %args = %{ $_[0] }; my %queries = ( # WARNING: these queries might have a race condition between pg_ls_dir # and pg_stat_file! temp folders are per database $PG_VERSION_81 => q{ SELECT 'live', agg.datname, sum(CASE WHEN agg.tmpfile <> '' THEN 1 ELSE 0 END), sum(CASE WHEN agg.tmpfile <> '' THEN (pg_stat_file(agg.dir||'/'||agg.tmpfile)).size ELSE 0 END) FROM ( SELECT t3.datname, t3.spcname, t3.dbroot||'/'||t3.dbcont AS dir, CASE gs.i WHEN 1 THEN pg_ls_dir(t3.dbroot||'/'||t3.dbcont) ELSE '' END AS tmpfile FROM ( SELECT d.datname, t2.spcname, t2.tblroot||'/'||t2.tblcont AS dbroot, pg_ls_dir(t2.tblroot||'/'||t2.tblcont) AS dbcont FROM ( SELECT t.spcname, t.tblroot, pg_ls_dir(tblroot) AS tblcont FROM ( SELECT spc.spcname, 'pg_tblspc/'||spc.oid AS tblroot FROM pg_tablespace AS spc WHERE spc.spcname !~ '^pg_' UNION ALL SELECT 'pg_default', 'base' AS dir ) AS t ) AS t2 JOIN pg_database d ON d.oid=t2.tblcont ) AS t3, (SELECT generate_series(1,2) AS i) AS gs WHERE t3.dbcont='pgsql_tmp' ) AS agg GROUP BY 1,2 }, # Temp folders are per tablespace $PG_VERSION_83 => q{ SELECT 'live', agg.spcname, sum(CASE WHEN agg.tmpfile <> '' THEN 1 ELSE 0 END), sum(CASE WHEN agg.tmpfile <> '' THEN (pg_stat_file(agg.dir||'/'||agg.tmpfile)).size ELSE 0 END) FROM ( SELECT gs.i, sr.oid, sr.spcname, sr.dir, CASE WHEN gs.i = 1 THEN pg_ls_dir(sr.dir) ELSE '' END AS tmpfile FROM ( SELECT spc.oid, spc.spcname, 'pg_tblspc/'||spc.oid||'/pgsql_tmp' AS dir, pg_ls_dir('pg_tblspc/'||spc.oid) AS sub FROM ( SELECT oid, spcname FROM pg_tablespace WHERE spcname !~ '^pg_' ) AS spc UNION ALL SELECT 0, 'pg_default', 'base/pgsql_tmp' AS dir, 'pgsql_tmp' AS sub FROM pg_ls_dir('base') AS l WHERE l='pgsql_tmp' ) sr, (SELECT generate_series(1,2) AS i) AS gs WHERE sr.sub='pgsql_tmp' ) agg GROUP BY 1,2 }, # Add sub folder PG_9.x_* to pg_tblspc $PG_VERSION_90 => q{ SELECT 'live', agg.spcname, sum(CASE WHEN agg.tmpfile <> '' THEN 1 ELSE 0 END), sum(CASE WHEN agg.tmpfile <> '' THEN (pg_stat_file(agg.dir||'/'||agg.tmpfile)).size ELSE 0 END) FROM ( SELECT ls.oid, ls.spcname, ls.dir||'/'||ls.sub AS dir, CASE gs.i WHEN 1 THEN '' ELSE pg_ls_dir(dir||'/'||ls.sub) END AS tmpfile FROM ( SELECT sr.oid, sr.spcname, 'pg_tblspc/'||sr.oid||'/'||sr.spc_root AS dir, pg_ls_dir('pg_tblspc/'||sr.oid||'/'||sr.spc_root) AS sub FROM ( SELECT spc.oid, spc.spcname, pg_ls_dir('pg_tblspc/'||spc.oid) AS spc_root, trim( trailing E'\n ' FROM pg_read_file('PG_VERSION', 0, 100)) as v FROM ( SELECT oid, spcname FROM pg_tablespace WHERE spcname !~ '^pg_' ) AS spc ) sr WHERE sr.spc_root ~ ('^PG_'||sr.v) UNION ALL SELECT 0, 'pg_default', 'base' AS dir, 'pgsql_tmp' AS sub FROM pg_ls_dir('base') AS l WHERE l='pgsql_tmp' ) AS ls, (SELECT generate_series(1,2) AS i) AS gs WHERE ls.sub = 'pgsql_tmp' ) agg GROUP BY 1,2 }, # Add stats from pg_stat_database $PG_VERSION_92 => q{ SELECT 'live', agg.spcname, sum(CASE WHEN agg.tmpfile <> '' THEN 1 ELSE 0 END), sum(CASE WHEN agg.tmpfile <> '' THEN (pg_stat_file(agg.dir||'/'||agg.tmpfile)).size ELSE 0 END) FROM ( SELECT ls.oid, ls.spcname, ls.dir||'/'||ls.sub AS dir, CASE gs.i WHEN 1 THEN '' ELSE pg_ls_dir(dir||'/'||ls.sub) END AS tmpfile FROM ( SELECT sr.oid, sr.spcname, 'pg_tblspc/'||sr.oid||'/'||sr.spc_root AS dir, pg_ls_dir('pg_tblspc/'||sr.oid||'/'||sr.spc_root) AS sub FROM ( SELECT spc.oid, spc.spcname, pg_ls_dir('pg_tblspc/'||spc.oid) AS spc_root, trim( trailing E'\n ' FROM pg_read_file('PG_VERSION')) as v FROM ( SELECT oid, spcname FROM pg_tablespace WHERE spcname !~ '^pg_' ) AS spc ) sr WHERE sr.spc_root ~ ('^PG_'||sr.v) UNION ALL SELECT 0, 'pg_default', 'base' AS dir, 'pgsql_tmp' AS sub FROM pg_ls_dir('base') AS l WHERE l='pgsql_tmp' ) AS ls, (SELECT generate_series(1,2) AS i) AS gs WHERE ls.sub = 'pgsql_tmp' ) agg GROUP BY 1,2 UNION ALL SELECT 'db', d.datname, s.temp_files, s.temp_bytes FROM pg_database AS d JOIN pg_stat_database AS s ON s.datid=d.oid WHERE datallowconn }, # Specific query to handle superuser and non-superuser roles in # PostgreSQL 10 the WHERE current_setting('is_superuser')::bool clause # does all the magic Also, the previous query was not working with # PostgreSQL 10 $PG_VERSION_100 => q{ SELECT 'live', agg.spcname, count(agg.tmpfile), SUM(COALESCE((pg_stat_file(agg.dir||'/'||agg.tmpfile, true)).size, 0)) AS SIZE FROM ( SELECT ls.oid, ls.spcname AS spcname, ls.dir||'/'||ls.sub AS dir, tmpdir.tmpfile FROM ( SELECT * FROM ( SELECT sr.oid, sr.spcname, 'pg_tblspc/'||sr.oid||'/'||sr.spc_root AS dir, pg_ls_dir('pg_tblspc/'||sr.oid||'/'||sr.spc_root) AS sub FROM ( SELECT spc.oid, spc.spcname, pg_ls_dir('pg_tblspc/'||spc.oid) AS spc_root, trim(TRAILING e'\n ' FROM pg_read_file('PG_VERSION')) AS v FROM ( SELECT oid, spcname FROM pg_tablespace WHERE spcname !~ '^pg_' ) AS spc ) sr WHERE sr.spc_root ~ ('^PG_'||sr.v) ) tmpr WHERE tmpr.sub = 'pgsql_tmp' UNION ALL SELECT 0, 'pg_default', 'base' AS dir, 'pgsql_tmp' AS sub FROM pg_ls_dir('base') AS l WHERE l='pgsql_tmp' ) AS ls LEFT OUTER JOIN LATERAL ( SELECT pg_ls_dir(ls.dir||'/'||ls.sub) AS tmpfile ) AS tmpdir ON (true) ) AS agg WHERE current_setting('is_superuser')::bool GROUP BY 1, 2 UNION ALL SELECT 'db', d.datname, s.temp_files, s.temp_bytes FROM pg_database AS d JOIN pg_stat_database AS s ON s.datid=d.oid }, # Use pg_ls_tmpdir with PostgreSQL 12 # The technic to bypass function execution for non-superuser roles used in # the query PG_VERSION_100 does not work anymore since commit b8d7f053c5c in # PostgreSQL. From now on, this probe requires at least a pg_monitor role to # perform with PostgreSQL >= 12. $PG_VERSION_120 => q{ SELECT 'live', agg.spcname, count(agg.name), SUM(agg.size) AS SIZE FROM ( SELECT ts.spcname, tmp.name, tmp.size FROM pg_tablespace ts, LATERAL pg_catalog.pg_ls_tmpdir(ts.oid) tmp WHERE spcname <> 'pg_global' ) AS agg GROUP BY 1, 2 UNION ALL SELECT 'db', d.datname, s.temp_files, s.temp_bytes FROM pg_database AS d JOIN pg_stat_database AS s ON s.datid=d.oid; }, ); @hosts = @{ parse_hosts %args }; is_compat $hosts[0], 'temp_files', $PG_VERSION_81 or exit 1; $obj = 'tablespace(s)' if $hosts[0]{'version_num'} >= $PG_VERSION_83; $obj = 'tablespace(s)/database(s)' if $hosts[0]{'version_num'} >= $PG_VERSION_92; pod2usage( -message => 'FATAL: you must give only one host with service "temp_files".', -exitval => 127 ) if @hosts != 1; if ( defined $args{'warning'} and defined $args{'critical'} ) { while ( $args{'warning'} =~ m/(?:(\d+)([kmgtpez]?b)?)/ig ) { if ( defined $2 ) { $w_limit = get_size("$1$2"); } else { $w_flimit = $1; } } while ( $args{'critical'} =~ m/(?:(\d+)([kmgtpez]?b)?)/ig ) { if ( defined $2 ) { $c_limit = get_size("$1$2"); } else { $c_flimit = $1; } } pod2usage( -message => 'FATAL: you must give the number file thresholds ' .'for both warning AND critical if used with service "temp_files".', -exitval => 127 ) if (defined $w_flimit and not defined $c_flimit) or (not defined $w_flimit and defined $c_flimit); pod2usage( -message => 'FATAL: you must give the total size thresholds ' .'for both warning AND critical if used with service "temp_files".', -exitval => 127 ) if (defined $w_limit and not defined $c_limit) or (not defined $w_limit and defined $c_limit); @perf_flimits = ( $w_flimit, $c_flimit ) if defined $w_flimit; @perf_limits = ( $w_limit, $c_limit ) if defined $w_limit; } %prev_temp_files = %{ load( $hosts[0], 'temp_files', $args{'status-file'} ) || {} }; @rs = @{ query_ver( $hosts[0], %queries ) }; DB_LOOP: foreach my $stat (@rs) { my $frate; my $brate; my $last_check; my $last_number; my $last_size; my $diff_number; my $diff_size; if ( $stat->[0] eq 'live' ) { push @perfdata => [ "# files in $stat->[1]", $stat->[2], 'File', @perf_flimits ]; push @perfdata => [ "Total size in $stat->[1]", $stat->[3], 'B', @perf_limits ]; if ( defined $c_limit) { if ( $stat->[3] > $c_limit ) { push @msg_crit => sprintf("%s (%s file(s)/%s)", $stat->[1], $stat->[2], to_size($stat->[3]) ); next DB_LOOP; } push @msg_warn => sprintf("%s (%s file(s)/%s)", $stat->[1], $stat->[2], to_size($stat->[3]) ) if $stat->[3] > $w_limit; } if ( defined $c_flimit) { if ( $stat->[2] > $c_flimit ) { push @msg_crit => sprintf("%s (%s file(s)/%s)", $stat->[1], $stat->[2], to_size($stat->[3]) ); next DB_LOOP; } push @msg_warn => sprintf("%s (%s file(s)/%s)", $stat->[1], $stat->[2], to_size($stat->[3]) ) if $stat->[2] > $w_flimit; } next DB_LOOP; } $new_temp_files{ $stat->[1] } = [ $now, $stat->[2], $stat->[3] ]; next DB_LOOP unless defined $prev_temp_files{ $stat->[1] }; $last_check = $prev_temp_files{ $stat->[1] }[0]; $last_number = $prev_temp_files{ $stat->[1] }[1]; $last_size = $prev_temp_files{ $stat->[1] }[2]; $diff_number = $stat->[2] - $last_number; $diff_size = $stat->[3] - $last_size; $frate = 60 * $diff_number / ($now - $last_check); $brate = 60 * $diff_size / ($now - $last_check); push @perfdata => [ "$stat->[1]", $frate, 'Fpm' ]; push @perfdata => [ "$stat->[1]", $brate, 'Bpm' ]; push @perfdata => [ "$stat->[1]", $diff_number, 'Files', @perf_flimits ]; push @perfdata => [ "$stat->[1]", $diff_size, 'B', @perf_limits ]; if ( defined $c_limit) { if ( $diff_size > $c_limit ) { push @msg_crit => sprintf("%s (%s file(s)/%s)", $stat->[1], $diff_number, to_size($diff_size) ); next DB_LOOP; } push @msg_warn => sprintf("%s (%s file(s)/%s)", $stat->[1], $diff_number, to_size($diff_size) ) if $diff_size > $w_limit; } if ( defined $c_flimit) { if ( $diff_number > $c_flimit ) { push @msg_crit => sprintf("%s (%s file(s)/%s)", $stat->[1], $diff_number, to_size($diff_size) ); next DB_LOOP; } push @msg_warn => sprintf("%s (%s file(s)/%s)", $stat->[1], $diff_number, to_size($diff_size) ) if $diff_number > $w_flimit; } } save $hosts[0], 'temp_files', \%new_temp_files, $args{'status-file'}; return status_critical( $me, [ @msg_crit, @msg_warn ], \@perfdata ) if scalar @msg_crit > 0; return status_warning( $me, \@msg_warn, \@perfdata ) if scalar @msg_warn > 0; return status_ok( $me, [ scalar(@rs) . " $obj checked" ], \@perfdata ); } =item B<uptime> (8.1+) Returns time since postmaster start ("uptime", from 8.1), since configuration reload (from 8.4), and since shared memory initialization (from 10). Please note that the uptime is unaffected when the postmaster resets all its children (for example after a kill -9 on a process or a failure). From 10+, the 'time since shared memory init' aims at detecting this situation: in fact we use the age of the oldest non-client child process (usually checkpointer, writer or startup). This needs pg_monitor access to read pg_stat_activity. Critical and Warning thresholds are optional. If both are set, Critical is raised when the postmaster uptime or the time since shared memory initialization is less than the critical threshold. Warning is raised when the time since configuration reload is less than the warning threshold. If only a warning or critical threshold is given, it will be used for both cases. Obviously these alerts will disappear from themselves once enough time has passed. Perfdata contain the three values (when available). Required privileges: pg_monitor on PG10+; otherwise unprivileged role. =cut sub check_uptime { my @rs; my @hosts; my @perfdata; my @msg; my @msg_warn; my @msg_crit; my $uptime; my $shmem_init_time; my $reload_conf_time; my $reload_conf_flag; my $msg_uptime; my $msg_shmem_init_time; my $msg_reload_conf; my $c_limit; my $w_limit; my $me = 'POSTGRES_UPTIME'; my %queries = ( $PG_VERSION_81 => q{ SELECT extract('epoch' from (current_timestamp - pg_postmaster_start_time())) AS time_since_postmaster_start, null, pg_postmaster_start_time() as postmaster_start_time }, $PG_VERSION_84 => q{ SELECT extract('epoch' from (current_timestamp - pg_postmaster_start_time())) AS time_since_postmaster_start, extract('epoch' from (current_timestamp - pg_conf_load_time())) AS time_since_conf_reload, pg_postmaster_start_time(), pg_conf_load_time() }, $PG_VERSION_100 => q{ SELECT extract('epoch' from (current_timestamp - pg_postmaster_start_time())) AS time_since_postmaster_start, extract('epoch' from (current_timestamp - pg_conf_load_time())) AS time_since_conf_reload, pg_postmaster_start_time(), pg_conf_load_time(), -- oldest child (usually checkpointer, startup...) extract('epoch' from (current_timestamp - min(backend_start))) AS age_oldest_child_process, min(backend_start) AS oldest_child_process FROM pg_stat_activity WHERE backend_type != 'client backend' } ); @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "uptime".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'uptime', $PG_VERSION_81 or exit 1; $c_limit = get_time $args{'critical'} if (defined $args{'critical'}) ; $w_limit = get_time $args{'warning'} if (defined $args{'warning'}); @rs = @{ query_ver( $hosts[0], %queries ) }; $uptime = int( $rs[0][0] ); $msg_uptime = "postmaster started for ".to_interval($uptime)." (since $rs[0][2])" ; push @perfdata => [ 'postmaster uptime', $uptime , 's', undef, undef, 0 ]; # time since configuration reload $reload_conf_flag = !(check_compat $hosts[0], $PG_VERSION_81, $PG_VERSION_84); if ($reload_conf_flag) { $reload_conf_time = int( $rs[0][1] ); $msg_reload_conf = "configuration reloaded for ".to_interval($reload_conf_time)." (since $rs[0][3])"; push @perfdata => [ 'configuration uptime', $reload_conf_time , 's', undef, undef, 0 ]; } else { $msg_reload_conf = ""; }; # time since last share memory reinit if ( check_compat $hosts[0], $PG_VERSION_100 ) { $shmem_init_time = int ( $rs[0][4] ); $msg_shmem_init_time = "shared memory initialized for ".to_interval($shmem_init_time)." (since $rs[0][5])"; push @perfdata => [ 'shmem init time', $shmem_init_time , 's', undef, undef, 0 ]; } # uptime check if ( defined $c_limit and $uptime < $c_limit ) { push @msg_crit => $msg_uptime; } elsif ( not defined $c_limit and defined $w_limit and $uptime < $w_limit ) { push @msg_warn => $msg_uptime; } else { push @msg => $msg_uptime; } # shmem init check if ( defined $shmem_init_time and defined $c_limit and $shmem_init_time < $c_limit ) { push @msg_crit => $msg_shmem_init_time; } elsif ( defined $shmem_init_time and not defined $c_limit and defined $w_limit and $shmem_init_time < $w_limit ) { push @msg_warn => $msg_shmem_init_time; } elsif ( defined $shmem_init_time ) { push @msg => $msg_shmem_init_time; } # reload check if ( $reload_conf_flag and defined $c_limit and not defined $w_limit and $reload_conf_time < $c_limit ) { push @msg_crit => $msg_reload_conf; } elsif ($reload_conf_flag and defined $w_limit and $reload_conf_time < $w_limit) { push @msg_warn => $msg_reload_conf; } elsif ( $reload_conf_flag ) { push @msg => $msg_reload_conf; } return status_critical( $me, [ @msg_crit, @msg_warn, @msg ], \@perfdata ) if @msg_crit; return status_warning( $me, [ @msg_warn, @msg ], \@perfdata ) if @msg_warn; return status_ok( $me, \@msg, \@perfdata ); } =item B<wal_files> (8.1+) Check the number of WAL files. Perfdata returns the total number of WAL files, current number of written WAL, the current number of recycled WAL, the rate of WAL written to disk since the last execution on the master cluster and the current timeline. Critical and Warning thresholds accept either a raw number of files or a percentage. In case of percentage, the limit is computed based on: 100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target) For PostgreSQL 8.1 and 8.2: 100% = 1 + checkpoint_segments * 2 If C<wal_keep_segments> is set for 9.0 to 9.4, the limit is the greatest of the following formulas: 100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target) 100% = 1 + wal_keep_segments + 2 * checkpoint_segments For 9.5 to 12, the limit is: 100% = max_wal_size (as a number of WAL) + wal_keep_segments (if set) For 13 and above: 100% = max_wal_size + wal_keep_size (as numbers of WAL) Required privileges: <10:superuser (<10) v10:unprivileged user with pg_monitor v11+ :unprivileged user with pg_monitor, or with grant EXECUTE on function pg_ls_waldir =cut sub check_wal_files { my $seg_written = 0; my $seg_recycled = 0; my $seg_kept = 0; my $num_seg = 0; my $tli; my $max_segs; my $first_seg; my @rs; my @perfdata; my @msg; my @hosts; my %args = %{ $_[0] }; my $me = 'POSTGRES_WAL_FILES'; my $wal_size = hex('ff000000'); my %queries = ( # The logic of these queries is mainly to compute a number of WALs to # compare against the current number of WALs (see rules above). # Parameters and the units stored in pg_settings have changed often across # versions. $PG_VERSION_130 => q{ WITH wal_settings AS ( SELECT sum(setting::int) filter (WHERE name='max_wal_size') as max_wal_size, -- unit: MB sum(setting::int) filter (WHERE name='wal_segment_size') as wal_segment_size, -- unit: B sum(setting::int) filter (WHERE name='wal_keep_size') as wal_keep_size -- unit: MB FROM pg_settings WHERE name IN ('max_wal_size','wal_segment_size','wal_keep_size') ) SELECT s.name, (wal_keep_size + max_wal_size) / (wal_segment_size/1024^2) AS max_nb_wal, -- unit: nb of WALs CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_current_wal_lsn() END, floor(wal_keep_size / (wal_segment_size/1024^2)) AS wal_keep_segments, -- unit: nb of WALs (pg_control_checkpoint()).timeline_id AS tli FROM pg_ls_waldir() AS s CROSS JOIN wal_settings WHERE name ~ '^[0-9A-F]{24}$' ORDER BY s.modification DESC, name DESC}, $PG_VERSION_110 => q{ WITH wal_settings AS ( SELECT sum(setting::int) filter (WHERE name='max_wal_size') as max_wal_size, -- unit: MB sum(setting::int) filter (WHERE name='wal_segment_size') as wal_segment_size, -- unit: B sum(setting::int) filter (WHERE name='wal_keep_segments') as wal_keep_segments -- unit: nb of WALs FROM pg_settings WHERE name IN ('max_wal_size','wal_segment_size','wal_keep_segments') ) SELECT s.name, wal_keep_segments + max_wal_size / (wal_segment_size / 1024^2) AS max_nb_wal, --unit: nb of WALs CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_current_wal_lsn() END, wal_keep_segments, -- unit: nb of WALs (pg_control_checkpoint()).timeline_id AS tli FROM pg_ls_waldir() AS s CROSS JOIN wal_settings WHERE name ~ '^[0-9A-F]{24}$' ORDER BY s.modification DESC, name DESC}, $PG_VERSION_100 => q{ WITH wal_settings AS ( SELECT sum(setting::int) filter (WHERE name='max_wal_size') as max_wal_size, --unit: MB sum(setting::int) filter (WHERE name='wal_segment_size') as wal_segment_size, --usually 2048 (blocks) sum(setting::int) filter (WHERE name='wal_block_size') as wal_block_size, --usually 8192 sum(setting::int) filter (WHERE name='wal_keep_segments') as wal_keep_segments -- unit:nb of WALs FROM pg_settings WHERE name IN ('max_wal_size','wal_segment_size','wal_block_size','wal_keep_segments') ) SELECT s.name, wal_keep_segments + (max_wal_size / (wal_block_size * wal_segment_size / 1024^2)) AS max_nb_wal, --unit: nb of WALs CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_current_wal_lsn() END, wal_keep_segments, (pg_control_checkpoint()).timeline_id AS tli FROM pg_ls_waldir() AS s CROSS JOIN wal_settings WHERE name ~ '^[0-9A-F]{24}$' ORDER BY s.modification DESC, name DESC}, $PG_VERSION_95 => q{ WITH wal_settings AS ( SELECT setting::int + current_setting('wal_keep_segments')::int as max_nb_wal --unit: nb of WALs FROM pg_settings WHERE name = 'max_wal_size' -- unit for max_wal_size: 16MB ) SELECT s.f, max_nb_wal, CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_current_xlog_location() END, current_setting('wal_keep_segments')::integer, substring(s.f from 1 for 8) AS tli FROM pg_ls_dir('pg_xlog') AS s(f) CROSS JOIN wal_settings WHERE f ~ '^[0-9A-F]{24}$' ORDER BY (pg_stat_file('pg_xlog/'||s,true)).modification DESC, f DESC}, $PG_VERSION_90 => q{ SELECT s.f, greatest( 1 + current_setting('checkpoint_segments')::float4 * (2 + current_setting('checkpoint_completion_target')::float4), 1 + current_setting('wal_keep_segments')::float4 + 2 * current_setting('checkpoint_segments')::float4 ), CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_current_xlog_location() END, current_setting('wal_keep_segments')::integer, substring(s.f from 1 for 8) AS tli FROM pg_ls_dir('pg_xlog') AS s(f) WHERE f ~ '^[0-9A-F]{24}$' ORDER BY (pg_stat_file('pg_xlog/'||s.f)).modification DESC, f DESC}, $PG_VERSION_83 => q{ SELECT s.f, 1 + ( current_setting('checkpoint_segments')::float4 * ( 2 + current_setting('checkpoint_completion_target')::float4 ) ), pg_current_xlog_location(), NULL, substring(s.f from 1 for 8) AS tli FROM pg_ls_dir('pg_xlog') AS s(f) WHERE f ~ '^[0-9A-F]{24}$' ORDER BY (pg_stat_file('pg_xlog/'||s.f)).modification DESC, f DESC}, $PG_VERSION_82 => q{ SELECT s.f, 1 + (current_setting('checkpoint_segments')::integer * 2), pg_current_xlog_location(), NULL, substring(s.f from 1 for 8) AS tli FROM pg_ls_dir('pg_xlog') AS s(f) WHERE f ~ '^[0-9A-F]{24}$' ORDER BY (pg_stat_file('pg_xlog/'||s.f)).modification DESC, f DESC}, $PG_VERSION_81 => q{ SELECT s.f, 1 + (current_setting('checkpoint_segments')::integer * 2), NULL, NULL, substring(s.f from 1 for 8) AS tli FROM pg_ls_dir('pg_xlog') AS s(f) WHERE f ~ '^[0-9A-F]{24}$' ORDER BY (pg_stat_file('pg_xlog/'||s.f)).modification DESC, f DESC} ); if ( defined $args{'warning'} ) { # warning and critical are mandatory. pod2usage( -message => "FATAL: you must specify critical and warning thresholds.", -exitval => 127 ) unless defined $args{'warning'} and defined $args{'critical'} ; # warning and critical must be raw or %. pod2usage( -message => "FATAL: critical and warning thresholds only accept raw numbers or %.", -exitval => 127 ) unless $args{'warning'} =~ m/^([0-9.]+)%?$/ and $args{'critical'} =~ m/^([0-9.]+)%?$/; } @hosts = @{ parse_hosts %args }; pod2usage( -message => 'FATAL: you must give only one host with service "wal_files".', -exitval => 127 ) if @hosts != 1; is_compat $hosts[0], 'wal_files', $PG_VERSION_81 or exit 1; $wal_size = 4294967296 if $hosts[0]{'version_num'} >= $PG_VERSION_93; @rs = @{ query_ver( $hosts[0], %queries ) }; $first_seg = $rs[0][0]; $max_segs = $rs[0][1]; #segments to keep including kept segments $tli = hex($rs[0][4]); foreach my $r (@rs) { $num_seg++; $seg_recycled++ if $r->[0] gt $first_seg; } $seg_written = $num_seg - $seg_recycled; push @perfdata => [ "total_wal", $num_seg, undef ]; push @perfdata => [ "recycled_wal", $seg_recycled ]; push @perfdata => [ "tli", $tli ]; # pay attention to the wal_keep_segment in perfdata if ( $hosts[0]{'version_num'} >= $PG_VERSION_90) { $seg_kept = $rs[0][3]; if ($seg_kept > 0) { # cheat with numbers if the keep_segment was just set and the # number of wal doesn't match it yet. if ($seg_kept > $seg_written) { push @perfdata => [ "written_wal", 1 ]; push @perfdata => [ "kept_wal", $seg_written - 1 ]; } else { push @perfdata => [ "written_wal", $seg_written - $seg_kept ]; push @perfdata => [ "kept_wal", $seg_kept ]; } } else { push @perfdata => [ "written_wal", $seg_written ]; push @perfdata => [ "kept_wal", 0 ]; } } else { push @perfdata => [ "written_wal", $seg_written ]; } push @msg => "$num_seg WAL files"; if ( $hosts[0]{'version_num'} >= $PG_VERSION_82 and $rs[0][2] ne '') { my $now = time(); my $curr_lsn = $rs[0][2]; my @prev_lsn = @{ load( $hosts[0], 'last wal files LSN', $args{'status-file'} ) || [] }; $curr_lsn =~ m{^([0-9A-F]+)/([0-9A-F]+)$}; $curr_lsn = ( $wal_size * hex($1) ) + hex($2); unless ( @prev_lsn == 0 or $now == $prev_lsn[0] ) { my $rate = ($curr_lsn - $prev_lsn[1])/($now - $prev_lsn[0]); $rate = int($rate*100+0.5)/100; push @perfdata => [ "wal_rate", $rate, 'Bps' ]; } save $hosts[0], 'last wal files LSN', [ $now, $curr_lsn ], $args{'status-file'}; } if ( defined $args{'warning'} ) { my $w_limit = get_size( $args{'warning'}, $max_segs ); my $c_limit = get_size( $args{'critical'}, $max_segs ); push @{ $perfdata[0] } => ( $w_limit, $c_limit, 1, $max_segs ); return status_critical( $me, \@msg, \@perfdata ) if $num_seg >= $c_limit; return status_warning( $me, \@msg, \@perfdata ) if $num_seg >= $w_limit; } return status_ok( $me, \@msg, \@perfdata ); } # End of SERVICE section in pod doc =pod =back =cut Getopt::Long::Configure('bundling'); GetOptions( \%args, 'checkpoint_segments=i', 'critical|c=s', 'dbexclude=s', 'dbinclude=s', 'debug!', 'detailed!', 'dump-status-file!', 'dump-bin-file:s', 'effective_cache_size=i', 'exclude=s', 'format|F=s', 'global-pattern=s', 'help|?!', 'host|h=s', 'ignore-wal-size!', 'unarchiver=s', 'dbname|d=s', 'dbservice|S=s', 'list|l!', 'maintenance_work_mem=i', 'no_check_autovacuum!', 'no_check_enable!', 'no_check_fsync!', 'no_check_track_counts!', 'output|o=s', 'path=s', 'pattern=s', 'port|p=s', 'psql|P=s', 'query=s', 'reverse!', 'save!', 'service|s=s', 'shared_buffers=i', 'slave=s', 'status-file=s', 'suffix=s', 'timeout|t=s', 'tmpdir=s', 'type=s', 'username|U=s', 'uid=s', 'version|V!', 'wal_buffers=i', 'warning|w=s', 'work_mem=i' ) or pod2usage( -exitval => 127 ); list_services() if $args{'list'}; version() if $args{'version'}; pod2usage( -verbose => 2 ) if $args{'help'}; dump_status_file( $args{'dump-bin-file'} ) if $args{'dump-status-file'} or defined $args{'dump-bin-file'}; # One service must be given pod2usage( -message => "FATAL: you must specify one service.\n" . " See -s or --service command line option.", -exitval => 127 ) unless defined $args{'service'}; # Check that the given service exists. pod2usage( -message => "FATAL: service $args{'service'} does not exist.\n" . " Use --list to show the available services.", -exitval => 127 ) unless exists $services{ $args{'service'} }; # Make sure the path given as status file is a file, not a folder. pod2usage( -message => "FATAL: --status-file must be a path to a file not a directory.\n" -exitval => 127 ) if -d $args{'status-file'}; # Check we have write permission to the tempdir pod2usage( -message => 'FATAL: temp directory given or found not writable.', -exitval => 127 ) if not -d $args{'tmpdir'} or not -x $args{'tmpdir'}; # Both critical and warning must be given is optional, # but for pga_version, minor_version and uptime which use only one of them or # none pod2usage( -message => 'FATAL: you must provide both warning and critical thresholds.', -exitval => 127 ) if $args{'service'} !~ m/^(pga_version|minor_version|uptime)$/ and ( ( defined $args{'critical'} and not defined $args{'warning'} ) or ( not defined $args{'critical'} and defined $args{'warning'} )); # Query, type and reverse are only allowed with "custom_query" service pod2usage( -message => 'FATAL: query, type and reverse are only allowed with "custom_query" service.', -exitval => 127 ) if ( ( defined $args{'query'} or defined $args{'type'} or $args{'reverse'} == 1 ) and ( $args{'service'} ne 'custom_query' ) ); # Check "configuration" specific arg pod2usage( -message => 'FATAL: work_mem, maintenance_work_mem, shared_buffers, wal_buffers, checkpoint_segments, effective_cache_size, no_check_autovacuum, no_check_fsync, no_check_enable, no_check_track_counts are only allowed with "configuration" service.', -exitval => 127 ) if ( (defined $args{'work_mem'} or defined $args{'maintenance_work_mem'} or defined $args{'shared_buffers'} or defined $args{'wal_buffers'} or defined $args{'checkpoint_segments'} or defined $args{'effective_cache_size'} or $args{'no_check_autovacuum'} == 1 or $args{'no_check_fsync'} == 1 or $args{'no_check_enable'} ==1 or $args{'no_check_track_counts'} == 1) and ( $args{'service'} ne 'configuration' ) ); # Check "archive_folder" specific args --ignore-wal-size and --suffix pod2usage( -message => 'FATAL: "ignore-wal-size" and "suffix" are only allowed with "archive_folder" service.', -exitval => 127 ) if ( $args{'ignore-wal-size'} or $args{'suffix'} ) and $args{'service'} ne 'archive_folder'; # Check "streaming_delta" specific args --slave pod2usage( -message => 'FATAL: "slave" is only allowed with "streaming_delta" service.', -exitval => 127 ) if scalar @{ $args{'slave'} } and $args{'service'} ne 'streaming_delta'; # Check "oldest_xmin" specific args --detailed pod2usage( -message => 'FATAL: "detailed" argument is only allowed with "oldest_xmin" service.', -exitval => 127 ) if scalar $args{'detailed'} and $args{'service'} ne 'oldest_xmin'; # Set psql absolute path unless ($args{'psql'}) { if ( $ENV{PGBINDIR} ) { $args{'psql'} = "$ENV{PGBINDIR}/psql"; } else { $args{'psql'} = 'psql'; } } # Pre-compile given regexp unless (($args{'service'} eq 'pg_dump_backup') or ($args{'service'} eq 'oldest_idlexact')) { $_ = qr/$_/ for @{ $args{'exclude'} } ; $_ = qr/$_/ for @{ $args{'dbinclude'} }; } $_ = qr/$_/ for @{ $args{'dbexclude'} }; # Output format for ( $args{'format'} ) { if ( /^binary$/ ) { $output_fmt = \&bin_output } elsif ( /^debug$/ ) { $output_fmt = \&debug_output } elsif ( /^human$/ ) { $output_fmt = \&human_output } elsif ( /^nagios$/ ) { $output_fmt = \&nagios_output } elsif ( /^nagios_strict$/ ) { $output_fmt = \&nagios_strict_output } elsif ( /^json$/ ) { $output_fmt = \&json_output } elsif ( /^json_strict$/ ) { $output_fmt = \&json_strict_output } else { pod2usage( -message => "FATAL: unrecognized output format \"$_\" (see \"--format\")", -exitval => 127 ); } } if ( $args{'format'} =~ '^json' ) { require JSON::PP; JSON::PP->import; } exit $services{ $args{'service'} }{'sub'}->( \%args ); __END__ =head2 EXAMPLES =over =item Execute service "last_vacuum" on host "host=localhost port=5432": check_pgactivity -h localhost -p 5432 -s last_vacuum -w 30m -c 1h30m =item Execute service "hot_standby_delta" between hosts "service=pg92" and "service=pg92s": check_pgactivity --dbservice pg92,pg92s --service hot_standby_delta -w 32MB -c 160MB =item Execute service "streaming_delta" on host "service=pg92" to check its slave "stby1" with the IP address "192.168.1.11": check_pgactivity --dbservice pg92 --slave "stby1 192.168.1.11" --service streaming_delta -w 32MB -c 160MB =item Execute service "hit_ratio" on host "slave" port "5433, excluding database matching the regexps "idelone" and "(?i:sleep)": check_pgactivity -p 5433 -h slave --service hit_ratio --dbexclude idelone --dbexclude "(?i:sleep)" -w 90% -c 80% =item Execute service "hit_ratio" on host "slave" port "5433, only for databases matching the regexp "importantone": check_pgactivity -p 5433 -h slave --service hit_ratio --dbinclude importantone -w 90% -c 80% =back =head1 VERSION check_pgactivity version 2.7, released on Mon Sep 25 2023. =head1 LICENSING This program is open source, licensed under the PostgreSQL license. For license terms, see the LICENSE provided with the sources. =head1 AUTHORS S<Author: Open PostgreSQL Monitoring Development Group> S<Copyright: (C) 2012-2023 Open PostgreSQL Monitoring Development Group> =cut ����������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/check_pgactivity.spec�������������������������������������������������������0000664�0000000�0000000�00000004372�14504266471�0021541�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������%global _tag REL2_7 Name: nagios-plugins-pgactivity Version: 2.7 Release: 1 Summary: PostgreSQL monitoring plugin for Nagios License: PostgreSQL Group: Applications/Databases Url: https://github.com/OPMDG/check_pgactivity Source0: https://github.com/OPMDG/check_pgactivity/archive/%{_tag}.tar.gz BuildArch: noarch BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) Requires: postgresql Requires: nagios-plugins Provides: check_pgactivity = %{version} %description check_pgactivity is a monitoring plugin of PostgreSQL for Nagios. It provides many checks and allow the gathering of many performance counters. check_pgactivity is part of Open PostgreSQL Monitoring. %prep %setup -n check_pgactivity-%{_tag} %install install -D -p -m 0755 check_pgactivity %{buildroot}/%{_libdir}/nagios/plugins/check_pgactivity %files %defattr(-,root,root,0755) %{_libdir}/nagios/plugins/check_pgactivity %doc README LICENSE %changelog * Mon Sep 25 2023 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 2.7-1 - new major release 2.7 * Fri Jul 08 2022 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 2.6-1 - new major release 2.6 * Tue Nov 24 2020 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 2.5-1 - new major release 2.5 * Wed Jan 30 2019 Christophe Courtois <christophe.courtois@dalibo.com> 2.4-1 - new major release 2.4 * Mon Nov 13 2017 Thomas Reiss <thomas.reiss@dalibo.com> 2.3-1 - new major release 2.3 * Wed Sep 20 2017 Thomas Reiss <thomas.reiss@dalibo.com 2.3beta1-1 - update to release 2.3beta1 * Tue Jun 6 2017 Thomas Reiss <thomas.reiss@dalibo.com> 2.2-1 - new release candidate 2.2 * Tue Feb 28 2017 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 2.2~rc1-1 - new release candidate 2.2~rc1 * Mon Aug 29 2016 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 2.0-1 - new major release 2.0 * Thu Jan 28 2016 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 1.25-1 - update to release 1.25 * Tue Jan 05 2016 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 1.25beta1-1 - update to release 1.25beta1 * Mon Sep 28 2015 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 1.24-1 - update to release 1.24 * Wed Dec 10 2014 Nicolas Thauvin <nicolas.thauvin@dalibo.com> 1.19-1 - update to release 1.19 * Fri Sep 19 2014 Nicolas Thauvin <nicolas.thauvin@dalibo.com> 1.15-1 - Initial version ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/contributors����������������������������������������������������������������0000664�0000000�0000000�00000001031�14504266471�0020012�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������The following contributors helped to bring check_pgactivity : * Jehan-Guillaume de Rorthais * Julien Rouhaud * Thomas Reiss * Ronan Dunklau * Adrien Nayrat * Stefan Fercot * Marc Cousin * Damien Clochard * Flavie Perette * Nicolas Gollet * Nicolas Thauvin * Guillaume Lelarge * Jérémy Marmol * Tobias Brox * Christophe Courtois * Andrey L. (loukash) * macarbiter * Frédéric Yhuel * Julian Vanden Broeck * Benoit Lobréau * Shangzi Xie And many users that reported issues. Thanks to all ! �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/��������������������������������������������������������������������������0000775�0000000�0000000�00000000000�14504266471�0015602�5����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/00-copyright-year.t�������������������������������������������������������0000664�0000000�0000000�00000002066�14504266471�0021156�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use File::Find; use Test::More; # Try to catch all copyright mentions in source code and # fail if the second part of the year is bad. my @filelist; my $year = (gmtime)[5] + 1900; # Build list of readable files find( sub { # ignore root return if m/^\.+$/; # ignore hidden folders $File::Find::prune = 1 if -d $File::Find::name and m/^\./; push @filelist, $File::Find::name unless m/^\./; }, '.' ); ### Beginning tests ### foreach my $f (@filelist) { open my $fh, '<', $f; while (<$fh>) { if ( m/(copyright.*?\d+\s*-\s*(\d+).*Open PostgreSQL Monitoring Development Group.*)$/i ) { is($2, $year, "up to date copyright year in $f:$.") or diag("The copyright mention is: $1"); } } close $fh; } ### End of tests ### done_testing; ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-archive_folder.t�������������������������������������������������������0000664�0000000�0000000�00000013544�14504266471�0021170�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use Test::More tests => 35; my $node = pgNode->new('prod'); # declare instance named "prod" my $archive_dir = $node->archive_dir; my $wal; my $time; # create the instance and start it $node->init(has_archiving => 1); if ( $node->version >= 9.6 ) { $node->append_conf('postgresql.conf', "wal_level = replica"); } elsif ( $node->version >= 9.0 ) { $node->append_conf('postgresql.conf', "wal_level = archive"); } $node->start; # generate three archives # split create table and insert to produce more data in WAL $node->psql('template1', 'create table t (i int primary key)'); $node->psql('template1', 'insert into t select generate_series(1,10000) as i'); $node->switch_wal; $node->psql('template1', 'insert into t select generate_series(10001,20000) as i'); $node->switch_wal; $node->psql('template1', 'insert into t select generate_series(20001,30000) as i'); $wal = $node->switch_wal; # The WAL sequence starts at 000000010000000000000000 up to v8.4, then # 000000010000000000000001 starting from v9.0. # Make sure we have the exact same archive sequence whatever the version so # following tests apply no matter the version. if ($node->version < 9.0) { $node->psql('template1', 'insert into t select generate_series(30001,40000) as i'); $wal = $node->switch_wal; unlink "$archive_dir/000000010000000000000000"; } $node->wait_for_archive($wal); ### Beginning of tests ### # simple success check $node->command_checks_all( [ './check_pgactivity', '--service' => 'archive_folder', '--username' => getlogin, '--warning' => '5m', '--critical' => '10m', '--path' => $archive_dir, '--format' => 'human' ], 0, [ qr/^Service *: POSTGRES_ARCHIVES$/m, qr/^Returns *: 0 \(OK\)$/m, qr{^Message *: 3 WAL archived in '$archive_dir'}m, qr/^Perfdata *: num_archives=3$/m, qr/^Perfdata *: latest_archive_age=\d+s warn=300 crit=600$/m ], [ qr/^$/ ], 'simple archives check' ); # test hole in the sequence rename "$archive_dir/000000010000000000000002", "$archive_dir/000000010000000000000002.bak"; $node->command_checks_all( [ './check_pgactivity', '--service' => 'archive_folder', '--username' => getlogin, '--warning' => '5m', '--critical' => '10m', '--path' => $archive_dir, '--format' => 'human' ], 2, [ qr/^Service *: POSTGRES_ARCHIVES$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr{^Message *: Wrong sequence or file missing @ '000000010000000000000002'}m, qr/^Perfdata *: num_archives=2$/m, qr/^Perfdata *: latest_archive_age=\d+s warn=300 crit=600$/m ], [ qr/^$/ ], 'error missing one archive' ); rename "$archive_dir/000000010000000000000002.bak", "$archive_dir/000000010000000000000002"; # test warning archive too old $time = time - 360; utime $time, $time, "$archive_dir/000000010000000000000001", "$archive_dir/000000010000000000000002", "$archive_dir/000000010000000000000003"; $node->command_checks_all( [ './check_pgactivity', '--service' => 'archive_folder', '--username' => getlogin, '--warning' => '5m', '--critical' => '10m', '--path' => $archive_dir, '--format' => 'human' ], 1, [ qr/^Service *: POSTGRES_ARCHIVES$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr{^Message *: 3 WAL archived in '$archive_dir'}m, qr/^Perfdata *: num_archives=3$/m, qr/^Perfdata *: latest_archive_age=36\ds warn=300 crit=600$/m ], [ qr/^$/ ], 'warn archive too old' ); # test critical archive too old $node->command_checks_all( [ './check_pgactivity', '--service' => 'archive_folder', '--username' => getlogin, '--warning' => '2m', '--critical' => '5m', '--path' => $archive_dir, '--format' => 'human' ], 2, [ qr/^Service *: POSTGRES_ARCHIVES$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr{^Message *: 3 WAL archived in '$archive_dir'}m, qr/^Perfdata *: num_archives=3$/m, qr/^Perfdata *: latest_archive_age=36\ds warn=120 crit=300$/m ], [ qr/^$/ ], 'critical archive too old' ); # wrong sequence order # setting 02 older than 01, the check gather archives in this mtime order: # 02 01 03 # because of this, after checking 02 validity, it expects 03 to be the # next file but find 01 and warn that 03 was expected. $time = time - 400; utime $time, $time, "$archive_dir/000000010000000000000002"; $node->command_checks_all( [ './check_pgactivity', '--service' => 'archive_folder', '--username' => getlogin, '--warning' => '10m', '--critical' => '15m', '--path' => $archive_dir, '--format' => 'human' ], 2, [ qr/^Service *: POSTGRES_ARCHIVES$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr{^Message *: Wrong sequence or file missing @ '000000010000000000000003}m, qr/^Perfdata *: num_archives=3$/m, qr/^Perfdata *: latest_archive_age=36\ds warn=600 crit=900$/m ], [ qr/^$/ ], 'wrong sequence order' ); ### End of tests ### $node->stop( 'immediate' ); ������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-archiver.t�������������������������������������������������������������0000664�0000000�0000000�00000012226�14504266471�0020013�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use TestLib (); use pgNode; use Test::More; my $node = pgNode->get_new_node('prod'); my $pga_data = "$TestLib::tmp_check/pga.data"; my $wal; my @stdout; my $num_tests = 25; $node->init(has_archiving => 1); if ( $node->version >= 9.6 ) { $node->append_conf('postgresql.conf', "wal_level = replica"); } elsif ( $node->version >= 9.0 ) { $node->append_conf('postgresql.conf', "wal_level = archive"); } $node->start; $num_tests-- if $node->version < 11; plan tests => $num_tests; ### Beginning of tests ### # generate one archive # split create table and insert to produce more data in WAL $node->psql('template1', 'create table t (i int primary key)'); $node->psql('template1', 'insert into t select generate_series(1,10000) as i'); $wal = $node->switch_wal; # The WAL sequence starts at 000000010000000000000000 up to v8.4, then # 000000010000000000000001 starting from v9.0. # Make sure we have the exact same archive sequence whatever the version so # following tests apply no matter the version. if ($node->version < 9.0) { $node->psql('template1', 'insert into t select generate_series(-1000,0) as i'); $wal = $node->switch_wal; } # FIXME: there's a race condition in archiver check when it get the mtime # of the next WAL to archive while it hasn't been created yet. # Write a checkpoint to force the creation of the new WAL. $node->psql('template1', 'checkpoint'); $node->wait_for_archive($wal); $node->command_checks_all( [ './check_pgactivity', '--service' => 'archiver', '--username' => getlogin, '--status-file' => $pga_data, '--format' => 'human' ], 0, [ qr/^Service *: POSTGRES_ARCHIVER$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 0 WAL files ready to archive$/m, qr/^Perfdata *: ready_archive=0 min=0$/m, qr/^Perfdata *: oldest_ready_wal=0s min=0$/m ], [ qr/^$/ ], 'basic check without thresholds with superuser' ); # archiver failing $node->append_conf('postgresql.conf', "archive_command = 'false'"); $node->reload; $node->psql('template1', 'insert into t select generate_series(10001,20000) as i'); $wal = $node->switch_wal; # avoid same race condition $node->psql('template1', 'checkpoint'); # FIXME: arbitrary sleep time to wait for archiver to fail at least one time sleep 1; # for 9.6 and before, the alert is raised on second call. TestLib::system_or_bail('./check_pgactivity', '--service' => 'archiver', '--username' => getlogin, '--host' => $node->host, '--port' => $node->port, '--status-file' => $pga_data, '--format' => 'human' ) if $node->version < 10; @stdout = ( qr/^Service *: POSTGRES_ARCHIVER$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: 1 WAL files ready to archive$/m, qr/^Message *: archiver failing on 000000010000000000000002$/m, qr/^Long message *: 000000010000000000000002 not archived since (?:\ds|last check)$/m, qr/^Perfdata *: ready_archive=1 min=0$/m, qr/^Perfdata *: oldest_ready_wal=\ds min=0$/m ); $node->command_checks_all( [ './check_pgactivity', '--service' => 'archiver', '--username' => getlogin, '--status-file' => $pga_data, '--format' => 'human' ], 2, \@stdout, [ qr/^$/ ], 'failing archiver with superuser' ); # For PostgreSQL 10+, we now create a non-superuser monitoring role SKIP: { skip "checking with non superuser role is not supported before v10", 8 if $node->version < 10; $node->psql('postgres', 'create role check_pga login'); $node->psql('postgres', 'grant pg_monitor to check_pga'); $node->psql('postgres', 'grant execute on function pg_catalog.pg_stat_file(text) to check_pga'); # With pg10, the perfdata oldest_ready_wal cannot be computed, thus is not # present in the perfdata. @stdout = ( qr/^Service *: POSTGRES_ARCHIVER$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: 1 WAL files ready to archive$/m, qr/^Message *: archiver failing on 000000010000000000000002$/m, qr/^Long message *: 000000010000000000000002 not archived since (?:\ds|last check)$/m, qr/^Perfdata *: ready_archive=1 min=0$/m, ); # For pg11+, oldest_ready_wal is always present. push @stdout, ( qr/^Perfdata *: oldest_ready_wal=\ds min=0$/m ) unless $node->version < 11; $node->command_checks_all( [ './check_pgactivity', '--service' => 'archiver', '--username' => 'check_pga', '--status-file' => $pga_data, '--format' => 'human' ], 2, \@stdout, [ qr/^$/ ], 'failing archiver with non-superuser' ); } ### End of tests ### $node->stop( 'immediate' ); ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-autovacuum.t�����������������������������������������������������������0000664�0000000�0000000�00000004716�14504266471�0020406�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use Test::More; my $node = pgNode->get_new_node('prod'); my $num_tests = 12; my $wal; # we have $num_tests normal tests + three tests for incompatible pg versions plan tests => $num_tests + 3; ### Beginning of tests ### $node->init; # Tests for PostreSQL 8.0 and before SKIP: { skip "testing incompatibility with PostgreSQL 8.0 and before", 3 unless $node->version <= 8.0; $node->start; $node->command_checks_all( [ './check_pgactivity', '--service' => 'autovacuum', '--username' => getlogin, '--format' => 'human' ], 1, [ qr/^$/ ], [ qr/^Service autovacuum is not compatible with host/ ], 'non compatible PostgreSQL version' ); } # Tests for PostreSQL 8.1 and after SKIP: { my @stdout; skip "these tests requires PostgreSQL 8.1 and after", $num_tests unless $node->version >= 8.1; if ($node->version < 8.3) { $node->append_conf('postgresql.conf', qq{autovacuum = on\n} .qq{stats_row_level = on} ); } $node->start; @stdout = ( qr/^Service *: POSTGRES_AUTOVACUUM$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: Number of autovacuum: [0-3]$/m, qr/^Perfdata *: VACUUM_FREEZE=[0-3]$/m, qr/^Perfdata *: VACUUM_ANALYZE=[0-3]$/m, qr/^Perfdata *: VACUUM=[0-3]$/m, qr/^Perfdata *: ANALYZE=[0-3]$/m, qr/^Perfdata *: oldest_autovacuum=(NaN|\d+)s$/m, ); SKIP: { skip "No max_worker before PgSQL 8.3", 1 if $node->version < 8.3; push @stdout, qr/^Perfdata *: max_workers=3$/m; } SKIP: { skip "No autovacuum brin summarize before PgSQL 10", 1 if $node->version < 10; push @stdout, qr/^Perfdata *: BRIN_SUMMARIZE=[0-3]$/m; } $node->command_checks_all( [ './check_pgactivity', '--service' => 'autovacuum', '--username' => getlogin, '--format' => 'human' ], 0, \@stdout, [ qr/^$/ ], 'basic check without thresholds' ); $node->stop( 'immediate' ); } ### End of tests ### ��������������������������������������������������check_pgactivity-REL2_7/t/01-backends.t�������������������������������������������������������������0000664�0000000�0000000�00000010447�14504266471�0017765�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use pgSession; use TestLib (); use IPC::Run (); use Test::More tests => 34; my $node = pgNode->get_new_node('prod'); my @timer; my @in; my @out; my @procs; $node->init; $node->append_conf('postgresql.conf', 'max_connections=8'); $node->start; ### Beginning of tests ### # failing without thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends', '--username' => getlogin, '--format' => 'human' ], 127, [ qr/^$/ ], [ qr/^FATAL: you must specify critical and warning thresholds.$/m ], 'failing without thresholds' ); # basic check $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '4', '--critical' => '5' ], 0, [ qr/^Service *: POSTGRES_BACKENDS$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 connections on 5$/m, qr/^Perfdata *: template1=1 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: maximum_connections=5 min=0 max=5$/m ], [ qr/^$/ ], 'basic check' ); # two sessions on two different db TestLib::system_or_bail('createdb', '--host' => $node->host, '--port' => $node->port, 'testdb' ); push @procs, pgSession->new($node, 'testdb'); $procs[0]->query('select pg_sleep(60)', 60); # wait for backend to be connected and active $node->poll_query_until('template1', q{ SELECT query_start < now() FROM pg_catalog.pg_stat_activity WHERE datname = 'testdb' LIMIT 1 }); $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '4', '--critical' => '5' ], 0, [ qr/^Service *: POSTGRES_BACKENDS$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 2 connections on 5$/m, qr/^Perfdata *: template1=1 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: testdb=1 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: maximum_connections=5 min=0 max=5$/m ], [ qr/^$/ ], 'two sessions' ); # add two new backends and test warning push( @procs, pgSession->new($node) ) for 1..2; $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '4', '--critical' => '5' ], 1, [ qr/^Service *: POSTGRES_BACKENDS$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: 4 connections on 5$/m, qr/^Perfdata *: template1=3 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: testdb=1 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: maximum_connections=5 min=0 max=5$/m ], [ qr/^$/ ], 'warning with four sessions' ); # add a new backends and test critical push @procs, pgSession->new($node); $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '4', '--critical' => '5' ], 2, [ qr/^Service *: POSTGRES_BACKENDS$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: 5 connections on 5$/m, qr/^Perfdata *: template1=4 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: testdb=1 warn=4 crit=5 min=0 max=5$/m, qr/^Perfdata *: maximum_connections=5 min=0 max=5$/m ], [ qr/^$/ ], 'critical with five sessions' ); ### End of tests ### # stop immediate to kill any remaining backends $node->stop('immediate'); �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-backends_status.t������������������������������������������������������0000664�0000000�0000000�00000010436�14504266471�0021366�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use pgSession; use TestLib (); use IPC::Run (); use Test::More; my $node = pgNode->get_new_node('prod'); my @timer; my @in; my @out; my @procs; my @stdout; $node->init; $node->append_conf('postgresql.conf', 'max_connections=8'); $node->start; ### Beginning of tests ### # basic check without thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends_status', '--username' => getlogin, '--format' => 'human' ], 0, [ qr/^Service *: POSTGRES_BACKENDS_STATUS$/m, qr/^Returns *: 0 \(OK\)$/m, ], [ qr/^$/ ], 'OK without thresholds' ); # basic check with thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends_status', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => 'idle_xact=4s', '--critical' => 'idle_xact=5s' ], 0, [ qr/^Service *: POSTGRES_BACKENDS_STATUS$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 backend connected$/m, qr/^Perfdata *: active=1$/m, qr/^Perfdata *: oldest active=0s$/m, qr/^Perfdata *: disabled=0$/m, qr/^Perfdata *: fastpath function call=0$/m, qr/^Perfdata *: oldest fastpath function call=0s$/m, qr/^Perfdata *: idle=0$/m, qr/^Perfdata *: oldest idle=0s$/m, qr/^Perfdata *: idle in transaction=0$/m, qr/^Perfdata *: oldest idle in transaction=0s warn=4s crit=5s min=\d max=\d$/m, qr/^Perfdata *: idle in transaction \(aborted\)=0$/m, qr/^Perfdata *: oldest idle in transaction \(aborted\)=0s$/m, qr/^Perfdata *: insufficient privilege=0$/m, qr/^Perfdata *: undefined=0$/m, qr/^Perfdata *: waiting for lock=0$/m, qr/^Perfdata *: oldest waiting for lock=0s$/m, ], [ qr/^$/ ], 'basic check with threshold and check presence of all perfdata' ); # two sessions on two different db TestLib::system_or_bail('createdb', '--host' => $node->host, '--port' => $node->port, 'testdb' ); push @procs, pgSession->new($node, 'testdb') for 1..3; $procs[0]->query('select pg_sleep(60)', 60); $procs[1]->query('BEGIN',0); # wait for backend to be connected and active $node->poll_query_until('template1', q{ SELECT query_start IS NOT NULL -- < now() FROM pg_catalog.pg_stat_activity WHERE datname = 'testdb' LIMIT 1 }); $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends_status', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => 'active=3', '--critical' => 'active=4' ], 0, [ qr/^Service *: POSTGRES_BACKENDS_STATUS$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 4 backend connected$/m, qr/^Perfdata *: idle=1$/m, qr/^Perfdata *: idle in transaction=1$/m, qr/^Perfdata *: active=2 warn=3 crit=4 min=\d max=\d$/m, ], [ qr/^$/ ], 'three sessions, one active, one idlexact, one idle, OK' ); $node->command_checks_all( [ './check_pgactivity', '--service' => 'backends_status', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => 'active=1', '--critical' => 'active=2' ], 2, [ qr/^Service *: POSTGRES_BACKENDS_STATUS$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: 2 active$/m, qr/^Perfdata *: idle=1$/m, qr/^Perfdata *: idle in transaction=1$/m, qr/^Perfdata *: active=2 warn=1 crit=2 min=\d max=\d$/m, ], [ qr/^$/ ], 'three sessions, one active, one idlexact, one idle, Critical' ); done_testing(); exit; ### End of tests ### # stop immediate to kill any remaining backends $node->stop('immediate'); ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-backup_label_age.t�����������������������������������������������������0000664�0000000�0000000�00000006635�14504266471�0021437�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use pgSession; use TestLib (); use IPC::Run (); use Test::More; use PostgresVersion; my $node = pgNode->get_new_node('prod'); my @timer; my @in; my @out; my @procs; my $pgversion; $node->init; $node->start; $pgversion=PostgresVersion->new($node->version()); $node->stop('immediate'); ### Settings according to PostgreSQL version if ($pgversion ge "9.6") { $node->append_conf('postgresql.conf', 'wal_level=replica'); } elsif ($pgversion ge "9.0") { $node->append_conf('postgresql.conf', 'wal_level=archive'); } $node->append_conf('postgresql.conf', 'archive_mode=on'); $node->append_conf('postgresql.conf', 'archive_command=\'/bin/true\''); $node->start; ### Beginning of tests ### # basic check without thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'backup_label_age', '--username' => getlogin, '--format' => 'human' ], 127, [ qr/^$/ ], [ qr/^FATAL: you must specify critical and warning thresholds.$/m ], 'failing without thresholds' ); # first check, with no backup being performed $node->command_checks_all( [ './check_pgactivity', '--service' => 'backup_label_age', '--username' => getlogin, '--format' => 'human', '--warning' => '5s', '--critical' => '10s' ], 0, [ qr/^Service *: POSTGRES_BACKUP_LABEL_AGE$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: backup_label file absent$/m, qr/^Perfdata *: age=0s warn=5 crit=10$/m, ], [ qr/^$/ ], 'basic check' ); # The following tests cases are only valid for pg<15. # Since exclusive backups were deprecated in pg15, we ignore this part # starting this release. if ($pgversion ge 15) { # stop immediate to kill any remaining backends $node->stop('immediate'); done_testing(); exit 0; } push @procs, pgSession->new($node, 'postgres'); $procs[0]->query('SELECT pg_start_backup(\'check_pga\')'); sleep 1; # first check, with no backup being performed $node->command_checks_all( [ './check_pgactivity', '--service' => 'backup_label_age', '--username' => getlogin, '--format' => 'human', '--warning' => '5s', '--critical' => '10s' ], 0, [ qr/^Service *: POSTGRES_BACKUP_LABEL_AGE$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: backup_label file present \(age: \ds\)$/m, qr/^Perfdata *: age=\ds warn=5 crit=10$/m, ], [ qr/^$/ ], 'basic check with exclusive backup' ); sleep 3; $node->command_checks_all( [ './check_pgactivity', '--service' => 'backup_label_age', '--username' => getlogin, '--format' => 'human', '--warning' => '2s', '--critical' => '10s' ], 1, [ qr/^Service *: POSTGRES_BACKUP_LABEL_AGE$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: age: \ds$/m, qr/^Perfdata *: age=\ds warn=2 crit=10$/m, ], [ qr/^$/ ], 'warn with exclusive backup' ); $procs[0]->query('SELECT pg_stop_backup()'); # stop immediate to kill any remaining backends $node->stop('immediate'); done_testing(); ���������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-connection.t�����������������������������������������������������������0000664�0000000�0000000�00000003235�14504266471�0020347�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use Test::More tests => 8; my $node = pgNode->new('prod'); # declare instance named "prod" # create the instance and start it $node->init; $node->start; ### Beginning of tests ### # This command sets PGHOST and PGPORT, then call and test the given command $node->command_checks_all( [ # command to run './check_pgactivity', '--service' => 'connection', '--username' => getlogin ], # expected return code 0, # array of regex matching expected standard output [ qr/^POSTGRES_CONNECTION OK: Connection successful at [-+:\. \d]+, on PostgreSQL [\d\.]+.*$/ ], # array of regex matching expected error output [ qr/^$/ ], # a name for this test 'connection successful' ); # Failing to connect # TODO: should stdout only report the user message and stderr the psql error? $node->command_checks_all( [ './check_pgactivity', '--service' => 'connection', '--port' => $node->port -1, # wrong port '--username' => getlogin ], 2, [ qr/^CHECK_PGACTIVITY CRITICAL: Query failed !$/m, # v12 and after adds " error:" in output qr/^psql:(?: error:)? (connection to server .* failed|could not connect to server):/m, qr/^\s*Is the server running locally and accepting/m ], [ qr/^$/ ], 'connection failing' ); ### End of tests ### $node->stop( 'immediate' ); �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-hit_ratio.t������������������������������������������������������������0000664�0000000�0000000�00000006337�14504266471�0020200�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use TestLib (); use Test::More tests => 20; my $node = pgNode->get_new_node('prod'); my $pga_data = "$TestLib::tmp_check/pga.data"; $node->init; $node->append_conf('postgresql.conf', 'stats_block_level = on') if $node->version < 8.3; $node->start; ### Beginning of tests ### # Check thresholds only accept percentages $node->command_checks_all( [ './check_pgactivity', '--service' => 'hit_ratio', '--username' => getlogin, '--format' => 'human', '--status-file' => $pga_data, '--warning' => '101', '--critical' => '0' ], 127, [ qr/^$/ ], [ qr/^FATAL: critical and warning thresholds only accept percentages.$/m ], 'Check percentage thresholds' ); # First check. Returns no perfdata # Even with ridiculous threshold, no alert is possible during the first call $node->command_checks_all( [ './check_pgactivity', '--service' => 'hit_ratio', '--username' => getlogin, '--format' => 'human', '--status-file' => $pga_data, '--warning' => '101%', '--critical' => '0%' ], 0, [ qr/^Service *: POSTGRES_HIT_RATIO$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 database\(s\) checked$/m, ], [ qr/^$/ ], 'first basic check' ); # The hit ratio is computed relatively to the previous check. # We need to wait at least 1 second to avoid a NaN as a ratio sleep 1; # Ridiculous thresholds to trigger a warning $node->command_checks_all( [ './check_pgactivity', '--service' => 'hit_ratio', '--username' => getlogin, '--format' => 'human', '--status-file' => $pga_data, '--warning' => '101%', '--critical' => '0%' ], 1, [ qr/^Service *: POSTGRES_HIT_RATIO$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: postgres: [\d.]+%$/m, qr/^Perfdata *: postgres=[\d.]+% warn=101 crit=0$/m, ], [ qr/^$/ ], 'Warning check' ); # Ridiculous thresholds to trigger a critical $node->command_checks_all( [ './check_pgactivity', '--service' => 'hit_ratio', '--username' => getlogin, '--format' => 'human', '--status-file' => $pga_data, '--warning' => '110%', '--critical' => '101%' ], 2, [ qr/^Service *: POSTGRES_HIT_RATIO$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: postgres: [\d.]+%$/m, qr/^Perfdata *: postgres=[\d.]+% warn=110 crit=101$/m, ], [ qr/^$/ ], 'Critical check' ); ### End of tests ### # stop immediate to kill any remaining backends $node->stop('immediate'); �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-idle_xact.t������������������������������������������������������������0000664�0000000�0000000�00000014321�14504266471�0020142�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use pgSession; use Test::More tests => 40; my $node = pgNode->get_new_node('prod'); my $proc; $node->init; $node->start; ### Beginning of tests ### # failing without thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human' ], 127, [ qr/^$/ ], [ qr/^FATAL: you must specify critical and warning thresholds.$/m ], 'failing without thresholds' ); # Tests for PostreSQL 8.2 and before SKIP: { skip "testing incompatibility with PostgreSQL 8.2 and before", 3 if $node->version >= 8.3; $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--warning' => '30s', '--critical' => '1h' ], 1, [ qr/^$/ ], [ qr/^Service oldest_idlexact is not compatible with host/ ], 'non compatible PostgreSQL version' ); } SKIP: { skip "incompatible tests with PostgreSQL < 8.3", 34 if $node->version < 8.3; # basic check $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '30s', '--critical' => '1h' ], 0, [ qr/^Service *: POSTGRES_OLDEST_IDLEXACT$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 0 idle transaction\(s\)$/m, qr/^Perfdata *: template1 # idle xact=0$/m ], [ qr/^$/ ], 'basic check' ); # unit check $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '30m', '--critical' => '1h' ], 0, [ qr/^Service *: POSTGRES_OLDEST_IDLEXACT$/m, qr/^Perfdata : template1 avg=NaNs warn=1800 crit=3600$/m, ], [ qr/^$/ ], 'unit check' ); $proc = pgSession->new( $node, 'postgres' ); $proc->query( 'BEGIN', 1 ); $proc->query( 'SELECT txid_current()', 1 ); # OK check $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--dbname' => 'postgres', '--warning' => '3s', '--critical' => '1h' ], 0, [ qr/^Service *: POSTGRES_OLDEST_IDLEXACT$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 idle transaction\(s\)$/m, qr/^Perfdata *: postgres # idle xact=1$/m ], [ qr/^$/ ], 'OK check' ); # wait for transaction to be idle for more than 3 seconds $node->poll_query_until( 'template1', q{ SELECT current_timestamp - xact_start > interval '3s' FROM pg_catalog.pg_stat_activity WHERE datname = 'postgres' AND xact_start IS NOT NULL LIMIT 1 }); # warning check $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--dbname' => 'postgres', '--warning' => '2s', '--critical' => '1h' ], 1, [ qr/^Service *: POSTGRES_OLDEST_IDLEXACT$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: 1 idle transaction\(s\)$/m, qr/^Perfdata *: postgres # idle xact=1$/m ], [ qr/^$/ ], 'warning check' ); # critical check $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '1s', '--critical' => '2s' ], 2, [ qr/^Service *: POSTGRES_OLDEST_IDLEXACT$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: 1 idle transaction\(s\)$/m, qr/^Perfdata *: postgres # idle xact=1$/m ], [ qr/^$/ ], 'critical check' ); SKIP: { skip "active xact are not detected before 9.2", 6 if $node->version < 9.2; # Emit one query and check that check_pga does not emit a warning or critical $proc->query( 'SELECT count(*) FROM pg_class', 1 ); # active transaction check $node->command_checks_all( [ './check_pgactivity', '--service' => 'oldest_idlexact', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '2s', '--critical' => '1h' ], 0, [ qr/^Service *: POSTGRES_OLDEST_IDLEXACT$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 idle transaction\(s\)$/m, qr/^Perfdata *: postgres # idle xact=1$/m ], [ qr/^$/ ], 'active transaction check' ); } } ### End of tests ### # stop immediate to kill any remaining backends $node->stop( 'immediate' ); ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-last_analyze.t���������������������������������������������������������0000664�0000000�0000000�00000012763�14504266471�0020704�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use TestLib (); use Test::More tests => 33; my $node = pgNode->get_new_node('prod'); my $pga_data = "$TestLib::tmp_check/pga.data"; my $stdout; my @stdout; $node->init; $node->append_conf('postgresql.conf', 'stats_row_level = on') if $node->version < 8.3; $node->start; ### Beginning of tests ### # failing without thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'last_analyze', '--username' => getlogin, '--format' => 'human' ], 127, [ qr/^$/ ], [ qr/^FATAL: you must specify critical and warning thresholds.$/m ], 'failing without thresholds' ); TestLib::system_or_bail('createdb', '--host' => $node->host, '--port' => $node->port, 'testdb' ); # test database with no tables $node->command_checks_all( [ './check_pgactivity', '--service' => 'last_analyze', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--status-file' => $pga_data, '--warning' => '1h', '--critical' => '10d' ], 0, [ qr/^Service *: POSTGRES_LAST_ANALYZE$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 database\(s\) checked$/m, qr/^Perfdata *: testdb=NaNs warn=3600 crit=864000$/m, ], [ qr/^$/ ], 'database with no tables' ); # test database with one table never analyzed # we must track the stat activity on pg_class to make sure there was some stat # activity to avoid the check_pga shortcut when no activity. ($stdout) = $node->psql('testdb', q{ SELECT n_tup_ins FROM pg_stat_sys_tables WHERE relname = 'pg_class' }); $node->psql('testdb', 'CREATE TABLE foo (bar INT PRIMARY KEY)'); $node->poll_query_until('testdb', qq{ SELECT n_tup_ins > $stdout FROM pg_stat_sys_tables WHERE relname = 'pg_class' }); @stdout = ( qr/^Service *: POSTGRES_LAST_ANALYZE$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: testdb: Infinity$/m, qr/^Perfdata *: testdb=Infinitys warn=3600 crit=864000$/m ); SKIP: { # skip **all** the tests in this files about analyze counts if < 9.1, # not just the two following below, so we avoid repeating this SKIP block. skip "No analyze counts PgSQL 9.1", 6 if $node->version < 9.1; push @stdout, ( qr/^Perfdata *: testdb analyze=0$/m, qr/^Perfdata *: testdb autoanalyze=0$/m ); } $node->command_checks_all( [ './check_pgactivity', '--service' => 'last_analyze', '--username' => getlogin, '--format' => 'human', '--dbname' => 'testdb', '--status-file' => $pga_data, '--warning' => '1h', '--critical' => '10d' ], 2, \@stdout, [ qr/^$/ ], 'database with one table never analyzed' ); # test database with two tables, only one never analyzed $node->psql('testdb', 'CREATE TABLE titi (grosminet INT PRIMARY KEY)'); $node->psql('testdb', 'INSERT INTO titi SELECT generate_series(1,1000)'); $node->psql('testdb', 'ANALYZE titi'); $node->poll_query_until('testdb', q{ SELECT last_analyze IS NOT NULL FROM pg_catalog.pg_stat_user_tables WHERE relname = 'titi' }); @stdout = ( qr/^Service *: POSTGRES_LAST_ANALYZE$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: testdb: Infinity$/m, qr/^Perfdata *: testdb=Infinitys warn=3600 crit=864000$/m ); push @stdout, ( qr/^Perfdata *: testdb analyze=1$/m, qr/^Perfdata *: testdb autoanalyze=0$/m ) if $node->version >= 9.1; $node->command_checks_all( [ './check_pgactivity', '--service' => 'last_analyze', '--username' => getlogin, '--format' => 'human', '--dbname' => 'testdb', '--status-file' => $pga_data, '--warning' => '1h', '--critical' => '10d' ], 2, \@stdout, [ qr/^$/ ], 'database with two tables, one never analyzed' ); # test database with two tables, both analyzed $node->psql('testdb', 'ANALYZE foo'); $node->poll_query_until('testdb', q{ SELECT count(last_analyze) = 2 FROM pg_catalog.pg_stat_user_tables WHERE relname IN ('foo', 'titi') }); @stdout = ( qr/^Service *: POSTGRES_LAST_ANALYZE$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: 1 database\(s\) checked$/m, qr/^Perfdata *: testdb=.*s warn=3600 crit=864000$/m ); push @stdout, ( qr/^Perfdata *: testdb analyze=1$/m, qr/^Perfdata *: testdb autoanalyze=0$/m ) if $node->version >= 9.1; $node->command_checks_all( [ './check_pgactivity', '--service' => 'last_analyze', '--username' => getlogin, '--format' => 'human', '--dbname' => 'testdb', '--status-file' => $pga_data, '--warning' => '1h', '--critical' => '10d' ], 0, \@stdout, [ qr/^$/ ], 'test database with two tables, both analyzed' ); ### End of tests ### $node->stop( 'immediate' ); �������������check_pgactivity-REL2_7/t/01-pga_version.t����������������������������������������������������������0000664�0000000�0000000�00000004730�14504266471�0020525�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use TestLib; use Test::More tests => 18; ### Beginning of tests ### my $good_version = '2.7'; my $bad_version = '0.0'; my $not_version = 'whatever'; command_checks_all( [ # command to run './check_pgactivity', '--service' => 'pga_version', '--warning' => $good_version ], # expected return code 0, # array of regex matching expected standard output [ qr/PGACTIVITY_VERSION OK: check_pgactivity $good_version, Perl [\d\.]+/ ], # array of regex matching expected error output [ qr/^$/ ], # a name for this test 'pga_version OK using --warning' ); command_checks_all( [ './check_pgactivity', '--service' => 'pga_version', '--critical' => $good_version ], 0, [ qr/^PGACTIVITY_VERSION OK: check_pgactivity $good_version, Perl [\d\.]+$/ ], [ qr/^$/ ], 'pga_version OK using --critical' ); command_checks_all( [ './check_pgactivity', '--service' => 'pga_version', '--warning' => $bad_version ], 1, [ qr/^PGACTIVITY_VERSION WARNING: check_pgactivity $good_version \(should be $bad_version!\), Perl [\d\.]+$/ ], [ qr/^$/ ], 'pga_version failing using --warning' ); command_checks_all( [ './check_pgactivity', '--service' => 'pga_version', '--critical' => $bad_version ], 2, [ qr/^PGACTIVITY_VERSION CRITICAL: check_pgactivity $good_version \(should be $bad_version!\), Perl [\d\.]+$/ ], [ qr/^$/ ], 'pga_version failing using --critical' ); command_checks_all( [ './check_pgactivity', '--service' => 'pga_version', '--warning' => $good_version, '--critical' => $good_version ], 127, [ qr/^$/ ], [ qr/^FATAL: you must provide a warning or a critical threshold for service pga_version!$/m ], 'pga_version error with both --warning and --critical' ); command_checks_all( [ './check_pgactivity', '--service' => 'pga_version', '--critical' => $not_version ], 127, [ qr/^$/ ], [ qr/^FATAL: given version does not look like a check_pgactivity version!$/m ], 'pga_version error on wrong version format' ); ### End of tests ### ����������������������������������������check_pgactivity-REL2_7/t/01-sequences_exhausted.t��������������������������������������������������0000664�0000000�0000000�00000005411�14504266471�0022253�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use pgSession; use TestLib (); use IPC::Run (); use Test::More; use PostgresVersion; my $node = pgNode->get_new_node('prod'); my @timer; my @in; my @out; my @procs; my $pgversion; $node->init; $node->start; ### Beginning of tests ### # basic check without thresholds $node->command_checks_all( [ './check_pgactivity', '--service' => 'sequences_exhausted', '--username' => getlogin, '--format' => 'human' ], 127, [ qr/^$/ ], [ qr/^FATAL: you must specify critical and warning thresholds.$/m ], 'failing without thresholds' ); TestLib::system_or_bail('createdb', '--host' => $node->host, '--port' => $node->port, 'testdb' ); $node->psql('testdb', 'CREATE TABLE test1 (i smallint PRIMARY KEY)'); $node->psql('testdb', 'CREATE SEQUENCE test1seq INCREMENT BY 8000 START WITH 16000 OWNED BY test1.i'); # As the sequence is new, set its first value $node->psql('testdb', "SELECT nextval('test1seq')") ; $node->command_checks_all( [ './check_pgactivity', '--service' => 'sequences_exhausted', '--username' => getlogin, '--format' => 'human', '--warning' => '50%', '--critical' => '90%' ], 0, [ qr/^Service *: POSTGRES_CHECK_SEQ_EXHAUSTED$/m, qr/^Returns *: 0 \(OK\)$/m, ], [ qr/^$/ ], 'basic check' ); $node->psql('testdb', "SELECT nextval('test1seq')"); $node->command_checks_all( [ './check_pgactivity', '--service' => 'sequences_exhausted', '--username' => getlogin, '--format' => 'human', '--warning' => '50%', '--critical' => '90%' ], 1, [ qr/^Service *: POSTGRES_CHECK_SEQ_EXHAUSTED$/m, qr/^Returns *: 1 \(WARNING\)$/m, ], [ qr/^$/ ], 'check warning' ); $node->psql('testdb', "SELECT nextval('test1seq')"); $node->command_checks_all( [ './check_pgactivity', '--service' => 'sequences_exhausted', '--username' => getlogin, '--format' => 'human', '--warning' => '50%', '--critical' => '90%' ], 2, [ qr/^Service *: POSTGRES_CHECK_SEQ_EXHAUSTED$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, ], [ qr/^$/ ], 'check critical' ); # stop immediate to kill any remaining backends $node->stop('immediate'); done_testing(); �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-streaming_delta.t������������������������������������������������������0000664�0000000�0000000�00000031144�14504266471�0021352�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use Test::More; my $num_tests = 118; # we have $num_tests normal tests + three tests for incompatible pg versions plan tests => $num_tests + 3; # declare instance named "prim" my $prim = pgNode->get_new_node('prim'); # declare standby instances "sec1" and "sec2" my $stb1 = pgNode->get_new_node('sec1'); my $stb2 = pgNode->get_new_node('sec2'); my $backup = 'backup'; # backup name my $pgversion; $pgversion = $prim->version; note "testing on version $pgversion"; # Tests for PostreSQL 9.0 and before SKIP: { # "skip" allows to ignore the whole bloc based on the given a condition skip "skip non-compatible test on PostgreSQL 9.0 and before", 3 unless $pgversion <= '9.0'; $prim->init; $prim->start; $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--format' => 'human' ], 1, [ qr/^$/ ], [ qr/^Service streaming_delta is not compatible with host/ ], 'non compatible PostgreSQL version' ); } # Tests for PostreSQL 9.1 and after SKIP: { skip "these tests requires PostgreSQL 9.1 and after", $num_tests unless $pgversion >= '9.1'; # create primary and start it $prim->init(allows_streaming => 1); $prim->start; note("primary started"); # create backup $prim->backup($backup); note("backup done"); # create standby from backup and start it $stb1->init_from_backup($prim, $backup, has_streaming => 1); $stb1->start; note("standby 1 started"); # create standby from backup and start it $stb2->init_from_backup($prim, $backup, has_streaming => 1); $stb2->start; note("standby 2 started"); # checkpoint to avoid waiting long time for the standby to catchup $prim->safe_psql('template1', 'checkpoint'); # wait for standbys to catchup $prim->wait_for_catchup($stb1, 'replay', $prim->lsn('insert')); $prim->wait_for_catchup($stb2, 'replay', $prim->lsn('insert')); note("standbys caught up"); ### Beginning of tests ### # Normal check with two standby note "Normal check with two standby"; $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--format' => 'human' ], 0, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 0 \(OK\)$/m, qr/Message *: 2 slaves checked$/m, qr/Perfdata *: sent delta sec1@=0B$/m, qr/Perfdata *: wrote delta sec1@=0B$/m, qr/Perfdata *: flushed delta sec1@=0B$/m, qr/Perfdata *: replay delta sec1@=0B$/m, qr/Perfdata *: pid sec1@=\d+$/m, qr/Perfdata *: sent delta sec2@=0B$/m, qr/Perfdata *: wrote delta sec2@=0B$/m, qr/Perfdata *: flushed delta sec2@=0B$/m, qr/Perfdata *: replay delta sec2@=0B$/m, qr/Perfdata *: pid sec2@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=2$/m ], [ qr/^$/ ], 'two standbys streaming' ); # Normal check excluding one note "Normal check with two standby, excluding one"; $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--exclude' => 'sec1', '--format' => 'human' ], 0, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 0 \(OK\)$/m, qr/Message *: 1 slaves checked$/m, qr/Perfdata *: sent delta sec2@=0B$/m, qr/Perfdata *: wrote delta sec2@=0B$/m, qr/Perfdata *: flushed delta sec2@=0B$/m, qr/Perfdata *: replay delta sec2@=0B$/m, qr/Perfdata *: pid sec2@=\d+$/m, qr/Perfdata *: # of excluded slaves=1$/m, qr/Perfdata *: # of slaves=2$/m ], [ qr/^$/ ], 'excluding one standby' ); # Normal check excluding both note "Normal check excluding both standby"; $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--exclude' => 'sec[12]', '--format' => 'human' ], 0, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 0 \(OK\)$/m, qr/Message *: 0 slaves checked$/m, qr/Perfdata *: # of excluded slaves=2$/m, qr/Perfdata *: # of slaves=2$/m ], [ qr/^$/ ], 'excluding one standby' ); # normal check with one explicit standby note "normal check with one explicit standby"; $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--slave' => 'sec1 ', '--format' => 'human' ], 0, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 0 \(OK\)$/m, qr/Message *: 2 slaves checked$/m, qr/Perfdata *: sent delta sec1@=0B$/m, qr/Perfdata *: wrote delta sec1@=0B$/m, qr/Perfdata *: flushed delta sec1@=0B$/m, qr/Perfdata *: replay delta sec1@=0B$/m, qr/Perfdata *: pid sec1@=\d+$/m, qr/Perfdata *: sent delta sec2@=0B$/m, qr/Perfdata *: wrote delta sec2@=0B$/m, qr/Perfdata *: flushed delta sec2@=0B$/m, qr/Perfdata *: replay delta sec2@=0B$/m, qr/Perfdata *: pid sec2@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=2$/m ], [ qr/^$/ ], 'one explicit standby' ); # failing check when called with an explicit standby not connected note "failing check when called with an explicit standby not connected"; $stb1->stop( 'fast' ); $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--slave' => 'sec1 ', '--format' => 'human' ], 2, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 2 \(CRITICAL\)$/m, qr/Message *: sec1 not connected$/m, qr/Perfdata *: sent delta sec2@=0B$/m, qr/Perfdata *: wrote delta sec2@=0B$/m, qr/Perfdata *: flushed delta sec2@=0B$/m, qr/Perfdata *: replay delta sec2@=0B$/m, qr/Perfdata *: pid sec2@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=1$/m ], [ qr/^$/ ], 'one failing explicit standby' ); # no standby connected note "no standby connected"; $stb2->stop( 'fast' ); $prim->command_checks_all( [ './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--format' => 'human' ], 3, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 3 \(UNKNOWN\)$/m, qr/Message *: No slaves connected$/m, ], [ qr/^$/ ], 'no standby connected' ); ## warning on flush note "warning on flush"; $stb1->start; $prim->wait_for_catchup($stb1, 'write', $prim->lsn('insert')); $prim->command_checks_all( [ 'perl', '-It/lib', '-MMocker::Streaming', './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--slave' => 'sec1 ', '--warning' => '512,4MB', '--critical' => '4MB,4MB', '--format' => 'human' ], 1, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 1 \(WARNING\)$/m, qr/Message *: warning flush lag: 2MB for sec1\@$/m, qr/Perfdata *: sent delta sec1@=0B$/m, qr/Perfdata *: wrote delta sec1@=1024kB$/m, qr/Perfdata *: flushed delta sec1@=2MB warn=512B crit=4MB$/m, qr/Perfdata *: replay delta sec1@=3MB warn=4MB crit=4MB$/m, qr/Perfdata *: pid sec1@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=1$/m ], [ qr/^$/ ], 'one explicit standby warning on flush lag' ); ## critical on flush note "critical on flush"; $prim->command_checks_all( [ 'perl', '-It/lib', '-MMocker::Streaming', './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--slave' => 'sec1 ', '--warning' => '512,4MB', '--critical' => '1MB,4MB', '--format' => 'human' ], 2, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 2 \(CRITICAL\)$/m, qr/Message *: critical flush lag: 2MB for sec1\@$/m, qr/Perfdata *: sent delta sec1@=0B$/m, qr/Perfdata *: wrote delta sec1@=1024kB$/m, qr/Perfdata *: flushed delta sec1@=2MB warn=512B crit=1024kB$/m, qr/Perfdata *: replay delta sec1@=3MB warn=4MB crit=4MB$/m, qr/Perfdata *: pid sec1@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=1$/m ], [ qr/^$/ ], 'one explicit standby critical on flush lag' ); ## warning on replay note "warning on replay"; $prim->command_checks_all( [ 'perl', '-It/lib', '-MMocker::Streaming', './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--slave' => 'sec1 ', '--warning' => '3MB,512', '--critical' => '4MB,4MB', '--format' => 'human' ], 1, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 1 \(WARNING\)$/m, qr/Message *: warning replay lag: 3MB for sec1\@$/m, qr/Perfdata *: sent delta sec1@=0B$/m, qr/Perfdata *: wrote delta sec1@=1024kB$/m, qr/Perfdata *: flushed delta sec1@=2MB warn=3MB crit=4MB$/m, qr/Perfdata *: replay delta sec1@=3MB warn=512B crit=4MB$/m, qr/Perfdata *: pid sec1@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=1$/m ], [ qr/^$/ ], 'one explicit standby warning on replay lag' ); ## critical on replay note "critical on replay"; $prim->command_checks_all( [ 'perl', '-It/lib', '-MMocker::Streaming', './check_pgactivity', '--service' => 'streaming_delta', '--username' => getlogin, '--slave' => 'sec1 ', '--warning' => '3MB,512', '--critical' => '4MB,2MB', '--format' => 'human' ], 2, [ qr/Service *: POSTGRES_STREAMING_DELTA$/m, qr/Returns *: 2 \(CRITICAL\)$/m, qr/Message *: critical replay lag: 3MB for sec1\@$/m, qr/Perfdata *: sent delta sec1@=0B$/m, qr/Perfdata *: wrote delta sec1@=1024kB$/m, qr/Perfdata *: flushed delta sec1@=2MB warn=3MB crit=4MB$/m, qr/Perfdata *: replay delta sec1@=3MB warn=512B crit=2MB$/m, qr/Perfdata *: pid sec1@=\d+$/m, qr/Perfdata *: # of excluded slaves=0$/m, qr/Perfdata *: # of slaves=1$/m ], [ qr/^$/ ], 'one explicit standby critical on replay lag' ); $stb1->stop( 'immediate' ); $stb2->stop( 'immediate' ); } # End of SKIP ### End of tests ### $prim->stop( 'immediate' ); ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/01-temp-files.t�����������������������������������������������������������0000664�0000000�0000000�00000041651�14504266471�0020261�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group use strict; use warnings; use lib 't/lib'; use pgNode; use pgSession; use Time::HiRes qw(usleep gettimeofday tv_interval); use Test::More tests => 143; my $node = pgNode->get_new_node('prod'); my $proc; my $t0; # use to avoid two check_pga calls within the same second. # See comment before first call of usleep. $node->init; $node->start; ### Beginning of tests ### # This service can run without thresholds # Tests for PostreSQL 8.1 and before SKIP: { skip "testing incompatibility with PostgreSQL 8.0 and before", 3 if $node->version >= 8.1; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', ], 1, [ qr/^$/ ], [ qr/^Service temp_files is not compatible with host/ ], 'non compatible PostgreSQL version' ); } SKIP: { skip "incompatible tests with PostgreSQL < 8.1", 34 if $node->version < 8.1; # basic check => Returns OK $t0 = [gettimeofday]; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', ], 0, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: [2-3] tablespace\(s\)\/database\(s\) checked$/m, ], [ qr/^$/ ], 'basic check' ); $t0 = [gettimeofday]; # unit test based on the file count => Returns OK # The query generates between 17.5MB (9.4) and 11,7MB (14) of WAL $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); # The added sleep ensures that two tests are not executed within the same # seconds. # The time difference is used by check_pga to compute the Fpm and Bpm # perfstats. As check_pga doesn work with sub-second time, if it is called # twice in the same second, it ends with division by zero error. # In consequence, this usleep recipe is repeated between each call of # check_pga. usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '3', '--critical' => '4' ], 0, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: [2-4] tablespace\(s\)\/database\(s\) checked$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files warn=3 crit=4$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files warn=3 crit=4$/m, qr/^Perfdata *: template1=0B$/m, ], [ qr/^$/ ], 'test file count OK' ); $t0 = [gettimeofday]; # unit test based on the file count => Returns WARN $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '1', '--critical' => '3' ], 1, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: postgres \(.* file\(s\)\/.*\)$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files warn=1 crit=3$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files warn=1 crit=3$/m, qr/^Perfdata *: template1=0B$/m, ], [ qr/^$/ ], 'test file count WARN' ); $t0 = [gettimeofday]; # unit test based on the file count => Returns CRIT $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '0', '--critical' => '1' ], 2, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: postgres \(.* file\(s\)\/.*\)$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files warn=0 crit=1$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files warn=0 crit=1$/m, qr/^Perfdata *: template1=0B$/m, ], [ qr/^$/ ], 'test file count CRIT' ); $t0 = [gettimeofday]; # unit test based on the file size => Returns OK $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '40MB', '--critical' => '50MB' ], 0, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: [2-4] tablespace\(s\)\/database\(s\) checked$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B warn=40MB crit=50MB$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files$/m, qr/^Perfdata *: template1=0B warn=40MB crit=50MB$/m, ], [ qr/^$/ ], 'test file size OK' ); $t0 = [gettimeofday]; # unit test based on the file size => Returns WARN $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '4MB', '--critical' => '40MB' ], 1, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: postgres \(.* file\(s\)\/.*\)$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B warn=4MB crit=40MB$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files$/m, qr/^Perfdata *: template1=0B warn=4MB crit=40MB$/m, ], [ qr/^$/ ], 'test file size WARN' ); $t0 = [gettimeofday]; # unit test based on the file size => Returns CRIT $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '4MB', '--critical' => '5MB' ], 2, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: postgres \(.* file\(s\)\/.*\)$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B warn=4MB crit=5MB$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files$/m, qr/^Perfdata *: template1=0B warn=4MB crit=5MB$/m, ], [ qr/^$/ ], 'test file count CRIT' ); $t0 = [gettimeofday]; # unit test based on the file size and count => Returns OK $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '3,49254kB', '--critical' => '4,65638kB' ], 0, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: [2-4] tablespace\(s\)\/database\(s\) checked$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files warn=3 crit=4$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B warn=48.102MB crit=64.102MB$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files warn=3 crit=4$/m, qr/^Perfdata *: template1=0B warn=48.102MB crit=64.102MB$/m, ], [ qr/^$/ ], 'test file size and count OK ' ); $t0 = [gettimeofday]; # unit test based on the file size and count => Returns WARN $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '1,16486kB', '--critical' => '3,49254kB' ], 1, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 1 \(WARNING\)$/m, qr/^Message *: postgres \(.* file\(s\)\/.*\)$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files warn=1 crit=3$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B warn=16.102MB crit=48.102MB$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files warn=1 crit=3$/m, qr/^Perfdata *: template1=0B warn=16.102MB crit=48.102MB$/m, ], [ qr/^$/ ], 'test file size and count WARN' ); $t0 = [gettimeofday]; # unit test based on the file size and count => Returns CRIT $node->psql('postgres', 'SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1;'); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', '--warning' => '0,0MB', '--critical' => '1,4MB' ], 2, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 2 \(CRITICAL\)$/m, qr/^Message *: postgres \(.* file\(s\)\/.*\)$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files warn=0 crit=1$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B warn=0B crit=4MB$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files warn=0 crit=1$/m, qr/^Perfdata *: template1=0B warn=0B crit=4MB$/m, ], [ qr/^$/ ], 'test file size and count CRIT' ); $t0 = [gettimeofday]; # unit test with a tablespace => Returns OK # * are the tempfiles located in the correct directory ? # * do we only account for temp files ? (cf issue #351) mkdir $node->basedir . '/tablespace1'; $node->psql('postgres', 'CREATE TABLESPACE myts1 LOCATION \'' . $node->basedir . '/tablespace1\';'); my $tbsp2 = $node->basedir . '/tablespace2'; mkdir $tbsp2; $node->psql('postgres', qq{CREATE TABLESPACE myts2 LOCATION '$tbsp2';}); # Create some tables in the tablespaces to make sure their files are not # reported as temp files (gh #351). $node->psql('postgres', 'CREATE TABLE matable0(x text);'); $node->psql('postgres', 'CREATE TABLE matable1(x text) TABLESPACE myts1;'); $node->psql('postgres', 'CREATE TABLE matable2(x text) TABLESPACE myts2;'); $node->psql('postgres', 'VACUUM;'); # Create one fake temp file in tablespace myts2 and make sure only one file # is reported be check_pga opendir my $dh, $tbsp2 || die "Can't opendir $tbsp2: $!"; my ($tbsp2_tmp) = grep { /PG_[.0-9]+_\d+/ } readdir($dh); $tbsp2_tmp = "$tbsp2/$tbsp2_tmp/pgsql_tmp"; close $dh; mkdir $tbsp2_tmp || die "Can't openmkdir $tbsp2_tmp: $!";; open my $fh, ">", "$tbsp2_tmp/pgsql_tmp1.1" || die "Can't open $tbsp2_tmp/pgsql_tmp1.1: $!"; print $fh "DATA"x1024; close $fh; open $fh, ">", "$tbsp2_tmp/pgsql_tmp1.2" || die "Can't open $tbsp2_tmp/pgsql_tmp1.2: $!"; print $fh "DATA"x1024; close $fh; $node->psql('postgres', q{ SET temp_tablespaces TO myts2; SELECT random() * x FROM generate_series(1,1000000) AS F(x) ORDER BY 1; }); ok(-f "$tbsp2_tmp/pgsql_tmp1.1", "temp file pgsql_tmp1.1 exists in tablespace myts2"); ok(-f "$tbsp2_tmp/pgsql_tmp1.2", "temp file pgsql_tmp1.2 exists in tablespace myts2"); usleep(100_000) while tv_interval($t0) < 1.01; $node->command_checks_all( [ './check_pgactivity', '--service' => 'temp_files', '--username' => getlogin, '--format' => 'human', '--dbname' => 'template1', ], 0, [ qr/^Service *: POSTGRES_TEMP_FILES$/m, qr/^Returns *: 0 \(OK\)$/m, qr/^Message *: [3-5] tablespace\(s\)\/database\(s\) checked$/m, qr/^Perfdata *: # files in myts2=2File$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: Total size in myts2=8kB$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*Fpm$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*Bpm$/m, qr/^Perfdata *: postgres=[1-9][0-9]*Files$/m, qr/^Perfdata *: postgres=[1-9][.0-9]*[kMGTPE]*B$/m, qr/^Perfdata *: template1=0Fpm$/m, qr/^Perfdata *: template1=0Bpm$/m, qr/^Perfdata *: template1=0Files$/m, qr/^Perfdata *: template1=0B$/m, ], [ qr/^$/ ], 'test with a tablespace' ); } ### End of tests ### # stop immediate to kill any remaining backends $node->stop( 'immediate' ); ���������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/02-general.t��������������������������������������������������������������0000664�0000000�0000000�00000005202�14504266471�0017622�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl # This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group # This file gathers various regression tests against the same cluster to avoid # creating one cluster per regression test. use strict; use warnings; use Storable ('store'); use lib 't/lib'; use pgNode; use TestLib (); use Test::More tests => 10; my $node = pgNode->get_new_node('prod'); $node->init; $node->append_conf('postgresql.conf', 'stats_block_level = on') if $node->version < 8.3; $node->start; ### Beginning of tests ### # == Regression test for #326 == # check_pga should not complain when using an existing status file # without an existing lock file. my $pga_data = "$TestLib::tmp_check/tmp-status-file.data"; # make sure there's no leftover files from previous tests... unlink $pga_data; unlink "${pga_data}.lock"; ok( ! -f $pga_data, "double check the status file does not exist" ); ok( ! -f "${pga_data}.lock", "double check the lock file does not exist" ); # First call to create the status and lock files $node->command_checks_all( [ './check_pgactivity', '--service' => 'hit_ratio', '--username' => getlogin, '--format' => 'human', '--status-file' => $pga_data, '--warning' => '101%', '--critical' => '0%' ], 0, [], [ qr/^$/ ], 'No error should occur' ); ok( -f $pga_data, "status file created from first check_pga call" ); ok( -f "${pga_data}.lock", "lock file created from first check_pga call" ); # Remove the lock file to trigger the failure described in issue #326 unlink( "${pga_data}.lock" ) or BAIL_OUT( "could not remove the lock file" ); ok( ! -f "${pga_data}.lock", "lock file removed" ); # The hit ratio is computed relatively to the previous check. # We need to wait at least 1 second to avoid a NaN as a ratio sleep 1; # trigger the failure described in issue #326 $node->command_checks_all( [ './check_pgactivity', '--service' => 'hit_ratio', '--username' => getlogin, '--format' => 'human', '--status-file' => $pga_data, '--warning' => '101%', '--critical' => '0%' ], 1, [], [ qr/^$/ ], 'No error should occur if the lock file is missing' ); ok( -f "${pga_data}.lock", "lock file created from second check_pga call" ); # cleanup everything for the next regression test unlink($pga_data, "${pga_data}.lock"); ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/README��������������������������������������������������������������������0000664�0000000�0000000�00000007117�14504266471�0016470�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������This folder contains all the tests related to `check_pgactivity`. # Environment setting The test are written using well known and used `Test::More` perl module. Below are instructions to setup basic environment to run them. CentOS 7: ~~~console # yum install -y git perl-core "perl(IPC::Run)" "perl(Test::More)" ~~~ CentOS 8: ~~~console # dnf install -y perl-core "perl(Test::More)" # dnf install -y --enablerepo=PowerTools "perl(IPC::Run)" ~~~ Debian 9 & 10: ~~~console # apt install -y libipc-run-perl ~~~ # Running tests The tests must be run from the root folder of the project using `perl` or `prove`. The tests are run against the PostgreSQL version available in your `PATH` and reported by `pg_config --version`. ~~~console check_pgactivity$ export PATH="/usr/pgsql-13/bin:$PATH" check_pgactivity$ perl t/01-connection.t 1..8 ok 1 - connection successful status (got 0 vs expected 0) [...] check_pgactivity$ prove t/10-streaming_delta.t t/10-streaming_delta.t .. ok All tests successful. Files=1, Tests=121, 3 wallclock secs ( 0.03 usr 0.01 sys + 2.09 cusr 0.52 csys = 2.65 CPU) Result: PASS check_pgactivity$ prove t/00-copyright-year.t ... ok t/01-archive_folder.t ... ok t/01-connection.t ....... ok t/01-pga_version.t ...... ok t/01-streaming_delta.t .. ok All tests successful. Files=5, Tests=206, 10 wallclock secs ( 0.03 usr 0.01 sys + 6.33 cusr 1.15 csys = 7.52 CPU) Result: PASS ~~~ # Logs If tests are failing, log files are kept under `tmp_check/log` folder. ~~~ # regression tests log check_pgactivity$ less tmp_check/log/regress_log_01-streaming_delta # PostgreSQL logs check_pgactivity$ less tmp_check/log/01-streaming_delta_prim.log ~~~ Make sure to clean or move away folder `tmp_check` before running new tests. # Devel The typical boilerplate to create a new test file is: ~~~perl use lib 't/lib'; use pgNode; # declare instance named "prod" my $node = pgNode->get_new_node('prod'); # create the instance and start it $node->init; $node->start; $node->command_checks_all( [ # command to run './check_pgactivity', '--service' => 'connection', '--username' => getlogin ], # expected return code 0, # array of regex matching expected standard output [ qr/^POSTGRES_CONNECTION OK: Connection successful at [-+:\. \d]+, on PostgreSQL [\d\.]+.*$/ ], # array of regex matching expected error output [ qr/^$/ ], # a name for this test 'connection successful' ); # stop instance as fast as possible $node->stop('immediate'); ~~~ Class `pgNode` is facet class creating and returning the appropriate PostgresNode object depending on the PostgreSQL backend version returned by `pg_config --version`. It helps extending the PostgresNode classes with new methods needed in our tests. Class PostgresNode comes from https://gitlab.com/adunstan/postgresnodeng/ which is currently a dead project. The class has been patched to fix some various incompatibilities with older PostgreSQL releases. As TAP test modules has moved a lot in PostgreSQL code during v14, we would need to check if it is possible to resync with upstream or just keep it that way. The PostgresNode class and methods are described in its embedded documentation. See: `perldoc t/lib/PostgresNode.pm`. Any method addition and other changes are documented in pgNode class. See: `perldoc t/lib/pgNode.pm`. Some of the methods in these classes are just wrappers around functions coming from TestLib, but adding some environment context of the instance (eg. setting `PGHOST`). See embeded documentation of TestLib for more details about these functions: `perldoc t/lib/TestLib.pm`. �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/����������������������������������������������������������������������0000775�0000000�0000000�00000000000�14504266471�0016350�5����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/COPYRIGHT.pgsql�������������������������������������������������������0000664�0000000�0000000�00000002250�14504266471�0020767�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������PostgreSQL Database Management System (formerly known as Postgres, then as Postgres95) Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group Portions Copyright (c) 1994, The Regents of the University of California Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/Mocker/���������������������������������������������������������������0000775�0000000�0000000�00000000000�14504266471�0017570�5����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/Mocker/Streaming.pm���������������������������������������������������0000664�0000000�0000000�00000002565�14504266471�0022067�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group package Streaming; use strict; use warnings; # This module is a simple wrapper around the "query" sub existing in # check_pgactivity script. Its purpose is to capture and edit query results # to test some part of the streaming_delta check. # # You must load it on check_pgactivity execution using eg.: # perl -It/ -MMocker::Streaming check_pgactivity --service streaming_delta CHECK { # keep reference to old query sub $main::{'query_orig'} = $main::{'query'}; # FIXME: check given query # install wrapper around query sub to capture and modify the result $main::{'query'} = sub { my $res; $res = $main::{'query_orig'}->(@_); return $res unless $_[1] =~ m/FROM pg_stat_replication/; # mock 1MB of write delta. # 2MB of flush delta. # 3MB of replay delta. # We don't mind the total WAL size (the X part of XXX/YYYYY) as the # test stay far below it. $res->[0][7] =~ m{^0/([0-9A-F]+)$}; $res->[0][4] = sprintf('0/%X', hex($1) - 1048576); $res->[0][5] = sprintf('0/%X', hex($1) - 2097152); $res->[0][6] = sprintf('0/%X', hex($1) - 3145728); return $res; }; } 1 �������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/PostgresNode.pm�������������������������������������������������������0000664�0000000�0000000�00000242731�14504266471�0021333�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������ =pod =head1 NAME PostgresNode - class representing PostgreSQL server instance =head1 SYNOPSIS use PostgresNode; my $node = PostgresNode->get_new_node('mynode'); # Create a data directory with initdb $node->init(); # Start the PostgreSQL server $node->start(); # Change a setting and restart $node->append_conf('postgresql.conf', 'hot_standby = on'); $node->restart(); # run a query with psql, like: # echo 'SELECT 1' | psql -qAXt postgres -v ON_ERROR_STOP=1 $psql_stdout = $node->safe_psql('postgres', 'SELECT 1'); # Run psql with a timeout, capturing stdout and stderr # as well as the psql exit code. Pass some extra psql # options. If there's an error from psql raise an exception. my ($stdout, $stderr, $timed_out); my $cmdret = $node->psql('postgres', 'SELECT pg_sleep(60)', stdout => \$stdout, stderr => \$stderr, timeout => 30, timed_out => \$timed_out, extra_params => ['--single-transaction'], on_error_die => 1) print "Sleep timed out" if $timed_out; # Similar thing, more convenient in common cases my ($cmdret, $stdout, $stderr) = $node->psql('postgres', 'SELECT 1'); # run query every second until it returns 't' # or times out $node->poll_query_until('postgres', q|SELECT random() < 0.1;|') or die "timed out"; # Do an online pg_basebackup my $ret = $node->backup('testbackup1'); # Take a backup of a running server my $ret = $node->backup_fs_hot('testbackup2'); # Take a backup of a stopped server $node->stop; my $ret = $node->backup_fs_cold('testbackup3') # Restore it to create a new independent node (not a replica) my $replica = get_new_node('replica'); $replica->init_from_backup($node, 'testbackup'); $replica->start; # Stop the server $node->stop('fast'); # Find a free, unprivileged TCP port to bind some other service to my $port = get_free_port(); =head1 DESCRIPTION PostgresNode contains a set of routines able to work on a PostgreSQL node, allowing to start, stop, backup and initialize it with various options. The set of nodes managed by a given test is also managed by this module. In addition to node management, PostgresNode instances have some wrappers around Test::More functions to run commands with an environment set up to point to the instance. The IPC::Run module is required. =cut package PostgresNode; use strict; use warnings; use Carp; use Config; use Cwd; use Exporter 'import'; use Fcntl qw(:mode); use File::Basename; use File::Path qw(rmtree); use File::Spec; use File::stat qw(stat); use File::Temp (); use IPC::Run; use PostgresVersion; use RecursiveCopy; use Socket; use Test::More; use TestLib (); use Time::HiRes qw(usleep); use Scalar::Util qw(blessed); our @EXPORT = qw( get_new_node get_free_port ); our ($use_tcp, $test_localhost, $test_pghost, $last_host_assigned, $last_port_assigned, @all_nodes, $died); INIT { # Set PGHOST for backward compatibility. This doesn't work for own_host # nodes, so prefer to not rely on this when writing new tests. $use_tcp = !$TestLib::use_unix_sockets; $test_localhost = "127.0.0.1"; $last_host_assigned = 1; $test_pghost = $use_tcp ? $test_localhost : TestLib::tempdir_short; $ENV{PGHOST} = $test_pghost; $ENV{PGDATABASE} = 'postgres'; # Tracking of last port value assigned to accelerate free port lookup. $last_port_assigned = int(rand() * 16384) + 49152; } # Current dev version, for which we have no subclass # When a new stable branch is made this and the subclass hierarchy below # need to be adjusted. my $devtip = 14; INIT { # sanity check to make sure there is a subclass for the last stable branch my $last_child = 'PostgresNodeV_' . ($devtip - 1); eval "${last_child}->can('get_new_node') || die('not found');"; die "No child package $last_child found" if $@; } =pod =head1 METHODS =over =item PostgresNode::new($class, $name, $pghost, $pgport) Create a new PostgresNode instance. Does not initdb or start it. You should generally prefer to use get_new_node() instead since it takes care of finding port numbers, registering instances for cleanup, etc. =cut sub new { my ($class, $name, $pghost, $pgport) = @_; my $testname = basename($0); $testname =~ s/\.[^.]+$//; my $self = { _port => $pgport, _host => $pghost, _basedir => "$TestLib::tmp_check/t_${testname}_${name}_data", _name => $name, _logfile_generation => 0, _logfile_base => "$TestLib::log_path/${testname}_${name}", _logfile => "$TestLib::log_path/${testname}_${name}.log" }; bless $self, $class; mkdir $self->{_basedir} or BAIL_OUT("could not create data directory \"$self->{_basedir}\": $!"); $self->dump_info; return $self; } =pod =item $node->port() Get the port number assigned to the host. This won't necessarily be a TCP port open on the local host since we prefer to use unix sockets if possible. Use $node->connstr() if you want a connection string. =cut sub port { my ($self) = @_; return $self->{_port}; } =pod =item $node->host() Return the host (like PGHOST) for this instance. May be a UNIX socket path. Use $node->connstr() if you want a connection string. =cut sub host { my ($self) = @_; return $self->{_host}; } =pod =item $node->basedir() The directory all the node's files will be within - datadir, archive directory, backups, etc. =cut sub basedir { my ($self) = @_; return $self->{_basedir}; } =pod =item $node->name() The name assigned to the node at creation time. =cut sub name { my ($self) = @_; return $self->{_name}; } =pod =item $node->logfile() Path to the PostgreSQL log file for this instance. =cut sub logfile { my ($self) = @_; return $self->{_logfile}; } =pod =item $node->connstr() Get a libpq connection string that will establish a connection to this node. Suitable for passing to psql, DBD::Pg, etc. =cut sub connstr { my ($self, $dbname) = @_; my $pgport = $self->port; my $pghost = $self->host; if (!defined($dbname)) { return "port=$pgport host=$pghost"; } # Escape properly the database string before using it, only # single quotes and backslashes need to be treated this way. $dbname =~ s#\\#\\\\#g; $dbname =~ s#\'#\\\'#g; return "port=$pgport host=$pghost dbname='$dbname'"; } =pod =item $node->group_access() Does the data dir allow group access? =cut sub group_access { my ($self) = @_; my $dir_stat = stat($self->data_dir); defined($dir_stat) or die('unable to stat ' . $self->data_dir); return (S_IMODE($dir_stat->mode) == 0750); } =pod =item $node->data_dir() Returns the path to the data directory. postgresql.conf and pg_hba.conf are always here. =cut sub data_dir { my ($self) = @_; my $res = $self->basedir; return "$res/pgdata"; } =pod =item $node->archive_dir() If archiving is enabled, WAL files go here. =cut sub archive_dir { my ($self) = @_; my $basedir = $self->basedir; return "$basedir/archives"; } =pod =item $node->backup_dir() The output path for backups taken with $node->backup() =cut sub backup_dir { my ($self) = @_; my $basedir = $self->basedir; return "$basedir/backup"; } =pod =item $node->info() Return a string containing human-readable diagnostic information (paths, etc) about this node. =cut sub info { my ($self) = @_; my $_info = ''; open my $fh, '>', \$_info or die; print $fh "Name: " . $self->name . "\n"; print $fh "Version: " . $self->{_pg_version} . "\n" if $self->{_pg_version}; print $fh "Data directory: " . $self->data_dir . "\n"; print $fh "Backup directory: " . $self->backup_dir . "\n"; print $fh "Archive directory: " . $self->archive_dir . "\n"; print $fh "Connection string: " . $self->connstr . "\n"; print $fh "Log file: " . $self->logfile . "\n"; print $fh "Install Path: ", $self->{_install_path} . "\n" if $self->{_install_path}; close $fh or die; return $_info; } =pod =item $node->dump_info() Print $node->info() =cut sub dump_info { my ($self) = @_; print $self->info; return; } # Internal method to set up trusted pg_hba.conf for replication. Not # documented because you shouldn't use it, it's called automatically if needed. sub set_replication_conf { my ($self) = @_; my $pgdata = $self->data_dir; $self->host eq $test_pghost or croak "set_replication_conf only works with the default host"; open my $hba, '>>', "$pgdata/pg_hba.conf"; print $hba "\n# Allow replication (set up by PostgresNode.pm)\n"; if ($TestLib::windows_os && !$TestLib::use_unix_sockets) { print $hba "host replication all $test_localhost/32 sspi include_realm=1 map=regress\n"; } close $hba; return; } # Internal method to set stat_temp_directory GUC. # Parameter stat_temp_directory removed in v15 sub set_stats_temp_directory { return } =pod =item $node->init(...) Initialize a new cluster for testing. Authentication is set up so that only the current OS user can access the cluster. On Unix, we use Unix domain socket connections, with the socket in a directory that's only accessible to the current user to ensure that. On Windows, we use SSPI authentication to ensure the same (by pg_regress --config-auth). WAL archiving can be enabled on this node by passing the keyword parameter has_archiving => 1. This is disabled by default. postgresql.conf can be set up for replication by passing the keyword parameter allows_streaming => 'logical' or 'physical' (passing 1 will also suffice for physical replication) depending on type of replication that should be enabled. This is disabled by default. The new node is set up in a fast but unsafe configuration where fsync is disabled. =cut sub init { my ($self, %params) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $host = $self->host; local %ENV = $self->_get_env(); $params{allows_streaming} = 0 unless defined $params{allows_streaming}; $params{has_archiving} = 0 unless defined $params{has_archiving}; mkdir $self->backup_dir; mkdir $self->archive_dir; TestLib::system_or_bail( 'initdb', '-D', $pgdata, ($self->_initdb_flags), @{ $params{extra} }); TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata, @{ $params{auth_extra} }); open my $conf, '>>', "$pgdata/postgresql.conf"; print $conf "\n# Added by PostgresNode.pm\n"; print $conf "fsync = off\n"; print $conf "restart_after_crash = off\n"; print $conf "log_line_prefix = '%m [%p] %q%a '\n"; print $conf "log_statement = all\n"; print $conf "log_replication_commands = on\n"; print $conf "wal_retrieve_retry_interval = '500ms'\n"; # If a setting tends to affect whether tests pass or fail, print it after # TEMP_CONFIG. Otherwise, print it before TEMP_CONFIG, thereby permitting # overrides. Settings that merely improve performance or ease debugging # belong before TEMP_CONFIG. print $conf TestLib::slurp_file($ENV{TEMP_CONFIG}) if defined $ENV{TEMP_CONFIG}; $self->set_stats_temp_directory($conf); if ($params{allows_streaming}) { $self->_init_streaming($conf, $params{allows_streaming}); } else { $self->_init_wal_level_minimal($conf); } print $conf "port = $port\n"; $self->_init_network($conf, $use_tcp, $host); close $conf; chmod($self->group_access ? 0640 : 0600, "$pgdata/postgresql.conf") or die("unable to set permissions for $pgdata/postgresql.conf"); $self->set_replication_conf if $params{allows_streaming}; $self->enable_archiving if $params{has_archiving}; return; } # methods use in init() which can be overridden in older versions sub _initdb_flags { return ('-A', 'trust', '-N'); } sub _init_network { my ($self, $conf, $use_tcp, $host) = @_; if ($use_tcp) { print $conf "unix_socket_directories = ''\n"; print $conf "listen_addresses = '$host'\n"; } else { print $conf "unix_socket_directories = '$host'\n"; print $conf "listen_addresses = ''\n"; } } sub _init_streaming { my ($self, $conf, $allows_streaming) = @_; if ($allows_streaming eq "logical") { print $conf "wal_level = logical\n"; } else { print $conf "wal_level = 'replica'\n"; } print $conf "max_wal_senders = 10\n"; print $conf "max_replication_slots = 10\n"; print $conf "wal_log_hints = on\n"; print $conf "hot_standby = on\n"; # conservative settings to ensure we can run multiple postmasters: print $conf "shared_buffers = 1MB\n"; print $conf "max_connections = 10\n"; # limit disk space consumption, too: print $conf "max_wal_size = 128MB\n"; } sub _init_wal_level_minimal { my ($self, $conf) = @_; print $conf "wal_level = minimal\n"; print $conf "max_wal_senders = 0\n"; } =pod =item $node->append_conf(filename, str) A shortcut method to append to files like pg_hba.conf and postgresql.conf. Does no validation or sanity checking. Does not reload the configuration after writing. A newline is automatically appended to the string. =cut sub append_conf { my ($self, $filename, $str) = @_; my $conffile = $self->data_dir . '/' . $filename; TestLib::append_to_file($conffile, $str . "\n"); chmod($self->group_access() ? 0640 : 0600, $conffile) or die("unable to set permissions for $conffile"); return; } =pod =item $node->adjust_conf(filename, setting, value, skip_equals) Modify the named config file with the setting. If the value is undefined, instead delete the setting. If the setting is not present then no action is taken. This will write "$setting = $value\n" in place of the existsing line, unless skip_equals is true, in which case it will write "$setting $value\n". If the value needs to be quoted it is up to the caller to do that. =cut sub adjust_conf { my ($self, $filename, $setting, $value, $skip_equals) = @_; my $conffile = $self->data_dir . '/' . $filename; my $contents = TestLib::slurp_file($conffile); my @lines = split(/\n/, $contents); my @result; my $eq = $skip_equals ? '' : '= '; foreach my $line (@lines) { if ($line !~ /^$setting\W/) { push(@result, $line); next; } if (defined $value) { push(@result, "$setting $eq$value"); } } open my $fh, ">", $conffile or croak "could not write \"$conffile\": $!"; print $fh join("\n", @result), "\n"; close $fh; chmod($self->group_access() ? 0640 : 0600, $conffile) or die("unable to set permissions for $conffile"); } =pod =item $node->backup(backup_name) Create a hot backup with B<pg_basebackup> in subdirectory B<backup_name> of B<< $node->backup_dir >>, including the WAL. By default, WAL files are fetched at the end of the backup, not streamed. You can adjust that and other things by passing an array of additional B<pg_basebackup> command line options in the keyword parameter backup_options. You'll have to configure a suitable B<max_wal_senders> on the target server since it isn't done by default. =cut sub backup { my ($self, $backup_name, %params) = @_; my $backup_path = $self->backup_dir . '/' . $backup_name; my $name = $self->name; local %ENV = $self->_get_env(); print "# Taking pg_basebackup $backup_name from node \"$name\"\n"; TestLib::system_or_bail( 'pg_basebackup', '-D', $backup_path, '-h', $self->host, '-p', $self->port, '--checkpoint', 'fast', ($self->_backup_sync), @{ $params{backup_options} }); print "# Backup finished\n"; return; } sub _backup_sync { return ('--no-sync'); } =item $node->backup_fs_hot(backup_name) Create a backup with a filesystem level copy in subdirectory B<backup_name> of B<< $node->backup_dir >>, including WAL. Archiving must be enabled, as B<pg_start_backup()> and B<pg_stop_backup()> are used. This is not checked or enforced. The backup name is passed as the backup label to B<pg_start_backup()>. =cut sub backup_fs_hot { my ($self, $backup_name) = @_; $self->_backup_fs($backup_name, 1); return; } =item $node->backup_fs_cold(backup_name) Create a backup with a filesystem level copy in subdirectory B<backup_name> of B<< $node->backup_dir >>, including WAL. The server must be stopped as no attempt to handle concurrent writes is made. Use B<backup> or B<backup_fs_hot> if you want to back up a running server. =cut sub backup_fs_cold { my ($self, $backup_name) = @_; $self->_backup_fs($backup_name, 0); return; } # Common sub of backup_fs_hot and backup_fs_cold sub _backup_fs { my ($self, $backup_name, $hot) = @_; my $backup_path = $self->backup_dir . '/' . $backup_name; my $port = $self->port; my $name = $self->name; print "# Taking filesystem backup $backup_name from node \"$name\"\n"; if ($hot) { my $stdout = $self->safe_psql('postgres', "SELECT * FROM pg_start_backup('$backup_name');"); print "# pg_start_backup: $stdout\n"; } RecursiveCopy::copypath( $self->data_dir, $backup_path, filterfn => sub { my $src = shift; return ($src ne 'log' and $src ne 'postmaster.pid'); }); if ($hot) { # We ignore pg_stop_backup's return value. We also assume archiving # is enabled; otherwise the caller will have to copy the remaining # segments. my $stdout = $self->safe_psql('postgres', 'SELECT * FROM pg_stop_backup();'); print "# pg_stop_backup: $stdout\n"; } print "# Backup finished\n"; return; } =pod =item $node->init_from_backup(root_node, backup_name) Initialize a node from a backup, which may come from this node or a different node. root_node must be a PostgresNode reference, backup_name the string name of a backup previously created on that node with $node->backup. Does not start the node after initializing it. By default, the backup is assumed to be plain format. To restore from a tar-format backup, pass the name of the tar program to use in the keyword parameter tar_program. Note that tablespace tar files aren't handled here. Streaming replication can be enabled on this node by passing the keyword parameter has_streaming => 1. This is disabled by default. Restoring WAL segments from archives using restore_command can be enabled by passing the keyword parameter has_restoring => 1. This is disabled by default. If has_restoring is used, standby mode is used by default. To use recovery mode instead, pass the keyword parameter standby => 0. The backup is copied, leaving the original unmodified. pg_hba.conf is unconditionally set to enable replication connections. =cut sub init_from_backup { my ($self, $root_node, $backup_name, %params) = @_; my $backup_path = $root_node->backup_dir . '/' . $backup_name; my $host = $self->host; my $port = $self->port; my $node_name = $self->name; my $root_name = $root_node->name; $params{has_streaming} = 0 unless defined $params{has_streaming}; $params{has_restoring} = 0 unless defined $params{has_restoring}; $params{standby} = 1 unless defined $params{standby}; print "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n"; croak "Backup \"$backup_name\" does not exist at $backup_path" unless -d $backup_path; mkdir $self->backup_dir; mkdir $self->archive_dir; my $data_path = $self->data_dir; if (defined $params{tar_program}) { mkdir($data_path); TestLib::system_or_bail($params{tar_program}, 'xf', $backup_path . '/base.tar', '-C', $data_path); TestLib::system_or_bail( $params{tar_program}, 'xf', $backup_path . '/pg_wal.tar', '-C', $data_path . '/pg_wal'); } else { rmdir($data_path); RecursiveCopy::copypath($backup_path, $data_path); } chmod(0700, $data_path); # Base configuration for this node $self->append_conf( 'postgresql.conf', qq( port = $port )); $self->_init_network_append($use_tcp, $host); $self->enable_streaming($root_node) if $params{has_streaming}; $self->enable_restoring($root_node, $params{standby}) if $params{has_restoring}; return; } sub _init_network_append { my ($self, $use_tcp, $host) = @_; if ($use_tcp) { $self->append_conf('postgresql.conf', "listen_addresses = '$host'"); } else { $self->append_conf('postgresql.conf', "unix_socket_directories = '$host'"); } } =pod =item $node->rotate_logfile() Switch to a new PostgreSQL log file. This does not alter any running PostgreSQL process. Subsequent method calls, including pg_ctl invocations, will use the new name. Return the new name. =cut sub rotate_logfile { my ($self) = @_; $self->{_logfile} = sprintf('%s_%d.log', $self->{_logfile_base}, ++$self->{_logfile_generation}); return $self->{_logfile}; } =pod =item $node->start(%params) => success_or_failure Wrapper for pg_ctl start Start the node and wait until it is ready to accept connections. =over =item fail_ok => 1 By default, failure terminates the entire F<prove> invocation. If given, instead return a true or false value to indicate success or failure. =back =cut sub start { my ($self, %params) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $name = $self->name; my $ret; BAIL_OUT("node \"$name\" is already running") if defined $self->{_pid}; print("### Starting node \"$name\"\n"); # Temporarily unset PGAPPNAME so that the server doesn't # inherit it. Otherwise this could affect libpqwalreceiver # connections in confusing ways. local %ENV = $self->_get_env(PGAPPNAME => undef); # Note: We set the cluster_name here, not in postgresql.conf (in # sub init) so that it does not get copied to standbys. $ret = TestLib::system_log('pg_ctl', '-w', '-D', $self->data_dir, '-l', $self->logfile, ($self->_cluster_name_opt($name)), 'start'); if ($ret != 0) { print "# pg_ctl start failed; logfile:\n"; print TestLib::slurp_file($self->logfile); BAIL_OUT("pg_ctl start failed") unless $params{fail_ok}; return 0; } $self->_update_pid(1); return 1; } sub _cluster_name_opt { my ($self, $name) = @_; return ('-o', "--cluster-name=$name"); } =pod =item $node->kill9() Send SIGKILL (signal 9) to the postmaster. Note: if the node is already known stopped, this does nothing. However, if we think it's running and it's not, it's important for this to fail. Otherwise, tests might fail to detect server crashes. =cut sub kill9 { my ($self) = @_; my $name = $self->name; return unless defined $self->{_pid}; local %ENV = $self->_get_env(); print "### Killing node \"$name\" using signal 9\n"; # kill(9, ...) fails under msys Perl 5.8.8, so fall back on pg_ctl. kill(9, $self->{_pid}) or TestLib::system_or_bail('pg_ctl', 'kill', 'KILL', $self->{_pid}); $self->{_pid} = undef; return; } =pod =item $node->stop(mode) Stop the node using pg_ctl -m $mode and wait for it to stop. Note: if the node is already known stopped, this does nothing. However, if we think it's running and it's not, it's important for this to fail. Otherwise, tests might fail to detect server crashes. =cut sub stop { my ($self, $mode) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $name = $self->name; local %ENV = $self->_get_env(); $mode = 'fast' unless defined $mode; return unless defined $self->{_pid}; print "### Stopping node \"$name\" using mode $mode\n"; TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop'); $self->_update_pid(0); return; } =pod =item $node->reload() Reload configuration parameters on the node. =cut sub reload { my ($self) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $name = $self->name; local %ENV = $self->_get_env(); print "### Reloading node \"$name\"\n"; TestLib::system_or_bail('pg_ctl', '-D', $pgdata, 'reload'); return; } =pod =item $node->restart() Wrapper for pg_ctl restart =cut sub restart { my ($self) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $logfile = $self->logfile; my $name = $self->name; local %ENV = $self->_get_env(PGAPPNAME => undef); print "### Restarting node \"$name\"\n"; TestLib::system_or_bail('pg_ctl', '-w', '-D', $pgdata, '-l', $logfile, 'restart'); $self->_update_pid(1); return; } =pod =item $node->promote() Wrapper for pg_ctl promote =cut sub promote { my ($self) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $logfile = $self->logfile; my $name = $self->name; local %ENV = $self->_get_env(); print "### Promoting node \"$name\"\n"; TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-l', $logfile, 'promote'); return; } =pod =item $node->logrotate() Wrapper for pg_ctl logrotate =cut sub logrotate { my ($self) = @_; my $port = $self->port; my $pgdata = $self->data_dir; my $logfile = $self->logfile; my $name = $self->name; local %ENV = $self->_get_env(); print "### Rotating log in node \"$name\"\n"; TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-l', $logfile, 'logrotate'); return; } # Internal routine to enable streaming replication on a standby node. sub enable_streaming { my ($self, $root_node) = @_; my $root_connstr = $root_node->connstr; my $name = $self->name; print "### Enabling streaming replication for node \"$name\"\n"; $self->append_conf( $self->_recovery_file, qq( primary_conninfo='$root_connstr application_name=$name' )); $self->set_standby_mode(); return; } sub _recovery_file { return "postgresql.conf"; } # Internal routine to enable archive recovery command on a standby node sub enable_restoring { my ($self, $root_node, $standby) = @_; my $path = TestLib::perl2host($root_node->archive_dir); my $name = $self->name; print "### Enabling WAL restore for node \"$name\"\n"; # On Windows, the path specified in the restore command needs to use # double back-slashes to work properly and to be able to detect properly # the file targeted by the copy command, so the directory value used # in this routine, using only one back-slash, need to be properly changed # first. Paths also need to be double-quoted to prevent failures where # the path contains spaces. $path =~ s{\\}{\\\\}g if ($TestLib::windows_os); my $copy_command = $TestLib::windows_os ? qq{copy "$path\\\\%f" "%p"} : qq{cp "$path/%f" "%p"}; $self->append_conf( $self->_recovery_file, qq( restore_command = '$copy_command' )); if ($standby) { $self->set_standby_mode(); } else { $self->set_recovery_mode(); } return; } =pod =item $node->set_recovery_mode() Place recovery.signal file. =cut sub set_recovery_mode { my ($self) = @_; $self->append_conf('recovery.signal', ''); return; } =pod =item $node->set_standby_mode() Place standby.signal file. =cut sub set_standby_mode { my ($self) = @_; $self->append_conf('standby.signal', ''); return; } # Internal routine to enable archiving sub enable_archiving { my ($self) = @_; my $path = TestLib::perl2host($self->archive_dir); my $name = $self->name; print "### Enabling WAL archiving for node \"$name\"\n"; # On Windows, the path specified in the restore command needs to use # double back-slashes to work properly and to be able to detect properly # the file targeted by the copy command, so the directory value used # in this routine, using only one back-slash, need to be properly changed # first. Paths also need to be double-quoted to prevent failures where # the path contains spaces. $path =~ s{\\}{\\\\}g if ($TestLib::windows_os); my $copy_command = $TestLib::windows_os ? qq{copy "%p" "$path\\\\%f"} : qq{cp "%p" "$path/%f"}; # Enable archive_mode and archive_command on node $self->append_conf( 'postgresql.conf', qq( archive_mode = on archive_command = '$copy_command' )); return; } # Internal method sub _update_pid { my ($self, $is_running) = @_; my $name = $self->name; # If we can open the PID file, read its first line and that's the PID we # want. if (open my $pidfile, '<', $self->data_dir . "/postmaster.pid") { chomp($self->{_pid} = <$pidfile>); print "# Postmaster PID for node \"$name\" is $self->{_pid}\n"; close $pidfile; # If we found a pidfile when there shouldn't be one, complain. BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running; return; } $self->{_pid} = undef; print "# No postmaster PID for node \"$name\"\n"; # Complain if we expected to find a pidfile. BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running; return; } =pod =item PostgresNode->get_new_node(node_name, %params) Build a new object of class C<PostgresNode> (or of a subclass, if you have one), assigning a free port number. Remembers the node, to prevent its port number from being reused for another node, and to ensure that it gets shut down when the test script exits. You should generally use this instead of C<PostgresNode::new(...)>. =over =item port => [1,65535] By default, this function assigns a port number to each node. Specify this to force a particular port number. The caller is responsible for evaluating potential conflicts and privilege requirements. =item own_host => 1 By default, all nodes use the same PGHOST value. If specified, generate a PGHOST specific to this node. This allows multiple nodes to use the same port. =item install_path => '/path/to/postgres/installation' Using this parameter is it possible to have nodes pointing to different installations, for testing different versions together or the same version with different build parameters. The provided path must be the parent of the installation's 'bin' and 'lib' directories. In the common case where this is not provided, Postgres binaries will be found in the caller's PATH. =back For backwards compatibility, it is also exported as a standalone function, which can only create objects of class C<PostgresNode>. =cut sub get_new_node { my $class = 'PostgresNode'; $class = shift if scalar(@_) % 2 != 1; my ($name, %params) = @_; # Select a port. my $port; if (defined $params{port}) { $port = $params{port}; } else { # When selecting a port, we look for an unassigned TCP port number, # even if we intend to use only Unix-domain sockets. This is clearly # necessary on $use_tcp (Windows) configurations, and it seems like a # good idea on Unixen as well. $port = get_free_port(); } # Select a host. my $host = $test_pghost; if ($params{own_host}) { if ($use_tcp) { $last_host_assigned++; $last_host_assigned > 254 and BAIL_OUT("too many own_host nodes"); $host = '127.0.0.' . $last_host_assigned; } else { $host = "$test_pghost/$name"; # Assume $name =~ /^[-_a-zA-Z0-9]+$/ mkdir $host; } } # Lock port number found by creating a new node my $node = $class->new($name, $host, $port); if ($params{install_path}) { $node->{_install_path} = $params{install_path}; } # Add node to list of nodes push(@all_nodes, $node); # Set the version of Postgres we're working with $node->_set_pg_version; # bless the object into the appropriate subclass, # according to the found version if (ref $node->{_pg_version} && $node->{_pg_version} < $devtip) { my $maj = $node->{_pg_version}->major(separator => '_'); my $subclass = __PACKAGE__ . "V_$maj"; bless $node, $subclass; } return $node; } # Private routine to run the pg_config binary found in our environment (or in # our install_path, if we have one), and set the version from it # sub _set_pg_version { my ($self) = @_; my $inst = $self->{_install_path}; my $pg_config = "pg_config"; if (defined $inst) { # If the _install_path is invalid, our PATH variables might find an # unrelated pg_config executable elsewhere. Sanity check the # directory. BAIL_OUT("directory not found: $inst") unless -d $inst; # If the directory exists but is not the root of a postgresql # installation, or if the user configured using # --bindir=$SOMEWHERE_ELSE, we're not going to find pg_config, so # complain about that, too. $pg_config = "$inst/bin/pg_config"; BAIL_OUT("pg_config not found: $pg_config") unless -e $pg_config; BAIL_OUT("pg_config not executable: $pg_config") unless -x $pg_config; # Leave $pg_config install_path qualified, to be sure we get the right # version information, below, or die trying } local %ENV = $self->_get_env(); # We only want the version field open my $fh, "-|", $pg_config, "--version" or BAIL_OUT("$pg_config failed: $!"); my $version_line = <$fh>; close $fh or die; $self->{_pg_version} = PostgresVersion->new($version_line); BAIL_OUT("could not parse pg_config --version output: $version_line") unless defined $self->{_pg_version}; } # Private routine to return a copy of the environment with the PATH and # (DY)LD_LIBRARY_PATH correctly set when there is an install path set for # the node. # # Routines that call Postgres binaries need to call this routine like this: # # local %ENV = $self->_get_env{[%extra_settings]); # # A copy of the environment is taken and node's host and port settings are # added as PGHOST and PGPORT, Then the extra settings (if any) are applied. # Any setting in %extra_settings with a value that is undefined is deleted # the remainder are# set. Then the PATH and (DY)LD_LIBRARY_PATH are adjusted # if the node's install path is set, and the copy environment is returned. # # The install path set in get_new_node needs to be a directory containing # bin and lib subdirectories as in a standard PostgreSQL installation, so this # can't be used with installations where the bin and lib directories don't have # a common parent directory. sub _get_env { my $self = shift; my %inst_env = (%ENV, PGHOST => $self->{_host}, PGPORT => $self->{_port}); # the remaining arguments are modifications to make to the environment my %mods = (@_); while (my ($k, $v) = each %mods) { if (defined $v) { $inst_env{$k} = "$v"; } else { delete $inst_env{$k}; } } # now fix up the new environment for the install path my $inst = $self->{_install_path}; if ($inst) { if ($TestLib::windows_os) { # Windows picks up DLLs from the PATH rather than *LD_LIBRARY_PATH # choose the right path separator if ($Config{osname} eq 'MSWin32') { $inst_env{PATH} = "$inst/bin;$inst/lib;$ENV{PATH}"; } else { $inst_env{PATH} = "$inst/bin:$inst/lib:$ENV{PATH}"; } } else { my $dylib_name = $Config{osname} eq 'darwin' ? "DYLD_LIBRARY_PATH" : "LD_LIBRARY_PATH"; $inst_env{PATH} = "$inst/bin:$ENV{PATH}"; if (exists $ENV{$dylib_name}) { $inst_env{$dylib_name} = "$inst/lib:$ENV{$dylib_name}"; } else { $inst_env{$dylib_name} = "$inst/lib"; } } } return (%inst_env); } # Private routine to get an installation path qualified command. # # IPC::Run maintains a cache, %cmd_cache, mapping commands to paths. Tests # which use nodes spanning more than one postgres installation path need to # avoid confusing which installation's binaries get run. Setting $ENV{PATH} is # insufficient, as IPC::Run does not check to see if the path has changed since # caching a command. sub installed_command { my ($self, $cmd) = @_; # Nodes using alternate installation locations use their installation's # bin/ directory explicitly return join('/', $self->{_install_path}, 'bin', $cmd) if defined $self->{_install_path}; # Nodes implicitly using the default installation location rely on IPC::Run # to find the right binary, which should not cause %cmd_cache confusion, # because no nodes with other installation paths do it that way. return $cmd; } =pod =item get_free_port() Locate an unprivileged (high) TCP port that's not currently bound to anything. This is used by get_new_node, and is also exported for use by test cases that need to start other, non-Postgres servers. Ports assigned to existing PostgresNode objects are automatically excluded, even if those servers are not currently running. XXX A port available now may become unavailable by the time we start the desired service. =cut sub get_free_port { my $found = 0; my $port = $last_port_assigned; while ($found == 0) { # advance $port, wrapping correctly around range end $port = 49152 if ++$port >= 65536; print "# Checking port $port\n"; # Check first that candidate port number is not included in # the list of already-registered nodes. $found = 1; foreach my $node (@all_nodes) { $found = 0 if ($node->port == $port); } # Check to see if anything else is listening on this TCP port. # Seek a port available for all possible listen_addresses values, # so callers can harness this port for the widest range of purposes. # The 0.0.0.0 test achieves that for MSYS, which automatically sets # SO_EXCLUSIVEADDRUSE. Testing 0.0.0.0 is insufficient for Windows # native Perl (https://stackoverflow.com/a/14388707), so we also # have to test individual addresses. Doing that for 127.0.0/24 # addresses other than 127.0.0.1 might fail with EADDRNOTAVAIL on # non-Linux, non-Windows kernels. # # Thus, 0.0.0.0 and individual 127.0.0/24 addresses are tested # only on Windows and only when TCP usage is requested. if ($found == 1) { foreach my $addr (qw(127.0.0.1), ($use_tcp && $TestLib::windows_os) ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ()) { if (!can_bind($addr, $port)) { $found = 0; last; } } } } print "# Found port $port\n"; # Update port for next time $last_port_assigned = $port; return $port; } # Internal routine to check whether a host:port is available to bind sub can_bind { my ($host, $port) = @_; my $iaddr = inet_aton($host); my $paddr = sockaddr_in($port, $iaddr); my $proto = getprotobyname("tcp"); socket(SOCK, PF_INET, SOCK_STREAM, $proto) or die "socket failed: $!"; # As in postmaster, don't use SO_REUSEADDR on Windows setsockopt(SOCK, SOL_SOCKET, SO_REUSEADDR, pack("l", 1)) unless $TestLib::windows_os; my $ret = bind(SOCK, $paddr) && listen(SOCK, SOMAXCONN); close(SOCK); return $ret; } # Automatically shut down any still-running nodes (in the same order the nodes # were created in) when the test script exits. END { # take care not to change the script's exit value my $exit_code = $?; foreach my $node (@all_nodes) { $node->teardown_node; # skip clean if we are requested to retain the basedir next if defined $ENV{'PG_TEST_NOCLEAN'}; # clean basedir on clean test invocation $node->clean_node if $exit_code == 0 && TestLib::all_tests_passing(); } $? = $exit_code; } =pod =item $node->teardown_node() Do an immediate stop of the node =cut sub teardown_node { my $self = shift; $self->stop('immediate'); return; } =pod =item $node->clean_node() Remove the base directory of the node if the node has been stopped. =cut sub clean_node { my $self = shift; rmtree $self->{_basedir} unless defined $self->{_pid}; return; } =pod =item $node->safe_psql($dbname, $sql) => stdout Invoke B<psql> to run B<sql> on B<dbname> and return its stdout on success. Die if the SQL produces an error. Runs with B<ON_ERROR_STOP> set. Takes optional extra params like timeout and timed_out parameters with the same options as psql. =cut sub safe_psql { my ($self, $dbname, $sql, %params) = @_; local %ENV = $self->_get_env(); my ($stdout, $stderr); my $ret = $self->psql( $dbname, $sql, %params, stdout => \$stdout, stderr => \$stderr, on_error_die => 1, on_error_stop => 1); # psql can emit stderr from NOTICEs etc if ($stderr ne "") { print "#### Begin standard error\n"; print $stderr; print "\n#### End standard error\n"; } return $stdout; } =pod =item $node->psql($dbname, $sql, %params) => psql_retval Invoke B<psql> to execute B<$sql> on B<$dbname> and return the return value from B<psql>, which is run with on_error_stop by default so that it will stop running sql and return 3 if the passed SQL results in an error. As a convenience, if B<psql> is called in array context it returns an array containing ($retval, $stdout, $stderr). psql is invoked in tuples-only unaligned mode with reading of B<.psqlrc> disabled. That may be overridden by passing extra psql parameters. stdout and stderr are transformed to UNIX line endings if on Windows. Any trailing newline is removed. Dies on failure to invoke psql but not if psql exits with a nonzero return code (unless on_error_die specified). If psql exits because of a signal, an exception is raised. =over =item stdout => \$stdout B<stdout>, if given, must be a scalar reference to which standard output is written. If not given, standard output is not redirected and will be printed unless B<psql> is called in array context, in which case it's captured and returned. =item stderr => \$stderr Same as B<stdout> but gets standard error. If the same scalar is passed for both B<stdout> and B<stderr> the results may be interleaved unpredictably. =item on_error_stop => 1 By default, the B<psql> method invokes the B<psql> program with ON_ERROR_STOP=1 set, so SQL execution is stopped at the first error and exit code 3 is returned. Set B<on_error_stop> to 0 to ignore errors instead. =item on_error_die => 0 By default, this method returns psql's result code. Pass on_error_die to instead die with an informative message. =item timeout => 'interval' Set a timeout for the psql call as an interval accepted by B<IPC::Run::timer> (integer seconds is fine). This method raises an exception on timeout, unless the B<timed_out> parameter is also given. =item timed_out => \$timed_out If B<timeout> is set and this parameter is given, the scalar it references is set to true if the psql call times out. =item connstr => B<value> If set, use this as the connection string for the connection to the backend. =item host => B<value> If this parameter is set, this host is used for the connection attempt. =item port => B<port> If this parameter is set, this port is used for the connection attempt. =item replication => B<value> If set, add B<replication=value> to the conninfo string. Passing the literal value C<database> results in a logical replication connection. =item extra_params => ['--single-transaction'] If given, it must be an array reference containing additional parameters to B<psql>. =back e.g. my ($stdout, $stderr, $timed_out); my $cmdret = $node->psql('postgres', 'SELECT pg_sleep(60)', stdout => \$stdout, stderr => \$stderr, timeout => 30, timed_out => \$timed_out, extra_params => ['--single-transaction']) will set $cmdret to undef and $timed_out to a true value. $node->psql('postgres', $sql, on_error_die => 1); dies with an informative message if $sql fails. =cut sub psql { my ($self, $dbname, $sql, %params) = @_; local %ENV = $self->_get_env(); my $stdout = $params{stdout}; my $stderr = $params{stderr}; my $replication = $params{replication}; my $timeout = undef; my $timeout_exception = 'psql timed out'; # Build the connection string. my $psql_connstr; if (defined $params{connstr}) { $psql_connstr = $params{connstr}; } else { $psql_connstr = $self->connstr($dbname); } $psql_connstr .= defined $replication ? " replication=$replication" : ""; my @no_password = ('-w') if ($params{no_password}); my @host = ('-h', $params{host}) if defined $params{host}; my @port = ('-p', $params{port}) if defined $params{port}; my @psql_params = ( $self->installed_command('psql'), '-XAtq', @no_password, @host, @port, '-d', $psql_connstr, '-f', '-'); # If the caller wants an array and hasn't passed stdout/stderr # references, allocate temporary ones to capture them so we # can return them. Otherwise we won't redirect them at all. if (wantarray) { if (!defined($stdout)) { my $temp_stdout = ""; $stdout = \$temp_stdout; } if (!defined($stderr)) { my $temp_stderr = ""; $stderr = \$temp_stderr; } } $params{on_error_stop} = 1 unless defined $params{on_error_stop}; $params{on_error_die} = 0 unless defined $params{on_error_die}; push @psql_params, '-v', 'ON_ERROR_STOP=1' if $params{on_error_stop}; push @psql_params, @{ $params{extra_params} } if defined $params{extra_params}; $timeout = IPC::Run::timeout($params{timeout}, exception => $timeout_exception) if (defined($params{timeout})); ${ $params{timed_out} } = 0 if defined $params{timed_out}; # IPC::Run would otherwise append to existing contents: $$stdout = "" if ref($stdout); $$stderr = "" if ref($stderr); my $ret; # Run psql and capture any possible exceptions. If the exception is # because of a timeout and the caller requested to handle that, just return # and set the flag. Otherwise, and for any other exception, rethrow. # # For background, see # https://metacpan.org/pod/release/ETHER/Try-Tiny-0.24/lib/Try/Tiny.pm do { local $@; eval { my @ipcrun_opts = (\@psql_params, '<', \$sql); push @ipcrun_opts, '>', $stdout if defined $stdout; push @ipcrun_opts, '2>', $stderr if defined $stderr; push @ipcrun_opts, $timeout if defined $timeout; IPC::Run::run @ipcrun_opts; $ret = $?; }; my $exc_save = $@; if ($exc_save) { # IPC::Run::run threw an exception. re-throw unless it's a # timeout, which we'll handle by testing is_expired die $exc_save if (blessed($exc_save) || $exc_save !~ /^\Q$timeout_exception\E/); $ret = undef; die "Got timeout exception '$exc_save' but timer not expired?!" unless $timeout->is_expired; if (defined($params{timed_out})) { ${ $params{timed_out} } = 1; } else { die "psql timed out: stderr: '$$stderr'\n" . "while running '@psql_params'"; } } }; # Note: on Windows, IPC::Run seems to convert \r\n to \n in program output # if we're using native Perl, but not if we're using MSys Perl. So do it # by hand in the latter case, here and elsewhere. if (defined $$stdout) { $$stdout =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp $$stdout; } if (defined $$stderr) { $$stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp $$stderr; } # See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR # We don't use IPC::Run::Simple to limit dependencies. # # We always die on signal. my $core = $ret & 128 ? " (core dumped)" : ""; die "psql exited with signal " . ($ret & 127) . "$core: '$$stderr' while running '@psql_params'" if $ret & 127; $ret = $ret >> 8; if ($ret && $params{on_error_die}) { die "psql error: stderr: '$$stderr'\nwhile running '@psql_params'" if $ret == 1; die "connection error: '$$stderr'\nwhile running '@psql_params'" if $ret == 2; die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'" if $ret == 3; die "psql returns $ret: '$$stderr'\nwhile running '@psql_params'"; } if (wantarray) { return ($ret, $$stdout, $$stderr); } else { return $ret; } } =pod =item $node->background_psql($dbname, \$stdin, \$stdout, $timer, %params) => harness Invoke B<psql> on B<$dbname> and return an IPC::Run harness object, which the caller may use to send input to B<psql>. The process's stdin is sourced from the $stdin scalar reference, and its stdout and stderr go to the $stdout scalar reference. This allows the caller to act on other parts of the system while idling this backend. The specified timer object is attached to the harness, as well. It's caller's responsibility to select the timeout length, and to restart the timer after each command if the timeout is per-command. psql is invoked in tuples-only unaligned mode with reading of B<.psqlrc> disabled. That may be overridden by passing extra psql parameters. Dies on failure to invoke psql, or if psql fails to connect. Errors occurring later are the caller's problem. psql runs with on_error_stop by default so that it will stop running sql and return 3 if passed SQL results in an error. Be sure to "finish" the harness when done with it. =over =item on_error_stop => 1 By default, the B<psql> method invokes the B<psql> program with ON_ERROR_STOP=1 set, so SQL execution is stopped at the first error and exit code 3 is returned. Set B<on_error_stop> to 0 to ignore errors instead. =item replication => B<value> If set, add B<replication=value> to the conninfo string. Passing the literal value C<database> results in a logical replication connection. =item extra_params => ['--single-transaction'] If given, it must be an array reference containing additional parameters to B<psql>. =back =cut sub background_psql { my ($self, $dbname, $stdin, $stdout, $timer, %params) = @_; local %ENV = $self->_get_env(); my $replication = $params{replication}; my @psql_params = ( $self->installed_command('psql'), '-XAtq', '-d', $self->connstr($dbname) . (defined $replication ? " replication=$replication" : ""), '-f', '-'); $params{on_error_stop} = 1 unless defined $params{on_error_stop}; push @psql_params, '-v', 'ON_ERROR_STOP=1' if $params{on_error_stop}; push @psql_params, @{ $params{extra_params} } if defined $params{extra_params}; # Ensure there is no data waiting to be sent: $$stdin = "" if ref($stdin); # IPC::Run would otherwise append to existing contents: $$stdout = "" if ref($stdout); my $harness = IPC::Run::start \@psql_params, '<', $stdin, '>', $stdout, $timer; # Request some output, and pump until we see it. This means that psql # connection failures are caught here, relieving callers of the need to # handle those. (Right now, we have no particularly good handling for # errors anyway, but that might be added later.) my $banner = "background_psql: ready"; $$stdin = "\\echo $banner\n"; pump $harness until $$stdout =~ /$banner/ || $timer->is_expired; die "psql startup timed out" if $timer->is_expired; return $harness; } =pod =item $node->interactive_psql($dbname, \$stdin, \$stdout, $timer, %params) => harness Invoke B<psql> on B<$dbname> and return an IPC::Run harness object, which the caller may use to send interactive input to B<psql>. The process's stdin is sourced from the $stdin scalar reference, and its stdout and stderr go to the $stdout scalar reference. ptys are used so that psql thinks it's being called interactively. The specified timer object is attached to the harness, as well. It's caller's responsibility to select the timeout length, and to restart the timer after each command if the timeout is per-command. psql is invoked in tuples-only unaligned mode with reading of B<.psqlrc> disabled. That may be overridden by passing extra psql parameters. Dies on failure to invoke psql, or if psql fails to connect. Errors occurring later are the caller's problem. Be sure to "finish" the harness when done with it. The only extra parameter currently accepted is =over =item extra_params => ['--single-transaction'] If given, it must be an array reference containing additional parameters to B<psql>. =back This requires IO::Pty in addition to IPC::Run. =cut sub interactive_psql { my ($self, $dbname, $stdin, $stdout, $timer, %params) = @_; local %ENV = $self->_get_env(); my @psql_params = ( $self->installed_command('psql'), '-XAt', '-d', $self->connstr($dbname)); push @psql_params, @{ $params{extra_params} } if defined $params{extra_params}; # Ensure there is no data waiting to be sent: $$stdin = "" if ref($stdin); # IPC::Run would otherwise append to existing contents: $$stdout = "" if ref($stdout); my $harness = IPC::Run::start \@psql_params, '<pty<', $stdin, '>pty>', $stdout, $timer; # Pump until we see psql's help banner. This ensures that callers # won't write anything to the pty before it's ready, avoiding an # implementation issue in IPC::Run. Also, it means that psql # connection failures are caught here, relieving callers of # the need to handle those. (Right now, we have no particularly # good handling for errors anyway, but that might be added later.) pump $harness until $$stdout =~ /Type "help" for help/ || $timer->is_expired; die "psql startup timed out" if $timer->is_expired; return $harness; } =pod =item $node->connect_ok($connstr, $test_name, %params) Attempt a connection with a custom connection string. This is expected to succeed. =over =item sql => B<value> If this parameter is set, this query is used for the connection attempt instead of the default. =item expected_stdout => B<value> If this regular expression is set, matches it with the output generated. =item log_like => [ qr/required message/ ] If given, it must be an array reference containing a list of regular expressions that must match against the server log, using C<Test::More::like()>. =item log_unlike => [ qr/prohibited message/ ] If given, it must be an array reference containing a list of regular expressions that must NOT match against the server log. They will be passed to C<Test::More::unlike()>. =item host => B<value> If this parameter is set, this host is used for the connection attempt. =item port => B<port> If this parameter is set, this port is used for the connection attempt. =back =cut sub connect_ok { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($self, $connstr, $test_name, %params) = @_; my $sql; if (defined($params{sql})) { $sql = $params{sql}; } else { $sql = "SELECT \$\$connected with $connstr\$\$"; } my (@log_like, @log_unlike); if (defined($params{log_like})) { @log_like = @{ $params{log_like} }; } if (defined($params{log_unlike})) { @log_unlike = @{ $params{log_unlike} }; } my $log_location = -s $self->logfile; # Never prompt for a password, any callers of this routine should # have set up things properly, and this should not block. my ($ret, $stdout, $stderr) = $self->psql( 'postgres', $sql, no_password => 1, host => $params{host}, port => $params{port}, connstr => "$connstr", on_error_stop => 0); is($ret, 0, $test_name); if (defined($params{expected_stdout})) { like($stdout, $params{expected_stdout}, "$test_name: matches"); } if (@log_like or @log_unlike) { my $log_contents = TestLib::slurp_file($self->logfile, $log_location); while (my $regex = shift @log_like) { like($log_contents, $regex, "$test_name: log matches"); } while (my $regex = shift @log_unlike) { unlike($log_contents, $regex, "$test_name: log does not match"); } } } =pod =item $node->connect_fails($connstr, $test_name, %params) Attempt a connection with a custom connection string. This is expected to fail. =over =item expected_stderr => B<value> If this regular expression is set, matches it with the output generated. =item log_like => [ qr/required message/ ] =item log_unlike => [ qr/prohibited message/ ] See C<connect_ok(...)>, above. =back =cut sub connect_fails { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($self, $connstr, $test_name, %params) = @_; my (@log_like, @log_unlike); if (defined($params{log_like})) { @log_like = @{ $params{log_like} }; } if (defined($params{log_unlike})) { @log_unlike = @{ $params{log_unlike} }; } my $log_location = -s $self->logfile; # Never prompt for a password, any callers of this routine should # have set up things properly, and this should not block. my ($ret, $stdout, $stderr) = $self->psql( 'postgres', undef, extra_params => ['-w'], connstr => "$connstr"); isnt($ret, 0, $test_name); if (defined($params{expected_stderr})) { like($stderr, $params{expected_stderr}, "$test_name: matches"); } if (@log_like or @log_unlike) { my $log_contents = TestLib::slurp_file($self->logfile, $log_location); while (my $regex = shift @log_like) { like($log_contents, $regex, "$test_name: log matches"); } while (my $regex = shift @log_unlike) { unlike($log_contents, $regex, "$test_name: log does not match"); } } } =pod =item $node->poll_query_until($dbname, $query [, $expected ]) Run B<$query> repeatedly, until it returns the B<$expected> result ('t', or SQL boolean true, by default). Continues polling if B<psql> returns an error result. Times out after 180 seconds. Returns 1 if successful, 0 if timed out. =cut sub poll_query_until { my ($self, $dbname, $query, $expected) = @_; local %ENV = $self->_get_env(); $expected = 't' unless defined($expected); # default value my $cmd = [ $self->installed_command('psql'), '-XAt', '-c', $query, '-d', $self->connstr($dbname) ]; my ($stdout, $stderr); my $max_attempts = 180 * 10; my $attempts = 0; while ($attempts < $max_attempts) { my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; $stdout =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp($stdout); if ($stdout eq $expected) { return 1; } # Wait 0.1 second before retrying. usleep(100_000); $attempts++; } # The query result didn't change in 180 seconds. Give up. Print the # output from the last attempt, hopefully that's useful for debugging. $stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp($stderr); diag qq(poll_query_until timed out executing this query: $query expecting this output: $expected last actual query output: $stdout with stderr: $stderr); return 0; } =pod =item $node->command_ok(...) Runs a shell command like TestLib::command_ok, but with PGHOST and PGPORT set so that the command will default to connecting to this PostgresNode. =cut sub command_ok { local $Test::Builder::Level = $Test::Builder::Level + 1; my $self = shift; local %ENV = $self->_get_env(); TestLib::command_ok(@_); return; } =pod =item $node->command_fails(...) TestLib::command_fails with our connection parameters. See command_ok(...) =cut sub command_fails { local $Test::Builder::Level = $Test::Builder::Level + 1; my $self = shift; local %ENV = $self->_get_env(); TestLib::command_fails(@_); return; } =pod =item $node->command_like(...) TestLib::command_like with our connection parameters. See command_ok(...) =cut sub command_like { local $Test::Builder::Level = $Test::Builder::Level + 1; my $self = shift; local %ENV = $self->_get_env(); TestLib::command_like(@_); return; } =pod =item $node->command_checks_all(...) TestLib::command_checks_all with our connection parameters. See command_ok(...) =cut sub command_checks_all { local $Test::Builder::Level = $Test::Builder::Level + 1; my $self = shift; local %ENV = $self->_get_env(); TestLib::command_checks_all(@_); return; } =pod =item $node->issues_sql_like(cmd, expected_sql, test_name) Run a command on the node, then verify that $expected_sql appears in the server log file. =cut sub issues_sql_like { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($self, $cmd, $expected_sql, $test_name) = @_; local %ENV = $self->_get_env(); my $log_location = -s $self->logfile; my $result = TestLib::run_log($cmd); ok($result, "@$cmd exit code 0"); my $log = TestLib::slurp_file($self->logfile, $log_location); like($log, $expected_sql, "$test_name: SQL found in server log"); return; } =pod =item $node->run_log(...) Runs a shell command like TestLib::run_log, but with connection parameters set so that the command will default to connecting to this PostgresNode. =cut sub run_log { my $self = shift; local %ENV = $self->_get_env(); TestLib::run_log(@_); return; } =pod =item $node->lsn(mode) Look up WAL locations on the server: * insert location (primary only, error on replica) * write location (primary only, error on replica) * flush location (primary only, error on replica) * receive location (always undef on primary) * replay location (always undef on primary) mode must be specified. =cut sub lsn { my ($self, $mode) = @_; my %modes = $self->_lsn_mode_map; $mode = '<undef>' if !defined($mode); croak "unknown mode for 'lsn': '$mode', valid modes are " . join(', ', keys %modes) if !defined($modes{$mode}); my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}"); chomp($result); if ($result eq '') { return; } else { return $result; } } sub _lsn_mode_map { return ( 'insert' => 'pg_current_wal_insert_lsn()', 'flush' => 'pg_current_wal_flush_lsn()', 'write' => 'pg_current_wal_lsn()', 'receive' => 'pg_last_wal_receive_lsn()', 'replay' => 'pg_last_wal_replay_lsn()'); } =pod =item $node->wait_for_catchup(standby_name, mode, target_lsn) Wait for the node with application_name standby_name (usually from node->name, also works for logical subscriptions) until its replication location in pg_stat_replication equals or passes the upstream's WAL insert point at the time this function is called. By default the replay_lsn is waited for, but 'mode' may be specified to wait for any of sent|write|flush|replay. The connection catching up must be in a streaming state. If there is no active replication connection from this peer, waits until poll_query_until timeout. Requires that the 'postgres' db exists and is accessible. target_lsn may be any arbitrary lsn, but is typically $primary_node->lsn('insert'). If omitted, pg_current_wal_lsn() is used. This is not a test. It die()s on failure. =cut sub wait_for_catchup { my ($self, $standby_name, $mode, $target_lsn) = @_; $mode = defined($mode) ? $mode : 'replay'; my %valid_modes = ('sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1); croak "unknown mode $mode for 'wait_for_catchup', valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode}); # Allow passing of a PostgresNode instance as shorthand if (blessed($standby_name) && $standby_name->isa("PostgresNode")) { $standby_name = $standby_name->name; } my $lsn_expr; if (defined($target_lsn)) { $lsn_expr = "'$target_lsn'"; } else { my %funcmap = $self->_lsn_mode_map; $lsn_expr = $funcmap{write}; } my $suffix = $self->_replication_suffix; print "Waiting for replication conn " . $standby_name . "'s " . $mode . "_lsn to pass " . $lsn_expr . " on " . $self->name . "\n"; my $query = qq[SELECT $lsn_expr <= ${mode}$suffix AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name in ('$standby_name', 'walreceiver');]; $self->poll_query_until('postgres', $query) or croak "timed out waiting for catchup"; print "done\n"; return; } sub _current_lsn_func { return "pg_current_wal_lsn"; } sub _replication_suffix { return "_lsn"; } =pod =item $node->wait_for_slot_catchup(slot_name, mode, target_lsn) Wait for the named replication slot to equal or pass the supplied target_lsn. The location used is the restart_lsn unless mode is given, in which case it may be 'restart' or 'confirmed_flush'. Requires that the 'postgres' db exists and is accessible. This is not a test. It die()s on failure. If the slot is not active, will time out after poll_query_until's timeout. target_lsn may be any arbitrary lsn, but is typically $primary_node->lsn('insert'). Note that for logical slots, restart_lsn is held down by the oldest in-progress tx. =cut sub wait_for_slot_catchup { my ($self, $slot_name, $mode, $target_lsn) = @_; $mode = defined($mode) ? $mode : 'restart'; if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) { croak "valid modes are restart, confirmed_flush"; } croak 'target lsn must be specified' unless defined($target_lsn); print "Waiting for replication slot " . $slot_name . "'s " . $mode . "_lsn to pass " . $target_lsn . " on " . $self->name . "\n"; my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name';]; $self->poll_query_until('postgres', $query) or croak "timed out waiting for catchup"; print "done\n"; return; } =pod =item $node->query_hash($dbname, $query, @columns) Execute $query on $dbname, replacing any appearance of the string __COLUMNS__ within the query with a comma-separated list of @columns. If __COLUMNS__ does not appear in the query, its result columns must EXACTLY match the order and number (but not necessarily alias) of supplied @columns. The query must return zero or one rows. Return a hash-ref representation of the results of the query, with any empty or null results as defined keys with an empty-string value. There is no way to differentiate between null and empty-string result fields. If the query returns zero rows, return a hash with all columns empty. There is no way to differentiate between zero rows returned and a row with only null columns. =cut sub query_hash { my ($self, $dbname, $query, @columns) = @_; croak 'calls in array context for multi-row results not supported yet' if (wantarray); # Replace __COLUMNS__ if found substr($query, index($query, '__COLUMNS__'), length('__COLUMNS__')) = join(', ', @columns) if index($query, '__COLUMNS__') >= 0; my $result = $self->safe_psql($dbname, $query); # hash slice, see http://stackoverflow.com/a/16755894/398670 . # # Fills the hash with empty strings produced by x-operator element # duplication if result is an empty row # my %val; @val{@columns} = $result ne '' ? split(qr/\|/, $result, -1) : ('',) x scalar(@columns); return \%val; } =pod =item $node->slot(slot_name) Return hash-ref of replication slot data for the named slot, or a hash-ref with all values '' if not found. Does not differentiate between null and empty string for fields, no field is ever undef. The restart_lsn and confirmed_flush_lsn fields are returned verbatim, and also as a 2-list of [highword, lowword] integer. Since we rely on Perl 5.8.8 we can't "use bigint", it's from 5.20, and we can't assume we have Math::Bigint from CPAN either. =cut sub slot { my ($self, $slot_name) = @_; my @columns = ( 'plugin', 'slot_type', 'datoid', 'database', 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn'); return $self->query_hash( 'postgres', "SELECT __COLUMNS__ FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name'", @columns); } =pod =item $node->pg_recvlogical_upto(self, dbname, slot_name, endpos, timeout_secs, ...) Invoke pg_recvlogical to read from slot_name on dbname until LSN endpos, which corresponds to pg_recvlogical --endpos. Gives up after timeout (if nonzero). Disallows pg_recvlogical from internally retrying on error by passing --no-loop. Plugin options are passed as additional keyword arguments. If called in scalar context, returns stdout, and die()s on timeout or nonzero return. If called in array context, returns a tuple of (retval, stdout, stderr, timeout). timeout is the IPC::Run::Timeout object whose is_expired method can be tested to check for timeout. retval is undef on timeout. =cut sub pg_recvlogical_upto { my ($self, $dbname, $slot_name, $endpos, $timeout_secs, %plugin_options) = @_; local %ENV = $self->_get_env(); my ($stdout, $stderr); my $timeout_exception = 'pg_recvlogical timed out'; croak 'slot name must be specified' unless defined($slot_name); croak 'endpos must be specified' unless defined($endpos); my @cmd = ( $self->installed_command('pg_recvlogical'), '-S', $slot_name, '--dbname', $self->connstr($dbname)); push @cmd, '--endpos', $endpos; push @cmd, '-f', '-', '--no-loop', '--start'; while (my ($k, $v) = each %plugin_options) { croak "= is not permitted to appear in replication option name" if ($k =~ qr/=/); push @cmd, "-o", "$k=$v"; } my $timeout; $timeout = IPC::Run::timeout($timeout_secs, exception => $timeout_exception) if $timeout_secs; my $ret = 0; do { local $@; eval { IPC::Run::run(\@cmd, ">", \$stdout, "2>", \$stderr, $timeout); $ret = $?; }; my $exc_save = $@; if ($exc_save) { # IPC::Run::run threw an exception. re-throw unless it's a # timeout, which we'll handle by testing is_expired die $exc_save if (blessed($exc_save) || $exc_save !~ qr/$timeout_exception/); $ret = undef; die "Got timeout exception '$exc_save' but timer not expired?!" unless $timeout->is_expired; die "$exc_save waiting for endpos $endpos with stdout '$stdout', stderr '$stderr'" unless wantarray; } }; $stdout =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; $stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; if (wantarray) { return ($ret, $stdout, $stderr, $timeout); } else { die "pg_recvlogical exited with code '$ret', stdout '$stdout' and stderr '$stderr'" if $ret; return $stdout; } } =pod =back =cut ########################################################################## # # Subclasses. # # There should be a subclass for each old version supported. The newest # (i.e. the one for the latest stable release) should inherit from the # PostgresNode class. Each other subclass should inherit from the subclass # repesenting the immediately succeeding stable release. # # The name must be PostgresNodeV_nn{_nn} where V_nn_{_nn} corresonds to the # release number (e.g. V_12 for release 12 or V_9_6 fpr release 9.6.) # PostgresNode knows about this naming convention and blesses each node # into the appropriate subclass. # # Each time a new stable release branch is made a subclass should be added # that inherits from PostgresNode, and be made the parent of the previous # subclass that inherited from PostgresNode. # # An empty package means that there are no differences that need to be # handled between this release and the later release. # ########################################################################## package PostgresNodeV_16; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNode); # https://www.postgresql.org/docs/16/release-16.html ########################################################################## package PostgresNodeV_15; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_16); # https://www.postgresql.org/docs/15/release-15.html ########################################################################## package PostgresNodeV_14; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_15); # Internal method to set stat_temp_directory GUC sub set_stats_temp_directory { my ($self, $conf) = @_; # XXX Neutralize any stats_temp_directory in TEMP_CONFIG. Nodes running # concurrently must not share a stats_temp_directory. print $conf "stats_temp_directory = 'pg_stat_tmp'\n"; } # https://www.postgresql.org/docs/14/release-14.html ########################################################################## package PostgresNodeV_13; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_14); # https://www.postgresql.org/docs/13/release-13.html ########################################################################## package PostgresNodeV_12; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_13); # https://www.postgresql.org/docs/12/release-12.html ########################################################################## package PostgresNodeV_11; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_12); # https://www.postgresql.org/docs/11/release-11.html # max_wal_senders + superuser_reserved_connections must be < max_connections # uses recovery.conf sub _recovery_file { return "recovery.conf"; } sub set_standby_mode { my $self = shift; $self->append_conf("recovery.conf", "standby_mode = on\n"); } sub init { my ($self, %params) = @_; $self->SUPER::init(%params); $self->adjust_conf('postgresql.conf', 'max_wal_senders', $params{allows_streaming} ? 5 : 0); } ########################################################################## package PostgresNodeV_10; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_11); # https://www.postgresql.org/docs/10/release-10.html ########################################################################## package PostgresNodeV_9_6; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_10); # https://www.postgresql.org/docs/9.6/release-9-6.html # no -no-sync option for pg_basebackup # replication conf is a bit different too # lsn function names are different sub _backup_sync { return (); } sub set_replication_conf { my ($self) = @_; my $pgdata = $self->data_dir; $self->host eq $test_pghost or die "set_replication_conf only works with the default host"; open my $hba, ">>$pgdata/pg_hba.conf"; print $hba "\n# Allow replication (set up by PostgresNode.pm)\n"; if (!$TestLib::windows_os) { print $hba "local replication all trust\n"; } else { print $hba "host replication all $test_localhost/32 sspi include_realm=1 map=regress\n"; } close $hba; } sub _lsn_mode_map { return ( 'insert' => 'pg_current_xlog_insert_location()', 'flush' => 'pg_current_xlog_flush_location()', 'write' => 'pg_current_xlog_location()', 'receive' => 'pg_last_xlog_receive_location()', 'replay' => 'pg_last_xlog_replay_location()'); } sub _replication_suffix { return "_location"; } ########################################################################## package PostgresNodeV_9_5; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_9_6); # https://www.postgresql.org/docs/9.5/release-9-5.html # no wal_level = replica sub init { my ($self, %params) = @_; $self->SUPER::init(%params); $self->adjust_conf('postgresql.conf', 'wal_level', 'hot_standby') if $params{allows_streaming}; } ########################################################################## package PostgresNodeV_9_4; ## no critic (ProhibitMultiplePackages) use Test::More; use parent -norequire, qw(PostgresNodeV_9_5); # https://www.postgresql.org/docs/9.4/release-9-4.html # no log_replication_commands # no wal_retrieve_retry_interval # no cluster_name sub init { my ($self, %params) = @_; $self->SUPER::init(%params); $self->adjust_conf('postgresql.conf', 'log_replication_commands', undef); $self->adjust_conf('postgresql.conf', 'wal_retrieve_retry_interval', undef); $self->adjust_conf('postgresql.conf', 'max_wal_size', undef); } sub _cluster_name_opt { return (); } ########################################################################## package PostgresNodeV_9_3; ## no critic (ProhibitMultiplePackages) use Test::More; use parent -norequire, qw(PostgresNodeV_9_4); # https://www.postgresql.org/docs/9.3/release-9-3.html # no logical replication, so no logical streaming sub init { my ($self, %params) = @_; $self->SUPER::init(%params); $self->adjust_conf('postgresql.conf', 'max_replication_slots', undef); $self->adjust_conf('postgresql.conf', 'wal_log_hints', undef); } sub _init_streaming { my ($self, $conf, $allows_streaming) = @_; BAIL_OUT("Server Version too old for logical replication") if ($allows_streaming eq "logical"); $self->SUPER::_init_streaming($conf, $allows_streaming); } ########################################################################## package PostgresNodeV_9_2; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_9_3); # https://www.postgresql.org/docs/9.3/release-9-2.html # no -N flag to initdb # socket location is in unix_socket_directory sub _initdb_flags { return ('-A', 'trust'); } sub _init_network { my ($self, $conf, $use_tcp, $host) = @_; if ($use_tcp) { print $conf "unix_socket_directory = ''\n"; print $conf "listen_addresses = '$host'\n"; } else { print $conf "unix_socket_directory = '$host'\n"; print $conf "listen_addresses = ''\n"; } } sub _init_network_append { my ($self, $use_tcp, $host) = @_; if ($use_tcp) { $self->append_conf('postgresql.conf', "listen_addresses = '$host'"); } else { $self->append_conf('postgresql.conf', "unix_socket_directory = '$host'"); } } ########################################################################## package PostgresNodeV_9_1; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_9_2); # https://www.postgresql.org/docs/9.3/release-9-1.html ########################################################################## package PostgresNodeV_9_0; ## no critic (ProhibitMultiplePackages) use Test::More; use parent -norequire, qw(PostgresNodeV_9_1); # https://www.postgresql.org/docs/9.3/release-9-0.html # no wal_senders setting # no pg_basebackup # can't turn off restart after crash sub init { my ($self, @args) = @_; $self->SUPER::init(@args); $self->adjust_conf('postgresql.conf', 'restart_after_crash', undef); $self->adjust_conf('postgresql.conf', 'wal_senders', undef); } sub _init_restart_after_crash { return ""; } sub backup { BAIL_OUT("Server version too old for backup function"); } sub init_from_backup { BAIL_OUT("Server version too old for init_from_backup function"); } ########################################################################## package PostgresNodeV_8_4; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_9_0); # https://www.postgresql.org/docs/9.3/release-8-4.html # no wal_level setting # no streaming sub _init_wal_level_minimal { # do nothing } sub _init_streaming { my ($self, $conf, $allows_streaming) = @_; BAIL_OUT("Server Version too old for streaming replication"); } ########################################################################## package PostgresNodeV_8_3; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_8_4); # https://www.postgresql.org/docs/9.3/release-8-3.html # no stats_temp_directory setting # no -w flag for psql sub init { my ($self, @args) = @_; $self->SUPER::init(@args); $self->adjust_conf('postgresql.conf', 'stats_temp_directory', undef); } sub psql { my ($self, $dbname, $sql, %params) = @_; local $ENV{PGPASSWORD}; if ($params{no_password}) { # since there is no -w flag for psql here, we try to # inhibit a password prompt by setting PGPASSWORD instead $ENV{PGPASSWORD} = 'no_such_password_12345'; delete $params{no_password}; } $self->SUPER::psql($dbname, $sql, %params); } sub interactive_psql { my ($self, $dbname, $stdin, $stdout, $timer, %params) = @_; local %ENV = $self->_get_env(); my @psql_params = ( $self->installed_command('psql'), '-XAt', '-p', $self->port, '-h', $self->host, '-d', $dbname ); push @psql_params, @{ $params{extra_params} } if defined $params{extra_params}; # Ensure there is no data waiting to be sent: $$stdin = "" if ref($stdin); # IPC::Run would otherwise append to existing contents: $$stdout = "" if ref($stdout); my $harness = IPC::Run::start \@psql_params, '<pty<', $stdin, '>pty>', $stdout, $timer; # Pump until we see psql's help banner. This ensures that callers # won't write anything to the pty before it's ready, avoiding an # implementation issue in IPC::Run. Also, it means that psql # connection failures are caught here, relieving callers of # the need to handle those. (Right now, we have no particularly # good handling for errors anyway, but that might be added later.) pump $harness until $$stdout =~ /\\q to quit/ || $timer->is_expired; die "psql startup timed out" if $timer->is_expired; return $harness; } ########################################################################## package PostgresNodeV_8_2; ## no critic (ProhibitMultiplePackages) use Test::More; use parent -norequire, qw(PostgresNodeV_8_3); use Time::HiRes qw(usleep); use Config; # https://www.postgresql.org/docs/9.3/release-8-2.html # no support for connstr with = sub psql { my ($self, $dbname, $sql, %params) = @_; my $connstr = $params{connstr}; BAIL_OUT("Server version too old: complex connstr with = not supported") if (defined($connstr) && $connstr =~ /=/); # Handle the simple common case where there's no explicit connstr $params{host} ||= $self->host; $params{port} ||= $self->port; # Supply this so the superclass doesn't try to construct a connstr $params{connstr} ||= $dbname; $self->SUPER::psql($dbname, $sql, %params); } sub poll_query_until { my ($self, $dbname, $query, $expected) = @_; local %ENV = $self->_get_env(); $expected = 't' unless defined($expected); # default value # my $cmd = [ # $self->installed_command('psql'), # '-XAt', '-c', $query, '-d', $self->connstr($dbname) # ]; my ($stdout, $stderr); my $max_attempts = 180 * 10; my $attempts = 0; while ($attempts < $max_attempts) { my $result = $self->psql($dbname, $query, stdout => \$stdout, stderr => \$stderr ); $stdout =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp($stdout); if ($stdout eq $expected) { return 1; } # Wait 0.1 second before retrying. usleep(100_000); $attempts++; } # The query result didn't change in 180 seconds. Give up. Print the # output from the last attempt, hopefully that's useful for debugging. $stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp($stderr); diag qq(poll_query_until timed out executing this query: $query expecting this output: $expected last actual query output: $stdout with stderr: $stderr); return 0; } # Internal routine to enable archiving sub enable_archiving { my ($self) = @_; $self->SUPER::enable_archiving; # Remove non existing archive_mode $self->adjust_conf( 'postgresql.conf', 'archive_mode', undef ); return; } ########################################################################## package PostgresNodeV_8_1; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_8_2); # https://www.postgresql.org/docs/9.3/release-8-1.html ########################################################################## package PostgresNodeV_8_0; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_8_1); # https://www.postgresql.org/docs/9.3/release-8-0.html ########################################################################## package PostgresNodeV_7_4; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_8_0); # https://www.postgresql.org/docs/9.3/release-7-4.html # no '-A trust' for initdb # no log_line_prefix # no 'log_statement = all' (only 'on') # no listen_addresses - use tcpip_socket and virtual_host instead # no archiving sub _initdb_flags { return (); } sub init { my ($self, @args) = @_; $self->SUPER::init(@args); $self->adjust_conf('postgresql.conf', 'log_line_prefix', undef); $self->adjust_conf('postgresql.conf', 'log_statement', 'on'); } sub _init_network { my ($self, $conf, $use_tcp, $host) = @_; if ($use_tcp) { print $conf "unix_socket_directory = ''\n"; print $conf "virtual_host = '$host'\n"; print $conf "tcpip_socket = true\n"; } else { print $conf "unix_socket_directory = '$host'\n"; } } ########################################################################## package PostgresNodeV_7_3; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_7_4); # https://www.postgresql.org/docs/9.3/release-7-3.html ########################################################################## package PostgresNodeV_7_2; ## no critic (ProhibitMultiplePackages) use parent -norequire, qw(PostgresNodeV_7_3); # https://www.postgresql.org/docs/9.3/release-7-2.html # no log_statement sub init { my ($self, @args) = @_; $self->SUPER::init(@args); $self->adjust_conf('postgresql.conf', 'log_statement', undef); } ########################################################################## # traditional module 'value' 1; ���������������������������������������check_pgactivity-REL2_7/t/lib/PostgresVersion.pm����������������������������������������������������0000664�0000000�0000000�00000006653�14504266471�0022074�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������############################################################################ # # PostgresVersion.pm # # Module encapsulating Postgres Version numbers # # Copyright (c) 2021, PostgreSQL Global Development Group # ############################################################################ =pod =head1 NAME PostgresVersion - class representing PostgreSQL version numbers =head1 SYNOPSIS use PostgresVersion; my $version = PostgresVersion->new($version_arg); # compare two versions my $bool = $version1 <= $version2; # or compare with a number $bool = $version < 12; # or with a string $bool = $version lt "13.1"; # interpolate in a string my $stringyval = "version: $version"; # get the major version my $maj = $version->major; =head1 DESCRIPTION PostgresVersion encapsulates Postgres version numbers, providing parsing of common version formats and comparison operations. =cut package PostgresVersion; use strict; use warnings; use Scalar::Util qw(blessed); use overload '<=>' => \&_version_cmp, 'cmp' => \&_version_cmp, '""' => \&_stringify; =pod =head1 METHODS =over =item PostgresVersion->new($version) Create a new PostgresVersion instance. The argument can be a number like 12, or a string like '12.2' or the output of a Postgres command like `psql --version` or `pg_config --version`; =back =cut sub new { my $class = shift; my $arg = shift; chomp $arg; # Accept standard formats, in case caller has handed us the output of a # postgres command line tool my $devel; ($arg, $devel) = ($1, $2) if ( $arg =~ m!^ # beginning of line (?:\(?PostgreSQL\)?\s)? # ignore PostgreSQL marker (\d+(?:\.\d+)*) # version number, dotted notation (devel|(?:alpha|beta|rc)\d+)? # dev marker - see version_stamp.pl !x); # Split into an array my @numbers = split(/\./, $arg); # Treat development versions as having a minor/micro version one less than # the first released version of that branch. push @numbers, -1 if ($devel); $numbers[$_] += 0 for 0..3; $devel ||= ""; return bless { str => "$arg$devel", num => \@numbers }, $class; } # Routine which compares the _pg_version_array obtained for the two # arguments and returns -1, 0, or 1, allowing comparison between two # PostgresVersion objects or a PostgresVersion and a version string or number. # # If the second argument is not a blessed object we call the constructor # to make one. # # Because we're overloading '<=>' and 'cmp' this function supplies us with # all the comparison operators ('<' and friends, 'gt' and friends) # sub _version_cmp { my ($a, $b, $swapped) = @_; $b = __PACKAGE__->new($b) unless blessed($b); ($a, $b) = ($b, $a) if $swapped; my ($an, $bn) = ($a->{num}, $b->{num}); for (my $idx = 0;; $idx++) { return 0 unless (defined $an->[$idx] && defined $bn->[$idx]); return $an->[$idx] <=> $bn->[$idx] if ($an->[$idx] <=> $bn->[$idx]); } } # Render the version number using the saved string. sub _stringify { my $self = shift; return $self->{str}; } =pod =over =item major([separator => 'char']) Returns the major version. For versions before 10 the parts are separated by a dot unless the separator argument is given. =back =cut sub major { my ($self, %params) = @_; my $result = $self->{num}->[0]; if ($result + 0 < 10) { my $sep = $params{separator} || '.'; $result .= "$sep$self->{num}->[1]"; } return $result; } 1; �������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/RecursiveCopy.pm������������������������������������������������������0000664�0000000�0000000�00000007603�14504266471�0021516�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������ # Copyright (c) 2021, PostgreSQL Global Development Group =pod =head1 NAME RecursiveCopy - simple recursive copy implementation =head1 SYNOPSIS use RecursiveCopy; RecursiveCopy::copypath($from, $to, filterfn => sub { return 1; }); RecursiveCopy::copypath($from, $to); =cut package RecursiveCopy; use strict; use warnings; use Carp; use File::Basename; use File::Copy; =pod =head1 DESCRIPTION =head2 copypath($from, $to, %params) Recursively copy all files and directories from $from to $to. Does not preserve file metadata (e.g., permissions). Only regular files and subdirectories are copied. Trying to copy other types of directory entries raises an exception. Raises an exception if a file would be overwritten, the source directory can't be read, or any I/O operation fails. However, we silently ignore ENOENT on open, because when copying from a live database it's possible for a file/dir to be deleted after we see its directory entry but before we can open it. Always returns true. If the B<filterfn> parameter is given, it must be a subroutine reference. This subroutine will be called for each entry in the source directory with its relative path as only parameter; if the subroutine returns true the entry is copied, otherwise the file is skipped. On failure the target directory may be in some incomplete state; no cleanup is attempted. =head1 EXAMPLES RecursiveCopy::copypath('/some/path', '/empty/dir', filterfn => sub { # omit log/ and contents my $src = shift; return $src ne 'log'; } ); =cut sub copypath { my ($base_src_dir, $base_dest_dir, %params) = @_; my $filterfn; if (defined $params{filterfn}) { croak "if specified, filterfn must be a subroutine reference" unless defined(ref $params{filterfn}) and (ref $params{filterfn} eq 'CODE'); $filterfn = $params{filterfn}; } else { $filterfn = sub { return 1; }; } # Complain if original path is bogus, because _copypath_recurse won't. croak "\"$base_src_dir\" does not exist" if !-e $base_src_dir; # Start recursive copy from current directory return _copypath_recurse($base_src_dir, $base_dest_dir, "", $filterfn); } # Recursive private guts of copypath sub _copypath_recurse { my ($base_src_dir, $base_dest_dir, $curr_path, $filterfn) = @_; my $srcpath = "$base_src_dir/$curr_path"; my $destpath = "$base_dest_dir/$curr_path"; # invoke the filter and skip all further operation if it returns false return 1 unless &$filterfn($curr_path); # Check for symlink -- needed only on source dir # (note: this will fall through quietly if file is already gone) croak "Cannot operate on symlink \"$srcpath\"" if -l $srcpath; # Abort if destination path already exists. Should we allow directories # to exist already? croak "Destination path \"$destpath\" already exists" if -e $destpath; # If this source path is a file, simply copy it to destination with the # same name and we're done. if (-f $srcpath) { my $fh; unless (open($fh, '<', $srcpath)) { return 1 if ($!{ENOENT}); die "open($srcpath) failed: $!"; } copy($fh, $destpath) or die "copy $srcpath -> $destpath failed: $!"; close $fh; return 1; } # If it's a directory, create it on dest and recurse into it. if (-d $srcpath) { my $directory; unless (opendir($directory, $srcpath)) { return 1 if ($!{ENOENT}); die "opendir($srcpath) failed: $!"; } mkdir($destpath) or die "mkdir($destpath) failed: $!"; while (my $entry = readdir($directory)) { next if ($entry eq '.' or $entry eq '..'); _copypath_recurse($base_src_dir, $base_dest_dir, $curr_path eq '' ? $entry : "$curr_path/$entry", $filterfn) or die "copypath $srcpath/$entry -> $destpath/$entry failed"; } closedir($directory); return 1; } # If it disappeared from sight, that's OK. return 1 if !-e $srcpath; # Else it's some weird file type; complain. croak "Source path \"$srcpath\" is not a regular file or directory"; } 1; �����������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/SimpleTee.pm����������������������������������������������������������0000664�0000000�0000000�00000001257�14504266471�0020602�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������ # Copyright (c) 2021, PostgreSQL Global Development Group # A simple 'tee' implementation, using perl tie. # # Whenever you print to the handle, it gets forwarded to a list of # handles. The list of output filehandles is passed to the constructor. # # This is similar to IO::Tee, but only used for output. Only the PRINT # method is currently implemented; that's all we need. We don't want to # depend on IO::Tee just for this. package SimpleTee; use strict; use warnings; sub TIEHANDLE { my $self = shift; return bless \@_, $self; } sub PRINT { my $self = shift; my $ok = 1; for my $fh (@$self) { print $fh @_ or $ok = 0; $fh->flush or $ok = 0; } return $ok; } 1; �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/TestLib.pm������������������������������������������������������������0000664�0000000�0000000�00000047726�14504266471�0020274�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������ # Copyright (c) 2021, PostgreSQL Global Development Group =pod =head1 NAME TestLib - helper module for writing PostgreSQL's C<prove> tests. =head1 SYNOPSIS use TestLib; # Test basic output of a command program_help_ok('initdb'); program_version_ok('initdb'); program_options_handling_ok('initdb'); # Test option combinations command_fails(['initdb', '--invalid-option'], 'command fails with invalid option'); my $tempdir = TestLib::tempdir; command_ok('initdb', '-D', $tempdir); # Miscellanea print "on Windows" if $TestLib::windows_os; my $path = TestLib::perl2host($backup_dir); ok(check_mode_recursive($stream_dir, 0700, 0600), "check stream dir permissions"); TestLib::system_log('pg_ctl', 'kill', 'QUIT', $slow_pid); =head1 DESCRIPTION C<TestLib> contains a set of routines dedicated to environment setup for a PostgreSQL regression test run and includes some low-level routines aimed at controlling command execution, logging and test functions. =cut # This module should never depend on any other PostgreSQL regression test # modules. package TestLib; use strict; use warnings; use Carp; use Config; use Cwd; use Exporter 'import'; use Fcntl qw(:mode :seek); use File::Basename; use File::Find; use File::Spec; use File::stat qw(stat); use File::Temp (); use IPC::Run; use SimpleTee; # specify a recent enough version of Test::More to support the # done_testing() function use Test::More 0.87; our @EXPORT = qw( generate_ascii_string slurp_dir slurp_file append_to_file check_mode_recursive chmod_recursive check_pg_config dir_symlink system_or_bail system_log run_log run_command command_ok command_fails command_exit_is program_help_ok program_version_ok program_options_handling_ok command_like command_like_safe command_fails_like command_checks_all $windows_os $is_msys2 $use_unix_sockets ); our ($windows_os, $is_msys2, $use_unix_sockets, $tmp_check, $log_path, $test_logfile); BEGIN { # Set to untranslated messages, to be able to compare program output # with expected strings. delete $ENV{LANGUAGE}; delete $ENV{LC_ALL}; $ENV{LC_MESSAGES} = 'C'; # This list should be kept in sync with pg_regress.c. my @envkeys = qw ( PGCHANNELBINDING PGCLIENTENCODING PGCONNECT_TIMEOUT PGDATA PGDATABASE PGGSSENCMODE PGGSSLIB PGHOSTADDR PGKRBSRVNAME PGPASSFILE PGPASSWORD PGREQUIREPEER PGREQUIRESSL PGSERVICE PGSERVICEFILE PGSSLCERT PGSSLCRL PGSSLCRLDIR PGSSLKEY PGSSLMAXPROTOCOLVERSION PGSSLMINPROTOCOLVERSION PGSSLMODE PGSSLROOTCERT PGSSLSNI PGTARGETSESSIONATTRS PGUSER PGPORT PGHOST PG_COLOR ); delete @ENV{@envkeys}; $ENV{PGAPPNAME} = basename($0); # Must be set early $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'; # Check if this environment is MSYS2. $is_msys2 = $^O eq 'msys' && `uname -or` =~ /^[2-9].*Msys/; if ($windows_os) { require Win32API::File; Win32API::File->import( qw(createFile OsFHandleOpen CloseHandle setFilePointer)); } # Specifies whether to use Unix sockets for test setups. On # Windows we don't use them by default since it's not universally # supported, but it can be overridden if desired. $use_unix_sockets = (!$windows_os || defined $ENV{PG_TEST_USE_UNIX_SOCKETS}); } =pod =head1 EXPORTED VARIABLES =over =item C<$windows_os> Set to true when running under Windows, except on Cygwin. =item C<$is_msys2> Set to true when running under MSYS2. =back =cut INIT { # Return EPIPE instead of killing the process with SIGPIPE. An affected # test may still fail, but it's more likely to report useful facts. $SIG{PIPE} = 'IGNORE'; # Determine output directories, and create them. The base path is the # TESTDIR environment variable, which is normally set by the invoking # Makefile. $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check"; $log_path = "$tmp_check/log"; mkdir $tmp_check; mkdir $log_path; # Open the test log file, whose name depends on the test name. $test_logfile = basename($0); $test_logfile =~ s/\.[^.]+$//; $test_logfile = "$log_path/regress_log_$test_logfile"; open my $testlog, '>', $test_logfile or die "could not open STDOUT to logfile \"$test_logfile\": $!"; # Hijack STDOUT and STDERR to the log file open(my $orig_stdout, '>&', \*STDOUT); open(my $orig_stderr, '>&', \*STDERR); open(STDOUT, '>&', $testlog); open(STDERR, '>&', $testlog); # The test output (ok ...) needs to be printed to the original STDOUT so # that the 'prove' program can parse it, and display it to the user in # real time. But also copy it to the log file, to provide more context # in the log. my $builder = Test::More->builder; my $fh = $builder->output; tie *$fh, "SimpleTee", $orig_stdout, $testlog; $fh = $builder->failure_output; tie *$fh, "SimpleTee", $orig_stderr, $testlog; # Enable auto-flushing for all the file handles. Stderr and stdout are # redirected to the same file, and buffering causes the lines to appear # in the log in confusing order. autoflush STDOUT 1; autoflush STDERR 1; autoflush $testlog 1; } END { # Test files have several ways of causing prove_check to fail: # 1. Exit with a non-zero status. # 2. Call ok(0) or similar, indicating that a constituent test failed. # 3. Deviate from the planned number of tests. # # Preserve temporary directories after (1) and after (2). $File::Temp::KEEP_ALL = 1 unless $? == 0 && all_tests_passing(); } =pod =head1 ROUTINES =over =item all_tests_passing() Return 1 if all the tests run so far have passed. Otherwise, return 0. =cut sub all_tests_passing { foreach my $status (Test::More->builder->summary) { return 0 unless $status; } return 1; } =pod =item tempdir(prefix) Securely create a temporary directory inside C<$tmp_check>, like C<mkdtemp>, and return its name. The directory will be removed automatically at the end of the tests. If C<prefix> is given, the new directory is templated as C<${prefix}_XXXX>. Otherwise the template is C<tmp_test_XXXX>. =cut sub tempdir { my ($prefix) = @_; $prefix = "tmp_test" unless defined $prefix; return File::Temp::tempdir( $prefix . '_XXXX', DIR => $tmp_check, CLEANUP => 1); } =pod =item tempdir_short() As above, but the directory is outside the build tree so that it has a short name, to avoid path length issues. =cut sub tempdir_short { return File::Temp::tempdir(CLEANUP => 1); } =pod =item perl2host() Translate a virtual file name to a host file name. Currently, this is a no-op except for the case of Perl=msys and host=mingw32. The subject need not exist, but its parent or grandparent directory must exist unless cygpath is available. =cut sub perl2host { my ($subject) = @_; return $subject unless $Config{osname} eq 'msys'; if ($is_msys2) { # get absolute, windows type path my $path = qx{cygpath -a -w "$subject"}; if (!$?) { chomp $path; return $path if $path; } # fall through if this didn't work. } my $here = cwd; my $leaf; if (chdir $subject) { $leaf = ''; } else { $leaf = '/' . basename $subject; my $parent = dirname $subject; if (!chdir $parent) { $leaf = '/' . basename($parent) . $leaf; $parent = dirname $parent; chdir $parent or die "could not chdir \"$parent\": $!"; } } # this odd way of calling 'pwd -W' is the only way that seems to work. my $dir = qx{sh -c "pwd -W"}; chomp $dir; chdir $here; return $dir . $leaf; } =pod =item system_log(@cmd) Run (via C<system()>) the command passed as argument; the return value is passed through. =cut sub system_log { print("# Running: " . join(" ", @_) . "\n"); return system(@_); } =pod =item system_or_bail(@cmd) Run (via C<system()>) the command passed as argument, and returns if the command is successful. On failure, abandon further tests and exit the program. =cut sub system_or_bail { if (system_log(@_) != 0) { BAIL_OUT("system $_[0] failed"); } return; } =pod =item run_log(@cmd) Run the given command via C<IPC::Run::run()>, noting it in the log. The return value from the command is passed through. =cut sub run_log { print("# Running: " . join(" ", @{ $_[0] }) . "\n"); return IPC::Run::run(@_); } =pod =item run_command(cmd) Run (via C<IPC::Run::run()>) the command passed as argument. The return value from the command is ignored. The return value is C<($stdout, $stderr)>. =cut sub run_command { my ($cmd) = @_; my ($stdout, $stderr); my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; chomp($stdout); chomp($stderr); return ($stdout, $stderr); } =pod =item generate_ascii_string(from_char, to_char) Generate a string made of the given range of ASCII characters. =cut sub generate_ascii_string { my ($from_char, $to_char) = @_; my $res; for my $i ($from_char .. $to_char) { $res .= sprintf("%c", $i); } return $res; } =pod =item slurp_dir(dir) Return the complete list of entries in the specified directory. =cut sub slurp_dir { my ($dir) = @_; opendir(my $dh, $dir) or croak "could not opendir \"$dir\": $!"; my @direntries = readdir $dh; closedir $dh; return @direntries; } =pod =item slurp_file(filename [, $offset]) Return the full contents of the specified file, beginning from an offset position if specified. =cut sub slurp_file { my ($filename, $offset) = @_; local $/; my $contents; if ($Config{osname} ne 'MSWin32') { open(my $in, '<', $filename) or croak "could not read \"$filename\": $!"; if (defined($offset)) { seek($in, $offset, SEEK_SET) or croak "could not seek \"$filename\": $!"; } $contents = <$in>; close $in; } else { my $fHandle = createFile($filename, "r", "rwd") or croak "could not open \"$filename\": $^E"; OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r') or croak "could not read \"$filename\": $^E\n"; if (defined($offset)) { setFilePointer($fh, $offset, qw(FILE_BEGIN)) or croak "could not seek \"$filename\": $^E\n"; } $contents = <$fh>; CloseHandle($fHandle) or croak "could not close \"$filename\": $^E\n"; } $contents =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; return $contents; } =pod =item append_to_file(filename, str) Append a string at the end of a given file. (Note: no newline is appended at end of file.) =cut sub append_to_file { my ($filename, $str) = @_; open my $fh, ">>", $filename or croak "could not write \"$filename\": $!"; print $fh $str; close $fh; return; } =pod =item check_mode_recursive(dir, expected_dir_mode, expected_file_mode, ignore_list) Check that all file/dir modes in a directory match the expected values, ignoring files in C<ignore_list> (basename only). =cut sub check_mode_recursive { my ($dir, $expected_dir_mode, $expected_file_mode, $ignore_list) = @_; # Result defaults to true my $result = 1; find( { follow_fast => 1, wanted => sub { # Is file in the ignore list? foreach my $ignore ($ignore_list ? @{$ignore_list} : []) { if ("$dir/$ignore" eq $File::Find::name) { return; } } # Allow ENOENT. A running server can delete files, such as # those in pg_stat. Other stat() failures are fatal. my $file_stat = stat($File::Find::name); unless (defined($file_stat)) { my $is_ENOENT = $!{ENOENT}; my $msg = "unable to stat $File::Find::name: $!"; if ($is_ENOENT) { warn $msg; return; } else { die $msg; } } my $file_mode = S_IMODE($file_stat->mode); # Is this a file? if (S_ISREG($file_stat->mode)) { if ($file_mode != $expected_file_mode) { print( *STDERR, sprintf("$File::Find::name mode must be %04o\n", $expected_file_mode)); $result = 0; return; } } # Else a directory? elsif (S_ISDIR($file_stat->mode)) { if ($file_mode != $expected_dir_mode) { print( *STDERR, sprintf("$File::Find::name mode must be %04o\n", $expected_dir_mode)); $result = 0; return; } } # Else something we can't handle else { die "unknown file type for $File::Find::name"; } } }, $dir); return $result; } =pod =item chmod_recursive(dir, dir_mode, file_mode) C<chmod> recursively each file and directory within the given directory. =cut sub chmod_recursive { my ($dir, $dir_mode, $file_mode) = @_; find( { follow_fast => 1, wanted => sub { my $file_stat = stat($File::Find::name); if (defined($file_stat)) { chmod( S_ISDIR($file_stat->mode) ? $dir_mode : $file_mode, $File::Find::name ) or die "unable to chmod $File::Find::name"; } } }, $dir); return; } =pod =item check_pg_config(regexp) Return the number of matches of the given regular expression within the installation's C<pg_config.h>. =cut sub check_pg_config { my ($regexp) = @_; my ($stdout, $stderr); my $result = IPC::Run::run [ 'pg_config', '--includedir' ], '>', \$stdout, '2>', \$stderr or die "could not execute pg_config"; chomp($stdout); $stdout =~ s/\r$//; open my $pg_config_h, '<', "$stdout/pg_config.h" or die "$!"; my $match = (grep { /^$regexp/ } <$pg_config_h>); close $pg_config_h; return $match; } =pod =item dir_symlink(oldname, newname) Portably create a symlink for a directory. On Windows this creates a junction point. Elsewhere it just calls perl's builtin symlink. =cut sub dir_symlink { my $oldname = shift; my $newname = shift; if ($windows_os) { $oldname = perl2host($oldname); $newname = perl2host($newname); $oldname =~ s,/,\\,g; $newname =~ s,/,\\,g; my $cmd = qq{mklink /j "$newname" "$oldname"}; if ($Config{osname} eq 'msys') { # need some indirection on msys $cmd = qq{echo '$cmd' | \$COMSPEC /Q}; } system($cmd); } else { symlink $oldname, $newname; } die "No $newname" unless -e $newname; } =pod =back =head1 Test::More-LIKE METHODS =over =item command_ok(cmd, test_name) Check that the command runs (via C<run_log>) successfully. =cut sub command_ok { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd, $test_name) = @_; my $result = run_log($cmd); ok($result, $test_name); return; } =pod =item command_fails(cmd, test_name) Check that the command fails (when run via C<run_log>). =cut sub command_fails { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd, $test_name) = @_; my $result = run_log($cmd); ok(!$result, $test_name); return; } =pod =item command_exit_is(cmd, expected, test_name) Check that the command exit code matches the expected exit code. =cut sub command_exit_is { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd, $expected, $test_name) = @_; print("# Running: " . join(" ", @{$cmd}) . "\n"); my $h = IPC::Run::start $cmd; $h->finish(); # On Windows, the exit status of the process is returned directly as the # process's exit code, while on Unix, it's returned in the high bits # of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h> # header file). IPC::Run's result function always returns exit code >> 8, # assuming the Unix convention, which will always return 0 on Windows as # long as the process was not terminated by an exception. To work around # that, use $h->full_results on Windows instead. my $result = ($Config{osname} eq "MSWin32") ? ($h->full_results)[0] : $h->result(0); is($result, $expected, $test_name); return; } =pod =item program_help_ok(cmd) Check that the command supports the C<--help> option. =cut sub program_help_ok { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd) = @_; my ($stdout, $stderr); print("# Running: $cmd --help\n"); my $result = IPC::Run::run [ $cmd, '--help' ], '>', \$stdout, '2>', \$stderr; ok($result, "$cmd --help exit code 0"); isnt($stdout, '', "$cmd --help goes to stdout"); is($stderr, '', "$cmd --help nothing to stderr"); return; } =pod =item program_version_ok(cmd) Check that the command supports the C<--version> option. =cut sub program_version_ok { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd) = @_; my ($stdout, $stderr); print("# Running: $cmd --version\n"); my $result = IPC::Run::run [ $cmd, '--version' ], '>', \$stdout, '2>', \$stderr; ok($result, "$cmd --version exit code 0"); isnt($stdout, '', "$cmd --version goes to stdout"); is($stderr, '', "$cmd --version nothing to stderr"); return; } =pod =item program_options_handling_ok(cmd) Check that a command with an invalid option returns a non-zero exit code and error message. =cut sub program_options_handling_ok { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd) = @_; my ($stdout, $stderr); print("# Running: $cmd --not-a-valid-option\n"); my $result = IPC::Run::run [ $cmd, '--not-a-valid-option' ], '>', \$stdout, '2>', \$stderr; ok(!$result, "$cmd with invalid option nonzero exit code"); isnt($stderr, '', "$cmd with invalid option prints error message"); return; } =pod =item command_like(cmd, expected_stdout, test_name) Check that the command runs successfully and the output matches the given regular expression. =cut sub command_like { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd, $expected_stdout, $test_name) = @_; my ($stdout, $stderr); print("# Running: " . join(" ", @{$cmd}) . "\n"); my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; ok($result, "$test_name: exit code 0"); is($stderr, '', "$test_name: no stderr"); like($stdout, $expected_stdout, "$test_name: matches"); return; } =pod =item command_like_safe(cmd, expected_stdout, test_name) Check that the command runs successfully and the output matches the given regular expression. Doesn't assume that the output files are closed. =cut sub command_like_safe { local $Test::Builder::Level = $Test::Builder::Level + 1; # Doesn't rely on detecting end of file on the file descriptors, # which can fail, causing the process to hang, notably on Msys # when used with 'pg_ctl start' my ($cmd, $expected_stdout, $test_name) = @_; my ($stdout, $stderr); my $stdoutfile = File::Temp->new(); my $stderrfile = File::Temp->new(); print("# Running: " . join(" ", @{$cmd}) . "\n"); my $result = IPC::Run::run $cmd, '>', $stdoutfile, '2>', $stderrfile; $stdout = slurp_file($stdoutfile); $stderr = slurp_file($stderrfile); ok($result, "$test_name: exit code 0"); is($stderr, '', "$test_name: no stderr"); like($stdout, $expected_stdout, "$test_name: matches"); return; } =pod =item command_fails_like(cmd, expected_stderr, test_name) Check that the command fails and the error message matches the given regular expression. =cut sub command_fails_like { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd, $expected_stderr, $test_name) = @_; my ($stdout, $stderr); print("# Running: " . join(" ", @{$cmd}) . "\n"); my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; ok(!$result, "$test_name: exit code not 0"); like($stderr, $expected_stderr, "$test_name: matches"); return; } =pod =item command_checks_all(cmd, ret, out, err, test_name) Run a command and check its status and outputs. Arguments: =over =item C<cmd>: Array reference of command and arguments to run =item C<ret>: Expected exit code =item C<out>: Expected stdout from command =item C<err>: Expected stderr from command =item C<test_name>: test name =back =cut sub command_checks_all { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($cmd, $expected_ret, $out, $err, $test_name) = @_; # run command my ($stdout, $stderr); print("# Running: " . join(" ", @{$cmd}) . "\n"); IPC::Run::run($cmd, '>', \$stdout, '2>', \$stderr); # See http://perldoc.perl.org/perlvar.html#%24CHILD_ERROR my $ret = $?; die "command exited with signal " . ($ret & 127) if $ret & 127; $ret = $ret >> 8; # check status ok($ret == $expected_ret, "$test_name status (got $ret vs expected $expected_ret)"); # check stdout for my $re (@$out) { like($stdout, $re, "$test_name stdout /$re/"); } # check stderr for my $re (@$err) { like($stderr, $re, "$test_name stderr /$re/"); } return; } =pod =back =cut 1; ������������������������������������������check_pgactivity-REL2_7/t/lib/pgNode.pm�������������������������������������������������������������0000664�0000000�0000000�00000006603�14504266471�0020127�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group =pod =head1 NAME pgNode - facet class extending PostgresNode for check_pgactivity tests =head1 DESCRIPTION This class should not be used directly to create objects. Its only aim is to extend the existing PostgresNode class, imported from PostgreSQL source code, without editing it so we can import new versions easily. See PostgresNode documentation for original methods. =cut package pgNode; use strict; use warnings; use Test::More; use Time::HiRes qw(usleep); use Cwd 'cwd'; use Config; use PostgresNode; BEGIN { # set environment vars $ENV{'TESTDIR'} = cwd; delete $ENV{'PAGER'}; # Look for command 'true' # It's under /bin on various Linux, under /usr/bin for macosx if ( -x '/bin/true' ) { $ENV{'PG_REGRESS'} = '/bin/true'; } else { $ENV{'PG_REGRESS'} = '/usr/bin/true'; } } sub new { my $class = shift; my $self = {}; $self->{'node'} = PostgresNode->get_new_node(@_); bless $self, $class; BAIL_OUT( "TAP tests does not support versions older than 8.2" ) if $self->version < 8.2; note('Node "', $self->{'node'}->name, '" uses version: ', $self->version); return $self; } sub AUTOLOAD { our $AUTOLOAD; my $subname = $AUTOLOAD; my $self = shift; $subname =~ s/^pgNode:://; return if $subname eq "DESTROY" and not $self->{'node'}->can("DESTROY"); return $self->{'node'}->$subname(@_); } # Overload wait_for_catchup to pass the PostgresNode object as param sub wait_for_catchup { my $self = shift; my $stb = shift; $self->{'node'}->wait_for_catchup($stb->{'node'}, @_); } =pod =head1 METHODS Below the changes and new methods implemented in this facet. =over =item $node->version() Return the PostgreSQL backend version. =cut sub version { return $_[0]->{'node'}->{_pg_version}; #die "pgNode must not be used directly to create an object" } =item $node->switch_wal() Force WAL rotation. Return the old segment filename. =cut sub switch_wal { my $self = shift; my $result; if ($self->version >= '10') { $result = $self->safe_psql('postgres', 'SELECT pg_walfile_name(pg_switch_wal())'); } else { $result = $self->safe_psql('postgres', 'SELECT pg_xlogfile_name(pg_switch_xlog())'); } chomp $result; return if $result eq ''; return $result; } =item $node->wait_for_archive($wal) Wait for given C<$wal> to be archived. Timeout is 30s before bailing out. =cut sub wait_for_archive { my $self = shift; my $wal = shift; my $sleep_time = 100_000; # 0.1s my $max_attempts = 300; # 300 * 0.1s = 30s my $walfile = $self->archive_dir() ."/$wal"; print "# waiting for archive $walfile\n"; while ($max_attempts and not -f $walfile) { $max_attempts--; usleep($sleep_time); } if (not -f $walfile) { print "# timeout waiting for archive $wal\n"; print TestLib::slurp_file($self->logfile); BAIL_OUT("achiving timeout or failure"); return 0; } return 1; } =pod =back =head1 SEE ALSO The original L<PostgresNode> class with further methods. The L<TestLib> class with testing helper functions. =cut 1; �����������������������������������������������������������������������������������������������������������������������������check_pgactivity-REL2_7/t/lib/pgSession.pm����������������������������������������������������������0000664�0000000�0000000�00000002434�14504266471�0020663�0����������������������������������������������������������������������������������������������������ustar�00root����������������������������root����������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# This program is open source, licensed under the PostgreSQL License. # For license terms, see the LICENSE file. # # Copyright (C) 2012-2023: Open PostgreSQL Monitoring Development Group package pgSession; use strict; use warnings; use version; use Carp; sub new { my $class = shift; my $node = shift; my $db = shift; my $self; $db = 'template1' unless defined $db; $self->{'timer'} = IPC::Run::timer(5); $self->{'delim'} = 'CHECK_PGA_PROMPT_DELIM=>'; $self->{'in'} = ''; $self->{'out'} = ''; $self->{'proc'} = $node->interactive_psql( $db, \$self->{'in'}, \$self->{'out'}, $self->{'timer'}, extra_params=>[ '--pset=pager', '--variable=PROMPT1='. $self->{'delim'} ] ); return bless $self, $class; } sub query { my ($self, $q, $t) = @_; $self->{'out'} = ''; $self->{'in'} = ''; $self->{'timer'}->start($t); # wait for the prompt to appear $self->{'proc'}->pump until $self->{'out'} =~ $self->{'delim'};; # reset the output to forget the banner + prompt $self->{'out'} = ''; # write and run the query (this echoes the query in $out :/) $self->{'in'} .= "$q;\n"; # push $in to the procs $self->{'proc'}->pump while length $self->{'in'}; } 1 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������