skytools-2.1.13/ 0000755 0001750 0001750 00000000000 11727601174 012477 5 ustar marko marko skytools-2.1.13/NEWS 0000644 0001750 0001750 00000044003 11727600317 013175 0 ustar marko marko
2012-03-13 - SkyTools 2.1.13 - "Pailapsiin Overdose"
* Convert SkyTools to use symbolic isolation level constants.
Pscyopg 2.4.2/2.4.3 have renumbered isolation levels, which
made "londiste copy" lose data. Pscyopg 2.4.4 fixes it.
But make SkyTools conform to new Pscyopg conventions so Londiste
can survive such changes in the future. This also makes Londiste
work with those 2 Pscyopg versions.
* Use REPEATABLE READ isolation level on copy. Pre-9.1 READ_COMMITTED
and SERIALIZABLE were the same. 9.1+ introduce new SERIALIZABLE
which does more than we want.
* londiste add-table: make trigger check sql 9.1-compatible.
(Sébastien Lardière)
* Make C modules compile on 9.2.
* v2.1.12 broke psycopg <= 2.0.9. Fix.
(Dimitri Fontaine)
* Fix rare crash in pgq.insert_event() & pgq triggers.
(Backport from 3.0)
* walmgr:
- Sync changes from 3.0 branch.
(Martin Pihlak)
- Add support for detecting stale locks and releasing them
instead of aborting
(Steve Singer)
- Move the pg_stop_backup() into a finally: block.
Some instances were reported where the base backup failed with some issue
but pg_stop_backup() hadn't been called and had to be called manually.
This should make that less likely
(Steve Singer)
2010-11-10 - SkyTools 2.1.12 - "Portable Minefield"
To apply Londiste database-side fixes, run 'londiste.upgrade.sql'
where needed.
= Features =
* Support Postgres 9.0. (Devrim Gündüz, Jason Buberel, Sébastien Lardière)
* walmgr: Introduce a 'slave_pg_xlog' configuration variable.
This allows master and slave pg_xlog files to be in different
locations. During restore this directory is symlinked to
slave pg_xlog. (Steve Singer)
* walmgr: Introduce a 'backup_datadir' configuration variable to
control whether the slave data directory is kept or
overwritten during restore. (Steve Singer)
* walmgr: consider wal_level during setup/stop (Martin Pihlak)
* skytools.dbstruct: support version-dependant sql.
* psycopgwrapper: always set db.server_version
= Fixes =
* londiste copy: restore index clustering info. (André Malo)
* londiste repair: use subprocess module instead os.popen4, where
available. (Shoaib Mir)
* pgqadm: Remove unnecessary JOIN in refresh_queues(). (Artyom Nosov)
* londiste add: dont send new table/seq list into queue, not useful,
only can cause surprise removal in case of several 'provider add'+
'subscriber add' pairs.
* londiste launcher: don't interpret file 'londiste' as module.
Needed for 3.0 compatibility.
* walmgr: "sync" now omits unneeded WAL segments if the
database has been cleanly shut down. This greatly reduces
sync time during planned switchovers as usually there
is only a single WAL file to be synched to slave.
* londiste.link: Add missing quoting. The functions are unused,
thus missed the last round of fixes.
* Building from CVS/GIT assumes --with-asciidoc automatically.
* Newer asciidoc/docbook do not need fixman.py hack. Remove it.
2010-02-03 - SkyTools 2.1.11 - "Replicates Like Randy Rabbit"
= Fixes =
* londiste, pgq smart triggers: crash in case the table info cache
invalidation callback was called from signal handler. Fix it by
moving cache operations out of callback.
* walmgr:
- symlink handling for pg_log and pg_xlog
- Set archive_command to "/bin/true" for restored postgresql.conf
(Martin Pihlak, Mark Kirkwood)
* londiste copy:
- Make constraint dropping work for inherited tables
(Hannu Krosing)
- Do not restore fkeys to tables with unknown replication status
or coming from different queue. (Hannu Krosing)
- Use TRUNCATE ONLY on 8.4+. (Sergey Konoplev)
- Replace last copy pidfile check with signal_pidfile,
thus avoiding potential infinite stall with dead copy.
* londiste repair: set LC_ALL=C when calling 'sort' to have
byte based sorting.
* skytools.DBScript:
- Add --version switch to show skytools version. (Hannu Krosing)
- Safer pidfile handling - avoid creating zero-length files
on disk full situation by deleting them on write error.
* pgq.Event: ev.retry, ev.ev_retry fields for retry count.
(Nico Mandery)
* pgq.maint_retry_events(): lock table to allow only single mover.
* pgq.logutriga() did not put custom pkey= value into events.
* pgq.logutriga() and pgq.sqltriga() did allow UPDATE and DELETE
on tables without pkey, running into SQL errors downstream.
They should throw error in such case.
* Fix DeprecationWarning on Python 2.6 vs. 'import sets'.
* make deb: Work around Debian's --install-layout related braindamage.
2009-08-31 - SkyTools 2.1.10 - "As Stable As A Falling Anvil"
= Fixes =
* pgqadm: rename 'as' function argument as it's keyword in Python 2.6
* londiste provider add-seq:
- Detect existing sequences.
- Make --all ignore pgq, londiste and temp schemas.
- Proper quoting was missing in few places.
* walmgr: Create pg_xlog/archive_status directory for slave restore.
(Mark Kirkwood)
* londiste provider add: detect and warn about user AFTER triggers
that run before Londiste one.
* docs:
- Documentation for log_size, log_count and use_skylog options.
- Refresh dependency list, add rsync
- Mention --with-asciidoc in INSTALL.
* --with-asciidoc: make it tolerate missing asciidoc and/or xmlto.
* deb84 Makefile target.
2009-03-13 - SkyTools 2.1.9 - "Six Pack of Luck"
= WalMgr improvements =
* walmgr.py: WAL files purge procedure now pays attention to recovery
restart points. (%r patch by Omar Kilani, Mark Kirkwood)
* walmgr.py: archive_mode GUC is now set during setup. Followed by an
optional restart if master_restart_cmd is specified (Mark Kirkwood)
* walmgr.py: PostgreSQL configuration files are backed up to
"config_backup" and restored to "slave_config_dir". (Mark Kirkwood)
* walmgr.py: Backups taken from slave now generate backup_label and
history files.
* walmgr.py Configuration files now default to PostgreSQL 8.3
= Fixes =
* londiste copy: Add missing identifier quoting for
TRUNCATE and ADD CONSTRAINT. (Luc Van Hoeylandt)
* Quote identifiers starting with numeric. (Andrew Dunstan)
* Fix crash with pgq.sqltrigq/pgq.logutriga, which automatically
detect table structure. Problem was handling table invalidation
event during internal query.
(Götz Lange, André Malo)
* Fix 'uninitialized variable' warning and potentially bad code
in pgq.sqltriga(). (Götz Lange, André Malo)
* skytools._cquoting: Fix crash with Decimal type.
(May affects users who have compiled psycopg2 to use
Decimal() instead of float for NUMERIC)
* skytools.DBStruct: The DEFAULT keyword was missing when
creating table columns with default value. Table creation
functionality is unused in 2.1.x thus no users should be affected.
* skytools.magic_insert: be more flexible about whats dict.
In particular, now accept DictRow.
* configure.ac: don't detect xmlto and asciidoc version
if --with-asciidoc is not used. Otherwise unnecessary error
message was printed.
* Add few headers for Postgres 8.4 that are not included
automatically anymore. (Devrim Gündüz)
2008-10-12 - SkyTools 2.1.8 - "Perestroika For Your Data"
= Fixes =
* deb: Make debian package accept skytools-8.3 too.
* add skylog.ini as installalble file (David Fetter)
* Londiste:
- Document the fkeys, triggers, restore-triggers commands.
- Run ANALYZE after table is copied over.
- Fix "provider add-seq --all"
* pgq.SerialConsumer (used by Londiste) - fix position check,
which got confused when sequence numbers did large jump.
* PgQ database functions:
- pgq.maint_rotate_step1() function removed tick #1
even if consumer was registered on it.
- pgq triggers: import cache invalidation from HEAD.
Now pgq.sqltriga() and pgq.logutriga() should be preferable
to pgq.logtriga().
- uninstall_pgq.sql: Correct syntax for DROP SCHEMA CASCADE.
* skytools.DBScript, used by all Python scripts:
- Don't try to stay running if MemoryError was thrown.
- Detect stale pidfile and ignore it.
- Stop hooking SIGTERM anymore. Python signal handling and
libpq polling did not interact well.
* scriptmgr:
- Don't crash on missing pidfile.
- Fix few typos on scriptmgr (Martin Pihlak)
* walmgr (Martin Pihlak):
- improved error messages on startup if no config file specified.
- walmgr.py stop now also stops syncdaemon
- pg_auth file is copied to slave as part of archiving. restored during boot.
- cleanup in remote_walmgr() to always pass config file to slave walmgr.
- master_backup() now reports exceptions.
= Features =
* londiste -s/-k now kill "copy" process with SIGTERM.
Previously the process was left running. On startup
"replay" will check if "copy" process had died before
finishing and launches one if needed. (Dimitri Fontaine)
* New skytools.signal_pidfile() function.
* Londiste: New lock_timeout parameter to limit time
locks are held on provider. Applies to operations:
provider add/remove, compare, repair.
* Londiste: on psycopg2 v2.0.6+ use new .copy_expert() API,
which does not crash on tables with large number of columns.
2008-06-03 - SkyTools 2.1.7 - "Attack of the Werebugs"
= Fixes =
* Fix pgq trigger compilation with Postgres 8.2 (Marcin Stępnicki)
* Replace `make` calls with $(MAKE) in Makefiles (Pierre-Emmanuel André)
* londiste: Fix incompatibility with Python 2.3 (Dimitri Fontaine)
* walmgr: Fix typo in config symlinking code. (pychecker)
* bulk_loader: Fix typo in temp table check. (pychecker)
* Install upgrade .sql files. Otherwise skytools_upgrade.py could
be used only from source directory.
* pgq.Consumer: Fix bug in retry/failed event handling. (pychecker)
* pgq: Fix pgq.maint_retry_events() - it could create double events when
amount of event to be moved back into main queue was more than 10.
= Features =
* Quoting of table and column names in Londiste and dispatcher scripts.
= Upgrade procedure =
* Database code under pgq and londiste schemas need to be upgraded.
That can be done on running databases with following command:
$ skytools_upgrade.py "connstr"
Or by applying 'londiste.upgrade.sql' and 'pgq.upgrade.sql' by hand.
The changes were only in functions, no table sctructure changed.
2008-04-05 - SkyTools 2.1.6 - "Quick Bugfix Release"
Now we have upgrade script, see 'man skytools_upgrade'
for info how to upgrade database code.
= Fixes =
* Include upgrade sql scripts in .tgz
* Fix 'londiste provider seqs'
* Fix 'londiste provider fkeys' in no tables added.
* Fix "londiste copy" pidfile timing race
* Fix Solaris build - avoid grep -q / define HAVE_UNSETENV
* Fix "make debXX" when several Postgres version are installed.
* New-style AC_OUTPUT usage.
* Disable manpage creation by default, --with-asciidoc to enable.
They are still included in .tgz so users should have less problems now.
* Restore iter-on-values behaviour for rows from curs.fetch*. The attempt
to make them iter-on-keys seems now misguided, as iter-on-values is already
used in existing code, and iter-on-keys returns keys in random order.
* londiste subscriber add: Dont drop triggers on table if --expect-sync is used.
* londiste copy: drop triggers and fkeys in case "replay" or "subscriber add" was skipped
* walmgr restore: better detection if old postmaster is running (Charles Duffy)
* walmgr xrestore: detect the death of parent process
* walmgr restore: create pg_tblspc - its required for 8.3 (Zoltán Böszörményi)
* walmgr restore: copy old config files over if exist (Zoltán Böszörményi)
= Features =
* Table name globbing for Londiste commands (Erik Jones)
* New manpages for scripts (Asko Oja & me)
* Upgrade script with manpage: scripts/skytools_upgrade.py
* Add .version() function to pgq_ext & londiste schemas.
* pgqadm: allow parameters without queue_ prefix in 'config' command.
* skytools Python module:
- intern() keys in db_urldecode() to decrease memory usage
- udp-logger: more fields: hostaddr, service_name
- udp-logger: dont cache udp socket, seem to hang in some cases
- DBScript.get_database() allows explicit connect string
- DBScript allows disabling config file loading
- magic_insert on dicts allows missing columns, uses NULL
- new parsing functions:
- parse_pgarray(), parse_logtriga_sql(), parse_tabbed_table(),
- parse_statements() - this one is used to split SQL install
files to separate statements.
- exists_function() checks both 'public' and 'pg_catalog'
for unqualified functions names.
- skytools.quoting: add C implementation for quote_literal, quote_copy,
quote_bytea_raw, unescape, db_urlencode, db_urldecode. This gives
20x speedup for db_urlencode and 50x for db_urldecode on some
real-life data.
- unquote_ident(), unquote_literal()
2007-11-19 - SkyTools 2.1.5 - "Enterprise-Grade Duct Tape"
= Big changes =
* Lot of new docs [Dimitri Fontaine, Asko Oja, Marko Kreen]
* Support for fkey and trigger handling in Londiste. [Erik Jones]
* Rewrite pgq.insert_event() and log triggers in C, thus SkyTools does
not depend on PL/Python anymore.
= Small changes =
* pgq+txid: convert to new API appearing in 8.3 /contrib/txid/
* Support psycopg2, preferring it to psycopg1.
* Improved bulk_loader, using temp tables exclusively.
* skytools.config: API change to allow usage without config file.
* skytools module: quote_ident(), quote_fqident()
* install .sql files under share/skytools in addition to contrib/
* pgqadm: also vacuums londiste and pgq_ext tables, if they exist
* londiste: provider add/remove --all [Hans-Juergen Schoenig]
* backend modules support 8.3
* pgq: switch pgq_lazy_fetch=NROWS for pgq.Consumer, which makes it use
cursor for event fetching, thus allowing larger batches
* txid: use bsearch() for larger snapshots
= Fixes =
* londiste fkeys: look also at dependers not only dependencies.
* pgq.consumer: make queue seeking in case of failover more strict.
* scriptmgr: dont die on user error.
* pgq: there was still fallout from reorg - 2 missing indexes.
* Due to historical reasons SerialConsumer (and thus Londiste)
accessed completed tick table directly, not via functions.
Make it use functions again.
* londiste: set client_encoding on subscriber same as on provider
* londiste: remove tbl should work also if table is already dropped [Dimitri Fontaine]
* couple walmgr fixed [Martin Pihlak]
= Upgrade procedure for database code =
* PgQ (used on Londiste provider side), table structure, plpgsql functions:
$ psql dbname -f upgrade/final/v2.1.5.pgq_core.sql
* PgQ new insert_event(), written in C:
$ psql dbname -f sql/pgq/lowlevel/pgq_lowlevel.sql
* PgQ new triggers (sqltriga, logtriga, logutriga), written in C:
$ psql dbname -f sql/pgq/triggers/pgq_triggers.sql
* Londiste (both provider and subscriber side)
$ psql dbname -f upgrade/final/v2.1.5.londiste.sql
* pgq_ext:
$ psql dbname -f upgrade/final/v2.1.5.pgq_ext.sql
2007-04-16 - SkyTools 2.1.4 - "Sweets from last Christmas"
= Fixes =
* logtriga.c was supposed to survive mismatched column string,
but the logic were buggy. Thanks go to Dmitriy V'jukov for
good analysis.
* Couple of scripts were not converted to new API. Fix it.
* Quiet a warning in textbuf.c
* Make configure and Makefiles survive on BSD's where 'make'
is not GNU make. Thanks to Hans-Juergen Schoening.
= Features =
* Published WalMgr was an old version. Sync with internal code,
where Martin has done lot of enhancements.
* Small incompat change in PGQ: add fields to pgq.get_consumer_info()
return type. Example upgrade procedure:
DROP TYPE pgq.ret_consumer_info cascade;
\i structure/types.sql
\i functions/pgq.get_consumer_info.sql
It will show some errors but thats ok. Its annoying but needed
for the tick_id cleanup in SerialConsumer/Londiste.
2007-04-10 - SkyTools 2.1.3 - "Brown Paper Bag"
Still managed to sneak in a last-minute typo.
* Fix copy-paste error in table_copy.py
* Remember to bump version in pgq.version()
2007-04-09 - SkyTools 2.1.2 - "Help screen works"
Most fallout from reorg is hopefully cleaned now.
* Dumb bug in ticker wich made it almost non-working,
except it managed to complete the regression test...
* Move --skip-truncate switch from 'copy' to 'londiste add'.
* 'londiste install' also installs plpgsql+plpythonu.
* SQL installer logs full path to file it found.
* Change pgq.get_queue_info() FOR loop variable to record
instead text that was reported to fail, although it did work here.
* Remember where the SQL files were installed.
2007-04-06 - SkyTools 2.1.1 - "Needs more thrust to fly"
SkyTools got big reorg before release, but with the hoopla
with the 3 projects at once, it did not get much testing...
There are some untested areas still, but at least pgq/londiste
are in better shape now.
* pgqadm: finish conversion...
* londiste.Syncer:
- Convert to new API
- Avoid ticking, parallel ticks are dangerous
- Bad arg in repair
* pgq:
- too aggressive check in register_consumer
- Algo desc for batch_event_sql
* Add some knobs to make regtests for londiste pass
more predictibly.
2007-03-13 - SkyTools 2.1 - "Radioactive Candy"
* Final public release.
skytools-2.1.13/doc/ 0000755 0001750 0001750 00000000000 11727601174 013244 5 ustar marko marko skytools-2.1.13/doc/pgq-nodupes.txt 0000644 0001750 0001750 00000002663 11670174255 016260 0 ustar marko marko = Avoiding duplicate events =
It is pretty burdensome to check if event is already processed,
especially on bulk data moving. Here's a way how this can be avoided.
First, consumer must guarantee that it processes all events in one tx.
Consumer itself can tag events for retry, but then it must be able to handle them later.
== Only one db ==
If the PgQ queue and event data handling happen in same database,
the consumer must simply call pgq.finish_batch() inside the event-processing
transaction.
== Several databases ==
If the event processing happens in different database, the consumer
must store the batch_id into destination database, inside the same
transaction as the event processing happens.
- Only after committing it, consumer can call pgq.finish_batch() in queue database
and commit that.
- As the batches come in sequence, there's no need to remember full log of batch_id's,
it's enough to keep the latest batch_id.
- Then at the start of every batch, consumer can check if the batch_id already
exists in destination database, and if it does, then just tag batch done,
without processing.
With this, there's no need for consumer to check for already processed
events.
== Note ==
This assumes the event processing is transaction-able - failures
will be rollbacked. If event processing includes communication with
world outside database, eg. sending email, such handling won't work.
skytools-2.1.13/doc/queue_splitter.1 0000644 0001750 0001750 00000014736 11727600402 016404 0 ustar marko marko '\" t
.\" Title: queue_splitter
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "QUEUE_SPLITTER" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
queue_splitter \- PgQ consumer that transports events from one queue into several target queues
.SH "SYNOPSIS"
.sp
.nf
queue_splitter\&.py [switches] config\&.ini
.fi
.SH "DESCRIPTION"
.sp
queue_spliter is PgQ consumer that transports events from source queue into several target queues\&. ev_extra1 field in each event shows into which target queue it must go\&. (pgq\&.logutriga() puts there the table name\&.)
.sp
One use case is to move events from OLTP database to batch processing server\&. By using queue spliter it is possible to move all kinds of events for batch processing with one consumer thus keeping OLTP database less crowded\&.
.SH "QUICK-START"
.sp
Basic queue_splitter setup and usage can be summarized by the following steps:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
pgq must be installed both in source and target databases\&. See pgqadm man page for details\&. Target database must also have pgq_ext schema installed\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
edit a queue_splitter configuration file, say queue_splitter_sourcedb_sourceq_targetdb\&.ini
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
create source and target queues
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py ticker\&.ini create
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
launch queue splitter in daemon mode
.sp
.if n \{\
.RS 4
.\}
.nf
$ queue_splitter\&.py queue_splitter_sourcedb_sourceq_targetdb\&.ini \-d
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
start producing and consuming events
.RE
.SH "CONFIG"
.SS "Common configuration parameters"
.PP
job_name
.RS 4
Name for particulat job the script does\&. Script will log under this name to logdb/logserver\&. The name is also used as default for PgQ consumer name\&. It should be unique\&.
.RE
.PP
pidfile
.RS 4
Location for pid file\&. If not given, script is disallowed to daemonize\&.
.RE
.PP
logfile
.RS 4
Location for log file\&.
.RE
.PP
loop_delay
.RS 4
If continuisly running process, how long to sleep after each work loop, in seconds\&. Default: 1\&.
.RE
.PP
connection_lifetime
.RS 4
Close and reconnect older database connections\&.
.RE
.PP
log_count
.RS 4
Number of log files to keep\&. Default: 3
.RE
.PP
log_size
.RS 4
Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M)
.RE
.PP
use_skylog
.RS 4
If set, search for
[\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons
logging
module\&. It allows setting up fully customizable logging setup\&.
.RE
.SS "Common PgQ consumer parameters"
.PP
pgq_queue_name
.RS 4
Queue name to attach to\&. No default\&.
.RE
.PP
pgq_consumer_id
.RS 4
Consumers ID to use when registering\&. Default: %(job_name)s
.RE
.SS "queue_splitter parameters"
.PP
src_db
.RS 4
Source database\&.
.RE
.PP
dst_db
.RS 4
Target database\&.
.RE
.SS "Example config file"
.sp
.if n \{\
.RS 4
.\}
.nf
[queue_splitter]
job_name = queue_spliter_sourcedb_sourceq_targetdb
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
src_db = dbname=sourcedb
dst_db = dbname=targetdb
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
pgq_queue_name = sourceq
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
logfile = ~/log/%(job_name)s\&.log
pidfile = ~/pid/%(job_name)s\&.pid
.fi
.if n \{\
.RE
.\}
.SH "COMMAND LINE SWITCHES"
.sp
Following switches are common to all skytools\&.DBScript\-based Python programs\&.
.PP
\-h, \-\-help
.RS 4
show help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.PP
\-d, \-\-daemon
.RS 4
make program go background
.RE
.sp
Following switches are used to control already running process\&. The pidfile is read from config then signal is sent to process id specified there\&.
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE
.SH "USECASE"
.sp
How to to process events created in secondary database with several queues but have only one queue in primary database\&. This also shows how to insert events into queues with regular SQL easily\&.
.sp
.if n \{\
.RS 4
.\}
.nf
CREATE SCHEMA queue;
CREATE TABLE queue\&.event1 (
\-\- this should correspond to event internal structure
\-\- here you can put checks that correct data is put into queue
id int4,
name text,
\-\- not needed, but good to have:
primary key (id)
);
\-\- put data into queue in urlencoded format, skip actual insert
CREATE TRIGGER redirect_queue1_trg BEFORE INSERT ON queue\&.event1
FOR EACH ROW EXECUTE PROCEDURE pgq\&.logutriga(\*(Aqsinglequeue\*(Aq, \*(AqSKIP\*(Aq);
\-\- repeat the above for event2
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
\-\- now the data can be inserted:
INSERT INTO queue\&.event1 (id, name) VALUES (1, \*(Aquser\*(Aq);
.fi
.if n \{\
.RE
.\}
.sp
If the queue_splitter is put on "singlequeue", it spreads the event on target to queues named "queue\&.event1", "queue\&.event2", etc\&. This keeps PgQ load on primary database minimal both CPU\-wise and maintenance\-wise\&.
skytools-2.1.13/doc/queue_splitter.txt 0000644 0001750 0001750 00000005341 11670174255 017064 0 ustar marko marko = queue_splitter(1) =
== NAME ==
queue_splitter - PgQ consumer that transports events from one queue into several target queues
== SYNOPSIS ==
queue_splitter.py [switches] config.ini
== DESCRIPTION ==
queue_spliter is PgQ consumer that transports events from source queue into
several target queues. `ev_extra1` field in each event shows into which
target queue it must go. (`pgq.logutriga()` puts there the table name.)
One use case is to move events from OLTP database to batch processing server.
By using queue spliter it is possible to move all kinds of events for batch
processing with one consumer thus keeping OLTP database less crowded.
== QUICK-START ==
Basic queue_splitter setup and usage can be summarized by the following
steps:
1. pgq must be installed both in source and target databases.
See pgqadm man page for details. Target database must also
have pgq_ext schema installed.
2. edit a queue_splitter configuration file, say queue_splitter_sourcedb_sourceq_targetdb.ini
3. create source and target queues
$ pgqadm.py ticker.ini create
4. launch queue splitter in daemon mode
$ queue_splitter.py queue_splitter_sourcedb_sourceq_targetdb.ini -d
5. start producing and consuming events
== CONFIG ==
include::common.config.txt[]
=== queue_splitter parameters ===
src_db::
Source database.
dst_db::
Target database.
=== Example config file ===
[queue_splitter]
job_name = queue_spliter_sourcedb_sourceq_targetdb
src_db = dbname=sourcedb
dst_db = dbname=targetdb
pgq_queue_name = sourceq
logfile = ~/log/%(job_name)s.log
pidfile = ~/pid/%(job_name)s.pid
== COMMAND LINE SWITCHES ==
include::common.switches.txt[]
== USECASE ==
How to to process events created in secondary database
with several queues but have only one queue in primary
database. This also shows how to insert events into
queues with regular SQL easily.
CREATE SCHEMA queue;
CREATE TABLE queue.event1 (
-- this should correspond to event internal structure
-- here you can put checks that correct data is put into queue
id int4,
name text,
-- not needed, but good to have:
primary key (id)
);
-- put data into queue in urlencoded format, skip actual insert
CREATE TRIGGER redirect_queue1_trg BEFORE INSERT ON queue.event1
FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('singlequeue', 'SKIP');
-- repeat the above for event2
-- now the data can be inserted:
INSERT INTO queue.event1 (id, name) VALUES (1, 'user');
If the queue_splitter is put on "singlequeue", it spreads the event
on target to queues named "queue.event1", "queue.event2", etc.
This keeps PgQ load on primary database minimal both CPU-wise
and maintenance-wise.
skytools-2.1.13/doc/pgqadm.1 0000644 0001750 0001750 00000021721 11727600371 014600 0 ustar marko marko '\" t
.\" Title: pgqadm
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "PGQADM" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
pgqadm \- PgQ ticker and administration interface
.SH "SYNOPSIS"
.sp
.nf
pgqadm\&.py [option] config\&.ini command [arguments]
.fi
.SH "DESCRIPTION"
.sp
PgQ is Postgres based event processing system\&. It is part of SkyTools package that contains several useful implementations on this engine\&. Main function of PgQadm is to maintain and keep healthy both pgq internal tables and tables that store events\&.
.sp
SkyTools is scripting framework for Postgres databases written in Python that provides several utilities and implements common database handling logic\&.
.sp
Event \- atomic piece of data created by Producers\&. In PgQ event is one record in one of tables that services that queue\&. Event record contains some system fields for PgQ and several data fileds filled by Producers\&. PgQ is neither checking nor enforcing event type\&. Event type is someting that consumer and produser must agree on\&. PgQ guarantees that each event is seen at least once but it is up to consumer to make sure that event is processed no more than once if that is needed\&.
.sp
Batch \- PgQ is designed for efficiency and high throughput so events are grouped into batches for bulk processing\&. Creating these batches is one of main tasks of PgQadm and there are several parameters for each queue that can be use to tune size and frequency of batches\&. Consumerss receive events in these batches and depending on business requirements process events separately or also in batches\&.
.sp
Queue \- Event are stored in queue tables i\&.e queues\&. Several producers can write into same queeu and several consumers can read from the queue\&. Events are kept in queue until all the consumers have seen them\&. We use table rotation to decrease hard disk io\&. Queue can contain any number of event types it is up to Producer and Consumer to agree on what types of events are passed and how they are encoded For example Londiste producer side can produce events for more tables tan consumer side needs so consumer subscribes only to those tables it needs and events for other tables are ignores\&.
.sp
Producer \- applicatione that pushes event into queue\&. Prodecer can be written in any langaage that is able to run stored procedures in Postgres\&.
.sp
Consumer \- application that reads events from queue\&. Consumers can be written in any language that can interact with Postgres\&. SkyTools package contains several useful consumers written in Python that can be used as they are or as good starting points to write more complex consumers\&.
.SH "QUICK-START"
.sp
Basic PgQ setup and usage can be summarized by the following steps:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
create the database
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
edit a PgQ ticker configuration file, say ticker\&.ini
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
install PgQ internal tables
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py ticker\&.ini install
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
launch the PgQ ticker on databse machine as daemon
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py \-d ticker\&.ini ticker
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
create queue
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py ticker\&.ini create
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
register or run consumer to register it automatically
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py ticker\&.ini register
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 7.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 7." 4.2
.\}
start producing events
.RE
.SH "CONFIG"
.sp
.if n \{\
.RS 4
.\}
.nf
[pgqadm]
job_name = pgqadm_somedb
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
db = dbname=somedb
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# how often to run maintenance [seconds]
maint_delay = 600
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# how often to check for activity [seconds]
loop_delay = 0\&.1
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
logfile = ~/log/%(job_name)s\&.log
pidfile = ~/pid/%(job_name)s\&.pid
.fi
.if n \{\
.RE
.\}
.SH "COMMANDS"
.SS "ticker"
.sp
Start ticking & maintenance process\&. Usually run as daemon with \-d option\&. Must be running for PgQ to be functional and for consumers to see any events\&.
.SS "status"
.sp
Show overview of registered queues and consumers and queue health\&. This command is used when you want to know what is happening inside PgQ\&.
.SS "install"
.sp
Installs PgQ schema into database from config file\&.
.SS "create "
.sp
Create queue tables into pgq schema\&. As soon as queue is created producers can start inserting events into it\&. But you must be aware that if there are no consumers on the queue the events are lost until consumer is registered\&.
.SS "drop "
.sp
Drop queue and all it\(cqs consumers from PgQ\&. Queue tables are dropped and all the contents are lost forever so use with care as with most drop commands\&.
.SS "register "
.sp
Register given consumer to listen to given queue\&. First batch seen by this consumer is the one completed after registration\&. Registration happens automatically when consumer is run first time so using this command is optional but may be needed when producers start producing events before consumer can be run\&.
.SS "unregister "
.sp
Removes consumer from given queue\&. Note consumer must be stopped before issuing this command otherwise it automatically registers again\&.
.SS "config [ [= \&... ]]"
.sp
Show or change queue config\&. There are several parameters that can be set for each queue shown here with default values:
.PP
queue_ticker_max_lag (2)
.RS 4
If no tick has happend during given number of seconds then one is generated just to keep queue lag in control\&. It may be increased if there is no need to deliver events fast\&. Not much room to decrease it :)
.RE
.PP
queue_ticker_max_count (200)
.RS 4
Threshold number of events in filling batch that triggers tick\&. Can be increased to encourage PgQ to create larger batches or decreased to encourage faster ticking with smaller batches\&.
.RE
.PP
queue_ticker_idle_period (60)
.RS 4
Number of seconds that can pass without ticking if no events are coming to queue\&. These empty ticks are used as keep alive signals for batch jobs and monitoring\&.
.RE
.PP
queue_rotation_period (2 hours)
.RS 4
Interval of time that may pass before PgQ tries to rotate tables to free up space\&. Not PgQ can not rotate tables if there are long transactions in database like VACUUM or pg_dump\&. May be decreased if low on disk space or increased to keep longer history of old events\&. To small values might affect performance badly because postgres tends to do seq scans on small tables\&. Too big values may waste disk space\&.
.RE
.sp
Looking at queue config\&.
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py mydb\&.ini config
testqueue
queue_ticker_max_lag = 3
queue_ticker_max_count = 500
queue_ticker_idle_period = 60
queue_rotation_period = 7200
$ pgqadm\&.py conf/pgqadm_myprovider\&.ini config testqueue queue_ticker_max_lag=10 queue_ticker_max_count=300
Change queue bazqueue config to: queue_ticker_max_lag=\*(Aq10\*(Aq, queue_ticker_max_count=\*(Aq300\*(Aq
$
.fi
.if n \{\
.RE
.\}
.SH "COMMON OPTIONS"
.PP
\-h, \-\-help
.RS 4
show help message
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program verbose
.RE
.PP
\-d, \-\-daemon
.RS 4
go background
.RE
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE
skytools-2.1.13/doc/cube_dispatcher.1 0000644 0001750 0001750 00000020273 11727600375 016460 0 ustar marko marko '\" t
.\" Title: cube_dispatcher
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "CUBE_DISPATCHER" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
cube_dispatcher \- PgQ consumer that is used to write source records into partitoned tables
.SH "SYNOPSIS"
.sp
.nf
cube_dispatcher\&.py [switches] config\&.ini
.fi
.SH "DESCRIPTION"
.sp
cube_dispatcher is PgQ consumer that reads url encoded records from source queue and writes them into partitioned tables according to configuration file\&. Used to prepare data for business intelligence\&. Name of the table is read from producer field in event\&. Batch creation time is used for partitioning\&. All records created in same day will go into same table partion\&. If partiton does not exist cube dispatcer will create it according to template\&.
.sp
Events are usually procuded by pgq\&.logutriga()\&. Logutriga adds all the data of the record into the event (also in case of updates and deletes)\&.
.sp
cube_dispatcher can be used in to modes:
.PP
keep_all
.RS 4
keeps all the data that comes in\&. If record is updated several times during one day then table partiton for that day will contain several instances of that record\&.
.RE
.PP
keep_latest
.RS 4
only last instance of each record is kept for each day\&. That also means that all tables must have primary keys so cube dispatcher can delete previous versions of records before inserting new data\&.
.RE
.SH "QUICK-START"
.sp
Basic cube_dispatcher setup and usage can be summarized by the following steps:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
pgq and logutriga must be installed in source databases\&. See pgqadm man page for details\&. target database must also have pgq_ext schema\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
edit a cube_dispatcher configuration file, say cube_dispatcher_sample\&.ini
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
create source queue
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py ticker\&.ini create
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
create target database and parent tables in it\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
launch cube dispatcher in daemon mode
.sp
.if n \{\
.RS 4
.\}
.nf
$ cube_dispatcher\&.py cube_dispatcher_sample\&.ini \-d
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
start producing events (create logutriga trggers on tables) CREATE OR REPLACE TRIGGER trig_cube_replica AFTER INSERT OR UPDATE ON some_table FOR EACH ROW EXECUTE PROCEDURE pgq\&.logutriga(\fI\fR)
.RE
.SH "CONFIG"
.SS "Common configuration parameters"
.PP
job_name
.RS 4
Name for particulat job the script does\&. Script will log under this name to logdb/logserver\&. The name is also used as default for PgQ consumer name\&. It should be unique\&.
.RE
.PP
pidfile
.RS 4
Location for pid file\&. If not given, script is disallowed to daemonize\&.
.RE
.PP
logfile
.RS 4
Location for log file\&.
.RE
.PP
loop_delay
.RS 4
If continuisly running process, how long to sleep after each work loop, in seconds\&. Default: 1\&.
.RE
.PP
connection_lifetime
.RS 4
Close and reconnect older database connections\&.
.RE
.PP
log_count
.RS 4
Number of log files to keep\&. Default: 3
.RE
.PP
log_size
.RS 4
Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M)
.RE
.PP
use_skylog
.RS 4
If set, search for
[\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons
logging
module\&. It allows setting up fully customizable logging setup\&.
.RE
.SS "Common PgQ consumer parameters"
.PP
pgq_queue_name
.RS 4
Queue name to attach to\&. No default\&.
.RE
.PP
pgq_consumer_id
.RS 4
Consumers ID to use when registering\&. Default: %(job_name)s
.RE
.SS "Config options specific to cube_dispatcher"
.PP
src_db
.RS 4
Connect string for source database where the queue resides\&.
.RE
.PP
dst_db
.RS 4
Connect string for target database where the tables should be created\&.
.RE
.PP
mode
.RS 4
Operation mode for cube_dispatcher\&. Either
keep_all
or
keep_latest\&.
.RE
.PP
dateformat
.RS 4
Optional parameter to specify how to suffix data tables\&. Default is
YYYY_MM_DD
which creates per\-day tables\&. With
YYYY_MM
per\-month tables can be created\&. If explicitly set empty, partitioning is disabled\&.
.RE
.PP
part_template
.RS 4
SQL fragment for table creation\&. Various magic replacements are done there:
.RE
.PP
_PKEY
.RS 4
comma separated list of primery key columns\&.
.RE
.PP
_PARENT
.RS 4
schema\-qualified parent table name\&.
.RE
.PP
_DEST_TABLE
.RS 4
schema\-qualified partition table\&.
.RE
.PP
_SCHEMA_TABLE
.RS 4
same as
\fIDEST_TABLE but dots replaced with "_\fR", to allow use as index names\&.
.RE
.SS "Example config file"
.sp
.if n \{\
.RS 4
.\}
.nf
[cube_dispatcher]
job_name = some_queue_to_cube
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
src_db = dbname=sourcedb_test
dst_db = dbname=dataminedb_test
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
pgq_queue_name = udata\&.some_queue
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
logfile = ~/log/%(job_name)s\&.log
pidfile = ~/pid/%(job_name)s\&.pid
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# how many rows are kept: keep_latest, keep_all
mode = keep_latest
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# to_char() fmt for table suffix
#dateformat = YYYY_MM_DD
# following disables table suffixes:
#dateformat =
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
part_template =
create table _DEST_TABLE (like _PARENT);
alter table only _DEST_TABLE add primary key (_PKEY);
.fi
.if n \{\
.RE
.\}
.SH "LOGUTRIGA EVENT FORMAT"
.sp
PgQ trigger function pgq\&.logutriga() sends table change event into queue in following format:
.PP
ev_type
.RS 4
(op || ":" || pkey_fields)\&. Where op is either "I", "U" or "D", corresponging to insert, update or delete\&. And
pkey_fields
is comma\-separated list of primary key fields for table\&. Operation type is always present but pkey_fields list can be empty, if table has no primary keys\&. Example:
I:col1,col2
.RE
.PP
ev_data
.RS 4
Urlencoded record of data\&. It uses db\-specific urlecoding where existence of
\fI=\fR
is meaningful \- missing
\fI=\fR
means NULL, present
\fI=\fR
means literal value\&. Example:
id=3&name=str&nullvalue&emptyvalue=
.RE
.PP
ev_extra1
.RS 4
Fully qualified table name\&.
.RE
.SH "COMMAND LINE SWITCHES"
.sp
Following switches are common to all skytools\&.DBScript\-based Python programs\&.
.PP
\-h, \-\-help
.RS 4
show help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.PP
\-d, \-\-daemon
.RS 4
make program go background
.RE
.sp
Following switches are used to control already running process\&. The pidfile is read from config then signal is sent to process id specified there\&.
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE
skytools-2.1.13/doc/pgq-sql.txt 0000644 0001750 0001750 00000010211 11670174255 015366 0 ustar marko marko = PgQ - queue for PostgreSQL =
== Queue creation ==
pgq.create_queue(queue_name text)
Initialize event queue.
Returns 0 if event queue already exists, 1 otherwise.
== Producer ==
pgq.insert_event(queue_name text, ev_type, ev_data)
pgq.insert_event(queue_name text, ev_type, ev_data, extra1, extra2, extra3, extra4)
Generate new event. This should be called inside main tx - thus
rollbacked with it if needed.
== Consumer ==
pgq.register_consumer(queue_name text, consumer_id text)
Attaches this consumer to particular event queue.
Returns 0 if the consumer was already attached, 1 otherwise.
pgq.unregister_consumer(queue_name text, consumer_id text)
Unregister and drop resources allocated to customer.
pgq.next_batch(queue_name text, consumer_id text)
Allocates next batch of events to consumer.
Returns batch id (int8), to be used in processing functions. If no batches
are available, returns NULL. That means that the ticker has not cut them yet.
This is the appropriate moment for consumer to sleep.
pgq.get_batch_events(batch_id int8)
`pgq.get_batch_events()` returns a set of events in this batch.
There may be no events in the batch. This is normal. The batch must still be closed
with pgq.finish_batch().
Event fields: (ev_id int8, ev_time timestamptz, ev_txid int8, ev_retry int4, ev_type text,
ev_data text, ev_extra1, ev_extra2, ev_extra3, ev_extra4)
pgq.event_failed(batch_id int8, event_id int8, reason text)
Tag event as 'failed' - it will be stored, but not further processing is done.
pgq.event_retry(batch_id int8, event_id int8, retry_seconds int4)
Tag event for 'retry' - after x seconds the event will be re-inserted
into main queue.
pgq.finish_batch(batch_id int8)
Tag batch as finished. Until this is not done, the consumer will get
same batch again.
After calling finish_batch consumer cannot do any operations with events
of that batch. All operations must be done before.
== Failed queue operation ==
Events tagged as failed just stay on their queue. Following
functions can be used to manage them.
pgq.failed_event_list(queue_name, consumer)
pgq.failed_event_list(queue_name, consumer, cnt, offset)
pgq.failed_event_count(queue_name, consumer)
Get info about the queue.
Event fields are same as for pgq.get_batch_events()
pgq.failed_event_delete(queue_name, consumer, event_id)
pgq.failed_event_retry(queue_name, consumer, event_id)
Remove an event from queue, or retry it.
== Info operations ==
pgq.get_queue_info()
Get list of queues.
Result: (queue_name, queue_ntables, queue_cur_table, queue_rotation_period, queue_switch_time, queue_external_ticker, queue_ticker_max_count, queue_ticker_max_lag, queue_ticker_idle_period, ticker_lag)
pgq.get_consumer_info()
pgq.get_consumer_info(queue_name)
pgq.get_consumer_info(queue_name, consumer)
Get list of active consumers.
Result: (queue_name, consumer_name, lag, last_seen, last_tick, current_batch, next_tick)
pgq.get_batch_info(batch_id)
Get info about batch.
Result fields: (queue_name, consumer_name, batch_start, batch_end, prev_tick_id, tick_id, lag)
== Notes ==
Consumer *must* be able to process same event several times.
== Example ==
First, create event queue:
select pgq.create_queue('LogEvent');
Then, producer side can do whenever it wishes:
select pgq.insert_event('LogEvent', 'data', 'DataFor123');
First step for consumer is to register:
select pgq.register_consumer('LogEvent', 'TestConsumer');
Then it can enter into consuming loop:
begin;
select pgq.next_batch('LogEvent', 'TestConsumer'); [into batch_id]
commit;
That will reserve a batch of events for this consumer.
To see the events in batch:
select * from pgq.get_batch_events(batch_id);
That will give all events in batch. The processing does not need to be happen
all in one transaction, framework can split the work how it wants.
If a events failed or needs to be tried again, framework can call:
select pgq.event_retry(batch_id, event_id, 60);
select pgq.event_failed(batch_id, event_id, 'Record deleted');
When all done, notify database about it:
select pgq.finish_batch(batch_id)
skytools-2.1.13/doc/common.logutriga.txt 0000644 0001750 0001750 00000001312 11670174255 017270 0 ustar marko marko
PgQ trigger function `pgq.logutriga()` sends table change event into
queue in following format:
ev_type::
`(op || ":" || pkey_fields)`. Where op is either "I", "U" or "D",
corresponging to insert, update or delete. And `pkey_fields`
is comma-separated list of primary key fields for table.
Operation type is always present but pkey_fields list can be empty,
if table has no primary keys. Example: `I:col1,col2`
ev_data::
Urlencoded record of data. It uses db-specific urlecoding where
existence of '=' is meaningful - missing '=' means NULL, present
'=' means literal value. Example: `id=3&name=str&nullvalue&emptyvalue=`
ev_extra1::
Fully qualified table name.
skytools-2.1.13/doc/londiste.1 0000644 0001750 0001750 00000030056 11727600366 015155 0 ustar marko marko '\" t
.\" Title: londiste
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "LONDISTE" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
londiste \- PostgreSQL replication engine written in python
.SH "SYNOPSIS"
.sp
.nf
londiste\&.py [option] config\&.ini command [arguments]
.fi
.SH "DESCRIPTION"
.sp
Londiste is the PostgreSQL replication engine portion of the SkyTools suite, by Skype\&. This suite includes packages implementing specific replication tasks and/or solutions in layers, building upon each other\&.
.sp
PgQ is a generic queue implementation based on ideas from Slony\-I\(cqs snapshot based event batching\&. Londiste uses PgQ as its transport mechanism to implement a robust and easy to use replication solution\&.
.sp
Londiste is an asynchronous master\-slave(s) replication system\&. Asynchronous means that a transaction commited on the master is not guaranteed to have made it to any slave at the master\(cqs commit time; and master\-slave means that data changes on slaves are not reported back to the master, it\(cqs the other way around only\&.
.sp
The replication is trigger based, and you choose a set of tables to replicate from the provider to the subscriber(s)\&. Any data changes occuring on the provider (in a replicated table) will fire the londiste trigger, which fills a queue of events for any subscriber(s) to care about\&.
.sp
A replay process consumes the queue in batches, and applies all given changes to any subscriber(s)\&. The initial replication step involves using the PostgreSQL\(cqs COPY command for efficient data loading\&.
.SH "QUICK-START"
.sp
Basic londiste setup and usage can be summarized by the following steps:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
create the subscriber database, with tables to replicate
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
edit a londiste configuration file, say conf\&.ini, and a PgQ ticker configuration file, say ticker\&.ini
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
install londiste on the provider and subscriber nodes\&. This step requires admin privileges on both provider and subscriber sides, and both install commands can be run remotely:
.sp
.if n \{\
.RS 4
.\}
.nf
$ londiste\&.py conf\&.ini provider install
$ londiste\&.py conf\&.ini subscriber install
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
launch the PgQ ticker on the provider machine:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py \-d ticker\&.ini ticker
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
launch the londiste replay process:
.sp
.if n \{\
.RS 4
.\}
.nf
$ londiste\&.py \-d conf\&.ini replay
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
add tables to replicate from the provider database:
.sp
.if n \{\
.RS 4
.\}
.nf
$ londiste\&.py conf\&.ini provider add table1 table2 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 7.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 7." 4.2
.\}
add tables to replicate to the subscriber database:
.sp
.if n \{\
.RS 4
.\}
.nf
$ londiste\&.py conf\&.ini subscriber add table1 table2 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
To replicate to more than one subscriber database just repeat each of the described subscriber steps for each subscriber\&.
.SH "COMMANDS"
.sp
The londiste command is parsed globally, and has both options and subcommands\&. Some options are reserved to a subset of the commands, and others should be used without any command at all\&.
.SH "GENERAL OPTIONS"
.sp
This section presents options available to all and any londiste command\&.
.PP
\-h, \-\-help
.RS 4
show this help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.SH "PROVIDER COMMANDS"
.sp
.if n \{\
.RS 4
.\}
.nf
$ londiste\&.py config\&.ini provider
.fi
.if n \{\
.RE
.\}
.sp
Where command is one of:
.SS "provider install"
.sp
Installs code into provider and subscriber database and creates queue\&. Equivalent to doing following by hand:
.sp
.if n \{\
.RS 4
.\}
.nf
CREATE LANGUAGE plpgsql;
CREATE LANGUAGE plpython;
\ei \&.\&.\&./contrib/txid\&.sql
\ei \&.\&.\&./contrib/pgq\&.sql
\ei \&.\&.\&./contrib/londiste\&.sql
select pgq\&.create_queue(queue name);
.fi
.if n \{\
.RE
.\}
.SS "provider add \&..."
.sp
Registers table(s) on the provider database and adds the londiste trigger to the table(s) which will send events to the queue\&. Table names can be schema qualified with the schema name defaulting to public if not supplied\&.
.PP
\-\-all
.RS 4
Register all tables in provider database, except those that are under schemas
\fIpgq\fR,
\fIlondiste\fR,
\fIinformation_schema\fR
or
\fIpg_*\fR\&.
.RE
.SS "provider remove \&..."
.sp
Unregisters table(s) on the provider side and removes the londiste triggers from the table(s)\&. The table removal event is also sent to the queue, so all subscribers unregister the table(s) on their end as well\&. Table names can be schema qualified with the schema name defaulting to public if not supplied\&.
.SS "provider add\-seq \&..."
.sp
Registers a sequence on provider\&.
.SS "provider remove\-seq \&..."
.sp
Unregisters a sequence on provider\&.
.SS "provider tables"
.sp
Shows registered tables on provider side\&.
.SS "provider seqs"
.sp
Shows registered sequences on provider side\&.
.SH "SUBSCRIBER COMMANDS"
.sp
.if n \{\
.RS 4
.\}
.nf
londiste\&.py config\&.ini subscriber
.fi
.if n \{\
.RE
.\}
.sp
Where command is one of:
.SS "subscriber install"
.sp
Installs code into subscriber database\&. Equivalent to doing following by hand:
.sp
.if n \{\
.RS 4
.\}
.nf
CREATE LANGUAGE plpgsql;
\ei \&.\&.\&./contrib/londiste\&.sql
.fi
.if n \{\
.RE
.\}
.sp
This will be done under the Postgres Londiste user, if the tables should be owned by someone else, it needs to be done by hand\&.
.SS "subscriber add \&..."
.sp
Registers table(s) on subscriber side\&. Table names can be schema qualified with the schema name defaulting to public if not supplied\&.
.sp
Switches (optional):
.PP
\-\-all
.RS 4
Add all tables that are registered on provider to subscriber database
.RE
.PP
\-\-force
.RS 4
Ignore table structure differences\&.
.RE
.PP
\-\-expect\-sync
.RS 4
Table is already synced by external means so initial COPY is unnecessary\&.
.RE
.PP
\-\-skip\-truncate
.RS 4
When doing initial COPY, don\(cqt remove old data\&.
.RE
.SS "subscriber remove \&..."
.sp
Unregisters table(s) from subscriber\&. No events will be applied to the table anymore\&. Actual table will not be touched\&. Table names can be schema qualified with the schema name defaulting to public if not supplied\&.
.SS "subscriber add\-seq \&..."
.sp
Registers a sequence on subscriber\&.
.SS "subscriber remove\-seq \&..."
.sp
Unregisters a sequence on subscriber\&.
.SS "subscriber resync \&..."
.sp
Tags table(s) as "not synced"\&. Later the replay process will notice this and launch copy process(es) to sync the table(s) again\&.
.SS "subscriber tables"
.sp
Shows registered tables on the subscriber side, and the current state of each table\&. Possible state values are:
.PP
NEW
.RS 4
the table has not yet been considered by londiste\&.
.RE
.PP
in\-copy
.RS 4
Full\-table copy is in progress\&.
.RE
.PP
catching\-up
.RS 4
Table is copied, missing events are replayed on to it\&.
.RE
.PP
wanna\-sync:
.RS 4
The "copy" process catched up, wants to hand the table over to "replay"\&.
.RE
.PP
do\-sync:
.RS 4
"replay" process is ready to accept it\&.
.RE
.PP
ok
.RS 4
table is in sync\&.
.RE
.SS "subscriber fkeys"
.sp
Show pending and active foreign keys on tables\&. Takes optional type argument \- pending or active\&. If no argument is given, both types are shown\&.
.sp
Pending foreign keys are those that were removed during COPY time but have not restored yet, The restore happens autmatically if both tables are synced\&.
.SS "subscriber triggers"
.sp
Show pending and active triggers on tables\&. Takes optional type argument \- pending or active\&. If no argument is given, both types are shown\&.
.sp
Pending triggers keys are those that were removed during COPY time but have not restored yet, The restore of triggers does not happen autmatically, it needs to be done manually with restore\-triggers command\&.
.SS "subscriber restore\-triggers "
.sp
Restores all pending triggers for single table\&. Optionally trigger name can be given as extra argument, then only that trigger is restored\&.
.SS "subscriber register"
.sp
Register consumer on queue\&. This usually happens automatically when replay is launched, but
.SS "subscriber unregister"
.sp
Unregister consumer from provider\(cqs queue\&. This should be done if you want to shut replication down\&.
.SH "REPLICATION COMMANDS"
.SS "replay"
.sp
The actual replication process\&. Should be run as daemon with \-d switch, because it needs to be always running\&.
.sp
It\(cqs main task is to get batches of events from PgQ and apply them to subscriber database\&.
.sp
Switches:
.PP
\-d, \-\-daemon
.RS 4
go background
.RE
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE
.SH "UTILITY COMMAND"
.SS "repair \&..."
.sp
Attempts to achieve a state where the table(s) is/are in sync, compares them, and writes out SQL statements that would fix differences\&.
.sp
Syncing happens by locking provider tables against updates and then waiting until the replay process has applied all pending changes to subscriber database\&. As this is dangerous operation, it has a hardwired limit of 10 seconds for locking\&. If the replay process does not catch up in that time, the locks are released and the repair operation is cancelled\&.
.sp
Comparing happens by dumping out the table contents of both sides, sorting them and then comparing line\-by\-line\&. As this is a CPU and memory\-hungry operation, good practice is to run the repair command on a third machine to avoid consuming resources on either the provider or the subscriber\&.
.SS "compare \&..."
.sp
Syncs tables like repair, but just runs SELECT count(*) on both sides to get a little bit cheaper, but also less precise, way of checking if the tables are in sync\&.
.SH "CONFIGURATION"
.sp
Londiste and PgQ both use INI configuration files, your distribution of skytools include examples\&. You often just have to edit the database connection strings, namely db in PgQ ticker\&.ini and provider_db and subscriber_db in londiste conf\&.ini as well as logfile and pidfile to adapt to you system paths\&.
.sp
See londiste(5)\&.
.SH "SEE ALSO"
.sp
londiste(5)
.sp
\m[blue]\fBhttps://developer\&.skype\&.com/SkypeGarage/DbProjects/SkyTools/\fR\m[]
.sp
\m[blue]\fBReference guide\fR\m[]\&\s-2\u[1]\d\s+2
.SH "NOTES"
.IP " 1." 4
Reference guide
.RS 4
\%http://skytools.projects.postgresql.org/doc/londiste.ref.html
.RE
skytools-2.1.13/doc/skytools_upgrade.txt 0000644 0001750 0001750 00000001377 11670174255 017415 0 ustar marko marko = skytools_upgrade(1) =
== NAME ==
skytools_upgrade - utility for upgrading Skytools code in databases.
== SYNOPSIS ==
skytools_upgrade.py connstr [connstr ..]
== DESCRIPTION ==
It connects to given database, then looks for following schemas:
pgq::
Main PgQ code.
pgq_ext::
PgQ batch/event tracking in remote database.
londiste::
Londiste replication.
If schema exists, its version is detected by querying .version()
function under schema. If the function does not exists, there
is some heiristics built in to differentiate between 2.1.4 and
2.1.5 version of ther schemas.
If detected that version is older that current, it is upgraded
by applying upgrade scripts in order.
== COMMAND LINE SWITCHES ==
include::common.switches.txt[]
skytools-2.1.13/doc/bulk_loader.txt 0000644 0001750 0001750 00000005354 11670174255 016301 0 ustar marko marko
= bulk_loader(1) =
== NAME ==
bulk_loader - PgQ consumer that loads urlencoded records to slow databases
== SYNOPSIS ==
bulk_loader.py [switches] config.ini
== DESCRIPTION ==
bulk_loader is PgQ consumer that reads url encoded records from source queue
and writes them into tables according to configuration file. It is targeted
to slow databases that cannot handle applying each row as separate statement.
Originally written for BizgresMPP/greenplumDB which have very high per-statement
overhead, but can also be used to load regular PostgreSQL database that cannot
manage regular replication.
Behaviour properties:
- reads urlencoded "logutriga" records.
- does not do partitioning, but allows optionally redirect table events.
- does not keep event order.
- always loads data with COPY, either directly to main table (INSERTs)
or to temp tables (UPDATE/COPY) then applies from there.
Events are usually procuded by `pgq.logutriga()`. Logutriga adds all the data
of the record into the event (also in case of updates and deletes).
== QUICK-START ==
Basic bulk_loader setup and usage can be summarized by the following
steps:
1. pgq and logutriga must be installed in source databases.
See pgqadm man page for details. target database must also
have pgq_ext schema.
2. edit a bulk_loader configuration file, say bulk_loader_sample.ini
3. create source queue
$ pgqadm.py ticker.ini create
4. Tune source queue to have big batches:
$ pgqadm.py ticker.ini config ticker_max_count="10000" ticker_max_lag="10 minutes" ticker_idle_period="10 minutes"
5. create target database and tables in it.
6. launch bulk_loader in daemon mode
$ bulk_loader.py -d bulk_loader_sample.ini
7. start producing events (create logutriga trggers on tables)
CREATE OR REPLACE TRIGGER trig_bulk_replica AFTER INSERT OR UPDATE ON some_table
FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('')
== CONFIG ==
include::common.config.txt[]
=== Config options specific to `bulk_loader` ===
src_db::
Connect string for source database where the queue resides.
dst_db::
Connect string for target database where the tables should be created.
remap_tables::
Optional parameter for table redirection. Contains comma-separated
list of : pairs. Eg: `oldtable1:newtable1, oldtable2:newtable2`.
load_method::
Optional parameter for load method selection. Available options:
0:: UPDATE as UPDATE from temp table. This is default.
1:: UPDATE as DELETE+COPY from temp table.
2:: merge INSERTs with UPDATEs, then do DELETE+COPY from temp table.
== LOGUTRIGA EVENT FORMAT ==
include::common.logutriga.txt[]
== COMMAND LINE SWITCHES ==
include::common.switches.txt[]
skytools-2.1.13/doc/cube_dispatcher.txt 0000644 0001750 0001750 00000007225 11670174255 017141 0 ustar marko marko
= cube_dispatcher(1) =
== NAME ==
cube_dispatcher - PgQ consumer that is used to write source records into partitoned tables
== SYNOPSIS ==
cube_dispatcher.py [switches] config.ini
== DESCRIPTION ==
cube_dispatcher is PgQ consumer that reads url encoded records from source queue
and writes them into partitioned tables according to configuration file.
Used to prepare data for business intelligence. Name of the table is read from
producer field in event. Batch creation time is used for partitioning. All records
created in same day will go into same table partion. If partiton does not exist
cube dispatcer will create it according to template.
Events are usually procuded by `pgq.logutriga()`. Logutriga adds all the data
of the record into the event (also in case of updates and deletes).
`cube_dispatcher` can be used in to modes:
keep_all::
keeps all the data that comes in. If record is updated several times
during one day then table partiton for that day will contain several instances of
that record.
keep_latest::
only last instance of each record is kept for each day. That also
means that all tables must have primary keys so cube dispatcher can delete previous
versions of records before inserting new data.
== QUICK-START ==
Basic cube_dispatcher setup and usage can be summarized by the following
steps:
1. pgq and logutriga must be installed in source databases.
See pgqadm man page for details. target database must also
have pgq_ext schema.
2. edit a cube_dispatcher configuration file, say cube_dispatcher_sample.ini
3. create source queue
$ pgqadm.py ticker.ini create
4. create target database and parent tables in it.
5. launch cube dispatcher in daemon mode
$ cube_dispatcher.py cube_dispatcher_sample.ini -d
6. start producing events (create logutriga trggers on tables)
CREATE OR REPLACE TRIGGER trig_cube_replica AFTER INSERT OR UPDATE ON some_table
FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('')
== CONFIG ==
include::common.config.txt[]
=== Config options specific to `cube_dispatcher` ===
src_db::
Connect string for source database where the queue resides.
dst_db::
Connect string for target database where the tables should be created.
mode::
Operation mode for cube_dispatcher. Either `keep_all` or `keep_latest`.
dateformat::
Optional parameter to specify how to suffix data tables.
Default is `YYYY_MM_DD` which creates per-day tables.
With `YYYY_MM` per-month tables can be created.
If explicitly set empty, partitioning is disabled.
part_template::
SQL fragment for table creation. Various magic replacements are done there:
_PKEY:: comma separated list of primery key columns.
_PARENT:: schema-qualified parent table name.
_DEST_TABLE:: schema-qualified partition table.
_SCHEMA_TABLE:: same as _DEST_TABLE but dots replaced with "__", to allow use as index names.
=== Example config file ===
[cube_dispatcher]
job_name = some_queue_to_cube
src_db = dbname=sourcedb_test
dst_db = dbname=dataminedb_test
pgq_queue_name = udata.some_queue
logfile = ~/log/%(job_name)s.log
pidfile = ~/pid/%(job_name)s.pid
# how many rows are kept: keep_latest, keep_all
mode = keep_latest
# to_char() fmt for table suffix
#dateformat = YYYY_MM_DD
# following disables table suffixes:
#dateformat =
part_template =
create table _DEST_TABLE (like _PARENT);
alter table only _DEST_TABLE add primary key (_PKEY);
== LOGUTRIGA EVENT FORMAT ==
include::common.logutriga.txt[]
== COMMAND LINE SWITCHES ==
include::common.switches.txt[]
skytools-2.1.13/doc/walmgr.txt 0000644 0001750 0001750 00000021754 11670174255 015311 0 ustar marko marko
= walmgr(1) =
== NAME ==
walmgr - tools for managing WAL-based replication for PostgreSQL.
== SYNOPSIS ==
walmgr.py command
== DESCRIPTION ==
It is both admin and worker script for PostgreSQL PITR replication.
== QUICK START ==
1. Set up passwordless ssh authentication from master to slave
master$ test -f ~/.ssh/id_dsa.pub || ssh-keygen -t dsa
master$ cat ~/.ssh/id_dsa.pub | ssh slave cat \>\> .ssh/authorized_keys
2. Configure paths
master$ edit master.ini
slave$ edit slave.ini
Make sure that walmgr.py executable has same pathname on slave and master.
3. Start archival process and create a base backup
master$ ./walmgr.py master.ini setup
master$ ./walmgr.py master.ini backup
Note: starting from PostgreSQL 8.3 the archiving is enabled by setting
archive_mode GUC to on. However changing this parameter requires the
server to be restarted.
4. Prepare postgresql.conf and pg_hba.conf on slave and start replay
master$ scp $PGDATA/*.conf slave:
slave$ ./walmgr.py slave.ini restore
For debian based distributions the standard configuration files are located
in /etc/postgresql/x.x/main directory. If another scheme is used the postgresql.conf
and pg_hba.conf should be copied to slave full_backup directory. Make sure to
disable archive_command in slave config.
'walmgr.py restore' moves data in place, creates recovery.conf and starts postmaster
in recovery mode.
5. In-progress WAL segments can be backup by command:
master$ ./walmgr.py master.ini sync
6. If need to stop replay on slave and boot into normal mode, do:
slave$ ./walmgr.py slave.ini boot
== GENERAL OPTIONS ==
Common options to all walmgr.py commands.
-h, --help::
show this help message and exit
-q, --quiet::
make program silent
-v, --verbose::
make program more verbose
-n, --not-really::
Show what would be done without actually doing anything.
== MASTER COMMANDS ==
=== setup ===
Sets up postgres archiving, creates necessary directory structures on slave.
=== sync ===
Synchronizes in-progress WAL files to slave.
=== syncdaemon ===
Start WAL synchronization in daemon mode. This will start periodically synching
the in-progress WAL files to slave.
The following parameters are used to drive the syncdaemon:
loop_delay - how long to sleep between the synchs.
use_xlog_functions - use record based shipping to synchronize in-progress WAL segments.
=== stop ===
Deconfigures postgres archiving.
=== periodic ===
Runs periodic command, if configured. This enables to execute arbitrary commands
on interval, useful for synchronizing scripts, config files, crontabs etc.
=== listbackups ===
List backup sets available on slave node.
=== backup ===
Creates a new base backup from master database. Will purge expired backups and WAL
files on slave if `keep_backups` is specified. During a backup a lock file is
created in slave `completed_wals` directory. This is to prevent simultaneous
backups and resulting corruption. If running backup is terminated, the BACKUPLOCK
file may have to be removed manually.
=== restore ===
EXPERIMENTAL. Attempts to restore the backup from slave to master.
== SLAVE COMMANDS ==
=== boot ===
Stop log playback and bring the database up.
=== pause ===
Pauses WAL playback.
=== continue ===
Continues previously paused WAL playback.
=== listbackups ===
Lists available backups.
=== backup ===
EXPERIMENTAL. Creates a new backup from slave data. Log replay is paused,
slave data directory is backed up to `full_backup` directory and log
replay resumed. Backups are rotated as needed. The idea is to move the
backup load away from production node. Usable from postgres 8.2 and up.
=== restore [src][dst] ===
Restores the specified backup set to target directory. If specified without
arguments the latest backup is *moved* to slave data directory (doesn't obey
retention rules). If src backup is specified the backup is copied (instead of moving).
Alternative destination directory can be specified with `dst`.
== CONFIGURATION ==
=== Common settings ===
==== job_name ====
Optional. Indentifies this script, used in logging. Keep unique if
using central logging.
==== logfile ====
Where to log.
==== use_skylog ====
Optional. If nonzero, skylog.ini is used for log configuration.
=== Master settings ===
==== pidfile ====
Pid file location for syncdaemon mode (if running with -d). Otherwise
not required.
==== master_db ====
Database to connect to for pg_start_backup() etc. It is not a
good idea to use `dbname=template` if running syncdaemon in
record shipping mode.
==== master_data ====
Master data directory location.
==== master_config ====
Master postgresql.conf file location. This is where
`archive_command` gets updated.
==== master_restart_cmd ====
The command to restart master database, this used after changing
`archive_mode` parameter. Leave unset, if you cannot afford to
restart the database at setup/stop.
==== slave ====
Slave host and base directory.
==== slave_config ====
Configuration file location for the slave walmgr.
==== completed_wals ====
Slave directory where archived WAL files are copied.
==== partial_wals ====
Slave directory where incomplete WAL files are stored.
==== full_backup ====
Slave directory where full backups are stored.
==== config_backup ====
Slave directory where configuration file backups are stored. Optional.
==== loop_delay ====
The frequency of syncdaemon updates. In record shipping mode only
incremental updates are sent, so smaller interval can be used.
==== use_xlog_functions ====
Use pg_xlog functions for record based shipping (available in 8.2 and up).
==== compression ====
If nonzero, a -z flag is added to rsync cmdline. Will reduce network
traffic at the cost of extra CPU time.
==== periodic_command ====
Shell script to be executed at specified time interval. Can be used for
synchronizing scripts, config files etc.
==== command_interval ====
How ofter to run periodic command script. In seconds, and only evaluated
at log switch times.
==== hot_standby ===
Boolean. If set to true, walmgr setup will set wal_level to hot_standby (9.0 and newer).
=== Sample master.ini ===
[wal-master]
logfile = master.log
pidfile = master.pid
master_db = dbname=template1
master_data = /var/lib/postgresql/8.0/main
master_config = /etc/postgresql/8.0/main/postgresql.conf
slave = slave:/var/lib/postgresql/walshipping
completed_wals = %(slave)s/logs.complete
partial_wals = %(slave)s/logs.partial
full_backup = %(slave)s/data.master
loop_delay = 10.0
use_xlog_functions = 1
compression = 1
=== Slave settings ===
==== slave_data ====
Postgres data directory for the slave. This is where the restored
backup is copied/moved.
==== slave_config_dir ====
Directory for postgres configuration files. If specified, "walmgr restore"
attempts to restore configuration files from config_backup directory.
==== slave_stop_cmd ====
Script to stop postmaster on slave.
==== slave_start_cmd ====
Script to start postmaster on slave.
==== slave ====
Base directory for slave files (logs.complete, data.master etc)
==== slave_bin ====
Specifies the location of postgres binaries (pg_controldata, etc). Needed if
they are not already in the PATH.
==== completed_wals ====
Directory where complete WAL files are stored. Also miscellaneous control files
are created in this directory (BACKUPLOCK, STOP, PAUSE, etc.).
==== partial_wals ====
Directory where partial WAL files are stored.
==== full_backup ====
Directory where full backups are stored.
==== keep_backups ====
Number of backups to keep. Also all WAL files needed to bring earliest
backup up to date are kept. The backups are rotated before new backup
is started, so at one point there is actually one less backup available.
It probably doesn't make sense to specify `keep_backups` if periodic
backups are not performed - the WAL files will pile up quickly.
Backups will be named data.master, data.master.0, data.master.1 etc.
==== archive_command ====
Script to execute before rotating away the oldest backup. If it fails
backups will not be rotated.
==== slave_pg_xlog ====
Set slave_pg_xlog to the directory on the slave where pg_xlog files get
written to. On a restore to the slave walmgr.py will
create a symbolic link from data/pg_xlog to this location.
==== backup_datadir ====
Set backup_datadir to 'no' to prevent walmgr.py from making a backup
of the data directory when restoring to the slave. This defaults to
'yes'
=== Sample slave.ini ===
[wal-slave]
logfile = slave.log
slave_data = /var/lib/postgresql/8.0/main
slave_stop_cmd = /etc/init.d/postgresql-8.0 stop
slave_start_cmd = /etc/init.d/postgresql-8.0 start
slave = /var/lib/postgresql/walshipping
completed_wals = %(slave)s/logs.complete
partial_wals = %(slave)s/logs.partial
full_backup = %(slave)s/data.master
keep_backups = 5
backup_datadir = yes
skytools-2.1.13/doc/overview.txt 0000644 0001750 0001750 00000013616 11670174255 015664 0 ustar marko marko #pragma section-numbers 2
= SkyTools =
[[TableOfContents]]
== Intro ==
This is package of tools we use at Skype to manage our cluster of [http://www.postgresql.org PostgreSQL]
servers. They are put together for our own convinience and also because they build on each other,
so managing them separately is pain.
The code is hosted at [http://pgfoundry.org PgFoundry] site:
http://pgfoundry.org/projects/skytools/
There are our [http://pgfoundry.org/frs/?group_id=1000206 downloads] and
[http://lists.pgfoundry.org/mailman/listinfo/skytools-users mailing list].
Also [http://pgfoundry.org/scm/?group_id=1000206 CVS]
and [http://pgfoundry.org/tracker/?group_id=1000206 bugtracker].
Combined todo list for all the modules: [http://skytools.projects.postgresql.org/doc/TODO.html TODO.html]
== High-level tools ==
Those are script that are meant for end-user.
In our case that means database administrators.
=== Londiste ===
Replication engine written in Python. It uses PgQ as transport mechanism.
Its main goals are robustness and easy usage. Thus its not as complete
and featureful as Slony-I.
[http://pgsql.tapoueh.org/londiste.html Tutorial] written by Dimitri Fontaine.
Documentation:
* [http://skytools.projects.postgresql.org/doc/londiste.cmdline.html Usage guide]
* [http://skytools.projects.postgresql.org/doc/londiste.config.html Config file]
* [http://skytools.projects.postgresql.org/doc/londiste.ref.html Low-level reference]
''' Features '''
* Tables can be added one-by-one into set.
* Initial COPY for one table does not block event replay for other tables.
* Can compare tables on both sides.
* Supports sequences.
* Easy installation.
''' Missing features '''
* Does not understand cascaded replication, when one subscriber acts
as provider to another one and it dies, the last one loses sync with the first one.
In other words - it understands only pair of servers.
''' Sample usage '''
{{{
## install pgq on provider:
$ pgqadm.py provider_ticker.ini install
## run ticker on provider:
$ pgqadm.py provider_ticker.ini ticker -d
## install Londiste in provider
$ londiste.py replic.ini provider install
## install Londiste in subscriber
$ londiste.py replic.ini subscriber install
## start replication daemon
$ londiste.py replic.ini replay -d
## activate tables on provider
$ londiste.py replic.ini provider add users orders
## add tables to subscriber
$ londiste.py replic.ini subscriber add users
}}}
=== PgQ ===
Generic queue implementation. Based on ideas from [http://www.slony1.info/ Slony-I] -
snapshot based event batching.
''' Features '''
* Generic multi-consumer, multi-producer queue.
* There can be several consumers on one queue.
* It is guaranteed that each of them sees a event at least once.
But it's not guaranteed that it sees it only once.
* The goal is to provide a clean API as SQL functions. The frameworks
on top of that don't need to understand internal details.
''' Technical design '''
* Events are batched using snapshots (like Slony-I).
* Consumers are poll-only, they don't need to do any administrative work.
* Queue administration is separate process from consumers.
* Tolerant of long transactions.
* Easy to monitor.
''' Docs '''
* [http://skytools.projects.postgresql.org/doc/pgq-sql.html SQL API overview]
* [http://skytools.projects.postgresql.org/pgq/ SQL API detailed docs]
* [http://skytools.projects.postgresql.org/doc/pgq-admin.html Administrative tool usage]
=== WalMgr ===
Python script for hot failover. Tries to make setup
initial copy and later switch easy for admins.
* Docs: [http://skytools.projects.postgresql.org/doc/walmgr.html walmgr.html]
Sample:
{{{
[ .. prepare config .. ]
master$ walmgr master.ini setup
master$ walmgr master.ini backup
slave$ walmgr slave.ini restore
[ .. main server down, switch failover server to normal mode: ]
slave$ walmgr slave.ini boot
}}}
== Low-level tools ==
Those are building blocks for the PgQ and Londiste.
Useful for database developers.
=== txid ===
Provides 8-byte transaction id-s for external usage.
=== logtriga ===
Trigger function for table event logging in "partial SQL" format.
Based on Slony-I logtrigger. Used in londiste for replication.
=== logutriga ===
Trigger function for table event logging in urlencoded format.
Written in PL/Python. For cases where data manipulation is necessary.
== Developement frameworks ==
=== skytools - Python framework for database scripting ===
This collect various utilities for Python scripts for databases.
''' Topics '''
* Daemonization
* Logging
* Configuration.
* Skeleton class for scripts.
* Quoting (SQL/COPY)
* COPY helpers.
* Database object lookup.
* Table structure detection.
Documentation: http://skytools.projects.postgresql.org/api/
=== pgq - Python framework for PgQ consumers ===
This builds on scripting framework above.
Docs:
* [http://skytools.projects.postgresql.org/api/ Python API docs]
== Sample scripts ==
Those are specialized script that are based on skytools/pgq framework.
Can be considered examples, although they are used in production in Skype.
=== Special data moving scripts ===
There are couple of scripts for situations where regular replication
does not fit. They all operate on `logutriga()` urlencoded queues.
* `cube_dispatcher`: Multi-table partitioning on change date, with optional keep-all-row-versions mode.
* `table_dispatcher`: configurable partitioning for one table.
* `bulk_loader`: aggregates changes for slow databases. Instead of each change in separate statement,
does minimal amount of DELETE-s and then big COPY.
|| Script || Supported operations || Number of tables || Partitioning ||
|| table_dispatcher || INSERT || 1 || any ||
|| cube_dispatcher || INSERT/UPDATE || any || change time ||
|| bulk_loader || INSERT/UPDATE/DELETE || any || none ||
=== queue_mover ===
Simply copies all events from one queue to another.
=== scriptmgr ===
Allows to start and stop several scripts together.
skytools-2.1.13/doc/Makefile 0000644 0001750 0001750 00000005740 11670174255 014714 0 ustar marko marko
include ../config.mak
wiki = https://developer.skype.com/SkypeGarage/DbProjects/SkyTools
web = mkz@shell.pgfoundry.org:/home/pgfoundry.org/groups/skytools/htdocs/
EPYDOC = epydoc
EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \
--name="Skytools" --html --no-private -v
HTMLS = londiste.cmdline.html londiste.config.html README.html INSTALL.html \
londiste.ref.html TODO.html pgq-sql.html pgq-admin.html pgq-nodupes.html \
$(SCRIPT_HTMLS)
SCRIPT_TXTS = walmgr.txt cube_dispatcher.txt table_dispatcher.txt \
queue_mover.txt queue_splitter.txt bulk_loader.txt \
scriptmgr.txt skytools_upgrade.txt
SCRIPT_MANS = $(SCRIPT_TXTS:.txt=.1)
SCRIPT_HTMLS = $(SCRIPT_TXTS:.txt=.html)
COMMON = common.switches.txt common.config.txt common.logutriga.txt
GETATTRS = python ./getattrs.py
all: man
man: londiste.1 londiste.5 pgqadm.1 $(SCRIPT_MANS)
html: $(HTMLS)
install: man
mkdir -p $(DESTDIR)/$(mandir)/man1
mkdir -p $(DESTDIR)/$(mandir)/man5
install -m 644 londiste.1 $(DESTDIR)/$(mandir)/man1
install -m 644 londiste.5 $(DESTDIR)/$(mandir)/man5
install -m 644 pgqadm.1 $(DESTDIR)/$(mandir)/man1
for m in $(SCRIPT_MANS); do \
install -m 644 $$m $(DESTDIR)/$(mandir)/man1 ; \
done
old.wiki.upload:
devupload.sh overview.txt $(wiki)
#devupload.sh TODO.txt $(wiki)/ToDo
#devupload.sh londiste.txt $(wiki)/LondisteUsage
#devupload.sh londiste.ref.txt $(wiki)/LondisteReference
#devupload.sh pgq-sql.txt $(wiki)/PgQdocs
#devupload.sh pgq-nodupes.txt $(wiki)/PgqNoDupes
#devupload.sh walmgr.txt $(wiki)/WalMgr
#devupload.sh pgq-admin.txt $(wiki)/PgqAdm
PY_PKGS = skytools pgq londiste
# skytools.config skytools.dbstruct skytools.gzlog \
# skytools.quoting skytools.scripting skytools.sqltools \
# pgq pgq.consumer pgq.event pgq.maint pgq.producer pgq.status pgq.ticker \
# londiste londiste.compare londiste.file_read londiste.file_write \
# londiste.installer londiste.playback londiste.repair londiste.setup \
# londiste.syncer londiste.table_copy
apidoc:
rm -rf api
mkdir -p api
cd ../python && $(EPYDOC) $(EPYARGS) -o ../doc/api $(PY_PKGS)
apiupload: apidoc
rsync -rtlz api/* $(web)/api
cd ../sql/pgq && rm -rf docs/html && $(MAKE) dox
rsync -rtlz ../sql/pgq/docs/html/* $(web)/pgq/
clean:
rm -rf api *.html
distclean: clean
rm -rf ../sql/pgq/docs/pgq
realclean: distclean
rm -f *.[15] *.xml
ifneq ($(ASCIIDOC),no)
ifneq ($(XMLTO),no)
londiste.1: londiste.cmdline.xml
$(XMLTO) man $<
londiste.5: londiste.config.xml
$(XMLTO) man $<
pgqadm.1: pgq-admin.xml
$(XMLTO) man $<
walmgr.1: walmgr.xml
$(XMLTO) man $<
%.xml: %.txt $(COMMON)
$(ASCIIDOC) -b docbook -d manpage `$(GETATTRS) $<` -o - $< \
| cat > $@
%.1: %.xml
$(XMLTO) man $<
endif
%.html: %.txt $(COMMON)
$(ASCIIDOC) -a toc `$(GETATTRS) $<` $<
README.html: ../README
cat $< \
| sed -e 's,doc/\([!-~]*\)[.]txt,link:\1.html[],g' \
-e 's,http:[!-~]*,&[],g' \
| $(ASCIIDOC) -o $@ -
INSTALL.html: ../INSTALL
$(ASCIIDOC) -o $@ $<
endif
web: $(HTMLS)
rsync -avz $(HTMLS) $(web)/doc/
skytools-2.1.13/doc/walmgr.1 0000644 0001750 0001750 00000034073 11727600373 014626 0 ustar marko marko '\" t
.\" Title: walmgr
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "WALMGR" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
walmgr \- tools for managing WAL\-based replication for PostgreSQL\&.
.SH "SYNOPSIS"
.sp
.nf
walmgr\&.py command
.fi
.SH "DESCRIPTION"
.sp
It is both admin and worker script for PostgreSQL PITR replication\&.
.SH "QUICK START"
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
Set up passwordless ssh authentication from master to slave
.sp
.if n \{\
.RS 4
.\}
.nf
master$ test \-f ~/\&.ssh/id_dsa\&.pub || ssh\-keygen \-t dsa
master$ cat ~/\&.ssh/id_dsa\&.pub | ssh slave cat \e>\e> \&.ssh/authorized_keys
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
Configure paths
.sp
.if n \{\
.RS 4
.\}
.nf
master$ edit master\&.ini
slave$ edit slave\&.ini
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
Make sure that walmgr\&.py executable has same pathname on slave and master\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
Start archival process and create a base backup
.sp
.if n \{\
.RS 4
.\}
.nf
master$ \&./walmgr\&.py master\&.ini setup
master$ \&./walmgr\&.py master\&.ini backup
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
Note: starting from PostgreSQL 8\&.3 the archiving is enabled by setting
archive_mode GUC to on\&. However changing this parameter requires the
server to be restarted\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
Prepare postgresql\&.conf and pg_hba\&.conf on slave and start replay
.sp
.if n \{\
.RS 4
.\}
.nf
master$ scp $PGDATA/*\&.conf slave:
slave$ \&./walmgr\&.py slave\&.ini restore
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
For debian based distributions the standard configuration files are located
in /etc/postgresql/x\&.x/main directory\&. If another scheme is used the postgresql\&.conf
and pg_hba\&.conf should be copied to slave full_backup directory\&. Make sure to
disable archive_command in slave config\&.
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
\*(Aqwalmgr\&.py restore\*(Aq moves data in place, creates recovery\&.conf and starts postmaster
in recovery mode\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
In\-progress WAL segments can be backup by command:
.sp
.if n \{\
.RS 4
.\}
.nf
master$ \&./walmgr\&.py master\&.ini sync
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
If need to stop replay on slave and boot into normal mode, do:
.sp
.if n \{\
.RS 4
.\}
.nf
slave$ \&./walmgr\&.py slave\&.ini boot
.fi
.if n \{\
.RE
.\}
.RE
.SH "GENERAL OPTIONS"
.sp
Common options to all walmgr\&.py commands\&.
.PP
\-h, \-\-help
.RS 4
show this help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.PP
\-n, \-\-not\-really
.RS 4
Show what would be done without actually doing anything\&.
.RE
.SH "MASTER COMMANDS"
.SS "setup"
.sp
Sets up postgres archiving, creates necessary directory structures on slave\&.
.SS "sync"
.sp
Synchronizes in\-progress WAL files to slave\&.
.SS "syncdaemon"
.sp
Start WAL synchronization in daemon mode\&. This will start periodically synching the in\-progress WAL files to slave\&.
.sp
The following parameters are used to drive the syncdaemon: loop_delay \- how long to sleep between the synchs\&. use_xlog_functions \- use record based shipping to synchronize in\-progress WAL segments\&.
.SS "stop"
.sp
Deconfigures postgres archiving\&.
.SS "periodic"
.sp
Runs periodic command, if configured\&. This enables to execute arbitrary commands on interval, useful for synchronizing scripts, config files, crontabs etc\&.
.SS "listbackups"
.sp
List backup sets available on slave node\&.
.SS "backup"
.sp
Creates a new base backup from master database\&. Will purge expired backups and WAL files on slave if keep_backups is specified\&. During a backup a lock file is created in slave completed_wals directory\&. This is to prevent simultaneous backups and resulting corruption\&. If running backup is terminated, the BACKUPLOCK file may have to be removed manually\&.
.SS "restore "
.sp
EXPERIMENTAL\&. Attempts to restore the backup from slave to master\&.
.SH "SLAVE COMMANDS"
.SS "boot"
.sp
Stop log playback and bring the database up\&.
.SS "pause"
.sp
Pauses WAL playback\&.
.SS "continue"
.sp
Continues previously paused WAL playback\&.
.SS "listbackups"
.sp
Lists available backups\&.
.SS "backup"
.sp
EXPERIMENTAL\&. Creates a new backup from slave data\&. Log replay is paused, slave data directory is backed up to full_backup directory and log replay resumed\&. Backups are rotated as needed\&. The idea is to move the backup load away from production node\&. Usable from postgres 8\&.2 and up\&.
.SS "restore [src][dst]"
.sp
Restores the specified backup set to target directory\&. If specified without arguments the latest backup is \fBmoved\fR to slave data directory (doesn\(cqt obey retention rules)\&. If src backup is specified the backup is copied (instead of moving)\&. Alternative destination directory can be specified with dst\&.
.SH "CONFIGURATION"
.SS "Common settings"
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBjob_name\fR
.RS 4
.sp
Optional\&. Indentifies this script, used in logging\&. Keep unique if using central logging\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBlogfile\fR
.RS 4
.sp
Where to log\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBuse_skylog\fR
.RS 4
.sp
Optional\&. If nonzero, skylog\&.ini is used for log configuration\&.
.RE
.SS "Master settings"
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBpidfile\fR
.RS 4
.sp
Pid file location for syncdaemon mode (if running with \-d)\&. Otherwise not required\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBmaster_db\fR
.RS 4
.sp
Database to connect to for pg_start_backup() etc\&. It is not a good idea to use dbname=template if running syncdaemon in record shipping mode\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBmaster_data\fR
.RS 4
.sp
Master data directory location\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBmaster_config\fR
.RS 4
.sp
Master postgresql\&.conf file location\&. This is where archive_command gets updated\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBmaster_restart_cmd\fR
.RS 4
.sp
The command to restart master database, this used after changing archive_mode parameter\&. Leave unset, if you cannot afford to restart the database at setup/stop\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave\fR
.RS 4
.sp
Slave host and base directory\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_config\fR
.RS 4
.sp
Configuration file location for the slave walmgr\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBcompleted_wals\fR
.RS 4
.sp
Slave directory where archived WAL files are copied\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBpartial_wals\fR
.RS 4
.sp
Slave directory where incomplete WAL files are stored\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBfull_backup\fR
.RS 4
.sp
Slave directory where full backups are stored\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBconfig_backup\fR
.RS 4
.sp
Slave directory where configuration file backups are stored\&. Optional\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBloop_delay\fR
.RS 4
.sp
The frequency of syncdaemon updates\&. In record shipping mode only incremental updates are sent, so smaller interval can be used\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBuse_xlog_functions\fR
.RS 4
.sp
Use pg_xlog functions for record based shipping (available in 8\&.2 and up)\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBcompression\fR
.RS 4
.sp
If nonzero, a \-z flag is added to rsync cmdline\&. Will reduce network traffic at the cost of extra CPU time\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBperiodic_command\fR
.RS 4
.sp
Shell script to be executed at specified time interval\&. Can be used for synchronizing scripts, config files etc\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBcommand_interval\fR
.RS 4
.sp
How ofter to run periodic command script\&. In seconds, and only evaluated at log switch times\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBhot_standby ===\fR
.RS 4
.sp
Boolean\&. If set to true, walmgr setup will set wal_level to hot_standby (9\&.0 and newer)\&.
.RE
.SS "Sample master\&.ini"
.sp
.if n \{\
.RS 4
.\}
.nf
[wal\-master]
logfile = master\&.log
pidfile = master\&.pid
master_db = dbname=template1
master_data = /var/lib/postgresql/8\&.0/main
master_config = /etc/postgresql/8\&.0/main/postgresql\&.conf
slave = slave:/var/lib/postgresql/walshipping
completed_wals = %(slave)s/logs\&.complete
partial_wals = %(slave)s/logs\&.partial
full_backup = %(slave)s/data\&.master
loop_delay = 10\&.0
use_xlog_functions = 1
compression = 1
.fi
.if n \{\
.RE
.\}
.SS "Slave settings"
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_data\fR
.RS 4
.sp
Postgres data directory for the slave\&. This is where the restored backup is copied/moved\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_config_dir\fR
.RS 4
.sp
Directory for postgres configuration files\&. If specified, "walmgr restore" attempts to restore configuration files from config_backup directory\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_stop_cmd\fR
.RS 4
.sp
Script to stop postmaster on slave\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_start_cmd\fR
.RS 4
.sp
Script to start postmaster on slave\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave\fR
.RS 4
.sp
Base directory for slave files (logs\&.complete, data\&.master etc)
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_bin\fR
.RS 4
.sp
Specifies the location of postgres binaries (pg_controldata, etc)\&. Needed if they are not already in the PATH\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBcompleted_wals\fR
.RS 4
.sp
Directory where complete WAL files are stored\&. Also miscellaneous control files are created in this directory (BACKUPLOCK, STOP, PAUSE, etc\&.)\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBpartial_wals\fR
.RS 4
.sp
Directory where partial WAL files are stored\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBfull_backup\fR
.RS 4
.sp
Directory where full backups are stored\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBkeep_backups\fR
.RS 4
.sp
Number of backups to keep\&. Also all WAL files needed to bring earliest
.sp
backup up to date are kept\&. The backups are rotated before new backup is started, so at one point there is actually one less backup available\&.
.sp
It probably doesn\(cqt make sense to specify keep_backups if periodic backups are not performed \- the WAL files will pile up quickly\&.
.sp
Backups will be named data\&.master, data\&.master\&.0, data\&.master\&.1 etc\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBarchive_command\fR
.RS 4
.sp
Script to execute before rotating away the oldest backup\&. If it fails backups will not be rotated\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBslave_pg_xlog\fR
.RS 4
.sp
Set slave_pg_xlog to the directory on the slave where pg_xlog files get written to\&. On a restore to the slave walmgr\&.py will create a symbolic link from data/pg_xlog to this location\&.
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBbackup_datadir\fR
.RS 4
.sp
Set backup_datadir to \fIno\fR to prevent walmgr\&.py from making a backup of the data directory when restoring to the slave\&. This defaults to \fIyes\fR
.RE
.SS "Sample slave\&.ini"
.sp
.if n \{\
.RS 4
.\}
.nf
[wal\-slave]
logfile = slave\&.log
slave_data = /var/lib/postgresql/8\&.0/main
slave_stop_cmd = /etc/init\&.d/postgresql\-8\&.0 stop
slave_start_cmd = /etc/init\&.d/postgresql\-8\&.0 start
slave = /var/lib/postgresql/walshipping
completed_wals = %(slave)s/logs\&.complete
partial_wals = %(slave)s/logs\&.partial
full_backup = %(slave)s/data\&.master
keep_backups = 5
backup_datadir = yes
.fi
.if n \{\
.RE
.\}
skytools-2.1.13/doc/getattrs.py 0000755 0001750 0001750 00000000207 11670174255 015457 0 ustar marko marko #! /usr/bin/env python
import sys
buf = open(sys.argv[1], "r").read().lower()
if buf.find("pgq consumer") >= 0:
print "-a pgq"
skytools-2.1.13/doc/londiste.cmdline.txt 0000644 0001750 0001750 00000022660 11670174255 017250 0 ustar marko marko = londiste(1) =
== NAME ==
londiste - PostgreSQL replication engine written in python
== SYNOPSIS ==
londiste.py [option] config.ini command [arguments]
== DESCRIPTION ==
Londiste is the PostgreSQL replication engine portion of the SkyTools suite,
by Skype. This suite includes packages implementing specific replication
tasks and/or solutions in layers, building upon each other.
PgQ is a generic queue implementation based on ideas from Slony-I's
snapshot based event batching. Londiste uses PgQ as its transport
mechanism to implement a robust and easy to use replication solution.
Londiste is an asynchronous master-slave(s) replication
system. Asynchronous means that a transaction commited on the master is
not guaranteed to have made it to any slave at the master's commit time; and
master-slave means that data changes on slaves are not reported back to
the master, it's the other way around only.
The replication is trigger based, and you choose a set of tables to
replicate from the provider to the subscriber(s). Any data changes
occuring on the provider (in a replicated table) will fire the
londiste trigger, which fills a queue of events for any subscriber(s) to
care about.
A replay process consumes the queue in batches, and applies all given
changes to any subscriber(s). The initial replication step involves using the
PostgreSQL's COPY command for efficient data loading.
== QUICK-START ==
Basic londiste setup and usage can be summarized by the following
steps:
1. create the subscriber database, with tables to replicate
2. edit a londiste configuration file, say conf.ini, and a PgQ ticker
configuration file, say ticker.ini
3. install londiste on the provider and subscriber nodes. This step
requires admin privileges on both provider and subscriber sides,
and both install commands can be run remotely:
$ londiste.py conf.ini provider install
$ londiste.py conf.ini subscriber install
4. launch the PgQ ticker on the provider machine:
$ pgqadm.py -d ticker.ini ticker
5. launch the londiste replay process:
$ londiste.py -d conf.ini replay
6. add tables to replicate from the provider database:
$ londiste.py conf.ini provider add table1 table2 ...
7. add tables to replicate to the subscriber database:
$ londiste.py conf.ini subscriber add table1 table2 ...
To replicate to more than one subscriber database just repeat each of the
described subscriber steps for each subscriber.
== COMMANDS ==
The londiste command is parsed globally, and has both options and
subcommands. Some options are reserved to a subset of the commands,
and others should be used without any command at all.
== GENERAL OPTIONS ==
This section presents options available to all and any londiste
command.
-h, --help::
show this help message and exit
-q, --quiet::
make program silent
-v, --verbose::
make program more verbose
== PROVIDER COMMANDS ==
$ londiste.py config.ini provider
Where command is one of:
=== provider install ===
Installs code into provider and subscriber database and creates
queue. Equivalent to doing following by hand:
CREATE LANGUAGE plpgsql;
CREATE LANGUAGE plpython;
\i .../contrib/txid.sql
\i .../contrib/pgq.sql
\i .../contrib/londiste.sql
select pgq.create_queue(queue name);
=== provider add ... ===
Registers table(s) on the provider database and adds the londiste trigger to
the table(s) which will send events to the queue. Table names can be schema
qualified with the schema name defaulting to public if not supplied.
--all::
Register all tables in provider database, except those that are
under schemas 'pgq', 'londiste', 'information_schema' or 'pg_*'.
=== provider remove ... ===
Unregisters table(s) on the provider side and removes the londiste triggers
from the table(s). The table removal event is also sent to the queue, so all
subscribers unregister the table(s) on their end as well. Table names can be
schema qualified with the schema name defaulting to public if not supplied.
=== provider add-seq ... ===
Registers a sequence on provider.
=== provider remove-seq ... ===
Unregisters a sequence on provider.
=== provider tables ===
Shows registered tables on provider side.
=== provider seqs ===
Shows registered sequences on provider side.
== SUBSCRIBER COMMANDS ==
londiste.py config.ini subscriber
Where command is one of:
=== subscriber install ===
Installs code into subscriber database. Equivalent to doing following
by hand:
CREATE LANGUAGE plpgsql;
\i .../contrib/londiste.sql
This will be done under the Postgres Londiste user, if the tables should
be owned by someone else, it needs to be done by hand.
=== subscriber add ... ===
Registers table(s) on subscriber side. Table names can be schema qualified
with the schema name defaulting to `public` if not supplied.
Switches (optional):
--all::
Add all tables that are registered on provider to subscriber database
--force::
Ignore table structure differences.
--expect-sync::
Table is already synced by external means so initial COPY is unnecessary.
--skip-truncate::
When doing initial COPY, don't remove old data.
=== subscriber remove ... ===
Unregisters table(s) from subscriber. No events will be applied to
the table anymore. Actual table will not be touched. Table names can be
schema qualified with the schema name defaulting to public if not supplied.
=== subscriber add-seq ... ===
Registers a sequence on subscriber.
=== subscriber remove-seq ... ===
Unregisters a sequence on subscriber.
=== subscriber resync ... ===
Tags table(s) as "not synced". Later the replay process will notice this
and launch copy process(es) to sync the table(s) again.
=== subscriber tables ===
Shows registered tables on the subscriber side, and the current state of
each table. Possible state values are:
NEW::
the table has not yet been considered by londiste.
in-copy::
Full-table copy is in progress.
catching-up::
Table is copied, missing events are replayed on to it.
wanna-sync:::
The "copy" process catched up, wants to hand the table over to
"replay".
do-sync:::
"replay" process is ready to accept it.
ok::
table is in sync.
=== subscriber fkeys ===
Show pending and active foreign keys on tables. Takes optional
type argument - `pending` or `active`. If no argument is given,
both types are shown.
Pending foreign keys are those that were removed during COPY time
but have not restored yet, The restore happens autmatically if
both tables are synced.
=== subscriber triggers ===
Show pending and active triggers on tables. Takes optional type
argument - `pending` or `active`. If no argument is given, both
types are shown.
Pending triggers keys are those that were removed during COPY time
but have not restored yet, The restore of triggers does not happen
autmatically, it needs to be done manually with `restore-triggers`
command.
=== subscriber restore-triggers ===
Restores all pending triggers for single table.
Optionally trigger name can be given as extra
argument, then only that trigger is restored.
=== subscriber register ===
Register consumer on queue. This usually happens
automatically when `replay` is launched, but
=== subscriber unregister ===
Unregister consumer from provider's queue. This should be
done if you want to shut replication down.
== REPLICATION COMMANDS ==
=== replay ===
The actual replication process. Should be run as daemon with -d
switch, because it needs to be always running.
It's main task is to get batches of events from PgQ and apply
them to subscriber database.
Switches:
-d, --daemon::
go background
-r, --reload::
reload config (send SIGHUP)
-s, --stop::
stop program safely (send SIGINT)
-k, --kill::
kill program immidiately (send SIGTERM)
== UTILITY COMMAND ==
=== repair ... ===
Attempts to achieve a state where the table(s) is/are in sync, compares
them, and writes out SQL statements that would fix differences.
Syncing happens by locking provider tables against updates and then
waiting until the replay process has applied all pending changes to
subscriber database. As this is dangerous operation, it has a hardwired
limit of 10 seconds for locking. If the replay process does not catch up
in that time, the locks are released and the repair operation is cancelled.
Comparing happens by dumping out the table contents of both sides,
sorting them and then comparing line-by-line. As this is a CPU and
memory-hungry operation, good practice is to run the repair command on a
third machine to avoid consuming resources on either the provider or the
subscriber.
=== compare ... ===
Syncs tables like repair, but just runs SELECT count(*) on both
sides to get a little bit cheaper, but also less precise, way of
checking if the tables are in sync.
== CONFIGURATION ==
Londiste and PgQ both use INI configuration files, your distribution of
skytools include examples. You often just have to edit the database
connection strings, namely db in PgQ ticker.ini and provider_db and
subscriber_db in londiste conf.ini as well as logfile and pidfile to adapt to
you system paths.
See `londiste(5)`.
== SEE ALSO ==
`londiste(5)`
https://developer.skype.com/SkypeGarage/DbProjects/SkyTools/[]
http://skytools.projects.postgresql.org/doc/londiste.ref.html[Reference guide]
skytools-2.1.13/doc/common.config.txt 0000644 0001750 0001750 00000002255 11670174255 016547 0 ustar marko marko
=== Common configuration parameters ===
job_name::
Name for particulat job the script does. Script will log under this name
to logdb/logserver. The name is also used as default for PgQ consumer name.
It should be unique.
pidfile::
Location for pid file. If not given, script is disallowed to daemonize.
logfile::
Location for log file.
loop_delay::
If continuisly running process, how long to sleep after each work loop,
in seconds. Default: 1.
connection_lifetime::
Close and reconnect older database connections.
log_count::
Number of log files to keep. Default: 3
log_size::
Max size for one log file. File is rotated if max size is reached.
Default: 10485760 (10M)
use_skylog::
If set, search for `[./skylog.ini, ~/.skylog.ini, /etc/skylog.ini]`.
If found then the file is used as config file for Pythons `logging` module.
It allows setting up fully customizable logging setup.
ifdef::pgq[]
=== Common PgQ consumer parameters ===
pgq_queue_name::
Queue name to attach to.
No default.
pgq_consumer_id::
Consumers ID to use when registering.
Default: %(job_name)s
endif::pgq[]
skytools-2.1.13/doc/TODO.txt 0000644 0001750 0001750 00000000565 11670174255 014562 0 ustar marko marko
= Skytools ToDo list =
== Ideas for 2.1 ==
* Frozen code, no new features.
* Fix things in existing code.
== Ideas for 2.2 ==
* Use pgq.sqltriga() in Londiste by default
* Backport triggers code from 3.0
* Backport truncate trigger 3.0
* londiste: support creating slave from master by pg_dump / PITR.
* Backport EXECUTE 3.0?
* Use session_replication_role?
skytools-2.1.13/doc/scriptmgr.1 0000644 0001750 0001750 00000013175 11727600405 015343 0 ustar marko marko '\" t
.\" Title: scriptmgr
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "SCRIPTMGR" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
scriptmgr \- utility for controlling other skytools scripts\&.
.SH "SYNOPSIS"
.sp
.nf
scriptmgr\&.py [switches] config\&.ini [\-a | job_name \&.\&.\&. ]
.fi
.SH "DESCRIPTION"
.sp
scriptmgr is used to manage several scripts together\&. It discovers potential jobs based on config file glob expression\&. From config file it gets both job_name and service type (that is the main section name eg [cube_dispatcher])\&. For each service type there is subsection in the config how to handle it\&. Unknown services are ignored\&.
.SH "COMMANDS"
.SS "status"
.sp
.if n \{\
.RS 4
.\}
.nf
scriptmgr config\&.ini status
.fi
.if n \{\
.RE
.\}
.sp
Show status for all known jobs\&.
.SS "start"
.sp
.if n \{\
.RS 4
.\}
.nf
scriptmgr config\&.ini start \-a
scriptmgr config\&.ini start job_name1 job_name2 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
launch script(s) that are not running\&.
.SS "stop"
.sp
.if n \{\
.RS 4
.\}
.nf
scriptmgr config\&.ini stop \-a
scriptmgr config\&.ini stop job_name1 job_name2 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
stop script(s) that are running\&.
.SS "restart"
.sp
.if n \{\
.RS 4
.\}
.nf
scriptmgr config\&.ini restart \-a
scriptmgr config\&.ini restart job_name1 job_name2 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
restart scripts\&.
.SS "reload"
.sp
.if n \{\
.RS 4
.\}
.nf
scriptmgr config\&.ini reload \-a
scriptmgr config\&.ini reload job_name1 job_name2 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
Send SIGHUP to scripts that are running\&.
.SH "CONFIG"
.SS "Common configuration parameters"
.PP
job_name
.RS 4
Name for particulat job the script does\&. Script will log under this name to logdb/logserver\&. The name is also used as default for PgQ consumer name\&. It should be unique\&.
.RE
.PP
pidfile
.RS 4
Location for pid file\&. If not given, script is disallowed to daemonize\&.
.RE
.PP
logfile
.RS 4
Location for log file\&.
.RE
.PP
loop_delay
.RS 4
If continuisly running process, how long to sleep after each work loop, in seconds\&. Default: 1\&.
.RE
.PP
connection_lifetime
.RS 4
Close and reconnect older database connections\&.
.RE
.PP
log_count
.RS 4
Number of log files to keep\&. Default: 3
.RE
.PP
log_size
.RS 4
Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M)
.RE
.PP
use_skylog
.RS 4
If set, search for
[\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons
logging
module\&. It allows setting up fully customizable logging setup\&.
.RE
.SS "scriptmgr parameters"
.PP
config_list
.RS 4
List of glob patters for finding config files\&. Example:
.sp
.if n \{\
.RS 4
.\}
.nf
config_list = ~/dbscripts/conf/*\&.ini, ~/random/conf/*\&.ini
.fi
.if n \{\
.RE
.\}
.RE
.SS "Service section parameters"
.PP
cwd
.RS 4
Working directory for script\&.
.RE
.PP
args
.RS 4
Arguments to give to script, in addition to
\-d\&.
.RE
.PP
script
.RS 4
Path to script\&. Unless script is in PATH, full path should be given\&.
.RE
.PP
disabled
.RS 4
If this service should be ignored\&.
.RE
.SS "Example config file"
.sp
.if n \{\
.RS 4
.\}
.nf
[scriptmgr]
job_name = scriptmgr_livesrv
logfile = ~/log/%(job_name)s\&.log
pidfile = ~/pid/%(job_name)s\&.pid
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
config_list = ~/scripts/conf/*\&.ini
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# defaults for all service sections
[DEFAULT]
cwd = ~/scripts
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
[table_dispatcher]
script = table_dispatcher\&.py
args = \-v
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
[cube_dispatcher]
script = python2\&.4 cube_dispatcher\&.py
disabled = 1
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
[pgqadm]
script = ~/scripts/pgqadm\&.py
args = ticker
.fi
.if n \{\
.RE
.\}
.SH "COMMAND LINE SWITCHES"
.sp
Following switches are common to all skytools\&.DBScript\-based Python programs\&.
.PP
\-h, \-\-help
.RS 4
show help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.PP
\-d, \-\-daemon
.RS 4
make program go background
.RE
.sp
Following switches are used to control already running process\&. The pidfile is read from config then signal is sent to process id specified there\&.
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE
.sp
Options specific to scriptmgr:
.PP
\-a, \-\-all
.RS 4
Operate on all non\-disabled scripts\&.
.RE
skytools-2.1.13/doc/londiste.5 0000644 0001750 0001750 00000007637 11727600367 015173 0 ustar marko marko '\" t
.\" Title: londiste
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "LONDISTE" "5" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
londiste \- PostgreSQL replication engine written in python
.SH "SYNOPSIS"
.sp
.nf
[londiste]
job_name = asd
.fi
.SH "DESCRIPTION"
.sp
The londiste configuration file follow the famous \&.INI syntax\&. It contains only one section named londiste\&.
.sp
Most defaults values are reasonable ones\&. That means you can only edit provider_db, subscriber_db and pgq_queue_name and be done with londiste configuration\&.
.SH "OPTIONS"
.sp
You can configure the following options into the londiste section\&.
.PP
job_name
.RS 4
Each Skytools daemon process must have a unique job_name\&. Londiste uses it also as consumer name when subscribing to queue\&.
.RE
.PP
provider_db
.RS 4
Provider database connection string (DSN)\&.
.RE
.PP
subscriber_db
.RS 4
Subscriber database connection string (DSN)\&.
.RE
.PP
pgq_queue_name
.RS 4
Name of the queue to read from\&. Several subscribers can read from same queue\&.
.RE
.PP
logfile
.RS 4
Where to log londiste activity\&.
.RE
.PP
pidfile
.RS 4
Where to store the pid of the main londiste process, the replay one\&.
.RE
.PP
lock_timeout
.RS 4
Few operations take lock on provider (provider add/remove, compare, repair)\&. This parameter specifies timeout in seconds (float) how long a lock can be held\&. New in version 2\&.1\&.8\&. Default: 10
.RE
.PP
loop_delay
.RS 4
How often to poll events from provider\&. In seconds (float)\&. Default: 1\&.
.RE
.PP
pgq_lazy_fetch
.RS 4
How many events to fetch at a time when processing a batch\&. Useful when you know a single transaction (maintenance
UPDATE
command, e\&.g\&.) will produce a lot of events in a single batch\&. When lazily fetching, a cursor is used so as to still process a single batch in a single transaction\&. Default: 0, always fetch all events of the batch, not using a cursor\&.
.RE
.PP
log_count
.RS 4
Number of log files to keep\&. Default: 3
.RE
.PP
log_size
.RS 4
Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M)
.RE
.PP
use_skylog
.RS 4
If set, search for
[\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons
logging
module\&. It allows setting up fully customizable logging setup\&. Default: 0
.RE
.SH "EXAMPLE"
.sp
.if n \{\
.RS 4
.\}
.nf
[londiste]
job_name = test_to_subcriber
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
provider_db = dbname=provider port=6000 host=127\&.0\&.0\&.1
subscriber_db = dbname=subscriber port=6000 host=127\&.0\&.0\&.1
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# it will be used as sql ident so no dots/spaces
pgq_queue_name = londiste\&.replika
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
logfile = /tmp/%(job_name)s\&.log
pidfile = /tmp/%(job_name)s\&.pid
.fi
.if n \{\
.RE
.\}
.SH "SEE ALSO"
.sp
londiste(1)
skytools-2.1.13/doc/pgq-admin.txt 0000644 0001750 0001750 00000015267 11670174255 015677 0 ustar marko marko = pgqadm(1) =
== NAME ==
pgqadm - PgQ ticker and administration interface
== SYNOPSIS ==
pgqadm.py [option] config.ini command [arguments]
== DESCRIPTION ==
PgQ is Postgres based event processing system. It is part of SkyTools package
that contains several useful implementations on this engine. Main function of
PgQadm is to maintain and keep healthy both pgq internal tables and tables that
store events.
SkyTools is scripting framework for Postgres databases written in Python that
provides several utilities and implements common database handling logic.
Event - atomic piece of data created by Producers. In PgQ event is one record
in one of tables that services that queue. Event record contains some system fields
for PgQ and several data fileds filled by Producers. PgQ is neither checking nor
enforcing event type. Event type is someting that consumer and produser must agree on.
PgQ guarantees that each event is seen at least once but it is up to consumer to
make sure that event is processed no more than once if that is needed.
Batch - PgQ is designed for efficiency and high throughput so events are grouped
into batches for bulk processing. Creating these batches is one of main tasks of
PgQadm and there are several parameters for each queue that can be use to tune
size and frequency of batches. Consumerss receive events in these batches and depending
on business requirements process events separately or also in batches.
Queue - Event are stored in queue tables i.e queues. Several producers can write into
same queeu and several consumers can read from the queue. Events are kept in queue
until all the consumers have seen them. We use table rotation to decrease
hard disk io. Queue can contain any number of event types it is up to Producer and
Consumer to agree on what types of events are passed and how they are encoded
For example Londiste producer side can produce events for more tables tan consumer
side needs so consumer subscribes only to those tables it needs and events for
other tables are ignores.
Producer - applicatione that pushes event into queue. Prodecer can be written in any
langaage that is able to run stored procedures in Postgres.
Consumer - application that reads events from queue. Consumers can be written in any
language that can interact with Postgres. SkyTools package contains several useful
consumers written in Python that can be used as they are or as good starting points
to write more complex consumers.
== QUICK-START ==
Basic PgQ setup and usage can be summarized by the following
steps:
1. create the database
2. edit a PgQ ticker configuration file, say ticker.ini
3. install PgQ internal tables
$ pgqadm.py ticker.ini install
4. launch the PgQ ticker on databse machine as daemon
$ pgqadm.py -d ticker.ini ticker
5. create queue
$ pgqadm.py ticker.ini create
6. register or run consumer to register it automatically
$ pgqadm.py ticker.ini register
7. start producing events
== CONFIG ==
[pgqadm]
job_name = pgqadm_somedb
db = dbname=somedb
# how often to run maintenance [seconds]
maint_delay = 600
# how often to check for activity [seconds]
loop_delay = 0.1
logfile = ~/log/%(job_name)s.log
pidfile = ~/pid/%(job_name)s.pid
== COMMANDS ==
=== ticker ===
Start ticking & maintenance process. Usually run as daemon with -d option.
Must be running for PgQ to be functional and for consumers to see any events.
=== status ===
Show overview of registered queues and consumers and queue health.
This command is used when you want to know what is happening inside PgQ.
=== install ===
Installs PgQ schema into database from config file.
=== create ===
Create queue tables into pgq schema. As soon as queue is created producers can
start inserting events into it. But you must be aware that if there are no
consumers on the queue the events are lost until consumer is registered.
=== drop ===
Drop queue and all it's consumers from PgQ. Queue tables are dropped and
all the contents are lost forever so use with care as with most drop commands.
=== register ===
Register given consumer to listen to given queue. First batch seen by this consumer
is the one completed after registration. Registration happens automatically when
consumer is run first time so using this command is optional but may be needed
when producers start producing events before consumer can be run.
=== unregister ===
Removes consumer from given queue. Note consumer must be stopped before issuing
this command otherwise it automatically registers again.
=== config [ [= ... ]] ===
Show or change queue config. There are several parameters that can be set for each
queue shown here with default values:
queue_ticker_max_lag (2)::
If no tick has happend during given number of seconds then one
is generated just to keep queue lag in control. It may be increased
if there is no need to deliver events fast. Not much room to decrease it :)
queue_ticker_max_count (200)::
Threshold number of events in filling batch that triggers tick.
Can be increased to encourage PgQ to create larger batches or decreased
to encourage faster ticking with smaller batches.
queue_ticker_idle_period (60)::
Number of seconds that can pass without ticking if no events are coming to queue.
These empty ticks are used as keep alive signals for batch jobs and monitoring.
queue_rotation_period (2 hours)::
Interval of time that may pass before PgQ tries to rotate tables to free up space.
Not PgQ can not rotate tables if there are long transactions in database like VACUUM
or pg_dump. May be decreased if low on disk space or increased to keep longer history
of old events. To small values might affect performance badly because postgres tends
to do seq scans on small tables. Too big values may waste disk space.
Looking at queue config.
$ pgqadm.py mydb.ini config
testqueue
queue_ticker_max_lag = 3
queue_ticker_max_count = 500
queue_ticker_idle_period = 60
queue_rotation_period = 7200
$ pgqadm.py conf/pgqadm_myprovider.ini config testqueue queue_ticker_max_lag=10 queue_ticker_max_count=300
Change queue bazqueue config to: queue_ticker_max_lag='10', queue_ticker_max_count='300'
$
== COMMON OPTIONS ==
-h, --help::
show help message
-q, --quiet::
make program silent
-v, --verbose::
make program verbose
-d, --daemon::
go background
-r, --reload::
reload config (send SIGHUP)
-s, --stop::
stop program safely (send SIGINT)
-k, --kill::
kill program immidiately (send SIGTERM)
// vim:sw=2 et smarttab sts=2:
skytools-2.1.13/doc/londiste.config.txt 0000644 0001750 0001750 00000004650 11670174255 017101 0 ustar marko marko = londiste(5) =
== NAME ==
londiste - PostgreSQL replication engine written in python
== SYNOPSIS ==
[londiste]
job_name = asd
== DESCRIPTION ==
The londiste configuration file follow the famous .INI syntax. It
contains only one section named londiste.
Most defaults values are reasonable ones. That means you can only edit
provider_db, subscriber_db and pgq_queue_name and be done with
londiste configuration.
== OPTIONS ==
You can configure the following options into the londiste section.
job_name::
Each Skytools daemon process must have a unique job_name.
Londiste uses it also as consumer name when subscribing to queue.
provider_db::
Provider database connection string (DSN).
subscriber_db::
Subscriber database connection string (DSN).
pgq_queue_name::
Name of the queue to read from. Several subscribers can
read from same queue.
logfile::
Where to log londiste activity.
pidfile::
Where to store the pid of the main londiste process, the replay one.
lock_timeout::
Few operations take lock on provider (provider add/remove, compare, repair).
This parameter specifies timeout in seconds (float) how long a lock
can be held. New in version 2.1.8. Default: 10
loop_delay::
How often to poll events from provider. In seconds (float).
Default: 1.
pgq_lazy_fetch::
How many events to fetch at a time when processing a batch. Useful when
you know a single transaction (maintenance +UPDATE+ command, e.g.) will
produce a lot of events in a single batch. When lazily fetching, a cursor
is used so as to still process a single batch in a single transaction.
Default: 0, always fetch all events of the batch, not using a cursor.
log_count::
Number of log files to keep. Default: 3
log_size::
Max size for one log file. File is rotated if max size is reached.
Default: 10485760 (10M)
use_skylog::
If set, search for `[./skylog.ini, ~/.skylog.ini, /etc/skylog.ini]`.
If found then the file is used as config file for Pythons `logging` module.
It allows setting up fully customizable logging setup.
Default: 0
== EXAMPLE ==
[londiste]
job_name = test_to_subcriber
provider_db = dbname=provider port=6000 host=127.0.0.1
subscriber_db = dbname=subscriber port=6000 host=127.0.0.1
# it will be used as sql ident so no dots/spaces
pgq_queue_name = londiste.replika
logfile = /tmp/%(job_name)s.log
pidfile = /tmp/%(job_name)s.pid
== SEE ALSO ==
londiste(1)
skytools-2.1.13/doc/fixman.py 0000755 0001750 0001750 00000000425 11670174255 015106 0 ustar marko marko #! /usr/bin/env python
import sys,re
# hacks to force empty lines into manpage
ln1 = r"\1\2"
xml = sys.stdin.read()
xml = re.sub(r"(\s*)(\s*)(