pax_global_header 0000666 0000000 0000000 00000000064 12634264163 0014521 g ustar 00root root 0000000 0000000 52 comment=003cd9a8bd8d36b24a481a304492ec49efbd5b6d
ganeti-2.15.2/ 0000755 0000000 0000000 00000000000 12634264163 0013055 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/.ghci 0000644 0000000 0000000 00000000025 12634264163 0013765 0 ustar 00root root 0000000 0000000 :set -isrc -itest/hs
ganeti-2.15.2/.gitignore 0000644 0000000 0000000 00000005660 12634264163 0015054 0 ustar 00root root 0000000 0000000 # Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
# *.[oa]
# *~
# global ignores
*.py[co]
*.swp
*~
*.o
*.hpc_o
*.prof_o
*.dyn_o
*.hi
*.hpc_hi
*.prof_hi
*.dyn_hi
*.hp
*.tix
*.prof
*.stat
.hpc/
# /
/.hsenv
/Makefile
/Makefile.ghc
/Makefile.ghc.bak
/Makefile.in
/Makefile.local
/Session.vim
/TAGS*
/apps/
/aclocal.m4
/autom4te.cache
/autotools/install-sh
/autotools/missing
/autotools/py-compile
/autotools/replace_vars.sed
/autotools/shell-env-init
/cabal_macros.h
/config.log
/config.status
/configure
/devel/squeeze-amd64.tar.gz
/devel/squeeze-amd64.conf
/devel/wheezy-amd64.tar.gz
/devel/wheezy-amd64.conf
/dist/
/empty-cabal-config
/epydoc.conf
/ganeti
/ganeti.cabal
/ganeti.depsflags
/stamp-srclinks
/stamp-directories
/vcs-version
/*.patch
/*.tar.bz2
/*.tar.gz
/ganeti-[0-9]*.[0-9]*.[0-9]*
# daemons
/daemons/daemon-util
/daemons/ganeti-cleaner
/daemons/ganeti-masterd
/daemons/ganeti-noded
/daemons/ganeti-rapi
/daemons/ganeti-watcher
# doc
/doc/api/
/doc/coverage/
/doc/html/
/doc/man-html/
/doc/install-quick.rst
/doc/news.rst
/doc/upgrade.rst
/doc/hs-lint.html
/doc/manpages-enabled.rst
/doc/man-*.rst
/doc/users/groupmemberships
/doc/users/groups
/doc/users/users
# doc/examples
/doc/examples/bash_completion
/doc/examples/bash_completion-debug
/doc/examples/ganeti.cron
/doc/examples/ganeti.initd
/doc/examples/ganeti.logrotate
/doc/examples/ganeti-kvm-poweroff.initd
/doc/examples/ganeti-master-role.ocf
/doc/examples/ganeti-node-role.ocf
/doc/examples/gnt-config-backup
/doc/examples/hooks/ipsec
/doc/examples/systemd/ganeti-*.service
# lib
/lib/_constants.py
/lib/_vcsversion.py
/lib/_generated_rpc.py
/lib/opcodes.py
/lib/rpc/stub/
# man
/man/*.[0-9]
/man/*.html
/man/*.in
/man/*.gen
# test/hs
/test/hs/hail
/test/hs/harep
/test/hs/hbal
/test/hs/hcheck
/test/hs/hinfo
/test/hs/hroller
/test/hs/hscan
/test/hs/hspace
/test/hs/hsqueeze
/test/hs/hpc-htools
/test/hs/hpc-mon-collector
/test/hs/htest
# tools
/tools/kvm-ifup
/tools/kvm-ifup-os
/tools/xen-ifup-os
/tools/burnin
/tools/ensure-dirs
/tools/users-setup
/tools/vcluster-setup
/tools/vif-ganeti
/tools/vif-ganeti-metad
/tools/net-common
/tools/node-cleanup
/tools/node-daemon-setup
/tools/prepare-node-join
/tools/shebang/
/tools/ssh-update
/tools/ssl-update
# scripts
/scripts/gnt-backup
/scripts/gnt-cluster
/scripts/gnt-debug
/scripts/gnt-filter
/scripts/gnt-group
/scripts/gnt-instance
/scripts/gnt-job
/scripts/gnt-node
/scripts/gnt-os
/scripts/gnt-network
/scripts/gnt-storage
# haskell-specific rules
/src/mon-collector
/src/htools
/src/hconfd
/src/hluxid
/src/hs2py
/src/hs2py-constants
/src/ganeti-confd
/src/ganeti-wconfd
/src/ganeti-kvmd
/src/ganeti-luxid
/src/ganeti-metad
/src/ganeti-mond
/src/rpc-test
# automatically-built Haskell files
/src/AutoConf.hs
/src/Ganeti/Curl/Internal.hs
/src/Ganeti/Hs2Py/ListConstants.hs
/src/Ganeti/Version.hs
/test/hs/Test/Ganeti/TestImports.hs
ganeti-2.15.2/COPYING 0000644 0000000 0000000 00000002416 12634264163 0014113 0 ustar 00root root 0000000 0000000 Copyright (C) 2006-2015 Google Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
ganeti-2.15.2/INSTALL 0000644 0000000 0000000 00000026317 12634264163 0014117 0 ustar 00root root 0000000 0000000 Ganeti quick installation guide
===============================
Please note that a more detailed installation procedure is described in
the :doc:`install`. Refer to it if you are setting up Ganeti the first time.
This quick installation guide is mainly meant as reference for experienced
users. A glossary of terms can be found in the :doc:`glossary`.
Software Requirements
---------------------
.. highlight:: shell-example
Before installing, please verify that you have the following programs:
- `Xen Hypervisor `_, version 3.0 or above, if
running on Xen
- `KVM Hypervisor `_, version 72 or above, if
running on KVM. In order to use advanced features, such as live
migration, virtio, etc, an even newer version is recommended (qemu-kvm
versions 0.11.X and above have shown good behavior).
- `DRBD `_, kernel module and userspace utils,
version 8.0.7 or above; note that Ganeti doesn't yet support version 8.4
- `RBD `_, kernel modules
(``rbd.ko``/``libceph.ko``) and userspace utils (``ceph-common``)
- `LVM2 `_
- `OpenSSH `_
- `bridge utilities `_
- `iproute2 `_
- `arping `_ (part of iputils)
- `ndisc6 `_ (if using IPv6)
- `Python `_, version 2.6 or above, not 3.0
- `Python OpenSSL bindings `_
- `simplejson Python module `_
- `pyparsing Python module `_, version
1.4.6 or above
- `pyinotify Python module `_
- `PycURL Python module `_
- `socat `_, see :ref:`note
` below
- `Paramiko `_, if you want to use
``ganeti-listrunner``
- `psutil Python module `_,
optional python package for supporting CPU pinning under KVM
- `fdsend Python module `_,
optional Python package for supporting NIC hotplugging under KVM
- `qemu-img `_, if you want to use ``ovfconverter``
- `fping `_
- `Python IP address manipulation library
`_
- `Bitarray Python library `_
- `GNU Make `_
- `GNU M4 `_
These programs are supplied as part of most Linux distributions, so
usually they can be installed via the standard package manager. Also
many of them will already be installed on a standard machine. On
Debian/Ubuntu, you can use this command line to install all required
packages, except for RBD, DRBD and Xen::
$ apt-get install lvm2 ssh bridge-utils iproute iputils-arping make m4 \
ndisc6 python python-openssl openssl \
python-pyparsing python-simplejson python-bitarray \
python-pyinotify python-pycurl python-ipaddr socat fping
Note that the previous instructions don't install optional packages.
To install the optional package, run the following line.::
$ apt-get install python-paramiko python-psutil qemu-utils
If some of the python packages are not available in your system,
you can try installing them using ``easy_install`` command.
For example::
$ apt-get install python-setuptools python-dev
$ cd / && easy_install \
psutil \
bitarray \
ipaddr
On Fedora to install all required packages except RBD, DRBD and Xen::
$ yum install openssh openssh-clients bridge-utils iproute ndisc6 make \
pyOpenSSL pyparsing python-simplejson python-inotify \
python-lxm socat fping python-bitarray python-ipaddr
For optional packages use the command::
$ yum install python-paramiko python-psutil qemu-img
If you want to build from source, please see doc/devnotes.rst for more
dependencies.
.. _socat-note:
.. note::
Ganeti's import/export functionality uses ``socat`` with OpenSSL for
transferring data between nodes. By default, OpenSSL 0.9.8 and above
employ transparent compression of all data using zlib if supported by
both sides of a connection. In cases where a lot of data is
transferred, this can lead to an increased CPU usage. Additionally,
Ganeti already compresses all data using ``gzip`` where it makes sense
(for inter-cluster instance moves).
To remedey this situation, patches implementing a new ``socat`` option
for disabling OpenSSL compression have been contributed and will
likely be included in the next feature release. Until then, users or
distributions need to apply the patches on their own.
Ganeti will use the option if it's detected by the ``configure``
script; auto-detection can be disabled by explicitly passing
``--enable-socat-compress`` (use the option to disable compression) or
``--disable-socat-compress`` (don't use the option).
The patches and more information can be found on
http://www.dest-unreach.org/socat/contrib/socat-opensslcompress.html.
Haskell requirements
~~~~~~~~~~~~~~~~~~~~
Starting with Ganeti 2.7, the Haskell GHC compiler and a few base
libraries are required in order to build Ganeti (but not to run and
deploy Ganeti on production machines). More specifically:
- `GHC `_ version 7 or higher
- or even better, `The Haskell Platform
`_ which gives you a simple way
to bootstrap Haskell
- `cabal-install `_ and
`Cabal `_, the Common Architecture
for Building Haskell Applications and Libraries (executable and library)
- `json `_, a JSON library
- `network `_, a basic
network library
- `parallel `_, a parallel
programming library (note: tested with up to version 3.x)
- `bytestring `_ and
`utf8-string `_
libraries; these usually come with the GHC compiler
- `text `_
- `deepseq `_,
usually comes with the GHC compiler
- `curl `_, tested with
versions 1.3.4 and above
- `hslogger `_, version 1.1 and
above.
- `hinotify `_, tested with
version 0.3.2
- `Crypto `_, tested with
version 4.2.4
- `regex-pcre `_,
bindings for the ``pcre`` library
- `attoparsec `_,
version 0.10 and above
- `vector `_
- `process `_, version 1.0.1.1 and
above; usually comes with the GHC compiler
- `base64-bytestring
`_,
version 1.0.0.0 and above
- `lifted-base `_,
version 0.1.1 and above.
- `lens `_,
version 3.10 and above.
Some of these are also available as package in Debian/Ubuntu::
$ apt-get install ghc cabal-install libghc-cabal-dev \
libghc-json-dev libghc-network-dev \
libghc-parallel-dev \
libghc-utf8-string-dev libghc-curl-dev \
libghc-hslogger-dev \
libghc-crypto-dev libghc-text-dev \
libghc-hinotify-dev libghc-regex-pcre-dev \
libpcre3-dev \
libghc-attoparsec-dev libghc-vector-dev \
libghc-zlib-dev
Debian Jessie also includes recent enough versions of these libraries::
$ apt-get install libghc-base64-bytestring-dev \
libghc-lens-dev \
libghc-lifted-base-dev
In Fedora, some of them are available via packages as well::
$ yum install ghc ghc-json-devel ghc-network-devel \
ghc-parallel-devel ghc-deepseq-devel \
ghc-hslogger-devel ghc-text-devel \
ghc-regex-pcre-devel
The most recent Fedora doesn't provide ``crypto``, ``inotify``. So these
need to be installed using ``cabal``.
If using a distribution which does not provide these libraries, first
install the Haskell platform. Then run::
$ cabal update
Then install the additional native libraries::
$ apt-get install libpcre3-dev libcurl4-openssl-dev
And finally the libraries required for building the packages via ``cabal``
(it will automatically pick only those that are not already installed via your
distribution packages)::
$ cabal install --only-dependencies cabal/ganeti.template.cabal
Haskell optional features
~~~~~~~~~~~~~~~~~~~~~~~~~
Optionally, more functionality can be enabled if your build machine has
a few more Haskell libraries enabled: the ``ganeti-confd`` daemon
(``--enable-confd``), the monitoring daemon (``--enable-monitoring``) and
the meta-data daemon (``--enable-metadata``).
The extra dependencies for these are:
- `snap-server` `_, version
0.8.1 and above.
- `case-insensitive`
`_, version
0.4.0.1 and above (it's also a dependency of ``snap-server``).
- `PSQueue `_,
version 1.0 and above.
These libraries are available in Debian Wheezy or later, so you can use
either apt::
$ apt-get install libghc-snap-server-dev libghc-psqueue-dev
or ``cabal``::
$ cabal install --only-dependencies cabal/ganeti.template.cabal \
--flags="confd mond metad"
to install them.
.. _cabal-note:
.. note::
Make sure that your ``~/.cabal/bin`` directory (or whatever else
is defined as ``bindir``) is in your ``PATH``.
Installation of the software
----------------------------
To install, simply run the following command::
$ ./configure --localstatedir=/var --sysconfdir=/etc && \
make && \
make install
This will install the software under ``/usr/local``. You then need to
copy ``doc/examples/ganeti.initd`` to ``/etc/init.d/ganeti`` and
integrate it into your boot sequence (``chkconfig``, ``update-rc.d``,
etc.). Also, Ganeti uses symbolic links in the sysconfdir to determine,
which of potentially many installed versions currently is used. If these
symbolic links should be added by the install as well, add the
option ``--enable-symlinks`` to the ``configure`` call.
Cluster initialisation
----------------------
Before initialising the cluster, on each node you need to create the
following directories:
- ``/etc/ganeti``
- ``/var/lib/ganeti``
- ``/var/log/ganeti``
- ``/srv/ganeti``
- ``/srv/ganeti/os``
- ``/srv/ganeti/export``
After this, use ``gnt-cluster init``.
.. vim: set textwidth=72 syntax=rst :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/Makefile.am 0000644 0000000 0000000 00000267560 12634264163 0015131 0 ustar 00root root 0000000 0000000 # Ganeti makefile
# - Indent with tabs only.
# - Keep files sorted; one line per file.
# - Directories in lib/ must have their own *dir variable (see hypervisor).
# - All directories must be listed DIRS.
# - Use autogen.sh to generate Makefile.in and configure script.
# For distcheck we need the haskell tests to be enabled. Note:
# The "correct" way would be to define AM_DISTCHECK_CONFIGURE_FLAGS.
# However, we still have to support older versions of autotools,
# so we cannot use that yet, hence we fall back to..
DISTCHECK_CONFIGURE_FLAGS = --enable-haskell-tests
# Automake doesn't export these variables before version 1.10.
abs_top_builddir = @abs_top_builddir@
abs_top_srcdir = @abs_top_srcdir@
# Helper values for calling builtin functions
empty :=
space := $(empty) $(empty)
comma := ,
# Helper function to strip src/ and test/hs/ from a list
strip_hsroot = $(patsubst src/%,%,$(patsubst test/hs/%,%,$(1)))
# Use bash in order to be able to use pipefail
SHELL=/bin/bash
EXTRA_DIST=
# Enable colors in shelltest
SHELLTESTARGS = "-c"
ACLOCAL_AMFLAGS = -I autotools
BUILD_BASH_COMPLETION = $(top_srcdir)/autotools/build-bash-completion
RUN_IN_TEMPDIR = $(top_srcdir)/autotools/run-in-tempdir
CHECK_PYTHON_CODE = $(top_srcdir)/autotools/check-python-code
CHECK_HEADER = $(top_srcdir)/autotools/check-header
CHECK_MAN_DASHES = $(top_srcdir)/autotools/check-man-dashes
CHECK_MAN_REFERENCES = $(top_srcdir)/autotools/check-man-references
CHECK_MAN_WARNINGS = $(top_srcdir)/autotools/check-man-warnings
CHECK_VERSION = $(top_srcdir)/autotools/check-version
CHECK_NEWS = $(top_srcdir)/autotools/check-news
CHECK_IMPORTS = $(top_srcdir)/autotools/check-imports
DOCPP = $(top_srcdir)/autotools/docpp
REPLACE_VARS_SED = autotools/replace_vars.sed
PRINT_PY_CONSTANTS = $(top_srcdir)/autotools/print-py-constants
BUILD_RPC = $(top_srcdir)/autotools/build-rpc
SHELL_ENV_INIT = autotools/shell-env-init
# starting as of Ganeti 2.10, all files are stored in two directories,
# with only symbolic links added at other places.
#
# $(versiondir) contains most of Ganeti and all architecture-dependent files
# $(versionedsharedir) contains only architecture-independent files; all python
# executables need to go directly to $(versionedsharedir), as all ganeti python
# mdules are installed outside the usual python path, i.e., as private modules.
#
# $(defaultversiondir) and $(defaultversionedsharedir) are the corresponding
# directories for "the currently running" version of Ganeti. We never install
# there, but all symbolic links go there, rather than directory to $(versiondir)
# or $(versionedsharedir). Note that all links to $(default*dir) need to be stable;
# so, if some currently architecture-independent executable is replaced by an
# architecture-dependent one (and hence has to go under $(versiondir)), add a link
# under $(versionedsharedir) but do not change the external links.
if USE_VERSION_FULL
DIRVERSION=$(VERSION_FULL)
else
DIRVERSION=$(VERSION_MAJOR).$(VERSION_MINOR)
endif
versiondir = $(libdir)/ganeti/$(DIRVERSION)
defaultversiondir = $(libdir)/ganeti/default
versionedsharedir = $(prefix)/share/ganeti/$(DIRVERSION)
defaultversionedsharedir = $(prefix)/share/ganeti/default
# Note: these are automake-specific variables, and must be named after
# the directory + 'dir' suffix
pkglibdir = $(versiondir)$(libdir)/ganeti
myexeclibdir = $(pkglibdir)
bindir = $(versiondir)/$(BINDIR)
sbindir = $(versiondir)$(SBINDIR)
mandir = $(versionedsharedir)/root$(MANDIR)
pkgpythondir = $(versionedsharedir)/ganeti
pkgpython_rpc_stubdir = $(versionedsharedir)/ganeti/rpc/stub
gntpythondir = $(versionedsharedir)
pkgpython_bindir = $(versionedsharedir)
gnt_python_sbindir = $(versionedsharedir)
tools_pythondir = $(versionedsharedir)
clientdir = $(pkgpythondir)/client
cmdlibdir = $(pkgpythondir)/cmdlib
cmdlib_clusterdir = $(pkgpythondir)/cmdlib/cluster
configdir = $(pkgpythondir)/config
hypervisordir = $(pkgpythondir)/hypervisor
hypervisor_hv_kvmdir = $(pkgpythondir)/hypervisor/hv_kvm
jqueuedir = $(pkgpythondir)/jqueue
storagedir = $(pkgpythondir)/storage
httpdir = $(pkgpythondir)/http
masterddir = $(pkgpythondir)/masterd
confddir = $(pkgpythondir)/confd
rapidir = $(pkgpythondir)/rapi
rpcdir = $(pkgpythondir)/rpc
rpc_stubdir = $(pkgpythondir)/rpc/stub
serverdir = $(pkgpythondir)/server
watcherdir = $(pkgpythondir)/watcher
impexpddir = $(pkgpythondir)/impexpd
utilsdir = $(pkgpythondir)/utils
toolsdir = $(pkglibdir)/tools
iallocatorsdir = $(pkglibdir)/iallocators
pytoolsdir = $(pkgpythondir)/tools
docdir = $(versiondir)$(datadir)/doc/$(PACKAGE)
ifupdir = $(sysconfdir)/ganeti
if USE_BACKUP_DIR
backup_dir = $(BACKUP_DIR)
else
backup_dir = $(localstatedir)/lib
endif
SYMLINK_TARGET_DIRS = \
$(sysconfdir)/ganeti \
$(libdir)/ganeti/iallocators \
$(libdir)/ganeti/tools \
$(prefix)/share/ganeti \
$(BINDIR) \
$(SBINDIR) \
$(MANDIR)/man1 \
$(MANDIR)/man7 \
$(MANDIR)/man8
# Delete output file if an error occurred while building it
.DELETE_ON_ERROR:
HS_DIRS = \
src \
src/Ganeti \
src/Ganeti/Confd \
src/Ganeti/Curl \
src/Ganeti/Cpu \
src/Ganeti/DataCollectors \
src/Ganeti/Daemon \
src/Ganeti/Hs2Py \
src/Ganeti/HTools \
src/Ganeti/HTools/Backend \
src/Ganeti/HTools/Cluster \
src/Ganeti/HTools/Program \
src/Ganeti/Hypervisor \
src/Ganeti/Hypervisor/Xen \
src/Ganeti/JQScheduler \
src/Ganeti/JQueue \
src/Ganeti/Locking \
src/Ganeti/Logging \
src/Ganeti/Monitoring \
src/Ganeti/Metad \
src/Ganeti/Objects \
src/Ganeti/OpCodes \
src/Ganeti/Query \
src/Ganeti/Storage \
src/Ganeti/Storage/Diskstats \
src/Ganeti/Storage/Drbd \
src/Ganeti/Storage/Lvm \
src/Ganeti/THH \
src/Ganeti/Utils \
src/Ganeti/WConfd \
test/hs \
test/hs/Test \
test/hs/Test/Ganeti \
test/hs/Test/Ganeti/Storage \
test/hs/Test/Ganeti/Storage/Diskstats \
test/hs/Test/Ganeti/Storage/Drbd \
test/hs/Test/Ganeti/Storage/Lvm \
test/hs/Test/Ganeti/Confd \
test/hs/Test/Ganeti/HTools \
test/hs/Test/Ganeti/HTools/Backend \
test/hs/Test/Ganeti/Hypervisor \
test/hs/Test/Ganeti/Hypervisor/Xen \
test/hs/Test/Ganeti/JQueue \
test/hs/Test/Ganeti/Locking \
test/hs/Test/Ganeti/Objects \
test/hs/Test/Ganeti/Query \
test/hs/Test/Ganeti/THH \
test/hs/Test/Ganeti/Utils \
test/hs/Test/Ganeti/WConfd
# Haskell directories without the roots (src, test/hs)
HS_DIRS_NOROOT = $(filter-out src,$(filter-out test/hs,$(HS_DIRS)))
DIRS = \
$(HS_DIRS) \
autotools \
cabal \
daemons \
devel \
devel/data \
doc \
doc/css \
doc/examples \
doc/examples/gnt-debug \
doc/examples/hooks \
doc/examples/systemd \
doc/users \
test/data/htools \
test/data/htools/rapi \
test/hs/shelltests \
test/autotools \
lib \
lib/build \
lib/client \
lib/cmdlib \
lib/cmdlib/cluster \
lib/confd \
lib/config \
lib/jqueue \
lib/http \
lib/hypervisor \
lib/hypervisor/hv_kvm \
lib/impexpd \
lib/masterd \
lib/rapi \
lib/rpc \
lib/rpc/stub \
lib/server \
lib/storage \
lib/tools \
lib/utils \
lib/watcher \
man \
qa \
qa/patch \
test \
test/data \
test/data/bdev-rbd \
test/data/ovfdata \
test/data/ovfdata/other \
test/data/cgroup_root \
test/data/cgroup_root/memory \
test/data/cgroup_root/memory/lxc \
test/data/cgroup_root/memory/lxc/instance1 \
test/data/cgroup_root/cpuset \
test/data/cgroup_root/cpuset/some_group \
test/data/cgroup_root/cpuset/some_group/lxc \
test/data/cgroup_root/cpuset/some_group/lxc/instance1 \
test/data/cgroup_root/devices \
test/data/cgroup_root/devices/some_group \
test/data/cgroup_root/devices/some_group/lxc \
test/data/cgroup_root/devices/some_group/lxc/instance1 \
test/py \
test/py/testutils \
test/py/cmdlib \
test/py/cmdlib/testsupport \
tools
ALL_APIDOC_HS_DIRS = \
$(APIDOC_HS_DIR) \
$(patsubst %,$(APIDOC_HS_DIR)/%,$(call strip_hsroot,$(HS_DIRS_NOROOT)))
BUILDTIME_DIR_AUTOCREATE = \
scripts \
$(APIDOC_DIR) \
$(ALL_APIDOC_HS_DIRS) \
$(APIDOC_PY_DIR) \
$(COVERAGE_DIR) \
$(COVERAGE_HS_DIR) \
$(COVERAGE_PY_DIR) \
.hpc
BUILDTIME_DIRS = \
$(BUILDTIME_DIR_AUTOCREATE) \
apps \
dist \
doc/html \
doc/man-html
DIRCHECK_EXCLUDE = \
$(BUILDTIME_DIRS) \
ganeti-[0-9]*.[0-9]*.[0-9]* \
doc/html/_* \
doc/man-html/_* \
tools/shebang \
autom4te.cache
# some helper vars
COVERAGE_DIR = doc/coverage
COVERAGE_PY_DIR = $(COVERAGE_DIR)/py
COVERAGE_HS_DIR = $(COVERAGE_DIR)/hs
APIDOC_DIR = doc/api
APIDOC_PY_DIR = $(APIDOC_DIR)/py
APIDOC_HS_DIR = $(APIDOC_DIR)/hs
MAINTAINERCLEANFILES = \
$(maninput) \
doc/install-quick.rst \
doc/news.rst \
doc/upgrade.rst \
vcs-version
maintainer-clean-local:
rm -rf $(BUILDTIME_DIRS)
CLEANFILES = \
$(addsuffix /*.py[co],$(DIRS)) \
$(addsuffix /*.hi,$(HS_DIRS)) \
$(addsuffix /*.o,$(HS_DIRS)) \
$(addsuffix /*.$(HTEST_SUFFIX)_hi,$(HS_DIRS)) \
$(addsuffix /*.$(HTEST_SUFFIX)_o,$(HS_DIRS)) \
$(HASKELL_PACKAGE_VERSIONS_FILE) \
$(CABAL_EXECUTABLES_APPS_STAMPS) \
empty-cabal-config \
ganeti.cabal \
$(HASKELL_PACKAGE_IDS_FILE) \
$(HASKELL_PACKAGE_VERSIONS_FILE) \
Makefile.ghc \
Makefile.ghc.bak \
$(PYTHON_BOOTSTRAP) \
$(gnt_python_sbin_SCRIPTS) \
epydoc.conf \
$(REPLACE_VARS_SED) \
$(SHELL_ENV_INIT) \
daemons/daemon-util \
daemons/ganeti-cleaner \
devel/squeeze-amd64.tar.gz \
devel/squeeze-amd64.conf \
$(mandocrst) \
doc/manpages-enabled.rst \
$(BUILT_EXAMPLES) \
doc/examples/bash_completion \
doc/examples/bash_completion-debug \
$(userspecs) \
lib/_generated_rpc.py \
$(man_MANS) \
$(manhtml) \
tools/kvm-ifup \
tools/kvm-ifup-os \
tools/xen-ifup-os \
tools/vif-ganeti \
tools/vif-ganeti-metad \
tools/net-common \
tools/users-setup \
tools/ssl-update \
tools/vcluster-setup \
tools/prepare-node-join \
tools/ssh-update \
$(python_scripts_shebang) \
stamp-directories \
stamp-srclinks \
$(nodist_pkgpython_PYTHON) \
$(nodist_pkgpython_rpc_stub_PYTHON) \
$(gnt_scripts) \
$(HS_ALL_PROGS) $(HS_BUILT_SRCS) \
$(HS_BUILT_TEST_HELPERS) \
src/ganeti-confd \
src/ganeti-wconfd \
src/ganeti-luxid \
src/ganeti-metad \
src/ganeti-mond \
.hpc/*.mix src/*.tix test/hs/*.tix *.tix \
doc/hs-lint.html
GENERATED_FILES = \
$(built_base_sources) \
$(built_python_sources) \
$(PYTHON_BOOTSTRAP) \
$(gnt_python_sbin_SCRIPTS)
clean-local:
rm -rf tools/shebang
rm -rf apps
rm -rf dist
HS_GENERATED_FILES = $(HS_PROGS) src/hluxid src/ganeti-luxid \
src/hconfd src/ganeti-confd
if ENABLE_MOND
HS_GENERATED_FILES += src/ganeti-mond
endif
if ENABLE_METADATA
HS_GENERATED_FILES += src/ganeti-metad
endif
built_base_sources = \
stamp-directories \
stamp-srclinks
built_python_base_sources = \
lib/_constants.py \
lib/_vcsversion.py \
lib/opcodes.py \
lib/rpc/stub/wconfd.py
if ENABLE_METADATA
built_python_base_sources += lib/rpc/stub/metad.py
endif
built_python_sources = \
$(nodist_pkgpython_PYTHON) \
$(nodist_pkgpython_rpc_stub_PYTHON)
# these are all built from the underlying %.in sources
BUILT_EXAMPLES = \
doc/examples/ganeti-kvm-poweroff.initd \
doc/examples/ganeti.cron \
doc/examples/ganeti.initd \
doc/examples/ganeti.logrotate \
doc/examples/ganeti-master-role.ocf \
doc/examples/ganeti-node-role.ocf \
doc/examples/gnt-config-backup \
doc/examples/hooks/ipsec \
doc/examples/systemd/ganeti-common.service \
doc/examples/systemd/ganeti-confd.service \
doc/examples/systemd/ganeti-kvmd.service \
doc/examples/systemd/ganeti-luxid.service \
doc/examples/systemd/ganeti-metad.service \
doc/examples/systemd/ganeti-mond.service \
doc/examples/systemd/ganeti-noded.service \
doc/examples/systemd/ganeti-rapi.service \
doc/examples/systemd/ganeti-wconfd.service
nodist_ifup_SCRIPTS = \
tools/kvm-ifup-os \
tools/xen-ifup-os
nodist_pkgpython_PYTHON = \
$(built_python_base_sources) \
lib/_generated_rpc.py
nodist_pkgpython_rpc_stub_PYTHON = \
lib/rpc/stub/wconfd.py
if ENABLE_METADATA
nodist_pkgpython_rpc_stub_PYTHON += lib/rpc/stub/metad.py
endif
nodist_pkgpython_bin_SCRIPTS = \
$(nodist_pkglib_python_scripts)
pkgpython_bin_SCRIPTS = \
$(pkglib_python_scripts)
noinst_PYTHON = \
lib/build/__init__.py \
lib/build/shell_example_lexer.py \
lib/build/sphinx_ext.py
pkgpython_PYTHON = \
lib/__init__.py \
lib/asyncnotifier.py \
lib/backend.py \
lib/bootstrap.py \
lib/cli.py \
lib/cli_opts.py \
lib/compat.py \
lib/constants.py \
lib/daemon.py \
lib/errors.py \
lib/hooksmaster.py \
lib/ht.py \
lib/jstore.py \
lib/locking.py \
lib/luxi.py \
lib/mcpu.py \
lib/metad.py \
lib/netutils.py \
lib/objects.py \
lib/opcodes_base.py \
lib/outils.py \
lib/ovf.py \
lib/pathutils.py \
lib/qlang.py \
lib/query.py \
lib/rpc_defs.py \
lib/runtime.py \
lib/serializer.py \
lib/ssconf.py \
lib/ssh.py \
lib/uidpool.py \
lib/vcluster.py \
lib/network.py \
lib/wconfd.py \
lib/workerpool.py
client_PYTHON = \
lib/client/__init__.py \
lib/client/base.py \
lib/client/gnt_backup.py \
lib/client/gnt_cluster.py \
lib/client/gnt_debug.py \
lib/client/gnt_group.py \
lib/client/gnt_instance.py \
lib/client/gnt_job.py \
lib/client/gnt_node.py \
lib/client/gnt_network.py \
lib/client/gnt_os.py \
lib/client/gnt_storage.py \
lib/client/gnt_filter.py
cmdlib_PYTHON = \
lib/cmdlib/__init__.py \
lib/cmdlib/backup.py \
lib/cmdlib/base.py \
lib/cmdlib/common.py \
lib/cmdlib/group.py \
lib/cmdlib/instance.py \
lib/cmdlib/instance_create.py \
lib/cmdlib/instance_helpervm.py \
lib/cmdlib/instance_migration.py \
lib/cmdlib/instance_operation.py \
lib/cmdlib/instance_query.py \
lib/cmdlib/instance_set_params.py \
lib/cmdlib/instance_storage.py \
lib/cmdlib/instance_utils.py \
lib/cmdlib/misc.py \
lib/cmdlib/network.py \
lib/cmdlib/node.py \
lib/cmdlib/operating_system.py \
lib/cmdlib/query.py \
lib/cmdlib/tags.py \
lib/cmdlib/test.py
cmdlib_cluster_PYTHON = \
lib/cmdlib/cluster/__init__.py \
lib/cmdlib/cluster/verify.py
config_PYTHON = \
lib/config/__init__.py \
lib/config/verify.py \
lib/config/temporary_reservations.py \
lib/config/utils.py
hypervisor_PYTHON = \
lib/hypervisor/__init__.py \
lib/hypervisor/hv_base.py \
lib/hypervisor/hv_chroot.py \
lib/hypervisor/hv_fake.py \
lib/hypervisor/hv_lxc.py \
lib/hypervisor/hv_xen.py
hypervisor_hv_kvm_PYTHON = \
lib/hypervisor/hv_kvm/__init__.py \
lib/hypervisor/hv_kvm/monitor.py \
lib/hypervisor/hv_kvm/netdev.py
jqueue_PYTHON = \
lib/jqueue/__init__.py \
lib/jqueue/exec.py
storage_PYTHON = \
lib/storage/__init__.py \
lib/storage/bdev.py \
lib/storage/base.py \
lib/storage/container.py \
lib/storage/drbd.py \
lib/storage/drbd_info.py \
lib/storage/drbd_cmdgen.py \
lib/storage/extstorage.py \
lib/storage/filestorage.py \
lib/storage/gluster.py
rapi_PYTHON = \
lib/rapi/__init__.py \
lib/rapi/baserlib.py \
lib/rapi/client.py \
lib/rapi/client_utils.py \
lib/rapi/connector.py \
lib/rapi/rlib2.py \
lib/rapi/testutils.py
http_PYTHON = \
lib/http/__init__.py \
lib/http/auth.py \
lib/http/client.py \
lib/http/server.py
confd_PYTHON = \
lib/confd/__init__.py \
lib/confd/client.py
masterd_PYTHON = \
lib/masterd/__init__.py \
lib/masterd/iallocator.py \
lib/masterd/instance.py
impexpd_PYTHON = \
lib/impexpd/__init__.py
watcher_PYTHON = \
lib/watcher/__init__.py \
lib/watcher/nodemaint.py \
lib/watcher/state.py
server_PYTHON = \
lib/server/__init__.py \
lib/server/masterd.py \
lib/server/noded.py \
lib/server/rapi.py
rpc_PYTHON = \
lib/rpc/__init__.py \
lib/rpc/client.py \
lib/rpc/errors.py \
lib/rpc/node.py \
lib/rpc/transport.py
rpc_stub_PYTHON = \
lib/rpc/stub/__init__.py
pytools_PYTHON = \
lib/tools/__init__.py \
lib/tools/burnin.py \
lib/tools/common.py \
lib/tools/ensure_dirs.py \
lib/tools/node_cleanup.py \
lib/tools/node_daemon_setup.py \
lib/tools/prepare_node_join.py \
lib/tools/ssh_update.py \
lib/tools/ssl_update.py \
lib/tools/cfgupgrade.py
utils_PYTHON = \
lib/utils/__init__.py \
lib/utils/algo.py \
lib/utils/filelock.py \
lib/utils/hash.py \
lib/utils/io.py \
lib/utils/livelock.py \
lib/utils/log.py \
lib/utils/lvm.py \
lib/utils/mlock.py \
lib/utils/nodesetup.py \
lib/utils/process.py \
lib/utils/retry.py \
lib/utils/security.py \
lib/utils/storage.py \
lib/utils/text.py \
lib/utils/version.py \
lib/utils/wrapper.py \
lib/utils/x509.py \
lib/utils/bitarrays.py
docinput = \
doc/admin.rst \
doc/cluster-keys-replacement.rst \
doc/cluster-merge.rst \
doc/conf.py \
doc/css/style.css \
doc/design-2.0.rst \
doc/design-2.1.rst \
doc/design-2.2.rst \
doc/design-2.3.rst \
doc/design-2.4.rst \
doc/design-2.5.rst \
doc/design-2.6.rst \
doc/design-2.7.rst \
doc/design-2.8.rst \
doc/design-2.9.rst \
doc/design-2.10.rst \
doc/design-2.11.rst \
doc/design-2.12.rst \
doc/design-2.13.rst \
doc/design-2.14.rst \
doc/design-2.15.rst \
doc/design-allocation-efficiency.rst \
doc/design-autorepair.rst \
doc/design-bulk-create.rst \
doc/design-ceph-ganeti-support.rst \
doc/design-configlock.rst \
doc/design-chained-jobs.rst \
doc/design-cmdlib-unittests.rst \
doc/design-cpu-pinning.rst \
doc/design-cpu-speed.rst \
doc/design-daemons.rst \
doc/design-dedicated-allocation.rst \
doc/design-device-uuid-name.rst \
doc/design-disk-conversion.rst \
doc/design-disks.rst \
doc/design-draft.rst \
doc/design-file-based-disks-ownership.rst \
doc/design-file-based-storage.rst \
doc/design-glusterfs-ganeti-support.rst \
doc/design-hotplug.rst \
doc/design-hroller.rst \
doc/design-hsqueeze.rst \
doc/design-htools-2.3.rst \
doc/design-http-server.rst \
doc/design-hugepages-support.rst \
doc/design-ifdown.rst \
doc/design-impexp2.rst \
doc/design-internal-shutdown.rst \
doc/design-kvmd.rst \
doc/design-location.rst \
doc/design-linuxha.rst \
doc/design-lu-generated-jobs.rst \
doc/design-monitoring-agent.rst \
doc/design-move-instance-improvements.rst \
doc/design-multi-reloc.rst \
doc/design-multi-storage-htools.rst \
doc/design-multi-version-tests.rst \
doc/design-network.rst \
doc/design-network2.rst \
doc/design-node-add.rst \
doc/design-node-security.rst \
doc/design-oob.rst \
doc/design-openvswitch.rst \
doc/design-opportunistic-locking.rst \
doc/design-optables.rst \
doc/design-os.rst \
doc/design-ovf-support.rst \
doc/design-partitioned.rst \
doc/design-performance-tests.rst \
doc/design-query-splitting.rst \
doc/design-query2.rst \
doc/design-query-splitting.rst \
doc/design-reason-trail.rst \
doc/design-reservations.rst \
doc/design-resource-model.rst \
doc/design-restricted-commands.rst \
doc/design-shared-storage.rst \
doc/design-shared-storage-redundancy.rst \
doc/design-ssh-ports.rst \
doc/design-storagetypes.rst \
doc/design-sync-rate-throttling.rst \
doc/design-systemd.rst \
doc/design-upgrade.rst \
doc/design-virtual-clusters.rst \
doc/design-x509-ca.rst \
doc/dev-codestyle.rst \
doc/devnotes.rst \
doc/glossary.rst \
doc/hooks.rst \
doc/iallocator.rst \
doc/index.rst \
doc/install-quick.rst \
doc/install.rst \
doc/locking.rst \
doc/manpages-disabled.rst \
doc/monitoring-query-format.rst \
doc/move-instance.rst \
doc/news.rst \
doc/ovfconverter.rst \
doc/rapi.rst \
doc/security.rst \
doc/upgrade.rst \
doc/virtual-cluster.rst \
doc/walkthrough.rst
# Generates file names such as "doc/man-gnt-instance.rst"
mandocrst = $(addprefix doc/man-,$(notdir $(manrst)))
# Haskell programs to be installed in $PREFIX/bin
HS_BIN_PROGS=src/htools
# Haskell programs to be installed in the MYEXECLIB dir
if ENABLE_MOND
HS_MYEXECLIB_PROGS=src/mon-collector
else
HS_MYEXECLIB_PROGS=
endif
# Haskell programs to be compiled by "make really-all"
HS_COMPILE_PROGS = \
src/ganeti-kvmd \
src/ganeti-wconfd \
src/hconfd \
src/hluxid \
src/hs2py \
src/rpc-test
if ENABLE_MOND
HS_COMPILE_PROGS += src/ganeti-mond
endif
if ENABLE_METADATA
HS_COMPILE_PROGS += src/ganeti-metad
endif
# All Haskell non-test programs to be compiled but not automatically installed
HS_PROGS = $(HS_BIN_PROGS) $(HS_MYEXECLIB_PROGS)
HS_BIN_ROLES = harep hbal hscan hspace hinfo hcheck hroller hsqueeze
HS_HTOOLS_PROGS = $(HS_BIN_ROLES) hail
# Haskell programs that cannot be disabled at configure (e.g., unlike
# 'mon-collector')
HS_DEFAULT_PROGS = \
$(HS_BIN_PROGS) \
test/hs/hpc-htools \
test/hs/hpc-mon-collector \
$(HS_COMPILE_PROGS)
if HTEST
HS_DEFAULT_PROGS += test/hs/htest
else
EXTRA_DIST += test/hs/htest.hs
endif
HS_ALL_PROGS = $(HS_DEFAULT_PROGS) $(HS_MYEXECLIB_PROGS)
HS_TEST_PROGS = $(filter test/%,$(HS_ALL_PROGS))
HS_SRC_PROGS = $(filter-out test/%,$(HS_ALL_PROGS))
HS_PROG_SRCS = $(patsubst %,%.hs,$(HS_DEFAULT_PROGS)) src/mon-collector.hs
HS_BUILT_TEST_HELPERS = $(HS_BIN_ROLES:%=test/hs/%) test/hs/hail
HFLAGS = \
-O -Wall -isrc \
-fwarn-monomorphism-restriction \
-fwarn-tabs \
-optP-include -optP$(HASKELL_PACKAGE_VERSIONS_FILE) \
-hide-all-packages \
`cat $(HASKELL_PACKAGE_IDS_FILE)` \
$(GHC_BYVERSION_FLAGS)
if DEVELOPER_MODE
HFLAGS += -Werror
endif
HTEST_SUFFIX = hpc
HPROF_SUFFIX = prof
DEP_SUFFIXES =
if GHC_LE_76
DEP_SUFFIXES += -dep-suffix $(HPROF_SUFFIX) -dep-suffix $(HTEST_SUFFIX)
else
# GHC >= 7.8 stopped putting underscores into -dep-suffix by itself
# (https://ghc.haskell.org/trac/ghc/ticket/9749) so we have to put them.
# It also needs -dep-suffix "" for the .o file.
DEP_SUFFIXES += -dep-suffix $(HPROF_SUFFIX)_ -dep-suffix $(HTEST_SUFFIX)_ \
-dep-suffix ""
endif
# GHC > 7.6 needs -dynamic-too when using Template Haskell since its
# ghci is switched to loading dynamic libraries by default.
# It must only be used in non-profiling GHC invocations.
# We also don't use it in compilations that use HTEST_SUFFIX (which are
# compiled with -fhpc) because HPC coverage doesn't interact well with
# GHCI shared lib loading (https://ghc.haskell.org/trac/ghc/ticket/9762).
HFLAGS_DYNAMIC =
if !GHC_LE_76
HFLAGS_DYNAMIC += -dynamic-too
endif
if HPROFILE
HPROFFLAGS = -prof -fprof-auto-top -osuf $(HPROF_SUFFIX)_o \
-hisuf $(HPROF_SUFFIX)_hi -rtsopts
endif
if HCOVERAGE
HFLAGS += -fhpc
endif
if HTEST
HFLAGS += -DTEST
endif
HTEST_FLAGS = $(HFLAGS) -fhpc -itest/hs \
-osuf $(HTEST_SUFFIX)_o \
-hisuf $(HTEST_SUFFIX)_hi
# extra flags that can be overriden on the command line (e.g. -Wwarn, etc.)
HEXTRA =
# internal extra flags (used for test/hs/htest mainly)
HEXTRA_INT =
# combination of HEXTRA and HEXTRA_CONFIGURE
HEXTRA_COMBINED = $(HEXTRA) $(HEXTRA_CONFIGURE)
# exclude options for coverage reports
HPCEXCL = --exclude Main \
--exclude Ganeti.Constants \
--exclude Ganeti.HTools.QC \
--exclude Ganeti.THH \
--exclude Ganeti.Version \
--exclude Test.Ganeti.Attoparsec \
--exclude Test.Ganeti.TestCommon \
--exclude Test.Ganeti.TestHTools \
--exclude Test.Ganeti.TestHelper \
--exclude Test.Ganeti.TestImports \
$(patsubst src.%,--exclude Test.%,$(subst /,.,$(patsubst %.hs,%, $(HS_LIB_SRCS))))
HS_LIB_SRCS = \
src/Ganeti/BasicTypes.hs \
src/Ganeti/Codec.hs \
src/Ganeti/Common.hs \
src/Ganeti/Compat.hs \
src/Ganeti/Confd/Client.hs \
src/Ganeti/Confd/ClientFunctions.hs \
src/Ganeti/Confd/Server.hs \
src/Ganeti/Confd/Types.hs \
src/Ganeti/Confd/Utils.hs \
src/Ganeti/Config.hs \
src/Ganeti/ConfigReader.hs \
src/Ganeti/Constants.hs \
src/Ganeti/ConstantUtils.hs \
src/Ganeti/Cpu/LoadParser.hs \
src/Ganeti/Cpu/Types.hs \
src/Ganeti/Curl/Multi.hs \
src/Ganeti/Daemon.hs \
src/Ganeti/Daemon/Utils.hs \
src/Ganeti/DataCollectors.hs \
src/Ganeti/DataCollectors/CLI.hs \
src/Ganeti/DataCollectors/CPUload.hs \
src/Ganeti/DataCollectors/Diskstats.hs \
src/Ganeti/DataCollectors/Drbd.hs \
src/Ganeti/DataCollectors/InstStatus.hs \
src/Ganeti/DataCollectors/InstStatusTypes.hs \
src/Ganeti/DataCollectors/Lv.hs \
src/Ganeti/DataCollectors/Program.hs \
src/Ganeti/DataCollectors/Types.hs \
src/Ganeti/DataCollectors/XenCpuLoad.hs \
src/Ganeti/Errors.hs \
src/Ganeti/HTools/AlgorithmParams.hs \
src/Ganeti/HTools/Backend/IAlloc.hs \
src/Ganeti/HTools/Backend/Luxi.hs \
src/Ganeti/HTools/Backend/MonD.hs \
src/Ganeti/HTools/Backend/Rapi.hs \
src/Ganeti/HTools/Backend/Simu.hs \
src/Ganeti/HTools/Backend/Text.hs \
src/Ganeti/HTools/CLI.hs \
src/Ganeti/HTools/Cluster.hs \
src/Ganeti/HTools/Cluster/Evacuate.hs \
src/Ganeti/HTools/Cluster/Metrics.hs \
src/Ganeti/HTools/Cluster/Moves.hs \
src/Ganeti/HTools/Cluster/Utils.hs \
src/Ganeti/HTools/Container.hs \
src/Ganeti/HTools/Dedicated.hs \
src/Ganeti/HTools/ExtLoader.hs \
src/Ganeti/HTools/GlobalN1.hs \
src/Ganeti/HTools/Graph.hs \
src/Ganeti/HTools/Group.hs \
src/Ganeti/HTools/Instance.hs \
src/Ganeti/HTools/Loader.hs \
src/Ganeti/HTools/Nic.hs \
src/Ganeti/HTools/Node.hs \
src/Ganeti/HTools/PeerMap.hs \
src/Ganeti/HTools/Program/Hail.hs \
src/Ganeti/HTools/Program/Harep.hs \
src/Ganeti/HTools/Program/Hbal.hs \
src/Ganeti/HTools/Program/Hcheck.hs \
src/Ganeti/HTools/Program/Hinfo.hs \
src/Ganeti/HTools/Program/Hscan.hs \
src/Ganeti/HTools/Program/Hspace.hs \
src/Ganeti/HTools/Program/Hsqueeze.hs \
src/Ganeti/HTools/Program/Hroller.hs \
src/Ganeti/HTools/Program/Main.hs \
src/Ganeti/HTools/Tags.hs \
src/Ganeti/HTools/Types.hs \
src/Ganeti/Hypervisor/Xen.hs \
src/Ganeti/Hypervisor/Xen/XmParser.hs \
src/Ganeti/Hypervisor/Xen/Types.hs \
src/Ganeti/Hash.hs \
src/Ganeti/Hs2Py/GenConstants.hs \
src/Ganeti/Hs2Py/GenOpCodes.hs \
src/Ganeti/Hs2Py/OpDoc.hs \
src/Ganeti/JQScheduler.hs \
src/Ganeti/JQScheduler/Filtering.hs \
src/Ganeti/JQScheduler/ReasonRateLimiting.hs \
src/Ganeti/JQScheduler/Types.hs \
src/Ganeti/JQueue.hs \
src/Ganeti/JQueue/Lens.hs \
src/Ganeti/JQueue/Objects.hs \
src/Ganeti/JSON.hs \
src/Ganeti/Jobs.hs \
src/Ganeti/Kvmd.hs \
src/Ganeti/Lens.hs \
src/Ganeti/Locking/Allocation.hs \
src/Ganeti/Locking/Types.hs \
src/Ganeti/Locking/Locks.hs \
src/Ganeti/Locking/Waiting.hs \
src/Ganeti/Logging.hs \
src/Ganeti/Logging/Lifted.hs \
src/Ganeti/Logging/WriterLog.hs \
src/Ganeti/Luxi.hs \
src/Ganeti/Network.hs \
src/Ganeti/Objects.hs \
src/Ganeti/Objects/BitArray.hs \
src/Ganeti/Objects/Disk.hs \
src/Ganeti/Objects/Instance.hs \
src/Ganeti/Objects/Lens.hs \
src/Ganeti/Objects/Nic.hs \
src/Ganeti/OpCodes.hs \
src/Ganeti/OpCodes/Lens.hs \
src/Ganeti/OpParams.hs \
src/Ganeti/Path.hs \
src/Ganeti/Parsers.hs \
src/Ganeti/PyValue.hs \
src/Ganeti/Query/Cluster.hs \
src/Ganeti/Query/Common.hs \
src/Ganeti/Query/Exec.hs \
src/Ganeti/Query/Export.hs \
src/Ganeti/Query/Filter.hs \
src/Ganeti/Query/FilterRules.hs \
src/Ganeti/Query/Group.hs \
src/Ganeti/Query/Instance.hs \
src/Ganeti/Query/Job.hs \
src/Ganeti/Query/Language.hs \
src/Ganeti/Query/Locks.hs \
src/Ganeti/Query/Network.hs \
src/Ganeti/Query/Node.hs \
src/Ganeti/Query/Query.hs \
src/Ganeti/Query/Server.hs \
src/Ganeti/Query/Types.hs \
src/Ganeti/PartialParams.hs \
src/Ganeti/Rpc.hs \
src/Ganeti/Runtime.hs \
src/Ganeti/SlotMap.hs \
src/Ganeti/Ssconf.hs \
src/Ganeti/Storage/Diskstats/Parser.hs \
src/Ganeti/Storage/Diskstats/Types.hs \
src/Ganeti/Storage/Drbd/Parser.hs \
src/Ganeti/Storage/Drbd/Types.hs \
src/Ganeti/Storage/Lvm/LVParser.hs \
src/Ganeti/Storage/Lvm/Types.hs \
src/Ganeti/Storage/Utils.hs \
src/Ganeti/THH.hs \
src/Ganeti/THH/Field.hs \
src/Ganeti/THH/HsRPC.hs \
src/Ganeti/THH/PyRPC.hs \
src/Ganeti/THH/PyType.hs \
src/Ganeti/THH/Types.hs \
src/Ganeti/THH/RPC.hs \
src/Ganeti/Types.hs \
src/Ganeti/UDSServer.hs \
src/Ganeti/Utils.hs \
src/Ganeti/Utils/Atomic.hs \
src/Ganeti/Utils/AsyncWorker.hs \
src/Ganeti/Utils/IORef.hs \
src/Ganeti/Utils/Livelock.hs \
src/Ganeti/Utils/Monad.hs \
src/Ganeti/Utils/MultiMap.hs \
src/Ganeti/Utils/MVarLock.hs \
src/Ganeti/Utils/Random.hs \
src/Ganeti/Utils/Statistics.hs \
src/Ganeti/Utils/UniStd.hs \
src/Ganeti/Utils/Validate.hs \
src/Ganeti/VCluster.hs \
src/Ganeti/WConfd/ConfigState.hs \
src/Ganeti/WConfd/ConfigModifications.hs \
src/Ganeti/WConfd/ConfigVerify.hs \
src/Ganeti/WConfd/ConfigWriter.hs \
src/Ganeti/WConfd/Client.hs \
src/Ganeti/WConfd/Core.hs \
src/Ganeti/WConfd/DeathDetection.hs \
src/Ganeti/WConfd/Language.hs \
src/Ganeti/WConfd/Monad.hs \
src/Ganeti/WConfd/Persistent.hs \
src/Ganeti/WConfd/Server.hs \
src/Ganeti/WConfd/Ssconf.hs \
src/Ganeti/WConfd/TempRes.hs
if ENABLE_MOND
HS_LIB_SRCS += src/Ganeti/Monitoring/Server.hs
else
EXTRA_DIST += src/Ganeti/Monitoring/Server.hs
endif
if ENABLE_METADATA
HS_LIB_SRCS += \
src/Ganeti/Metad/Config.hs \
src/Ganeti/Metad/ConfigCore.hs \
src/Ganeti/Metad/ConfigServer.hs \
src/Ganeti/Metad/Server.hs \
src/Ganeti/Metad/Types.hs \
src/Ganeti/Metad/WebServer.hs
else
EXTRA_DIST += \
src/Ganeti/Metad/Config.hs \
src/Ganeti/Metad/ConfigCore.hs \
src/Ganeti/Metad/ConfigServer.hs \
src/Ganeti/Metad/Server.hs \
src/Ganeti/Metad/Types.hs \
src/Ganeti/Metad/WebServer.hs
endif
HS_TEST_SRCS = \
test/hs/Test/AutoConf.hs \
test/hs/Test/Ganeti/Attoparsec.hs \
test/hs/Test/Ganeti/BasicTypes.hs \
test/hs/Test/Ganeti/Common.hs \
test/hs/Test/Ganeti/Confd/Types.hs \
test/hs/Test/Ganeti/Confd/Utils.hs \
test/hs/Test/Ganeti/Constants.hs \
test/hs/Test/Ganeti/Daemon.hs \
test/hs/Test/Ganeti/Errors.hs \
test/hs/Test/Ganeti/HTools/Backend/MonD.hs \
test/hs/Test/Ganeti/HTools/Backend/Simu.hs \
test/hs/Test/Ganeti/HTools/Backend/Text.hs \
test/hs/Test/Ganeti/HTools/CLI.hs \
test/hs/Test/Ganeti/HTools/Cluster.hs \
test/hs/Test/Ganeti/HTools/Container.hs \
test/hs/Test/Ganeti/HTools/Graph.hs \
test/hs/Test/Ganeti/HTools/Instance.hs \
test/hs/Test/Ganeti/HTools/Loader.hs \
test/hs/Test/Ganeti/HTools/Node.hs \
test/hs/Test/Ganeti/HTools/PeerMap.hs \
test/hs/Test/Ganeti/HTools/Types.hs \
test/hs/Test/Ganeti/Hypervisor/Xen/XmParser.hs \
test/hs/Test/Ganeti/JSON.hs \
test/hs/Test/Ganeti/Jobs.hs \
test/hs/Test/Ganeti/JQScheduler.hs \
test/hs/Test/Ganeti/JQueue.hs \
test/hs/Test/Ganeti/JQueue/Objects.hs \
test/hs/Test/Ganeti/Kvmd.hs \
test/hs/Test/Ganeti/Luxi.hs \
test/hs/Test/Ganeti/Locking/Allocation.hs \
test/hs/Test/Ganeti/Locking/Locks.hs \
test/hs/Test/Ganeti/Locking/Waiting.hs \
test/hs/Test/Ganeti/Network.hs \
test/hs/Test/Ganeti/PartialParams.hs \
test/hs/Test/Ganeti/Objects.hs \
test/hs/Test/Ganeti/Objects/BitArray.hs \
test/hs/Test/Ganeti/OpCodes.hs \
test/hs/Test/Ganeti/Query/Aliases.hs \
test/hs/Test/Ganeti/Query/Filter.hs \
test/hs/Test/Ganeti/Query/Instance.hs \
test/hs/Test/Ganeti/Query/Language.hs \
test/hs/Test/Ganeti/Query/Network.hs \
test/hs/Test/Ganeti/Query/Query.hs \
test/hs/Test/Ganeti/Rpc.hs \
test/hs/Test/Ganeti/Runtime.hs \
test/hs/Test/Ganeti/SlotMap.hs \
test/hs/Test/Ganeti/Ssconf.hs \
test/hs/Test/Ganeti/Storage/Diskstats/Parser.hs \
test/hs/Test/Ganeti/Storage/Drbd/Parser.hs \
test/hs/Test/Ganeti/Storage/Drbd/Types.hs \
test/hs/Test/Ganeti/Storage/Lvm/LVParser.hs \
test/hs/Test/Ganeti/THH.hs \
test/hs/Test/Ganeti/THH/Types.hs \
test/hs/Test/Ganeti/TestCommon.hs \
test/hs/Test/Ganeti/TestHTools.hs \
test/hs/Test/Ganeti/TestHelper.hs \
test/hs/Test/Ganeti/Types.hs \
test/hs/Test/Ganeti/Utils.hs \
test/hs/Test/Ganeti/Utils/MultiMap.hs \
test/hs/Test/Ganeti/Utils/Statistics.hs \
test/hs/Test/Ganeti/WConfd/TempRes.hs
HS_LIBTEST_SRCS = $(HS_LIB_SRCS) $(HS_TEST_SRCS)
HS_BUILT_SRCS = \
test/hs/Test/Ganeti/TestImports.hs \
src/AutoConf.hs \
src/Ganeti/Hs2Py/ListConstants.hs \
src/Ganeti/Curl/Internal.hs \
src/Ganeti/Version.hs
HS_BUILT_SRCS_IN = \
$(patsubst %,%.in,$(filter-out src/Ganeti/Curl/Internal.hs,$(HS_BUILT_SRCS))) \
src/Ganeti/Curl/Internal.hsc \
lib/_constants.py.in \
lib/opcodes.py.in_after \
lib/opcodes.py.in_before
HS_LIBTESTBUILT_SRCS = $(HS_LIBTEST_SRCS) $(HS_BUILT_SRCS)
$(RUN_IN_TEMPDIR): | stamp-directories
doc/html/index.html: ENABLE_MANPAGES =
doc/man-html/index.html: ENABLE_MANPAGES = 1
doc/man-html/index.html: doc/manpages-enabled.rst $(mandocrst)
if HAS_SPHINX_PRE13
SPHINX_HTML_THEME=default
else
SPHINX_HTML_THEME=classic
endif
# Note: we use here an order-only prerequisite, as the contents of
# _constants.py are not actually influencing the html build output: it
# has to exist in order for the sphinx module to be loaded
# successfully, but we certainly don't want the docs to be rebuilt if
# it changes
doc/html/index.html doc/man-html/index.html: $(docinput) doc/conf.py \
configure.ac $(RUN_IN_TEMPDIR) lib/build/sphinx_ext.py \
lib/build/shell_example_lexer.py lib/ht.py \
doc/css/style.css lib/rapi/connector.py lib/rapi/rlib2.py \
autotools/sphinx-wrapper | $(built_python_sources)
@test -n "$(SPHINX)" || \
{ echo 'sphinx-build' not found during configure; exit 1; }
if !MANPAGES_IN_DOC
if test -n '$(ENABLE_MANPAGES)'; then \
echo 'Man pages in documentation were disabled at configure time' >&2; \
exit 1; \
fi
endif
## Sphinx provides little control over what content should be included. Some
## mechanisms exist, but they all have drawbacks or actual issues. Since we
## build two different versions of the documentation--once without man pages and
## once, if enabled, with them--some control is necessary. xmpp-wrapper provides
## us with this, but requires running in a temporary directory. It moves the
## correct files into place depending on environment variables.
dir=$(dir $@) && \
@mkdir_p@ $$dir && \
PYTHONPATH=. ENABLE_MANPAGES=$(ENABLE_MANPAGES) COPY_DOC=1 \
HTML_THEME=$(SPHINX_HTML_THEME) \
$(RUN_IN_TEMPDIR) autotools/sphinx-wrapper $(SPHINX) -q -W -b html \
-d . \
-D version="$(VERSION_MAJOR).$(VERSION_MINOR)" \
-D release="$(PACKAGE_VERSION)" \
-D graphviz_dot="$(DOT)" \
doc $(CURDIR)/$$dir && \
rm -f $$dir/.buildinfo $$dir/objects.inv
touch $@
doc/html: doc/html/index.html
doc/man-html: doc/man-html/index.html
doc/install-quick.rst: INSTALL
doc/news.rst: NEWS
doc/upgrade.rst: UPGRADE
doc/install-quick.rst doc/news.rst doc/upgrade.rst:
set -e; \
{ echo '.. This file is automatically updated at build time from $<.'; \
echo '.. Do not edit.'; \
echo; \
cat $<; \
} > $@
doc/manpages-enabled.rst: Makefile | $(built_base_sources)
{ echo '.. This file is automatically generated, do not edit!'; \
echo ''; \
echo 'Man pages'; \
echo '========='; \
echo; \
echo '.. toctree::'; \
echo ' :maxdepth: 1'; \
echo; \
for i in $(notdir $(mandocrst)); do \
echo " $$i"; \
done | LC_ALL=C sort; \
} > $@
doc/man-%.rst: man/%.gen Makefile $(REPLACE_VARS_SED) | $(built_base_sources)
if MANPAGES_IN_DOC
{ echo '.. This file is automatically updated at build time from $<.'; \
echo '.. Do not edit.'; \
echo; \
echo "$*"; \
echo '=========================================='; \
tail -n +3 $< | sed -f $(REPLACE_VARS_SED); \
} > $@
else
echo 'Man pages in documentation were disabled at configure time' >&2; \
exit 1;
endif
doc/users/%: doc/users/%.in Makefile $(REPLACE_VARS_SED)
cat $< | sed -f $(REPLACE_VARS_SED) | LC_ALL=C sort | uniq | (grep -v '^root' || true) > $@
userspecs = \
doc/users/users \
doc/users/groups \
doc/users/groupmemberships
# Things to build but not to install (add it to EXTRA_DIST if it should be
# distributed)
noinst_DATA = \
$(BUILT_EXAMPLES) \
doc/examples/bash_completion \
doc/examples/bash_completion-debug \
$(userspecs) \
$(manhtml)
if HAS_SPHINX
if MANPAGES_IN_DOC
noinst_DATA += doc/man-html
else
noinst_DATA += doc/html
endif
endif
gnt_scripts = \
scripts/gnt-backup \
scripts/gnt-cluster \
scripts/gnt-debug \
scripts/gnt-group \
scripts/gnt-instance \
scripts/gnt-job \
scripts/gnt-network \
scripts/gnt-node \
scripts/gnt-os \
scripts/gnt-storage \
scripts/gnt-filter
gnt_scripts_basenames = \
$(patsubst scripts/%,%,$(patsubst daemons/%,%,$(gnt_scripts) $(gnt_python_sbin_SCRIPTS)))
gnt_python_sbin_SCRIPTS = \
$(PYTHON_BOOTSTRAP_SBIN)
gntpython_SCRIPTS = $(gnt_scripts)
PYTHON_BOOTSTRAP_SBIN = \
daemons/ganeti-noded \
daemons/ganeti-rapi \
daemons/ganeti-watcher
PYTHON_BOOTSTRAP = \
tools/burnin \
tools/ensure-dirs \
tools/node-cleanup \
tools/node-daemon-setup \
tools/prepare-node-join \
tools/ssh-update \
tools/ssl-update
qa_scripts = \
qa/__init__.py \
qa/ganeti-qa.py \
qa/qa_cluster.py \
qa/qa_config.py \
qa/qa_daemon.py \
qa/qa_env.py \
qa/qa_error.py \
qa/qa_filters.py \
qa/qa_group.py \
qa/qa_instance.py \
qa/qa_instance_utils.py \
qa/qa_iptables.py \
qa/qa_job.py \
qa/qa_job_utils.py \
qa/qa_logging.py \
qa/qa_monitoring.py \
qa/qa_network.py \
qa/qa_node.py \
qa/qa_os.py \
qa/qa_performance.py \
qa/qa_rapi.py \
qa/qa_tags.py \
qa/qa_utils.py
bin_SCRIPTS = $(HS_BIN_PROGS)
install-exec-hook:
@mkdir_p@ $(DESTDIR)$(iallocatorsdir)
# FIXME: this is a hardcoded logic, instead of auto-resolving
$(LN_S) -f ../../../bin/htools \
$(DESTDIR)$(iallocatorsdir)/hail
for role in $(HS_BIN_ROLES); do \
$(LN_S) -f htools $(DESTDIR)$(bindir)/$$role ; \
done
HS_SRCS = $(HS_LIBTESTBUILT_SRCS)
HS_MAKEFILE_GHC_SRCS = $(HS_SRC_PROGS:%=%.hs)
if WANT_HSTESTS
HS_MAKEFILE_GHC_SRCS += $(HS_TEST_PROGS:%=%.hs)
endif
Makefile.ghc: $(HS_MAKEFILE_GHC_SRCS) Makefile $(HASKELL_PACKAGE_VERSIONS_FILE) \
| $(built_base_sources) $(HS_BUILT_SRCS)
$(GHC) -M -dep-makefile $@ $(DEP_SUFFIXES) $(HFLAGS) $(HFLAGS_DYNAMIC) \
-itest/hs \
$(HEXTRA_COMBINED) $(HS_MAKEFILE_GHC_SRCS)
# Since ghc -M does not generate dependency line for object files, dependencies
# from a target executable seed object (e.g. src/hluxid.o) to objects which
# finally will be linked to the target object (e.g. src/Ganeti/Daemon.o) are
# missing in Makefile.ghc.
# see: https://www.haskell.org/ghc/docs/7.6.2/html/users_guide/separate-compilation.html#makefile-dependencies
# Following substitutions will add dependencies between object files which
# corresponds to the interface file already there as a dependency for each
# object listed in Makefile.ghc.
# e.g. src/hluxid.o : src/Ganeti/Daemon.hi
# => src/hluxid.o : src/Ganeti/Daemon.hi src/Ganeti/Daemon.o
sed -i -r -e 's/([^ ]+)\.hi$$/\1.hi \1.o/' -e 's/([^ ]+)_hi$$/\1_hi \1_o/' $@
@include_makefile_ghc@
# Contains the package-id flags for the current build: "-package-id" followed
# by the name and hash of the package, one for each dependency.
# Obtained from the setup-config using the Cabal API
# (CabalDependenciesMacros.hs) after `cabal configure`.
# This file is created along with HASKELL_PACKAGE_VERSIONS_FILE; if you want
# to depend on it in a rule, depend on HASKELL_PACKAGE_VERSIONS_FILE instead.
HASKELL_PACKAGE_IDS_FILE = ganeti.depsflags
# Defines the MIN_VERSION_* macros for all Haskell packages used in this
# compilation.
# The versions are determined using `cabal configure`, which takes them from
# the ghc-pkg database.
# At the moment, we don't support cabal sandboxes, so we use cabal configure
# with the --user flag.
# Note: `cabal configure` and CabalDependenciesMacros.hs perform no
# downloading (only `cabal install` can do that).
HASKELL_PACKAGE_VERSIONS_FILE = cabal_macros.h
$(HASKELL_PACKAGE_VERSIONS_FILE): Makefile ganeti.cabal \
cabal/CabalDependenciesMacros.hs
touch empty-cabal-config
$(CABAL) --config-file=empty-cabal-config configure --user \
-f`test $(HTEST) == yes && echo "htest" || echo "-htest"` \
-f`test $(ENABLE_MOND) == True && echo "mond" || echo "-mond"` \
-f`test $(ENABLE_METADATA) == True && echo "metad" || echo "-metad"`
runhaskell $(abs_top_srcdir)/cabal/CabalDependenciesMacros.hs \
ganeti.cabal \
$(HASKELL_PACKAGE_IDS_FILE) \
$(HASKELL_PACKAGE_VERSIONS_FILE)
# Like the %.o rule, but allows access to the test/hs directory.
# This uses HFLAGS instead of HTEST_FLAGS because it's only for generating
# object files (.o for GHC <= 7.6, .o/.so for newer GHCs) that are loaded
# in GHCI when evaluating TH. The actual test-with-coverage .hpc_o files
# are created in the `%.$(HTEST_SUFFIX)_o` rule.
test/hs/%.o: $(HASKELL_PACKAGE_VERSIONS_FILE)
@echo '[GHC|test]: $@ <- test/hs/$^'
@$(GHC) -c $(HFLAGS) -itest/hs $(HFLAGS_DYNAMIC) \
$(HEXTRA_COMBINED) $(@:%.o=%.hs)
%.o: $(HASKELL_PACKAGE_VERSIONS_FILE)
@echo '[GHC]: $@ <- $^'
@$(GHC) -c $(HFLAGS) $(HFLAGS_DYNAMIC) \
$(HEXTRA_COMBINED) $(@:%.o=%.hs)
# For TH+profiling we need to compile twice: Once without profiling,
# and then once with profiling. See
# http://www.haskell.org/ghc/docs/7.0.4/html/users_guide/template-haskell.html#id636646
if HPROFILE
%.$(HPROF_SUFFIX)_o: %.o
@echo '[GHC|prof]: $@ <- $^'
@$(GHC) -c $(HFLAGS) \
$(HPROFFLAGS) \
$(HEXTRA_COMBINED) \
$(@:%.$(HPROF_SUFFIX)_o=%.hs)
endif
# We depend on the non-test .o file here because we need the corresponding .so
# file for GHC > 7.6 ghci dynamic loading for TH, and creating the .o file
# will create the .so file since we use -dynamic-too (using the `test/hs/%.o`
# rule).
%.$(HTEST_SUFFIX)_o: %.o
@echo '[GHC|test]: $@ <- $^'
@$(GHC) -c $(HTEST_FLAGS) \
$(HEXTRA_COMBINED) $(@:%.$(HTEST_SUFFIX)_o=%.hs)
%.hi: %.o ;
%.$(HTEST_SUFFIX)_hi: %.$(HTEST_SUFFIX)_o ;
%.$(HPROF_SUFFIX)_hi: %.$(HPROF_SUFFIX)_o ;
if HPROFILE
$(HS_SRC_PROGS): %: %.$(HPROF_SUFFIX)_o | stamp-directories
@echo '[GHC-link]: $@'
$(GHC) $(HFLAGS) $(HPROFFLAGS) \
$(HEXTRA_COMBINED) --make $(@:%=%.hs)
else
$(HS_SRC_PROGS): %: %.o | stamp-directories
@echo '[GHC-link]: $@'
$(GHC) $(HFLAGS) $(HFLAGS_DYNAMIC) \
$(HEXTRA_COMBINED) --make $(@:%=%.hs)
endif
@rm -f $(notdir $@).tix
@touch "$@"
$(HS_TEST_PROGS): %: %.$(HTEST_SUFFIX)_o \
| stamp-directories $(built_python_sources)
@if [ "$(HS_NODEV)" ]; then \
echo "Error: cannot run unittests without the development" \
" libraries (see devnotes.rst)" 1>&2; \
exit 1; \
fi
@echo '[GHC-link|test]: $@'
$(GHC) $(HTEST_FLAGS) \
$(HEXTRA_COMBINED) --make $(@:%=%.hs)
@rm -f $(notdir $@).tix
@touch "$@"
dist_sbin_SCRIPTS = \
tools/ganeti-listrunner
nodist_sbin_SCRIPTS = \
daemons/ganeti-cleaner \
src/ganeti-kvmd \
src/ganeti-luxid \
src/ganeti-confd \
src/ganeti-wconfd
src/ganeti-luxid: src/hluxid
cp -f $< $@
# strip path prefixes off the sbin scripts
all_sbin_scripts = \
$(patsubst tools/%,%,$(patsubst daemons/%,%,$(patsubst scripts/%,%,\
$(patsubst src/%,%,$(dist_sbin_SCRIPTS) $(nodist_sbin_SCRIPTS)))))
src/ganeti-confd: src/hconfd
cp -f $< $@
if ENABLE_MOND
nodist_sbin_SCRIPTS += src/ganeti-mond
endif
if ENABLE_METADATA
nodist_sbin_SCRIPTS += src/ganeti-metad
endif
python_scripts = \
tools/cfgshell \
tools/cfgupgrade \
tools/cfgupgrade12 \
tools/cluster-merge \
tools/confd-client \
tools/fmtjson \
tools/lvmstrap \
tools/move-instance \
tools/ovfconverter \
tools/post-upgrade \
tools/sanitize-config \
tools/query-config
python_scripts_shebang = \
$(patsubst tools/%,tools/shebang/%, $(python_scripts))
tools/shebang/%: tools/%
mkdir -p tools/shebang
head -1 $< | sed 's|#!/usr/bin/python|#!$(PYTHON)|' > $@
echo '# Generated file; do not edit.' >> $@
tail -n +2 $< >> $@
dist_tools_SCRIPTS = \
tools/kvm-console-wrapper \
tools/master-ip-setup \
tools/xen-console-wrapper
dist_tools_python_SCRIPTS = \
tools/burnin
nodist_tools_python_SCRIPTS = \
tools/node-cleanup \
$(python_scripts_shebang)
tools_python_basenames = \
$(patsubst shebang/%,%,\
$(patsubst tools/%,%,\
$(dist_tools_python_SCRIPTS) $(nodist_tools_python_SCRIPTS)))
nodist_tools_SCRIPTS = \
tools/users-setup \
tools/vcluster-setup
tools_basenames = $(patsubst tools/%,%,$(nodist_tools_SCRIPTS) $(dist_tools_SCRIPTS))
pkglib_python_scripts = \
daemons/import-export \
tools/check-cert-expired
nodist_pkglib_python_scripts = \
tools/ensure-dirs \
tools/node-daemon-setup \
tools/prepare-node-join \
tools/ssh-update \
tools/ssl-update
pkglib_python_basenames = \
$(patsubst daemons/%,%,$(patsubst tools/%,%,\
$(pkglib_python_scripts) $(nodist_pkglib_python_scripts)))
myexeclib_SCRIPTS = \
daemons/daemon-util \
tools/kvm-ifup \
tools/kvm-ifup-os \
tools/xen-ifup-os \
tools/vif-ganeti \
tools/vif-ganeti-metad \
tools/net-common \
$(HS_MYEXECLIB_PROGS)
# compute the basenames of the myexeclib_scripts
myexeclib_scripts_basenames = \
$(patsubst tools/%,%,$(patsubst daemons/%,%,$(patsubst src/%,%,$(myexeclib_SCRIPTS))))
EXTRA_DIST += \
NEWS \
UPGRADE \
epydoc.conf.in \
pylintrc \
pylintrc-test \
autotools/build-bash-completion \
autotools/build-rpc \
autotools/check-header \
autotools/check-imports \
autotools/check-man-dashes \
autotools/check-man-references \
autotools/check-man-warnings \
autotools/check-news \
autotools/check-python-code \
autotools/check-tar \
autotools/check-version \
autotools/docpp \
autotools/gen-py-coverage \
autotools/print-py-constants \
autotools/sphinx-wrapper \
autotools/testrunner \
autotools/wrong-hardcoded-paths \
cabal/cabal-from-modules.py \
$(RUN_IN_TEMPDIR) \
daemons/daemon-util.in \
daemons/ganeti-cleaner.in \
$(pkglib_python_scripts) \
devel/build_chroot \
devel/upload \
devel/webserver \
tools/kvm-ifup.in \
tools/ifup-os.in \
tools/vif-ganeti.in \
tools/vif-ganeti-metad.in \
tools/net-common.in \
tools/vcluster-setup.in \
$(python_scripts) \
$(docinput) \
doc/html \
$(BUILT_EXAMPLES:%=%.in) \
doc/examples/ganeti.default \
doc/examples/ganeti.default-debug \
doc/examples/hooks/ethers \
doc/examples/gnt-debug/README \
doc/examples/gnt-debug/delay0.json \
doc/examples/gnt-debug/delay50.json \
doc/examples/systemd/ganeti-master.target \
doc/examples/systemd/ganeti-node.target \
doc/examples/systemd/ganeti.service \
doc/examples/systemd/ganeti.target \
doc/users/groupmemberships.in \
doc/users/groups.in \
doc/users/users.in \
ganeti.cabal \
cabal/ganeti.template.cabal \
cabal/CabalDependenciesMacros.hs \
$(dist_TESTS) \
$(TEST_FILES) \
$(python_test_support) \
$(python_test_utils) \
man/footer.rst \
$(manrst) \
$(maninput) \
qa/qa-sample.json \
$(qa_scripts) \
$(HS_LIBTEST_SRCS) $(HS_BUILT_SRCS_IN) \
$(HS_PROG_SRCS) \
src/lint-hints.hs \
test/hs/cli-tests-defs.sh \
test/hs/offline-test.sh \
.ghci
man_MANS = \
man/ganeti-cleaner.8 \
man/ganeti-confd.8 \
man/ganeti-luxid.8 \
man/ganeti-listrunner.8 \
man/ganeti-kvmd.8 \
man/ganeti-mond.8 \
man/ganeti-noded.8 \
man/ganeti-os-interface.7 \
man/ganeti-extstorage-interface.7 \
man/ganeti-rapi.8 \
man/ganeti-watcher.8 \
man/ganeti-wconfd.8 \
man/ganeti.7 \
man/gnt-backup.8 \
man/gnt-cluster.8 \
man/gnt-debug.8 \
man/gnt-group.8 \
man/gnt-network.8 \
man/gnt-instance.8 \
man/gnt-job.8 \
man/gnt-node.8 \
man/gnt-os.8 \
man/gnt-storage.8 \
man/gnt-filter.8 \
man/hail.1 \
man/harep.1 \
man/hbal.1 \
man/hcheck.1 \
man/hinfo.1 \
man/hscan.1 \
man/hspace.1 \
man/hsqueeze.1 \
man/hroller.1 \
man/htools.1 \
man/mon-collector.7
# Remove extensions from all filenames in man_MANS
mannoext = $(patsubst %.1,%,$(patsubst %.7,%,$(patsubst %.8,%,$(man_MANS))))
manrst = $(patsubst %,%.rst,$(mannoext))
manhtml = $(patsubst %.rst,%.html,$(manrst))
mangen = $(patsubst %.rst,%.gen,$(manrst))
maninput = \
$(patsubst %.1,%.1.in,$(patsubst %.7,%.7.in,$(patsubst %.8,%.8.in,$(man_MANS)))) \
$(patsubst %.html,%.html.in,$(manhtml)) \
$(mangen)
manfullpath = $(patsubst man/%.1,man1/%.1,\
$(patsubst man/%.7,man7/%.7,\
$(patsubst man/%.8,man8/%.8,$(man_MANS))))
TEST_FILES = \
test/autotools/autotools-check-news.test \
test/data/htools/clean-nonzero-score.data \
test/data/htools/common-suffix.data \
test/data/htools/empty-cluster.data \
test/data/htools/hail-alloc-dedicated-1.json \
test/data/htools/hail-alloc-drbd.json \
test/data/htools/hail-alloc-invalid-network.json \
test/data/htools/hail-alloc-invalid-twodisks.json \
test/data/htools/hail-alloc-restricted-network.json \
test/data/htools/hail-alloc-nlocation.json \
test/data/htools/hail-alloc-plain-tags.json \
test/data/htools/hail-alloc-spindles.json \
test/data/htools/hail-alloc-twodisks.json \
test/data/htools/hail-change-group.json \
test/data/htools/hail-invalid-reloc.json \
test/data/htools/hail-node-evac.json \
test/data/htools/hail-reloc-drbd.json \
test/data/htools/hail-reloc-drbd-crowded.json \
test/data/htools/hbal-cpu-speed.data \
test/data/htools/hbal-dyn.data \
test/data/htools/hbal-evac.data \
test/data/htools/hbal-excl-tags.data \
test/data/htools/hbal-forth.data \
test/data/htools/hbal-location-1.data \
test/data/htools/hbal-location-2.data \
test/data/htools/hbal-migration-1.data \
test/data/htools/hbal-migration-2.data \
test/data/htools/hbal-migration-3.data \
test/data/htools/hail-multialloc-dedicated.json \
test/data/htools/hbal-soft-errors.data \
test/data/htools/hbal-split-insts.data \
test/data/htools/hspace-groups-one.data \
test/data/htools/hspace-groups-two.data \
test/data/htools/hspace-tiered-dualspec-exclusive.data \
test/data/htools/hspace-tiered-dualspec.data \
test/data/htools/hspace-tiered-exclusive.data \
test/data/htools/hspace-tiered-ipolicy.data \
test/data/htools/hspace-tiered-mixed.data \
test/data/htools/hspace-tiered-resourcetypes.data \
test/data/htools/hspace-tiered-vcpu.data \
test/data/htools/hspace-tiered.data \
test/data/htools/invalid-node.data \
test/data/htools/missing-resources.data \
test/data/htools/multiple-master.data \
test/data/htools/multiple-tags.data \
test/data/htools/n1-failure.data \
test/data/htools/rapi/groups.json \
test/data/htools/rapi/info.json \
test/data/htools/rapi/instances.json \
test/data/htools/rapi/nodes.json \
test/data/htools/hroller-full.data \
test/data/htools/hroller-nodegroups.data \
test/data/htools/hroller-nonredundant.data \
test/data/htools/hroller-online.data \
test/data/htools/hsqueeze-mixed-instances.data \
test/data/htools/hsqueeze-overutilized.data \
test/data/htools/hsqueeze-underutilized.data \
test/data/htools/shared-n1-failure.data \
test/data/htools/unique-reboot-order.data \
test/data/mond-data.txt \
test/hs/shelltests/htools-balancing.test \
test/hs/shelltests/htools-basic.test \
test/hs/shelltests/htools-dynutil.test \
test/hs/shelltests/htools-excl.test \
test/hs/shelltests/htools-hail.test \
test/hs/shelltests/htools-hbal-evac.test \
test/hs/shelltests/htools-hbal.test \
test/hs/shelltests/htools-hcheck.test \
test/hs/shelltests/htools-hroller.test \
test/hs/shelltests/htools-hspace.test \
test/hs/shelltests/htools-hsqueeze.test \
test/hs/shelltests/htools-invalid.test \
test/hs/shelltests/htools-multi-group.test \
test/hs/shelltests/htools-no-backend.test \
test/hs/shelltests/htools-rapi.test \
test/hs/shelltests/htools-single-group.test \
test/hs/shelltests/htools-text-backend.test \
test/hs/shelltests/htools-mon-collector.test \
test/data/bdev-drbd-8.0.txt \
test/data/bdev-drbd-8.3.txt \
test/data/bdev-drbd-8.4.txt \
test/data/bdev-drbd-8.4-no-disk-params.txt \
test/data/bdev-drbd-disk.txt \
test/data/bdev-drbd-net-ip4.txt \
test/data/bdev-drbd-net-ip6.txt \
test/data/bdev-rbd/json_output_empty.txt \
test/data/bdev-rbd/json_output_extra_matches.txt \
test/data/bdev-rbd/json_output_no_matches.txt \
test/data/bdev-rbd/json_output_ok.txt \
test/data/bdev-rbd/plain_output_new_extra_matches.txt \
test/data/bdev-rbd/plain_output_new_no_matches.txt \
test/data/bdev-rbd/plain_output_new_ok.txt \
test/data/bdev-rbd/plain_output_old_empty.txt \
test/data/bdev-rbd/plain_output_old_extra_matches.txt \
test/data/bdev-rbd/plain_output_old_no_matches.txt \
test/data/bdev-rbd/plain_output_old_ok.txt \
test/data/bdev-rbd/output_invalid.txt \
test/data/cert1.pem \
test/data/cert2.pem \
test/data/cgroup_root/memory/lxc/instance1/memory.limit_in_bytes \
test/data/cgroup_root/cpuset/some_group/lxc/instance1/cpuset.cpus \
test/data/cgroup_root/devices/some_group/lxc/instance1/devices.list \
test/data/cluster_config_2.7.json \
test/data/cluster_config_2.8.json \
test/data/cluster_config_2.9.json \
test/data/cluster_config_2.10.json \
test/data/cluster_config_2.11.json \
test/data/cluster_config_2.12.json \
test/data/cluster_config_2.13.json \
test/data/cluster_config_2.14.json \
test/data/instance-minor-pairing.txt \
test/data/instance-disks.txt \
test/data/ip-addr-show-dummy0.txt \
test/data/ip-addr-show-lo-ipv4.txt \
test/data/ip-addr-show-lo-ipv6.txt \
test/data/ip-addr-show-lo-oneline-ipv4.txt \
test/data/ip-addr-show-lo-oneline-ipv6.txt \
test/data/ip-addr-show-lo-oneline.txt \
test/data/ip-addr-show-lo.txt \
test/data/kvm_0.12.5_help.txt \
test/data/kvm_0.15.90_help.txt \
test/data/kvm_0.9.1_help.txt \
test/data/kvm_0.9.1_help_boot_test.txt \
test/data/kvm_1.0_help.txt \
test/data/kvm_1.1.2_help.txt \
test/data/kvm_runtime.json \
test/data/lvs_lv.txt \
test/data/NEWS_OK.txt \
test/data/NEWS_previous_unreleased.txt \
test/data/ovfdata/compr_disk.vmdk.gz \
test/data/ovfdata/config.ini \
test/data/ovfdata/corrupted_resources.ovf \
test/data/ovfdata/empty.ini \
test/data/ovfdata/empty.ovf \
test/data/ovfdata/ganeti.mf \
test/data/ovfdata/ganeti.ovf \
test/data/ovfdata/gzip_disk.ovf \
test/data/ovfdata/new_disk.vmdk \
test/data/ovfdata/no_disk.ini \
test/data/ovfdata/no_disk_in_ref.ovf \
test/data/ovfdata/no_os.ini \
test/data/ovfdata/no_ovf.ova \
test/data/ovfdata/other/rawdisk.raw \
test/data/ovfdata/ova.ova \
test/data/ovfdata/rawdisk.raw \
test/data/ovfdata/second_disk.vmdk \
test/data/ovfdata/unsafe_path.ini \
test/data/ovfdata/virtualbox.ovf \
test/data/ovfdata/wrong_config.ini \
test/data/ovfdata/wrong_extension.ovd \
test/data/ovfdata/wrong_manifest.mf \
test/data/ovfdata/wrong_manifest.ovf \
test/data/ovfdata/wrong_ova.ova \
test/data/ovfdata/wrong_xml.ovf \
test/data/proc_cgroup.txt \
test/data/proc_diskstats.txt \
test/data/proc_drbd8.txt \
test/data/proc_drbd80-emptyline.txt \
test/data/proc_drbd80-emptyversion.txt \
test/data/proc_drbd83.txt \
test/data/proc_drbd83_sync.txt \
test/data/proc_drbd83_sync_want.txt \
test/data/proc_drbd83_sync_krnl2.6.39.txt \
test/data/proc_drbd84.txt \
test/data/proc_drbd84_emptyfirst.txt \
test/data/proc_drbd84_sync.txt \
test/data/proc_meminfo.txt \
test/data/proc_cpuinfo.txt \
test/data/qa-minimal-nodes-instances-only.json \
test/data/sys_drbd_usermode_helper.txt \
test/data/vgreduce-removemissing-2.02.02.txt \
test/data/vgreduce-removemissing-2.02.66-fail.txt \
test/data/vgreduce-removemissing-2.02.66-ok.txt \
test/data/vgs-missing-pvs-2.02.02.txt \
test/data/vgs-missing-pvs-2.02.66.txt \
test/data/xen-xl-list-4.4-crashed-instances.txt \
test/data/xen-xm-info-4.0.1.txt \
test/data/xen-xm-list-4.0.1-dom0-only.txt \
test/data/xen-xm-list-4.0.1-four-instances.txt \
test/data/xen-xm-list-long-4.0.1.txt \
test/data/xen-xm-uptime-4.0.1.txt \
test/py/ganeti-cli.test \
test/py/gnt-cli.test \
test/py/import-export_unittest-helper
python_tests = \
doc/examples/rapi_testutils.py \
test/py/cmdlib/backup_unittest.py \
test/py/cmdlib/cluster_unittest.py \
test/py/cmdlib/cmdlib_unittest.py \
test/py/cmdlib/group_unittest.py \
test/py/cmdlib/instance_unittest.py \
test/py/cmdlib/instance_migration_unittest.py \
test/py/cmdlib/instance_query_unittest.py \
test/py/cmdlib/instance_storage_unittest.py \
test/py/cmdlib/node_unittest.py \
test/py/cmdlib/test_unittest.py \
test/py/cfgupgrade_unittest.py \
test/py/docs_unittest.py \
test/py/ganeti.asyncnotifier_unittest.py \
test/py/ganeti.backend_unittest-runasroot.py \
test/py/ganeti.backend_unittest.py \
test/py/ganeti.bootstrap_unittest.py \
test/py/ganeti.cli_unittest.py \
test/py/ganeti.cli_opts_unittest.py \
test/py/ganeti.client.gnt_cluster_unittest.py \
test/py/ganeti.client.gnt_instance_unittest.py \
test/py/ganeti.client.gnt_job_unittest.py \
test/py/ganeti.compat_unittest.py \
test/py/ganeti.confd.client_unittest.py \
test/py/ganeti.config_unittest.py \
test/py/ganeti.constants_unittest.py \
test/py/ganeti.daemon_unittest.py \
test/py/ganeti.errors_unittest.py \
test/py/ganeti.hooks_unittest.py \
test/py/ganeti.ht_unittest.py \
test/py/ganeti.http_unittest.py \
test/py/ganeti.hypervisor.hv_chroot_unittest.py \
test/py/ganeti.hypervisor.hv_fake_unittest.py \
test/py/ganeti.hypervisor.hv_kvm_unittest.py \
test/py/ganeti.hypervisor.hv_lxc_unittest.py \
test/py/ganeti.hypervisor.hv_xen_unittest.py \
test/py/ganeti.hypervisor_unittest.py \
test/py/ganeti.impexpd_unittest.py \
test/py/ganeti.jqueue_unittest.py \
test/py/ganeti.jstore_unittest.py \
test/py/ganeti.locking_unittest.py \
test/py/ganeti.luxi_unittest.py \
test/py/ganeti.masterd.iallocator_unittest.py \
test/py/ganeti.masterd.instance_unittest.py \
test/py/ganeti.mcpu_unittest.py \
test/py/ganeti.netutils_unittest.py \
test/py/ganeti.objects_unittest.py \
test/py/ganeti.opcodes_unittest.py \
test/py/ganeti.outils_unittest.py \
test/py/ganeti.ovf_unittest.py \
test/py/ganeti.qlang_unittest.py \
test/py/ganeti.query_unittest.py \
test/py/ganeti.rapi.baserlib_unittest.py \
test/py/ganeti.rapi.client_unittest.py \
test/py/ganeti.rapi.resources_unittest.py \
test/py/ganeti.rapi.rlib2_unittest.py \
test/py/ganeti.rapi.testutils_unittest.py \
test/py/ganeti.rpc_unittest.py \
test/py/ganeti.rpc.client_unittest.py \
test/py/ganeti.runtime_unittest.py \
test/py/ganeti.serializer_unittest.py \
test/py/ganeti.server.rapi_unittest.py \
test/py/ganeti.ssconf_unittest.py \
test/py/ganeti.ssh_unittest.py \
test/py/ganeti.storage.bdev_unittest.py \
test/py/ganeti.storage.container_unittest.py \
test/py/ganeti.storage.drbd_unittest.py \
test/py/ganeti.storage.filestorage_unittest.py \
test/py/ganeti.storage.gluster_unittest.py \
test/py/ganeti.tools.burnin_unittest.py \
test/py/ganeti.tools.ensure_dirs_unittest.py \
test/py/ganeti.tools.node_daemon_setup_unittest.py \
test/py/ganeti.tools.prepare_node_join_unittest.py \
test/py/ganeti.uidpool_unittest.py \
test/py/ganeti.utils.algo_unittest.py \
test/py/ganeti.utils.filelock_unittest.py \
test/py/ganeti.utils.hash_unittest.py \
test/py/ganeti.utils.io_unittest-runasroot.py \
test/py/ganeti.utils.io_unittest.py \
test/py/ganeti.utils.log_unittest.py \
test/py/ganeti.utils.lvm_unittest.py \
test/py/ganeti.utils.mlock_unittest.py \
test/py/ganeti.utils.nodesetup_unittest.py \
test/py/ganeti.utils.process_unittest.py \
test/py/ganeti.utils.retry_unittest.py \
test/py/ganeti.utils.security_unittest.py \
test/py/ganeti.utils.storage_unittest.py \
test/py/ganeti.utils.text_unittest.py \
test/py/ganeti.utils.version_unittest.py \
test/py/ganeti.utils.wrapper_unittest.py \
test/py/ganeti.utils.x509_unittest.py \
test/py/ganeti.utils.bitarrays_unittest.py \
test/py/ganeti.utils_unittest.py \
test/py/ganeti.vcluster_unittest.py \
test/py/ganeti.workerpool_unittest.py \
test/py/pycurl_reset_unittest.py \
test/py/qa.qa_config_unittest.py \
test/py/tempfile_fork_unittest.py
python_test_support = \
test/py/__init__.py \
test/py/lockperf.py \
test/py/testutils_ssh.py \
test/py/mocks.py \
test/py/testutils/__init__.py \
test/py/testutils/config_mock.py \
test/py/cmdlib/__init__.py \
test/py/cmdlib/testsupport/__init__.py \
test/py/cmdlib/testsupport/cmdlib_testcase.py \
test/py/cmdlib/testsupport/iallocator_mock.py \
test/py/cmdlib/testsupport/livelock_mock.py \
test/py/cmdlib/testsupport/netutils_mock.py \
test/py/cmdlib/testsupport/pathutils_mock.py \
test/py/cmdlib/testsupport/processor_mock.py \
test/py/cmdlib/testsupport/rpc_runner_mock.py \
test/py/cmdlib/testsupport/ssh_mock.py \
test/py/cmdlib/testsupport/utils_mock.py \
test/py/cmdlib/testsupport/util.py \
test/py/cmdlib/testsupport/wconfd_mock.py
haskell_tests = test/hs/htest
dist_TESTS = \
test/py/check-cert-expired_unittest.bash \
test/py/daemon-util_unittest.bash \
test/py/systemd_unittest.bash \
test/py/ganeti-cleaner_unittest.bash \
test/py/import-export_unittest.bash \
test/py/cli-test.bash \
test/py/bash_completion.bash
if PY_UNIT
dist_TESTS += $(python_tests)
endif
nodist_TESTS =
check_SCRIPTS =
if WANT_HSTESTS
nodist_TESTS += $(haskell_tests)
# test dependency
test/hs/offline-test.sh: test/hs/hpc-htools test/hs/hpc-mon-collector
dist_TESTS += test/hs/offline-test.sh
check_SCRIPTS += \
test/hs/hpc-htools \
test/hs/hpc-mon-collector \
$(HS_BUILT_TEST_HELPERS)
endif
TESTS = $(dist_TESTS) $(nodist_TESTS)
# Environment for all tests
PLAIN_TESTS_ENVIRONMENT = \
PYTHONPATH=.:./test/py \
TOP_SRCDIR=$(abs_top_srcdir) TOP_BUILDDIR=$(abs_top_builddir) \
PYTHON=$(PYTHON) FAKEROOT=$(FAKEROOT_PATH) \
$(RUN_IN_TEMPDIR)
# Environment for tests run by automake
TESTS_ENVIRONMENT = \
$(PLAIN_TESTS_ENVIRONMENT) $(abs_top_srcdir)/autotools/testrunner
all_python_code = \
$(dist_sbin_SCRIPTS) \
$(python_scripts) \
$(pkglib_python_scripts) \
$(nodist_pkglib_python_scripts) \
$(nodist_tools_python_scripts) \
$(pkgpython_PYTHON) \
$(client_PYTHON) \
$(cmdlib_PYTHON) \
$(cmdlib_cluster_PYTHON) \
$(config_PYTHON) \
$(hypervisor_PYTHON) \
$(hypervisor_hv_kvm_PYTHON) \
$(jqueue_PYTHON) \
$(storage_PYTHON) \
$(rapi_PYTHON) \
$(server_PYTHON) \
$(rpc_PYTHON) \
$(rpc_stub_PYTHON) \
$(pytools_PYTHON) \
$(http_PYTHON) \
$(confd_PYTHON) \
$(masterd_PYTHON) \
$(impexpd_PYTHON) \
$(utils_PYTHON) \
$(watcher_PYTHON) \
$(noinst_PYTHON) \
$(qa_scripts)
if PY_UNIT
all_python_code += $(python_tests)
all_python_code += $(python_test_support)
all_python_code += $(python_test_utils)
endif
srclink_files = \
man/footer.rst \
test/py/check-cert-expired_unittest.bash \
test/py/daemon-util_unittest.bash \
test/py/systemd_unittest.bash \
test/py/ganeti-cleaner_unittest.bash \
test/py/import-export_unittest.bash \
test/py/cli-test.bash \
test/py/bash_completion.bash \
test/hs/offline-test.sh \
test/hs/cli-tests-defs.sh \
$(all_python_code) \
$(HS_LIBTEST_SRCS) $(HS_PROG_SRCS) \
$(docinput)
check_python_code = \
$(BUILD_BASH_COMPLETION) \
$(CHECK_IMPORTS) \
$(CHECK_HEADER) \
$(DOCPP) \
$(all_python_code)
lint_python_code = \
ganeti \
ganeti/http/server.py \
$(dist_sbin_SCRIPTS) \
$(python_scripts) \
$(pkglib_python_scripts) \
$(BUILD_BASH_COMPLETION) \
$(CHECK_IMPORTS) \
$(CHECK_HEADER) \
$(DOCPP) \
$(gnt_python_sbin_SCRIPTS) \
$(PYTHON_BOOTSTRAP)
standalone_python_modules = \
lib/rapi/client.py \
tools/ganeti-listrunner
pep8_python_code = \
ganeti \
ganeti/http/server.py \
$(dist_sbin_SCRIPTS) \
$(python_scripts) \
$(pkglib_python_scripts) \
$(BUILD_BASH_COMPLETION) \
$(CHECK_HEADER) \
$(DOCPP) \
$(PYTHON_BOOTSTRAP) \
$(gnt_python_sbin_SCRIPTS) \
qa \
$(python_test_support)
$(python_test_utils)
test/py/daemon-util_unittest.bash: daemons/daemon-util
test/py/systemd_unittest.bash: daemons/daemon-util $(BUILT_EXAMPLES)
test/py/ganeti-cleaner_unittest.bash: daemons/ganeti-cleaner
test/py/bash_completion.bash: doc/examples/bash_completion-debug
tools/kvm-ifup: tools/kvm-ifup.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
chmod +x $@
tools/kvm-ifup-os: tools/ifup-os.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) -e "s/ifup-os:/kvm-ifup-os:/" < $< > $@
chmod +x $@
tools/xen-ifup-os: tools/ifup-os.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) -e "s/ifup-os:/xen-ifup-os:/" < $< > $@
chmod +x $@
tools/vif-ganeti: tools/vif-ganeti.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
chmod +x $@
tools/vif-ganeti-metad: tools/vif-ganeti-metad.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
chmod +x $@
tools/net-common: tools/net-common.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
chmod +x $@
tools/users-setup: Makefile $(userspecs)
set -e; \
{ echo '#!/bin/sh'; \
echo 'if [ "x$$1" != "x--yes-do-it" ];'; \
echo 'then echo "This will do the following changes"'; \
$(AWK) -- '{print "echo + Will add group ",$$1; count++}\
END {if (count == 0) {print "echo + No groups to add"}}' doc/users/groups; \
$(AWK) -- '{if (NF > 1) {print "echo + Will add user",$$1,"with primary group",$$2} \
else {print "echo + Will add user",$$1}; count++}\
END {if (count == 0) {print "echo + No users to add"}}' doc/users/users; \
$(AWK) -- '{print "echo + Will add user",$$1,"to group",$$2}' doc/users/groupmemberships; \
echo 'echo'; \
echo 'echo "OK? (y/n)"'; \
echo 'read confirm'; \
echo 'if [ "x$$confirm" != "xy" ]; then exit 0; fi'; \
echo 'fi'; \
$(AWK) -- '{print "groupadd --system",$$1}' doc/users/groups; \
$(AWK) -- '{if (NF > 1) {print "useradd --system --gid",$$2,$$1} else {print "useradd --system",$$1}}' doc/users/users; \
$(AWK) -- '{print "usermod --append --groups",$$2,$$1}' doc/users/groupmemberships; \
} > $@
chmod +x $@
tools/vcluster-setup: tools/vcluster-setup.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
chmod +x $@
daemons/%:: daemons/%.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
chmod +x $@
doc/examples/%:: doc/examples/%.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
doc/examples/bash_completion: BC_ARGS = --compact
doc/examples/bash_completion-debug: BC_ARGS =
doc/examples/bash_completion doc/examples/bash_completion-debug: \
$(BUILD_BASH_COMPLETION) $(RUN_IN_TEMPDIR) \
lib/cli.py $(gnt_scripts) $(client_PYTHON) tools/burnin \
daemons/ganeti-cleaner \
$(GENERATED_FILES) $(HS_GENERATED_FILES)
PYTHONPATH=. $(RUN_IN_TEMPDIR) \
$(CURDIR)/$(BUILD_BASH_COMPLETION) $(BC_ARGS) > $@
man/%.gen: man/%.rst lib/query.py lib/build/sphinx_ext.py \
lib/build/shell_example_lexer.py \
| $(RUN_IN_TEMPDIR) $(built_python_sources)
@echo "Checking $< for hardcoded paths..."
@if grep -nEf autotools/wrong-hardcoded-paths $<; then \
echo "Man page $< has hardcoded paths (see above)!" 1>&2 ; \
exit 1; \
fi
set -e ; \
trap 'echo auto-removing $@; rm $@' EXIT; \
PYTHONPATH=. $(RUN_IN_TEMPDIR) $(CURDIR)/$(DOCPP) < $< > $@ ;\
$(CHECK_MAN_REFERENCES) $@; \
trap - EXIT
man/%.7.in man/%.8.in man/%.1.in: man/%.gen man/footer.rst
@test -n "$(PANDOC)" || \
{ echo 'pandoc' not found during configure; exit 1; }
set -o pipefail -e; \
trap 'echo auto-removing $@; rm $@' EXIT; \
$(PANDOC) -s -f rst -t man $< man/footer.rst | \
sed -e 's/\\@/@/g' > $@; \
if test -n "$(MAN_HAS_WARNINGS)"; then LC_ALL=en_US.UTF-8 $(CHECK_MAN_WARNINGS) $@; fi; \
$(CHECK_MAN_DASHES) $@; \
trap - EXIT
man/%.html.in: man/%.gen man/footer.rst
@test -n "$(PANDOC)" || \
{ echo 'pandoc' not found during configure; exit 1; }
set -o pipefail ; \
$(PANDOC) --toc -s -f rst -t html $< man/footer.rst | \
sed -e 's/\\@/@/g' > $@
man/%: man/%.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
epydoc.conf: epydoc.conf.in $(REPLACE_VARS_SED)
sed -f $(REPLACE_VARS_SED) < $< > $@
vcs-version:
if test -d .git; then \
git describe | tr '"' - > $@; \
elif test ! -f $@ ; then \
echo "Cannot auto-generate $@ file"; exit 1; \
fi
.PHONY: clean-vcs-version
clean-vcs-version:
rm -f vcs-version
.PHONY: regen-vcs-version
regen-vcs-version:
@set -e; \
cd $(srcdir); \
if test -d .git; then \
T=`mktemp` ; trap 'rm -f $$T' EXIT; \
git describe > $$T; \
if ! cmp --quiet $$T vcs-version; then \
mv $$T vcs-version; \
fi; \
fi
src/Ganeti/Version.hs: src/Ganeti/Version.hs.in \
vcs-version $(built_base_sources)
set -e; \
VCSVER=`cat $(abs_top_srcdir)/vcs-version`; \
sed -e 's"%ver%"'"$$VCSVER"'"' < $< > $@
src/Ganeti/Hs2Py/ListConstants.hs: src/Ganeti/Hs2Py/ListConstants.hs.in \
src/Ganeti/Constants.hs \
| stamp-directories
@echo Generating $@
@set -e; \
## Extract constant names from 'Constants.hs' by extracting the left
## side of all lines containing an equal sign (i.e., '=') and
## prepending the apostrophe sign (i.e., "'").
##
## For example, the constant
## adminstDown = ...
## becomes
## 'adminstDown
NAMES=$$(sed -e "/^--/ d" $(abs_top_srcdir)/src/Ganeti/Constants.hs |\
sed -n -e "/=/ s/\(.*\) =.*/ '\1:/g p"); \
m4 -DPY_CONSTANT_NAMES="$$NAMES" \
$(abs_top_srcdir)/src/Ganeti/Hs2Py/ListConstants.hs.in > $@
src/Ganeti/Curl/Internal.hs: src/Ganeti/Curl/Internal.hsc | stamp-directories
hsc2hs -o $@ $<
test/hs/Test/Ganeti/TestImports.hs: test/hs/Test/Ganeti/TestImports.hs.in \
$(built_base_sources)
set -e; \
{ cat $< ; \
echo ; \
for name in $(filter-out Ganeti.THH,$(subst /,.,$(patsubst %.hs,%,$(patsubst src/%,%,$(HS_LIB_SRCS))))) ; do \
echo "import $$name ()" ; \
done ; \
} > $@
lib/_constants.py: Makefile src/hs2py lib/_constants.py.in | stamp-directories
cat $(abs_top_srcdir)/lib/_constants.py.in > $@
src/hs2py --constants >> $@
lib/constants.py: lib/_constants.py
src/AutoConf.hs: Makefile src/AutoConf.hs.in $(PRINT_PY_CONSTANTS) \
| $(built_base_sources)
@echo "m4 ... >" $@
@m4 -DPACKAGE_VERSION="$(PACKAGE_VERSION)" \
-DVERSION_MAJOR="$(VERSION_MAJOR)" \
-DVERSION_MINOR="$(VERSION_MINOR)" \
-DVERSION_REVISION="$(VERSION_REVISION)" \
-DVERSION_SUFFIX="$(VERSION_SUFFIX)" \
-DVERSION_FULL="$(VERSION_FULL)" \
-DDIRVERSION="$(DIRVERSION)" \
-DLOCALSTATEDIR="$(localstatedir)" \
-DSYSCONFDIR="$(sysconfdir)" \
-DSSH_CONFIG_DIR="$(SSH_CONFIG_DIR)" \
-DSSH_LOGIN_USER="$(SSH_LOGIN_USER)" \
-DSSH_CONSOLE_USER="$(SSH_CONSOLE_USER)" \
-DEXPORT_DIR="$(EXPORT_DIR)" \
-DBACKUP_DIR="$(backup_dir)" \
-DOS_SEARCH_PATH="\"$(OS_SEARCH_PATH)\"" \
-DES_SEARCH_PATH="\"$(ES_SEARCH_PATH)\"" \
-DXEN_BOOTLOADER="$(XEN_BOOTLOADER)" \
-DXEN_CONFIG_DIR="$(XEN_CONFIG_DIR)" \
-DXEN_KERNEL="$(XEN_KERNEL)" \
-DXEN_INITRD="$(XEN_INITRD)" \
-DKVM_KERNEL="$(KVM_KERNEL)" \
-DSHARED_FILE_STORAGE_DIR="$(SHARED_FILE_STORAGE_DIR)" \
-DIALLOCATOR_SEARCH_PATH="\"$(IALLOCATOR_SEARCH_PATH)\"" \
-DDEFAULT_BRIDGE="$(DEFAULT_BRIDGE)" \
-DDEFAULT_VG="$(DEFAULT_VG)" \
-DKVM_PATH="$(KVM_PATH)" \
-DIP_PATH="$(IP_PATH)" \
-DSOCAT_PATH="$(SOCAT)" \
-DPYTHON_PATH="$(PYTHON)" \
-DSOCAT_USE_ESCAPE="$(SOCAT_USE_ESCAPE)" \
-DSOCAT_USE_COMPRESS="$(SOCAT_USE_COMPRESS)" \
-DLVM_STRIPECOUNT="$(LVM_STRIPECOUNT)" \
-DTOOLSDIR="$(libdir)/ganeti/tools" \
-DGNT_SCRIPTS="$(foreach i,$(notdir $(gnt_scripts)),\"$(i)\":)" \
-DHS_HTOOLS_PROGS="$(foreach i,$(HS_HTOOLS_PROGS),\"$(i)\":)" \
-DPKGLIBDIR="$(libdir)/ganeti" \
-DSHAREDIR="$(prefix)/share/ganeti" \
-DVERSIONEDSHAREDIR="$(versionedsharedir)" \
-DDRBD_BARRIERS="$(DRBD_BARRIERS)" \
-DDRBD_NO_META_FLUSH="$(DRBD_NO_META_FLUSH)" \
-DSYSLOG_USAGE="$(SYSLOG_USAGE)" \
-DDAEMONS_GROUP="$(DAEMONS_GROUP)" \
-DADMIN_GROUP="$(ADMIN_GROUP)" \
-DMASTERD_USER="$(MASTERD_USER)" \
-DMASTERD_GROUP="$(MASTERD_GROUP)" \
-DMETAD_USER="$(METAD_USER)" \
-DMETAD_GROUP="$(METAD_GROUP)" \
-DRAPI_USER="$(RAPI_USER)" \
-DRAPI_GROUP="$(RAPI_GROUP)" \
-DCONFD_USER="$(CONFD_USER)" \
-DCONFD_GROUP="$(CONFD_GROUP)" \
-DWCONFD_USER="$(WCONFD_USER)" \
-DWCONFD_GROUP="$(WCONFD_GROUP)" \
-DKVMD_USER="$(KVMD_USER)" \
-DKVMD_GROUP="$(KVMD_GROUP)" \
-DLUXID_USER="$(LUXID_USER)" \
-DLUXID_GROUP="$(LUXID_GROUP)" \
-DNODED_USER="$(NODED_USER)" \
-DNODED_GROUP="$(NODED_GROUP)" \
-DMOND_USER="$(MOND_USER)" \
-DMOND_GROUP="$(MOND_GROUP)" \
-DDISK_SEPARATOR="$(DISK_SEPARATOR)" \
-DQEMUIMG_PATH="$(QEMUIMG_PATH)" \
-DXEN_CMD="$(XEN_CMD)" \
-DENABLE_RESTRICTED_COMMANDS="$(ENABLE_RESTRICTED_COMMANDS)" \
-DENABLE_METADATA="$(ENABLE_METADATA)" \
-DENABLE_MOND="$(ENABLE_MOND)" \
-DHAS_GNU_LN="$(HAS_GNU_LN)" \
-DMAN_PAGES="$$(for i in $(notdir $(man_MANS)); do \
echo -n "$$i" | sed -re 's/^(.*)\.([0-9]+)$$/("\1",\2):/g'; \
done)" \
-DAF_INET4="$$(PYTHONPATH=. $(PYTHON) $(PRINT_PY_CONSTANTS) AF_INET4)" \
-DAF_INET6="$$(PYTHONPATH=. $(PYTHON) $(PRINT_PY_CONSTANTS) AF_INET6)" \
$(abs_top_srcdir)/src/AutoConf.hs.in > $@
lib/_vcsversion.py: Makefile vcs-version | stamp-directories
set -e; \
VCSVER=`cat $(abs_top_srcdir)/vcs-version`; \
{ echo '# This file is automatically generated, do not edit!'; \
echo '#'; \
echo ''; \
echo '"""Build-time VCS version number for Ganeti.'; \
echo '';\
echo 'This file is autogenerated by the build process.'; \
echo 'For any changes you need to re-run ./configure (and'; \
echo 'not edit by hand).'; \
echo ''; \
echo '"""'; \
echo ''; \
echo '# pylint: disable=C0301,C0324'; \
echo '# because this is autogenerated, we do not want'; \
echo '# style warnings' ; \
echo ''; \
echo "VCS_VERSION = '$$VCSVER'"; \
} > $@
lib/opcodes.py: Makefile src/hs2py lib/opcodes.py.in_before \
lib/opcodes.py.in_after | stamp-directories
cat $(abs_top_srcdir)/lib/opcodes.py.in_before > $@
src/hs2py --opcodes >> $@
cat $(abs_top_srcdir)/lib/opcodes.py.in_after >> $@
# Generating the RPC wrappers depends on many things, so make sure
# it's built at the end of the built sources
lib/_generated_rpc.py: lib/rpc_defs.py $(BUILD_RPC) | $(built_base_sources) $(built_python_base_sources)
PYTHONPATH=. $(RUN_IN_TEMPDIR) $(CURDIR)/$(BUILD_RPC) lib/rpc_defs.py > $@
if ENABLE_METADATA
lib/rpc/stub/metad.py: Makefile src/hs2py | stamp-directories
src/hs2py --metad-rpc > $@
endif
lib/rpc/stub/wconfd.py: Makefile src/hs2py | stamp-directories
src/hs2py --wconfd-rpc > $@
$(SHELL_ENV_INIT): Makefile stamp-directories
set -e; \
{ echo '# Allow overriding for tests'; \
echo 'readonly LOCALSTATEDIR=$${LOCALSTATEDIR:-$${GANETI_ROOTDIR:-}$(localstatedir)}'; \
echo 'readonly SYSCONFDIR=$${SYSCONFDIR:-$${GANETI_ROOTDIR:-}$(sysconfdir)}'; \
echo; \
echo 'readonly PKGLIBDIR=$(libdir)/ganeti'; \
echo 'readonly LOG_DIR="$$LOCALSTATEDIR/log/ganeti"'; \
echo 'readonly RUN_DIR="$$LOCALSTATEDIR/run/ganeti"'; \
echo 'readonly DATA_DIR="$$LOCALSTATEDIR/lib/ganeti"'; \
echo 'readonly CONF_DIR="$$SYSCONFDIR/ganeti"'; \
} > $@
## Writes sed script to replace placeholders with build-time values. The
## additional quotes after the first @ sign are necessary to stop configure
## from replacing those values as well.
$(REPLACE_VARS_SED): $(SHELL_ENV_INIT) Makefile stamp-directories
set -e; \
{ echo 's#@''PREFIX@#$(prefix)#g'; \
echo 's#@''SYSCONFDIR@#$(sysconfdir)#g'; \
echo 's#@''LOCALSTATEDIR@#$(localstatedir)#g'; \
echo 's#@''BINDIR@#$(BINDIR)#g'; \
echo 's#@''SBINDIR@#$(SBINDIR)#g'; \
echo 's#@''LIBDIR@#$(libdir)#g'; \
echo 's#@''GANETI_VERSION@#$(PACKAGE_VERSION)#g'; \
echo 's#@''CUSTOM_XEN_BOOTLOADER@#$(XEN_BOOTLOADER)#g'; \
echo 's#@''CUSTOM_XEN_KERNEL@#$(XEN_KERNEL)#g'; \
echo 's#@''CUSTOM_XEN_INITRD@#$(XEN_INITRD)#g'; \
echo 's#@''CUSTOM_IALLOCATOR_SEARCH_PATH@#$(IALLOCATOR_SEARCH_PATH)#g'; \
echo 's#@''CUSTOM_EXPORT_DIR@#$(EXPORT_DIR)#g'; \
echo 's#@''RPL_SSH_INITD_SCRIPT@#$(SSH_INITD_SCRIPT)#g'; \
echo 's#@''PKGLIBDIR@#$(libdir)/ganeti#g'; \
echo 's#@''GNTMASTERUSER@#$(MASTERD_USER)#g'; \
echo 's#@''GNTRAPIUSER@#$(RAPI_USER)#g'; \
echo 's#@''GNTCONFDUSER@#$(CONFD_USER)#g'; \
echo 's#@''GNTWCONFDUSER@#$(WCONFD_USER)#g'; \
echo 's#@''GNTLUXIDUSER@#$(LUXID_USER)#g'; \
echo 's#@''GNTNODEDUSER@#$(NODED_USER)#g'; \
echo 's#@''GNTMONDUSER@#$(MOND_USER)#g'; \
echo 's#@''GNTRAPIGROUP@#$(RAPI_GROUP)#g'; \
echo 's#@''GNTADMINGROUP@#$(ADMIN_GROUP)#g'; \
echo 's#@''GNTCONFDGROUP@#$(CONFD_GROUP)#g'; \
echo 's#@''GNTNODEDGROUP@#$(NODED_GROUP)#g'; \
echo 's#@''GNTWCONFDGROUP@#$(CONFD_GROUP)#g'; \
echo 's#@''GNTLUXIDGROUP@#$(LUXID_GROUP)#g'; \
echo 's#@''GNTMASTERDGROUP@#$(MASTERD_GROUP)#g'; \
echo 's#@''GNTMONDGROUP@#$(MOND_GROUP)#g'; \
echo 's#@''GNTDAEMONSGROUP@#$(DAEMONS_GROUP)#g'; \
echo 's#@''CUSTOM_ENABLE_MOND@#$(ENABLE_MOND)#g'; \
echo 's#@''MODULES@#$(strip $(lint_python_code))#g'; \
echo 's#@''XEN_CONFIG_DIR@#$(XEN_CONFIG_DIR)#g'; \
echo; \
echo '/^@SHELL_ENV_INIT@$$/ {'; \
echo ' r $(SHELL_ENV_INIT)'; \
echo ' d'; \
echo '}'; \
} > $@
# Using deferred evaluation
daemons/ganeti-%: MODULE = ganeti.server.$(patsubst ganeti-%,%,$(notdir $@))
daemons/ganeti-watcher: MODULE = ganeti.watcher
scripts/%: MODULE = ganeti.client.$(subst -,_,$(notdir $@))
tools/burnin: MODULE = ganeti.tools.burnin
tools/ensure-dirs: MODULE = ganeti.tools.ensure_dirs
tools/node-daemon-setup: MODULE = ganeti.tools.node_daemon_setup
tools/prepare-node-join: MODULE = ganeti.tools.prepare_node_join
tools/ssh-update: MODULE = ganeti.tools.ssh_update
tools/node-cleanup: MODULE = ganeti.tools.node_cleanup
tools/ssl-update: MODULE = ganeti.tools.ssl_update
$(HS_BUILT_TEST_HELPERS): TESTROLE = $(patsubst test/hs/%,%,$@)
$(PYTHON_BOOTSTRAP) $(gnt_scripts) $(gnt_python_sbin_SCRIPTS): Makefile | stamp-directories
test -n "$(MODULE)" || { echo Missing module; exit 1; }
set -e; \
{ echo '#!${PYTHON}'; \
echo '# This file is automatically generated, do not edit!'; \
echo "# Edit $(MODULE) instead."; \
echo; \
echo '"""Bootstrap script for L{$(MODULE)}"""'; \
echo; \
echo '# pylint: disable=C0103'; \
echo '# C0103: Invalid name'; \
echo; \
echo 'import sys'; \
echo 'import $(MODULE) as main'; \
echo; \
echo '# Temporarily alias commands until bash completion'; \
echo '# generator is changed'; \
echo 'if hasattr(main, "commands"):'; \
echo ' commands = main.commands # pylint: disable=E1101'; \
echo 'if hasattr(main, "aliases"):'; \
echo ' aliases = main.aliases # pylint: disable=E1101'; \
echo; \
echo 'if __name__ == "__main__":'; \
echo ' sys.exit(main.Main())'; \
} > $@
chmod u+x $@
$(HS_BUILT_TEST_HELPERS): Makefile
@test -n "$(TESTROLE)" || { echo Missing TESTROLE; exit 1; }
set -e; \
{ echo '#!/bin/sh'; \
echo '# This file is automatically generated, do not edit!'; \
echo "# Edit Makefile.am instead."; \
echo; \
echo "HTOOLS=$(TESTROLE) exec ./test/hs/hpc-htools \"\$$@\""; \
} > $@
chmod u+x $@
stamp-directories: Makefile
$(MAKE) $(AM_MAKEFLAGS) ganeti
@mkdir_p@ $(DIRS) $(BUILDTIME_DIR_AUTOCREATE)
touch $@
# We need to create symlinks because "make distcheck" will not install Python
# files when building.
stamp-srclinks: Makefile | stamp-directories
set -e; \
for i in $(srclink_files); do \
if test ! -f $$i -a -f $(abs_top_srcdir)/$$i; then \
$(LN_S) $(abs_top_srcdir)/$$i $$i; \
fi; \
done
touch $@
.PHONY: ganeti
ganeti:
cd $(top_builddir) && test -h "$@" || { rm -f $@ && $(LN_S) lib $@; }
.PHONY: check-dirs
check-dirs: $(GENERATED_FILES)
@set -e; \
find . -type d \( -name . -o -name .git -prune -o -print \) | { \
error=; \
while read dir; do \
case "$$dir" in \
$(strip $(patsubst %,(./%) ;;,$(DIRCHECK_EXCLUDE) $(DIRS))) \
*) error=1; echo "Directory $$dir not listed in Makefile" >&2 ;; \
esac; \
done; \
for dir in $(DIRS); do \
if ! test -d "$$dir"; then \
echo "Directory $$dir listed in DIRS does not exist" >&2; \
error=1; \
fi \
done; \
test -z "$$error"; \
}
.PHONY: check-news
check-news:
RELEASE=$(PACKAGE_VERSION) $(CHECK_NEWS) < $(top_srcdir)/NEWS
.PHONY: check-local
check-local: check-dirs check-news $(GENERATED_FILES)
$(CHECK_PYTHON_CODE) $(check_python_code)
PYTHONPATH=. $(CHECK_HEADER) \
$(filter-out $(GENERATED_FILES),$(check_python_code))
$(CHECK_VERSION) $(VERSION) $(top_srcdir)/NEWS
PYTHONPATH=. $(RUN_IN_TEMPDIR) $(CURDIR)/$(CHECK_IMPORTS) . $(standalone_python_modules)
error= ; \
if [ "x`echo $(VERSION_SUFFIX)|grep 'alpha'`" == "x" ]; then \
expver=$(VERSION_MAJOR).$(VERSION_MINOR); \
if test "`head -n 1 $(top_srcdir)/README`" != "Ganeti $$expver"; then \
echo "Incorrect version in README, expected $$expver" >&2; \
error=1; \
fi; \
for file in doc/iallocator.rst doc/hooks.rst doc/virtual-cluster.rst \
doc/security.rst; do \
if test "`sed -ne '4 p' $(top_srcdir)/$$file`" != \
"Documents Ganeti version $$expver"; then \
echo "Incorrect version in $$file, expected $$expver" >&2; \
error=1; \
fi; \
done; \
if ! test -f $(top_srcdir)/doc/design-$$expver.rst; then \
echo "File $(top_srcdir)/doc/design-$$expver.rst not found" >&2; \
error=1; \
fi; \
if test "`sed -ne '5 p' $(top_srcdir)/doc/design-draft.rst`" != \
".. Last updated for Ganeti $$expver"; then \
echo "doc/design-draft.rst was not updated for version $$expver" >&2; \
error=1; \
fi; \
fi; \
for file in configure.ac $(HS_LIBTEST_SRCS) $(HS_PROG_SRCS); do \
if test $$(wc --max-line-length < $(top_srcdir)/$$file) -gt 80; then \
echo "Longest line in $$file is longer than 80 characters" >&2; \
error=1; \
fi; \
done; \
test -z "$$error"
.PHONY: hs-test-%
hs-test-%: test/hs/htest
@rm -f htest.tix
test/hs/htest -t $*
.PHONY: hs-tests
hs-tests: test/hs/htest
@rm -f htest.tix
./test/hs/htest
.PHONY: py-tests
py-tests: $(python_tests) ganeti $(built_python_sources)
error=; \
for file in $(python_tests); \
do if ! $(TESTS_ENVIRONMENT) $$file; then error=1; fi; \
done; \
test -z "$$error"
.PHONY: hs-shell-%
hs-shell-%: test/hs/hpc-htools test/hs/hpc-mon-collector \
$(HS_BUILT_TEST_HELPERS)
@rm -f hpc-htools.tix hpc-mon-collector.tix
HBINARY="./test/hs/hpc-htools" \
SHELLTESTARGS=$(SHELLTESTARGS) \
./test/hs/offline-test.sh $*
.PHONY: hs-shell
hs-shell: test/hs/hpc-htools test/hs/hpc-mon-collector $(HS_BUILT_TEST_HELPERS)
@rm -f hpc-htools.tix hpc-mon-collector.tix
HBINARY="./test/hs/hpc-htools" \
SHELLTESTARGS=$(SHELLTESTARGS) \
./test/hs/offline-test.sh
.PHONY: hs-check
hs-check: hs-tests hs-shell
# E111: indentation is not a multiple of four
# E121: continuation line indentation is not a multiple of four
# (since our indent level is not 4)
# E125: continuation line does not distinguish itself from next logical line
# (since our indent level is not 4)
# E123: closing bracket does not match indentation of opening bracket's line
# E127: continuation line over-indented for visual indent
# (since our indent level is not 4)
# note: do NOT add E128 here; it's a valid style error in most cases!
# I've seen real errors, but also some cases were we indent wrongly
# due to line length; try to rework the cases where it is triggered,
# instead of silencing it
# E261: at least two spaces before inline comment
# E501: line too long (80 characters)
PEP8_IGNORE = E111,E121,E123,E125,E127,E261,E501
# For excluding pep8 expects filenames only, not whole paths
PEP8_EXCLUDE = $(subst $(space),$(comma),$(strip $(notdir $(built_python_sources))))
# A space-separated list of pylint warnings to completely ignore:
# I0013 = disable warnings for ignoring whole files
LINT_DISABLE = I0013
# Additional pylint options
LINT_OPTS =
# The combined set of pylint options
LINT_OPTS_ALL = $(LINT_OPTS) \
$(addprefix --disable=,$(LINT_DISABLE))
LINT_TARGETS = pylint pylint-qa pylint-test
if HAS_PEP8
LINT_TARGETS += pep8
endif
if HAS_HLINT
LINT_TARGETS += hlint
endif
.PHONY: lint
lint: $(LINT_TARGETS)
.PHONY: pylint
pylint: $(GENERATED_FILES)
@test -n "$(PYLINT)" || { echo 'pylint' not found during configure; exit 1; }
$(PYLINT) $(LINT_OPTS_ALL) $(lint_python_code)
.PHONY: pylint-qa
pylint-qa: $(GENERATED_FILES)
@test -n "$(PYLINT)" || { echo 'pylint' not found during configure; exit 1; }
cd $(top_srcdir)/qa && \
PYTHONPATH=$(abs_top_srcdir) $(PYLINT) $(LINT_OPTS_ALL) \
--rcfile ../pylintrc $(patsubst qa/%.py,%,$(qa_scripts))
# FIXME: lint all test code, not just the newly added test support
pylint-test: $(GENERATED_FILES)
@test -n "$(PYLINT)" || { echo 'pylint' not found during configure; exit 1; }
cd $(top_srcdir) && \
PYTHONPATH=.:./test/py $(PYLINT) $(LINT_OPTS_ALL) \
--rcfile=pylintrc-test $(python_test_support) $(python_test_utils)
.PHONY: pep8
pep8: $(GENERATED_FILES)
@test -n "$(PEP8)" || { echo 'pep8' not found during configure; exit 1; }
$(PEP8) --ignore='$(PEP8_IGNORE)' --exclude='$(PEP8_EXCLUDE)' \
--repeat $(pep8_python_code)
# FIXME: remove ignore "Use void" when GHC 6.x is deprecated
HLINT_EXCLUDES = src/Ganeti/THH.hs test/hs/hpc-htools.hs
.PHONY: hlint
hlint: $(HS_BUILT_SRCS) src/lint-hints.hs
@test -n "$(HLINT)" || { echo 'hlint' not found during configure; exit 1; }
@rm -f doc/hs-lint.html
if tty -s; then C="-c"; else C=""; fi; \
$(HLINT) --utf8 --report=doc/hs-lint.html --cross $$C \
--ignore "Use first" \
--ignore "Use &&&" \
--ignore "Use &&" \
--ignore "Use void" \
--ignore "Reduce duplication" \
--ignore "Use import/export shortcut" \
--hint src/lint-hints \
--cpp-file=$(HASKELL_PACKAGE_VERSIONS_FILE) \
$(filter-out $(HLINT_EXCLUDES),$(HS_LIBTEST_SRCS) $(HS_PROG_SRCS))
@if [ ! -f doc/hs-lint.html ]; then \
echo "All good" > doc/hs-lint.html; \
fi
# a dist hook rule for updating the vcs-version file; this is
# hardcoded due to where it needs to build the file...
dist-hook:
$(MAKE) $(AM_MAKEFLAGS) regen-vcs-version
rm -f $(top_distdir)/vcs-version
cp -p $(srcdir)/vcs-version $(top_distdir)
# a distcheck hook rule for catching revision control directories
distcheck-hook:
if find $(top_distdir) -name .svn -or -name .git | grep .; then \
echo "Found revision control files in final archive." 1>&2; \
exit 1; \
fi
if find $(top_distdir) -name '*.py[co]' | grep .; then \
echo "Found Python byte code in final archive." 1>&2; \
exit 1; \
fi
if find $(top_distdir) -name '*~' | grep .; then \
echo "Found backup files in final archive." 1>&2; \
exit 1; \
fi
# Empty files or directories should not be distributed. They can cause
# unnecessary warnings for packagers. Directories used by automake during
# distcheck must be excluded.
if find $(top_distdir) -empty -and -not \( \
-path $(top_distdir)/_build -or \
-path $(top_distdir)/_inst \) | grep .; then \
echo "Found empty files or directories in final archive." 1>&2; \
exit 1; \
fi
if test -e $(top_distdir)/doc/man-html; then \
echo "Found documentation including man pages in final archive" >&2; \
exit 1; \
fi
# Backwards compatible distcheck-release target
distcheck-release: distcheck
distrebuildcheck: dist
set -e; \
builddir=$$(mktemp -d $(abs_srcdir)/distrebuildcheck.XXXXXXX); \
trap "echo Removing $$builddir; cd $(abs_srcdir); rm -rf $$builddir" EXIT; \
cd $$builddir; \
tar xzf $(abs_srcdir)/$(distdir).tar.gz; \
cd $(distdir); \
./configure; \
$(MAKE) maintainer-clean; \
cp $(abs_srcdir)/vcs-version .; \
./configure; \
$(MAKE) $(AM_MAKEFLAGS)
dist-release: dist
set -e; \
for i in $(DIST_ARCHIVES); do \
echo -n "Checking $$i ... "; \
autotools/check-tar < $$i; \
echo OK; \
done
install-exec-local:
@mkdir_p@ "$(DESTDIR)${localstatedir}/lib/ganeti" \
"$(DESTDIR)${localstatedir}/log/ganeti" \
"$(DESTDIR)${localstatedir}/run/ganeti"
for dir in $(SYMLINK_TARGET_DIRS); do \
@mkdir_p@ $(DESTDIR)$$dir; \
done
$(LN_S) -f $(sysconfdir)/ganeti/lib $(DESTDIR)$(defaultversiondir)
$(LN_S) -f $(sysconfdir)/ganeti/share $(DESTDIR)$(defaultversionedsharedir)
for prog in $(HS_BIN_ROLES); do \
$(LN_S) -f $(defaultversiondir)$(BINDIR)/$$prog $(DESTDIR)$(BINDIR)/$$prog; \
done
$(LN_S) -f $(defaultversiondir)$(libdir)/ganeti/iallocators/hail $(DESTDIR)$(libdir)/ganeti/iallocators/hail
for prog in $(all_sbin_scripts); do \
$(LN_S) -f $(defaultversiondir)$(SBINDIR)/$$prog $(DESTDIR)$(SBINDIR)/$$prog; \
done
for prog in $(gnt_scripts_basenames); do \
$(LN_S) -f $(defaultversionedsharedir)/$$prog $(DESTDIR)$(SBINDIR)/$$prog; \
done
for prog in $(pkglib_python_basenames); do \
$(LN_S) -f $(defaultversionedsharedir)/$$prog $(DESTDIR)$(libdir)/ganeti/$$prog; \
done
for prog in $(tools_python_basenames); do \
$(LN_S) -f $(defaultversionedsharedir)/$$prog $(DESTDIR)$(libdir)/ganeti/tools/$$prog; \
done
for prog in $(tools_basenames); do \
$(LN_S) -f $(defaultversiondir)/$(libdir)/ganeti/tools/$$prog $(DESTDIR)$(libdir)/ganeti/tools/$$prog; \
done
if ! test -n '$(ENABLE_MANPAGES)'; then \
for man in $(manfullpath); do \
$(LN_S) -f $(defaultversionedsharedir)/root$(MANDIR)/$$man $(DESTDIR)$(MANDIR)/$$man; \
done; \
fi
for prog in $(myexeclib_scripts_basenames); do \
$(LN_S) -f $(defaultversiondir)$(libdir)/ganeti/$$prog $(DESTDIR)$(libdir)/ganeti/$$prog; \
done
if INSTALL_SYMLINKS
$(LN_S) -f $(versionedsharedir) $(DESTDIR)$(sysconfdir)/ganeti/share
$(LN_S) -f $(versiondir) $(DESTDIR)$(sysconfdir)/ganeti/lib
endif
.PHONY: apidoc
if WANT_HSAPIDOC
apidoc: py-apidoc hs-apidoc
else
apidoc: py-apidoc
endif
.PHONY: py-apidoc
py-apidoc: epydoc.conf $(RUN_IN_TEMPDIR) $(GENERATED_FILES)
env - PATH="$$PATH" PYTHONPATH="$$PYTHONPATH" \
$(RUN_IN_TEMPDIR) epydoc -v \
--conf $(CURDIR)/epydoc.conf \
--output $(CURDIR)/$(APIDOC_PY_DIR)
.PHONY: hs-apidoc
hs-apidoc: $(APIDOC_HS_DIR)/index.html
$(APIDOC_HS_DIR)/index.html: $(HS_LIBTESTBUILT_SRCS) Makefile
@test -n "$(HSCOLOUR)" || \
{ echo 'HsColour' not found during configure; exit 1; }
@test -n "$(HADDOCK)" || \
{ echo 'haddock' not found during configure; exit 1; }
rm -rf $(APIDOC_HS_DIR)/*
for i in $(ALL_APIDOC_HS_DIRS); do \
@mkdir_p@ $$i; \
$(HSCOLOUR) -print-css > $$i/hscolour.css; \
done
set -e ; \
export LC_ALL=en_US.UTF-8; \
OPTGHC="--optghc=-isrc --optghc=-itest/hs"; \
OPTGHC="$$OPTGHC --optghc=-optP-include --optghc=-optP$(HASKELL_PACKAGE_VERSIONS_FILE)"; \
for file in $(HS_LIBTESTBUILT_SRCS); do \
f_nosrc=$${file##src/}; \
f_notst=$${f_nosrc##test/hs/}; \
f_html=$${f_notst%%.hs}.html; \
$(HSCOLOUR) -css -anchor $$file > $(APIDOC_HS_DIR)/$$f_html ; \
done ; \
$(HADDOCK) --odir $(APIDOC_HS_DIR) --html --hoogle --ignore-all-exports -w \
-t ganeti -p src/haddock-prologue \
--source-module="%{MODULE/.//}.html" \
--source-entity="%{MODULE/.//}.html#%{NAME}" \
$$OPTGHC \
$(HS_LIBTESTBUILT_SRCS)
.PHONY: TAGS
TAGS: $(GENERATED_FILES)
rm -f TAGS
$(GHC) -e ":etags TAGS_hs" -v0 \
$(filter-out -O -Werror,$(HFLAGS)) \
-osuf tags.o \
-hisuf tags.hi \
-lcurl \
$(HS_LIBTEST_SRCS)
find . -path './lib/*.py' -o -path './scripts/gnt-*' -o \
-path './daemons/ganeti-*' -o -path './tools/*' -o \
-path './qa/*.py' | \
etags --etags-include=TAGS_hs -L -
.PHONY: coverage
COVERAGE_TESTS=
if HS_UNIT
COVERAGE_TESTS += hs-coverage
endif
if PY_UNIT
COVERAGE_TESTS += py-coverage
endif
coverage: $(COVERAGE_TESTS)
test/py/docs_unittest.py: $(gnt_scripts)
.PHONY: py-coverage
py-coverage: $(GENERATED_FILES) $(python_tests)
@test -n "$(PYCOVERAGE)" || \
{ echo 'python-coverage' not found during configure; exit 1; }
set -e; \
COVERAGE=$(PYCOVERAGE) \
COVERAGE_FILE=$(CURDIR)/$(COVERAGE_PY_DIR)/data \
TEXT_COVERAGE=$(CURDIR)/$(COVERAGE_PY_DIR)/report.txt \
HTML_COVERAGE=$(CURDIR)/$(COVERAGE_PY_DIR) \
$(PLAIN_TESTS_ENVIRONMENT) \
$(abs_top_srcdir)/autotools/gen-py-coverage \
$(python_tests)
.PHONY: hs-coverage
hs-coverage: $(haskell_tests) test/hs/hpc-htools test/hs/hpc-mon-collector
rm -f *.tix
$(MAKE) $(AM_MAKEFLAGS) hs-check
@mkdir_p@ $(COVERAGE_HS_DIR)
hpc sum --union $(HPCEXCL) \
htest.tix hpc-htools.tix hpc-mon-collector.tix > coverage-hs.tix
hpc markup --destdir=$(COVERAGE_HS_DIR) coverage-hs.tix
hpc report coverage-hs.tix | tee $(COVERAGE_HS_DIR)/report.txt
$(LN_S) -f hpc_index.html $(COVERAGE_HS_DIR)/index.html
# Special "kind-of-QA" target for htools, needs special setup (all
# tools compiled with -fhpc)
.PHONY: live-test
live-test: all
set -e ; \
cd src; \
rm -f .hpc; $(LN_S) ../.hpc .hpc; \
rm -f *.tix *.mix; \
./live-test.sh; \
hpc sum --union $(HPCEXCL) $(addsuffix .tix,$(HS_PROGS:src/%=%)) \
--output=live-test.tix ; \
@mkdir_p@ ../$(COVERAGE_HS_DIR) ; \
hpc markup --destdir=../$(COVERAGE_HS_DIR) live-test \
--srcdir=.. $(HPCEXCL) ; \
hpc report --srcdir=.. live-test $(HPCEXCL)
commit-check: autotools-check distcheck lint apidoc
autotools-check:
TESTDATA_DIR=./test/data shelltest $(SHELLTESTARGS) \
$(abs_top_srcdir)/test/autotools/*-*.test \
-- --hide-successes
.PHONY: gitignore-check
gitignore-check:
@if [ -n "`git status --short`" ]; then \
echo "Git status is not clean!" 1>&2 ; \
git status --short; \
exit 1; \
fi
# target to rebuild all man pages (both groff and html output)
.PHONY: man
man: $(man_MANS) $(manhtml)
CABAL_EXECUTABLES = $(HS_DEFAULT_PROGS)
CABAL_EXECUTABLES_HS = $(patsubst %,%.hs,$(CABAL_EXECUTABLES))
CABAL_EXECUTABLES_APPS_STAMPS = $(patsubst src/%,apps/%.hs.stamp,$(patsubst test/hs/%,apps/%.hs.stamp,$(CABAL_EXECUTABLES)))
# Executable symlinks
apps/%.hs.stamp: Makefile
mkdir -p apps
rm -f $(basename $@)
ln -s ../$(filter %/$(basename $(notdir $@)),$(CABAL_EXECUTABLES_HS)) $(basename $@)
touch $@
# Builds the cabal file
ganeti.cabal: cabal/ganeti.template.cabal Makefile cabal/cabal-from-modules.py $(CABAL_EXECUTABLES_APPS_STAMPS)
@echo $(subst /,.,$(patsubst %.hs,%,$(patsubst test/hs/%,%,$(patsubst src/%,%,$(HS_SRCS))))) \
| python $(abs_top_srcdir)/cabal/cabal-from-modules.py $(abs_top_srcdir)/cabal/ganeti.template.cabal > $@
for p in $(CABAL_EXECUTABLES); do \
echo >> $@; \
echo "executable `basename $$p`" >> $@; \
echo " hs-source-dirs: apps" >> $@; \
echo " main-is: `basename $$p`.hs" >> $@; \
echo " default-language: Haskell2010" >> $@; \
echo " build-depends:" >> $@; \
echo " base" >> $@; \
echo " , ganeti" >> $@; \
if [ $$p == test/hs/htest ]; then \
echo " , hslogger" >> $@; \
echo " , test-framework" >> $@; \
elif [ $$p == src/rpc-test ]; then \
echo " , json" >> $@; \
fi \
done
# Target that builds all binaries (including those that are not
# rebuilt except when running the tests)
.PHONY: really-all
really-all: all $(check_SCRIPTS) $(haskell_tests) $(HS_ALL_PROGS)
# we don't need the ancient implicit rules:
%: %,v
%: RCS/%,v
%: RCS/%
%: s.%
%: SCCS/s.%
-include ./Makefile.local
# support inspecting the value of a make variable
print-%:
@echo $($*)
# vim: set noet :
ganeti-2.15.2/NEWS 0000644 0000000 0000000 00000575363 12634264163 0013577 0 ustar 00root root 0000000 0000000 News
====
Version 2.15.2
--------------
*(Released Wed, 16 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- Allow disk aittachment to diskless instances
- Reduce memory footprint: Compute lock allocation strictly
- Calculate correct affected nodes set in InstanceChangeGroup
(Issue 1144)
- Reduce memory footprint: Don't keep input for error messages
- Use bulk-adding of keys in renew-crypto
- Reduce memory footprint: Send answers strictly
- Reduce memory footprint: Store keys as ByteStrings
- Reduce memory footprint: Encode UUIDs as ByteStrings
- Do not retry all requests after connection timeouts to prevent
repeated job submission
- Fix reason trails of expanding opcodes
- Make lockConfig call retryable
- Extend timeout for gnt-cluster renew-crypto
- Return the correct error code in the post-upgrade script
- Make OpenSSL refrain from DH altogether
- Fix faulty iallocator type check
- Improve cfgupgrade output in case of errors
- Fix upgrades of instances with missing creation time
- Support force option for deactivate disks on RAPI
- Make htools tolerate missing "dtotal" and "dfree" on luxi
- Fix default for --default-iallocator-params
- Renew-crypto: stop daemons on master node first
- Don't warn about broken SSH setup of offline nodes (Issue 1131)
- Fix computation in network blocks
- At IAlloc backend guess state from admin state
- Set node tags in iallocator htools backend
- Only search for Python-2 interpreters
- Handle Xen 4.3 states better
- Improve xl socat migrations
Version 2.15.1
--------------
*(Released Mon, 7 Sep 2015)*
New features
~~~~~~~~~~~~
- The ext template now allows userspace-only disks to be used
Bugfixes
~~~~~~~~
- Fixed the silently broken 'gnt-instance replace-disks --ignore-ipolicy'
command.
- User shutdown reporting can now be disabled on Xen using the
'--user-shutdown' flag.
- Remove falsely reported communication NIC error messages on instance start.
- Fix 'gnt-node migrate' behavior when no instances are present on a node.
- Fix the multi-allocation functionality for non-DRBD instances.
Version 2.15.0
--------------
*(Released Wed, 29 Jul 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- In order to improve allocation efficiency when using DRBD, the cluster
metric now takes the total reserved memory into account. A consequence
of this change is that the best possible cluster metric is no longer 0.
htools(1) interprets minimal cluster scores to be offsets of the theoretical
lower bound, so only users interpreting the cluster score directly should
be affected.
- This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun 'gnt-cluster renew-crypto --new-node-certificates'
after the cluster is fully upgraded to 2.14.1
New features
~~~~~~~~~~~~
- On dedicated clusters, hail will now favour allocations filling up
nodes efficiently over balanced allocations.
New dependencies
~~~~~~~~~~~~~~~~
- The indirect dependency on Haskell package 'case-insensitive' is now
explicit.
Version 2.15.0 rc1
------------------
*(Released Wed, 17 Jun 2015)*
This was the first release candidate in the 2.15 series. All important
changes are listed in the latest 2.15 entry.
Known issues:
~~~~~~~~~~~~~
- Issue 1094: differences in encodings in SSL certificates due to
different OpenSSL versions can result in rendering a cluster
uncommunicative after a master-failover.
Version 2.15.0 beta1
--------------------
*(Released Thu, 30 Apr 2015)*
This was the second beta release in the 2.15 series. All important changes
are listed in the latest 2.15 entry.
Version 2.14.2
--------------
*(Released Tue, 15 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- Allow disk attachment to diskless instances
- Calculate correct affected nodes set in InstanceChangeGroup
(Issue 1144)
- Do not retry all requests after connection timeouts to prevent
repeated job submission
- Fix reason trails of expanding opcodes
- Make lockConfig call retryable
- Extend timeout for gnt-cluster renew-crypto
- Return the correct error code in the post-upgrade script
- Make OpenSSL refrain from DH altogether
- Fix faulty iallocator type check
- Improve cfgupgrade output in case of errors
- Fix upgrades of instances with missing creation time
- Make htools tolerate missing "dtotal" and "dfree" on luxi
- Fix default for --default-iallocator-params
- Renew-crypto: stop daemons on master node first
- Don't warn about broken SSH setup of offline nodes (Issue 1131)
- At IAlloc backend guess state from admin state
- Set node tags in iallocator htools backend
- Only search for Python-2 interpreters
- Handle Xen 4.3 states better
- Improve xl socat migrations
- replace-disks: fix --ignore-ipolicy
- Fix disabling of user shutdown reporting
- Allow userspace-only disk templates
- Fix instance failover in case of DTS_EXT_MIRROR
- Fix operations on empty nodes by accepting allocation of 0 jobs
- Fix instance multi allocation for non-DRBD disks
- Redistribute master key on downgrade
- Allow more failover options when using the --no-disk-moves flag
Version 2.14.1
--------------
*(Released Fri, 10 Jul 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The SSH security changes reduced the number of nodes which can SSH into
other nodes. Unfortunately enough, the Ganeti implementation of migration
for the xl stack of Xen required SSH to be able to migrate the instance,
leading to a situation where full movement of an instance around the cluster
was not possible. This version fixes the issue by using socat to transfer
instance data. While socat is less secure than SSH, it is about as secure as
xm migrations, and occurs over the secondary network if present. As a
consequence of this change, Xen instance migrations using xl cannot occur
between nodes running 2.14.0 and 2.14.1.
- This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun 'gnt-cluster renew-crypto --new-node-certificates'
after the cluster is fully upgraded to 2.14.1
Other Changes
~~~~~~~~~~~~~
- The ``htools`` now properly work also on shared-storage clusters.
- Instance moves now work properly also for the plain disk template.
- Filter-evaluation for run-time data filter was fixed (issue 1100).
- Various improvements to the documentation have been added.
Version 2.14.0
--------------
*(Released Tue, 2 Jun 2015)*
New features
~~~~~~~~~~~~
- The build system now enforces external Haskell dependencies to lie in
a supported range as declared by our new ganeti.cabal file.
- Basic support for instance reservations has been added. Instance addition
supports a --forthcoming option telling Ganeti to only reserve the resources
but not create the actual instance. The instance can later be created with
by passing the --commit option to the instance addition command.
- Node tags starting with htools:nlocation: now have a special meaning to htools(1).
They control between which nodes migration is possible, e.g., during hypervisor
upgrades. See hbal(1) for details.
- The node-allocation lock as been removed for good, thus speeding up parallel
instance allocation and creation.
- The external storage interface has been extended by optional ``open``
and ``close`` scripts.
New dependencies
~~~~~~~~~~~~~~~~
- Building the Haskell part of Ganeti now requires Cabal and cabal-install.
Known issues
~~~~~~~~~~~~
- Under certain conditions instance doesn't get unpaused after live
migration (issue #1050)
Since 2.14.0 rc1
~~~~~~~~~~~~~~~~
- The call to the IAllocator in 'gnt-node evacuate' has been fixed.
- In opportunistic locking, only ask for those node resource locks where
the node lock is held.
- Lock requests are repeatable now; this avoids failure of a job in a
race condition with a signal sent to the job.
- Various improvements to the QA.
Version 2.14.0 rc2
------------------
*(Released Tue, 19 May 2015)*
This was the second release candidate in the 2.14 series. All important
changes are listed in the 2.14.0 entry.
Since 2.14.0 rc1
~~~~~~~~~~~~~~~~
- private parameters are now properly exported to instance create scripts
- unnecessary config unlocks and upgrades have been removed, improving
performance, in particular of cluster verification
- some rarely occuring file-descriptor leaks have been fixed
- The checks for orphan and lost volumes have been fixed to also work
correctly when multiple volume groups are used.
Version 2.14.0 rc1
------------------
*(Released Wed, 29 Apr 2015)*
This was the first release candidate in the 2.14 series. All important
changes are listed in the latest 2.14 entry.
Since 2.14.0 beta2
~~~~~~~~~~~~~~~~~~
The following issue has been fixed:
- A race condition where a badly timed kill of WConfD could lead to
an incorrect configuration.
Fixes inherited from the 2.12 branch:
- Upgrade from old versions (2.5 and 2.6) was failing (issues 1070, 1019).
- gnt-network info outputs wrong external reservations (issue 1068)
- Refuse to demote master from master capability (issue 1023)
Fixes inherited from the 2.13 branch:
- bugs related to ssh-key handling of master candidate (issues 1045, 1046, 1047)
Version 2.14.0 beta2
--------------------
*(Released Thu, 26 Mar 2015)*
This was the second beta release in the 2.14 series. All important changes
are listed in the latest 2.14 entry.
Since 2.14.0 beta1
~~~~~~~~~~~~~~~~~~
The following issues have been fixed:
- Issue 1018: Cluster init (and possibly other jobs) occasionally fail to start
The extension of the external storage interface was not present in 2.14.0 beta1.
Version 2.14.0 beta1
--------------------
*(Released Fri, 13 Feb 2015)*
This was the first beta release of the 2.14 series. All important changes
are listed in the latest 2.14 entry.
Version 2.13.3
--------------
*(Released Mon, 14 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- Calculate correct affected nodes set in InstanceChangeGroup
(Issue 1144)
- Do not retry all requests after connection timeouts to prevent
repeated job submission
- Fix reason trails of expanding opcodes
- Make lockConfig call retryable
- Extend timeout for gnt-cluster renew-crypto
- Return the correct error code in the post-upgrade script
- Make OpenSSL refrain from DH altogether
- Fix upgrades of instances with missing creation time
- Make htools tolerate missing "dtotal" and "dfree" on luxi
- Fix default for --default-iallocator-params
- Renew-crypto: stop daemons on master node first
- Don't warn about broken SSH setup of offline nodes (Issue 1131)
- At IAlloc backend guess state from admin state
- Only search for Python-2 interpreters
- Handle Xen 4.3 states better
- Improve xl socat migrations
- replace-disks: fix --ignore-ipolicy
- Fix disabling of user shutdown reporting
- Fix operations on empty nodes by accepting allocation of 0 jobs
- Fix instance multi allocation for non-DRBD disks
- Redistribute master key on downgrade
- Allow more failover options when using the --no-disk-moves flag
Version 2.13.2
--------------
*(Released Mon, 13 Jul 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun 'gnt-cluster renew-crypto --new-node-certificates'
after the cluster is fully upgraded to 2.13.2
Other fixes and known issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Inherited from 2.12:
- Fixed Issue #1115: Race between starting WConfD and updating the config
- Fixed Issue #1114: Binding RAPI to a specific IP makes the watcher
restart the RAPI
- Fixed Issue #1100: Filter-evaluation for run-time data filter
- Better handling of the "crashed" Xen state
- The watcher can be instructed to skip disk verification
- Reduce amount of logging on successful requests
- Prevent multiple communication NICs being created for instances
- The ``htools`` now properly work also on shared-storage clusters
- Instance moves now work properly also for the plain disk template
- Various improvements to the documentation have been added
Known issues:
- Issue #1104: gnt-backup: dh key too small
Version 2.13.1
--------------
*(Released Tue, 16 Jun 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The SSH security changes reduced the number of nodes which can SSH into
other nodes. Unfortunately enough, the Ganeti implementation of migration
for the xl stack of Xen required SSH to be able to migrate the instance,
leading to a situation where full movement of an instance around the cluster
was not possible. This version fixes the issue by using socat to transfer
instance data. While socat is less secure than SSH, it is about as secure as
xm migrations, and occurs over the secondary network if present. As a
consequence of this change, Xen instance migrations using xl cannot occur
between nodes running 2.13.0 and 2.13.1.
Other fixes and known issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Inherited from 2.12:
- Fixed Issue #1082: RAPI is unresponsive after master-failover
- Fixed Issue #1083: Cluster verify reports existing instance disks on
non-default VGs as missing
- Fixed Issue #1101: Modifying the storage directory for the shared-file disk
template doesn't work
- Fixed a possible file descriptor leak when forking jobs
- Fixed missing private parameters in the environment for OS scripts
- Fixed a performance regression when handling configuration
(only upgrade it if it changes)
- Adapt for compilation with GHC7.8 (compiles with warnings;
cherrypicked from 2.14)
Known issues:
- Issue #1094: Mismatch in SSL encodings breaks RPC communication
- Issue #1104: Export fails: key is too small
Version 2.13.0
--------------
*(Released Tue, 28 Apr 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Ganeti now internally retries the instance creation opcode if opportunistic
locking did not acquire nodes with enough free resources. The internal retry
will not use opportunistic locking. In particular, instance creation, even
if opportunistic locking is set, will never fail with ECODE_TEMP_NORES.
- The handling of SSH security had undergone a significant change. From
this version on, each node has an individual SSH key pair instead of
sharing one with all nodes of the cluster. From now on, we also
restrict SSH access to master candidates. This means that only master
candidates can ssh into other cluster nodes and all
non-master-candidates cannot. Refer to the UPGRADE notes
for further instructions on the creation and distribution of the keys.
- Ganeti now checks hypervisor version compatibility before trying an instance
migration. It errors out if the versions are not compatible. Add the option
--ignore-hvversions to restore the old behavior of only warning.
- Node tags starting with htools:migration: or htools:allowmigration: now have
a special meaning to htools(1). See hbal(1) for details.
- The LXC hypervisor code has been repaired and improved. Instances cannot be
migrated and cannot have more than one disk, but should otherwise work as with
other hypervisors. OS script changes should not be necessary. LXC version
1.0.0 or higher required.
New features
~~~~~~~~~~~~
- A new job filter rules system allows to define iptables-like rules for the
job scheduler, making it easier to (soft-)drain the job queue, perform
maintenance, and rate-limit selected job types. See gnt-filter(8) for
details.
- Ganeti jobs can now be ad-hoc rate limited via the reason trail.
For a set of jobs queued with "--reason=rate-limit:n:label", the job
scheduler ensures that not more than n will be scheduled to run at the same
time. See ganeti(7), section "Options", for details.
- The monitoring daemon has now variable sleep times for the data
collectors. This currently means that the granularity of cpu-avg-load
can be configured.
- The 'gnt-cluster verify' command now has the option
'--verify-ssh-clutter', which verifies whether Ganeti (accidentally)
cluttered up the 'authorized_keys' file.
- Instance disks can now be converted from one disk template to another for many
different template combinations. When available, more efficient conversions
will be used, otherwise the disks are simply copied over.
New dependencies
~~~~~~~~~~~~~~~~
- The monitoring daemon uses the PSQueue library. Be sure to install it
if you use Mond.
- The formerly optional regex-pcre is now an unconditional dependency because
the new job filter rules have regular expressions as a core feature.
Since 2.13.0 rc1
~~~~~~~~~~~~~~~~~~
The following issues have been fixed:
- Bugs related to ssh-key handling of master candidates (issues 1045,
1046, 1047)
Fixes inherited from the 2.12 branch:
- Upgrade from old versions (2.5 and 2.6) was failing (issues 1070, 1019).
- gnt-network info outputs wrong external reservations (issue 1068)
- Refuse to demote master from master capability (issue 1023)
Version 2.13.0 rc1
------------------
*(Released Wed, 25 Mar 2015)*
This was the first release candidate of the 2.13 series.
All important changes are listed in the latest 2.13 entry.
Since 2.13.0 beta1
~~~~~~~~~~~~~~~~~~
The following issues have been fixed:
- Issue 1018: Cluster init (and possibly other jobs) occasionally fail to start
Version 2.13.0 beta1
--------------------
*(Released Wed, 14 Jan 2015)*
This was the first beta release of the 2.13 series. All important changes
are listed in the latest 2.13 entry.
Version 2.12.6
--------------
*(Released Mon, 14 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- Calculate correct affected nodes set in InstanceChangeGroup
(Issue 1144)
- Do not retry all requests after connection timeouts to prevent
repeated job submission
- Fix reason trails of expanding opcodes
- Make lockConfig call retryable
- Return the correct error code in the post-upgrade script
- Make OpenSSL refrain from DH altogether
- Fix upgrades of instances with missing creation time
- Make htools tolerate missing "dtotal" and "dfree" on luxi
- Fix default for --default-iallocator-params
- At IAlloc backend guess state from admin state
- Only search for Python-2 interpreters
- Handle Xen 4.3 states better
- replace-disks: fix --ignore-ipolicy
- Fix disabling of user shutdown reporting
- Fix operations on empty nodes by accepting allocation of 0 jobs
- Fix instance multi allocation for non-DRBD disks
- Allow more failover options when using the --no-disk-moves flag
Version 2.12.5
--------------
*(Released Mon, 13 Jul 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun 'gnt-cluster renew-crypto --new-node-certificates'
after the cluster is fully upgraded to 2.12.5.
Fixed and improvements
~~~~~~~~~~~~~~~~~~~~~~
- Fixed Issue #1030: GlusterFS support breaks at upgrade to 2.12 -
switches back to shared-file
- Fixed Issue #1094 (see the notice in Incompatible/important changes):
Differences in encodings of SSL certificates can render a cluster
uncommunicative after a master-failover
- Fixed Issue #1098: Support for ECDSA SSH keys
- Fixed Issue #1100: Filter-evaluation for run-time data filter
- Fixed Issue #1101: Modifying the storage directory for the shared-file
disk template doesn't work
- Fixed Issue #1108: Spurious "NIC name already used" errors during
instance creation
- Fixed Issue #1114: Binding RAPI to a specific IP makes the watcher
restart the RAPI
- Fixed Issue #1115: Race between starting WConfD and updating the config
- Better handling of the "crashed" Xen state
- The ``htools`` now properly work also on shared-storage clusters
- Various improvements to the documentation have been added
Inherited from the 2.11 branch:
- Fixed Issue #1113: Reduce amount of logging on successful requests
Known issues
~~~~~~~~~~~~
- Issue #1104: gnt-backup: dh key too small
Version 2.12.4
--------------
*(Released Tue, 12 May 2015)*
- Fixed Issue #1082: RAPI is unresponsive after master-failover
- Fixed Issue #1083: Cluster verify reports existing instance disks on
non-default VGs as missing
- Fixed a possible file descriptor leak when forking jobs
- Fixed missing private parameters in the environment for OS scripts
- Fixed a performance regression when handling configuration
(only upgrade it if it changes)
- Adapt for compilation with GHC7.8 (compiles with warnings;
cherrypicked from 2.14)
Known issues
~~~~~~~~~~~~
Pending since 2.12.2:
- Under certain conditions instance doesn't get unpaused after live
migration (issue #1050)
- GlusterFS support breaks at upgrade to 2.12 - switches back to
shared-file (issue #1030)
Version 2.12.3
--------------
*(Released Wed, 29 Apr 2015)*
- Fixed Issue #1019: upgrade from 2.6.2 to 2.12 fails. cfgupgrade
doesn't migrate the config.data file properly
- Fixed Issue 1023: Master master-capable option bug
- Fixed Issue 1068: gnt-network info outputs wrong external reservations
- Fixed Issue 1070: Upgrade of Ganeti 2.5.2 to 2.12.0 fails due to
missing UUIDs for disks
- Fixed Issue 1073: ssconf_hvparams_* not distributed with ssconf
Inherited from the 2.11 branch:
- Fixed Issue 1032: Renew-crypto --new-node-certificates sometimes does not
complete.
The operation 'gnt-cluster renew-crypto --new-node-certificates' is
now more robust against intermitten reachability errors. Nodes that
are temporarily not reachable, are contacted with several retries.
Nodes which are marked as offline are omitted right away.
Inherited from the 2.10 branch:
- Fixed Issue 1057: master-failover succeeds, but IP remains assigned to
old master
- Fixed Issue 1058: Python's os.minor() does not support devices with
high minor numbers
- Fixed Issue 1059: Luxid fails if DNS returns an IPv6 address that does
not reverse resolve
Known issues
~~~~~~~~~~~~
Pending since 2.12.2:
- GHC 7.8 introduced some incompatible changes, so currently Ganeti
2.12. doesn't compile on GHC 7.8
- Under certain conditions instance doesn't get unpaused after live
migration (issue #1050)
- GlusterFS support breaks at upgrade to 2.12 - switches back to
shared-file (issue #1030)
Version 2.12.2
--------------
*(Released Wed, 25 Mar 2015)*
- Support for the lens Haskell library up to version 4.7 (issue #1028)
- SSH keys are now distributed only to master and master candidates
(issue #377)
- Improved performance for operations that frequently read the
cluster configuration
- Improved robustness of spawning job processes that occasionally caused
newly-started jobs to timeout
- Fixed race condition during cluster verify which occasionally caused
it to fail
Inherited from the 2.11 branch:
- Fix failing automatic glusterfs mounts (issue #984)
- Fix watcher failing to read its status file after an upgrade
(issue #1022)
- Improve Xen instance state handling, in particular of somewhat exotic
transitional states
Inherited from the 2.10 branch:
- Fix failing to change a diskless drbd instance to plain
(issue #1036)
- Fixed issues with auto-upgrades from pre-2.6
(hv_state_static and disk_state_static)
- Fix memory leak in the monitoring daemon
Inherited from the 2.9 branch:
- Fix file descriptor leak in Confd client
Known issues
~~~~~~~~~~~~
- GHC 7.8 introduced some incompatible changes, so currently Ganeti
2.12. doesn't compile on GHC 7.8
- Under certain conditions instance doesn't get unpaused after live
migration (issue #1050)
- GlusterFS support breaks at upgrade to 2.12 - switches back to
shared-file (issue #1030)
Version 2.12.1
--------------
*(Released Wed, 14 Jan 2015)*
- Fix users under which the wconfd and metad daemons run (issue #976)
- Clean up stale livelock files (issue #865)
- Fix setting up the metadata daemon's network interface for Xen
- Make watcher identify itself on disk activation
- Add "ignore-ipolicy" option to gnt-instance grow-disk
- Check disk size ipolicy during "gnt-instance grow-disk" (issue #995)
Inherited from the 2.11 branch:
- Fix counting votes when doing master failover (issue #962)
- Fix broken haskell dependencies (issues #758 and #912)
- Check if IPv6 is used directly when running SSH (issue #892)
Inherited from the 2.10 branch:
- Fix typo in gnt_cluster output (issue #1015)
- Use the Python path detected at configure time in the top-level Python
scripts.
- Fix check for sphinx-build from python2-sphinx
- Properly check if an instance exists in 'gnt-instance console'
Version 2.12.0
--------------
*(Released Fri, 10 Oct 2014)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Ganeti is now distributed under the 2-clause BSD license.
See the COPYING file.
- Do not use debug mode in production. Certain daemons will issue warnings
when launched in debug mode. Some debug logging violates some of the new
invariants in the system (see "New features"). The logging has been kept as
it aids diagnostics and development.
New features
~~~~~~~~~~~~
- OS install script parameters now come in public, private and secret
varieties:
- Public parameters are like all other parameters in Ganeti.
- Ganeti will not log private and secret parameters, *unless* it is running
in debug mode.
- Ganeti will not save secret parameters to configuration. Secret parameters
must be supplied every time you install, or reinstall, an instance.
- Attempting to override public parameters with private or secret parameters
results in an error. Similarly, you may not use secret parameters to
override private parameters.
- The move-instance tool can now attempt to allocate an instance by using
opportunistic locking when an iallocator is used.
- The build system creates sample systemd unit files, available under
doc/examples/systemd. These unit files allow systemd to natively
manage and supervise all Ganeti processes.
- Different types of compression can be applied during instance moves, including
user-specified ones.
- Ganeti jobs now run as separate processes. The jobs are coordinated by
a new daemon "WConfd" that manages cluster's configuration and locks
for individual jobs. A consequence is that more jobs can run in parallel;
the number is run-time configurable, see "New features" entry
of 2.11.0. To avoid luxid being overloaded with tracking running jobs, it
backs of and only occasionally, in a sequential way, checks if jobs have
finished and schedules new ones. In this way, luxid keeps responsive under
high cluster load. The limit as when to start backing of is also run-time
configurable.
- The metadata daemon is now optionally available, as part of the
partial implementation of the OS-installs design. It allows pass
information to OS install scripts or to instances.
It is also possible to run Ganeti without the daemon, if desired.
- Detection of user shutdown of instances has been implemented for Xen
as well.
New dependencies
~~~~~~~~~~~~~~~~
- The KVM CPU pinning no longer uses the affinity python package, but psutil
instead. The package is still optional and needed only if the feature is to
be used.
Incomplete features
~~~~~~~~~~~~~~~~~~~
The following issues are related to features which are not completely
implemented in 2.12:
- Issue 885: Network hotplugging on KVM sometimes makes an instance
unresponsive
- Issues 708 and 602: The secret parameters are currently still written
to disk in the job queue.
- Setting up the metadata network interface under Xen isn't fully
implemented yet.
Known issues
~~~~~~~~~~~~
- *Wrong UDP checksums in DHCP network packets:*
If an instance communicates with the metadata daemon and uses DHCP to
obtain its IP address on the provided virtual network interface,
it can happen that UDP packets have a wrong checksum, due to
a bug in virtio. See for example https://bugs.launchpad.net/bugs/930962
Ganeti works around this bug by disabling the UDP checksums on the way
from a host to instances (only on the special metadata communication
network interface) using the ethtool command. Therefore if using
the metadata daemon the host nodes should have this tool available.
- The metadata daemon is run as root in the split-user mode, to be able
to bind to port 80.
This should be improved in future versions, see issue #949.
Since 2.12.0 rc2
~~~~~~~~~~~~~~~~
The following issues have been fixed:
- Fixed passing additional parameters to RecreateInstanceDisks over
RAPI.
- Fixed the permissions of WConfd when running in the split-user mode.
As WConfd takes over the previous master daemon to manage the
configuration, it currently runs under the masterd user.
- Fixed the permissions of the metadata daemon wn running in the
split-user mode (see Known issues).
- Watcher now properly adds a reason trail entry when initiating disk
checks.
- Fixed removing KVM parameters introduced in 2.12 when downgrading a
cluster to 2.11: "migration_caps", "disk_aio" and "virtio_net_queues".
- Improved retrying of RPC calls that fail due to network errors.
Version 2.12.0 rc2
------------------
*(Released Mon, 22 Sep 2014)*
This was the second release candidate of the 2.12 series.
All important changes are listed in the latest 2.12 entry.
Since 2.12.0 rc1
~~~~~~~~~~~~~~~~
The following issues have been fixed:
- Watcher now checks if WConfd is running and functional.
- Watcher now properly adds reason trail entries.
- Fixed NIC options in Xen's config files.
Inherited from the 2.10 branch:
- Fixed handling of the --online option
- Add warning against hvparam changes with live migrations, which might
lead to dangerous situations for instances.
- Only the LVs in the configured VG are checked during cluster verify.
Version 2.12.0 rc1
------------------
*(Released Wed, 20 Aug 2014)*
This was the first release candidate of the 2.12 series.
All important changes are listed in the latest 2.12 entry.
Since 2.12.0 beta1
~~~~~~~~~~~~~~~~~~
The following issues have been fixed:
- Issue 881: Handle communication errors in mcpu
- Issue 883: WConfd leaks memory for some long operations
- Issue 884: Under heavy load the IAllocator fails with a "missing
instance" error
Inherited from the 2.10 branch:
- Improve the recognition of Xen domU states
- Automatic upgrades:
- Create the config backup archive in a safe way
- On upgrades, check for upgrades to resume first
- Pause watcher during upgrade
- Allow instance disks to be added with --no-wait-for-sync
Version 2.12.0 beta1
--------------------
*(Released Mon, 21 Jul 2014)*
This was the first beta release of the 2.12 series. All important changes
are listed in the latest 2.12 entry.
Version 2.11.8
--------------
*(Released Mon, 14 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- Make htools tolerate missing "dtotal" and "dfree" on luxi
- Fix default for --default-iallocator-params
- At IAlloc backend guess state from admin state
- replace-disks: fix --ignore-ipolicy
- Fix instance multi allocation for non-DRBD disks
- Trigger renew-crypto on downgrade to 2.11
- Downgrade log-message for rereading job
- Downgrade log-level for successful requests
- Check for gnt-cluster before running gnt-cluster upgrade
Version 2.11.7
--------------
*(Released Fri, 17 Apr 2015)*
- The operation 'gnt-cluster renew-crypto --new-node-certificates' is
now more robust against intermitten reachability errors. Nodes that
are temporarily not reachable, are contacted with several retries.
Nodes which are marked as offline are omitted right away.
Version 2.11.6
--------------
*(Released Mon, 22 Sep 2014)*
- Ganeti is now distributed under the 2-clause BSD license.
See the COPYING file.
- Fix userspace access checks.
- Various documentation fixes have been added.
Inherited from the 2.10 branch:
- The --online option now works as documented.
- The watcher is paused during cluster upgrades; also, upgrade
checks for upgrades to resume first.
- Instance disks can be added with --no-wait-for-sync.
Version 2.11.5
--------------
*(Released Thu, 7 Aug 2014)*
Inherited from the 2.10 branch:
Important security release. In 2.10.0, the
'gnt-cluster upgrade' command was introduced. Before
performing an upgrade, the configuration directory of
the cluster is backed up. Unfortunately, the archive was
written with permissions that make it possible for
non-privileged users to read the archive and thus have
access to cluster and RAPI keys. After this release,
the archive will be created with privileged access only.
We strongly advise you to restrict the permissions of
previously created archives. The archives are found in
/var/lib/ganeti*.tar (unless otherwise configured with
--localstatedir or --with-backup-dir).
If you suspect that non-privileged users have accessed
your archives already, we advise you to renew the
cluster's crypto keys using 'gnt-cluster renew-crypto'
and to reset the RAPI credentials by editing
/var/lib/ganeti/rapi_users (respectively under a
different path if configured differently with
--localstatedir).
Other changes included in this release:
- Fix handling of Xen instance states.
- Fix NIC configuration with absent NIC VLAN
- Adapt relative path expansion in PATH to new environment
- Exclude archived jobs from configuration backups
- Fix RAPI for split query setup
- Allow disk hot-remove even with chroot or SM
Inherited from the 2.9 branch:
- Make htools tolerate missing 'spfree' on luxi
Version 2.11.4
--------------
*(Released Thu, 31 Jul 2014)*
- Improved documentation of the instance shutdown behavior.
Inherited from the 2.10 branch:
- KVM: fix NIC configuration with absent NIC VLAN (Issue 893)
- Adapt relative path expansion in PATH to new environment
- Exclude archived jobs from configuration backup
- Expose early_release for ReplaceInstanceDisks
- Add backup directory for configuration backups for upgrades
- Fix BlockdevSnapshot in case of non lvm-based disk
- Improve RAPI error handling for queries in non-existing items
- Allow disk hot-remove even with chroot or SM
- Remove superflous loop in instance queries (Issue 875)
Inherited from the 2.9 branch:
- Make ganeti-cleaner switch to save working directory (Issue 880)
Version 2.11.3
--------------
*(Released Wed, 9 Jul 2014)*
- Readd nodes to their previous node group
- Remove old-style gnt-network connect
Inherited from the 2.10 branch:
- Make network_vlan an optional OpParam
- hspace: support --accept-existing-errors
- Make hspace support --independent-groups
- Add a modifier for a group's allocation policy
- Export VLAN nicparam to NIC configuration scripts
- Fix gnt-network client to accept vlan info
- Support disk hotplug with userspace access
Inherited from the 2.9 branch:
- Make htools tolerate missing "spfree" on luxi
- Move the design for query splitting to the implemented list
- Add tests for DRBD setups with empty first resource
Inherited from the 2.8 branch:
- DRBD parser: consume initial empty resource lines
Version 2.11.2
--------------
*(Released Fri, 13 Jun 2014)*
- Improvements to KVM wrt to the kvmd and instance shutdown behavior.
WARNING: In contrast to our standard policy, this bug fix update
introduces new parameters to the configuration. This means in
particular that after an upgrade from 2.11.0 or 2.11.1, 'cfgupgrade'
needs to be run, either manually or explicitly by running
'gnt-cluster upgrade --to 2.11.2' (which requires that they
had configured the cluster with --enable-versionfull).
This also means, that it is not easily possible to downgrade from
2.11.2 to 2.11.1 or 2.11.0. The only way is to go back to 2.10 and
back.
Inherited from the 2.10 branch:
- Check for SSL encoding inconsistencies
- Check drbd helper only in VM capable nodes
- Improvements in statistics utils
Inherited from the 2.9 branch:
- check-man-warnings: use C.UTF-8 and set LC_ALL
Version 2.11.1
--------------
*(Released Wed, 14 May 2014)*
- Add design-node-security.rst to docinput
- kvm: use a dedicated QMP socket for kvmd
Inherited from the 2.10 branch:
- Set correct Ganeti version on setup commands
- Add a utility to combine shell commands
- Add design doc for performance tests
- Fix failed DRBD disk creation cleanup
- Hooking up verification for shared file storage
- Fix --shared-file-storage-dir option of gnt-cluster modify
- Clarify default setting of 'metavg'
- Fix invocation of GetCommandOutput in QA
- Clean up RunWithLocks
- Add an exception-trapping thread class
- Wait for delay to provide interruption information
- Add an expected block option to RunWithLocks
- Track if a QA test was blocked by locks
- Add a RunWithLocks QA utility function
- Add restricted migration
- Add an example for node evacuation
- Add a test for parsing version strings
- Tests for parallel job execution
- Fail in replace-disks if attaching disks fails
- Fix passing of ispecs in cluster init during QA
- Move QAThreadGroup to qa_job_utils.py
- Extract GetJobStatuses and use an unified version
- Run disk template specific tests only if possible
Inherited from the 2.9 branch:
- If Automake version > 1.11, force serial tests
- KVM: set IFF_ONE_QUEUE on created tap interfaces
- Add configure option to pass GHC flags
Version 2.11.0
--------------
*(Released Fri, 25 Apr 2014)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- ``gnt-node list`` no longer shows disk space information for shared file
disk templates because it is not a node attribute. (For example, if you have
both the file and shared file disk templates enabled, ``gnt-node list`` now
only shows information about the file disk template.)
- The shared file disk template is now in the new 'sharedfile' storage type.
As a result, ``gnt-node list-storage -t file`` now only shows information
about the file disk template and you may use ``gnt-node list-storage -t
sharedfile`` to query storage information for the shared file disk template.
- Over luxi, syntactially incorrect queries are now rejected as a whole;
before, a 'SumbmitManyJobs' request was partially executed, if the outer
structure of the request was syntactically correct. As the luxi protocol
is internal (external applications are expected to use RAPI), the impact
of this incompatible change should be limited.
- Queries for nodes, instances, groups, backups and networks are now
exclusively done via the luxi daemon. Legacy python code was removed,
as well as the --enable-split-queries configuration option.
- Orphan volumes errors are demoted to warnings and no longer affect the exit
code of ``gnt-cluster verify``.
- RPC security got enhanced by using different client SSL certificates
for each node. In this context 'gnt-cluster renew-crypto' got a new
option '--renew-node-certificates', which renews the client
certificates of all nodes. After a cluster upgrade from pre-2.11, run
this to create client certificates and activate this feature.
New features
~~~~~~~~~~~~
- Instance moves, backups and imports can now use compression to transfer the
instance data.
- Node groups can be configured to use an SSH port different than the
default 22.
- Added experimental support for Gluster distributed file storage as the
``gluster`` disk template under the new ``sharedfile`` storage type through
automatic management of per-node FUSE mount points. You can configure the
mount point location at ``gnt-cluster init`` time by using the new
``--gluster-storage-dir`` switch.
- Job scheduling is now handled by luxid, and the maximal number of jobs running
in parallel is a run-time parameter of the cluster.
- A new tool for planning dynamic power management, called ``hsqueeze``, has
been added. It suggests nodes to power up or down and corresponding instance
moves.
New dependencies
~~~~~~~~~~~~~~~~
The following new dependencies have been added:
For Haskell:
- ``zlib`` library (http://hackage.haskell.org/package/base64-bytestring)
- ``base64-bytestring`` library (http://hackage.haskell.org/package/zlib),
at least version 1.0.0.0
- ``lifted-base`` library (http://hackage.haskell.org/package/lifted-base)
- ``lens`` library (http://hackage.haskell.org/package/lens)
Since 2.11.0 rc1
~~~~~~~~~~~~~~~~
- Fix Xen instance state
Inherited from the 2.10 branch:
- Fix conflict between virtio + spice or soundhw
- Fix bitarray ops wrt PCI slots
- Allow releases scheduled 5 days in advance
- Make watcher submit queries low priority
- Fix specification of TIDiskParams
- Add unittests for instance modify parameter renaming
- Add renaming of instance custom params
- Add RAPI symmetry tests for groups
- Extend RAPI symmetry tests with RAPI-only aliases
- Add test for group custom parameter renaming
- Add renaming of group custom ndparams, ipolicy, diskparams
- Add the RAPI symmetry test for nodes
- Add aliases for nodes
- Allow choice of HTTP method for modification
- Add cluster RAPI symmetry test
- Fix failing cluster query test
- Add aliases for cluster parameters
- Add support for value aliases to RAPI
- Provide tests for GET/PUT symmetry
- Sort imports
- Also consider filter fields for deciding if using live data
- Document the python-fdsend dependency
- Verify configuration version number before parsing
- KVM: use running HVPs to calc blockdev options
- KVM: reserve a PCI slot for the SCSI controller
- Check for LVM-based verification results only when enabled
- Fix "existing" typos
- Fix output of gnt-instance info after migration
- Warn in UPGRADE about not tar'ing exported insts
- Fix non-running test and remove custom_nicparams rename
- Account for NODE_RES lock in opportunistic locking
- Fix request flooding of noded during disk sync
Inherited from the 2.9 branch:
- Make watcher submit queries low priority
- Fix failing gnt-node list-drbd command
- Update installation guide wrt to DRBD version
- Fix list-drbd QA test
- Add messages about skipped QA disk template tests
- Allow QA asserts to produce more messages
- Set exclusion tags correctly in requested instance
- Export extractExTags and updateExclTags
- Document spindles in the hbal man page
- Sample logrotate conf breaks permissions with split users
- Fix 'gnt-cluster' and 'gnt-node list-storage' outputs
Inherited from the 2.8 branch:
- Add reason parameter to RAPI client functions
- Include qa/patch in Makefile
- Handle empty patches better
- Move message formatting functions to separate file
- Add optional ordering of QA patch files
- Allow multiple QA patches
- Refactor current patching code
Version 2.11.0 rc1
------------------
*(Released Thu, 20 Mar 2014)*
This was the first RC release of the 2.11 series. Since 2.11.0 beta1:
- Convert int to float when checking config. consistency
- Rename compression option in gnt-backup export
Inherited from the 2.9 branch:
- Fix error introduced during merge
- gnt-cluster copyfile: accept relative paths
Inherited from the 2.8 branch:
- Improve RAPI detection of the watcher
- Add patching QA configuration files on buildbots
- Enable a timeout for instance shutdown
- Allow KVM commands to have a timeout
- Allow xen commands to have a timeout
- Fix wrong docstring
Version 2.11.0 beta1
--------------------
*(Released Wed, 5 Mar 2014)*
This was the first beta release of the 2.11 series. All important changes
are listed in the latest 2.11 entry.
Version 2.10.8
--------------
*(Released Fri, 11 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- Make htools tolerate missing "dtotal" and "dfree" on luxi
- At IAlloc backend guess state from admin state
- replace-disks: fix --ignore-ipolicy
- Fix instance multi allocation for non-DRBD disks
- Check for gnt-cluster before running gnt-cluster upgrade
- Work around a Python os.minor bug
- Add IP-related checks after master-failover
- Pass correct backend params in move-instance
- Allow plain/DRBD conversions regardless of lack of disks
- Fix MonD collector thunk leak
- Stop MonD when removing a node from a cluster
- Finalize backup only if successful
- Fix file descriptor leak in Confd Client
- Auto-upgrade hv_state_static and disk_state_static
- Do not hardcode the Python path in CLI tools
- Use the Python interpreter from ENV
- ganeti.daemon: fix daemon mode with GnuTLS >= 3.3 (Issues 961, 964)
- Ganeti.Daemon: always install SIGHUP handler (Issue 755)
- Fix DRBD version check for non VM capable nodes
- Fix handling of the --online option
- Add warning against hvparam changes with live migrations
- Only verify LVs in configured VG during cluster verify
- Fix network info in case of multi NIC instances
- On upgrades, check for upgrades to resume first
- Pause watcher during upgrade
- Allow instance disks to be added with --no-wait-for-sync
Version 2.10.7
--------------
*(Released Thu, 7 Aug 2014)*
Important security release. In 2.10.0, the
'gnt-cluster upgrade' command was introduced. Before
performing an upgrade, the configuration directory of
the cluster is backed up. Unfortunately, the archive was
written with permissions that make it possible for
non-privileged users to read the archive and thus have
access to cluster and RAPI keys. After this release,
the archive will be created with privileged access only.
We strongly advise you to restrict the permissions of
previously created archives. The archives are found in
/var/lib/ganeti*.tar (unless otherwise configured with
--localstatedir or --with-backup-dir).
If you suspect that non-privileged users have accessed
your archives already, we advise you to renew the
cluster's crypto keys using 'gnt-cluster renew-crypto'
and to reset the RAPI credentials by editing
/var/lib/ganeti/rapi_users (respectively under a
different path if configured differently with
--localstatedir).
Other changes included in this release:
- Fix handling of Xen instance states.
- Fix NIC configuration with absent NIC VLAN
- Adapt relative path expansion in PATH to new environment
- Exclude archived jobs from configuration backups
- Fix RAPI for split query setup
- Allow disk hot-remove even with chroot or SM
Inherited from the 2.9 branch:
- Make htools tolerate missing 'spfree' on luxi
Version 2.10.6
--------------
*(Released Mon, 30 Jun 2014)*
- Make Ganeti tolerant towards differnt openssl library
version on different nodes (issue 853).
- Allow hspace to make useful predictions in multi-group
clusters with one group overfull (isse 861).
- Various gnt-network related fixes.
- Fix disk hotplug with userspace access.
- Various documentation errors fixed.
Version 2.10.5
--------------
*(Released Mon, 2 Jun 2014)*
- Two new options have been added to gnt-group evacuate.
The 'sequential' option forces all the evacuation steps to
be carried out sequentially, thus avoiding congestion on a
slow link between node groups. The 'force-failover' option
disallows migrations and forces failovers to be used instead.
In this way evacuation to a group with vastly differnet
hypervisor is possible.
- In tiered allocation, when looking for ways on how to shrink
an instance, the canoncial path is tried first, i.e., in each
step reduce on the resource most placements are blocked on. Only
if no smaller fitting instance can be found shrinking a single
resource till fit is tried.
- For finding the placement of an instance, the duplicate computations
in the computation of the various cluster scores are computed only
once. This significantly improves the performance of hspace for DRBD
on large clusters; for other clusters, a slight performance decrease
might occur. Moreover, due to the changed order, floating point
number inaccuracies accumulate differently, thus resulting in different
cluster scores. It has been verified that the effect of these different
roundings is less than 1e-12.
- network queries fixed with respect to instances
- relax too strict prerequisite in LUClusterSetParams for DRBD helpers
- VArious improvements to QA and build-time tests
Version 2.10.4
--------------
*(Released Thu, 15 May 2014)*
- Support restricted migration in hbal
- Fix for the --shared-file-storage-dir of gnt-cluster modify (issue 811)
- Fail in replace-disks if attaching disks fails (issue 814)
- Set IFF_ONE_QUEUE on created tap interfaces for KVM
- Small fixes and enhancements in the build system
- Various documentation fixes (e.g. issue 810)
Version 2.10.3
--------------
*(Released Wed, 16 Apr 2014)*
- Fix filtering of pending jobs with -o id (issue 778)
- Make RAPI API calls more symmetric (issue 770)
- Make parsing of old cluster configuration more robust (issue 783)
- Fix wrong output of gnt-instance info after migrations
- Fix reserved PCI slots for KVM hotplugging
- Use runtime hypervisor parameters to calculate bockdevice options for KVM
- Fix high node daemon load during disk sync if the sync is paused manually
(issue 792)
- Improve opportunistic locking during instance creation (issue 791)
Inherited from the 2.9 branch:
- Make watcher submit queries low priority (issue 772)
- Add reason parameter to RAPI client functions (issue 776)
- Fix failing gnt-node list-drbd command (issue 777)
- Properly display fake job locks in gnt-debug.
- small fixes in documentation
Version 2.10.2
--------------
*(Released Mon, 24 Mar 2014)*
- Fix conflict between virtio + spice or soundhw (issue 757)
- accept relative paths in gnt-cluster copyfile (issue 754)
- Introduce shutdown timeout for 'xm shutdown' command
- Improve RAPI detection of the watcher (issue 752)
Version 2.10.1
--------------
*(Released Wed, 5 Mar 2014)*
- Fix incorrect invocation of hooks on offline nodes (issue 742)
- Fix incorrect exit code of gnt-cluster verify in certain circumstances
(issue 744)
Inherited from the 2.9 branch:
- Fix overflow problem in hbal that caused it to break when waiting for
jobs for more than 10 minutes (issue 717)
- Make hbal properly handle non-LVM storage
- Properly export and import NIC parameters, and do so in a backwards
compatible way (issue 716)
- Fix net-common script in case of routed mode (issue 728)
- Improve documentation (issues 724, 730)
Version 2.10.0
--------------
*(Released Thu, 20 Feb 2014)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Adding disks with 'gnt-instance modify' now waits for the disks to sync per
default. Specify --no-wait-for-sync to override this behavior.
- The Ganeti python code now adheres to a private-module layout. In particular,
the module 'ganeti' is no longer in the python search path.
- On instance allocation, the iallocator now considers non-LVM storage
properly. In particular, actual file storage space information is used
when allocating space for a file/sharedfile instance.
- When disabling disk templates cluster-wide, the cluster now first
checks whether there are instances still using those templates.
- 'gnt-node list-storage' now also reports storage information about
file-based storage types.
- In case of non drbd instances, export \*_SECONDARY environment variables
as empty strings (and not "None") during 'instance-migrate' related hooks.
New features
~~~~~~~~~~~~
- KVM hypervisors can now access RBD storage directly without having to
go through a block device.
- A new command 'gnt-cluster upgrade' was added that automates the upgrade
procedure between two Ganeti versions that are both 2.10 or higher.
- The move-instance command can now change disk templates when moving
instances, and does not require any node placement options to be
specified if the destination cluster has a default iallocator.
- Users can now change the soundhw and cpuid settings for XEN hypervisors.
- Hail and hbal now have the (optional) capability of accessing average CPU
load information through the monitoring deamon, and to use it to dynamically
adapt the allocation of instances.
- Hotplug support. Introduce new option '--hotplug' to ``gnt-instance modify``
so that disk and NIC modifications take effect without the need of actual
reboot. There are a couple of constrains currently for this feature:
- only KVM hypervisor (versions >= 1.0) supports it,
- one can not (yet) hotplug a disk using userspace access mode for RBD
- in case of a downgrade instances should suffer a reboot in order to
be migratable (due to core change of runtime files)
- ``python-fdsend`` is required for NIC hotplugging.
Misc changes
~~~~~~~~~~~~
- A new test framework for logical units was introduced and the test
coverage for logical units was improved significantly.
- Opcodes are entirely generated from Haskell using the tool 'hs2py' and
the module 'src/Ganeti/OpCodes.hs'.
- Constants are also generated from Haskell using the tool
'hs2py-constants' and the module 'src/Ganeti/Constants.hs', with the
exception of socket related constants, which require changing the
cluster configuration file, and HVS related constants, because they
are part of a port of instance queries to Haskell. As a result, these
changes will be part of the next release of Ganeti.
New dependencies
~~~~~~~~~~~~~~~~
The following new dependencies have been added/updated.
Python
- The version requirements for ``python-mock`` have increased to at least
version 1.0.1. It is still used for testing only.
- ``python-fdsend`` (https://gitorious.org/python-fdsend) is optional
but required for KVM NIC hotplugging to work.
Since 2.10.0 rc3
~~~~~~~~~~~~~~~~
- Fix integer overflow problem in hbal
Version 2.10.0 rc3
------------------
*(Released Wed, 12 Feb 2014)*
This was the third RC release of the 2.10 series. Since 2.10.0 rc2:
- Improved hotplug robustness
- Start Ganeti daemons after ensure-dirs during upgrade
- Documentation improvements
Inherited from the 2.9 branch:
- Fix the RAPI instances-multi-alloc call
- assign unique filenames to file-based disks
- gracefully handle degraded non-diskless instances with 0 disks (issue 697)
- noded now runs with its specified group, which is the default group,
defaulting to root (issue 707)
- make using UUIDs to identify nodes in gnt-node consistently possible
(issue 703)
Version 2.10.0 rc2
------------------
*(Released Fri, 31 Jan 2014)*
This was the second RC release of the 2.10 series. Since 2.10.0 rc1:
- Documentation improvements
- Run drbdsetup syncer only on network attach
- Include target node in hooks nodes for migration
- Fix configure dirs
- Support post-upgrade hooks during cluster upgrades
Inherited from the 2.9 branch:
- Ensure that all the hypervisors exist in the config file (Issue 640)
- Correctly recognise the role as master node (Issue 687)
- configure: allow detection of Sphinx 1.2+ (Issue 502)
- gnt-instance now honors the KVM path correctly (Issue 691)
Inherited from the 2.8 branch:
- Change the list separator for the usb_devices parameter from comma to space.
Commas could not work because they are already the hypervisor option
separator (Issue 649)
- Add support for blktap2 file-driver (Issue 638)
- Add network tag definitions to the haskell codebase (Issue 641)
- Fix RAPI network tag handling
- Add the network tags to the tags searched by gnt-cluster search-tags
- Fix caching bug preventing jobs from being cancelled
- Start-master/stop-master was always failing if ConfD was disabled. (Issue 685)
Version 2.10.0 rc1
------------------
*(Released Tue, 17 Dec 2013)*
This was the first RC release of the 2.10 series. Since 2.10.0 beta1:
- All known issues in 2.10.0 beta1 have been resolved (see changes from
the 2.8 branch).
- Improve handling of KVM runtime files from earlier Ganeti versions
- Documentation fixes
Inherited from the 2.9 branch:
- use custom KVM path if set for version checking
- SingleNotifyPipeCondition: don't share pollers
Inherited from the 2.8 branch:
- Fixed Luxi daemon socket permissions after master-failover
- Improve IP version detection code directly checking for colons rather than
passing the family from the cluster object
- Fix NODE/NODE_RES locking in LUInstanceCreate by not acquiring NODE_RES locks
opportunistically anymore (Issue 622)
- Allow link local IPv6 gateways (Issue 624)
- Fix error printing (Issue 616)
- Fix a bug in InstanceSetParams concerning names: in case no name is passed in
disk modifications, keep the old one. If name=none then set disk name to
None.
- Update build_chroot script to work with the latest hackage packages
- Add a packet number limit to "fping" in master-ip-setup (Issue 630)
- Fix evacuation out of drained node (Issue 615)
- Add default file_driver if missing (Issue 571)
- Fix job error message after unclean master shutdown (Issue 618)
- Lock group(s) when creating instances (Issue 621)
- SetDiskID() before accepting an instance (Issue 633)
- Allow the ext template disks to receive arbitrary parameters, both at creation
time and while being modified
- Xen handle domain shutdown (future proofing cherry-pick)
- Refactor reading live data in htools (future proofing cherry-pick)
Version 2.10.0 beta1
--------------------
*(Released Wed, 27 Nov 2013)*
This was the first beta release of the 2.10 series. All important changes
are listed in the latest 2.10 entry.
Known issues
~~~~~~~~~~~~
The following issues are known to be present in the beta and will be fixed
before rc1.
- Issue 477: Wrong permissions for confd LUXI socket
- Issue 621: Instance related opcodes do not aquire network/group locks
- Issue 622: Assertion Error: Node locks differ from node resource locks
- Issue 623: IPv6 Masterd <-> Luxid communication error
Version 2.9.7
-------------
*(Released Fri, 11 Dec 2015)*
Important changes and security notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Security release.
CVE-2015-7944
Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.
Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI's exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti's security document: doc/html/security.html
CVE-2015-7945
Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.
Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.
Minor changes
~~~~~~~~~~~~~
- gnt-instance replace-disks no longer crashes when --ignore-policy is
passed to it
- Stop MonD when removing a node from a cluster
- Fix file descriptor leak in Confd client
- Always install SIGHUP handler for Haskell daemons (Issue 755)
- Make ganeti-cleaner switch to a safe working directory (Issue 880)
- Make htools tolerate missing "spfree" on Luxi
- DRBD parser: consume initial empty resource lines (Issue 869)
- KVM: set IFF_ONE_QUEUE on created tap interfaces
- Set exclusion tags correctly in requested instance
Version 2.9.6
-------------
*(Released Mon, 7 Apr 2014)*
- Improve RAPI detection of the watcher (Issue 752)
- gnt-cluster copyfile: accept relative paths (Issue 754)
- Make watcher submit queries low priority (Issue 772)
- Add reason parameter to RAPI client functions (Issue 776)
- Fix failing gnt-node list-drbd command (Issue 777)
- Properly display fake job locks in gnt-debug.
- Enable timeout for instance shutdown
- small fixes in documentation
Version 2.9.5
-------------
*(Released Tue, 25 Feb 2014)*
- Fix overflow problem in hbal that caused it to break when waiting for
jobs for more than 10 minutes (issue 717)
- Make hbal properly handle non-LVM storage
- Properly export and import NIC parameters, and do so in a backwards
compatible way (issue 716)
- Fix net-common script in case of routed mode (issue 728)
- Improve documentation (issues 724, 730)
Version 2.9.4
-------------
*(Released Mon, 10 Feb 2014)*
- Fix the RAPI instances-multi-alloc call
- assign unique filenames to file-based disks
- gracefully handle degraded non-diskless instances with 0 disks (issue 697)
- noded now runs with its specified group, which is the default group,
defaulting to root (issue 707)
- make using UUIDs to identify nodes in gnt-node consistently possible
(issue 703)
Version 2.9.3
-------------
*(Released Mon, 27 Jan 2014)*
- Ensure that all the hypervisors exist in the config file (Issue 640)
- Correctly recognise the role as master node (Issue 687)
- configure: allow detection of Sphinx 1.2+ (Issue 502)
- gnt-instance now honors the KVM path correctly (Issue 691)
Inherited from the 2.8 branch:
- Change the list separator for the usb_devices parameter from comma to space.
Commas could not work because they are already the hypervisor option
separator (Issue 649)
- Add support for blktap2 file-driver (Issue 638)
- Add network tag definitions to the haskell codebase (Issue 641)
- Fix RAPI network tag handling
- Add the network tags to the tags searched by gnt-cluster search-tags
- Fix caching bug preventing jobs from being cancelled
- Start-master/stop-master was always failing if ConfD was disabled. (Issue 685)
Version 2.9.2
-------------
*(Released Fri, 13 Dec 2013)*
- use custom KVM path if set for version checking
- SingleNotifyPipeCondition: don't share pollers
Inherited from the 2.8 branch:
- Fixed Luxi daemon socket permissions after master-failover
- Improve IP version detection code directly checking for colons rather than
passing the family from the cluster object
- Fix NODE/NODE_RES locking in LUInstanceCreate by not acquiring NODE_RES locks
opportunistically anymore (Issue 622)
- Allow link local IPv6 gateways (Issue 624)
- Fix error printing (Issue 616)
- Fix a bug in InstanceSetParams concerning names: in case no name is passed in
disk modifications, keep the old one. If name=none then set disk name to
None.
- Update build_chroot script to work with the latest hackage packages
- Add a packet number limit to "fping" in master-ip-setup (Issue 630)
- Fix evacuation out of drained node (Issue 615)
- Add default file_driver if missing (Issue 571)
- Fix job error message after unclean master shutdown (Issue 618)
- Lock group(s) when creating instances (Issue 621)
- SetDiskID() before accepting an instance (Issue 633)
- Allow the ext template disks to receive arbitrary parameters, both at creation
time and while being modified
- Xen handle domain shutdown (future proofing cherry-pick)
- Refactor reading live data in htools (future proofing cherry-pick)
Version 2.9.1
-------------
*(Released Wed, 13 Nov 2013)*
- fix bug, that kept nodes offline when readding
- when verifying DRBD versions, ignore unavailable nodes
- fix bug that made the console unavailable on kvm in split-user
setup (issue 608)
- DRBD: ensure peers are UpToDate for dual-primary (inherited 2.8.2)
Version 2.9.0
-------------
*(Released Tue, 5 Nov 2013)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- hroller now also plans for capacity to move non-redundant instances off
any node to be rebooted; the old behavior of completely ignoring any
non-redundant instances can be restored by adding the --ignore-non-redundant
option.
- The cluster option '--no-lvm-storage' was removed in favor of the new option
'--enabled-disk-templates'.
- On instance creation, disk templates no longer need to be specified
with '-t'. The default disk template will be taken from the list of
enabled disk templates.
- The monitoring daemon is now running as root, in order to be able to collect
information only available to root (such as the state of Xen instances).
- The ConfD client is now IPv6 compatible.
- File and shared file storage is no longer dis/enabled at configure time,
but using the option '--enabled-disk-templates' at cluster initialization and
modification.
- The default directories for file and shared file storage are not anymore
specified at configure time, but taken from the cluster's configuration.
They can be set at cluster initialization and modification with
'--file-storage-dir' and '--shared-file-storage-dir'.
- Cluster verification now includes stricter checks regarding the
default file and shared file storage directories. It now checks that
the directories are explicitely allowed in the 'file-storage-paths' file and
that the directories exist on all nodes.
- The list of allowed disk templates in the instance policy and the list
of cluster-wide enabled disk templates is now checked for consistency
on cluster or group modification. On cluster initialization, the ipolicy
disk templates are ensured to be a subset of the cluster-wide enabled
disk templates.
New features
~~~~~~~~~~~~
- DRBD 8.4 support. Depending on the installed DRBD version, Ganeti now uses
the correct command syntax. It is possible to use different DRBD versions
on different nodes as long as they are compatible to each other. This
enables rolling upgrades of DRBD with no downtime. As permanent operation
of different DRBD versions within a node group is discouraged,
``gnt-cluster verify`` will emit a warning if it detects such a situation.
- New "inst-status-xen" data collector for the monitoring daemon, providing
information about the state of the xen instances on the nodes.
- New "lv" data collector for the monitoring daemon, collecting data about the
logical volumes on the nodes, and pairing them with the name of the instances
they belong to.
- New "diskstats" data collector, collecting the data from /proc/diskstats and
presenting them over the monitoring daemon interface.
- The ConfD client is now IPv6 compatible.
New dependencies
~~~~~~~~~~~~~~~~
The following new dependencies have been added.
Python
- ``python-mock`` (http://www.voidspace.org.uk/python/mock/) is now a required
for the unit tests (and only used for testing).
Haskell
- ``hslogger`` (http://software.complete.org/hslogger) is now always
required, even if confd is not enabled.
Since 2.9.0 rc3
~~~~~~~~~~~~~~~
- Correctly start/stop luxid during gnt-cluster master-failover (inherited
from stable-2.8)
- Improved error messsages (inherited from stable-2.8)
Version 2.9.0 rc3
-----------------
*(Released Tue, 15 Oct 2013)*
The third release candidate in the 2.9 series. Since 2.9.0 rc2:
- in implicit configuration upgrade, match ipolicy with enabled disk templates
- improved harep documentation (inherited from stable-2.8)
Version 2.9.0 rc2
-----------------
*(Released Wed, 9 Oct 2013)*
The second release candidate in the 2.9 series. Since 2.9.0 rc1:
- Fix bug in cfgupgrade that led to failure when upgrading from 2.8 with
at least one DRBD instance.
- Fix bug in cfgupgrade that led to an invalid 2.8 configuration after
downgrading.
Version 2.9.0 rc1
-----------------
*(Released Tue, 1 Oct 2013)*
The first release candidate in the 2.9 series. Since 2.9.0 beta1:
- various bug fixes
- update of the documentation, in particular installation instructions
- merging of LD_* constants into DT_* constants
- python style changes to be compatible with newer versions of pylint
Version 2.9.0 beta1
-------------------
*(Released Thu, 29 Aug 2013)*
This was the first beta release of the 2.9 series. All important changes
are listed in the latest 2.9 entry.
Version 2.8.4
-------------
*(Released Thu, 23 Jan 2014)*
- Change the list separator for the usb_devices parameter from comma to space.
Commas could not work because they are already the hypervisor option
separator (Issue 649)
- Add support for blktap2 file-driver (Issue 638)
- Add network tag definitions to the haskell codebase (Issue 641)
- Fix RAPI network tag handling
- Add the network tags to the tags searched by gnt-cluster search-tags
- Fix caching bug preventing jobs from being cancelled
- Start-master/stop-master was always failing if ConfD was disabled. (Issue 685)
Version 2.8.3
-------------
*(Released Thu, 12 Dec 2013)*
- Fixed Luxi daemon socket permissions after master-failover
- Improve IP version detection code directly checking for colons rather than
passing the family from the cluster object
- Fix NODE/NODE_RES locking in LUInstanceCreate by not acquiring NODE_RES locks
opportunistically anymore (Issue 622)
- Allow link local IPv6 gateways (Issue 624)
- Fix error printing (Issue 616)
- Fix a bug in InstanceSetParams concerning names: in case no name is passed in
disk modifications, keep the old one. If name=none then set disk name to
None.
- Update build_chroot script to work with the latest hackage packages
- Add a packet number limit to "fping" in master-ip-setup (Issue 630)
- Fix evacuation out of drained node (Issue 615)
- Add default file_driver if missing (Issue 571)
- Fix job error message after unclean master shutdown (Issue 618)
- Lock group(s) when creating instances (Issue 621)
- SetDiskID() before accepting an instance (Issue 633)
- Allow the ext template disks to receive arbitrary parameters, both at creation
time and while being modified
- Xen handle domain shutdown (future proofing cherry-pick)
- Refactor reading live data in htools (future proofing cherry-pick)
Version 2.8.2
-------------
*(Released Thu, 07 Nov 2013)*
- DRBD: ensure peers are UpToDate for dual-primary
- Improve error message for replace-disks
- More dependency checks at configure time
- Placate warnings on ganeti.outils_unittest.py
Version 2.8.1
-------------
*(Released Thu, 17 Oct 2013)*
- Correctly start/stop luxid during gnt-cluster master-failover
- Don't attempt IPv6 ssh in case of IPv4 cluster (Issue 595)
- Fix path for the job queue serial file
- Improved harep man page
- Minor documentation improvements
Version 2.8.0
-------------
*(Released Mon, 30 Sep 2013)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Instance policy can contain multiple instance specs, as described in
the “Constrained instance sizes†section of :doc:`Partitioned Ganeti
`. As a consequence, it's not possible to partially change
or override instance specs. Bounding specs (min and max) can be specified as a
whole using the new option ``--ipolicy-bounds-specs``, while standard
specs use the new option ``--ipolicy-std-specs``.
- The output of the info command of gnt-cluster, gnt-group, gnt-node,
gnt-instance is a valid YAML object.
- hail now honors network restrictions when allocating nodes. This led to an
update of the IAllocator protocol. See the IAllocator documentation for
details.
- confd now only answers static configuration request over the network. luxid
was extracted, listens on the local LUXI socket and responds to live queries.
This allows finer grained permissions if using separate users.
New features
~~~~~~~~~~~~
- The :doc:`Remote API ` daemon now supports a command line flag
to always require authentication, ``--require-authentication``. It can
be specified in ``$sysconfdir/default/ganeti``.
- A new cluster attribute 'enabled_disk_templates' is introduced. It will
be used to manage the disk templates to be used by instances in the cluster.
Initially, it will be set to a list that includes plain, drbd, if they were
enabled by specifying a volume group name, and file and sharedfile, if those
were enabled at configure time. Additionally, it will include all disk
templates that are currently used by instances. The order of disk templates
will be based on Ganeti's history of supporting them. In the future, the
first entry of the list will be used as a default disk template on instance
creation.
- ``cfgupgrade`` now supports a ``--downgrade`` option to bring the
configuration back to the previous stable version.
- Disk templates in group ipolicy can be restored to the default value.
- Initial support for diskless instances and virtual clusters in QA.
- More QA and unit tests for instance policies.
- Every opcode now contains a reason trail (visible through ``gnt-job info``)
describing why the opcode itself was executed.
- The monitoring daemon is now available. It allows users to query the cluster
for obtaining information about the status of the system. The daemon is only
responsible for providing the information over the network: the actual data
gathering is performed by data collectors (currently, only the DRBD status
collector is available).
- In order to help developers work on Ganeti, a new script
(``devel/build_chroot``) is provided, for building a chroot that contains all
the required development libraries and tools for compiling Ganeti on a Debian
Squeeze system.
- A new tool, ``harep``, for performing self-repair and recreation of instances
in Ganeti has been added.
- Split queries are enabled for tags, network, exports, cluster info, groups,
jobs, nodes.
- New command ``show-ispecs-cmd`` for ``gnt-cluster`` and ``gnt-group``.
It prints the command line to set the current policies, to ease
changing them.
- Add the ``vnet_hdr`` HV parameter for KVM, to control whether the tap
devices for KVM virtio-net interfaces will get created with VNET_HDR
(IFF_VNET_HDR) support. If set to false, it disables offloading on the
virtio-net interfaces, which prevents host kernel tainting and log
flooding, when dealing with broken or malicious virtio-net drivers.
It's set to true by default.
- Instance failover now supports a ``--cleanup`` parameter for fixing previous
failures.
- Support 'viridian' parameter in Xen HVM
- Support DSA SSH keys in bootstrap
- To simplify the work of packaging frameworks that want to add the needed users
and groups in a split-user setup themselves, at build time three files in
``doc/users`` will be generated. The ``groups`` files contains, one per line,
the groups to be generated, the ``users`` file contains, one per line, the
users to be generated, optionally followed by their primary group, where
important. The ``groupmemberships`` file contains, one per line, additional
user-group membership relations that need to be established. The syntax of
these files will remain stable in all future versions.
New dependencies
~~~~~~~~~~~~~~~~
The following new dependencies have been added:
For Haskell:
- The ``curl`` library is not optional anymore for compiling the Haskell code.
- ``snap-server`` library (if monitoring is enabled).
For Python:
- The minimum Python version needed to run Ganeti is now 2.6.
- ``yaml`` library (only for running the QA).
Since 2.8.0 rc3
~~~~~~~~~~~~~~~
- Perform proper cleanup on termination of Haskell daemons
- Fix corner-case in handling of remaining retry time
Version 2.8.0 rc3
-----------------
*(Released Tue, 17 Sep 2013)*
- To simplify the work of packaging frameworks that want to add the needed users
and groups in a split-user setup themselves, at build time three files in
``doc/users`` will be generated. The ``groups`` files contains, one per line,
the groups to be generated, the ``users`` file contains, one per line, the
users to be generated, optionally followed by their primary group, where
important. The ``groupmemberships`` file contains, one per line, additional
user-group membership relations that need to be established. The syntax of
these files will remain stable in all future versions.
- Add a default to file-driver when unspecified over RAPI (Issue 571)
- Mark the DSA host pubkey as optional, and remove it during config downgrade
(Issue 560)
- Some documentation fixes
Version 2.8.0 rc2
-----------------
*(Released Tue, 27 Aug 2013)*
The second release candidate of the 2.8 series. Since 2.8.0. rc1:
- Support 'viridian' parameter in Xen HVM (Issue 233)
- Include VCS version in ``gnt-cluster version``
- Support DSA SSH keys in bootstrap (Issue 338)
- Fix batch creation of instances
- Use FQDN to check master node status (Issue 551)
- Make the DRBD collector more failure-resilient
Version 2.8.0 rc1
-----------------
*(Released Fri, 2 Aug 2013)*
The first release candidate of the 2.8 series. Since 2.8.0 beta1:
- Fix upgrading/downgrading from 2.7
- Increase maximum RAPI message size
- Documentation updates
- Split ``confd`` between ``luxid`` and ``confd``
- Merge 2.7 series up to the 2.7.1 release
- Allow the ``modify_etc_hosts`` option to be changed
- Add better debugging for ``luxid`` queries
- Expose bulk parameter for GetJobs in RAPI client
- Expose missing ``network`` fields in RAPI
- Add some ``cluster verify`` tests
- Some unittest fixes
- Fix a malfunction in ``hspace``'s tiered allocation
- Fix query compatibility between haskell and python implementations
- Add the ``vnet_hdr`` HV parameter for KVM
- Add ``--cleanup`` to instance failover
- Change the connected groups format in ``gnt-network info`` output; it
was previously displayed as a raw list by mistake. (Merged from 2.7)
Version 2.8.0 beta1
-------------------
*(Released Mon, 24 Jun 2013)*
This was the first beta release of the 2.8 series. All important changes
are listed in the latest 2.8 entry.
Version 2.7.2
-------------
*(Released Thu, 26 Sep 2013)*
- Change the connected groups format in ``gnt-network info`` output; it
was previously displayed as a raw list by mistake
- Check disk template in right dict when copying
- Support multi-instance allocs without iallocator
- Fix some errors in the documentation
- Fix formatting of tuple in an error message
Version 2.7.1
-------------
*(Released Thu, 25 Jul 2013)*
- Add logrotate functionality in daemon-util
- Add logrotate example file
- Add missing fields to network queries over rapi
- Fix network object timestamps
- Add support for querying network timestamps
- Fix a typo in the example crontab
- Fix a documentation typo
Version 2.7.0
-------------
*(Released Thu, 04 Jul 2013)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Instance policies for disk size were documented to be on a per-disk
basis, but hail applied them to the sum of all disks. This has been
fixed.
- ``hbal`` will now exit with status 0 if, during job execution over
LUXI, early exit has been requested and all jobs are successful;
before, exit status 1 was used, which cannot be differentiated from
"job error" case
- Compatibility with newer versions of rbd has been fixed
- ``gnt-instance batch-create`` has been changed to use the bulk create
opcode from Ganeti. This lead to incompatible changes in the format of
the JSON file. It's now not a custom dict anymore but a dict
compatible with the ``OpInstanceCreate`` opcode.
- Parent directories for file storage need to be listed in
``$sysconfdir/ganeti/file-storage-paths`` now. ``cfgupgrade`` will
write the file automatically based on old configuration values, but it
can not distribute it across all nodes and the file contents should be
verified. Use ``gnt-cluster copyfile
$sysconfdir/ganeti/file-storage-paths`` once the cluster has been
upgraded. The reason for requiring this list of paths now is that
before it would have been possible to inject new paths via RPC,
allowing files to be created in arbitrary locations. The RPC protocol
is protected using SSL/X.509 certificates, but as a design principle
Ganeti does not permit arbitrary paths to be passed.
- The parsing of the variants file for OSes (see
:manpage:`ganeti-os-interface(7)`) has been slightly changed: now empty
lines and comment lines (starting with ``#``) are ignored for better
readability.
- The ``setup-ssh`` tool added in Ganeti 2.2 has been replaced and is no
longer available. ``gnt-node add`` now invokes a new tool on the
destination node, named ``prepare-node-join``, to configure the SSH
daemon. Paramiko is no longer necessary to configure nodes' SSH
daemons via ``gnt-node add``.
- Draining (``gnt-cluster queue drain``) and un-draining the job queue
(``gnt-cluster queue undrain``) now affects all nodes in a cluster and
the flag is not reset after a master failover.
- Python 2.4 has *not* been tested with this release. Using 2.6 or above
is recommended. 2.6 will be mandatory from the 2.8 series.
New features
~~~~~~~~~~~~
- New network management functionality to support automatic allocation
of IP addresses and managing of network parameters. See
:manpage:`gnt-network(8)` for more details.
- New external storage backend, to allow managing arbitrary storage
systems external to the cluster. See
:manpage:`ganeti-extstorage-interface(7)`.
- New ``exclusive-storage`` node parameter added, restricted to
nodegroup level. When it's set to true, physical disks are assigned in
an exclusive fashion to instances, as documented in :doc:`Partitioned
Ganeti `. Currently, only instances using the
``plain`` disk template are supported.
- The KVM hypervisor has been updated with many new hypervisor
parameters, including a generic one for passing arbitrary command line
values. See a complete list in :manpage:`gnt-instance(8)`. It is now
compatible up to qemu 1.4.
- A new tool, called ``mon-collector``, is the stand-alone executor of
the data collectors for a monitoring system. As of this version, it
just includes the DRBD data collector, that can be executed by calling
``mon-collector`` using the ``drbd`` parameter. See
:manpage:`mon-collector(7)`.
- A new user option, :pyeval:`rapi.RAPI_ACCESS_READ`, has been added
for RAPI users. It allows granting permissions to query for
information to a specific user without giving
:pyeval:`rapi.RAPI_ACCESS_WRITE` permissions.
- A new tool named ``node-cleanup`` has been added. It cleans remains of
a cluster from a machine by stopping all daemons, removing
certificates and ssconf files. Unless the ``--no-backup`` option is
given, copies of the certificates are made.
- Instance creations now support the use of opportunistic locking,
potentially speeding up the (parallel) creation of multiple instances.
This feature is currently only available via the :doc:`RAPI
` interface and when an instance allocator is used. If the
``opportunistic_locking`` parameter is set the opcode will try to
acquire as many locks as possible, but will not wait for any locks
held by other opcodes. If not enough resources can be found to
allocate the instance, the temporary error code
:pyeval:`errors.ECODE_TEMP_NORES` is returned. The operation can be
retried thereafter, with or without opportunistic locking.
- New experimental linux-ha resource scripts.
- Restricted-commands support: ganeti can now be asked (via command line
or rapi) to perform commands on a node. These are passed via ganeti
RPC rather than ssh. This functionality is restricted to commands
specified on the ``$sysconfdir/ganeti/restricted-commands`` for security
reasons. The file is not copied automatically.
Misc changes
~~~~~~~~~~~~
- Diskless instances are now externally mirrored (Issue 237). This for
now has only been tested in conjunction with explicit target nodes for
migration/failover.
- Queries not needing locks or RPC access to the node can now be
performed by the confd daemon, making them independent from jobs, and
thus faster to execute. This is selectable at configure time.
- The functionality for allocating multiple instances at once has been
overhauled and is now also available through :doc:`RAPI `.
There are no significant changes from version 2.7.0~rc3.
Version 2.7.0 rc3
-----------------
*(Released Tue, 25 Jun 2013)*
- Fix permissions on the confd query socket (Issue 477)
- Fix permissions on the job archive dir (Issue 498)
- Fix handling of an internal exception in replace-disks (Issue 472)
- Fix gnt-node info handling of shortened names (Issue 497)
- Fix gnt-instance grow-disk when wiping is enabled
- Documentation improvements, and support for newer pandoc
- Fix hspace honoring ipolicy for disks (Issue 484)
- Improve handling of the ``kvm_extra`` HV parameter
Version 2.7.0 rc2
-----------------
*(Released Fri, 24 May 2013)*
- ``devel/upload`` now works when ``/var/run`` on the target nodes is a
symlink.
- Disks added through ``gnt-instance modify`` or created through
``gnt-instance recreate-disks`` are wiped, if the
``prealloc_wipe_disks`` flag is set.
- If wiping newly created disks fails, the disks are removed. Also,
partial failures in creating disks through ``gnt-instance modify``
triggers a cleanup of the partially-created disks.
- Removing the master IP address doesn't fail if the address has been
already removed.
- Fix ownership of the OS log dir
- Workaround missing SO_PEERCRED constant (Issue 191)
Version 2.7.0 rc1
-----------------
*(Released Fri, 3 May 2013)*
This was the first release candidate of the 2.7 series. Since beta3:
- Fix kvm compatibility with qemu 1.4 (Issue 389)
- Documentation updates (admin guide, upgrade notes, install
instructions) (Issue 372)
- Fix gnt-group list nodes and instances count (Issue 436)
- Fix compilation without non-mandatory libraries (Issue 441)
- Fix xen-hvm hypervisor forcing nics to type 'ioemu' (Issue 247)
- Make confd logging more verbose at INFO level (Issue 435)
- Improve "networks" documentation in :manpage:`gnt-instance(8)`
- Fix failure path for instance storage type conversion (Issue 229)
- Update htools text backend documentation
- Improve the renew-crypto section of :manpage:`gnt-cluster(8)`
- Disable inter-cluster instance move for file-based instances, because
it is dependant on instance export, which is not supported for
file-based instances. (Issue 414)
- Fix gnt-job crashes on non-ascii characters (Issue 427)
- Fix volume group checks on non-vm-capable nodes (Issue 432)
Version 2.7.0 beta3
-------------------
*(Released Mon, 22 Apr 2013)*
This was the third beta release of the 2.7 series. Since beta2:
- Fix hail to verify disk instance policies on a per-disk basis (Issue 418).
- Fix data loss on wrong usage of ``gnt-instance move``
- Properly export errors in confd-based job queries
- Add ``users-setup`` tool
- Fix iallocator protocol to report 0 as a disk size for diskless
instances. This avoids hail breaking when a diskless instance is
present.
- Fix job queue directory permission problem that made confd job queries
fail. This requires running an ``ensure-dirs --full-run`` on upgrade
for access to archived jobs (Issue 406).
- Limit the sizes of networks supported by ``gnt-network`` to something
between a ``/16`` and a ``/30`` to prevent memory bloat and crashes.
- Fix bugs in instance disk template conversion
- Fix GHC 7 compatibility
- Fix ``burnin`` install path (Issue 426).
- Allow very small disk grows (Issue 347).
- Fix a ``ganeti-noded`` memory bloat introduced in 2.5, by making sure
that noded doesn't import masterd code (Issue 419).
- Make sure the default metavg at cluster init is the same as the vg, if
unspecified (Issue 358).
- Fix cleanup of partially created disks (part of Issue 416)
Version 2.7.0 beta2
-------------------
*(Released Tue, 2 Apr 2013)*
This was the second beta release of the 2.7 series. Since beta1:
- Networks no longer have a "type" slot, since this information was
unused in Ganeti: instead of it tags should be used.
- The rapi client now has a ``target_node`` option to MigrateInstance.
- Fix early exit return code for hbal (Issue 386).
- Fix ``gnt-instance migrate/failover -n`` (Issue 396).
- Fix ``rbd showmapped`` output parsing (Issue 312).
- Networks are now referenced indexed by UUID, rather than name. This
will require running cfgupgrade, from 2.7.0beta1, if networks are in
use.
- The OS environment now includes network information.
- Deleting of a network is now disallowed if any instance nic is using
it, to prevent dangling references.
- External storage is now documented in man pages.
- The exclusive_storage flag can now only be set at nodegroup level.
- Hbal can now submit an explicit priority with its jobs.
- Many network related locking fixes.
- Bump up the required pylint version to 0.25.1.
- Fix the ``no_remember`` option in RAPI client.
- Many ipolicy related tests, qa, and fixes.
- Many documentation improvements and fixes.
- Fix building with ``--disable-file-storage``.
- Fix ``-q`` option in htools, which was broken if passed more than
once.
- Some haskell/python interaction improvements and fixes.
- Fix iallocator in case of missing LVM storage.
- Fix confd config load in case of ``--no-lvm-storage``.
- The confd/query functionality is now mentioned in the security
documentation.
Version 2.7.0 beta1
-------------------
*(Released Wed, 6 Feb 2013)*
This was the first beta release of the 2.7 series. All important changes
are listed in the latest 2.7 entry.
Version 2.6.2
-------------
*(Released Fri, 21 Dec 2012)*
Important behaviour change: hbal won't rebalance anymore instances which
have the ``auto_balance`` attribute set to false. This was the intention
all along, but until now it only skipped those from the N+1 memory
reservation (DRBD-specific).
A significant number of bug fixes in this release:
- Fixed disk adoption interaction with ipolicy checks.
- Fixed networking issues when instances are started, stopped or
migrated, by forcing the tap device's MAC prefix to "fe" (issue 217).
- Fixed the warning in cluster verify for shared storage instances not
being redundant.
- Fixed removal of storage directory on shared file storage (issue 262).
- Fixed validation of LVM volume group name in OpClusterSetParams
(``gnt-cluster modify``) (issue 285).
- Fixed runtime memory increases (``gnt-instance modify -m``).
- Fixed live migration under Xen's ``xl`` mode.
- Fixed ``gnt-instance console`` with ``xl``.
- Fixed building with newer Haskell compiler/libraries.
- Fixed PID file writing in Haskell daemons (confd); this prevents
restart issues if confd was launched manually (outside of
``daemon-util``) while another copy of it was running
- Fixed a type error when doing live migrations with KVM (issue 297) and
the error messages for failing migrations have been improved.
- Fixed opcode validation for the out-of-band commands (``gnt-node
power``).
- Fixed a type error when unsetting OS hypervisor parameters (issue
311); now it's possible to unset all OS-specific hypervisor
parameters.
- Fixed the ``dry-run`` mode for many operations: verification of
results was over-zealous but didn't take into account the ``dry-run``
operation, resulting in "wrong" failures.
- Fixed bash completion in ``gnt-job list`` when the job queue has
hundreds of entries; especially with older ``bash`` versions, this
results in significant CPU usage.
And lastly, a few other improvements have been made:
- Added option to force master-failover without voting (issue 282).
- Clarified error message on lock conflict (issue 287).
- Logging of newly submitted jobs has been improved (issue 290).
- Hostname checks have been made uniform between instance rename and
create (issue 291).
- The ``--submit`` option is now supported by ``gnt-debug delay``.
- Shutting down the master daemon by sending SIGTERM now stops it from
processing jobs waiting for locks; instead, those jobs will be started
once again after the master daemon is started the next time (issue
296).
- Support for Xen's ``xl`` program has been improved (besides the fixes
above).
- Reduced logging noise in the Haskell confd daemon (only show one log
entry for each config reload, instead of two).
- Several man page updates and typo fixes.
Version 2.6.1
-------------
*(Released Fri, 12 Oct 2012)*
A small bugfix release. Among the bugs fixed:
- Fixed double use of ``PRIORITY_OPT`` in ``gnt-node migrate``, that
made the command unusable.
- Commands that issue many jobs don't fail anymore just because some jobs
take so long that other jobs are archived.
- Failures during ``gnt-instance reinstall`` are reflected by the exit
status.
- Issue 190 fixed. Check for DRBD in cluster verify is enabled only when
DRBD is enabled.
- When ``always_failover`` is set, ``--allow-failover`` is not required
in migrate commands anymore.
- ``bash_completion`` works even if extglob is disabled.
- Fixed bug with locks that made failover for RDB-based instances fail.
- Fixed bug in non-mirrored instance allocation that made Ganeti choose
a random node instead of one based on the allocator metric.
- Support for newer versions of pylint and pep8.
- Hail doesn't fail anymore when trying to add an instance of type
``file``, ``sharedfile`` or ``rbd``.
- Added new Makefile target to rebuild the whole distribution, so that
all files are included.
Version 2.6.0
-------------
*(Released Fri, 27 Jul 2012)*
.. attention:: The ``LUXI`` protocol has been made more consistent
regarding its handling of command arguments. This, however, leads to
incompatibility issues with previous versions. Please ensure that you
restart Ganeti daemons soon after the upgrade, otherwise most
``LUXI`` calls (job submission, setting/resetting the drain flag,
pausing/resuming the watcher, cancelling and archiving jobs, querying
the cluster configuration) will fail.
New features
~~~~~~~~~~~~
Instance run status
+++++++++++++++++++
The current ``admin_up`` field, which used to denote whether an instance
should be running or not, has been removed. Instead, ``admin_state`` is
introduced, with 3 possible values -- ``up``, ``down`` and ``offline``.
The rational behind this is that an instance being “down†can have
different meanings:
- it could be down during a reboot
- it could be temporarily be down for a reinstall
- or it could be down because it is deprecated and kept just for its
disk
The previous Boolean state was making it difficult to do capacity
calculations: should Ganeti reserve memory for a down instance? Now, the
tri-state field makes it clear:
- in ``up`` and ``down`` state, all resources are reserved for the
instance, and it can be at any time brought up if it is down
- in ``offline`` state, only disk space is reserved for it, but not
memory or CPUs
The field can have an extra use: since the transition between ``up`` and
``down`` and vice-versus is done via ``gnt-instance start/stop``, but
transition between ``offline`` and ``down`` is done via ``gnt-instance
modify``, it is possible to given different rights to users. For
example, owners of an instance could be allowed to start/stop it, but
not transition it out of the offline state.
Instance policies and specs
+++++++++++++++++++++++++++
In previous Ganeti versions, an instance creation request was not
limited on the minimum size and on the maximum size just by the cluster
resources. As such, any policy could be implemented only in third-party
clients (RAPI clients, or shell wrappers over ``gnt-*``
tools). Furthermore, calculating cluster capacity via ``hspace`` again
required external input with regards to instance sizes.
In order to improve these workflows and to allow for example better
per-node group differentiation, we introduced instance specs, which
allow declaring:
- minimum instance disk size, disk count, memory size, cpu count
- maximum values for the above metrics
- and “standard†values (used in ``hspace`` to calculate the standard
sized instances)
The minimum/maximum values can be also customised at node-group level,
for example allowing more powerful hardware to support bigger instance
memory sizes.
Beside the instance specs, there are a few other settings belonging to
the instance policy framework. It is possible now to customise, per
cluster and node-group:
- the list of allowed disk templates
- the maximum ratio of VCPUs per PCPUs (to control CPU oversubscription)
- the maximum ratio of instance to spindles (see below for more
information) for local storage
All these together should allow all tools that talk to Ganeti to know
what are the ranges of allowed values for instances and the
over-subscription that is allowed.
For the VCPU/PCPU ratio, we already have the VCPU configuration from the
instance configuration, and the physical CPU configuration from the
node. For the spindle ratios however, we didn't track before these
values, so new parameters have been added:
- a new node parameter ``spindle_count``, defaults to 1, customisable at
node group or node level
- at new backend parameter (for instances), ``spindle_use`` defaults to 1
Note that spindles in this context doesn't need to mean actual
mechanical hard-drives; it's just a relative number for both the node
I/O capacity and instance I/O consumption.
Instance migration behaviour
++++++++++++++++++++++++++++
While live-migration is in general desirable over failover, it is
possible that for some workloads it is actually worse, due to the
variable time of the “suspend†phase during live migration.
To allow the tools to work consistently over such instances (without
having to hard-code instance names), a new backend parameter
``always_failover`` has been added to control the migration/failover
behaviour. When set to True, all migration requests for an instance will
instead fall-back to failover.
Instance memory ballooning
++++++++++++++++++++++++++
Initial support for memory ballooning has been added. The memory for an
instance is no longer fixed (backend parameter ``memory``), but instead
can vary between minimum and maximum values (backend parameters
``minmem`` and ``maxmem``). Currently we only change an instance's
memory when:
- live migrating or failing over and instance and the target node
doesn't have enough memory
- user requests changing the memory via ``gnt-instance modify
--runtime-memory``
Instance CPU pinning
++++++++++++++++++++
In order to control the use of specific CPUs by instance, support for
controlling CPU pinning has been added for the Xen, HVM and LXC
hypervisors. This is controlled by a new hypervisor parameter
``cpu_mask``; details about possible values for this are in the
:manpage:`gnt-instance(8)`. Note that use of the most specific (precise
VCPU-to-CPU mapping) form will work well only when all nodes in your
cluster have the same amount of CPUs.
Disk parameters
+++++++++++++++
Another area in which Ganeti was not customisable were the parameters
used for storage configuration, e.g. how many stripes to use for LVM,
DRBD resync configuration, etc.
To improve this area, we've added disks parameters, which are
customisable at cluster and node group level, and which allow to
specify various parameters for disks (DRBD has the most parameters
currently), for example:
- DRBD resync algorithm and parameters (e.g. speed)
- the default VG for meta-data volumes for DRBD
- number of stripes for LVM (plain disk template)
- the RBD pool
These parameters can be modified via ``gnt-cluster modify -D …`` and
``gnt-group modify -D …``, and are used at either instance creation (in
case of LVM stripes, for example) or at disk “activation†time
(e.g. resync speed).
Rados block device support
++++++++++++++++++++++++++
A Rados (http://ceph.com/wiki/Rbd) storage backend has been added,
denoted by the ``rbd`` disk template type. This is considered
experimental, feedback is welcome. For details on configuring it, see
the :doc:`install` document and the :manpage:`gnt-cluster(8)` man page.
Master IP setup
+++++++++++++++
The existing master IP functionality works well only in simple setups (a
single network shared by all nodes); however, if nodes belong to
different networks, then the ``/32`` setup and lack of routing
information is not enough.
To allow the master IP to function well in more complex cases, the
system was reworked as follows:
- a master IP netmask setting has been added
- the master IP activation/turn-down code was moved from the node daemon
to a separate script
- whether to run the Ganeti-supplied master IP script or a user-supplied
on is a ``gnt-cluster init`` setting
Details about the location of the standard and custom setup scripts are
in the man page :manpage:`gnt-cluster(8)`; for information about the
setup script protocol, look at the Ganeti-supplied script.
SPICE support
+++++++++++++
The `SPICE `_ support has been
improved.
It is now possible to use TLS-protected connections, and when renewing
or changing the cluster certificates (via ``gnt-cluster renew-crypto``,
it is now possible to specify spice or spice CA certificates. Also, it
is possible to configure a password for SPICE sessions via the
hypervisor parameter ``spice_password_file``.
There are also new parameters to control the compression and streaming
options (e.g. ``spice_image_compression``, ``spice_streaming_video``,
etc.). For details, see the man page :manpage:`gnt-instance(8)` and look
for the spice parameters.
Lastly, it is now possible to see the SPICE connection information via
``gnt-instance console``.
OVF converter
+++++++++++++
A new tool (``tools/ovfconverter``) has been added that supports
conversion between Ganeti and the `Open Virtualization Format
`_ (both to and
from).
This relies on the ``qemu-img`` tool to convert the disk formats, so the
actual compatibility with other virtualization solutions depends on it.
Confd daemon changes
++++++++++++++++++++
The configuration query daemon (``ganeti-confd``) is now optional, and
has been rewritten in Haskell; whether to use the daemon at all, use the
Python (default) or the Haskell version is selectable at configure time
via the ``--enable-confd`` parameter, which can take one of the
``haskell``, ``python`` or ``no`` values. If not used, disabling the
daemon will result in a smaller footprint; for larger systems, we
welcome feedback on the Haskell version which might become the default
in future versions.
If you want to use ``gnt-node list-drbd`` you need to have the Haskell
daemon running. The Python version doesn't implement the new call.
User interface changes
~~~~~~~~~~~~~~~~~~~~~~
We have replaced the ``--disks`` option of ``gnt-instance
replace-disks`` with a more flexible ``--disk`` option, which allows
adding and removing disks at arbitrary indices (Issue 188). Furthermore,
disk size and mode can be changed upon recreation (via ``gnt-instance
recreate-disks``, which accepts the same ``--disk`` option).
As many people are used to a ``show`` command, we have added that as an
alias to ``info`` on all ``gnt-*`` commands.
The ``gnt-instance grow-disk`` command has a new mode in which it can
accept the target size of the disk, instead of the delta; this can be
more safe since two runs in absolute mode will be idempotent, and
sometimes it's also easier to specify the desired size directly.
Also the handling of instances with regard to offline secondaries has
been improved. Instance operations should not fail because one of it's
secondary nodes is offline, even though it's safe to proceed.
A new command ``list-drbd`` has been added to the ``gnt-node`` script to
support debugging of DRBD issues on nodes. It provides a mapping of DRBD
minors to instance name.
API changes
~~~~~~~~~~~
RAPI coverage has improved, with (for example) new resources for
recreate-disks, node power-cycle, etc.
Compatibility
~~~~~~~~~~~~~
There is partial support for ``xl`` in the Xen hypervisor; feedback is
welcome.
Python 2.7 is better supported, and after Ganeti 2.6 we will investigate
whether to still support Python 2.4 or move to Python 2.6 as minimum
required version.
Support for Fedora has been slightly improved; the provided example
init.d script should work better on it and the INSTALL file should
document the needed dependencies.
Internal changes
~~~~~~~~~~~~~~~~
The deprecated ``QueryLocks`` LUXI request has been removed. Use
``Query(what=QR_LOCK, ...)`` instead.
The LUXI requests :pyeval:`luxi.REQ_QUERY_JOBS`,
:pyeval:`luxi.REQ_QUERY_INSTANCES`, :pyeval:`luxi.REQ_QUERY_NODES`,
:pyeval:`luxi.REQ_QUERY_GROUPS`, :pyeval:`luxi.REQ_QUERY_EXPORTS` and
:pyeval:`luxi.REQ_QUERY_TAGS` are deprecated and will be removed in a
future version. :pyeval:`luxi.REQ_QUERY` should be used instead.
RAPI client: ``CertificateError`` now derives from
``GanetiApiError``. This should make it more easy to handle Ganeti
errors.
Deprecation warnings due to PyCrypto/paramiko import in
``tools/setup-ssh`` have been silenced, as usually they are safe; please
make sure to run an up-to-date paramiko version, if you use this tool.
The QA scripts now depend on Python 2.5 or above (the main code base
still works with Python 2.4).
The configuration file (``config.data``) is now written without
indentation for performance reasons; if you want to edit it, it can be
re-formatted via ``tools/fmtjson``.
A number of bugs has been fixed in the cluster merge tool.
``x509`` certification verification (used in import-export) has been
changed to allow the same clock skew as permitted by the cluster
verification. This will remove some rare but hard to diagnose errors in
import-export.
Version 2.6.0 rc4
-----------------
*(Released Thu, 19 Jul 2012)*
Very few changes from rc4 to the final release, only bugfixes:
- integrated fixes from release 2.5.2 (fix general boot flag for KVM
instance, fix CDROM booting for KVM instances)
- fixed node group modification of node parameters
- fixed issue in LUClusterVerifyGroup with multi-group clusters
- fixed generation of bash completion to ensure a stable ordering
- fixed a few typos
Version 2.6.0 rc3
-----------------
*(Released Fri, 13 Jul 2012)*
Third release candidate for 2.6. The following changes were done from
rc3 to rc4:
- Fixed ``UpgradeConfig`` w.r.t. to disk parameters on disk objects.
- Fixed an inconsistency in the LUXI protocol with the provided
arguments (NOT backwards compatible)
- Fixed a bug with node groups ipolicy where ``min`` was greater than
the cluster ``std`` value
- Implemented a new ``gnt-node list-drbd`` call to list DRBD minors for
easier instance debugging on nodes (requires ``hconfd`` to work)
Version 2.6.0 rc2
-----------------
*(Released Tue, 03 Jul 2012)*
Second release candidate for 2.6. The following changes were done from
rc2 to rc3:
- Fixed ``gnt-cluster verify`` regarding ``master-ip-script`` on non
master candidates
- Fixed a RAPI regression on missing beparams/memory
- Fixed redistribution of files on offline nodes
- Added possibility to run activate-disks even though secondaries are
offline. With this change it relaxes also the strictness on some other
commands which use activate disks internally:
* ``gnt-instance start|reboot|rename|backup|export``
- Made it possible to remove safely an instance if its secondaries are
offline
- Made it possible to reinstall even though secondaries are offline
Version 2.6.0 rc1
-----------------
*(Released Mon, 25 Jun 2012)*
First release candidate for 2.6. The following changes were done from
rc1 to rc2:
- Fixed bugs with disk parameters and ``rbd`` templates as well as
``instance_os_add``
- Made ``gnt-instance modify`` more consistent regarding new NIC/Disk
behaviour. It supports now the modify operation
- ``hcheck`` implemented to analyze cluster health and possibility of
improving health by rebalance
- ``hbal`` has been improved in dealing with split instances
Version 2.6.0 beta2
-------------------
*(Released Mon, 11 Jun 2012)*
Second beta release of 2.6. The following changes were done from beta2
to rc1:
- Fixed ``daemon-util`` with non-root user models
- Fixed creation of plain instances with ``--no-wait-for-sync``
- Fix wrong iv_names when running ``cfgupgrade``
- Export more information in RAPI group queries
- Fixed bug when changing instance network interfaces
- Extended burnin to do NIC changes
- query: Added ``<``, ``>``, ``<=``, ``>=`` comparison operators
- Changed default for DRBD barriers
- Fixed DRBD error reporting for syncer rate
- Verify the options on disk parameters
And of course various fixes to documentation and improved unittests and
QA.
Version 2.6.0 beta1
-------------------
*(Released Wed, 23 May 2012)*
First beta release of 2.6. The following changes were done from beta1 to
beta2:
- integrated patch for distributions without ``start-stop-daemon``
- adapted example init.d script to work on Fedora
- fixed log handling in Haskell daemons
- adapted checks in the watcher for pycurl linked against libnss
- add partial support for ``xl`` instead of ``xm`` for Xen
- fixed a type issue in cluster verification
- fixed ssconf handling in the Haskell code (was breaking confd in IPv6
clusters)
Plus integrated fixes from the 2.5 branch:
- fixed ``kvm-ifup`` to use ``/bin/bash``
- fixed parallel build failures
- KVM live migration when using a custom keymap
Version 2.5.2
-------------
*(Released Tue, 24 Jul 2012)*
A small bugfix release, with no new features:
- fixed bash-isms in kvm-ifup, for compatibility with systems which use a
different default shell (e.g. Debian, Ubuntu)
- fixed KVM startup and live migration with a custom keymap (fixes Issue
243 and Debian bug #650664)
- fixed compatibility with KVM versions that don't support multiple boot
devices (fixes Issue 230 and Debian bug #624256)
Additionally, a few fixes were done to the build system (fixed parallel
build failures) and to the unittests (fixed race condition in test for
FileID functions, and the default enable/disable mode for QA test is now
customisable).
Version 2.5.1
-------------
*(Released Fri, 11 May 2012)*
A small bugfix release.
The main issues solved are on the topic of compatibility with newer LVM
releases:
- fixed parsing of ``lv_attr`` field
- adapted to new ``vgreduce --removemissing`` behaviour where sometimes
the ``--force`` flag is needed
Also on the topic of compatibility, ``tools/lvmstrap`` has been changed
to accept kernel 3.x too (was hardcoded to 2.6.*).
A regression present in 2.5.0 that broke handling (in the gnt-* scripts)
of hook results and that also made display of other errors suboptimal
was fixed; the code behaves now like 2.4 and earlier.
Another change in 2.5, the cleanup of the OS scripts environment, is too
aggressive: it removed even the ``PATH`` variable, which requires the OS
scripts to *always* need to export it. Since this is a bit too strict,
we now export a minimal PATH, the same that we export for hooks.
The fix for issue 201 (Preserve bridge MTU in KVM ifup script) was
integrated into this release.
Finally, a few other miscellaneous changes were done (no new features,
just small improvements):
- Fix ``gnt-group --help`` display
- Fix hardcoded Xen kernel path
- Fix grow-disk handling of invalid units
- Update synopsis for ``gnt-cluster repair-disk-sizes``
- Accept both PUT and POST in noded (makes future upgrade to 2.6 easier)
Version 2.5.0
-------------
*(Released Thu, 12 Apr 2012)*
Incompatible/important changes and bugfixes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The default of the ``/2/instances/[instance_name]/rename`` RAPI
resource's ``ip_check`` parameter changed from ``True`` to ``False``
to match the underlying LUXI interface.
- The ``/2/nodes/[node_name]/evacuate`` RAPI resource was changed to use
body parameters, see :doc:`RAPI documentation `. The server does
not maintain backwards-compatibility as the underlying operation
changed in an incompatible way. The RAPI client can talk to old
servers, but it needs to be told so as the return value changed.
- When creating file-based instances via RAPI, the ``file_driver``
parameter no longer defaults to ``loop`` and must be specified.
- The deprecated ``bridge`` NIC parameter is no longer supported. Use
``link`` instead.
- Support for the undocumented and deprecated RAPI instance creation
request format version 0 has been dropped. Use version 1, supported
since Ganeti 2.1.3 and :doc:`documented `, instead.
- Pyparsing 1.4.6 or above is required, see :doc:`installation
documentation `.
- The "cluster-verify" hooks are now executed per group by the
``OP_CLUSTER_VERIFY_GROUP`` opcode. This maintains the same behavior
if you just run ``gnt-cluster verify``, which generates one opcode per
group.
- The environment as passed to the OS scripts is cleared, and thus no
environment variables defined in the node daemon's environment will be
inherited by the scripts.
- The :doc:`iallocator ` mode ``multi-evacuate`` has been
deprecated.
- :doc:`New iallocator modes ` have been added to
support operations involving multiple node groups.
- Offline nodes are ignored when failing over an instance.
- Support for KVM version 1.0, which changed the version reporting format
from 3 to 2 digits.
- TCP/IP ports used by DRBD disks are returned to a pool upon instance
removal.
- ``Makefile`` is now compatible with Automake 1.11.2
- Includes all bugfixes made in the 2.4 series
New features
~~~~~~~~~~~~
- The ganeti-htools project has been merged into the ganeti-core source
tree and will be built as part of Ganeti (see :doc:`install-quick`).
- Implemented support for :doc:`shared storage `.
- Add support for disks larger than 2 TB in ``lvmstrap`` by supporting
GPT-style partition tables (requires `parted
`_).
- Added support for floppy drive and 2nd CD-ROM drive in KVM hypervisor.
- Allowed adding tags on instance creation.
- Export instance tags to hooks (``INSTANCE_TAGS``, see :doc:`hooks`)
- Allow instances to be started in a paused state, enabling the user to
see the complete console output on boot using the console.
- Added new hypervisor flag to control default reboot behaviour
(``reboot_behavior``).
- Added support for KVM keymaps (hypervisor parameter ``keymap``).
- Improved out-of-band management support:
- Added ``gnt-node health`` command reporting the health status of
nodes.
- Added ``gnt-node power`` command to manage power status of nodes.
- Added command for emergency power-off (EPO), ``gnt-cluster epo``.
- Instance migration can fall back to failover if instance is not
running.
- Filters can be used when listing nodes, instances, groups and locks;
see :manpage:`ganeti(7)` manpage.
- Added post-execution status as variables to :doc:`hooks `
environment.
- Instance tags are exported/imported together with the instance.
- When given an explicit job ID, ``gnt-job info`` will work for archived
jobs.
- Jobs can define dependencies on other jobs (not yet supported via
RAPI or command line, but used by internal commands and usable via
LUXI).
- Lock monitor (``gnt-debug locks``) shows jobs waiting for
dependencies.
- Instance failover is now available as a RAPI resource
(``/2/instances/[instance_name]/failover``).
- ``gnt-instance info`` defaults to static information if primary node
is offline.
- Opcodes have a new ``comment`` attribute.
- Added basic SPICE support to KVM hypervisor.
- ``tools/ganeti-listrunner`` allows passing of arguments to executable.
Node group improvements
~~~~~~~~~~~~~~~~~~~~~~~
- ``gnt-cluster verify`` has been modified to check groups separately,
thereby improving performance.
- Node group support has been added to ``gnt-cluster verify-disks``,
which now operates per node group.
- Watcher has been changed to work better with node groups.
- One process and state file per node group.
- Slow watcher in one group doesn't block other group's watcher.
- Added new command, ``gnt-group evacuate``, to move all instances in a
node group to other groups.
- Added ``gnt-instance change-group`` to move an instance to another
node group.
- ``gnt-cluster command`` and ``gnt-cluster copyfile`` now support
per-group operations.
- Node groups can be tagged.
- Some operations switch from an exclusive to a shared lock as soon as
possible.
- Instance's primary and secondary nodes' groups are now available as
query fields (``pnode.group``, ``pnode.group.uuid``, ``snodes.group``
and ``snodes.group.uuid``).
Misc
~~~~
- Numerous updates to documentation and manpages.
- :doc:`RAPI ` documentation now has detailed parameter
descriptions.
- Some opcode/job results are now also documented, see :doc:`RAPI
`.
- A lockset's internal lock is now also visible in lock monitor.
- Log messages from job queue workers now contain information about the
opcode they're processing.
- ``gnt-instance console`` no longer requires the instance lock.
- A short delay when waiting for job changes reduces the number of LUXI
requests significantly.
- DRBD metadata volumes are overwritten with zeros during disk creation.
- Out-of-band commands no longer acquire the cluster lock in exclusive
mode.
- ``devel/upload`` now uses correct permissions for directories.
Version 2.5.0 rc6
-----------------
*(Released Fri, 23 Mar 2012)*
This was the sixth release candidate of the 2.5 series.
Version 2.5.0 rc5
-----------------
*(Released Mon, 9 Jan 2012)*
This was the fifth release candidate of the 2.5 series.
Version 2.5.0 rc4
-----------------
*(Released Thu, 27 Oct 2011)*
This was the fourth release candidate of the 2.5 series.
Version 2.5.0 rc3
-----------------
*(Released Wed, 26 Oct 2011)*
This was the third release candidate of the 2.5 series.
Version 2.5.0 rc2
-----------------
*(Released Tue, 18 Oct 2011)*
This was the second release candidate of the 2.5 series.
Version 2.5.0 rc1
-----------------
*(Released Tue, 4 Oct 2011)*
This was the first release candidate of the 2.5 series.
Version 2.5.0 beta3
-------------------
*(Released Wed, 31 Aug 2011)*
This was the third beta release of the 2.5 series.
Version 2.5.0 beta2
-------------------
*(Released Mon, 22 Aug 2011)*
This was the second beta release of the 2.5 series.
Version 2.5.0 beta1
-------------------
*(Released Fri, 12 Aug 2011)*
This was the first beta release of the 2.5 series.
Version 2.4.5
-------------
*(Released Thu, 27 Oct 2011)*
- Fixed bug when parsing command line parameter values ending in
backslash
- Fixed assertion error after unclean master shutdown
- Disable HTTP client pool for RPC, significantly reducing memory usage
of master daemon
- Fixed queue archive creation with wrong permissions
Version 2.4.4
-------------
*(Released Tue, 23 Aug 2011)*
Small bug-fixes:
- Fixed documentation for importing with ``--src-dir`` option
- Fixed a bug in ``ensure-dirs`` with queue/archive permissions
- Fixed a parsing issue with DRBD 8.3.11 in the Linux kernel
Version 2.4.3
-------------
*(Released Fri, 5 Aug 2011)*
Many bug-fixes and a few small features:
- Fixed argument order in ``ReserveLV`` and ``ReserveMAC`` which caused
issues when you tried to add an instance with two MAC addresses in one
request
- KVM: fixed per-instance stored UID value
- KVM: configure bridged NICs at migration start
- KVM: Fix a bug where instance will not start with never KVM versions
(>= 0.14)
- Added OS search path to ``gnt-cluster info``
- Fixed an issue with ``file_storage_dir`` where you were forced to
provide an absolute path, but the documentation states it is a
relative path, the documentation was right
- Added a new parameter to instance stop/start called ``--no-remember``
that will make the state change to not be remembered
- Implemented ``no_remember`` at RAPI level
- Improved the documentation
- Node evacuation: don't call IAllocator if node is already empty
- Fixed bug in DRBD8 replace disks on current nodes
- Fixed bug in recreate-disks for DRBD instances
- Moved assertion checking locks in ``gnt-instance replace-disks``
causing it to abort with not owning the right locks for some situation
- Job queue: Fixed potential race condition when cancelling queued jobs
- Fixed off-by-one bug in job serial generation
- ``gnt-node volumes``: Fix instance names
- Fixed aliases in bash completion
- Fixed a bug in reopening log files after being sent a SIGHUP
- Added a flag to burnin to allow specifying VCPU count
- Bugfixes to non-root Ganeti configuration
Version 2.4.2
-------------
*(Released Thu, 12 May 2011)*
Many bug-fixes and a few new small features:
- Fixed a bug related to log opening failures
- Fixed a bug in instance listing with orphan instances
- Fixed a bug which prevented resetting the cluster-level node parameter
``oob_program`` to the default
- Many fixes related to the ``cluster-merge`` tool
- Fixed a race condition in the lock monitor, which caused failures
during (at least) creation of many instances in parallel
- Improved output for gnt-job info
- Removed the quiet flag on some ssh calls which prevented debugging
failures
- Improved the N+1 failure messages in cluster verify by actually
showing the memory values (needed and available)
- Increased lock attempt timeouts so that when executing long operations
(e.g. DRBD replace-disks) other jobs do not enter 'blocking acquire'
too early and thus prevent the use of the 'fair' mechanism
- Changed instance query data (``gnt-instance info``) to not acquire
locks unless needed, thus allowing its use on locked instance if only
static information is asked for
- Improved behaviour with filesystems that do not support rename on an
opened file
- Fixed the behaviour of ``prealloc_wipe_disks`` cluster parameter which
kept locks on all nodes during the wipe, which is unneeded
- Fixed ``gnt-watcher`` handling of errors during hooks execution
- Fixed bug in ``prealloc_wipe_disks`` with small disk sizes (less than
10GiB) which caused the wipe to fail right at the end in some cases
- Fixed master IP activation when doing master failover with no-voting
- Fixed bug in ``gnt-node add --readd`` which allowed the re-adding of
the master node itself
- Fixed potential data-loss in under disk full conditions, where Ganeti
wouldn't check correctly the return code and would consider
partially-written files 'correct'
- Fixed bug related to multiple VGs and DRBD disk replacing
- Added new disk parameter ``metavg`` that allows placement of the meta
device for DRBD in a different volume group
- Fixed error handling in the node daemon when the system libc doesn't
have major number 6 (i.e. if ``libc.so.6`` is not the actual libc)
- Fixed lock release during replace-disks, which kept cluster-wide locks
when doing disk replaces with an iallocator script
- Added check for missing bridges in cluster verify
- Handle EPIPE errors while writing to the terminal better, so that
piping the output to e.g. ``less`` doesn't cause a backtrace
- Fixed rare case where a ^C during Luxi calls could have been
interpreted as server errors, instead of simply terminating
- Fixed a race condition in LUGroupAssignNodes (``gnt-group
assign-nodes``)
- Added a few more parameters to the KVM hypervisor, allowing a second
CDROM, custom disk type for CDROMs and a floppy image
- Removed redundant message in instance rename when the name is given
already as a FQDN
- Added option to ``gnt-instance recreate-disks`` to allow creating the
disks on new nodes, allowing recreation when the original instance
nodes are completely gone
- Added option when converting disk templates to DRBD to skip waiting
for the resync, in order to make the instance available sooner
- Added two new variables to the OS scripts environment (containing the
instance's nodes)
- Made the root_path and optional parameter for the xen-pvm hypervisor,
to allow use of ``pvgrub`` as bootloader
- Changed the instance memory modifications to only check out-of-memory
conditions on memory increases, and turned the secondary node warnings
into errors (they can still be overridden via ``--force``)
- Fixed the handling of a corner case when the Python installation gets
corrupted (e.g. a bad disk) while ganeti-noded is running and we try
to execute a command that doesn't exist
- Fixed a bug in ``gnt-instance move`` (LUInstanceMove) when the primary
node of the instance returned failures during instance shutdown; this
adds the option ``--ignore-consistency`` to gnt-instance move
And as usual, various improvements to the error messages, documentation
and man pages.
Version 2.4.1
-------------
*(Released Wed, 09 Mar 2011)*
Emergency bug-fix release. ``tools/cfgupgrade`` was broken and overwrote
the RAPI users file if run twice (even with ``--dry-run``).
The release fixes that bug (nothing else changed).
Version 2.4.0
-------------
*(Released Mon, 07 Mar 2011)*
Final 2.4.0 release. Just a few small fixes:
- Fixed RAPI node evacuate
- Fixed the kvm-ifup script
- Fixed internal error handling for special job cases
- Updated man page to specify the escaping feature for options
Version 2.4.0 rc3
-----------------
*(Released Mon, 28 Feb 2011)*
A critical fix for the ``prealloc_wipe_disks`` feature: it is possible
that this feature wiped the disks of the wrong instance, leading to loss
of data.
Other changes:
- Fixed title of query field containing instance name
- Expanded the glossary in the documentation
- Fixed one unittest (internal issue)
Version 2.4.0 rc2
-----------------
*(Released Mon, 21 Feb 2011)*
A number of bug fixes plus just a couple functionality changes.
On the user-visible side, the ``gnt-* list`` command output has changed
with respect to "special" field states. The current rc1 style of display
can be re-enabled by passing a new ``--verbose`` (``-v``) flag, but in
the default output mode special fields states are displayed as follows:
- Offline resource: ``*``
- Unavailable/not applicable: ``-``
- Data missing (RPC failure): ``?``
- Unknown field: ``??``
Another user-visible change is the addition of ``--force-join`` to
``gnt-node add``.
As for bug fixes:
- ``tools/cluster-merge`` has seen many fixes and is now enabled again
- Fixed regression in RAPI/instance reinstall where all parameters were
required (instead of optional)
- Fixed ``gnt-cluster repair-disk-sizes``, was broken since Ganeti 2.2
- Fixed iallocator usage (offline nodes were not considered offline)
- Fixed ``gnt-node list`` with respect to non-vm_capable nodes
- Fixed hypervisor and OS parameter validation with respect to
non-vm_capable nodes
- Fixed ``gnt-cluster verify`` with respect to offline nodes (mostly
cosmetic)
- Fixed ``tools/listrunner`` with respect to agent-based usage
Version 2.4.0 rc1
-----------------
*(Released Fri, 4 Feb 2011)*
Many changes and fixes since the beta1 release. While there were some
internal changes, the code has been mostly stabilised for the RC
release.
Note: the dumb allocator was removed in this release, as it was not kept
up-to-date with the IAllocator protocol changes. It is recommended to
use the ``hail`` command from the ganeti-htools package.
Note: the 2.4 and up versions of Ganeti are not compatible with the
0.2.x branch of ganeti-htools. You need to upgrade to
ganeti-htools-0.3.0 (or later).
Regressions fixed from 2.3
~~~~~~~~~~~~~~~~~~~~~~~~~~
- Fixed the ``gnt-cluster verify-disks`` command
- Made ``gnt-cluster verify-disks`` work in parallel (as opposed to
serially on nodes)
- Fixed disk adoption breakage
- Fixed wrong headers in instance listing for field aliases
Other bugs fixed
~~~~~~~~~~~~~~~~
- Fixed corner case in KVM handling of NICs
- Fixed many cases of wrong handling of non-vm_capable nodes
- Fixed a bug where a missing instance symlink was not possible to
recreate with any ``gnt-*`` command (now ``gnt-instance
activate-disks`` does it)
- Fixed the volume group name as reported by ``gnt-cluster
verify-disks``
- Increased timeouts for the import-export code, hopefully leading to
fewer aborts due network or instance timeouts
- Fixed bug in ``gnt-node list-storage``
- Fixed bug where not all daemons were started on cluster
initialisation, but only at the first watcher run
- Fixed many bugs in the OOB implementation
- Fixed watcher behaviour in presence of instances with offline
secondaries
- Fixed instance list output for instances running on the wrong node
- a few fixes to the cluster-merge tool, but it still cannot merge
multi-node groups (currently it is not recommended to use this tool)
Improvements
~~~~~~~~~~~~
- Improved network configuration for the KVM hypervisor
- Added e1000 as a supported NIC for Xen-HVM
- Improved the lvmstrap tool to also be able to use partitions, as
opposed to full disks
- Improved speed of disk wiping (the cluster parameter
``prealloc_wipe_disks``, so that it has a low impact on the total time
of instance creations
- Added documentation for the OS parameters
- Changed ``gnt-instance deactivate-disks`` so that it can work if the
hypervisor is not responding
- Added display of blacklisted and hidden OS information in
``gnt-cluster info``
- Extended ``gnt-cluster verify`` to also validate hypervisor, backend,
NIC and node parameters, which might create problems with currently
invalid (but undetected) configuration files, but prevents validation
failures when unrelated parameters are modified
- Changed cluster initialisation to wait for the master daemon to become
available
- Expanded the RAPI interface:
- Added config redistribution resource
- Added activation/deactivation of instance disks
- Added export of console information
- Implemented log file reopening on SIGHUP, which allows using
logrotate(8) for the Ganeti log files
- Added a basic OOB helper script as an example
Version 2.4.0 beta1
-------------------
*(Released Fri, 14 Jan 2011)*
User-visible
~~~~~~~~~~~~
- Fixed timezone issues when formatting timestamps
- Added support for node groups, available via ``gnt-group`` and other
commands
- Added out-of-band framework and management, see :doc:`design
document `
- Removed support for roman numbers from ``gnt-node list`` and
``gnt-instance list``.
- Allowed modification of master network interface via ``gnt-cluster
modify --master-netdev``
- Accept offline secondaries while shutting down instance disks
- Added ``blockdev_prefix`` parameter to Xen PVM and HVM hypervisors
- Added support for multiple LVM volume groups
- Avoid sorting nodes for ``gnt-node list`` if specific nodes are
requested
- Added commands to list available fields:
- ``gnt-node list-fields``
- ``gnt-group list-fields``
- ``gnt-instance list-fields``
- Updated documentation and man pages
Integration
~~~~~~~~~~~
- Moved ``rapi_users`` file into separate directory, now named
``.../ganeti/rapi/users``, ``cfgupgrade`` moves the file and creates a
symlink
- Added new tool for running commands on many machines,
``tools/ganeti-listrunner``
- Implemented more verbose result in ``OpInstanceConsole`` opcode, also
improving the ``gnt-instance console`` output
- Allowed customisation of disk index separator at ``configure`` time
- Export node group allocation policy to :doc:`iallocator `
- Added support for non-partitioned md disks in ``lvmstrap``
- Added script to gracefully power off KVM instances
- Split ``utils`` module into smaller parts
- Changed query operations to return more detailed information, e.g.
whether an information is unavailable due to an offline node. To use
this new functionality, the LUXI call ``Query`` must be used. Field
information is now stored by the master daemon and can be retrieved
using ``QueryFields``. Instances, nodes and groups can also be queried
using the new opcodes ``OpQuery`` and ``OpQueryFields`` (not yet
exposed via RAPI). The following commands make use of this
infrastructure change:
- ``gnt-group list``
- ``gnt-group list-fields``
- ``gnt-node list``
- ``gnt-node list-fields``
- ``gnt-instance list``
- ``gnt-instance list-fields``
- ``gnt-debug locks``
Remote API
~~~~~~~~~~
- New RAPI resources (see :doc:`rapi`):
- ``/2/modify``
- ``/2/groups``
- ``/2/groups/[group_name]``
- ``/2/groups/[group_name]/assign-nodes``
- ``/2/groups/[group_name]/modify``
- ``/2/groups/[group_name]/rename``
- ``/2/instances/[instance_name]/disk/[disk_index]/grow``
- RAPI changes:
- Implemented ``no_install`` for instance creation
- Implemented OS parameters for instance reinstallation, allowing
use of special settings on reinstallation (e.g. for preserving data)
Misc
~~~~
- Added IPv6 support in import/export
- Pause DRBD synchronization while wiping disks on instance creation
- Updated unittests and QA scripts
- Improved network parameters passed to KVM
- Converted man pages from docbook to reStructuredText
Version 2.3.1
-------------
*(Released Mon, 20 Dec 2010)*
Released version 2.3.1~rc1 without any changes.
Version 2.3.1 rc1
-----------------
*(Released Wed, 1 Dec 2010)*
- impexpd: Disable OpenSSL compression in socat if possible (backport
from master, commit e90739d625b, see :doc:`installation guide
` for details)
- Changed unittest coverage report to exclude test scripts
- Added script to check version format
Version 2.3.0
-------------
*(Released Wed, 1 Dec 2010)*
Released version 2.3.0~rc1 without any changes.
Version 2.3.0 rc1
-----------------
*(Released Fri, 19 Nov 2010)*
A number of bugfixes and documentation updates:
- Update ganeti-os-interface documentation
- Fixed a bug related to duplicate MACs or similar items which should be
unique
- Fix breakage in OS state modify
- Reinstall instance: disallow offline secondaries (fixes bug related to
OS changing but reinstall failing)
- plus all the other fixes between 2.2.1 and 2.2.2
Version 2.3.0 rc0
-----------------
*(Released Tue, 2 Nov 2010)*
- Fixed clearing of the default iallocator using ``gnt-cluster modify``
- Fixed master failover race with watcher
- Fixed a bug in ``gnt-node modify`` which could lead to an inconsistent
configuration
- Accept previously stopped instance for export with instance removal
- Simplify and extend the environment variables for instance OS scripts
- Added new node flags, ``master_capable`` and ``vm_capable``
- Added optional instance disk wiping prior during allocation. This is a
cluster-wide option and can be set/modified using
``gnt-cluster {init,modify} --prealloc-wipe-disks``.
- Added IPv6 support, see :doc:`design document ` and
:doc:`install-quick`
- Added a new watcher option (``--ignore-pause``)
- Added option to ignore offline node on instance start/stop
(``--ignore-offline``)
- Allow overriding OS parameters with ``gnt-instance reinstall``
- Added ability to change node's secondary IP address using ``gnt-node
modify``
- Implemented privilege separation for all daemons except
``ganeti-noded``, see ``configure`` options
- Complain if an instance's disk is marked faulty in ``gnt-cluster
verify``
- Implemented job priorities (see ``ganeti(7)`` manpage)
- Ignore failures while shutting down instances during failover from
offline node
- Exit daemon's bootstrap process only once daemon is ready
- Export more information via ``LUInstanceQuery``/remote API
- Improved documentation, QA and unittests
- RAPI daemon now watches ``rapi_users`` all the time and doesn't need a
restart if the file was created or changed
- Added LUXI protocol version sent with each request and response,
allowing detection of server/client mismatches
- Moved the Python scripts among gnt-* and ganeti-* into modules
- Moved all code related to setting up SSH to an external script,
``setup-ssh``
- Infrastructure changes for node group support in future versions
Version 2.2.2
-------------
*(Released Fri, 19 Nov 2010)*
A few small bugs fixed, and some improvements to the build system:
- Fix documentation regarding conversion to drbd
- Fix validation of parameters in cluster modify (``gnt-cluster modify
-B``)
- Fix error handling in node modify with multiple changes
- Allow remote imports without checked names
Version 2.2.1
-------------
*(Released Tue, 19 Oct 2010)*
- Disable SSL session ID cache in RPC client
Version 2.2.1 rc1
-----------------
*(Released Thu, 14 Oct 2010)*
- Fix interaction between Curl/GnuTLS and the Python's HTTP server
(thanks Apollon Oikonomopoulos!), finally allowing the use of Curl
with GnuTLS
- Fix problems with interaction between Curl and Python's HTTP server,
resulting in increased speed in many RPC calls
- Improve our release script to prevent breakage with older aclocal and
Python 2.6
Version 2.2.1 rc0
-----------------
*(Released Thu, 7 Oct 2010)*
- Fixed issue 125, replace hardcoded "xenvg" in ``gnt-cluster`` with
value retrieved from master
- Added support for blacklisted or hidden OS definitions
- Added simple lock monitor (accessible via (``gnt-debug locks``)
- Added support for -mem-path in KVM hypervisor abstraction layer
- Allow overriding instance parameters in tool for inter-cluster
instance moves (``tools/move-instance``)
- Improved opcode summaries (e.g. in ``gnt-job list``)
- Improve consistency of OS listing by sorting it
- Documentation updates
Version 2.2.0.1
---------------
*(Released Fri, 8 Oct 2010)*
- Rebuild with a newer autotools version, to fix python 2.6 compatibility
Version 2.2.0
-------------
*(Released Mon, 4 Oct 2010)*
- Fixed regression in ``gnt-instance rename``
Version 2.2.0 rc2
-----------------
*(Released Wed, 22 Sep 2010)*
- Fixed OS_VARIANT variable for OS scripts
- Fixed cluster tag operations via RAPI
- Made ``setup-ssh`` exit with non-zero code if an error occurred
- Disabled RAPI CA checks in watcher
Version 2.2.0 rc1
-----------------
*(Released Mon, 23 Aug 2010)*
- Support DRBD versions of the format "a.b.c.d"
- Updated manpages
- Re-introduce support for usage from multiple threads in RAPI client
- Instance renames and modify via RAPI
- Work around race condition between processing and archival in job
queue
- Mark opcodes following failed one as failed, too
- Job field ``lock_status`` was removed due to difficulties making it
work with the changed job queue in Ganeti 2.2; a better way to monitor
locks is expected for a later 2.2.x release
- Fixed dry-run behaviour with many commands
- Support ``ssh-agent`` again when adding nodes
- Many additional bugfixes
Version 2.2.0 rc0
-----------------
*(Released Fri, 30 Jul 2010)*
Important change: the internal RPC mechanism between Ganeti nodes has
changed from using a home-grown http library (based on the Python base
libraries) to use the PycURL library. This requires that PycURL is
installed on nodes. Please note that on Debian/Ubuntu, PycURL is linked
against GnuTLS by default. cURL's support for GnuTLS had known issues
before cURL 7.21.0 and we recommend using the latest cURL release or
linking against OpenSSL. Most other distributions already link PycURL
and cURL against OpenSSL. The command::
python -c 'import pycurl; print pycurl.version'
can be used to determine the libraries PycURL and cURL are linked
against.
Other significant changes:
- Rewrote much of the internals of the job queue, in order to achieve
better parallelism; this decouples job query operations from the job
processing, and it should allow much nicer behaviour of the master
daemon under load, and it also has uncovered some long-standing bugs
related to the job serialisation (now fixed)
- Added a default iallocator setting to the cluster parameters,
eliminating the need to always pass nodes or an iallocator for
operations that require selection of new node(s)
- Added experimental support for the LXC virtualization method
- Added support for OS parameters, which allows the installation of
instances to pass parameter to OS scripts in order to customise the
instance
- Added a hypervisor parameter controlling the migration type (live or
non-live), since hypervisors have various levels of reliability; this
has renamed the 'live' parameter to 'mode'
- Added a cluster parameter ``reserved_lvs`` that denotes reserved
logical volumes, meaning that cluster verify will ignore them and not
flag their presence as errors
- The watcher will now reset the error count for failed instances after
8 hours, thus allowing self-healing if the problem that caused the
instances to be down/fail to start has cleared in the meantime
- Added a cluster parameter ``drbd_usermode_helper`` that makes Ganeti
check for, and warn, if the drbd module parameter ``usermode_helper``
is not consistent with the cluster-wide setting; this is needed to
make diagnose easier of failed drbd creations
- Started adding base IPv6 support, but this is not yet
enabled/available for use
- Rename operations (cluster, instance) will now return the new name,
which is especially useful if a short name was passed in
- Added support for instance migration in RAPI
- Added a tool to pre-configure nodes for the SSH setup, before joining
them to the cluster; this will allow in the future a simplified model
for node joining (but not yet fully enabled in 2.2); this needs the
paramiko python library
- Fixed handling of name-resolving errors
- Fixed consistency of job results on the error path
- Fixed master-failover race condition when executed multiple times in
sequence
- Fixed many bugs related to the job queue (mostly introduced during the
2.2 development cycle, so not all are impacting 2.1)
- Fixed instance migration with missing disk symlinks
- Fixed handling of unknown jobs in ``gnt-job archive``
- And many other small fixes/improvements
Internal changes:
- Enhanced both the unittest and the QA coverage
- Switched the opcode validation to a generic model, and extended the
validation to all opcode parameters
- Changed more parts of the code that write shell scripts to use the
same class for this
- Switched the master daemon to use the asyncore library for the Luxi
server endpoint
Version 2.2.0 beta0
-------------------
*(Released Thu, 17 Jun 2010)*
- Added tool (``move-instance``) and infrastructure to move instances
between separate clusters (see :doc:`separate documentation
` and :doc:`design document `)
- Added per-request RPC timeout
- RAPI now requires a Content-Type header for requests with a body (e.g.
``PUT`` or ``POST``) which must be set to ``application/json`` (see
:rfc:`2616` (HTTP/1.1), section 7.2.1)
- ``ganeti-watcher`` attempts to restart ``ganeti-rapi`` if RAPI is not
reachable
- Implemented initial support for running Ganeti daemons as separate
users, see configure-time flags ``--with-user-prefix`` and
``--with-group-prefix`` (only ``ganeti-rapi`` is supported at this
time)
- Instances can be removed after export (``gnt-backup export
--remove-instance``)
- Self-signed certificates generated by Ganeti now use a 2048 bit RSA
key (instead of 1024 bit)
- Added new cluster configuration file for cluster domain secret
- Import/export now use SSL instead of SSH
- Added support for showing estimated time when exporting an instance,
see the ``ganeti-os-interface(7)`` manpage and look for
``EXP_SIZE_FD``
Version 2.1.8
-------------
*(Released Tue, 16 Nov 2010)*
Some more bugfixes. Unless critical bugs occur, this will be the last
2.1 release:
- Fix case of MAC special-values
- Fix mac checker regex
- backend: Fix typo causing "out of range" error
- Add missing --units in gnt-instance list man page
Version 2.1.7
-------------
*(Released Tue, 24 Aug 2010)*
Bugfixes only:
- Don't ignore secondary node silently on non-mirrored disk templates
(issue 113)
- Fix --master-netdev arg name in gnt-cluster(8) (issue 114)
- Fix usb_mouse parameter breaking with vnc_console (issue 109)
- Properly document the usb_mouse parameter
- Fix path in ganeti-rapi(8) (issue 116)
- Adjust error message when the ganeti user's .ssh directory is
missing
- Add same-node-check when changing the disk template to drbd
Version 2.1.6
-------------
*(Released Fri, 16 Jul 2010)*
Bugfixes only:
- Add an option to only select some reboot types during qa/burnin.
(on some hypervisors consequent reboots are not supported)
- Fix infrequent race condition in master failover. Sometimes the old
master ip address would be still detected as up for a short time
after it was removed, causing failover to fail.
- Decrease mlockall warnings when the ctypes module is missing. On
Python 2.4 we support running even if no ctypes module is installed,
but we were too verbose about this issue.
- Fix building on old distributions, on which man doesn't have a
--warnings option.
- Fix RAPI not to ignore the MAC address on instance creation
- Implement the old instance creation format in the RAPI client.
Version 2.1.5
-------------
*(Released Thu, 01 Jul 2010)*
A small bugfix release:
- Fix disk adoption: broken by strict --disk option checking in 2.1.4
- Fix batch-create: broken in the whole 2.1 series due to a lookup on
a non-existing option
- Fix instance create: the --force-variant option was ignored
- Improve pylint 0.21 compatibility and warnings with Python 2.6
- Fix modify node storage with non-FQDN arguments
- Fix RAPI client to authenticate under Python 2.6 when used
for more than 5 requests needing authentication
- Fix gnt-instance modify -t (storage) giving a wrong error message
when converting a non-shutdown drbd instance to plain
Version 2.1.4
-------------
*(Released Fri, 18 Jun 2010)*
A small bugfix release:
- Fix live migration of KVM instances started with older Ganeti
versions which had fewer hypervisor parameters
- Fix gnt-instance grow-disk on down instances
- Fix an error-reporting bug during instance migration
- Better checking of the ``--net`` and ``--disk`` values, to avoid
silently ignoring broken ones
- Fix an RPC error reporting bug affecting, for example, RAPI client
users
- Fix bug triggered by different API version os-es on different nodes
- Fix a bug in instance startup with custom hvparams: OS level
parameters would fail to be applied.
- Fix the RAPI client under Python 2.6 (but more work is needed to
make it work completely well with OpenSSL)
- Fix handling of errors when resolving names from DNS
Version 2.1.3
-------------
*(Released Thu, 3 Jun 2010)*
A medium sized development cycle. Some new features, and some
fixes/small improvements/cleanups.
Significant features
~~~~~~~~~~~~~~~~~~~~
The node deamon now tries to mlock itself into memory, unless the
``--no-mlock`` flag is passed. It also doesn't fail if it can't write
its logs, and falls back to console logging. This allows emergency
features such as ``gnt-node powercycle`` to work even in the event of a
broken node disk (tested offlining the disk hosting the node's
filesystem and dropping its memory caches; don't try this at home)
KVM: add vhost-net acceleration support. It can be tested with a new
enough version of the kernel and of qemu-kvm.
KVM: Add instance chrooting feature. If you use privilege dropping for
your VMs you can also now force them to chroot to an empty directory,
before starting the emulated guest.
KVM: Add maximum migration bandwith and maximum downtime tweaking
support (requires a new-enough version of qemu-kvm).
Cluster verify will now warn if the master node doesn't have the master
ip configured on it.
Add a new (incompatible) instance creation request format to RAPI which
supports all parameters (previously only a subset was supported, and it
wasn't possible to extend the old format to accomodate all the new
features. The old format is still supported, and a client can check for
this feature, before using it, by checking for its presence in the
``features`` RAPI resource.
Now with ancient latin support. Try it passing the ``--roman`` option to
``gnt-instance info``, ``gnt-cluster info`` or ``gnt-node list``
(requires the python-roman module to be installed, in order to work).
Other changes
~~~~~~~~~~~~~
As usual many internal code refactorings, documentation updates, and
such. Among others:
- Lots of improvements and cleanups to the experimental Remote API
(RAPI) client library.
- A new unit test suite for the core daemon libraries.
- A fix to creating missing directories makes sure the umask is not
applied anymore. This enforces the same directory permissions
everywhere.
- Better handling terminating daemons with ctrl+c (used when running
them in debugging mode).
- Fix a race condition in live migrating a KVM instance, when stat()
on the old proc status file returned EINVAL, which is an unexpected
value.
- Fixed manpage checking with newer man and utf-8 charachters. But now
you need the en_US.UTF-8 locale enabled to build Ganeti from git.
Version 2.1.2.1
---------------
*(Released Fri, 7 May 2010)*
Fix a bug which prevented untagged KVM instances from starting.
Version 2.1.2
-------------
*(Released Fri, 7 May 2010)*
Another release with a long development cycle, during which many
different features were added.
Significant features
~~~~~~~~~~~~~~~~~~~~
The KVM hypervisor now can run the individual instances as non-root, to
reduce the impact of a VM being hijacked due to bugs in the
hypervisor. It is possible to run all instances as a single (non-root)
user, to manually specify a user for each instance, or to dynamically
allocate a user out of a cluster-wide pool to each instance, with the
guarantee that no two instances will run under the same user ID on any
given node.
An experimental RAPI client library, that can be used standalone
(without the other Ganeti libraries), is provided in the source tree as
``lib/rapi/client.py``. Note this client might change its interface in
the future, as we iterate on its capabilities.
A new command, ``gnt-cluster renew-crypto`` has been added to easily
replace the cluster's certificates and crypto keys. This might help in
case they have been compromised, or have simply expired.
A new disk option for instance creation has been added that allows one
to "adopt" currently existing logical volumes, with data
preservation. This should allow easier migration to Ganeti from
unmanaged (or managed via other software) instances.
Another disk improvement is the possibility to convert between redundant
(DRBD) and plain (LVM) disk configuration for an instance. This should
allow better scalability (starting with one node and growing the
cluster, or shrinking a two-node cluster to one node).
A new feature that could help with automated node failovers has been
implemented: if a node sees itself as offline (by querying the master
candidates), it will try to shutdown (hard) all instances and any active
DRBD devices. This reduces the risk of duplicate instances if an
external script automatically failovers the instances on such nodes. To
enable this, the cluster parameter ``maintain_node_health`` should be
enabled; in the future this option (per the name) will enable other
automatic maintenance features.
Instance export/import now will reuse the original instance
specifications for all parameters; that means exporting an instance,
deleting it and the importing it back should give an almost identical
instance. Note that the default import behaviour has changed from
before, where it created only one NIC; now it recreates the original
number of NICs.
Cluster verify has added a few new checks: SSL certificates validity,
/etc/hosts consistency across the cluster, etc.
Other changes
~~~~~~~~~~~~~
As usual, many internal changes were done, documentation fixes,
etc. Among others:
- Fixed cluster initialization with disabled cluster storage (regression
introduced in 2.1.1)
- File-based storage supports growing the disks
- Fixed behaviour of node role changes
- Fixed cluster verify for some corner cases, plus a general rewrite of
cluster verify to allow future extension with more checks
- Fixed log spamming by watcher and node daemon (regression introduced
in 2.1.1)
- Fixed possible validation issues when changing the list of enabled
hypervisors
- Fixed cleanup of /etc/hosts during node removal
- Fixed RAPI response for invalid methods
- Fixed bug with hashed passwords in ``ganeti-rapi`` daemon
- Multiple small improvements to the KVM hypervisor (VNC usage, booting
from ide disks, etc.)
- Allow OS changes without re-installation (to record a changed OS
outside of Ganeti, or to allow OS renames)
- Allow instance creation without OS installation (useful for example if
the OS will be installed manually, or restored from a backup not in
Ganeti format)
- Implemented option to make cluster ``copyfile`` use the replication
network
- Added list of enabled hypervisors to ssconf (possibly useful for
external scripts)
- Added a new tool (``tools/cfgupgrade12``) that allows upgrading from
1.2 clusters
- A partial form of node re-IP is possible via node readd, which now
allows changed node primary IP
- Command line utilities now show an informational message if the job is
waiting for a lock
- The logs of the master daemon now show the PID/UID/GID of the
connected client
Version 2.1.1
-------------
*(Released Fri, 12 Mar 2010)*
During the 2.1.0 long release candidate cycle, a lot of improvements and
changes have accumulated with were released later as 2.1.1.
Major changes
~~~~~~~~~~~~~
The node evacuate command (``gnt-node evacuate``) was significantly
rewritten, and as such the IAllocator protocol was changed - a new
request type has been added. This unfortunate change during a stable
series is designed to improve performance of node evacuations; on
clusters with more than about five nodes and which are well-balanced,
evacuation should proceed in parallel for all instances of the node
being evacuated. As such, any existing IAllocator scripts need to be
updated, otherwise the above command will fail due to the unknown
request. The provided "dumb" allocator has not been updated; but the
ganeti-htools package supports the new protocol since version 0.2.4.
Another important change is increased validation of node and instance
names. This might create problems in special cases, if invalid host
names are being used.
Also, a new layer of hypervisor parameters has been added, that sits at
OS level between the cluster defaults and the instance ones. This allows
customisation of virtualization parameters depending on the installed
OS. For example instances with OS 'X' may have a different KVM kernel
(or any other parameter) than the cluster defaults. This is intended to
help managing a multiple OSes on the same cluster, without manual
modification of each instance's parameters.
A tool for merging clusters, ``cluster-merge``, has been added in the
tools sub-directory.
Bug fixes
~~~~~~~~~
- Improved the int/float conversions that should make the code more
robust in face of errors from the node daemons
- Fixed the remove node code in case of internal configuration errors
- Fixed the node daemon behaviour in face of inconsistent queue
directory (e.g. read-only file-system where we can't open the files
read-write, etc.)
- Fixed the behaviour of gnt-node modify for master candidate demotion;
now it either aborts cleanly or, if given the new "auto_promote"
parameter, will automatically promote other nodes as needed
- Fixed compatibility with (unreleased yet) Python 2.6.5 that would
completely prevent Ganeti from working
- Fixed bug for instance export when not all disks were successfully
exported
- Fixed behaviour of node add when the new node is slow in starting up
the node daemon
- Fixed handling of signals in the LUXI client, which should improve
behaviour of command-line scripts
- Added checks for invalid node/instance names in the configuration (now
flagged during cluster verify)
- Fixed watcher behaviour for disk activation errors
- Fixed two potentially endless loops in http library, which led to the
RAPI daemon hanging and consuming 100% CPU in some cases
- Fixed bug in RAPI daemon related to hashed passwords
- Fixed bug for unintended qemu-level bridging of multi-NIC KVM
instances
- Enhanced compatibility with non-Debian OSes, but not using absolute
path in some commands and allowing customisation of the ssh
configuration directory
- Fixed possible future issue with new Python versions by abiding to the
proper use of ``__slots__`` attribute on classes
- Added checks that should prevent directory traversal attacks
- Many documentation fixes based on feedback from users
New features
~~~~~~~~~~~~
- Added an "early_release" more for instance replace disks and node
evacuate, where we release locks earlier and thus allow higher
parallelism within the cluster
- Added watcher hooks, intended to allow the watcher to restart other
daemons (e.g. from the ganeti-nbma project), but they can be used of
course for any other purpose
- Added a compile-time disable for DRBD barriers, to increase
performance if the administrator trusts the power supply or the
storage system to not lose writes
- Added the option of using syslog for logging instead of, or in
addition to, Ganeti's own log files
- Removed boot restriction for paravirtual NICs for KVM, recent versions
can indeed boot from a paravirtual NIC
- Added a generic debug level for many operations; while this is not
used widely yet, it allows one to pass the debug value all the way to
the OS scripts
- Enhanced the hooks environment for instance moves (failovers,
migrations) where the primary/secondary nodes changed during the
operation, by adding {NEW,OLD}_{PRIMARY,SECONDARY} vars
- Enhanced data validations for many user-supplied values; one important
item is the restrictions imposed on instance and node names, which
might reject some (invalid) host names
- Add a configure-time option to disable file-based storage, if it's not
needed; this allows greater security separation between the master
node and the other nodes from the point of view of the inter-node RPC
protocol
- Added user notification in interactive tools if job is waiting in the
job queue or trying to acquire locks
- Added log messages when a job is waiting for locks
- Added filtering by node tags in instance operations which admit
multiple instances (start, stop, reboot, reinstall)
- Added a new tool for cluster mergers, ``cluster-merge``
- Parameters from command line which are of the form ``a=b,c=d`` can now
use backslash escapes to pass in values which contain commas,
e.g. ``a=b\\c,d=e`` where the 'a' parameter would get the value
``b,c``
- For KVM, the instance name is the first parameter passed to KVM, so
that it's more visible in the process list
Version 2.1.0
-------------
*(Released Tue, 2 Mar 2010)*
Ganeti 2.1 brings many improvements with it. Major changes:
- Added infrastructure to ease automated disk repairs
- Added new daemon to export configuration data in a cheaper way than
using the remote API
- Instance NICs can now be routed instead of being associated with a
networking bridge
- Improved job locking logic to reduce impact of jobs acquiring multiple
locks waiting for other long-running jobs
In-depth implementation details can be found in the Ganeti 2.1 design
document.
Details
~~~~~~~
- Added chroot hypervisor
- Added more options to xen-hvm hypervisor (``kernel_path`` and
``device_model``)
- Added more options to xen-pvm hypervisor (``use_bootloader``,
``bootloader_path`` and ``bootloader_args``)
- Added the ``use_localtime`` option for the xen-hvm and kvm
hypervisors, and the default value for this has changed to false (in
2.0 xen-hvm always enabled it)
- Added luxi call to submit multiple jobs in one go
- Added cluster initialization option to not modify ``/etc/hosts``
file on nodes
- Added network interface parameters
- Added dry run mode to some LUs
- Added RAPI resources:
- ``/2/instances/[instance_name]/info``
- ``/2/instances/[instance_name]/replace-disks``
- ``/2/nodes/[node_name]/evacuate``
- ``/2/nodes/[node_name]/migrate``
- ``/2/nodes/[node_name]/role``
- ``/2/nodes/[node_name]/storage``
- ``/2/nodes/[node_name]/storage/modify``
- ``/2/nodes/[node_name]/storage/repair``
- Added OpCodes to evacuate or migrate all instances on a node
- Added new command to list storage elements on nodes (``gnt-node
list-storage``) and modify them (``gnt-node modify-storage``)
- Added new ssconf files with master candidate IP address
(``ssconf_master_candidates_ips``), node primary IP address
(``ssconf_node_primary_ips``) and node secondary IP address
(``ssconf_node_secondary_ips``)
- Added ``ganeti-confd`` and a client library to query the Ganeti
configuration via UDP
- Added ability to run hooks after cluster initialization and before
cluster destruction
- Added automatic mode for disk replace (``gnt-instance replace-disks
--auto``)
- Added ``gnt-instance recreate-disks`` to re-create (empty) disks
after catastrophic data-loss
- Added ``gnt-node repair-storage`` command to repair damaged LVM volume
groups
- Added ``gnt-instance move`` command to move instances
- Added ``gnt-cluster watcher`` command to control watcher
- Added ``gnt-node powercycle`` command to powercycle nodes
- Added new job status field ``lock_status``
- Added parseable error codes to cluster verification (``gnt-cluster
verify --error-codes``) and made output less verbose (use
``--verbose`` to restore previous behaviour)
- Added UUIDs to the main config entities (cluster, nodes, instances)
- Added support for OS variants
- Added support for hashed passwords in the Ganeti remote API users file
(``rapi_users``)
- Added option to specify maximum timeout on instance shutdown
- Added ``--no-ssh-init`` option to ``gnt-cluster init``
- Added new helper script to start and stop Ganeti daemons
(``daemon-util``), with the intent to reduce the work necessary to
adjust Ganeti for non-Debian distributions and to start/stop daemons
from one place
- Added more unittests
- Fixed critical bug in ganeti-masterd startup
- Removed the configure-time ``kvm-migration-port`` parameter, this is
now customisable at the cluster level for both the KVM and Xen
hypervisors using the new ``migration_port`` parameter
- Pass ``INSTANCE_REINSTALL`` variable to OS installation script when
reinstalling an instance
- Allowed ``@`` in tag names
- Migrated to Sphinx (http://sphinx.pocoo.org/) for documentation
- Many documentation updates
- Distribute hypervisor files on ``gnt-cluster redist-conf``
- ``gnt-instance reinstall`` can now reinstall multiple instances
- Updated many command line parameters
- Introduced new OS API version 15
- No longer support a default hypervisor
- Treat virtual LVs as inexistent
- Improved job locking logic to reduce lock contention
- Match instance and node names case insensitively
- Reimplemented bash completion script to be more complete
- Improved burnin
Version 2.0.6
-------------
*(Released Thu, 4 Feb 2010)*
- Fix cleaner behaviour on nodes not in a cluster (Debian bug 568105)
- Fix a string formatting bug
- Improve safety of the code in some error paths
- Improve data validation in the master of values returned from nodes
Version 2.0.5
-------------
*(Released Thu, 17 Dec 2009)*
- Fix security issue due to missing validation of iallocator names; this
allows local and remote execution of arbitrary executables
- Fix failure of gnt-node list during instance removal
- Ship the RAPI documentation in the archive
Version 2.0.4
-------------
*(Released Wed, 30 Sep 2009)*
- Fixed many wrong messages
- Fixed a few bugs related to the locking library
- Fixed MAC checking at instance creation time
- Fixed a DRBD parsing bug related to gaps in /proc/drbd
- Fixed a few issues related to signal handling in both daemons and
scripts
- Fixed the example startup script provided
- Fixed insserv dependencies in the example startup script (patch from
Debian)
- Fixed handling of drained nodes in the iallocator framework
- Fixed handling of KERNEL_PATH parameter for xen-hvm (Debian bug
#528618)
- Fixed error related to invalid job IDs in job polling
- Fixed job/opcode persistence on unclean master shutdown
- Fixed handling of partial job processing after unclean master
shutdown
- Fixed error reporting from LUs, previously all errors were converted
into execution errors
- Fixed error reporting from burnin
- Decreased significantly the memory usage of the job queue
- Optimised slightly multi-job submission
- Optimised slightly opcode loading
- Backported the multi-job submit framework from the development
branch; multi-instance start and stop should be faster
- Added script to clean archived jobs after 21 days; this will reduce
the size of the queue directory
- Added some extra checks in disk size tracking
- Added an example ethers hook script
- Added a cluster parameter that prevents Ganeti from modifying of
/etc/hosts
- Added more node information to RAPI responses
- Added a ``gnt-job watch`` command that allows following the ouput of a
job
- Added a bind-address option to ganeti-rapi
- Added more checks to the configuration verify
- Enhanced the burnin script such that some operations can be retried
automatically
- Converted instance reinstall to multi-instance model
Version 2.0.3
-------------
*(Released Fri, 7 Aug 2009)*
- Added ``--ignore-size`` to the ``gnt-instance activate-disks`` command
to allow using the pre-2.0.2 behaviour in activation, if any existing
instances have mismatched disk sizes in the configuration
- Added ``gnt-cluster repair-disk-sizes`` command to check and update
any configuration mismatches for disk sizes
- Added ``gnt-master cluste-failover --no-voting`` to allow master
failover to work on two-node clusters
- Fixed the ``--net`` option of ``gnt-backup import``, which was
unusable
- Fixed detection of OS script errors in ``gnt-backup export``
- Fixed exit code of ``gnt-backup export``
Version 2.0.2
-------------
*(Released Fri, 17 Jul 2009)*
- Added experimental support for stripped logical volumes; this should
enhance performance but comes with a higher complexity in the block
device handling; stripping is only enabled when passing
``--with-lvm-stripecount=N`` to ``configure``, but codepaths are
affected even in the non-stripped mode
- Improved resiliency against transient failures at the end of DRBD
resyncs, and in general of DRBD resync checks
- Fixed a couple of issues with exports and snapshot errors
- Fixed a couple of issues in instance listing
- Added display of the disk size in ``gnt-instance info``
- Fixed checking for valid OSes in instance creation
- Fixed handling of the "vcpus" parameter in instance listing and in
general of invalid parameters
- Fixed http server library, and thus RAPI, to handle invalid
username/password combinations correctly; this means that now they
report unauthorized for queries too, not only for modifications,
allowing earlier detect of configuration problems
- Added a new "role" node list field, equivalent to the master/master
candidate/drained/offline flags combinations
- Fixed cluster modify and changes of candidate pool size
- Fixed cluster verify error messages for wrong files on regular nodes
- Fixed a couple of issues with node demotion from master candidate role
- Fixed node readd issues
- Added non-interactive mode for ``ganeti-masterd --no-voting`` startup
- Added a new ``--no-voting`` option for masterfailover to fix failover
on two-nodes clusters when the former master node is unreachable
- Added instance reinstall over RAPI
Version 2.0.1
-------------
*(Released Tue, 16 Jun 2009)*
- added ``-H``/``-B`` startup parameters to ``gnt-instance``, which will
allow re-adding the start in single-user option (regression from 1.2)
- the watcher writes the instance status to a file, to allow monitoring
to report the instance status (from the master) based on cached
results of the watcher's queries; while this can get stale if the
watcher is being locked due to other work on the cluster, this is
still an improvement
- the watcher now also restarts the node daemon and the rapi daemon if
they died
- fixed the watcher to handle full and drained queue cases
- hooks export more instance data in the environment, which helps if
hook scripts need to take action based on the instance's properties
(no longer need to query back into ganeti)
- instance failovers when the instance is stopped do not check for free
RAM, so that failing over a stopped instance is possible in low memory
situations
- rapi uses queries for tags instead of jobs (for less job traffic), and
for cluster tags it won't talk to masterd at all but read them from
ssconf
- a couple of error handling fixes in RAPI
- drbd handling: improved the error handling of inconsistent disks after
resync to reduce the frequency of "there are some degraded disks for
this instance" messages
- fixed a bug in live migration when DRBD doesn't want to reconnect (the
error handling path called a wrong function name)
Version 2.0.0
-------------
*(Released Wed, 27 May 2009)*
- no changes from rc5
Version 2.0 rc5
---------------
*(Released Wed, 20 May 2009)*
- fix a couple of bugs (validation, argument checks)
- fix ``gnt-cluster getmaster`` on non-master nodes (regression)
- some small improvements to RAPI and IAllocator
- make watcher automatically start the master daemon if down
Version 2.0 rc4
---------------
*(Released Mon, 27 Apr 2009)*
- change the OS list to not require locks; this helps with big clusters
- fix ``gnt-cluster verify`` and ``gnt-cluster verify-disks`` when the
volume group is broken
- ``gnt-instance info``, without any arguments, doesn't run for all
instances anymore; either pass ``--all`` or pass the desired
instances; this helps against mistakes on big clusters where listing
the information for all instances takes a long time
- miscellaneous doc and man pages fixes
Version 2.0 rc3
---------------
*(Released Wed, 8 Apr 2009)*
- Change the internal locking model of some ``gnt-node`` commands, in
order to reduce contention (and blocking of master daemon) when
batching many creation/reinstall jobs
- Fixes to Xen soft reboot
- No longer build documentation at build time, instead distribute it in
the archive, in order to reduce the need for the whole docbook/rst
toolchains
Version 2.0 rc2
---------------
*(Released Fri, 27 Mar 2009)*
- Now the cfgupgrade scripts works and can upgrade 1.2.7 clusters to 2.0
- Fix watcher startup sequence, improves the behaviour of busy clusters
- Some other fixes in ``gnt-cluster verify``, ``gnt-instance
replace-disks``, ``gnt-instance add``, ``gnt-cluster queue``, KVM VNC
bind address and other places
- Some documentation fixes and updates
Version 2.0 rc1
---------------
*(Released Mon, 2 Mar 2009)*
- More documentation updates, now all docs should be more-or-less
up-to-date
- A couple of small fixes (mixed hypervisor clusters, offline nodes,
etc.)
- Added a customizable HV_KERNEL_ARGS hypervisor parameter (for Xen PVM
and KVM)
- Fix an issue related to $libdir/run/ganeti and cluster creation
Version 2.0 beta2
-----------------
*(Released Thu, 19 Feb 2009)*
- Xen PVM and KVM have switched the default value for the instance root
disk to the first partition on the first drive, instead of the whole
drive; this means that the OS installation scripts must be changed
accordingly
- Man pages have been updated
- RAPI has been switched by default to HTTPS, and the exported functions
should all work correctly
- RAPI v1 has been removed
- Many improvements to the KVM hypervisor
- Block device errors are now better reported
- Many other bugfixes and small improvements
Version 2.0 beta1
-----------------
*(Released Mon, 26 Jan 2009)*
- Version 2 is a general rewrite of the code and therefore the
differences are too many to list, see the design document for 2.0 in
the ``doc/`` subdirectory for more details
- In this beta version there is not yet a migration path from 1.2 (there
will be one in the final 2.0 release)
- A few significant changes are:
- all commands are executed by a daemon (``ganeti-masterd``) and the
various ``gnt-*`` commands are just front-ends to it
- all the commands are entered into, and executed from a job queue,
see the ``gnt-job(8)`` manpage
- the RAPI daemon supports read-write operations, secured by basic
HTTP authentication on top of HTTPS
- DRBD version 0.7 support has been removed, DRBD 8 is the only
supported version (when migrating from Ganeti 1.2 to 2.0, you need
to migrate to DRBD 8 first while still running Ganeti 1.2)
- DRBD devices are using statically allocated minor numbers, which
will be assigned to existing instances during the migration process
- there is support for both Xen PVM and Xen HVM instances running on
the same cluster
- KVM virtualization is supported too
- file-based storage has been implemented, which means that it is
possible to run the cluster without LVM and DRBD storage, for
example using a shared filesystem exported from shared storage (and
still have live migration)
Version 1.2.7
-------------
*(Released Tue, 13 Jan 2009)*
- Change the default reboot type in ``gnt-instance reboot`` to "hard"
- Reuse the old instance mac address by default on instance import, if
the instance name is the same.
- Handle situations in which the node info rpc returns incomplete
results (issue 46)
- Add checks for tcp/udp ports collisions in ``gnt-cluster verify``
- Improved version of batcher:
- state file support
- instance mac address support
- support for HVM clusters/instances
- Add an option to show the number of cpu sockets and nodes in
``gnt-node list``
- Support OSes that handle more than one version of the OS api (but do
not change the current API in any other way)
- Fix ``gnt-node migrate``
- ``gnt-debug`` man page
- Fixes various more typos and small issues
- Increase disk resync maximum speed to 60MB/s (from 30MB/s)
Version 1.2.6
-------------
*(Released Wed, 24 Sep 2008)*
- new ``--hvm-nic-type`` and ``--hvm-disk-type`` flags to control the
type of disk exported to fully virtualized instances.
- provide access to the serial console of HVM instances
- instance auto_balance flag, set by default. If turned off it will
avoid warnings on cluster verify if there is not enough memory to fail
over an instance. in the future it will prevent automatically failing
it over when we will support that.
- batcher tool for instance creation, see ``tools/README.batcher``
- ``gnt-instance reinstall --select-os`` to interactively select a new
operating system when reinstalling an instance.
- when changing the memory amount on instance modify a check has been
added that the instance will be able to start. also warnings are
emitted if the instance will not be able to fail over, if auto_balance
is true.
- documentation fixes
- sync fields between ``gnt-instance list/modify/add/import``
- fix a race condition in drbd when the sync speed was set after giving
the device a remote peer.
Version 1.2.5
-------------
*(Released Tue, 22 Jul 2008)*
- note: the allowed size and number of tags per object were reduced
- fix a bug in ``gnt-cluster verify`` with inconsistent volume groups
- fixed twisted 8.x compatibility
- fixed ``gnt-instance replace-disks`` with iallocator
- add TCP keepalives on twisted connections to detect restarted nodes
- disk increase support, see ``gnt-instance grow-disk``
- implement bulk node/instance query for RAPI
- add tags in node/instance listing (optional)
- experimental migration (and live migration) support, read the man page
for ``gnt-instance migrate``
- the ``ganeti-watcher`` logs are now timestamped, and the watcher also
has some small improvements in handling its state file
Version 1.2.4
-------------
*(Released Fri, 13 Jun 2008)*
- Experimental readonly, REST-based remote API implementation;
automatically started on master node, TCP port 5080, if enabled by
``--enable-rapi`` parameter to configure script.
- Instance allocator support. Add and import instance accept a
``--iallocator`` parameter, and call that instance allocator to decide
which node to use for the instance. The iallocator document describes
what's expected from an allocator script.
- ``gnt-cluster verify`` N+1 memory redundancy checks: Unless passed the
``--no-nplus1-mem`` option ``gnt-cluster verify`` now checks that if a
node is lost there is still enough memory to fail over the instances
that reside on it.
- ``gnt-cluster verify`` hooks: it is now possible to add post-hooks to
``gnt-cluster verify``, to check for site-specific compliance. All the
hooks will run, and their output, if any, will be displayed. Any
failing hook will make the verification return an error value.
- ``gnt-cluster verify`` now checks that its peers are reachable on the
primary and secondary interfaces
- ``gnt-node add`` now supports the ``--readd`` option, to readd a node
that is still declared as part of the cluster and has failed.
- ``gnt-* list`` commands now accept a new ``-o +field`` way of
specifying output fields, that just adds the chosen fields to the
default ones.
- ``gnt-backup`` now has a new ``remove`` command to delete an existing
export from the filesystem.
- New per-instance parameters hvm_acpi, hvm_pae and hvm_cdrom_image_path
have been added. Using them you can enable/disable acpi and pae
support, and specify a path for a cd image to be exported to the
instance. These parameters as the name suggest only work on HVM
clusters.
- When upgrading an HVM cluster to Ganeti 1.2.4, the values for ACPI and
PAE support will be set to the previously hardcoded values, but the
(previously hardcoded) path to the CDROM ISO image will be unset and
if required, needs to be set manually with ``gnt-instance modify``
after the upgrade.
- The address to which an instance's VNC console is bound is now
selectable per-instance, rather than being cluster wide. Of course
this only applies to instances controlled via VNC, so currently just
applies to HVM clusters.
Version 1.2.3
-------------
*(Released Mon, 18 Feb 2008)*
- more tweaks to the disk activation code (especially helpful for DRBD)
- change the default ``gnt-instance list`` output format, now there is
one combined status field (see the manpage for the exact values this
field will have)
- some more fixes for the mac export to hooks change
- make Ganeti not break with DRBD 8.2.x (which changed the version
format in ``/proc/drbd``) (issue 24)
- add an upgrade tool from "remote_raid1" disk template to "drbd" disk
template, allowing migration from DRBD0.7+MD to DRBD8
Version 1.2.2
-------------
*(Released Wed, 30 Jan 2008)*
- fix ``gnt-instance modify`` breakage introduced in 1.2.1 with the HVM
support (issue 23)
- add command aliases infrastructure and a few aliases
- allow listing of VCPUs in the ``gnt-instance list`` and improve the
man pages and the ``--help`` option of ``gnt-node
list``/``gnt-instance list``
- fix ``gnt-backup list`` with down nodes (issue 21)
- change the tools location (move from $pkgdatadir to $pkglibdir/tools)
- fix the dist archive and add a check for including svn/git files in
the future
- some developer-related changes: improve the burnin and the QA suite,
add an upload script for testing during development
Version 1.2.1
-------------
*(Released Wed, 16 Jan 2008)*
- experimental HVM support, read the install document, section
"Initializing the cluster"
- allow for the PVM hypervisor per-instance kernel and initrd paths
- add a new command ``gnt-cluster verify-disks`` which uses a new
algorithm to improve the reconnection of the DRBD pairs if the device
on the secondary node has gone away
- make logical volume code auto-activate LVs at disk activation time
- slightly improve the speed of activating disks
- allow specification of the MAC address at instance creation time, and
changing it later via ``gnt-instance modify``
- fix handling of external commands that generate lots of output on
stderr
- update documentation with regard to minimum version of DRBD8 supported
Version 1.2.0
-------------
*(Released Tue, 4 Dec 2007)*
- Log the ``xm create`` output to the node daemon log on failure (to
help diagnosing the error)
- In debug mode, log all external commands output if failed to the logs
- Change parsing of lvm commands to ignore stderr
Version 1.2 beta3
-----------------
*(Released Wed, 28 Nov 2007)*
- Another round of updates to the DRBD 8 code to deal with more failures
in the replace secondary node operation
- Some more logging of failures in disk operations (lvm, drbd)
- A few documentation updates
- QA updates
Version 1.2 beta2
-----------------
*(Released Tue, 13 Nov 2007)*
- Change configuration file format from Python's Pickle to JSON.
Upgrading is possible using the cfgupgrade utility.
- Add support for DRBD 8.0 (new disk template ``drbd``) which allows for
faster replace disks and is more stable (DRBD 8 has many improvements
compared to DRBD 0.7)
- Added command line tags support (see man pages for ``gnt-instance``,
``gnt-node``, ``gnt-cluster``)
- Added instance rename support
- Added multi-instance startup/shutdown
- Added cluster rename support
- Added ``gnt-node evacuate`` to simplify some node operations
- Added instance reboot operation that can speedup reboot as compared to
stop and start
- Soften the requirement that hostnames are in FQDN format
- The ``ganeti-watcher`` now activates drbd pairs after secondary node
reboots
- Removed dependency on debian's patched fping that uses the
non-standard ``-S`` option
- Now the OS definitions are searched for in multiple, configurable
paths (easier for distros to package)
- Some changes to the hooks infrastructure (especially the new
post-configuration update hook)
- Other small bugfixes
.. vim: set textwidth=72 syntax=rst :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/README 0000644 0000000 0000000 00000000277 12634264163 0013743 0 ustar 00root root 0000000 0000000 Ganeti 2.15
===========
For installation instructions, read the INSTALL and the doc/install.rst
files.
For a brief introduction, read the ganeti(7) manpage and the other pages
it suggests.
ganeti-2.15.2/UPGRADE 0000644 0000000 0000000 00000035133 12634264163 0014074 0 ustar 00root root 0000000 0000000 Upgrade notes
=============
.. highlight:: shell-example
This document details the steps needed to upgrade a cluster to newer versions
of Ganeti.
As a general rule the node daemons need to be restarted after each software
upgrade; if using the provided example init.d script, this means running the
following command on all nodes::
$ /etc/init.d/ganeti restart
2.11 and above
--------------
Starting from 2.10 onwards, Ganeti has support for parallely installed versions
and automated upgrades. The default configuration for 2.11 and higher already is
to install as a parallel version without changing the running version. If both
versions, the installed one and the one to upgrade to, are 2.10 or higher, the
actual switch of the live version can be carried out by the following command
on the master node.::
$ gnt-cluster upgrade --to 2.11
This will carry out the steps described below in the section on upgrades from
2.1 and above. Downgrades to the previous minor version can be done in the same
way, specifiying the smaller version on the ``--to`` argument.
Note that ``gnt-cluster upgrade`` only manages the actual switch between
versions as described below on upgrades from 2.1 and above. It does not install
or remove any binaries. Having the new binaries installed is a prerequisite of
calling ``gnt-cluster upgrade`` (and the command will abort if the prerequisite
is not met). The old binaries can be used to downgrade back to the previous
version; once the system administrator decides that going back to the old
version is not needed any more, they can be removed. Addition and removal of
the Ganeti binaries should happen in the same way as for all other binaries on
your system.
2.13
----
When upgrading to 2.13, first apply the instructions of ``2.11 and
above``. 2.13 comes with the new feature of enhanced SSH security
through individual SSH keys. This features needs to be enabled
after the upgrade by::
$ gnt-cluster renew-crypto --new-ssh-keys --no-ssh-key-check
Note that new SSH keys are generated automatically without warning when
upgrading with ``gnt-cluster upgrade``.
If you instructed Ganeti to not touch the SSH setup (by using the
``--no-ssh-init`` option of ``gnt-cluster init``, the changes in the
handling of SSH keys will not affect your cluster.
If you want to be prompted for each newly created SSH key, leave out
the ``--no-ssh-key-check`` option in the command listed above.
Note that after a downgrade from 2.13 to 2.12, the individual SSH keys
will not get removed automatically. This can lead to reachability
errors under very specific circumstances (Issue 1008). In case you plan
on keeping 2.12 for a while and not upgrade to 2.13 again soon, we recommend
to replace all SSH key pairs of non-master nodes' with the master node's SSH
key pair.
2.12
----
Due to issue #1094 in Ganeti 2.11 and 2.12 up to version 2.12.4, we
advise to rerun 'gnt-cluster renew-crypto --new-node-certificates'
after an upgrade to 2.12.5 or higher.
2.11
----
When upgrading to 2.11, first apply the instructions of ``2.11 and
above``. 2.11 comes with the new feature of enhanced RPC security
through client certificates. This features needs to be enabled after the
upgrade by::
$ gnt-cluster renew-crypto --new-node-certificates
Note that new node certificates are generated automatically without
warning when upgrading with ``gnt-cluster upgrade``.
2.1 and above
-------------
Starting with Ganeti 2.0, upgrades between revisions (e.g. 2.1.0 to 2.1.1)
should not need manual intervention. As a safety measure, minor releases (e.g.
2.1.3 to 2.2.0) require the ``cfgupgrade`` command for changing the
configuration version. Below you find the steps necessary to upgrade between
minor releases.
To run commands on all nodes, the `distributed shell (dsh)
`_ can be used, e.g.
``dsh -M -F 8 -f /var/lib/ganeti/ssconf_online_nodes gnt-cluster --version``.
#. Ensure no jobs are running (master node only)::
$ gnt-job list
#. Pause the watcher for an hour (master node only)::
$ gnt-cluster watcher pause 1h
#. Stop all daemons on all nodes::
$ /etc/init.d/ganeti stop
#. Backup old configuration (master node only)::
$ tar czf /var/lib/ganeti-$(date +\%FT\%T).tar.gz -C /var/lib ganeti
(``/var/lib/ganeti`` can also contain exported instances, so make sure to
backup only files you are interested in. Use ``--exclude export`` for
example)
#. Install new Ganeti version on all nodes
#. Run cfgupgrade on the master node::
$ /usr/lib/ganeti/tools/cfgupgrade --verbose --dry-run
$ /usr/lib/ganeti/tools/cfgupgrade --verbose
(``cfgupgrade`` supports a number of parameters, run it with
``--help`` for more information)
#. Upgrade the directory permissions on all nodes::
$ /usr/lib/ganeti/ensure-dirs --full-run
Note that ensure-dirs does not create the directories for file
and shared-file storage. This is due to security reasons. They need
to be created manually. For details see ``man gnt-cluster``.
#. Create the (missing) required users and make users part of the required
groups on all nodes::
$ /usr/lib/ganeti/tools/users-setup
This will ask for confirmation. To execute directly, add the ``--yes-do-it``
option.
#. Restart daemons on all nodes::
$ /etc/init.d/ganeti restart
#. Re-distribute configuration (master node only)::
$ gnt-cluster redist-conf
#. If you use file storage, check that the ``/etc/ganeti/file-storage-paths``
is correct on all nodes. For security reasons it's not copied
automatically, but it can be copied manually via::
$ gnt-cluster copyfile /etc/ganeti/file-storage-paths
#. Restart daemons again on all nodes::
$ /etc/init.d/ganeti restart
#. Enable the watcher again (master node only)::
$ gnt-cluster watcher continue
#. Verify cluster (master node only)::
$ gnt-cluster verify
Reverting an upgrade
~~~~~~~~~~~~~~~~~~~~
For going back between revisions (e.g. 2.1.1 to 2.1.0) no manual
intervention is required, as for upgrades.
Starting from version 2.8, ``cfgupgrade`` supports ``--downgrade``
option to bring the configuration back to the previous stable version.
This is useful if you upgrade Ganeti and after some time you run into
problems with the new version. You can downgrade the configuration
without losing the changes made since the upgrade. Any feature not
supported by the old version will be removed from the configuration, of
course, but you get a warning about it. If there is any new feature and
you haven't changed from its default value, you don't have to worry
about it, as it will get the same value whenever you'll upgrade again.
Automatic downgrades
....................
From version 2.11 onwards, downgrades can be done by using the
``gnt-cluster upgrade`` command.::
gnt-cluster upgrade --to 2.10
Manual downgrades
.................
The procedure is similar to upgrading, but please notice that you have to
revert the configuration **before** installing the old version.
#. Ensure no jobs are running (master node only)::
$ gnt-job list
#. Pause the watcher for an hour (master node only)::
$ gnt-cluster watcher pause 1h
#. Stop all daemons on all nodes::
$ /etc/init.d/ganeti stop
#. Backup old configuration (master node only)::
$ tar czf /var/lib/ganeti-$(date +\%FT\%T).tar.gz -C /var/lib ganeti
#. Run cfgupgrade on the master node::
$ /usr/lib/ganeti/tools/cfgupgrade --verbose --downgrade --dry-run
$ /usr/lib/ganeti/tools/cfgupgrade --verbose --downgrade
You may want to copy all the messages about features that have been
removed during the downgrade, in case you want to restore them when
upgrading again.
#. Install the old Ganeti version on all nodes
NB: in Ganeti 2.8, the ``cmdlib.py`` file was split into a series of files
contained in the ``cmdlib`` directory. If Ganeti is installed from sources
and not from a package, while downgrading Ganeti to a pre-2.8
version it is important to remember to remove the ``cmdlib`` directory
from the directory containing the Ganeti python files (which usually is
``${PREFIX}/lib/python${VERSION}/dist-packages/ganeti``).
A simpler upgrade/downgrade procedure will be made available in future
versions of Ganeti.
#. Restart daemons on all nodes::
$ /etc/init.d/ganeti restart
#. Re-distribute configuration (master node only)::
$ gnt-cluster redist-conf
#. Restart daemons again on all nodes::
$ /etc/init.d/ganeti restart
#. Enable the watcher again (master node only)::
$ gnt-cluster watcher continue
#. Verify cluster (master node only)::
$ gnt-cluster verify
Specific tasks for 2.11 to 2.10 downgrade
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
After running ``cfgupgrade``, the ``client.pem`` and
``ssconf_master_candidates_certs`` files need to be removed
from Ganeti's data directory on all nodes. While this step is
not necessary for 2.10 to run cleanly, leaving them will cause
problems when upgrading again after the downgrade.
2.0 releases
------------
2.0.3 to 2.0.4
~~~~~~~~~~~~~~
No changes needed except restarting the daemon; but rollback to 2.0.3 might
require configuration editing.
If you're using Xen-HVM instances, please double-check the network
configuration (``nic_type`` parameter) as the defaults might have changed:
2.0.4 adds any missing configuration items and depending on the version of the
software the cluster has been installed with, some new keys might have been
added.
2.0.1 to 2.0.2/2.0.3
~~~~~~~~~~~~~~~~~~~~
Between 2.0.1 and 2.0.2 there have been some changes in the handling of block
devices, which can cause some issues. 2.0.3 was then released which adds two
new options/commands to fix this issue.
If you use DRBD-type instances and see problems in instance start or
activate-disks with messages from DRBD about "lower device too small" or
similar, it is recoomended to:
#. Run ``gnt-instance activate-disks --ignore-size $instance`` for each
of the affected instances
#. Then run ``gnt-cluster repair-disk-sizes`` which will check that
instances have the correct disk sizes
1.2 to 2.0
----------
Prerequisites:
- Ganeti 1.2.7 is currently installed
- All instances have been migrated from DRBD 0.7 to DRBD 8.x (i.e. no
``remote_raid1`` disk template)
- Upgrade to Ganeti 2.0.0~rc2 or later (~rc1 and earlier don't have the needed
upgrade tool)
In the below steps, replace :file:`/var/lib` with ``$libdir`` if Ganeti was not
installed with this prefix (e.g. :file:`/usr/local/var`). Same for
:file:`/usr/lib`.
Execution (all steps are required in the order given):
#. Make a backup of the current configuration, for safety::
$ cp -a /var/lib/ganeti /var/lib/ganeti-1.2.backup
#. Stop all instances::
$ gnt-instance stop --all
#. Make sure no DRBD device are in use, the following command should show no
active minors::
$ gnt-cluster command grep cs: /proc/drbd | grep -v cs:Unconf
#. Stop the node daemons and rapi daemon on all nodes (note: should be logged
in not via the cluster name, but the master node name, as the command below
will remove the cluster ip from the master node)::
$ gnt-cluster command /etc/init.d/ganeti stop
#. Install the new software on all nodes, either from packaging (if available)
or from sources; the master daemon will not start but give error messages
about wrong configuration file, which is normal
#. Upgrade the configuration file::
$ /usr/lib/ganeti/tools/cfgupgrade12 -v --dry-run
$ /usr/lib/ganeti/tools/cfgupgrade12 -v
#. Make sure ``ganeti-noded`` is running on all nodes (and start it if
not)
#. Start the master daemon::
$ ganeti-masterd
#. Check that a simple node-list works::
$ gnt-node list
#. Redistribute updated configuration to all nodes::
$ gnt-cluster redist-conf
$ gnt-cluster copyfile /var/lib/ganeti/known_hosts
#. Optional: if needed, install RAPI-specific certificates under
:file:`/var/lib/ganeti/rapi.pem` and run::
$ gnt-cluster copyfile /var/lib/ganeti/rapi.pem
#. Run a cluster verify, this should show no problems::
$ gnt-cluster verify
#. Remove some obsolete files::
$ gnt-cluster command rm /var/lib/ganeti/ssconf_node_pass
$ gnt-cluster command rm /var/lib/ganeti/ssconf_hypervisor
#. Update the xen pvm (if this was a pvm cluster) setting for 1.2
compatibility::
$ gnt-cluster modify -H xen-pvm:root_path=/dev/sda
#. Depending on your setup, you might also want to reset the initrd parameter::
$ gnt-cluster modify -H xen-pvm:initrd_path=/boot/initrd-2.6-xenU
#. Reset the instance autobalance setting to default::
$ for i in $(gnt-instance list -o name --no-headers); do \
gnt-instance modify -B auto_balance=default $i; \
done
#. Optional: start the RAPI demon::
$ ganeti-rapi
#. Restart instances::
$ gnt-instance start --force-multiple --all
At this point, ``gnt-cluster verify`` should show no errors and the migration
is complete.
1.2 releases
------------
1.2.4 to any other higher 1.2 version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No changes needed. Rollback will usually require manual edit of the
configuration file.
1.2.3 to 1.2.4
~~~~~~~~~~~~~~
No changes needed. Note that going back from 1.2.4 to 1.2.3 will require manual
edit of the configuration file (since we added some HVM-related new
attributes).
1.2.2 to 1.2.3
~~~~~~~~~~~~~~
No changes needed. Note that the drbd7-to-8 upgrade tool does a disk format
change for the DRBD metadata, so in theory this might be **risky**. It is
advised to have (good) backups before doing the upgrade.
1.2.1 to 1.2.2
~~~~~~~~~~~~~~
No changes needed.
1.2.0 to 1.2.1
~~~~~~~~~~~~~~
No changes needed. Only some bugfixes and new additions that don't affect
existing clusters.
1.2.0 beta 3 to 1.2.0
~~~~~~~~~~~~~~~~~~~~~
No changes needed.
1.2.0 beta 2 to beta 3
~~~~~~~~~~~~~~~~~~~~~~
No changes needed. A new version of the debian-etch-instance OS (0.3) has been
released, but upgrading it is not required.
1.2.0 beta 1 to beta 2
~~~~~~~~~~~~~~~~~~~~~~
Beta 2 switched the config file format to JSON. Steps to upgrade:
#. Stop the daemons (``/etc/init.d/ganeti stop``) on all nodes
#. Disable the cron job (default is :file:`/etc/cron.d/ganeti`)
#. Install the new version
#. Make a backup copy of the config file
#. Upgrade the config file using the following command::
$ /usr/share/ganeti/cfgupgrade --verbose /var/lib/ganeti/config.data
#. Start the daemons and run ``gnt-cluster info``, ``gnt-node list`` and
``gnt-instance list`` to check if the upgrade process finished successfully
The OS definition also need to be upgraded. There is a new version of the
debian-etch-instance OS (0.2) that goes along with beta 2.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/autogen.sh 0000755 0000000 0000000 00000000433 12634264163 0015056 0 ustar 00root root 0000000 0000000 #!/bin/sh
if test ! -f configure.ac ; then
echo "You must execute this script from the top level directory."
exit 1
fi
set -e
rm -rf config.cache autom4te.cache
${ACLOCAL:-aclocal} -I autotools
${AUTOCONF:-autoconf}
${AUTOMAKE:-automake} --add-missing
rm -rf autom4te.cache
ganeti-2.15.2/autotools/ 0000755 0000000 0000000 00000000000 12634264163 0015106 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/autotools/ac_ghc_pkg.m4 0000644 0000000 0000000 00000005211 12634264163 0017414 0 ustar 00root root 0000000 0000000 #####
# Copyright (C) 2012 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#####
#
# SYNOPSIS
#
# AC_GHC_PKG_CHECK(modname, action_found, action_not_found, extended)
#
# DESCRIPTION
#
# Checks for a Haskell (GHC) module. If found, execute the second
# argument, if not found, the third one.
#
# If the fourth argument is non-empty, then the check will be some
# via 'ghc-pkg list' (which supports patterns), otherwise it will
# use just 'ghc-pkg latest'.
#
#
#####
AC_DEFUN([AC_GHC_PKG_CHECK],[
if test -z $GHC_PKG; then
AC_MSG_ERROR([GHC_PKG not defined])
fi
AC_MSG_CHECKING([haskell library $1])
if test -n "$4"; then
GHC_PKG_RESULT=$($GHC_PKG --simple-output list '$1'|tail -n1)
else
GHC_PKG_RESULT=$($GHC_PKG latest '$1' 2>/dev/null)
fi
if test -n "$GHC_PKG_RESULT"; then
AC_MSG_RESULT($GHC_PKG_RESULT)
$2
else
AC_MSG_RESULT([no])
$3
fi
])
#####
#
# SYNOPSIS
#
# AC_GHC_PKG_REQUIRE(modname, extended)
#
# DESCRIPTION
#
# Checks for a Haskell (GHC) module, and abort if not found. If the
# second argument is non-empty, then the check will be some via
# 'ghc-pkg list' (which supports patterns), otherwise it will use
# just 'ghc-pkg latest'.
#
#
#####
AC_DEFUN([AC_GHC_PKG_REQUIRE],[
AC_GHC_PKG_CHECK($1, [],
[AC_MSG_FAILURE([Required Haskell module $1 not found])],
$2)
])
ganeti-2.15.2/autotools/ac_python_module.m4 0000644 0000000 0000000 00000001750 12634264163 0020704 0 ustar 00root root 0000000 0000000 ##### http://autoconf-archive.cryp.to/ac_python_module.html
#
# SYNOPSIS
#
# AC_PYTHON_MODULE(modname[, fatal])
#
# DESCRIPTION
#
# Checks for Python module.
#
# If fatal is non-empty then absence of a module will trigger an
# error.
#
# LAST MODIFICATION
#
# 2007-01-09
#
# COPYLEFT
#
# Copyright (c) 2007 Andrew Collier
#
# Copying and distribution of this file, with or without
# modification, are permitted in any medium without royalty provided
# the copyright notice and this notice are preserved.
AC_DEFUN([AC_PYTHON_MODULE],[
if test -z $PYTHON;
then
PYTHON="python"
fi
PYTHON_NAME=`basename $PYTHON`
AC_MSG_CHECKING($PYTHON_NAME module: $1)
$PYTHON -c "import $1" 2>/dev/null
if test $? -eq 0;
then
AC_MSG_RESULT(yes)
eval AS_TR_CPP(HAVE_PYMOD_$1)=yes
else
AC_MSG_RESULT(no)
eval AS_TR_CPP(HAVE_PYMOD_$1)=no
#
if test -n "$2"
then
AC_MSG_ERROR(failed to find required module $1)
exit 1
fi
fi
])
ganeti-2.15.2/autotools/build-bash-completion 0000755 0000000 0000000 00000066410 12634264163 0021224 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2009, 2010, 2011, 2012 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to generate bash_completion script for Ganeti.
"""
# pylint: disable=C0103
# [C0103] Invalid name build-bash-completion
import os
import os.path
import re
import itertools
import optparse
from cStringIO import StringIO
# _constants shouldn't be imported from anywhere except constants.py, but we're
# making an exception here because this script is only used at build time.
from ganeti import _constants
from ganeti import constants
from ganeti import cli
from ganeti import utils
from ganeti import build
from ganeti import pathutils
from ganeti.tools import burnin
#: Regular expression describing desired format of option names. Long names can
#: contain lowercase characters, numbers and dashes only.
_OPT_NAME_RE = re.compile(r"^-[a-zA-Z0-9]|--[a-z][-a-z0-9]+$")
def _WriteGntLog(sw, support_debug):
if support_debug:
sw.Write("_gnt_log() {")
sw.IncIndent()
try:
sw.Write("if [[ -n \"$GANETI_COMPL_LOG\" ]]; then")
sw.IncIndent()
try:
sw.Write("{")
sw.IncIndent()
try:
sw.Write("echo ---")
sw.Write("echo \"$@\"")
sw.Write("echo")
finally:
sw.DecIndent()
sw.Write("} >> $GANETI_COMPL_LOG")
finally:
sw.DecIndent()
sw.Write("fi")
finally:
sw.DecIndent()
sw.Write("}")
def _WriteNodes(sw):
sw.Write("_ganeti_nodes() {")
sw.IncIndent()
try:
node_list_path = os.path.join(pathutils.DATA_DIR, "ssconf_node_list")
sw.Write("cat %s 2>/dev/null || :", utils.ShellQuote(node_list_path))
finally:
sw.DecIndent()
sw.Write("}")
def _WriteInstances(sw):
sw.Write("_ganeti_instances() {")
sw.IncIndent()
try:
instance_list_path = os.path.join(pathutils.DATA_DIR,
"ssconf_instance_list")
sw.Write("cat %s 2>/dev/null || :", utils.ShellQuote(instance_list_path))
finally:
sw.DecIndent()
sw.Write("}")
def _WriteJobs(sw):
sw.Write("_ganeti_jobs() {")
sw.IncIndent()
try:
# FIXME: this is really going into the internals of the job queue
sw.Write(("local jlist=($( shopt -s nullglob &&"
" cd %s 2>/dev/null && echo job-* || : ))"),
utils.ShellQuote(pathutils.QUEUE_DIR))
sw.Write('echo "${jlist[@]/job-/}"')
finally:
sw.DecIndent()
sw.Write("}")
def _WriteOSAndIAllocator(sw):
for (fnname, paths) in [
("os", pathutils.OS_SEARCH_PATH),
("iallocator", constants.IALLOCATOR_SEARCH_PATH),
]:
sw.Write("_ganeti_%s() {", fnname)
sw.IncIndent()
try:
# FIXME: Make querying the master for all OSes cheap
for path in paths:
sw.Write("( shopt -s nullglob && cd %s 2>/dev/null && echo * || : )",
utils.ShellQuote(path))
finally:
sw.DecIndent()
sw.Write("}")
def _WriteNodegroup(sw):
sw.Write("_ganeti_nodegroup() {")
sw.IncIndent()
try:
nodegroups_path = os.path.join(pathutils.DATA_DIR, "ssconf_nodegroups")
sw.Write("cat %s 2>/dev/null || :", utils.ShellQuote(nodegroups_path))
finally:
sw.DecIndent()
sw.Write("}")
def _WriteNetwork(sw):
sw.Write("_ganeti_network() {")
sw.IncIndent()
try:
networks_path = os.path.join(pathutils.DATA_DIR, "ssconf_networks")
sw.Write("cat %s 2>/dev/null || :", utils.ShellQuote(networks_path))
finally:
sw.DecIndent()
sw.Write("}")
def _WriteFindFirstArg(sw):
# Params:
# Result variable: $first_arg_idx
sw.Write("_ganeti_find_first_arg() {")
sw.IncIndent()
try:
sw.Write("local w i")
sw.Write("first_arg_idx=")
sw.Write("for (( i=$1; i < COMP_CWORD; ++i )); do")
sw.IncIndent()
try:
sw.Write("w=${COMP_WORDS[$i]}")
# Skip option value
sw.Write("""if [[ -n "$2" && "$w" == @($2) ]]; then let ++i""")
# Skip
sw.Write("""elif [[ -n "$3" && "$w" == @($3) ]]; then :""")
# Ah, we found the first argument
sw.Write("else first_arg_idx=$i; break;")
sw.Write("fi")
finally:
sw.DecIndent()
sw.Write("done")
finally:
sw.DecIndent()
sw.Write("}")
def _WriteListOptions(sw):
# Params:
# Input variable: $first_arg_idx
# Result variables: $arg_idx, $choices
sw.Write("_ganeti_list_options() {")
sw.IncIndent()
try:
sw.Write("""if [[ -z "$first_arg_idx" ]]; then""")
sw.IncIndent()
try:
sw.Write("arg_idx=0")
# Show options only if the current word starts with a dash
sw.Write("""if [[ "$cur" == -* ]]; then""")
sw.IncIndent()
try:
sw.Write("choices=$1")
finally:
sw.DecIndent()
sw.Write("fi")
sw.Write("return")
finally:
sw.DecIndent()
sw.Write("fi")
# Calculate position of current argument
sw.Write("arg_idx=$(( COMP_CWORD - first_arg_idx ))")
sw.Write("choices=")
finally:
sw.DecIndent()
sw.Write("}")
def _WriteGntCheckopt(sw, support_debug):
# Params:
# Result variable: $optcur
sw.Write("_gnt_checkopt() {")
sw.IncIndent()
try:
sw.Write("""if [[ -n "$1" && "$cur" == @($1) ]]; then""")
sw.IncIndent()
try:
sw.Write("optcur=\"${cur#--*=}\"")
sw.Write("return 0")
finally:
sw.DecIndent()
sw.Write("""elif [[ -n "$2" && "$prev" == @($2) ]]; then""")
sw.IncIndent()
try:
sw.Write("optcur=\"$cur\"")
sw.Write("return 0")
finally:
sw.DecIndent()
sw.Write("fi")
if support_debug:
sw.Write("_gnt_log optcur=\"'$optcur'\"")
sw.Write("return 1")
finally:
sw.DecIndent()
sw.Write("}")
def _WriteGntCompgen(sw, support_debug):
# Params:
# Result variable: $COMPREPLY
sw.Write("_gnt_compgen() {")
sw.IncIndent()
try:
sw.Write("""COMPREPLY=( $(compgen "$@") )""")
if support_debug:
sw.Write("_gnt_log COMPREPLY=\"${COMPREPLY[@]}\"")
finally:
sw.DecIndent()
sw.Write("}")
def WritePreamble(sw, support_debug):
"""Writes the script preamble.
Helper functions should be written here.
"""
sw.Write("# This script is automatically generated at build time.")
sw.Write("# Do not modify manually.")
_WriteGntLog(sw, support_debug)
_WriteNodes(sw)
_WriteInstances(sw)
_WriteJobs(sw)
_WriteOSAndIAllocator(sw)
_WriteNodegroup(sw)
_WriteNetwork(sw)
_WriteFindFirstArg(sw)
_WriteListOptions(sw)
_WriteGntCheckopt(sw, support_debug)
_WriteGntCompgen(sw, support_debug)
def WriteCompReply(sw, args, cur="\"$cur\""):
sw.Write("_gnt_compgen %s -- %s", args, cur)
sw.Write("return")
class CompletionWriter(object):
"""Command completion writer class.
"""
def __init__(self, arg_offset, opts, args, support_debug):
self.arg_offset = arg_offset
self.opts = opts
self.args = args
self.support_debug = support_debug
for opt in opts:
# While documented, these variables aren't seen as public attributes by
# pylint. pylint: disable=W0212
opt.all_names = sorted(opt._short_opts + opt._long_opts)
invalid = list(itertools.ifilterfalse(_OPT_NAME_RE.match, opt.all_names))
if invalid:
raise Exception("Option names don't match regular expression '%s': %s" %
(_OPT_NAME_RE.pattern, utils.CommaJoin(invalid)))
def _FindFirstArgument(self, sw):
ignore = []
skip_one = []
for opt in self.opts:
if opt.takes_value():
# Ignore value
for i in opt.all_names:
if i.startswith("--"):
ignore.append("%s=*" % utils.ShellQuote(i))
skip_one.append(utils.ShellQuote(i))
else:
ignore.extend([utils.ShellQuote(i) for i in opt.all_names])
ignore = sorted(utils.UniqueSequence(ignore))
skip_one = sorted(utils.UniqueSequence(skip_one))
if ignore or skip_one:
# Try to locate first argument
sw.Write("_ganeti_find_first_arg %s %s %s",
self.arg_offset + 1,
utils.ShellQuote("|".join(skip_one)),
utils.ShellQuote("|".join(ignore)))
else:
# When there are no options the first argument is always at position
# offset + 1
sw.Write("first_arg_idx=%s", self.arg_offset + 1)
def _CompleteOptionValues(self, sw):
# Group by values
# "values" -> [optname1, optname2, ...]
values = {}
for opt in self.opts:
if not opt.takes_value():
continue
# Only static choices implemented so far (e.g. no node list)
suggest = getattr(opt, "completion_suggest", None)
# our custom option type
if opt.type == "bool":
suggest = ["yes", "no"]
if not suggest:
suggest = opt.choices
if (isinstance(suggest, (int, long)) and
suggest in cli.OPT_COMPL_ALL):
key = suggest
elif suggest:
key = " ".join(sorted(suggest))
else:
key = ""
values.setdefault(key, []).extend(opt.all_names)
# Don't write any code if there are no option values
if not values:
return
cur = "\"$optcur\""
wrote_opt = False
for (suggest, allnames) in values.items():
longnames = [i for i in allnames if i.startswith("--")]
if wrote_opt:
condcmd = "elif"
else:
condcmd = "if"
sw.Write("%s _gnt_checkopt %s %s; then", condcmd,
utils.ShellQuote("|".join(["%s=*" % i for i in longnames])),
utils.ShellQuote("|".join(allnames)))
sw.IncIndent()
try:
if suggest == cli.OPT_COMPL_MANY_NODES:
# TODO: Implement comma-separated values
WriteCompReply(sw, "-W ''", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_NODE:
WriteCompReply(sw, "-W \"$(_ganeti_nodes)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_INSTANCE:
WriteCompReply(sw, "-W \"$(_ganeti_instances)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_OS:
WriteCompReply(sw, "-W \"$(_ganeti_os)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_EXTSTORAGE:
WriteCompReply(sw, "-W \"$(_ganeti_extstorage)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_FILTER:
WriteCompReply(sw, "-W \"$(_ganeti_filter)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_IALLOCATOR:
WriteCompReply(sw, "-W \"$(_ganeti_iallocator)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_NODEGROUP:
WriteCompReply(sw, "-W \"$(_ganeti_nodegroup)\"", cur=cur)
elif suggest == cli.OPT_COMPL_ONE_NETWORK:
WriteCompReply(sw, "-W \"$(_ganeti_network)\"", cur=cur)
elif suggest == cli.OPT_COMPL_INST_ADD_NODES:
sw.Write("local tmp= node1= pfx= curvalue=\"${optcur#*:}\"")
sw.Write("if [[ \"$optcur\" == *:* ]]; then")
sw.IncIndent()
try:
sw.Write("node1=\"${optcur%%:*}\"")
sw.Write("if [[ \"$COMP_WORDBREAKS\" != *:* ]]; then")
sw.IncIndent()
try:
sw.Write("pfx=\"$node1:\"")
finally:
sw.DecIndent()
sw.Write("fi")
finally:
sw.DecIndent()
sw.Write("fi")
if self.support_debug:
sw.Write("_gnt_log pfx=\"'$pfx'\" curvalue=\"'$curvalue'\""
" node1=\"'$node1'\"")
sw.Write("for i in $(_ganeti_nodes); do")
sw.IncIndent()
try:
sw.Write("if [[ -z \"$node1\" ]]; then")
sw.IncIndent()
try:
sw.Write("tmp=\"$tmp $i $i:\"")
finally:
sw.DecIndent()
sw.Write("elif [[ \"$i\" != \"$node1\" ]]; then")
sw.IncIndent()
try:
sw.Write("tmp=\"$tmp $i\"")
finally:
sw.DecIndent()
sw.Write("fi")
finally:
sw.DecIndent()
sw.Write("done")
WriteCompReply(sw, "-P \"$pfx\" -W \"$tmp\"", cur="\"$curvalue\"")
else:
WriteCompReply(sw, "-W %s" % utils.ShellQuote(suggest), cur=cur)
finally:
sw.DecIndent()
wrote_opt = True
if wrote_opt:
sw.Write("fi")
return
def _CompleteArguments(self, sw):
if not (self.opts or self.args):
return
all_option_names = []
for opt in self.opts:
all_option_names.extend(opt.all_names)
all_option_names.sort()
# List options if no argument has been specified yet
sw.Write("_ganeti_list_options %s",
utils.ShellQuote(" ".join(all_option_names)))
if self.args:
last_idx = len(self.args) - 1
last_arg_end = 0
varlen_arg_idx = None
wrote_arg = False
sw.Write("compgenargs=")
for idx, arg in enumerate(self.args):
assert arg.min is not None and arg.min >= 0
assert not (idx < last_idx and arg.max is None)
if arg.min != arg.max or arg.max is None:
if varlen_arg_idx is not None:
raise Exception("Only one argument can have a variable length")
varlen_arg_idx = idx
compgenargs = []
if isinstance(arg, cli.ArgUnknown):
choices = ""
elif isinstance(arg, cli.ArgSuggest):
choices = utils.ShellQuote(" ".join(arg.choices))
elif isinstance(arg, cli.ArgInstance):
choices = "$(_ganeti_instances)"
elif isinstance(arg, cli.ArgNode):
choices = "$(_ganeti_nodes)"
elif isinstance(arg, cli.ArgGroup):
choices = "$(_ganeti_nodegroup)"
elif isinstance(arg, cli.ArgNetwork):
choices = "$(_ganeti_network)"
elif isinstance(arg, cli.ArgJobId):
choices = "$(_ganeti_jobs)"
elif isinstance(arg, cli.ArgOs):
choices = "$(_ganeti_os)"
elif isinstance(arg, cli.ArgExtStorage):
choices = "$(_ganeti_extstorage)"
elif isinstance(arg, cli.ArgFilter):
choices = "$(_ganeti_filter)"
elif isinstance(arg, cli.ArgFile):
choices = ""
compgenargs.append("-f")
elif isinstance(arg, cli.ArgCommand):
choices = ""
compgenargs.append("-c")
elif isinstance(arg, cli.ArgHost):
choices = ""
compgenargs.append("-A hostname")
else:
raise Exception("Unknown argument type %r" % arg)
if arg.min == 1 and arg.max == 1:
cmpcode = """"$arg_idx" == %d""" % (last_arg_end)
elif arg.max is None:
cmpcode = """"$arg_idx" -ge %d""" % (last_arg_end)
elif arg.min <= arg.max:
cmpcode = (""""$arg_idx" -ge %d && "$arg_idx" -lt %d""" %
(last_arg_end, last_arg_end + arg.max))
else:
raise Exception("Unable to generate argument position condition")
last_arg_end += arg.min
if choices or compgenargs:
if wrote_arg:
condcmd = "elif"
else:
condcmd = "if"
sw.Write("""%s [[ %s ]]; then""", condcmd, cmpcode)
sw.IncIndent()
try:
if choices:
sw.Write("""choices="$choices "%s""", choices)
if compgenargs:
sw.Write("compgenargs=%s",
utils.ShellQuote(" ".join(compgenargs)))
finally:
sw.DecIndent()
wrote_arg = True
if wrote_arg:
sw.Write("fi")
if self.args:
WriteCompReply(sw, """-W "$choices" $compgenargs""")
else:
# $compgenargs exists only if there are arguments
WriteCompReply(sw, '-W "$choices"')
def WriteTo(self, sw):
self._FindFirstArgument(sw)
self._CompleteOptionValues(sw)
self._CompleteArguments(sw)
def WriteCompletion(sw, scriptname, funcname, support_debug,
commands=None,
opts=None, args=None):
"""Writes the completion code for one command.
@type sw: ShellWriter
@param sw: Script writer
@type scriptname: string
@param scriptname: Name of command line program
@type funcname: string
@param funcname: Shell function name
@type commands: list
@param commands: List of all subcommands in this program
"""
sw.Write("%s() {", funcname)
sw.IncIndent()
try:
sw.Write("local "
' cur="${COMP_WORDS[COMP_CWORD]}"'
' prev="${COMP_WORDS[COMP_CWORD-1]}"'
' i first_arg_idx choices compgenargs arg_idx optcur')
if support_debug:
sw.Write("_gnt_log cur=\"$cur\" prev=\"$prev\"")
sw.Write("[[ -n \"$GANETI_COMPL_LOG\" ]] &&"
" _gnt_log \"$(set | grep ^COMP_)\"")
sw.Write("COMPREPLY=()")
if opts is not None and args is not None:
assert not commands
CompletionWriter(0, opts, args, support_debug).WriteTo(sw)
else:
sw.Write("""if [[ "$COMP_CWORD" == 1 ]]; then""")
sw.IncIndent()
try:
# Complete the command name
WriteCompReply(sw,
("-W %s" %
utils.ShellQuote(" ".join(sorted(commands.keys())))))
finally:
sw.DecIndent()
sw.Write("fi")
# Group commands by arguments and options
grouped_cmds = {}
for cmd, (_, argdef, optdef, _, _) in commands.items():
if not (argdef or optdef):
continue
grouped_cmds.setdefault((tuple(argdef), tuple(optdef)), set()).add(cmd)
# We're doing options and arguments to commands
sw.Write("""case "${COMP_WORDS[1]}" in""")
sort_grouped = sorted(grouped_cmds.items(),
key=lambda (_, y): sorted(y)[0])
for ((argdef, optdef), cmds) in sort_grouped:
assert argdef or optdef
sw.Write("%s)", "|".join(map(utils.ShellQuote, sorted(cmds))))
sw.IncIndent()
try:
CompletionWriter(1, optdef, argdef, support_debug).WriteTo(sw)
finally:
sw.DecIndent()
sw.Write(";;")
sw.Write("esac")
finally:
sw.DecIndent()
sw.Write("}")
sw.Write("complete -F %s -o filenames %s",
utils.ShellQuote(funcname),
utils.ShellQuote(scriptname))
def GetFunctionName(name):
return "_" + re.sub(r"[^a-z0-9]+", "_", name.lower())
def GetCommands(filename, module):
"""Returns the commands defined in a module.
Aliases are also added as commands.
"""
try:
commands = getattr(module, "commands")
except AttributeError:
raise Exception("Script %s doesn't have 'commands' attribute" %
filename)
# Add the implicit "--help" option
help_option = cli.cli_option("-h", "--help", default=False,
action="store_true")
for name, (_, _, optdef, _, _) in commands.items():
if help_option not in optdef:
optdef.append(help_option)
for opt in cli.COMMON_OPTS:
if opt in optdef:
raise Exception("Common option '%s' listed for command '%s' in %s" %
(opt, name, filename))
optdef.append(opt)
# Use aliases
aliases = getattr(module, "aliases", {})
if aliases:
commands = commands.copy()
for name, target in aliases.items():
commands[name] = commands[target]
return commands
def HaskellOptToOptParse(opts, kind):
"""Converts a Haskell options to Python cli_options.
@type opts: string
@param opts: comma-separated string with short and long options
@type kind: string
@param kind: type generated by Common.hs/complToText; needs to be
kept in sync
"""
# pylint: disable=W0142
# since we pass *opts in a number of places
opts = opts.split(",")
if kind == "none":
return cli.cli_option(*opts, action="store_true")
elif kind in ["file", "string", "host", "dir", "inetaddr"]:
return cli.cli_option(*opts, type="string")
elif kind == "integer":
return cli.cli_option(*opts, type="int")
elif kind == "float":
return cli.cli_option(*opts, type="float")
elif kind == "onegroup":
return cli.cli_option(*opts, type="string",
completion_suggest=cli.OPT_COMPL_ONE_NODEGROUP)
elif kind == "onenode":
return cli.cli_option(*opts, type="string",
completion_suggest=cli.OPT_COMPL_ONE_NODE)
elif kind == "manyinstances":
# FIXME: no support for many instances
return cli.cli_option(*opts, type="string")
elif kind.startswith("choices="):
choices = kind[len("choices="):].split(",")
return cli.cli_option(*opts, type="choice", choices=choices)
else:
# FIXME: there are many other currently unused completion types,
# should be added on an as-needed basis
raise Exception("Unhandled option kind '%s'" % kind)
#: serialised kind to arg type
_ARG_MAP = {
"choices": cli.ArgChoice,
"command": cli.ArgCommand,
"file": cli.ArgFile,
"host": cli.ArgHost,
"jobid": cli.ArgJobId,
"onegroup": cli.ArgGroup,
"oneinstance": cli.ArgInstance,
"onenode": cli.ArgNode,
"oneos": cli.ArgOs,
"string": cli.ArgUnknown,
"suggests": cli.ArgSuggest,
}
def HaskellArgToCliArg(kind, min_cnt, max_cnt):
"""Converts a Haskell options to Python _Argument.
@type kind: string
@param kind: type generated by Common.hs/argComplToText; needs to be
kept in sync
"""
min_cnt = int(min_cnt)
if max_cnt == "none":
max_cnt = None
else:
max_cnt = int(max_cnt)
# pylint: disable=W0142
# since we pass **kwargs
kwargs = {"min": min_cnt, "max": max_cnt}
if kind.startswith("choices=") or kind.startswith("suggest="):
(kind, choices) = kind.split("=", 1)
kwargs["choices"] = choices.split(",")
if kind not in _ARG_MAP:
raise Exception("Unhandled argument kind '%s'" % kind)
else:
return _ARG_MAP[kind](**kwargs)
def ParseHaskellOptsArgs(script, output):
"""Computes list of options/arguments from help-completion output.
"""
cli_opts = []
cli_args = []
for line in output.splitlines():
v = line.split(None)
exc = lambda msg: Exception("Invalid %s output from %s: %s" %
(msg, script, v))
if len(v) < 2:
raise exc("help completion")
if v[0].startswith("-"):
if len(v) != 2:
raise exc("option format")
(opts, kind) = v
cli_opts.append(HaskellOptToOptParse(opts, kind))
else:
if len(v) != 3:
raise exc("argument format")
(kind, min_cnt, max_cnt) = v
cli_args.append(HaskellArgToCliArg(kind, min_cnt, max_cnt))
return (cli_opts, cli_args)
def WriteHaskellCompletion(sw, script, htools=True, debug=True):
"""Generates completion information for a Haskell program.
This converts completion info from a Haskell program into 'fake'
cli_opts and then builds completion for them.
"""
if htools:
cmd = "./src/htools"
env = {"HTOOLS": script}
script_name = script
func_name = "htools_%s" % script
else:
cmd = "./" + script
env = {}
script_name = os.path.basename(script)
func_name = script_name
func_name = GetFunctionName(func_name)
output = utils.RunCmd([cmd, "--help-completion"], env=env, cwd=".").output
(opts, args) = ParseHaskellOptsArgs(script_name, output)
WriteCompletion(sw, script_name, func_name, debug, opts=opts, args=args)
def WriteHaskellCmdCompletion(sw, script, debug=True):
"""Generates completion information for a Haskell multi-command program.
This gathers the list of commands from a Haskell program and
computes the list of commands available, then builds the sub-command
list of options/arguments for each command, using that for building
a unified help output.
"""
cmd = "./" + script
script_name = os.path.basename(script)
func_name = script_name
func_name = GetFunctionName(func_name)
output = utils.RunCmd([cmd, "--help-completion"], cwd=".").output
commands = {}
lines = output.splitlines()
if len(lines) != 1:
raise Exception("Invalid lines in multi-command mode: %s" % str(lines))
v = lines[0].split(None)
exc = lambda msg: Exception("Invalid %s output from %s: %s" %
(msg, script, v))
if len(v) != 3:
raise exc("help completion in multi-command mode")
if not v[0].startswith("choices="):
raise exc("invalid format in multi-command mode '%s'" % v[0])
for subcmd in v[0][len("choices="):].split(","):
output = utils.RunCmd([cmd, subcmd, "--help-completion"], cwd=".").output
(opts, args) = ParseHaskellOptsArgs(script, output)
commands[subcmd] = (None, args, opts, None, None)
WriteCompletion(sw, script_name, func_name, debug, commands=commands)
def main():
parser = optparse.OptionParser(usage="%prog [--compact]")
parser.add_option("--compact", action="store_true",
help=("Don't indent output and don't include debugging"
" facilities"))
options, args = parser.parse_args()
if args:
parser.error("Wrong number of arguments")
# Whether to build debug version of completion script
debug = not options.compact
buf = StringIO()
sw = utils.ShellWriter(buf, indent=debug)
# Remember original state of extglob and enable it (required for pattern
# matching; must be enabled while parsing script)
sw.Write("gnt_shopt_extglob=$(shopt -p extglob || :)")
sw.Write("shopt -s extglob")
WritePreamble(sw, debug)
# gnt-* scripts
for scriptname in _constants.GNT_SCRIPTS:
filename = "scripts/%s" % scriptname
WriteCompletion(sw, scriptname, GetFunctionName(scriptname), debug,
commands=GetCommands(filename,
build.LoadModule(filename)))
# Burnin script
WriteCompletion(sw, "%s/burnin" % pathutils.TOOLSDIR, "_ganeti_burnin",
debug,
opts=burnin.OPTIONS, args=burnin.ARGUMENTS)
# ganeti-cleaner
WriteHaskellCompletion(sw, "daemons/ganeti-cleaner", htools=False,
debug=not options.compact)
# htools
for script in _constants.HTOOLS_PROGS:
WriteHaskellCompletion(sw, script, htools=True, debug=debug)
# ganeti-confd, if enabled
WriteHaskellCompletion(sw, "src/ganeti-confd", htools=False,
debug=debug)
# mon-collector, if monitoring is enabled
if _constants.ENABLE_MOND:
WriteHaskellCmdCompletion(sw, "src/mon-collector", debug=debug)
# Reset extglob to original value
sw.Write("[[ -n \"$gnt_shopt_extglob\" ]] && $gnt_shopt_extglob")
sw.Write("unset gnt_shopt_extglob")
print buf.getvalue()
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/build-rpc 0000755 0000000 0000000 00000013442 12634264163 0016721 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2011 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to generate RPC code.
"""
# pylint: disable=C0103
# [C0103] Invalid name
import sys
import re
import itertools
import textwrap
from cStringIO import StringIO
from ganeti import utils
from ganeti import compat
from ganeti import build
_SINGLE = "single-node"
_MULTI = "multi-node"
#: Expected length of a rpc definition
_RPC_DEF_LEN = 8
def _WritePreamble(sw):
"""Writes a preamble for the RPC wrapper output.
"""
sw.Write("# This code is automatically generated at build time.")
sw.Write("# Do not modify manually.")
sw.Write("")
sw.Write("\"\"\"Automatically generated RPC client wrappers.")
sw.Write("")
sw.Write("\"\"\"")
sw.Write("")
sw.Write("from ganeti import rpc_defs")
sw.Write("")
def _WrapCode(line):
"""Wraps Python code.
"""
return textwrap.wrap(line, width=70, expand_tabs=False,
fix_sentence_endings=False, break_long_words=False,
replace_whitespace=True,
subsequent_indent=utils.ShellWriter.INDENT_STR)
def _WriteDocstring(sw, name, timeout, kind, args, desc):
"""Writes a docstring for an RPC wrapper.
"""
sw.Write("\"\"\"Wrapper for RPC call '%s'", name)
sw.Write("")
if desc:
sw.Write(desc)
sw.Write("")
note = ["This is a %s call" % kind]
if timeout and not callable(timeout):
note.append(" with a timeout of %s" % utils.FormatSeconds(timeout))
sw.Write("@note: %s", "".join(note))
if kind == _SINGLE:
sw.Write("@type node: string")
sw.Write("@param node: Node name")
else:
sw.Write("@type node_list: list of string")
sw.Write("@param node_list: List of node names")
if args:
for (argname, _, argtext) in args:
if argtext:
docline = "@param %s: %s" % (argname, argtext)
for line in _WrapCode(docline):
sw.Write(line)
sw.Write("")
sw.Write("\"\"\"")
def _WriteBaseClass(sw, clsname, calls):
"""Write RPC wrapper class.
"""
sw.Write("")
sw.Write("class %s(object):", clsname)
sw.IncIndent()
try:
sw.Write("# E1101: Non-existent members")
sw.Write("# R0904: Too many public methods")
sw.Write("# pylint: disable=E1101,R0904")
if not calls:
sw.Write("pass")
return
sw.Write("_CALLS = rpc_defs.CALLS[%r]", clsname)
sw.Write("")
for v in calls:
if len(v) != _RPC_DEF_LEN:
raise ValueError("Procedure %s has only %d elements, expected %d" %
(v[0], len(v), _RPC_DEF_LEN))
for (name, kind, _, timeout, args, _, _, desc) in sorted(calls):
funcargs = ["self"]
if kind == _SINGLE:
funcargs.append("node")
elif kind == _MULTI:
funcargs.append("node_list")
else:
raise Exception("Unknown kind '%s'" % kind)
funcargs.extend(map(compat.fst, args))
funcargs.append("_def=_CALLS[%r]" % name)
funcdef = "def call_%s(%s):" % (name, utils.CommaJoin(funcargs))
for line in _WrapCode(funcdef):
sw.Write(line)
sw.IncIndent()
try:
_WriteDocstring(sw, name, timeout, kind, args, desc)
buf = StringIO()
buf.write("return ")
# In case line gets too long and is wrapped in a bad spot
buf.write("(")
buf.write("self._Call(_def, ")
if kind == _SINGLE:
buf.write("[node]")
else:
buf.write("node_list")
buf.write(", [%s])" %
# Function arguments
utils.CommaJoin(map(compat.fst, args)))
if kind == _SINGLE:
buf.write("[node]")
buf.write(")")
for line in _WrapCode(buf.getvalue()):
sw.Write(line)
finally:
sw.DecIndent()
sw.Write("")
finally:
sw.DecIndent()
def main():
"""Main function.
"""
buf = StringIO()
sw = utils.ShellWriter(buf)
_WritePreamble(sw)
for filename in sys.argv[1:]:
sw.Write("# Definitions from '%s'", filename)
module = build.LoadModule(filename)
# Call types are re-defined in definitions file to avoid imports. Verify
# here to ensure they're equal to local constants.
assert module.SINGLE == _SINGLE
assert module.MULTI == _MULTI
dups = utils.GetRepeatedKeys(*module.CALLS.values())
if dups:
raise Exception("Found duplicate RPC definitions for '%s'" %
utils.CommaJoin(sorted(dups)))
for (clsname, calls) in sorted(module.CALLS.items()):
_WriteBaseClass(sw, clsname, calls.values())
print buf.getvalue()
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/check-header 0000755 0000000 0000000 00000011531 12634264163 0017340 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2011, 2014 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to verify file header.
"""
# pylint: disable=C0103
# [C0103] Invalid name
import sys
import re
import itertools
from ganeti import constants
from ganeti import utils
from ganeti import compat
#: Assume header is always in the first 8kB of a file
_READ_SIZE = 8 * 1024
_BSD2 = [
"All rights reserved.",
"",
"Redistribution and use in source and binary forms, with or without",
"modification, are permitted provided that the following conditions are",
"met:",
"",
"1. Redistributions of source code must retain the above copyright notice,",
"this list of conditions and the following disclaimer.",
"",
"2. Redistributions in binary form must reproduce the above copyright",
"notice, this list of conditions and the following disclaimer in the",
"documentation and/or other materials provided with the distribution.",
"",
"THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS",
"IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED",
"TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR",
"PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR",
"CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,",
"EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,",
"PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR",
"PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF",
"LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING",
"NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS",
"SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
]
_SHEBANG = re.compile(r"^#(?:|!(?:/usr/bin/python(?:| -u)|/bin/(?:|ba)sh))$")
_COPYRIGHT_YEAR = r"20[01][0-9]"
_COPYRIGHT = re.compile(r"# Copyright \(C\) (%s(?:, %s)*) Google Inc\.$" %
(_COPYRIGHT_YEAR, _COPYRIGHT_YEAR))
_COPYRIGHT_DESC = "Copyright (C) [, ...] Google Inc."
_AUTOGEN = "# This file is automatically generated, do not edit!"
class HeaderError(Exception):
pass
def _Fail(lineno, msg):
raise HeaderError("Line %s: %s" % (lineno, msg))
def _CheckHeader(getline_fn):
(lineno, line) = getline_fn()
if line == _AUTOGEN:
return
if not _SHEBANG.match(line):
_Fail(lineno, ("Must contain nothing but a hash character (#) or a"
" shebang line (e.g. #!/bin/bash)"))
(lineno, line) = getline_fn()
if line == _AUTOGEN:
return
if line != "#":
_Fail(lineno, "Must contain nothing but hash character (#)")
(lineno, line) = getline_fn()
if line:
_Fail(lineno, "Must be empty")
(lineno, line) = getline_fn()
if not _COPYRIGHT.match(line):
_Fail(lineno, "Must contain copyright information (%s)" % _COPYRIGHT_DESC)
for licence_line in _BSD2:
(lineno, line) = getline_fn()
if line != ("# %s" % licence_line).rstrip():
_Fail(lineno, "Does not match expected licence line (%s)" % licence_line)
(lineno, line) = getline_fn()
if line:
_Fail(lineno, "Must be empty")
def Main():
"""Main program.
"""
fail = False
for filename in sys.argv[1:]:
content = utils.ReadFile(filename, size=_READ_SIZE)
lines = zip(itertools.count(1), content.splitlines())
try:
_CheckHeader(compat.partial(lines.pop, 0))
except HeaderError, err:
report = str(err)
print "%s: %s" % (filename, report)
fail = True
if fail:
sys.exit(constants.EXIT_FAILURE)
else:
sys.exit(constants.EXIT_SUCCESS)
if __name__ == "__main__":
Main()
ganeti-2.15.2/autotools/check-imports 0000755 0000000 0000000 00000005625 12634264163 0017614 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2011 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to check module imports.
"""
# pylint: disable=C0103
# C0103: Invalid name
import sys
# All modules imported after this line are removed from the global list before
# importing a module to be checked
_STANDARD_MODULES = sys.modules.keys()
import os.path
from ganeti import build
def main():
args = sys.argv[1:]
# Get references to functions used later on
load_module = build.LoadModule
abspath = os.path.abspath
commonprefix = os.path.commonprefix
normpath = os.path.normpath
script_path = abspath(__file__)
srcdir = normpath(abspath(args.pop(0)))
assert "ganeti" in sys.modules
for filename in args:
# Reset global state
for name in sys.modules.keys():
if name not in _STANDARD_MODULES:
sys.modules.pop(name, None)
assert "ganeti" not in sys.modules
# Load module (this might import other modules)
module = load_module(filename)
result = []
for (name, checkmod) in sorted(sys.modules.items()):
if checkmod is None or checkmod == module:
continue
try:
checkmodpath = getattr(checkmod, "__file__")
except AttributeError:
# Built-in module
pass
else:
abscheckmodpath = os.path.abspath(checkmodpath)
if abscheckmodpath == script_path:
# Ignore check script
continue
if commonprefix([abscheckmodpath, srcdir]) == srcdir:
result.append(name)
if result:
raise Exception("Module '%s' has illegal imports: %s" %
(filename, ", ".join(result)))
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/check-man-dashes 0000755 0000000 0000000 00000002701 12634264163 0020127 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2012, 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
! grep -F '\[em]' "$1" || \
{ echo "Unescaped dashes found in $1, use \\-- instead of --" 1>&2; exit 1; }
ganeti-2.15.2/autotools/check-man-references 0000755 0000000 0000000 00000004043 12634264163 0021002 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e -u -o pipefail
# Use array for arguments so that comments can be inline
args=(
# "...name*(8)" (missing backslash)
-e '\w+\*+\([0-9]*\)'
# "...name(8)" (no asterisk)
-e '\w+\([0-9]*\)'
# "...name(8)*" (asterisk after number)
-e '\w+\([0-9]*\)\*'
# "...name*\(8)" (only one asterisk before backslash)
-e '\w+\*\\\([0-9]*\)'
# ":manpage:..." (Sphinx-specific)
-e ':manpage:'
)
for fname; do
# Ignore title and then look for faulty references
if tail -n +2 $fname | grep -n -E -i "${args[@]}"; then
{
echo "Found faulty man page reference(s) in '$fname'."\
'Use syntax "**name**\(number)" instead.'\
'Example: **gnt-instance**\(8).'
} >&2
exit 1
fi
done
ganeti-2.15.2/autotools/check-man-warnings 0000755 0000000 0000000 00000003136 12634264163 0020513 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2010, 2012 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
if locale -a | grep -qF 'C.UTF-8'; then
loc="C.UTF-8"
else
loc="en_US.UTF-8"
fi
! LANG="$loc" LC_ALL="$loc" MANWIDTH=80 \
man --warnings --encoding=utf8 --local-file "$1" 2>&1 >/dev/null | \
grep -v -e "cannot adjust line" -e "can't break line" | \
grep .
ganeti-2.15.2/autotools/check-news 0000755 0000000 0000000 00000013304 12634264163 0017064 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2011, 2012, 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to check NEWS file.
"""
# pylint: disable=C0103
# [C0103] Invalid name
import sys
import time
import datetime
import locale
import fileinput
import re
import os
DASHES_RE = re.compile(r"^\s*-+\s*$")
RELEASED_RE = re.compile(r"^\*\(Released (?P[A-Z][a-z]{2}),"
r" (?P.+)\)\*$")
UNRELEASED_RE = re.compile(r"^\*\(unreleased\)\*$")
VERSION_RE = re.compile(r"^Version (\d+(\.\d+)+( (alpha|beta|rc)\d+)?)$")
#: How many days release timestamps may be in the future
TIMESTAMP_FUTURE_DAYS_MAX = 5
errors = []
def Error(msg):
"""Log an error for later display.
"""
errors.append(msg)
def ReqNLines(req, count_empty, lineno, line):
"""Check if we have N empty lines before the current one.
"""
if count_empty < req:
Error("Line %s: Missing empty line(s) before %s,"
" %d needed but got only %d" %
(lineno, line, req, count_empty))
if count_empty > req:
Error("Line %s: Too many empty lines before %s,"
" %d needed but got %d" %
(lineno, line, req, count_empty))
def IsAlphaVersion(version):
return "alpha" in version
def UpdateAllowUnreleased(allow_unreleased, version_match, release):
if not allow_unreleased:
return False
if IsAlphaVersion(release):
return True
version = version_match.group(1)
if version == release:
return False
return True
def main():
# Ensure "C" locale is used
curlocale = locale.getlocale()
if curlocale != (None, None):
Error("Invalid locale %s" % curlocale)
# Get the release version, but replace "~" with " " as the version
# in the NEWS file uses spaces for beta and rc releases.
release = os.environ.get('RELEASE', "").replace("~", " ")
prevline = None
expect_date = False
count_empty = 0
allow_unreleased = True
found_versions = set()
for line in fileinput.input():
line = line.rstrip("\n")
version_match = VERSION_RE.match(line)
if version_match:
ReqNLines(2, count_empty, fileinput.filelineno(), line)
version = version_match.group(1)
if version in found_versions:
Error("Line %s: Duplicate release %s found" %
(fileinput.filelineno(), version))
found_versions.add(version)
allow_unreleased = UpdateAllowUnreleased(allow_unreleased, version_match,
release)
unreleased_match = UNRELEASED_RE.match(line)
if unreleased_match and not allow_unreleased:
Error("Line %s: Unreleased version after current release %s" %
(fileinput.filelineno(), release))
if unreleased_match or RELEASED_RE.match(line):
ReqNLines(1, count_empty, fileinput.filelineno(), line)
if line:
count_empty = 0
else:
count_empty += 1
if DASHES_RE.match(line):
if not VERSION_RE.match(prevline):
Error("Line %s: Invalid title" %
(fileinput.filelineno() - 1))
if len(line) != len(prevline):
Error("Line %s: Invalid dashes length" %
(fileinput.filelineno()))
expect_date = True
elif expect_date:
if not line:
# Ignore empty lines
continue
if UNRELEASED_RE.match(line):
# Ignore unreleased versions
expect_date = False
continue
m = RELEASED_RE.match(line)
if not m:
Error("Line %s: Invalid release line" % fileinput.filelineno())
expect_date = False
continue
# Including the weekday in the date string does not work as time.strptime
# would return an inconsistent result if the weekday is incorrect.
parsed_ts = time.mktime(time.strptime(m.group("date"), "%d %b %Y"))
parsed = datetime.date.fromtimestamp(parsed_ts)
today = datetime.date.today()
if (parsed - datetime.timedelta(TIMESTAMP_FUTURE_DAYS_MAX)) > today:
Error("Line %s: %s is more than %s days in the future (today is %s)" %
(fileinput.filelineno(), parsed, TIMESTAMP_FUTURE_DAYS_MAX,
today))
weekday = parsed.strftime("%a")
# Check weekday
if m.group("day") != weekday:
Error("Line %s: %s was/is a %s, not %s" %
(fileinput.filelineno(), parsed, weekday,
m.group("day")))
expect_date = False
prevline = line
if errors:
for msg in errors:
print >> sys.stderr, msg
sys.exit(1)
else:
sys.exit(0)
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/check-python-code 0000755 0000000 0000000 00000005351 12634264163 0020344 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2009, 2011 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
# Ensure the checks always use the same locale
export LC_ALL=C
readonly maxlinelen=$(for ((i=0; i<81; ++i)); do echo -n .; done)
if [[ "${#maxlinelen}" != 81 ]]; then
echo "Internal error: Check for line length is incorrect" >&2
exit 1
fi
# "[...] If the last ARG evaluates to 0, let returns 1; 0 is returned
# otherwise.", hence ignoring the return value.
let problems=0 || :
for script; do
if grep -n -H -F $'\t' "$script"; then
let ++problems
echo "Found tabs in $script" >&2
fi
if grep -n -H -E '[[:space:]]$' "$script"; then
let ++problems
echo "Found end-of-line-whitespace in $script" >&2
fi
# FIXME: This will also match "foo.xrange(...)"
if grep -n -H -E '^[^#]*\' "$script"; then
let ++problems
echo "Forbidden function 'xrange' used in $script" >&2
fi
if grep -n -H -E -i '#[[:space:]]*(vim|Local[[:space:]]+Variables):' "$script"
then
let ++problems
echo "Found editor-specific settings in $script" >&2
fi
if grep -n -H "^$maxlinelen" "$script"; then
let ++problems
echo "Longest line in $script is longer than 80 characters" >&2
fi
if grep -n -H -E -i \
'#.*\bpylint[[:space:]]*:[[:space:]]*disable-msg\b' "$script"
then
let ++problems
echo "Found old-style pylint disable pragma in $script" >&2
fi
done
if [[ "$problems" -gt 0 ]]; then
echo "Found $problems problem(s) while checking code." >&2
exit 1
fi
ganeti-2.15.2/autotools/check-tar 0000755 0000000 0000000 00000004100 12634264163 0016670 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2010 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to check tarball generated by Automake.
"""
import sys
import stat
import tarfile
def ReportError(member, msg):
print >>sys.stderr, "%s: %s" % (member.name, msg)
def main():
tf = tarfile.open(fileobj=sys.stdin)
success = True
for member in tf.getmembers():
if member.uid != 0:
success = False
ReportError(member, "Owned by UID %s, not UID 0" % member.uid)
if member.gid != 0:
success = False
ReportError(member, "Owned by GID %s, not GID 0" % member.gid)
if member.mode & (stat.S_IWGRP | stat.S_IWOTH):
success = False
ReportError(member, "World or group writeable (mode is %o)" % member.mode)
if success:
sys.exit(0)
sys.exit(1)
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/check-version 0000755 0000000 0000000 00000004061 12634264163 0017575 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2010,2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
# Enable Bash-specific patterns
shopt -s extglob
readonly version=$1
readonly newsfile=$2
readonly numpat='+([0-9])'
case "$version" in
# Format "x.y.z"
$numpat.$numpat.$numpat) : ;;
# Format "x.y.z~rcN" or "x.y.z~betaN" or "x.y.z~alphaN" for N > 0
$numpat.$numpat.$numpat~@(rc|beta|alpha)[1-9]*([0-9])) : ;;
*)
echo "Invalid version format: $version" >&2
exit 1
;;
esac
readonly newsver="Version ${version/\~/ }"
# Only alpha versions are allowed not to have their own NEWS section yet
set +e
FOUND=x`echo $version | grep "alpha[1-9]*[0-9]$"`
set -e
if [ $FOUND == "x" ]
then
if ! grep -q -x "$newsver" $newsfile
then
echo "Unable to find heading '$newsver' in NEWS" >&2
exit 1
fi
fi
exit 0
ganeti-2.15.2/autotools/docpp 0000755 0000000 0000000 00000004021 12634264163 0016136 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2011 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script to replace special directives in documentation.
"""
import re
import fileinput
from ganeti import query
from ganeti.build import sphinx_ext
_DOC_RE = re.compile(r"^@(?P[A-Z_]+)_(?P[A-Z]+)@$")
_DOC_CLASSES_DATA = {
"CONSTANTS": (sphinx_ext.DOCUMENTED_CONSTANTS, sphinx_ext.BuildValuesDoc),
"QUERY_FIELDS": (query.ALL_FIELDS, sphinx_ext.BuildQueryFields),
}
def main():
for line in fileinput.input():
m = _DOC_RE.match(line)
if m:
fields_dict, builder = _DOC_CLASSES_DATA[m.group("class")]
fields = fields_dict[m.group("kind").lower()]
for i in builder(fields):
print i
else:
print line,
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/gen-py-coverage 0000755 0000000 0000000 00000004545 12634264163 0020034 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2010, 2011, 2012 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
set -u
: ${PYTHON:=python}
: ${COVERAGE:?}
: ${COVERAGE_FILE:?}
: ${TEXT_COVERAGE:?}
: ${HTML_COVERAGE:=}
: ${GANETI_TEMP_DIR:?}
reportargs=(
'--include=*'
'--omit=test/py/*'
)
$COVERAGE erase
if [[ -n "$HTML_COVERAGE" ]]; then
if [[ ! -d "$HTML_COVERAGE" ]]; then
echo "Not a directory: $HTML_COVERAGE" >&2
exit 1
fi
# At least coverage 3.4 fails to overwrite files
find "$HTML_COVERAGE" \( -type f -o -type l \) -delete
fi
for script; do
if [[ "$script" == *-runasroot.py ]]; then
if [[ -z "$FAKEROOT" ]]; then
echo "WARNING: FAKEROOT variable not set: skipping $script" >&2
continue
fi
cmdprefix="$FAKEROOT"
else
cmdprefix=
fi
$cmdprefix $COVERAGE run --branch --append "${reportargs[@]}" $script
done
echo "Writing text report to $TEXT_COVERAGE ..." >&2
$COVERAGE report "${reportargs[@]}" | tee "$TEXT_COVERAGE"
if [[ -n "$HTML_COVERAGE" ]]; then
echo "Generating HTML report in $HTML_COVERAGE ..." >&2
$COVERAGE html "${reportargs[@]}" -d "$HTML_COVERAGE"
fi
ganeti-2.15.2/autotools/print-py-constants 0000755 0000000 0000000 00000003573 12634264163 0020640 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Script for printing Python constants related to sockets.
These constants are the remnants of the Haskell to Python constant
generation. This solution is transitional until Ganeti 2.11 because
the solution for eliminating completely the Python to Haskell
conversion requires updating the configuration file.
"""
import socket
import sys
def main():
if len(sys.argv) > 1:
if sys.argv[1] == "AF_INET4":
print "%s" % socket.AF_INET
elif sys.argv[1] == "AF_INET6":
print "%s" % socket.AF_INET6
if __name__ == "__main__":
main()
ganeti-2.15.2/autotools/run-in-tempdir 0000755 0000000 0000000 00000001471 12634264163 0017711 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Helper for running things in a temporary directory; used for docs
# building, unittests, etc.
set -e
tmpdir=$(mktemp -d -t gntbuild.XXXXXXXX)
trap "rm -rf $tmpdir" EXIT
# fully copy items
cp -r autotools daemons scripts lib tools qa $tmpdir
if [[ -z "$COPY_DOC" ]]; then
mkdir $tmpdir/doc
ln -s $PWD/doc/examples $tmpdir/doc
else
# Building documentation requires all files
cp -r doc $tmpdir
fi
mkdir $tmpdir/test/
cp -r test/py $tmpdir/test/py
ln -s $PWD/test/data $tmpdir/test
ln -s $PWD/test/hs $tmpdir/test
mv $tmpdir/lib $tmpdir/ganeti
ln -T -s $tmpdir/ganeti $tmpdir/lib
mkdir -p $tmpdir/src $tmpdir/test/hs
for hfile in htools ganeti-confd mon-collector hs2py; do
if [ -e src/$hfile ]; then
ln -s $PWD/src/$hfile $tmpdir/src/
fi
done
cd $tmpdir && GANETI_TEMP_DIR="$tmpdir" "$@"
ganeti-2.15.2/autotools/sphinx-wrapper 0000755 0000000 0000000 00000003305 12634264163 0020024 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e -u -o pipefail
if [[ -e doc/manpages.rst ]]; then
echo 'doc/manpages.rst should not exist' >&2
exit 1
fi
if [[ -n "$ENABLE_MANPAGES" ]]; then
mv doc/manpages-enabled.rst doc/manpages.rst
rm doc/manpages-disabled.rst
else
mv doc/manpages-disabled.rst doc/manpages.rst
if [[ -e doc/manpages-enabled.rst ]]; then
rm doc/manpages-enabled.rst
fi
fi
exec "$@"
ganeti-2.15.2/autotools/testrunner 0000755 0000000 0000000 00000003301 12634264163 0017242 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2010, 2011 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
filename=$1
execasroot() {
local fname=$1
shift
if [[ -z "$FAKEROOT" ]]; then
echo "WARNING: FAKEROOT variable not set, skipping $fname" >&2
else
exec "$FAKEROOT" "$@"
fi
}
case "$filename" in
*-runasroot.py) execasroot $filename $PYTHON "$@" ;;
*.py) exec $PYTHON "$@" ;;
*-runasroot) execasroot $filename "$@" ;;
*) exec "$@" ;;
esac
ganeti-2.15.2/autotools/wrong-hardcoded-paths 0000644 0000000 0000000 00000000114 12634264163 0021211 0 ustar 00root root 0000000 0000000 /etc/ganeti
/usr/(local/)?lib/ganeti
/(usr/local/)?var/(lib|run|log)/ganeti
ganeti-2.15.2/cabal/ 0000755 0000000 0000000 00000000000 12634264163 0014117 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/cabal/CabalDependenciesMacros.hs 0000644 0000000 0000000 00000002437 12634264163 0021137 0 ustar 00root root 0000000 0000000 module Main where
import Control.Applicative
import qualified Data.Set as Set
import qualified Distribution.Simple.Build.Macros as Macros
import Distribution.Simple.Configure (maybeGetPersistBuildConfig)
import Distribution.Simple.LocalBuildInfo (externalPackageDeps)
import Distribution.PackageDescription (packageDescription)
import Distribution.PackageDescription.Parse (readPackageDescription)
import Distribution.Text (display)
import Distribution.Verbosity (normal)
import System.Environment (getArgs)
main :: IO ()
main = do
-- Get paths from program arguments.
(cabalPath, depsPath, macrosPath) <- do
args <- getArgs
case args of
[c, d, m] -> return (c, d, m)
_ -> error "Expected 3 arguments: cabalPath depsPath macrosPath"
-- Read the cabal file.
pkgDesc <- packageDescription <$> readPackageDescription normal cabalPath
-- Read the setup-config.
m'conf <- maybeGetPersistBuildConfig "dist"
case m'conf of
Nothing -> error "could not read dist/setup-config"
Just conf -> do
-- Write package dependencies.
let deps = map (display . fst) $ externalPackageDeps conf
writeFile depsPath (unwords $ map ("-package-id " ++) deps)
-- Write package MIN_VERSION_* macros.
writeFile macrosPath $ Macros.generate pkgDesc conf
ganeti-2.15.2/cabal/cabal-from-modules.py 0000644 0000000 0000000 00000000340 12634264163 0020137 0 ustar 00root root 0000000 0000000 import sys
cabal_in = sys.argv[1]
modules = '\n '.join(sorted(sys.stdin.read().split()))
template = open(cabal_in).read()
contents = template.replace('-- AUTOGENERATED_MODULES_HERE', modules)
sys.stdout.write(contents)
ganeti-2.15.2/cabal/ganeti.template.cabal 0000644 0000000 0000000 00000007217 12634264163 0020173 0 ustar 00root root 0000000 0000000 name: ganeti
version: 2.15
homepage: http://www.ganeti.org
license: BSD2
license-file: COPYING
author: Google Inc.
maintainer: ganeti-devel@googlegroups.com
copyright: 2006-2015 Google Inc.
category: System
build-type: Simple
extra-source-files: README
cabal-version: >=1.10
synopsis: Cluster-based virtualization management software
description:
Cluster-based virtualization management software
.
See
flag mond
description: enable the ganeti monitoring daemon
default: True
flag metad
description: enable the ganeti metadata daemon
default: True
flag htest
description: enable tests
default: True
library
exposed-modules:
-- AUTOGENERATED_MODULES_HERE
-- other-modules:
other-extensions:
TemplateHaskell
build-depends:
base >= 4.5.0.0
, array >= 0.4.0.0
, bytestring >= 0.9.2.1
, containers >= 0.4.2.1
, deepseq >= 1.3.0.0
, directory >= 1.1.0.2
, filepath >= 1.3.0.0
, mtl >= 2.1.1
, old-time >= 1.1.0.0
, pretty >= 1.1.1.0
, process >= 1.1.0.1
, random >= 1.0.1.1
, template-haskell >= 2.7.0.0
, text >= 0.11.1.13
, transformers >= 0.3.0.0
, unix >= 2.5.1.0
, attoparsec >= 0.10.1.1 && < 0.13
, base64-bytestring >= 1.0.0.1 && < 1.1
, case-insensitive >= 0.4.0.1 && < 1.3
, Crypto >= 4.2.4 && < 4.3
, curl >= 1.3.7 && < 1.4
, hinotify >= 0.3.2 && < 0.4
, hslogger >= 1.1.4 && < 1.3
, json >= 0.5 && < 0.9
, lens >= 3.10 && < 4.8
, lifted-base >= 0.2.0.3 && < 0.3
, monad-control >= 0.3.1.3 && < 1.1
, MonadCatchIO-transformers >= 0.3.0.0 && < 0.4
, network >= 2.3.0.13 && < 2.7
, parallel >= 3.2.0.2 && < 3.3
, regex-pcre >= 0.94.2 && < 0.95
, temporary >= 1.1.2.3 && < 1.3
, transformers-base >= 0.4.1 && < 0.5
, utf8-string >= 0.3.7 && < 0.4
, zlib >= 0.5.3.3 && < 0.6
-- Executables:
-- , happy
-- , hscolour
-- , shelltestrunner
if flag(htest)
build-depends:
HUnit >= 1.2.4.2 && < 1.3
, QuickCheck >= 2.4.2 && < 2.8
, test-framework >= 0.6 && < 0.9
, test-framework-hunit >= 0.2.7 && < 0.4
, test-framework-quickcheck2 >= 0.2.12.1 && < 0.4
if flag(mond)
build-depends:
PSQueue >= 1.1 && < 1.2
, snap-core >= 0.8.1 && < 0.10
, snap-server >= 0.8.1 && < 0.10
if flag(metad)
build-depends:
snap-core >= 0.8.1 && < 0.10
, snap-server >= 0.8.1 && < 0.10
hs-source-dirs:
src, test/hs
build-tools:
hsc2hs
default-language:
Haskell2010
ghc-options:
-Wall
ganeti-2.15.2/configure.ac 0000644 0000000 0000000 00000067050 12634264163 0015353 0 ustar 00root root 0000000 0000000 # Configure script for Ganeti
m4_define([gnt_version_major], [2])
m4_define([gnt_version_minor], [15])
m4_define([gnt_version_revision], [2])
m4_define([gnt_version_suffix], [])
m4_define([gnt_version_full],
m4_format([%d.%d.%d%s],
gnt_version_major, gnt_version_minor,
gnt_version_revision, gnt_version_suffix))
AC_PREREQ(2.59)
AC_INIT(ganeti, gnt_version_full, ganeti@googlegroups.com)
AC_CONFIG_AUX_DIR(autotools)
AC_CONFIG_SRCDIR(configure)
AM_INIT_AUTOMAKE([1.9 foreign tar-ustar -Wall -Wno-portability]
m4_esyscmd([case `automake --version | head -n 1` in
*1.11*);;
*) echo serial-tests;;
esac]))
AC_SUBST([VERSION_MAJOR], gnt_version_major)
AC_SUBST([VERSION_MINOR], gnt_version_minor)
AC_SUBST([VERSION_REVISION], gnt_version_revision)
AC_SUBST([VERSION_SUFFIX], gnt_version_suffix)
AC_SUBST([VERSION_FULL], gnt_version_full)
AC_SUBST([BINDIR], $bindir)
AC_SUBST([SBINDIR], $sbindir)
AC_SUBST([MANDIR], $mandir)
# --enable-versionfull
AC_ARG_ENABLE([versionfull],
[AS_HELP_STRING([--enable-versionfull],
m4_normalize([use the full version string rather
than major.minor for version directories]))],
[[if test "$enableval" != no; then
USE_VERSION_FULL=yes
else
USER_VERSION_FULL=no
fi
]],
[USE_VERSION_FULL=no
])
AC_SUBST(USE_VERSION_FULL, $USE_VERSION_FULL)
AM_CONDITIONAL([USE_VERSION_FULL], [test "$USE_VERSION_FULL" = yes])
# --enable-symlinks
AC_ARG_ENABLE([symlinks],
[AS_HELP_STRING([--enable-symlinks],
m4_normalize([also install version-dependent symlinks under
$sysconfdir (default: disabled)]))],
[[if test "$enableval" != yes; then
INSTALL_SYMLINKS=no
else
INSTALL_SYMLINKS=yes
fi
]],
[INSTALL_SYMLINKS=no
])
AC_SUBST(INSTALL_SYMLINKS, $INSTALL_SYMLINKS)
AM_CONDITIONAL([INSTALL_SYMLINKS], [test "$INSTALL_SYMLINKS" = yes])
# --enable-haskell-profiling
AC_ARG_ENABLE([haskell-profiling],
[AS_HELP_STRING([--enable-haskell-profiling],
m4_normalize([enable profiling for Haskell binaries
(default: disabled)]))],
[[if test "$enableval" != yes; then
HPROFILE=no
else
HPROFILE=yes
fi
]],
[HPROFILE=no
])
AC_SUBST(HPROFILE, $HPROFILE)
AM_CONDITIONAL([HPROFILE], [test "$HPROFILE" = yes])
# --enable-haskell-coverage
AC_ARG_ENABLE([haskell-coverage],
[AS_HELP_STRING([--enable-haskell-coverage],
m4_normalize([enable coverage for Haskell binaries
(default: disabled)]))],
[[if test "$enableval" != yes; then
HCOVERAGE=no
else
HCOVERAGE=yes
fi
]],
[HCOVERAGE=no
])
AC_SUBST(HCOVERAGE, $HCOVERAGE)
AM_CONDITIONAL([HCOVERAGE], [test "$HCOVERAGE" = yes])
# --enable-haskell-tests
AC_ARG_ENABLE([haskell-tests],
[AS_HELP_STRING([--enable-haskell-tests],
m4_normalize([enable additinal Haskell development test code
(default: disabled)]))],
[[if test "$enableval" != yes; then
HTEST=no
else
HTEST=yes
fi
]],
[HTEST=no
])
AC_SUBST(HTEST, $HTEST)
AM_CONDITIONAL([HTEST], [test "$HTEST" = yes])
# --enable-developer-mode
AC_ARG_ENABLE([developer-mode],
[AS_HELP_STRING([--enable-developer-mode],
m4_normalize([do a developper build with additional
checks and fatal warnings; this is implied by enabling
the haskell tests]))],
[[if test "$enableval" != no; then
DEVELOPER_MODE=yes
else
DEVELOPER_MODE=no
fi
]],
[DEVELOPER_MODE=no
])
AC_SUBST(DEVELOPER_MODE, $DEVELOPER_MODE)
AM_CONDITIONAL([DEVELOPER_MODE],
[test "$DEVELOPER_MODE" = yes -o "$HTEST" = yes])
# --with-haskell-flags=
AC_ARG_WITH([haskell-flags],
[AS_HELP_STRING([--with-haskell-flags=FLAGS],
[Extra flags to pass to GHC]
)],
[hextra_configure="$withval"],
[hextra_configure=""])
AC_SUBST(HEXTRA_CONFIGURE, $hextra_configure)
# --with-ssh-initscript=...
AC_ARG_WITH([ssh-initscript],
[AS_HELP_STRING([--with-ssh-initscript=SCRIPT],
[SSH init script to use (default is /etc/init.d/ssh)]
)],
[ssh_initd_script="$withval"],
[ssh_initd_script="/etc/init.d/ssh"])
AC_SUBST(SSH_INITD_SCRIPT, $ssh_initd_script)
# --with-export-dir=...
AC_ARG_WITH([export-dir],
[AS_HELP_STRING([--with-export-dir=DIR],
[directory to use by default for instance image]
[ exports (default is /srv/ganeti/export)]
)],
[export_dir="$withval"],
[export_dir="/srv/ganeti/export"])
AC_SUBST(EXPORT_DIR, $export_dir)
# --with-backup-dir=...
AC_ARG_WITH([backup-dir],
[AS_HELP_STRING([--with-backup-dir=DIR],
[directory to use for configuration backups]
[ on Ganeti upgrades (default is $(localstatedir)/lib)]
)],
[backup_dir="$withval"
USE_BACKUP_DIR=yes
],
[backup_dir=
USE_BACKUP_DIR=no
])
AC_SUBST(BACKUP_DIR, $backup_dir)
AM_CONDITIONAL([USE_BACKUP_DIR], [test "$USE_BACKUP_DIR" = yes])
# --with-ssh-config-dir=...
AC_ARG_WITH([ssh-config-dir],
[AS_HELP_STRING([--with-ssh-config-dir=DIR],
[ directory with ssh host keys ]
[ (default is /etc/ssh)]
)],
[ssh_config_dir="$withval"],
[ssh_config_dir="/etc/ssh"])
AC_SUBST(SSH_CONFIG_DIR, $ssh_config_dir)
# --with-xen-config-dir=...
AC_ARG_WITH([xen-config-dir],
[AS_HELP_STRING([--with-xen-config-dir=DIR],
m4_normalize([Xen configuration directory
(default: /etc/xen)]))],
[xen_config_dir="$withval"],
[xen_config_dir=/etc/xen])
AC_SUBST(XEN_CONFIG_DIR, $xen_config_dir)
# --with-os-search-path=...
AC_ARG_WITH([os-search-path],
[AS_HELP_STRING([--with-os-search-path=LIST],
[comma separated list of directories to]
[ search for OS images (default is /srv/ganeti/os)]
)],
[os_search_path="$withval"],
[os_search_path="/srv/ganeti/os"])
AC_SUBST(OS_SEARCH_PATH, $os_search_path)
# --with-extstorage-search-path=...
AC_ARG_WITH([extstorage-search-path],
[AS_HELP_STRING([--with-extstorage-search-path=LIST],
[comma separated list of directories to]
[ search for External Storage Providers]
[ (default is /srv/ganeti/extstorage)]
)],
[es_search_path="$withval"],
[es_search_path="/srv/ganeti/extstorage"])
AC_SUBST(ES_SEARCH_PATH, $es_search_path)
# --with-iallocator-search-path=...
AC_ARG_WITH([iallocator-search-path],
[AS_HELP_STRING([--with-iallocator-search-path=LIST],
[comma separated list of directories to]
[ search for instance allocators (default is $libdir/ganeti/iallocators)]
)],
[iallocator_search_path="$withval"],
[iallocator_search_path="$libdir/$PACKAGE_NAME/iallocators"])
AC_SUBST(IALLOCATOR_SEARCH_PATH, $iallocator_search_path)
# --with-default-vg=...
AC_ARG_WITH([default-vg],
[AS_HELP_STRING([--with-default-vg=VOLUMEGROUP],
[default volume group (default is xenvg)]
)],
[default_vg="$withval"],
[default_vg="xenvg"])
AC_SUBST(DEFAULT_VG, $default_vg)
# --with-default-bridge=...
AC_ARG_WITH([default-bridge],
[AS_HELP_STRING([--with-default-bridge=BRIDGE],
[default bridge (default is xen-br0)]
)],
[default_bridge="$withval"],
[default_bridge="xen-br0"])
AC_SUBST(DEFAULT_BRIDGE, $default_bridge)
# --with-xen-bootloader=...
AC_ARG_WITH([xen-bootloader],
[AS_HELP_STRING([--with-xen-bootloader=PATH],
[bootloader for Xen hypervisor (default is empty)]
)],
[xen_bootloader="$withval"],
[xen_bootloader=])
AC_SUBST(XEN_BOOTLOADER, $xen_bootloader)
# --with-xen-kernel=...
AC_ARG_WITH([xen-kernel],
[AS_HELP_STRING([--with-xen-kernel=PATH],
[DomU kernel image for Xen hypervisor (default is /boot/vmlinuz-3-xenU)]
)],
[xen_kernel="$withval"],
[xen_kernel="/boot/vmlinuz-3-xenU"])
AC_SUBST(XEN_KERNEL, $xen_kernel)
# --with-xen-initrd=...
AC_ARG_WITH([xen-initrd],
[AS_HELP_STRING([--with-xen-initrd=PATH],
[DomU initrd image for Xen hypervisor (default is /boot/initrd-3-xenU)]
)],
[xen_initrd="$withval"],
[xen_initrd="/boot/initrd-3-xenU"])
AC_SUBST(XEN_INITRD, $xen_initrd)
# --with-kvm-kernel=...
AC_ARG_WITH([kvm-kernel],
[AS_HELP_STRING([--with-kvm-kernel=PATH],
[Guest kernel image for KVM hypervisor (default is /boot/vmlinuz-3-kvmU)]
)],
[kvm_kernel="$withval"],
[kvm_kernel="/boot/vmlinuz-3-kvmU"])
AC_SUBST(KVM_KERNEL, $kvm_kernel)
# --with-kvm-path=...
AC_ARG_WITH([kvm-path],
[AS_HELP_STRING([--with-kvm-path=PATH],
[absolute path to the kvm binary]
[ (default is /usr/bin/kvm)]
)],
[kvm_path="$withval"],
[kvm_path="/usr/bin/kvm"])
AC_SUBST(KVM_PATH, $kvm_path)
# --with-lvm-stripecount=...
AC_ARG_WITH([lvm-stripecount],
[AS_HELP_STRING([--with-lvm-stripecount=NUM],
[the default number of stripes to use for LVM volumes]
[ (default is 1)]
)],
[lvm_stripecount="$withval"],
[lvm_stripecount=1])
AC_SUBST(LVM_STRIPECOUNT, $lvm_stripecount)
# --with-ssh-login-user=...
AC_ARG_WITH([ssh-login-user],
[AS_HELP_STRING([--with-ssh-login-user=USERNAME],
[user to use for SSH logins within the cluster (default is root)]
)],
[ssh_login_user="$withval"],
[ssh_login_user=root])
AC_SUBST(SSH_LOGIN_USER, $ssh_login_user)
# --with-ssh-console-user=...
AC_ARG_WITH([ssh-console-user],
[AS_HELP_STRING([--with-ssh-console-user=USERNAME],
[user to use for SSH logins to access instance consoles (default is root)]
)],
[ssh_console_user="$withval"],
[ssh_console_user=root])
AC_SUBST(SSH_CONSOLE_USER, $ssh_console_user)
# --with-default-user=...
AC_ARG_WITH([default-user],
[AS_HELP_STRING([--with-default-user=USERNAME],
[default user for daemons]
[ (default is to run all daemons as root)]
)],
[user_default="$withval"],
[user_default=root])
# --with-default-group=...
AC_ARG_WITH([default-group],
[AS_HELP_STRING([--with-default-group=GROUPNAME],
[default group for daemons]
[ (default is to run all daemons under group root)]
)],
[group_default="$withval"],
[group_default=root])
# --with-user-prefix=...
AC_ARG_WITH([user-prefix],
[AS_HELP_STRING([--with-user-prefix=PREFIX],
[prefix for daemon users]
[ (default is to run all daemons as root; use --with-default-user]
[ to change the default)]
)],
[user_masterd="${withval}masterd";
user_metad="$user_default";
user_rapi="${withval}rapi";
user_confd="${withval}confd";
user_wconfd="${withval}masterd";
user_kvmd="$user_default";
user_luxid="${withval}masterd";
user_noded="$user_default";
user_mond="$user_default"],
[user_masterd="$user_default";
user_metad="$user_default";
user_rapi="$user_default";
user_confd="$user_default";
user_wconfd="$user_default";
user_kvmd="$user_default";
user_luxid="$user_default";
user_noded="$user_default";
user_mond="$user_default"])
AC_SUBST(MASTERD_USER, $user_masterd)
AC_SUBST(METAD_USER, $user_metad)
AC_SUBST(RAPI_USER, $user_rapi)
AC_SUBST(CONFD_USER, $user_confd)
AC_SUBST(WCONFD_USER, $user_wconfd)
AC_SUBST(KVMD_USER, $user_kvmd)
AC_SUBST(LUXID_USER, $user_luxid)
AC_SUBST(NODED_USER, $user_noded)
AC_SUBST(MOND_USER, $user_mond)
# --with-group-prefix=...
AC_ARG_WITH([group-prefix],
[AS_HELP_STRING([--with-group-prefix=PREFIX],
[prefix for daemon POSIX groups]
[ (default is to run all daemons under group root; use]
[ --with-default-group to change the default)]
)],
[group_rapi="${withval}rapi";
group_admin="${withval}admin";
group_confd="${withval}confd";
group_wconfd="${withval}masterd";
group_kvmd="$group_default";
group_luxid="${withval}luxid";
group_masterd="${withval}masterd";
group_metad="$group_default";
group_noded="$group_default";
group_daemons="${withval}daemons";
group_mond="$group_default"],
[group_rapi="$group_default";
group_admin="$group_default";
group_confd="$group_default";
group_wconfd="$group_default";
group_kvmd="$group_default";
group_luxid="$group_default";
group_masterd="$group_default";
group_metad="$group_default";
group_noded="$group_default";
group_daemons="$group_default";
group_mond="$group_default"])
AC_SUBST(RAPI_GROUP, $group_rapi)
AC_SUBST(ADMIN_GROUP, $group_admin)
AC_SUBST(CONFD_GROUP, $group_confd)
AC_SUBST(WCONFD_GROUP, $group_wconfd)
AC_SUBST(KVMD_GROUP, $group_kvmd)
AC_SUBST(LUXID_GROUP, $group_luxid)
AC_SUBST(MASTERD_GROUP, $group_masterd)
AC_SUBST(METAD_GROUP, $group_metad)
AC_SUBST(NODED_GROUP, $group_noded)
AC_SUBST(DAEMONS_GROUP, $group_daemons)
AC_SUBST(MOND_GROUP, $group_mond)
# Print the config to the user
AC_MSG_NOTICE([Running ganeti-masterd as $group_masterd:$group_masterd])
AC_MSG_NOTICE([Running ganeti-metad as $group_metad:$group_metad])
AC_MSG_NOTICE([Running ganeti-rapi as $user_rapi:$group_rapi])
AC_MSG_NOTICE([Running ganeti-confd as $user_confd:$group_confd])
AC_MSG_NOTICE([Running ganeti-wconfd as $user_wconfd:$group_wconfd])
AC_MSG_NOTICE([Running ganeti-luxid as $user_luxid:$group_luxid])
AC_MSG_NOTICE([Group for daemons is $group_daemons])
AC_MSG_NOTICE([Group for clients is $group_admin])
# --enable-drbd-barriers
AC_ARG_ENABLE([drbd-barriers],
[AS_HELP_STRING([--enable-drbd-barriers],
m4_normalize([enable the DRBD barriers functionality by
default (>= 8.0.12) (default: enabled)]))],
[[if test "$enableval" != no; then
DRBD_BARRIERS=n
DRBD_NO_META_FLUSH=False
else
DRBD_BARRIERS=bf
DRBD_NO_META_FLUSH=True
fi
]],
[DRBD_BARRIERS=n
DRBD_NO_META_FLUSH=False
])
AC_SUBST(DRBD_BARRIERS, $DRBD_BARRIERS)
AC_SUBST(DRBD_NO_META_FLUSH, $DRBD_NO_META_FLUSH)
# --enable-syslog[=no/yes/only]
AC_ARG_ENABLE([syslog],
[AS_HELP_STRING([--enable-syslog],
[enable use of syslog (default: disabled), one of no/yes/only])],
[[case "$enableval" in
no)
SYSLOG=no
;;
yes)
SYSLOG=yes
;;
only)
SYSLOG=only
;;
*)
SYSLOG=
;;
esac
]],
[SYSLOG=no])
if test -z "$SYSLOG"
then
AC_MSG_ERROR([invalid value for syslog, choose one of no/yes/only])
fi
AC_SUBST(SYSLOG_USAGE, $SYSLOG)
# --enable-restricted-commands[=no/yes]
AC_ARG_ENABLE([restricted-commands],
[AS_HELP_STRING([--enable-restricted-commands],
m4_normalize([enable restricted commands in the node daemon
(default: disabled)]))],
[[if test "$enableval" = no; then
enable_restricted_commands=False
else
enable_restricted_commands=True
fi
]],
[enable_restricted_commands=False])
AC_SUBST(ENABLE_RESTRICTED_COMMANDS, $enable_restricted_commands)
# --with-disk-separator=...
AC_ARG_WITH([disk-separator],
[AS_HELP_STRING([--with-disk-separator=STRING],
[Disk index separator, useful if the default of ':' is handled]
[ specially by the hypervisor]
)],
[disk_separator="$withval"],
[disk_separator=":"])
AC_SUBST(DISK_SEPARATOR, $disk_separator)
# Check common programs
AC_PROG_INSTALL
AC_PROG_LN_S
# check if ln is the GNU version of ln (and hence supports -T)
if ln --version 2> /dev/null | head -1 | grep -q GNU
then
AC_SUBST(HAS_GNU_LN, True)
else
AC_SUBST(HAS_GNU_LN, False)
fi
# Check for the ip command
AC_ARG_VAR(IP_PATH, [ip path])
AC_PATH_PROG(IP_PATH, [ip], [])
if test -z "$IP_PATH"
then
AC_MSG_ERROR([ip command not found])
fi
# Check for pandoc
AC_ARG_VAR(PANDOC, [pandoc path])
AC_PATH_PROG(PANDOC, [pandoc], [])
if test -z "$PANDOC"
then
AC_MSG_WARN([pandoc not found, man pages rebuild will not be possible])
fi
# Check for python-sphinx
AC_ARG_VAR(SPHINX, [sphinx-build path])
AC_PATH_PROG(SPHINX, [sphinx-build], [])
if test -z "$SPHINX"
then
AC_MSG_WARN(m4_normalize([sphinx-build not found, documentation rebuild will
not be possible]))
else
# Sphinx exits with code 1 when it prints its usage
sphinxver=`{ $SPHINX --version 2>&1 || :; } | head -n 3`
if ! echo "$sphinxver" | grep -q -w -e '^Sphinx' -e '^Usage:'; then
AC_MSG_ERROR([Unable to determine Sphinx version])
# Note: Character classes ([...]) need to be double quoted due to autoconf
# using m4
elif ! echo "$sphinxver" | grep -q -E \
'^Sphinx([[[:space:]]]+|\(sphinx-build[[1-9]]?\)|v)*[[1-9]]\>'; then
AC_MSG_ERROR([Sphinx 1.0 or higher is required])
fi
fi
AM_CONDITIONAL([HAS_SPHINX], [test -n "$SPHINX"])
AM_CONDITIONAL([HAS_SPHINX_PRE13],
[test -n "$SPHINX" && echo "$sphinxver" | grep -q -E \
'^Sphinx([[[:space:]]]+|\(sphinx-build[[1-9]]?\)|v)*[[1-9]]\.[[0-2]]\.'])
AC_ARG_ENABLE([manpages-in-doc],
[AS_HELP_STRING([--enable-manpages-in-doc],
m4_normalize([include man pages in HTML documentation
(requires sphinx; default disabled)]))],
[case "$enableval" in
yes) manpages_in_doc=yes ;;
no) manpages_in_doc= ;;
*)
AC_MSG_ERROR([Bad value $enableval for --enable-manpages-in-doc])
;;
esac
],
[manpages_in_doc=])
AM_CONDITIONAL([MANPAGES_IN_DOC], [test -n "$manpages_in_doc"])
AC_SUBST(MANPAGES_IN_DOC, $manpages_in_doc)
if test -z "$SPHINX" -a -n "$manpages_in_doc"; then
AC_MSG_ERROR([Including man pages in HTML documentation requires sphinx])
fi
# Check for graphviz (dot)
AC_ARG_VAR(DOT, [dot path])
AC_PATH_PROG(DOT, [dot], [])
if test -z "$DOT"
then
AC_MSG_WARN(m4_normalize([dot (from the graphviz suite) not found,
documentation rebuild not possible]))
fi
# Check for pylint
AC_ARG_VAR(PYLINT, [pylint path])
AC_PATH_PROG(PYLINT, [pylint], [])
if test -z "$PYLINT"
then
AC_MSG_WARN([pylint not found, checking code will not be possible])
fi
# Check for pep8
AC_ARG_VAR(PEP8, [pep8 path])
AC_PATH_PROG(PEP8, [pep8], [])
if test -z "$PEP8"
then
AC_MSG_WARN([pep8 not found, checking code will not be complete])
fi
AM_CONDITIONAL([HAS_PEP8], [test -n "$PEP8"])
# Check for python-coverage
AC_ARG_VAR(PYCOVERAGE, [python-coverage path])
AC_PATH_PROGS(PYCOVERAGE, [python-coverage coverage], [])
if test -z "$PYCOVERAGE"
then
AC_MSG_WARN(m4_normalize([python-coverage or coverage not found, evaluating
Python test coverage will not be possible]))
fi
# Check for socat
AC_ARG_VAR(SOCAT, [socat path])
AC_PATH_PROG(SOCAT, [socat], [])
if test -z "$SOCAT"
then
AC_MSG_ERROR([socat not found])
fi
# Check for qemu-img
AC_ARG_VAR(QEMUIMG_PATH, [qemu-img path])
AC_PATH_PROG(QEMUIMG_PATH, [qemu-img], [])
if test -z "$QEMUIMG_PATH"
then
AC_MSG_WARN([qemu-img not found, using ovfconverter will not be possible])
fi
ENABLE_MOND=
AC_ARG_ENABLE([monitoring],
[AS_HELP_STRING([--enable-monitoring],
[enable the ganeti monitoring daemon (default: check)])],
[],
[enable_monitoring=check])
# --enable-metadata
ENABLE_METADATA=
AC_ARG_ENABLE([metadata],
[AS_HELP_STRING([--enable-metadata],
[enable the ganeti metadata daemon (default: check)])],
[],
[enable_metadata=check])
# Check for ghc
AC_ARG_VAR(GHC, [ghc path])
AC_PATH_PROG(GHC, [ghc], [])
if test -z "$GHC"; then
AC_MSG_FAILURE([ghc not found, compilation will not possible])
fi
# Note: Character classes ([...]) need to be double quoted due to autoconf
# using m4
AM_CONDITIONAL([GHC_LE_76], [$GHC --numeric-version | grep -q '^7\.[[0-6]]\.'])
AC_MSG_CHECKING([checking for extra GHC flags])
GHC_BYVERSION_FLAGS=
# check for GHC supported flags that vary accross versions
for flag in -fwarn-incomplete-uni-patterns; do
if $GHC -e '0' $flag >/dev/null 2>/dev/null; then
GHC_BYVERSION_FLAGS="$GHC_BYVERSION_FLAGS $flag"
fi
done
AC_MSG_RESULT($GHC_BYVERSION_FLAGS)
AC_SUBST(GHC_BYVERSION_FLAGS)
# Check for ghc-pkg
AC_ARG_VAR(GHC_PKG, [ghc-pkg path])
AC_PATH_PROG(GHC_PKG, [ghc-pkg], [])
if test -z "$GHC_PKG"; then
AC_MSG_FAILURE([ghc-pkg not found, compilation will not be possible])
fi
# Check for cabal
AC_ARG_VAR(CABAL, [cabal path])
AC_PATH_PROG(CABAL, [cabal], [])
if test -z "$CABAL"; then
AC_MSG_FAILURE([cabal not found, compilation will not be possible])
fi
# check for standard modules
AC_GHC_PKG_REQUIRE(Cabal)
AC_GHC_PKG_REQUIRE(curl)
AC_GHC_PKG_REQUIRE(json)
AC_GHC_PKG_REQUIRE(network)
AC_GHC_PKG_REQUIRE(mtl)
AC_GHC_PKG_REQUIRE(bytestring)
AC_GHC_PKG_REQUIRE(base64-bytestring-1.*, t)
AC_GHC_PKG_REQUIRE(utf8-string)
AC_GHC_PKG_REQUIRE(zlib)
AC_GHC_PKG_REQUIRE(hslogger)
AC_GHC_PKG_REQUIRE(process)
AC_GHC_PKG_REQUIRE(attoparsec)
AC_GHC_PKG_REQUIRE(vector)
AC_GHC_PKG_REQUIRE(text)
AC_GHC_PKG_REQUIRE(hinotify)
AC_GHC_PKG_REQUIRE(Crypto)
AC_GHC_PKG_REQUIRE(lifted-base)
AC_GHC_PKG_REQUIRE(lens)
AC_GHC_PKG_REQUIRE(regex-pcre)
#extra modules for monitoring daemon functionality; also needed for tests
MONITORING_PKG=
AC_GHC_PKG_CHECK([snap-server], [],
[NS_NODEV=1; MONITORING_PKG="$MONITORING_PKG snap-server"])
AC_GHC_PKG_CHECK([PSQueue], [],
[NS_NODEV=1; MONITORING_PKG="$MONITORING_PKG PSQueue"])
has_monitoring=False
if test "$enable_monitoring" != no; then
MONITORING_DEP=
has_monitoring_pkg=False
if test -z "$MONITORING_PKG"; then
has_monitoring_pkg=True
elif test "$enable_monitoring" = check; then
AC_MSG_WARN(m4_normalize([The required extra libraries for the monitoring
daemon were not found ($MONITORING_PKG),
monitoring disabled]))
else
AC_MSG_FAILURE(m4_normalize([The monitoring functionality was requested, but
required libraries were not found:
$MONITORING_PKG]))
fi
has_monitoring_dep=False
if test -z "$MONITORING_DEP"; then
has_monitoring_dep=True
elif test "$enable_monitoring" = check; then
AC_MSG_WARN(m4_normalize([The optional Ganeti components required for the
monitoring agent were not enabled
($MONITORING_DEP), monitoring disabled]))
else
AC_MSG_FAILURE(m4_normalize([The monitoring functionality was requested, but
required optional Ganeti components were not
found: $MONITORING_DEP]))
fi
fi
if test "$has_monitoring_pkg" = True -a "$has_monitoring_dep" = True; then
has_monitoring=True
AC_MSG_NOTICE([Enabling the monitoring agent usage])
fi
AC_SUBST(ENABLE_MOND, $has_monitoring)
AM_CONDITIONAL([ENABLE_MOND], [test "$has_monitoring" = True])
# extra modules for metad functionality; also needed for tests
METAD_PKG=
AC_GHC_PKG_CHECK([snap-server], [],
[NS_NODEV=1; METAD_PKG="$METAD_PKG snap-server"])
has_metad=False
if test "$enable_metadata" != no; then
if test -z "$METAD_PKG"; then
has_metad=True
elif test "$enable_metadata" = check; then
AC_MSG_WARN(m4_normalize([The required extra libraries for metad were
not found ($METAD_PKG), metad disabled]))
else
AC_MSG_FAILURE(m4_normalize([The metadata functionality was requested, but
required libraries were not found:
$METAD_PKG]))
fi
fi
if test "$has_metad" = True; then
AC_MSG_NOTICE([Enabling metadata usage])
fi
AC_SUBST(ENABLE_METADATA, $has_metad)
AM_CONDITIONAL([ENABLE_METADATA], [test x$has_metad = xTrue])
# development modules
AC_GHC_PKG_CHECK([QuickCheck-2.*], [], [HS_NODEV=1], t)
AC_GHC_PKG_CHECK([test-framework-0.6*], [], [
AC_GHC_PKG_CHECK([test-framework-0.7*], [], [
AC_GHC_PKG_CHECK([test-framework-0.8*], [], [HS_NODEV=1], t)
], t)
], t)
AC_GHC_PKG_CHECK([test-framework-hunit], [], [HS_NODEV=1])
AC_GHC_PKG_CHECK([test-framework-quickcheck2], [], [HS_NODEV=1])
AC_GHC_PKG_CHECK([temporary], [], [HS_NODEV=1])
if test -n "$HS_NODEV"; then
AC_MSG_WARN(m4_normalize([Required development modules were not found,
you won't be able to run Haskell unittests]))
else
AC_MSG_NOTICE([Haskell development modules found, unittests enabled])
fi
AC_SUBST(HS_NODEV)
AM_CONDITIONAL([HS_UNIT], [test -n $HS_NODEV])
# Check for HsColour
HS_APIDOC=no
AC_ARG_VAR(HSCOLOUR, [HsColour path])
AC_PATH_PROG(HSCOLOUR, [HsColour], [])
if test -z "$HSCOLOUR"; then
AC_MSG_WARN(m4_normalize([HsColour not found, htools API documentation will
not be generated]))
fi
# Check for haddock
AC_ARG_VAR(HADDOCK, [haddock path])
AC_PATH_PROG(HADDOCK, [haddock], [])
if test -z "$HADDOCK"; then
AC_MSG_WARN(m4_normalize([haddock not found, htools API documentation will
not be generated]))
fi
if test -n "$HADDOCK" && test -n "$HSCOLOUR"; then
HS_APIDOC=yes
fi
AC_SUBST(HS_APIDOC)
# Check for hlint
AC_ARG_VAR(HLINT, [hlint path])
AC_PATH_PROG(HLINT, [hlint], [])
if test -z "$HLINT"; then
AC_MSG_WARN([hlint not found, checking code will not be possible])
fi
AM_CONDITIONAL([WANT_HSTESTS], [test "x$HS_NODEV" = x])
AM_CONDITIONAL([WANT_HSAPIDOC], [test "$HS_APIDOC" = yes])
AM_CONDITIONAL([HAS_HLINT], [test "$HLINT"])
# Check for fakeroot
AC_ARG_VAR(FAKEROOT_PATH, [fakeroot path])
AC_PATH_PROG(FAKEROOT_PATH, [fakeroot], [])
if test -z "$FAKEROOT_PATH"; then
AC_MSG_WARN(m4_normalize([fakeroot not found, tests that must run as root
will not be executed]))
fi
AM_CONDITIONAL([HAS_FAKEROOT], [test "x$FAKEROOT_PATH" != x])
SOCAT_USE_ESCAPE=
AC_ARG_ENABLE([socat-escape],
[AS_HELP_STRING([--enable-socat-escape],
[use escape functionality available in socat >= 1.7 (default: detect
automatically)])],
[[if test "$enableval" = yes; then
SOCAT_USE_ESCAPE=True
else
SOCAT_USE_ESCAPE=False
fi
]])
if test -z "$SOCAT_USE_ESCAPE"
then
if $SOCAT -hh | grep -w -q escape; then
SOCAT_USE_ESCAPE=True
else
SOCAT_USE_ESCAPE=False
fi
fi
AC_SUBST(SOCAT_USE_ESCAPE)
SOCAT_USE_COMPRESS=
AC_ARG_ENABLE([socat-compress],
[AS_HELP_STRING([--enable-socat-compress],
[use OpenSSL compression option available in patched socat builds
(see INSTALL for details; default: detect automatically)])],
[[if test "$enableval" = yes; then
SOCAT_USE_COMPRESS=True
else
SOCAT_USE_COMPRESS=False
fi
]])
if test -z "$SOCAT_USE_COMPRESS"
then
if $SOCAT -hhh | grep -w -q openssl-compress; then
SOCAT_USE_COMPRESS=True
else
SOCAT_USE_COMPRESS=False
fi
fi
AC_SUBST(SOCAT_USE_COMPRESS)
if man --help | grep -q -e --warnings
then
MAN_HAS_WARNINGS=1
else
MAN_HAS_WARNINGS=
AC_MSG_WARN(m4_normalize([man does not support --warnings, man page checks
will not be possible]))
fi
AC_SUBST(MAN_HAS_WARNINGS)
# Check for Python
# We need a Python-2 interpreter, version at least 2.6. As AM_PATH_PYTHON
# only supports a "minimal version" constraint, we check <3.0 afterwards.
# We tune _AM_PYTHON_INTERPRETER_LIST to first check interpreters that are
# likely interpreters for Python 2.
m4_define_default([_AM_PYTHON_INTERPRETER_LIST],
[python2 python2.7 python2.6 python])
AM_PATH_PYTHON(2.6)
if $PYTHON -c "import sys
if (sys.hexversion >> 24) < 3:
sys.exit(1)
else:
sys.exit(0)
"; then
AC_MSG_FAILURE([Can only work with an interpreter for Python 2])
fi
AC_PYTHON_MODULE(OpenSSL, t)
AC_PYTHON_MODULE(simplejson, t)
AC_PYTHON_MODULE(pyparsing, t)
AC_PYTHON_MODULE(pyinotify, t)
AC_PYTHON_MODULE(pycurl, t)
AC_PYTHON_MODULE(bitarray, t)
AC_PYTHON_MODULE(ipaddr, t)
AC_PYTHON_MODULE(mock)
AC_PYTHON_MODULE(psutil)
AC_PYTHON_MODULE(paramiko)
# Development-only Python modules
PY_NODEV=
AC_PYTHON_MODULE(yaml)
if test $HAVE_PYMOD_YAML == "no"; then
PY_NODEV="$PY_NODEV yaml"
fi
if test -n "$PY_NODEV"; then
AC_MSG_WARN(m4_normalize([Required development modules ($PY_NODEV) were not
found, you won't be able to run Python unittests]))
else
AC_MSG_NOTICE([Python development modules found, unittests enabled])
fi
AC_SUBST(PY_NODEV)
AM_CONDITIONAL([PY_UNIT], [test -n $PY_NODEV])
include_makefile_ghc='
ifneq ($(MAKECMDGOALS),ganeti)
ifneq ($(MAKECMDGOALS),clean)
ifneq ($(MAKECMDGOALS),distclean)
include Makefile.ghc
endif
endif
endif
'
AC_SUBST([include_makefile_ghc])
AM_SUBST_NOTMAKE([include_makefile_ghc])
AC_CONFIG_FILES([ Makefile ])
AC_OUTPUT
ganeti-2.15.2/daemons/ 0000755 0000000 0000000 00000000000 12634264163 0014503 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/daemons/daemon-util.in 0000644 0000000 0000000 00000022647 12634264163 0017264 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2009, 2011, 2012 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e
@SHELL_ENV_INIT@
readonly defaults_file="$SYSCONFDIR/default/ganeti"
# This is a list of all daemons and the order in which they're started. The
# order is important as there are dependencies between them. On shutdown,
# they're stopped in reverse order.
DAEMONS=(
ganeti-noded
ganeti-confd
ganeti-wconfd
ganeti-rapi
ganeti-luxid
ganeti-kvmd
)
# This is the list of daemons that are loaded on demand; they should only be
# stopped, not started.
ON_DEMAND_DAEMONS=(
ganeti-metad
)
_mond_enabled() {
[[ "@CUSTOM_ENABLE_MOND@" == True ]]
}
if _mond_enabled; then
DAEMONS+=( ganeti-mond )
fi
# The full list of all daemons we know about
ALL_DAEMONS=( ${DAEMONS[@]} ${ON_DEMAND_DAEMONS[@]} )
NODED_ARGS=
CONFD_ARGS=
WCONFD_ARGS=
LUXID_ARGS=
RAPI_ARGS=
MOND_ARGS=
# Read defaults file if it exists
if [[ -s $defaults_file ]]; then
. $defaults_file
fi
# Meant to facilitate use utilities in /etc/rc.d/init.d/functions in case
# start-stop-daemon is not available.
_ignore_error() {
eval "$@" || :
}
_daemon_pidfile() {
echo "$RUN_DIR/$1.pid"
}
_daemon_executable() {
echo "@PREFIX@/sbin/$1"
}
_daemon_usergroup() {
case "$1" in
confd)
echo "@GNTCONFDUSER@:@GNTCONFDGROUP@"
;;
wconfd)
echo "@GNTWCONFDUSER@:@GNTWCONFDGROUP@"
;;
luxid)
echo "@GNTLUXIDUSER@:@GNTLUXIDGROUP@"
;;
rapi)
echo "@GNTRAPIUSER@:@GNTRAPIGROUP@"
;;
noded)
echo "@GNTNODEDUSER@:@GNTNODEDGROUP@"
;;
mond)
echo "@GNTMONDUSER@:@GNTMONDGROUP@"
;;
*)
echo "root:@GNTDAEMONSGROUP@"
;;
esac
}
# Checks whether the local machine is part of a cluster
check_config() {
local server_pem=$DATA_DIR/server.pem
local fname
for fname in $server_pem; do
if [[ ! -f $fname ]]; then
echo "Missing configuration file $fname" >&2
return 1
fi
done
return 0
}
# Checks the exit code of a daemon
check_exitcode() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing exit code.' >&2
return 1
fi
local rc="$1"; shift
case "$rc" in
0) ;;
11)
echo "not master"
;;
*)
echo "exit code $rc"
return 1
;;
esac
return 0
}
# Checks if we should use systemctl to start/stop daemons
use_systemctl() {
# Is systemd running as PID 1?
[ -d /run/systemd/system ] || return 1
type -p systemctl >/dev/null || return 1
# Does systemd know about Ganeti at all?
loadstate="$(systemctl show -pLoadState ganeti.target)"
if [ "$loadstate" = "LoadState=loaded" ]; then
return 0
fi
return 1
}
# Prints path to PID file for a daemon.
daemon_pidfile() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
_daemon_pidfile $name
}
# Prints path to daemon executable.
daemon_executable() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
_daemon_executable $name
}
# Prints a list of all daemons in the order in which they should be started
list_start_daemons() {
local name
for name in "${DAEMONS[@]}"; do
echo "$name"
done
}
# Prints a list of all daemons in the order in which they should be stopped
list_stop_daemons() {
for name in "${ALL_DAEMONS[@]}"; do
echo "$name"
done | tac
}
# Checks whether a daemon name is known
is_daemon_name() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
for i in "${ALL_DAEMONS[@]}"; do
if [[ "$i" == "$name" ]]; then
return 0
fi
done
echo "Unknown daemon name '$name'" >&2
return 1
}
# Checks whether daemon is running
check() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
local pidfile=$(_daemon_pidfile $name)
local daemonexec=$(_daemon_executable $name)
if use_systemctl; then
activestate="$(systemctl show -pActiveState "${name}.service")"
if [ "$activestate" = "ActiveState=active" ]; then
return 0
else
return 1
fi
elif type -p start-stop-daemon >/dev/null; then
start-stop-daemon --stop --signal 0 --quiet \
--pidfile $pidfile
else
_ignore_error status \
-p $pidfile \
$daemonexec
fi
}
# Starts a daemon
start() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
# Convert daemon name to uppercase after removing "ganeti-" prefix
local plain_name=${name#ganeti-}
local ucname=$(tr a-z A-Z <<<$plain_name)
local pidfile=$(_daemon_pidfile $name)
local usergroup=$(_daemon_usergroup $plain_name)
local daemonexec=$(_daemon_executable $name)
if use_systemctl; then
systemctl start "${name}.service"
return $?
fi
# Read $_ARGS and $EXTRA__ARGS
eval local args="\"\$${ucname}_ARGS \$EXTRA_${ucname}_ARGS\""
@PKGLIBDIR@/ensure-dirs
if type -p start-stop-daemon >/dev/null; then
start-stop-daemon --start --quiet --oknodo \
--pidfile $pidfile \
--startas $daemonexec \
--chuid $usergroup \
-- $args "$@"
else
# TODO: Find a way to start daemon with a group, until then the group must
# be removed
_ignore_error daemon \
--pidfile $pidfile \
--user ${usergroup%:*} \
$daemonexec $args "$@"
fi
}
# Stops a daemon
stop() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
local pidfile=$(_daemon_pidfile $name)
if use_systemctl; then
systemctl stop "${name}.service"
elif type -p start-stop-daemon >/dev/null; then
start-stop-daemon --stop --quiet --oknodo --retry 30 \
--pidfile $pidfile
else
_ignore_error killproc -p $pidfile $name
fi
}
# Starts a daemon if it's not yet running
check_and_start() {
local name="$1"
if ! check $name; then
if use_systemctl; then
echo "${name} supervised by systemd but not running, will not restart."
return 1
fi
start $name
fi
}
# Starts the master role
start_master() {
if use_systemctl; then
systemctl start ganeti-master.target
else
start ganeti-wconfd
start ganeti-rapi
start ganeti-luxid
fi
}
# Stops the master role
stop_master() {
if use_systemctl; then
systemctl stop ganeti-master.target
else
stop ganeti-luxid
stop ganeti-rapi
stop ganeti-wconfd
fi
}
# Start all daemons
start_all() {
use_systemctl && systemctl start ganeti.target
# Fall through so that we detect any errors.
for i in $(list_start_daemons); do
local rc=0
# Try to start daemon
start $i || rc=$?
if ! errmsg=$(check_exitcode $rc); then
echo "$errmsg" >&2
return 1
fi
done
return 0
}
# Stop all daemons
stop_all() {
if use_systemctl; then
systemctl stop ganeti.target
else
for i in $(list_stop_daemons); do
stop $i
done
fi
}
# SIGHUP a process to force re-opening its logfiles
rotate_logs() {
if [[ "$#" -lt 1 ]]; then
echo 'Missing daemon name.' >&2
return 1
fi
local name="$1"; shift
local pidfile=$(_daemon_pidfile $name)
local daemonexec=$(_daemon_executable $name)
if type -p start-stop-daemon >/dev/null; then
start-stop-daemon --stop --signal HUP --quiet \
--oknodo --pidfile $pidfile
else
_ignore_error killproc \
-p $pidfile \
$daemonexec -HUP
fi
}
# SIGHUP all processes
rotate_all_logs() {
for i in $(list_stop_daemons); do
rotate_logs $i
done
}
# Reloads the SSH keys
reload_ssh_keys() {
@RPL_SSH_INITD_SCRIPT@ restart
}
# Read @SYSCONFDIR@/rc.d/init.d/functions if start-stop-daemon not available
if ! type -p start-stop-daemon >/dev/null && \
[[ -f @SYSCONFDIR@/rc.d/init.d/functions ]]; then
_ignore_error . @SYSCONFDIR@/rc.d/init.d/functions
fi
if [[ "$#" -lt 1 ]]; then
echo "Usage: $0 " >&2
exit 1
fi
orig_action=$1; shift
if [[ "$orig_action" == *_* ]]; then
echo "Command must not contain underscores" >&2
exit 1
fi
# Replace all dashes (-) with underlines (_)
action=${orig_action//-/_}
# Is it a known function?
if ! declare -F "$action" >/dev/null 2>&1; then
echo "Unknown command: $orig_action" >&2
exit 1
fi
# Call handler function
$action "$@"
ganeti-2.15.2/daemons/ganeti-cleaner.in 0000644 0000000 0000000 00000007076 12634264163 0017723 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# Copyright (C) 2009, 2010, 2011, 2012 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
set -e -u
@SHELL_ENV_INIT@
# Overridden by unittest
: ${CHECK_CERT_EXPIRED:=$PKGLIBDIR/check-cert-expired}
usage() {
echo "Usage: $0 node|master" 2>&1
exit $1
}
if [[ "$#" -ne 1 ]]; then
usage 1
fi
case "$1" in
node)
readonly CLEANER_LOG_DIR=$LOG_DIR/cleaner
;;
master)
readonly CLEANER_LOG_DIR=$LOG_DIR/master-cleaner
;;
--help-completion)
echo "choices=node,master 1 1"
exit 0
;;
--help)
usage 0
;;
*)
usage 1
;;
esac
readonly CRYPTO_DIR=$RUN_DIR/crypto
readonly QUEUE_ARCHIVE_DIR=$DATA_DIR/queue/archive
in_cluster() {
[[ -e $DATA_DIR/ssconf_master_node ]]
}
cleanup_node() {
# Return if directory for crypto keys doesn't exist
[[ -d $CRYPTO_DIR ]] || return 0
find $CRYPTO_DIR -mindepth 1 -maxdepth 1 -type d | \
while read dir; do
if $CHECK_CERT_EXPIRED $dir/cert; then
rm -vf $dir/{cert,key}
rmdir -v --ignore-fail-on-non-empty $dir
fi
done
}
cleanup_watcher() {
# Return if machine is not part of a cluster
in_cluster || return 0
# Remove old watcher files
find $DATA_DIR -maxdepth 1 -type f -mtime +$REMOVE_AFTER \
\( -name 'watcher.*-*-*-*.data' -or \
-name 'watcher.*-*-*-*.instance-status' \) -print0 | \
xargs -r0 rm -vf
}
cleanup_master() {
# Return if machine is not part of a cluster
in_cluster || return 0
# Return if queue archive directory doesn't exist
[[ -d $QUEUE_ARCHIVE_DIR ]] || return 0
# Remove old jobs
find $QUEUE_ARCHIVE_DIR -mindepth 2 -type f -mtime +$REMOVE_AFTER -print0 | \
xargs -r0 rm -vf
}
# Define how many days archived jobs should be left alone
REMOVE_AFTER=21
# Define how many log files to keep around (usually one per day)
KEEP_LOGS=50
# Log file for this run
LOG_FILE=$CLEANER_LOG_DIR/cleaner-$(date +'%Y-%m-%dT%H_%M').$$.log
# Create log directory
mkdir -p $CLEANER_LOG_DIR
# Redirect all output to log file
exec >>$LOG_FILE 2>&1
echo "Cleaner started at $(date)"
# Switch to a working directory accessible to the cleaner
cd $CLEANER_LOG_DIR
# Remove old cleaner log files
find $CLEANER_LOG_DIR -maxdepth 1 -type f | sort | head -n -$KEEP_LOGS | \
xargs -r rm -vf
case "$1" in
node)
cleanup_node
cleanup_watcher
;;
master)
cleanup_master
;;
esac
exit 0
ganeti-2.15.2/daemons/import-export 0000755 0000000 0000000 00000053606 12634264163 0017274 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2010 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Import/export daemon.
"""
# pylint: disable=C0103
# C0103: Invalid name import-export
import errno
import logging
import optparse
import os
import select
import signal
import subprocess
import sys
import time
import math
from ganeti import constants
from ganeti import cli
from ganeti import utils
from ganeti import errors
from ganeti import serializer
from ganeti import objects
from ganeti import impexpd
from ganeti import netutils
#: How many lines to keep in the status file
MAX_RECENT_OUTPUT_LINES = 20
#: Don't update status file more than once every 5 seconds (unless forced)
MIN_UPDATE_INTERVAL = 5.0
#: How long to wait for a connection to be established
DEFAULT_CONNECT_TIMEOUT = 60
#: Get dd(1) statistics every few seconds
DD_STATISTICS_INTERVAL = 5.0
#: Seconds for throughput calculation
DD_THROUGHPUT_INTERVAL = 60.0
#: Number of samples for throughput calculation
DD_THROUGHPUT_SAMPLES = int(math.ceil(float(DD_THROUGHPUT_INTERVAL) /
DD_STATISTICS_INTERVAL))
# Global variable for options
options = None
def SetupLogging():
"""Configures the logging module.
"""
formatter = logging.Formatter("%(asctime)s: %(message)s")
stderr_handler = logging.StreamHandler()
stderr_handler.setFormatter(formatter)
stderr_handler.setLevel(logging.NOTSET)
root_logger = logging.getLogger("")
root_logger.addHandler(stderr_handler)
if options.debug:
root_logger.setLevel(logging.NOTSET)
elif options.verbose:
root_logger.setLevel(logging.INFO)
else:
root_logger.setLevel(logging.ERROR)
# Create special logger for child process output
child_logger = logging.Logger("child output")
child_logger.addHandler(stderr_handler)
child_logger.setLevel(logging.NOTSET)
return child_logger
class StatusFile(object):
"""Status file manager.
"""
def __init__(self, path):
"""Initializes class.
"""
self._path = path
self._data = objects.ImportExportStatus(ctime=time.time(),
mtime=None,
recent_output=[])
def AddRecentOutput(self, line):
"""Adds a new line of recent output.
"""
self._data.recent_output.append(line)
# Remove old lines
del self._data.recent_output[:-MAX_RECENT_OUTPUT_LINES]
def SetListenPort(self, port):
"""Sets the port the daemon is listening on.
@type port: int
@param port: TCP/UDP port
"""
assert isinstance(port, (int, long)) and 0 < port < (2 ** 16)
self._data.listen_port = port
def GetListenPort(self):
"""Returns the port the daemon is listening on.
"""
return self._data.listen_port
def SetConnected(self):
"""Sets the connected flag.
"""
self._data.connected = True
def GetConnected(self):
"""Determines whether the daemon is connected.
"""
return self._data.connected
def SetProgress(self, mbytes, throughput, percent, eta):
"""Sets how much data has been transferred so far.
@type mbytes: number
@param mbytes: Transferred amount of data in MiB.
@type throughput: float
@param throughput: MiB/second
@type percent: number
@param percent: Percent processed
@type eta: number
@param eta: Expected number of seconds until done
"""
self._data.progress_mbytes = mbytes
self._data.progress_throughput = throughput
self._data.progress_percent = percent
self._data.progress_eta = eta
def SetExitStatus(self, exit_status, error_message):
"""Sets the exit status and an error message.
"""
# Require error message when status isn't 0
assert exit_status == 0 or error_message
self._data.exit_status = exit_status
self._data.error_message = error_message
def ExitStatusIsSuccess(self):
"""Returns whether the exit status means "success".
"""
return not bool(self._data.error_message)
def Update(self, force):
"""Updates the status file.
@type force: bool
@param force: Write status file in any case, not only when minimum interval
is expired
"""
if not (force or
self._data.mtime is None or
time.time() > (self._data.mtime + MIN_UPDATE_INTERVAL)):
return
logging.debug("Updating status file %s", self._path)
self._data.mtime = time.time()
utils.WriteFile(self._path,
data=serializer.DumpJson(self._data.ToDict()),
mode=0400)
def ProcessChildIO(child, socat_stderr_read_fd, dd_stderr_read_fd,
dd_pid_read_fd, exp_size_read_fd, status_file, child_logger,
signal_notify, signal_handler, mode):
"""Handles the child processes' output.
"""
assert not (signal_handler.signum - set([signal.SIGTERM, signal.SIGINT])), \
"Other signals are not handled in this function"
# Buffer size 0 is important, otherwise .read() with a specified length
# might buffer data while poll(2) won't mark its file descriptor as
# readable again.
socat_stderr_read = os.fdopen(socat_stderr_read_fd, "r", 0)
dd_stderr_read = os.fdopen(dd_stderr_read_fd, "r", 0)
dd_pid_read = os.fdopen(dd_pid_read_fd, "r", 0)
exp_size_read = os.fdopen(exp_size_read_fd, "r", 0)
tp_samples = DD_THROUGHPUT_SAMPLES
if options.exp_size == constants.IE_CUSTOM_SIZE:
exp_size = None
else:
exp_size = options.exp_size
child_io_proc = impexpd.ChildIOProcessor(options.debug, status_file,
child_logger, tp_samples,
exp_size)
try:
fdmap = {
child.stderr.fileno():
(child.stderr, child_io_proc.GetLineSplitter(impexpd.PROG_OTHER)),
socat_stderr_read.fileno():
(socat_stderr_read, child_io_proc.GetLineSplitter(impexpd.PROG_SOCAT)),
dd_pid_read.fileno():
(dd_pid_read, child_io_proc.GetLineSplitter(impexpd.PROG_DD_PID)),
dd_stderr_read.fileno():
(dd_stderr_read, child_io_proc.GetLineSplitter(impexpd.PROG_DD)),
exp_size_read.fileno():
(exp_size_read, child_io_proc.GetLineSplitter(impexpd.PROG_EXP_SIZE)),
signal_notify.fileno(): (signal_notify, None),
}
poller = select.poll()
for fd in fdmap:
utils.SetNonblockFlag(fd, True)
poller.register(fd, select.POLLIN)
if options.connect_timeout and mode == constants.IEM_IMPORT:
listen_timeout = utils.RunningTimeout(options.connect_timeout, True)
else:
listen_timeout = None
exit_timeout = None
dd_stats_timeout = None
while True:
# Break out of loop if only signal notify FD is left
if len(fdmap) == 1 and signal_notify.fileno() in fdmap:
break
timeout = None
if listen_timeout and not exit_timeout:
assert mode == constants.IEM_IMPORT and options.connect_timeout
if status_file.GetConnected():
listen_timeout = None
elif listen_timeout.Remaining() < 0:
errmsg = ("Child process didn't establish connection in time"
" (%0.0fs), sending SIGTERM" % options.connect_timeout)
logging.error(errmsg)
status_file.AddRecentOutput(errmsg)
status_file.Update(True)
child.Kill(signal.SIGTERM)
exit_timeout = \
utils.RunningTimeout(constants.CHILD_LINGER_TIMEOUT, True)
# Next block will calculate timeout
else:
# Not yet connected, check again in a second
timeout = 1000
if exit_timeout:
timeout = exit_timeout.Remaining() * 1000
if timeout < 0:
logging.info("Child process didn't exit in time")
break
if (not dd_stats_timeout) or dd_stats_timeout.Remaining() < 0:
notify_status = child_io_proc.NotifyDd()
if notify_status:
# Schedule next notification
dd_stats_timeout = utils.RunningTimeout(DD_STATISTICS_INTERVAL, True)
else:
# Try again soon (dd isn't ready yet)
dd_stats_timeout = utils.RunningTimeout(1.0, True)
if dd_stats_timeout:
dd_timeout = max(0, dd_stats_timeout.Remaining() * 1000)
if timeout is None:
timeout = dd_timeout
else:
timeout = min(timeout, dd_timeout)
for fd, event in utils.RetryOnSignal(poller.poll, timeout):
if event & (select.POLLIN | event & select.POLLPRI):
(from_, to) = fdmap[fd]
# Read up to 1 KB of data
data = from_.read(1024)
if data:
if to:
to.write(data)
elif fd == signal_notify.fileno():
# Signal handling
if signal_handler.called:
signal_handler.Clear()
if exit_timeout:
logging.info("Child process still has about %0.2f seconds"
" to exit", exit_timeout.Remaining())
else:
logging.info("Giving child process %0.2f seconds to exit",
constants.CHILD_LINGER_TIMEOUT)
exit_timeout = \
utils.RunningTimeout(constants.CHILD_LINGER_TIMEOUT, True)
else:
poller.unregister(fd)
del fdmap[fd]
elif event & (select.POLLNVAL | select.POLLHUP |
select.POLLERR):
poller.unregister(fd)
del fdmap[fd]
child_io_proc.FlushAll()
# If there was a timeout calculator, we were waiting for the child to
# finish, e.g. due to a signal
return not bool(exit_timeout)
finally:
child_io_proc.CloseAll()
def ParseOptions():
"""Parses the options passed to the program.
@return: Arguments to program
"""
global options # pylint: disable=W0603
parser = optparse.OptionParser(usage=("%%prog {%s|%s}" %
(constants.IEM_IMPORT,
constants.IEM_EXPORT)))
parser.add_option(cli.DEBUG_OPT)
parser.add_option(cli.VERBOSE_OPT)
parser.add_option("--key", dest="key", action="store", type="string",
help="RSA key file")
parser.add_option("--cert", dest="cert", action="store", type="string",
help="X509 certificate file")
parser.add_option("--ca", dest="ca", action="store", type="string",
help="X509 CA file")
parser.add_option("--bind", dest="bind", action="store", type="string",
help="Bind address")
parser.add_option("--ipv4", dest="ipv4", action="store_true",
help="Use IPv4 only")
parser.add_option("--ipv6", dest="ipv6", action="store_true",
help="Use IPv6 only")
parser.add_option("--host", dest="host", action="store", type="string",
help="Remote hostname")
parser.add_option("--port", dest="port", action="store", type="int",
help="Remote port")
parser.add_option("--connect-retries", dest="connect_retries", action="store",
type="int", default=0,
help=("How many times the connection should be retried"
" (export only)"))
parser.add_option("--connect-timeout", dest="connect_timeout", action="store",
type="int", default=DEFAULT_CONNECT_TIMEOUT,
help="Timeout for connection to be established (seconds)")
parser.add_option("--compress", dest="compress", action="store",
type="string", help="Compression method",
default=constants.IEC_GZIP)
parser.add_option("--expected-size", dest="exp_size", action="store",
type="string", default=None,
help="Expected import/export size (MiB)")
parser.add_option("--magic", dest="magic", action="store",
type="string", default=None, help="Magic string")
parser.add_option("--cmd-prefix", dest="cmd_prefix", action="store",
type="string", help="Command prefix")
parser.add_option("--cmd-suffix", dest="cmd_suffix", action="store",
type="string", help="Command suffix")
(options, args) = parser.parse_args()
if len(args) != 2:
# Won't return
parser.error("Expected exactly two arguments")
(status_file_path, mode) = args
if mode not in (constants.IEM_IMPORT,
constants.IEM_EXPORT):
# Won't return
parser.error("Invalid mode: %s" % mode)
# Normalize and check parameters
if options.host is not None and not netutils.IPAddress.IsValid(options.host):
try:
options.host = netutils.Hostname.GetNormalizedName(options.host)
except errors.OpPrereqError, err:
parser.error("Invalid hostname '%s': %s" % (options.host, err))
if options.port is not None:
options.port = utils.ValidateServiceName(options.port)
if (options.exp_size is not None and
options.exp_size != constants.IE_CUSTOM_SIZE):
try:
options.exp_size = int(options.exp_size)
except (ValueError, TypeError), err:
# Won't return
parser.error("Invalid value for --expected-size: %s (%s)" %
(options.exp_size, err))
if not (options.magic is None or constants.IE_MAGIC_RE.match(options.magic)):
parser.error("Magic must match regular expression %s" %
constants.IE_MAGIC_RE.pattern)
if options.ipv4 and options.ipv6:
parser.error("Can only use one of --ipv4 and --ipv6")
return (status_file_path, mode)
# Return code signifying that no program was found
PROGRAM_NOT_FOUND_RCODE = 127
def _RunWithTimeout(cmd, timeout, silent=False):
"""Runs a command, killing it if a timeout was reached.
Uses the alarm signal, not thread-safe. Waits regardless of whether the
command exited early.
@type timeout: number
@param timeout: Timeout, in seconds
@type silent: Boolean
@param silent: Whether command output should be suppressed
@rtype: tuple of (bool, int)
@return: Whether the command timed out, and the return code
"""
try:
if silent:
with open(os.devnull, 'wb') as null_fd:
p = subprocess.Popen(cmd, stdout=null_fd, stderr=null_fd)
else:
p = subprocess.Popen(cmd)
except OSError:
return False, PROGRAM_NOT_FOUND_RCODE
time.sleep(timeout)
timed_out = False
status = p.poll()
if status is None:
timed_out = True
p.kill()
return timed_out, p.wait()
CHECK_SWITCH = "-h"
def VerifyOptions():
"""Performs various runtime checks to make sure the options are valid.
"""
if options.compress != constants.IEC_NONE:
utility_name = constants.IEC_COMPRESSION_UTILITIES.get(options.compress,
options.compress)
timed_out, rcode = \
_RunWithTimeout([utility_name, CHECK_SWITCH], 2, silent=True)
if timed_out:
raise Exception("The invoked utility has timed out - the %s switch to"
" check for presence must be supported" % CHECK_SWITCH)
if rcode != 0:
raise Exception("Verification attempt of selected compression method %s"
" failed - check that %s is present and can be invoked"
" safely with the %s switch" %
(options.compress, utility_name, CHECK_SWITCH))
class ChildProcess(subprocess.Popen):
def __init__(self, env, cmd, noclose_fds):
"""Initializes this class.
"""
self._noclose_fds = noclose_fds
# Not using close_fds because doing so would also close the socat stderr
# pipe, which we still need.
subprocess.Popen.__init__(self, cmd, env=env, shell=False, close_fds=False,
stderr=subprocess.PIPE, stdout=None, stdin=None,
preexec_fn=self._ChildPreexec)
self._SetProcessGroup()
def _ChildPreexec(self):
"""Called before child executable is execve'd.
"""
# Move to separate process group. By sending a signal to its process group
# we can kill the child process and all grandchildren.
os.setpgid(0, 0)
# Close almost all file descriptors
utils.CloseFDs(noclose_fds=self._noclose_fds)
def _SetProcessGroup(self):
"""Sets the child's process group.
"""
assert self.pid, "Can't be called in child process"
# Avoid race condition by setting child's process group (as good as
# possible in Python) before sending signals to child. For an
# explanation, see preexec function for child.
try:
os.setpgid(self.pid, self.pid)
except EnvironmentError, err:
# If the child process was faster we receive EPERM or EACCES
if err.errno not in (errno.EPERM, errno.EACCES):
raise
def Kill(self, signum):
"""Sends signal to child process.
"""
logging.info("Sending signal %s to child process", signum)
utils.IgnoreProcessNotFound(os.killpg, self.pid, signum)
def ForceQuit(self):
"""Ensure child process is no longer running.
"""
# Final check if child process is still alive
if utils.RetryOnSignal(self.poll) is None:
logging.error("Child process still alive, sending SIGKILL")
self.Kill(signal.SIGKILL)
utils.RetryOnSignal(self.wait)
def main():
"""Main function.
"""
# Option parsing
(status_file_path, mode) = ParseOptions()
# Configure logging
child_logger = SetupLogging()
status_file = StatusFile(status_file_path)
try:
try:
# Option verification
VerifyOptions()
# Pipe to receive socat's stderr output
(socat_stderr_read_fd, socat_stderr_write_fd) = os.pipe()
# Pipe to receive dd's stderr output
(dd_stderr_read_fd, dd_stderr_write_fd) = os.pipe()
# Pipe to receive dd's PID
(dd_pid_read_fd, dd_pid_write_fd) = os.pipe()
# Pipe to receive size predicted by export script
(exp_size_read_fd, exp_size_write_fd) = os.pipe()
# Get child process command
cmd_builder = impexpd.CommandBuilder(mode, options, socat_stderr_write_fd,
dd_stderr_write_fd, dd_pid_write_fd)
cmd = cmd_builder.GetCommand()
# Prepare command environment
cmd_env = os.environ.copy()
if options.exp_size == constants.IE_CUSTOM_SIZE:
cmd_env["EXP_SIZE_FD"] = str(exp_size_write_fd)
logging.debug("Starting command %r", cmd)
# Start child process
child = ChildProcess(cmd_env, cmd,
[socat_stderr_write_fd, dd_stderr_write_fd,
dd_pid_write_fd, exp_size_write_fd])
try:
def _ForwardSignal(signum, _):
"""Forwards signals to child process.
"""
child.Kill(signum)
signal_wakeup = utils.SignalWakeupFd()
try:
# TODO: There is a race condition between starting the child and
# handling the signals here. While there might be a way to work around
# it by registering the handlers before starting the child and
# deferring sent signals until the child is available, doing so can be
# complicated.
signal_handler = utils.SignalHandler([signal.SIGTERM, signal.SIGINT],
handler_fn=_ForwardSignal,
wakeup=signal_wakeup)
try:
# Close child's side
utils.RetryOnSignal(os.close, socat_stderr_write_fd)
utils.RetryOnSignal(os.close, dd_stderr_write_fd)
utils.RetryOnSignal(os.close, dd_pid_write_fd)
utils.RetryOnSignal(os.close, exp_size_write_fd)
if ProcessChildIO(child, socat_stderr_read_fd, dd_stderr_read_fd,
dd_pid_read_fd, exp_size_read_fd,
status_file, child_logger,
signal_wakeup, signal_handler, mode):
# The child closed all its file descriptors and there was no
# signal
# TODO: Implement timeout instead of waiting indefinitely
utils.RetryOnSignal(child.wait)
finally:
signal_handler.Reset()
finally:
signal_wakeup.Reset()
finally:
child.ForceQuit()
if child.returncode == 0:
errmsg = None
elif child.returncode < 0:
errmsg = "Exited due to signal %s" % (-child.returncode, )
else:
errmsg = "Exited with status %s" % (child.returncode, )
status_file.SetExitStatus(child.returncode, errmsg)
except Exception, err: # pylint: disable=W0703
logging.exception("Unhandled error occurred")
status_file.SetExitStatus(constants.EXIT_FAILURE,
"Unhandled error occurred: %s" % (err, ))
if status_file.ExitStatusIsSuccess():
sys.exit(constants.EXIT_SUCCESS)
sys.exit(constants.EXIT_FAILURE)
finally:
status_file.Update(True)
if __name__ == "__main__":
main()
ganeti-2.15.2/devel/ 0000755 0000000 0000000 00000000000 12634264163 0014154 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/devel/build_chroot 0000755 0000000 0000000 00000037144 12634264163 0016570 0 ustar 00root root 0000000 0000000 #!/bin/bash
#Requirements for this script to work:
#* Make sure that the user who uses the chroot is in group 'src', or change
# the ${GROUP} variable to a group that contains the user.
#* Add any path of the host system that you want to access inside the chroot
# to the /etc/schroot/default/fstab file. This is important in particular if
# your homedir is not in /home.
#* Add this to your /etc/fstab:
# tmpfs /var/lib/schroot/mount tmpfs defaults,size=3G 0 0
# tmpfs /var/lib/schroot/unpack tmpfs defaults,size=3G 0 0
#Configuration
: ${ARCH:=amd64}
: ${DIST_RELEASE:=wheezy}
: ${CONF_DIR:=/etc/schroot/chroot.d}
: ${CHROOT_DIR:=/srv/chroot}
: ${ALTERNATIVE_EDITOR:=/usr/bin/vim.basic}
: ${CHROOT_FINAL_HOOK:=/bin/true}
: ${GROUP:=src}
# Additional Variables taken from the environmen
# DATA_DIR
# CHROOT_EXTRA_DEBIAN_PACKAGES
#Automatically generated variables
CHROOTNAME=$DIST_RELEASE-$ARCH
CHNAME=building_$CHROOTNAME
TEMP_CHROOT_CONF=$CONF_DIR/$CHNAME.conf
FINAL_CHROOT_CONF=$CHROOTNAME.conf
ROOT=`pwd`
CHDIR=$ROOT/$CHNAME
USER=`whoami`
COMP_FILENAME=$CHROOTNAME.tar.gz
COMP_FILEPATH=$ROOT/$COMP_FILENAME
TEMP_DATA_DIR=`mktemp -d`
ACTUAL_DATA_DIR=$DATA_DIR
ACTUAL_DATA_DIR=${ACTUAL_DATA_DIR:-$TEMP_DATA_DIR}
GHC_VERSION="7.6.3"
CABAL_INSTALL_VERSION="1.18.0.2"
SHA1_LIST='
cabal-install-1.18.0.2.tar.gz 2d1f7a48d17b1e02a1e67584a889b2ff4176a773
ghc-7.6.3-i386-unknown-linux.tar.bz2 f042b4171a2d4745137f2e425e6949c185f8ea14
ghc-7.6.3-x86_64-unknown-linux.tar.bz2 46ec3f3352ff57fba0dcbc8d9c20f7bcb6924b77
'
# export all variables needed in the schroot
export ARCH GHC_VERSION CABAL_INSTALL_VERSION SHA1_LIST
# Use gzip --rsyncable if available, to speed up transfers of generated files
# The environment variable GZIP is read automatically by 'gzip',
# see ENVIRONMENT in gzip(1).
gzip --rsyncable /dev/null 2>&1 && export GZIP="--rsyncable"
#Runnability checks
if [ $USER != 'root' ]
then
echo "This script requires root permissions to run"
exit
fi
if [ -f $TEMP_CHROOT_CONF ]
then
echo "The configuration file name for the temporary chroot"
echo " $TEMP_CHROOT_CONF"
echo "already exists."
echo "Remove it or change the CHNAME value in the script."
exit
fi
#Create configuration dir and files if they do not exist
if [ ! -d $ACTUAL_DATA_DIR ]
then
mkdir $ACTUAL_DATA_DIR
echo "The data directory"
echo " $ACTUAL_DATA_DIR"
echo "has been created."
fi
if [ ! -f $ACTUAL_DATA_DIR/final.schroot.conf.in ]
then
cat <$ACTUAL_DATA_DIR/final.schroot.conf.in
[${CHROOTNAME}]
description=Debian ${DIST_RELEASE} ${ARCH}
groups=${GROUP}
source-root-groups=root
type=file
file=${CHROOT_DIR}/${COMP_FILENAME}
END
echo "The file"
echo " $ACTUAL_DATA_DIR/final.schroot.conf.in"
echo "has been created with default configurations."
fi
if [ ! -f $ACTUAL_DATA_DIR/temp.schroot.conf.in ]
then
cat <$ACTUAL_DATA_DIR/temp.schroot.conf.in
[${CHNAME}]
description=Debian ${DIST_RELEASE} ${ARCH}
directory=${CHDIR}
groups=${GROUP}
users=root
type=directory
END
echo "The file"
echo " $ACTUAL_DATA_DIR/temp.schroot.conf.in"
echo "has been created with default configurations."
fi
#Stop on errors
set -e
#Cleanup
rm -rf $CHDIR
mkdir $CHDIR
#Install tools for building chroots
apt-get install -y schroot debootstrap
shopt -s expand_aliases
alias in_chroot='schroot -c $CHNAME -d / '
function subst_variables {
sed \
-e "s/\${ARCH}/$ARCH/" \
-e "s*\${CHDIR}*$CHDIR*" \
-e "s/\${CHNAME}/$CHNAME/" \
-e "s/\${CHROOTNAME}/$CHROOTNAME/" \
-e "s*\${CHROOT_DIR}*$CHROOT_DIR*" \
-e "s/\${COMP_FILENAME}/$COMP_FILENAME/" \
-e "s/\${DIST_RELEASE}/$DIST_RELEASE/" $@
}
#Generate chroot configurations
cat $ACTUAL_DATA_DIR/temp.schroot.conf.in | subst_variables > $TEMP_CHROOT_CONF
cat $ACTUAL_DATA_DIR/final.schroot.conf.in | subst_variables > $FINAL_CHROOT_CONF
#Install the base system
debootstrap --arch $ARCH $DIST_RELEASE $CHDIR
APT_INSTALL="apt-get install -y --no-install-recommends"
if [ $DIST_RELEASE = squeeze ]
then
echo "deb http://backports.debian.org/debian-backports" \
"$DIST_RELEASE-backports main contrib non-free" \
> $CHDIR/etc/apt/sources.list.d/backports.list
fi
#Install all the packages
in_chroot -- \
apt-get update
# Functions for downloading and checking Haskell core components.
# The functions run commands within the schroot.
# arguments : file_name expected_sha1
function verify_sha1 {
local SUM="$( in_chroot -- sha1sum "$1" | awk '{print $1;exit}' )"
if [ "$SUM" != "$2" ] ; then
echo "ERROR: The SHA1 sum $SUM of $1 doesn't match $2." >&2
return 1
else
echo "SHA1 of $1 verified correct."
fi
}
# arguments: URL
function lookup_sha1 {
grep -o "${1##*/}"'\s\+[0-9a-fA-F]*' <<<"$SHA1_LIST" | awk '{print $2;exit}'
}
# arguments : file_name URL
function download {
local FNAME="$1"
local URL="$2"
in_chroot -- wget --no-check-certificate --output-document="$FNAME" "$URL"
verify_sha1 "$FNAME" "$( lookup_sha1 "$URL" )"
}
function install_ghc {
local GHC_ARCH=$ARCH
local TDIR=$( schroot -c $CHNAME -d / -- mktemp -d )
[ -n "$TDIR" ]
if [ "$ARCH" == "amd64" ] ; then
download "$TDIR"/ghc.tar.bz2 \
http://www.haskell.org/ghc/dist/${GHC_VERSION}/ghc-${GHC_VERSION}-x86_64-unknown-linux.tar.bz2
elif [ "$ARCH" == "i386" ] ; then
download "$TDIR"/ghc.tar.bz2 \
http://www.haskell.org/ghc/dist/${GHC_VERSION}/ghc-${GHC_VERSION}-i386-unknown-linux.tar.bz2
else
echo "Don't know what GHC to download for architecture $ARCH" >&2
return 1
fi
schroot -c $CHNAME -d "$TDIR" -- \
tar xjf ghc.tar.bz2
schroot -c $CHNAME -d "$TDIR/ghc-${GHC_VERSION}" -- \
./configure --prefix=/usr/local
schroot -c $CHNAME -d "$TDIR/ghc-${GHC_VERSION}" -- \
make install
schroot -c $CHNAME -d "/" -- \
rm -rf "$TDIR"
}
function install_cabal {
local TDIR=$( schroot -c $CHNAME -d / -- mktemp -d )
[ -n "$TDIR" ]
download "$TDIR"/cabal-install.tar.gz \
http://www.haskell.org/cabal/release/cabal-install-${CABAL_INSTALL_VERSION}/cabal-install-${CABAL_INSTALL_VERSION}.tar.gz
schroot -c $CHNAME -d "$TDIR" -- \
tar xzf cabal-install.tar.gz
schroot -c $CHNAME -d "$TDIR/cabal-install-${CABAL_INSTALL_VERSION}" -- \
bash -c 'EXTRA_CONFIGURE_OPTS="--enable-library-profiling" ./bootstrap.sh --global'
schroot -c $CHNAME -d "/" -- \
rm -rf "$TDIR"
}
case $DIST_RELEASE in
squeeze)
# do not install libghc6-network-dev, since it's too old, and just
# confuses the dependencies
in_chroot -- \
$APT_INSTALL \
autoconf automake \
zlib1g-dev \
libgmp3-dev \
libcurl4-gnutls-dev \
libpcre3-dev \
happy \
hscolour pandoc \
graphviz qemu-utils \
python-docutils \
python-simplejson \
python-pyparsing \
python-pyinotify \
python-pycurl \
python-ipaddr \
python-yaml \
python-paramiko
in_chroot -- \
$APT_INSTALL python-setuptools python-dev build-essential
in_chroot -- \
easy_install \
logilab-astng==0.24.1 \
logilab-common==0.58.3 \
mock==1.0.1 \
pylint==0.26.0
in_chroot -- \
easy_install \
sphinx==1.1.3 \
pep8==1.3.3 \
coverage==3.4 \
bitarray==0.8.0
install_ghc
install_cabal
in_chroot -- \
cabal update
# sinec we're using Cabal >=1.16, we can use the parallel install option
in_chroot -- \
cabal install --global -j --enable-library-profiling \
attoparsec-0.11.1.0 \
base64-bytestring-1.0.0.1 \
blaze-builder-0.3.3.2 \
case-insensitive-1.1.0.3 \
Crypto-4.2.5.1 \
curl-1.3.8 \
happy \
hashable-1.2.1.0 \
hinotify-0.3.6 \
hscolour-1.20.3 \
hslogger-1.2.3 \
json-0.7 \
lifted-base-0.2.2.0 \
lens-4.0.4 \
MonadCatchIO-transformers-0.3.0.0 \
network-2.4.1.2 \
parallel-3.2.0.4 \
parsec-3.1.3 \
regex-pcre-0.94.4 \
temporary-1.2.0.1 \
vector-0.10.9.1 \
zlib-0.5.4.1 \
\
'hlint>=1.9.12' \
HUnit-1.2.5.2 \
QuickCheck-2.6 \
test-framework-0.8.0.3 \
test-framework-hunit-0.3.0.1 \
test-framework-quickcheck2-0.3.0.2 \
\
snap-server-0.9.4.0 \
PSQueue-1.1 \
\
cabal-file-th-0.2.3 \
shelltestrunner
#Install selected packages from backports
in_chroot -- \
$APT_INSTALL -t squeeze-backports \
git \
git-email \
vim
;;
wheezy)
in_chroot -- \
$APT_INSTALL \
autoconf automake ghc ghc-haddock libghc-network-dev \
libghc-test-framework{,-hunit,-quickcheck2}-dev \
libghc-json-dev libghc-curl-dev libghc-hinotify-dev \
libghc-parallel-dev libghc-utf8-string-dev \
libghc-hslogger-dev libghc-crypto-dev \
libghc-regex-pcre-dev libghc-attoparsec-dev \
libghc-vector-dev libghc-temporary-dev \
libghc-snap-server-dev libpcre3 libpcre3-dev happy hscolour pandoc \
libghc-zlib-dev libghc-psqueue-dev \
cabal-install \
python-setuptools python-sphinx python-epydoc graphviz python-pyparsing \
python-simplejson python-pycurl python-paramiko \
python-bitarray python-ipaddr python-yaml qemu-utils python-coverage pep8 \
shelltestrunner python-dev openssh-client vim git git-email
# We need version 0.9.4 of pyinotify because the packaged version, 0.9.3, is
# incompatibile with the packaged version of python-epydoc 3.0.1.
# Reason: a logger class in pyinotify calculates its superclasses at
# runtime, which clashes with python-epydoc's static analysis phase.
#
# Problem introduced in:
# https://github.com/seb-m/pyinotify/commit/2c7e8f8959d2f8528e0d90847df360
# and "fixed" in:
# https://github.com/seb-m/pyinotify/commit/98c5f41a6e2e90827a63ff1b878596
in_chroot -- \
easy_install \
logilab-astng==0.24.1 \
logilab-common==0.58.3 \
mock==1.0.1 \
pylint==0.26.0 \
pep8==1.3.3
in_chroot -- \
easy_install pyinotify==0.9.4
in_chroot -- \
cabal update
in_chroot -- \
cabal install --global \
'base64-bytestring>=1' \
lens-3.10.2 \
'lifted-base>=0.1.2' \
'hlint>=1.9.12'
;;
testing)
in_chroot -- \
$APT_INSTALL \
autoconf automake ghc ghc-haddock libghc-network-dev \
libghc-test-framework{,-hunit,-quickcheck2}-dev \
libghc-json-dev libghc-curl-dev libghc-hinotify-dev \
libghc-parallel-dev libghc-utf8-string-dev \
libghc-hslogger-dev libghc-crypto-dev \
libghc-regex-pcre-dev libghc-attoparsec-dev \
libghc-vector-dev libghc-temporary-dev \
libghc-snap-server-dev libpcre3 libpcre3-dev happy hscolour pandoc \
libghc-zlib-dev libghc-psqueue-dev \
libghc-base64-bytestring-dev libghc-lens-dev libghc-lifted-base-dev \
libghc-cabal-dev \
cabal-install \
python-setuptools python-sphinx python-epydoc graphviz python-pyparsing \
python-simplejson python-pycurl python-pyinotify python-paramiko \
python-bitarray python-ipaddr python-yaml qemu-utils python-coverage pep8 \
shelltestrunner python-dev pylint openssh-client vim git git-email
in_chroot -- \
cabal update
in_chroot -- \
cabal install --global \
'hlint>=1.9.12'
;;
precise)
# ghc, git-email and other dependencies are hosted in the universe
# repository, which is not enabled by default.
echo "Adding universe repository..."
cat > $CHDIR/etc/apt/sources.list.d/universe.list <=1' \
hslogger-1.2.3 \
'hlint>=1.9.12' \
json-0.7 \
lens-3.10.2 \
'lifted-base>=0.1.2' \
'network>=2.4.0.1' \
'regex-pcre>=0.94.4' \
parsec-3.1.3 \
shelltestrunner \
'snap-server>=0.8.1' \
test-framework-0.8.0.3 \
test-framework-hunit-0.3.0.1 \
test-framework-quickcheck2-0.3.0.2 \
'transformers>=0.3.0.0'
;;
*)
in_chroot -- \
$APT_INSTALL \
autoconf automake ghc ghc-haddock libghc-network-dev \
libghc-test-framework{,-hunit,-quickcheck2}-dev \
libghc-json-dev libghc-curl-dev libghc-hinotify-dev \
libghc-parallel-dev libghc-utf8-string-dev \
libghc-hslogger-dev libghc-crypto-dev \
libghc-regex-pcre-dev libghc-attoparsec-dev \
libghc-vector-dev libghc-temporary-dev libghc-psqueue-dev \
libghc-snap-server-dev libpcre3 libpcre3-dev happy hscolour pandoc \
libghc-lens-dev libghc-lifted-base-dev \
libghc-cabal-dev \
cabal-install \
libghc-base64-bytestring-dev \
python-setuptools python-sphinx python-epydoc graphviz python-pyparsing \
python-simplejson python-pyinotify python-pycurl python-paramiko \
python-bitarray python-ipaddr python-yaml qemu-utils python-coverage pep8 \
shelltestrunner python-dev pylint openssh-client vim git git-email \
build-essential
in_chroot -- \
cabal update
in_chroot -- \
cabal install --global \
'hlint>=1.9.12'
;;
esac
# print what packages and versions are installed:
in_chroot -- \
cabal list --installed --simple-output
in_chroot -- \
$APT_INSTALL sudo fakeroot rsync locales less socat
# Configure the locale
case $DIST_RELEASE in
precise)
in_chroot -- \
$APT_INSTALL language-pack-en
;;
*)
echo "en_US.UTF-8 UTF-8" >> $CHDIR/etc/locale.gen
in_chroot -- \
locale-gen
;;
esac
in_chroot -- \
$APT_INSTALL lvm2 ssh bridge-utils iproute iputils-arping \
ndisc6 python-openssl openssl \
python-mock fping qemu-utils
in_chroot -- \
easy_install psutil
in_chroot -- \
easy_install jsonpointer \
jsonpointer \
jsonpatch
in_chroot -- \
$APT_INSTALL \
python-epydoc debhelper quilt
# extra debian packages
for package in $CHROOT_EXTRA_DEBIAN_PACKAGES
do in_chroot -- \
$APT_INSTALL $package
done
#Set default editor
in_chroot -- \
update-alternatives --set editor $ALTERNATIVE_EDITOR
# Final user hook
in_chroot -- $CHROOT_FINAL_HOOK
rm -f $COMP_FILEPATH
echo "Creating compressed schroot image..."
cd $CHDIR
tar czf $COMP_FILEPATH ./*
cd $ROOT
rm -rf $CHDIR
rm -f $TEMP_CHROOT_CONF
rm -rf $TEMP_DATA_DIR
echo "Chroot created. In order to run it:"
echo " * sudo cp $FINAL_CHROOT_CONF $CONF_DIR/$FINAL_CHROOT_CONF"
echo " * sudo mkdir -p $CHROOT_DIR"
echo " * sudo cp $COMP_FILEPATH $CHROOT_DIR/$COMP_FILENAME"
echo "Then run \"schroot -c $CHROOTNAME\""
ganeti-2.15.2/devel/check-split-query 0000755 0000000 0000000 00000005475 12634264163 0017466 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Copyright (C) 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Checks query equivalence between masterd and confd
#
# This is not (currently) run automatically during QA, but you can run
# it manually on a test cluster. It will force all queries known to be
# converted via both paths and check the difference, via both 'list'
# and 'list-fields'. For best results, it should be run on a non-empty
# cluster.
#
# Also note that this is not expected to show 100% perfect matches,
# since the JSON output differs slightly for complex data types
# (e.g. dictionaries with different sort order for keys, etc.).
#
# Current known delta:
# - all dicts, sort order
# - ctime is always defined in Haskell as epoch 0 if missing
MA=`mktemp master.XXXXXX`
CF=`mktemp confd.XXXXXX`
trap 'rm -f "$MA" "$CF"' EXIT
trap 'exit 1' SIGINT
RET=0
SEP="--separator=,"
ENABLED_QUERIES="node group network backup"
test_cmd() {
cmd="$1"
desc="$2"
FORCE_LUXI_SOCKET=master $cmd > "$MA"
FORCE_LUXI_SOCKET=query $cmd > "$CF"
diff -u "$MA" "$CF" || {
echo "Mismatch in $desc, see above."
RET=1
}
}
for kind in $ENABLED_QUERIES; do
all_fields=$(FORCE_LUXI_SOCKET=master gnt-$kind list-fields \
--no-headers --separator=,|cut -d, -f1)
comma_fields=$(echo $all_fields|tr ' ' ,|sed -e 's/,$//')
for op in list list-fields; do
test_cmd "gnt-$kind $op $SEP" "$kind $op"
done
#test_cmd "gnt-$kind list $SEP -o$comma_fields" "$kind list with all fields"
for field in $all_fields; do
test_cmd "gnt-$kind list $SEP -o$field" "$kind list for field $field"
done
done
exit $RET
ganeti-2.15.2/devel/check_copyright 0000755 0000000 0000000 00000006075 12634264163 0017257 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Copyright (C) 2014 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Script to check whether the local dirty commits are changing files
# which do not have an updated copyright.
#
# The script will determine your current remote branch and local
# branch, from which it will extract the commits to analyze.
# Afterwards, for each commit, it will see which files are being
# modified and, for each file, it will check the copyright.
function join {
local IFS="$1"
shift
echo "$*"
}
# Determine the tracking branch for the current branch
readonly REMOTE=$(git branch -vv | grep -e "^\*" | sed -e "s/ \+/ /g" | awk '{ print $4 }' | grep "\[" | tr -d ":[]")
if [ -z "$REMOTE" ]
then
echo check_copyright: failed to get remote branch
exit 1
fi
# Determine which commits have no been pushed (i.e, diff between the
# remote branch and the current branch)
COMMITS=$(git log --pretty=format:'%h' ${REMOTE}..HEAD)
if [ -z "$COMMITS" ]
then
echo check_copyright: there are no commits to check
exit 0
fi
# for each commit, check its files
for commit in $(echo $COMMITS | tac -s " ")
do
FILES=$(git diff-tree --no-commit-id --name-only -r $commit)
if [ -z "$FILES" ]
then
echo check_copyright: commit \"$commit\" has no files to check
else
# for each file, check if it is in the 'lib' or 'src' dirs
# and, if so, check the copyright
for file in $FILES
do
DIR=$(echo $file | cut -d "/" -f 1)
if [ "$DIR" = lib -o "$DIR" = src ]
then
COPYRIGHT=$(grep "Copyright (C)" $file)
YEAR=$(date +%G)
if [ -z "$COPYRIGHT" ]
then
echo check_copyright: commit \"$commit\" misses \
copyright for \"$file\"
elif ! echo $COPYRIGHT | grep -o $YEAR > /dev/null
then
echo check_copyright: commit \"$commit\" misses \
\"$YEAR\" copyright for \"$file\"
fi
fi
done
fi
done
ganeti-2.15.2/devel/release 0000755 0000000 0000000 00000006110 12634264163 0015520 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Copyright (C) 2009 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# This is a test script to ease development and testing on test clusters.
# It should not be used to update production environments.
# Usage: release v2.0.5
# Alternative: URL=file:///my/git/repo release e5823b7e2cd8a3...
# It will clone the given repository from the default or passed URL,
# checkout the given reference (a tag or branch) and then create a
# release archive; you will need to copy the archive and delete the
# temporary directory at the end
set -e
: ${URL:=git://git.ganeti.org/ganeti.git}
TAG="$1"
if [[ -z "$TAG" ]]; then
echo "Usage: $0 " >&2
exit 1
fi
echo "Using Git repository $URL"
TMPDIR=$(mktemp -d -t gntrelease.XXXXXXXXXX)
cd $TMPDIR
echo "Cloning the repository under $TMPDIR ..."
git clone -q "$URL" dist
cd dist
git checkout $TAG
# Check minimum aclocal version for releasing
MIN_ACLOCAL_VERSION=( 1 11 1 )
ACLOCAL_VERSION=$(${ACLOCAL:-aclocal} --version | head -1 | \
sed -e 's/^[^0-9]*\([0-9\.]*\)$/\1/')
ACLOCAL_VERSION_REST=$ACLOCAL_VERSION
for v in ${MIN_ACLOCAL_VERSION[@]}; do
ACLOCAL_VERSION_PART=${ACLOCAL_VERSION_REST%%.*}
ACLOCAL_VERSION_REST=${ACLOCAL_VERSION_REST#$ACLOCAL_VERSION_PART.}
if [[ $v -eq $ACLOCAL_VERSION_PART ]]; then
continue
elif [[ $v -lt $ACLOCAL_VERSION_PART ]]; then
break
else # gt
echo "aclocal version $ACLOCAL_VERSION is too old (< 1.11.1)"
exit 1
fi
done
./autogen.sh
./configure
VERSION=$(sed -n -e '/^PACKAGE_VERSION =/ s/^PACKAGE_VERSION = // p' Makefile)
make distcheck-release
fakeroot make dist-release
tar tzvf ganeti-$VERSION.tar.gz
echo
echo 'MD5:'
md5sum ganeti-$VERSION.tar.gz
echo
echo 'SHA1:'
sha1sum ganeti-$VERSION.tar.gz
echo
echo "The archive is at $PWD/ganeti-$VERSION.tar.gz"
echo "Please copy it and remove the temporary directory when done."
ganeti-2.15.2/devel/review 0000755 0000000 0000000 00000012237 12634264163 0015410 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Copyright (C) 2009 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# To set user mappings, use this command:
# git config gnt-review.johndoe 'John Doe '
# To disable strict mode (enabled by default):
# git config gnt-review.strict false
# To enable strict mode:
# git config gnt-review.strict true
set -e
# Get absolute path to myself
me_plain="$0"
me=$(readlink -f "$me_plain")
add_reviewed_by() {
local msgfile="$1"
grep -q '^Reviewed-by: ' "$msgfile" && return
perl -i -e '
my $reviewer = $ENV{"REVIEWER"};
defined($reviewer) or $reviewer = "";
my $sob = 0;
while (<>) {
if ($sob == 0 and m/^Signed-off-by:/) {
$sob = 1;
} elsif ($sob == 1 and not m/^Signed-off-by:/) {
print "Reviewed-by: $reviewer\n";
$sob = -1;
}
print;
}
if ($sob == 1) {
print "Reviewed-by: $reviewer\n";
}
' "$msgfile"
}
replace_users() {
local msgfile="$1"
if perl -i -e '
use strict;
use warnings;
my $error = 0;
my $strict;
sub map_username {
my ($name) = @_;
return $name unless $name;
my @cmd = ("git", "config", "--get", "gnt-review.$name");
open(my $fh, "-|", @cmd) or die "Command \"@cmd\" failed: $!";
my $output = do { local $/ = undef; <$fh> };
close($fh);
if ($? == 0) {
chomp $output;
$output =~ s/\s+/ /;
return $output;
}
unless (defined $strict) {
@cmd = ("git", "config", "--get", "--bool", "gnt-review.strict");
open($fh, "-|", @cmd) or die "Command \"@cmd\" failed: $!";
$output = do { local $/ = undef; <$fh> };
close($fh);
$strict = ($? != 0 or not $output or $output !~ m/^false$/);
}
if ($strict and $name !~ m/^.+<.+\@.+>$/) {
$error = 1;
}
return $name;
}
while (<>) {
if (m/^Reviewed-by:(.*)$/) {
my @names = grep {
# Ignore empty entries
!/^$/
} map {
# Normalize whitespace
$_ =~ s/(^\s+|\s+$)//g;
$_ =~ s/\s+/ /g;
# Map names
$_ = map_username($_);
$_;
} split(m/,/, $1);
# Get unique names
my %saw;
@names = grep(!$saw{$_}++, @names);
undef %saw;
foreach (sort @names) {
print "Reviewed-by: $_\n";
}
} else {
print;
}
}
exit($error? 33 : 0);
' "$msgfile"
then
:
else
[[ "$?" == 33 ]] && return 1
exit 1
fi
if ! grep -q '^Reviewed-by: ' "$msgfile"
then
echo 'Missing Reviewed-by: line' >&2
sleep 1
return 1
fi
return 0
}
run_editor() {
local filename="$1"
local editor=${EDITOR:-vi}
local args
case "$(basename "$editor")" in
vi* | *vim)
# Start edit mode at Reviewed-by: line
args='+/^Reviewed-by: +nohlsearch +startinsert!'
;;
*)
args=
;;
esac
$editor $args "$filename"
}
commit_editor() {
local msgfile="$1"
local tmpf=$(mktemp)
trap "rm -f $tmpf" EXIT
cp "$msgfile" "$tmpf"
while :
do
add_reviewed_by "$tmpf"
run_editor "$tmpf"
replace_users "$tmpf" && break
done
cp "$tmpf" "$msgfile"
}
copy_commit() {
local rev="$1" target_branch="$2"
echo "Copying commit $rev ..."
git cherry-pick -n "$rev"
GIT_EDITOR="$me --commit-editor \"\$@\"" git commit -c "$rev" -s
}
usage() {
echo "Usage: $me_plain [from..to] " >&2
echo " If not passed from..to defaults to target-branch..HEAD" >&2
exit 1
}
main() {
local range target_branch
case "$#" in
1)
target_branch="$1"
range="$target_branch..$(git rev-parse HEAD)"
;;
2)
range="$1"
target_branch="$2"
if [[ "$range" != *..* ]]; then
usage
fi
;;
*)
usage
;;
esac
git checkout "$target_branch"
local old_head=$(git rev-parse HEAD)
for rev in $(git rev-list --reverse "$range")
do
copy_commit "$rev"
done
git log "$old_head..$target_branch"
}
if [[ "$1" == --commit-editor ]]
then
shift
commit_editor "$@"
else
main "$@"
fi
ganeti-2.15.2/devel/upload 0000755 0000000 0000000 00000010655 12634264163 0015375 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Copyright (C) 2006, 2007, 2008, 2009, 2010, 2012, 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# This is a test script to ease development and testing on test clusters.
# It should not be used to update production environments.
# Usage: upload node-{1,2,3}
# it will upload the python libraries to
# $prefix/lib/python2.X/dist-packages/ganeti and the command line utils to
# $prefix/sbin. It needs passwordless root login to the nodes.
set -e -u
usage() {
echo "Usage: $0 [--no-restart] [--no-cron] [--no-debug] hosts..." >&2
exit $1
}
declare -r SED="sed -f autotools/replace_vars.sed"
NO_RESTART=
NO_CRON=
NO_DEBUG=
hosts=
while [ "$#" -gt 0 ]; do
opt="$1"
case "$opt" in
--no-restart)
NO_RESTART=1
;;
--no-cron)
NO_CRON=1
;;
--no-debug)
NO_DEBUG=1
;;
-h|--help)
usage 0
;;
-*)
echo "Unknown option: $opt" >&2
usage 1
;;
*)
hosts="$hosts $opt"
;;
esac
shift
done
if [ -z "$hosts" ]; then
usage 1
fi
set ${hosts}
make regen-vcs-version
TXD=`mktemp -d`
trap 'rm -rf $TXD' EXIT
if [[ -f /proc/cpuinfo ]]; then
cpu_count=$(grep -E -c '^processor[[:space:]]*:' /proc/cpuinfo)
make_args=-j$(( cpu_count + 1 ))
else
make_args=
fi
# Make sure that directories will get correct permissions
umask 0022
# install ganeti as a real tree
make $make_args install DESTDIR="$TXD"
# at this point, make has been finished, so the configuration is
# fixed; we can read the prefix vars/etc.
PREFIX="$(echo @PREFIX@ | $SED)"
SYSCONFDIR="$(echo @SYSCONFDIR@ | $SED)"
LIBDIR="$(echo @LIBDIR@ | $SED)"
PKGLIBDIR="$(echo @PKGLIBDIR@ | $SED)"
# copy additional needed files
[ -f doc/examples/ganeti.initd ] && \
install -D --mode=0755 doc/examples/ganeti.initd \
"$TXD/$SYSCONFDIR/init.d/ganeti"
[ -f doc/examples/ganeti.logrotate ] && \
install -D --mode=0755 doc/examples/ganeti.logrotate \
"$TXD/$SYSCONFDIR/logrotate.d/ganeti"
[ -f doc/examples/ganeti-master-role.ocf ] && \
install -D --mode=0755 doc/examples/ganeti-master-role.ocf \
"$TXD/$LIBDIR/ocf/resource.d/ganeti/ganeti-master-role"
[ -f doc/examples/ganeti-node-role.ocf ] && \
install -D --mode=0755 doc/examples/ganeti-node-role.ocf \
"$TXD/$LIBDIR/ocf/resource.d/ganeti/ganeti-node-role"
[ -f doc/examples/ganeti.default-debug -a -z "$NO_DEBUG" ] && \
install -D --mode=0644 doc/examples/ganeti.default-debug \
"$TXD/$SYSCONFDIR/default/ganeti"
[ -f doc/examples/bash_completion-debug ] && \
install -D --mode=0644 doc/examples/bash_completion-debug \
"$TXD/$SYSCONFDIR/bash_completion.d/ganeti"
if [ -f doc/examples/ganeti.cron -a -z "$NO_CRON" ]; then
install -D --mode=0644 doc/examples/ganeti.cron \
"$TXD/$SYSCONFDIR/cron.d/ganeti"
fi
echo ---
( cd "$TXD" && find; )
echo ---
# and now put it under $prefix on the target node(s)
for host; do
echo Uploading code to ${host}...
rsync -v -rlKDc \
-e "ssh -oBatchMode=yes" \
--exclude="*.py[oc]" --exclude="*.pdf" --exclude="*.html" \
"$TXD/" \
root@${host}:/ &
done
wait
if test -z "${NO_RESTART}"; then
for host; do
echo Restarting ganeti-noded on ${host}...
ssh -oBatchMode=yes root@${host} $SYSCONFDIR/init.d/ganeti restart &
done
wait
fi
ganeti-2.15.2/devel/webserver 0000755 0000000 0000000 00000004232 12634264163 0016107 0 ustar 00root root 0000000 0000000 #!/usr/bin/python
#
# Copyright (C) 2013 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import sys
import BaseHTTPServer
import SimpleHTTPServer
def main():
if len(sys.argv) == 2:
host = "127.0.0.1"
(_, port) = sys.argv
elif len(sys.argv) == 3:
(_, port, host) = sys.argv
else:
sys.stderr.write("Usage: %s []\n" % sys.argv[0])
sys.stderr.write("\n")
sys.stderr.write("Provides an HTTP server on the specified TCP port")
sys.stderr.write(" exporting the current working directory. Binds to")
sys.stderr.write(" localhost by default.\n")
sys.exit(1)
try:
port = int(port)
except (ValueError, TypeError), err:
sys.stderr.write("Invalid port '%s': %s\n" % (port, err))
sys.exit(1)
handler = SimpleHTTPServer.SimpleHTTPRequestHandler
server = BaseHTTPServer.HTTPServer((host, port), handler)
server.serve_forever()
if __name__ == "__main__":
main()
ganeti-2.15.2/doc/ 0000755 0000000 0000000 00000000000 12634264163 0013622 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/doc/admin.rst 0000644 0000000 0000000 00000224051 12634264163 0015450 0 ustar 00root root 0000000 0000000 Ganeti administrator's guide
============================
Documents Ganeti version |version|
.. contents::
.. highlight:: shell-example
Introduction
------------
Ganeti is a virtualization cluster management software. You are expected
to be a system administrator familiar with your Linux distribution and
the Xen or KVM virtualization environments before using it.
The various components of Ganeti all have man pages and interactive
help. This manual though will help you getting familiar with the system
by explaining the most common operations, grouped by related use.
After a terminology glossary and a section on the prerequisites needed
to use this manual, the rest of this document is divided in sections
for the different targets that a command affects: instance, nodes, etc.
.. _terminology-label:
Ganeti terminology
++++++++++++++++++
This section provides a small introduction to Ganeti terminology, which
might be useful when reading the rest of the document.
Cluster
~~~~~~~
A set of machines (nodes) that cooperate to offer a coherent, highly
available virtualization service under a single administration domain.
Node
~~~~
A physical machine which is member of a cluster. Nodes are the basic
cluster infrastructure, and they don't need to be fault tolerant in
order to achieve high availability for instances.
Node can be added and removed (if they host no instances) at will from
the cluster. In a HA cluster and only with HA instances, the loss of any
single node will not cause disk data loss for any instance; of course,
a node crash will cause the crash of its primary instances.
A node belonging to a cluster can be in one of the following roles at a
given time:
- *master* node, which is the node from which the cluster is controlled
- *master candidate* node, only nodes in this role have the full cluster
configuration and knowledge, and only master candidates can become the
master node
- *regular* node, which is the state in which most nodes will be on
bigger clusters (>20 nodes)
- *drained* node, nodes in this state are functioning normally but the
cannot receive new instances; the intention is that nodes in this role
have some issue and they are being evacuated for hardware repairs
- *offline* node, in which there is a record in the cluster
configuration about the node, but the daemons on the master node will
not talk to this node; any instances declared as having an offline
node as either primary or secondary will be flagged as an error in the
cluster verify operation
Depending on the role, each node will run a set of daemons:
- the :command:`ganeti-noded` daemon, which controls the manipulation of
this node's hardware resources; it runs on all nodes which are in a
cluster
- the :command:`ganeti-confd` daemon (Ganeti 2.1+) which runs on all
nodes, but is only functional on master candidate nodes; this daemon
can be disabled at configuration time if you don't need its
functionality
- the :command:`ganeti-rapi` daemon which runs on the master node and
offers an HTTP-based API for the cluster
- the :command:`ganeti-masterd` daemon which runs on the master node and
allows control of the cluster
Beside the node role, there are other node flags that influence its
behaviour:
- the *master_capable* flag denotes whether the node can ever become a
master candidate; setting this to 'no' means that auto-promotion will
never make this node a master candidate; this flag can be useful for a
remote node that only runs local instances, and having it become a
master is impractical due to networking or other constraints
- the *vm_capable* flag denotes whether the node can host instances or
not; for example, one might use a non-vm_capable node just as a master
candidate, for configuration backups; setting this flag to no
disallows placement of instances of this node, deactivates hypervisor
and related checks on it (e.g. bridge checks, LVM check, etc.), and
removes it from cluster capacity computations
Instance
~~~~~~~~
A virtual machine which runs on a cluster. It can be a fault tolerant,
highly available entity.
An instance has various parameters, which are classified in three
categories: hypervisor related-parameters (called ``hvparams``), general
parameters (called ``beparams``) and per network-card parameters (called
``nicparams``). All these parameters can be modified either at instance
level or via defaults at cluster level.
Disk template
~~~~~~~~~~~~~
The are multiple options for the storage provided to an instance; while
the instance sees the same virtual drive in all cases, the node-level
configuration varies between them.
There are several disk templates you can choose from:
``diskless``
The instance has no disks. Only used for special purpose operating
systems or for testing.
``file`` *****
The instance will use plain files as backend for its disks. No
redundancy is provided, and this is somewhat more difficult to
configure for high performance.
``sharedfile`` *****
The instance will use plain files as backend, but Ganeti assumes that
those files will be available and in sync automatically on all nodes.
This allows live migration and failover of instances using this
method.
``plain``
The instance will use LVM devices as backend for its disks. No
redundancy is provided.
``drbd``
.. note:: This is only valid for multi-node clusters using DRBD 8.0+
A mirror is set between the local node and a remote one, which must be
specified with the second value of the --node option. Use this option
to obtain a highly available instance that can be failed over to a
remote node should the primary one fail.
.. note:: Ganeti does not support DRBD stacked devices:
DRBD stacked setup is not fully symmetric and as such it is
not working with live migration.
``rbd``
The instance will use Volumes inside a RADOS cluster as backend for its
disks. It will access them using the RADOS block device (RBD).
``gluster`` *****
The instance will use a Gluster volume for instance storage. Disk
images will be stored in the top-level ``ganeti/`` directory of the
volume. This directory will be created automatically for you.
``ext``
The instance will use an external storage provider. See
:manpage:`ganeti-extstorage-interface(7)` for how to implement one.
.. note::
Disk templates marked with an asterisk require Ganeti to access the
file system. Ganeti will refuse to do so unless you whitelist the
relevant paths in the file storage paths configuration which,
with default configure-time paths is located
in :pyeval:`pathutils.FILE_STORAGE_PATHS_FILE`.
The default paths used by Ganeti are:
=============== ===================================================
Disk template Default path
=============== ===================================================
``file`` :pyeval:`pathutils.DEFAULT_FILE_STORAGE_DIR`
``sharedfile`` :pyeval:`pathutils.DEFAULT_SHARED_FILE_STORAGE_DIR`
``gluster`` :pyeval:`pathutils.DEFAULT_GLUSTER_STORAGE_DIR`
=============== ===================================================
Those paths can be changed at ``gnt-cluster init`` time. See
:manpage:`gnt-cluster(8)` for details.
IAllocator
~~~~~~~~~~
A framework for using external (user-provided) scripts to compute the
placement of instances on the cluster nodes. This eliminates the need to
manually specify nodes in instance add, instance moves, node evacuate,
etc.
In order for Ganeti to be able to use these scripts, they must be place
in the iallocator directory (usually ``lib/ganeti/iallocators`` under
the installation prefix, e.g. ``/usr/local``).
“Primary†and “secondary†concepts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An instance has a primary and depending on the disk configuration, might
also have a secondary node. The instance always runs on the primary node
and only uses its secondary node for disk replication.
Similarly, the term of primary and secondary instances when talking
about a node refers to the set of instances having the given node as
primary, respectively secondary.
Tags
~~~~
Tags are short strings that can be attached to either to cluster itself,
or to nodes or instances. They are useful as a very simplistic
information store for helping with cluster administration, for example
by attaching owner information to each instance after it's created::
$ gnt-instance add … %instance1%
$ gnt-instance add-tags %instance1% %owner:user2%
And then by listing each instance and its tags, this information could
be used for contacting the users of each instance.
Jobs and OpCodes
~~~~~~~~~~~~~~~~
While not directly visible by an end-user, it's useful to know that a
basic cluster operation (e.g. starting an instance) is represented
internally by Ganeti as an *OpCode* (abbreviation from operation
code). These OpCodes are executed as part of a *Job*. The OpCodes in a
single Job are processed serially by Ganeti, but different Jobs will be
processed (depending on resource availability) in parallel. They will
not be executed in the submission order, but depending on resource
availability, locks and (starting with Ganeti 2.3) priority. An earlier
job may have to wait for a lock while a newer job doesn't need any locks
and can be executed right away. Operations requiring a certain order
need to be submitted as a single job, or the client must submit one job
at a time and wait for it to finish before continuing.
For example, shutting down the entire cluster can be done by running the
command ``gnt-instance shutdown --all``, which will submit for each
instance a separate job containing the “shutdown instance†OpCode.
Prerequisites
+++++++++++++
You need to have your Ganeti cluster installed and configured before you
try any of the commands in this document. Please follow the
:doc:`install` for instructions on how to do that.
Instance management
-------------------
Adding an instance
++++++++++++++++++
The add operation might seem complex due to the many parameters it
accepts, but once you have understood the (few) required parameters and
the customisation capabilities you will see it is an easy operation.
The add operation requires at minimum five parameters:
- the OS for the instance
- the disk template
- the disk count and size
- the node specification or alternatively the iallocator to use
- and finally the instance name
The OS for the instance must be visible in the output of the command
``gnt-os list`` and specifies which guest OS to install on the instance.
The disk template specifies what kind of storage to use as backend for
the (virtual) disks presented to the instance; note that for instances
with multiple virtual disks, they all must be of the same type.
The node(s) on which the instance will run can be given either manually,
via the ``-n`` option, or computed automatically by Ganeti, if you have
installed any iallocator script.
With the above parameters in mind, the command is::
$ gnt-instance add \
-n %TARGET_NODE%:%SECONDARY_NODE% \
-o %OS_TYPE% \
-t %DISK_TEMPLATE% -s %DISK_SIZE% \
%INSTANCE_NAME%
The instance name must be resolvable (e.g. exist in DNS) and usually
points to an address in the same subnet as the cluster itself.
The above command has the minimum required options; other options you
can give include, among others:
- The maximum/minimum memory size (``-B maxmem``, ``-B minmem``)
(``-B memory`` can be used to specify only one size)
- The number of virtual CPUs (``-B vcpus``)
- Arguments for the NICs of the instance; by default, a single-NIC
instance is created. The IP and/or bridge of the NIC can be changed
via ``--net 0:ip=IP,link=BRIDGE``
See :manpage:`ganeti-instance(8)` for the detailed option list.
For example if you want to create an highly available instance, with a
single disk of 50GB and the default memory size, having primary node
``node1`` and secondary node ``node3``, use the following command::
$ gnt-instance add -n node1:node3 -o debootstrap -t drbd -s 50G \
instance1
There is a also a command for batch instance creation from a
specification file, see the ``batch-create`` operation in the
gnt-instance manual page.
Regular instance operations
+++++++++++++++++++++++++++
Removal
~~~~~~~
Removing an instance is even easier than creating one. This operation is
irreversible and destroys all the contents of your instance. Use with
care::
$ gnt-instance remove %INSTANCE_NAME%
.. _instance-startup-label:
Startup/shutdown
~~~~~~~~~~~~~~~~
Instances are automatically started at instance creation time. To
manually start one which is currently stopped you can run::
$ gnt-instance startup %INSTANCE_NAME%
Ganeti will start an instance with up to its maximum instance memory. If
not enough memory is available Ganeti will use all the available memory
down to the instance minimum memory. If not even that amount of memory
is free Ganeti will refuse to start the instance.
Note, that this will not work when an instance is in a permanently
stopped state ``offline``. In this case, you will first have to
put it back to online mode by running::
$ gnt-instance modify --online %INSTANCE_NAME%
The command to stop the running instance is::
$ gnt-instance shutdown %INSTANCE_NAME%
If you want to shut the instance down more permanently, so that it
does not require dynamically allocated resources (memory and vcpus),
after shutting down an instance, execute the following::
$ gnt-instance modify --offline %INSTANCE_NAME%
.. warning:: Do not use the Xen or KVM commands directly to stop
instances. If you run for example ``xm shutdown`` or ``xm destroy``
on an instance Ganeti will automatically restart it (via
the :command:`ganeti-watcher(8)` command which is launched via cron).
Instances can also be shutdown by the user from within the instance, in
which case they will marked accordingly and the
:command:`ganeti-watcher(8)` will not restart them. See
:manpage:`gnt-cluster(8)` for details.
Querying instances
~~~~~~~~~~~~~~~~~~
There are two ways to get information about instances: listing
instances, which does a tabular output containing a given set of fields
about each instance, and querying detailed information about a set of
instances.
The command to see all the instances configured and their status is::
$ gnt-instance list
The command can return a custom set of information when using the ``-o``
option (as always, check the manpage for a detailed specification). Each
instance will be represented on a line, thus making it easy to parse
this output via the usual shell utilities (grep, sed, etc.).
To get more detailed information about an instance, you can run::
$ gnt-instance info %INSTANCE%
which will give a multi-line block of information about the instance,
it's hardware resources (especially its disks and their redundancy
status), etc. This is harder to parse and is more expensive than the
list operation, but returns much more detailed information.
Changing an instance's runtime memory
+++++++++++++++++++++++++++++++++++++
Ganeti will always make sure an instance has a value between its maximum
and its minimum memory available as runtime memory. As of version 2.6
Ganeti will only choose a size different than the maximum size when
starting up, failing over, or migrating an instance on a node with less
than the maximum memory available. It won't resize other instances in
order to free up space for an instance.
If you find that you need more memory on a node any instance can be
manually resized without downtime, with the command::
$ gnt-instance modify -m %SIZE% %INSTANCE_NAME%
The same command can also be used to increase the memory available on an
instance, provided that enough free memory is available on its node, and
the specified size is not larger than the maximum memory size the
instance had when it was first booted (an instance will be unable to see
new memory above the maximum that was specified to the hypervisor at its
boot time, if it needs to grow further a reboot becomes necessary).
Export/Import
+++++++++++++
You can create a snapshot of an instance disk and its Ganeti
configuration, which then you can backup, or import into another
cluster. The way to export an instance is::
$ gnt-backup export -n %TARGET_NODE% %INSTANCE_NAME%
The target node can be any node in the cluster with enough space under
``/srv/ganeti`` to hold the instance image. Use the ``--noshutdown``
option to snapshot an instance without rebooting it. Note that Ganeti
only keeps one snapshot for an instance - any previous snapshot of the
same instance existing cluster-wide under ``/srv/ganeti`` will be
removed by this operation: if you want to keep them, you need to move
them out of the Ganeti exports directory.
Importing an instance is similar to creating a new one, but additionally
one must specify the location of the snapshot. The command is::
$ gnt-backup import -n %TARGET_NODE% \
--src-node=%NODE% --src-dir=%DIR% %INSTANCE_NAME%
By default, parameters will be read from the export information, but you
can of course pass them in via the command line - most of the options
available for the command :command:`gnt-instance add` are supported here
too.
Import of foreign instances
+++++++++++++++++++++++++++
There is a possibility to import a foreign instance whose disk data is
already stored as LVM volumes without going through copying it: the disk
adoption mode.
For this, ensure that the original, non-managed instance is stopped,
then create a Ganeti instance in the usual way, except that instead of
passing the disk information you specify the current volumes::
$ gnt-instance add -t plain -n %HOME_NODE% ... \
--disk 0:adopt=%lv_name%[,vg=%vg_name%] %INSTANCE_NAME%
This will take over the given logical volumes, rename them to the Ganeti
standard (UUID-based), and without installing the OS on them start
directly the instance. If you configure the hypervisor similar to the
non-managed configuration that the instance had, the transition should
be seamless for the instance. For more than one disk, just pass another
disk parameter (e.g. ``--disk 1:adopt=...``).
Instance kernel selection
+++++++++++++++++++++++++
The kernel that instances uses to bootup can come either from the node,
or from instances themselves, depending on the setup.
Xen-PVM
~~~~~~~
With Xen PVM, there are three options.
First, you can use a kernel from the node, by setting the hypervisor
parameters as such:
- ``kernel_path`` to a valid file on the node (and appropriately
``initrd_path``)
- ``kernel_args`` optionally set to a valid Linux setting (e.g. ``ro``)
- ``root_path`` to a valid setting (e.g. ``/dev/xvda1``)
- ``bootloader_path`` and ``bootloader_args`` to empty
Alternatively, you can delegate the kernel management to instances, and
use either ``pvgrub`` or the deprecated ``pygrub``. For this, you must
install the kernels and initrds in the instance and create a valid GRUB
v1 configuration file.
For ``pvgrub`` (new in version 2.4.2), you need to set:
- ``kernel_path`` to point to the ``pvgrub`` loader present on the node
(e.g. ``/usr/lib/xen/boot/pv-grub-x86_32.gz``)
- ``kernel_args`` to the path to the GRUB config file, relative to the
instance (e.g. ``(hd0,0)/grub/menu.lst``)
- ``root_path`` **must** be empty
- ``bootloader_path`` and ``bootloader_args`` to empty
While ``pygrub`` is deprecated, here is how you can configure it:
- ``bootloader_path`` to the pygrub binary (e.g. ``/usr/bin/pygrub``)
- the other settings are not important
More information can be found in the Xen wiki pages for `pvgrub
`_ and `pygrub
`_.
KVM
~~~
For KVM also the kernel can be loaded either way.
For loading the kernels from the node, you need to set:
- ``kernel_path`` to a valid value
- ``initrd_path`` optionally set if you use an initrd
- ``kernel_args`` optionally set to a valid value (e.g. ``ro``)
If you want instead to have the instance boot from its disk (and execute
its bootloader), simply set the ``kernel_path`` parameter to an empty
string, and all the others will be ignored.
Instance HA features
--------------------
.. note:: This section only applies to multi-node clusters
.. _instance-change-primary-label:
Changing the primary node
+++++++++++++++++++++++++
There are three ways to exchange an instance's primary and secondary
nodes; the right one to choose depends on how the instance has been
created and the status of its current primary node. See
:ref:`rest-redundancy-label` for information on changing the secondary
node. Note that it's only possible to change the primary node to the
secondary and vice-versa; a direct change of the primary node with a
third node, while keeping the current secondary is not possible in a
single step, only via multiple operations as detailed in
:ref:`instance-relocation-label`.
Failing over an instance
~~~~~~~~~~~~~~~~~~~~~~~~
If an instance is built in highly available mode you can at any time
fail it over to its secondary node, even if the primary has somehow
failed and it's not up anymore. Doing it is really easy, on the master
node you can just run::
$ gnt-instance failover %INSTANCE_NAME%
That's it. After the command completes the secondary node is now the
primary, and vice-versa.
The instance will be started with an amount of memory between its
``maxmem`` and its ``minmem`` value, depending on the free memory on its
target node, or the operation will fail if that's not possible. See
:ref:`instance-startup-label` for details.
If the instance's disk template is of type rbd, then you can specify
the target node (which can be any node) explicitly, or specify an
iallocator plugin. If you omit both, the default iallocator will be
used to determine the target node::
$ gnt-instance failover -n %TARGET_NODE% %INSTANCE_NAME%
Live migrating an instance
~~~~~~~~~~~~~~~~~~~~~~~~~~
If an instance is built in highly available mode, it currently runs and
both its nodes are running fine, you can migrate it over to its
secondary node, without downtime. On the master node you need to run::
$ gnt-instance migrate %INSTANCE_NAME%
The current load on the instance and its memory size will influence how
long the migration will take. In any case, for both KVM and Xen
hypervisors, the migration will be transparent to the instance.
If the destination node has less memory than the instance's current
runtime memory, but at least the instance's minimum memory available
Ganeti will automatically reduce the instance runtime memory before
migrating it, unless the ``--no-runtime-changes`` option is passed, in
which case the target node should have at least the instance's current
runtime memory free.
If the instance's disk template is of type rbd, then you can specify
the target node (which can be any node) explicitly, or specify an
iallocator plugin. If you omit both, the default iallocator will be
used to determine the target node::
$ gnt-instance migrate -n %TARGET_NODE% %INSTANCE_NAME%
Moving an instance (offline)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If an instance has not been create as mirrored, then the only way to
change its primary node is to execute the move command::
$ gnt-instance move -n %NEW_NODE% %INSTANCE%
This has a few prerequisites:
- the instance must be stopped
- its current primary node must be on-line and healthy
- the disks of the instance must not have any errors
Since this operation actually copies the data from the old node to the
new node, expect it to take proportional to the size of the instance's
disks and the speed of both the nodes' I/O system and their networking.
Disk operations
+++++++++++++++
Disk failures are a common cause of errors in any server
deployment. Ganeti offers protection from single-node failure if your
instances were created in HA mode, and it also offers ways to restore
redundancy after a failure.
Preparing for disk operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is important to note that for Ganeti to be able to do any disk
operation, the Linux machines on top of which Ganeti runs must be
consistent; for LVM, this means that the LVM commands must not return
failures; it is common that after a complete disk failure, any LVM
command aborts with an error similar to::
$ vgs
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
Couldn't find device with uuid 't30jmN-4Rcf-Fr5e-CURS-pawt-z0jU-m1TgeJ'.
Couldn't find all physical volumes for volume group xenvg.
Before restoring an instance's disks to healthy status, it's needed to
fix the volume group used by Ganeti so that we can actually create and
manage the logical volumes. This is usually done in a multi-step
process:
#. first, if the disk is completely gone and LVM commands exit with
“Couldn't find device with uuid…†then you need to run the command::
$ vgreduce --removemissing %VOLUME_GROUP%
#. after the above command, the LVM commands should be executing
normally (warnings are normal, but the commands will not fail
completely).
#. if the failed disk is still visible in the output of the ``pvs``
command, you need to deactivate it from allocations by running::
$ pvs -x n /dev/%DISK%
At this point, the volume group should be consistent and any bad
physical volumes should not longer be available for allocation.
Note that since version 2.1 Ganeti provides some commands to automate
these two operations, see :ref:`storage-units-label`.
.. _rest-redundancy-label:
Restoring redundancy for DRBD-based instances
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A DRBD instance has two nodes, and the storage on one of them has
failed. Depending on which node (primary or secondary) has failed, you
have three options at hand:
- if the storage on the primary node has failed, you need to re-create
the disks on it
- if the storage on the secondary node has failed, you can either
re-create the disks on it or change the secondary and recreate
redundancy on the new secondary node
Of course, at any point it's possible to force re-creation of disks even
though everything is already fine.
For all three cases, the ``replace-disks`` operation can be used::
# re-create disks on the primary node
$ gnt-instance replace-disks -p %INSTANCE_NAME%
# re-create disks on the current secondary
$ gnt-instance replace-disks -s %INSTANCE_NAME%
# change the secondary node, via manual specification
$ gnt-instance replace-disks -n %NODE% %INSTANCE_NAME%
# change the secondary node, via an iallocator script
$ gnt-instance replace-disks -I %SCRIPT% %INSTANCE_NAME%
# since Ganeti 2.1: automatically fix the primary or secondary node
$ gnt-instance replace-disks -a %INSTANCE_NAME%
Since the process involves copying all data from the working node to the
target node, it will take a while, depending on the instance's disk
size, node I/O system and network speed. But it is (barring any network
interruption) completely transparent for the instance.
Re-creating disks for non-redundant instances
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. versionadded:: 2.1
For non-redundant instances, there isn't a copy (except backups) to
re-create the disks. But it's possible to at-least re-create empty
disks, after which a reinstall can be run, via the ``recreate-disks``
command::
$ gnt-instance recreate-disks %INSTANCE%
Note that this will fail if the disks already exists. The instance can
be assigned to new nodes automatically by specifying an iallocator
through the ``--iallocator`` option.
Conversion of an instance's disk type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is possible to convert between a non-redundant instance of type
``plain`` (LVM storage) and redundant ``drbd`` via the ``gnt-instance
modify`` command::
# start with a non-redundant instance
$ gnt-instance add -t plain ... %INSTANCE%
# later convert it to redundant
$ gnt-instance stop %INSTANCE%
$ gnt-instance modify -t drbd -n %NEW_SECONDARY% %INSTANCE%
$ gnt-instance start %INSTANCE%
# and convert it back
$ gnt-instance stop %INSTANCE%
$ gnt-instance modify -t plain %INSTANCE%
$ gnt-instance start %INSTANCE%
The conversion must be done while the instance is stopped, and
converting from plain to drbd template presents a small risk, especially
if the instance has multiple disks and/or if one node fails during the
conversion procedure). As such, it's recommended (as always) to make
sure that downtime for manual recovery is acceptable and that the
instance has up-to-date backups.
Debugging instances
+++++++++++++++++++
Accessing an instance's disks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From an instance's primary node you can have access to its disks. Never
ever mount the underlying logical volume manually on a fault tolerant
instance, or will break replication and your data will be
inconsistent. The correct way to access an instance's disks is to run
(on the master node, as usual) the command::
$ gnt-instance activate-disks %INSTANCE%
And then, *on the primary node of the instance*, access the device that
gets created. For example, you could mount the given disks, then edit
files on the filesystem, etc.
Note that with partitioned disks (as opposed to whole-disk filesystems),
you will need to use a tool like :manpage:`kpartx(8)`::
# on node1
$ gnt-instance activate-disks %instance1%
node3:disk/0:…
$ ssh node3
# on node 3
$ kpartx -l /dev/…
$ kpartx -a /dev/…
$ mount /dev/mapper/… /mnt/
# edit files under mnt as desired
$ umount /mnt/
$ kpartx -d /dev/…
$ exit
# back to node 1
After you've finished you can deactivate them with the deactivate-disks
command, which works in the same way::
$ gnt-instance deactivate-disks %INSTANCE%
Note that if any process started by you is still using the disks, the
above command will error out, and you **must** cleanup and ensure that
the above command runs successfully before you start the instance,
otherwise the instance will suffer corruption.
Accessing an instance's console
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The command to access a running instance's console is::
$ gnt-instance console %INSTANCE_NAME%
Use the console normally and then type ``^]`` when done, to exit.
Other instance operations
+++++++++++++++++++++++++
Reboot
~~~~~~
There is a wrapper command for rebooting instances::
$ gnt-instance reboot %instance2%
By default, this does the equivalent of shutting down and then starting
the instance, but it accepts parameters to perform a soft-reboot (via
the hypervisor), a hard reboot (hypervisor shutdown and then startup) or
a full one (the default, which also de-configures and then configures
again the disks of the instance).
Instance OS definitions debugging
+++++++++++++++++++++++++++++++++
Should you have any problems with instance operating systems the command
to see a complete status for all your nodes is::
$ gnt-os diagnose
.. _instance-relocation-label:
Instance relocation
~~~~~~~~~~~~~~~~~~~
While it is not possible to move an instance from nodes ``(A, B)`` to
nodes ``(C, D)`` in a single move, it is possible to do so in a few
steps::
# instance is located on A, B
$ gnt-instance replace-disks -n %nodeC% %instance1%
# instance has moved from (A, B) to (A, C)
# we now flip the primary/secondary nodes
$ gnt-instance migrate %instance1%
# instance lives on (C, A)
# we can then change A to D via:
$ gnt-instance replace-disks -n %nodeD% %instance1%
Which brings it into the final configuration of ``(C, D)``. Note that we
needed to do two replace-disks operation (two copies of the instance
disks), because we needed to get rid of both the original nodes (A and
B).
Network Management
------------------
Ganeti used to describe NICs of an Instance with an IP, a MAC, a connectivity
link and mode. This had three major shortcomings:
* there was no easy way to assign a unique IP to an instance
* network info (subnet, gateway, domain, etc.) was not available on target
node (kvm-ifup, hooks, etc)
* one should explicitly pass L2 info (mode, and link) to every NIC
Plus there was no easy way to get the current networking overview (which
instances are on the same L2 or L3 network, which IPs are reserved, etc).
All the above required an external management tool that has an overall view
and provides the corresponding info to Ganeti.
gnt-network aims to support a big part of this functionality inside Ganeti and
abstract the network as a separate entity. Currently, a Ganeti network
provides the following:
* A single IPv4 pool, subnet and gateway
* Connectivity info per nodegroup (mode, link)
* MAC prefix for each NIC inside the network
* IPv6 prefix/Gateway related to this network
* Tags
IP pool management ensures IP uniqueness inside this network. The user can
pass `ip=pool,network=test` and will:
1. Get the first available IP in the pool
2. Inherit the connectivity mode and link of the network's netparams
3. NIC will obtain the MAC prefix of the network
4. All network related info will be available as environment variables in
kvm-ifup scripts and hooks, so that they can dynamically manage all
networking-related setup on the host.
Hands on with gnt-network
+++++++++++++++++++++++++
To create a network do::
# gnt-network add --network=192.0.2.0/24 --gateway=192.0.2.1 test
Please see all other available options (--add-reserved-ips, --mac-prefix,
--network6, --gateway6, --tags).
Currently, IPv6 info is not used by Ganeti itself. It only gets exported
to NIC configuration scripts and hooks via environment variables.
To make this network available on a nodegroup you should specify the
connectivity mode and link during connection::
# gnt-network connect --nic-parameters mode=bridged,link=br100 test default nodegroup1
To add a NIC inside this network::
# gnt-instance modify --net -1:add,ip=pool,network=test inst1
This will let a NIC obtain a unique IP inside this network, and inherit the
nodegroup's netparams (bridged, br100). IP here is optional. If missing the
NIC will just get the L2 info.
To move an existing NIC from a network to another and remove its IP::
# gnt-instance modify --net -1:ip=none,network=test1 inst1
This will release the old IP from the old IP pool and the NIC will inherit the
new nicparams.
On the above actions there is a extra option `--no-conflicts-ckeck`. This
does not check for conflicting setups. Specifically:
1. When a network is added, IPs of nodes and master are not being checked.
2. When connecting a network on a nodegroup, IPs of instances inside this
nodegroup are not checked whether they reside inside the subnet or not.
3. When specifying explicitly a IP without passing a network, Ganeti will not
check if this IP is included inside any available network on the nodegroup.
External components
+++++++++++++++++++
All the aforementioned steps assure NIC configuration from the Ganeti
perspective. Of course this has nothing to do, how the instance eventually will
get the desired connectivity (IPv4, IPv6, default routes, DNS info, etc) and
where will the IP resolve. This functionality is managed by the external
components.
Let's assume that the VM will need to obtain a dynamic IP via DHCP, get a SLAAC
address, and use DHCPv6 for other configuration information (in case RFC-6106
is not supported by the client, e.g. Windows). This means that the following
external services are needed:
1. A DHCP server
2. An IPv6 router sending Router Advertisements
3. A DHCPv6 server exporting DNS info
4. A dynamic DNS server
These components must be configured dynamically and on a per NIC basis.
The way to do this is by using custom kvm-ifup scripts and hooks.
snf-network
~~~~~~~~~~~
The snf-network package [1,3] includes custom scripts that will provide the
aforementioned functionality. `kvm-vif-bridge` and `vif-custom` is an
alternative to `kvm-ifup` and `vif-ganeti` that take into account all network
info being exported. Their actions depend on network tags. Specifically:
`dns`: will update an external DDNS server (nsupdate on a bind server)
`ip-less-routed`: will setup routes, rules and proxy ARP
This setup assumes a pre-existing routing table along with some local
configuration and provides connectivity to instances via an external
gateway/router without requiring nodes to have an IP inside this network.
`private-filtered`: will setup ebtables rules to ensure L2 isolation on a
common bridge. Only packets with the same MAC prefix will be forwarded to the
corresponding virtual interface.
`nfdhcpd`: will update an external DHCP server
nfdhcpd
~~~~~~~
snf-network works with nfdhcpd [2,3]: a custom user space DHCP
server based on NFQUEUE. Currently, nfdhcpd replies on BOOTP/DHCP requests
originating from a tap or a bridge. Additionally in case of a routed setup it
provides a ra-stateless configuration by responding to router and neighbour
solicitations along with DHCPv6 requests for DNS options. Its db is
dynamically updated using text files inside a local dir with inotify
(snf-network just adds a per NIC binding file with all relevant info if the
corresponding network tag is found). Still we need to mangle all these
packets and send them to the corresponding NFQUEUE.
Known shortcomings
++++++++++++++++++
Currently the following things are some know weak points of the gnt-network
design and implementation:
* Cannot define a network without an IP pool
* The pool defines the size of the network
* Reserved IPs must be defined explicitly (inconvenient for a big range)
* Cannot define an IPv6 only network
Future work
+++++++++++
Any upcoming patches should target:
* Separate L2, L3, IPv6, IP pool info
* Support a set of IP pools per network
* Make IP/network in NIC object take a list of entries
* Introduce external scripts for node configuration
(dynamically create/destroy bridges/routes upon network connect/disconnect)
[1] https://code.grnet.gr/git/snf-network
[2] https://code.grnet.gr/git/snf-nfdhcpd
[3] deb http:/apt.dev.grnet.gr/ wheezy/
Node operations
---------------
There are much fewer node operations available than for instances, but
they are equivalently important for maintaining a healthy cluster.
Add/readd
+++++++++
It is at any time possible to extend the cluster with one more node, by
using the node add operation::
$ gnt-node add %NEW_NODE%
If the cluster has a replication network defined, then you need to pass
the ``-s REPLICATION_IP`` parameter to this option.
A variation of this command can be used to re-configure a node if its
Ganeti configuration is broken, for example if it has been reinstalled
by mistake::
$ gnt-node add --readd %EXISTING_NODE%
This will reinitialise the node as if it's been newly added, but while
keeping its existing configuration in the cluster (primary/secondary IP,
etc.), in other words you won't need to use ``-s`` here.
Changing the node role
++++++++++++++++++++++
A node can be in different roles, as explained in the
:ref:`terminology-label` section. Promoting a node to the master role is
special, while the other roles are handled all via a single command.
Failing over the master node
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to promote a different node to the master role (for whatever
reason), run on any other master-candidate node the command::
$ gnt-cluster master-failover
and the node you ran it on is now the new master. In case you try to run
this on a non master-candidate node, you will get an error telling you
which nodes are valid.
Changing between the other roles
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``gnt-node modify`` command can be used to select a new role::
# change to master candidate
$ gnt-node modify -C yes %NODE%
# change to drained status
$ gnt-node modify -D yes %NODE%
# change to offline status
$ gnt-node modify -O yes %NODE%
# change to regular mode (reset all flags)
$ gnt-node modify -O no -D no -C no %NODE%
Note that the cluster requires that at any point in time, a certain
number of nodes are master candidates, so changing from master candidate
to other roles might fail. It is recommended to either force the
operation (via the ``--force`` option) or first change the number of
master candidates in the cluster - see :ref:`cluster-config-label`.
Evacuating nodes
++++++++++++++++
There are two steps of moving instances off a node:
- moving the primary instances (actually converting them into secondary
instances)
- moving the secondary instances (including any instances converted in
the step above)
Primary instance conversion
~~~~~~~~~~~~~~~~~~~~~~~~~~~
For this step, you can use either individual instance move
commands (as seen in :ref:`instance-change-primary-label`) or the bulk
per-node versions; these are::
$ gnt-node migrate %NODE%
$ gnt-node evacuate -s %NODE%
Note that the instance “move†command doesn't currently have a node
equivalent.
Both these commands, or the equivalent per-instance command, will make
this node the secondary node for the respective instances, whereas their
current secondary node will become primary. Note that it is not possible
to change in one step the primary node to another node as primary, while
keeping the same secondary node.
Secondary instance evacuation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For the evacuation of secondary instances, a command called
:command:`gnt-node evacuate` is provided and its syntax is::
$ gnt-node evacuate -I %IALLOCATOR_SCRIPT% %NODE%
$ gnt-node evacuate -n %DESTINATION_NODE% %NODE%
The first version will compute the new secondary for each instance in
turn using the given iallocator script, whereas the second one will
simply move all instances to DESTINATION_NODE.
Removal
+++++++
Once a node no longer has any instances (neither primary nor secondary),
it's easy to remove it from the cluster::
$ gnt-node remove %NODE_NAME%
This will deconfigure the node, stop the ganeti daemons on it and leave
it hopefully like before it joined to the cluster.
Replication network changes
+++++++++++++++++++++++++++
The :command:`gnt-node modify -s` command can be used to change the
secondary IP of a node. This operation can only be performed if:
- No instance is active on the target node
- The new target IP is reachable from the master's secondary IP
Also this operation will not allow to change a node from single-homed
(same primary and secondary ip) to multi-homed (separate replication
network) or vice versa, unless:
- The target node is the master node and `--force` is passed.
- The target cluster is single-homed and the new primary ip is a change
to single homed for a particular node.
- The target cluster is multi-homed and the new primary ip is a change
to multi homed for a particular node.
For example to do a single-homed to multi-homed conversion::
$ gnt-node modify --force -s %SECONDARY_IP% %MASTER_NAME%
$ gnt-node modify -s %SECONDARY_IP% %NODE1_NAME%
$ gnt-node modify -s %SECONDARY_IP% %NODE2_NAME%
$ gnt-node modify -s %SECONDARY_IP% %NODE3_NAME%
...
The same commands can be used for multi-homed to single-homed except the
secondary IPs should be the same as the primaries for each node, for
that case.
Storage handling
++++++++++++++++
When using LVM (either standalone or with DRBD), it can become tedious
to debug and fix it in case of errors. Furthermore, even file-based
storage can become complicated to handle manually on many hosts. Ganeti
provides a couple of commands to help with automation.
Logical volumes
~~~~~~~~~~~~~~~
This is a command specific to LVM handling. It allows listing the
logical volumes on a given node or on all nodes and their association to
instances via the ``volumes`` command::
$ gnt-node volumes
Node PhysDev VG Name Size Instance
node1 /dev/sdb1 xenvg e61fbc97-….disk0 512M instance17
node1 /dev/sdb1 xenvg ebd1a7d1-….disk0 512M instance19
node2 /dev/sdb1 xenvg 0af08a3d-….disk0 512M instance20
node2 /dev/sdb1 xenvg cc012285-….disk0 512M instance16
node2 /dev/sdb1 xenvg f0fac192-….disk0 512M instance18
The above command maps each logical volume to a volume group and
underlying physical volume and (possibly) to an instance.
.. _storage-units-label:
Generalized storage handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. versionadded:: 2.1
Starting with Ganeti 2.1, a new storage framework has been implemented
that tries to abstract the handling of the storage type the cluster
uses.
First is listing the backend storage and their space situation::
$ gnt-node list-storage
Node Type Name Size Used Free Allocatable
node1 lvm-vg xenvg 3.6T 0M 3.6T Y
node2 lvm-vg xenvg 3.6T 0M 3.6T Y
node3 lvm-vg xenvg 3.6T 2.0G 3.6T Y
The default is to list LVM physical volumes. It's also possible to list
the LVM volume groups::
$ gnt-node list-storage -t lvm-vg
Node Type Name Size Used Free Allocatable
node1 lvm-vg xenvg 3.6T 0M 3.6T Y
node2 lvm-vg xenvg 3.6T 0M 3.6T Y
node3 lvm-vg xenvg 3.6T 2.0G 3.6T Y
Next is repairing storage units, which is currently only implemented for
volume groups and does the equivalent of ``vgreduce --removemissing``::
$ gnt-node repair-storage %node2% lvm-vg xenvg
Sun Oct 25 22:21:45 2009 Repairing storage unit 'xenvg' on node2 ...
Last is the modification of volume properties, which is (again) only
implemented for LVM physical volumes and allows toggling the
``allocatable`` value::
$ gnt-node modify-storage --allocatable=no %node2% lvm-pv /dev/%sdb1%
Use of the storage commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~
All these commands are needed when recovering a node from a disk
failure:
- first, we need to recover from complete LVM failure (due to missing
disk), by running the ``repair-storage`` command
- second, we need to change allocation on any partially-broken disk
(i.e. LVM still sees it, but it has bad blocks) by running
``modify-storage``
- then we can evacuate the instances as needed
Cluster operations
------------------
Beside the cluster initialisation command (which is detailed in the
:doc:`install` document) and the master failover command which is
explained under node handling, there are a couple of other cluster
operations available.
.. _cluster-config-label:
Standard operations
+++++++++++++++++++
One of the few commands that can be run on any node (not only the
master) is the ``getmaster`` command::
# on node2
$ gnt-cluster getmaster
node1.example.com
It is possible to query and change global cluster parameters via the
``info`` and ``modify`` commands::
$ gnt-cluster info
Cluster name: cluster.example.com
Cluster UUID: 07805e6f-f0af-4310-95f1-572862ee939c
Creation time: 2009-09-25 05:04:15
Modification time: 2009-10-18 22:11:47
Master node: node1.example.com
Architecture (this node): 64bit (x86_64)
…
Tags: foo
Default hypervisor: xen-pvm
Enabled hypervisors: xen-pvm
Hypervisor parameters:
- xen-pvm:
root_path: /dev/sda1
…
Cluster parameters:
- candidate pool size: 10
…
Default instance parameters:
- default:
memory: 128
…
Default nic parameters:
- default:
link: xen-br0
…
There various parameters above can be changed via the ``modify``
commands as follows:
- the hypervisor parameters can be changed via ``modify -H
xen-pvm:root_path=…``, and so on for other hypervisors/key/values
- the "default instance parameters" are changeable via ``modify -B
parameter=value…`` syntax
- the cluster parameters are changeable via separate options to the
modify command (e.g. ``--candidate-pool-size``, etc.)
For detailed option list see the :manpage:`gnt-cluster(8)` man page.
The cluster version can be obtained via the ``version`` command::
$ gnt-cluster version
Software version: 2.1.0
Internode protocol: 20
Configuration format: 2010000
OS api version: 15
Export interface: 0
This is not very useful except when debugging Ganeti.
Global node commands
++++++++++++++++++++
There are two commands provided for replicating files to all nodes of a
cluster and for running commands on all the nodes::
$ gnt-cluster copyfile %/path/to/file%
$ gnt-cluster command %ls -l /path/to/file%
These are simple wrappers over scp/ssh and more advanced usage can be
obtained using :manpage:`dsh(1)` and similar commands. But they are
useful to update an OS script from the master node, for example.
Cluster verification
++++++++++++++++++++
There are three commands that relate to global cluster checks. The first
one is ``verify`` which gives an overview on the cluster state,
highlighting any issues. In normal operation, this command should return
no ``ERROR`` messages::
$ gnt-cluster verify
Sun Oct 25 23:08:58 2009 * Verifying global settings
Sun Oct 25 23:08:58 2009 * Gathering data (2 nodes)
Sun Oct 25 23:09:00 2009 * Verifying node status
Sun Oct 25 23:09:00 2009 * Verifying instance status
Sun Oct 25 23:09:00 2009 * Verifying orphan volumes
Sun Oct 25 23:09:00 2009 * Verifying remaining instances
Sun Oct 25 23:09:00 2009 * Verifying N+1 Memory redundancy
Sun Oct 25 23:09:00 2009 * Other Notes
Sun Oct 25 23:09:00 2009 - NOTICE: 5 non-redundant instance(s) found.
Sun Oct 25 23:09:00 2009 * Hooks Results
The second command is ``verify-disks``, which checks that the instance's
disks have the correct status based on the desired instance state
(up/down)::
$ gnt-cluster verify-disks
Note that this command will show no output when disks are healthy.
The last command is used to repair any discrepancies in Ganeti's
recorded disk size and the actual disk size (disk size information is
needed for proper activation and growth of DRBD-based disks)::
$ gnt-cluster repair-disk-sizes
Sun Oct 25 23:13:16 2009 - INFO: Disk 0 of instance instance1 has mismatched size, correcting: recorded 512, actual 2048
Sun Oct 25 23:13:17 2009 - WARNING: Invalid result from node node4, ignoring node results
The above shows one instance having wrong disk size, and a node which
returned invalid data, and thus we ignored all primary instances of that
node.
Configuration redistribution
++++++++++++++++++++++++++++
If the verify command complains about file mismatches between the master
and other nodes, due to some node problems or if you manually modified
configuration files, you can force an push of the master configuration
to all other nodes via the ``redist-conf`` command::
$ gnt-cluster redist-conf
This command will be silent unless there are problems sending updates to
the other nodes.
Cluster renaming
++++++++++++++++
It is possible to rename a cluster, or to change its IP address, via the
``rename`` command. If only the IP has changed, you need to pass the
current name and Ganeti will realise its IP has changed::
$ gnt-cluster rename %cluster.example.com%
This will rename the cluster to 'cluster.example.com'. If
you are connected over the network to the cluster name, the operation
is very dangerous as the IP address will be removed from the node and
the change may not go through. Continue?
y/[n]/?: %y%
Failure: prerequisites not met for this operation:
Neither the name nor the IP address of the cluster has changed
In the above output, neither value has changed since the cluster
initialisation so the operation is not completed.
Queue operations
++++++++++++++++
The job queue execution in Ganeti 2.0 and higher can be inspected,
suspended and resumed via the ``queue`` command::
$ gnt-cluster queue info
The drain flag is unset
$ gnt-cluster queue drain
$ gnt-instance stop %instance1%
Failed to submit job for instance1: Job queue is drained, refusing job
$ gnt-cluster queue info
The drain flag is set
$ gnt-cluster queue undrain
This is most useful if you have an active cluster and you need to
upgrade the Ganeti software, or simply restart the software on any node:
#. suspend the queue via ``queue drain``
#. wait until there are no more running jobs via ``gnt-job list``
#. restart the master or another node, or upgrade the software
#. resume the queue via ``queue undrain``
.. note:: this command only stores a local flag file, and if you
failover the master, it will not have effect on the new master.
Watcher control
+++++++++++++++
The :manpage:`ganeti-watcher(8)` is a program, usually scheduled via
``cron``, that takes care of cluster maintenance operations (restarting
downed instances, activating down DRBD disks, etc.). However, during
maintenance and troubleshooting, this can get in your way; disabling it
via commenting out the cron job is not so good as this can be
forgotten. Thus there are some commands for automated control of the
watcher: ``pause``, ``info`` and ``continue``::
$ gnt-cluster watcher info
The watcher is not paused.
$ gnt-cluster watcher pause %1h%
The watcher is paused until Mon Oct 26 00:30:37 2009.
$ gnt-cluster watcher info
The watcher is paused until Mon Oct 26 00:30:37 2009.
$ ganeti-watcher -d
2009-10-25 23:30:47,984: pid=28867 ganeti-watcher:486 DEBUG Pause has been set, exiting
$ gnt-cluster watcher continue
The watcher is no longer paused.
$ ganeti-watcher -d
2009-10-25 23:31:04,789: pid=28976 ganeti-watcher:345 DEBUG Archived 0 jobs, left 0
2009-10-25 23:31:05,884: pid=28976 ganeti-watcher:280 DEBUG Got data from cluster, writing instance status file
2009-10-25 23:31:06,061: pid=28976 ganeti-watcher:150 DEBUG Data didn't change, just touching status file
$ gnt-cluster watcher info
The watcher is not paused.
The exact details of the argument to the ``pause`` command are available
in the manpage.
.. note:: this command only stores a local flag file, and if you
failover the master, it will not have effect on the new master.
Node auto-maintenance
+++++++++++++++++++++
If the cluster parameter ``maintain_node_health`` is enabled (see the
manpage for :command:`gnt-cluster`, the init and modify subcommands),
then the following will happen automatically:
- the watcher will shutdown any instances running on offline nodes
- the watcher will deactivate any DRBD devices on offline nodes
In the future, more actions are planned, so only enable this parameter
if the nodes are completely dedicated to Ganeti; otherwise it might be
possible to lose data due to auto-maintenance actions.
Removing a cluster entirely
+++++++++++++++++++++++++++
The usual method to cleanup a cluster is to run ``gnt-cluster destroy``
however if the Ganeti installation is broken in any way then this will
not run.
It is possible in such a case to cleanup manually most if not all traces
of a cluster installation by following these steps on all of the nodes:
1. Shutdown all instances. This depends on the virtualisation method
used (Xen, KVM, etc.):
- Xen: run ``xm list`` and ``xm destroy`` on all the non-Domain-0
instances
- KVM: kill all the KVM processes
- chroot: kill all processes under the chroot mountpoints
2. If using DRBD, shutdown all DRBD minors (which should by at this time
no-longer in use by instances); on each node, run ``drbdsetup
/dev/drbdN down`` for each active DRBD minor.
3. If using LVM, cleanup the Ganeti volume group; if only Ganeti created
logical volumes (and you are not sharing the volume group with the
OS, for example), then simply running ``lvremove -f xenvg`` (replace
'xenvg' with your volume group name) should do the required cleanup.
4. If using file-based storage, remove recursively all files and
directories under your file-storage directory: ``rm -rf
/srv/ganeti/file-storage/*`` replacing the path with the correct path
for your cluster.
5. Stop the ganeti daemons (``/etc/init.d/ganeti stop``) and kill any
that remain alive (``pgrep ganeti`` and ``pkill ganeti``).
6. Remove the ganeti state directory (``rm -rf /var/lib/ganeti/*``),
replacing the path with the correct path for your installation.
7. If using RBD, run ``rbd unmap /dev/rbdN`` to unmap the RBD disks.
Then remove the RBD disk images used by Ganeti, identified by their
UUIDs (``rbd rm uuid.rbd.diskN``).
On the master node, remove the cluster from the master-netdev (usually
``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by running
``ip a del $clusterip/32 dev xen-br0`` (use the correct cluster ip and
network device name).
At this point, the machines are ready for a cluster creation; in case
you want to remove Ganeti completely, you need to also undo some of the
SSH changes and log directories:
- ``rm -rf /var/log/ganeti /srv/ganeti`` (replace with the correct
paths)
- remove from ``/root/.ssh`` the keys that Ganeti added (check the
``authorized_keys`` and ``id_dsa`` files)
- regenerate the host's SSH keys (check the OpenSSH startup scripts)
- uninstall Ganeti
Otherwise, if you plan to re-create the cluster, you can just go ahead
and rerun ``gnt-cluster init``.
Replacing the SSH and SSL keys
++++++++++++++++++++++++++++++
Ganeti uses both SSL and SSH keys, and actively modifies the SSH keys on
the nodes. As result, in order to replace these keys, a few extra steps
need to be followed: :doc:`cluster-keys-replacement`
Monitoring the cluster
----------------------
Starting with Ganeti 2.8, a monitoring daemon is available, providing
information about the status and the performance of the system.
The monitoring daemon runs on every node, listening on TCP port 1815. Each
instance of the daemon provides information related to the node it is running
on.
.. include:: monitoring-query-format.rst
Tags handling
-------------
The tags handling (addition, removal, listing) is similar for all the
objects that support it (instances, nodes, and the cluster).
Limitations
+++++++++++
Note that the set of characters present in a tag and the maximum tag
length are restricted. Currently the maximum length is 128 characters,
there can be at most 4096 tags per object, and the set of characters is
comprised by alphanumeric characters and additionally ``.+*/:@-_``.
Operations
++++++++++
Tags can be added via ``add-tags``::
$ gnt-instance add-tags %INSTANCE% %a% %b% %c%
$ gnt-node add-tags %INSTANCE% %a% %b% %c%
$ gnt-cluster add-tags %a% %b% %c%
The above commands add three tags to an instance, to a node and to the
cluster. Note that the cluster command only takes tags as arguments,
whereas the node and instance commands first required the node and
instance name.
Tags can also be added from a file, via the ``--from=FILENAME``
argument. The file is expected to contain one tag per line.
Tags can also be remove via a syntax very similar to the add one::
$ gnt-instance remove-tags %INSTANCE% %a% %b% %c%
And listed via::
$ gnt-instance list-tags
$ gnt-node list-tags
$ gnt-cluster list-tags
Global tag search
+++++++++++++++++
It is also possible to execute a global search on the all tags defined
in the cluster configuration, via a cluster command::
$ gnt-cluster search-tags %REGEXP%
The parameter expected is a regular expression (see
:manpage:`regex(7)`). This will return all tags that match the search,
together with the object they are defined in (the names being show in a
hierarchical kind of way)::
$ gnt-cluster search-tags %o%
/cluster foo
/instances/instance1 owner:bar
Autorepair
----------
The tool ``harep`` can be used to automatically fix some problems that are
present in the cluster.
It is mainly meant to be regularly and automatically executed
as a cron job. This is quite evident by considering that, when executed, it does
not immediately fix all the issues of the instances of the cluster, but it
cycles the instances through a series of states, one at every ``harep``
execution. Every state performs a step towards the resolution of the problem.
This process goes on until the instance is brought back to the healthy state,
or the tool realizes that it is not able to fix the instance, and
therefore marks it as in failure state.
Allowing harep to act on the cluster
++++++++++++++++++++++++++++++++++++
By default, ``harep`` checks the status of the cluster but it is not allowed to
perform any modification. Modification must be explicitly allowed by an
appropriate use of tags. Tagging can be applied at various levels, and can
enable different kinds of autorepair, as hereafter described.
All the tags that authorize ``harep`` to perform modifications follow this
syntax::
ganeti:watcher:autorepair:
where ```` indicates the kind of intervention that can be performed. Every
possible value of ```` includes at least all the authorization of the
previous one, plus its own. The possible values, in increasing order of
severity, are:
- ``fix-storage`` allows a disk replacement or another operation that
fixes the instance backend storage without affecting the instance
itself. This can for example recover from a broken drbd secondary, but
risks data loss if something is wrong on the primary but the secondary
was somehow recoverable.
- ``migrate`` allows an instance migration. This can recover from a
drained primary, but can cause an instance crash in some cases (bugs).
- ``failover`` allows instance reboot on the secondary. This can recover
from an offline primary, but the instance will lose its running state.
- ``reinstall`` allows disks to be recreated and an instance to be
reinstalled. This can recover from primary&secondary both being
offline, or from an offline primary in the case of non-redundant
instances. It causes data loss.
These autorepair tags can be applied to a cluster, a nodegroup or an instance,
and will act where they are applied and to everything in the entities sub-tree
(e.g. a tag applied to a nodegroup will apply to all the instances contained in
that nodegroup, but not to the rest of the cluster).
If there are multiple ``ganeti:watcher:autorepair:`` tags in an
object (cluster, node group or instance), the least destructive tag
takes precedence. When multiplicity happens across objects, the nearest
tag wins. For example, if in a cluster with two instances, *I1* and
*I2*, *I1* has ``failover``, and the cluster itself has both
``fix-storage`` and ``reinstall``, *I1* will end up with ``failover``
and *I2* with ``fix-storage``.
Limiting harep
++++++++++++++
Sometimes it is useful to stop harep from performing its task temporarily,
and it is useful to be able to do so without distrupting its configuration, that
is, without removing the authorization tags. In order to do this, suspend tags
are provided.
Suspend tags can be added to cluster, nodegroup or instances, and act on the
entire entities sub-tree. No operation will be performed by ``harep`` on the
instances protected by a suspend tag. Their syntax is as follows::
ganeti:watcher:autorepair:suspend[:]
If there are multiple suspend tags in an object, the form without timestamp
takes precedence (permanent suspension); or, if all object tags have a
timestamp, the one with the highest timestamp.
Tags with a timestamp will be automatically removed when the time indicated by
the timestamp is passed. Indefinite suspension tags have to be removed manually.
Result reporting
++++++++++++++++
Harep will report about the result of its actions both through its CLI, and by
adding tags to the instances it operated on. Such tags will follow the syntax
hereby described::
ganeti:watcher:autorepair:result:::::
If this tag is present a repair of type ``type`` has been performed on
the instance and has been completed by ``timestamp``. The result is
either ``success``, ``failure`` or ``enoperm``, and jobs is a
*+*-separated list of jobs that were executed for this repair.
An ``enoperm`` result is an error state due to permission problems. It
is returned when the repair cannot proceed because it would require to perform
an operation that is not allowed by the ``ganeti:watcher:autorepair:`` tag
that is defining the instance autorepair permissions.
NB: if an instance repair ends up in a failure state, it will not be touched
again by ``harep`` until it has been manually fixed by the system administrator
and the ``ganeti:watcher:autorepair:result:failure:*`` tag has been manually
removed.
Job operations
--------------
The various jobs submitted by the instance/node/cluster commands can be
examined, canceled and archived by various invocations of the
``gnt-job`` command.
First is the job list command::
$ gnt-job list
17771 success INSTANCE_QUERY_DATA
17773 success CLUSTER_VERIFY_DISKS
17775 success CLUSTER_REPAIR_DISK_SIZES
17776 error CLUSTER_RENAME(cluster.example.com)
17780 success CLUSTER_REDIST_CONF
17792 success INSTANCE_REBOOT(instance1.example.com)
More detailed information about a job can be found via the ``info``
command::
$ gnt-job info %17776%
Job ID: 17776
Status: error
Received: 2009-10-25 23:18:02.180569
Processing start: 2009-10-25 23:18:02.200335 (delta 0.019766s)
Processing end: 2009-10-25 23:18:02.279743 (delta 0.079408s)
Total processing time: 0.099174 seconds
Opcodes:
OP_CLUSTER_RENAME
Status: error
Processing start: 2009-10-25 23:18:02.200335
Processing end: 2009-10-25 23:18:02.252282
Input fields:
name: cluster.example.com
Result:
OpPrereqError
[Neither the name nor the IP address of the cluster has changed]
Execution log:
During the execution of a job, it's possible to follow the output of a
job, similar to the log that one get from the ``gnt-`` commands, via the
watch command::
$ gnt-instance add --submit … %instance1%
JobID: 17818
$ gnt-job watch %17818%
Output from job 17818 follows
-----------------------------
Mon Oct 26 00:22:48 2009 - INFO: Selected nodes for instance instance1 via iallocator dumb: node1, node2
Mon Oct 26 00:22:49 2009 * creating instance disks...
Mon Oct 26 00:22:52 2009 adding instance instance1 to cluster config
Mon Oct 26 00:22:52 2009 - INFO: Waiting for instance instance1 to sync disks.
…
Mon Oct 26 00:23:03 2009 creating os for instance instance1 on node node1
Mon Oct 26 00:23:03 2009 * running the instance OS create scripts...
Mon Oct 26 00:23:13 2009 * starting instance...
$
This is useful if you need to follow a job's progress from multiple
terminals.
A job that has not yet started to run can be canceled::
$ gnt-job cancel %17810%
But not one that has already started execution::
$ gnt-job cancel %17805%
Job 17805 is no longer waiting in the queue
There are two queues for jobs: the *current* and the *archive*
queue. Jobs are initially submitted to the current queue, and they stay
in that queue until they have finished execution (either successfully or
not). At that point, they can be moved into the archive queue using e.g.
``gnt-job autoarchive all``. The ``ganeti-watcher`` script will do this
automatically 6 hours after a job is finished. The ``ganeti-cleaner``
script will then remove archived the jobs from the archive directory
after three weeks.
Note that ``gnt-job list`` only shows jobs in the current queue.
Archived jobs can be viewed using ``gnt-job info ``.
Special Ganeti deployments
--------------------------
Since Ganeti 2.4, it is possible to extend the Ganeti deployment with
two custom scenarios: Ganeti inside Ganeti and multi-site model.
Running Ganeti under Ganeti
+++++++++++++++++++++++++++
It is sometimes useful to be able to use a Ganeti instance as a Ganeti
node (part of another cluster, usually). One example scenario is two
small clusters, where we want to have an additional master candidate
that holds the cluster configuration and can be used for helping with
the master voting process.
However, these Ganeti instance should not host instances themselves, and
should not be considered in the normal capacity planning, evacuation
strategies, etc. In order to accomplish this, mark these nodes as
non-``vm_capable``::
$ gnt-node modify --vm-capable=no %node3%
The vm_capable status can be listed as usual via ``gnt-node list``::
$ gnt-node list -oname,vm_capable
Node VMCapable
node1 Y
node2 Y
node3 N
When this flag is set, the cluster will not do any operations that
relate to instances on such nodes, e.g. hypervisor operations,
disk-related operations, etc. Basically they will just keep the ssconf
files, and if master candidates the full configuration.
Multi-site model
++++++++++++++++
If Ganeti is deployed in multi-site model, with each site being a node
group (so that instances are not relocated across the WAN by mistake),
it is conceivable that either the WAN latency is high or that some sites
have a lower reliability than others. In this case, it doesn't make
sense to replicate the job information across all sites (or even outside
of a “central†node group), so it should be possible to restrict which
nodes can become master candidates via the auto-promotion algorithm.
Ganeti 2.4 introduces for this purpose a new ``master_capable`` flag,
which (when unset) prevents nodes from being marked as master
candidates, either manually or automatically.
As usual, the node modify operation can change this flag::
$ gnt-node modify --auto-promote --master-capable=no %node3%
Fri Jan 7 06:23:07 2011 - INFO: Demoting from master candidate
Fri Jan 7 06:23:08 2011 - INFO: Promoted nodes to master candidate role: node4
Modified node node3
- master_capable -> False
- master_candidate -> False
And the node list operation will list this flag::
$ gnt-node list -oname,master_capable %node1% %node2% %node3%
Node MasterCapable
node1 Y
node2 Y
node3 N
Note that marking a node both not ``vm_capable`` and not
``master_capable`` makes the node practically unusable from Ganeti's
point of view. Hence these two flags should be used probably in
contrast: some nodes will be only master candidates (master_capable but
not vm_capable), and other nodes will only hold instances (vm_capable
but not master_capable).
Ganeti tools
------------
Beside the usual ``gnt-`` and ``ganeti-`` commands which are provided
and installed in ``$prefix/sbin`` at install time, there are a couple of
other tools installed which are used seldom but can be helpful in some
cases.
lvmstrap
++++++++
The ``lvmstrap`` tool, introduced in :ref:`configure-lvm-label` section,
has two modes of operation:
- ``diskinfo`` shows the discovered disks on the system and their status
- ``create`` takes all not-in-use disks and creates a volume group out
of them
.. warning:: The ``create`` argument to this command causes data-loss!
cfgupgrade
++++++++++
The ``cfgupgrade`` tools is used to upgrade between major (and minor)
Ganeti versions, and to roll back. Point-releases are usually
transparent for the admin.
More information about the upgrade procedure is listed on the wiki at
http://code.google.com/p/ganeti/wiki/UpgradeNotes.
There is also a script designed to upgrade from Ganeti 1.2 to 2.0,
called ``cfgupgrade12``.
cfgshell
++++++++
.. note:: This command is not actively maintained; make sure you backup
your configuration before using it
This can be used as an alternative to direct editing of the
main configuration file if Ganeti has a bug and prevents you, for
example, from removing an instance or a node from the configuration
file.
.. _burnin-label:
burnin
++++++
.. warning:: This command will erase existing instances if given as
arguments!
This tool is used to exercise either the hardware of machines or
alternatively the Ganeti software. It is safe to run on an existing
cluster **as long as you don't pass it existing instance names**.
The command will, by default, execute a comprehensive set of operations
against a list of instances, these being:
- creation
- disk replacement (for redundant instances)
- failover and migration (for redundant instances)
- move (for non-redundant instances)
- disk growth
- add disks, remove disk
- add NICs, remove NICs
- export and then import
- rename
- reboot
- shutdown/startup
- and finally removal of the test instances
Executing all these operations will test that the hardware performs
well: the creation, disk replace, disk add and disk growth will exercise
the storage and network; the migrate command will test the memory of the
systems. Depending on the passed options, it can also test that the
instance OS definitions are executing properly the rename, import and
export operations.
sanitize-config
+++++++++++++++
This tool takes the Ganeti configuration and outputs a "sanitized"
version, by randomizing or clearing:
- DRBD secrets and cluster public key (always)
- host names (optional)
- IPs (optional)
- OS names (optional)
- LV names (optional, only useful for very old clusters which still have
instances whose LVs are based on the instance name)
By default, all optional items are activated except the LV name
randomization. When passing ``--no-randomization``, which disables the
optional items (i.e. just the DRBD secrets and cluster public keys are
randomized), the resulting file can be used as a safety copy of the
cluster config - while not trivial, the layout of the cluster can be
recreated from it and if the instance disks have not been lost it
permits recovery from the loss of all master candidates.
move-instance
+++++++++++++
See :doc:`separate documentation for move-instance `.
users-setup
+++++++++++
Ganeti can either be run entirely as root, or with every daemon running as
its own specific user (if the parameters ``--with-user-prefix`` and/or
``--with-group-prefix`` have been specified at ``./configure``-time).
In case split users are activated, they are required to exist on the system,
and they need to belong to the proper groups in order for the access
permissions to files and programs to be correct.
The ``users-setup`` tool, when run, takes care of setting up the proper
users and groups.
When invoked without parameters, the tool runs in interactive mode, showing the
list of actions it will perform and asking for confirmation before proceeding.
Providing the ``--yes-do-it`` parameter to the tool prevents the confirmation
from being asked, and the users and groups will be created immediately.
.. TODO: document cluster-merge tool
Other Ganeti projects
---------------------
Below is a list (which might not be up-to-date) of additional projects
that can be useful in a Ganeti deployment. They can be downloaded from
the project site (http://code.google.com/p/ganeti/) and the repositories
are also on the project git site (http://git.ganeti.org).
NBMA tools
++++++++++
The ``ganeti-nbma`` software is designed to allow instances to live on a
separate, virtual network from the nodes, and in an environment where
nodes are not guaranteed to be able to reach each other via multicasting
or broadcasting. For more information see the README in the source
archive.
ganeti-htools
+++++++++++++
Before Ganeti version 2.5, this was a standalone project; since that
version it is integrated into the Ganeti codebase (see
:doc:`install-quick` for instructions on how to enable it). If you run
an older Ganeti version, you will have to download and build it
separately.
For more information and installation instructions, see the README file
in the source archive.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/cluster-keys-replacement.rst 0000644 0000000 0000000 00000010324 12634264163 0021303 0 ustar 00root root 0000000 0000000 ========================
Cluster Keys Replacement
========================
Ganeti uses both SSL and SSH keys, and actively modifies the SSH keys
on the nodes. As result, in order to replace these keys, a few extra
steps need to be followed.
For an example when this could be needed, see the thread at
`Regenerating SSL and SSH keys after the security bug in Debian's
OpenSSL
`_.
Ganeti uses OpenSSL for encryption on the RPC layer and SSH for
executing commands. The SSL certificate is automatically generated
when the cluster is initialized and it's copied to added nodes
automatically together with the master's SSH host key.
Note that paths below may vary depending on your distribution. In
general, modifications should be done on the master node and then
distributed to all nodes of a cluster (possibly using a pendrive - but
don't forget to use "shred" to remove files securely afterwards).
Replacing SSL keys
==================
The cluster-wide SSL key is stored in ``/var/lib/ganeti/server.pem``.
Besides that, since Ganeti 2.11, each node has an individual node
SSL key, which is stored in ``/var/lib/ganeti/client.pem``. This
client certificate is signed by the cluster-wide SSL certficate.
To renew the individual node certificates, run this command::
gnt-cluster renew-crypto --new-node-certificates
Run the following command to generate a new cluster-wide certificate::
gnt-cluster renew-crypto --new-cluster-certificate
Note that this triggers both, the renewal of the cluster certificate
as well as the renewal of the individual node certificate. The reason
for this is that the node certificates are signed by the cluster
certificate and thus they need to be renewed and signed as soon as
the changes certificate changes. Therefore, the command above is
equivalent to::
gnt-cluster renew-crypto --new-cluster-certificate --new-node-certificates
On older versions, which don't have this command, use this instead::
chmod 0600 /var/lib/ganeti/server.pem &&
openssl req -new -newkey rsa:1024 -days 1825 -nodes \
-x509 -keyout /var/lib/ganeti/server.pem \
-out /var/lib/ganeti/server.pem -batch &&
chmod 0400 /var/lib/ganeti/server.pem &&
/etc/init.d/ganeti restart
gnt-cluster copyfile /var/lib/ganeti/server.pem
gnt-cluster command /etc/init.d/ganeti restart
Note that older versions don't have individual node certificates and thus
one does not have to handle the creation and distribution of them.
Replacing SSH keys
==================
There are two sets of SSH keys in the cluster: the host keys (both DSA
and RSA, though Ganeti only uses the RSA one) and the root's DSA key
(Ganeti uses DSA for historically reasons, in the future RSA will be
used).
host keys
+++++++++
These are the files named ``/etc/ssh/ssh_host_*``. You need to
manually recreate them; it's possibly that the startup script of
OpenSSH will generate them if they don't exist, or that the package
system regenerates them.
Also make sure to copy the master's SSH host keys to all other nodes.
cluster public key file
+++++++++++++++++++++++
The new public rsa host key created in the previous step must be added
in two places:
#. known hosts file, ``/var/lib/ganeti/known_hosts``
#. cluster configuration file, ``/var/lib/ganeti/config.data``
Edit these two files and update them with newly generated SSH host key
(in the previous step, take it from the
``/etc/ssh/ssh_host_rsa_key.pub``).
For the ``config.data`` file, please look for an entry named
``rsahostkeypub`` and replace the value for it with the contents of
the ``.pub`` file. For the ``known_hosts`` file, you need to replace
the old key with the new one on each line (for each host).
root's key
++++++++++
These are the files named ``~root/.ssh/id_dsa*``.
Run this command to rebuild them::
ssh-keygen -t dsa -f ~root/.ssh/id_dsa -q -N ""
root's ``authorized_keys``
++++++++++++++++++++++++++
This is the file named ``~root/.ssh/authorized_keys``.
Edit file and update it with the newly generated root key, from the
``id_dsa.pub`` file generated in the previous step.
Finish
======
In the end, the files mentioned above should be identical for all
nodes in a cluster. Also do not forget to run ``gnt-cluster verify``.
ganeti-2.15.2/doc/cluster-merge.rst 0000644 0000000 0000000 00000004721 12634264163 0017136 0 ustar 00root root 0000000 0000000 ================
Merging clusters
================
With ``cluster-merge`` from the ``tools`` directory it is possible to
merge two or more clusters into one single cluster.
If anything goes wrong at any point the script suggests you rollback
steps you've to perform *manually* if there are any. The point of no
return is when the master daemon is started the first time after merging
the configuration files. A rollback at this point would involve a lot of
manual work.
For the internal design of this tool have a look at the `Automated
Ganeti Cluster Merger ` document.
Merge Clusters
==============
The tool has to be invoked on the cluster you like to merge the other
clusters into.
The usage of ``cluster-merge`` is as follows::
cluster-merge [--debug|--verbose] [--watcher-pause-period SECONDS] \
[--groups [merge|rename]] []
You can provide multiple clusters. The tool will then go over every
cluster in serial and perform the steps to merge it into the invoking
cluster.
These options can be used to control the behaviour of the tool:
``--debug``/``--verbose``
These options are mutually exclusive and increase the level of output
to either debug output or just more verbose output like action
performed right now.
``--watcher-pause-period``
Define the period of time in seconds the watcher shall be disabled,
default is 1800 seconds (30 minutes).
``--groups``
This option controls how ``cluster-merge`` handles duplicate node
group names on the merging clusters. If ``merge`` is specified then
all node groups with the same name will be merged into one. If
``rename`` is specified, then conflicting node groups on the remove
clusters will have their cluster name appended to the group name. If
this option is not speicifed, then ``cluster-merge`` will refuse to
continue if it finds conflicting group names, otherwise it will
proceed as normal.
Rollback
========
If for any reason something in the merge doesn't work the way it should
``cluster-merge`` will abort, provide an error message and optionally
rollback steps. Please be aware that after a certain point there's no
easy way to rollback the cluster to its previous state. If you've
reached that point the tool will not provide any rollback steps.
If you end up with rollback steps, please perform them before invoking
the tool again. It doesn't keep state over invokations.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/conf.py 0000644 0000000 0000000 00000016065 12634264163 0015131 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
#
# Ganeti documentation build configuration file, created by
# sphinx-quickstart on Tue Apr 14 13:23:20 2009.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys, os
enable_manpages = bool(os.getenv("ENABLE_MANPAGES"))
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.append(os.path.abspath("."))
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
needs_sphinx = "1.0"
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named "sphinx.ext.*") or your custom ones.
extensions = [
"sphinx.ext.todo",
"sphinx.ext.graphviz",
"ganeti.build.sphinx_ext",
"ganeti.build.shell_example_lexer",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# The suffix of source filenames.
source_suffix = ".rst"
# The encoding of source files.
source_encoding = "utf-8"
# The master toctree document.
master_doc = "index"
# General information about the project.
project = u"Ganeti"
copyright = u"%s Google Inc." % ", ".join(map(str, range(2006, 2013 + 1)))
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# These next two will be passed via the command line, see the makefile
# The short X.Y version
#version = VERSION_MAJOR + "." + VERSION_MINOR
# The full version, including alpha/beta/rc tags.
#release = PACKAGE_VERSION
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
language = "en"
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ""
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = "%B %d, %Y"
# List of documents that shouldn't be included in the build.
#unused_docs = []
if enable_manpages:
exclude_patterns = []
else:
exclude_patterns = [
"man-*.rst",
]
# List of directories, relative to source directory, that shouldn't be searched
# for source files.
exclude_trees = [
"_build",
"api",
"coverage"
"examples",
]
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, "()" will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = os.getenv("HTML_THEME")
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# " v documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["css"]
html_style = "style.css"
# If not "", a "Last updated on:" timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = "%b %d, %Y"
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
html_use_modindex = False
# If false, no index is generated.
html_use_index = False
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ""
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = ""
# Output file base name for HTML help builder.
htmlhelp_basename = "Ganetidoc"
# -- Options for LaTeX output --------------------------------------------------
# The paper size ("letter" or "a4").
#latex_paper_size = "a4"
# The font size ("10pt", "11pt" or "12pt").
#latex_font_size = "10pt"
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
("index", "Ganeti.tex", u"Ganeti Documentation",
u"Google Inc.", "manual"),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Additional stuff for the LaTeX preamble.
#latex_preamble = ""
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
latex_use_modindex = False
ganeti-2.15.2/doc/css/ 0000755 0000000 0000000 00000000000 12634264163 0014412 5 ustar 00root root 0000000 0000000 ganeti-2.15.2/doc/css/style.css 0000644 0000000 0000000 00000000077 12634264163 0016270 0 ustar 00root root 0000000 0000000 @import url(default.css);
a {
text-decoration: underline;
}
ganeti-2.15.2/doc/design-2.0.rst 0000644 0000000 0000000 00000231203 12634264163 0016123 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.0 design
=================
This document describes the major changes in Ganeti 2.0 compared to
the 1.2 version.
The 2.0 version will constitute a rewrite of the 'core' architecture,
paving the way for additional features in future 2.x versions.
.. contents:: :depth: 3
Objective
=========
Ganeti 1.2 has many scalability issues and restrictions due to its
roots as software for managing small and 'static' clusters.
Version 2.0 will attempt to remedy first the scalability issues and
then the restrictions.
Background
==========
While Ganeti 1.2 is usable, it severely limits the flexibility of the
cluster administration and imposes a very rigid model. It has the
following main scalability issues:
- only one operation at a time on the cluster [#]_
- poor handling of node failures in the cluster
- mixing hypervisors in a cluster not allowed
It also has a number of artificial restrictions, due to historical
design:
- fixed number of disks (two) per instance
- fixed number of NICs
.. [#] Replace disks will release the lock, but this is an exception
and not a recommended way to operate
The 2.0 version is intended to address some of these problems, and
create a more flexible code base for future developments.
Among these problems, the single-operation at a time restriction is
biggest issue with the current version of Ganeti. It is such a big
impediment in operating bigger clusters that many times one is tempted
to remove the lock just to do a simple operation like start instance
while an OS installation is running.
Scalability problems
--------------------
Ganeti 1.2 has a single global lock, which is used for all cluster
operations. This has been painful at various times, for example:
- It is impossible for two people to efficiently interact with a cluster
(for example for debugging) at the same time.
- When batch jobs are running it's impossible to do other work (for
example failovers/fixes) on a cluster.
This poses scalability problems: as clusters grow in node and instance
size it's a lot more likely that operations which one could conceive
should run in parallel (for example because they happen on different
nodes) are actually stalling each other while waiting for the global
lock, without a real reason for that to happen.
One of the main causes of this global lock (beside the higher
difficulty of ensuring data consistency in a more granular lock model)
is the fact that currently there is no long-lived process in Ganeti
that can coordinate multiple operations. Each command tries to acquire
the so called *cmd* lock and when it succeeds, it takes complete
ownership of the cluster configuration and state.
Other scalability problems are due the design of the DRBD device
model, which assumed at its creation a low (one to four) number of
instances per node, which is no longer true with today's hardware.
Artificial restrictions
-----------------------
Ganeti 1.2 (and previous versions) have a fixed two-disks, one-NIC per
instance model. This is a purely artificial restrictions, but it
touches multiple areas (configuration, import/export, command line)
that it's more fitted to a major release than a minor one.
Architecture issues
-------------------
The fact that each command is a separate process that reads the
cluster state, executes the command, and saves the new state is also
an issue on big clusters where the configuration data for the cluster
begins to be non-trivial in size.
Overview
========
In order to solve the scalability problems, a rewrite of the core
design of Ganeti is required. While the cluster operations themselves
won't change (e.g. start instance will do the same things, the way
these operations are scheduled internally will change radically.
The new design will change the cluster architecture to:
.. digraph:: "ganeti-2.0-architecture"
compound=false
concentrate=true
mclimit=100.0
nslimit=100.0
edge[fontsize="8" fontname="Helvetica-Oblique"]
node[width="0" height="0" fontsize="12" fontcolor="black" shape=rect]
subgraph outside {
rclient[label="external clients"]
label="Outside the cluster"
}
subgraph cluster_inside {
label="ganeti cluster"
labeljust=l
subgraph cluster_master_node {
label="master node"
rapi[label="RAPI daemon"]
cli[label="CLI"]
watcher[label="Watcher"]
burnin[label="Burnin"]
masterd[shape=record style=filled label="{ luxi endpoint | master I/O thread | job queue | { worker| worker | worker }}"]
{rapi;cli;watcher;burnin} -> masterd:luxi [label="LUXI" labelpos=100]
}
subgraph cluster_nodes {
label="nodes"
noded1 [shape=record label="{ RPC listener | Disk management | Network management | Hypervisor } "]
noded2 [shape=record label="{ RPC listener | Disk management | Network management | Hypervisor } "]
noded3 [shape=record label="{ RPC listener | Disk management | Network management | Hypervisor } "]
}
masterd:w2 -> {noded1;noded2;noded3} [label="node RPC"]
cli -> {noded1;noded2;noded3} [label="SSH"]
}
rclient -> rapi [label="RAPI protocol"]
This differs from the 1.2 architecture by the addition of the master
daemon, which will be the only entity to talk to the node daemons.
Detailed design
===============
The changes for 2.0 can be split into roughly three areas:
- core changes that affect the design of the software
- features (or restriction removals) but which do not have a wide
impact on the design
- user-level and API-level changes which translate into differences for
the operation of the cluster
Core changes
------------
The main changes will be switching from a per-process model to a
daemon based model, where the individual gnt-* commands will be
clients that talk to this daemon (see `Master daemon`_). This will
allow us to get rid of the global cluster lock for most operations,
having instead a per-object lock (see `Granular locking`_). Also, the
daemon will be able to queue jobs, and this will allow the individual
clients to submit jobs without waiting for them to finish, and also
see the result of old requests (see `Job Queue`_).
Beside these major changes, another 'core' change but that will not be
as visible to the users will be changing the model of object attribute
storage, and separate that into name spaces (such that an Xen PVM
instance will not have the Xen HVM parameters). This will allow future
flexibility in defining additional parameters. For more details see
`Object parameters`_.
The various changes brought in by the master daemon model and the
read-write RAPI will require changes to the cluster security; we move
away from Twisted and use HTTP(s) for intra- and extra-cluster
communications. For more details, see the security document in the
doc/ directory.
Master daemon
~~~~~~~~~~~~~
In Ganeti 2.0, we will have the following *entities*:
- the master daemon (on the master node)
- the node daemon (on all nodes)
- the command line tools (on the master node)
- the RAPI daemon (on the master node)
The master-daemon related interaction paths are:
- (CLI tools/RAPI daemon) and the master daemon, via the so called
*LUXI* API
- the master daemon and the node daemons, via the node RPC
There are also some additional interaction paths for exceptional cases:
- CLI tools might access via SSH the nodes (for ``gnt-cluster copyfile``
and ``gnt-cluster command``)
- master failover is a special case when a non-master node will SSH
and do node-RPC calls to the current master
The protocol between the master daemon and the node daemons will be
changed from (Ganeti 1.2) Twisted PB (perspective broker) to HTTP(S),
using a simple PUT/GET of JSON-encoded messages. This is done due to
difficulties in working with the Twisted framework and its protocols
in a multithreaded environment, which we can overcome by using a
simpler stack (see the caveats section).
The protocol between the CLI/RAPI and the master daemon will be a
custom one (called *LUXI*): on a UNIX socket on the master node, with
rights restricted by filesystem permissions, the CLI/RAPI will talk to
the master daemon using JSON-encoded messages.
The operations supported over this internal protocol will be encoded
via a python library that will expose a simple API for its
users. Internally, the protocol will simply encode all objects in JSON
format and decode them on the receiver side.
For more details about the RAPI daemon see `Remote API changes`_, and
for the node daemon see `Node daemon changes`_.
.. _luxi:
The LUXI protocol
+++++++++++++++++
As described above, the protocol for making requests or queries to the
master daemon will be a UNIX-socket based simple RPC of JSON-encoded
messages.
The choice of UNIX was in order to get rid of the need of
authentication and authorisation inside Ganeti; for 2.0, the
permissions on the Unix socket itself will determine the access
rights.
We will have two main classes of operations over this API:
- cluster query functions
- job related functions
The cluster query functions are usually short-duration, and are the
equivalent of the ``OP_QUERY_*`` opcodes in Ganeti 1.2 (and they are
internally implemented still with these opcodes). The clients are
guaranteed to receive the response in a reasonable time via a timeout.
The job-related functions will be:
- submit job
- query job (which could also be categorized in the query-functions)
- archive job (see the job queue design doc)
- wait for job change, which allows a client to wait without polling
For more details of the actual operation list, see the `Job Queue`_.
Both requests and responses will consist of a JSON-encoded message
followed by the ``ETX`` character (ASCII decimal 3), which is not a
valid character in JSON messages and thus can serve as a message
delimiter. The contents of the messages will be a dictionary with two
fields:
:method:
the name of the method called
:args:
the arguments to the method, as a list (no keyword arguments allowed)
Responses will follow the same format, with the two fields being:
:success:
a boolean denoting the success of the operation
:result:
the actual result, or error message in case of failure
There are two special value for the result field:
- in the case that the operation failed, and this field is a list of
length two, the client library will try to interpret is as an
exception, the first element being the exception type and the second
one the actual exception arguments; this will allow a simple method of
passing Ganeti-related exception across the interface
- for the *WaitForChange* call (that waits on the server for a job to
change status), if the result is equal to ``nochange`` instead of the
usual result for this call (a list of changes), then the library will
internally retry the call; this is done in order to differentiate
internally between master daemon hung and job simply not changed
Users of the API that don't use the provided python library should
take care of the above two cases.
Master daemon implementation
++++++++++++++++++++++++++++
The daemon will be based around a main I/O thread that will wait for
new requests from the clients, and that does the setup/shutdown of the
other thread (pools).
There will two other classes of threads in the daemon:
- job processing threads, part of a thread pool, and which are
long-lived, started at daemon startup and terminated only at shutdown
time
- client I/O threads, which are the ones that talk the local protocol
(LUXI) to the clients, and are short-lived
Master startup/failover
+++++++++++++++++++++++
In Ganeti 1.x there is no protection against failing over the master
to a node with stale configuration. In effect, the responsibility of
correct failovers falls on the admin. This is true both for the new
master and for when an old, offline master startup.
Since in 2.x we are extending the cluster state to cover the job queue
and have a daemon that will execute by itself the job queue, we want
to have more resilience for the master role.
The following algorithm will happen whenever a node is ready to
transition to the master role, either at startup time or at node
failover:
#. read the configuration file and parse the node list
contained within
#. query all the nodes and make sure we obtain an agreement via
a quorum of at least half plus one nodes for the following:
- we have the latest configuration and job list (as
determined by the serial number on the configuration and
highest job ID on the job queue)
- if we are not failing over (but just starting), the
quorum agrees that we are the designated master
- if any of the above is false, we prevent the current operation
(i.e. we don't become the master)
#. at this point, the node transitions to the master role
#. for all the in-progress jobs, mark them as failed, with
reason unknown or something similar (master failed, etc.)
Since due to exceptional conditions we could have a situation in which
no node can become the master due to inconsistent data, we will have
an override switch for the master daemon startup that will assume the
current node has the right data and will replicate all the
configuration files to the other nodes.
**Note**: the above algorithm is by no means an election algorithm; it
is a *confirmation* of the master role currently held by a node.
Logging
+++++++
The logging system will be switched completely to the standard python
logging module; currently it's logging-based, but exposes a different
API, which is just overhead. As such, the code will be switched over
to standard logging calls, and only the setup will be custom.
With this change, we will remove the separate debug/info/error logs,
and instead have always one logfile per daemon model:
- master-daemon.log for the master daemon
- node-daemon.log for the node daemon (this is the same as in 1.2)
- rapi-daemon.log for the RAPI daemon logs
- rapi-access.log, an additional log file for the RAPI that will be
in the standard HTTP log format for possible parsing by other tools
Since the :term:`watcher` will only submit jobs to the master for
startup of the instances, its log file will contain less information
than before, mainly that it will start the instance, but not the
results.
Node daemon changes
+++++++++++++++++++
The only change to the node daemon is that, since we need better
concurrency, we don't process the inter-node RPC calls in the node
daemon itself, but we fork and process each request in a separate
child.
Since we don't have many calls, and we only fork (not exec), the
overhead should be minimal.
Caveats
+++++++
A discussed alternative is to keep the current individual processes
touching the cluster configuration model. The reasons we have not
chosen this approach is:
- the speed of reading and unserializing the cluster state
today is not small enough that we can ignore it; the addition of
the job queue will make the startup cost even higher. While this
runtime cost is low, it can be on the order of a few seconds on
bigger clusters, which for very quick commands is comparable to
the actual duration of the computation itself
- individual commands would make it harder to implement a
fire-and-forget job request, along the lines "start this
instance but do not wait for it to finish"; it would require a
model of backgrounding the operation and other things that are
much better served by a daemon-based model
Another area of discussion is moving away from Twisted in this new
implementation. While Twisted has its advantages, there are also many
disadvantages to using it:
- first and foremost, it's not a library, but a framework; thus, if
you use twisted, all the code needs to be 'twiste-ized' and written
in an asynchronous manner, using deferreds; while this method works,
it's not a common way to code and it requires that the entire process
workflow is based around a single *reactor* (Twisted name for a main
loop)
- the more advanced granular locking that we want to implement would
require, if written in the async-manner, deep integration with the
Twisted stack, to such an extend that business-logic is inseparable
from the protocol coding; we felt that this is an unreasonable
request, and that a good protocol library should allow complete
separation of low-level protocol calls and business logic; by
comparison, the threaded approach combined with HTTPs protocol
required (for the first iteration) absolutely no changes from the 1.2
code, and later changes for optimizing the inter-node RPC calls
required just syntactic changes (e.g. ``rpc.call_...`` to
``self.rpc.call_...``)
Another issue is with the Twisted API stability - during the Ganeti
1.x lifetime, we had to to implement many times workarounds to changes
in the Twisted version, so that for example 1.2 is able to use both
Twisted 2.x and 8.x.
In the end, since we already had an HTTP server library for the RAPI,
we just reused that for inter-node communication.
Granular locking
~~~~~~~~~~~~~~~~
We want to make sure that multiple operations can run in parallel on a
Ganeti Cluster. In order for this to happen we need to make sure
concurrently run operations don't step on each other toes and break the
cluster.
This design addresses how we are going to deal with locking so that:
- we preserve data coherency
- we prevent deadlocks
- we prevent job starvation
Reaching the maximum possible parallelism is a Non-Goal. We have
identified a set of operations that are currently bottlenecks and need
to be parallelised and have worked on those. In the future it will be
possible to address other needs, thus making the cluster more and more
parallel one step at a time.
This section only talks about parallelising Ganeti level operations, aka
Logical Units, and the locking needed for that. Any other
synchronization lock needed internally by the code is outside its scope.
Library details
+++++++++++++++
The proposed library has these features:
- internally managing all the locks, making the implementation
transparent from their usage
- automatically grabbing multiple locks in the right order (avoid
deadlock)
- ability to transparently handle conversion to more granularity
- support asynchronous operation (future goal)
Locking will be valid only on the master node and will not be a
distributed operation. Therefore, in case of master failure, the
operations currently running will be aborted and the locks will be
lost; it remains to the administrator to cleanup (if needed) the
operation result (e.g. make sure an instance is either installed
correctly or removed).
A corollary of this is that a master-failover operation with both
masters alive needs to happen while no operations are running, and
therefore no locks are held.
All the locks will be represented by objects (like
``lockings.SharedLock``), and the individual locks for each object
will be created at initialisation time, from the config file.
The API will have a way to grab one or more than one locks at the same
time. Any attempt to grab a lock while already holding one in the wrong
order will be checked for, and fail.
The Locks
+++++++++
At the first stage we have decided to provide the following locks:
- One "config file" lock
- One lock per node in the cluster
- One lock per instance in the cluster
All the instance locks will need to be taken before the node locks, and
the node locks before the config lock. Locks will need to be acquired at
the same time for multiple instances and nodes, and internal ordering
will be dealt within the locking library, which, for simplicity, will
just use alphabetical order.
Each lock has the following three possible statuses:
- unlocked (anyone can grab the lock)
- shared (anyone can grab/have the lock but only in shared mode)
- exclusive (no one else can grab/have the lock)
Handling conversion to more granularity
+++++++++++++++++++++++++++++++++++++++
In order to convert to a more granular approach transparently each time
we split a lock into more we'll create a "metalock", which will depend
on those sub-locks and live for the time necessary for all the code to
convert (or forever, in some conditions). When a metalock exists all
converted code must acquire it in shared mode, so it can run
concurrently, but still be exclusive with old code, which acquires it
exclusively.
In the beginning the only such lock will be what replaces the current
"command" lock, and will acquire all the locks in the system, before
proceeding. This lock will be called the "Big Ganeti Lock" because
holding that one will avoid any other concurrent Ganeti operations.
We might also want to devise more metalocks (eg. all nodes, all
nodes+config) in order to make it easier for some parts of the code to
acquire what it needs without specifying it explicitly.
In the future things like the node locks could become metalocks, should
we decide to split them into an even more fine grained approach, but
this will probably be only after the first 2.0 version has been
released.
Adding/Removing locks
+++++++++++++++++++++
When a new instance or a new node is created an associated lock must be
added to the list. The relevant code will need to inform the locking
library of such a change.
This needs to be compatible with every other lock in the system,
especially metalocks that guarantee to grab sets of resources without
specifying them explicitly. The implementation of this will be handled
in the locking library itself.
When instances or nodes disappear from the cluster the relevant locks
must be removed. This is easier than adding new elements, as the code
which removes them must own them exclusively already, and thus deals
with metalocks exactly as normal code acquiring those locks. Any
operation queuing on a removed lock will fail after its removal.
Asynchronous operations
+++++++++++++++++++++++
For the first version the locking library will only export synchronous
operations, which will block till the needed lock are held, and only
fail if the request is impossible or somehow erroneous.
In the future we may want to implement different types of asynchronous
operations such as:
- try to acquire this lock set and fail if not possible
- try to acquire one of these lock sets and return the first one you
were able to get (or after a timeout) (select/poll like)
These operations can be used to prioritize operations based on available
locks, rather than making them just blindly queue for acquiring them.
The inherent risk, though, is that any code using the first operation,
or setting a timeout for the second one, is susceptible to starvation
and thus may never be able to get the required locks and complete
certain tasks. Considering this providing/using these operations should
not be among our first priorities.
Locking granularity
+++++++++++++++++++
For the first version of this code we'll convert each Logical Unit to
acquire/release the locks it needs, so locking will be at the Logical
Unit level. In the future we may want to split logical units in
independent "tasklets" with their own locking requirements. A different
design doc (or mini design doc) will cover the move from Logical Units
to tasklets.
Code examples
+++++++++++++
In general when acquiring locks we should use a code path equivalent
to::
lock.acquire()
try:
...
# other code
finally:
lock.release()
This makes sure we release all locks, and avoid possible deadlocks. Of
course extra care must be used not to leave, if possible locked
structures in an unusable state. Note that with Python 2.5 a simpler
syntax will be possible, but we want to keep compatibility with Python
2.4 so the new constructs should not be used.
In order to avoid this extra indentation and code changes everywhere in
the Logical Units code, we decided to allow LUs to declare locks, and
then execute their code with their locks acquired. In the new world LUs
are called like this::
# user passed names are expanded to the internal lock/resource name,
# then known needed locks are declared
lu.ExpandNames()
... some locking/adding of locks may happen ...
# late declaration of locks for one level: this is useful because sometimes
# we can't know which resource we need before locking the previous level
lu.DeclareLocks() # for each level (cluster, instance, node)
... more locking/adding of locks can happen ...
# these functions are called with the proper locks held
lu.CheckPrereq()
lu.Exec()
... locks declared for removal are removed, all acquired locks released ...
The Processor and the LogicalUnit class will contain exact documentation
on how locks are supposed to be declared.
Caveats
+++++++
This library will provide an easy upgrade path to bring all the code to
granular locking without breaking everything, and it will also guarantee
against a lot of common errors. Code switching from the old "lock
everything" lock to the new system, though, needs to be carefully
scrutinised to be sure it is really acquiring all the necessary locks,
and none has been overlooked or forgotten.
The code can contain other locks outside of this library, to synchronise
other threaded code (eg for the job queue) but in general these should
be leaf locks or carefully structured non-leaf ones, to avoid deadlock
race conditions.
.. _jqueue-original-design:
Job Queue
~~~~~~~~~
Granular locking is not enough to speed up operations, we also need a
queue to store these and to be able to process as many as possible in
parallel.
A Ganeti job will consist of multiple ``OpCodes`` which are the basic
element of operation in Ganeti 1.2 (and will remain as such). Most
command-level commands are equivalent to one OpCode, or in some cases
to a sequence of opcodes, all of the same type (e.g. evacuating a node
will generate N opcodes of type replace disks).
Job execution—“Life of a Ganeti jobâ€
++++++++++++++++++++++++++++++++++++
#. Job gets submitted by the client. A new job identifier is generated
and assigned to the job. The job is then automatically replicated
[#replic]_ to all nodes in the cluster. The identifier is returned to
the client.
#. A pool of worker threads waits for new jobs. If all are busy, the job
has to wait and the first worker finishing its work will grab it.
Otherwise any of the waiting threads will pick up the new job.
#. Client waits for job status updates by calling a waiting RPC
function. Log message may be shown to the user. Until the job is
started, it can also be canceled.
#. As soon as the job is finished, its final result and status can be
retrieved from the server.
#. If the client archives the job, it gets moved to a history directory.
There will be a method to archive all jobs older than a a given age.
.. [#replic] We need replication in order to maintain the consistency
across all nodes in the system; the master node only differs in the
fact that now it is running the master daemon, but it if fails and we
do a master failover, the jobs are still visible on the new master
(though marked as failed).
Failures to replicate a job to other nodes will be only flagged as
errors in the master daemon log if more than half of the nodes failed,
otherwise we ignore the failure, and rely on the fact that the next
update (for still running jobs) will retry the update. For finished
jobs, it is less of a problem.
Future improvements will look into checking the consistency of the job
list and jobs themselves at master daemon startup.
Job storage
+++++++++++
Jobs are stored in the filesystem as individual files, serialized
using JSON (standard serialization mechanism in Ganeti).
The choice of storing each job in its own file was made because:
- a file can be atomically replaced
- a file can easily be replicated to other nodes
- checking consistency across nodes can be implemented very easily,
since all job files should be (at a given moment in time) identical
The other possible choices that were discussed and discounted were:
- single big file with all job data: not feasible due to difficult
updates
- in-process databases: hard to replicate the entire database to the
other nodes, and replicating individual operations does not mean wee
keep consistency
Queue structure
+++++++++++++++
All file operations have to be done atomically by writing to a temporary
file and subsequent renaming. Except for log messages, every change in a
job is stored and replicated to other nodes.
::
/var/lib/ganeti/queue/
job-1 (JSON encoded job description and status)
[…]
job-37
job-38
job-39
lock (Queue managing process opens this file in exclusive mode)
serial (Last job ID used)
version (Queue format version)
Locking
+++++++
Locking in the job queue is a complicated topic. It is called from more
than one thread and must be thread-safe. For simplicity, a single lock
is used for the whole job queue.
A more detailed description can be found in doc/locking.rst.
Internal RPC
++++++++++++
RPC calls available between Ganeti master and node daemons:
jobqueue_update(file_name, content)
Writes a file in the job queue directory.
jobqueue_purge()
Cleans the job queue directory completely, including archived job.
jobqueue_rename(old, new)
Renames a file in the job queue directory.
Client RPC
++++++++++
RPC between Ganeti clients and the Ganeti master daemon supports the
following operations:
SubmitJob(ops)
Submits a list of opcodes and returns the job identifier. The
identifier is guaranteed to be unique during the lifetime of a
cluster.
WaitForJobChange(job_id, fields, […], timeout)
This function waits until a job changes or a timeout expires. The
condition for when a job changed is defined by the fields passed and
the last log message received.
QueryJobs(job_ids, fields)
Returns field values for the job identifiers passed.
CancelJob(job_id)
Cancels the job specified by identifier. This operation may fail if
the job is already running, canceled or finished.
ArchiveJob(job_id)
Moves a job into the …/archive/ directory. This operation will fail if
the job has not been canceled or finished.
Job and opcode status
+++++++++++++++++++++
Each job and each opcode has, at any time, one of the following states:
Queued
The job/opcode was submitted, but did not yet start.
Waiting
The job/opcode is waiting for a lock to proceed.
Running
The job/opcode is running.
Canceled
The job/opcode was canceled before it started.
Success
The job/opcode ran and finished successfully.
Error
The job/opcode was aborted with an error.
If the master is aborted while a job is running, the job will be set to
the Error status once the master started again.
History
+++++++
Archived jobs are kept in a separate directory,
``/var/lib/ganeti/queue/archive/``. This is done in order to speed up
the queue handling: by default, the jobs in the archive are not
touched by any functions. Only the current (unarchived) jobs are
parsed, loaded, and verified (if implemented) by the master daemon.
Ganeti updates
++++++++++++++
The queue has to be completely empty for Ganeti updates with changes
in the job queue structure. In order to allow this, there will be a
way to prevent new jobs entering the queue.
Object parameters
~~~~~~~~~~~~~~~~~
Across all cluster configuration data, we have multiple classes of
parameters:
A. cluster-wide parameters (e.g. name of the cluster, the master);
these are the ones that we have today, and are unchanged from the
current model
#. node parameters
#. instance specific parameters, e.g. the name of disks (LV), that
cannot be shared with other instances
#. instance parameters, that are or can be the same for many
instances, but are not hypervisor related; e.g. the number of VCPUs,
or the size of memory
#. instance parameters that are hypervisor specific (e.g. kernel_path
or PAE mode)
The following definitions for instance parameters will be used below:
:hypervisor parameter:
a hypervisor parameter (or hypervisor specific parameter) is defined
as a parameter that is interpreted by the hypervisor support code in
Ganeti and usually is specific to a particular hypervisor (like the
kernel path for :term:`PVM` which makes no sense for :term:`HVM`).
:backend parameter:
a backend parameter is defined as an instance parameter that can be
shared among a list of instances, and is either generic enough not
to be tied to a given hypervisor or cannot influence at all the
hypervisor behaviour.
For example: memory, vcpus, auto_balance
All these parameters will be encoded into constants.py with the prefix
"BE\_" and the whole list of parameters will exist in the set
"BES_PARAMETERS"
:proper parameter:
a parameter whose value is unique to the instance (e.g. the name of a
LV, or the MAC of a NIC)
As a general rule, for all kind of parameters, “None†(or in
JSON-speak, “nilâ€) will no longer be a valid value for a parameter. As
such, only non-default parameters will be saved as part of objects in
the serialization step, reducing the size of the serialized format.
Cluster parameters
++++++++++++++++++
Cluster parameters remain as today, attributes at the top level of the
Cluster object. In addition, two new attributes at this level will
hold defaults for the instances:
- hvparams, a dictionary indexed by hypervisor type, holding default
values for hypervisor parameters that are not defined/overridden by
the instances of this hypervisor type
- beparams, a dictionary holding (for 2.0) a single element 'default',
which holds the default value for backend parameters
Node parameters
+++++++++++++++
Node-related parameters are very few, and we will continue using the
same model for these as previously (attributes on the Node object).
There are three new node flags, described in a separate section "node
flags" below.
Instance parameters
+++++++++++++++++++
As described before, the instance parameters are split in three:
instance proper parameters, unique to each instance, instance
hypervisor parameters and instance backend parameters.
The “hvparams†and “beparams†are kept in two dictionaries at instance
level. Only non-default parameters are stored (but once customized, a
parameter will be kept, even with the same value as the default one,
until reset).
The names for hypervisor parameters in the instance.hvparams subtree
should be choosen as generic as possible, especially if specific
parameters could conceivably be useful for more than one hypervisor,
e.g. ``instance.hvparams.vnc_console_port`` instead of using both
``instance.hvparams.hvm_vnc_console_port`` and
``instance.hvparams.kvm_vnc_console_port``.
There are some special cases related to disks and NICs (for example):
a disk has both Ganeti-related parameters (e.g. the name of the LV)
and hypervisor-related parameters (how the disk is presented to/named
in the instance). The former parameters remain as proper-instance
parameters, while the latter value are migrated to the hvparams
structure. In 2.0, we will have only globally-per-instance such
hypervisor parameters, and not per-disk ones (e.g. all NICs will be
exported as of the same type).
Starting from the 1.2 list of instance parameters, here is how they
will be mapped to the three classes of parameters:
- name (P)
- primary_node (P)
- os (P)
- hypervisor (P)
- status (P)
- memory (BE)
- vcpus (BE)
- nics (P)
- disks (P)
- disk_template (P)
- network_port (P)
- kernel_path (HV)
- initrd_path (HV)
- hvm_boot_order (HV)
- hvm_acpi (HV)
- hvm_pae (HV)
- hvm_cdrom_image_path (HV)
- hvm_nic_type (HV)
- hvm_disk_type (HV)
- vnc_bind_address (HV)
- serial_no (P)
Parameter validation
++++++++++++++++++++
To support the new cluster parameter design, additional features will
be required from the hypervisor support implementations in Ganeti.
The hypervisor support implementation API will be extended with the
following features:
:PARAMETERS: class-level attribute holding the list of valid parameters
for this hypervisor
:CheckParamSyntax(hvparams): checks that the given parameters are
valid (as in the names are valid) for this hypervisor; usually just
comparing ``hvparams.keys()`` and ``cls.PARAMETERS``; this is a class
method that can be called from within master code (i.e. cmdlib) and
should be safe to do so
:ValidateParameters(hvparams): verifies the values of the provided
parameters against this hypervisor; this is a method that will be
called on the target node, from backend.py code, and as such can
make node-specific checks (e.g. kernel_path checking)
Default value application
+++++++++++++++++++++++++
The application of defaults to an instance is done in the Cluster
object, via two new methods as follows:
- ``Cluster.FillHV(instance)``, returns 'filled' hvparams dict, based on
instance's hvparams and cluster's ``hvparams[instance.hypervisor]``
- ``Cluster.FillBE(instance, be_type="default")``, which returns the
beparams dict, based on the instance and cluster beparams
The FillHV/BE transformations will be used, for example, in the
RpcRunner when sending an instance for activation/stop, and the sent
instance hvparams/beparams will have the final value (noded code doesn't
know about defaults).
LU code will need to self-call the transformation, if needed.
Opcode changes
++++++++++++++
The parameter changes will have impact on the OpCodes, especially on
the following ones:
- ``OpInstanceCreate``, where the new hv and be parameters will be sent
as dictionaries; note that all hv and be parameters are now optional,
as the values can be instead taken from the cluster
- ``OpInstanceQuery``, where we have to be able to query these new
parameters; the syntax for names will be ``hvparam/$NAME`` and
``beparam/$NAME`` for querying an individual parameter out of one
dictionary, and ``hvparams``, respectively ``beparams``, for the whole
dictionaries
- ``OpModifyInstance``, where the the modified parameters are sent as
dictionaries
Additionally, we will need new OpCodes to modify the cluster-level
defaults for the be/hv sets of parameters.
Caveats
+++++++
One problem that might appear is that our classification is not
complete or not good enough, and we'll need to change this model. As
the last resort, we will need to rollback and keep 1.2 style.
Another problem is that classification of one parameter is unclear
(e.g. ``network_port``, is this BE or HV?); in this case we'll take
the risk of having to move parameters later between classes.
Security
++++++++
The only security issue that we foresee is if some new parameters will
have sensitive value. If so, we will need to have a way to export the
config data while purging the sensitive value.
E.g. for the drbd shared secrets, we could export these with the
values replaced by an empty string.
Node flags
~~~~~~~~~~
Ganeti 2.0 adds three node flags that change the way nodes are handled
within Ganeti and the related infrastructure (iallocator interaction,
RAPI data export).
*master candidate* flag
+++++++++++++++++++++++
Ganeti 2.0 allows more scalability in operation by introducing
parallelization. However, a new bottleneck is reached that is the
synchronization and replication of cluster configuration to all nodes
in the cluster.
This breaks scalability as the speed of the replication decreases
roughly with the size of the nodes in the cluster. The goal of the
master candidate flag is to change this O(n) into O(1) with respect to
job and configuration data propagation.
Only nodes having this flag set (let's call this set of nodes the
*candidate pool*) will have jobs and configuration data replicated.
The cluster will have a new parameter (runtime changeable) called
``candidate_pool_size`` which represents the number of candidates the
cluster tries to maintain (preferably automatically).
This will impact the cluster operations as follows:
- jobs and config data will be replicated only to a fixed set of nodes
- master fail-over will only be possible to a node in the candidate pool
- cluster verify needs changing to account for these two roles
- external scripts will no longer have access to the configuration
file (this is not recommended anyway)
The caveats of this change are:
- if all candidates are lost (completely), cluster configuration is
lost (but it should be backed up external to the cluster anyway)
- failed nodes which are candidate must be dealt with properly, so
that we don't lose too many candidates at the same time; this will be
reported in cluster verify
- the 'all equal' concept of ganeti is no longer true
- the partial distribution of config data means that all nodes will
have to revert to ssconf files for master info (as in 1.2)
Advantages:
- speed on a 100+ nodes simulated cluster is greatly enhanced, even
for a simple operation; ``gnt-instance remove`` on a diskless instance
remove goes from ~9seconds to ~2 seconds
- node failure of non-candidates will be less impacting on the cluster
The default value for the candidate pool size will be set to 10 but
this can be changed at cluster creation and modified any time later.
Testing on simulated big clusters with sequential and parallel jobs
show that this value (10) is a sweet-spot from performance and load
point of view.
*offline* flag
++++++++++++++
In order to support better the situation in which nodes are offline
(e.g. for repair) without altering the cluster configuration, Ganeti
needs to be told and needs to properly handle this state for nodes.
This will result in simpler procedures, and less mistakes, when the
amount of node failures is high on an absolute scale (either due to
high failure rate or simply big clusters).
Nodes having this attribute set will not be contacted for inter-node
RPC calls, will not be master candidates, and will not be able to host
instances as primaries.
Setting this attribute on a node:
- will not be allowed if the node is the master
- will not be allowed if the node has primary instances
- will cause the node to be demoted from the master candidate role (if
it was), possibly causing another node to be promoted to that role
This attribute will impact the cluster operations as follows:
- querying these nodes for anything will fail instantly in the RPC
library, with a specific RPC error (RpcResult.offline == True)
- they will be listed in the Other section of cluster verify
The code is changed in the following ways:
- RPC calls were be converted to skip such nodes:
- RpcRunner-instance-based RPC calls are easy to convert
- static/classmethod RPC calls are harder to convert, and were left
alone
- the RPC results were unified so that this new result state (offline)
can be differentiated
- master voting still queries in repair nodes, as we need to ensure
consistency in case the (wrong) masters have old data, and nodes have
come back from repairs
Caveats:
- some operation semantics are less clear (e.g. what to do on instance
start with offline secondary?); for now, these will just fail as if
the flag is not set (but faster)
- 2-node cluster with one node offline needs manual startup of the
master with a special flag to skip voting (as the master can't get a
quorum there)
One of the advantages of implementing this flag is that it will allow
in the future automation tools to automatically put the node in
repairs and recover from this state, and the code (should/will) handle
this much better than just timing out. So, future possible
improvements (for later versions):
- watcher will detect nodes which fail RPC calls, will attempt to ssh
to them, if failure will put them offline
- watcher will try to ssh and query the offline nodes, if successful
will take them off the repair list
Alternatives considered: The RPC call model in 2.0 is, by default,
much nicer - errors are logged in the background, and job/opcode
execution is clearer, so we could simply not introduce this. However,
having this state will make both the codepaths clearer (offline
vs. temporary failure) and the operational model (it's not a node with
errors, but an offline node).
*drained* flag
++++++++++++++
Due to parallel execution of jobs in Ganeti 2.0, we could have the
following situation:
- gnt-node migrate + failover is run
- gnt-node evacuate is run, which schedules a long-running 6-opcode
job for the node
- partway through, a new job comes in that runs an iallocator script,
which finds the above node as empty and a very good candidate
- gnt-node evacuate has finished, but now it has to be run again, to
clean the above instance(s)
In order to prevent this situation, and to be able to get nodes into
proper offline status easily, a new *drained* flag was added to the
nodes.
This flag (which actually means "is being, or was drained, and is
expected to go offline"), will prevent allocations on the node, but
otherwise all other operations (start/stop instance, query, etc.) are
working without any restrictions.
Interaction between flags
+++++++++++++++++++++++++
While these flags are implemented as separate flags, they are
mutually-exclusive and are acting together with the master node role
as a single *node status* value. In other words, a flag is only in one
of these roles at a given time. The lack of any of these flags denote
a regular node.
The current node status is visible in the ``gnt-cluster verify``
output, and the individual flags can be examined via separate flags in
the ``gnt-node list`` output.
These new flags will be exported in both the iallocator input message
and via RAPI, see the respective man pages for the exact names.
Feature changes
---------------
The main feature-level changes will be:
- a number of disk related changes
- removal of fixed two-disk, one-nic per instance limitation
Disk handling changes
~~~~~~~~~~~~~~~~~~~~~
The storage options available in Ganeti 1.x were introduced based on
then-current software (first DRBD 0.7 then later DRBD 8) and the
estimated usage patters. However, experience has later shown that some
assumptions made initially are not true and that more flexibility is
needed.
One main assumption made was that disk failures should be treated as
'rare' events, and that each of them needs to be manually handled in
order to ensure data safety; however, both these assumptions are false:
- disk failures can be a common occurrence, based on usage patterns or
cluster size
- our disk setup is robust enough (referring to DRBD8 + LVM) that we
could automate more of the recovery
Note that we still don't have fully-automated disk recovery as a goal,
but our goal is to reduce the manual work needed.
As such, we plan the following main changes:
- DRBD8 is much more flexible and stable than its previous version
(0.7), such that removing the support for the ``remote_raid1``
template and focusing only on DRBD8 is easier
- dynamic discovery of DRBD devices is not actually needed in a cluster
that where the DRBD namespace is controlled by Ganeti; switching to a
static assignment (done at either instance creation time or change
secondary time) will change the disk activation time from O(n) to
O(1), which on big clusters is a significant gain
- remove the hard dependency on LVM (currently all available storage
types are ultimately backed by LVM volumes) by introducing file-based
storage
Additionally, a number of smaller enhancements are also planned:
- support variable number of disks
- support read-only disks
Future enhancements in the 2.x series, which do not require base design
changes, might include:
- enhancement of the LVM allocation method in order to try to keep
all of an instance's virtual disks on the same physical
disks
- add support for DRBD8 authentication at handshake time in
order to ensure each device connects to the correct peer
- remove the restrictions on failover only to the secondary
which creates very strict rules on cluster allocation
DRBD minor allocation
+++++++++++++++++++++
Currently, when trying to identify or activate a new DRBD (or MD)
device, the code scans all in-use devices in order to see if we find
one that looks similar to our parameters and is already in the desired
state or not. Since this needs external commands to be run, it is very
slow when more than a few devices are already present.
Therefore, we will change the discovery model from dynamic to
static. When a new device is logically created (added to the
configuration) a free minor number is computed from the list of
devices that should exist on that node and assigned to that
device.
At device activation, if the minor is already in use, we check if
it has our parameters; if not so, we just destroy the device (if
possible, otherwise we abort) and start it with our own
parameters.
This means that we in effect take ownership of the minor space for
that device type; if there's a user-created DRBD minor, it will be
automatically removed.
The change will have the effect of reducing the number of external
commands run per device from a constant number times the index of the
first free DRBD minor to just a constant number.
Removal of obsolete device types (MD, DRBD7)
++++++++++++++++++++++++++++++++++++++++++++
We need to remove these device types because of two issues. First,
DRBD7 has bad failure modes in case of dual failures (both network and
disk - it cannot propagate the error up the device stack and instead
just panics. Second, due to the asymmetry between primary and
secondary in MD+DRBD mode, we cannot do live failover (not even if we
had MD+DRBD8).
File-based storage support
++++++++++++++++++++++++++
Using files instead of logical volumes for instance storage would
allow us to get rid of the hard requirement for volume groups for
testing clusters and it would also allow usage of SAN storage to do
live failover taking advantage of this storage solution.
Better LVM allocation
+++++++++++++++++++++
Currently, the LV to PV allocation mechanism is a very simple one: at
each new request for a logical volume, tell LVM to allocate the volume
in order based on the amount of free space. This is good for
simplicity and for keeping the usage equally spread over the available
physical disks, however it introduces a problem that an instance could
end up with its (currently) two drives on two physical disks, or
(worse) that the data and metadata for a DRBD device end up on
different drives.
This is bad because it causes unneeded ``replace-disks`` operations in
case of a physical failure.
The solution is to batch allocations for an instance and make the LVM
handling code try to allocate as close as possible all the storage of
one instance. We will still allow the logical volumes to spill over to
additional disks as needed.
Note that this clustered allocation can only be attempted at initial
instance creation, or at change secondary node time. At add disk time,
or at replacing individual disks, it's not easy enough to compute the
current disk map so we'll not attempt the clustering.
DRBD8 peer authentication at handshake
++++++++++++++++++++++++++++++++++++++
DRBD8 has a new feature that allow authentication of the peer at
connect time. We can use this to prevent connecting to the wrong peer
more that securing the connection. Even though we never had issues
with wrong connections, it would be good to implement this.
LVM self-repair (optional)
++++++++++++++++++++++++++
The complete failure of a physical disk is very tedious to
troubleshoot, mainly because of the many failure modes and the many
steps needed. We can safely automate some of the steps, more
specifically the ``vgreduce --removemissing`` using the following
method:
#. check if all nodes have consistent volume groups
#. if yes, and previous status was yes, do nothing
#. if yes, and previous status was no, save status and restart
#. if no, and previous status was no, do nothing
#. if no, and previous status was yes:
#. if more than one node is inconsistent, do nothing
#. if only one node is inconsistent:
#. run ``vgreduce --removemissing``
#. log this occurrence in the Ganeti log in a form that
can be used for monitoring
#. [FUTURE] run ``replace-disks`` for all
instances affected
Failover to any node
++++++++++++++++++++
With a modified disk activation sequence, we can implement the
*failover to any* functionality, removing many of the layout
restrictions of a cluster:
- the need to reserve memory on the current secondary: this gets reduced
to a must to reserve memory anywhere on the cluster
- the need to first failover and then replace secondary for an
instance: with failover-to-any, we can directly failover to
another node, which also does the replace disks at the same
step
In the following, we denote the current primary by P1, the current
secondary by S1, and the new primary and secondaries by P2 and S2. P2
is fixed to the node the user chooses, but the choice of S2 can be
made between P1 and S1. This choice can be constrained, depending on
which of P1 and S1 has failed.
- if P1 has failed, then S1 must become S2, and live migration is not
possible
- if S1 has failed, then P1 must become S2, and live migration could be
possible (in theory, but this is not a design goal for 2.0)
The algorithm for performing the failover is straightforward:
- verify that S2 (the node the user has chosen to keep as secondary) has
valid data (is consistent)
- tear down the current DRBD association and setup a DRBD pairing
between P2 (P2 is indicated by the user) and S2; since P2 has no data,
it will start re-syncing from S2
- as soon as P2 is in state SyncTarget (i.e. after the resync has
started but before it has finished), we can promote it to primary role
(r/w) and start the instance on P2
- as soon as the P2?S2 sync has finished, we can remove
the old data on the old node that has not been chosen for
S2
Caveats: during the P2?S2 sync, a (non-transient) network error
will cause I/O errors on the instance, so (if a longer instance
downtime is acceptable) we can postpone the restart of the instance
until the resync is done. However, disk I/O errors on S2 will cause
data loss, since we don't have a good copy of the data anymore, so in
this case waiting for the sync to complete is not an option. As such,
it is recommended that this feature is used only in conjunction with
proper disk monitoring.
Live migration note: While failover-to-any is possible for all choices
of S2, migration-to-any is possible only if we keep P1 as S2.
Caveats
+++++++
The dynamic device model, while more complex, has an advantage: it
will not reuse by mistake the DRBD device of another instance, since
it always looks for either our own or a free one.
The static one, in contrast, will assume that given a minor number N,
it's ours and we can take over. This needs careful implementation such
that if the minor is in use, either we are able to cleanly shut it
down, or we abort the startup. Otherwise, it could be that we start
syncing between two instance's disks, causing data loss.
Variable number of disk/NICs per instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Variable number of disks
++++++++++++++++++++++++
In order to support high-security scenarios (for example read-only sda
and read-write sdb), we need to make a fully flexibly disk
definition. This has less impact that it might look at first sight:
only the instance creation has hard coded number of disks, not the disk
handling code. The block device handling and most of the instance
handling code is already working with "the instance's disks" as
opposed to "the two disks of the instance", but some pieces are not
(e.g. import/export) and the code needs a review to ensure safety.
The objective is to be able to specify the number of disks at
instance creation, and to be able to toggle from read-only to
read-write a disk afterward.
Variable number of NICs
+++++++++++++++++++++++
Similar to the disk change, we need to allow multiple network
interfaces per instance. This will affect the internal code (some
function will have to stop assuming that ``instance.nics`` is a list
of length one), the OS API which currently can export/import only one
instance, and the command line interface.
Interface changes
-----------------
There are two areas of interface changes: API-level changes (the OS
interface and the RAPI interface) and the command line interface
changes.
OS interface
~~~~~~~~~~~~
The current Ganeti OS interface, version 5, is tailored for Ganeti 1.2.
The interface is composed by a series of scripts which get called with
certain parameters to perform OS-dependent operations on the cluster.
The current scripts are:
create
called when a new instance is added to the cluster
export
called to export an instance disk to a stream
import
called to import from a stream to a new instance
rename
called to perform the os-specific operations necessary for renaming an
instance
Currently these scripts suffer from the limitations of Ganeti 1.2: for
example they accept exactly one block and one swap devices to operate
on, rather than any amount of generic block devices, they blindly assume
that an instance will have just one network interface to operate, they
can not be configured to optimise the instance for a particular
hypervisor.
Since in Ganeti 2.0 we want to support multiple hypervisors, and a
non-fixed number of network and disks the OS interface need to change to
transmit the appropriate amount of information about an instance to its
managing operating system, when operating on it. Moreover since some old
assumptions usually used in OS scripts are no longer valid we need to
re-establish a common knowledge on what can be assumed and what cannot
be regarding Ganeti environment.
When designing the new OS API our priorities are:
- ease of use
- future extensibility
- ease of porting from the old API
- modularity
As such we want to limit the number of scripts that must be written to
support an OS, and make it easy to share code between them by uniforming
their input. We also will leave the current script structure unchanged,
as far as we can, and make a few of the scripts (import, export and
rename) optional. Most information will be passed to the script through
environment variables, for ease of access and at the same time ease of
using only the information a script needs.
The Scripts
+++++++++++
As in Ganeti 1.2, every OS which wants to be installed in Ganeti needs
to support the following functionality, through scripts:
create:
used to create a new instance running that OS. This script should
prepare the block devices, and install them so that the new OS can
boot under the specified hypervisor.
export (optional):
used to export an installed instance using the given OS to a format
which can be used to import it back into a new instance.
import (optional):
used to import an exported instance into a new one. This script is
similar to create, but the new instance should have the content of the
export, rather than contain a pristine installation.
rename (optional):
used to perform the internal OS-specific operations needed to rename
an instance.
If any optional script is not implemented Ganeti will refuse to perform
the given operation on instances using the non-implementing OS. Of
course the create script is mandatory, and it doesn't make sense to
support the either the export or the import operation but not both.
Incompatibilities with 1.2
__________________________
We expect the following incompatibilities between the OS scripts for 1.2
and the ones for 2.0:
- Input parameters: in 1.2 those were passed on the command line, in 2.0
we'll use environment variables, as there will be a lot more
information and not all OSes may care about all of it.
- Number of calls: export scripts will be called once for each device
the instance has, and import scripts once for every exported disk.
Imported instances will be forced to have a number of disks greater or
equal to the one of the export.
- Some scripts are not compulsory: if such a script is missing the
relevant operations will be forbidden for instances of that OS. This
makes it easier to distinguish between unsupported operations and
no-op ones (if any).
Input
_____
Rather than using command line flags, as they do now, scripts will
accept inputs from environment variables. We expect the following input
values:
OS_API_VERSION
The version of the OS API that the following parameters comply with;
this is used so that in the future we could have OSes supporting
multiple versions and thus Ganeti send the proper version in this
parameter
INSTANCE_NAME
Name of the instance acted on
HYPERVISOR
The hypervisor the instance should run on (e.g. 'xen-pvm', 'xen-hvm',
'kvm')
DISK_COUNT
The number of disks this instance will have
NIC_COUNT
The number of NICs this instance will have
DISK__PATH
Path to the Nth disk.
DISK__ACCESS
W if read/write, R if read only. OS scripts are not supposed to touch
read-only disks, but will be passed them to know.
DISK__FRONTEND_TYPE
Type of the disk as seen by the instance. Can be 'scsi', 'ide',
'virtio'
DISK__BACKEND_TYPE
Type of the disk as seen from the node. Can be 'block', 'file:loop' or
'file:blktap'
NIC__MAC
Mac address for the Nth network interface
NIC__IP
Ip address for the Nth network interface, if available
NIC__BRIDGE
Node bridge the Nth network interface will be connected to
NIC__FRONTEND_TYPE
Type of the Nth NIC as seen by the instance. For example 'virtio',
'rtl8139', etc.
DEBUG_LEVEL
Whether more out should be produced, for debugging purposes. Currently
the only valid values are 0 and 1.
These are only the basic variables we are thinking of now, but more
may come during the implementation and they will be documented in the
:manpage:`ganeti-os-interface(7)` man page. All these variables will be
available to all scripts.
Some scripts will need a few more information to work. These will have
per-script variables, such as for example:
OLD_INSTANCE_NAME
rename: the name the instance should be renamed from.
EXPORT_DEVICE
export: device to be exported, a snapshot of the actual device. The
data must be exported to stdout.
EXPORT_INDEX
export: sequential number of the instance device targeted.
IMPORT_DEVICE
import: device to send the data to, part of the new instance. The data
must be imported from stdin.
IMPORT_INDEX
import: sequential number of the instance device targeted.
(Rationale for INSTANCE_NAME as an environment variable: the instance
name is always needed and we could pass it on the command line. On the
other hand, though, this would force scripts to both access the
environment and parse the command line, so we'll move it for
uniformity.)
Output/Behaviour
________________
As discussed scripts should only send user-targeted information to
stderr. The create and import scripts are supposed to format/initialise
the given block devices and install the correct instance data. The
export script is supposed to export instance data to stdout in a format
understandable by the the import script. The data will be compressed by
Ganeti, so no compression should be done. The rename script should only
modify the instance's knowledge of what its name is.
Other declarative style features
++++++++++++++++++++++++++++++++
Similar to Ganeti 1.2, OS specifications will need to provide a
'ganeti_api_version' containing list of numbers matching the
version(s) of the API they implement. Ganeti itself will always be
compatible with one version of the API and may maintain backwards
compatibility if it's feasible to do so. The numbers are one-per-line,
so an OS supporting both version 5 and version 20 will have a file
containing two lines. This is different from Ganeti 1.2, which only
supported one version number.
In addition to that an OS will be able to declare that it does support
only a subset of the Ganeti hypervisors, by declaring them in the
'hypervisors' file.
Caveats/Notes
+++++++++++++
We might want to have a "default" import/export behaviour that just
dumps all disks and restores them. This can save work as most systems
will just do this, while allowing flexibility for different systems.
Environment variables are limited in size, but we expect that there will
be enough space to store the information we need. If we discover that
this is not the case we may want to go to a more complex API such as
storing those information on the filesystem and providing the OS script
with the path to a file where they are encoded in some format.
Remote API changes
~~~~~~~~~~~~~~~~~~
The first Ganeti remote API (RAPI) was designed and deployed with the
Ganeti 1.2.5 release. That version provide read-only access to the
cluster state. Fully functional read-write API demands significant
internal changes which will be implemented in version 2.0.
We decided to go with implementing the Ganeti RAPI in a RESTful way,
which is aligned with key features we looking. It is simple,
stateless, scalable and extensible paradigm of API implementation. As
transport it uses HTTP over SSL, and we are implementing it with JSON
encoding, but in a way it possible to extend and provide any other
one.
Design
++++++
The Ganeti RAPI is implemented as independent daemon, running on the
same node with the same permission level as Ganeti master
daemon. Communication is done through the LUXI library to the master
daemon. In order to keep communication asynchronous RAPI processes two
types of client requests:
- queries: server is able to answer immediately
- job submission: some time is required for a useful response
In the query case requested data send back to client in the HTTP
response body. Typical examples of queries would be: list of nodes,
instances, cluster info, etc.
In the case of job submission, the client receive a job ID, the
identifier which allows one to query the job progress in the job queue
(see `Job Queue`_).
Internally, each exported object has an version identifier, which is
used as a state identifier in the HTTP header E-Tag field for
requests/responses to avoid race conditions.
Resource representation
+++++++++++++++++++++++
The key difference of using REST instead of others API is that REST
requires separation of services via resources with unique URIs. Each
of them should have limited amount of state and support standard HTTP
methods: GET, POST, DELETE, PUT.
For example in Ganeti's case we can have a set of URI:
- ``/{clustername}/instances``
- ``/{clustername}/instances/{instancename}``
- ``/{clustername}/instances/{instancename}/tag``
- ``/{clustername}/tag``
A GET request to ``/{clustername}/instances`` will return the list of
instances, a POST to ``/{clustername}/instances`` should create a new
instance, a DELETE ``/{clustername}/instances/{instancename}`` should
delete the instance, a GET ``/{clustername}/tag`` should return get
cluster tags.
Each resource URI will have a version prefix. The resource IDs are to
be determined.
Internal encoding might be JSON, XML, or any other. The JSON encoding
fits nicely in Ganeti RAPI needs. The client can request a specific
representation via the Accept field in the HTTP header.
REST uses HTTP as its transport and application protocol for resource
access. The set of possible responses is a subset of standard HTTP
responses.
The statelessness model provides additional reliability and
transparency to operations (e.g. only one request needs to be analyzed
to understand the in-progress operation, not a sequence of multiple
requests/responses).
Security
++++++++
With the write functionality security becomes a much bigger an issue.
The Ganeti RAPI uses basic HTTP authentication on top of an
SSL-secured connection to grant access to an exported resource. The
password is stored locally in an Apache-style ``.htpasswd`` file. Only
one level of privileges is supported.
Caveats
+++++++
The model detailed above for job submission requires the client to
poll periodically for updates to the job; an alternative would be to
allow the client to request a callback, or a 'wait for updates' call.
The callback model was not considered due to the following two issues:
- callbacks would require a new model of allowed callback URLs,
together with a method of managing these
- callbacks only work when the client and the master are in the same
security domain, and they fail in the other cases (e.g. when there is
a firewall between the client and the RAPI daemon that only allows
client-to-RAPI calls, which is usual in DMZ cases)
The 'wait for updates' method is not suited to the HTTP protocol,
where requests are supposed to be short-lived.
Command line changes
~~~~~~~~~~~~~~~~~~~~
Ganeti 2.0 introduces several new features as well as new ways to
handle instance resources like disks or network interfaces. This
requires some noticeable changes in the way command line arguments are
handled.
- extend and modify command line syntax to support new features
- ensure consistent patterns in command line arguments to reduce
cognitive load
The design changes that require these changes are, in no particular
order:
- flexible instance disk handling: support a variable number of disks
with varying properties per instance,
- flexible instance network interface handling: support a variable
number of network interfaces with varying properties per instance
- multiple hypervisors: multiple hypervisors can be active on the same
cluster, each supporting different parameters,
- support for device type CDROM (via ISO image)
As such, there are several areas of Ganeti where the command line
arguments will change:
- Cluster configuration
- cluster initialization
- cluster default configuration
- Instance configuration
- handling of network cards for instances,
- handling of disks for instances,
- handling of CDROM devices and
- handling of hypervisor specific options.
There are several areas of Ganeti where the command line arguments
will change:
- Cluster configuration
- cluster initialization
- cluster default configuration
- Instance configuration
- handling of network cards for instances,
- handling of disks for instances,
- handling of CDROM devices and
- handling of hypervisor specific options.
Notes about device removal/addition
+++++++++++++++++++++++++++++++++++
To avoid problems with device location changes (e.g. second network
interface of the instance becoming the first or third and the like)
the list of network/disk devices is treated as a stack, i.e. devices
can only be added/removed at the end of the list of devices of each
class (disk or network) for each instance.
gnt-instance commands
+++++++++++++++++++++
The commands for gnt-instance will be modified and extended to allow
for the new functionality:
- the add command will be extended to support the new device and
hypervisor options,
- the modify command continues to handle all modifications to
instances, but will be extended with new arguments for handling
devices.
Network Device Options
++++++++++++++++++++++
The generic format of the network device option is:
--net $DEVNUM[:$OPTION=$VALUE][,$OPTION=VALUE]
:$DEVNUM: device number, unsigned integer, starting at 0,
:$OPTION: device option, string,
:$VALUE: device option value, string.
Currently, the following device options will be defined (open to
further changes):
:mac: MAC address of the network interface, accepts either a valid
MAC address or the string 'auto'. If 'auto' is specified, a new MAC
address will be generated randomly. If the mac device option is not
specified, the default value 'auto' is assumed.
:bridge: network bridge the network interface is connected
to. Accepts either a valid bridge name (the specified bridge must
exist on the node(s)) as string or the string 'auto'. If 'auto' is
specified, the default brigde is used. If the bridge option is not
specified, the default value 'auto' is assumed.
Disk Device Options
+++++++++++++++++++
The generic format of the disk device option is:
--disk $DEVNUM[:$OPTION=$VALUE][,$OPTION=VALUE]
:$DEVNUM: device number, unsigned integer, starting at 0,
:$OPTION: device option, string,
:$VALUE: device option value, string.
Currently, the following device options will be defined (open to
further changes):
:size: size of the disk device, either a positive number, specifying
the disk size in mebibytes, or a number followed by a magnitude suffix
(M for mebibytes, G for gibibytes). Also accepts the string 'auto' in
which case the default disk size will be used. If the size option is
not specified, 'auto' is assumed. This option is not valid for all
disk layout types.
:access: access mode of the disk device, a single letter, valid values
are:
- *w*: read/write access to the disk device or
- *r*: read-only access to the disk device.
If the access mode is not specified, the default mode of read/write
access will be configured.
:path: path to the image file for the disk device, string. No default
exists. This option is not valid for all disk layout types.
Adding devices
++++++++++++++
To add devices to an already existing instance, use the device type
specific option to gnt-instance modify. Currently, there are two
device type specific options supported:
:--net: for network interface cards
:--disk: for disk devices
The syntax to the device specific options is similar to the generic
device options, but instead of specifying a device number like for
gnt-instance add, you specify the magic string add. The new device
will always be appended at the end of the list of devices of this type
for the specified instance, e.g. if the instance has disk devices 0,1
and 2, the newly added disk device will be disk device 3.
Example: gnt-instance modify --net add:mac=auto test-instance
Removing devices
++++++++++++++++
Removing devices from and instance is done via gnt-instance
modify. The same device specific options as for adding instances are
used. Instead of a device number and further device options, only the
magic string remove is specified. It will always remove the last
device in the list of devices of this type for the instance specified,
e.g. if the instance has disk devices 0, 1, 2 and 3, the disk device
number 3 will be removed.
Example: gnt-instance modify --net remove test-instance
Modifying devices
+++++++++++++++++
Modifying devices is also done with device type specific options to
the gnt-instance modify command. There are currently two device type
options supported:
:--net: for network interface cards
:--disk: for disk devices
The syntax to the device specific options is similar to the generic
device options. The device number you specify identifies the device to
be modified.
Example::
gnt-instance modify --disk 2:access=r
Hypervisor Options
++++++++++++++++++
Ganeti 2.0 will support more than one hypervisor. Different
hypervisors have various options that only apply to a specific
hypervisor. Those hypervisor specific options are treated specially
via the ``--hypervisor`` option. The generic syntax of the hypervisor
option is as follows::
--hypervisor $HYPERVISOR:$OPTION=$VALUE[,$OPTION=$VALUE]
:$HYPERVISOR: symbolic name of the hypervisor to use, string,
has to match the supported hypervisors. Example: xen-pvm
:$OPTION: hypervisor option name, string
:$VALUE: hypervisor option value, string
The hypervisor option for an instance can be set on instance creation
time via the ``gnt-instance add`` command. If the hypervisor for an
instance is not specified upon instance creation, the default
hypervisor will be used.
Modifying hypervisor parameters
+++++++++++++++++++++++++++++++
The hypervisor parameters of an existing instance can be modified
using ``--hypervisor`` option of the ``gnt-instance modify``
command. However, the hypervisor type of an existing instance can not
be changed, only the particular hypervisor specific option can be
changed. Therefore, the format of the option parameters has been
simplified to omit the hypervisor name and only contain the comma
separated list of option-value pairs.
Example::
gnt-instance modify --hypervisor cdrom=/srv/boot.iso,boot_order=cdrom:network test-instance
gnt-cluster commands
++++++++++++++++++++
The command for gnt-cluster will be extended to allow setting and
changing the default parameters of the cluster:
- The init command will be extend to support the defaults option to
set the cluster defaults upon cluster initialization.
- The modify command will be added to modify the cluster
parameters. It will support the --defaults option to change the
cluster defaults.
Cluster defaults
The generic format of the cluster default setting option is:
--defaults $OPTION=$VALUE[,$OPTION=$VALUE]
:$OPTION: cluster default option, string,
:$VALUE: cluster default option value, string.
Currently, the following cluster default options are defined (open to
further changes):
:hypervisor: the default hypervisor to use for new instances,
string. Must be a valid hypervisor known to and supported by the
cluster.
:disksize: the disksize for newly created instance disks, where
applicable. Must be either a positive number, in which case the unit
of megabyte is assumed, or a positive number followed by a supported
magnitude symbol (M for megabyte or G for gigabyte).
:bridge: the default network bridge to use for newly created instance
network interfaces, string. Must be a valid bridge name of a bridge
existing on the node(s).
Hypervisor cluster defaults
+++++++++++++++++++++++++++
The generic format of the hypervisor cluster wide default setting
option is::
--hypervisor-defaults $HYPERVISOR:$OPTION=$VALUE[,$OPTION=$VALUE]
:$HYPERVISOR: symbolic name of the hypervisor whose defaults you want
to set, string
:$OPTION: cluster default option, string,
:$VALUE: cluster default option value, string.
.. vim: set textwidth=72 :
ganeti-2.15.2/doc/design-2.1.rst 0000644 0000000 0000000 00000137761 12634264163 0016142 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.1 design
=================
This document describes the major changes in Ganeti 2.1 compared to
the 2.0 version.
The 2.1 version will be a relatively small release. Its main aim is to
avoid changing too much of the core code, while addressing issues and
adding new features and improvements over 2.0, in a timely fashion.
.. contents:: :depth: 4
Objective
=========
Ganeti 2.1 will add features to help further automatization of cluster
operations, further improve scalability to even bigger clusters, and
make it easier to debug the Ganeti core.
Detailed design
===============
As for 2.0 we divide the 2.1 design into three areas:
- core changes, which affect the master daemon/job queue/locking or
all/most logical units
- logical unit/feature changes
- external interface changes (eg. command line, os api, hooks, ...)
Core changes
------------
Storage units modelling
~~~~~~~~~~~~~~~~~~~~~~~
Currently, Ganeti has a good model of the block devices for instances
(e.g. LVM logical volumes, files, DRBD devices, etc.) but none of the
storage pools that are providing the space for these front-end
devices. For example, there are hardcoded inter-node RPC calls for
volume group listing, file storage creation/deletion, etc.
The storage units framework will implement a generic handling for all
kinds of storage backends:
- LVM physical volumes
- LVM volume groups
- File-based storage directories
- any other future storage method
There will be a generic list of methods that each storage unit type
will provide, like:
- list of storage units of this type
- check status of the storage unit
Additionally, there will be specific methods for each method, for
example:
- enable/disable allocations on a specific PV
- file storage directory creation/deletion
- VG consistency fixing
This will allow a much better modeling and unification of the various
RPC calls related to backend storage pool in the future. Ganeti 2.1 is
intended to add the basics of the framework, and not necessarilly move
all the curent VG/FileBased operations to it.
Note that while we model both LVM PVs and LVM VGs, the framework will
**not** model any relationship between the different types. In other
words, we don't model neither inheritances nor stacking, since this is
too complex for our needs. While a ``vgreduce`` operation on a LVM VG
could actually remove a PV from it, this will not be handled at the
framework level, but at individual operation level. The goal is that
this is a lightweight framework, for abstracting the different storage
operation, and not for modelling the storage hierarchy.
Locking improvements
~~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
The class ``LockSet`` (see ``lib/locking.py``) is a container for one or
many ``SharedLock`` instances. It provides an interface to add/remove
locks and to acquire and subsequently release any number of those locks
contained in it.
Locks in a ``LockSet`` are always acquired in alphabetic order. Due to
the way we're using locks for nodes and instances (the single cluster
lock isn't affected by this issue) this can lead to long delays when
acquiring locks if another operation tries to acquire multiple locks but
has to wait for yet another operation.
In the following demonstration we assume to have the instance locks
``inst1``, ``inst2``, ``inst3`` and ``inst4``.
#. Operation A grabs lock for instance ``inst4``.
#. Operation B wants to acquire all instance locks in alphabetic order,
but it has to wait for ``inst4``.
#. Operation C tries to lock ``inst1``, but it has to wait until
Operation B (which is trying to acquire all locks) releases the lock
again.
#. Operation A finishes and releases lock on ``inst4``. Operation B can
continue and eventually releases all locks.
#. Operation C can get ``inst1`` lock and finishes.
Technically there's no need for Operation C to wait for Operation A, and
subsequently Operation B, to finish. Operation B can't continue until
Operation A is done (it has to wait for ``inst4``), anyway.
Proposed changes
++++++++++++++++
Non-blocking lock acquiring
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Acquiring locks for OpCode execution is always done in blocking mode.
They won't return until the lock has successfully been acquired (or an
error occurred, although we won't cover that case here).
``SharedLock`` and ``LockSet`` must be able to be acquired in a
non-blocking way. They must support a timeout and abort trying to
acquire the lock(s) after the specified amount of time.
Retry acquiring locks
^^^^^^^^^^^^^^^^^^^^^
To prevent other operations from waiting for a long time, such as
described in the demonstration before, ``LockSet`` must not keep locks
for a prolonged period of time when trying to acquire two or more locks.
Instead it should, with an increasing timeout for acquiring all locks,
release all locks again and sleep some time if it fails to acquire all
requested locks.
A good timeout value needs to be determined. In any case should
``LockSet`` proceed to acquire locks in blocking mode after a few
(unsuccessful) attempts to acquire all requested locks.
One proposal for the timeout is to use ``2**tries`` seconds, where
``tries`` is the number of unsuccessful tries.
In the demonstration before this would allow Operation C to continue
after Operation B unsuccessfully tried to acquire all locks and released
all acquired locks (``inst1``, ``inst2`` and ``inst3``) again.
Other solutions discussed
+++++++++++++++++++++++++
There was also some discussion on going one step further and extend the
job queue (see ``lib/jqueue.py``) to select the next task for a worker
depending on whether it can acquire the necessary locks. While this may
reduce the number of necessary worker threads and/or increase throughput
on large clusters with many jobs, it also brings many potential
problems, such as contention and increased memory usage, with it. As
this would be an extension of the changes proposed before it could be
implemented at a later point in time, but we decided to stay with the
simpler solution for now.
Implementation details
++++++++++++++++++++++
``SharedLock`` redesign
^^^^^^^^^^^^^^^^^^^^^^^
The current design of ``SharedLock`` is not good for supporting timeouts
when acquiring a lock and there are also minor fairness issues in it. We
plan to address both with a redesign. A proof of concept implementation
was written and resulted in significantly simpler code.
Currently ``SharedLock`` uses two separate queues for shared and
exclusive acquires and waiters get to run in turns. This means if an
exclusive acquire is released, the lock will allow shared waiters to run
and vice versa. Although it's still fair in the end there is a slight
bias towards shared waiters in the current implementation. The same
implementation with two shared queues can not support timeouts without
adding a lot of complexity.
Our proposed redesign changes ``SharedLock`` to have only one single
queue. There will be one condition (see Condition_ for a note about
performance) in the queue per exclusive acquire and two for all shared
acquires (see below for an explanation). The maximum queue length will
always be ``2 + (number of exclusive acquires waiting)``. The number of
queue entries for shared acquires can vary from 0 to 2.
The two conditions for shared acquires are a bit special. They will be
used in turn. When the lock is instantiated, no conditions are in the
queue. As soon as the first shared acquire arrives (and there are
holder(s) or waiting acquires; see Acquire_), the active condition is
added to the queue. Until it becomes the topmost condition in the queue
and has been notified, any shared acquire is added to this active
condition. When the active condition is notified, the conditions are
swapped and further shared acquires are added to the previously inactive
condition (which has now become the active condition). After all waiters
on the previously active (now inactive) and now notified condition
received the notification, it is removed from the queue of pending
acquires.
This means shared acquires will skip any exclusive acquire in the queue.
We believe it's better to improve parallelization on operations only
asking for shared (or read-only) locks. Exclusive operations holding the
same lock can not be parallelized.
Acquire
*******
For exclusive acquires a new condition is created and appended to the
queue. Shared acquires are added to the active condition for shared
acquires and if the condition is not yet on the queue, it's appended.
The next step is to wait for our condition to be on the top of the queue
(to guarantee fairness). If the timeout expired, we return to the caller
without acquiring the lock. On every notification we check whether the
lock has been deleted, in which case an error is returned to the caller.
The lock can be acquired if we're on top of the queue (there is no one
else ahead of us). For an exclusive acquire, there must not be other
exclusive or shared holders. For a shared acquire, there must not be an
exclusive holder. If these conditions are all true, the lock is
acquired and we return to the caller. In any other case we wait again on
the condition.
If it was the last waiter on a condition, the condition is removed from
the queue.
Optimization: There's no need to touch the queue if there are no pending
acquires and no current holders. The caller can have the lock
immediately.
.. digraph:: "design-2.1-lock-acquire"
graph[fontsize=8, fontname="Helvetica"]
node[fontsize=8, fontname="Helvetica", width="0", height="0"]
edge[fontsize=8, fontname="Helvetica"]
/* Actions */
abort[label="Abort\n(couldn't acquire)"]
acquire[label="Acquire lock"]
add_to_queue[label="Add condition to queue"]
wait[label="Wait for notification"]
remove_from_queue[label="Remove from queue"]
/* Conditions */
alone[label="Empty queue\nand can acquire?", shape=diamond]
have_timeout[label="Do I have\ntimeout?", shape=diamond]
top_of_queue_and_can_acquire[
label="On top of queue and\ncan acquire lock?",
shape=diamond,
]
/* Lines */
alone->acquire[label="Yes"]
alone->add_to_queue[label="No"]
have_timeout->abort[label="Yes"]
have_timeout->wait[label="No"]
top_of_queue_and_can_acquire->acquire[label="Yes"]
top_of_queue_and_can_acquire->have_timeout[label="No"]
add_to_queue->wait
wait->top_of_queue_and_can_acquire
acquire->remove_from_queue
Release
*******
First the lock removes the caller from the internal owner list. If there
are pending acquires in the queue, the first (the oldest) condition is
notified.
If the first condition was the active condition for shared acquires, the
inactive condition will be made active. This ensures fairness with
exclusive locks by forcing consecutive shared acquires to wait in the
queue.
.. digraph:: "design-2.1-lock-release"
graph[fontsize=8, fontname="Helvetica"]
node[fontsize=8, fontname="Helvetica", width="0", height="0"]
edge[fontsize=8, fontname="Helvetica"]
/* Actions */
remove_from_owners[label="Remove from owner list"]
notify[label="Notify topmost"]
swap_shared[label="Swap shared conditions"]
success[label="Success"]
/* Conditions */
have_pending[label="Any pending\nacquires?", shape=diamond]
was_active_queue[
label="Was active condition\nfor shared acquires?",
shape=diamond,
]
/* Lines */
remove_from_owners->have_pending
have_pending->notify[label="Yes"]
have_pending->success[label="No"]
notify->was_active_queue
was_active_queue->swap_shared[label="Yes"]
was_active_queue->success[label="No"]
swap_shared->success
Delete
******
The caller must either hold the lock in exclusive mode already or the
lock must be acquired in exclusive mode. Trying to delete a lock while
it's held in shared mode must fail.
After ensuring the lock is held in exclusive mode, the lock will mark
itself as deleted and continue to notify all pending acquires. They will
wake up, notice the deleted lock and return an error to the caller.
Condition
^^^^^^^^^
Note: This is not necessary for the locking changes above, but it may be
a good optimization (pending performance tests).
The existing locking code in Ganeti 2.0 uses Python's built-in
``threading.Condition`` class. Unfortunately ``Condition`` implements
timeouts by sleeping 1ms to 20ms between tries to acquire the condition
lock in non-blocking mode. This requires unnecessary context switches
and contention on the CPython GIL (Global Interpreter Lock).
By using POSIX pipes (see ``pipe(2)``) we can use the operating system's
support for timeouts on file descriptors (see ``select(2)``). A custom
condition class will have to be written for this.
On instantiation the class creates a pipe. After each notification the
previous pipe is abandoned and re-created (technically the old pipe
needs to stay around until all notifications have been delivered).
All waiting clients of the condition use ``select(2)`` or ``poll(2)`` to
wait for notifications, optionally with a timeout. A notification will
be signalled to the waiting clients by closing the pipe. If the pipe
wasn't closed during the timeout, the waiting function returns to its
caller nonetheless.
Node daemon availability
~~~~~~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
Currently, when a Ganeti node suffers serious system disk damage, the
migration/failover of an instance may not correctly shutdown the virtual
machine on the broken node causing instances duplication. The ``gnt-node
powercycle`` command can be used to force a node reboot and thus to
avoid duplicated instances. This command relies on node daemon
availability, though, and thus can fail if the node daemon has some
pages swapped out of ram, for example.
Proposed changes
++++++++++++++++
The proposed solution forces node daemon to run exclusively in RAM. It
uses python ctypes to to call ``mlockall(MCL_CURRENT | MCL_FUTURE)`` on
the node daemon process and all its children. In addition another log
handler has been implemented for node daemon to redirect to
``/dev/console`` messages that cannot be written on the logfile.
With these changes node daemon can successfully run basic tasks such as
a powercycle request even when the system disk is heavily damaged and
reading/writing to disk fails constantly.
New Features
------------
Automated Ganeti Cluster Merger
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current situation
+++++++++++++++++
Currently there's no easy way to merge two or more clusters together.
But in order to optimize resources this is a needed missing piece. The
goal of this design doc is to come up with a easy to use solution which
allows you to merge two or more cluster together.
Initial contact
+++++++++++++++
As the design of Ganeti is based on an autonomous system, Ganeti by
itself has no way to reach nodes outside of its cluster. To overcome
this situation we're required to prepare the cluster before we can go
ahead with the actual merge: We've to replace at least the ssh keys on
the affected nodes before we can do any operation within ``gnt-``
commands.
To make this a automated process we'll ask the user to provide us with
the root password of every cluster we've to merge. We use the password
to grab the current ``id_dsa`` key and then rely on that ssh key for any
further communication to be made until the cluster is fully merged.
Cluster merge
+++++++++++++
After initial contact we do the cluster merge:
1. Grab the list of nodes
2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys``
3. Stop all instances running on the merging cluster
4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons
5. Stop all Ganeti daemons on all merging nodes
6. Grab the ``config.data`` from the master of the merging cluster
7. Stop local ``ganeti-masterd``
8. Merge the config:
1. Open our own cluster ``config.data``
2. Open cluster ``config.data`` of the merging cluster
3. Grab all nodes of the merging cluster
4. Set ``master_candidate`` to false on all merging nodes
5. Add the nodes to our own cluster ``config.data``
6. Grab all the instances on the merging cluster
7. Adjust the port if the instance has drbd layout:
1. In ``logical_id`` (index 2)
2. In ``physical_id`` (index 1 and 3)
8. Add the instances to our own cluster ``config.data``
9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it``
10. ``gnt-node add --readd`` on all merging nodes
11. ``gnt-cluster redist-conf``
12. Restart ``ganeti-masterd`` normally
13. Enable ``ganeti-watcher`` again
14. Start all merging instances again
Rollback
++++++++
Until we actually (re)add any nodes we can abort and rollback the merge
at any point. After merging the config, though, we've to get the backup
copy of ``config.data`` (from another master candidate node). And for
security reasons it's a good idea to undo ``id_dsa.pub`` distribution by
going on every affected node and remove the ``id_dsa.pub`` key again.
Also we've to keep in mind, that we've to start the Ganeti daemons and
starting up the instances again.
Verification
++++++++++++
Last but not least we should verify that the merge was successful.
Therefore we run ``gnt-cluster verify``, which ensures that the cluster
overall is in a healthy state. Additional it's also possible to compare
the list of instances/nodes with a list made prior to the upgrade to
make sure we didn't lose any data/instance/node.
Appendix
++++++++
cluster-merge.py
^^^^^^^^^^^^^^^^
Used to merge the cluster config. This is a POC and might differ from
actual production code.
::
#!/usr/bin/python
import sys
from ganeti import config
from ganeti import constants
c_mine = config.ConfigWriter(offline=True)
c_other = config.ConfigWriter(sys.argv[1])
fake_id = 0
for node in c_other.GetNodeList():
node_info = c_other.GetNodeInfo(node)
node_info.master_candidate = False
c_mine.AddNode(node_info, str(fake_id))
fake_id += 1
for instance in c_other.GetInstanceList():
instance_info = c_other.GetInstanceInfo(instance)
for dsk in instance_info.disks:
if dsk.dev_type in constants.LDS_DRBD:
port = c_mine.AllocatePort()
logical_id = list(dsk.logical_id)
logical_id[2] = port
dsk.logical_id = tuple(logical_id)
physical_id = list(dsk.physical_id)
physical_id[1] = physical_id[3] = port
dsk.physical_id = tuple(physical_id)
c_mine.AddInstance(instance_info, str(fake_id))
fake_id += 1
Feature changes
---------------
Ganeti Confd
~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
In Ganeti 2.0 all nodes are equal, but some are more equal than others.
In particular they are divided between "master", "master candidates" and
"normal". (Moreover they can be offline or drained, but this is not
important for the current discussion). In general the whole
configuration is only replicated to master candidates, and some partial
information is spread to all nodes via ssconf.
This change was done so that the most frequent Ganeti operations didn't
need to contact all nodes, and so clusters could become bigger. If we
want more information to be available on all nodes, we need to add more
ssconf values, which is counter-balancing the change, or to talk with
the master node, which is not designed to happen now, and requires its
availability.
Information such as the instance->primary_node mapping will be needed on
all nodes, and we also want to make sure services external to the
cluster can query this information as well. This information must be
available at all times, so we can't query it through RAPI, which would
be a single point of failure, as it's only available on the master.
Proposed changes
++++++++++++++++
In order to allow fast and highly available access read-only to some
configuration values, we'll create a new ganeti-confd daemon, which will
run on master candidates. This daemon will talk via UDP, and
authenticate messages using HMAC with a cluster-wide shared key. This
key will be generated at cluster init time, and stored on the clusters
alongside the ganeti SSL keys, and readable only by root.
An interested client can query a value by making a request to a subset
of the cluster master candidates. It will then wait to get a few
responses, and use the one with the highest configuration serial number.
Since the configuration serial number is increased each time the ganeti
config is updated, and the serial number is included in all answers,
this can be used to make sure to use the most recent answer, in case
some master candidates are stale or in the middle of a configuration
update.
In order to prevent replay attacks queries will contain the current unix
timestamp according to the client, and the server will verify that its
timestamp is in the same 5 minutes range (this requires synchronized
clocks, which is a good idea anyway). Queries will also contain a "salt"
which they expect the answers to be sent with, and clients are supposed
to accept only answers which contain salt generated by them.
The configuration daemon will be able to answer simple queries such as:
- master candidates list
- master node
- offline nodes
- instance list
- instance primary nodes
Wire protocol
^^^^^^^^^^^^^
A confd query will look like this, on the wire::
plj0{
"msg": "{\"type\": 1,
\"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\",
\"protocol\": 1,
\"query\": \"node1.example.com\"}\n",
"salt": "1249637704",
"hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f"
}
``plj0`` is a fourcc that details the message content. It stands for plain
json 0, and can be changed as we move on to different type of protocols
(for example protocol buffers, or encrypted json). What follows is a
json encoded string, with the following fields:
- ``msg`` contains a JSON-encoded query, its fields are:
- ``protocol``, integer, is the confd protocol version (initially
just ``constants.CONFD_PROTOCOL_VERSION``, with a value of 1)
- ``type``, integer, is the query type. For example "node role by
name" or "node primary ip by instance ip". Constants will be
provided for the actual available query types
- ``query`` is a multi-type field (depending on the ``type`` field):
- it can be missing, when the request is fully determined by the
``type`` field
- it can contain a string which denotes the search key: for
example an IP, or a node name
- it can contain a dictionary, in which case the actual details
vary further per request type
- ``rsalt``, string, is the required response salt; the client must
use it to recognize which answer it's getting.
- ``salt`` must be the current unix timestamp, according to the
client; servers should refuse messages which have a wrong timing,
according to their configuration and clock
- ``hmac`` is an hmac signature of salt+msg, with the cluster hmac key
If an answer comes back (which is optional, since confd works over UDP)
it will be in this format::
plj0{
"msg": "{\"status\": 0,
\"answer\": 0,
\"serial\": 42,
\"protocol\": 1}\n",
"salt": "9aa6ce92-8336-11de-af38-001d093e835f",
"hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af"
}
Where:
- ``plj0`` the message type magic fourcc, as discussed above
- ``msg`` contains a JSON-encoded answer, its fields are:
- ``protocol``, integer, is the confd protocol version (initially
just constants.CONFD_PROTOCOL_VERSION, with a value of 1)
- ``status``, integer, is the error code; initially just ``0`` for
'ok' or ``1`` for 'error' (in which case answer contains an error
detail, rather than an answer), but in the future it may be
expanded to have more meanings (e.g. ``2`` if the answer is
compressed)
- ``answer``, is the actual answer; its type and meaning is query
specific: for example for "node primary ip by instance ip" queries
it will be a string containing an IP address, for "node role by
name" queries it will be an integer which encodes the role
(master, candidate, drained, offline) according to constants
- ``salt`` is the requested salt from the query; a client can use it
to recognize what query the answer is answering.
- ``hmac`` is an hmac signature of salt+msg, with the cluster hmac key
Redistribute Config
~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
Currently LUClusterRedistConf triggers a copy of the updated
configuration file to all master candidates and of the ssconf files to
all nodes. There are other files which are maintained manually but which
are important to keep in sync. These are:
- rapi SSL key certificate file (rapi.pem) (on master candidates)
- rapi user/password file rapi_users (on master candidates)
Furthermore there are some files which are hypervisor specific but we
may want to keep in sync:
- the xen-hvm hypervisor uses one shared file for all vnc passwords, and
copies the file once, during node add. This design is subject to
revision to be able to have different passwords for different groups
of instances via the use of hypervisor parameters, and to allow
xen-hvm and kvm to use an equal system to provide password-protected
vnc sessions. In general, though, it would be useful if the vnc
password files were copied as well, to avoid unwanted vnc password
changes on instance failover/migrate.
Optionally the admin may want to also ship files such as the global
xend.conf file, and the network scripts to all nodes.
Proposed changes
++++++++++++++++
RedistributeConfig will be changed to copy also the rapi files, and to
call every enabled hypervisor asking for a list of additional files to
copy. Users will have the possibility to populate a file containing a
list of files to be distributed; this file will be propagated as well.
Such solution is really simple to implement and it's easily usable by
scripts.
This code will be also shared (via tasklets or by other means, if
tasklets are not ready for 2.1) with the AddNode and SetNodeParams LUs
(so that the relevant files will be automatically shipped to new master
candidates as they are set).
VNC Console Password
~~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
Currently just the xen-hvm hypervisor supports setting a password to
connect the the instances' VNC console, and has one common password
stored in a file.
This doesn't allow different passwords for different instances/groups of
instances, and makes it necessary to remember to copy the file around
the cluster when the password changes.
Proposed changes
++++++++++++++++
We'll change the VNC password file to a vnc_password_file hypervisor
parameter. This way it can have a cluster default, but also a different
value for each instance. The VNC enabled hypervisors (xen and kvm) will
publish all the password files in use through the cluster so that a
redistribute-config will ship them to all nodes (see the Redistribute
Config proposed changes above).
The current VNC_PASSWORD_FILE constant will be removed, but its value
will be used as the default HV_VNC_PASSWORD_FILE value, thus retaining
backwards compatibility with 2.0.
The code to export the list of VNC password files from the hypervisors
to RedistributeConfig will be shared between the KVM and xen-hvm
hypervisors.
Disk/Net parameters
~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
Currently disks and network interfaces have a few tweakable options and
all the rest is left to a default we chose. We're finding that we need
more and more to tweak some of these parameters, for example to disable
barriers for DRBD devices, or allow striping for the LVM volumes.
Moreover for many of these parameters it will be nice to have
cluster-wide defaults, and then be able to change them per
disk/interface.
Proposed changes
++++++++++++++++
We will add new cluster level diskparams and netparams, which will
contain all the tweakable parameters. All values which have a sensible
cluster-wide default will go into this new structure while parameters
which have unique values will not.
Example of network parameters:
- mode: bridge/route
- link: for mode "bridge" the bridge to connect to, for mode route it
can contain the routing table, or the destination interface
Example of disk parameters:
- stripe: lvm stripes
- stripe_size: lvm stripe size
- meta_flushes: drbd, enable/disable metadata "barriers"
- data_flushes: drbd, enable/disable data "barriers"
Some parameters are bound to be disk-type specific (drbd, vs lvm, vs
files) or hypervisor specific (nic models for example), but for now they
will all live in the same structure. Each component is supposed to
validate only the parameters it knows about, and ganeti itself will make
sure that no "globally unknown" parameters are added, and that no
parameters have overridden meanings for different components.
The parameters will be kept, as for the BEPARAMS into a "default"
category, which will allow us to expand on by creating instance
"classes" in the future. Instance classes is not a feature we plan
implementing in 2.1, though.
Global hypervisor parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
Currently all hypervisor parameters are modifiable both globally
(cluster level) and at instance level. However, there is no other
framework to held hypervisor-specific parameters, so if we want to add
a new class of hypervisor parameters that only makes sense on a global
level, we have to change the hvparams framework.
Proposed changes
++++++++++++++++
We add a new (global, not per-hypervisor) list of parameters which are
not changeable on a per-instance level. The create, modify and query
instance operations are changed to not allow/show these parameters.
Furthermore, to allow transition of parameters to the global list, and
to allow cleanup of inadverdently-customised parameters, the
``UpgradeConfig()`` method of instances will drop any such parameters
from their list of hvparams, such that a restart of the master daemon
is all that is needed for cleaning these up.
Also, the framework is simple enough that if we need to replicate it
at beparams level we can do so easily.
Non bridged instances support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
Currently each instance NIC must be connected to a bridge, and if the
bridge is not specified the default cluster one is used. This makes it
impossible to use the vif-route xen network scripts, or other
alternative mechanisms that don't need a bridge to work.
Proposed changes
++++++++++++++++
The new "mode" network parameter will distinguish between bridged
interfaces and routed ones.
When mode is "bridge" the "link" parameter will contain the bridge the
instance should be connected to, effectively making things as today. The
value has been migrated from a nic field to a parameter to allow for an
easier manipulation of the cluster default.
When mode is "route" the ip field of the interface will become
mandatory, to allow for a route to be set. In the future we may want
also to accept multiple IPs or IP/mask values for this purpose. We will
evaluate possible meanings of the link parameter to signify a routing
table to be used, which would allow for insulation between instance
groups (as today happens for different bridges).
For now we won't add a parameter to specify which network script gets
called for which instance, so in a mixed cluster the network script must
be able to handle both cases. The default kvm vif script will be changed
to do so. (Xen doesn't have a ganeti provided script, so nothing will be
done for that hypervisor)
Introducing persistent UUIDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current state and shortcomings
++++++++++++++++++++++++++++++
Some objects in the Ganeti configurations are tracked by their name
while also supporting renames. This creates an extra difficulty,
because neither Ganeti nor external management tools can then track
the actual entity, and due to the name change it behaves like a new
one.
Proposed changes part 1
+++++++++++++++++++++++
We will change Ganeti to use UUIDs for entity tracking, but in a
staggered way. In 2.1, we will simply add an “uuid†attribute to each
of the instances, nodes and cluster itself. This will be reported on
instance creation for nodes, and on node adds for the nodes. It will
be of course avaiblable for querying via the OpNodeQuery/Instance and
cluster information, and via RAPI as well.
Note that Ganeti will not provide any way to change this attribute.
Upgrading from Ganeti 2.0 will automatically add an ‘uuid’ attribute
to all entities missing it.
Proposed changes part 2
+++++++++++++++++++++++
In the next release (e.g. 2.2), the tracking of objects will change
from the name to the UUID internally, and externally Ganeti will
accept both forms of identification; e.g. an RAPI call would be made
either against ``/2/instances/foo.bar`` or against
``/2/instances/bb3b2e42…``. Since an FQDN must have at least a dot,
and dots are not valid characters in UUIDs, we will not have namespace
issues.
Another change here is that node identification (during cluster
operations/queries like master startup, “am I the master?†and
similar) could be done via UUIDs which is more stable than the current
hostname-based scheme.
Internal tracking refers to the way the configuration is stored; a
DRBD disk of an instance refers to the node name (so that IPs can be
changed easily), but this is still a problem for name changes; thus
these will be changed to point to the node UUID to ease renames.
The advantages of this change (after the second round of changes), is
that node rename becomes trivial, whereas today node rename would
require a complete lock of all instances.
Automated disk repairs infrastructure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Replacing defective disks in an automated fashion is quite difficult
with the current version of Ganeti. These changes will introduce
additional functionality and interfaces to simplify automating disk
replacements on a Ganeti node.
Fix node volume group
+++++++++++++++++++++
This is the most difficult addition, as it can lead to dataloss if it's
not properly safeguarded.
The operation must be done only when all the other nodes that have
instances in common with the target node are fine, i.e. this is the only
node with problems, and also we have to double-check that all instances
on this node have at least a good copy of the data.
This might mean that we have to enhance the GetMirrorStatus calls, and
introduce and a smarter version that can tell us more about the status
of an instance.
Stop allocation on a given PV
+++++++++++++++++++++++++++++
This is somewhat simple. First we need a "list PVs" opcode (and its
associated logical unit) and then a set PV status opcode/LU. These in
combination should allow both checking and changing the disk/PV status.
Instance disk status
++++++++++++++++++++
This new opcode or opcode change must list the instance-disk-index and
node combinations of the instance together with their status. This will
allow determining what part of the instance is broken (if any).
Repair instance
+++++++++++++++
This new opcode/LU/RAPI call will run ``replace-disks -p`` as needed, in
order to fix the instance status. It only affects primary instances;
secondaries can just be moved away.
Migrate node
++++++++++++
This new opcode/LU/RAPI call will take over the current ``gnt-node
migrate`` code and run migrate for all instances on the node.
Evacuate node
++++++++++++++
This new opcode/LU/RAPI call will take over the current ``gnt-node
evacuate`` code and run replace-secondary with an iallocator script for
all instances on the node.
User-id pool
~~~~~~~~~~~~
In order to allow running different processes under unique user-ids
on a node, we introduce the user-id pool concept.
The user-id pool is a cluster-wide configuration parameter.
It is a list of user-ids and/or user-id ranges that are reserved
for running Ganeti processes (including KVM instances).
The code guarantees that on a given node a given user-id is only
handed out if there is no other process running with that user-id.
Please note, that this can only be guaranteed if all processes in
the system - that run under a user-id belonging to the pool - are
started by reserving a user-id first. That can be accomplished
either by using the RequestUnusedUid() function to get an unused
user-id or by implementing the same locking mechanism.
Implementation
++++++++++++++
The functions that are specific to the user-id pool feature are located
in a separate module: ``lib/uidpool.py``.
Storage
^^^^^^^
The user-id pool is a single cluster parameter. It is stored in the
*Cluster* object under the ``uid_pool`` name as a list of integer
tuples. These tuples represent the boundaries of user-id ranges.
For single user-ids, the boundaries are equal.
The internal user-id pool representation is converted into a
string: a newline separated list of user-ids or user-id ranges.
This string representation is distributed to all the nodes via the
*ssconf* mechanism. This means that the user-id pool can be
accessed in a read-only way on any node without consulting the master
node or master candidate nodes.
Initial value
^^^^^^^^^^^^^
The value of the user-id pool cluster parameter can be initialized
at cluster initialization time using the
``gnt-cluster init --uid-pool ...``
command.
As there is no sensible default value for the user-id pool parameter,
it is initialized to an empty list if no ``--uid-pool`` option is
supplied at cluster init time.
If the user-id pool is empty, the user-id pool feature is considered
to be disabled.
Manipulation
^^^^^^^^^^^^
The user-id pool cluster parameter can be modified from the
command-line with the following commands:
- ``gnt-cluster modify --uid-pool ``
- ``gnt-cluster modify --add-uids ``
- ``gnt-cluster modify --remove-uids ``
The ``--uid-pool`` option overwrites the current setting with the
supplied ````, while
``--add-uids``/``--remove-uids`` adds/removes the listed uids
or uid-ranges from the pool.
The ```` should be a comma-separated list of
user-ids or user-id ranges. A range should be defined by a lower and
a higher boundary. The boundaries should be separated with a dash.
The boundaries are inclusive.
The ```` is parsed into the internal
representation, sanity-checked and stored in the ``uid_pool``
attribute of the *Cluster* object.
It is also immediately converted into a string (formatted in the
input format) and distributed to all nodes via the *ssconf* mechanism.
Inspection
^^^^^^^^^^
The current value of the user-id pool cluster parameter is printed
by the ``gnt-cluster info`` command.
The output format is accepted by the ``gnt-cluster modify --uid-pool``
command.
Locking
^^^^^^^
The ``uidpool.py`` module provides a function (``RequestUnusedUid``)
for requesting an unused user-id from the pool.
This will try to find a random user-id that is not currently in use.
The algorithm is the following:
1) Randomize the list of user-ids in the user-id pool
2) Iterate over this randomized UID list
3) Create a lock file (it doesn't matter if it already exists)
4) Acquire an exclusive POSIX lock on the file, to provide mutual
exclusion for the following non-atomic operations
5) Check if there is a process in the system with the given UID
6) If there isn't, return the UID, otherwise unlock the file and
continue the iteration over the user-ids
The user can than start a new process with this user-id.
Once a process is successfully started, the exclusive POSIX lock can
be released, but the lock file will remain in the filesystem.
The presence of such a lock file means that the given user-id is most
probably in use. The lack of a uid lock file does not guarantee that
there are no processes with that user-id.
After acquiring the exclusive POSIX lock, ``RequestUnusedUid``
always performs a check to see if there is a process running with the
given uid.
A user-id can be returned to the pool, by calling the
``ReleaseUid`` function. This will remove the corresponding lock file.
Note, that it doesn't check if there is any process still running
with that user-id. The removal of the lock file only means that there
are most probably no processes with the given user-id. This helps
in speeding up the process of finding a user-id that is guaranteed to
be unused.
There is a convenience function, called ``ExecWithUnusedUid`` that
wraps the execution of a function (or any callable) that requires a
unique user-id. ``ExecWithUnusedUid`` takes care of requesting an
unused user-id and unlocking the lock file. It also automatically
returns the user-id to the pool if the callable raises an exception.
Code examples
+++++++++++++
Requesting a user-id from the pool:
::
from ganeti import ssconf
from ganeti import uidpool
# Get list of all user-ids in the uid-pool from ssconf
ss = ssconf.SimpleStore()
uid_pool = uidpool.ParseUidPool(ss.GetUidPool(), separator="\n")
all_uids = set(uidpool.ExpandUidPool(uid_pool))
uid = uidpool.RequestUnusedUid(all_uids)
try:
# Once the process is started, we can release the file lock
uid.Unlock()
except ..., err:
# Return the UID to the pool
uidpool.ReleaseUid(uid)
Releasing a user-id:
::
from ganeti import uidpool
uid =
uidpool.ReleaseUid(uid)
External interface changes
--------------------------
OS API
~~~~~~
The OS API of Ganeti 2.0 has been built with extensibility in mind.
Since we pass everything as environment variables it's a lot easier to
send new information to the OSes without breaking retrocompatibility.
This section of the design outlines the proposed extensions to the API
and their implementation.
API Version Compatibility Handling
++++++++++++++++++++++++++++++++++
In 2.1 there will be a new OS API version (eg. 15), which should be
mostly compatible with api 10, except for some new added variables.
Since it's easy not to pass some variables we'll be able to handle
Ganeti 2.0 OSes by just filtering out the newly added piece of
information. We will still encourage OSes to declare support for the new
API after checking that the new variables don't provide any conflict for
them, and we will drop api 10 support after ganeti 2.1 has released.
New Environment variables
+++++++++++++++++++++++++
Some variables have never been added to the OS api but would definitely
be useful for the OSes. We plan to add an INSTANCE_HYPERVISOR variable
to allow the OS to make changes relevant to the virtualization the
instance is going to use. Since this field is immutable for each
instance, the os can tight the install without caring of making sure the
instance can run under any virtualization technology.
We also want the OS to know the particular hypervisor parameters, to be
able to customize the install even more. Since the parameters can
change, though, we will pass them only as an "FYI": if an OS ties some
instance functionality to the value of a particular hypervisor parameter
manual changes or a reinstall may be needed to adapt the instance to the
new environment. This is not a regression as of today, because even if
the OSes are left blind about this information, sometimes they still
need to make compromises and cannot satisfy all possible parameter
values.
OS Variants
+++++++++++
Currently we are assisting to some degree of "os proliferation" just to
change a simple installation behavior. This means that the same OS gets
installed on the cluster multiple times, with different names, to
customize just one installation behavior. Usually such OSes try to share
as much as possible through symlinks, but this still causes
complications on the user side, especially when multiple parameters must
be cross-matched.
For example today if you want to install debian etch, lenny or squeeze
you probably need to install the debootstrap OS multiple times, changing
its configuration file, and calling it debootstrap-etch,
debootstrap-lenny or debootstrap-squeeze. Furthermore if you have for
example a "server" and a "development" environment which installs
different packages/configuration files and must be available for all
installs you'll probably end up with deboostrap-etch-server,
debootstrap-etch-dev, debootrap-lenny-server, debootstrap-lenny-dev,
etc. Crossing more than two parameters quickly becomes not manageable.
In order to avoid this we plan to make OSes more customizable, by
allowing each OS to declare a list of variants which can be used to
customize it. The variants list is mandatory and must be written, one
variant per line, in the new "variants.list" file inside the main os
dir. At least one supported variant must be supported. When choosing the
OS exactly one variant will have to be specified, and will be encoded in
the os name as +. As for today it will be possible to
change an instance's OS at creation or install time.
The 2.1 OS list will be the combination of each OS, plus its supported
variants. This will cause the name name proliferation to remain, but at
least the internal OS code will be simplified to just parsing the passed
variant, without the need for symlinks or code duplication.
Also we expect the OSes to declare only "interesting" variants, but to
accept some non-declared ones which a user will be able to pass in by
overriding the checks ganeti does. This will be useful for allowing some
variations to be used without polluting the OS list (per-OS
documentation should list all supported variants). If a variant which is
not internally supported is forced through, the OS scripts should abort.
In the future (post 2.1) we may want to move to full fledged parameters
all orthogonal to each other (for example "architecture" (i386, amd64),
"suite" (lenny, squeeze, ...), etc). (As opposed to the variant, which
is a single parameter, and you need a different variant for all the set
of combinations you want to support). In this case we envision the
variants to be moved inside of Ganeti and be associated with lists
parameter->values associations, which will then be passed to the OS.
IAllocator changes
~~~~~~~~~~~~~~~~~~
Current State and shortcomings
++++++++++++++++++++++++++++++
The iallocator interface allows creation of instances without manually
specifying nodes, but instead by specifying plugins which will do the
required computations and produce a valid node list.
However, the interface is quite akward to use:
- one cannot set a 'default' iallocator script
- one cannot use it to easily test if allocation would succeed
- some new functionality, such as rebalancing clusters and calculating
capacity estimates is needed
Proposed changes
++++++++++++++++
There are two area of improvements proposed:
- improving the use of the current interface
- extending the IAllocator API to cover more automation
Default iallocator names
^^^^^^^^^^^^^^^^^^^^^^^^
The cluster will hold, for each type of iallocator, a (possibly empty)
list of modules that will be used automatically.
If the list is empty, the behaviour will remain the same.
If the list has one entry, then ganeti will behave as if
'--iallocator' was specifyed on the command line. I.e. use this
allocator by default. If the user however passed nodes, those will be
used in preference.
If the list has multiple entries, they will be tried in order until
one gives a successful answer.
Dry-run allocation
^^^^^^^^^^^^^^^^^^
The create instance LU will get a new 'dry-run' option that will just
simulate the placement, and return the chosen node-lists after running
all the usual checks.
Cluster balancing
^^^^^^^^^^^^^^^^^
Instance add/removals/moves can create a situation where load on the
nodes is not spread equally. For this, a new iallocator mode will be
implemented called ``balance`` in which the plugin, given the current
cluster state, and a maximum number of operations, will need to
compute the instance relocations needed in order to achieve a "better"
(for whatever the script believes it's better) cluster.
Cluster capacity calculation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this mode, called ``capacity``, given an instance specification and
the current cluster state (similar to the ``allocate`` mode), the
plugin needs to return:
- how many instances can be allocated on the cluster with that
specification
- on which nodes these will be allocated (in order)
.. vim: set textwidth=72 :
ganeti-2.15.2/doc/design-2.10.rst 0000644 0000000 0000000 00000000732 12634264163 0016205 0 ustar 00root root 0000000 0000000 ==================
Ganeti 2.10 design
==================
The following design documents have been implemented in Ganeti 2.10.
- :doc:`design-cmdlib-unittests`
- :doc:`design-hotplug`
- :doc:`design-openvswitch`
- :doc:`design-performance-tests`
- :doc:`design-storagetypes`
- :doc:`design-upgrade`
The following designs have been partially implemented in Ganeti 2.10.
- :doc:`design-ceph-ganeti-support`
- :doc:`design-internal-shutdown`
- :doc:`design-query-splitting`
ganeti-2.15.2/doc/design-2.11.rst 0000644 0000000 0000000 00000000466 12634264163 0016212 0 ustar 00root root 0000000 0000000 ==================
Ganeti 2.11 design
==================
The following design documents have been implemented in Ganeti 2.11.
- :doc:`design-internal-shutdown`
- :doc:`design-kvmd`
The following designs have been partially implemented in Ganeti 2.11.
- :doc:`design-node-security`
- :doc:`design-hsqueeze`
ganeti-2.15.2/doc/design-2.12.rst 0000644 0000000 0000000 00000000534 12634264163 0016207 0 ustar 00root root 0000000 0000000 ==================
Ganeti 2.12 design
==================
The following design documents have been implemented in Ganeti 2.12.
- :doc:`design-daemons`
- :doc:`design-systemd`
- :doc:`design-cpu-speed`
The following designs have been partially implemented in Ganeti 2.12.
- :doc:`design-node-security`
- :doc:`design-hsqueeze`
- :doc:`design-os`
ganeti-2.15.2/doc/design-2.13.rst 0000644 0000000 0000000 00000000544 12634264163 0016211 0 ustar 00root root 0000000 0000000 ==================
Ganeti 2.13 design
==================
The following design documents have been implemented in Ganeti 2.13.
- :doc:`design-disk-conversion`
- :doc:`design-optables`
- :doc:`design-hsqueeze`
The following designs have been partially implemented in Ganeti 2.13.
- :doc:`design-location`
- :doc:`design-node-security`
- :doc:`design-os`
ganeti-2.15.2/doc/design-2.14.rst 0000644 0000000 0000000 00000000322 12634264163 0016204 0 ustar 00root root 0000000 0000000 ==================
Ganeti 2.14 design
==================
The following designs have been partially implemented in Ganeti 2.14.
- :doc:`design-location`
- :doc:`design-reservations`
- :doc:`design-configlock`
ganeti-2.15.2/doc/design-2.15.rst 0000644 0000000 0000000 00000000531 12634264163 0016207 0 ustar 00root root 0000000 0000000 ==================
Ganeti 2.15 design
==================
The following designs have been partially implemented in Ganeti 2.15.
- :doc:`design-configlock`
- :doc:`design-shared-storage-redundancy`
The following designs' implementations were completed in Ganeti 2.15.
- :doc:`design-allocation-efficiency`
- :doc:`design-dedicated-allocation`
ganeti-2.15.2/doc/design-2.2.rst 0000644 0000000 0000000 00000111047 12634264163 0016130 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.2 design
=================
This document describes the major changes in Ganeti 2.2 compared to
the 2.1 version.
The 2.2 version will be a relatively small release. Its main aim is to
avoid changing too much of the core code, while addressing issues and
adding new features and improvements over 2.1, in a timely fashion.
.. contents:: :depth: 4
As for 2.1 we divide the 2.2 design into three areas:
- core changes, which affect the master daemon/job queue/locking or
all/most logical units
- logical unit/feature changes
- external interface changes (e.g. command line, OS API, hooks, ...)
Core changes
============
Master Daemon Scaling improvements
----------------------------------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently the Ganeti master daemon is based on four sets of threads:
- The main thread (1 thread) just accepts connections on the master
socket
- The client worker pool (16 threads) handles those connections,
one thread per connected socket, parses luxi requests, and sends data
back to the clients
- The job queue worker pool (25 threads) executes the actual jobs
submitted by the clients
- The rpc worker pool (10 threads) interacts with the nodes via
http-based-rpc
This means that every masterd currently runs 52 threads to do its job.
Being able to reduce the number of thread sets would make the master's
architecture a lot simpler. Moreover having less threads can help
decrease lock contention, log pollution and memory usage.
Also, with the current architecture, masterd suffers from quite a few
scalability issues:
Core daemon connection handling
+++++++++++++++++++++++++++++++
Since the 16 client worker threads handle one connection each, it's very
easy to exhaust them, by just connecting to masterd 16 times and not
sending any data. While we could perhaps make those pools resizable,
increasing the number of threads won't help with lock contention nor
with better handling long running operations making sure the client is
informed that everything is proceeding, and doesn't need to time out.
Wait for job change
+++++++++++++++++++
The REQ_WAIT_FOR_JOB_CHANGE luxi operation makes the relevant client
thread block on its job for a relative long time. This is another easy
way to exhaust the 16 client threads, and a place where clients often
time out, moreover this operation is negative for the job queue lock
contention (see below).
Job Queue lock
++++++++++++++
The job queue lock is quite heavily contended, and certain easily
reproducible workloads show that's it's very easy to put masterd in
trouble: for example running ~15 background instance reinstall jobs,
results in a master daemon that, even without having finished the
client worker threads, can't answer simple job list requests, or
submit more jobs.
Currently the job queue lock is an exclusive non-fair lock insulating
the following job queue methods (called by the client workers).
- AddNode
- RemoveNode
- SubmitJob
- SubmitManyJobs
- WaitForJobChanges
- CancelJob
- ArchiveJob
- AutoArchiveJobs
- QueryJobs
- Shutdown
Moreover the job queue lock is acquired outside of the job queue in two
other classes:
- jqueue._JobQueueWorker (in RunTask) before executing the opcode, after
finishing its executing and when handling an exception.
- jqueue._OpExecCallbacks (in NotifyStart and Feedback) when the
processor (mcpu.Processor) is about to start working on the opcode
(after acquiring the necessary locks) and when any data is sent back
via the feedback function.
Of those the major critical points are:
- Submit[Many]Job, QueryJobs, WaitForJobChanges, which can easily slow
down and block client threads up to making the respective clients
time out.
- The code paths in NotifyStart, Feedback, and RunTask, which slow
down job processing between clients and otherwise non-related jobs.
To increase the pain:
- WaitForJobChanges is a bad offender because it's implemented with a
notified condition which awakes waiting threads, who then try to
acquire the global lock again
- Many should-be-fast code paths are slowed down by replicating the
change to remote nodes, and thus waiting, with the lock held, on
remote rpcs to complete (starting, finishing, and submitting jobs)
Proposed changes
~~~~~~~~~~~~~~~~
In order to be able to interact with the master daemon even when it's
under heavy load, and to make it simpler to add core functionality
(such as an asynchronous rpc client) we propose three subsequent levels
of changes to the master core architecture.
After making this change we'll be able to re-evaluate the size of our
thread pool, if we see that we can make most threads in the client
worker pool always idle. In the future we should also investigate making
the rpc client asynchronous as well, so that we can make masterd a lot
smaller in number of threads, and memory size, and thus also easier to
understand, debug, and scale.
Connection handling
+++++++++++++++++++
We'll move the main thread of ganeti-masterd to asyncore, so that it can
share the mainloop code with all other Ganeti daemons. Then all luxi
clients will be asyncore clients, and I/O to/from them will be handled
by the master thread asynchronously. Data will be read from the client
sockets as it becomes available, and kept in a buffer, then when a
complete message is found, it's passed to a client worker thread for
parsing and processing. The client worker thread is responsible for
serializing the reply, which can then be sent asynchronously by the main
thread on the socket.
Wait for job change
+++++++++++++++++++
The REQ_WAIT_FOR_JOB_CHANGE luxi request is changed to be
subscription-based, so that the executing thread doesn't have to be
waiting for the changes to arrive. Threads producing messages (job queue
executors) will make sure that when there is a change another thread is
awaken and delivers it to the waiting clients. This can be either a
dedicated "wait for job changes" thread or pool, or one of the client
workers, depending on what's easier to implement. In either case the
main asyncore thread will only be involved in pushing of the actual
data, and not in fetching/serializing it.
Other features to look at, when implementing this code are:
- Possibility not to need the job lock to know which updates to push:
if the thread producing the data pushed a copy of the update for the
waiting clients, the thread sending it won't need to acquire the
lock again to fetch the actual data.
- Possibility to signal clients about to time out, when no update has
been received, not to despair and to keep waiting (luxi level
keepalive).
- Possibility to defer updates if they are too frequent, providing
them at a maximum rate (lower priority).
Job Queue lock
++++++++++++++
In order to decrease the job queue lock contention, we will change the
code paths in the following ways, initially:
- A per-job lock will be introduced. All operations affecting only one
job (for example feedback, starting/finishing notifications,
subscribing to or watching a job) will only require the job lock.
This should be a leaf lock, but if a situation arises in which it
must be acquired together with the global job queue lock the global
one must always be acquired last (for the global section).
- The locks will be converted to a sharedlock. Any read-only operation
will be able to proceed in parallel.
- During remote update (which happens already per-job) we'll drop the
job lock level to shared mode, so that activities reading the lock
(for example job change notifications or QueryJobs calls) will be
able to proceed in parallel.
- The wait for job changes improvements proposed above will be
implemented.
In the future other improvements may include splitting off some of the
work (eg replication of a job to remote nodes) to a separate thread pool
or asynchronous thread, not tied with the code path for answering client
requests or the one executing the "real" work. This can be discussed
again after we used the more granular job queue in production and tested
its benefits.
Inter-cluster instance moves
----------------------------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With the current design of Ganeti, moving whole instances between
different clusters involves a lot of manual work. There are several ways
to move instances, one of them being to export the instance, manually
copying all data to the new cluster before importing it again. Manual
changes to the instances configuration, such as the IP address, may be
necessary in the new environment. The goal is to improve and automate
this process in Ganeti 2.2.
Proposed changes
~~~~~~~~~~~~~~~~
Authorization, Authentication and Security
++++++++++++++++++++++++++++++++++++++++++
Until now, each Ganeti cluster was a self-contained entity and wouldn't
talk to other Ganeti clusters. Nodes within clusters only had to trust
the other nodes in the same cluster and the network used for replication
was trusted, too (hence the ability the use a separate, local network
for replication).
For inter-cluster instance transfers this model must be weakened. Nodes
in one cluster will have to talk to nodes in other clusters, sometimes
in other locations and, very important, via untrusted network
connections.
Various option have been considered for securing and authenticating the
data transfer from one machine to another. To reduce the risk of
accidentally overwriting data due to software bugs, authenticating the
arriving data was considered critical. Eventually we decided to use
socat's OpenSSL options (``OPENSSL:``, ``OPENSSL-LISTEN:`` et al), which
provide us with encryption, authentication and authorization when used
with separate keys and certificates.
Combinations of OpenSSH, GnuPG and Netcat were deemed too complex to set
up from within Ganeti. Any solution involving OpenSSH would require a
dedicated user with a home directory and likely automated modifications
to the user's ``$HOME/.ssh/authorized_keys`` file. When using Netcat,
GnuPG or another encryption method would be necessary to transfer the
data over an untrusted network. socat combines both in one program and
is already a dependency.
Each of the two clusters will have to generate an RSA key. The public
parts are exchanged between the clusters by a third party, such as an
administrator or a system interacting with Ganeti via the remote API
("third party" from here on). After receiving each other's public key,
the clusters can start talking to each other.
All encrypted connections must be verified on both sides. Neither side
may accept unverified certificates. The generated certificate should
only be valid for the time necessary to move the instance.
For additional protection of the instance data, the two clusters can
verify the certificates and destination information exchanged via the
third party by checking an HMAC signature using a key shared among the
involved clusters. By default this secret key will be a random string
unique to the cluster, generated by running SHA1 over 20 bytes read from
``/dev/urandom`` and the administrator must synchronize the secrets
between clusters before instances can be moved. If the third party does
not know the secret, it can't forge the certificates or redirect the
data. Unless disabled by a new cluster parameter, verifying the HMAC
signatures must be mandatory. The HMAC signature for X509 certificates
will be prepended to the certificate similar to an :rfc:`822` header and
only covers the certificate (from ``-----BEGIN CERTIFICATE-----`` to
``-----END CERTIFICATE-----``). The header name will be
``X-Ganeti-Signature`` and its value will have the format
``$salt/$hash`` (salt and hash separated by slash). The salt may only
contain characters in the range ``[a-zA-Z0-9]``.
On the web, the destination cluster would be equivalent to an HTTPS
server requiring verifiable client certificates. The browser would be
equivalent to the source cluster and must verify the server's
certificate while providing a client certificate to the server.
Copying data
++++++++++++
To simplify the implementation, we decided to operate at a block-device
level only, allowing us to easily support non-DRBD instance moves.
Intra-cluster instance moves will re-use the existing export and import
scripts supplied by instance OS definitions. Unlike simply copying the
raw data, this allows one to use filesystem-specific utilities to dump
only used parts of the disk and to exclude certain disks from the move.
Compression should be used to further reduce the amount of data
transferred.
The export scripts writes all data to stdout and the import script reads
it from stdin again. To avoid copying data and reduce disk space
consumption, everything is read from the disk and sent over the network
directly, where it'll be written to the new block device directly again.
Workflow
++++++++
#. Third party tells source cluster to shut down instance, asks for the
instance specification and for the public part of an encryption key
- Instance information can already be retrieved using an existing API
(``OpInstanceQueryData``).
- An RSA encryption key and a corresponding self-signed X509
certificate is generated using the "openssl" command. This key will
be used to encrypt the data sent to the destination cluster.
- Private keys never leave the cluster.
- The public part (the X509 certificate) is signed using HMAC with
salting and a secret shared between Ganeti clusters.
#. Third party tells destination cluster to create an instance with the
same specifications as on source cluster and to prepare for an
instance move with the key received from the source cluster and
receives the public part of the destination's encryption key
- The current API to create instances (``OpInstanceCreate``) will be
extended to support an import from a remote cluster.
- A valid, unexpired X509 certificate signed with the destination
cluster's secret will be required. By verifying the signature, we
know the third party didn't modify the certificate.
- The private keys never leave their cluster, hence the third party
can not decrypt or intercept the instance's data by modifying the
IP address or port sent by the destination cluster.
- The destination cluster generates another key and certificate,
signs and sends it to the third party, who will have to pass it to
the API for exporting an instance (``OpBackupExport``). This
certificate is used to ensure we're sending the disk data to the
correct destination cluster.
- Once a disk can be imported, the API sends the destination
information (IP address and TCP port) together with an HMAC
signature to the third party.
#. Third party hands public part of the destination's encryption key
together with all necessary information to source cluster and tells
it to start the move
- The existing API for exporting instances (``OpBackupExport``)
will be extended to export instances to remote clusters.
#. Source cluster connects to destination cluster for each disk and
transfers its data using the instance OS definition's export and
import scripts
- Before starting, the source cluster must verify the HMAC signature
of the certificate and destination information (IP address and TCP
port).
- When connecting to the remote machine, strong certificate checks
must be employed.
#. Due to the asynchronous nature of the whole process, the destination
cluster checks whether all disks have been transferred every time
after transferring a single disk; if so, it destroys the encryption
key
#. After sending all disks, the source cluster destroys its key
#. Destination cluster runs OS definition's rename script to adjust
instance settings if needed (e.g. IP address)
#. Destination cluster starts the instance if requested at the beginning
by the third party
#. Source cluster removes the instance if requested
Instance move in pseudo code
++++++++++++++++++++++++++++
.. highlight:: python
The following pseudo code describes a script moving instances between
clusters and what happens on both clusters.
#. Script is started, gets the instance name and destination cluster::
(instance_name, dest_cluster_name) = sys.argv[1:]
# Get destination cluster object
dest_cluster = db.FindCluster(dest_cluster_name)
# Use database to find source cluster
src_cluster = db.FindClusterByInstance(instance_name)
#. Script tells source cluster to stop instance::
# Stop instance
src_cluster.StopInstance(instance_name)
# Get instance specification (memory, disk, etc.)
inst_spec = src_cluster.GetInstanceInfo(instance_name)
(src_key_name, src_cert) = src_cluster.CreateX509Certificate()
#. ``CreateX509Certificate`` on source cluster::
key_file = mkstemp()
cert_file = "%s.cert" % key_file
RunCmd(["/usr/bin/openssl", "req", "-new",
"-newkey", "rsa:1024", "-days", "1",
"-nodes", "-x509", "-batch",
"-keyout", key_file, "-out", cert_file])
plain_cert = utils.ReadFile(cert_file)
# HMAC sign using secret key, this adds a "X-Ganeti-Signature"
# header to the beginning of the certificate
signed_cert = utils.SignX509Certificate(plain_cert,
utils.ReadFile(constants.X509_SIGNKEY_FILE))
# The certificate now looks like the following:
#
# X-Ganeti-Signature: $1234$28676f0516c6ab68062b[…]
# -----BEGIN CERTIFICATE-----
# MIICsDCCAhmgAwIBAgI[…]
# -----END CERTIFICATE-----
# Return name of key file and signed certificate in PEM format
return (os.path.basename(key_file), signed_cert)
#. Script creates instance on destination cluster and waits for move to
finish::
dest_cluster.CreateInstance(mode=constants.REMOTE_IMPORT,
spec=inst_spec,
source_cert=src_cert)
# Wait until destination cluster gives us its certificate
dest_cert = None
disk_info = []
while not (dest_cert and len(disk_info) < len(inst_spec.disks)):
tmp = dest_cluster.WaitOutput()
if tmp is Certificate:
dest_cert = tmp
elif tmp is DiskInfo:
# DiskInfo contains destination address and port
disk_info[tmp.index] = tmp
# Tell source cluster to export disks
for disk in disk_info:
src_cluster.ExportDisk(instance_name, disk=disk,
key_name=src_key_name,
dest_cert=dest_cert)
print ("Instance %s sucessfully moved to %s" %
(instance_name, dest_cluster.name))
#. ``CreateInstance`` on destination cluster::
# …
if mode == constants.REMOTE_IMPORT:
# Make sure certificate was not modified since it was generated by
# source cluster (which must use the same secret)
if (not utils.VerifySignedX509Cert(source_cert,
utils.ReadFile(constants.X509_SIGNKEY_FILE))):
raise Error("Certificate not signed with this cluster's secret")
if utils.CheckExpiredX509Cert(source_cert):
raise Error("X509 certificate is expired")
source_cert_file = utils.WriteTempFile(source_cert)
# See above for X509 certificate generation and signing
(key_name, signed_cert) = CreateSignedX509Certificate()
SendToClient("x509-cert", signed_cert)
for disk in instance.disks:
# Start socat
RunCmd(("socat"
" OPENSSL-LISTEN:%s,…,key=%s,cert=%s,cafile=%s,verify=1"
" stdout > /dev/disk…") %
port, GetRsaKeyPath(key_name, private=True),
GetRsaKeyPath(key_name, private=False), src_cert_file)
SendToClient("send-disk-to", disk, ip_address, port)
DestroyX509Cert(key_name)
RunRenameScript(instance_name)
#. ``ExportDisk`` on source cluster::
# Make sure certificate was not modified since it was generated by
# destination cluster (which must use the same secret)
if (not utils.VerifySignedX509Cert(cert_pem,
utils.ReadFile(constants.X509_SIGNKEY_FILE))):
raise Error("Certificate not signed with this cluster's secret")
if utils.CheckExpiredX509Cert(cert_pem):
raise Error("X509 certificate is expired")
dest_cert_file = utils.WriteTempFile(cert_pem)
# Start socat
RunCmd(("socat stdin"
" OPENSSL:%s:%s,…,key=%s,cert=%s,cafile=%s,verify=1"
" < /dev/disk…") %
disk.host, disk.port,
GetRsaKeyPath(key_name, private=True),
GetRsaKeyPath(key_name, private=False), dest_cert_file)
if instance.all_disks_done:
DestroyX509Cert(key_name)
.. highlight:: text
Miscellaneous notes
+++++++++++++++++++
- A very similar system could also be used for instance exports within
the same cluster. Currently OpenSSH is being used, but could be
replaced by socat and SSL/TLS.
- During the design of intra-cluster instance moves we also discussed
encrypting instance exports using GnuPG.
- While most instances should have exactly the same configuration as
on the source cluster, setting them up with a different disk layout
might be helpful in some use-cases.
- A cleanup operation, similar to the one available for failed instance
migrations, should be provided.
- ``ganeti-watcher`` should remove instances pending a move from another
cluster after a certain amount of time. This takes care of failures
somewhere in the process.
- RSA keys can be generated using the existing
``bootstrap.GenerateSelfSignedSslCert`` function, though it might be
useful to not write both parts into a single file, requiring small
changes to the function. The public part always starts with
``-----BEGIN CERTIFICATE-----`` and ends with ``-----END
CERTIFICATE-----``.
- The source and destination cluster might be different when it comes
to available hypervisors, kernels, etc. The destination cluster should
refuse to accept an instance move if it can't fulfill an instance's
requirements.
Privilege separation
--------------------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All Ganeti daemons are run under the user root. This is not ideal from a
security perspective as for possible exploitation of any daemon the user
has full access to the system.
In order to overcome this situation we'll allow Ganeti to run its daemon
under different users and a dedicated group. This also will allow some
side effects, like letting the user run some ``gnt-*`` commands if one
is in the same group.
Implementation
~~~~~~~~~~~~~~
For Ganeti 2.2 the implementation will be focused on a the RAPI daemon
only. This involves changes to ``daemons.py`` so it's possible to drop
privileges on daemonize the process. Though, this will be a short term
solution which will be replaced by a privilege drop already on daemon
startup in Ganeti 2.3.
It also needs changes in the master daemon to create the socket with new
permissions/owners to allow RAPI access. There will be no other
permission/owner changes in the file structure as the RAPI daemon is
started with root permission. In that time it will read all needed files
and then drop privileges before contacting the master daemon.
Feature changes
===============
KVM Security
------------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently all kvm processes run as root. Taking ownership of the
hypervisor process, from inside a virtual machine, would mean a full
compromise of the whole Ganeti cluster, knowledge of all Ganeti
authentication secrets, full access to all running instances, and the
option of subverting other basic services on the cluster (eg: ssh).
Proposed changes
~~~~~~~~~~~~~~~~
We would like to decrease the surface of attack available if an
hypervisor is compromised. We can do so adding different features to
Ganeti, which will allow restricting the broken hypervisor
possibilities, in the absence of a local privilege escalation attack, to
subvert the node.
Dropping privileges in kvm to a single user (easy)
++++++++++++++++++++++++++++++++++++++++++++++++++
By passing the ``-runas`` option to kvm, we can make it drop privileges.
The user can be chosen by an hypervisor parameter, so that each instance
can have its own user, but by default they will all run under the same
one. It should be very easy to implement, and can easily be backported
to 2.1.X.
This mode protects the Ganeti cluster from a subverted hypervisor, but
doesn't protect the instances between each other, unless care is taken
to specify a different user for each. This would prevent the worst
attacks, including:
- logging in to other nodes
- administering the Ganeti cluster
- subverting other services
But the following would remain an option:
- terminate other VMs (but not start them again, as that requires root
privileges to set up networking) (unless different users are used)
- trace other VMs, and probably subvert them and access their data
(unless different users are used)
- send network traffic from the node
- read unprotected data on the node filesystem
Running kvm in a chroot (slightly harder)
+++++++++++++++++++++++++++++++++++++++++
By passing the ``-chroot`` option to kvm, we can restrict the kvm
process in its own (possibly empty) root directory. We need to set this
area up so that the instance disks and control sockets are accessible,
so it would require slightly more work at the Ganeti level.
Breaking out in a chroot would mean:
- a lot less options to find a local privilege escalation vector
- the impossibility to write local data, if the chroot is set up
correctly
- the impossibility to read filesystem data on the host
It would still be possible though to:
- terminate other VMs
- trace other VMs, and possibly subvert them (if a tracer can be
installed in the chroot)
- send network traffic from the node
Running kvm with a pool of users (slightly harder)
++++++++++++++++++++++++++++++++++++++++++++++++++
If rather than passing a single user as an hypervisor parameter, we have
a pool of useable ones, we can dynamically choose a free one to use and
thus guarantee that each machine will be separate from the others,
without putting the burden of this on the cluster administrator.
This would mean interfering between machines would be impossible, and
can still be combined with the chroot benefits.
Running iptables rules to limit network interaction (easy)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
These don't need to be handled by Ganeti, but we can ship examples. If
the users used to run VMs would be blocked from sending some or all
network traffic, it would become impossible for a broken into hypervisor
to send arbitrary data on the node network, which is especially useful
when the instance and the node network are separated (using ganeti-nbma
or a separate set of network interfaces), or when a separate replication
network is maintained. We need to experiment to see how much restriction
we can properly apply, without limiting the instance legitimate traffic.
Running kvm inside a container (even harder)
++++++++++++++++++++++++++++++++++++++++++++
Recent linux kernels support different process namespaces through
control groups. PIDs, users, filesystems and even network interfaces can
be separated. If we can set up ganeti to run kvm in a separate container
we could insulate all the host process from being even visible if the
hypervisor gets broken into. Most probably separating the network
namespace would require one extra hop in the host, through a veth
interface, thus reducing performance, so we may want to avoid that, and
just rely on iptables.
Implementation plan
~~~~~~~~~~~~~~~~~~~
We will first implement dropping privileges for kvm processes as a
single user, and most probably backport it to 2.1. Then we'll ship
example iptables rules to show how the user can be limited in its
network activities. After that we'll implement chroot restriction for
kvm processes, and extend the user limitation to use a user pool.
Finally we'll look into namespaces and containers, although that might
slip after the 2.2 release.
New OS states
-------------
Separate from the OS external changes, described below, we'll add some
internal changes to the OS.
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two issues related to the handling of the OSes.
First, it's impossible to disable an OS for new instances, since that
will also break reinstallations and renames of existing instances. To
phase out an OS definition, without actually having to modify the OS
scripts, it would be ideal to be able to restrict new installations but
keep the rest of the functionality available.
Second, ``gnt-instance reinstall --select-os`` shows all the OSes
available on the clusters. Some OSes might exist only for debugging and
diagnose, and not for end-user availability. For this, it would be
useful to "hide" a set of OSes, but keep it otherwise functional.
Proposed changes
~~~~~~~~~~~~~~~~
Two new cluster-level attributes will be added, holding the list of OSes
hidden from the user and respectively the list of OSes which are
blacklisted from new installations.
These lists will be modifiable via ``gnt-os modify`` (implemented via
``OpClusterSetParams``), such that even not-yet-existing OSes can be
preseeded into a given state.
For the hidden OSes, they are fully functional except that they are not
returned in the default OS list (as computed via ``OpOsDiagnose``),
unless the hidden state is requested.
For the blacklisted OSes, they are also not shown (unless the
blacklisted state is requested), and they are also prevented from
installation via ``OpInstanceCreate`` (in create mode).
Both these attributes are per-OS, not per-variant. Thus they apply to
all of an OS' variants, and it's impossible to blacklist or hide just
one variant. Further improvements might allow a given OS variant to be
blacklisted, as opposed to whole OSes.
External interface changes
==========================
OS API
------
The OS variants implementation in Ganeti 2.1 didn't prove to be useful
enough to alleviate the need to hack around the Ganeti API in order to
provide flexible OS parameters.
As such, for Ganeti 2.2 we will provide support for arbitrary OS
parameters. However, since OSes are not registered in Ganeti, but
instead discovered at runtime, the interface is not entirely
straightforward.
Furthermore, to support the system administrator in keeping OSes
properly in sync across the nodes of a cluster, Ganeti will also verify
(if existing) the consistence of a new ``os_version`` file.
These changes to the OS API will bump the API version to 20.
OS version
~~~~~~~~~~
A new ``os_version`` file will be supported by Ganeti. This file is not
required, but if existing, its contents will be checked for consistency
across nodes. The file should hold only one line of text (any extra data
will be discarded), and its contents will be shown in the OS information
and diagnose commands.
It is recommended that OS authors increase the contents of this file for
any changes; at a minimum, modifications that change the behaviour of
import/export scripts must increase the version, since they break
intra-cluster migration.
Parameters
~~~~~~~~~~
The interface between Ganeti and the OS scripts will be based on
environment variables, and as such the parameters and their values will
need to be valid in this context.
Names
+++++
The parameter names will be declared in a new file, ``parameters.list``,
together with a one-line documentation (whitespace-separated). Example::
$ cat parameters.list
ns1 Specifies the first name server to add to /etc/resolv.conf
extra_packages Specifies additional packages to install
rootfs_size Specifies the root filesystem size (the rest will be left unallocated)
track Specifies the distribution track, one of 'stable', 'testing' or 'unstable'
As seen above, the documentation can be separate via multiple
spaces/tabs from the names.
The parameter names as read from the file will be used for the command
line interface in lowercased form; as such, there shouldn't be any two
parameters which differ in case only.
Values
++++++
The values of the parameters are, from Ganeti's point of view,
completely freeform. If a given parameter has, from the OS' point of
view, a fixed set of valid values, these should be documented as such
and verified by the OS, but Ganeti will not handle such parameters
specially.
An empty value must be handled identically as a missing parameter. In
other words, the validation script should only test for non-empty
values, and not for declared versus undeclared parameters.
Furthermore, each parameter should have an (internal to the OS) default
value, that will be used if not passed from Ganeti. More precisely, it
should be possible for any parameter to specify a value that will have
the same effect as not passing the parameter, and no in no case should
the absence of a parameter be treated as an exceptional case (outside
the value space).
Environment variables
^^^^^^^^^^^^^^^^^^^^^
The parameters will be exposed in the environment upper-case and
prefixed with the string ``OSP_``. For example, a parameter declared in
the 'parameters' file as ``ns1`` will appear in the environment as the
variable ``OSP_NS1``.
Validation
++++++++++
For the purpose of parameter name/value validation, the OS scripts
*must* provide an additional script, named ``verify``. This script will
be called with the argument ``parameters``, and all the parameters will
be passed in via environment variables, as described above.
The script should signify result/failure based on its exit code, and
show explanatory messages either on its standard output or standard
error. These messages will be passed on to the master, and stored as in
the OpCode result/error message.
The parameters must be constructed to be independent of the instance
specifications. In general, the validation script will only be called
with the parameter variables set, but not with the normal per-instance
variables, in order for Ganeti to be able to validate default parameters
too, when they change. Validation will only be performed on one cluster
node, and it will be up to the ganeti administrator to keep the OS
scripts in sync between all nodes.
Instance operations
+++++++++++++++++++
The parameters will be passed, as described above, to all the other
instance operations (creation, import, export). Ideally, these scripts
will not abort with parameter validation errors, if the ``verify``
script has verified them correctly.
Note: when changing an instance's OS type, any OS parameters defined at
instance level will be kept as-is. If the parameters differ between the
new and the old OS, the user should manually remove/update them as
needed.
Declaration and modification
++++++++++++++++++++++++++++
Since the OSes are not registered in Ganeti, we will only make a 'weak'
link between the parameters as declared in Ganeti and the actual OSes
existing on the cluster.
It will be possible to declare parameters either globally, per cluster
(where they are indexed per OS/variant), or individually, per
instance. The declaration of parameters will not be tied to current
existing OSes. When specifying a parameter, if the OS exists, it will be
validated; if not, then it will simply be stored as-is.
A special note is that it will not be possible to 'unset' at instance
level a parameter that is declared globally. Instead, at instance level
the parameter should be given an explicit value, or the default value as
explained above.
CLI interface
+++++++++++++
The modification of global (default) parameters will be done via the
``gnt-os`` command, and the per-instance parameters via the
``gnt-instance`` command. Both these commands will take an addition
``--os-parameters`` or ``-O`` flag that specifies the parameters in the
familiar comma-separated, key=value format. For removing a parameter, a
``-key`` syntax will be used, e.g.::
# initial modification
$ gnt-instance modify -O use_dchp=true instance1
# later revert (to the cluster default, or the OS default if not
# defined at cluster level)
$ gnt-instance modify -O -use_dhcp instance1
Internal storage
++++++++++++++++
Internally, the OS parameters will be stored in a new ``osparams``
attribute. The global parameters will be stored on the cluster object,
and the value of this attribute will be a dictionary indexed by OS name
(this also accepts an OS+variant name, which will override a simple OS
name, see below), and for values the key/name dictionary. For the
instances, the value will be directly the key/name dictionary.
Overriding rules
++++++++++++++++
Any instance-specific parameters will override any variant-specific
parameters, which in turn will override any global parameters. The
global parameters, in turn, override the built-in defaults (of the OS
scripts).
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.3.rst 0000644 0000000 0000000 00000112501 12634264163 0016125 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.3 design
=================
This document describes the major changes in Ganeti 2.3 compared to
the 2.2 version.
.. contents:: :depth: 4
As for 2.1 and 2.2 we divide the 2.3 design into three areas:
- core changes, which affect the master daemon/job queue/locking or
all/most logical units
- logical unit/feature changes
- external interface changes (e.g. command line, OS API, hooks, ...)
Core changes
============
Node Groups
-----------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently all nodes of a Ganeti cluster are considered as part of the
same pool, for allocation purposes: DRBD instances for example can be
allocated on any two nodes.
This does cause a problem in cases where nodes are not all equally
connected to each other. For example if a cluster is created over two
set of machines, each connected to its own switch, the internal bandwidth
between machines connected to the same switch might be bigger than the
bandwidth for inter-switch connections.
Moreover, some operations inside a cluster require all nodes to be locked
together for inter-node consistency, and won't scale if we increase the
number of nodes to a few hundreds.
Proposed changes
~~~~~~~~~~~~~~~~
With this change we'll divide Ganeti nodes into groups. Nothing will
change for clusters with only one node group. Bigger clusters will be
able to have more than one group, and each node will belong to exactly
one.
Node group management
+++++++++++++++++++++
To manage node groups and the nodes belonging to them, the following new
commands and flags will be introduced::
gnt-group add # add a new node group
gnt-group remove # delete an empty node group
gnt-group list # list node groups
gnt-group rename # rename a node group
gnt-node {list,info} -g # list only nodes belonging to a node group
gnt-node modify -g # assign a node to a node group
Node group attributes
+++++++++++++++++++++
In clusters with more than one node group, it may be desirable to
establish local policies regarding which groups should be preferred when
performing allocation of new instances, or inter-group instance migrations.
To help with this, we will provide an ``alloc_policy`` attribute for
node groups. Such attribute will be honored by iallocator plugins when
making automatic decisions regarding instance placement.
The ``alloc_policy`` attribute can have the following values:
- unallocable: the node group should not be a candidate for instance
allocations, and the operation should fail if only groups in this
state could be found that would satisfy the requirements.
- last_resort: the node group should not be used for instance
allocations, unless this would be the only way to have the operation
succeed. Prioritization among groups in this state will be deferred to
the iallocator plugin that's being used.
- preferred: the node group can be used freely for allocation of
instances (this is the default state for newly created node
groups). Note that prioritization among groups in this state will be
deferred to the iallocator plugin that's being used.
Node group operations
+++++++++++++++++++++
One operation at the node group level will be initially provided::
gnt-group drain
The purpose of this operation is to migrate all instances in a given
node group to other groups in the cluster, e.g. to reclaim capacity if
there are enough free resources in other node groups that share a
storage pool with the evacuated group.
Instance level changes
++++++++++++++++++++++
With the introduction of node groups, instances will be required to live
in only one group at a time; this is mostly important for DRBD
instances, which will not be allowed to have their primary and secondary
nodes in different node groups. To support this, we envision the
following changes:
- The iallocator interface will be augmented, and node groups exposed,
so that plugins will be able to make a decision regarding the group
in which to place a new instance. By default, all node groups will
be considered, but it will be possible to include a list of groups
in the creation job, in which case the plugin will limit itself to
considering those; in both cases, the ``alloc_policy`` attribute
will be honored.
- If, on the other hand, a primary and secondary nodes are specified
for a new instance, they will be required to be on the same node
group.
- Moving an instance between groups can only happen via an explicit
operation, which for example in the case of DRBD will work by
performing internally a replace-disks, a migration, and a second
replace-disks. It will be possible to clean up an interrupted
group-move operation.
- Cluster verify will signal an error if an instance has nodes
belonging to different groups. Additionally, changing the group of a
given node will be initially only allowed if the node is empty, as a
straightforward mechanism to avoid creating such situation.
- Inter-group instance migration will have the same operation modes as
new instance allocation, defined above: letting an iallocator plugin
decide the target group, possibly restricting the set of node groups
to consider, or specifying a target primary and secondary nodes. In
both cases, the target group or nodes must be able to accept the
instance network- and storage-wise; the operation will fail
otherwise, though in the future we may be able to allow some
parameter to be changed together with the move (in the meantime, an
import/export will be required in this scenario).
Internal changes
++++++++++++++++
We expect the following changes for cluster management:
- Frequent multinode operations, such as os-diagnose or cluster-verify,
will act on one group at a time, which will have to be specified in
all cases, except for clusters with just one group. Command line
tools will also have a way to easily target all groups, by
generating one job per group.
- Groups will have a human-readable name, but will internally always
be referenced by a UUID, which will be immutable; for example, nodes
will contain the UUID of the group they belong to. This is done
to simplify referencing while keeping it easy to handle renames and
movements. If we see that this works well, we'll transition other
config objects (instances, nodes) to the same model.
- The addition of a new per-group lock will be evaluated, if we can
transition some operations now requiring the BGL to it.
- Master candidate status will be allowed to be spread among groups.
For the first version we won't add any restriction over how this is
done, although in the future we may have a minimum number of master
candidates which Ganeti will try to keep in each group, for example.
Other work and future changes
+++++++++++++++++++++++++++++
Commands like ``gnt-cluster command``/``gnt-cluster copyfile`` will
continue to work on the whole cluster, but it will be possible to target
one group only by specifying it.
Commands which allow selection of sets of resources (for example
``gnt-instance start``/``gnt-instance stop``) will be able to select
them by node group as well.
Initially node groups won't be taggable objects, to simplify the first
implementation, but we expect this to be easy to add in a future version
should we see it's useful.
We envision groups as a good place to enhance cluster scalability. In
the future we may want to use them as units for configuration diffusion,
to allow a better master scalability. For example it could be possible
to change some all-nodes RPCs to contact each group once, from the
master, and make one node in the group perform internal diffusion. We
won't implement this in the first version, but we'll evaluate it for the
future, if we see scalability problems on big multi-group clusters.
When Ganeti will support more storage models (e.g. SANs, Sheepdog, Ceph)
we expect groups to be the basis for this, allowing for example a
different Sheepdog/Ceph cluster, or a different SAN to be connected to
each group. In some cases this will mean that inter-group move operation
will be necessarily performed with instance downtime, unless the
hypervisor has block-migrate functionality, and we implement support for
it (this would be theoretically possible, today, with KVM, for example).
Scalability issues with big clusters
------------------------------------
Current and future issues
~~~~~~~~~~~~~~~~~~~~~~~~~
Assuming the node groups feature will enable bigger clusters, other
parts of Ganeti will be impacted even more by the (in effect) bigger
clusters.
While many areas will be impacted, one is the most important: the fact
that the watcher still needs to be able to repair instance data on the
current 5 minutes time-frame (a shorter time-frame would be even
better). This means that the watcher itself needs to have parallelism
when dealing with node groups.
Also, the iallocator plugins are being fed data from Ganeti but also
need access to the full cluster state, and in general we still rely on
being able to compute the full cluster state somewhat “cheaply†and
on-demand. This conflicts with the goal of disconnecting the different
node groups, and to keep the same parallelism while growing the cluster
size.
Another issue is that the current capacity calculations are done
completely outside Ganeti (and they need access to the entire cluster
state), and this prevents keeping the capacity numbers in sync with the
cluster state. While this is still acceptable for smaller clusters where
a small number of allocations/removal are presumed to occur between two
periodic capacity calculations, on bigger clusters where we aim to
parallelize heavily between node groups this is no longer true.
As proposed changes, the main change is introducing a cluster state
cache (not serialised to disk), and to update many of the LUs and
cluster operations to account for it. Furthermore, the capacity
calculations will be integrated via a new OpCode/LU, so that we have
faster feedback (instead of periodic computation).
Cluster state cache
~~~~~~~~~~~~~~~~~~~
A new cluster state cache will be introduced. The cache relies on two
main ideas:
- the total node memory, CPU count are very seldom changing; the total
node disk space is also slow changing, but can change at runtime; the
free memory and free disk will change significantly for some jobs, but
on a short timescale; in general, these values will be mostly “constantâ€
during the lifetime of a job
- we already have a periodic set of jobs that query the node and
instance state, driven the by :command:`ganeti-watcher` command, and
we're just discarding the results after acting on them
Given the above, it makes sense to cache the results of node and instance
state (with a focus on the node state) inside the master daemon.
The cache will not be serialised to disk, and will be for the most part
transparent to the outside of the master daemon.
Cache structure
+++++++++++++++
The cache will be oriented with a focus on node groups, so that it will
be easy to invalidate an entire node group, or a subset of nodes, or the
entire cache. The instances will be stored in the node group of their
primary node.
Furthermore, since the node and instance properties determine the
capacity statistics in a deterministic way, the cache will also hold, at
each node group level, the total capacity as determined by the new
capacity iallocator mode.
Cache updates
+++++++++++++
The cache will be updated whenever a query for a node state returns
“full†node information (so as to keep the cache state for a given node
consistent). Partial results will not update the cache (see next
paragraph).
Since there will be no way to feed the cache from outside, and we
would like to have a consistent cache view when driven by the watcher,
we'll introduce a new OpCode/LU for the watcher to run, instead of the
current separate opcodes (see below in the watcher section).
Updates to a node that change a node's specs “downward†(e.g. less
memory) will invalidate the capacity data. Updates that increase the
node will not invalidate the capacity, as we're more interested in “at
least available†correctness, not “at most availableâ€.
Cache invalidation
++++++++++++++++++
If a partial node query is done (e.g. just for the node free space), and
the returned values don't match with the cache, then the entire node
state will be invalidated.
By default, all LUs will invalidate the caches for all nodes and
instances they lock. If an LU uses the BGL, then it will invalidate the
entire cache. In time, it is expected that LUs will be modified to not
invalidate, if they are not expected to change the node's and/or
instance's state (e.g. ``LUInstanceConsole``, or
``LUInstanceActivateDisks``).
Invalidation of a node's properties will also invalidate the capacity
data associated with that node.
Cache lifetime
++++++++++++++
The cache elements will have an upper bound on their lifetime; the
proposal is to make this an hour, which should be a high enough value to
cover the watcher being blocked by a medium-term job (e.g. 20-30
minutes).
Cache usage
+++++++++++
The cache will be used by default for most queries (e.g. a Luxi call,
without locks, for the entire cluster). Since this will be a change from
the current behaviour, we'll need to allow non-cached responses,
e.g. via a ``--cache=off`` or similar argument (which will force the
query).
The cache will also be used for the iallocator runs, so that computing
allocation solution can proceed independent from other jobs which lock
parts of the cluster. This is important as we need to separate
allocation on one group from exclusive blocking jobs on other node
groups.
The capacity calculations will also use the cache. This is detailed in
the respective sections.
Watcher operation
~~~~~~~~~~~~~~~~~
As detailed in the cluster cache section, the watcher also needs
improvements in order to scale with the the cluster size.
As a first improvement, the proposal is to introduce a new OpCode/LU
pair that runs with locks held over the entire query sequence (the
current watcher runs a job with two opcodes, which grab and release the
locks individually). The new opcode will be called
``OpUpdateNodeGroupCache`` and will do the following:
- try to acquire all node/instance locks (to examine in more depth, and
possibly alter) in the given node group
- invalidate the cache for the node group
- acquire node and instance state (possibly via a new single RPC call
that combines node and instance information)
- update cache
- return the needed data
The reason for the per-node group query is that we don't want a busy
node group to prevent instance maintenance in other node
groups. Therefore, the watcher will introduce parallelism across node
groups, and it will possible to have overlapping watcher runs. The new
execution sequence will be:
- the parent watcher process acquires global watcher lock
- query the list of node groups (lockless or very short locks only)
- fork N children, one for each node group
- release the global lock
- poll/wait for the children to finish
Each forked children will do the following:
- try to acquire the per-node group watcher lock
- if fail to acquire, exit with special code telling the parent that the
node group is already being managed by a watcher process
- otherwise, submit a OpUpdateNodeGroupCache job
- get results (possibly after a long time, due to busy group)
- run the needed maintenance operations for the current group
This new mode of execution means that the master watcher processes might
overlap in running, but not the individual per-node group child
processes.
This change allows us to keep (almost) the same parallelism when using a
bigger cluster with node groups versus two separate clusters.
Cost of periodic cache updating
+++++++++++++++++++++++++++++++
Currently the watcher only does “small†queries for the node and
instance state, and at first sight changing it to use the new OpCode
which populates the cache with the entire state might introduce
additional costs, which must be payed every five minutes.
However, the OpCodes that the watcher submits are using the so-called
dynamic fields (need to contact the remote nodes), and the LUs are not
selective—they always grab all the node and instance state. So in the
end, we have the same cost, it just becomes explicit rather than
implicit.
This ‘grab all node state’ behaviour is what makes the cache worth
implementing.
Intra-node group scalability
++++++++++++++++++++++++++++
The design above only deals with inter-node group issues. It still makes
sense to run instance maintenance for nodes A and B if only node C is
locked (all being in the same node group).
This problem is commonly encountered in previous Ganeti versions, and it
should be handled similarly, by tweaking lock lifetime in long-duration
jobs.
TODO: add more ideas here.
State file maintenance
++++++++++++++++++++++
The splitting of node group maintenance to different children which will
run in parallel requires that the state file handling changes from
monolithic updates to partial ones.
There are two file that the watcher maintains:
- ``$LOCALSTATEDIR/lib/ganeti/watcher.data``, its internal state file,
used for deciding internal actions
- ``$LOCALSTATEDIR/run/ganeti/instance-status``, a file designed for
external consumption
For the first file, since it's used only internally to the watchers, we
can move to a per node group configuration.
For the second file, even if it's used as an external interface, we will
need to make some changes to it: because the different node groups can
return results at different times, we need to either split the file into
per-group files or keep the single file and add a per-instance timestamp
(currently the file holds only the instance name and state).
The proposal is that each child process maintains its own node group
file, and the master process will, right after querying the node group
list, delete any extra per-node group state file. This leaves the
consumers to run a simple ``cat instance-status.group-*`` to obtain the
entire list of instance and their states. If needed, the modify
timestamp of each file can be used to determine the age of the results.
Capacity calculations
~~~~~~~~~~~~~~~~~~~~~
Currently, the capacity calculations are done completely outside
Ganeti. As explained in the current problems section, this needs to
account better for the cluster state changes.
Therefore a new OpCode will be introduced, ``OpComputeCapacity``, that
will either return the current capacity numbers (if available), or
trigger a new capacity calculation, via the iallocator framework, which
will get a new method called ``capacity``.
This method will feed the cluster state (for the complete set of node
group, or alternative just a subset) to the iallocator plugin (either
the specified one, or the default if none is specified), and return the
new capacity in the format currently exported by the htools suite and
known as the “tiered specs†(see :manpage:`hspace(1)`).
tspec cluster parameters
++++++++++++++++++++++++
Currently, the “tspec†calculations done in :command:`hspace` require
some additional parameters:
- maximum instance size
- type of instance storage
- maximum ratio of virtual CPUs per physical CPUs
- minimum disk free
For the integration in Ganeti, there are multiple ways to pass these:
- ignored by Ganeti, and being the responsibility of the iallocator
plugin whether to use these at all or not
- as input to the opcode
- as proper cluster parameters
Since the first option is not consistent with the intended changes, a
combination of the last two is proposed:
- at cluster level, we'll have cluster-wide defaults
- at node groups, we'll allow overriding the cluster defaults
- and if they are passed in via the opcode, they will override for the
current computation the values
Whenever the capacity is requested via different parameters, it will
invalidate the cache, even if otherwise the cache is up-to-date.
The new parameters are:
- max_inst_spec: (int, int, int), the maximum instance specification
accepted by this cluster or node group, in the order of memory, disk,
vcpus;
- default_template: string, the default disk template to use
- max_cpu_ratio: double, the maximum ratio of VCPUs/PCPUs
- max_disk_usage: double, the maximum disk usage (as a ratio)
These might also be used in instance creations (to be determined later,
after they are introduced).
OpCode details
++++++++++++++
Input:
- iallocator: string (optional, otherwise uses the cluster default)
- cached: boolean, optional, defaults to true, and denotes whether we
accept cached responses
- the above new parameters, optional; if they are passed, they will
overwrite all node group's parameters
Output:
- cluster: list of tuples (memory, disk, vcpu, count), in decreasing
order of specifications; the first three members represent the
instance specification, the last one the count of how many instances
of this specification can be created on the cluster
- node_groups: a dictionary keyed by node group UUID, with values a
dictionary:
- tspecs: a list like the cluster one
- additionally, the new cluster parameters, denoting the input
parameters that were used for this node group
- ctime: the date the result has been computed; this represents the
oldest creation time amongst all node groups (so as to accurately
represent how much out-of-date the global response is)
Note that due to the way the tspecs are computed, for any given
specification, the total available count is the count for the given
entry, plus the sum of counts for higher specifications.
Node flags
----------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently all nodes are, from the point of view of their capabilities,
homogeneous. This means the cluster considers all nodes capable of
becoming master candidates, and of hosting instances.
This prevents some deployment scenarios: e.g. having a Ganeti instance
(in another cluster) be just a master candidate, in case all other
master candidates go down (but not, of course, host instances), or
having a node in a remote location just host instances but not become
master, etc.
Proposed changes
~~~~~~~~~~~~~~~~
Two new capability flags will be added to the node:
- master_capable, denoting whether the node can become a master
candidate or master
- vm_capable, denoting whether the node can host instances
In terms of the other flags, master_capable is a stronger version of
"not master candidate", and vm_capable is a stronger version of
"drained".
For the master_capable flag, it will affect auto-promotion code and node
modifications.
The vm_capable flag will affect the iallocator protocol, capacity
calculations, node checks in cluster verify, and will interact in novel
ways with locking (unfortunately).
It is envisaged that most nodes will be both vm_capable and
master_capable, and just a few will have one of these flags
removed. Ganeti itself will allow clearing of both flags, even though
this doesn't make much sense currently.
.. _jqueue-job-priority-design:
Job priorities
--------------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently all jobs and opcodes have the same priority. Once a job
started executing, its thread won't be released until all opcodes got
their locks and did their work. When a job is finished, the next job is
selected strictly by its incoming order. This does not mean jobs are run
in their incoming order—locks and other delays can cause them to be
stalled for some time.
In some situations, e.g. an emergency shutdown, one may want to run a
job as soon as possible. This is not possible currently if there are
pending jobs in the queue.
Proposed changes
~~~~~~~~~~~~~~~~
Each opcode will be assigned a priority on submission. Opcode priorities
are integers and the lower the number, the higher the opcode's priority
is. Within the same priority, jobs and opcodes are initially processed
in their incoming order.
Submitted opcodes can have one of the priorities listed below. Other
priorities are reserved for internal use. The absolute range is
-20..+19. Opcodes submitted without a priority (e.g. by older clients)
are assigned the default priority.
- High (-10)
- Normal (0, default)
- Low (+10)
As a change from the current model where executing a job blocks one
thread for the whole duration, the new job processor must return the job
to the queue after each opcode and also if it can't get all locks in a
reasonable timeframe. This will allow opcodes of higher priority
submitted in the meantime to be processed or opcodes of the same
priority to try to get their locks. When added to the job queue's
workerpool, the priority is determined by the first unprocessed opcode
in the job.
If an opcode is deferred, the job will go back to the "queued" status,
even though it's just waiting to try to acquire its locks again later.
If an opcode can not be processed after a certain number of retries or a
certain amount of time, it should increase its priority. This will avoid
starvation.
A job's priority can never go below -20. If a job hits priority -20, it
must acquire its locks in blocking mode.
Opcode priorities are synchronised to disk in order to be restored after
a restart or crash of the master daemon.
Priorities also need to be considered inside the locking library to
ensure opcodes with higher priorities get locks first. See
:ref:`locking priorities ` for more details.
Worker pool
+++++++++++
To support job priorities in the job queue, the worker pool underlying
the job queue must be enhanced to support task priorities. Currently
tasks are processed in the order they are added to the queue (but, due
to their nature, they don't necessarily finish in that order). All tasks
are equal. To support tasks with higher or lower priority, a few changes
have to be made to the queue inside a worker pool.
Each task is assigned a priority when added to the queue. This priority
can not be changed until the task is executed (this is fine as in all
current use-cases, tasks are added to a pool and then forgotten about
until they're done).
A task's priority can be compared to Unix' process priorities. The lower
the priority number, the closer to the queue's front it is. A task with
priority 0 is going to be run before one with priority 10. Tasks with
the same priority are executed in the order in which they were added.
While a task is running it can query its own priority. If it's not ready
yet for finishing, it can raise an exception to defer itself, optionally
changing its own priority. This is useful for the following cases:
- A task is trying to acquire locks, but those locks are still held by
other tasks. By deferring itself, the task gives others a chance to
run. This is especially useful when all workers are busy.
- If a task decides it hasn't gotten its locks in a long time, it can
start to increase its own priority.
- Tasks waiting for long-running operations running asynchronously could
defer themselves while waiting for a long-running operation.
With these changes, the job queue will be able to implement per-job
priorities.
.. _locking-priorities:
Locking
+++++++
In order to support priorities in Ganeti's own lock classes,
``locking.SharedLock`` and ``locking.LockSet``, the internal structure
of the former class needs to be changed. The last major change in this
area was done for Ganeti 2.1 and can be found in the respective
:doc:`design document `.
The plain list (``[]``) used as a queue is replaced by a heap queue,
similar to the `worker pool`_. The heap or priority queue does automatic
sorting, thereby automatically taking care of priorities. For each
priority there's a plain list with pending acquires, like the single
queue of pending acquires before this change.
When the lock is released, the code locates the list of pending acquires
for the highest priority waiting. The first condition (index 0) is
notified. Once all waiting threads received the notification, the
condition is removed from the list. If the list of conditions is empty
it's removed from the heap queue.
Like before, shared acquires are grouped and skip ahead of exclusive
acquires if there's already an existing shared acquire for a priority.
To accomplish this, a separate dictionary of shared acquires per
priority is maintained.
To simplify the code and reduce memory consumption, the concept of the
"active" and "inactive" condition for shared acquires is abolished. The
lock can't predict what priorities the next acquires will use and even
keeping a cache can become computationally expensive for arguable
benefit (the underlying POSIX pipe, see ``pipe(2)``, needs to be
re-created for each notification anyway).
The following diagram shows a possible state of the internal queue from
a high-level view. Conditions are shown as (waiting) threads. Assuming
no modifications are made to the queue (e.g. more acquires or timeouts),
the lock would be acquired by the threads in this order (concurrent
acquires in parentheses): ``threadE1``, ``threadE2``, (``threadS1``,
``threadS2``, ``threadS3``), (``threadS4``, ``threadS5``), ``threadE3``,
``threadS6``, ``threadE4``, ``threadE5``.
::
[
(0, [exc/threadE1, exc/threadE2, shr/threadS1/threadS2/threadS3]),
(2, [shr/threadS4/threadS5]),
(10, [exc/threadE3]),
(33, [shr/threadS6, exc/threadE4, exc/threadE5]),
]
IPv6 support
------------
Currently Ganeti does not support IPv6. This is true for nodes as well
as instances. Due to the fact that IPv4 exhaustion is threateningly near
the need of using IPv6 is increasing, especially given that bigger and
bigger clusters are supported.
Supported IPv6 setup
~~~~~~~~~~~~~~~~~~~~
In Ganeti 2.3 we introduce additionally to the ordinary pure IPv4
setup a hybrid IPv6/IPv4 mode. The latter works as follows:
- all nodes in a cluster have a primary IPv6 address
- the master has a IPv6 address
- all nodes **must** have a secondary IPv4 address
The reason for this hybrid setup is that key components that Ganeti
depends on do not or only partially support IPv6. More precisely, Xen
does not support instance migration via IPv6 in version 3.4 and 4.0.
Similarly, KVM does not support instance migration nor VNC access for
IPv6 at the time of this writing.
This led to the decision of not supporting pure IPv6 Ganeti clusters, as
very important cluster operations would not have been possible. Using
IPv4 as secondary address does not affect any of the goals
of the IPv6 support: since secondary addresses do not need to be
publicly accessible, they need not be globally unique. In other words,
one can practically use private IPv4 secondary addresses just for
intra-cluster communication without propagating them across layer 3
boundaries.
netutils: Utilities for handling common network tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently common utility functions are kept in the ``utils`` module.
Since this module grows bigger and bigger network-related functions are
moved to a separate module named *netutils*. Additionally all these
utilities will be IPv6-enabled.
Cluster initialization
~~~~~~~~~~~~~~~~~~~~~~
As mentioned above there will be two different setups in terms of IP
addressing: pure IPv4 and hybrid IPv6/IPv4 address. To choose that a
new cluster init parameter *--primary-ip-version* is introduced. This is
needed as a given name can resolve to both an IPv4 and IPv6 address on a
dual-stack host effectively making it impossible to infer that bit.
Once a cluster is initialized and the primary IP version chosen all
nodes that join have to conform to that setup. In the case of our
IPv6/IPv4 setup all nodes *must* have a secondary IPv4 address.
Furthermore we store the primary IP version in ssconf which is consulted
every time a daemon starts to determine the default bind address (either
*0.0.0.0* or *::*. In a IPv6/IPv4 setup we need to bind the Ganeti
daemon listening on network sockets to the IPv6 address.
Node addition
~~~~~~~~~~~~~
When adding a new node to a IPv6/IPv4 cluster it must have a IPv6
address to be used as primary and a IPv4 address used as secondary. As
explained above, every time a daemon is started we use the cluster
primary IP version to determine to which any address to bind to. The
only exception to this is when a node is added to the cluster. In this
case there is no ssconf available when noded is started and therefore
the correct address needs to be passed to it.
Name resolution
~~~~~~~~~~~~~~~
Since the gethostbyname*() functions do not support IPv6 name resolution
will be done by using the recommended getaddrinfo().
IPv4-only components
~~~~~~~~~~~~~~~~~~~~
============================ =================== ====================
Component IPv6 Status Planned Version
============================ =================== ====================
Xen instance migration Not supported Xen 4.1: libxenlight
KVM instance migration Not supported Unknown
KVM VNC access Not supported Unknown
============================ =================== ====================
Privilege Separation
--------------------
Current state and shortcomings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In Ganeti 2.2 we introduced privilege separation for the RAPI daemon.
This was done directly in the daemon's code in the process of
daemonizing itself. Doing so leads to several potential issues. For
example, a file could be opened while the code is still running as
``root`` and for some reason not be closed again. Even after changing
the user ID, the file descriptor can be written to.
Implementation
~~~~~~~~~~~~~~
To address these shortcomings, daemons will be started under the target
user right away. The ``start-stop-daemon`` utility used to start daemons
supports the ``--chuid`` option to change user and group ID before
starting the executable.
The intermediate solution for the RAPI daemon from Ganeti 2.2 will be
removed again.
Files written by the daemons may need to have an explicit owner and
group set (easily done through ``utils.WriteFile``).
All SSH-related code is removed from the ``ganeti.bootstrap`` module and
core components and moved to a separate script. The core code will
simply assume a working SSH setup to be in place.
Security Domains
~~~~~~~~~~~~~~~~
In order to separate the permissions of file sets we separate them
into the following 3 overall security domain chunks:
1. Public: ``0755`` respectively ``0644``
2. Ganeti wide: shared between the daemons (gntdaemons)
3. Secret files: shared among a specific set of daemons/users
So for point 3 this tables shows the correlation of the sets to groups
and their users:
=== ========== ============================== ==========================
Set Group Users Description
=== ========== ============================== ==========================
A gntrapi gntrapi, gntmasterd Share data between
gntrapi and gntmasterd
B gntadmins gntrapi, gntmasterd, *users* Shared between users who
needs to call gntmasterd
C gntconfd gntconfd, gntmasterd Share data between
gntconfd and gntmasterd
D gntmasterd gntmasterd masterd only; Currently
only to redistribute the
configuration, has access
to all files under
``lib/ganeti``
E gntdaemons gntmasterd, gntrapi, gntconfd Shared between the various
Ganeti daemons to exchange
data
=== ========== ============================== ==========================
Restricted commands
~~~~~~~~~~~~~~~~~~~
The following commands still require root permissions to fulfill their
functions:
::
gnt-cluster {init|destroy|command|copyfile|rename|masterfailover|renew-crypto}
gnt-node {add|remove}
gnt-instance {console}
Directory structure and permissions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here's how we propose to change the filesystem hierarchy and their
permissions.
Assuming it follows the defaults: ``gnt${daemon}`` for user and
the groups from the section `Security Domains`_::
${localstatedir}/lib/ganeti/ (0755; gntmasterd:gntmasterd)
cluster-domain-secret (0600; gntmasterd:gntmasterd)
config.data (0640; gntmasterd:gntconfd)
hmac.key (0440; gntmasterd:gntconfd)
known_host (0644; gntmasterd:gntmasterd)
queue/ (0700; gntmasterd:gntmasterd)
archive/ (0700; gntmasterd:gntmasterd)
* (0600; gntmasterd:gntmasterd)
* (0600; gntmasterd:gntmasterd)
rapi.pem (0440; gntrapi:gntrapi)
rapi_users (0640; gntrapi:gntrapi)
server.pem (0440; gntmasterd:gntmasterd)
ssconf_* (0444; root:gntmasterd)
uidpool/ (0750; root:gntmasterd)
watcher.data (0600; root:gntmasterd)
${localstatedir}/run/ganeti/ (0770; gntmasterd:gntdaemons)
socket/ (0750; gntmasterd:gntadmins)
ganeti-master (0770; gntmasterd:gntadmins)
${localstatedir}/log/ganeti/ (0770; gntmasterd:gntdaemons)
master-daemon.log (0600; gntmasterd:gntdaemons)
rapi-daemon.log (0600; gntrapi:gntdaemons)
conf-daemon.log (0600; gntconfd:gntdaemons)
node-daemon.log (0600; gntnoded:gntdaemons)
Feature changes
===============
External interface changes
==========================
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.4.rst 0000644 0000000 0000000 00000000377 12634264163 0016135 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.4 design
=================
The following design documents have been implemented in Ganeti 2.4:
- :doc:`design-oob`
- :doc:`design-query2`
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.5.rst 0000644 0000000 0000000 00000000516 12634264163 0016131 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.5 design
=================
The following design documents have been implemented in Ganeti 2.5:
- :doc:`design-lu-generated-jobs`
- :doc:`design-chained-jobs`
- :doc:`design-multi-reloc`
- :doc:`design-shared-storage`
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.6.rst 0000644 0000000 0000000 00000000414 12634264163 0016127 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.6 design
=================
The following design documents have been implemented in Ganeti 2.6:
- :doc:`design-cpu-pinning`
- :doc:`design-ovf-support`
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.7.rst 0000644 0000000 0000000 00000001475 12634264163 0016140 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.7 design
=================
The following design documents have been implemented in Ganeti 2.7:
- :doc:`design-bulk-create`
- :doc:`design-opportunistic-locking`
- :doc:`design-restricted-commands`
- :doc:`design-node-add`
- :doc:`design-virtual-clusters`
- :doc:`design-network`
- :doc:`design-linuxha`
- :doc:`design-shared-storage` (Updated to reflect the new ExtStorage
Interface)
The following designs have been partially implemented in Ganeti 2.7:
- :doc:`design-query-splitting`: only queries not needing RPC are
supported, through confd
- :doc:`design-partitioned`: only exclusive use of disks is implemented
- :doc:`design-monitoring-agent`: an example standalone DRBD data
collector is included
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.8.rst 0000644 0000000 0000000 00000001363 12634264163 0016135 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.8 design
=================
The following design documents have been implemented in Ganeti 2.8:
- :doc:`design-reason-trail`
- :doc:`design-autorepair`
- :doc:`design-device-uuid-name`
The following designs have been partially implemented in Ganeti 2.8:
- :doc:`design-storagetypes`
- :doc:`design-hroller`
- :doc:`design-query-splitting`: everything except instance queries.
- :doc:`design-partitioned`: "Constrained instance sizes" implemented.
- :doc:`design-monitoring-agent`: implementation of all the core functionalities
of the monitoring agent. Reason trail implemented as part of the work for the
instance status collector.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-2.9.rst 0000644 0000000 0000000 00000000623 12634264163 0016134 0 ustar 00root root 0000000 0000000 =================
Ganeti 2.9 design
=================
The following design documents have been implemented in Ganeti 2.9.
- :doc:`design-hroller`
- :doc:`design-partitioned`
- :doc:`design-monitoring-agent`
- :doc:`design-reason-trail`
- :doc:`design-query-splitting`
- :doc:`design-device-uuid-name`
The following designs have been partially implemented in Ganeti 2.9.
- :doc:`design-storagetypes`
ganeti-2.15.2/doc/design-allocation-efficiency.rst 0000644 0000000 0000000 00000005555 12634264163 0022064 0 ustar 00root root 0000000 0000000 =========================================================================
Improving allocation efficiency by considering the total reserved memory
=========================================================================
This document describes a change to the cluster metric to enhance
the allocation efficiency of Ganeti's ``htools``.
.. contents:: :depth: 4
Current state and shortcomings
==============================
Ganeti's ``htools``, which typically make all allocation and balancing
decisions, greedily try to improve the cluster metric. So it is important
that the cluster metric faithfully reflects the objectives of these operations.
Currently the cluster metric is composed of counting violations (instances on
offline nodes, nodes that are not N+1 redundant, etc) and the sum of standard
deviations of relative resource usage of the individual nodes. The latter
component is to ensure that all nodes equally bear the load of the instances.
This is reasonable for resources where the total usage is independent of
its distribution, as it is the case for CPU, disk, and total RAM. It is,
however, not true for reserved memory. By distributing its secondaries
more widespread over the cluster, a node can reduce its reserved memory
without increasing it on other nodes. Not taking this aspect into account
has lead to quite inefficient allocation of instances on the cluster (see
example below).
Proposed changes
================
A new additive component is added to the cluster metric. It is the sum over
all nodes of the fraction of reserved memory. This way, moves and allocations
that reduce the amount of memory reserved to ensure N+1 redundancy are favored.
Note that this component does not have the scaling of standard deviations of
fractions, but, instead counts nodes reserved for N+1 redundancy. In an ideal
allocation, this will not exceed 1. But bad allocations will violate this
property. As waste of reserved memory is a more future-oriented problem than,
e.g., current N+1 violations, we give the new component a relatively small
weight of 0.25, so that counting current violations still dominate.
Another consequence of this metric change is that the value 0 is no longer
obtainable: as soon as we have DRBD instance, we have to reserve memory.
However, in most cases only differences of scores influence decissions made.
In the few cases, were absolute values of the cluster score are specified,
they are interpreted as relative to the theoretical minimum of the reserved
memory score.
Example
=======
Consider the capacity of an empty cluster of 6 nodes, each capable of holding
10 instances; this can be measured, e.g., by
``hspace --simulate=p,6,204801,10241,21 --disk-template=drbd
--standard-alloc=10240,1024,2``. Without the metric change 34 standard
instances are allocated. With the metric change, 48 standard instances
are allocated. This is a 41% increase in utilization.
ganeti-2.15.2/doc/design-autorepair.rst 0000644 0000000 0000000 00000037361 12634264163 0020010 0 ustar 00root root 0000000 0000000 ====================
Instance auto-repair
====================
.. contents:: :depth: 4
This is a design document detailing the implementation of self-repair and
recreation of instances in Ganeti. It also discusses ideas that might be useful
for more future self-repair situations.
Current state and shortcomings
==============================
Ganeti currently doesn't do any sort of self-repair or self-recreate of
instances:
- If a drbd instance is broken (its primary of secondary nodes go
offline or need to be drained) an admin or an external tool must fail
it over if necessary, and then trigger a disk replacement.
- If a plain instance is broken (or both nodes of a drbd instance are)
an admin or an external tool must recreate its disk and reinstall it.
Moreover in an oversubscribed cluster operations mentioned above might
fail for lack of capacity until a node is repaired or a new one added.
In this case an external tool would also need to go through any
"pending-recreate" or "pending-repair" instances and fix them.
Proposed changes
================
We'd like to increase the self-repair capabilities of Ganeti, at least
with regards to instances. In order to do so we plan to add mechanisms
to mark an instance as "due for being repaired" and then the relevant
repair to be performed as soon as it's possible, on the cluster.
The self repair will be written as part of ganeti-watcher or as an extra
watcher component that is called less often.
As the first version we'll only handle the case in which an instance
lives on an offline or drained node. In the future we may add more
self-repair capabilities for errors ganeti can detect.
New attributes (or tags)
------------------------
In order to know when to perform a self-repair operation we need to know
whether they are allowed by the cluster administrator.
This can be implemented as either new attributes or tags. Tags could be
acceptable as they would only be read and interpreted by the self-repair tool
(part of the watcher), and not by the ganeti core opcodes and node rpcs. The
following tags would be needed:
ganeti:watcher:autorepair:
++++++++++++++++++++++++++++++++
(instance/nodegroup/cluster)
Allow repairs to happen on an instance that has the tag, or that lives
in a cluster or nodegroup which does. Types of repair are in order of
perceived risk, lower to higher, and each type includes allowing the
operations in the lower ones:
- ``fix-storage`` allows a disk replacement or another operation that
fixes the instance backend storage without affecting the instance
itself. This can for example recover from a broken drbd secondary, but
risks data loss if something is wrong on the primary but the secondary
was somehow recoverable.
- ``migrate`` allows an instance migration. This can recover from a
drained primary, but can cause an instance crash in some cases (bugs).
- ``failover`` allows instance reboot on the secondary. This can recover
from an offline primary, but the instance will lose its running state.
- ``reinstall`` allows disks to be recreated and an instance to be
reinstalled. This can recover from primary&secondary both being
offline, or from an offline primary in the case of non-redundant
instances. It causes data loss.
Each repair type allows all the operations in the previous types, in the
order above, in order to ensure a repair can be completed fully. As such
a repair of a lower type might not be able to proceed if it detects an
error condition that requires a more risky or drastic solution, but
never vice versa (if a worse solution is allowed then so is a better
one).
If there are multiple ``ganeti:watcher:autorepair:`` tags in an
object (cluster, node group or instance), the least destructive tag
takes precedence. When multiplicity happens across objects, the nearest
tag wins. For example, if in a cluster with two instances, *I1* and
*I2*, *I1* has ``failover``, and the cluster itself has both
``fix-storage`` and ``reinstall``, *I1* will end up with ``failover``
and *I2* with ``fix-storage``.
ganeti:watcher:autorepair:suspend[:]
+++++++++++++++++++++++++++++++++++++++++++++++
(instance/nodegroup/cluster)
If this tag is encountered no autorepair operations will start for the
instance (or for any instance, if present at the cluster or group
level). Any job which already started will be allowed to finish, but
then the autorepair system will not proceed further until this tag is
removed, or the timestamp passes (in which case the tag will be removed
automatically by the watcher).
Note that depending on how this tag is used there might still be race
conditions related to it for an external tool that uses it
programmatically, as no "lock tag" or tag "test-and-set" operation is
present at this time. While this is known we won't solve these race
conditions in the first version.
It might also be useful to easily have an operation that tags all
instances matching a filter on some charateristic. But again, this
wouldn't be specific to this tag.
If there are multiple
``ganeti:watcher:autorepair:suspend[:]`` tags in an object,
the form without timestamp takes precedence (permanent suspension); or,
if all object tags have a timestamp, the one with the highest timestamp.
When multiplicity happens across objects, the nearest tag wins, as
above. This makes it possible to suspend cluster-enabled repairs with a
single tag in the cluster object; or to suspend them only for a certain
node group or instance. At the same time, it is possible to re-enable
cluster-suspended repairs in a particular instance or group by applying
an enable tag to them.
ganeti:watcher:autorepair:pending::::
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(instance)
If this tag is present a repair of type ``type`` is pending on the
target instance. This means that either jobs are being run, or it's
waiting for resource availability. ``id`` is the unique id identifying
this repair, ``timestamp`` is the time when this tag was first applied
to this instance for this ``id`` (we will "update" the tag by adding a
"new copy" of it and removing the old version as we run more jobs, but
the timestamp will never change for the same repair)
``jobs`` is the list of jobs already run or being run to repair the
instance (separated by a plus sign, *+*). If the instance has just
been put in pending state but no job has run yet, this list is empty.
This tag will be set by ganeti if an equivalent autorepair tag is
present and a a repair is needed, or can be set by an external tool to
request a repair as a "once off".
If multiple instances of this tag are present they will be handled in
order of timestamp.
ganeti:watcher:autorepair:result:::::
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(instance)
If this tag is present a repair of type ``type`` has been performed on
the instance and has been completed by ``timestamp``. The result is
either ``success``, ``failure`` or ``enoperm``, and jobs is a
*+*-separated list of jobs that were executed for this repair.
An ``enoperm`` result is returned when the repair was brought on until
possible, but the repair type doesn't consent to proceed further.
Possible states, and transitions
--------------------------------
At any point an instance can be in one of the following health states:
Healthy
+++++++
The instance lives on only online nodes. The autorepair system will
never touch these instances. Any ``repair:pending`` tags will be removed
and marked ``success`` with no jobs attached to them.
This state can transition to:
- Needs-repair, repair disallowed (node offlined or drained, no
autorepair tag)
- Needs-repair, autorepair allowed (node offlined or drained, autorepair
tag present)
- Suspended (a suspend tag is added)
Suspended
+++++++++
Whenever a ``repair:suspend`` tag is added the autorepair code won't
touch the instance until the timestamp on the tag has passed, if
present. The tag will be removed afterwards (and the instance will
transition to its correct state, depending on its health and other
tags).
Note that when an instance is suspended any pending repair is
interrupted, but jobs which were submitted before the suspension are
allowed to finish.
Needs-repair, repair disallowed
+++++++++++++++++++++++++++++++
The instance lives on an offline or drained node, but no autorepair tag
is set, or the autorepair tag set is of a type not powerful enough to
finish the repair. The autorepair system will never touch these
instances, and they can transition to:
- Healthy (manual repair)
- Pending repair (a ``repair:pending`` tag is added)
- Needs-repair, repair allowed always (an autorepair always tag is added)
- Suspended (a suspend tag is added)
Needs-repair, repair allowed always
+++++++++++++++++++++++++++++++++++
A ``repair:pending`` tag is added, and the instance transitions to the
Pending Repair state. The autorepair tag is preserved.
Of course if a ``repair:suspended`` tag is found no pending tag will be
added, and the instance will instead transition to the Suspended state.
Pending repair
++++++++++++++
When an instance is in this stage the following will happen:
If a ``repair:suspended`` tag is found the instance won't be touched and
moved to the Suspended state. Any jobs which were already running will
be left untouched.
If there are still jobs running related to the instance and scheduled by
this repair they will be given more time to run, and the instance will
be checked again later. The state transitions to itself.
If no jobs are running and the instance is detected to be healthy, the
``repair:result`` tag will be added, and the current active
``repair:pending`` tag will be removed. It will then transition to the
Healthy state if there are no ``repair:pending`` tags, or to the Pending
state otherwise: there, the instance being healthy, those tags will be
resolved without any operation as well (note that this is the same as
transitioning to the Healthy state, where ``repair:pending`` tags would
also be resolved).
If no jobs are running and the instance still has issues:
- if the last job(s) failed it can either be retried a few times, if
deemed to be safe, or the repair can transition to the Failed state.
The ``repair:result`` tag will be added, and the active
``repair:pending`` tag will be removed (further ``repair:pending``
tags will not be able to proceed, as explained by the Failed state,
until the failure state is cleared)
- if the last job(s) succeeded but there are not enough resources to
proceed, the state will transition to itself and no jobs are
scheduled. The tag is left untouched (and later checked again). This
basically just delays any repairs, the current ``pending`` tag stays
active, and any others are untouched).
- if the last job(s) succeeded but the repair type cannot allow to
proceed any further the ``repair:result`` tag is added with an
``enoperm`` result, and the current ``repair:pending`` tag is removed.
The instance is now back to "Needs-repair, repair disallowed",
"Needs-repair, autorepair allowed", or "Pending" if there is already a
future tag that can repair the instance.
- if the last job(s) succeeded and the repair can continue new job(s)
can be submitted, and the ``repair:pending`` tag can be updated.
Failed
++++++
If repairing an instance has failed a ``repair:result:failure`` is
added. The presence of this tag is used to detect that an instance is in
this state, and it will not be touched until the failure is investigated
and the tag is removed.
An external tool or person needs to investigate the state of the
instance and remove this tag when he is sure the instance is repaired
and safe to turn back to the normal autorepair system.
(Alternatively we can use the suspended state (indefinitely or
temporarily) to mark the instance as "not touch" when we think a human
needs to look at it. To be decided).
A graph with the possible transitions follows; note that in the graph,
following the implementation, the two ``Needs repair`` states have been
coalesced into one; and the ``Suspended`` state disapears, for it
becames an attribute of the instance object (its auto-repair policy).
.. digraph:: "auto-repair-states"
node [shape=circle, style=filled, fillcolor="#BEDEF1",
width=2, fixedsize=true];
healthy [label="Healthy"];
needsrep [label="Needs repair"];
pendrep [label="Pending repair"];
failed [label="Failed repair"];
disabled [label="(no state)", width=1.25];
{rank=same; needsrep}
{rank=same; healthy}
{rank=same; pendrep}
{rank=same; failed}
{rank=same; disabled}
// These nodes are needed to be the "origin" of the "initial state" arrows.
node [width=.5, label="", style=invis];
inih;
inin;
inip;
inif;
inix;
edge [fontsize=10, fontname="Arial Bold", fontcolor=blue]
inih -> healthy [label="No tags or\nresult:success"];
inip -> pendrep [label="Tag:\nautorepair:pending"];
inif -> failed [label="Tag:\nresult:failure"];
inix -> disabled [fontcolor=black, label="ArNotEnabled"];
edge [fontcolor="orange"];
healthy -> healthy [label="No problems\ndetected"];
healthy -> needsrep [
label="Brokeness\ndetected in\nfirst half of\nthe tool run"];
pendrep -> healthy [
label="All jobs\ncompleted\nsuccessfully /\ninstance healthy"];
pendrep -> failed [label="Some job(s)\nfailed"];
edge [fontcolor="red"];
needsrep -> pendrep [
label="Repair\nallowed and\ninitial job(s)\nsubmitted"];
needsrep -> needsrep [
label="Repairs suspended\n(no-op) or enabled\nbut not powerful enough\n(result: enoperm)"];
pendrep -> pendrep [label="More jobs\nsubmitted"];
Repair operation
----------------
Possible repairs are:
- Replace-disks (drbd, if the secondary is down), (or other storage
specific fixes)
- Migrate (shared storage, rbd, drbd, if the primary is drained)
- Failover (shared storage, rbd, drbd, if the primary is down)
- Recreate disks + reinstall (all nodes down, plain, files or drbd)
Note that more than one of these operations may need to happen before a
full repair is completed (eg. if a drbd primary goes offline first a
failover will happen, then a replce-disks).
The self-repair tool will first take care of all needs-repair instance
that can be brought into ``pending`` state, and transition them as
described above.
Then it will go through any ``repair:pending`` instances and handle them
as described above.
Note that the repair tool MAY "group" instances by performing common
repair jobs for them (eg: node evacuate).
Staging of work
---------------
First version: recreate-disks + reinstall (2.6.1)
Second version: failover and migrate repairs (2.7)
Third version: replace disks repair (2.7 or 2.8)
Future work
===========
One important piece of work will be reporting what the autorepair system
is "thinking" and exporting this in a form that can be read by an
outside user or system. In order to do this we need a better
communication system than embedding this information into tags. This
should be thought in an extensible way that can be used in general for
Ganeti to provide "advisory" information about entities it manages, and
for an external system to "advise" ganeti over what it can do, but in a
less direct manner than submitting individual jobs.
Note that cluster verify checks some errors that are actually instance
specific, (eg. a missing backend disk on a drbd node) or node-specific
(eg. an extra lvm device). If we were to split these into "instance
verify", "node verify" and "cluster verify", then we could easily use
this tool to perform some of those repairs as well.
Finally self-repairs could also be extended to the cluster level, for
example concepts like "N+1 failures", missing master candidates, etc. or
node level for some specific types of errors.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-bulk-create.rst 0000644 0000000 0000000 00000006360 12634264163 0020026 0 ustar 00root root 0000000 0000000 ==================
Ganeti Bulk Create
==================
.. contents:: :depth: 4
.. highlight:: python
Current state and shortcomings
==============================
Creation of instances happens a lot. A fair load is done by just
creating instances and due to bad allocation shifting them around later
again. Additionally, if you turn up a new cluster you already know a
bunch of instances, which need to exists on the cluster. Doing this
one-by-one is not only cumbersome but might also fail, due to lack of
resources or lead to badly balanced clusters.
Since the early Ganeti 2.0 alpha version there is a ``gnt-instance
batch-create`` command to allocate a bunch of instances based on a json
file. This feature, however, doesn't take any advantages of iallocator
and submits jobs in a serialized manner.
Proposed changes
----------------
To overcome this shortcoming we would extend the current iallocator
interface to allow bulk requests. On the Ganeti side, a new opcode is
introduced to handle the bulk creation and returning the resulting
placement from the IAllocator_.
Problems
--------
Due to the design of chained jobs, we can guarantee, that with the state
at which the ``multi-alloc`` opcode is run, all of the instances will
fit (or all won't). But we can't guarantee that once the instance
creation requests were submit, no other jobs have sneaked in between.
This might still lead to failing jobs because the resources have changed
in the meantime.
Implementation
==============
IAllocator
----------
A new additional ``type`` will be added called ``multi-allocate`` to
distinguish between normal and bulk operation. For the bulk operation
the ``request`` will be a finite list of request dicts.
If ``multi-allocate`` is declared, ``request`` must exist and is a list
of ``request`` dicts as described in :doc:`Operation specific input
`. The ``result`` then is a list of instance name and node
placements in the order of the ``request`` field.
In addition, the old ``allocate`` request type will be deprecated and at
latest in Ganeti 2.8 incooperated into this new request. Current code
will need slight adaption to work with the new request. This needs
careful testing.
OpInstanceBulkAdd
-----------------
We add a new opcode ``OpInstanceBulkAdd``. It receives a list of
``OpInstanceCreate`` on the ``instances`` field. This is done to make
sure, that these two loosely coupled opcodes do not get out of sync. On
the RAPI side, however, this just is a list of instance create
definitions. And the client is adapted accordingly.
The opcode itself does some sanity checks on the instance creation
opcodes which includes:
* ``mode`` is not set
* ``pnode`` and ``snodes`` is not set
* ``iallocator`` is not set
Any of the above error will be aborted with ``OpPrereqError``. Once the
list has been verified it is handed to the ``iallocator`` as described
in IAllocator_. Upon success we then return the result of the
IAllocator_ call.
At this point the current instance allocation would work with the
resources available on the cluster as perceived upon
``OpInstanceBulkAdd`` invocation. However, there might be corner cases
where this is not true as described in Problems_.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-ceph-ganeti-support.rst 0000644 0000000 0000000 00000014611 12634264163 0021524 0 ustar 00root root 0000000 0000000 ============================
RADOS/Ceph support in Ganeti
============================
.. contents:: :depth: 4
Objective
=========
The project aims to improve Ceph RBD support in Ganeti. It can be
primarily divided into following tasks.
- Use Qemu/KVM RBD driver to provide instances with direct RBD
support. [implemented as of Ganeti 2.10]
- Allow Ceph RBDs' configuration through Ganeti. [unimplemented]
- Write a data collector to monitor Ceph nodes. [unimplemented]
Background
==========
Ceph RBD
--------
Ceph is a distributed storage system which provides data access as
files, objects and blocks. As part of this project, we're interested in
integrating ceph's block device (RBD) directly with Qemu/KVM.
Primary components/daemons of Ceph.
- Monitor - Serve as authentication point for clients.
- Metadata - Store all the filesystem metadata (Not configured here as
they are not required for RBD)
- OSD - Object storage devices. One daemon for each drive/location.
RBD support in Ganeti
---------------------
Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster.
This is enabled through RBD disk templates. These templates allow RBD
volume's access through RBD Linux driver. The volumes are mapped to host
as local block devices which are then attached to the instances. This
method incurs an additional overhead. We plan to resolve it by using
Qemu's RBD driver to enable direct access to RBD volumes for KVM
instances.
Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster.
Allowing configuration of ceph nodes through Ganeti will be a good
addition to its prime features.
Qemu/KVM Direct RBD Integration
===============================
A new disk param ``access`` is introduced. It's added at
cluster/node-group level to simplify prototype implementation.
It will specify the access method either as ``userspace`` or
``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The
device path, ``rbd:/``, is generated by RADOSBlockDevice
and is added to the params dictionary as ``kvm_dev_path``.
This approach ensures that no disk template specific changes are
required in hv_kvm.py allowing easy integration of other distributed
storage systems (like Gluster).
Note that the RBD volume is mapped as a local block device as before.
The local mapping won't be used during instance operation in the
``userspace`` access mode, but can be used by administrators and OS
scripts.
Updated commands
----------------
::
$ gnt-instance info
``access:userspace/kernelspace`` will be added to Disks category. This
output applies to KVM based instances only.
Ceph configuration on Ganeti nodes
==================================
This document proposes configuration of distributed storage
pool (Ceph or Gluster) through ganeti. Currently, this design document
focuses on configuring a Ceph cluster. A prerequisite of this setup
would be installation of ceph packages on all the concerned nodes.
At Ganeti Cluster init, the user will set distributed-storage specific
options which will be stored at cluster level. The Storage cluster
will be initialized using ``gnt-storage``. For the prototype, only a
single storage pool/node-group is configured.
Following steps take place when a node-group is initialized as a storage
cluster.
- Check for an existing ceph cluster through /etc/ceph/ceph.conf file
on each node.
- Fetch cluster configuration parameters and create a distributed
storage object accordingly.
- Issue an 'init distributed storage' RPC to group nodes (if any).
- On each node, ``ceph`` cli tool will run appropriate services.
- Mark nodes as well as the node-group as distributed-storage-enabled.
The storage cluster will operate at a node-group level. The ceph
cluster will be initiated using gnt-storage. A new sub-command
``init-distributed-storage`` will be added to it.
The configuration of the nodes will be handled through an init function
called by the node daemons running on the respective nodes. A new RPC is
introduced to handle the calls.
A new object will be created to send the storage parameters to the node
- storage_type, devices, node_role (mon/osd) etc.
A new node can be directly assigned to the storage enabled node-group.
During the 'gnt-node add' process, required ceph daemons will be started
and node will be added to the ceph cluster.
Only an offline node can be assigned to storage enabled node-group.
``gnt-node --readd`` needs to be performed to issue RPCs for spawning
appropriate services on the newly assigned node.
Updated Commands
----------------
Following are the affected commands.::
$ gnt-cluster init -S ceph:disk=/dev/sdb,option=value...
During cluster initialization, ceph specific options are provided which
apply at cluster-level.::
$ gnt-cluster modify -S ceph:option=value2...
For now, cluster modification will be allowed when there is no
initialized storage cluster.::
$ gnt-storage init-distributed-storage -s{--storage-type} ceph \
Ensure that no other node-group is configured as distributed storage
cluster and configure ceph on the specified node-group. If there is no
node in the node-group, it'll only be marked as distributed storage
enabled and no action will be taken.::
$ gnt-group assign-nodes
It ensures that the node is offline if the node-group specified is
distributed storage capable. Ceph configuration on the newly assigned
node is not performed at this step.::
$ gnt-node --offline
If the node is part of storage node-group, an offline call will stop/remove
ceph daemons.::
$ gnt-node add --readd
If the node is now part of the storage node-group, issue init
distributed storage RPC to the respective node. This step is required
after assigning a node to the storage enabled node-group::
$ gnt-node remove
A warning will be issued stating that the node is part of distributed
storage, mark it offline before removal.
Data collector for Ceph
-----------------------
TBD
Future Work
-----------
Due to the loopback bug in ceph, one may run into daemon hang issues
while performing writes to a RBD volumes through block device mapping.
This bug is applicable only when the RBD volume is stored on the OSD
running on the local node. In order to mitigate this issue, we can
create storage pools on different nodegroups and access RBD
volumes on different pools.
http://tracker.ceph.com/issues/3076
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-chained-jobs.rst 0000644 0000000 0000000 00000020303 12634264163 0020147 0 ustar 00root root 0000000 0000000 ============
Chained jobs
============
.. contents:: :depth: 4
This is a design document about the innards of Ganeti's job processing.
Readers are advised to study previous design documents on the topic:
- :ref:`Original job queue `
- :ref:`Job priorities `
- :doc:`LU-generated jobs `
Current state and shortcomings
==============================
Ever since the introduction of the job queue with Ganeti 2.0 there have
been situations where we wanted to run several jobs in a specific order.
Due to the job queue's current design, such a guarantee can not be
given. Jobs are run according to their priority, their ability to
acquire all necessary locks and other factors.
One way to work around this limitation is to do some kind of job
grouping in the client code. Once all jobs of a group have finished, the
next group is submitted and waited for. There are different kinds of
clients for Ganeti, some of which don't share code (e.g. Python clients
vs. htools). This design proposes a solution which would be implemented
as part of the job queue in the master daemon.
Proposed changes
================
With the implementation of :ref:`job priorities
` the processing code was re-architectured
and became a lot more versatile. It now returns jobs to the queue in
case the locks for an opcode can't be acquired, allowing other
jobs/opcodes to be run in the meantime.
The proposal is to add a new, optional property to opcodes to define
dependencies on other jobs. Job X could define opcodes with a dependency
on the success of job Y and would only be run once job Y is finished. If
there's a dependency on success and job Y failed, job X would fail as
well. Since such dependencies would use job IDs, the jobs still need to
be submitted in the right order.
.. pyassert::
# Update description below if finalized job status change
constants.JOBS_FINALIZED == frozenset([
constants.JOB_STATUS_CANCELED,
constants.JOB_STATUS_SUCCESS,
constants.JOB_STATUS_ERROR,
])
The new attribute's value would be a list of two-valued tuples. Each
tuple contains a job ID and a list of requested status for the job
depended upon. Only final status are accepted
(:pyeval:`utils.CommaJoin(constants.JOBS_FINALIZED)`). An empty list is
equivalent to specifying all final status (except
:pyeval:`constants.JOB_STATUS_CANCELED`, which is treated specially).
An opcode runs only once all its dependency requirements have been
fulfilled.
Any job referring to a cancelled job is also cancelled unless it
explicitly lists :pyeval:`constants.JOB_STATUS_CANCELED` as a requested
status.
In case a referenced job can not be found in the normal queue or the
archive, referring jobs fail as the status of the referenced job can't
be determined.
With this change, clients can submit all wanted jobs in the right order
and proceed to wait for changes on all these jobs (see
``cli.JobExecutor``). The master daemon will take care of executing them
in the right order, while still presenting the client with a simple
interface.
Clients using the ``SubmitManyJobs`` interface can use relative job IDs
(negative integers) to refer to jobs in the same submission.
.. highlight:: javascript
Example data structures::
# First job
{
"job_id": "6151",
"ops": [
{ "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., },
{ "OP_ID": "OP_INSTANCE_FAILOVER", ..., },
],
}
# Second job, runs in parallel with first job
{
"job_id": "7687",
"ops": [
{ "OP_ID": "OP_INSTANCE_MIGRATE", ..., },
],
}
# Third job, depending on success of previous jobs
{
"job_id": "9218",
"ops": [
{ "OP_ID": "OP_NODE_SET_PARAMS",
"depend": [
[6151, ["success"]],
[7687, ["success"]],
],
"offline": True, },
],
}
Implementation details
----------------------
Status while waiting for dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jobs waiting for dependencies are certainly not in the queue anymore and
therefore need to change their status from "queued". While waiting for
opcode locks the job is in the "waiting" status (the constant is named
``JOB_STATUS_WAITLOCK``, but the actual value is ``waiting``). There the
following possibilities:
#. Introduce a new status, e.g. "waitdeps".
Pro:
- Clients know for sure a job is waiting for dependencies, not locks
Con:
- Code and tests would have to be updated/extended for the new status
- List of possible state transitions certainly wouldn't get simpler
- Breaks backwards compatibility, older clients might get confused
#. Use existing "waiting" status.
Pro:
- No client changes necessary, less code churn (note that there are
clients which don't live in Ganeti core)
- Clients don't need to know the difference between waiting for a job
and waiting for a lock; it doesn't make a difference
- Fewer state transitions (see commit ``5fd6b69479c0``, which removed
many state transitions and disk writes)
Con:
- Not immediately visible what a job is waiting for, but it's the
same issue with locks; this is the reason why the lock monitor
(``gnt-debug locks``) was introduced; job dependencies can be shown
as "locks" in the monitor
Based on these arguments, the proposal is to do the following:
- Rename ``JOB_STATUS_WAITLOCK`` constant to ``JOB_STATUS_WAITING`` to
reflect its actual meanting: the job is waiting for something
- While waiting for dependencies and locks, jobs are in the "waiting"
status
- Export dependency information in lock monitor; example output::
Name Mode Owner Pending
job/27491 - - success:job/34709,job/21459
job/21459 - - success,error:job/14513
Cost of deserialization
~~~~~~~~~~~~~~~~~~~~~~~
To determine the status of a dependency job the job queue must have
access to its data structure. Other queue operations already do this,
e.g. archiving, watching a job's progress and querying jobs.
Initially (Ganeti 2.0/2.1) the job queue shared the job objects
in memory and protected them using locks. Ganeti 2.2 (see :doc:`design
document `) changed the queue to read and deserialize jobs
from disk. This significantly reduced locking and code complexity.
Nowadays inotify is used to wait for changes on job files when watching
a job's progress.
Reading from disk and deserializing certainly has some cost associated
with it, but it's a significantly simpler architecture than
synchronizing in memory with locks. At the stage where dependencies are
evaluated the queue lock is held in shared mode, so different workers
can read at the same time (deliberately ignoring CPython's interpreter
lock).
It is expected that the majority of executed jobs won't use
dependencies and therefore won't be affected.
Other discussed solutions
=========================
Job-level attribute
-------------------
At a first look it might seem to be better to put dependencies on
previous jobs at a job level. However, it turns out that having the
option of defining only a single opcode in a job as having such a
dependency can be useful as well. The code complexity in the job queue
is equivalent if not simpler.
Since opcodes are guaranteed to run in order, clients can just define
the dependency on the first opcode.
Another reason for the choice of an opcode-level attribute is that the
current LUXI interface for submitting jobs is a bit restricted and would
need to be changed to allow the addition of job-level attributes,
potentially requiring changes in all LUXI clients and/or breaking
backwards compatibility.
Client-side logic
-----------------
There's at least one implementation of a batched job executor twisted
into the ``burnin`` tool's code. While certainly possible, a client-side
solution should be avoided due to the different clients already in use.
For one, the :doc:`remote API ` client shouldn't import
non-standard modules. htools are written in Haskell and can't use Python
modules. A batched job executor contains quite some logic. Even if
cleanly abstracted in a (Python) library, sharing code between different
clients is difficult if not impossible.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-cmdlib-unittests.rst 0000644 0000000 0000000 00000015134 12634264163 0021121 0 ustar 00root root 0000000 0000000 =====================================
Unit tests for cmdlib / LogicalUnit's
=====================================
.. contents:: :depth: 4
This is a design document describing unit tests for the cmdlib module.
Other modules are deliberately omitted, as LU's contain the most complex
logic and are only sparingly tested.
Current state and shortcomings
==============================
The current test coverage of the cmdlib module is at only ~14%. Given
the complexity of the code this is clearly too little.
The reasons for this low coverage are numerous. There are organisational
reasons, like no strict requirements for unit tests for each feature.
But there are also design and technical reasons, which this design
document wants to address. First, it's not clear which parts of LU's
should be tested by unit tests, i.e. the test boundaries are not clearly
defined. And secondly, it's too hard to actually write unit tests for
LU's. There exists no good framework or set of tools to write easy to
understand and concise tests.
Proposed changes
================
This design document consists of two parts. Initially, the test
boundaries for cmdlib are laid out, and considerations about writing
unit tests are given. Then the test framework is described, together
with a rough overview of the individual parts and how they are meant
to be used.
Test boundaries
---------------
For the cmdlib module, every LogicalUnit is seen as a unit for testing.
Unit tests for LU's may only execute the LU but make sure that no side
effect (like filesystem access, network access or the like) takes
place. Smaller test units (like individual methods) are sensible and
will be supported by the test framework. However, they are not the main
scope of this document.
LU's require the following environment to be provided by the test code
in order to be executed:
An input opcode
LU's get all the user provided input and parameters from the opcode.
The command processor
Used to get the execution context id and to output logging messages.
It also drives the execution of LU's by calling the appropriate
methods in the right order.
The Ganeti context
Provides node-management methods and contains
* The configuration. This gives access to the cluster configuration.
* The Ganeti Lock Manager. Manages locks during the execution.
The RPC runner
Used to communicate with node daemons on other nodes and to perform
operations on them.
The IAllocator runner
Calls the IAllocator with a given request.
All of those components have to be replaced/adapted by the test
framework.
The goal of unit tests at the LU level is to exercise every possible
code path in the LU at least once. Shared methods which are used by
multiple LU's should be made testable by themselves and explicit unit
tests should be written for them.
Ultimately, the code coverage for the cmdlib module should be higher
than 90%. As Python is a dynamic language, a portion of those tests
only exists to exercise the code without actually asserting for
anything in the test. They merely make sure that no type errors exist
and that potential typos etc. are caught at unit test time.
Test framework
--------------
The test framework will it make possible to write short and concise
tests for LU's. In the simplest case, only an opcode has to be provided
by the test. The framework will then use default values, like an almost
empty configuration with only the master node and no instances.
All aspects of the test environment will be configurable by individual
tests.
MCPU mocking
************
The MCPU drives the execution of LU's. It has to perform its usual
sequence of actions, but additionally it has to provide easy access to
the log output of LU's. It will contain utility assertion methods on the
output.
The mock will be a sub-class of ``mcpu.Processor`` which overrides
portions of it in order to support the additional functionality. The
advantage of being a sub-class of the original processor is the
automatic compatibility with the code running in real clusters.
Configuration mocking
*********************
Per default, the mocked configuration will contain only the master node,
no instances and default parameters. However, convenience methods for
the following use cases will be provided:
- "Shortcut" methods to add objects to the configuration.
- Helper methods to quickly create standard nodes/instances/etc.
- Pre-populated default configurations for standard use-cases (i.e.
cluster with three nodes, five instances, etc.).
- Convenience assertion methods for checking the configuration.
Lock mocking
************
Initially, the mocked lock manager always grants all locks. It performs
the following tasks:
- It keeps track of requested/released locks.
- Provides utility assertion methods for checking locks (current and
already released ones).
In the future, this component might be extended to prevent locks from
being granted. This could eventually be used to test optimistic locking.
RPC mocking
***********
No actual RPC can be made during unit tests. Therefore, those calls have
to be replaced and their results mocked. As this will entail a large
portion of work when writing tests, mocking RPC's will be made as easy as
possible. This entails:
- Easy construction of RPC results.
- Easy mocking of RPC calls (also multiple ones of the same type during
one LU execution).
- Asserting for RPC calls (including arguments, affected nodes, etc.).
IAllocator mocking
******************
Calls (also multiple ones during the execution of a LU) to the
IAllocator interface have to be mocked. The framework will provide,
similarly to the RPC mocking, provide means to specify the mocked result
and to assert on the IAllocator requests.
Future work
===========
With unit tests for cmdlib in place, further unit testing for other
modules can and should be added. The test boundaries therefore should be
aligned with the boundaries from cmdlib.
The mocked locking module can be extended to allow testing of optimistic
locking in LU's. In this case, on all requested locks are actually
granted to the LU, so it has to adapt for this situation correctly.
A higher test coverage for LU's will increase confidence in our code and
tests. Refactorings will be easier to make as more problems are caught
during tests.
After a baseline of unit tests is established for cmdlib, efficient
testing guidelines could be put in place. For example, new code could be
required to not lower the test coverage in cmdlib. Additionally, every
bug fix could be required to include a test which triggered the bug
before the fix is created.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-configlock.rst 0000644 0000000 0000000 00000015174 12634264163 0017751 0 ustar 00root root 0000000 0000000 ===================================
Removal of the Config Lock Overhead
===================================
.. contents:: :depth: 4
This is a design document detailing how the adverse effect of
the config lock can be removed in an incremental way.
Current state and shortcomings
==============================
As a result of the :doc:`design-daemons`, the configuration is held
in a proccess different from the processes carrying out the Ganeti
jobs. Therefore, job processes have to contact WConfD in order to
change the configuration. Of course, these modifications of the
configuration need to be synchronised.
The current form of synchronisation is via ``ConfigLock``. Exclusive
possession of this lock guarantees that no one else modifies the
configuration. In other words, the current procedure for a job to
update the configuration is to
- acquire the ``ConfigLock`` from WConfD,
- read the configration,
- write the modified configuration, and
- release ``ConfigLock``.
The current procedure has some drawbacks. These also affect the
overall throughput of jobs in a Ganeti cluster.
- At each configuration update, the whole configuration is
transferred between the job and WConfD.
- More importantly, however, jobs can only release the ``ConfigLock`` after
the write; the write, in turn, is only confirmed once the configuration
is written on disk. In particular, we can only have one update per
configuration write. Also, having the ``ConfigLock`` is only confirmed
to the job, once the new lock status is written to disk.
Additional overhead is caused by the fact that reads are synchronised over
a shared config lock. This used to make sense when the configuration was
modifiable in the same process to ensure consistent read. With the new
structure, all access to the configuration via WConfD are consistent
anyway, and local modifications by other jobs do not happen.
Proposed changes for an incremental improvement
===============================================
Ideally, jobs would just send patches for the configuration to WConfD
that are applied by means of atomically updating the respective ``IORef``.
This, however, would require chaning all of Ganeti's logical units in
one big change. Therefore, we propose to keep the ``ConfigLock`` and,
step by step, reduce its impact till it eventually will be just used
internally in the WConfD process.
Unlocked Reads
--------------
In a first step, all configuration operations that are synchronised over
a shared config lock, and therefore necessarily read-only, will instead
use WConfD's ``readConfig`` used to obtain a snapshot of the configuration.
This will be done without modifying the locks. It is sound, as reads to
a Haskell ``IORef`` always yield a consistent value. From that snapshot
the required view is computed locally. This saves two lock-configurtion
write cycles per read and, additionally, does not block any concurrent
modifications.
In a second step, more specialised read functions will be added to ``WConfD``.
This will reduce the traffic for reads.
Cached Reads
------------
As jobs synchronize with each other by means of regular locks, the parts
of the configuration relevant for a job can only change while a job waits
for new locks. So, if a job has a copy of the configuration and not asked
for locks afterwards, all read-only access can be done from that copy. While
this will not affect the ``ConfigLock``, it saves traffic.
Set-and-release action
----------------------
As a typical pattern is to change the configuration and afterwards release
the ``ConfigLock``. To avoid unnecessary RPC call overhead, WConfD will offer
a combined call. To make that call retryable, it will do nothing if the the
``ConfigLock`` is not held by the caller; in the return value, it will indicate
if the config lock was held when the call was made.
Short-lived ``ConfigLock``
--------------------------
For a lot of operations, the regular locks already ensure that only
one job can modify a certain part of the configuration. For example,
only jobs with an exclusive lock on an instance will modify that
instance. Therefore, it can update that entity atomically,
without relying on the configuration lock to achive consistency.
``WConfD`` will provide such operations. To
avoid interference with non-atomic operations that still take the
config lock and write the configuration as a whole, this operation
will only be carried out at times the config lock is not taken. To
ensure this, the thread handling the request will take the config lock
itself (hence no one else has it, if that succeeds) before the change
and release afterwards; both operations will be done without
triggering a writeout of the lock status.
Note that the thread handling the request has to take the lock in its
own name and not in that of the requesting job. A writeout of the lock
status can still happen, triggered by other requests. Now, if
``WConfD`` gets restarted after the lock acquisition, if that happend
in the name of the job, it would own a lock without knowing about it,
and hence that lock would never get released.
Approaches considered, but not working
======================================
Set-and-release action with asynchronous writes
-----------------------------------------------
Approach
~~~~~~~~
As a typical pattern is to change the configuration and afterwards release
the ``ConfigLock``. To avoid unnecessary delay in this operation (the next
modification of the configuration can already happen while the last change
is written out), WConfD will offer a combined command that will
- set the configuration to the specified value,
- release the config lock,
- and only then wait for the configuration write to finish; it will not
wait for confirmation of the lock-release write.
If jobs use this combined command instead of the sequential set followed
by release, new configuration changes can come in during writeout of the
current change; in particular, a writeout can contain more than one change.
Problem
~~~~~~~
This approach works fine, as long as always either ``WConfD`` can do an ordered
shutdown or the calling process dies as well. If however, we allow random kill
signals to be sent to individual daemons (e.g., by an out-of-memory killer),
the following race occurs. A process can ask for a combined write-and-unlock
operation; while the configuration is still written out, the write out of the
updated lock status already finishes. Now, if ``WConfD`` forcefully gets killed
in that very moment, a restarted ``WConfD`` will read the old configuration but
the new lock status. This will make the calling process believe that its call,
while it didn't get an answer, succeeded nevertheless, thus resulting in a
wrong configuration state.
ganeti-2.15.2/doc/design-cpu-pinning.rst 0000644 0000000 0000000 00000017623 12634264163 0020063 0 ustar 00root root 0000000 0000000 Ganeti CPU Pinning
==================
Objective
---------
This document defines Ganeti's support for CPU pinning (aka CPU
affinity).
CPU pinning enables mapping and unmapping entire virtual machines or a
specific virtual CPU (vCPU), to a physical CPU or a range of CPUs.
At this stage Pinning will be implemented for Xen and KVM.
Command Line
------------
Suggested command line parameters for controlling CPU pinning are as
follows::
gnt-instance modify -H cpu_mask=
cpu-pinning-info can be any of the following:
* One vCPU mapping, which can be the word "all" or a combination
of CPU numbers and ranges separated by comma. In this case, all
vCPUs will be mapped to the indicated list.
* A list of vCPU mappings, separated by a colon ':'. In this case
each vCPU is mapped to an entry in the list, and the size of the
list must match the number of vCPUs defined for the instance. This
is enforced when setting CPU pinning or when setting the number of
vCPUs using ``-B vcpus=#``.
The mapping list is matched to consecutive virtual CPUs, so the first entry
would be the CPU pinning information for vCPU 0, the second entry
for vCPU 1, etc.
The default setting for new instances is "all", which maps the entire
instance to all CPUs, thus effectively turning off CPU pinning.
Here are some usage examples::
# Map vCPU 0 to physical CPU 1 and vCPU 1 to CPU 3 (assuming 2 vCPUs)
gnt-instance modify -H cpu_mask=1:3 my-inst
# Pin vCPU 0 to CPUs 1 or 2, and vCPU 1 to any CPU
gnt-instance modify -H cpu_mask=1-2:all my-inst
# Pin vCPU 0 to any CPU, vCPU 1 to CPUs 1, 3, 4 or 5, and CPU 2 to
# CPU 0
gnt-instance modify -H cpu_mask=all:1\\,3-5:0 my-inst
# Pin entire VM to CPU 0
gnt-instance modify -H cpu_mask=0 my-inst
# Turn off CPU pinning (default setting)
gnt-instance modify -H cpu_mask=all my-inst
Assuming an instance has 3 vCPUs, the following commands will fail::
# not enough mappings
gnt-instance modify -H cpu_mask=0:1 my-inst
# too many
gnt-instance modify -H cpu_mask=2:1:1:all my-inst
Validation
----------
CPU pinning information is validated by making sure it matches the
number of vCPUs. This validation happens when changing either the
cpu_mask or vcpus parameters.
Changing either parameter in a way that conflicts with the other will
fail with a proper error message.
To make such a change, both parameters should be modified at the same
time. For example:
``gnt-instance modify -B vcpus=4 -H cpu_mask=1:1:2-3:4\\,6 my-inst``
Besides validating CPU configuration, i.e. the number of vCPUs matches
the requested CPU pinning, Ganeti will also verify the number of
physical CPUs is enough to support the required configuration. For
example, trying to run a configuration of vcpus=2,cpu_mask=0:4 on
a node with 4 cores will fail (Note: CPU numbers are 0-based).
This validation should repeat every time an instance is started or
migrated live. See more details under Migration below.
Cluster verification should also test the compatibility of other nodes in
the cluster to required configuration and alert if a minimum requirement
is not met.
Failover
--------
CPU pinning configuration can be transferred from node to node, unless
the number of physical CPUs is smaller than what the configuration calls
for. It is suggested that unless this is the case, all transfers and
migrations will succeed.
In case the number of physical CPUs is smaller than the numbers
indicated by CPU pinning information, instance failover will fail.
In case of emergency, to force failover to ignore mismatching CPU
information, the following switch can be used:
``gnt-instance failover --fix-cpu-mismatch my-inst``.
This command will try to failover the instance with the current cpu mask,
but if that fails, it will change the mask to be "all".
Migration
---------
In case of live migration, and in addition to failover considerations,
it is required to remap CPU pinning after migration. This can be done in
realtime for instances for both Xen and KVM, and only depends on the
number of physical CPUs being sufficient to support the migrated
instance.
Data
----
Pinning information will be kept as a list of integers per vCPU.
To mark a mapping of any CPU, we will use (-1).
A single entry, no matter what the number of vCPUs is, will always mean
that all vCPUs have the same mapping.
Configuration file
------------------
The pinning information is kept for each instance's hypervisor
params section of the configuration file as the original string.
Xen
---
There are 2 ways to control pinning in Xen, either via the command line
or through the configuration file.
The commands to make direct pinning changes are the following::
# To pin a vCPU to a specific CPU
xm vcpu-pin
# To unpin a vCPU
xm vcpu-pin all
# To get the current pinning status
xm vcpu-list
Since currently controlling Xen in Ganeti is done in the configuration
file, it is straight forward to use the same method for CPU pinning.
There are 2 different parameters that control Xen's CPU pinning and
configuration:
vcpus
controls the number of vCPUs
cpus
maps vCPUs to physical CPUs
When no pinning is required (pinning information is "all"), the
"cpus" entry is removed from the configuration file.
For all other cases, the configuration is "translated" to Xen, which
expects either ``cpus = "a"`` or ``cpus = [ "a", "b", "c", ...]``,
where each a, b or c are a physical CPU number, CPU range, or a
combination, and the number of entries (if a list is used) must match
the number of vCPUs, and are mapped in order.
For example, CPU pinning information of ``1:2,4-7:0-1`` is translated
to this entry in Xen's configuration ``cpus = [ "1", "2,4-7", "0-1" ]``
KVM
---
Controlling pinning in KVM is a little more complicated as there is no
configuration to control pinning before instances are started.
The way to change or assign CPU pinning under KVM is to use ``taskset`` or
its underlying system call ``sched_setaffinity``. Setting the affinity for
the VM process will change CPU pinning for the entire VM, and setting it
for specific vCPU threads will control specific vCPUs.
The sequence of commands to control pinning is this: start the instance
with the ``-S`` switch, so it halts before starting execution, get the
process ID or identify thread IDs of each vCPU by sending ``info cpus``
to the monitor, map vCPUs as required by the cpu-pinning information,
and issue a ``cont`` command on the KVM monitor to allow the instance
to start execution.
For example, a sequence of commands to control CPU affinity under KVM
may be:
* Start KVM: ``/usr/bin/kvm … … -S``
* Use socat to connect to monitor
* send ``info cpus`` to monitor to get thread/vCPU information
* call ``sched_setaffinity`` for each thread with the CPU mask
* send ``cont`` to KVM's monitor
A CPU mask is a hexadecimal bit mask where each bit represents one
physical CPU. See man page for :manpage:`sched_setaffinity(2)` for more
details.
For example, to run a specific thread-id on CPUs 1 or 3 the mask is
0x0000000A.
As of 2.12, the psutil python package
(https://github.com/giampaolo/psutil) will be used to control process
and thread affinity. The affinity python package
(http://pypi.python.org/pypi/affinity) was used before, but it was not
invoking the two underlying system calls appropriately, using a cast
instead of the CPU_SET macro, causing failures for masks referencing
more than 63 CPUs.
Alternative Design Options
--------------------------
1. There's an option to ignore the limitations of the underlying
hypervisor and instead of requiring explicit pinning information
for *all* vCPUs, assume a mapping of "all" to vCPUs not mentioned.
This can lead to inadvertent missing information, but either way,
since using cpu-pinning options is probably not going to be
frequent, there's no real advantage.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-cpu-speed.rst 0000644 0000000 0000000 00000003274 12634264163 0017516 0 ustar 00root root 0000000 0000000 ======================================
Taking relative CPU speed into account
======================================
.. contents:: :depth: 4
This document describes the suggested addition of a new
node-parameter, describing the CPU speed of a node,
relative to that of a normal node in the node group.
Current state and shortcomings
==============================
Currently, for balancing a cluster, for most resources (disk, memory),
the ratio between the amount used and the amount available is taken as
a measure of load for that resources. As ``hbal`` tries to even out the
load in terms of these measures, larger nodes get a larger share of the
instances, even for a cluster not running at full capacity.
For for one resources, however, hardware differences are not taken into
account: CPU speed. For CPU, the load is measured by the ratio of used virtual
to physical CPUs on the node. Balancing this measure implictly assumes
equal speed of all CPUs.
Proposed changes
================
It is proposed to add a new node parameter, ``cpu_speed``, that is a
floating-point number, with default value ``1.0``. It can be modified in the
same ways, as all other node parameters.
The cluster metric used by ``htools`` will be changed to use the ratio
of virtual to physical cpus weighted by speed, rather than the plain
virtual-to-physical ratio. So, when balancing, nodes will be
considered as if they had physical cpus equal to ``cpu_speed`` times
the actual number.
Finally, it should be noted that for IO load, in non-dedicated Ganeti, the
``spindle_count`` already serves the same purpose as the newly proposed
``cpu_speed``. It is a parameter to measure the amount of IO a node can handle
in arbitrary units.
ganeti-2.15.2/doc/design-daemons.rst 0000644 0000000 0000000 00000064321 12634264163 0017257 0 ustar 00root root 0000000 0000000 ==========================
Ganeti daemons refactoring
==========================
.. contents:: :depth: 2
This is a design document detailing the plan for refactoring the internal
structure of Ganeti, and particularly the set of daemons it is divided into.
Current state and shortcomings
==============================
Ganeti is comprised of a growing number of daemons, each dealing with part of
the tasks the cluster has to face, and communicating with the other daemons
using a variety of protocols.
Specifically, as of Ganeti 2.8, the situation is as follows:
``Master daemon (MasterD)``
It is responsible for managing the entire cluster, and it's written in Python.
It is executed on a single node (the master node). It receives the commands
given by the cluster administrator (through the remote API daemon or the
command line tools) over the LUXI protocol. The master daemon is responsible
for creating and managing the jobs that will execute such commands, and for
managing the locks that ensure the cluster will not incur in race conditions.
Each job is managed by a separate Python thread, that interacts with the node
daemons via RPC calls.
The master daemon is also responsible for managing the configuration of the
cluster, changing it when required by some job. It is also responsible for
copying the configuration to the other master candidates after updating it.
``RAPI daemon (RapiD)``
It is written in Python and runs on the master node only. It waits for
requests issued remotely through the remote API protocol. Then, it forwards
them, using the LUXI protocol, to the master daemon (if they are commands) or
to the query daemon if they are queries about the configuration (including
live status) of the cluster.
``Node daemon (NodeD)``
It is written in Python. It runs on all the nodes. It is responsible for
receiving the master requests over RPC and execute them, using the appropriate
backend (hypervisors, DRBD, LVM, etc.). It also receives requests over RPC for
the execution of queries gathering live data on behalf of the query daemon.
``Configuration daemon (ConfD)``
It is written in Haskell. It runs on all the master candidates. Since the
configuration is replicated only on the master node, this daemon exists in
order to provide information about the configuration to nodes needing them.
The requests are done through ConfD's own protocol, HMAC signed,
implemented over UDP, and meant to be used by parallely querying all the
master candidates (or a subset thereof) and getting the most up to date
answer. This is meant as a way to provide a robust service even in case master
is temporarily unavailable.
``Query daemon (QueryD)``
It is written in Haskell. It runs on all the master candidates. It replies
to Luxi queries about the current status of the system, including live data it
obtains by querying the node daemons through RPCs.
``Monitoring daemon (MonD)``
It is written in Haskell. It runs on all nodes, including the ones that are
not vm-capable. It is meant to provide information on the status of the
system. Such information is related only to the specific node the daemon is
running on, and it is provided as JSON encoded data over HTTP, to be easily
readable by external tools.
The monitoring daemon communicates with ConfD to get information about the
configuration of the cluster. The choice of communicating with ConfD instead
of MasterD allows it to obtain configuration information even when the cluster
is heavily degraded (e.g.: when master and some, but not all, of the master
candidates are unreachable).
The current structure of the Ganeti daemons is inefficient because there are
many different protocols involved, and each daemon needs to be able to use
multiple ones, and has to deal with doing different things, thus making
sometimes unclear which daemon is responsible for performing a specific task.
Also, with the current configuration, jobs are managed by the master daemon
using python threads. This makes terminating a job after it has started a
difficult operation, and it is the main reason why this is not possible yet.
The master daemon currently has too many different tasks, that could be handled
better if split among different daemons.
Proposed changes
================
In order to improve on the current situation, a new daemon subdivision is
proposed, and presented hereafter.
.. digraph:: "new-daemons-structure"
{rank=same; RConfD LuxiD;}
{rank=same; Jobs rconfigdata;}
node [shape=box]
RapiD [label="RapiD [M]"]
LuxiD [label="LuxiD [M]"]
WConfD [label="WConfD [M]"]
Jobs [label="Jobs [M]"]
RConfD [label="RConfD [MC]"]
MonD [label="MonD [All]"]
NodeD [label="NodeD [All]"]
Clients [label="gnt-*\nclients [M]"]
p1 [shape=none, label=""]
p2 [shape=none, label=""]
p3 [shape=none, label=""]
p4 [shape=none, label=""]
configdata [shape=none, label="config.data"]
rconfigdata [shape=none, label="config.data\n[MC copy]"]
locksdata [shape=none, label="locks.data"]
RapiD -> LuxiD [label="LUXI"]
LuxiD -> WConfD [label="WConfD\nproto"]
LuxiD -> Jobs [label="fork/exec"]
Jobs -> WConfD [label="WConfD\nproto"]
Jobs -> NodeD [label="RPC"]
LuxiD -> NodeD [label="RPC"]
rconfigdata -> RConfD
configdata -> rconfigdata [label="sync via\nNodeD RPC"]
WConfD -> NodeD [label="RPC"]
WConfD -> configdata
WConfD -> locksdata
MonD -> RConfD [label="RConfD\nproto"]
Clients -> LuxiD [label="LUXI"]
p1 -> MonD [label="MonD proto"]
p2 -> RapiD [label="RAPI"]
p3 -> RConfD [label="RConfD\nproto"]
p4 -> Clients [label="CLI"]
``LUXI daemon (LuxiD)``
It will be written in Haskell. It will run on the master node and it will be
the only LUXI server, replying to all the LUXI queries. These includes both
the queries about the live configuration of the cluster, previously served by
QueryD, and the commands actually changing the status of the cluster by
submitting jobs. Therefore, this daemon will also be the one responsible with
managing the job queue. When a job needs to be executed, the LuxiD will spawn
a separate process tasked with the execution of that specific job, thus making
it easier to terminate the job itself, if needeed. When a job requires locks,
LuxiD will request them from WConfD.
In order to keep availability of the cluster in case of failure of the master
node, LuxiD will replicate the job queue to the other master candidates, by
RPCs to the NodeD running there (the choice of RPCs for this task might be
reviewed at a second time, after implementing this design).
``Configuration management daemon (WConfD)``
It will run on the master node and it will be responsible for the management
of the authoritative copy of the cluster configuration (that is, it will be
the daemon actually modifying the ``config.data`` file). All the requests of
configuration changes will have to pass through this daemon, and will be
performed using a LUXI-like protocol ("WConfD proto" in the graph. The exact
protocol will be defined in the separate design document that will detail the
WConfD separation). Having a single point of configuration management will
also allow Ganeti to get rid of possible race conditions due to concurrent
modifications of the configuration. When the configuration is updated, it
will have to push the received changes to the other master candidates, via
RPCs, so that RConfD daemons and (in case of a failure on the master node)
the WConfD daemon on the new master can access an up-to-date version of it
(the choice of RPCs for this task might be reviewed at a second time). This
daemon will also be the one responsible for managing the locks, granting them
to the jobs requesting them, and taking care of freeing them up if the jobs
holding them crash or are terminated before releasing them. In order to do
this, each job, after being spawned by LuxiD, will open a local unix socket
that will be used to communicate with it, and will be destroyed when the job
terminates. LuxiD will be able to check, after a timeout, whether the job is
still running by connecting here, and to ask WConfD to forcefully remove the
locks if the socket is closed.
Also, WConfD should hold a serialized list of the locks and their owners in a
file (``locks.data``), so that it can keep track of their status in case it
crashes and needs to be restarted (by asking LuxiD which of them are still
running).
Interaction with this daemon will be performed using Unix sockets.
``Configuration query daemon (RConfD)``
It is written in Haskell, and it corresponds to the old ConfD. It will run on
all the master candidates and it will serve information about the static
configuration of the cluster (the one contained in ``config.data``). The
provided information will be highly available (as in: a response will be
available as long as a stable-enough connection between the client and at
least one working master candidate is available) and its freshness will be
best effort (the most recent reply from any of the master candidates will be
returned, but it might still be older than the one available through WConfD).
The information will be served through the ConfD protocol.
``Rapi daemon (RapiD)``
It remains basically unchanged, with the only difference that all of its LUXI
query are directed towards LuxiD instead of being split between MasterD and
QueryD.
``Monitoring daemon (MonD)``
It remains unaffected by the changes in this design document. It will just get
some of the data it needs from RConfD instead of the old ConfD, but the
interfaces of the two are identical.
``Node daemon (NodeD)``
It remains unaffected by the changes proposed in the design document. The only
difference being that it will receive its RPCs from LuxiD (for job queue
replication), from WConfD (for configuration replication) and for the
processes executing single jobs (for all the operations to be performed by
nodes) instead of receiving them just from MasterD.
This restructuring will allow us to reorganize and improve the codebase,
introducing cleaner interfaces and giving well defined and more restricted tasks
to each daemon.
Furthermore, having more well-defined interfaces will allow us to have easier
upgrade procedures, and to work towards the possibility of upgrading single
components of a cluster one at a time, without the need for immediately
upgrading the entire cluster in a single step.
Implementation
==============
While performing this refactoring, we aim to increase the amount of
Haskell code, thus benefiting from the additional type safety provided by its
wide compile-time checks. In particular, all the job queue management and the
configuration management daemon will be written in Haskell, taking over the role
currently fulfilled by Python code executed as part of MasterD.
The changes describe by this design document are quite extensive, therefore they
will not be implemented all at the same time, but through a sequence of steps,
leaving the codebase in a consistent and usable state.
#. Rename QueryD to LuxiD.
A part of LuxiD, the one replying to configuration
queries including live information about the system, already exists in the
form of QueryD. This is being renamed to LuxiD, and will form the first part
of the new daemon. NB: this is happening starting from Ganeti 2.8. At the
beginning, only the already existing queries will be replied to by LuxiD.
More queries will be implemented in the next versions.
#. Let LuxiD be the interface for the queries and MasterD be their executor.
Currently, MasterD is the only responsible for receiving and executing LUXI
queries, and for managing the jobs they create.
Receiving the queries and managing the job queue will be extracted from
MasterD into LuxiD.
Actually executing jobs will still be done by MasterD, that contains all the
logic for doing that and for properly managing locks and the configuration.
At this stage, scheduling will simply consist in starting jobs until a fixed
maximum number of simultaneously running jobs is reached.
#. Extract WConfD from MasterD.
The logic for managing the configuration file is factored out to the
dedicated WConfD daemon. All configuration changes, currently executed
directly by MasterD, will be changed to be IPC requests sent to the new
daemon.
#. Extract locking management from MasterD.
The logic for managing and granting locks is extracted to WConfD as well.
Locks will not be taken directly anymore, but asked via IPC to WConfD.
This step can be executed on its own or at the same time as the previous one.
#. Jobs are executed as processes.
The logic for running jobs is rewritten so that each job can be managed by an
independent process. LuxiD will spawn a new (Python) process for every single
job. The RPCs will remain unchanged, and the LU code will stay as is as much
as possible.
MasterD will cease to exist as a deamon on its own at this point, but not
before.
#. Improve job scheduling algorithm.
The simple algorithm for scheduling jobs will be replaced by a more
intelligent one. Also, the implementation of :doc:`design-optables` can be
started.
Job death detection
-------------------
**Requirements:**
- It must be possible to reliably detect a death of a process even under
uncommon conditions such as very heavy system load.
- A daemon must be able to detect a death of a process even if the
daemon is restarted while the process is running.
- The solution must not rely on being able to communicate with
a process.
- The solution must work for the current situation where multiple jobs
run in a single process.
- It must be POSIX compliant.
These conditions rule out simple solutions like checking a process ID
(because the process might be eventually replaced by another process
with the same ID) or keeping an open connection to a process.
**Solution:** As a job process is spawned, before attempting to
communicate with any other process, it will create a designated empty
lock file, open it, acquire an *exclusive* lock on it, and keep it open.
When connecting to a daemon, the job process will provide it with the
path of the file. If the process dies unexpectedly, the operating system
kernel automatically cleans up the lock.
Therefore, daemons can check if a process is dead by trying to acquire
a *shared* lock on the lock file in a non-blocking mode:
- If the locking operation succeeds, it means that the exclusive lock is
missing, therefore the process has died, but the lock
file hasn't been cleaned up yet. The daemon should release the lock
immediately. Optionally, the daemon may delete the lock file.
- If the file is missing, the process has died and the lock file has
been cleaned up.
- If the locking operation fails due to a lock conflict, it means
the process is alive.
Using shared locks for querying lock files ensures that the detection
works correctly even if multiple daemons query a file at the same time.
A job should close and remove its lock file when completely finishes.
The WConfD daemon will be responsible for removing stale lock files of
jobs that didn't remove its lock files themselves.
**Statelessness of the protocol:** To keep our protocols stateless,
the job id and the path the to lock file are sent as part of every
request that deals with resources, in particular the Ganeti Locks.
All resources are owned by the pair (job id, lock file). In this way,
several jobs can live in the same process (as it will be in the
transition period), but owner death detection still only depends on the
owner of the resource. In particular, no additional lookup table is
needed to obtain the lock file for a given owner.
**Considered alternatives:** An alternative to creating a separate lock
file would be to lock the job status file. However, file locks are kept
only as long as the file is open. Therefore any operation followed by
closing the file would cause the process to release the lock. In
particular, with jobs as threads, the master daemon wouldn't be able to
keep locks and operate on job files at the same time.
Job execution
-------------
As the Luxi daemon will be responsible for executing jobs, it needs to
start jobs in such a way that it can properly detect if the job dies
under any circumstances (such as Luxi daemon being restarted in the
process).
The name of the lock file will be stored in the corresponding job file
so that anybody is able to check the status of the process corresponding
to a job.
The proposed procedure:
#. The Luxi daemon saves the name of its own lock file into the job file.
#. The Luxi daemon forks, creating a bi-directional pipe with the child
process.
#. The child process creates and locks its own, proper lock file and
handles its name to the Luxi daemon through the pipe.
#. The Luxi daemon saves the name of the lock file into the job file and
confirms it to the child process.
#. Only then the child process can replace itself by the actual job
process.
If the child process detects that the pipe is broken before receiving the
confirmation, it must terminate, not starting the actual job.
This way, the actual job is only started if it is ensured that its lock
file name is written to the job file.
If the Luxi daemon detects that the pipe is broken before successfully
sending the confirmation in step 4., it assumes that the job has failed.
If the pipe gets broken after sending the confirmation, no further
action is necessary. If the child doesn't receive the confirmation,
it will die and its death will be detected by Luxid eventually.
If the Luxi daemon dies before completing the procedure, the job will
not be started. If the job file contained the daemon's lock file name,
it will be detected as dead (because the daemon process died). If the
job file already contained its proper lock file, it will also be
detected as dead (because the child process won't start the actual job
and die).
WConfD details
--------------
WConfD will communicate with its clients through a Unix domain socket for both
configuration management and locking. Clients can issue multiple RPC calls
through one socket. For each such a call the client sends a JSON request
document with a remote function name and data for its arguments. The server
replies with a JSON response document containing either the result of
signalling a failure.
Any state associated with client processes will be mirrored on persistent
storage and linked to the identity of processes so that the WConfD daemon will
be able to resume its operation at any point after a restart or a crash. WConfD
will track each client's process start time along with its process ID to be
able detect if a process dies and it's process ID is reused. WConfD will clear
all locks and other state associated with a client if it detects it's process
no longer exists.
Configuration management
++++++++++++++++++++++++
The new configuration management protocol will be implemented in the following
steps:
Step 1:
#. Implement the following functions in WConfD and export them through
RPC:
- Obtain a single internal lock, either in shared or
exclusive mode. This lock will substitute the current lock
``_config_lock`` in config.py.
- Release the lock.
- Return the whole configuration data to a client.
- Receive the whole configuration data from a client and replace the
current configuration with it. Distribute it to master candidates
and distribute the corresponding *ssconf*.
WConfD must detect deaths of its clients (see `Job death
detection`_) and release locks automatically.
#. In config.py modify public methods that access configuration:
- Instead of acquiring a local lock, obtain a lock from WConfD
using the above functions
- Fetch the current configuration from WConfD.
- Use it to perform the method's task.
- If the configuration was modified, send it to WConfD at the end.
- Release the lock to WConfD.
This will decouple the configuration management from the master daemon,
even though the specific configuration tasks will still performed by
individual jobs.
After this step it'll be possible access the configuration from separate
processes.
Step 2:
#. Reimplement all current methods of ``ConfigWriter`` for reading and
writing the configuration of a cluster in WConfD.
#. Expose each of those functions in WConfD as a separate RPC function.
This will allow easy future extensions or modifications.
#. Replace ``ConfigWriter`` with a stub (preferably automatically
generated from the Haskell code) that will contain the same methods
as the current ``ConfigWriter`` and delegate all calls to its
methods to WConfD.
Step 3:
In a later step, the impact of the config lock will be reduced by moving
it more and more into an internal detail of WConfD. This forthcoming process
of :doc:`design-configlock` is described separately.
Locking
+++++++
The new locking protocol will be implemented as follows:
Re-implement the current locking mechanism in WConfD and expose it for RPC
calls. All current locks will be mapped into a data structure that will
uniquely identify them (storing lock's level together with it's name).
WConfD will impose a linear order on locks. The order will be compatible
with the current ordering of lock levels so that existing code will work
without changes.
WConfD will keep the set of currently held locks for each client. The
protocol will allow the following operations on the set:
*Update:*
Update the current set of locks according to a given list. The list contains
locks and their desired level (release / shared / exclusive). To prevent
deadlocks, WConfD will check that all newly requested locks (or already held
locks requested to be upgraded to *exclusive*) are greater in the sense of
the linear order than all currently held locks, and fail the operation if
not. Only the locks in the list will be updated, other locks already held
will be left intact. If the operation fails, the client's lock set will be
left intact.
*Opportunistic union:*
Add as much as possible locks from a given set to the current set within a
given timeout. WConfD will again check the proper order of locks and
acquire only the ones that are allowed wrt. the current set. Returns the
set of acquired locks, possibly empty. Immediate. Never fails. (It would also
be possible to extend the operation to try to wait until a given number of
locks is available, or a given timeout elapses.)
*List:*
List the current set of held locks. Immediate, never fails.
*Intersection:*
Retain only a given set of locks in the current one. This function is
provided for convenience, it's redundant wrt. *list* and *update*. Immediate,
never fails.
Addidional restrictions due to lock implications:
Ganeti supports locks that act as if a lock on a whole group (like all nodes)
were held. To avoid dead locks caused by the additional blockage of those
group locks, we impose certain restrictions. Whenever `A` is a group lock and
`B` belongs to `A`, then the following holds.
- `A` is in lock order before `B`.
- All locks that are in the lock order between `A` and `B` also belong to `A`.
- It is considered a lock-order violation to ask for an exclusive lock on `B`
while holding a shared lock on `A`.
After this step it'll be possible to use locks from jobs as separate processes.
The above set of operations allows the clients to use various work-flows. In particular:
Pessimistic strategy:
Lock all potentially relevant resources (for example all nodes), determine
which will be needed, and release all the others.
Optimistic strategy:
Determine what locks need to be acquired without holding any. Lock the
required set of locks. Determine the set of required locks again and check if
they are all held. If not, release everything and restart.
.. COMMENTED OUT:
Start with the smallest set of locks and when determining what more
relevant resources will be needed, expand the set. If an *union* operation
fails, release all locks, acquire the desired union and restart the
operation so that all preconditions and possible concurrent changes are
checked again.
Future aims:
- Add more fine-grained locks to prevent unnecessary blocking of jobs. This
could include locks on parameters of entities or locks on their states (so that
a node remains online, but otherwise can change, etc.). In particular,
adding, moving and removing instances currently blocks the whole node.
- Add checks that all modified configuration parameters belong to entities
the client has locked and log violations.
- Make the above checks mandatory.
- Automate optimistic locking and checking the locks in logical units.
For example, this could be accomplished by allowing some of the initial
phases of `LogicalUnit` (such as `ExpandNames` and `DeclareLocks`) to be run
repeatedly, checking if the set of locks requested the second time is
contained in the set acquired after the first pass.
- Add the possibility for a job to reserve hardware resources such as disk
space or memory on nodes. Most likely as a new, special kind of instances
that would only block its resources and allow to be converted to a regular
instance. This would allow long-running jobs such as instance creation or
move to lock the corresponding nodes, acquire the resources and turn the
locks into shared ones, keeping an exclusive lock only on the instance.
- Use more sophisticated algorithm for preventing deadlocks such as a
`wait-for graph`_. This would allow less *union* failures and allow more
optimistic, scalable acquisition of locks.
.. _`wait-for graph`: http://en.wikipedia.org/wiki/Wait-for_graph
Further considerations
======================
There is a possibility that a job will finish performing its task while LuxiD
and/or WConfD will not be available.
In order to deal with this situation, each job will update its job file
in the queue. This is race free, as LuxiD will no longer touch the job file,
once the job is started; a corollary of this is that the job also has to
take care of replicating updates to the job file. LuxiD will watch job files for
changes to determine when a job was cleanly finished. To determine jobs
that died without having the chance of updating the job file, the `Job death
detection`_ mechanism will be used.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-dedicated-allocation.rst 0000644 0000000 0000000 00000007406 12634264163 0021663 0 ustar 00root root 0000000 0000000 =================================
Allocation for Partitioned Ganeti
=================================
.. contents:: :depth: 4
Current state and shortcomings
==============================
The introduction of :doc:`design-partitioned` allowed to
dedicate resources, in particular storage, exclusively to
an instance. The advantage is that such instances have
guaranteed latency that is not affected by other
instances. Typically, those instances are created once
and never moved. Also, typically large chunks (full, half,
or quarter) of a node are handed out to individual
partitioned instances.
Ganeti's allocation strategy is to keep the cluster as
balanced as possible. In particular, as long as empty nodes
are available, new instances, regardless of their size,
will be placed there. Therefore, if a couple of small
instances are placed on the cluster first, it will no longer
be possible to place a big instance on the cluster despite
the total usage of the cluster being low.
Proposed changes
================
We propose to change the allocation strategy of hail for
node groups that have the ``exclusive_storage`` flag set,
as detailed below; nothing will be changed for non-exclusive
node groups. The new strategy will try to keep the cluster
as available for new instances as possible.
Dedicated Allocation Metric
---------------------------
The instance policy is a set of intervals in which the resources
of the instance have to be. Typical choices for dedicated clusters
have disjoint intervals with the same monotonicity in every dimension.
In this case, the order is obvious. In order to make it well-defined
in every case, we specify that we sort the intervals by the lower
bound of the disk size. This is motivated by the fact that disk is
the most critical aspect of partitioned Ganeti.
For a node the *allocation vector* is the vector of, for each
instance policy interval in decreasing order, the number of
instances minimally compliant with that interval that still
can be placed on that node. For the drbd template, it is assumed
that all newly placed instances have new secondaries.
The *lost-allocations vector* for an instance on a node is the
difference of the allocation vectors for that node before and
after placing that instance on that node. Lost-allocation vectors
are ordered lexicographically, i.e., a loss of an allocation
larger instance size dominates loss of allocations of smaller
instance sizes.
If allocating in a node group with ``exclusive_storage`` set
to true, hail will try to minimise the pair of the lost-allocations
vector and the remaining disk space on the node afer, ordered
lexicographically.
Example
-------
Consider the already mentioned scenario were only full, half, and quarter
nodes are given to instances. Here, for the placement of a
quarter-node--sized instance we would prefer a three-quarter-filled node (lost
allocations: 0, 0, 1 and no left overs) over a quarter-filled node (lost
allocations: 0, 0, 1 and half a node left over)
over a half-filled node (lost allocations: 0, 1, 1) over an empty
node (lost allocations: 1, 1, 1). A half-node sized instance, however,
would prefer a half-filled node (lost allocations: 0, 1, 2 and no left-overs)
over a quarter-filled node (lost allocations: 0, 1, 2 and a quarter node left
over) over an empty node (lost allocations: 1, 1, 2).
Note that the presence of additional policy intervals affects the preferences
of instances of other sizes as well. This is by design, as additional available
instance sizes make additional remaining node sizes attractive. If, in the
given example, we would also allow three-quarter-node--sized instances, for
a quarter-node--sized instance it would now be better to be placed on a
half-full node (lost allocations: 0, 0, 1, 1) than on a quarter-filled
node (lost allocations: 0, 1, 0, 1).
ganeti-2.15.2/doc/design-device-uuid-name.rst 0000644 0000000 0000000 00000005541 12634264163 0020751 0 ustar 00root root 0000000 0000000 ==========================================
Design for adding UUID and name to devices
==========================================
.. contents:: :depth: 4
This is a design document about adding UUID and name to instance devices
(Disks/NICs) and the ability to reference them by those identifiers.
Current state and shortcomings
==============================
Currently, the only way to refer to a device (Disk/NIC) is by its index
inside the VM (e.g. gnt-instance modify --disk 2:remove).
Using indices as identifiers has the drawback that addition/removal of a
device results in changing the identifiers(indices) of other devices and
makes the net effect of commands depend on their strict ordering. A
device reference is not absolute, meaning an external entity controlling
Ganeti, e.g., over RAPI, cannot keep permanent identifiers for referring
to devices, nor can it have more than one outstanding commands, since
their order of execution is not guaranteed.
Proposed Changes
================
To be able to reference a device in a unique way, we propose to extend
Disks and NICs by assigning to them a UUID and a name. The UUID will be
assigned by Ganeti upon creation, while the name will be an optional
user parameter. Renaming a device will also be supported.
Commands (e.g. `gnt-instance modify`) will be able to reference each
device by its index, UUID, or name. To be able to refer to devices by
name, we must guarantee that device names are unique. Unlike other
objects (instances, networks, nodegroups, etc.), NIC and Disk objects
will not have unique names across the cluster, since they are still not
independent entities, but rather part of the instance object. This makes
global uniqueness of names hard to achieve at this point. Instead their
names will be unique at instance level.
Apart from unique device names, we must also guarantee that a device
name can not be the UUID of another device. Also, to remove ambiguity
while supporting both indices and names as identifiers, we forbid purely
numeric device names.
Implementation Details
======================
Modify OpInstanceSetParams to accept not only indexes, but also device
names and UUIDs. So, the accepted NIC and disk modifications will have
the following format:
identifier:action,key=value
where, from now on, identifier can be an index (-1 for the last device),
UUID, or name and action should be add, modify, or remove.
Configuration Changes
~~~~~~~~~~~~~~~~~~~~~
Disk and NIC config objects get two extra slots:
- uuid
- name
Instance Queries
~~~~~~~~~~~~~~~~~
We will extend the query mechanism to expose names and UUIDs of NICs and
Disks.
Hook Variables
~~~~~~~~~~~~~~
We will expose the name of NICs and Disks to the hook environment of
instance-related operations:
``INSTANCE_NIC%d_NAME``
``INSTANCE_DISK%d_NAME``
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-disk-conversion.rst 0000644 0000000 0000000 00000031347 12634264163 0020750 0 ustar 00root root 0000000 0000000 =================================
Conversion between disk templates
=================================
.. contents:: :depth: 4
This design document describes the support for generic disk template
conversion in Ganeti. The logic used is disk template agnostic and
targets to cover the majority of conversions among the supported disk
templates.
Current state and shortcomings
==============================
Currently, Ganeti supports choosing among different disk templates when
creating an instance. However, converting the disk template of an
existing instance is possible only between the ``plain`` and ``drbd``
templates. This feature was added in Ganeti since its early versions
when the number of supported disk templates was limited. Now that Ganeti
supports plenty of choices, this feature should be extended to provide
more flexibility to the user.
The procedure for converting from the plain to the drbd disk template
works as follows. Firstly, a completely new disk template is generated
matching the size, mode, and the count of the current instance's disks.
The missing volumes are created manually both in the primary (meta disk)
and the secondary node. The original LVs running on the primary node are
renamed to match the new names. The last step is to manually associate
the DRBD devices with their mirror block device pairs. The conversion
from the drbd to the plain disk template is much simpler than the
opposite. Firstly, the DRBD mirroring is manually disabled. Then the
unnecessary volumes including the meta disk(s) of the primary node, and
the meta and data disk(s) from the previously secondary node are
removed.
Proposed changes
================
This design proposes the creation of a unified interface for handling
the disk template conversions in Ganeti. Currently, there is no such
interface and each one of the supported conversions uses a separate code
path.
This proposal introduces a single, disk-agnostic interface for handling
the disk template conversions in Ganeti, keeping in mind that we want it
to be as generic as possible. An exception case will be the currently
supported conversions between the LVM-based disk templates. Their basic
functionality will not be affected and will diverge from the rest disk
template conversions. The target is to provide support for conversions
among the majority of the available disk templates, and also creating
a mechanism that will easily support any new templates that may be
probably added in Ganeti, at a future point.
Design decisions
================
Currently, the supported conversions for the LVM-based templates are
handled by the ``LUInstanceSetParams`` LU. Our implementation will
follow the same approach. From a high-level point-of-view this design
can be split in two parts:
* The extension of the LU's checks to cover all the supported template
conversions
* The new functionality which will be introduced to provide the new
feature
The instance must be stopped before starting the disk template
conversion, as it currently is, otherwise the operation will fail. The
new mechanism will need to copy the disk's data for the conversion to be
possible. We propose using the Unix ``dd`` command to copy the
instance's data. It can be used to copy data from source to destination,
block-by-block, regardless of their filesystem types, making it a
convenient tool for the case. Since the conversion will be done via data
copy it will take a long time for bigger disks to copy their data and
consequently for the instance to switch to the new template.
Some template conversions can be done faster without copying explicitly
their disks' data. A use case is the conversions between the LVM-based
templates, i.e., ``drbd`` and ``plain`` which will be done as happens
now and not using the ``dd`` command. Also, this implementation will
provide partial support for the ``blockdev`` disk template which will
act only as a source template. Since those volumes are adopted
pre-existent block devices we will not support conversions targeting
this template. Another exception case will be the ``diskless`` template.
Since it is a testing template that creates instances with no disks we
will not provide support for conversions that include this template
type.
We divide the design into the following parts:
* Block device changes, that include the new methods which will be
introduced and will be responsible for building the commands for the
data copy from/to the requested devices
* Backend changes, that include a new RPC call which will concatenate
the output of the above two methods and will execute the data copy
command
* Core changes, that include the modifications in the Logical Unit
* User interface changes, i.e., command line changes
Block device changes
--------------------
The block device abstract class will be extended with two new methods,
named ``Import`` and ``Export``. Those methods will be responsible for
building the commands that will be used for the data copy between the
corresponding devices. The ``Export`` method will build the command
which will export the data from the source device, while the ``Import``
method will do the opposite. It will import the data to the newly
created target device. Those two methods will not perform the actual
data copy; they will simply return the requested commands for
transferring the data from/to the individual devices. The output of the
two methods will be combined using a pipe ("|") by the caller method in
the backend level.
By default the data import and export will be done using the ``dd``
command. All the inherited classes will use the base functionality
unless there is a faster way to convert to. In that case the underlying
block device will overwrite those methods with its specific
functionality. A use case will be the Ceph/RADOS block devices which
will make use of the ``rbd import`` and ``rbd export`` commands to copy
their data instead of using the default ``dd`` command.
Keeping the data copy functionality in the block device layer, provides
us with a generic mechanism that works between almost all conversions
and furthermore can be easily extended for new disk templates. It also
covers the devices that support the ``access=userspace`` parameter and
solves this problem in a generic way, by implementing the logic in the
right level where we know what is the best to do for each device.
Backend changes
---------------
Introduce a new RPC call:
* blockdev_convert(src_disk, dest_disk)
where ``src_disk`` and ``dest_disk`` are the original and the new disk
objects respectively. First, the actual device instances will be
computed and then they will be used to build the export and import
commands for the data copy. The output of those methods will be
concatenated using a pipe, following a similar approach with the impexp
daemon. Finally, the unified data copy command will be executed, at this
level, by the ``nodeD``.
Core changes
------------
The main modifications will be made in the ``LUInstanceSetParams`` LU.
The implementation of the conversion mechanism will be split into the
following parts:
* The generation of the new disk template for the instance. The new
disks will match the size, mode, and name of the original volumes.
Those parameters and any other needed, .i.e., the provider's name for
the ExtStorage conversions, will be computed by a new method which we
will introduce, named ``ComputeDisksInfo``. The output of that
function will be used as the ``disk_info`` argument of the
``GenerateDiskTemplate`` method.
* The creation of the new block devices. We will make use of the
``CreateDisks`` method which creates and attaches the new block
devices.
* The data copy for each disk of the instance from the original to the
newly created volume. The data copy will be made by the ``nodeD`` with
the rpc call we have introduced earlier in this design. In case some
disks fail to copy their data the operation will fail and the newly
created disks will be removed. The instance will remain intact.
* The detachment of the original disks of the instance when the data
copy operation successfully completes by calling the
``RemoveInstanceDisk`` method for each instance's disk.
* The attachment of the new disks to the instance by calling the
``AddInstanceDisk`` method for each disk we have created.
* The update of the configuration file with the new values.
* The removal of the original block devices from the node using the
``BlockdevRemove`` method for each one of the old disks.
User interface changes
----------------------
The ``-t`` (``--disk-template``) option from the gnt-instance modify
command will specify the disk template to convert *to*, as it happens
now. The rest disk options such as its size, its mode, and its name will
be computed from the original volumes by the conversion mechanism, and
the user will not explicitly provide them.
ExtStorage conversions
~~~~~~~~~~~~~~~~~~~~~~
When converting to an ExtStorage disk template the
``provider=*PROVIDER*`` option which specifies the ExtStorage provider
will be mandatory. Also, arbitrary parameters can be passed to the
ExtStorage provider. Those parameters will be optional and could be
passed as additional comma separated options. Since it is not allowed to
convert the disk template of an instance and make use of the ``--disk``
option at the same time, we propose to introduce a new option named
``--ext-params`` to handle the ``ext`` template conversions.
::
gnt-instance modify -t ext --ext-params provider=pvdr1 test_vm
gnt-instance modify -t ext --ext-params provider=pvdr1,param1=val1,param2=val2 test_vm
File-based conversions
~~~~~~~~~~~~~~~~~~~~~~
For conversions *to* a file-based template the ``--file-storage-dir``
and the ``--file-driver`` options could be used, similarly to the
**add** command, to manually configure the storage directory and the
preferred driver for the file-based disks.
::
gnt-instance modify -t file --file-storage-dir=mysubdir test_vm
Supported template conversions
==============================
This is a summary of the disk template conversions that the conversion
mechanism will support:
+--------------+-----------------------------------------------------------------------------------+
| Source | Target Disk Template |
| Disk +---------+-------+------+------------+---------+------+------+----------+----------+
| Template | Plain | DRBD | File | Sharedfile | Gluster | RBD | Ext | BlockDev | Diskless |
+==============+=========+=======+======+============+=========+======+======+==========+==========+
| Plain | - | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| DRBD | Yes. | - | Yes. | Yes. | Yes. | Yes. | Yes. | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| File | Yes. | Yes. | - | Yes. | Yes. | Yes. | Yes. | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Sharedfile | Yes. | Yes. | Yes. | - | Yes. | Yes. | Yes. | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Gluster | Yes. | Yes. | Yes. | Yes. | - | Yes. | Yes. | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| RBD | Yes. | Yes. | Yes. | Yes. | Yes. | - | Yes. | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Ext | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | - | No. | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| BlockDev | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | - | No. |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Diskless | No. | No. | No. | No. | No. | No. | No. | No. | - |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
Future Work
===========
Expand the conversion mechanism to provide a visual indication of the
data copy operation. We could monitor the progress of the data sent via
a pipe, and provide to the user information such as the time elapsed,
percentage completed (probably with a progress bar), total data
transferred, and so on, similar to the progress tracking that is
currently done by the impexp daemon.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-disks.rst 0000644 0000000 0000000 00000024574 12634264163 0016754 0 ustar 00root root 0000000 0000000 =====
Disks
=====
.. contents:: :depth: 4
This is a design document detailing the implementation of disks as a new
top-level citizen in the config file (just like instances, nodes etc).
Current state and shortcomings
==============================
Currently, Disks are stored in Ganeti's config file as a list
(container) of Disk objects under the Instance in which they belong.
This implementation imposes a number of limitations:
* A Disk object cannot live outside an Instance. This means that one
cannot detach a disk from an instance (without destroying the disk)
and then reattach it (to the same or even to a different instance).
* Disks are not taggable objects, as only top-level citizens of the
config file can be made taggable. Having taggable disks will allow for
further customizations.
* All disks of an instance have to be of the same template. Dropping
this constraint would allow mixing different kinds of storage (e.g. an
instance might have a local ``plain`` storage for the OS and a
remotely replicated ``sharedstorage`` for the data).
Proposed changes
================
The implementation is going to be split in four parts:
* Make disks a top-level citizen in config file. The Instance object
will no longer contain a list of Disk objects, but a list of disk
UUIDs.
* Add locks for Disk objects and make them taggable.
* Allow to attach/detach an existing disk to/from an instance.
* Allow creation/modification/deletion of disks that are not attached to
any instance (requires new LUs for disks).
* Allow disks of a single instance to be of different templates.
* Remove all unnecessary distinction between disk templates and disk
types.
Design decisions
================
Disks as config top-level citizens
----------------------------------
The first patch-series is going to add a new top-level citizen in the
config object (namely ``disks``) and separate the disk objects from the
instances. In doing so there are a number of problems that we have to
overcome:
* How the Disk object will be represented in the config file and how it
is going to be connected with the instance it belongs to.
* How the existing code will get the disks belonging to an instance.
* What it means for a disk to be attached/detached to/from an instance.
* How disks are going to be created/deleted, attached/detached using
the existing code.
Disk representation
~~~~~~~~~~~~~~~~~~~
The ``Disk`` object gets two extra slots, ``_TIMESTAMPS`` and
``serial_no``.
The ``Instance`` object will no longer contain the list of disk objects
that are attached to it or a disk template.
Instead, an Instance object will refer to its
disks using their UUIDs and the disks will contain their own template.
Since the order in which the disks are attached
to an instance is important we are going to have a list of disk UUIDs
under the Instance object which will denote the disks attached to the
instance and their order at the same time. So the Instance's ``disks``
slot is going to be a list of disk UUIDs. The `Disk` object is not going
to have a slot pointing to the `Instance` in which it belongs since this
is redundant.
Get instance's disks
~~~~~~~~~~~~~~~~~~~~
A new function ``GetInstanceDisks`` will be added to the config that given an
instance will return a list of Disk objects with the disks attached to this
instance. This list will be exactly the same as 'instance.disks' was before.
Everywhere in the code we are going to replace the 'instance.disks' (which from
now one will contain a list of disk UUIDs) with the function
``GetInstanceDisks``.
Since disks will not be part of the `Instance` object any more, 'all_nodes' and
'secondary_nodes' can not be `Instance`'s properties. Instead we will use the
functions ``GetInstanceNodes`` and ``GetInstanceSecondaryNodes`` from the
config to compute these values.
Configuration changes
~~~~~~~~~~~~~~~~~~~~~
The ``ConfigData`` object gets one extra slot: ``disks``. Also there
will be two new functions, ``AddDisk`` and ``RemoveDisk`` that will
create/remove a disk objects from the config.
The ``VerifyConfig`` function will be changed so it can check that there
are no dangling pointers from instances to disks (i.e. an instance
points to a disk that doesn't exist in the config).
The 'upgrade' operation for the config should check if disks are top level
citizens and if not it has to extract the disk objects from the instances,
replace them with their uuids, and copy the disk template. In case of the 'downgrade' operation (where
disks will be made again part of the `Instance` object) all disks that are not
attached to any instance at all will be ignored (removed from config).
The disk template of the
instance is set to the disk template of any disk attached to it. If
there are multiple disk templates present, the downgrade fails and the
user is requested to detach disks from the instances.
Apply Disk modifications
~~~~~~~~~~~~~~~~~~~~~~~~
There are four operations that can be performed to a `Disk` object:
* Create a new `Disk` object of a given template and save it to the
config.
* Remove an existing `Disk` object from the config.
* Attach an existing `Disk` to an existing `Instance`.
* Detach an existing `Disk` from an existing `Instance`.
The first two operations will be performed using the config functions
``AddDisk`` and ``RemoveDisk`` respectively where the last two operations
will be performed using the functions ``AttachInstanceDisk`` and
``DetachInstanceDisk``.
More specifically, the `add` operation will add and attach a disk at the same
time, using a wrapper that calls the ``AddDisk`` and ``AttachInstanceDisk``
functions. On the same vein, the `remove` operation will detach and remove a
disk using a wrapper that calls the ``DetachInstanceDisk`` and
``RemoveInstanceDisk``. The `attach` and `detach` operations are simpler, in
the sense that they only call the ``AttachInstanceDisk`` and
``DetachInstanceDisk`` functions respectively.
It is important to note that the `detach` operation introduces the notion of
disks that are not attached to any instance. For this reason, the configuration
checks for detached disks will be removed, as the detached disks can be handled
by the code.
In addition since Ganeti doesn't allow for a `Disk` object to be attached to
more than one `Instance` at once, when attaching a disk to an instance we have
to make sure that the disk is not attached anywhere else.
Backend changes
~~~~~~~~~~~~~~~
The backend needs access to the disks of an `Instance` but doesn't have access to
the `GetInstanceDisks` function from the config file. Thus we will create a new
`Instance` slot (namely ``disks_info``) that will get annotated (during RPC)
with the instance's disk objects. So in the backend we will only have to
replace the ``disks`` slot with ``disks_info``.
Supporting the old interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The current interface is designed with a uniform disk type in mind and
this interface should still be supported to not break tools and
workflows downstream.
The behaviour is fully compatible for instances with constantly
attached, uniform disks.
Whenever an operation operates on an instance, the operation will only
consider the disks attached. If the operation is specific to a disk
type, it will throw an error if any disks of a type not supported are
attached.
When setting the disk template of an instance, we convert all currently
attached disks to that template. This means that all disk types
currently attached must be convertible to the new template.
Since the disk template as a configuration value is going away, it needs
to be replaced for queries. If the instance has no disks, the
disk_template will be 'diskless', if it has disks of a single type, its
disk_template will be that type, and if it has disks of multiple types,
the new disk template 'mixed' will be returned.
Eliminating the disk template from the instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to remove the disk template from the instance model, all
current uses of the disk template there need to be replaced. These uses
fall into the following general categories:
1. The configuration needs to reflect the new model. `cfgupgrade` and
`bootstrap` need to be fixed, creating and modifying instances and
disks for instances needs to be fixed.
2. The query interface will no longer be able to return an instance disk
template.
3. Several checks for the DISKLESS template will be replaced by checking
if any disks are attached.
4. If an operation works disk by disk, the operation will dispatch for
the functionality by disk instead of by instance. If an operation
requires that all disks are of the same kind (e.g. a query if the
instance is DRBD backed) then the assumption is checked beforehand.
Since this is a user visible change, it will have to be announced in
the NEWS file specifying the calls changed.
5. Operations that operate on the instance and extract the disk template
e.g. for creation of a new disk will require an additional parameter
for the disk template. Several instances already provide an optional
parameter to override the instance setting, those will become
required. This is incompatible as well and will need to be listed in
the NEWS file.
Attach/Detach disks from cli
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `attach`/`detach` options should be available through the command
``gnt-instance modify``. Like the `add`/`remove` options, the `attach`/`detach`
options can be invoked using the legacy syntax or the new syntax that supports
indexes. For the attach option, we can refer to the disk using either its
`name` or `uuid`. The detach option on the other hand has the same syntax as
the remove option, and we can refer to a disk by its `name`, `uuid` or `index`
in the instance.
The attach/detach syntax can be seen below:
* **Legacy syntax**
.. code-block:: bash
gnt-instance modify --disk attach,name=*NAME* *INSTANCE*
gnt-instance modify --disk attach,uuid=*UUID* *INSTANCE*
gnt-instance modify --disk detach *INSTANCE*
* **New syntax**
.. code-block:: bash
gnt-instance modify --disk *N*:attach,name=*NAME* *INSTANCE*
gnt-instance modify --disk *N*:attach,uuid=*UUID* *INSTANCE*
gnt-instance modify --disk *N*:detach *INSTANCE*
gnt-instance modify --disk *NAME*:detach *INSTANCE*
gnt-instance modify --disk *UUID*:detach *INSTANCE*
.. TODO: Locks for Disk objects
.. TODO: LUs for disks
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-draft.rst 0000644 0000000 0000000 00000001423 12634264163 0016723 0 ustar 00root root 0000000 0000000 ======================
Design document drafts
======================
.. Last updated for Ganeti 2.15
.. toctree::
:maxdepth: 2
design-x509-ca.rst
design-http-server.rst
design-impexp2.rst
design-resource-model.rst
design-storagetypes.rst
design-glusterfs-ganeti-support.rst
design-hugepages-support.rst
design-ceph-ganeti-support.rst
design-os.rst
design-move-instance-improvements.rst
design-node-security.rst
design-ifdown.rst
design-location.rst
design-reservations.rst
design-sync-rate-throttling.rst
design-network2.rst
design-configlock.rst
design-multi-storage-htools.rst
design-shared-storage-redundancy.rst
design-disks.rst
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-file-based-disks-ownership.rst 0000644 0000000 0000000 00000005371 12634264163 0022753 0 ustar 00root root 0000000 0000000 =================================
Ganeti file-based disks ownership
=================================
.. contents:: :depth: 2
This design document explains the issue that emerges from the usage of the
`detach` operation to file-based disks and provides a simple solution to it.
Note that this design document applies only to disks of template `file` and
`sharedfile`, but not `gluster`. However, for brevity reasons these templates
will go under the umbrella term `file-based`.
Current state and shortcomings
==============================
When creating a file-based disk, Ganeti stores it inside a specific directory,
called `file_storage_dir`. Inside this directory, there is a folder for each
file-based instance and inside each folder are the files for the instance's
disks (e.g. ``//``). This way of
storing disks seems simple enough, but the
`detach` operation does not work well with it. The reason is that if a disk is
detached from an instance and attached to another one, the file will remain to
the folder of the original instance.
This means that if we try to destroy an instance with detached disks, Ganeti
will correctly complain that the instance folder still has disk data. In more
high-level terms, we need to find a way to resolve the issue of disk ownership
at the filesystem level for file-based instances.
Proposed changes
================
The change we propose is simple. Once a disk is detached from an instance, it
will be moved out of the instance's folder. The new location will be the
`file_storage_dir`, i.e. the disk will reside on the same level as the instance
folders. In order to maintain a consistent configuration, the logical_id of the
disk will be updated to point to the new path.
Similarly, on the `attach` operation, the file name and logical id will change
and the disk will be moved under the new instance's directory.
Implementation details
======================
Detach operation
~~~~~~~~~~~~~~~~
Before detaching a disk from an instance, we do the following:
1. Transform the current path to the new one.
// --> /
2. Use the rpc call ``call_blockdev_rename`` to move the disk to the new path.
3. Store the new ``logical_id`` to the configuration.
Attach operation
~~~~~~~~~~~~~~~~
Before attaching a disk to an instance, we do the following:
1. Create the new path for the file disk. In order to construct it properly,
use the ``GenerateDiskTemplate`` function to create a dummy disk template
and get its ``logical_id``. The new ``logical_id`` contains the new path for
the file disk.
2. Use the rpc call ``call_blockdev_rename`` to move the disk to the new path.
3. Store the new ``logical_id`` to the configuration.
ganeti-2.15.2/doc/design-file-based-storage.rst 0000644 0000000 0000000 00000023647 12634264163 0021274 0 ustar 00root root 0000000 0000000 ==================
File-based Storage
==================
This page describes the proposed file-based storage for the 2.0 version
of Ganeti. The project consists in extending Ganeti in order to support
a filesystem image as Virtual Block Device (VBD) in Dom0 as the primary
storage for a VM.
Objective
=========
Goals:
* file-based storage for virtual machines running in a Xen-based
Ganeti cluster
* failover of file-based virtual machines between cluster-nodes
* export/import file-based virtual machines
* reuse existing image files
* allow Ganeti to initialize the cluster without checking for a volume
group (e.g. xenvg)
Non Goals:
* any kind of data mirroring between clusters for file-based instances
(this should be achieved by using shared storage)
* special support for live-migration
* encryption of VBDs
* compression of VBDs
Background
==========
Ganeti is a virtual server management software tool built on top of Xen
VM monitor and other Open Source software.
Since Ganeti currently supports only block devices as storage backend
for virtual machines, the wish came up to provide a file-based backend.
Using this file-based option provides the possibility to store the VBDs
on basically every filesystem and therefore allows to deploy external
data storages (e.g. SAN, NAS, etc.) in clusters.
Overview
========
Introduction
++++++++++++
Xen (and other hypervisors) provide(s) the possibility to use a file as
the primary storage for a VM. One file represents one VBD.
Advantages/Disadvantages
++++++++++++++++++++++++
Advantages of file-backed VBD:
* support of sparse allocation
* easy from a management/backup point of view (e.g. you can just copy
the files around)
* external storage (e.g. SAN, NAS) can be used to store VMs
Disadvantages of file-backed VBD:
* possible performance loss for I/O-intensive workloads
* using sparse files requires care to ensure the sparseness is
preserved when copying, and there is no header in which metadata
relating back to the VM can be stored
Xen-related specifications
++++++++++++++++++++++++++
Driver
~~~~~~
There are several ways to realize the required functionality with an
underlying Xen hypervisor.
1) loopback driver
^^^^^^^^^^^^^^^^^^
Advantages:
* available in most precompiled kernels
* stable, since it is in kernel tree for a long time
* easy to set up
Disadvantages:
* buffer writes very aggressively, which can affect guest filesystem
correctness in the event of a host crash
* can even cause out-of-memory kernel crashes in Dom0 under heavy
write load
* substantial slowdowns under heavy I/O workloads
* the default number of supported loopdevices is only 8
* doesn't support QCOW files
``blktap`` driver
^^^^^^^^^^^^^^^^^
Advantages:
* higher performance than loopback driver
* more scalable
* better safety properties for VBD data
* Xen-team strongly encourages use
* already in Xen tree
* supports QCOW files
* asynchronous driver (i.e. high performance)
Disadvantages:
* not enabled in most precompiled kernels
* stable, but not as much tested as loopback driver
3) ubklback driver
^^^^^^^^^^^^^^^^^^
The Xen Roadmap states "Work is well under way to implement a
``ublkback`` driver that supports all of the various qemu file format
plugins".
Furthermore, the Roadmap includes the following:
"... A special high-performance qcow plugin is also under
development, that supports better metadata caching, asynchronous IO,
and allows request reordering with appropriate safety barriers to
enforce correctness. It remains both forward and backward compatible
with existing qcow disk images, but makes adjustments to qemu's
default allocation policy when creating new disks such as to
optimize performance."
File types
~~~~~~~~~~
Raw disk image file
^^^^^^^^^^^^^^^^^^^
Advantages:
* Resizing supported
* Sparse file (filesystem dependend)
* simple and easily exportable
Disadvantages:
* Underlying filesystem needs to support sparse files (most
filesystems do, though)
QCOW disk image file
^^^^^^^^^^^^^^^^^^^^
Advantages:
* Smaller file size, even on filesystems which don't support holes
(i.e. sparse files)
* Snapshot support, where the image only represents changes made to an
underlying disk image
* Optional zlib based compression
* Optional AES encryption
Disadvantages:
* Resizing not supported yet (it's on the way)
VMDK disk image file
^^^^^^^^^^^^^^^^^^^^
This file format is directly based on the qemu vmdk driver, which is
synchronous and thus slow.
Detailed Design
===============
Terminology
+++++++++++
* **VBD** (Virtual Block Device): Persistent storage available to a
virtual machine, providing the abstraction of an actual block
storage device. VBDs may be actual block devices, filesystem images,
or remote/network storage.
* **Dom0** (Domain 0): The first domain to be started on a Xen
machine. Domain 0 is responsible for managing the system.
* **VM** (Virtual Machine): The environment in which a hosted
operating system runs, providing the abstraction of a dedicated
machine. A VM may be identical to the underlying hardware (as in
full virtualization, or it may differ, as in paravirtualization). In
the case of Xen the domU (unprivileged domain) instance is meant.
* **QCOW**: QEMU (a processor emulator) image format.
Implementation
++++++++++++++
Managing file-based instances
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The option for file-based storage will be added to the 'gnt-instance'
utility.
Add Instance
^^^^^^^^^^^^
Example:
gnt-instance add -t file:[path\ =[,driver=loop[,reuse[,...]]]] \
--disk 0:size=5G --disk 1:size=10G -n node -o debian-etch instance2
This will create a file-based instance with e.g. the following files:
* ``/sda`` -> 5GB
* ``/sdb`` -> 10GB
The default directory where files will be stored is
``/srv/ganeti/file-storage/``. This can be changed by setting the
```` option. This option denotes the full path to the directory
where the files are stored. The filetype will be "raw" for the first
release of Ganeti 2.0. However, the code will be extensible to more
file types, since Ganeti will store information about the file type of
each image file. Internally Ganeti will keep track of the used driver,
the file-type and the full path to the file for every VBD. Example:
"logical_id" : ``[FD_LOOP, FT_RAW, "/instance1/sda"]`` If the
``--reuse`` flag is set, Ganeti checks for existing files in the
corresponding directory (e.g. ``/xen/instance2/``). If one or more
files in this directory are present and correctly named (the naming
conventions will be defined in Ganeti version 2.0) Ganeti will set a
VM up with these. If no file can be found or the names or invalid the
operation will be aborted.
Remove instance
^^^^^^^^^^^^^^^
The instance removal will just differ from the actual one by deleting
the VBD-files instead of the corresponding block device (e.g. a logical
volume).
Starting/Stopping Instance
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here nothing has to be changed, as the xen tools don't differentiate
between file-based or blockdevice-based instances in this case.
Export/Import instance
^^^^^^^^^^^^^^^^^^^^^^
Provided "dump/restore" is used in the "export" and "import" guest-os
scripts, there are no modifications needed when file-based instances are
exported/imported. If any other backup-tool (which requires access to
the mounted file-system) is used then the image file can be temporarily
mounted. This can be done in different ways:
Mount a raw image file via loopback driver::
mount -o loop /srv/ganeti/file-storage/instance1/sda1 /mnt/disk\
Mount a raw image file via blkfront driver (Dom0 kernel needs this
module to do the following operation)::
xm block-attach 0 tap:aio:/srv/ganeti/file-storage/instance1/sda1 /dev/xvda1 w 0\
mount /dev/xvda1 /mnt/disk
Mount a qcow image file via blkfront driver (Dom0 kernel needs this
module to do the following operation)
xm block-attach 0 tap:qcow:/srv/ganeti/file-storage/instance1/sda1 /dev/xvda1 w 0
mount /dev/xvda1 /mnt/disk
High availability features with file-based instances
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Failing over an instance
^^^^^^^^^^^^^^^^^^^^^^^^
Failover is done in the same way as with block device backends. The
instance gets stopped on the primary node and started on the secondary.
The roles of primary and secondary get swapped. Note: If a failover is
done, Ganeti will assume that the corresponding VBD(s) location (i.e.
directory) is the same on the source and destination node. In case one
or more corresponding file(s) are not present on the destination node,
Ganeti will abort the operation.
Replacing an instance disks
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since there is no data mirroring for file-backed VM there is no such
operation.
Evacuation of a node
^^^^^^^^^^^^^^^^^^^^
Since there is no data mirroring for file-backed VMs there is no such
operation.
Live migration
^^^^^^^^^^^^^^
Live migration is possible using file-backed VBDs. However, the
administrator has to make sure that the corresponding files are exactly
the same on the source and destination node.
Xen Setup
+++++++++
File creation
~~~~~~~~~~~~~
Creation of a raw file is simple. Example of creating a sparse file of 2
Gigabytes. The option "seek" instructs "dd" to create a sparse file::
dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1
Creation of QCOW image files can be done with the "qemu-img" utility (in
debian it comes with the "qemu" package).
Config file
~~~~~~~~~~~
The Xen config file will have the following modification if one chooses
the file-based disk-template.
1) loopback driver and raw file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
disk = ['file:,sda1,w']
2) blktap driver and raw file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
disk = ['tap:aio:,sda1,w']
3) blktap driver and qcow file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
disk = ['tap:qcow:,sda1,w']
Other hypervisors
+++++++++++++++++
Other hypervisors have mostly differnet ways to make storage available
to their virtual instances/machines. This is beyond the scope of this
document.
ganeti-2.15.2/doc/design-glusterfs-ganeti-support.rst 0000644 0000000 0000000 00000031143 12634264163 0022622 0 ustar 00root root 0000000 0000000 ========================
GlusterFS Ganeti support
========================
This document describes the plan for adding GlusterFS support inside Ganeti.
.. contents:: :depth: 4
.. highlight:: shell-example
Gluster overview
================
Gluster is a "brick" "translation" service that can turn a number of LVM logical
volume or disks (so-called "bricks") into an unified "volume" that can be
mounted over the network through FUSE or NFS.
This is a simplified view of what components are at play and how they
interconnect as data flows from the actual disks to the instances. The parts in
grey are available for Ganeti to use and included for completeness but not
targeted for implementation at this stage.
.. digraph:: "gluster-ganeti-overview"
graph [ spline=ortho ]
node [ shape=rect ]
{
node [ shape=none ]
_volume [ label=volume ]
bricks -> translators -> _volume
_volume -> network [label=transport]
network -> instances
}
{ rank=same; brick1 [ shape=oval ]
brick2 [ shape=oval ]
brick3 [ shape=oval ]
bricks }
{ rank=same; translators distribute }
{ rank=same; volume [ shape=oval ]
_volume }
{ rank=same; instances instanceA instanceB instanceC instanceD }
{ rank=same; network FUSE NFS QEMUC QEMUD }
{
node [ shape=oval ]
brick1 [ label=brick ]
brick2 [ label=brick ]
brick3 [ label=brick ]
}
{
node [ shape=oval ]
volume
}
brick1 -> distribute
brick2 -> distribute
brick3 -> distribute -> volume
volume -> FUSE [ label=UDP>
color="black:grey" ]
NFS [ color=grey fontcolor=grey ]
volume -> NFS [ label="TCP" color=grey fontcolor=grey ]
NFS -> mountpoint [ color=grey fontcolor=grey ]
mountpoint [ shape=oval ]
FUSE -> mountpoint
instanceA [ label=instances ]
instanceB [ label=instances ]
mountpoint -> instanceA
mountpoint -> instanceB
mountpoint [ shape=oval ]
QEMUC [ label=QEMU ]
QEMUD [ label=QEMU ]
{
instanceC [ label=instances ]
instanceD [ label=instances ]
}
volume -> QEMUC [ label=UDP>
color="black:grey" ]
volume -> QEMUD [ label=UDP>
color="black:grey" ]
QEMUC -> instanceC
QEMUD -> instanceD
brick:
The unit of storage in gluster. Typically a drive or LVM logical volume
formatted using, for example, XFS.
distribute:
One of the translators in Gluster, it assigns files to bricks based on the
hash of their full path inside the volume.
volume:
A filesystem you can mount on multiple machines; all machines see the same
directory tree and files.
FUSE/NFS:
Gluster offers two ways to mount volumes: through FUSE or a custom NFS server
that is incompatible with other NFS servers. FUSE is more compatible with
other services running on the storage nodes; NFS gives better performance.
For now, FUSE is a priority.
QEMU:
QEMU 1.3 has the ability to use Gluster volumes directly in userspace without
the need for mounting anything. Ganeti still needs kernelspace access at disk
creation and OS install time.
transport:
FUSE and QEMU allow you to connect using TCP and UDP, whereas NFS only
supports TCP. Those protocols are called transports in Gluster. For now, TCP
is a priority.
It is the administrator's duty to set up the bricks, the translators and thus
the volume as they see fit. Ganeti will take care of connecting the instances to
a given volume.
.. note::
The gluster mountpoint must be whitelisted by the administrator in
``/etc/ganeti/file-storage-paths`` for security reasons in order to allow
Ganeti to modify the filesystem.
Why not use a ``sharedfile`` disk template?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gluster volumes `can` be used by Ganeti using the generic shared file disk
template. There is a number of reasons why that is probably not a good idea,
however:
* Shared file, being a generic solution, cannot offer userspace access support.
* Even with userspace support, Ganeti still needs kernelspace access in order to
create disks and install OSes on them. Ganeti can manage the mounting for you
so that the Gluster servers only have as many connections as necessary.
* Experiments showed that you can't trust ``mount.glusterfs`` to give useful
return codes or error messages. Ganeti can work around its oddities so
administrators don't have to.
* The shared file folder scheme (``../{instance.name}/disk{disk.id}``) does not
work well with Gluster. The ``distribute`` translator distributes files across
bricks, but directories need to be replicated on `all` bricks. As a result, if
we have a dozen hundred instances, that means a dozen hundred folders being
replicated on all bricks. This does not scale well.
* This frees up the shared file disk template to use a different, unsupported
replication scheme together with Gluster. (Storage pools are the long term
solution for this, however.)
So, while gluster `is` a shared file disk template, essentially, Ganeti can
provide better support for it than that.
Implementation strategy
=======================
Working with GlusterFS in kernel space essentially boils down to:
1. Ask FUSE to mount the Gluster volume.
2. Check that the mount succeeded.
3. Use files stored in the volume as instance disks, just like sharedfile does.
4. When the instances are spun down, attempt unmounting the volume. If the
gluster connection is still required, the mountpoint is allowed to remain.
Since it is not strictly necessary for Gluster to mount the disk if all that's
needed is userspace access, however, it is inappropriate for the Gluster storage
class to inherit from FileStorage. So the implementation should resort to
composition rather than inheritance:
1. Extract the ``FileStorage`` disk-facing logic into a ``FileDeviceHelper``
class.
* In order not to further inflate bdev.py, Filestorage should join its helper
functions in filestorage.py (thus reducing their visibility) and add Gluster
to its own file, gluster.py. Moving the other classes to their own files
like it's been done in ``lib/hypervisor/``) is not addressed as part of this
design.
2. Use the ``FileDeviceHelper`` class to implement a ``GlusterStorage`` class in
much the same way.
3. Add Gluster as a disk template that behaves like SharedFile in every way.
4. Provide Ganeti knowledge about what a ``GlusterVolume`` is and how to mount,
unmount and reference them.
* Before attempting a mount, we should check if the volume is not mounted
already. Linux allows mounting partitions multiple times, but then you also
have to unmount them as many times as you mounted them to actually free the
resources; this also makes the output of commands such as ``mount`` less
useful.
* Every time the device could be released (after instance shutdown, OS
installation scripts or file creation), a single unmount is attempted. If
the device is still busy (e.g. from other instances, jobs or open
administrator shells), the failure is ignored.
5. Modify ``GlusterStorage`` and customize the disk template behavior to fit
Gluster's needs.
Directory structure
~~~~~~~~~~~~~~~~~~~
In order to address the shortcomings of the generic shared file handling of
instance disk directory structure, Gluster uses a different scheme for
determining a disk's logical id and therefore path on the file system.
The naming scheme is::
/ganeti/{instance.uuid}.{disk.id}
...bringing the actual path on a node's file system to::
/var/run/ganeti/gluster/ganeti/{instance.uuid}.{disk.id}
This means Ganeti only uses one folder on the Gluster volume (allowing other
uses of the Gluster volume in the meantime) and works better with how Gluster
distributes storage over its bricks.
Changes to the storage types system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ganeti has a number of storage types that abstract over disk templates. This
matters mainly in terms of disk space reporting. Gluster support is improved by
a rethinking of how disk templates are assigned to storage types in Ganeti.
This is the summary of the changes:
+--------------+---------+---------+-------------------------------------------+
| Disk | Current | New | Does it report storage information to... |
| template | storage | storage +-------------+----------------+------------+
| | type | type | ``gnt-node | ``gnt-node | iallocator |
| | | | list`` | list-storage`` | |
+==============+=========+=========+=============+================+============+
| File | File | File | Yes. | Yes. | Yes. |
+--------------+---------+---------+-------------+----------------+------------+
| Shared file | File | Shared | No. | Yes. | No. |
+--------------+---------+ file | | | |
| Gluster (new)| N/A | (new) | | | |
+--------------+---------+---------+-------------+----------------+------------+
| RBD (for | RBD | No. | No. | No. |
| reference) | | | | |
+--------------+-------------------+-------------+----------------+------------+
Gluster or Shared File should not, like RBD, report storage information to
gnt-node list or to IAllocators. Regrettably, the simplest way to do so right
now is by claiming that storage reporting for the relevant storage type is not
implemented. An effort was made to claim that the shared storage type did support
disk reporting while refusing to provide any value, but it was not successful
(``hail`` does not support this combination.)
To do so without breaking the File disk template, a new storage type must be
added. Like RBD, it does not claim to support disk reporting. However, we can
still make an effort of reporting stats to ``gnt-node list-storage``.
The rationale is simple. For shared file and gluster storage, disk space is not
a function of any one node. If storage types with disk space reporting are used,
Hail expects them to give useful numbers for allocation purposes, but a shared
storage system means disk balancing is not affected by node-instance allocation
any longer. Moreover, it would be wasteful to mount a Gluster volume on each
node just for running statvfs() if no machine was actually running gluster VMs.
As a result, Gluster support for gnt-node list-storage is necessarily limited
and nodes on which Gluster is available but not in use will report failures.
Additionally, running ``gnt-node list`` will give an output like this::
Node DTotal DFree MTotal MNode MFree Pinst Sinst
node1.example.com ? ? 744M 273M 477M 0 0
node2.example.com ? ? 744M 273M 477M 0 0
This is expected and consistent with behaviour in RBD.
An alternative would have been to report DTotal and DFree as 0 in order to allow
``hail`` to ignore the disk information, but this incorrectly populates the
``gnt-node list`` DTotal and DFree fields with 0s as well.
New configuration switches
~~~~~~~~~~~~~~~~~~~~~~~~~~
Configurable at the cluster and node group level (``gnt-cluster modify``,
``gnt-group modify`` and other commands that support the `-D` switch to edit
disk parameters):
``gluster:host``
The IP address or hostname of the Gluster server to connect to. In the default
deployment of Gluster, that is any machine that is hosting a brick.
Default: ``"127.0.0.1"``
``gluster:port``
The port where the Gluster server is listening to.
Default: ``24007``
``gluster:volume``
The volume Ganeti should use.
Default: ``"gv0"``
Configurable at the cluster level only (``gnt-cluster init``) and stored in
ssconf for all nodes to read (just like shared file):
``--gluster-dir``
Where the Gluster volume should be mounted.
Default: ``/var/run/ganeti/gluster``
The default values work if all of the Ganeti nodes also host Gluster bricks.
This is possible, but `not` recommended as it can cause the host to hardlock due
to deadlocks in the kernel memory (much in the same way RBD works).
Future work
===========
In no particular order:
* Support the UDP transport.
* Support mounting through NFS.
* Filter ``gnt-node list`` so DTotal and DFree are not shown for RBD and shared
file disk types, or otherwise report the disk storage values as "-" or some
other special value to clearly distinguish it from the result of a
communication failure between nodes.
* Allow configuring the in-volume path Ganeti uses.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-hotplug.rst 0000644 0000000 0000000 00000025317 12634264163 0017315 0 ustar 00root root 0000000 0000000 =======
Hotplug
=======
.. contents:: :depth: 4
This is a design document detailing the implementation of device
hotplugging in Ganeti. The logic used is hypervisor agnostic but still
the initial implementation will target the KVM hypervisor. The
implementation adds ``python-fdsend`` as a new dependency. In case
it is not installed hotplug will not be possible and the user will
be notified with a warning.
Current state and shortcomings
==============================
Currently, Ganeti supports addition/removal/modification of devices
(NICs, Disks) but the actual modification takes place only after
rebooting the instance. To this end an instance cannot change network,
get a new disk etc. without a hard reboot.
Until now, in case of KVM hypervisor, code does not name devices nor
places them in specific PCI slots. Devices are appended in the KVM
command and Ganeti lets KVM decide where to place them. This means that
there is a possibility a device that resides in PCI slot 5, after a
reboot (due to another device removal) to be moved to another PCI slot
and probably get renamed too (due to udev rules, etc.).
In order for a migration to succeed, the process on the target node
should be started with exactly the same machine version, CPU
architecture and PCI configuration with the running process. During
instance creation/startup ganeti creates a KVM runtime file with all the
necessary information to generate the KVM command. This runtime file is
used during instance migration to start a new identical KVM process. The
current format includes the fixed part of the final KVM command, a list
of NICs', and hvparams dict. It does not favor easy manipulations
concerning disks, because they are encapsulated in the fixed KVM
command.
Proposed changes
================
For the case of the KVM hypervisor, QEMU exposes 32 PCI slots to the
instance. Disks and NICs occupy some of these slots. Recent versions of
QEMU have introduced monitor commands that allow addition/removal of PCI
devices. Devices are referenced based on their name or position on the
virtual PCI bus. To be able to use these commands, we need to be able to
assign each device a unique name.
To keep track where each device is plugged into, we add the
``pci`` slot to Disk and NIC objects, but we save it only in runtime
files, since it is hypervisor specific info. This is added for easy
object manipulation and is ensured not to be written back to the config.
We propose to make use of QEMU 1.7 QMP commands so that
modifications to devices take effect instantly without the need for hard
reboot. The only change exposed to the end-user will be the addition of
a ``--hotplug`` option to the ``gnt-instance modify`` command.
Upon hotplugging the PCI configuration of an instance is changed.
Runtime files should be updated correspondingly. Currently this is
impossible in case of disk hotplug because disks are included in command
line entry of the runtime file, contrary to NICs that are correctly
treated separately. We change the format of runtime files, we remove
disks from the fixed KVM command and create new entry containing them
only. KVM options concerning disk are generated during
``_ExecuteKVMCommand()``, just like NICs.
Design decisions
================
Which should be each device ID? Currently KVM does not support arbitrary
IDs for devices; supported are only names starting with a letter, max 32
chars length, and only including '.' '_' '-' special chars.
For debugging purposes and in order to be more informative, device will be
named after: --pci-.
Who decides where to hotplug each device? As long as this is a
hypervisor specific matter, there is no point for the master node to
decide such a thing. Master node just has to request noded to hotplug a
device. To this end, hypervisor specific code should parse the current
PCI configuration (i.e. ``query-pci`` QMP command), find the first
available slot and hotplug the device. Having noded to decide where to
hotplug a device we ensure that no error will occur due to duplicate
slot assignment (if masterd keeps track of PCI reservations and noded
fails to return the PCI slot that the device was plugged into then next
hotplug will fail).
Where should we keep track of devices' PCI slots? As already mentioned,
we must keep track of devices PCI slots to successfully migrate
instances. First option is to save this info to config data, which would
allow us to place each device at the same PCI slot after reboot. This
would require to make the hypervisor return the PCI slot chosen for each
device, and storing this information to config data. Additionally the
whole instance configuration should be returned with PCI slots filled
after instance start and each instance should keep track of current PCI
reservations. We decide not to go towards this direction in order to
keep it simple and do not add hypervisor specific info to configuration
data (``pci_reservations`` at instance level and ``pci`` at device
level). For the aforementioned reason, we decide to store this info only
in KVM runtime files.
Where to place the devices upon instance startup? QEMU has by default 4
pre-occupied PCI slots. So, hypervisor can use the remaining ones for
disks and NICs. Currently, PCI configuration is not preserved after
reboot. Each time an instance starts, KVM assigns PCI slots to devices
based on their ordering in Ganeti configuration, i.e. the second disk
will be placed after the first, the third NIC after the second, etc.
Since we decided that there is no need to keep track of devices PCI
slots, there is no need to change current functionality.
How to deal with existing instances? Hotplug depends on runtime file
manipulation. It stores there pci info and every device the kvm process is
currently using. Existing files have no pci info in devices and have block
devices encapsulated inside kvm_cmd entry. Thus hotplugging of existing devices
will not be possible. Still migration and hotplugging of new devices will
succeed. The workaround will happen upon loading kvm runtime: if we detect old
style format we will add an empty list for block devices and upon saving kvm
runtime we will include this empty list as well. Switching entirely to new
format will happen upon instance reboot.
Configuration changes
---------------------
The ``NIC`` and ``Disk`` objects get one extra slot: ``pci``. It refers to
PCI slot that the device gets plugged into.
In order to be able to live migrate successfully, runtime files should
be updated every time a live modification (hotplug) takes place. To this
end we change the format of runtime files. The KVM options referring to
instance's disks are no longer recorded as part of the KVM command line.
Disks are treated separately, just as we treat NICs right now. We insert
and remove entries to reflect the current PCI configuration.
Backend changes
---------------
Introduce one new RPC call:
- hotplug_device(DEVICE_TYPE, ACTION, device, ...)
where DEVICE_TYPE can be either NIC or Disk, and ACTION either REMOVE or ADD.
Hypervisor changes
------------------
We implement hotplug on top of the KVM hypervisor. We take advantage of
QEMU 1.7 QMP commands (``device_add``, ``device_del``,
``blockdev-add``, ``netdev_add``, ``netdev_del``). Since ``drive_del``
is not yet implemented in QMP we use the one of HMP. QEMU
refers to devices based on their id. We use ``uuid`` to name them
properly. If a device is about to be hotplugged we parse the output of
``query-pci`` and find the occupied PCI slots. We choose the first
available and the whole device object is appended to the corresponding
entry in the runtime file.
Concerning NIC handling, we build on the top of the existing logic
(first create a tap with _OpenTap() and then pass its file descriptor to
the KVM process). To this end we need to pass access rights to the
corresponding file descriptor over the QMP socket (UNIX domain
socket). The open file is passed as a socket-level control message
(SCM), using the ``fdsend`` python library.
User interface
--------------
The new ``--hotplug`` option to gnt-instance modify is introduced, which
forces live modifications.
Enabling hotplug
++++++++++++++++
Hotplug will be optional during gnt-instance modify. For existing
instance, after installing a version that supports hotplugging we
have the restriction that hotplug will not be supported for existing
devices. The reason is that old runtime files lack of:
1. Device pci configuration info.
2. Separate block device entry.
Hotplug will be supported only for KVM in the first implementation. For
all other hypervisors, backend will raise an Exception case hotplug is
requested.
NIC Hotplug
+++++++++++
The user can add/modify/remove NICs either with hotplugging or not. If a
NIC is to be added a tap is created first and configured properly with
kvm-vif-bridge script. Then the instance gets a new network interface.
Since there is no QEMU monitor command to modify a NIC, we modify a NIC
by temporary removing the existing one and adding a new with the new
configuration. When removing a NIC the corresponding tap gets removed as
well.
::
gnt-instance modify --net add --hotplug test
gnt-instance modify --net 1:mac=aa:00:00:55:44:33 --hotplug test
gnt-instance modify --net 1:remove --hotplug test
Disk Hotplug
++++++++++++
The user can add and remove disks with hotplugging or not. QEMU monitor
supports resizing of disks, however the initial implementation will
support only disk addition/deletion.
::
gnt-instance modify --disk add:size=1G --hotplug test
gnt-instance modify --net 1:remove --hotplug test
Dealing with chroot and uid pool (and disks in general)
-------------------------------------------------------
The design so far covers all issues that arise without addressing the
case where the kvm process will not run with root privileges.
Specifically:
- in case of chroot, the kvm process cannot see the newly created device
- in case of uid pool security model, the kvm process is not allowed
to access the device
For NIC hotplug we address this problem by using the ``getfd`` QMP
command and passing the file descriptor to the kvm process over the
monitor socket using SCM_RIGHTS. For disk hotplug and in case of uid
pool we can let the hypervisor code temporarily ``chown()`` the device
before the actual hotplug. Still this is insufficient in case of chroot.
In this case, we need to ``mknod()`` the device inside the chroot. Both
workarounds can be avoided, if we make use of the ``add-fd``
QMP command, that was introduced in version 1.7. This command is the
equivalent of NICs' `get-fd`` for disks and will allow disk hotplug in
every case. So, if the QMP does not support the ``add-fd``
command, we will not allow disk hotplug
and notify the user with the corresponding warning.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-hroller.rst 0000644 0000000 0000000 00000016315 12634264163 0017300 0 ustar 00root root 0000000 0000000 ============
HRoller tool
============
.. contents:: :depth: 4
This is a design document detailing the cluster maintenance scheduler,
HRoller.
Current state and shortcomings
==============================
To enable automating cluster-wide reboots a new htool, called HRoller,
was added to Ganeti starting from version 2.7. This tool helps
parallelizing cluster offline maintenances by calculating which nodes
are not both primary and secondary for a DRBD instance, and thus can be
rebooted at the same time, when all instances are down.
The way this is done is documented in the :manpage:`hroller(1)` manpage.
We would now like to perform online maintenance on the cluster by
rebooting nodes after evacuating their primary instances (rolling
reboots).
Proposed changes
================
New options
-----------
- HRoller should be able to operate on single nodegroups (-G flag) or
select its target node through some other mean (eg. via a tag, or a
regexp). (Note that individual node selection is already possible via
the -O flag, that makes hroller ignore a node altogether).
- HRoller should handle non redundant instances: currently these are
ignored but there should be a way to select its behavior between "it's
ok to reboot a node when a non-redundant instance is on it" or "skip
nodes with non-redundant instances". This will only be selectable
globally, and not per instance.
- Hroller will make sure to keep any instance which is up in its current
state, via live migrations, unless explicitly overridden. The
algorithm that will be used calculate the rolling reboot with live
migrations is described below, and any override on considering the
instance status will only be possible on the whole run, and not
per-instance.
Calculating rolling maintenances
--------------------------------
In order to perform rolling maintenance we need to migrate instances off
the nodes before a reboot. How this can be done depends on the
instance's disk template and status:
Down instances
++++++++++++++
If an instance was shutdown when the maintenance started it will be
considered for avoiding contemporary reboot of its primary and secondary
nodes, but will *not* be considered as a target for the node evacuation.
This allows avoiding needlessly moving its primary around, since it
won't suffer a downtime anyway.
Note that a node with non-redundant instances will only ever be
considered good for rolling-reboot if these are down (or the checking of
status is overridden) *and* an explicit option to allow it is set.
DRBD
++++
Each node must migrate all instances off to their secondaries, and then
can either be rebooted, or the secondaries can be evacuated as well.
Since currently doing a ``replace-disks`` on DRBD breaks redundancy,
it's not any safer than temporarily rebooting a node with secondaries on
them (citation needed). As such we'll implement for now just the
"migrate+reboot" mode, and focus later on replace-disks as well.
In order to do that we can use the following algorithm:
1) Compute node sets that don't contain both the primary and the
secondary of any instance, and also don't contain the primary
nodes of two instances that have the same node as secondary. These
can be obtained by computing a coloring of the graph with nodes
as vertexes and an edge between two nodes, if either condition
prevents simultaneous maintenance. (This is the current algorithm of
:manpage:`hroller(1)` with the extension that the graph to be colored
has additional edges between the primary nodes of two instances sharing
their secondary node.)
2) It is then possible to migrate in parallel all nodes in a set
created at step 1, and then reboot/perform maintenance on them, and
migrate back their original primaries, which allows the computation
above to be reused for each following set without N+1 failures
being triggered, if none were present before. See below about the
actual execution of the maintenance.
Non-DRBD
++++++++
All non-DRBD disk templates that can be migrated have no "secondary"
concept. As such instances can be migrated to any node (in the same
nodegroup). In order to do the job we can either:
- Perform migrations on one node at a time, perform the maintenance on
that node, and proceed (the node will then be targeted again to host
instances automatically, as hail chooses targets for the instances
between all nodes in a group. Nodes in different nodegroups can be
handled in parallel.
- Perform migrations on one node at a time, but without waiting for the
first node to come back before proceeding. This allows us to continue,
restricting the cluster, until no more capacity in the nodegroup is
available, and then having to wait for some nodes to come back so that
capacity is available again for the last few nodes.
- Pre-Calculate sets of nodes that can be migrated together (probably
with a greedy algorithm) and parallelize between them, with the
migrate-back approach discussed for DRBD to perform the calculation
only once.
Note that for non-DRBD disks that still use local storage (eg. RBD and
plain) redundancy might break anyway, and nothing except the first
algorithm might be safe. This perhaps would be a good reason to consider
managing better RBD pools, if those are implemented on top of nodes
storage, rather than on dedicated storage machines.
Full-Evacuation
+++++++++++++++
If full evacuation of the nodes to be rebooted is desired, a simple
migration is not enough for the DRBD instances. To keep the number of
disk operations small, we restrict moves to ``migrate, replace-secondary``.
That is, after migrating instances out of the nodes to be rebooted,
replacement secondaries are searched for, for all instances that have
their then secondary on one of the rebooted nodes. This is done by a
greedy algorithm, refining the initial reboot partition, if necessary.
Future work
===========
Hroller should become able to execute rolling maintenances, rather than
just calculate them. For this to succeed properly one of the following
must happen:
- HRoller handles rolling maintenances that happen at the same time as
unrelated cluster jobs, and thus recalculates the maintenance at each
step
- HRoller can selectively drain the cluster so it's sure that only the
rolling maintenance can be going on
DRBD nodes' ``replace-disks``' functionality should be implemented. Note
that when we will support a DRBD version that allows multi-secondary
this can be done safely, without losing replication at any time, by
adding a temporary secondary and only when the sync is finished dropping
the previous one.
Non-redundant (plain or file) instances should have a way to be moved
off as well via plain storage live migration or ``gnt-instance move``
(which requires downtime).
If/when RBD pools can be managed inside Ganeti, care can be taken so
that the pool is evacuated as well from a node before it's put into
maintenance. This is equivalent to evacuating DRBD secondaries.
Master failovers during the maintenance should be performed by hroller.
This requires RPC/RAPI support for master failover. Hroller should also
be modified to better support running on the master itself and
continuing on the new master.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-hsqueeze.rst 0000644 0000000 0000000 00000013206 12634264163 0017456 0 ustar 00root root 0000000 0000000 =============
HSqueeze tool
=============
.. contents:: :depth: 4
This is a design document detailing the node-freeing scheduler, HSqueeze.
Current state and shortcomings
==============================
Externally-mirrored instances can be moved between nodes at low
cost. Therefore, it is attractive to free up nodes and power them down
at times of low usage, even for small periods of time, like nights or
weekends.
Currently, the best way to find out a suitable set of nodes to shut down
is to use the property of our balancedness metric to move instances
away from drained nodes. So, one would manually drain more and more
nodes and see, if `hbal` could find a solution freeing up all those
drained nodes.
Proposed changes
================
We propose the addition of a new htool command-line tool, called
`hsqueeze`, that aims at keeping resource usage at a constant high
level by evacuating and powering down nodes, or powering up nodes and
rebalancing, as appropriate. By default, only externally-mirrored
instances are moved, but options are provided to additionally take
DRBD instances (which can be moved without downtimes), or even all
instances into consideration.
Tagging of standy nodes
-----------------------
Powering down nodes that are technically healthy effectively creates a
new node state: nodes on standby. To avoid further state
proliferation, and as this information is only used by `hsqueeze`,
this information is recorded in node tags. `hsqueeze` will assume
that offline nodes having a tag with prefix `htools:standby:` can
easily be powered on at any time.
Minimum available resources
---------------------------
To keep the squeezed cluster functional, a minimal amount of resources
will be left available on every node. While the precise amount will
be specifiable via command-line options, a sensible default is chosen,
like enough resource to start an additional instance at standard
allocation on each node. If the available resources fall below this
limit, `hsqueeze` will, in fact, try to power on more nodes, till
enough resources are available, or all standy nodes are online.
To avoid flapping behavior, a second, higher, amount of reserve
resources can be specified, and `hsqueeze` will only power down nodes,
if after the power down this higher amount of reserve resources is
still available.
Computation of the set to free up
---------------------------------
To determine which nodes can be powered down, `hsqueeze` basically
follows the same algorithm as the manual process. It greedily goes
through all non-master nodes and tries if the algorithm used by `hbal`
would find a solution (with the appropriate move restriction) that
frees up the extended set of nodes to be drained, while keeping enough
resources free. Being based on the algorithm used by `hbal`, all
restrictions respected by `hbal`, in particular memory reservation
for N+1 redundancy, are also respected by `hsqueeze`.
The order in which the nodes are tried is choosen by a
suitable heuristics, like trying the nodes in order of increasing
number of instances; the hope is that this reduces the number of
instances that actually have to be moved.
If the amount of free resources has fallen below the lower limit,
`hsqueeze` will determine the set of nodes to power up in a similar
way; it will hypothetically add more and more of the standby
nodes (in some suitable order) till the algorithm used by `hbal` will
finally balance the cluster in a way that enough resources are available,
or all standy nodes are online.
Instance moves and execution
----------------------------
Once the final set of nodes to power down is determined, the instance
moves are determined by the algorithm used by `hbal`. If
requested by the `-X` option, the nodes freed up are drained, and the
instance moves are executed in the same way as `hbal` does. Finally,
those of the freed-up nodes that do not already have a
`htools:standby:` tag are tagged as `htools:standby:auto`, all free-up
nodes are marked as offline and powered down via the
:doc:`design-oob`.
Similarly, if it is determined that nodes need to be added, then first
the nodes are powered up via the :doc:`design-oob`, then they're marked
as online and finally,
the cluster is balanced in the same way, as `hbal` would do. For the
newly powered up nodes, the `htools:standby:auto` tag, if present, is
removed, but no other tags are removed (including other
`htools:standby:` tags).
Design choices
==============
The proposed algorithm builds on top of the already present balancing
algorithm, instead of greedily packing nodes as full as possible. The
reason is, that in the end, a balanced cluster is needed anyway;
therefore, basing on the balancing algorithm reduces the number of
instance moves. Additionally, the final configuration will also
benefit from all improvements to the balancing algorithm, like taking
dynamic CPU data into account.
We decided to have a separate program instead of adding an option to
`hbal` to keep the interfaces, especially that of `hbal`, cleaner. It is
not unlikely that, over time, additional `hsqueeze`-specific options
might be added, specifying, e.g., which nodes to prefer for
shutdown. With the approach of the `htools` of having a single binary
showing different behaviors, having an additional program also does not
introduce significant additional cost.
We decided to have a whole prefix instead of a single tag reserved
for marking standby nodes (we consider all tags starting with
`htools:standby:` as serving only this purpose). This is not only in
accordance with the tag
reservations for other tools, but it also allows for further extension
(like specifying priorities on which nodes to power up first) without
changing name spaces.
ganeti-2.15.2/doc/design-htools-2.3.rst 0000644 0000000 0000000 00000031435 12634264163 0017441 0 ustar 00root root 0000000 0000000 ====================================
Synchronising htools to Ganeti 2.3
====================================
Ganeti 2.3 introduces a number of new features that change the cluster
internals significantly enough that the htools suite needs to be
updated accordingly in order to function correctly.
Shared storage support
======================
Currently, the htools algorithms presume a model where all of an
instance's resources is served from within the cluster, more
specifically from the nodes comprising the cluster. While is this
usual for memory and CPU, deployments which use shared storage will
invalidate this assumption for storage.
To account for this, we need to move some assumptions from being
implicit (and hardcoded) to being explicitly exported from Ganeti.
New instance parameters
-----------------------
It is presumed that Ganeti will export for all instances a new
``storage_type`` parameter, that will denote either internal storage
(e.g. *plain* or *drbd*), or external storage.
Furthermore, a new ``storage_pool`` parameter will classify, for both
internal and external storage, the pool out of which the storage is
allocated. For internal storage, this will be either ``lvm`` (the pool
that provides space to both ``plain`` and ``drbd`` instances) or
``file`` (for file-storage-based instances). For external storage,
this will be the respective NAS/SAN/cloud storage that backs up the
instance. Note that for htools, external storage pools are opaque; we
only care that they have an identifier, so that we can distinguish
between two different pools.
If these two parameters are not present, the instances will be
presumed to be ``internal/lvm``.
New node parameters
-------------------
For each node, it is expected that Ganeti will export what storage
types it supports and pools it has access to. So a classic 2.2 cluster
will have all nodes supporting ``internal/lvm`` and/or
``internal/file``, whereas a new shared storage only 2.3 cluster could
have ``external/my-nas`` storage.
Whatever the mechanism that Ganeti will use internally to configure
the associations between nodes and storage pools, we consider that
we'll have available two node attributes inside htools: the list of internal
and external storage pools.
External storage and instances
------------------------------
Currently, for an instance we allow one cheap move type: failover to
the current secondary, if it is a healthy node, and four other
“expensive†(as in, including data copies) moves that involve changing
either the secondary or the primary node or both.
In presence of an external storage type, the following things will
change:
- the disk-based moves will be disallowed; this is already a feature
in the algorithm, controlled by a boolean switch, so adapting
external storage here will be trivial
- instead of the current one secondary node, the secondaries will
become a list of potential secondaries, based on access to the
instance's storage pool
Except for this, the basic move algorithm remains unchanged.
External storage and nodes
--------------------------
Two separate areas will have to change for nodes and external storage.
First, then allocating instances (either as part of a move or a new
allocation), if the instance is using external storage, then the
internal disk metrics should be ignored (for both the primary and
secondary cases).
Second, the per-node metrics used in the cluster scoring must take
into account that nodes might not have internal storage at all, and
handle this as a well-balanced case (score 0).
N+1 status
----------
Currently, computing the N+1 status of a node is simple:
- group the current secondary instances by their primary node, and
compute the sum of each instance group memory
- choose the maximum sum, and check if it's smaller than the current
available memory on this node
In effect, computing the N+1 status is a per-node matter. However,
with shared storage, we don't have secondary nodes, just potential
secondaries. Thus computing the N+1 status will be a cluster-level
matter, and much more expensive.
A simple version of the N+1 checks would be that for each instance
having said node as primary, we have enough memory in the cluster for
relocation. This means we would actually need to run allocation
checks, and update the cluster status from within allocation on one
node, while being careful that we don't recursively check N+1 status
during this relocation, which is too expensive.
However, the shared storage model has some properties that changes the
rules of the computation. Speaking broadly (and ignoring hard
restrictions like tag based exclusion and CPU limits), the exact
location of an instance in the cluster doesn't matter as long as
memory is available. This results in two changes:
- simply tracking the amount of free memory buckets is enough,
cluster-wide
- moving an instance from one node to another would not change the N+1
status of any node, and only allocation needs to deal with N+1
checks
Unfortunately, this very cheap solution fails in case of any other
exclusion or prevention factors.
TODO: find a solution for N+1 checks.
Node groups support
===================
The addition of node groups has a small impact on the actual
algorithms, which will simply operate at node group level instead of
cluster level, but it requires the addition of new algorithms for
inter-node group operations.
The following two definitions will be used in the following
paragraphs:
local group
The local group refers to a node's own node group, or when speaking
about an instance, the node group of its primary node
regular cluster
A cluster composed of a single node group, or pre-2.3 cluster
super cluster
This term refers to a cluster which comprises multiple node groups,
as opposed to a 2.2 and earlier cluster with a single node group
In all the below operations, it's assumed that Ganeti can gather the
entire super cluster state cheaply.
Balancing changes
-----------------
Balancing will move from cluster-level balancing to group
balancing. In order to achieve a reasonable improvement in a super
cluster, without needing to keep state of what groups have been
already balanced previously, the balancing algorithm will run as
follows:
#. the cluster data is gathered
#. if this is a regular cluster, as opposed to a super cluster,
balancing will proceed normally as previously
#. otherwise, compute the cluster scores for all groups
#. choose the group with the worst score and see if we can improve it;
if not choose the next-worst group, so on
#. once a group has been identified, run the balancing for it
Of course, explicit selection of a group will be allowed.
Super cluster operations
++++++++++++++++++++++++
Beside the regular group balancing, in a super cluster we have more
operations.
Redistribution
^^^^^^^^^^^^^^
In a regular cluster, once we run out of resources (offline nodes
which can't be fully evacuated, N+1 failures, etc.) there is nothing
we can do unless nodes are added or instances are removed.
In a super cluster however, there might be resources available in
another group, so there is the possibility of relocating instances
between groups to re-establish N+1 success within each group.
One difficulty in the presence of both super clusters and shared
storage is that the move paths of instances are quite complicated;
basically an instance can move inside its local group, and to any
other groups which have access to the same storage type and storage
pool pair. In effect, the super cluster is composed of multiple
‘partitions’, each containing one or more groups, but a node is
simultaneously present in multiple partitions, one for each storage
type and storage pool it supports. As such, the interactions between
the individual partitions are too complex for non-trivial clusters to
assume we can compute a perfect solution: we might need to move some
instances using shared storage pool ‘A’ in order to clear some more
memory to accept an instance using local storage, which will further
clear more VCPUs in a third partition, etc. As such, we'll limit
ourselves at simple relocation steps within a single partition.
Algorithm:
#. read super cluster data, and exit if cluster doesn't allow
inter-group moves
#. filter out any groups that are “alone†in their partition
(i.e. no other group sharing at least one storage method)
#. determine list of healthy versus unhealthy groups:
#. a group which contains offline nodes still hosting instances is
definitely not healthy
#. a group which has nodes failing N+1 is ‘weakly’ unhealthy
#. if either list is empty, exit (no work to do, or no way to fix problems)
#. for each unhealthy group:
#. compute the instances that are causing the problems: all
instances living on offline nodes, all instances living as
secondary on N+1 failing nodes, all instances living as primaries
on N+1 failing nodes (in this order)
#. remove instances, one by one, until the source group is healthy
again
#. try to run a standard allocation procedure for each instance on
all potential groups in its partition
#. if all instances were relocated successfully, it means we have a
solution for repairing the original group
Compression
^^^^^^^^^^^
In a super cluster which has had many instance reclamations, it is
possible that while none of the groups is empty, overall there is
enough empty capacity that an entire group could be removed.
The algorithm for “compressing†the super cluster is as follows:
#. read super cluster data
#. compute total *(memory, disk, cpu)*, and free *(memory, disk, cpu)*
for the super-cluster
#. computer per-group used and free *(memory, disk, cpu)*
#. select candidate groups for evacuation:
#. they must be connected to other groups via a common storage type
and pool
#. they must have fewer used resources than the global free
resources (minus their own free resources)
#. for each of these groups, try to relocate all its instances to
connected peer groups
#. report the list of groups that could be evacuated, or if instructed
so, perform the evacuation of the group with the largest free
resources (i.e. in order to reclaim the most capacity)
Load balancing
^^^^^^^^^^^^^^
Assuming a super cluster using shared storage, where instance failover
is cheap, it should be possible to do a load-based balancing across
groups.
As opposed to the normal balancing, where we want to balance on all
node attributes, here we should look only at the load attributes; in
other words, compare the available (total) node capacity with the
(total) load generated by instances in a given group, and computing
such scores for all groups, trying to see if we have any outliers.
Once a reliable load-weighting method for groups exists, we can apply
a modified version of the cluster scoring method to score not
imbalances across nodes, but imbalances across groups which result in
a super cluster load-related score.
Allocation changes
------------------
It is important to keep the allocation method across groups internal
(in the Ganeti/Iallocator combination), instead of delegating it to an
external party (e.g. a RAPI client). For this, the IAllocator protocol
should be extended to provide proper group support.
For htools, the new algorithm will work as follows:
#. read/receive cluster data from Ganeti
#. filter out any groups that do not supports the requested storage
method
#. for remaining groups, try allocation and compute scores after
allocation
#. sort valid allocation solutions accordingly and return the entire
list to Ganeti
The rationale for returning the entire group list, and not only the
best choice, is that we anyway have the list, and Ganeti might have
other criteria (e.g. the best group might be busy/locked down, etc.)
so even if from the point of view of resources it is the best choice,
it might not be the overall best one.
Node evacuation changes
-----------------------
While the basic concept in the ``multi-evac`` iallocator
mode remains unchanged (it's a simple local group issue), when failing
to evacuate and running in a super cluster, we could have resources
available elsewhere in the cluster for evacuation.
The algorithm for computing this will be the same as the one for super
cluster compression and redistribution, except that the list of
instances is fixed to the ones living on the nodes to-be-evacuated.
If the inter-group relocation is successful, the result to Ganeti will
not be a local group evacuation target, but instead (for each
instance) a pair *(remote group, nodes)*. Ganeti itself will have to
decide (based on user input) whether to continue with inter-group
evacuation or not.
In case that Ganeti doesn't provide complete cluster data, just the
local group, the inter-group relocation won't be attempted.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-http-server.rst 0000644 0000000 0000000 00000012671 12634264163 0020115 0 ustar 00root root 0000000 0000000 =========================================
Design for replacing Ganeti's HTTP server
=========================================
.. contents:: :depth: 4
.. _http-srv-shortcomings:
Current state and shortcomings
------------------------------
The :doc:`new design for import/export ` depends on an
HTTP server. Ganeti includes a home-grown HTTP server based on Python's
``BaseHTTPServer``. While it served us well so far, it only implements
the very basics of the HTTP protocol. It is, for example, not structured
well enough to support chunked transfers (:rfc:`2616`, section 3.6.1),
which would have some advantages. In addition, it has not been designed
for sending large responses.
In the case of the node daemon the HTTP server can not easily be
separated from the actual backend code and therefore must run as "root".
The RAPI daemon does request parsing in the same process as talking to
the master daemon via LUXI.
Proposed changes
----------------
The proposal is to start using a full-fledged HTTP server in Ganeti and
to run Ganeti's code as `FastCGI `_
applications. Reasons:
- Simplify Ganeti's code by delegating the details of HTTP and SSL to
another piece of software
- Run HTTP frontend and handler backend as separate processes and users
(esp. useful for node daemon, but also import/export and Remote API)
- Allows implementation of :ref:`rpc-feedback`
Software choice
+++++++++++++++
Theoretically any server able of speaking FastCGI to a backend process
could be used. However, to keep the number of steps required for setting
up a new cluster at roughly the same level, the implementation will be
geared for one specific HTTP server at the beginning. Support for other
HTTP servers can still be implemented.
After a rough selection of available HTTP servers `lighttpd
`_ and `nginx `_ were
the most likely candidates. Both are `widely used`_ and tested.
.. _widely used: http://news.netcraft.com/archives/2011/01/12/
january-2011-web-server-survey-4.html
Nginx' `original documentation `_ is in
Russian, translations are `available in a Wiki
`_. Nginx does not support old-style CGI
programs.
The author found `lighttpd's documentation
`_ easier to understand and
was able to configure a test server quickly. This, together with the
support for more technologies, made deciding easier.
With its use as a public-facing web server on a large number of websites
(and possibly more behind proxies), lighttpd should be a safe choice.
Unlike other webservers, such as the Apache HTTP Server, lighttpd's
codebase is of manageable size.
Initially the HTTP server would only be used for import/export
transfers, but its use can be expanded to the Remote API and node
daemon (see :ref:`rpc-feedback`).
To reduce the attack surface, an option will be provided to configure
services (e.g. import/export) to only listen on certain network
interfaces.
.. _rpc-feedback:
RPC feedback
++++++++++++
HTTP/1.1 supports chunked transfers (:rfc:`2616`, section 3.6.1). They
could be used to provide feedback from node daemons to the master,
similar to the feedback from jobs. A good use would be to provide
feedback to the user during long-running operations, e.g. downloading an
instance's data from another cluster.
.. _requirement: http://www.python.org/dev/peps/pep-0333/
#buffering-and-streaming
WSGI 1.0 (:pep:`333`) includes the following `requirement`_:
WSGI servers, gateways, and middleware **must not** delay the
transmission of any block; they **must** either fully transmit the
block to the client, or guarantee that they will continue transmission
even while the application is producing its next block
This behaviour was confirmed to work with lighttpd and the
:ref:`flup ` library. FastCGI by itself has no such
guarantee; webservers with buffering might require artificial padding to
force the message to be transmitted.
The node daemon can send JSON-encoded messages back to the master daemon
by separating them using a predefined character (see :ref:`LUXI
`). The final message contains the method's result. pycURL passes
each received chunk to the callback set as ``CURLOPT_WRITEFUNCTION``.
Once a message is complete, the master daemon can pass it to a callback
function inside the job, which then decides on what to do (e.g. forward
it as job feedback to the user).
A more detailed design may have to be written before deciding whether to
implement RPC feedback.
.. _http-software-req:
Software requirements
+++++++++++++++++++++
- lighttpd 1.4.24 or above built with OpenSSL support (earlier versions
`don't support SSL client certificates
`_)
- `flup `_ for FastCGI
Lighttpd SSL configuration
++++++++++++++++++++++++++
.. highlight:: lighttpd
The following sample shows how to configure SSL with client certificates
in Lighttpd::
$SERVER["socket"] == ":443" {
ssl.engine = "enable"
ssl.pemfile = "server.pem"
ssl.ca-file = "ca.pem"
ssl.use-sslv2 = "disable"
ssl.cipher-list = "HIGH:-DES:-3DES:-EXPORT:-ADH"
ssl.verifyclient.activate = "enable"
ssl.verifyclient.enforce = "enable"
ssl.verifyclient.exportcert = "enable"
ssl.verifyclient.username = "SSL_CLIENT_S_DN_CN"
}
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-hugepages-support.rst 0000644 0000000 0000000 00000007215 12634264163 0021312 0 ustar 00root root 0000000 0000000 ===============================
Huge Pages Support for Ganeti
===============================
This is a design document about implementing support for huge pages in
Ganeti. (Please note that Ganeti works with Transparent Huge Pages i.e.
THP and any reference in this document to Huge Pages refers to explicit
Huge Pages).
Current State and Shortcomings:
-------------------------------
The Linux kernel allows using pages of larger size by setting aside a
portion of the memory. Using larger page size may enhance the
performance of applications that require a lot of memory by improving
page hits. To use huge pages, memory has to be reserved beforehand. This
portion of memory is subtracted from free memory and is considered as in
use. Currently Ganeti cannot take proper advantage of huge pages. On a
node, if huge pages are reserved and are available to fulfill the VM
request, Ganeti fails to recognize huge pages and considers the memory
reserved for huge pages as used memory. This leads to failure of
launching VMs on a node where memory is available in the form of huge
pages rather than normal pages.
Proposed Changes:
-----------------
The following components will be changed in order for Ganeti to take
advantage of Huge Pages.
Hypervisor Parameters:
----------------------
Currently, It is possible to set or modify huge pages mount point at
cluster level via the hypervisor parameter ``mem_path`` as::
$ gnt-cluster init \
>--enabled-hypervisors=kvm -nic-parameters link=br100 \
> -H kvm:mem_path=/mount/point/for/hugepages
This hypervisor parameter is inherited by all the instances as
default although it can be overriden at the instance level.
The following changes will be made to the inheritence behaviour.
- The hypervisor parameter ``mem_path`` and all other hypervisor
parameters will be made available at the node group level (in
addition to the cluster level), so that users can set defaults for
the node group::
$ gnt-group add/modify\
> -H hv:parameter=value
This changes the hypervisor inheritence level as::
cluster -> group -> OS -> instance
- Furthermore, the hypervisor parameter ``mem_path`` will be changeable
only at the cluster or node group level and users must not be able to
override this at OS or instance level. The following command must
produce an error message that ``mem_path`` may only be set at either
the cluster or the node group level::
$ gnt-instance add -H kvm:mem_path=/mount/point/for/hugepages
Memory Pools:
-------------
Memory management of Ganeti will be improved by creating separate pools
for memory used by the node itself, memory used by the hypervisor and
the memory reserved for huge pages as:
- mtotal/xen (Xen memory)
- mfree/xen (Xen unused memory)
- mtotal/hp (Memory reserved for Huge Pages)
- mfree/hp (Memory available from unused huge pages)
- mpgsize/hp (Size of a huge page)
mfree and mtotal will be changed to mean "the total and free memory for
the default method in this cluster/nodegroup". Note that the default
method depends both on the default hypervisor and its parameters.
iAllocator Changes:
-------------------
If huge pages are set as default for a cluster of node group, then
iAllocator must consider the huge pages memory on the nodes, as a
parameter when trying to find the best node for the VM.
Note that the iallocator will also be changed to use the correct
parameter depending on the cluster/group.
hbal Changes:
-------------
The cluster balancer (hbal) will be changed to use the default memory
pool and recognize memory reserved for huge pages when trying to
rebalance the cluster.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-ifdown.rst 0000644 0000000 0000000 00000015403 12634264163 0017114 0 ustar 00root root 0000000 0000000 ======================================
Design for adding ifdown script to KVM
======================================
.. contents:: :depth: 4
This is a design document about adding support for an ifdown script responsible
for deconfiguring network devices and cleanup changes made by the ifup script. The
first implementation will target KVM but it could be ported to Xen as well
especially when hotplug gets implemented.
Current state and shortcomings
==============================
Currently, KVM before instance startup, instance migration and NIC hotplug, it
creates a tap and invokes explicitly the kvm-ifup script with the relevant
environment (INTERFACE, MAC, IP, MODE, LINK, TAGS, and all the network info if
any; NETWORK\_SUBNET, NETWORK\_TAGS, etc).
For Xen we have the `vif-ganeti` script (associated with vif-script hypervisor
parameter). The main difference is that Xen calls it by itself by passing it as
an extra option in the configuration file.
This ifup script can do several things; bridge a tap to a bridge, add ip rules,
update a external DNS or DHCP server, enable proxy ARP or proxy NDP, issue
openvswitch commands, etc. In general we can divide those actions in two
categories:
1) Commands that change the state of the host
2) Commands that change the state of external components.
Currently those changes do not get cleaned up or modified upon instance
shutdown, remove, migrate, or NIC hot-unplug. Thus we have stale entries in
hosts and most important might have stale/invalid configuration on external
components like routers that could affect connectivity.
A workaround could be hooks but:
1) During migrate hooks the environment is the one held in config data
and not in runtime files. The NIC configuration might have changed on
master but not on the running KVM process (unless hotplug is used).
Plus the NIC order in config data might not be the same one on the KVM
process.
2) On instance modification, changes are not available on hooks. With
other words we do not know the configuration before and after modification.
Since Ganeti is the orchestrator and is the one who explicitly configures
host devices (tap, vif) it should be the one responsible for cleanup/
deconfiguration. Especially on a SDN approach this kind of script might
be useful to cleanup flows in the cluster in order to ensure correct paths
without ping pongs between hosts or connectivity loss for the instance.
Proposed Changes
================
We add an new script, kvm-ifdown that is explicitly invoked after:
1) instance shutdown on primary node
2) successful instance migration on source node
3) failed instance migration on target node
4) successful NIC hot-remove on primary node
If an administrator's custom ifdown script exists (e.g. `kvm-ifdown-custom`),
the `kvm-ifdown` script executes that script, as happens with `kvm-ifup`.
Along with that change we should rename custom ifup script from
`kvm-vif-bridge` (which does not make any sense) to `kvm-ifup-custom`.
In contrary to `kvm-ifup`, one cannot rely on `kvm-ifdown` script to be
called. A node might die just after a successful migration or after an
instance shutdown. In that case, all "undo" operations will not be invoked.
Thus, this script should work "on a best effort basis" and the network
should not rely on the script being called or being successful. Additionally
it should modify *only* the node local dynamic configs (routes, arp entries,
SDN, firewalls, etc.), whereas static ones (DNS, DHCP, etc.) should be modified
via hooks.
Implementation Details
======================
1) Where to get the NIC info?
We cannot account on config data since it might have changed. So the only
place we keep our valid data is inside the runtime file. During instance
modifications (NIC hot-remove, hot-modify) we have the NIC object from
the RPC. We take its UUID and search for the corresponding entry in the
runtime file to get further info. After instance shutdown and migration
we just take all NICs from the runtime file and invoke the ifdown script
for each one
2) Where to find the corresponding TAP?
Currently TAP names are kept under
/var/run/ganeti/kvm-hypervisor/nics//.
This is not enough. As told above a NIC's index might change during instance's
life. An example will make things clear:
* The admin starts an instance with three NICs.
* The admin removes the second without hotplug.
* The admin removes the first with hotplug.
The index that will arrive with the RPC will be 1 and if we read the relevant
NIC file we will get the tap of the NIC that has been removed on second
step but is still existing in the KVM process.
So upon TAP creation we write another file with the same info but named
after the NIC's UUID. The one named after its index can be left
for compatibility (Ganeti does not use it; external tools might)
Obviously this info will not be available for old instances in the cluster.
The ifdown script should be aware of this corner case.
3) What should we cleanup/deconfigure?
Upon NIC hot-remove we obviously want to wipe everything. But on instance
migration we don't want to reset external configuration like DNS. So we choose
to pass an extra positional argument to the ifdown script (it already has the
TAP name) that will reflect the context it was invoked with. Please note that
de-configuration of external components is not encouraged and should be
done via hooks. Still we could easily support it via this extra argument.
4) What will be the script environment?
In general the same environment passed to ifup script. Except instance's
tags. Those are the only info not kept in runtime file and it can
change between ifup and ifdown script execution. The ifdown
script must be aware of it and should cleanup everything that ifup script
might setup depending on instance tags (e.g. firewalls, etc)
Configuration Changes
~~~~~~~~~~~~~~~~~~~~~
1) The `kvm-ifdown` script will be an extra file installed under the same dir
`kvm-ifup` resides. We could have a single script (and symbolic links to it)
that shares the same code, where a second positional argument or an extra
environment variable would define if we are bringing the interface up or
down. Still this is not the best practice since it is not equivalent
with how KVM uses `script` and `downscript` in the `netdev` option; scripts
are different files that get the tap name as positional argument. Of course
common code will go in `net-common` so that it can be sourced from either
Xen or KVM specific scripts.
2) An extra file written upon TAP creation named after the NIC's UUID and
including the TAP's name. Since this should be the correct file to keep
backwards compatibility we create a symbolic link named after the NIC's
index and pointing to this new file.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-impexp2.rst 0000644 0000000 0000000 00000054362 12634264163 0017221 0 ustar 00root root 0000000 0000000 ==================================
Design for import/export version 2
==================================
.. contents:: :depth: 4
Current state and shortcomings
------------------------------
Ganeti 2.2 introduced :doc:`inter-cluster instance moves `
and replaced the import/export mechanism with the same technology. It's
since shown that the chosen implementation was too complicated and and
can be difficult to debug.
The old implementation is henceforth called "version 1". It used
``socat`` in combination with a rather complex tree of ``bash`` and
Python utilities to move instances between clusters and import/export
them inside the cluster. Due to protocol limitations, the master daemon
starts a daemon on the involved nodes and then keeps polling a status
file for updates. A non-trivial number of timeouts ensures that jobs
don't freeze.
In version 1, the destination node would start a daemon listening on a
random TCP port. Upon receiving the destination information, the source
node would temporarily stop the instance, create snapshots, and start
exporting the data by connecting to the destination. The random TCP port
is chosen by the operating system by binding the socket to port 0.
While this is a somewhat elegant solution, it causes problems in setups
with restricted connectivity (e.g. iptables).
Another issue encountered was with dual-stack IPv6 setups. ``socat`` can
only listen on one protocol, IPv4 or IPv6, at a time. The connecting
node can not simply resolve the DNS name, but it must be told the exact
IP address.
Instance OS definitions can provide custom import/export scripts. They
were working well in the early days when a filesystem was usually
created directly on the block device. Around Ganeti 2.0 there was a
transition to using partitions on the block devices. Import/export
scripts could no longer use simple ``dump`` and ``restore`` commands,
but usually ended up doing raw data dumps.
Proposed changes
----------------
Unlike in version 1, in version 2 the destination node will connect to
the source. The active side is swapped. This design assumes the
following design documents have been implemented:
- :doc:`design-x509-ca`
- :doc:`design-http-server`
The following design is mostly targetted at inter-cluster instance
moves. Intra-cluster import and export use the same technology, but do
so in a less complicated way (e.g. reusing the node daemon certificate
in version 1).
Support for instance OS import/export scripts, which have been in Ganeti
since the beginning, will be dropped with this design. Should the need
arise, they can be re-added later.
Software requirements
+++++++++++++++++++++
- HTTP client: cURL/pycURL (already used for inter-node RPC and RAPI
client)
- Authentication: X509 certificates (server and client)
Transport
+++++++++
Instead of a home-grown, mostly raw protocol the widely used HTTP
protocol will be used. Ganeti already uses HTTP for its :doc:`Remote API
` and inter-node communication. Encryption and authentication will
be implemented using SSL and X509 certificates.
SSL certificates
++++++++++++++++
The source machine will identify connecting clients by their SSL
certificate. Unknown certificates will be refused.
Version 1 created a new self-signed certificate per instance
import/export, allowing the certificate to be used as a Certificate
Authority (CA). This worked by means of starting a new ``socat``
instance per instance import/export.
Under the version 2 model, a continously running HTTP server will be
used. This disallows the use of self-signed certificates for
authentication as the CA needs to be the same for all issued
certificates.
See the :doc:`separate design document for more details on how the
certificate authority will be implemented `.
Local imports/exports will, like version 1, use the node daemon's
certificate/key. Doing so allows the verification of local connections.
The client's certificate can be exported to the CGI/FastCGI handler
using lighttpd's ``ssl.verifyclient.exportcert`` setting. If a
cluster-local import/export is being done, the handler verifies if the
used certificate matches with the local node daemon key.
Source
++++++
The source can be the same physical machine as the destination, another
node in the same cluster, or a node in another cluster. A
physical-to-virtual migration mechanism could be implemented as an
alternative source.
In the case of a traditional import, the source is usually a file on the
source machine. For exports and remote imports, the source is an
instance's raw disk data. In all cases the transported data is opaque to
Ganeti.
All nodes of a cluster will run an instance of Lighttpd. The
configuration is automatically generated when starting Ganeti. The HTTP
server is configured to listen on IPv4 and IPv6 simultaneously.
Imports/exports will use a dedicated TCP port, similar to the Remote
API.
See the separate :ref:`HTTP server design document
` for why Ganeti's existing, built-in HTTP server
is not a good choice.
The source cluster is provided with a X509 Certificate Signing Request
(CSR) for a key private to the destination cluster.
After shutting down the instance, creating snapshots and restarting the
instance the master will sign the destination's X509 certificate using
the :doc:`X509 CA ` once per instance disk. Instead of
using another identifier, the certificate's serial number (:ref:`never
reused `) and fingerprint are used to identify incoming
requests. Once ready, the master will call an RPC method on the source
node and provide it with the input information (e.g. file paths or block
devices) and the certificate identities.
The RPC method will write the identities to a place accessible by the
HTTP request handler, generate unique transfer IDs and return them to
the master. The transfer ID could be a filename containing the
certificate's serial number, fingerprint and some disk information. The
file containing the per-transfer information is signed using the node
daemon key and the signature written to a separate file.
Once everything is in place, the master sends the certificates, the data
and notification URLs (which include the transfer IDs) and the public
part of the source's CA to the job submitter. Like in version 1,
everything will be signed using the cluster domain secret.
Upon receiving a request, the handler verifies the identity and
continues to stream the instance data. The serial number and fingerprint
contained in the transfer ID should be matched with the certificate
used. If a cluster-local import/export was requested, the remote's
certificate is verified with the local node daemon key. The signature of
the information file from which the handler takes the path of the block
device (and more) is verified using the local node daemon certificate.
There are two options for handling requests, :ref:`CGI
` and :ref:`FastCGI `.
To wait for all requests to finish, the master calls another RPC method.
The destination should notify the source once it's done with downloading
the data. Since this notification may never arrive (e.g. network
issues), an additional timeout needs to be used.
There is no good way to avoid polling as the HTTP requests will be
handled asynchronously in another process. Once, and if, implemented
:ref:`RPC feedback ` could be used to combine the two RPC
methods.
Upon completion of the transfer requests, the instance is removed if
requested.
.. _lighttpd-cgi-opt:
Option 1: CGI
~~~~~~~~~~~~~
While easier to implement, this option requires the HTTP server to
either run as "root" or a so-called SUID binary to elevate the started
process to run as "root".
The export data can be sent directly to the HTTP server without any
further processing.
.. _lighttpd-fastcgi-opt:
Option 2: FastCGI
~~~~~~~~~~~~~~~~~
Unlike plain CGI, FastCGI scripts are run separately from the webserver.
The webserver talks to them via a Unix socket. Webserver and scripts can
run as separate users. Unlike for CGI, there are almost no bootstrap
costs attached to each request.
The FastCGI protocol requires data to be sent in length-prefixed
packets, something which wouldn't be very efficient to do in Python for
large amounts of data (instance imports/exports can be hundreds of
gigabytes). For this reason the proposal is to use a wrapper program
written in C (e.g. `fcgiwrap
`_) and to write the handler
like an old-style CGI program with standard input/output. If data should
be copied from a file, ``cat``, ``dd`` or ``socat`` can be used (see
note about :ref:`sendfile(2)/splice(2) with Python `).
The bootstrap cost associated with starting a Python interpreter for
a disk export is expected to be negligible.
The `spawn-fcgi `_
program will be used to start the CGI wrapper as "root".
FastCGI is, in the author's opinion, the better choice as it allows user
separation. As a first implementation step the export handler can be run
as a standard CGI program. User separation can be implemented as a
second step.
Destination
+++++++++++
The destination can be the same physical machine as the source, another
node in the same cluster, or a node in another cluster. While not
considered in this design document, instances could be exported from the
cluster by implementing an external client for exports.
For traditional exports the destination is usually a file on the
destination machine. For imports and remote exports, the destination is
an instance's disks. All transported data is opaque to Ganeti.
Before an import can be started, an RSA key and corresponding
Certificate Signing Request (CSR) must be generated using the new opcode
``OpInstanceImportPrepare``. The returned information is signed using
the cluster domain secret. The RSA key backing the CSR must not leave
the destination cluster. After being passed through a third party, the
source cluster will generate signed certificates from the CSR.
Once the request for creating the instance arrives at the master daemon,
it'll create the instance and call an RPC method on the instance's
primary node to download all data. The RPC method does not return until
the transfer is complete or failed (see :ref:`EXP_SIZE_FD `
and :ref:`RPC feedback `).
The node will use pycURL to connect to the source machine and identify
itself with the signed certificate received. pycURL will be configured
to write directly to a file descriptor pointing to either a regular file
or block device. The file descriptor needs to point to the correct
offset for resuming downloads.
Using cURL's multi interface, more than one transfer can be made at the
same time. While parallel transfers are used by the version 1
import/export, it can be decided at a later time whether to use them in
version 2 too. More investigation is necessary to determine whether
``CURLOPT_MAXCONNECTS`` is enough to limit the number of connections or
whether more logic is necessary.
If a transfer fails before it's finished (e.g. timeout or network
issues) it should be retried using an exponential backoff delay. The
opcode submitter can specify for how long the transfer should be
retried.
At the end of a transfer, succssful or not, the source cluster must be
notified. A the same time the RSA key needs to be destroyed.
Support for HTTP proxies can be implemented by setting
``CURLOPT_PROXY``. Proxies could be used for moving instances in/out of
restricted network environments or across protocol borders (e.g. IPv4
networks unable to talk to IPv6 networks).
The big picture for instance moves
----------------------------------
#. ``OpInstanceImportPrepare`` (destination cluster)
Create RSA key and CSR (certificate signing request), return signed
with cluster domain secret.
#. ``OpBackupPrepare`` (source cluster)
Becomes a no-op in version 2, but see :ref:`backwards-compat`.
#. ``OpBackupExport`` (source cluster)
- Receives destination cluster's CSR, verifies signature using
cluster domain secret.
- Creates certificates using CSR and :doc:`cluster CA
`, one for each disk
- Stop instance, create snapshots, start instance
- Prepare HTTP resources on node
- Send certificates, URLs and CA certificate to job submitter using
feedback mechanism
- Wait for all transfers to finish or fail (with timeout)
- Remove snapshots
#. ``OpInstanceCreate`` (destination cluster)
- Receives certificates signed by destination cluster, verifies
certificates and URLs using cluster domain secret
Note that the parameters should be implemented in a generic way
allowing future extensions, e.g. to download disk images from a
public, remote server. The cluster domain secret allows Ganeti to
check data received from a third party, but since this won't work
with such extensions, other checks will have to be designed.
- Create block devices
- Download every disk from source, verified using remote's CA and
authenticated using signed certificates
- Destroy RSA key and certificates
- Start instance
.. TODO: separate create from import?
.. _impexp2-http-resources:
HTTP resources on source
------------------------
The HTTP resources listed below will be made available by the source
machine. The transfer ID is generated while preparing the export and is
unique per disk and instance. No caching should be used and the
``Pragma`` (HTTP/1.0) and ``Cache-Control`` (HTTP/1.1) headers set
accordingly by the server.
``GET /transfers/[transfer_id]/contents``
Dump disk contents. Important request headers:
``Accept`` (:rfc:`2616`, section 14.1)
Specify preferred media types. Only one type is supported in the
initial implementation:
``application/octet-stream``
Request raw disk content.
If support for more media types were to be implemented in the
future, the "q" parameter used for "indicating a relative quality
factor" needs to be used. In the meantime parameters need to be
expected, but can be ignored.
If support for OS scripts were to be re-added in the future, the
MIME type ``application/x-ganeti-instance-export`` is hereby
reserved for disk dumps using an export script.
If the source can not satisfy the request the response status code
will be 406 (Not Acceptable). Successful requests will specify the
used media type using the ``Content-Type`` header. Unless only
exactly one media type is requested, the client must handle the
different response types.
``Accept-Encoding`` (:rfc:`2616`, section 14.3)
Specify desired content coding. Supported are ``identity`` for
uncompressed data, ``gzip`` for compressed data and ``*`` for any.
The response will include a ``Content-Encoding`` header with the
actual coding used. If the client specifies an unknown coding, the
response status code will be 406 (Not Acceptable).
If the client specifically needs compressed data (see
:ref:`impexp2-compression`) but only gets ``identity``, it can
either compress locally or abort the request.
``Range`` (:rfc:`2616`, section 14.35)
Raw disk dumps can be resumed using this header (e.g. after a
network issue).
If this header was given in the request and the source supports
resuming, the status code of the response will be 206 (Partial
Content) and it'll include the ``Content-Range`` header as per
:rfc:`2616`. If it does not support resuming or the request was not
specifying a range, the status code will be 200 (OK).
Only a single byte range is supported. cURL does not support
``multipart/byteranges`` responses by itself. Even if they could be
somehow implemented, doing so would be of doubtful benefit for
import/export.
For raw data dumps handling ranges is pretty straightforward by just
dumping the requested range.
cURL will fail with the error code ``CURLE_RANGE_ERROR`` if a
request included a range but the server can't handle it. The request
must be retried without a range.
``POST /transfers/[transfer_id]/done``
Use this resource to notify the source when transfer is finished (even
if not successful). The status code will be 204 (No Content).
Code samples
------------
pycURL to file
++++++++++++++
.. highlight:: python
The following code sample shows how to write downloaded data directly to
a file without pumping it through Python::
curl = pycurl.Curl()
curl.setopt(pycurl.URL, "http://www.google.com/")
curl.setopt(pycurl.WRITEDATA, open("googlecom.html", "w"))
curl.perform()
This works equally well if the file descriptor is a pipe to another
process.
.. _backwards-compat:
Backwards compatibility
-----------------------
.. _backwards-compat-v1:
Version 1
+++++++++
The old inter-cluster import/export implementation described in the
:doc:`Ganeti 2.2 design document ` will be supported for at
least one minor (2.x) release. Intra-cluster imports/exports will use
the new version right away.
.. _exp-size-fd:
``EXP_SIZE_FD``
+++++++++++++++
Together with the improved import/export infrastructure Ganeti 2.2
allowed instance export scripts to report the expected data size. This
was then used to provide the user with an estimated remaining time.
Version 2 no longer supports OS import/export scripts and therefore
``EXP_SIZE_FD`` is no longer needed.
.. _impexp2-compression:
Compression
+++++++++++
Version 1 used explicit compression using ``gzip`` for transporting
data, but the dumped files didn't use any compression. Version 2 will
allow the destination to specify which encoding should be used. This way
the transported data is already compressed and can be directly used by
the client (see :ref:`impexp2-http-resources`). The cURL option
``CURLOPT_ENCODING`` can be used to set the ``Accept-Encoding`` header.
cURL will not decompress received data when
``CURLOPT_HTTP_CONTENT_DECODING`` is set to zero (if another HTTP client
library were used which doesn't support disabling transparent
compression, a custom content-coding type could be defined, e.g.
``x-ganeti-gzip``).
Notes
-----
The HTTP/1.1 protocol (:rfc:`2616`) defines trailing headers for chunked
transfers in section 3.6.1. This could be used to transfer a checksum at
the end of an import/export. cURL supports trailing headers since
version 7.14.1. Lighttpd doesn't seem to support them for FastCGI, but
they appear to be usable in combination with an NPH CGI (No Parsed
Headers).
.. _lighttp-sendfile:
Lighttpd allows FastCGI applications to send the special headers
``X-Sendfile`` and ``X-Sendfile2`` (the latter with a range). Using
these headers applications can send response headers and tell the
webserver to serve regular file stored on the file system as a response
body. The webserver will then take care of sending that file.
Unfortunately this mechanism is restricted to regular files and can not
be used for data from programs, neither direct nor via named pipes,
without writing to a file first. The latter is not an option as instance
data can be very large. Theoretically ``X-Sendfile`` could be used for
sending the input for a file-based instance import, but that'd require
the webserver to run as "root".
.. _python-sendfile:
Python does not include interfaces for the ``sendfile(2)`` or
``splice(2)`` system calls. The latter can be useful for faster copying
of data between file descriptors. There are some 3rd-party modules (e.g.
http://pypi.python.org/pypi/py-sendfile/) and discussions
(http://bugs.python.org/issue10882) for including support for
``sendfile(2)``, but the later is certainly not going to happen for the
Python versions supported by Ganeti. Calling the function using the
``ctypes`` module might be possible.
Performance considerations
--------------------------
The design described above was confirmed to be one of the better choices
in terms of download performance with bigger block sizes. All numbers
were gathered on the same physical machine with a single CPU and 1 GB of
RAM while downloading 2 GB of zeros read from ``/dev/zero``. ``wget``
(version 1.10.2) was used as the client, ``lighttpd`` (version 1.4.28)
as the server. The numbers in the first line are in megabytes per
second. The second line in each row is the CPU time spent in userland
respective system (measured for the CGI/FastCGI program using ``time
-v``).
::
----------------------------------------------------------------------
Block size 4 KB 64 KB 128 KB 1 MB 4 MB
======================================================================
Plain CGI script reading 83 174 180 122 120
from ``/dev/zero``
0.6/3.9 0.1/2.4 0.1/2.2 0.0/1.9 0.0/2.1
----------------------------------------------------------------------
FastCGI with ``fcgiwrap``, 86 167 170 177 174
``dd`` reading from
``/dev/zero`` 1.1/5 0.5/2.9 0.5/2.7 0.7/3.1 0.7/2.8
----------------------------------------------------------------------
FastCGI with ``fcgiwrap``, 68 146 150 170 170
Python script copying from
``/dev/zero`` to stdout
1.3/5.1 0.8/3.7 0.7/3.3 0.9/2.9 0.8/3
----------------------------------------------------------------------
FastCGI, Python script using 31 48 47 5 1
``flup`` library (version
1.0.2) reading from
``/dev/zero``
23.5/9.8 14.3/8.5 16.1/8 - -
----------------------------------------------------------------------
It should be mentioned that the ``flup`` library is not implemented in
the most efficient way, but even with some changes it doesn't get much
faster. It is fine for small amounts of data, but not for huge
transfers.
Other considered solutions
--------------------------
Another possible solution considered was to use ``socat`` like version 1
did. Due to the changing model, a large part of the code would've
required a rewrite anyway, while still not fixing all shortcomings. For
example, ``socat`` could still listen on only one protocol, IPv4 or
IPv6. Running two separate instances might have fixed that, but it'd get
more complicated. Using an existing HTTP server will provide us with a
number of other benefits as well, such as easier user separation between
server and backend.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-internal-shutdown.rst 0000644 0000000 0000000 00000014343 12634264163 0021315 0 ustar 00root root 0000000 0000000 ============================================================
Detection of user-initiated shutdown from inside an instance
============================================================
.. contents:: :depth: 2
This is a design document detailing the implementation of a way for Ganeti to
detect whether an instance marked as up but not running was shutdown gracefully
by the user from inside the instance itself.
Current state and shortcomings
==============================
Ganeti keeps track of the desired status of instances in order to be able to
take proper action (e.g.: reboot) on the instances that happen to crash.
Currently, the only way to properly shut down an instance is through Ganeti's
own commands, which can be used to mark an instance as ``ADMIN_down``.
If a user shuts down an instance from inside, through the proper command of the
operating system it is running, the instance will be shutdown gracefully, but
Ganeti is not aware of that: the desired status of the instance will still be
marked as ``running``, so when the watcher realises that the instance is down,
it will restart it. This behaviour is usually not what the user expects.
Proposed changes
================
We propose to modify Ganeti in such a way that it will detect when an instance
was shutdown as a result of an explicit request from the user. When such a
situation is detected, instead of presenting an error as it happens now, either
the state of the instance will be set to ``ADMIN_down``, or the instance will be
automatically rebooted, depending on an instance-specific configuration value.
The default behavior in case no such parameter is found will be to follow the
apparent will of the user, and setting to ``ADMIN_down`` an instance that was
shut down correctly from inside.
The rest of this design document details the implementation of instance shutdown
detection for Xen. The KVM implementation is detailed in :doc:`design-kvmd`.
Implementation
==============
Xen knows why a domain is being shut down (a crash or an explicit shutdown
or poweroff request), but such information is not usually readily available
externally, because all such cases lead to the virtual machine being destroyed
immediately after the event is detected.
Still, Xen allows the instance configuration file to define what action to be
taken in all those cases through the ``on_poweroff``, ``on_shutdown`` and
``on_crash`` variables. By setting them to ``preserve``, Xen will avoid
destroying the domains automatically.
When the domain is not destroyed, it can be viewed by using ``xm list`` (or ``xl
list`` in newer Xen versions), and the ``State`` field of the output will
provide useful information.
If the state is ``----c-`` it means the instance has crashed.
If the state is ``---s--`` it means the instance was properly shutdown.
If the instance was properly shutdown and it is still marked as ``running`` by
Ganeti, it means that it was shutdown from inside by the user, and the Ganeti
status of the instance needs to be changed to ``ADMIN_down``.
This will be done at regular intervals by the group watcher, just before
deciding which instances to reboot.
On top of that, at the same time, the watcher will also need to issue ``xm
destroy`` commands for all the domains that are in a crashed or shutdown state,
since this will not be done automatically by Xen anymore because of the
``preserve`` setting in their config files.
This behavior will be limited to the domains shut down from inside, because it
will actually keep the resources of the domain busy until the watcher will do
the cleaning job (that, with the default setting, is up to every 5 minutes).
Still, this is considered acceptable, because it is not frequent for a domain to
be shut down this way. The cleanup function will be also run automatically just
before performing any job that requires resources to be available (such as when
creating a new instance), in order to ensure that the new resource allocation
happens starting from a clean state. Functionalities that only query the state
of instances will not run the cleanup function.
The cleanup operation includes both node-specific operations (the actual
destruction of the stopped domains) and configuration changes, to be performed
on the master node (marking as offline an instance that was shut down
internally). The watcher, on the master node, will fetch the list of instances
that have been shutdown from inside (recognizable by their ``oper_state``
as described below). It will then submit a series of ``InstanceShutdown`` jobs
that will mark such instances as ``ADMIN_down`` and clean them up (after
the functionality of ``InstanceShutdown`` will have been extended as specified
in the rest of this design document).
LUs performing operations other than an explicit cleanup will have to be
modified to perform the cleanup as well, either by submitting a job to perform
the cleanup (to be completed before actually performing the task at hand) or by
explicitly performing the cleanup themselves through the RPC calls.
Other required changes
++++++++++++++++++++++
The implementation of this design document will require some commands to be
changed in order to cope with the new shutdown procedure.
With the default shutdown action in Xen set to ``preserve``, the Ganeti
command for shutting down instances would leave them in a shutdown but
preserved state. Therefore, it will have to be changed in such a way to
immediately perform the cleanup of the instance after verifying its correct
shutdown. Also, it will correctly deal with instances that have been shutdown
from inside but are still active according to Ganeti, by detecting this
situation, destroying the instance and carrying out the rest of the Ganeti
shutdown procedure as usual.
The ``gnt-instance list`` command will need to be able to handle the situation
where an instance was shutdown internally but not yet cleaned up. The
``admin_state`` field will maintain the current meaning unchanged. The
``oper_state`` field will get a new possible state, ``S``, meaning that the
instance was shutdown internally.
The ``gnt-instance info`` command ``State`` field, in such case, will show a
message stating that the instance was supposed to be run but was shut down
internally.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-kvmd.rst 0000644 0000000 0000000 00000025413 12634264163 0016571 0 ustar 00root root 0000000 0000000 ==========
KVM daemon
==========
.. toctree::
:maxdepth: 2
This design document describes the KVM daemon, which is responsible for
determining whether a given KVM instance was shutdown by an
administrator or a user.
Current state and shortcomings
==============================
This design document describes the KVM daemon which addresses the KVM
side of the user-initiated shutdown problem introduced in
:doc:`design-internal-shutdown`. We are also interested in keeping this
functionality optional. That is, an administrator does not necessarily
have to run the KVM daemon if either he is running Xen or even, if he
is running KVM, he is not interested in instance shutdown detection.
This requirement is important because it means the KVM daemon should
be a modular component in the overall Ganeti design, i.e., it should
be easy to enable and disable it.
Proposed changes
================
The instance shutdown feature for KVM requires listening on events from
the Qemu Machine Protocol (QMP) Unix socket, which is created together
with a KVM instance. A QMP socket typically looks like
``/var/run/ganeti/kvm-hypervisor/ctrl/.qmp`` and implements
the QMP protocol. This is a bidirectional protocol that allows Ganeti
to send commands, such as, system powerdown, as well as, receive events,
such as, the powerdown and shutdown events.
Listening in on these events allows Ganeti to determine whether a given
KVM instance was shutdown by an administrator, either through
``gnt-instance stop|remove `` or ``kill -KILL
``, or by a user, through ``poweroff`` from inside the
instance. Upon an administrator powerdown, the QMP protocol sends two
events, namely, a powerdown event and a shutdown event, whereas upon a
user shutdown only the shutdown event is sent. This is enough to
distinguish between an administrator and a user shutdown. However,
there is one limitation, which is, ``kill -TERM ``. Even
though this is an action performed by the administrator, it will be
considered a user shutdown by the approach described in this document.
Several design strategies were considered. Most of these strategies
consisted of spawning some process listening on the QMP socket when a
KVM instance is created. However, having a listener process per KVM
instance is not scalable. Therefore, a different strategy is proposed,
namely, having a single process, called the KVM daemon, listening on the
QMP sockets of all KVM instances within a node. That also means there
is an instance of the KVM daemon on each node.
In order to implement the KVM daemon, two problems need to be addressed,
namely, how the KVM daemon knows when to open a connection to a given
QMP socket and how the KVM daemon communicates with Ganeti whether a
given instance was shutdown by an administrator or a user.
QMP connections management
--------------------------
As mentioned before, the QMP sockets reside in the KVM control
directory, which is usually located under
``/var/run/ganeti/kvm-hypervisor/ctrl/``. When a KVM instance is
created, a new QMP socket for this instance is also created in this
directory.
In order to simplify the design of the KVM daemon, instead of having
Ganeti communicate to this daemon through a pipe or socket the creation
of a new KVM instance, and thus a new QMP socket, this daemon will
monitor the KVM control directory using ``inotify``. As a result, the
daemon is not only able to deal with KVM instances being created and
removed, but also capable of overcoming other problematic situations
concerning the filesystem, such as, the case when the KVM control
directory does not exist because, for example, Ganeti was not yet
started, or the KVM control directory was removed, for example, as a
result of a Ganeti reinstallation.
Shutdown detection
------------------
As mentioned before, the KVM daemon is responsible for opening a
connection to the QMP socket of a given instance and listening in on the
shutdown and powerdown events, which allow the KVM daemon to determine
whether the instance stopped because of an administrator or user
shutdown. Once the instance is stopped, the KVM daemon needs to
communicate to Ganeti whether the user was responsible for shutting down
the instance.
In order to achieve this, the KVM daemon writes an empty file, called
the shutdown file, in the KVM control directory with a name similar to
the QMP socket file but with the extension ``.qmp`` replaced with
``.shutdown``. The presence of this file indicates that the shutdown
was initiated by a user, whereas the absence of this file indicates that
the shutdown was caused by an administrator. This strategy also handles
crashes and signals, such as, ``SIGKILL``, to be handled correctly,
given that in these cases the KVM daemon never receives the powerdown
and shutdown events and, therefore, never creates the shutdown file.
KVM daemon launch
-----------------
With the above issues addressed, a question remains as to when the KVM
daemon should be started. The KVM daemon is different from other Ganeti
daemons, which start together with the Ganeti service, because the KVM
daemon is optional, given that it is specific to KVM and should not be
run on installations containing only Xen, and, even in a KVM
installation, the user might still choose not to enable it. And finally
because the KVM daemon is not really necessary until the first KVM
instance is started. For these reasons, the KVM daemon is started from
within Ganeti when a KVM instance is started. And the job process
spawned by the node daemon is responsible for starting the KVM daemon.
Given the current design of Ganeti, in which the node daemon spawns a
job process to handle the creation of the instance, when launching the
KVM daemon it is necessary to first check whether an instance of this
daemon is already running and, if this is not the case, then the KVM
daemon can be safely started.
Design alternatives
===================
At first, it might seem natural to include the instance shutdown
detection for KVM in the node daemon. After all, the node daemon is
already responsible for managing instances, for example, starting and
stopping an instance. Nevertheless, the node daemon is more complicated
than it might seem at first.
The node daemon is composed of the main loop, which runs in the main
thread and is responsible for receiving requests and spawning jobs for
handling these requests, and the jobs, which are independent processes
spawned for executing the actual tasks, such as, creating an instance.
Including instance shutdown detection in the node daemon is not viable
because adding it to the main loop would cause KVM specific code to
taint the generality of the node daemon. In order to add it to the job
processes, it would be possible to spawn either a foreground or a
background process. However, these options are also not viable because
they would lead to the situation described before where there would be a
monitoring process per instance, which is not scalable. Moreover, the
foreground process has an additional disadvantage: it would require
modifications the node daemon in order not to expect a terminating job,
which is the current node daemon design.
There is another design issue to have in mind. We could reconsider the
place where to write the data that tell Ganeti whether an instance was
shutdown by an administrator or the user. Instead of using the KVM
shutdown files presented above, in which the presence of the file
indicates a user shutdown and its absence an administrator shutdown, we
could store a value in the KVM runtime state file, which is where the
relevant KVM state information is. The advantage of this approach is
that it would keep the KVM related information in one place, thus making
it easier to manage. However, it would lead to a more complex
implementation and, in the context of the general transition in Ganeti
from Python to Haskell, a simpler implementation is preferred.
Finally, it should be noted that the KVM runtime state file benefits
from automatic migration. That is, when an instance is migrated so is
the KVM state file. However, the instance shutdown detection for KVM
does not require this feature and, in fact, migrating the instance
shutdown state would be incorrect.
Further considerations
======================
There are potential race conditions between Ganeti and the KVM daemon,
however, in practice they seem unlikely. For example, the KVM daemon
needs to add and remove watches to the parent directories of the KVM
control directory until this directory is finally created. It is
possible that Ganeti creates this directory and a KVM instance before
the KVM daemon has a chance to add a watch to the KVM control directory,
thus causing this daemon to miss the ``inotify`` creation event for the
QMP socket.
There are other problems which arise from the limitations of
``inotify``. For example, if the KVM daemon is started after the first
Ganeti instance has been created, then the ``inotify`` will not produce
any event for the creation of the QMP socket. This can happen, for
example, if the KVM daemon needs to be restarted or upgraded. As a
result, it might be necessary to have an additional mechanism that runs
at KVM daemon startup or at regular intervals to ensure that the current
KVM internal state is consistent with the actual contents of the KVM
control directory.
Another race condition occurs when Ganeti shuts down a KVM instance
using force. Ganeti uses ``TERM`` signals to stop KVM instances when
force is specified or ACPI is not enabled. However, as mentioned
before, ``TERM`` signals are interpreted by the KVM daemon as a user
shutdown. As a result, the KVM daemon creates a shutdown file which
then must be removed by Ganeti. The race condition occurs because the
KVM daemon might create the shutdown file after the hypervisor code that
tries to remove this file has already run. In practice, the race
condition seems unlikely because Ganeti stops the KVM instance in a
retry loop, which allows Ganeti to stop the instance and cleanup its
runtime information.
It is possible to determine if a process, in this particular case the
KVM process, was terminated by a ``TERM`` signal, using the `proc
connector and socket filters
`_.
The proc connector is a socket connected between a userspace process and
the kernel through the netlink protocol and can be used to receive
notifications of process events, and the socket filters is a mechanism
for subscribing only to events that are relevant. There are several
`process events `_ which can be
subscribed to, however, in this case, we are interested only in the exit
event, which carries information about the exit signal.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-linuxha.rst 0000644 0000000 0000000 00000014431 12634264163 0017276 0 ustar 00root root 0000000 0000000 ====================
Linux HA integration
====================
.. contents:: :depth: 4
This is a design document detailing the integration of Ganeti and Linux HA.
Current state and shortcomings
==============================
Ganeti doesn't currently support any self-healing or self-monitoring.
We are now working on trying to improve the situation in this regard:
- The :doc:`autorepair system ` will take care
of self repairing a cluster in the presence of offline nodes.
- The :doc:`monitoring agent ` will take care
of exporting data to monitoring.
What is still missing is a way to self-detect "obvious" failures rapidly
and to:
- Maintain the master role active.
- Offline resource that are obviously faulty so that the autorepair
system can perform its work.
Proposed changes
================
Linux-HA provides software that can be used to provide high availability
of services through automatic failover of resources. In particular
Pacemaker can be used together with Heartbeat or Corosync to make sure a
resource is kept active on a self-monitoring cluster.
Ganeti OCF agents
-----------------
The Ganeti agents will be slightly special in the HA world. The
following will apply:
- The agents will be able to be configured cluster-wise by tags (which
will be read on the nodes via ssconf_cluster_tags) and locally by
files on the filesystem that will allow them to "simulate" a
particular condition (eg. simulate a failure even if none is
detected).
- The agents will be able to run in "full" or "partial" mode: in
"partial" mode they will always succeed, and thus never fail a
resource as long as a node is online, is running the linux HA software
and is responding to the network. In "full" mode they will also check
resources like the cluster master ip or master daemon, and act if they
are missing
Note that for what Ganeti does OCF agents are needed: simply relying on
the LSB scripts will not work for the Ganeti service.
Master role agent
-----------------
This agent will manage the Ganeti master role. It needs to be configured
as a sticky resource (you don't want to flap the master role around, do
you?) that is active on only one node. You can require quorum or fencing
to protect your cluster from multiple masters.
The agent will implement a stateless resource that considers itself
"started" only the master node, "stopped" on all master candidates and
in error mode for all other nodes.
Note that if not all your nodes are master candidates this resource
might have problems:
- if all nodes are configured to run the resource, heartbeat may decide
to "fence" (aka stonith) all your non-master-candidate nodes if told
to do so. This might not be what you want.
- if only master candidates are configured as nodes for the resource,
beware of promotions and demotions, as nothing will update
automatically pacemaker should a change happen at the Ganeti level.
Other solutions, such as reporting the resource just as "stopped" on non
master candidates as well might mean that pacemaker would choose the
"wrong" node to promote to master, which is also a bad idea.
Future improvements
+++++++++++++++++++
- Ability to work better with non-master-candidate nodes
- Stateful resource that can "safely" transfer the master role between
online nodes (with queue drain and such)
- Implement "full" mode, with detection of the cluster IP and the master
node daemon.
Node role agent
---------------
This agent will manage the Ganeti node role. It needs to be configured
as a cloned resource that is active on all nodes.
In partial mode it will always return success (and thus trigger a
failure only upon an HA level or network failure). Full mode, which
initially will not be implemented, couls also check for the node daemon
being unresponsive or other local conditions (TBD).
When a failure happens the HA notification system will trigger on all
other nodes, including the master. The master will then be able to
offline the node. Any other work to restore instance availability should
then be done by the autorepair system.
The following cluster tags are supported:
- ``ocf:node-offline:use-powercycle``: Try to powercycle a node using
``gnt-node powercycle`` when offlining.
- ``ocf:node-offline:use-poweroff``: Try to power off a node using
``gnt-node power off`` when offlining (requires OOB support).
Future improvements
+++++++++++++++++++
- Handle draining differently than offlining
- Handle different modes of "stopping" the service
- Implement "full" mode
Risks
-----
Running Ganeti with Pacemaker increases the risk of stability for your
Ganeti Cluster. Events like:
- stopping heartbeat or corosync on a node
- corosync or heartbeat being killed for any reason
- temporary failure in a node's networking
will trigger potentially dangerous operations such as node offlining or
master role failover. Moreover if the autorepair system will be working
they will be able to also trigger instance failovers or migrations, and
disk replaces.
Also note that operations like: master-failover, or manual node-modify
might interact badly with this setup depending on the way your HA system
is configured (see below).
This of course is an inherent problem with any Linux-HA installation,
but is probably more visible with Ganeti given that our resources tend
to be more heavyweight than many others managed in HA clusters (eg. an
IP address).
Code status
-----------
This code is heavily experimental, and Linux-HA is a very complex
subsystem. *We might not be able to help you* if you decide to run this
code: please make sure you understand fully high availability on your
production machines. Ganeti only ships this code as an example but it
might need customization or complex configurations on your side for it
to run properly.
*Ganeti does not automate HA configuration for your cluster*. You need
to do this job by hand. Good luck, don't get it wrong.
Future work
===========
- Integrate the agents better with the ganeti monitoring
- Add hooks for managing HA at node add/remove/modify/master-failover
operations
- Provide a stonith system through Ganeti's OOB system
- Provide an OOB system that does "shunning" of offline nodes, for
emulating a real OOB, at least on all nodes
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-location.rst 0000644 0000000 0000000 00000010701 12634264163 0017432 0 ustar 00root root 0000000 0000000 ======================================
Improving location awareness of Ganeti
======================================
This document describes an enhancement of Ganeti's instance
placement by taking into account that some nodes are vulnerable
to common failures.
.. contents:: :depth: 4
Current state and shortcomings
==============================
Currently, Ganeti considers all nodes in a single node group as
equal. However, this is not true in some setups. Nodes might share
common causes of failure or be even located in different places
with spacial redundancy being a desired feature.
The similar problem for instances, i.e., instances providing the
same external service should not placed on the same nodes, is
solved by means of exclusion tags. However, there is no mechanism
for a good choice of node pairs for a single instance. Moreover,
while instances providing the same service run on different nodes,
they are not spread out location wise.
Proposed changes
================
We propose to the cluster metric (as used, e.g., by ``hbal`` and ``hail``)
to honor additional node tags indicating nodes that might have a common
cause of failure.
Failure tags
------------
As for exclusion tags, cluster tags will determine which tags are considered
to denote a source of common failure. More precisely, a cluster tag of the
form *htools:nlocation:x* will make node tags starting with *x:* indicate a
common cause of failure, that redundant instances should avoid.
Metric changes
--------------
The following components will be added cluster metric, weighed appropriately.
- The number of pairs of an instance and a common-failure tag, where primary
and secondary node both have this tag.
- The number of pairs of exclusion tags and common-failure tags where there
exist at least two instances with the given exclusion tag with the primary
node having the given common-failure tag.
The weights for these components might have to be tuned as experience with these
setups grows, but as a starting point, both components will have a weight of
1.0 each. In this way, any common-failure violations are less important than
any hard constraints missed (like instances on offline nodes) so that
the hard constraints will be restored first when balancing a cluster.
Nevertheless, with weight 1.0 the new common-failure components will
still be significantly more important than all the balancedness components
(cpu, disk, memory), as the latter are standard deviations of fractions.
It will also dominate the disk load component which, which, when only taking
static information into account, essentially amounts to counting disks. In
this way, Ganeti will be willing to sacrifice equal numbers of disks on every
node in order to fulfill location requirements.
Appart from changing the balancedness metric, common-failure tags will
not have any other effect. In particular, as opposed to exclusion tags,
no hard guarantees are made: ``hail`` will try allocate an instance in
a common-failure avoiding way if possible, but still allocate the instance
if not.
Additional migration restrictions
=================================
Inequality between nodes can also restrict the set of instance migrations
possible. Here, the most prominent example is updating the hypervisor where
usually migrations from the new to the old hypervisor version is not possible.
Migration tags
--------------
As for exclusion tags, cluster tags will determine which tags are considered
restricting migration. More precisely, a cluster tag of the form
*htools:migration:x* will make node tags starting with *x:* a migration relevant
node property. Additionally, cluster tags of the form
*htools:allowmigration:y::z* where *y* and *z* are migration tags not containing
*::* specify a unidirectional migration possibility from *y* to *z*.
Restriction
-----------
An instance migration will only be considered by ``htools``, if for all
migration tags *y* present on the node migrated from, either the tag
is also present on the node migrated to or there is a cluster tag
*htools::allowmigration:y::z* and the target node is tagged *z* (or both).
Example
-------
For the simple hypervisor upgrade, where migration from old to new is possible,
but not the other way round, tagging all already upgraded nodes suffices.
Advise only
-----------
These tags are of advisory nature only. That is, all ``htools`` will strictly
obey the restrictions imposed by those tags, but Ganeti will not prevent users
from manually instructing other migrations.
ganeti-2.15.2/doc/design-lu-generated-jobs.rst 0000644 0000000 0000000 00000006663 12634264163 0021145 0 ustar 00root root 0000000 0000000 ==================================
Submitting jobs from logical units
==================================
.. contents:: :depth: 4
This is a design document about the innards of Ganeti's job processing.
Readers are advised to study previous design documents on the topic:
- :ref:`Original job queue `
- :ref:`Job priorities `
Current state and shortcomings
==============================
Some Ganeti operations want to execute as many operations in parallel as
possible. Examples are evacuating or failing over a node (``gnt-node
evacuate``/``gnt-node failover``). Without changing large parts of the
code, e.g. the RPC layer, to be asynchronous, or using threads inside a
logical unit, only a single operation can be executed at a time per job.
Currently clients work around this limitation by retrieving the list of
desired targets and then re-submitting a number of jobs. This requires
logic to be kept in the client, in some cases leading to duplication
(e.g. CLI and RAPI).
Proposed changes
================
The job queue lock is guaranteed to be released while executing an
opcode/logical unit. This means an opcode can talk to the job queue and
submit more jobs. It then receives the job IDs, like any job submitter
using the LUXI interface would. These job IDs are returned to the
client, who then will then proceed to wait for the jobs to finish.
Technically, the job queue already passes a number of callbacks to the
opcode processor. These are used for giving user feedback, notifying the
job queue of an opcode having gotten its locks, and checking whether the
opcode has been cancelled. A new callback function is added to submit
jobs. Its signature and result will be equivalent to the job queue's
existing ``SubmitManyJobs`` function.
Logical units can submit jobs by returning an instance of a special
container class with a list of jobs, each of which is a list of opcodes
(e.g. ``[[op1, op2], [op3]]``). The opcode processor will recognize
instances of the special class when used a return value and will submit
the contained jobs. The submission status and job IDs returned by the
submission callback are used as the opcode's result. It should be
encapsulated in a dictionary allowing for future extensions.
.. highlight:: javascript
Example::
{
"jobs": [
(True, "8149"),
(True, "21019"),
(False, "Submission failed"),
(True, "31594"),
],
}
Job submissions can fail for variety of reasons, e.g. a full or drained
job queue. Lists of jobs can not be submitted atomically, meaning some
might fail while others succeed. The client is responsible for handling
such cases.
Other discussed solutions
=========================
Instead of requiring the client to wait for the returned jobs, another
idea was to do so from within the submitting opcode in the master
daemon. While technically possible, doing so would have two major
drawbacks:
- Opcodes waiting for other jobs to finish block one job queue worker
thread
- All locks must be released before starting the waiting process,
failure to do so can lead to deadlocks
Instead of returning the job IDs as part of the normal opcode result,
introducing a new opcode field, e.g. ``op_jobids``, was discussed and
dismissed. A new field would touch many areas and possibly break some
assumptions. There were also questions about the semantics.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-monitoring-agent.rst 0000644 0000000 0000000 00000072745 12634264163 0021123 0 ustar 00root root 0000000 0000000 =======================
Ganeti monitoring agent
=======================
.. contents:: :depth: 4
This is a design document detailing the implementation of a Ganeti
monitoring agent report system, that can be queried by a monitoring
system to calculate health information for a Ganeti cluster.
Current state and shortcomings
==============================
There is currently no monitoring support in Ganeti. While we don't want
to build something like Nagios or Pacemaker as part of Ganeti, it would
be useful if such tools could easily extract information from a Ganeti
machine in order to take actions (example actions include logging an
outage for future reporting or alerting a person or system about it).
Proposed changes
================
Each Ganeti node should export a status page that can be queried by a
monitoring system. Such status page will be exported on a network port
and will be encoded in JSON (simple text) over HTTP.
The choice of JSON is obvious as we already depend on it in Ganeti and
thus we don't need to add extra libraries to use it, as opposed to what
would happen for XML or some other markup format.
Location of agent report
------------------------
The report will be available from all nodes, and be concerned for all
node-local resources. This allows more real-time information to be
available, at the cost of querying all nodes.
Information reported
--------------------
The monitoring agent system will report on the following basic information:
- Instance status
- Instance disk status
- Status of storage for instances
- Ganeti daemons status, CPU usage, memory footprint
- Hypervisor resources report (memory, CPU, network interfaces)
- Node OS resources report (memory, CPU, network interfaces)
- Node OS CPU load average report
- Information from a plugin system
.. _monitoring-agent-format-of-the-report:
Format of the report
--------------------
The report of the will be in JSON format, and it will present an array
of report objects.
Each report object will be produced by a specific data collector.
Each report object includes some mandatory fields, to be provided by all
the data collectors:
``name``
The name of the data collector that produced this part of the report.
It is supposed to be unique inside a report.
``version``
The version of the data collector that produces this part of the
report. Built-in data collectors (as opposed to those implemented as
plugins) should have "B" as the version number.
``format_version``
The format of what is represented in the "data" field for each data
collector might change over time. Every time this happens, the
format_version should be changed, so that who reads the report knows
what format to expect, and how to correctly interpret it.
``timestamp``
The time when the reported data were gathered. It has to be expressed
in nanoseconds since the unix epoch (0:00:00 January 01, 1970). If not
enough precision is available (or needed) it can be padded with
zeroes. If a report object needs multiple timestamps, it can add more
and/or override this one inside its own "data" section.
``category``
A collector can belong to a given category of collectors (e.g.: storage
collectors, daemon collector). This means that it will have to provide a
minumum set of prescribed fields, as documented for each category.
This field will contain the name of the category the collector belongs to,
if any, or just the ``null`` value.
``kind``
Two kinds of collectors are possible:
`Performance reporting collectors`_ and `Status reporting collectors`_.
The respective paragraphs will describe them and the value of this field.
``data``
This field contains all the data generated by the specific data collector,
in its own independently defined format. The monitoring agent could check
this syntactically (according to the JSON specifications) but not
semantically.
Here follows a minimal example of a report::
[
{
"name" : "TheCollectorIdentifier",
"version" : "1.2",
"format_version" : 1,
"timestamp" : 1351607182000000000,
"category" : null,
"kind" : 0,
"data" : { "plugin_specific_data" : "go_here" }
},
{
"name" : "AnotherDataCollector",
"version" : "B",
"format_version" : 7,
"timestamp" : 1351609526123854000,
"category" : "storage",
"kind" : 1,
"data" : { "status" : { "code" : 1,
"message" : "Error on disk 2"
},
"plugin_specific" : "data",
"some_late_data" : { "timestamp" : 1351609526123942720,
...
}
}
}
]
Performance reporting collectors
++++++++++++++++++++++++++++++++
These collectors only provide data about some component of the system, without
giving any interpretation over their meaning.
The value of the ``kind`` field of the report will be ``0``.
Status reporting collectors
+++++++++++++++++++++++++++
These collectors will provide information about the status of some
component of ganeti, or managed by ganeti.
The value of their ``kind`` field will be ``1``.
The rationale behind this kind of collectors is that there are some situations
where exporting data about the underlying subsystems would expose potential
issues. But if Ganeti itself is able (and going) to fix the problem, conflicts
might arise between Ganeti and something/somebody else trying to fix the same
problem.
Also, some external monitoring systems might not be aware of the internals of a
particular subsystem (e.g.: DRBD) and might only exploit the high level
response of its data collector, alerting an administrator if anything is wrong.
Still, completely hiding the underlying data is not a good idea, as they might
still be of use in some cases. So status reporting plugins will provide two
output modes: one just exporting a high level information about the status,
and one also exporting all the data they gathered.
The default output mode will be the status-only one. Through a command line
parameter (for stand-alone data collectors) or through the HTTP request to the
monitoring agent
(when collectors are executed as part of it) the verbose output mode providing
all the data can be selected.
When exporting just the status each status reporting collector will provide,
in its ``data`` section, at least the following field:
``status``
summarizes the status of the component being monitored and consists of two
subfields:
``code``
It assumes a numeric value, encoded in such a way to allow using a bitset
to easily distinguish which states are currently present in the whole
cluster. If the bitwise OR of all the ``status`` fields is 0, the cluster
is completely healty.
The status codes are as follows:
``0``
The collector can determine that everything is working as
intended.
``1``
Something is temporarily wrong but it is being automatically fixed by
Ganeti.
There is no need of external intervention.
``2``
The collector has failed to understand whether the status is good or
bad. Further analysis is required. Interpret this status as a
potentially dangerous situation.
``4``
The collector can determine that something is wrong and Ganeti has no
way to fix it autonomously. External intervention is required.
``message``
A message to better explain the reason of the status.
The exact format of the message string is data collector dependent.
The field is mandatory, but the content can be an empty string if the
``code`` is ``0`` (working as intended) or ``1`` (being fixed
automatically).
If the status code is ``2``, the message should specify what has gone
wrong.
If the status code is ``4``, the message shoud explain why it was not
possible to determine a proper status.
The ``data`` section will also contain all the fields describing the gathered
data, according to a collector-specific format.
Instance status
+++++++++++++++
At the moment each node knows which instances are running on it, which
instances it is primary for, but not the cause why an instance might not
be running. On the other hand we don't want to distribute full instance
"admin" status information to all nodes, because of the performance
impact this would have.
As such we propose that:
- Any operation that can affect instance status will have an optional
"reason" attached to it (at opcode level). This can be used for
example to distinguish an admin request, from a scheduled maintenance
or an automated tool's work. If this reason is not passed, Ganeti will
just use the information it has about the source of the request.
This reason information will be structured according to the
:doc:`Ganeti reason trail ` design document.
- RPCs that affect the instance status will be changed so that the
"reason" and the version of the config object they ran on is passed to
them. They will then export the new expected instance status, together
with the associated reason and object version to the status report
system, which then will export those themselves.
Monitoring and auditing systems can then use the reason to understand
the cause of an instance status, and they can use the timestamp to
understand the freshness of their data even in the absence of an atomic
cross-node reporting: for example if they see an instance "up" on a node
after seeing it running on a previous one, they can compare these values
to understand which data is freshest, and repoll the "older" node. Of
course if they keep seeing this status this represents an error (either
an instance continuously "flapping" between nodes, or an instance is
constantly up on more than one), which should be reported and acted
upon.
The instance status will be on each node, for the instances it is
primary for, and its ``data`` section of the report will contain a list
of instances, named ``instances``, with at least the following fields for
each instance:
``name``
The name of the instance.
``uuid``
The UUID of the instance (stable on name change).
``admin_state``
The status of the instance (up/down/offline) as requested by the admin.
``actual_state``
The actual status of the instance. It can be ``up``, ``down``, or
``hung`` if the instance is up but it appears to be completely stuck.
``uptime``
The uptime of the instance (if it is up, "null" otherwise).
``mtime``
The timestamp of the last known change to the instance state.
``state_reason``
The last known reason for state change of the instance, described according
to the JSON representation of a reason trail, as detailed in the :doc:`reason
trail design document `.
``status``
It represents the status of the instance, and its format is the same as that
of the ``status`` field of `Status reporting collectors`_.
Each hypervisor should provide its own instance status data collector, possibly
with the addition of more, specific, fields.
The ``category`` field of all of them will be ``instance``.
The ``kind`` field will be ``1``.
Note that as soon as a node knows it's not the primary anymore for an
instance it will stop reporting status for it: this means the instance
will either disappear, if it has been deleted, or appear on another
node, if it's been moved.
The ``code`` of the ``status`` field of the report of the Instance status data
collector will be:
``0``
if ``status`` is ``0`` for all the instances it is reporting about.
``1``
otherwise.
Storage collectors
++++++++++++++++++
The storage collectors will be a series of data collectors
that will gather data about storage for the current node. The collection
will be performed at different granularity and abstraction levels, from
the physical disks, to partitions, logical volumes and to the specific
storage types used by Ganeti itself (drbd, rbd, plain, file).
The ``name`` of each of these collector will reflect what storage type each of
them refers to.
The ``category`` field of these collector will be ``storage``.
The ``kind`` field will depend on the specific collector.
Each ``storage`` collector's ``data`` section will provide collector-specific
fields.
The various storage collectors will provide keys to join the data they provide,
in order to allow the user to get a better understanding of the system. E.g.:
through device names, or instance names.
Diskstats collector
*******************
This storage data collector will gather information about the status of the
disks installed in the system, as listed in the /proc/diskstats file. This means
that not only physical hard drives, but also ramdisks and loopback devices will
be listed.
Its ``kind`` in the report will be ``0`` (`Performance reporting collectors`_).
Its ``category`` field in the report will contain the value ``storage``.
When executed in verbose mode, the ``data`` section of the report of this
collector will be a list of items, each representing one disk, each providing
the following fields:
``major``
The major number of the device.
``minor``
The minor number of the device.
``name``
The name of the device.
``readsNum``
This is the total number of reads completed successfully.
``mergedReads``
Reads which are adjacent to each other may be merged for efficiency. Thus
two 4K reads may become one 8K read before it is ultimately handed to the
disk, and so it will be counted (and queued) as only one I/O. This field
specifies how often this was done.
``secRead``
This is the total number of sectors read successfully.
``timeRead``
This is the total number of milliseconds spent by all reads.
``writes``
This is the total number of writes completed successfully.
``mergedWrites``
Writes which are adjacent to each other may be merged for efficiency. Thus
two 4K writes may become one 8K read before it is ultimately handed to the
disk, and so it will be counted (and queued) as only one I/O. This field
specifies how often this was done.
``secWritten``
This is the total number of sectors written successfully.
``timeWrite``
This is the total number of milliseconds spent by all writes.
``ios``
The number of I/Os currently in progress.
The only field that should go to zero, it is incremented as requests are
given to appropriate struct request_queue and decremented as they finish.
``timeIO``
The number of milliseconds spent doing I/Os. This field increases so long
as field ``IOs`` is nonzero.
``wIOmillis``
The weighted number of milliseconds spent doing I/Os.
This field is incremented at each I/O start, I/O completion, I/O merge,
or read of these stats by the number of I/Os in progress (field ``IOs``)
times the number of milliseconds spent doing I/O since the last update of
this field. This can provide an easy measure of both I/O completion time
and the backlog that may be accumulating.
Logical Volume collector
************************
This data collector will gather information about the attributes of logical
volumes present in the system.
Its ``kind`` in the report will be ``0`` (`Performance reporting collectors`_).
Its ``category`` field in the report will contain the value ``storage``.
The ``data`` section of the report of this collector will be a list of items,
each representing one logical volume and providing the following fields:
``uuid``
The UUID of the logical volume.
``name``
The name of the logical volume.
``attr``
The attributes of the logical volume.
``major``
Persistent major number or -1 if not persistent.
``minor``
Persistent minor number or -1 if not persistent.
``kernel_major``
Currently assigned major number or -1 if LV is not active.
``kernel_minor``
Currently assigned minor number or -1 if LV is not active.
``size``
Size of LV in bytes.
``seg_count``
Number of segments in LV.
``tags``
Tags, if any.
``modules``
Kernel device-mapper modules required for this LV, if any.
``vg_uuid``
Unique identifier of the volume group.
``vg_name``
Name of the volume group.
``segtype``
Type of LV segment.
``seg_start``
Offset within the LVto the start of the segment in bytes.
``seg_start_pe``
Offset within the LV to the start of the segment in physical extents.
``seg_size``
Size of the segment in bytes.
``seg_tags``
Tags for the segment, if any.
``seg_pe_ranges``
Ranges of Physical Extents of underlying devices in lvs command line format.
``devices``
Underlying devices used with starting extent numbers.
``instance``
The name of the instance this LV is used by, or ``null`` if it was not
possible to determine it.
DRBD status
***********
This data collector will run only on nodes where DRBD is actually
present and it will gather information about DRBD devices.
Its ``kind`` in the report will be ``1`` (`Status reporting collectors`_).
Its ``category`` field in the report will contain the value ``storage``.
When executed in verbose mode, the ``data`` section of the report of this
collector will provide the following fields:
``versionInfo``
Information about the DRBD version number, given by a combination of
any (but at least one) of the following fields:
``version``
The DRBD driver version.
``api``
The API version number.
``proto``
The protocol version.
``srcversion``
The version of the source files.
``gitHash``
Git hash of the source files.
``buildBy``
Who built the binary, and, optionally, when.
``device``
A list of structures, each describing a DRBD device (a minor) and containing
the following fields:
``minor``
The device minor number.
``connectionState``
The state of the connection. If it is "Unconfigured", all the following
fields are not present.
``localRole``
The role of the local resource.
``remoteRole``
The role of the remote resource.
``localState``
The status of the local disk.
``remoteState``
The status of the remote disk.
``replicationProtocol``
The replication protocol being used.
``ioFlags``
The input/output flags.
``perfIndicators``
The performance indicators. This field will contain the following
sub-fields:
``networkSend``
KiB of data sent on the network.
``networkReceive``
KiB of data received from the network.
``diskWrite``
KiB of data written on local disk.
``diskRead``
KiB of date read from the local disk.
``activityLog``
Number of updates of the activity log.
``bitMap``
Number of updates to the bitmap area of the metadata.
``localCount``
Number of open requests to the local I/O subsystem.
``pending``
Number of requests sent to the partner but not yet answered.
``unacknowledged``
Number of requests received by the partner but still to be answered.
``applicationPending``
Num of block input/output requests forwarded to DRBD but that have not yet
been answered.
``epochs``
(Optional) Number of epoch objects. Not provided by all DRBD versions.
``writeOrder``
(Optional) Currently used write ordering method. Not provided by all DRBD
versions.
``outOfSync``
(Optional) KiB of storage currently out of sync. Not provided by all DRBD
versions.
``syncStatus``
(Optional) The status of the synchronization of the disk. This is present
only if the disk is being synchronized, and includes the following fields:
``percentage``
The percentage of synchronized data.
``progress``
How far the synchronization is. Written as "x/y", where x and y are
integer numbers expressed in the measurement unit stated in
``progressUnit``
``progressUnit``
The measurement unit for the progress indicator.
``timeToFinish``
The expected time before finishing the synchronization.
``speed``
The speed of the synchronization.
``want``
The desiderd speed of the synchronization.
``speedUnit``
The measurement unit of the ``speed`` and ``want`` values. Expressed
as "size/time".
``instance``
The name of the Ganeti instance this disk is associated to.
Ganeti daemons status
+++++++++++++++++++++
Ganeti will report what information it has about its own daemons.
This should allow identifying possible problems with the Ganeti system itself:
for example memory leaks, crashes and high resource utilization should be
evident by analyzing this information.
The ``kind`` field will be ``1`` (`Status reporting collectors`_).
Each daemon will have its own data collector, and each of them will have
a ``category`` field valued ``daemon``.
When executed in verbose mode, their data section will include at least:
``memory``
The amount of used memory.
``size_unit``
The measurement unit used for the memory.
``uptime``
The uptime of the daemon.
``CPU usage``
How much cpu the daemon is using (percentage).
Any other daemon-specific information can be included as well in the ``data``
section.
Hypervisor resources report
+++++++++++++++++++++++++++
Each hypervisor has a view of system resources that sometimes is
different than the one the OS sees (for example in Xen the Node OS,
running as Dom0, has access to only part of those resources). In this
section we'll report all information we can in a "non hypervisor
specific" way. Each hypervisor can then add extra specific information
that is not generic enough be abstracted.
The ``kind`` field will be ``0`` (`Performance reporting collectors`_).
Each of the hypervisor data collectory will be of ``category``: ``hypervisor``.
Node OS resources report
++++++++++++++++++++++++
Since Ganeti assumes it's running on Linux, it's useful to export some
basic information as seen by the host system.
The ``category`` field of the report will be ``null``.
The ``kind`` field will be ``0`` (`Performance reporting collectors`_).
The ``data`` section will include:
``cpu_number``
The number of available cpus.
``cpus``
A list with one element per cpu, showing its average load.
``memory``
The current view of memory (free, used, cached, etc.)
``filesystem``
A list with one element per filesystem, showing a summary of the
total/available space.
``NICs``
A list with one element per network interface, showing the amount of
sent/received data, error rate, IP address of the interface, etc.
``versions``
A map using the name of a component Ganeti interacts (Linux, drbd,
hypervisor, etc) as the key and its version number as the value.
Note that we won't go into any hardware specific details (e.g. querying a
node RAID is outside the scope of this, and can be implemented as a
plugin) but we can easily just report the information above, since it's
standard enough across all systems.
Node OS CPU load average report
+++++++++++++++++++++++++++++++
This data collector will export CPU load statistics as seen by the host
system. Apart from using the data from an external monitoring system we
can also use the data to improve instance allocation and/or the Ganeti
cluster balance. To compute the CPU load average we will use a number of
values collected inside a time window. The collection process will be
done by an independent thread (see `Mode of Operation`_).
This report is a subset of the previous report (`Node OS resources
report`_) and they might eventually get merged, once reporting for the
other fields (memory, filesystem, NICs) gets implemented too.
Specifically:
The ``category`` field of the report will be ``null``.
The ``kind`` field will be ``0`` (`Performance reporting collectors`_).
The ``data`` section will include:
``cpu_number``
The number of available cpus.
``cpus``
A list with one element per cpu, showing its average load.
``cpu_total``
The total CPU load average as a sum of the all separate cpus.
The CPU load report function will get N values, collected by the
CPU load collection function and calculate the above averages. Please
see the section `Mode of Operation`_ for more information one how the
two functions of the data collector interact.
Format of the query
-------------------
.. include:: monitoring-query-format.rst
Instance disk status propagation
--------------------------------
As for the instance status Ganeti has now only partial information about
its instance disks: in particular each node is unaware of the disk to
instance mapping, that exists only on the master.
For this design doc we plan to fix this by changing all RPCs that create
a backend storage or that put an already existing one in use and passing
the relevant instance to the node. The node can then export these to the
status reporting tool.
While we haven't implemented these RPC changes yet, we'll use Confd to
fetch this information in the data collectors.
Plugin system
-------------
The monitoring system will be equipped with a plugin system that can
export specific local information through it.
The plugin system is expected to be used by local installations to
export any installation specific information that they want to be
monitored, about either hardware or software on their systems.
The plugin system will be in the form of either scripts or binaries whose output
will be inserted in the report.
Eventually support for other kinds of plugins might be added as well, such as
plain text files which will be inserted into the report, or local unix or
network sockets from which the information has to be read. This should allow
most flexibility for implementing an efficient system, while being able to keep
it as simple as possible.
Data collectors
---------------
In order to ease testing as well as to make it simple to reuse this
subsystem it will be possible to run just the "data collectors" on each
node without passing through the agent daemon.
If a data collector is run independently, it should print on stdout its
report, according to the format corresponding to a single data collector
report object, as described in the previous paragraphs.
Mode of operation
-----------------
In order to be able to report information fast the monitoring agent
daemon will keep an in-memory or on-disk cache of the status, which will
be returned when queries are made. The status system will then
periodically check resources to make sure the status is up to date.
Different parts of the report will be queried at different speeds. These
will depend on:
- how often they vary (or we expect them to vary)
- how fast they are to query
- how important their freshness is
Of course the last parameter is installation specific, and while we'll
try to have defaults, it will be configurable. The first two instead we
can use adaptively to query a certain resource faster or slower
depending on those two parameters.
When run as stand-alone binaries, the data collector will not using any
caching system, and just fetch and return the data immediately.
Since some performance collectors have to operate on a number of values
collected in previous times, we need a mechanism independent of the data
collector which will trigger the collection of those values and also
store them, so that they are available for calculation by the data
collectors.
To collect data periodically, a thread will be created by the monitoring
agent which will run the collection function of every data collector
that provides one. The values returned by the collection function of
the data collector will be saved in an appropriate map, associating each
value to the corresponding collector, using the collector's name as the
key of the map. This map will be stored in mond's memory.
The collectors are divided in two categories:
- stateless collectors, collectors who have immediate access to the
reported information
- stateful collectors, collectors whose report is based on data collected
in a previous time window
For example: the collection function of the CPU load collector will
collect a CPU load value and save it in the map mentioned above. The
collection function will be called by the collector thread every t
milliseconds. When the report function of the collector is called, it
will process the last N values of the map and calculate the
corresponding average.
Implementation place
--------------------
The status daemon will be implemented as a standalone Haskell daemon. In
the future it should be easy to merge multiple daemons into one with
multiple entry points, should we find out it saves resources and doesn't
impact functionality.
The libekg library should be looked at for easily providing metrics in
json format.
Implementation order
--------------------
We will implement the agent system in this order:
- initial example data collectors (eg. for drbd and instance status).
- initial daemon for exporting data, integrating the existing collectors
- plugin system
- RPC updates for instance status reasons and disk to instance mapping
- cache layer for the daemon
- more data collectors
Future work
===========
As a future step it can be useful to "centralize" all this reporting
data on a single place. This for example can be just the master node, or
all the master candidates. We will evaluate doing this after the first
node-local version has been developed and tested.
Another possible change is replacing the "read-only" RPCs with queries
to the agent system, thus having only one way of collecting information
from the nodes from a monitoring system and for Ganeti itself.
One extra feature we may need is a way to query for only sub-parts of
the report (eg. instances status only). This can be done by passing
arguments to the HTTP GET, which will be defined when we get to this
funtionality.
Finally the :doc:`autorepair system design `. system
(see its design) can be expanded to use the monitoring agent system as a
source of information to decide which repairs it can perform.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-move-instance-improvements.rst 0000644 0000000 0000000 00000047422 12634264163 0023132 0 ustar 00root root 0000000 0000000 ========================================
Instance move improvements
========================================
.. contents:: :depth: 3
Ganeti provides tools for moving instances within and between clusters. Through
special export and import calls, a new instance is created with the disk data of
the existing one.
The tools work correctly and reliably, but depending on bandwidth and priority,
an instance disk of considerable size requires a long time to transfer. The
length of the transfer is inconvenient at best, but the problem becomes only
worse if excessive locking causes a move operation to be delayed for a longer
period of time, or to block other operations.
The performance of moves is a complex topic, with available bandwidth,
compression, and encryption all being candidates for choke points that bog down
a transfer. Depending on the environment a move is performed in, tuning these
can have significant performance benefits, but Ganeti does not expose many
options needed for such tuning. The details of what to expose and what tradeoffs
can be made will be presented in this document.
Apart from existing functionality, some beneficial features can be introduced to
help with instance moves. Zeroing empty space on instance disks can be useful
for drastically improving the qualities of compression, effectively not needing
to transfer unused disk space during moves. Compression itself can be improved
by using different tools. The encryption used can be weakened or eliminated for
certain moves. Using opportunistic locking during instance moves results in
greater parallelization. As all of these approaches aim to tackle two different
aspects of the problem, they do not exclude each other and will be presented
independently.
The performance of Ganeti moves
===============================
In the current implementation, there are three possible factors limiting the
speed of an instance move. The first is the network bandwidth, which Ganeti can
exploit better by using compression. The second is the encryption, which is
obligatory, and which can throttle an otherwise fast connection. The third is
surprisingly the compression, which can cause the connection to be
underutilized.
Example 1: some numbers present during an intra-cluster instance move:
* Network bandwidth: 105MB/s, courtesy of a gigabit switch
* Encryption performance: 40MB/s, provided by OpenSSL
* Compression performance: 22.3MB/s input, 7.1MB/s gzip compressed output
As can be seen in this example, the obligatory encryption results in 62% of
available bandwidth being wasted, while using compression further lowers the
throughput to 55% of what the encryption would allow. The following sections
will talk about these numbers in more detail, and suggest improvements and best
practices.
Encryption and Ganeti security
++++++++++++++++++++++++++++++
Turning compression and encryption off would allow for an immediate improvement,
and while that is possible for compression, there are good reasons why
encryption is currently not a feature a user can disable.
While it is impossible to secure instance data if an attacker gains SSH access
to a node, the RAPI was designed to never allow user data to be accessed through
it in case of being compromised. If moves could be performed unencrypted, this
property would be broken. Instance moves can take place in environments which
may be hostile, and where unencrypted traffic could be intercepted. As they can
be instigated through the RAPI, an attacker could access all data on all
instances in a cluster by moving them unencrypted and intercepting the data in
flight. This is one of the few situations where the current speed of instance
moves could be considered a perk.
The performance of encryption can be increased by either using a less secure
form of encryption, including no encryption, or using a faster encryption
algorithm. The example listed above utilizes AES-256, one of the few ciphers
that Ganeti deems secure enough to use. AES-128, also allowed by Ganeti's
current settings, is weaker but 46% faster. A cipher that is not allowed due to
its flaws, such as RC4, could offer a 208% increase in speed. On the other hand,
using an OS capable of utilizing the AES_NI chip present on modern hardware
can double the performance of AES, making it the best tradeoff between security
and performance.
Ganeti cannot and should not detect all the factors listed above, but should
rather give its users some leeway in what to choose. A precedent already exists,
as intra-cluster DRBD replication is already performed unencrypted, albeit on a
separate VLAN. For intra-cluster moves, Ganeti should allow its users to set
OpenSSL ciphers at will, while still enforcing high-security settings for moves
between clusters.
Thus, two settings will be introduced:
* a cluster-level setting called ``--allow-cipher-bypassing``, a boolean that
cannot be set over RAPI
* a gnt-instance move setting called ``--ciphers-to-use``, bypassing the default
cipher list with given ciphers, filtered to ensure no other OpenSSL options
are passed in within
This change will serve to address the issues with moving non-redundant instances
within the cluster, while keeping Ganeti security at its current level.
Compression
+++++++++++
Support for disk compression during instance moves was partially present before,
but cleaned up and unified under the ``--compress`` option only as of Ganeti
2.11. The only option offered by Ganeti is gzip with no options passed to it,
resulting in a good compression ratio, but bad compression speed.
As compression can affect the speed of instance moves significantly, it is
worthwhile to explore alternatives. To test compression tool performance, an 8GB
drive filled with data matching the expected usage patterns (taken from a
workstation) was compressed by using various tools with various settings. The
two top performers were ``lzop`` and, surprisingly, ``gzip``. The improvement in
the performance of ``gzip`` was obtained by explicitly optimizing for speed
rather than compression.
* ``gzip -6``: 22.3MB/s in, 7.1MB/s out
* ``gzip -1``: 44.1MB/s in, 15.1MB/s out
* ``lzop``: 71.9MB/s in, 28.1MB/s out
If encryption is the limiting factor, and as in the example, limits the
bandwidth to 40MB/s, ``lzop`` allows for an effective 79% increase in transfer
speed. The fast ``gzip`` would also prove to be beneficial, but much less than
``lzop``. It should also be noted that as a rule of thumb, tools with a lower
compression ratio had a lesser workload, with ``lzop`` straining the CPU much
less than any of the competitors.
With the test results present here, it is clear that ``lzop`` would be a very
worthwhile addition to the compression options present in Ganeti, yet the
problem is that it is not available by default on all distributions, as the
option's presence might imply. In general, Ganeti may know how to use several
tools, and check for their presence, but should add some way of at least hinting
at which tools are available.
Additionally, the user might want to use a tool that Ganeti did not account for.
Allowing the tool to be named is also helpful, both for cases when multiple
custom tools are to be used, and for distinguishing between various tools in
case of e.g. inter-cluster moves.
To this end, the ``--compression-tools`` cluster parameter will be added to
Ganeti. It contains a list of names of compression tools that can be supplied as
the parameter of ``--compress``, and by default it contains all the tools
Ganeti knows how to use. The user can change the list as desired, removing
entries that are not or should not be available on the cluster, and adding
custom tools.
Every custom tool is identified by its name, and Ganeti expects the name to
correspond to a script invoking the compression tool. Without arguments, the
script compresses input on stdin, outputting it on stdout. With the -d argument,
the script does the same, only while decompressing. The -h argument is used to
check for the presence of the script, and in this case, only the error code is
examined. This syntax matches the ``gzip`` syntax well, which should allow most
compression tools to be adapted to it easily.
Ganeti will not allow arbitrary parameters to be passed to a compression tool,
and will restrict the names to contain only a small but assuredly safe subset of
characters - alphanumeric values and dashes and underscores. This minimizes the
risk of security issues that could arise from an attacker smuggling a malicious
command through RAPI. Common variations, like the speed/compression tradeoff of
``gzip``, will be handled by aliases, e.g. ``gzip-fast`` or ``gzip-slow``.
It should also be noted that for some purposes - e.g. the writing of OVF files,
``gzip`` is the only allowed means of compression, and an appropriate error
message should be displayed if the user attempts to use one of the other
provided tools.
Zeroing instance disks
======================
While compression lowers the amount of data sent, further reductions can be
achieved by taking advantage of the structure of the disk - namely, sending only
used disk sectors.
There is no direct way to achieve this, as it would require that the
move-instance tool is aware of the structure of the file system. Mounting the
filesystem is not an option, primarily due to security issues. A disk primed to
take advantage of a disk driver exploit could cause an attacker to breach
instance isolation and gain control of a Ganeti node.
An indirect way for this performance gain to be achieved is the zeroing of any
hard disk space not in use. While this primarily means empty space, swap
partitions can be zeroed as well.
Sequences of zeroes can be compressed and thus transferred very efficiently, all
without the host knowing that these are empty space. This approach can also be
dangerous if a sparse disk is zeroed in this way, causing ballooning. As Ganeti
does not seem to make special concessions for moving sparse disks, the only
difference should be the disk space utilization on the current node.
Zeroing approaches
++++++++++++++++++
Zeroing is a feasible approach, but the node cannot perform it as it cannot
mount the disk. Only virtualization-based options remain, and of those, using
Ganeti's own virtualization capabilities makes the most sense. There are two
ways of doing this - creating a new helper instance, temporary or persistent, or
reusing the target instance.
Both approaches have their disadvantages. Creating a new helper instance
requires managing its lifecycle, taking special care to make sure no helper
instance remains left over due to a failed operation. Even if this were to be
taken care of, disks are not yet separate entities in Ganeti, making the
temporary transfer of disks between instances hard to implement and even harder
to make robust. The reuse can be done by modifying the OS running on the
instance to perform the zeroing itself when notified via the new instance
communication mechanism, but this approach is neither generic, nor particularly
safe. There is no guarantee that the zeroing operation will not interfere with
the normal operation of the instance, nor that it will be completed if a
user-initiated shutdown occurs.
A better solution can be found by combining the two approaches - re-using the
virtualized environment, but with a specifically crafted OS image. With the
instance shut down as it should be in preparation for the move, it can be
extended with an additional disk with the OS image on it. By prepending the
disk and changing some instance parameters, the instance can boot from it. The
OS can be configured to perform the zeroing on startup, attempting to mount any
partitions with a filesystem present, and creating and deleting a zero-filled
file on them. After the zeroing is complete, the OS should shut down, and the
master should note the shutdown and restore the instance to its previous state.
Note that the requirements above are very similar to the notion of a helper VM
suggested in the OS install document. Some potentially unsafe actions are
performed within a virtualized environment, acting on disks that belong or will
belong to the instance. The mechanisms used will thus be developed with both
approaches in mind.
Implementation
++++++++++++++
There are two components to this solution - the Ganeti changes needed to boot
the OS, and the OS image used for the zeroing. Due to the variety of filesystems
and architectures that instances can use, no single ready-to-run disk image can
satisfy the needs of all the Ganeti users. Instead, the instance-debootstrap
scripts can be used to generate a zeroing-capable OS image. This might not be
ideal, as there are lightweight distributions that take up less space and boot
up more quickly. Generating those with the right set of drivers for the
virtualization platform of choice is not easy. Thus we do not provide a script
for this purpose, but the user is free to provide any OS image which performs
the necessary steps: zero out all virtualization-provided devices on startup,
shutdown immediately. The cluster-wide parameter controlling the image to be
used would be called ``--zeroing-image``.
The modifications to Ganeti code needed are minor. The zeroing functionality
should be implemented as an extension of the instance export, and exposed as the
``--zero-free-space option``. Prior to beginning the export, the instance
configuration is temporarily extended with a new read-only disk of sufficient
size to host the zeroing image, and the changes necessary for the image to be
used as the boot drive. The temporary nature of the configuration changes
requires that they are not propagated to other nodes. While this would normally
not be feasible with an instance using a disk template offering multi-node
redundancy, experiments with the code have shown that the restriction on
diverse disk templates can be bypassed to temporarily allow a plain
disk-template disk to host the zeroing image. Given that one of the planned
changes in Ganeti is to have instance disks as separate entities, with no
restriction on templates, this assumption is useful rather than harmful by
asserting the desired behavior. The image is dumped to the disk, and the
instance is started up.
Once the instance is started up, the zeroing will proceed until completion, when
a self-initiated shutdown will occur. The instance-shutdown detection
capabilities of 2.11 should prevent the watcher from restarting the instance
once this happens, allowing the host to take it as a sign the zeroing was
completed. Either way, the host waits until the instance is shut down, or a
timeout has been reached and the instance is forcibly shut down. As the time
needed to zero an instance is dependent on the size of the disk of the instance,
the user can provide a fixed and a per-size timeout, recommended to be set to
twice the maximum write speed of the device hosting the instance.
Better progress monitoring can be implemented with the instance-host
communication channel proposed by the OS install design document. The first
version will most likely use only the shutdown detection, and will be improved
to account for the available communication channel at a later time.
After the shutdown, the temporary disk is destroyed and the instance
configuration is reverted to its original state. The very same action is done if
any error is encountered during the zeroing process. In the case that the
zeroing is interrupted while the zero-filled file is being written, the file may
remain on the disk of the instance. The script that performs the zeroing will be
made to react to system signals by deleting the zero-filled file, but there is
little else that can be done to recover.
When to use zeroing
+++++++++++++++++++
The question of when it is useful to use zeroing is hard to answer because the
effectiveness of the approach depends on many factors. All compression tools
compress zeroes to almost nothingness, but compressing them takes time. If the
time needed to compress zeroes were equal to zero, the approach would boil down
to whether it is faster to zero unused space out, performing writes to disk, or
to transfer it compressed. For the example used above, the average compression
ratio, and write speeds of current disk drives, the answer would almost
unanimously be yes.
With a more realistic setup, where zeroes take time to compress, yet less time
than ordinary data, the gains depend on the previously mentioned tradeoff and
the free space available. Zeroing will definitely lessen the amount of bandwidth
used, but it can lead to the connection being underutilized due to the time
spent compressing data. It is up to the user to make these tradeoffs, but
zeroing should be seen primarily as a means of further reducing the amount of
data sent while increasing disk activity, with possible speed gains that should
not be relied upon.
In the future, the VM created for zeroing could also undertake other tasks
related to the move, such as compression and encryption, and produce a stream
of data rather than just modifying the disk. This would lessen the strain on
the resources of the hypervisor, both disk I/O and CPU usage, and allow moves to
obey the resource constraints placed on the instance being moved.
Lock reduction
==============
An instance move as executed by the move-instance tool consists of several
preparatory RAPI calls, leading up to two long-lasting opcodes: OpCreateInstance
and OpBackupExport. While OpBackupExport locks only the instance, the locks of
OpCreateInstance require more attention.
When executed, this opcode attempts to lock all nodes on which the instance may
be created and obtain shared locks on the groups they belong to. In the case
that an IAllocator is used, this means all nodes must be locked. Any operation
that requires a node lock to be present can delay the move operation, and there
is no shortage of these.
The concept of opportunistic locking has been introduced to remedy exactly this
situation, allowing the IAllocator to lock as many nodes as possible. Depending
whether the allocation can be made on these nodes, the operation either proceeds
as expected, or fails noting that it is temporarily infeasible. The failure case
would change the semantics of the move-instance tool, which is expected to fail
only if the move is impossible. To yield the benefits of opportunistic locking
yet satisfy this constraint, the move-instance tool can be extended with the
--opportunistic-tries and --opportunistic-try-delay options. A number of
opportunistic instance creations are attempted, with a delay between attempts.
The delay is slightly altered every time to avoid timing issues. Should all
attempts fail, a normal instance creation is requested, which blocks until all
the locks can be acquired.
While it may seem excessive to grab so many node locks, the early release
mechanism is used to make the situation less dire, releasing all nodes that were
not chosen as candidates for allocation. This is taken to the extreme as all the
locks acquired are released prior to the start of the transfer, barring the
newly-acquired lock over the new instance. This works because all operations
that alter the node in a way which could affect the transfer:
* are prevented by the instance lock or instance presence, e.g. gnt-node remove,
gnt-node evacuate,
* do not interrupt the transfer, e.g. a PV on the node can be set as
unallocatable, and the transfer still proceeds as expected,
* do not care, e.g. a gnt-node powercycle explicitly ignores all locks.
This invariant should be kept in mind, and perhaps verified through tests.
All in all, there is very little space to reduce the number of locks used, and
the only improvement that can be made is introducing opportunistic locking as an
option of move-instance.
Introduction of changes
=======================
All the changes noted will be implemented in Ganeti 2.12, in the way described
in the previous chapters. They will be implemented as separate changes, first
the lock reduction, then the instance zeroing, then the compression
improvements, and finally the encryption changes.
ganeti-2.15.2/doc/design-multi-reloc.rst 0000644 0000000 0000000 00000011701 12634264163 0020057 0 ustar 00root root 0000000 0000000 ====================================
Moving instances accross node groups
====================================
This design document explains the changes needed in Ganeti to perform
instance moves across node groups. Reader familiarity with the following
existing documents is advised:
- :doc:`Current IAllocator specification `
- :doc:`Shared storage model in 2.3+ `
Motivation and and design proposal
==================================
At the moment, moving instances away from their primary or secondary
nodes with the ``relocate`` and ``multi-evacuate`` IAllocator calls
restricts target nodes to those on the same node group. This ensures a
mobility domain is never crossed, and allows normal operation of each
node group to be confined within itself.
It is desirable, however, to have a way of moving instances across node
groups so that, for example, it is possible to move a set of instances
to another group for policy reasons, or completely empty a given group
to perform maintenance operations.
To implement this, we propose the addition of new IAllocator calls to
compute inter-group instance moves and group-aware node evacuation,
taking into account mobility domains as appropriate. The interface
proposed below should be enough to cover the use cases mentioned above.
With the implementation of this design proposal, the previous
``multi-evacuate`` mode will be deprecated.
.. _multi-reloc-detailed-design:
Detailed design
===============
All requests honor the groups' ``alloc_policy`` attribute.
Changing instance's groups
--------------------------
Takes a list of instances and a list of node group UUIDs; the instances
will be moved away from their current group, to any of the groups in the
target list. All instances need to have their primary node in the same
group, which may not be a target group. If the target group list is
empty, the request is simply "change group" and the instances are placed
in any group but their original one.
Node evacuation
---------------
Evacuates instances off their primary nodes. The evacuation mode
can be given as ``primary-only``, ``secondary-only`` or
``all``. The call is given a list of instances whose primary nodes need
to be in the same node group. The returned nodes need to be in the same
group as the original primary node.
.. _multi-reloc-result:
Result
------
In all storage models, an inter-group move can be modeled as a sequence
of **replace secondary**, **migration** and **failover** operations
(when shared storage is used, they will all be failover or migration
operations within the corresponding mobility domain).
The result of the operations described above must contain two lists of
instances and a list of jobs (each of which is a list of serialized
opcodes) to actually execute the operation. :doc:`Job dependencies
` can be used to force jobs to run in a certain
order while still making use of parallelism.
The two lists of instances describe which instances could be
moved/migrated and which couldn't for some reason ("unsuccessful"). The
union of the instances in the two lists must be equal to the set of
instances given in the original request. The successful list of
instances contains elements as follows::
(instance name, target group name, [chosen node names])
The choice of names is simply for readability reasons (for example,
Ganeti could log the computed solution in the job information) and for
being able to check (manually) for consistency that the generated
opcodes match the intended target groups/nodes. Note that for the
node-evacuate operation, the group is not changed, but it should still
be returned as such (as it's easier to have the same return type for
both operations).
The unsuccessful list of instances contains elements as follows::
(instance name, explanation)
where ``explanation`` is a string describing why the plugin was not able
to relocate the instance.
The client is given a list of job IDs (see the :doc:`design for
LU-generated jobs `) which it can watch.
Failures should be reported to the user.
.. highlight:: python
Example job list::
[
# First job
[
{ "OP_ID": "OP_INSTANCE_MIGRATE",
"instance_name": "inst1.example.com",
},
{ "OP_ID": "OP_INSTANCE_MIGRATE",
"instance_name": "inst2.example.com",
},
],
# Second job
[
{ "OP_ID": "OP_INSTANCE_REPLACE_DISKS",
"depends": [
[-1, ["success"]],
],
"instance_name": "inst2.example.com",
"mode": "replace_new_secondary",
"remote_node": "node4.example.com",
},
],
# Third job
[
{ "OP_ID": "OP_INSTANCE_FAILOVER",
"depends": [
[-2, []],
],
"instance_name": "inst8.example.com",
},
],
]
Accepted opcodes:
- ``OP_INSTANCE_FAILOVER``
- ``OP_INSTANCE_MIGRATE``
- ``OP_INSTANCE_REPLACE_DISKS``
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-multi-storage-htools.rst 0000644 0000000 0000000 00000014230 12634264163 0021725 0 ustar 00root root 0000000 0000000 ==================================================
HTools support for multiple storage units per node
==================================================
.. contents:: :depth: 4
This design document describes changes to hbal and related components (first
and foremost LUXI), that will allow it to handle nodes that can't be considered
monolithic in regard to disk layout, for example because they have multiple
different storage units available.
Current state and shortcomings
==============================
Currently the htools assume that there is one storage unit per node and that it can
be arbitrarily split among instances. This leads to problems in clusters
where multiple storage units are present: There might be 10GB DRBD and 10GB
plain storage available on a node, for a total of 20GB. If an instance that
uses 15GB of a single type of storage is requested, it can't actually fit on
the node, but the current implementation of hail doesn't notice this.
This behaviour is clearly wrong, but the problem doesn't arise often in current
setup, due to the fact that instances currently only have a single
storage type and that users typically use node groups to differentiate between
different node storage layouts.
For the node show action, RAPI only returns
* ``dfree``: The total amount of free disk space
* ``dtotal``: The total amount of disk space
which is insufficient for the same reasons.
Proposed changes
================
Definitions
-----------
* All disks have exactly one *desired storage unit*, which determines where and
how the disk can be stored. If the disk is transfered, the desired storage
unit remains unchanged. The desired storage unit includes specifics like the
volume group in the case of LVM based storage.
* A *storage unit* is a specific storage location on a specific node. Storage
units have exactly one desired storage unit they can contain. A storage unit
further has an identifier (containing the storage type, a key and possibly
parameters), a total capacity, and a free capacity. A node cannot
contain multiple storage units of the same desired storage unit.
* For the purposes of this document a *disk* has a desired storage unit and a size.
* A *disk can be moved* to a node, if there is at least one storage unit on
that node which can contain the desired storage unit of the disk and if the
free capacity is at least the size of the disk.
* An *instance can be moved* to a node, if all its disks can be moved there
one-by-one.
LUXI and IAllocator protocol extension
--------------------------------------
The LUXI and IAllocator protocols are extended to include in the ``node``:
* ``storage``: a list of objects (storage units) with
#. Storage unit, containing in order:
#. storage type
#. storage key (e.g. volume group name)
#. extra parameters (e.g. flag for exclusive storage) as a list.
#. Amount free in MiB
#. Amount total in MiB
.. code-block:: javascript
{
"storage": [
{ "sunit": ["drbd8", "xenvg", []]
, "free": 2000,
, "total": 4000
},
{ "sunit": ["file", "/path/to/storage1", []]
, "free": 5000,
, "total": 10000
},
{ "sunit": ["file", "/path/to/storage2", []]
, "free": 1000,
, "total": 20000
},
{ "sunit": ["lvm-vg", "xenssdvg", [false]]
, "free": 1024,
, "total": 1024
}
]
}
is a node with an LVM volume group mirrored over DRBD, two file storage
directories, one half full, one mostly full, and a non-mirrored volume group.
The storage type ``drbd8`` needs to be added in order to differentiate between
mirrored storage and non-mirrored storage.
The storage key signals the volume group used and the storage unit takes no
additional parameters.
Text protocol extension
-----------------------
The same field is optionally present in the HTools text protocol:
* a new "storage" column is added to the node section, which is a semicolon
separated list of comma separated fields in the order
#. ``free``
#. ``total``
#. ``sunit``, which in itself contains
#. the storage type
#. the storage key
#. extra arguments
For example:
2000,4000,drbd,xenvg;5000,10000,file,/path/to/storage1;1000,20000;
[...]
Interpretation
--------------
``hbal`` and ``hail`` will use this information only if available, if the data
file doesn't contain the ``storage`` field the old algorithm is used.
If the node information contains the ``storage`` field, hbal and hail will
assume that only the space compatible with the disk's requirements is
available. For an instance to fit a node, all it's disks need to fit there
separately. For a disk to fit a node, a storage unit of the type of
the disk needs to have enough free space to contain it. The total free storage
is not taken into consideration.
Ignoring the old information will in theory introduce a backwards
incompatibility: If the total free storage is smaller than to the sum of the
free storage reported in the ``storage`` field a previously illegal move will
become legal.
Balancing
---------
In order to determine a storage location for an instance, we collect analogous
metrics to the current total node free space metric -- namely the standard deviation
statistic of the free space per storage unit.
The *standard deviation metric* of a desired storage unit is the sample standard
deviation of the percentage of free space of storage units compatible.
The *full storage metric* is a average of the standard deviation metrics of the
desired storage units.
This is backwards compatible in-so-far as that
#. For a single storage unit per node it will have the same value.
#. The weight of the storage versus the other metrics remains unchanged.
Further this retains the property that scarce resources with low total will
tend to have bigger impact on the metric than those with large totals, because
in latter case the relative differences will not make for a large standard
deviation.
Ignoring nodes that do not contain the desired storage unit additionally
boosts the importance of the scarce desired storage units, because having more
storage units of a desired storage unit will tend to make the standard
deviation metric smaller.
ganeti-2.15.2/doc/design-multi-version-tests.rst 0000644 0000000 0000000 00000016161 12634264163 0021605 0 ustar 00root root 0000000 0000000 ===================
Multi-version tests
===================
.. contents:: :depth: 4
This is a design document describing how tests which use multiple
versions of Ganeti can be introduced into the current build
infrastructure.
Desired improvements
====================
The testing of Ganeti is currently done by using two different
approaches - unit tests and QA. While the former are useful for ensuring
that the individual parts of the system work as expected, most errors
are discovered only when all the components of Ganeti interact during
QA.
However useful otherwise, until now the QA has failed to provide support
for testing upgrades and version compatibility as it was limited to
using only one version of Ganeti. While these can be tested for every
release manually, a systematic approach is preferred and none can exist
with this restriction in place. To lift it, the buildbot scripts and QA
utilities must be extended to allow a way of specifying and using
diverse multi-version checks.
Required use cases
==================
There are two classes of multi-version tests that are interesting in
Ganeti, and this chapter provides an example from each to highlight what
should be accounted for in the design.
Compatibility tests
-------------------
One interface Ganeti exposes to clients interested in interacting with
it is the RAPI. Its stability has always been a design principle
followed during implementation, but whether it held true in practice was
not asserted through tests.
An automatic test of RAPI compatibility would have to take a diverse set
of RAPI requests and perform them on two clusters of different versions,
one of which would be the reference version. If the clusters had been
identically configured, all of the commands successfully executed on the
reference version should succeed on the newer version as well.
To achieve this, two versions of Ganeti can be run separately on a
cleanly setup cluster. With no guarantee that the versions can coexist,
the deployment of these has to be separate. A proxy placed between the
client and Ganeti records all the requests and responses. Using this
data, a testing utility can decide if the newer version is compatible or
not, and provide additional information to assist with debugging.
Upgrade / downgrade tests
-------------------------
An upgrade / downgrade test serves to examine whether the state of the
cluster is unchanged after its configuration has been upgraded or
downgraded to another version of Ganeti.
The test works with two consecutive versions of Ganeti, both installed
on the same machine. It examines whether the configuration data and
instances survive the downgrade and upgrade procedures. This is done by
creating a cluster with the newer version, downgrading it to the older
one, and upgrading it to the newer one again. After every step, the
integrity of the cluster is checked by running various operations and
ensuring everything still works.
Design and implementation
=========================
Although the previous examples have not been selected to show use cases
as diverse as possible, they still show a number of dissimilarities:
- Parallel installation vs sequential deployments
- Comparing with reference version vs comparing consecutive versions
- Examining result dumps vs trying a sequence of operations
With the first two real use cases demonstrating such diversity, it does
not make sense to design multi-version test classes. Instead, the
programmability of buildbot's configuration files can be leveraged to
implement each test as a separate builder with a custom sequence of
steps. The individual steps such as checking out a given or previous
version, or installing and removing Ganeti, will be provided as utility
functions for any test writer to use.
Current state
-------------
An upgrade / downgrade test is a part of the QA suite as of commit
aa104b5e. The test and the corresponding buildbot changes are a very
good first step, both by showing that multi-version tests can be done,
and by providing utilities needed for builds of multiple branches.
Previously, the same folder was used as the base directory of any build,
and now a directory structure more accommodating to multiple builds is
in place.
The builder running the test has one flaw - regardless of the branch
submitted, it compares versions 2.10 and 2.11 (current master). This
behaviour is different from any of the other builders, which may
restrict the branches a test can be performed on, but do not
differentiate between them otherwise. While additional builders for
different versions pairs may be added, this is not a good long-term
solution.
The test can be improved by making it compare the current and the
previous version. As the buildbot has no notion of what a previous
version is, additional utilities to handle this logic will have to be
introduced.
Planned changes
---------------
The upgrade / downgrade test should be generalized to work for any
version which can be downgraded from and upgraded to automatically,
meaning versions from 2.11 onwards. This will be made challenging by
the fact that the previous version has to be checked out by reading the
version of the currently checked out code, identifying the previous
version, and then making yet another checkout.
The major and minor version can be read from a Ganeti repository in
multiple ways. The two are present as constants defined in source files,
but due to refactorings shifting constants from the Python to the
Haskell side, their position varies across versions. A more reliable way
of fetching them is by examining the news file, as it obeys strict
formatting restrictions.
With the version found, a script that acts as a previous version
lookup table can be invoked. This script can be constructed dynamically
upon buildbot startup, and specified as a build step. The checkout
following it proceeds as expected.
The RAPI compatibility test should be added as a separate builder
afterwards. As the test requires additional comparison and proxy logic
to be used, it will be enabled only on 2.11 onwards, comparing the
versions to 2.6 - the reference version for the RAPI. Details on the
design of this test will be added in a separate document.
Potential issues
================
While there are many advantages to having a single builder representing
a multi-version test, working on every branch, there is at least one
disadvantage: the need to define a base or reference version, which is
the only version that can be used to trigger the test, and the only one
on which code changes can be tried.
If an error is detected while running a test, and the issue lies with
a version other than the one used to invoke the test, the fix would
have to make it into the repository before the test could be tried
again.
For simple tests, the issue might be mitigated by running them locally.
However, the multi-version tests are more likely to be complicated than
not, and it could be difficult to reproduce a test by hand.
The situation can be made simpler by requiring that any multi-version
test can use only versions lower than the reference version. As errors
are more likely to be found in new rather than old code, this would at
least reduce the number of troublesome cases.
ganeti-2.15.2/doc/design-network.rst 0000644 0000000 0000000 00000025533 12634264163 0017324 0 ustar 00root root 0000000 0000000 ==================
Network management
==================
.. contents:: :depth: 4
This is a design document detailing the implementation of network resource
management in Ganeti.
Current state and shortcomings
==============================
Currently Ganeti supports two configuration modes for instance NICs:
routed and bridged mode. The ``ip`` NIC parameter, which is mandatory
for routed NICs and optional for bridged ones, holds the given NIC's IP
address and may be filled either manually, or via a DNS lookup for the
instance's hostname.
This approach presents some shortcomings:
a) It relies on external systems to perform network resource
management. Although large organizations may already have IP pool
management software in place, this is not usually the case with
stand-alone deployments. For smaller installations it makes sense to
allocate a pool of IP addresses to Ganeti and let it transparently
assign these IPs to instances as appropriate.
b) The NIC network information is incomplete, lacking netmask and
gateway. Operating system providers could for example use the
complete network information to fully configure an instance's
network parameters upon its creation.
Furthermore, having full network configuration information would
enable Ganeti nodes to become more self-contained and be able to
infer system configuration (e.g. /etc/network/interfaces content)
from Ganeti configuration. This should make configuration of
newly-added nodes a lot easier and less dependant on external
tools/procedures.
c) Instance placement must explicitly take network availability in
different node groups into account; the same ``link`` is implicitly
expected to connect to the same network across the whole cluster,
which may not always be the case with large clusters with multiple
node groups.
Proposed changes
----------------
In order to deal with the above shortcomings, we propose to extend
Ganeti with high-level network management logic, which consists of a new
NIC slot called ``network``, a new ``Network`` configuration object
(cluster level) and logic to perform IP address pool management, i.e.
maintain a set of available and occupied IP addresses.
Configuration changes
+++++++++++++++++++++
We propose the introduction of a new high-level Network object,
containing (at least) the following data:
- Symbolic name
- UUID
- Network in CIDR notation (IPv4 + IPv6)
- Default gateway, if one exists (IPv4 + IPv6)
- IP pool management data (reservations)
- Default NIC connectivity mode (bridged, routed). This is the
functional equivalent of the current NIC ``mode``.
- Default host interface (e.g. br0). This is the functional equivalent
of the current NIC ``link``.
- Tags
Each network will be connected to any number of node groups. During the
connection of a network to a nodegroup, we define the corresponding
connectivity mode (bridged or routed) and the host interface (br100 or
routing_table_200). This is achieved by adding a ``networks`` slot to
the NodeGroup object and using the networks' UUIDs as keys. The value
for each key is a dictionary containing the network's ``mode`` and
``link`` (netparams). Every NIC assigned to the network will eventually
inherit the network's netparams, as its nicparams.
IP pool management
++++++++++++++++++
A new helper library is introduced, wrapping around Network objects to
give IP pool management capabilities. A network's pool is defined by two
bitfields, the length of the network size each:
``reservations``
This field holds all IP addresses reserved by Ganeti instances.
``external reservations``
This field holds all IP addresses that are manually reserved by the
administrator (external gateway, IPs of external servers, etc) or
automatically by ganeti (the network/broadcast addresses,
Cluster IPs (node addresses + cluster master)). These IPs are excluded
from the IP pool and cannot be assigned automatically by ganeti to
instances (via ip=pool).
The bitfields are implemented using the python-bitarray package for
space efficiency and their binary value stored base64-encoded for JSON
compatibility. This approach gives relatively compact representations
even for large IPv4 networks (e.g. /20).
Cluster IP addresses (node + master IPs) are reserved automatically
as external if the cluster's data network itself is placed under
pool management.
Helper ConfigWriter methods provide free IP address generation and
reservation, using a TemporaryReservationManager.
It should be noted that IP pool management is performed only for IPv4
networks, as they are expected to be densely populated. IPv6 networks
can use different approaches, e.g. sequential address asignment or
EUI-64 addresses.
New NIC parameter: network
++++++++++++++++++++++++++
In order to be able to use the new network facility while maintaining
compatibility with the current networking model, a new NIC parameter is
introduced, called ``network`` to reflect the fact that the given NIC
belongs to the given network and its configuration is managed by Ganeti
itself. To keep backwards compatibility, existing code is executed if
the ``network`` value is 'none' or omitted during NIC creation. If we
want our NIC to be assigned to a network, then only the ip (optional)
and the network parameters should be passed. Mode and link are inherited
from the network-nodegroup mapping configuration (netparams). This
provides the desired abstraction between the VM's network and the
node-specific underlying infrastructure.
We also introduce a new ``ip`` address value, ``constants.NIC_IP_POOL``,
that specifies that a given NIC's IP address should be obtained using
the first available IP address inside the pool of the specified network.
(reservations OR external_reservations). This value is only valid
for NICs belonging to a network. A NIC's IP address can also be
specified manually, as long as it is contained in the network the NIC
is connected to. In case this IP is externally reserved, Ganeti will produce
an error which the user can override if explicitly requested. Of course
this IP will be reserved and will not be able to be assigned to another
instance.
Hooks
+++++
Introduce new hooks concerning network operations:
``OP_NETWORK_ADD``
Add a network to Ganeti
:directory: network-add
:pre-execution: master node
:post-execution: master node
``OP_NETWORK_REMOVE``
Remove a network from Ganeti
:directory: network-remove
:pre-execution: master node
:post-execution: master node
``OP_NETWORK_SET_PARAMS``
Modify a network
:directory: network-modify
:pre-execution: master node
:post-execution: master node
For connect/disconnect operations use existing:
``OP_GROUP_SET_PARAMS``
Modify a nodegroup
:directory: group-modify
:pre-execution: master node
:post-execution: master node
Hook variables
^^^^^^^^^^^^^^
During instance related operations:
``INSTANCE_NICn_NETWORK``
The friendly name of the network
During network related operations:
``NETWORK_NAME``
The friendly name of the network
``NETWORK_SUBNET``
The ip range of the network
``NETWORK_GATEWAY``
The gateway of the network
During nodegroup related operations:
``GROUP_NETWORK``
The friendly name of the network
``GROUP_NETWORK_MODE``
The mode (bridged or routed) of the netparams
``GROUP_NETWORK_LINK``
The link of the netparams
Backend changes
+++++++++++++++
To keep the hypervisor-visible changes to a minimum, and maintain
compatibility with the existing network configuration scripts, the
instance's hypervisor configuration will have host-level mode and link
replaced by the *connectivity mode* and *host interface* (netparams) of
the given network on the current node group.
Network configuration scripts detect if a NIC is assigned to a Network
by the presence of the new environment variable:
Network configuration script variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``NETWORK``
The friendly name of the network
Conflicting IPs
+++++++++++++++
To ensure IP uniqueness inside a nodegroup, we introduce the term
``conflicting ips``. Conflicting IPs occur: (a) when creating a
networkless NIC with IP contained in a network already connected to the
instance's nodegroup (b) when connecting/disconnecting a network
to/from a nodegroup and at the same time instances with IPs inside the
network's range still exist. Conflicting IPs produce prereq errors.
Handling of conflicting IP with --force option:
For case (a) reserve the IP and assign the NIC to the Network.
For case (b) during connect same as (a), during disconnect release IP and
reset NIC's network parameter to None
Userland interface
++++++++++++++++++
A new client script is introduced, ``gnt-network``, which handles
network-related configuration in Ganeti.
Network addition/deletion
^^^^^^^^^^^^^^^^^^^^^^^^^
::
gnt-network add --network=192.168.100.0/28 --gateway=192.168.100.1 \
--network6=2001:db8:2ffc::/64 --gateway6=2001:db8:2ffc::1 \
--add-reserved-ips=192.168.100.10,192.168.100.11 net100
(Checks for already exising name and valid IP values)
gnt-network remove network_name
(Checks if not connected to any nodegroup)
Network modification
^^^^^^^^^^^^^^^^^^^^
::
gnt-network modify --gateway=192.168.100.5 net100
(Changes the gateway only if ip is available)
gnt-network modify --add-reserved-ips=192.168.100.11 net100
(Adds externally reserved ip)
gnt-network modify --remove-reserved-ips=192.168.100.11 net100
(Removes externally reserved ip)
Assignment to node groups
^^^^^^^^^^^^^^^^^^^^^^^^^
::
gnt-network connect net100 nodegroup1 bridged br100
(Checks for existing bridge among nodegroup)
gnt-network connect net100 nodegroup2 routed rt_table
(Checks for conflicting IPs)
gnt-network disconnect net101 nodegroup1
(Checks for conflicting IPs)
Network listing
^^^^^^^^^^^^^^^
::
gnt-network list
Network Subnet Gateway NodeGroups GroupList
net100 192.168.100.0/28 192.168.100.1 1 default(bridged, br100)
net101 192.168.101.0/28 192.168.101.1 1 default(routed, rt_tab)
Network information
^^^^^^^^^^^^^^^^^^^
::
gnt-network info testnet1
Network name: testnet1
subnet: 192.168.100.0/28
gateway: 192.168.100.1
size: 16
free: 10 (62.50%)
usage map:
0 XXXXX..........X 63
(X) used (.) free
externally reserved IPs:
192.168.100.0, 192.168.100.1, 192.168.100.15
connected to node groups:
default(bridged, br100)
used by 3 instances:
test1 : 0:192.168.100.4
test2 : 0:192.168.100.2
test3 : 0:192.168.100.3
IAllocator changes
++++++++++++++++++
The IAllocator protocol can be made network-aware, i.e. also consider
network availability for node group selection. Networks, as well as
future shared storage pools, can be seen as constraints used to rule out
the placement on certain node groups.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-network2.rst 0000644 0000000 0000000 00000045240 12634264163 0017403 0 ustar 00root root 0000000 0000000 ============================
Network Management (revised)
============================
.. contents:: :depth: 4
This is a design document detailing how to extend the existing network
management and make it more flexible and able to deal with more generic
use cases.
Current state and shortcomings
------------------------------
Currently in Ganeti, networks are tightly connected with IP pools,
since creation of a network implies the existence of one subnet
and the corresponding IP pool. This design does not allow common
scenarios like:
- L2 only networks
- IPv6 only networks
- Networks without an IP pool
- Networks with an IPv6 pool
- Networks with multiple IP pools (alternative to externally reserving
IPs)
Additionally one cannot have multiple IP pools inside one network.
Finally, from the instance perspective, a NIC cannot get more than one
IPs (v4 and v6).
Proposed changes
----------------
In order to deal with the above shortcomings, we propose to extend
the existing networks in Ganeti and support:
a) Networks with multiple subnets
b) Subnets with multiple IP pools
c) NICs with multiple IPs from various subnets of a single network
These changes bring up some design and implementation issues that we
discuss in the following sections.
Semantics
++++++++++
Quoting the initial network management design doc "an IP pool consists
of two bitarrays. Specifically the ``reservations`` bitarray which holds
all IP addresses reserved by Ganeti instances and the ``external
reservations`` bitarray with all IPs that are excluded from the IP pool
and cannot be assigned automatically by Ganeti to instances (via
ip=pool)".
Without violating those semantics, here, we clarify the following
definitions.
**network**: A cluster level taggable configuration object with a
user-provider name, (e.g. network1, network2), UUID and MAC prefix.
**L2**: The `mode` and `link` with which we connect a network to a
nodegroup. A NIC attached to a network will inherit this info, just like
connecting an Ethernet cable to a physical NIC. In this sense we only
have one L2 info per NIC.
**L3**: A CIDR and a gateway related to the network. Since a NIC can
have multiple IPs on the same cable each network can have multiple L3
info with the restriction that they do not overlap with each other.
The gateway is optional (just like with current implementation). No
gateway can be used for private networks that do not have a default
route.
**subnet**: A subnet is the above L3 info plus some additional information
(see below).
**ip**: A valid IP should reside in a network's subnet, and should not
be used by more than one instance. An IP can be either obtained dynamically
from a pool or requested explicitly from a subnet (or a pool).
**range**: Sequential IPs inside one subnet calculated either from the
first IP and a size (e.g. start=192.0.2.10, size=10) or the first IP and
the last IP (e.g. start=192.0.2.10, end=192.0.2.19). A single IP can
also be thought of as an IP range with size=1 (see configuration
changes).
**reservations**: All IPs that are used by instances in the cluster at
any time.
**external reservations**: All IPs that are supposed to be reserved
by the admin for either some external component or specific instances.
If one instance requests an external IP explicitly (ip=192.0.2.100),
Ganeti will allow the operation only if ``--force`` is given. Still, the
admin can externally reserve an IP that is already in use by an
instance, as happens now. This helps to reserve an IP for future use and
at the same time prevent any possible race between the instance that
releases this IP and another that tries to retrieve it.
**pool**: A (range, reservations, name) tuple from which instances can
dynamically obtain an IP. Reservations is a bitarray with
length the size of the range, and is needed so that we know which IPs
are available at any time without querying all instances. The use of
name is explained below. A subnet can have multiple pools.
Split L2 from L3
++++++++++++++++
Currently networks in Ganeti do not separate L2 from L3. This means
that one cannot use L2 only networks. The reason is because the CIDR
(passed currently with the ``--network`` option) and the derived IP pool
are mandatory. This design makes L3 info optional. This way we can have
an L2 only network just by connecting a Ganeti network to a nodegroup
with the desired `mode` and `link`. Then one could add one or more subnets
to the existing network.
Multiple Subnets per Network
++++++++++++++++++++++++++++
Currently the IPv4 CIDR is mandatory for a network. Also a network can
obtain at most one IPv4 CIDR and one IPv6 CIDR. These restrictions will
be lifted.
This design doc introduces support for multiple subnets per network. The
L3 info will be moved inside the subnet. A subnet will have a `name` and
a `uuid` just like NIC and Disk config objects. Additionally it will contain
the `dhcp` flag which is explained below, and the `pools` and `external`
fields which are mentioned in the next section. Only the `cidr` will be
mandatory.
Any subnet related actions will be done via the new ``--subnet`` option.
Its syntax will be similar to ``--net``.
The network's subnets must not overlap with each other. Logic will
validate any operations related to reserving/releasing of IPs and check
whether a requested IP is included inside one of the network's subnets.
Just like currently, the L3 info will be exported to NIC configuration
hooks and scripts as environment variables. The example below adds
subnets to a network:
::
gnt-network modify --subnet add:cidr=10.0.0.0/24,gateway=10.0.0.1,dhcp=true net1
gnt-network modify --subnet add:cidr=2001::/64,gateway=2001::1,dhcp=true net1
To remove a subnet from a network one should use:
::
gnt-network modify --subnet some-ident:remove network1
where some-ident can be either a CIDR, a name or a UUID. Ganeti will
allow this operation only if no instances use IPs from this subnet.
Since DHCP is allowed only for a single CIDR on the same cable, the
subnet must have a `dhcp` flag. Logic must not allow more that one
subnets of the same version (4 or 6) in the same network to have DHCP enabled.
To modify a subnet's name or the dhcp flag one could use:
::
gnt-network modify --subnet some-ident:modify,dhcp=false,name=foo network1
This would search for a registered subnet that matches the identifier,
disable DHCP on it and change its name.
The ``dhcp`` parameter is used only for validation purposes and does not
make Ganeti starting a DHCP service. It will just be exported to
external scripts (ifup and hooks) and handled accordingly.
Changing the CIDR or the gateway of a subnet should also be supported.
::
gnt-network modify --subnet some-ident:modify,cidr=192.0.2.0/22 net1
gnt-network modify --subnet some-ident:modify,cidr=192.0.2.32/28 net1
gnt-network modify --subnet some-ident:modify,gateway=192.0.2.40 net1
Before expanding a subnet logic should should check for overlapping
subnets. Shrinking the subnet should be allowed only if the ranges
that are about to be trimmed are not included either in pool
reservations or external ranges.
Multiple IP pools per Subnet
++++++++++++++++++++++++++++
Currently IP pools are automatically created during network creation and
include the whole subnet. Some IPs can be excluded from the pool by
passing them explicitly with ``--add-reserved-ips`` option.
Still for IPv6 subnets or even big IPv4 ones this might be insufficient.
It is impossible to have two bitarrays for a /64 prefix. Even for IPv4
networks a /20 subnet currently requires 8K long bitarrays. And the
second 4K is practically useless since the external reservations are way
less than the actual reservations.
This design extract IP pool management from the network logic, and pools
will become optional. Currently the pool is created based on the
network's CIDR. With multiple subnets per network, we should be able to
create and add IP pools to a network (and eventually to the
corresponding subnet). Each pool will have an optional user friendly
`name` so that the end user can refer to it (see instance related
operations).
The user will be able to obtain dynamically an IP only if we have
already defined a pool for a network's subnet. One would use ``ip=pool``
for the first available IP of the first available pool, or
``ip=some-pool-name`` for the first available IP of a specific pool.
Any pool related actions will be done via the new ``--pool`` option.
In order to add a pool a relevant subnet should pre-exist. Overlapping
pools won't be allowed. For example:
::
gnt-network modify --pool add:192.0.2.10-192.0.2.100,name=pool1 net1
gnt-network modify --pool add:10.0.0.7-10.0.0.20 net1
gnt-network modify --pool add:10.0.0.100 net1
will first parse and find the ranges. Then for each range, Ganeti will
try to find a matching subnet meaning that a pool must be a subrange of
the subnet. If found, the range with empty reservations will be appended
to the list of the subnet's pools. Moreover, logic must be added to
reserve the IPs that are currently in use by instances of this network.
Adding a pool can be easier if we associate it directly with a subnet.
For example on could use the following shortcuts:
::
gnt-network modify --subnet add:cidr=10.0.0.0/27,pool net1
gnt-network modify --pool add:subnet=some-ident
gnt-network modify --pool add:10.0.0.0/27 net1
During pool removal, logic should be added to split pools if ranges
given overlap existing ones. For example:
::
gnt-network modify --pool remove:192.0.2.20-192.0.2.50 net1
will split the pool previously added (10-100) into two new ones;
10-19 and 51-100. The corresponding bitarrays will be trimmed
accordingly. The name will be preserved.
The same things apply to external reservations. Just like now,
modifications will take place via the ``--add|remove-reserved-ips``
option. Logic must be added to support IP ranges.
::
gnt-network modify --add-reserved-ips 192.0.2.20-192.0.2.50 net1
Based on the aforementioned we propose the following changes:
1) Change the IP pool representation in config data.
Existing `reservations` and `external_reservations` bitarrays will be
removed. Instead, for each subnet we will have:
* `pools`: List of (IP range, reservations bitarray) tuples.
* `external`: List of IP ranges
For external ranges the reservations bitarray is not needed
since this will be all 1's.
A configuration example could be::
net1 {
subnets [
uuid1 {
name: subnet1
cidr: 192.0.2.0/24
pools: [
{range:Range(192.0.2.10, 192.0.2.15), reservations: 00000, name:pool1}
]
reserved: [192.0.2.15]
}
uuid2 {
name: subnet2
cidr: 10.0.0.0/24
pools: [
{range:10.0.0.8/29, reservations: 00000000, name:pool3}
{range:10.0.0.40-10.0.0.45, reservations: 000000, name:pool3}
]
reserved: [Range(10.0.0.8, 10.0.0.15), 10.2.4.5]
}
]
}
Range(start, end) will be some json representation of an IPRange().
We decide not to store external reservations as pools (and in the
same list) since we get the following advantages:
- Keep the existing semantics for pools and external reservations.
- Each list has similar entries: one has pools the other has ranges.
The pool must have a bitarray, and has an optional name. It is
meaningless to add a name and a bitarray to external ranges.
- Each list must not have overlapping ranges. Still external
reservations can overlap with pools.
- The --pool option supports add|remove|modify command just like
`--net` and `--disk` and operate on single entities (a restriction that
is not needed for external reservations).
- Another thing, and probably the most important, is that in order to
get the first available IP, only the reserved list must be checked for
conflicts. The ipaddr.summarize_address_range(first, last) could be very
helpful.
2) Change the network module logic.
The above changes should be done in the network module and be transparent
to the rest of the Ganeti code. If a random IP from the networks is
requested, Ganeti searches for an available IP from the first pool of
the first subnet. If it is full it gets to the next pool. Then to the
next subnet and so on. Of course the `external` IP ranges will be
excluded. If an IP is explicitly requested, Ganeti will try to find a
matching subnet. Its pools and external will be checked for
availability. All this logic will be extracted in a separate class
with helper methods for easier manipulation of IP ranges and
bitarrays.
Bitarray processing can be optimized too. The usage of bitarrays will
be reduced since (a) we no longer have `external_reservations` and (b)
pools will have shorter bitarrays (i.e. will not have to cover the whole
subnet). Besides that, we could keep the bitarrays in memory, so that
in most cases (e.g. adding/removing reservations, querying), we don't
keep converting strings to bitarrays and vice versa. Also, the Haskell
code could as well keep this in memory as a bitarray, and validate it
on load.
3) Changes in config module.
We should not have instances with the same IP inside the same network.
We introduce _AllIPs() helper config method that will hold all existing
(IP, network) tuples. Config logic will check this list as well
before passing it to TemporaryReservationManager.
4) Change the query mechanism.
Since we have more that one subnets the new `subnets` field will
include a list of:
* cidr: IPv4 or IPv6 CIDR
* gateway: IPv4 or IPv6 address
* dhcp: True or False
* name: The user friendly name for the subnet
Since we want to support small pools inside big subnets, current query
results are not practical as far as the `map` field is concerned. It
should be replaced with the new `pools` field for each subnet, which will
contain:
* start: The first IP of the pool
* end: The last IP of the pool
* map: A string with 'X' for reserved IPs (either external or not) and
with '.' for all available ones inside the pool
Multiple IPs per NIC
++++++++++++++++++++
Currently IP is a simple string inside the NIC object and there is a
one-to-one mapping between the `ip` and the `network` slots. The whole
logic behind this is that a NIC belongs to a network (cable) and
inherits its mode and link. This rational will not change.
Since this design adds support for multiple subnets per network, a NIC
must be able to obtain multiple IPs from various subnets of the same
network. Thus we change the `ip` slot into list.
We introduce a new `ipX` attribute. For backwards compatibility `ip`
will denote `ip0`.
During instance related operations one could use something like:
::
gnt-instance add --net 0:ip0=192.0.2.4,ip1=pool,ip2=some-pool-name,network=network1 inst1
gnt-instance add --net 0:ip=pool,network1 inst1
This will be parsed, converted to a proper list (e.g. ip = [192.0.2.4,
"pool", "some-pool-name"]) and finally passed to the corresponding opcode.
Based on the previous example, here the first IP will match subnet1, the
second IP will be retrieved from the first available pool of the first
available subnet, and the third from the pool with the some-pool name.
During instance modification, the `ip` option will refer to the first IP
of the NIC, whereas the `ipX` will refer to the X'th IP. As with NICs
we start counting from 0 so `ip1` will refer to the second IP. For example
one should pass:
::
--net 0:modify,ip1=1.2.3.10
to change the second IP of the first NIC to 1.2.3.10,
::
--net -1:add,ip0=pool,ip1=1.2.3.4,network=test
to add a new NIC with two IPs, and
::
--net 1:modify,ip1=none
to remove the second IP of the second NIC.
Configuration changes
---------------------
IPRange config object:
Introduce new config object that will hold ranges needed by pools, and
reservations. It will be either a tuple of (start, size, end) or a
simple string. The `end` is redundant and can derive from start and
size in runtime, but will appear in the representation for readability
and debug reasons.
Pool config object:
Introduce new config object to represent a single subnet's pool. It
will have the `range`, `reservations`, `name` slots. The range slot
will be an IPRange config object, the reservations a bitarray and the
name a simple string.
Subnet config object:
Introduce new config object with slots: `name`, `uuid`, `cidr`,
`gateway`, `dhcp`, `pools`, `external`. Pools is a list of Pool config
objects. External is a list of IPRange config objects. All ranges must
reside inside the subnet's CIDR. Only `cidr` will be mandatory. The
`dhcp` attribute will be False by default.
Network config objects:
The L3 and the IP pool representation will change. Specifically all
slots besides `name`, `mac_prefix`, and `tag` will be removed. Instead
the slot `subnets` with a list of Subnet config objects will be added.
NIC config objects:
NIC's network slot will be removed and the `ip` slot will be modified
to a list of strings.
KVM runtime files:
Any change done in config data must be done also in KVM runtime files.
For this purpose the existing _UpgradeSerializedRuntime() can be used.
Exported variables
------------------
The exported variables during instance related operations will be just
like Linux uses aliases for interfaces. Specifically:
``IP:i`` for the ith IP.
``NETWORK_*:i`` for the ith subnet. * is SUBNET, GATEWAY, DHCP.
In case of hooks those variables will be prefixed with ``INSTANCE_NICn``
for the nth NIC.
Backwards Compatibility
-----------------------
The existing networks representation will be internally modified.
They will obtain one subnet, and one pool with range the whole subnet.
During `gnt-network add` if the deprecated ``--network`` option is passed
will still create a network with one subnet, and one IP pool with the
size of the subnet. Otherwise ``--subnet`` and ``--pool`` options
will be needed.
The query mechanism will also include the deprecated `map` field. For the
newly created network this will contain only the mapping of the first
pool. The deprecated `network`, `gateway`, `network6`, `gateway6` fields
will point to the first IPv4 and IPv6 subnet accordingly.
During instance related operation the `ip` argument of the ``--net``
option will refer to the first IP of the NIC.
Hooks and scripts will still have the same environment exported in case
of single IP per NIC.
This design allows more fine-grained configurations which in turn yields
more flexibility and a wider coverage of use cases. Still basic cases
(the ones that are currently available) should be easy to set up.
Documentation will be enriched with examples for both typical and
advanced use cases of gnt-network.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-node-add.rst 0000644 0000000 0000000 00000013624 12634264163 0017304 0 ustar 00root root 0000000 0000000 Design for adding a node to a cluster
=====================================
.. contents:: :depth: 3
Note
----
Closely related to this design is the more recent design
:doc:`node security ` which extends and changes
some of the aspects mentioned in this document. Make sure that you
read the more recent design as well to get an up to date picture of
Ganeti's procedure for adding new nodes.
Current state and shortcomings
------------------------------
Before a node can be added to a cluster, its SSH daemon must be
re-configured to use the cluster-wide SSH host key. Ganeti 2.3.0 changed
the way this is done by moving all related code to a separate script,
``tools/setup-ssh``, using Paramiko. Before all such configuration was
done from ``lib/bootstrap.py`` using the system's own SSH client and a
shell script given to said client through parameters.
Both solutions controlled all actions on the connecting machine; the
newly added node was merely executing commands. This implies and
requires a tight coupling and equality between nodes (e.g. paths to
files being the same). Most of the logic and error handling is also done
on the connecting machine.
Once a node's SSH daemon has been configured, more than 25 files need to
be copied using ``scp`` before the node daemon can be started. No
verification is being done before files are copied. Once the node daemon
is started, an opcode is submitted to the master daemon, which will then
copy more files, such as the configuration and job queue for master
candidates, using RPC. This process is somewhat fragile and requires
initiating many SSH connections.
Proposed changes
----------------
SSH
~~~
The main goal is to move more logic to the newly added node. Instead of
having a relatively large script executed on the master node, most of it
is moved over to the added node.
A new script named ``prepare-node-join`` is added. It receives a JSON
data structure (defined :ref:`below `) on its
standard input. Once the data has been successfully decoded, it proceeds
to configure the local node's SSH daemon and root's SSH settings, after
which the SSH daemon is restarted.
All the master node has to do to add a new node is to gather all
required data, build the data structure, and invoke the script on the
node to be added. This will enable us to once again use the system's own
SSH client and to drop the dependency on Paramiko for Ganeti itself
(``ganeti-listrunner`` is going to continue using Paramiko).
Eventually ``setup-ssh`` can be removed.
Node daemon
~~~~~~~~~~~
Similar to SSH setup changes, the process of copying files and starting
the node daemon will be moved into a dedicated program. On its standard
input it will receive a standardized JSON structure (defined :ref:`below
`). Once the input data has been successfully
decoded and the received values were verified for sanity, the program
proceeds to write the values to files and then starts the node daemon
(``ganeti-noded``).
To add a new node to the cluster, the master node will have to gather
all values, build the data structure, and then invoke the newly added
``node-daemon-setup`` program via SSH. In this way only a single SSH
connection is needed and the values can be verified before being written
to files.
If the program exits successfully, the node is ready to be added to the
master daemon's configuration. The node daemon will be running, but
``OpNodeAdd`` needs to be run before it becomes a full node. The opcode
will copy more files, such as the :doc:`RAPI certificate `.
Data structures
---------------
.. _prepare-node-join-json:
JSON structure for SSH setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The data is given in an object containing the keys described below.
Unless specified otherwise, all entries are optional.
``cluster_name``
Required string with the cluster name. If a local cluster name is
found, the join process is aborted unless the passed cluster name
matches the local name.
``node_daemon_certificate``
Public part of cluster's node daemon certificate in PEM format. If a
local node certificate and key is found, the join process is aborted
unless this passed public part can be verified with the local key.
``ssh_host_key``
List containing public and private parts of SSH host key. See below
for definition.
``ssh_root_key``
List containing public and private parts of root's key for SSH
authorization. See below for definition.
Lists of SSH keys use a tuple with three values. The first describes the
key variant (``rsa`` or ``dsa``). The second and third are the private
and public part of the key. Example:
.. highlight:: javascript
::
[
("rsa", "-----BEGIN RSA PRIVATE KEY-----...", "ssh-rss AAAA..."),
("dsa", "-----BEGIN DSA PRIVATE KEY-----...", "ssh-dss AAAA..."),
]
.. _node-daemon-setup-json:
JSON structure for node daemon setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The data is given in an object containing the keys described below.
Unless specified otherwise, all entries are optional.
``cluster_name``
Required string with the cluster name. If a local cluster name is
found, the join process is aborted unless the passed cluster name
matches the local name. The cluster name is also included in the
dictionary given via the ``ssconf`` entry.
``node_daemon_certificate``
Public and private part of cluster's node daemon certificate in PEM
format. If a local node certificate is found, the process is aborted
unless it matches.
``ssconf``
Dictionary with ssconf names and their values. Both are strings.
Example:
.. highlight:: javascript
::
{
"cluster_name": "cluster.example.com",
"master_ip": "192.168.2.1",
"master_netdev": "br0",
# …
}
``start_node_daemon``
Boolean denoting whether the node daemon should be started (or
restarted if it was running for some reason).
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-node-security.rst 0000644 0000000 0000000 00000074646 12634264163 0020436 0 ustar 00root root 0000000 0000000 =============================
Improvements of Node Security
=============================
This document describes an enhancement of Ganeti's security by restricting
the distribution of security-sensitive data to the master and master
candidates only.
Note: In this document, we will use the term 'normal node' for a node that
is neither master nor master-candidate.
.. contents:: :depth: 4
Objective
=========
Up till 2.10, Ganeti distributes security-relevant keys to all nodes,
including nodes that are neither master nor master-candidates. Those
keys are the private and public SSH keys for node communication and the
SSL certficate and private key for RPC communication. Objective of this
design is to limit the set of nodes that can establish ssh and RPC
connections to the master and master candidates.
As pointed out in
`issue 377 `_, this
is a security risk. Since all nodes have these keys, compromising
any of those nodes would possibly give an attacker access to all other
machines in the cluster. Reducing the set of nodes that are able to
make ssh and RPC connections to the master and master candidates would
significantly reduce the risk simply because fewer machines would be a
valuable target for attackers.
Note: For bigger installations of Ganeti, it is advisable to run master
candidate nodes as non-vm-capable nodes. This would reduce the attack
surface for the hypervisor exploitation.
Detailed design
===============
Current state and shortcomings
------------------------------
Currently (as of 2.10), all nodes hold the following information:
- the ssh host keys (public and private)
- the ssh root keys (public and private)
- node daemon certificate (the SSL client certificate and its
corresponding private key)
Concerning ssh, this setup contains the following security issue. Since
all nodes of a cluster can ssh as root into any other cluster node, one
compromised node can harm all other nodes of a cluster.
Regarding the SSL encryption of the RPC communication with the node
daemon, we currently have the following setup. There is only one
certificate which is used as both, client and server certificate. Besides
the SSL client verification, we check if the used client certificate is
the same as the certificate stored on the server.
This means that any node running a node daemon can also act as an RPC
client and use it to issue RPC calls to other cluster nodes. This in
turn means that any compromised node could be used to make RPC calls to
any node (including itself) to gain full control over VMs. This could
be used by an attacker to for example bring down the VMs or exploit bugs
in the virtualization stacks to gain access to the host machines as well.
Proposal concerning SSH host key distribution
---------------------------------------------
We propose the following design regarding the SSH host key handling. The
root keys are untouched by this design.
Each node gets its own ssh private/public key pair, but only the public
keys of the master candidates get added to the ``authorized_keys`` file
of all nodes. This has the advantages, that:
- Only master candidates can ssh into other nodes, thus compromised
nodes cannot compromise the cluster further.
- One can remove a compromised master candidate from a cluster
(including removing it's public key from all nodes' ``authorized_keys``
file) without having to regenerate and distribute new ssh keys for all
master candidates. (Even though it is be good practice to do that anyway,
since the compromising of the other master candidates might have taken
place already.)
- If a (uncompromised) master candidate is offlined to be sent for
repair due to a hardware failure before Ganeti can remove any keys
from it (for example when the network adapter of the machine is broken),
we don't have to worry about the keys being on a machine that is
physically accessible.
To ensure security while transferring public key information and
updating the ``authorized_keys``, there are several other changes
necessary:
- Any distribution of keys (in this case only public keys) is done via
SSH and not via RPC. An attacker who has RPC control should not be
able to get SSH access where he did not have SSH access before
already.
- The only RPC calls that are made in this context are from the master
daemon to the node daemon on its own host and noded ensures as much
as possible that the change to be made does not harm the cluster's
security boundary.
- The nodes that are potential master candidates keep a list of public
keys of potential master candidates of the cluster in a separate
file called ``ganeti_pub_keys`` to keep track of which keys could
possibly be added ``authorized_keys`` files of the nodes. We come
to what "potential" means in this case in the next section. The key
list is only transferred via SSH or written directly by noded. It
is not stored in the cluster config, because the config is
distributed via RPC.
The following sections describe in detail which Ganeti commands are
affected by the proposed changes.
RAPI
~~~~
The design goal to limit SSH powers to master candidates conflicts with
the current powers a user of the RAPI interface would have. The
``master_capable`` flag of nodes can be modified via RAPI.
That means, an attacker that has access to the RAPI interface, can make
all non-master-capable nodes master-capable, and then increase the master
candidate pool size till all machines are master candidates (or at least
a particular machine that he is aming for). This means that with RAPI
access and a compromised normal node, one can make this node a master
candidate and then still have the power to compromise the whole cluster.
To mitigate this issue, we propose the following changes:
- Add a flag ``master_capability_rapi_modifiable`` to the cluster
configuration which indicates whether or not it should be possible
to modify the ``master_capable`` flag of nodes via RAPI. The flag is
set to ``False`` by default and can itself only be changed on the
commandline. In this design doc, we refer to the flag as the
"rapi flag" from here on.
- Only if the ``master_capabability_rapi_modifiable`` switch is set to
``True``, it is possible to modify the master-capability flag of
nodes.
With this setup, there are the following definitions of "potential
master candidates" depending on the rapi flag:
- If the rapi flag is set to ``True``, all cluster nodes are potential
master candidates, because as described above, all of them can
eventually be made master candidates via RAPI and thus security-wise,
we haven't won anything above the current SSH handling.
- If the rapi flag is set to ``False``, only the master capable nodes
are considered potential master candidates, as it is not possible to
make them master candidates via RAPI at all.
Note that when the rapi flag is changed, the state of the
``ganeti_pub_keys`` file on all nodes has to be updated accordingly.
This should be done in the client script ``gnt_cluster`` before the
RPC call to update the configuration is made, because this way, if
someone would try to perform that RPC call on master to trick it into
thinking that the flag is enabled, this would not help as the content of
the ``ganeti_pub_keys`` file is a crucial part in the design of the
distribution of the SSH keys.
Note: One could think of always allowing to disable the master-capability
via RAPI and just restrict the enabling of it, thus making it possible
to RAPI-"freeze" the nodes' master-capability state once it disabled.
However, we think these are rather confusing semantics of the involved
flags and thus we go with proposed design.
Note that this change will break RAPI compatibility, at least if the
rapi flag is not explicitely set to ``True``. We made this choice to
have the more secure option as default, because otherwise it is
unlikely to be widely used.
Cluster initialization
~~~~~~~~~~~~~~~~~~~~~~
On cluster initialization, the following steps are taken in
bootstrap.py:
- A public/private key pair is generated (as before), but only used
by the first (and thus master) node. In particular, the private key
never leaves the node.
- A mapping of node UUIDs to public SSH keys is created and stored
as text file in ``/var/lib/ganeti/ganeti_pub_keys`` only accessible
by root (permissions 0600). The master node's uuid and its public
key is added as first entry. The format of the file is one
line per node, each line composed as ``node_uuid ssh_key``.
- The node's public key is added to it's own ``authorized_keys`` file.
(Re-)Adding nodes to a cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
According to :doc:`design-node-add`, Ganeti transfers the ssh keys to
every node that gets added to the cluster.
Adding a new node will require the following steps.
In gnt_node.py:
- On the new node, a new public/private SSH key pair is generated.
- The public key of the new node is fetched (via SSH) to the master
node and if it is a potential master candidate (see definition above),
it is added to the ``ganeti_pub_keys`` list on the master node.
- The public keys of all current master candidates are added to the
new node's ``authorized_keys`` file (also via SSH).
In LUNodeAdd in cmdlib/node.py:
- The LUNodeAdd determines whether or not the new node is a master
candidate and in any case updates the cluster's configuration with the
new nodes information. (This is not changed by the proposed design.)
- If the new node is a master candidate, we make an RPC call to the node
daemon of the master node to add the new node's public key to all
nodes' ``authorized_keys`` files. The implementation of this RPC call
has to be extra careful as described in the next steps, because
compromised RPC security should not compromise SSH security.
RPC call execution in noded (on master node):
- Check that the public key of the new node is in the
``ganeti_pub_keys`` file of the master node to make sure that no keys
of nodes outside the Ganeti cluster and no keys that are not potential
master candidates gain SSH access in the cluster.
- Via SSH, transfer the new node's public key to all nodes (including
the new node) and add it to their ``authorized_keys`` file.
- The ``ganeti_pub_keys`` file is transferred via SSH to all
potential master candidates nodes except the master node
(including the new one).
In case of readding a node that used to be in the cluster before,
handling of the SSH keys would basically be the same, in particular also
a new SSH key pair is generated for the node, because we cannot be sure
that the old key pair has not been compromised while the node was
offlined. Note that for reasons of data hygiene, a node's
``ganeti_pub_keys`` file is cleared before the node is readded.
Also, Ganeti attempts to remove any Ganeti keys from the ``authorized_keys``
file before the node is readded. However, since Ganeti does not keep a list
of all keys ever used in the cluster, this applies only to keys which
are currently used in the cluster. Note that Ganeti won't touch any keys
that were added to the ``authorized_keys`` by other systems than Ganeti.
Pro- and demoting a node to/from master candidate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the role of a node is changed from 'normal' to 'master_candidate',
the procedure is the same as for adding nodes from the step "In
LUNodeAdd ..." on.
If a node gets demoted to 'normal', the master daemon makes a similar
RPC call to the master node's node daemon as for adding a node.
In the RPC call, noded will perform the following steps:
- Check that the public key of the node to be demoted is indeed in the
``ganeti_pub_keys`` file to avoid deleting ssh keys of machines that
don't belong to the cluster (and thus potentially lock out the
administrator).
- Via SSH, remove the key from all node's ``authorized_keys`` files.
This affected the behavior of the following commands:
::
gnt-node modify --master-candidate=yes
gnt-node modify --master-candidate=no [--auto-promote]
If the node has been master candidate already before the command to promote
it was issued, Ganeti does not do anything.
Note that when you demote a node from master candidate to normal node, another
master-capable and normal node will be promoted to master candidate. For this
newly promoted node, the same changes apply as if it was explicitely promoted.
The same behavior should be ensured for the corresponding rapi command.
Offlining and onlining a node
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When offlining a node, it immediately loses its role as master or master
candidate as well. When it is onlined again, it will become master
candidate again if it was so before. The handling of the keys should be done
in the same way as when the node is explicitely promoted or demoted to or from
master candidate. See the previous section for details.
This affects the command:
::
gnt-node modify --offline=yes
gnt-node modify --offline=no [--auto-promote]
For offlining, the removal of the keys is particularly important, as the
detection of a compromised node might be the very reason for the offlining.
Of course we cannot guarantee that removal of the key is always successful,
because the node might not be reachable anymore. Even though it is a
best-effort operation, it is still an improvement over the status quo,
because currently Ganeti does not even try to remove any keys.
The same behavior should be ensured for the corresponding rapi command.
Cluster verify
~~~~~~~~~~~~~~
So far, ``gnt-cluster verify`` checks the SSH connectivity of all nodes to
all other nodes. We propose to replace this by the following checks:
- For all master candidates, we check if they can connect any other node
in the cluster (other master candidates and normal nodes).
- We check if the ``ganeti_pub_keys`` file contains keys of nodes that
are no longer in the cluster or that are not potential master
candidates.
- For all normal nodes, we check if their key does not appear in other
node's ``authorized_keys``. For now, we will only emit a warning
rather than an error if this check fails, because Ganeti might be
run in a setup where Ganeti is not the only system manipulating the
SSH keys.
Upgrades
~~~~~~~~
When upgrading from a version that has the previous SSH setup to the one
proposed in this design, the upgrade procedure has to involve the
following steps in the post-upgrade hook:
- For all nodes, new SSH key pairs are generated.
- All nodes and their public keys are added to the ``ganeti_pub_keys``
file and the file is copied to all nodes.
- All keys of master candidate nodes are added to the
``authorized_keys`` files of all other nodes.
Since this upgrade significantly changes the configuration of the
clusters' nodes, we will add a note to the UPGRADE notes to make the
administrator aware of this fact (in case he intends to enable access
from normal nodes to master candidates for other reasons than Ganeti
uses the machines).
Also, in any operation where Ganeti creates new SSH keys, the old keys
will be backed up and not simply overridden.
Downgrades
~~~~~~~~~~
These downgrading steps will be implemtented from 2.13 to 2.12:
- The master node's private/public key pair will be distributed to all
nodes (via SSH) and the individual SSH keys will be backed up.
- The obsolete individual ssh keys will be removed from all nodes'
``authorized_keys`` file.
Renew-Crypto
~~~~~~~~~~~~
The ``gnt-cluster renew-crypto`` command will be extended by a new
option ``--new-ssh-keys``, which will renew all SSH keys on all nodes
and rebuild the ``authorized_keys`` files and the ``ganeti_pub_keys``
files according to the previous sections. This operation will be
performed considering the already stated security considerations, for
example minimizing RPC calls, distribution of keys via SSH only etc.
Compliance to --no-ssh-init and --no-node-setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With this design, Ganeti will do more manipulations of SSH keys and
``authorized_keys`` files than before. If this is not feasible in
a Ganeti environment, the administrator has the option to prevent
Ganeti from performing any manipulations on the SSH setup of the nodes.
The options for doing so, are ``--no-ssh-init`` for ``gnt-cluster
init``, and ``--no-node-setup`` for ``gnt-node add``. Note that
these options already existed before the implementation of this
design, we just confirm that they will be complied to with the
new design as well.
Proposal regarding node daemon certificates
-------------------------------------------
Regarding the node daemon certificates, we propose the following changes
in the design.
- Instead of using the same certificate for all nodes as both, server
and client certificate, we generate a common server certificate (and
the corresponding private key) for all nodes and a different client
certificate (and the corresponding private key) for each node. The
server certificate will be self-signed. The client certficate will
be signed by the server certificate. The client certificates will
use the node UUID as serial number to ensure uniqueness within the
cluster. They will use the host's hostname as the certificate
common name (CN).
- In addition, we store a mapping of
(node UUID, client certificate digest) in the cluster's configuration
and ssconf for hosts that are master or master candidate.
The client certificate digest is a hash of the client certificate.
We suggest a 'sha1' hash here. We will call this mapping 'candidate map'
from here on.
- The node daemon will be modified in a way that on an incoming RPC
request, it first performs a client verification (same as before) to
ensure that the requesting host is indeed the holder of the
corresponding private key. Additionally, it compares the digest of
the certificate of the incoming request to the respective entry of
the candidate map. If the digest does not match the entry of the host
in the mapping or is not included in the mapping at all, the SSL
connection is refused.
This design has the following advantages:
- A compromised normal node cannot issue RPC calls, because it will
not be in the candidate map. (See the ``Drawbacks`` section regarding
an indirect way of achieving this though.)
- A compromised master candidate would be able to issue RPC requests,
but on detection of its compromised state, it can be removed from the
cluster (and thus from the candidate map) without the need for
redistribution of any certificates, because the other master candidates
can continue using their own certificates. However, it is best
practice to issue a complete key renewal even in this case, unless one
can ensure no actions compromising other nodes have not already been
carried out.
- A compromised node would not be able to use the other (possibly master
candidate) nodes' information from the candidate map to issue RPCs,
because the config just stores the digests and not the certificate
itself.
- A compromised node would be able to obtain another node's certificate
by waiting for incoming RPCs from this other node. However, the node
cannot use the certificate to issue RPC calls, because the SSL client
verification would require the node to hold the corresponding private
key as well.
Drawbacks of this design:
- Complexity of node and certificate management will be increased (see
following sections for details).
- If the candidate map is not distributed fast enough to all nodes after
an update of the configuration, it might be possible to issue RPC calls
from a compromised master candidate node that has been removed
from the Ganeti cluster already. However, this is still a better
situation than before and an inherent problem when one wants to
distinguish between master candidates and normal nodes.
- A compromised master candidate would still be able to issue RPC calls,
if it uses ssh to retrieve another master candidate's client
certificate and the corresponding private SSL key. This is an issue
even with the first part of the improved handling of ssh keys in this
design (limiting ssh keys to master candidates), but it will be
eliminated with the second part of the design (separate ssh keys for
each master candidate).
- Even though this proposal is an improvement towards the previous
situation in Ganeti, it still does not use the full power of SSL. For
further improvements, see Section "Related and future work".
- Signing the client certificates with the server certificate will
increase the complexity of the renew-crypto, as a renewal of the
server certificates requires the renewal (and signing) of all client
certificates as well.
Alternative proposals:
- The initial version of this document described a setup where the
client certificates were also self-signed. This led to a serious
problem (Issue 1094), which would only have been solvable by
distributing all client certificates to all nodes and load them
as trusted CAs. As this would have resulted in having to restart
noded on all nodes every time a node is added, removed, demoted
or promoted, this was not feasible and we switched to client
certficates which are signed by the server certificate.
- Instead of generating a client certificate per node, one could think
of just generating two different client certificates, one for normal
nodes and one for master candidates. Noded could then just check if
the requesting node has the master candidate certificate. Drawback of
this proposal is that once one master candidate gets compromised, all
master candidates would need to get a new certificate even if the
compromised master candidate had not yet fetched the certificates
from the other master candidates via ssh.
- In addition to our main proposal, one could think of including a
piece of data (for example the node's host name or UUID) in the RPC
call which is encrypted with the requesting node's private key. The
node daemon could check if the datum can be decrypted using the node's
certificate. However, this would ensure similar functionality as
SSL's built-in client verification and add significant complexity
to Ganeti's RPC protocol.
In the following sections, we describe how our design affects various
Ganeti operations.
Cluster initialization
~~~~~~~~~~~~~~~~~~~~~~
On cluster initialization, so far only the node daemon certificate was
created. With our design, two certificates (and corresponding keys)
need to be created, a server certificate to be distributed to all nodes
and a client certificate only to be used by this particular node. In the
following, we use the term node daemon certificate for the server
certficate only.
In the cluster configuration, the candidate map is created. It is
populated with the respective entry for the master node. It is also
written to ssconf.
(Re-)Adding nodes
~~~~~~~~~~~~~~~~~
When a node is added, the server certificate is copied to the node (as
before). Additionally, a new client certificate (and the corresponding
private key) is created on the new node to be used only by the new node
as client certifcate.
If the new node is a master candidate, the candidate map is extended by
the new node's data. As before, the updated configuration is distributed
to all nodes (as complete configuration on the master candidates and
ssconf on all nodes). Note that distribution of the configuration after
adding a node is already implemented, since all nodes hold the list of
nodes in the cluster in ssconf anyway.
If the configuration for whatever reason already holds an entry for this
node, it will be overriden.
When readding a node, the procedure is the same as for adding a node.
Promotion and demotion of master candidates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a normal node gets promoted to be master candidate, an entry to the
candidate map has to be added and the updated configuration has to be
distributed to all nodes. If there was already an entry for the node,
we override it.
On demotion of a master candidate, the node's entry in the candidate map
gets removed and the updated configuration gets redistibuted.
The same procedure applied to onlining and offlining master candidates.
Cluster verify
~~~~~~~~~~~~~~
Cluster verify will be extended by the following checks:
- Whether each entry in the candidate map indeed corresponds to a master
candidate.
- Whether the master candidate's certificate digest match their entry
in the candidate map.
- Whether no node tries to use the certificate of another node. In
particular, it is important to check that no normal node tries to
use the certificate of a master candidate.
- Whether there are still self-signed client certificates in use (from
a pre 2.12.4 Ganeti version).
Crypto renewal
~~~~~~~~~~~~~~
Currently, when the cluster's cryptographic tokens are renewed using the
``gnt-cluster renew-crypto`` command the node daemon certificate is
renewed (among others). Option ``--new-cluster-certificate`` renews the
node daemon certificate only.
By adding an option ``--new-node-certificates`` we offer to renew the
client certificate. Whenever the client certificates are renewed, the
candidate map has to be updated and redistributed.
If for whatever reason, the candidate map becomes inconsistent, for example
due inconsistent updating after a demotion or offlining), the user can use
this option to renew the client certificates and update the candidate
certificate map.
Note that renewing the server certificate requires all client certificates
being renewed and signed by the new server certificate, because
otherwise their signature can not be verified by the server who only has
the new server certificate then.
As there was a different design in place in Ganeti 2.12.4 and previous
versions, we have to ensure that renew-crypto works on pre 2.12 versions and
2.12.1-4. Users that got hit by Issue 1094 will be encouraged to run
renew-crypto at least once after switching to 2.12.5. Those who did not
encounter this bug yet, will still get nagged friendly by gnt-cluster
verify.
Further considerations
----------------------
Watcher
~~~~~~~
The watcher is a script that is run on all nodes in regular intervals. The
changes proposed in this design will not affect the watcher's implementation,
because it behaves differently on the master than on non-master nodes.
Only on the master, it issues query calls which would require a client
certificate of a node in the candidate mapping. This is the case for the
master node. On non-master node, it's only external communication is done via
the ConfD protocol, which uses the hmac key, which is present on all nodes.
Besides that, the watcher does not make any ssh connections, and thus is
not affected by the changes in ssh key handling either.
Other Keys and Daemons
~~~~~~~~~~~~~~~~~~~~~~
Ganeti handles a couple of other keys/certificates that have not been mentioned
in this design so far. Also, other daemons than the ones mentioned so far
perform intra-cluster communication. Neither the keys nor the daemons will
be affected by this design for several reasons:
- The hmac key used by ConfD (see :doc:`design-2.1`): the hmac key is still
distributed to all nodes, because it was designed to be used for
communicating with ConfD, which should be possible from all nodes.
For example, the monitoring daemon which runs on all nodes uses it to
retrieve information from ConfD. However, since communication with ConfD
is read-only, a compromised node holding the hmac key does not enable an
attacker to change the cluster's state.
- The WConfD daemon writes the configuration to all master candidates
via RPC. Since it only runs on the master node, it's ability to run
RPC requests is maintained with this design.
- The rapi SSL key certificate and rapi user/password file 'rapi_users' is
already only copied to the master candidates (see :doc:`design-2.1`,
Section ``Redistribute Config``).
- The spice certificates are still distributed to all nodes, since it should
be possible to use spice to access VMs on any cluster node.
- The cluster domain secret is used for inter-cluster instance moves.
Since instances can be moved from any normal node of the source cluster to
any normal node of the destination cluster, the presence of this
secret on all nodes is necessary.
Related and Future Work
~~~~~~~~~~~~~~~~~~~~~~~
There a couple of suggestions on how to improve the SSL setup even more.
As a trade-off wrt to complexity and implementation effort, we did not
implement them yet (as of version 2.11) but describe them here for
future reference.
- The server certificate is currently self-signed and the client certificates
are signed by the server certificate. It would increase the security if they
were signed by a common CA. There is already a design doc for a Ganeti CA
which was suggested in a different context (related to import/export).
This would also be a benefit for the RPC calls. See design doc
:doc:`design-impexp2` for more information. Implementing a CA is rather
complex, because it would mean also to support renewing the CA certificate and
providing and supporting infrastructure to revoke compromised certificates.
- An extension of the previous suggestion would be to even enable the
system administrator to use an external CA. Especially in bigger
setups, where already an SSL infrastructure exists, it would be useful
if Ganeti can simply be integrated with it, rather than forcing the
user to use the Ganeti CA.
- Ganeti RPC calls are currently done without checking if the hostname
of the node complies with the common name of the certificate. This
might be a desirable feature, but would increase the effort when a
node is renamed.
- The typical use case for SSL is to have one certificate per node
rather than one shared certificate (Ganeti's noded server certificate)
and a client certificate. One could change the design in a way that
only one certificate per node is used, but this would require a common
CA so that the validity of the certificate can be established by every
node in the cluster.
- With the proposed design, the serial numbers of the client
certificates are set to the node UUIDs. This is technically also not
complying to how SSL is supposed to be used, as the serial numbers
should reflect the enumeration of certificates created by the CA. Once
a CA is implemented, it might be reasonable to change this
accordingly. The implementation of the proposed design also has the
drawback of the serial number not changing even if the certificate is
replaced by a new one (for example when calling ``gnt-cluster renew-
crypt``), which also does not comply to way SSL was designed to be
used.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
ganeti-2.15.2/doc/design-oob.rst 0000644 0000000 0000000 00000040053 12634264163 0016404 0 ustar 00root root 0000000 0000000 Ganeti Node OOB Management Framework
====================================
Objective
---------
Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management
Capabilities.
Background
----------
Ganeti currently has no support for Out of Band management of the nodes
in a cluster. It relies on the OS running on the nodes and has therefore
limited possibilities when the OS is not responding. The command
``gnt-node powercycle`` can be issued to attempt a reboot of a node that
crashed but there are no means to power a node off and power it back
on. Supporting this is very handy in the following situations:
* **Emergency Power Off**: During emergencies, time is critical and
manual tasks just add latency which can be avoided through
automation. If a server room overheats, halting the OS on the nodes
is not enough. The nodes need to be powered off cleanly to prevent
damage to equipment.
* **Repairs**: In most cases, repairing a node means that the node has
to be powered off.
* **Crashes**: Software bugs may crash a node. Having an OS
independent way to power-cycle a node helps to recover the node
without human intervention.
Overview
--------
Ganeti will be extended with OOB capabilities through adding a new
**Cluster Parameter** (``--oob-program``), a new **Node Property**
(``--oob-program``), a new **Node State (powered)** and support in
``gnt-node`` for invoking an **External Helper Command** which executes
the actual OOB command (``gnt-node nodename ...``). The
supported commands are: ``power on``, ``power off``, ``power cycle``,
``power status`` and ``health``.
.. note::
The new **Node State (powered)** is a **State of Record**
(:term:`SoR`), not a **State of World** (:term:`SoW`). The maximum
execution time of the **External Helper Command** will be limited to
60s to prevent the cluster from getting locked for an undefined amount
of time.
Detailed Design
---------------
New ``gnt-cluster`` Parameter
+++++++++++++++++++++++++++++
| Program: ``gnt-cluster``
| Command: ``modify|init``
| Parameters: ``--oob-program``
| Options: ``--oob-program``: executable OOB program (absolute path)
New ``gnt-cluster epo`` Command
+++++++++++++++++++++++++++++++
| Program: ``gnt-cluster``
| Command: ``epo``
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all``
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the
| cluster back online
| ``--force``: To force the operation without asking for confirmation
| ``--groups``: To operate on groups instead of nodes
| ``--all``: To operate on the whole cluster
This is a convenience command to allow easy emergency power off of a
whole cluster or part of it. It takes care of all steps needed to get
the cluster into a sane state to turn off the nodes.
With ``--on`` it does the reverse and tries to bring the rest of the
cluster back to life.
.. note::
The master node is not able to shut itself cleanly down. Therefore,
this command will not do all the work on single node clusters. On
multi node clusters the command tries to find another master or if
that is not possible prepares everything to the point where the user
has to shutdown the master node itself alone this applies also to the
single node cluster configuration.
New ``gnt-node`` Property
+++++++++++++++++++++++++
| Program: ``gnt-node``
| Command: ``modify|add``
| Parameters: ``--oob-program``
| Options: ``--oob-program``: executable OOB program (absolute path)
.. note::
If ``--oob-program`` is set to ``!`` then the node has no OOB
capabilities. Otherwise, we will inherit the node group respectively
the cluster wide value. I.e. the nodes have to opt out from OOB
capabilities.
Addition to ``gnt-cluster verify``
++++++++++++++++++++++++++++++++++
| Program: ``gnt-cluster``
| Command: ``verify``
| Parameter: None
| Option: None
| Additional Checks:
1. existence and execution flag of OOB program on all Master
Candidates if the cluster parameter ``--oob-program`` is set or at
least one node has the property ``--oob-program`` set. The OOB
helper is just invoked on the master
2. check if node state powered matches actual power state of the
machine for those nodes where ``--oob-program`` is set
New Node State
++++++++++++++
Ganeti supports the following two boolean states related to the nodes:
**drained**
The cluster still communicates with drained nodes but excludes them
from allocation operations
**offline**
if offline, the cluster does not communicate with offline nodes;
useful for nodes that are not reachable in order to avoid delays
And will extend this list with the following boolean state:
**powered**
if not powered, the cluster does not communicate with not powered
nodes if the node property ``--oob-program`` is not set, the state
powered is not displayed
Additionally modify the meaning of the offline state as follows:
**offline**
if offline, the cluster does not communicate with offline nodes
(**with the exception of OOB commands for nodes where**
``--oob-program`` **is set**); useful for nodes that are not reachable
in order to avoid delays
The corresponding command extensions are:
| Program: ``gnt-node``
| Command: ``info``
| Parameter: [ ``nodename`` ... ]
| Option: None
Additional Output (:term:`SoR`, ommited if node property
``--oob-program`` is not set):
powered: ``[True|False]``
| Program: ``gnt-node``
| Command: ``modify``
| Parameter: nodename
| Option: [ ``--powered=yes|no`` ]
| Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
| the node in question
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| Program: ``gnt-node``
| Command: ``power [on|off|cycle|status]``
| Parameters: [ ``nodename`` ... ]
| Options: None
| Caveats:
* If no nodenames are passed to ``power [on|off|cycle]``, the user
will be prompted with ``"Do you really want to power [on|off|cycle]
the following nodes: