drbd-8.4.4/.gitignore0000664000000000000000000000171512132747531013143 0ustar rootroot/autom4te.cache /config.log /config.status /configure /drbd-*.tar.gz /drbd.spec /drbd-kernel.spec /drbd-km.spec /ID /TODO /tags /Makefile user/Makefile scripts/Makefile documentation/Makefile ./.filelist ./drbd_config.h *.gcda *.gcno *.o drbd/drbd.ko drbd/drbd.ko.unsigned drbd/.*.cmd drbd/compat/.*.cmd drbd/.compat.h.d drbd/.config.timestamp drbd/.kernel.config.gz drbd/.drbd_kernelrelease drbd/.drbd_kernelrelease.new drbd/.tmp_versions drbd/Module.symvers drbd/compat.h drbd/drbd.mod.c drbd/drbd_buildtag.c drbd/modules.order drbd/linux/drbd_config.h.orig user/config.h user/config.h.in user/drbd_buildtag.c user/drbd_strings.c user/drbdadm user/drbdadm_scanner.c user/drbdmeta user/drbdmeta_scanner.c user/drbdsetup documentation/drbd.8 documentation/drbd.conf.5 documentation/drbdadm.8 documentation/drbddisk.8 documentation/drbdmeta.8 documentation/drbdsetup.8 documentation/manpage.links documentation/manpage.refs documentation/drbdsetup_*.xml benchmark/dm drbd-8.4.4/COPYING0000664000000000000000000004310611101361566012201 0ustar rootroot GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) 19yy This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19yy name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. drbd-8.4.4/ChangeLog0000664000000000000000000012464512226007136012727 0ustar rootrootLatest: ------ For even more detail, use "git log" or visit http://git.drbd.org/. 8.4.4 (api:genl1/proto:86-101) -------- * Changes from 8.3.16 * fix potential deadlock when concurrently fencing and establishing a connection * fix potential distributed deadlock during verify or resync * fix decoding of bitmap vli rle for device sizes > 64 TB * fix for deadlock when using automatic split-brain-recovery * only fail empty flushes if no good data is reachable * avoid to shrink max_bio_size due to peer re-configuration * fix resume-io after reconnect with broken fence-peer handler * crm-fence-peer: less cib polling, peer-alive detection using crmadmin, and other improvements * OCF RA: avoid too tight pacemaker driven recovery loop on failed promote, implement adjust_master_score parameter * fix too large bursts for very slow resync rates * don't let application IO throttle resync to a stall * Fixed a hole in the challenge-response implementation * Fixed a theoretical IO deadlock (Only triggers with unusual small AL i.e. 7) * Fixed attaching to disks with fixed size external meta-data (This regressions was introduced with 8.4.3) * Fixed a crash when the connection broke at a very exact point in time while a graceful disconnect executes * Support for REQ_DISCARD * Allow parallel promote/demote * Allow online changing of al-stripes and al-stripe-size with the resize command * drbdadm adjust now deals with IP-address changes correctly * Align the internal object model with drbd-9.0 (away from drbd-8.3) * New drbd-proxy option "bwlimit"; allow proxy options on host level * Compiles on Linux v3.10 8.4.3 (api:genl1/proto:86-101) -------- * Changes from drbd-8.3.14 and drbd-8.3.15 * Do not end up in Unconnected state if the connection breaks at a certain point during the handshake (was introduced with 8.3.12) * Closed a race condition between promoting and connection handshake, that could lead to an inconsistent view of the peer's UUIDS * Fixed a race condition that could lead to a full sync, if connection breaks at a certain point while starting a regular resync * Fixed crm-fence-peer.sh for pacemaker 1.1.8 * Fixed a list corruption for read requests that complete with an error after they were aborted * Fixed a kernel panic if the device size was an exact multiple of 128MiB * Fixed a potential memory allocation during deadlock online resize * Improve the handling of read-errors: Make sure that sectors that had a read error are overwritten with data from the peer on the next resync * Expose the data-generation-uuids through /sys/block/drbdXX/drbd/ * The new flag --peer-max-bio-size for drbdmeta create-md is of use if you plan to use the device for long time without peer * Fixed a potential protocol error and resulting disconnect/reconnect, when a disk fails on the secondary node. (Introduced with 8.4.2) * Do not fail empty flushed on diskless, primary node * Fixed calculation of master scores for single volume and unconfigured resources in the drbd.ocf RA * When the connection breaks during a grace full disconnect DRBD failed to call the eventually configured fence-peer handler. Fixed. * Obey md-barrier setting when changed via the disk-options command * Non blocking queuing of AL-updates; This change significantly improves the number of IOPs in case the workload does not fit into the configured AL size. (Backport from drbd-9) * New options for drbdmeta create-md --al-stripes and --al-stripe-size to create a larger and striped AL * Fixed drbdadm adjust of max-bio-bvecs * Fixed a memory leak of 336 byte per attach/detach cycle * Fix a potential null-pointer deference when configuring invalid resync after dependencies * Compiles on Linux v3.7 8.4.2 (api:genl1/proto:86-101) -------- * Fixed IO resuming after connection was established before fence peer handler returned * Fixed an issue in the state engine that could cause state lockup with multiple volumes * Write all pages of the bitmap if it gets moved during an online resize operation. (This issue was introduced with 8.3.10) * Fixed a race condition could cause DRBD to go through a NetworkFailure state during disconnect * Fixed a race condition in the disconnect code path that could lead to a BUG() (introduced with 8.4.0) * Fixed a write ordering problem on SyncTarget nodes for a write to a block that gets resynced at the same time. The bug can only be triggered with a device that has a firmware that actually reorders writes to the same block (merged from 8.3.13) * Fixed a potential deadlock during restart of conflicting writes * Disable the write ordering method "barrier" by default, since it is not possible for a driver to find out if it works reliably since 2.6.36 * All fixes that went into 8.3.13 * Removed a null pointer access when using on-congestion policy on a diskless device * In case of a graceful detach under IO load, wait for the outstanding IO. (As opposed to aborting IOs as a forcefully detach does) * Reinstate disabling AL updates with invalidate-remote (8.4.0 regression) * Reinstate the 'disk-barrier no', 'disk-flushes no', and 'disk-drain no' switches (8.4.0 regression) * Backported the request code from DRBD-9. Improves handling of many corner cases. * Support FLUSH/FUA bio flags * Made the establishing of connections faster * New option 'al-updates no' to disable writing transactions into the activity log. It is use full if you prefer a full sync after a primary crash, for improved performance of a spread out random write work load * Expose the data generation identifies via sysfs * "--stop" option for online verify to specify a stop sector 8.4.1 (api:genl1/proto:86-100) -------- * Fixed a bug that might cause in kernel list corruption triggered by simultaneous IO on multiple volumes in a single resource * Fixed a bug that might cause a kernel OOPS in the worker thread while the receiver tied to establish a connection (drbd-8.4.0 regression) * Fixed an issue in the receiver that could cause connection triggered by simultaneous IO on multiple volumes in a single resource * Consider the discard-my-data flag for all volumes * Fixed attaching to backing devices that do not support barriers/flushes, when barriers/flushes are not disabled by the configuration. (drbd-8.4.0 regression) * Fixed a rare compatibility issue with DRBD's older than 8.3.7 when negotiating the bio_size * Fixed a rare race condition where an empty resync could stall with if pause/unpause events happen in parallel * Made the re-establishing of connections quicker, if it got a broken pipe once. Previously there was a bug in the code caused it to waste the first successful established connection after a broken pipe event. * crm-fence-peer.sh: Can now deal with multiple DRBD instances being in a master/slave group * Optional load balancing for read requests: new keyword "read-balance" 8.4.0 (api:genl1/proto:86-100) -------- * Fixed handling of read errors during online verify * Fix for connecting on high latency network links * Fixed state transitions if fence-peer handler returns after connection was established again * Go into inconsistent disk state with on-io-error=pass-on policy * Timeouts for requests processing on the peer (previously that worked only if the data socket was congested) * Reworked Linux backward compatibility mechanism * Conflicting write detection is now based on an interval tree, removed the hash-tables (necessary for the unlimited BIO sizes) * Removed the tracing framework * Support for multiple volumes (minors, block devices) per connection; up to 65536 volumes per connection supported * Reduced IO latencies during some state changes (esp. start resync) * New on disk format for the AL: double capacity; 4k aligned IO; same space * Multiple AL changes in a single transaction (precondition for unlimited BIO sizes) * DRBD no longer imposes any limit on BIO sizes * Removed DRBD's limits on the number of minor devices * DRBD's minors can now be removed (not only unconfigured) * Switched the user space interface form connector to generic netlink * drbdadm, configuration changes: volume sections; syncer section removed; bool options got yes/no values, that improves option inheritance; resource options * drbdsetup: new commands for creating and removing resources and minors * drbdsetup: new commands for changing disk options while the disk is attached; ...for changing net options while the connection is established * drbdsetup/drbdadm the wire-protocol is now a regular connection option * Removed drbdadm option --force * IO freezing/thawing is done on connection (all volumes) level * fencing is done on connection (all volumes) level * Enforce application of activity log after primary crash in user space * Features from drbd-8.3: Allow detach from frozen backing devices with the new --force option; configurable timeout for backing devices by the new disk-timeout option * Renamed --dry-run of connect to --tentative; plus alias in drbdsetup * drbdadm got a "help" sub command, that shows the specific options * drbdadm now knows all drbdsetup options, and verify ... * drbdadm can now process all options in random order, and ignores the "--" separator; compatibility aliases with the old calling conventions; now it is compatible with the pre 8.4 way of calling. * New default values (compared to drbd-8.3) for: minor-count, ko-count, al-extents, c-plan-ahead, c-fill-target, c-min-rate, use-rle, on-io-error 8.3.10 (api:88/proto:86-96) -------- * Fixed a subtle performance degradation that might affected synchronous work loads (databases) (introduced in 8.3.9) * Fixed a locking regression (introduced in 8.3.9) * Fixed on-no-data-accessible for Primary, SyncTarget nodes (Bugz 332) * Progress bar for online verify * Optionally use the resync speed control loop code for the online verify process as well * Added code to detect false positives when using data-integrity-alg * New config option on-congestion and new connection states ahead and behind * Reduced IO latencies during resync, bitmap exchange and temporal states * Only build a single kernel module package on distributions that provide the infrastructure to have kernel version independent modules * On 64bit architectures allow device sizes up to one petabyte 8.3.9 (api:88/proto:86-95) -------- * Fix for possible deadlock on IO error during resync * Fixed a race condition between adding and removing network configuration. Lead to a BUG_ON() when triggered. * Fixed spurious full syncs that could happen after an empty resync and concurrent connection loss. * Fixed spurious full syncs that happened when connection got lost while one node was in WFSyncUUID state (Bugz 318) * Fixed a race in the meta-data update code path, that could lead to forgotten updates to the meta-data. That in fact could lead to unexpected behavior at the next connect * Fixed potential deadlock on detach * Fixed potential data divergence after multiple failures * Implicitly create unconfigured devices which are referenced in sync-after dependencies. * OCF RA now also works with pacemaker 1.1 * Allow BIO sizes of up to 128kByte. Note: In case drbd-proxy is used, at least version 1.0.16 of drbd-proxy is required. * New configuration keyword on-no-data-accessible. Possible values io-error, and suspend-io. The default is "io-error", which matches the previous behavior. * If the fencing policy is set to resource-and-stonith, the primary node will creates the new current UUID _after_ the fencing handler returned. (Before it did immediately) * Rewrote the resync speed control loop code. New configuration parameters c-plan-ahead, c-fill-target, c-delay-target, c-max-rate, c-min-rate. * Disable activity log updates when all blocks of an unconnected device is are out of sync. That can be activated by using "invalidate-remote" on an unconnected primary. * Improved IPv6 support: link local addresses * Improved resync speed display in /proc/drbd 8.3.8 (api:88/proto:86-94) -------- * Do not expose failed local READs to upper layers, regression introduced in 8.3.3 * Fixed support for devices with 4k hard sector size (again) * Fixed a potential Oops in the disconnect code * Fixed a race condition that could cause DRBD to consider the peers disk as Inconstent after resync instead of UpToDate (Bugz 271) * Fixed a reace condition that could cause DRBD to consider the peers disk as Outdated instead of Inconsistent during resync (Bugz 277) * Disallow to start a resync with invalidate / invalidate-remote when the source disk is not UpToDate * Forcing primary works now also for Consistent, not only for Outdated and Inconsistent (Bugz 266) * Improved robustness against corrupt or malicous sector addresses when receiving data * Added the initial-split-brain, it gets called also if the split-brain gets automatically resolved * Added the --assume-clean option for the resize command, it causes drbd to not resync the new storage after an online grow operation * drbdadm: Do not segfault if stacked-on-top-of refers to an undefined res * drbdadm: Do not consider configs with invalid after statements as invalid * drbdadm: Do not segfault if the peer's proxy section is missing * drbdadm: Allow nullglob in include statement * drbdadm: Fixed the use of waitpid * init script: fix insserv headers (Debian 576901) * Gave the receiving code the ability to use multiple BIOs for writing a single data packet; now DRBD works with BIOs up to 32kByte also on LVM devices; from now on the use_bmbv config option does nothing * New command check-resize, that allows DRBD to detect offline resizing and to move internal meta-data accordingly * Added a control loop, that allows DRBD to find auto tune the resync speed, on connections with large queues (drbd-proxy) * --dry-run option for connect; disconnects after sync handshake * --overwrite-data-of-peer got an alias named --force * Improvements to crm-fence-peer * Fixed option parsing and stacking in snapshot-resync-target-lvm.sh * Compiles on 2.6.33 and 2.6.34 8.3.7 (api:88/proto:86-91) -------- * Lots of fixes to the new RPM packaging * Lots of fixes to the autoconfig stuff * Following the rename of CONFIG_LBD to CONFIG_LBDAF * Silenced an assert. Could trigger after changing write ordering (Bugz 261) * Fixed a race condition between detach and ongoing IO. Very hard to trigger, caused an OOPS in make_request/drbd_make_request. (Bugz 262) * Fixed a regression in the resync handshake code introduced before 8.3.3. That bug causes DRBD to block during the initial handshake when a partial resync is not possible but a full resync is necessary. Happens very rarely. (Bugz 260) * Do not drop into StandAlone mode when connection is lost during authentication * Corrected a null test in the authentication code, found by conccinelle, thanks to upstream integration. The chance to trigger that was probably 10^-9. * crm-fence-peer.sh is now also usable if DRBD is managed from the xen block helper script * Fixes to the init script's dependencies * Backported cleanups that where contributed to the in kernel DRBD * Allow online resizing of disconnected devices, new option to drbdsetup drbdsetup /dev/drbdX resize --assume-peer-has-space * Allow multiple after options in the syncer section for stacked setups * Correctly process relative paths in include statements in drbd.conf * New option (-t) for drbdadm to test syntax of config snippets * Following Linux upstream changes 2.6.32 (SHASH and in_flight issues) * New /etc/drbd.conf example that suggests the use of /etc/drbd.d/xxx.res 8.3.6 (api:88/proto:86-91) -------- * Make sure that we ship all unplug events * Introduced autoconf, new RPM packaging 8.3.5 (api:88/proto:86-91) -------- * Fixed a regression introduced shortly before 8.3.3, which might case a deadlock in DRBD's disconnect code path. (Bugz 258) * Fixed drbdsetup X resume-io which is needed for the recovery from the effects of broken fence-peer scripts. (Bugz 256) * Do not reduce master score of a current Primary on connection loss, to avoid unnecessary migrations * Do not display the usage count dialog for /etc/inti.d/drbd status 8.3.4 (api:88/proto:86-91) -------- * Fixed a regression in the connector backport introduced with 8.3.3. Affected only kernels older than 2.6.14. I.e. RHEL4 and SLES9. 8.3.3 (api:88/proto:86-91) -------- * Correctly deal with large bitmaps (Bugz 239, 240) * Fixed a segfault in drbdadm's parser for unknown sync-after dependencies * DRBD_PEER was not set for handlers (introduced in 8.3.2) (Bugz 241) * Fixed a bug that could cause reads off diskless DRBD devices to get very slow * Fixed a deadlock possible when IO errors occure during resync (Bugz 224) * Do not do a full sync in case P_SYNC_UUID packet gets lost (Bugz 244) * Do not forget a resync in case the last ACK of a resync gets lost * The UUID compare function now handles more cases when connection/disk got lost during UUID updates (Bugz 251, 254) * If a resource gets renamed (only) update its /dev/ * drbdsetup get-gi/show-gi sometimes warned about unknown tags (Bugz 253) * Autotune sndbuf-size and rcvbuf-size by default * Fixed many spelling errors * Improvements on the crm-fence-peer Pacemaker integration * Do not upgrade a Consistent disk to UpToDate when the fence-peer handler can not reach the peer (Bugz 198) * Support for Infiniband via SDP (sockets direct protocol) * Install bash completion stuff on SLES11 * Following Linux upstream changes 2.6.31 8.3.2 (api:88/proto:86-90) -------- * Fixed the "Not a digest" issue for hash functions already ported to shash * Fixed a race condition between device configuration and de-configuration * Fixed: The sync-after logic modified flags of an unconfigured device. This caused very weird symptoms later. (Bugz 214) * Fixed a possible imbalance of the 'pe' counter during online verify * Fixed activity-log reading; could have been (partially) ignored, leading to incomplete resync after primary crash. * Fixed a deadlock when using automatic split brain recovery * Fixed a possible kernel crash in DRBD on highmem kernels. Was triggered by reading in the bitmap on one device, and writing data to an other, disconnected devices at the same time. * Fix for potential segfaults in drbdadm's sh-status & status commands * Correctly clean up resync status if a detach interrupts a resync (Bugz 223) * Reasonable error reporting if 'drbdadm invalidate' fails because resync already runs * New module parameter: disable_sendpage. Workaround for a Xen on DRBD issue * Allow detach while being SyncTarget (Bugz 221) * Optional RLE compression of the bitmap exchange ('use-rle' keyword) * Rewrite of the LRU code * Allow to skip initial resync: 'drbdadm -- --clear-bitmap new-current-uuid' * Upon request of Dolphin NICS: The 'sci' keyword is now called 'ssocks' * Allow more than two host ('on' keyword) sections, new drbdadm option '--peer' * New, alternate keyword 'floating' for host sections, for matching by ip-address * An OCF resource agent for pacemaker [Marked as BETA for this release] * crm-fence-peer.sh: A fence peer handler using pacemaker constraints * /etc/init.d/drbd stop now works completely independent of the config file * arbitrary custom device names (prefix drbd_ required) * Code cleanups for Linux mainline submission (no functional change) * Using Linux's own tracing framework instead of our own * Compatibility with Linux 2.6.30 and 2.6.31-rc1 8.3.1 (api:88/proto:86-89) -------- * Fixed drbdadm invalidate on disconnected devices (reg in 8.2.7) * Fixed a hard to trigger spinlock deadlock when using device stacking with the upper device having a smaller minor number than the lower device. (Bugz 210) * Adding a missing range check in ensure_mdev() * Implemented a congested_fn; the kernel can keep its pdflushes running now * Improvements the connection code for high latency links * Fix for several potential memory leaks when allocating a device * Use an additional meta data bit to store the fact of an old crashed primary * Udev rule that populates /dev/drbd/by-res/ and /dev/drbd/by-disk/ * New timeout option: outdated-wfc-timeout * New drbdmeta option: --ignore-sanity-checks * Include statement for drbd.conf * Improvements to drbd-overview.pl * Fixed snapshot-resync-target-lvm.sh to work with more than 10 devices * Do not force a full resync after a detach on a primary node * Compatibility with Linux 2.6.27, 2.6.28 and 2.6.29 8.3.0 (api:88/proto:86-89) -------- * Fixed 'sleep with spinlock held' in case online verify found a difference * Fixed error code pathes in request processing. * Fix for stack smashing in drbdmeta * Fixed a bug that could lead to a crash when detaching/attaching on the primary under heavy IO (Bugz 171) * Fixed a bug in the new epoch code (introduced with 8.2.7). Might cause crash at reconnect after connection loss during heavy IO (Bugz 160) * Fixed a bug in drbdsetup that could cause drbdsetup wait-connect to miss the connection event. * Fixed a race condition in the new barrier code. (Reordered barrier ACKs) * Do not rely on blkdev_issue_flush() returning ENOTSUPP * bitmap in unmapped pages = support for devices > 4TByte (was DRBD+) * checksum based resync (was DRBD+) * support for stacked resource (was DRBD+) * Added support for stacked resources to the bash completion stuff * Added missing documentation (manpages) * Fixed drbdadm handlers for stacked resources * Support of drbd-proxy in stacked setups * RedHat cluster suite (rgmanager) integration scripts * Renamed 'state' to 'role' * More build compatibility with older vendor kernels * Added drbd-overview.pl to the packages 8.2.7 (api:88/proto:86-88) -------- * Fixed possible Oops on connection loss during sync handshake * Fixed various possible deadlocks in the disconnect/reconnect and online-verify code * Fixed possible access-after-free * Added support for TCP buffer autotuning * Added support for AF_SCI aka "Super Sockets" * Added support for IPV6 * latency improvements * Support for using barriers to enforce write ordering on the secondary node. New config options: no-disk-barrier, no-disk-drain * Merged all changes from 8.0.12 -> 8.0.14 into 8.2 8.2.6 (api:88/proto:86-88) -------- * The details of the LRU data structures is now hidden from /proc/drbd but can be re-enabled by echoing 1 to /sys/module/drbd/parameters/proc_details * Bash completion support for drbdadm (Installs on all distributions but RedHat, since they lack the /etc/bash_completion.d directory) * Fixed the out-of-sync handler it never fired. * Added the before-resync-target handler, and a default implementation of a handler that snapshots the target of a resynchronisation in case it is a LVM2 logic volume. * Improved error messages and documentation in regards to false positives of online verify and data-integrity-alg mechanisms in the presence of ReiserFS or the Linux swap code. * Added the max-bio-bvecs option to workaround issues in a stack of DRBD/LVM/Xen. * Merged all changes from 8.0.11 -> 8.0.12 into 8.2 * Fixed online resizing in case it is triggered from the secondary node. 8.2.5 (api:88/proto:86-88) -------- * Fixed a race between online-verify and application writes. It caused drbd to report false positives, and very likely deadlocked immediately afterwards. * When DRBD is build for kernels older than 2.6.16 mutexes are provided by a wrapper include file which is shipped with DRBD. We had a bug in there that caused one of DRBD's threads to lockup after the first connection loss. Fixed. * Merged all changes from 8.0.8 -> 8.0.11 into 8.2 8.2.4 (api:88/proto:86-88) -------- * Fixed the online-verify and data-integrity-alg features. While preparing DRBD for Linux-2.6.24 a bug was introduced that rendered most digest based functionality in DRBD useless. 8.2.3 (api:88/proto:86-88) -------- * Released the online-verify feature from DRBD+ into drbd-8.2 * Fixed the data-integrity-alg feature to work correctly in 'allow-two-primaries' mode. * Significant latency improvement: Implemented sane default CPU bindings (affinity mask) of threads, and added the tuning option 'cpu-mask'. * A completely new drbdmeta - finally dealing with all drbd-0.7 to drbd-8.x conversion cases correctly. * Merged all changes from 8.0.7 -> 8.0.8 into 8.2 8.2.1 (api:86/proto:86-87) -------- * Fixed the OOPS when _not_ using a shared secret on kernel before Linux-2.6.19 (=old crypto API). * Support for drbd-proxy's configuration directives in drbdadm. * Look for /etc/drbd-82.conf before drbd-08.conf before drbd.conf * drbdadm now allows one to move the device, disk and meta-disk statements out of the host ("on") sections into the resource section. This allows you to shorten your config files. * Merged all changes from 8.0.6 -> 8.0.7 to 8.2 8.2.0 (api:86/proto:86-87) -------- * Branch for new features after 8.1 and 8.0. We will do a number of features that do not fiddle with the general architecture of DRBD. This will be like the the current Linux-2.6 development model. * Implemented the data-integrity-alg option. When this is set to one of the kernel's hash algorithms, such a hash is shipped with every user-data carrying packet. In case user-data is corrupted on the network DRBD protests by dropping the connection. Changelog for fixes propagated from 8.0.x: ------------------------------------------ 8.0.16 (api:86/proto:86) -------- * Fixed the init script to work with all flavors of LSB 8.0.15 (api:86/proto:86) -------- * Adding a missing range check in ensure_mdev() * Fix for several potential memory leaks when allocating a device (Bugz 135) * Fixed error code pathes in request processing (Bugz 149) * Fixed an minor issue with the exposed data UUID logic (Bugz 164) * Fixed tight attach/detach loops (Bugz 171) * Fixed drbdadm adjust, when a running config contains "--discard-my-data" (Bugz 201) * Fix stack smashing in drbdmeta * Adding a missing range check in ensure_mdev() (Bugz 199) * Compatibility with Linux 2.6.27 and 2.6.28 8.0.14 (api:86/proto:86) -------- * fixed some access-after-free corner cases * fixed some potential deadlocks on state changes * don't use bounce buffers (reduce unnecessary buffer page bounces on 32bit) * properly serialize IO operations on the whole bitmap * reduce race possibilities in our thread restart code * linux 2.6.27 compatibility * latency improvements using TCP_QUICKACK * reduced spurious coding differrences between drbd-8.0 and drbd-8.2 * drbdsetup now checks for /proc/drbd instead of playing netlink ping-pong to determin whether the drbd kernel module is present. * fixed (harmless but annoying) stack corruption in drbdadm * adjusted timeouts on 'detach' and 'down' * fixed unit conversion of disk size configuration parameter * fixed drbdadm/drbdsetup default unit mismatch for disk size configuration parameter * drbd.spec file update * documentation update 8.0.13 (api:86/proto:86) -------- * Fixed online resizing if there is application IO on the fly when the resize is triggered. * Fixed online resizing if it is triggered from the secondary node. * Fixed a possible deadlock in case "become-primary-on-both" is used, and a resync starts * Fixed the invocation of the pri-on-incon-degr handler * Fixed the exit codes of drbdsetup * sock_create_lite() to avoid a socket->sk leak * Auto-tune socket buffers if sndbuf-size is set to zero * Made it to compile on Linux-2.6.26 8.0.12 (api:86/proto:86) -------- * Corrected lock-out of application IO during bitmap IO. (Only triggered issues with multi-terrabyte volumes) * If an attach would causes a split-brain, abort the attach, do not drop the connection * A node without data (no disk, no connection) only accepts data (attach or connect) if that data matches the last-known data * Fixed various race conditions between state transitions * Various bugfixes to issues found by using the sparse tool * Corrected the exit codes of drbdsetup/drbdadm to match the expectations of dopd (drbd-outdate-peer-daemon) * Corrected the online changing of the number of AL extents while application IO is in flight. * Two new config options no-disk-flushes and no-md-flushes to disable the use of io subsystem flushes and barrier BIOs. * Make it compile on Linux-2.6.25 * Support for standard disk stats * Work on stalling issues of the resync process * drbdsetup /dev/drbdX down no longer fails for non-existing minors * Added wipe-md to drbdadm 8.0.11 (api:86/proto:86) -------- * If we had no IO while a device was connected it could happen that the pending count was erroneously decreased to -1 at disconnect. Fixed. * Fixed a race that could deadlock worker and receiver while disconnecting a diskless node from a primary peer. * Fixed a minimal meory leak, upon each module unload of DRBD. 8.0.10 (api:86/proto:86) -------- * Fixed a race condition in the disconnect code path that could cause the pending count to not return to zero. This means that the next role change will block forever. 8.0.9 (api:86/proto:86) -------- * In case our backing devices support write barriers and cache flushes, we use these means to ensure data integrity in the presence of volatile disk write caches and power outages. * Fixed DRBD to no longer log "Sync bitmap getting lost (drbd_bm_set_bits_in_irq: (!b->bm))" endlessly after the local disk was lost. * Fixed protocol A for BIOs larger than the page size. If you hit the bug, DBBD would simply lose connection. * Don't accidentally truncate the device in case a secondary with a too small disk gets connected. * Changed state processing so that state changes visible via the drbdsetup events interface are always in the right order. * Made drbddisk's exit codes more LSB compliant. * Back-ported the new drbdmeta to drbd-8.0 (from drbd-8.2). * More robustness to the code that establishes connections. * Made '/etc/init.d/drbd status' to print a nice overview. * Made it to compile on Linux-2.6.24. 8.0.8 (api:86/proto:86) -------- * Improvements to the bitmap code. (The old code might re-enable interrupts by accident) * Fixed a endianness issue in the write protocol. C bit-fields are might by laid out differently on little/big endian machines. * Drbdadm's adjust forgot sometimes to adjust options values that where inherited from the common section, fixed that. * Removed dopd. It gets (and should be) shipped with heartbeat. * When peer authentication is enabled, you could trick drbd to send a state report packet before it authenticated itself. Fixed that. * Added robustness to resync pause/continue. * Drbdsetup should not report a random error if no netlink answer is received from the drbd module. * Fixes to error code paths. ( drbd_drain_block() and lc_alloc() ) * Fixed a possible OOPS in case one manages to loose disk and network concurrently. (iSCSI disk and network over same switch) * Fixed the broadcasting of state change events to userspace. 8.0.7 (api:86/proto:86) -------- * Fixed drbdmeta's conversion of 07 style meta data. * Handle the failure of vmalloc() in the bitmap code more gracefully. * Do not pause resync on unconfigured devices. * Added missing pieces of the implementation of the "pri-lost" handler. * Added the "split-brain" handler. * Drop the network config after failure to authenticate. * Made it to compile on Linux-2.6.24-rc1. * Fixed an unlikely race that could cause a device to get stuck in SyncPause. * Online resizing failed to start resync properly (although it set up all the meta data correct). Fixed that. * Minor improvements to documentation and error messages. 8.0.6 (api:86/proto:86) -------- * Fixed DRBD to not deadlock while doing bitmap updates on Linux 2.6.22 and later. * Make it compile on Linux-2.6.22 and later. * Removed a hard coded path to docbook DTDs from our SGML files, maybe improving the situation with building the documentation. * When a drbd connect attempt gets accepted onto a other program, that simply closes the socket, drbd stays for some seconds in the "BrokenPipe" network state. When one removed the network config during that time, drbd OOPSed. This is fixed now. * drbdmeta can now also initialize meta data on meta devices smaller than 128MB. * Added an explicit NULL argument to our ioctl() calls in drbdmeta. * Added scripts/block-drbd, which is a nice way hand over DRBD role assignment to Xen, allowing one to do Xen live migrations in a sane way. * Added scripts/pretty-proc-drbd.sh * Added an option to drbd.conf which instructs the init script to promote DRBD devices to primary role upon machine start up. 8.0.5 (api:86/proto:86) -------- * Changed the default behavior of the init script. Now the init script terminates in case the devices refuse to because they had a split brain. Introduced an option to preserve the old behavior. * Fixed a bug where the local_cnt could get imbalanced upon a state change. * Fixed an bug in the UUID algorithm, that could lead to both sides in cs:WFBitMapT state. It was triggered when the disk on the SyncTarget gets detached and attached. * Implemented proper size checking on strings that get communicated with DRBD's netlink protocol. * Changed the maximal length of device names from 32 characters to 128 characters. (udev based disk names might be very long nowadays) * Fixed the after-sb-0pri policies discard-younger/discard-older * When the resync speed was changed to a considerably lower value while resync was running, it could happen that we erroneously decremented the local_cnt too often. * Fixed a bug in the UUID code, that caused drbd to erroneously report a split brain after changing the role of a diskless node multiple times. * Both nodes ended up in SyncSource when a state change occurred on one node while the disk state on the other node is in the temporal 'Negotiating' state. Fixed got fixed. * drbdmeta's parse/scan code for meta-data dumps got fixed for huge devices, and an improved error reporting. * drbdadm had difficulties with option values of type string that start with an digit. Fixed. * Fixed a code path that should it make possible to unload the module even in case some of our receive buffers leaked. * The usermode helper program is now user definable. It is no longer hardcoded set to 'drbdadm'. 8.0.4 (api:86/proto:86) -------- * Fixed an OOPS in case you do an invalidate on an diskless device. And made invalidates on diskless devices possible by using drbdmeta. * Fix for an possible OOPS in drbd_al_to_on_disk_bm(). * Fix for a possible OOPS. This issue was triggered when you do an attach very soon (ms) after the disk was dropped. * Fix for a race condition in receive_state(). Symptom was that the resync stalls at 100% on a node. * Some block devices fail requests by clearing the BIO_UPTODATE flag (that is ok), but not returning an error (that is strange). We now deal with that correctly. * Drbdadm's parser will now reject config files with resources with missing "on" sections. (Instead of segfaulting) * Init script continues in case the setup of a single resource fails. * Improvements to the "drbdsetup events" interface: Updates about the resync progress and initial state of all devices if called with "-a". * The benchmark/dm program can now also create custom bandwidth loads. 8.0.3 (api:86/proto:86) -------- * Fixed a race condition that could cause us to continue to traverse a bio after it was freed. (led to an OOPS) * Fixed a race condition that could cause us to use members of an ee, after it was freed. (led to various weirdness) * Language fixes for the man pages. * The drbdsetup commands (events, wait-connect,...) release the lock now. * Minor fixes and updates to the user land tools and to the peer outdater. 8.0.2 (api:86/proto:86) -------- * Removed a bug that could cause an OOPS in drbd_al_to_on_disk_bm() * Improved the robustness of the UUID based algorithm that decides about the resync direction. * Fixed the error handling in case a the open() of a backing block device fails. * Fixed a race condition that could cause a "drbdadm disconnect" to hang. * More verbosity in we can not claim a backing block device. * More verbosity and paranoia in the bitmap area. * On some vendor kernels 8.0.1 did not load because of kzalloc. fixed. * Fault injection can now not only be turned on or off, but can be enabled on a per device basis. * Fixed the scripts and files needed to build drbd into a kernel. 8.0.1 (api:86/proto:86) -------- * Fixed some race conditions that could trigger an OOPS when the local disk fails and DRBD detaches itself from the failing disk. * Added a missing call to drbd_try_outdate_peer(). * LVM's LVs expose ambiguous queue settings. When a RAID-0 (md) PV is used the present a segment size of 64k but at the same time allow only 8 sectors. Fixed DRBD to deal with that fact correctly. * New option "always-asbp" to also use the after-after-split-brain-policy handlers, even it is not possible to determine from the UUIDs that the data of the two nodes was related in the past. * More verbosity in case a bio_add_page() fails. * Replaced kmalloc()/memset() with kzmalloc(), and a wrapper for older kernels * A fast version of drbd_al_to_on_disk_bm(). This is necessary for short (even sub-second) switchover times while having large "al-extents" settings. * Fixed drbdadm's array overflows (of on stack objects) * drbdsetup can now dump its usage in a XML format * New init script for gentoo * Fixed Typos in the usage of /proc/sysrq-trigger in the example config. 8.0.0 (api:86/proto:86) -------- * No effective changes to rc2. 8.0rc2 (api:86/proto:86) -------- * Added the well known automagically adjust drbd_config.h to make drbd compile on every by vendor's backports defaced kernel. ( Linux-2.6.x only of course ) * Fixed races with starting and finishing resync processes while heavy application IO is going on. * Ported DRBD to the new crypto API (and added a emulation of the now API on top of the old one for older 2.6.x kernels) * Code to perform better on Ethernet networks with jumbo frames. * Bugfixes to our request code (race conditions). * Every error code that is returned by drbdsetup has a textual description by now. 8.0rc1 (api:86/proto:85) -------- * The drbd-peer-outdater got updated to work in multi node heartbeat clusters. (But we still not succeeded to get this into Heartbeat's repository accepted.) * Fixed resync decision after a crash in a pri-pri cluster. * Implemented the ping-timeout option for "sub-second" failover clusters. * Implemented all the "violently" options in the reconnect handling. * Updated man pages of drbd.conf and drbdsetup. * Removed the "self-claiming" on secondary nodes. * Fixed an uncountable number of bugs. 8.0pre6 (api:85/proto:85) -------- * All panic() calls where removed from DRBD. * IO errors while accessing the backing storage device are now handled correct. * Conflict detection for two primaries is in place and tested. * More tracing stuff * Lots of bugs found and fixed 8.0pre5 (api:84/proto:83) -------- * The request code was completely rewritten. * The write conflict detection code for primary-primary is currently broken, but will be fixed soon. * drbdsetup is no longer based on IOCTL but works now via netlink/connector. * drbd_panic() is on its way out. * A runtime configurable tracing framework got added. * A lot of effort was put into finding and fixing bugs. 8.0pre4 (api:83/proto:82) -------- * Added the "drbd-peer-outdater" heartbeat plugin. * New ("cluster wide") state changes. (Cluster wide serialization of major state changes, like becoming primary, invalidating a disk etc...) * Write requests are now sent by the worker instead out of the process's context that calls make_request(). * The worker thread no longer gets restarted upon loss of connection. * A test suite developed by students of 'FH Hagenberg' was added. 8.0pre3 (api:82/proto:80) -------- * Now it works on device mapper (LVM) as well as on "real" block devices. * Finally (after years) a sane "drbdadm adjust" implementation, which is really really robust. * Fixes for 64bit kernel / 32 bit userland environments * Fixes in the sys-v init script * Renamed "--do-what-I-say" to "--overwrite-data-of-peer". Hopefully people now understand what this option does. 8.0-pre2 (api:81/proto:80) -------- * removed the "on-disconnect" and "split-brain-fix" config options and added the "fencing" config option instead. * Updated all man pages to cover drbd-8.0 * /proc/drbd shows the whole drbd_state_t, as well the logging of state changes shows every field of drbd_state_t now. * Deactivated most of the TCQ code for now, since it changed again in the mainline kernel. * Minor other fixes. 8.0_pre1 (api:80/proto:80) -------- * Removed support for Linux-2.4.x * Cleanup of the wire protocol. * Added optional peer authentication with a shared secret. * Consolidated state changes into a central function. * Improved, tunable after-split-brain recovery strategies. * Always verify all IDs used in the protocol that are used as pointers. * Introduced the "outdate" disk state, and commands for managing it. * Introduced the "drbdmeta" command, and require the user to create meta-data explicitly. * Support for primary/primary (for OCFS2, GFS...) * Replaced the sync-groups with the sync-after mechanism. * The "common" section in the configuration file. * Replaced the generation counters (GCs) with data-generation-UUIDs * Improved performance by using Linux-2.6's BIOs with up to 32k per IO request. Before we transferred only up to 4k per IO request. * A Warning if the disk sizes are more than 10% different. * A connection teardown packet to differentiate between a crash of the peer and a peer that is shut down gracefully. * External imposable SyncPause states, to serialize DRBD's resynchronization with the resynchronization of backing storage's RAID configurations. * Backing storage can be hot added to disk less nodes. * Prepared for advanced integration to Heartbeat-2.0 * Changed internal APIs so that missed writes of the meta-data super block are reported as they happen. * The http://usage.drbd.org sub project. * Rewrote the scanner/parser of drbd.conf. 10 times smaller/faster and easier to maintain. * Asynchronous meta-data IO [ Code drop from the DRBD+ branch ] drbd-8.4.4/Makefile.in0000664000000000000000000002204112132747531013213 0ustar rootroot# Makefile for drbd # # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. # Copyright (C) 2001-2008, Philipp Reisner . # Copyright (C) 2002-2008, Lars Ellenberg . # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # # TODO move some of the more cryptic bash scriptlets here into scripts/* # and call those from here. -- lge # variables set by configure GIT = @GIT@ LN_S = @LN_S@ PREFIX = @prefix@ RPMBUILD = @RPMBUILD@ SED = @SED@ # features enabled or disabled by configure WITH_UTILS = @WITH_UTILS@ WITH_KM = @WITH_KM@ WITH_UDEV = @WITH_UDEV@ WITH_XEN = @WITH_XEN@ WITH_PACEMAKER = @WITH_PACEMAKER@ WITH_HEARTBEAT = @WITH_HEARTBEAT@ WITH_RGMANAGER = @WITH_RGMANAGER@ WITH_BASHCOMPLETION = @WITH_BASHCOMPLETION@ # default for KDIR/KVER ifndef KVER ifndef KDIR KVER = `uname -r` KDIR = /lib/modules/$(KVER)/build else KVER := $(shell make -s -C $(KDIR) kernelrelease) endif endif KDIR ?= /lib/modules/$(KVER)/build # for some reason some of the commands below only work correctly in bash, # and not in e.g. dash. I'm too lazy to fix it to be compatible. SHELL=/bin/bash SUBDIRS = user scripts documentation drbd REL_VERSION := $(shell $(SED) -ne '/^\#define REL_VERSION/{s/^[^"]*"\([^ "]*\).*/\1/;p;q;}' drbd/linux/drbd_config.h) ifdef FORCE # # NOTE to generate a tgz even if too lazy to update the changelogs, # or to forcefully include the FIXME to be done: latest change date; # for now, include the git hash of the latest commit # in the tgz name: # make distclean doc tgz FORCE=1 # REL_VERSION := $(REL_VERSION)-$(shell $(GIT) rev-parse HEAD) endif DIST_VERSION := $(REL_VERSION) FDIST_VERSION := $(shell test -e .filelist && sed -ne 's,^drbd-\([^/]*\)/.*,\1,p;q' < .filelist) ifeq ($(FDIST_VERSION),) FDIST_VERSION := $(DIST_VERSION) endif all: tools module .PHONY: check-kdir check-kdir: ifeq ($(WITH_KM),yes) @if ! test -e $(KDIR)/Makefile ; then \ echo " SORRY, kernel makefile not found." ;\ echo " You need to tell me a correct KDIR," ;\ echo " Or install the neccessary kernel source packages." ;\ echo "" ;\ false;\ fi endif .PHONY: module module: check-kdir ifeq ($(WITH_KM),yes) @ $(MAKE) -C drbd KVER=$(KVER) KDIR=$(KDIR) @ echo -e "\n\tModule build was successful." endif .PHONY: tools tools: ifeq ($(WITH_UTILS),yes) @ set -e; for i in $(patsubst drbd,,$(SUBDIRS)); do $(MAKE) -C $$i ; done @ echo -e "\n\tUserland tools build was successful." endif doc: $(MAKE) -C documentation doc doc-clean: $(MAKE) -C documentation doc-clean install: install-tools ifeq ($(WITH_KM),yes) $(MAKE) -C drbd install endif install-tools: @ set -e; for i in $(patsubst drbd,,$(SUBDIRS)); do $(MAKE) -C $$i install; done clean: @ set -e; for i in $(SUBDIRS); do $(MAKE) -C $$i clean; done rm -f *~ rm -rf dist distclean: @ set -e; for i in $(SUBDIRS); do $(MAKE) -C $$i distclean; done rm -f *~ .filelist rm -rf dist rm -f config.log rm -f config.status uninstall: @ set -e; for i in $(SUBDIRS); do $(MAKE) -C $$i uninstall; done check_changelogs_up2date: @ up2date=true; dver_re=$(DIST_VERSION); dver_re=$${dver_re//./\\.}; \ echo "checking for presence of $$dver_re in various changelog files"; \ in_changelog=$$(sed -n -e '0,/^%changelog/d' \ -e '/- '"$$dver_re"'-/p' < drbd.spec.in) ; \ if test -z "$$in_changelog" ; \ then \ echo -e "\n\t%changelog in drbd.spec.in needs update"; \ up2date=false; fi; \ in_changelog=$$(sed -n -e '0,/^%changelog/d' \ -e '/- '"$$dver_re"'-/p' < drbd-km.spec.in) ; \ if test -z "$$in_changelog" ; \ then \ echo -e "\n\t%changelog in drbd-km.spec.in needs update"; \ up2date=false; fi; \ in_changelog=$$(sed -n -e '0,/^%changelog/d' \ -e '/- '"$$dver_re"'-/p' < drbd-kernel.spec.in) ; \ if test -z "$$in_changelog" ; \ then \ echo -e "\n\t%changelog in drbd-kernel.spec.in needs update"; \ up2date=false; fi; \ if ! grep "^$$dver_re\>" >/dev/null 2>&1 ChangeLog; \ then \ echo -e "\n\tChangeLog needs update"; \ up2date=false; fi ; \ if ! grep "^AC_INIT(DRBD, $$dver_re" >/dev/null 2>&1 configure.ac; \ then \ echo -e "\n\tconfigure.ac needs update"; \ up2date=false; fi ; \ if ! grep "^drbd8 (2:$$dver_re-" >/dev/null 2>&1 debian/changelog; \ then \ echo -e "\n\tdebian/changelog needs update [ignored]\n"; \ : do not fail the build because of outdated debian/changelog ; fi ; \ $$up2date # XXX this is newly created whenever the toplevel makefile does something. # however it is NOT updated when you just do a make in user/ or drbd/ ... # # update of drbd_buildtag.c is forced: .PHONY: drbd/drbd_buildtag.c drbd/drbd_buildtag.c: $(MAKE) -C drbd drbd_buildtag.c # update of .filelist is forced: .PHONY: .filelist .filelist: @$(GIT) ls-files | sed '$(if $(PRESERVE_DEBIAN),,/^debian/d);s#^#drbd-$(DIST_VERSION)/#' > .filelist @[ -s .filelist ] # assert there is something in .filelist now @find documentation -name "[^.]*.[58]" -o -name "*.html" | \ sed "s/^/drbd-$(DIST_VERSION)\//" >> .filelist ; \ echo drbd-$(DIST_VERSION)/drbd_config.h >> .filelist ; \ echo drbd-$(DIST_VERSION)/drbd/drbd_buildtag.c >> .filelist ; \ echo drbd-$(DIST_VERSION)/.filelist >> .filelist ; \ echo drbd-$(DIST_VERSION)/configure >> .filelist ; \ echo drbd-$(DIST_VERSION)/user/config.h.in >> .filelist ; \ echo "./.filelist updated." # tgz will no longer automatically update .filelist, # so the tgz and therefore rpm target will work within # an extracted tarball, too. # to generate a distribution tarball, use make tarball, # which will regenerate .filelist tgz: test -e .filelist $(LN_S) -f drbd/linux/drbd_config.h drbd_config.h rm -f drbd-$(FDIST_VERSION) $(LN_S) . drbd-$(FDIST_VERSION) for f in $$(<.filelist) ; do [ -e $$f ] && continue ; echo missing: $$f ; exit 1; done grep debian .filelist >/dev/null 2>&1 && _DEB=-debian || _DEB="" ; \ tar --owner=0 --group=0 -czf - -T .filelist > drbd-$(FDIST_VERSION)$$_DEB.tar.gz rm drbd-$(FDIST_VERSION) ifeq ($(FORCE),) tgz: check_changelogs_up2date doc endif check_all_committed: @$(if $(FORCE),-,)modified=`$(GIT) ls-files -m -t`; \ if test -n "$$modified" ; then \ echo "$$modified"; \ false; \ fi prepare_release: $(MAKE) tarball $(MAKE) tarball PRESERVE_DEBIAN=1 configure: configure.ac autoheader autoconf tarball: check_all_committed distclean doc configure .filelist $(MAKE) tgz all module tools doc .filelist: drbd/drbd_buildtag.c kernel-patch: drbd/drbd_buildtag.c set -o errexit; \ kbase=$$(basename $(KDIR)); \ d=patch-$$kbase-drbd-$(DIST_VERSION); \ test -e $$d && cp -fav --backup=numbered $$d $$d; \ bash scripts/patch-kernel $(KDIR) . > $$d ifdef RPMBUILD drbd.spec: drbd.spec.in configure ./configure --enable-spec drbd-km.spec: drbd-km.spec.in configure ./configure --enable-spec --without-utils --with-km drbd-kernel.spec: drbd-kernel.spec.in configure ./configure --enable-spec --without-utils --with-km .PHONY: rpm rpm: tgz drbd.spec cp drbd-$(FDIST_VERSION).tar.gz `rpm -E "%_sourcedir"` $(RPMBUILD) -bb \ $(RPMOPT) \ drbd.spec @echo "You have now:" ; find `rpm -E "%_rpmdir"` -name *.rpm .PHONY: km-rpm km-rpm: check-kdir tgz drbd-km.spec cp drbd-$(FDIST_VERSION).tar.gz `rpm -E "%_sourcedir"` $(RPMBUILD) -bb \ --define "kernelversion $(KVER)" \ --define "kdir $(KDIR)" \ $(RPMOPT) \ drbd-km.spec @echo "You have now:" ; find `rpm -E "%_rpmdir"` -name *.rpm # kernel module package using the system macros. # result is kABI aware and uses the weak-updates mechanism. # Only define %kernel_version, it it was set outside of this file, # i.e. was inherited from environment, or set explicitly on command line. # If unset, the macro will figure it out internally, and not depend on # uname -r, which may be wrong in a chroot build environment. .PHONY: kmp-rpm kmp-rpm: tgz drbd-kernel.spec cp drbd-$(FDIST_VERSION).tar.gz `rpm -E "%_sourcedir"` $(RPMBUILD) -bb \ $(if $(filter file,$(origin KVER)), --define "kernel_version $(KVER)") \ $(RPMOPT) \ drbd-kernel.spec @echo "You have now:" ; find `rpm -E "%_rpmdir"` -name *.rpm .PHONY: srpm srpm: tgz drbd.spec drbd-km.spec cp drbd-$(FDIST_VERSION).tar.gz `rpm -E "%_sourcedir"` $(RPMBUILD) -bs \ --define "kernelversion $(KVER)" \ --define "kernel_version $(KVER)" \ --define "kdir $(KDIR)" \ $(RPMOPT) \ drbd.spec drbd-km.spec drbd-kernel.spec @echo "You have now:" ; find `rpm -E "%_srcrpmdir"` -name *.src.rpm endif drbd-8.4.4/README0000664000000000000000000000065111101361566012024 0ustar rootroot DRBD ====== by Philipp Reisner and Lars Ellenberg LINBIT Information Technologies Reference documentation is included in the documentation directory. Please refer to the web pages at http://www.drbd.org/ http://www.drbd.org/docs/introduction/ to find maintained information. drbd-8.4.4/autogen.sh0000775000000000000000000000046311516050234013143 0ustar rootroot#!/bin/sh # for those that expect an autogen.sh, # here it is. autoheader autoconf echo " suggested configure parameters: # prepare for rpmbuild, only generate spec files ./configure --with-km --enable-spec # or prepare for direct build ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc " drbd-8.4.4/benchmark/Makefile0000664000000000000000000000027711114021040014522 0ustar rootrootCFLAGS=-Wall $(OPTFLAGS) OPTFLAGS=-O2 all: dm io-latency-test io-latency-test: io-latency-test.c $(CC) -pthread -lm -o $@ $^ install: clean: rm -f dm io-latency-test distclean: clean drbd-8.4.4/benchmark/README0000664000000000000000000000013711101361566013755 0ustar rootrootNeeds to be reviewed is untouched since at least the 0.6 days does not work with 0.7 currently drbd-8.4.4/benchmark/dm.c0000664000000000000000000002367511101361566013655 0ustar rootroot/* dm.c Copright 2001-2008 LINBIT Information Technologies Philipp Reisner, Lars Ellenberg dm is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. dm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with dm; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define min(a,b) ( (a) < (b) ? (a) : (b) ) unsigned long long fsize(int in_fd) { struct stat dm_stat; unsigned long long size; if (fstat(in_fd, &dm_stat)) { fprintf(stderr, "Can not fstat\n"); exit(20); } if (S_ISBLK(dm_stat.st_mode)) { unsigned long ls; if (ioctl(in_fd, BLKGETSIZE, &ls)) { fprintf(stderr, "Can not ioctl(BLKGETSIZE)\n"); exit(20); } size = ((unsigned long long)ls) * 512; } else if (S_ISREG(dm_stat.st_mode)) { size = dm_stat.st_size; } else size = -1; return size; } unsigned long long m_strtol(const char *s) { char *e = (char *)s; unsigned long long r; r = strtol(s, &e, 0); switch (*e) { case 0: return r; case 's': case 'S': return r << 9; // * 512; case 'K': case 'k': return r << 10; // * 1024; case 'M': case 'm': return r << 20; // * 1024 * 1024; case 'G': case 'g': return r << 30; // * 1024 * 1024 * 1024; default: fprintf(stderr, "%s is not a valid number\n", s); exit(20); } } void usage(char *prgname) { fprintf(stderr, "USAGE: %s [options] \n" " Available options:\n" " --input-pattern val -a val \n" " --input-file val -i val \n" " --output-file val -o val\n" " --connect-port val -P val\n" " --connect-ip val -c val\n" " instead of -o you might use -P and -c\n" " --buffer-size val -b val\n" " --seek-input val -k val\n" " --seek-output val -l val\n" " --size val -s val\n" " --o_direct -x\n" " affect -i and -o \n" " --bandwidth -w val byte/second \n" " --sync -y\n" " --progress -m\n" " --performance -p\n" " --dialog -d\n" " --help -h\n", prgname); exit(20); } int connect_to (char *ip, int port) { int fd; struct sockaddr_in addr; int ret; if((fd = socket (PF_INET, SOCK_STREAM, 0)) < 0) { perror("socket"); exit(20); } addr.sin_family = AF_INET; addr.sin_port = htons (port); if (inet_aton (ip, &addr.sin_addr) == 0) { fprintf(stderr, "Error in inet_aton (%s).\n", ip); close (fd); exit(20); } if ((ret = connect (fd, (struct sockaddr*) &addr, sizeof (addr))) < 0) { perror("connect"); close (fd); exit(20); } return fd; } int main(int argc, char **argv) { void *buffer; size_t rr, ww; unsigned long long seek_offs_i = 0; unsigned long long seek_offs_o = 0; unsigned long long size = -1, rsize; int in_fd = 0, out_fd = 1; unsigned long buffer_size = 65536; unsigned long long target_bw = 0; int o_direct = 0; int do_sync = 0; int show_progress = 0; int show_performance = 0; struct timeval tv1, tv2; int use_pattern = 0; int pattern; int dialog = 0, show_input_size = 0; int last_percentage = 0; char *input_file_name = NULL; char *output_file_name = NULL; char *connect_target = NULL; int connect_port = 0; int c; static struct option options[] = { {"input-pattern", required_argument, 0, 'a'}, {"input-file", required_argument, 0, 'i'}, {"output-file", required_argument, 0, 'o'}, {"connect-ip", required_argument, 0, 'c'}, {"connect-port", required_argument, 0, 'P'}, {"buffer-size", required_argument, 0, 'b'}, {"seek-input", required_argument, 0, 'k'}, {"seek-output", required_argument, 0, 'l'}, {"size", required_argument, 0, 's'}, {"o_direct", no_argument, 0, 'x'}, {"bandwidth", required_argument, 0, 'w'}, {"sync", no_argument, 0, 'y'}, {"progress", no_argument, 0, 'm'}, {"performance", no_argument, 0, 'p'}, {"dialog", no_argument, 0, 'd'}, {"help", no_argument, 0, 'h'}, {"show-input-size", no_argument, 0, 'z'}, {0, 0, 0, 0} }; if (argc == 1) usage(argv[0]); while (1) { c = getopt_long(argc, argv, "i:o:c:P:b:k:l:s:w:xympha:dz", options, 0); if (c == -1) break; switch (c) { case 'i': input_file_name = optarg; break; case 'o': output_file_name = optarg; break; case 'c': connect_target = optarg; break; case 'P': connect_port = m_strtol(optarg); break; case 'b': buffer_size = m_strtol(optarg); break; case 'k': seek_offs_i = m_strtol(optarg); break; case 'l': seek_offs_o = m_strtol(optarg); break; case 's': size = m_strtol(optarg); break; case 'x': o_direct = 1; break; case 'y': do_sync = 1; break; case 'm': show_progress = 1; break; case 'p': show_performance = 1; break; case 'h': usage(argv[0]); break; case 'a': use_pattern = 1; pattern = m_strtol(optarg); break; case 'd': dialog = 1; break; case 'z': show_input_size = 1; break; case 'w': target_bw = m_strtol(optarg); break; } } if( output_file_name && connect_target ) { fprintf(stderr, "Both connect target and an output file name given.\n" "That is too much.\n"); exit(20); } if(input_file_name) { in_fd = open(input_file_name, O_RDONLY | (o_direct ? O_DIRECT : 0)); /* if EINVAL, and o_direct, try again without it! */ if (in_fd == -1 && errno == EINVAL && o_direct) { in_fd = open(input_file_name, O_RDONLY); if (in_fd >= 0) fprintf(stderr, "NOT using O_DIRECT for input file %s\n", input_file_name); } if (in_fd == -1) { fprintf(stderr, "Can not open input file/device\n"); exit(20); } } if(output_file_name) { out_fd = open(output_file_name, O_WRONLY | O_CREAT | O_TRUNC | (o_direct? O_DIRECT : 0) , 0664); /* if EINVAL, and o_direct, try again without it! */ if (out_fd == -1 && errno == EINVAL && o_direct) { out_fd = open(output_file_name, O_WRONLY | O_CREAT | O_TRUNC, 0664); if (out_fd >= 0) fprintf(stderr, "NOT using O_DIRECT for output file %s\n", output_file_name); } if (out_fd == -1) { fprintf(stderr, "Can not open output file/device\n"); exit(20); } } if(connect_target) { out_fd = connect_to (connect_target, connect_port); } (void)posix_memalign(&buffer, sysconf(_SC_PAGESIZE), buffer_size); if (!buffer) { fprintf(stderr, "Can not allocate the Buffer memory\n"); exit(20); } if (seek_offs_i) { if (lseek(in_fd, seek_offs_i, SEEK_SET) == -1) { fprintf(stderr, "Can not lseek(2) in input file/device\n"); exit(20); } } if (seek_offs_o) { if (lseek(out_fd, seek_offs_o, SEEK_SET) == -1) { fprintf(stderr, "Can not lseek(2) in output file/device\n"); exit(20); } } if (use_pattern) { memset(buffer, pattern, buffer_size); } if (dialog && size == -1) { size = min(fsize(in_fd), fsize(out_fd)); if (size == -1) { fprintf(stderr, "Can not determine the size\n"); exit(20); } if (size == 0) { fprintf(stderr, "Nothing to do?\n"); exit(20); } } if (show_input_size) { size = fsize(in_fd); if (size == -1) { fprintf(stderr, "Can not determine the size\n"); exit(20); } printf("%lldK\n", size / 1024); exit(0); } rsize = size; gettimeofday(&tv1, NULL); while (1) { if (use_pattern) rr = min(buffer_size, rsize); else rr = read(in_fd, buffer, (size_t) min(buffer_size, rsize)); if (rr == 0) break; if (rr == -1) { if (errno == EINVAL && o_direct) { fprintf(stderr, "either leave off --o_direct," " or fix the alignment of the buffer/size/offset!\n"); } perror("Read failed"); break; } if (show_progress) { printf(rr == buffer_size ? "R" : "r"); fflush(stdout); } ww = write(out_fd, buffer, rr); if (ww == -1) { if (errno == EINVAL && o_direct) { fprintf(stderr, "either leave off --o_direct," " or fix the alignment of the buffer/size/offset!\n"); } perror("Write failed"); break; } rsize = rsize - ww; if (dialog) { int new_percentage = (int)(100.0 * (size - rsize) / size); if (new_percentage != last_percentage) { printf("\r%3d", new_percentage); fflush(stdout); last_percentage = new_percentage; } } if (target_bw) { gettimeofday(&tv2, NULL); long sec = tv2.tv_sec - tv1.tv_sec; long usec = tv2.tv_usec - tv1.tv_usec; double bps; double time_should; int time_wait; if (usec < 0) { sec--; usec += 1000000; } bps = ((double)(size - rsize)) / (sec + ((double)usec) / 1000000); if ( bps > target_bw ) { time_should = ((double)(size - rsize)) * 1000 / target_bw; // mili seconds. time_wait = (int) (time_should - (sec*1000 + ((double)usec) / 1000)); poll(NULL,0,time_wait); } } if (ww != rr) break; } if (do_sync) fsync(out_fd); gettimeofday(&tv2, NULL); if (show_progress || dialog) printf("\n"); if (show_performance) { long sec = tv2.tv_sec - tv1.tv_sec; long usec = tv2.tv_usec - tv1.tv_usec; double mps; if (usec < 0) { sec--; usec += 1000000; } mps = (((double)(size - rsize)) / (1 << 20)) / (sec + ((double)usec) / 1000000); printf("%.2f MB/sec (%llu B / ", mps, size - rsize); printf("%02ld:%02ld.%06ld)\n", sec / 60, sec % 60, usec); } if (size != -1 && rsize) fprintf(stderr, "Could transfer only %lld Byte.\n", (size - rsize)); return 0; } drbd-8.4.4/benchmark/io-latency-test.c0000664000000000000000000002136411516050234016264 0ustar rootroot/* io-latency-test.c By Philipp Reisner. Copyright (C) 2006, Philipp Reisner . Initial author. io-latency-test is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. io-latency-test is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with dm; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ /* In case this crashes (in your UML) touch /etc/ld.so.nohwcap */ // compile with gcc -pthread -o io-latency-test io-latency-test.c #include #include #include #include #include #include #include #define _GNU_SOURCE #include #include #define MONITOR_TIME 300000 // Check every 300 milliseconds. (3.33 times per second) #define RECORD_TIME 20000 // Try to write a record every 20 milliseconds (50 per second) #define PERCENTILE 90 unsigned int monitor_time=MONITOR_TIME; unsigned int record_time=RECORD_TIME; unsigned long records=0; struct shared_data { pthread_mutex_t mutex; unsigned long record_nr; unsigned int write_duration_us; unsigned int write_duration_records; unsigned int max_write_duration_us; double avg_write_duration; }; void* wd_thread(void *arg) { struct shared_data *data = (struct shared_data*) arg; unsigned long last_record_nr=-1, current_record_nr=0; unsigned int avg_write,wd,wr,mwd; double avg_write_duration; enum { IO_RUNNING, IO_BLOCKED } io_state = IO_RUNNING; while(1) { usleep(monitor_time); // sleep some milliseconds pthread_mutex_lock(&data->mutex); current_record_nr = data->record_nr; wd = data->write_duration_us; wr = data->write_duration_records; mwd = data->max_write_duration_us; data->write_duration_us = 0; data->write_duration_records = 0; data->max_write_duration_us = 0; avg_write_duration = data->avg_write_duration; pthread_mutex_unlock(&data->mutex); if( records && current_record_nr == records) break; switch(io_state) { case IO_RUNNING: if(current_record_nr == last_record_nr) { printf("IO got frozen. Last completely " "written record: %lu" " \n", last_record_nr); io_state = IO_BLOCKED; } else { if(wr==0) wr=1; avg_write = wd/wr; printf("Current record: %lu " "( cur. write duration %d.%02dms; " "avg. wd. %.2fms)\r", current_record_nr, avg_write/1000,(avg_write%1000)/10, avg_write_duration/1000); fflush(stdout); } last_record_nr = current_record_nr; case IO_BLOCKED: if(current_record_nr != last_record_nr) { printf("IO just resumed. Blocked for %d.%02dms\n", mwd/1000, (mwd%1000)/10); io_state = IO_RUNNING; } } } if(io_state == IO_RUNNING) printf("\n"); } void usage(char *prgname) { fprintf(stderr, "USAGE: %s [options] recordfile\n" " Available options:\n" " --records val -n val\n" " --record-interval-ms -r val\n" " --monitor-interval-ms -m val\n", prgname); exit(20); } int cmp_int(const void *v1, const void *v2) { const int *i1 = (int *) v1; const int *i2 = (int *) v2; return *i1 == *i2 ? 0 : ( *i1 < *i2 ? -1 : +1 ); } int main(int argc, char** argv) { pthread_t watch_dog; unsigned long record_nr=0; FILE* record_f; struct timeval now_tv, then_tv; struct tm now_tm; int write_duration_us=0; int min_wd=(1<<30), max_wd=0; double avg_write_duration; int avg_wd_nr=0,c; int *all_write_durations = NULL; int median=0; double std_deviation=0; int rp; struct shared_data data; static struct option options[] = { {"records", required_argument, 0, 'n'}, {"record-interval-ms", required_argument, 0, 'r'}, {"monitor-interval-ms", required_argument, 0, 'm'}, {0, 0, 0, 0 } }; while (1) { c = getopt_long(argc, argv, "n:r:m:", options, 0); if (c == -1) break; switch (c) { case 'n': records = atol(optarg); break; case 'r': record_time = atoi(optarg) * 1000; break; case 'm': monitor_time = atoi(optarg) * 1000; break; default: usage(argv[0]); } } if(optind != argc-1) { usage(argv[0]); } if(!(record_f = fopen(argv[optind],"w"))) { perror("fopen:"); fprintf(stderr,"Failed to open '%s' for writing\n", argv[optind]); return 10; } if (records) { all_write_durations = calloc(records, sizeof(int)); if (all_write_durations == NULL) { fprintf(stderr, "Malloc failed\n"); return 10; } } printf("\n" "This programm writes records to a file, shows the write latency\n" "of the file system and block device combination and informs\n" "you in case IO completely stalls.\n\n" " Due to the nature of the 'D' process state on Linux\n" " (and other Unix operating systems) you can not kill this\n" " test programm while IO is frozen. You have to kill it with\n" " Ctrl-C (SIGINT) while IO is running.\n\n" "In case the record file's block device freezes, this " "program will\n" "inform you here which record was completely written before it " "freezed.\n\n" ); pthread_mutex_init(&data.mutex,NULL); data.record_nr = record_nr; data.write_duration_us = 0; data.write_duration_records = 1; data.max_write_duration_us = 0; pthread_create(&watch_dog,NULL,wd_thread,&data); for( ; !records || record_nr < records ; record_nr++) { gettimeofday(&now_tv, NULL); localtime_r(&now_tv.tv_sec,&now_tm); fprintf(record_f, "%04d-%02d-%02d %02d:%02d:%02d.%06ld: " "Record number: %-6lu " "(L.r.w.t.: %d.%02dms)\n", 1900+ now_tm.tm_year, 1+ now_tm.tm_mon, now_tm.tm_mday, now_tm.tm_hour, now_tm.tm_min, now_tm.tm_sec, now_tv.tv_usec, record_nr, write_duration_us/1000, (write_duration_us%1000)/10); if(fflush(record_f)) { // flush it from glibc to the kernel. perror("fflush:"); return 10; } if(fdatasync(fileno(record_f))) { // from buffer cache to disk. perror("fdatasync:"); return 10; } // eventually wait for full record_time gettimeofday(&then_tv, NULL); write_duration_us = ( (then_tv.tv_sec - now_tv.tv_sec ) * 1000000 + (then_tv.tv_usec - now_tv.tv_usec) ); if (write_duration_us < monitor_time) { if(write_duration_us < min_wd) min_wd = write_duration_us; if(write_duration_us > max_wd) max_wd = write_duration_us; avg_write_duration = (avg_write_duration * avg_wd_nr + write_duration_us) / (++avg_wd_nr); if (all_write_durations) all_write_durations[record_nr] = write_duration_us; } pthread_mutex_lock(&data.mutex); data.record_nr = record_nr; data.write_duration_us += write_duration_us; data.write_duration_records++; data.avg_write_duration = avg_write_duration; if (write_duration_us > data.max_write_duration_us) data.max_write_duration_us = write_duration_us; pthread_mutex_unlock(&data.mutex); if(write_duration_us < record_time ) { usleep(record_time - write_duration_us); } } pthread_mutex_lock(&data.mutex); data.record_nr = record_nr; pthread_mutex_unlock(&data.mutex); pthread_join(watch_dog,NULL); if (all_write_durations) { qsort(all_write_durations, records, sizeof(int), &cmp_int); median = all_write_durations[records/2]; printf("median = %5.2f\n", (double)median/1000); rp = records * (100-PERCENTILE) / 100; for (record_nr = rp/2; record_nr < records - rp/2; record_nr++) { /* printf("records[%lu] = %5.2f \n", record_nr, (double)all_write_durations[record_nr]/1000); */ std_deviation += pow((double)(all_write_durations[record_nr] - median)/1000, 2); } std_deviation = sqrt(std_deviation / (records - rp) ); } printf( "STATS:\n" " +---------------------------------< records written [ 1 ]\n" " | +----------------------------< average (arithmetic) [ ms ]\n" " | | +-----------------------< shortes write [ ms ]\n" " | | | +------------------< longes write (<%dms) [ ms ]\n" " | | | | +-------------< %d%% percentile median [ ms ]\n" " | | | | | +--------< %d%% percentile standard deviation [ ms ]\n" " ^ ^ ^ ^ ^ ^\n" " %4lu, %5.2f, %5.2f, %5.2f, %5.2f, %5.2f\n", monitor_time/1000, PERCENTILE, PERCENTILE, records, avg_write_duration/1000, (double)min_wd/1000, (double)max_wd/1000, (double)median/1000, std_deviation); } drbd-8.4.4/configure.ac0000664000000000000000000002532112226007136013432 0ustar rootrootdnl dnl autoconf for DRBD dnl dnl License: GNU General Public License Version 2 (GPLv2) dnl Minimum autoconf version we require AC_PREREQ(2.53) dnl What we are, our version, who to bug in case of problems AC_INIT(DRBD, 8.4.4, [drbd-dev@lists.linbit.com]) dnl Sanitize $prefix. Autoconf does this by itself, but so late in the dnl generated configure script that the expansion does not occur until dnl after our eval magic below. if test "$prefix" = "NONE"; then prefix=$ac_default_prefix fi exec_prefix=$prefix dnl Expand autoconf variables so that we dont end up with '${prefix}' dnl in #defines dnl Autoconf deliberately leaves them unexpanded to allow make dnl exec_prefix=/foo install. DRBD supports only DESTDIR, KDIR and dnl KVER to be invoked with make. prefix="`eval echo ${prefix}`" exec_prefix="`eval echo ${exec_prefix}`" bindir="`eval echo ${bindir}`" sbindir="`eval echo ${sbindir}`" libexecdir="`eval echo ${libexecdir}`" datarootdir="`eval echo ${datarootdir}`" datadir="`eval echo ${datadir}`" sysconfdir="`eval echo ${sysconfdir}`" sharedstatedir="`eval echo ${sharedstatedir}`" localstatedir="`eval echo ${localstatedir}`" libdir="`eval echo ${libdir}`" includedir="`eval echo ${includedir}`" oldincludedir="`eval echo ${oldincludedir}`" infodir="`eval echo ${infodir}`" mandir="`eval echo ${mandir}`" docdir="`eval echo ${docdir}`" dnl "--with-" options (all except km enabled by default, pass --without- to disable) WITH_UTILS=yes WITH_LEGACY_UTILS=yes WITH_KM=no WITH_UDEV=yes WITH_XEN=yes WITH_PACEMAKER=yes WITH_HEARTBEAT=yes WITH_RGMANAGER=no WITH_BASHCOMPLETION=yes WITH_NOARCH_SUBPACKAGES=no AC_ARG_WITH([utils], [AS_HELP_STRING([--with-utils], [Enable management utilities])], [WITH_UTILS=$withval]) AC_ARG_WITH([legacy_utils], [AS_HELP_STRING([--without-legacy_utils], [Do not include legacy <= 8.3 drbdsetup/drbdadm])], [WITH_LEGACY_UTILS=$withval]) AC_ARG_WITH([km], [AS_HELP_STRING([--with-km], [Enable kernel module])], [WITH_KM=$withval]) AC_ARG_WITH([udev], [AS_HELP_STRING([--with-udev], [Enable udev integration])], [WITH_UDEV=$withval]) AC_ARG_WITH([xen], [AS_HELP_STRING([--with-xen], [Enable Xen integration])], [WITH_XEN=$withval]) AC_ARG_WITH([pacemaker], [AS_HELP_STRING([--with-pacemaker], [Enable Pacemaker integration])], [WITH_PACEMAKER=$withval]) AC_ARG_WITH([heartbeat], [AS_HELP_STRING([--with-heartbeat], [Enable Heartbeat integration])], [WITH_HEARTBEAT=$withval]) AC_ARG_WITH([rgmanager], [AS_HELP_STRING([--with-rgmanager], [Enable Red Hat Cluster Suite integration])], [WITH_RGMANAGER=$withval]) AC_ARG_WITH([bashcompletion], [AS_HELP_STRING([--with-bashcompletion], [Enable programmable bash completion])], [WITH_BASHCOMPLETION=$withval]) AC_ARG_WITH([distro], [AS_HELP_STRING([--with-distro], [Configure for a specific distribution (supported values: generic, redhat, suse, debian, gentoo, slackware; default is to autodetect)])], [DISTRO=$withval]) AC_ARG_WITH([initdir], [AS_HELP_STRING([--with-initdir], [Override directory for init scripts (default is distribution-specific)])], [INITDIR=$withval]) AC_ARG_WITH([noarchsubpkg], [AS_HELP_STRING([--with-noarchsubpkg], [Build subpackages that support it for the "noarch" architecture (makes sense only with --enable-spec, supported by RPM from 4.6.0 forward)])], [WITH_NOARCH_SUBPACKAGES=$withval]) AC_ARG_ENABLE([spec], [AS_HELP_STRING([--enable-spec], [Rather than creating Makefiles, create an RPM spec file only])], [SPECMODE=$enableval], [SPECMODE=""]) AC_SUBST(WITH_UTILS) AC_SUBST(WITH_LEGACY_UTILS) AC_SUBST(WITH_KM) AC_SUBST(WITH_UDEV) AC_SUBST(WITH_XEN) AC_SUBST(WITH_PACEMAKER) AC_SUBST(WITH_HEARTBEAT) AC_SUBST(WITH_RGMANAGER) AC_SUBST(WITH_BASHCOMPLETION) dnl Checks for programs AC_PROG_CC AC_PROG_LN_S AC_PATH_PROG(SED, sed) AC_PATH_PROG(GREP, grep) AC_PATH_PROG(FLEX, flex) AC_PATH_PROG(RPMBUILD, rpmbuild) AC_PATH_PROG(XSLTPROC, xsltproc) AC_PATH_PROG(TAR, tar) AC_PATH_PROG(GIT, git) AC_PATH_PROG(DPKG_BUILDPACKAGE, dpkg-buildpackage) AC_PATH_PROG(UDEVADM, udevadm, [false], [/sbin$PATH_SEPARATOR$PATH]) AC_PATH_PROG(UDEVINFO, udevinfo, [false], [/sbin$PATH_SEPARATOR$PATH]) if test -z "$CC"; then if test "$WITH_UTILS" = "yes"; then AC_MSG_ERROR([Cannot build utils without a C compiler, either install a compiler or pass the --without-utils option.]) fi if test "$WITH_KM" = "yes"; then AC_MSG_ERROR([Cannot build kernel module without a C compiler, either install a compiler or pass the --without-km option.]) fi fi if test -z $FLEX; then if test "$WITH_UTILS" = "yes"; then AC_MSG_ERROR([Cannot build utils without flex, either install flex or pass the --without-utils option.]) fi fi if test -z $RPMBUILD; then AC_MSG_WARN([No rpmbuild found, building RPM packages is disabled.]) fi if test -z $DPKG_BUILDPACKAGE; then AC_MSG_WARN([No dpkg-buildpackage found, building Debian packages is disabled.]) fi if test -z $XSLTPROC; then AC_MSG_WARN([Cannot build man pages without xsltproc. You may safely ignore this warning when building from a tarball.]) dnl default to some sane value at least, dnl so the error message about command not found makes sense dnl otherwise you get "--xinclude ... command not found" :-/ XSLTPROC=xsltproc fi if test -z $GIT; then AC_MSG_WARN(Cannot update buildtag without git. You may safely ignore this warning when building from a tarball.) fi if test $UDEVADM = false && test $UDEVINFO = false; then if test "$WITH_UDEV" = "yes"; then AC_MSG_WARN([udev support enabled, but neither udevadm nor udevinfo found on this system.]) fi fi dnl Checks for system services BASH_COMPLETION_SUFFIX="" UDEV_RULE_SUFFIX="" RPM_DIST_TAG="" RPM_CONFLICTS_KM="" RPM_BUILDREQ_DEFAULT="gcc flex glibc-devel make" RPM_BUILDREQ_KM="" RPM_SUBPACKAGE_NOARCH="" RPM_REQ_PACEMAKER="" RPM_REQ_HEARTBEAT="" RPM_REQ_BASH_COMPLETION="" RPM_REQ_XEN="" RPM_REQ_CHKCONFIG_POST="" RPM_REQ_CHKCONFIG_PREUN="" dnl figure out the distribution we're running on, and set some variables accordingly if test -z $DISTRO; then AC_CHECK_FILE(/etc/gentoo-release, [DISTRO="gentoo"]) AC_CHECK_FILE(/etc/redhat-release, [DISTRO="redhat"]) AC_CHECK_FILE(/etc/slackware-version, [DISTRO="slackware"]) AC_CHECK_FILE(/etc/debian_version, [DISTRO="debian"]) AC_CHECK_FILE(/etc/SuSE-release, [DISTRO="suse"]) fi case "$DISTRO" in gentoo) AC_MSG_NOTICE([configured for Gentoo.]) ;; redhat) test -z $INITDIR && INITDIR="$sysconfdir/rc.d/init.d" RPM_DIST_TAG="%{?dist}" RPM_CONFLICTS_KM="drbd-kmod <= %{version}_3" dnl Fedora/Red Hat packaging guidelines mandate that packages dnl belonging to the "minimal build system" should not be dnl listed in BuildRequires RPM_BUILDREQ_DEFAULT="flex" RPM_BUILDREQ_KM="kernel-devel" RPM_REQ_CHKCONFIG_POST="Requires(post): chkconfig" RPM_REQ_CHKCONFIG_PREUN="Requires(preun): chkconfig" AC_MSG_NOTICE([configured for Red Hat (includes Fedora, RHEL, CentOS).]) AC_CHECK_FILE(/etc/fedora-release, [SUB_DISTRO="fedora"], [SUB_DISTRO="RHEL"]) if test "$SUB_DISTRO" = "fedora"; then # pacemaker, heartbeat and bash-completion are not available in RHEL # Xen: Be relaxed on RHEL (hassle free update). Be strict on Fedora RPM_REQ_PACEMAKER="Requires: pacemaker" RPM_REQ_HEARTBEAT="Requires: heartbeat" RPM_REQ_BASH_COMPLETION="Requires: bash-completion" RPM_REQ_XEN="Requires: xen" fi ;; slackware) test -z $INITDIR && INITDIR="$sysconfdir/rc.d" AC_MSG_NOTICE([configured for Slackware.]) ;; debian) AC_MSG_NOTICE([configured for Debian (includes Ubuntu).]) ;; suse) BASH_COMPLETION_SUFFIX=".sh" RPM_CONFLICTS_KM="km_drbd, drbd-kmp <= %{version}_3" RPM_BUILDREQ_KM="kernel-syms" # RPM_REQ_CHKCONFIG_POST="" chkconfig is part of aaa_base on suse # RPM_REQ_CHKCONFIG_PREUN="" chkconfig is part of aaa_base on suse AC_MSG_NOTICE([configured for SUSE (includes openSUSE, SLES).]) RPM_REQ_BASH_COMPLETION="Requires: bash" # The following are disabled for hassle free updates: # RPM_REQ_XEN="Requires: xen" # RPM_REQ_PACEMAKER="Requires: pacemaker" # RPM_REQ_HEARTBEAT="Requires: heartbeat" # Unfortunately gcc on SLES9 is broken with -O2. Works with -O1 if grep -q 'VERSION = 9' /etc/SuSE-release; then CFLAGS="-g -O1" fi ;; "") AC_MSG_WARN([Unable to determine what distribution we are running on. Distribution-specific features will be disabled.]) ;; esac dnl INITDIR may be set with --with-initdir, or set in the distro dnl detection magic above. If unset down to here, use a sensible dnl default. test -z $INITDIR && INITDIR="$sysconfdir/init.d" dnl Our udev rules file is known to work only with udev >= 85 if test "$WITH_UDEV" = "yes"; then udev_version=`$UDEVADM version 2>/dev/null` || udev_version=`$UDEVINFO -V | cut -d " " -f 3` if test -z $udev_version || test $udev_version -lt 85; then UDEV_RULE_SUFFIX=".disabled" AC_MSG_WARN([Obsolete or unknown udev version. Installing disabled udev rules.]) fi fi dnl Our sub-packages can be built for noarch, but RPM only supports dnl this from version 4.6.0 forward if test "$WITH_NOARCH_SUBPACKAGES" = "yes"; then RPM_SUBPACKAGE_NOARCH="BuildArch: noarch" fi AC_SUBST(DISTRO) AC_SUBST(INITDIR) AC_SUBST(BASH_COMPLETION_SUFFIX) AC_SUBST(UDEV_RULE_SUFFIX) AC_SUBST(RPM_DIST_TAG) AC_SUBST(RPM_CONFLICTS_KM) AC_SUBST(RPM_BUILDREQ_DEFAULT) AC_SUBST(RPM_BUILDREQ_KM) AC_SUBST(RPM_SUBPACKAGE_NOARCH) AC_SUBST(RPM_REQ_PACEMAKER) AC_SUBST(RPM_REQ_HEARTBEAT) AC_SUBST(RPM_REQ_BASH_COMPLETION) AC_SUBST(RPM_REQ_XEN) AC_SUBST(RPM_REQ_CHKCONFIG_POST) AC_SUBST(RPM_REQ_CHKCONFIG_PREUN) AH_TEMPLATE(DRBD_LIB_DIR, [Local state directory. Commonly /var/lib/drbd or /usr/local/var/lib/drbd]) AH_TEMPLATE(DRBD_RUN_DIR, [Runtime state directory. Commonly /var/run/drbd or /usr/local/var/run/drbd]) AH_TEMPLATE(DRBD_LOCK_DIR, [Local lock directory. Commonly /var/lock or /usr/local/var/lock]) AH_TEMPLATE(DRBD_CONFIG_DIR, [Local configuration directory. Commonly /etc or /usr/local/etc]) AH_TEMPLATE(DRBD_LEGACY_83, [Include support for drbd-8.3 kernel code]) AC_DEFINE_UNQUOTED(DRBD_LIB_DIR, ["$localstatedir/lib/$PACKAGE_TARNAME"]) AC_DEFINE_UNQUOTED(DRBD_RUN_DIR, ["$localstatedir/run/$PACKAGE_TARNAME"]) AC_DEFINE_UNQUOTED(DRBD_LOCK_DIR, ["$localstatedir/lock"]) AC_DEFINE_UNQUOTED(DRBD_CONFIG_DIR, ["$sysconfdir"]) if test "$WITH_LEGACY_UTILS" = "yes"; then AC_DEFINE(DRBD_LEGACY_83, [1]) fi dnl The configuration files we create (from their .in template) if test -z $SPECMODE; then AC_CONFIG_FILES(Makefile user/Makefile user/legacy/Makefile scripts/Makefile documentation/Makefile) AC_CONFIG_HEADERS(user/config.h user/legacy/config.h) else if test "$WITH_UTILS" = "yes"; then AC_CONFIG_FILES(drbd.spec) fi if test "$WITH_KM" = "yes"; then AC_CONFIG_FILES(drbd-km.spec drbd-kernel.spec) fi fi dnl output AC_OUTPUT drbd-8.4.4/documentation/Makefile.in0000664000000000000000000001003512221261130016045 0ustar rootroot# Makefile in documentation directory # # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd; see the file COPYING. If not, write to # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # variables set by configure DISTRO = @DISTRO@ prefix = @prefix@ exec_prefix = @exec_prefix@ localstatedir = @localstatedir@ datarootdir = @datarootdir@ datadir = @datadir@ sbindir = @sbindir@ sysconfdir = @sysconfdir@ mandir = @mandir@ BASH_COMPLETION_SUFFIX = @BASH_COMPLETION_SUFFIX@ UDEV_RULE_SUFFIX = @UDEV_RULE_SUFFIX@ INITDIR = @INITDIR@ LIBDIR = @prefix@/lib/@PACKAGE_TARNAME@ CC = @CC@ CFLAGS = @CFLAGS@ XSLTPROC = @XSLTPROC@ # features enabled or disabled by configure WITH_UTILS = @WITH_UTILS@ WITH_KM = @WITH_KM@ WITH_UDEV = @WITH_UDEV@ WITH_XEN = @WITH_XEN@ WITH_PACEMAKER = @WITH_PACEMAKER@ WITH_HEARTBEAT = @WITH_HEARTBEAT@ WITH_RGMANAGER = @WITH_RGMANAGER@ WITH_BASHCOMPLETION = @WITH_BASHCOMPLETION@ # variables meant to be overridden from the make command line DESTDIR ?= / ifeq ($(WITH_UTILS),yes) MANPAGES := drbdsetup.8 drbd.conf.5 drbd.8 drbdadm.8 drbdmeta.8 endif ifeq ($(WITH_HEARTBEAT),yes) MANPAGES += drbddisk.8 endif SOURCES := $(wildcard *.xml) STYLESHEET_PREFIX ?= http://docbook.sourceforge.net/release/xsl/current MANPAGES_STYLESHEET ?= $(STYLESHEET_PREFIX)/manpages/docbook.xsl HTML_STYLESHEET ?= $(STYLESHEET_PREFIX)/xhtml/docbook.xsl FO_STYLESHEET ?= $(STYLESHEET_PREFIX)/fo/docbook.xsl XSLTPROC_OPTIONS ?= --xinclude XSLTPROC_MANPAGES_OPTIONS ?= $(XSLTPROC_OPTIONS) XSLTPROC_HTML_OPTIONS ?= $(XSLTPROC_OPTIONS) XSLTPROC_FO_OPTIONS ?= $(XSLTPROC_OPTIONS) DRBDSETUP_CMDS = new-resource new-minor del-resource del-minor DRBDSETUP_CMDS += attach connect disk-options net-options resource-options DRBDSETUP_CMDS += disconnect detach primary secondary verify invalidate invalidate-remote DRBDSETUP_CMDS += down wait-connect wait-sync role cstate dstate DRBDSETUP_CMDS += resize check-resize pause-sync resume-sync DRBDSETUP_CMDS += outdate show-gi get-gi show events DRBDSETUP_CMDS += suspend-io resume-io new-current-uuid all: @echo "To (re)make the documentation: make doc" clean: @echo "To clean the documentation: make doc-clean" doc: man doc-clean: distclean ####### Implicit rules .SUFFIXES: .sgml .5 .8 .html .pdf .ps %.5 %.8: %.xml ifeq ($(WITH_UTILS),yes) $(XSLTPROC) \ $(XSLTPROC_MANPAGES_OPTIONS) \ $(MANPAGES_STYLESHEET) $< endif %.html: %.xml ifeq ($(WITH_UTILS),yes) $(XSLTPROC) -o $@ \ $(XSLTPROC_HTML_OPTIONS) \ $(HTML_STYLESHEET) $< endif %.fo: %.xml ifeq ($(WITH_UTILS),yes) $(XSLTPROC) -o $@ \ $(XSLTPROC_FO_OPTIONS) \ $(FO_STYLESHEET) $< endif ../user/drbdsetup: (cd ../user; make drbdsetup) drbdsetup_xml-help_%.xml: ../user/drbdsetup ../user/drbdsetup xml-help $* > $@ drbdsetup_%.xml: drbdsetup_xml-help_%.xml xml-usage-to-docbook.xsl $(XSLTPROC) -o $@ xml-usage-to-docbook.xsl $< distclean: rm -f *.[58] manpage.links manpage.refs *~ manpage.log rm -f *.ps.gz *.pdf *.ps *.html pod2htm* rm -f drbdsetup_*.xml ####### man: $(MANPAGES) install: @ok=true; for f in $(MANPAGES) ; \ do [ -e $$f ] || { echo $$f missing ; ok=false; } ; \ done ; $$ok set -e; for f in $(MANPAGES) ; do \ s=$${f##*.}; \ install -v -D -m 644 $$f $(DESTDIR)$(mandir)/man$$s/$$f ; \ done uninstall: @ set -e; for f in $(MANPAGES) ; do \ s=$${f##*.}; \ rm -vf $(DESTDIR)$(mandir)/man$$s/$$f ; \ done html: $(SOURCES:.xml=.html) pdf: $(SOURCES:.xml=.pdf) ps: $(SOURCES:.xml=.ps) drbdsetup.8: drbdsetup.xml $(patsubst %,drbdsetup_%.xml,$(DRBDSETUP_CMDS)) drbd-8.4.4/documentation/Makefile.lang0000664000000000000000000000545411101361566016403 0ustar rootroot# Makefile in documentation directory # # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd; see the file COPYING. If not, write to # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # # C_LANG=$(shell basename `pwd`) MANPAGES=drbd.conf.5 drbdsetup.8 drbd.8 # # Docbook Magic for SuSE, Worksforme... # Needs to be explicitly enabled with make doc DIST=SuSE-9.0 # requires these packages and their dependencies: # docbook-dsssl-stylesheets docbook-toys # docbook-utils docbook-utils opensp perl-SGMLS # ifeq ($(DIST),SuSE-9.0) # dockbook2man just does not work. # this does: DB2MAN :=\ @_db2man() { in=$$1 ;\ echo " [DB2MAN] $$in" ;\ nsgmls "$$in" |\ sgmlspl /usr/share/sgml/docbook/utils-0.6.6/helpers/docbook2man-spec.pl;\ }; _db2man # dockbook2html produces errors and swallows tables. # db2html works, but in contrast do dockbook2man creates # a subdirectory and some extra files ... # this works for me: DB2HTML :=\ _db2html() { in=$$1; base=$${in%.sgml} ;\ echo " [DB2HTML] $$in" ;\ db2html "$$in" ;\ ln -sf t1.html $$base/index.html ;\ }; _db2html else DB2MAN := $(shell which 2>/dev/null docbook2man) DB2HTML := $(shell which 2>/dev/null docbook2html) DB2PDF := $(shell which 2>/dev/null docbook2pdf) DB2PS := $(shell which 2>/dev/null docbook2ps) endif ####### Implicit rules .SUFFIXES: .sgml .5 .8 .html .pdf .ps .sgml.5: $(DB2MAN) $< .sgml.8: $(DB2MAN) $< .sgml.html: $(DB2HTML) $< mv index.html $@ .sgml.pdf: $(DB2PDF) $< .sgml.ps: $(DB2PS) $< gzip -c $@ > $@.gz ####### man: $(MANPAGES) all: man clean: rm -f *.[58] manpage.links manpage.refs *~ manpage.log rm -f *.ps.gz *.pdf *.ps *.html install: install -D -m 644 drbdsetup.8 $(PREFIX)/usr/share/man/$(C_LANG)/man8/drbdsetup.8 install -D -m 644 drbd.conf.5 $(PREFIX)/usr/share/man/$(C_LANG)/man5/drbd.conf.5 install -D -m 644 drbd.8 $(PREFIX)/usr/share/man/$(C_LANG)/man8/drbd.8 uninstall: rm $(PREFIX)/usr/share/man/$(C_LANG)/man8/drbdsetup.8 rm $(PREFIX)/usr/share/man/$(C_LANG)/man5/drbd.conf.5 rm $(PREFIX)/usr/share/man/$(C_LANG)/man8/drbd.8 html: $(shell ls *.sgml | sed s/sgml/html/g) pdf: $(shell ls *.sgml | sed s/sgml/pdf/g) ps: $(shell ls *.sgml | sed s/sgml/ps/g) drbd-8.4.4/documentation/aspell.en.per0000664000000000000000000000374211516050234016407 0ustar rootrootpersonal_ws-1.1 en 285 ArbitraryCnt BIOs BLKFLSBUF BLKGETSIZE BLKSSZGET BarrierAck Bitmap's BrokenPipe ConnectedCnt ConnectedInd DRBD's DUnknown EAGAIN EBUSY EINVAL ENOMEM EOPNOTSUPP Ellenberg FIXME GBit GCs GPL GmbH HumanCnt IDE JFS KDIR KOBJ LANANA LBD LINBIT MDF NIC NICs NUL NUM NetworkFailure PausedSyncS PausedSyncT Philipp PingAck ProtocolError RAIDs RHEL RLE ReIsErFs Reisner SETLKW SIGALRM SVN SWAPSPACE StandAlone StartingSync StartingSyncS StartingSyncT SyncParam SyncSource SyncTarget SyncUUID TCQ TK TODO TearDown TimeoutCnt UI UUIDs UnknownMandatoryTag UpToDate VerifyS VerifyT WFBitMapS WFBitMapT WFConnection WFReportParams WFSyncUUID WantFullSync ack acked acks actlog addr adm al alg api argc args argv asbp asender asprintf bm bmbv bnum boolean bsize buildtag bvec cB calloc canonicalize cfg cgi chdir checksum cmd cmdname cn conf config conv cpu crypto cstate ctl degr dereference dev devfs devname diskless diskstats dont dopd drbd drbdX drbdadm drbdmeta drbdsetup drbdtool ds dstate dt endian endianness enums etext evictable exa extraversion fcntl fd fdopen fgets filesystem fprintf fs fstat fstype gc gcc gethostbyname gi goto haclient hacluster hdr hexdump hmac hostname http idx incon init inline io ip ipv irq kb kmalloc ko lastState len lge libdisk linux lld llu ln longjmp longoptions lookup lr lru lu lvm malloc md mem memalign memset metadata mkdir modprobe multihomed mutex netlink nodenames noheadings nop nosuffix nv ok online oopsie optind ord outdate outdating pagesize param parms pathname pid plaintext posix pre prepends pri printf proc pv pvs recurse recurses recv refcnt refcount reiser reiserfs resize resync resynced rr runlength sb sbin scmd sendpage sizeof sndbuf spinlock ssocks startup stderr stdout stonith str strtoll struct subcommand sublevel superblock suse symlinks syncer sys syscall sysconf tl toc tracepoint tri tty udev unconfigure unconfigured uniq urandom userland userspace usr uuid uuids varname vasprintf vmalloc wfc wget writeout xfs xfsprogs xml yylval drbd-8.4.4/documentation/drbd.conf.xml0000664000000000000000000025027512221261130016375 0ustar rootroot 6 May 2011 DRBD 8.4.0 drbd.conf 5 Configuration Files drbd.conf Configuration file for DRBD's devices drbd.conf Introduction The file is read by . The file format was designed as to allow to have a verbatim copy of the file on both nodes of the cluster. It is highly recommended to do so in order to keep your configuration manageable. The file should be the same on both nodes of the cluster. Changes to do not apply immediately. By convention the main config contains two include statements. The first one includes the file , the second one all file with a suffix. A small example.res file resource r0 { net { protocol C; cram-hmac-alg sha1; shared-secret "FooFunFactory"; } disk { resync-rate 10M; } on alice { volume 0 { device minor 1; disk /dev/sda7; meta-disk internal; } address 10.1.1.31:7789; } on bob { volume 0 { device minor 1; disk /dev/sda7; meta-disk internal; } address 10.1.1.32:7789; } } In this example, there is a single DRBD resource (called r0) which uses protocol C for the connection between its devices. It contains a single volume which runs on host alice uses /dev/drbd1 as devices for its application, and /dev/sda7 as low-level storage for the data. The IP addresses are used to specify the networking interfaces to be used. An eventually running resync process should use about 10MByte/second of IO bandwidth. This sync-rate statement is valid for volume 0, but would also be valid for further volumes. In this example it assigns full 10MByte/second to each volume. There may be multiple resource sections in a single drbd.conf file. For more examples, please have a look at the DRBD User's Guide. File Format The file consists of sections and parameters. A section begins with a keyword, sometimes an additional name, and an opening brace ({). A section ends with a closing brace (}. The braces enclose the parameters. section [name] { parameter value; [...] } A parameter starts with the identifier of the parameter followed by whitespace. Every subsequent character is considered as part of the parameter's value. A special case are Boolean parameters which consist only of the identifier. Parameters are terminated by a semicolon (;). Some parameter values have default units which might be overruled by K, M or G. These units are defined in the usual way (K = 2^10 = 1024, M = 1024 K, G = 1024 M). Comments may be placed into the configuration file and must begin with a hash sign (#). Subsequent characters are ignored until the end of the line. Sections drbd.conf skip Comments out chunks of text, even spanning more than one line. Characters between the keyword and the opening brace ({) are ignored. Everything enclosed by the braces is skipped. This comes in handy, if you just want to comment out some '' section: just precede it with ''. drbd.conf global Configures some global parameters. Currently only , , and are allowed here. You may only have one global section, preferably as the first section. drbd.conf common All resources inherit the options set in this section. The common section might have a , a , a , a and a section. drbd.conf resource Configures a DRBD resource. Each resource section needs to have two (or more) sections and may have a , a , a , a and a section. It might contain s sections. drbd.conf on Carries the necessary configuration parameters for a DRBD device of the enclosing resource. host-name is mandatory and must match the Linux host name (uname -n) of one of the nodes. You may list more than one host name here, in case you want to use the same parameters on several hosts (you'd have to move the IP around usually). Or you may list more than two such sections. resource r1 { protocol C; device minor 1; meta-disk internal; on alice bob { address 10.2.2.100:7801; disk /dev/mapper/some-san; } on charlie { address 10.2.2.101:7801; disk /dev/mapper/other-san; } on daisy { address 10.2.2.103:7801; disk /dev/mapper/other-san-as-seen-from-daisy; } } See also the section keyword. Required statements in this section: and . Note for backward compatibility and convenience it is valid to embed the statements of a single volume directly into the host section. drbd.conf volume Defines a volume within a connection. The minor numbers of a replicated volume might be different on different hosts, the volume number (vnr) is what groups them together. Required parameters in this section: , , . drbd.conf stacked-on-top-of For a stacked DRBD setup (3 or 4 nodes), a is used instead of an section. Required parameters in this section: and . drbd.conf on Carries the necessary configuration parameters for a DRBD device of the enclosing resource. This section is very similar to the section. The difference to the section is that the matching of the host sections to machines is done by the IP-address instead of the node name. Required parameters in this section: , , , all of which may be inherited from the resource section, in which case you may shorten this section down to just the address identifier. resource r2 { protocol C; device minor 2; disk /dev/sda7; meta-disk internal; # short form, device, disk and meta-disk inherited floating 10.1.1.31:7802; # longer form, only device inherited floating 10.1.1.32:7802 { disk /dev/sdb; meta-disk /dev/sdc8; } } drbd.conf disk This section is used to fine tune DRBD's properties in respect to the low level storage. Please refer to drbdsetup 8 for detailed description of the parameters. Optional parameters: , , , , , , , , , , , , , , , , , , . drbd.conf net This section is used to fine tune DRBD's properties. Please refer to drbdsetup 8 for a detailed description of this section's parameters. Optional parameters: , , , , , , , , , , , , , , , , , , , , , , , . drbd.conf startup This section is used to fine tune DRBD's properties. Please refer to drbdsetup 8 for a detailed description of this section's parameters. Optional parameters: , , , , and . drbd.conf options This section is used to fine tune the behaviour of the resource object. Please refer to drbdsetup 8 for a detailed description of this section's parameters. Optional parameters: , and . drbd.conf handlers In this section you can define handlers (executables) that are started by the DRBD system in response to certain events. Optional parameters: , , , (formerly oudate-peer), , , , , . The interface is done via environment variables: is the name of the resource is the minor number of the DRBD device, in decimal. is the path to the primary configuration file; if you split your configuration into multiple files (e.g. in ), this will not be helpful. , , are the address family (e.g. ), the peer's address and hostnames. is deprecated. Please note that not all of these might be set for all handlers, and that some values might not be useable for a definition. Parameters drbd.conf minor-count count may be a number from 1 to 1048575. Minor-count is a sizing hint for DRBD. It helps to right-size various memory pools. It should be set in the in the same order of magnitude than the actual number of minors you use. Per default the module loads with 11 more resources than you have currently in your config but at least 32. drbd.conf dialog-refresh time may be 0 or a positive number. The user dialog redraws the second count every time seconds (or does no redraws if time is 0). The default value is 1. drbd.conf disable-ip-verification Use disable-ip-verification if, for some obscure reasons, drbdadm can/might not use or to do a sanity check for the IP address. You can disable the IP verification with this option. drbd.conf usage-count Please participate in DRBD's online usage counter. The most convenient way to do so is to set this option to . Valid options are: , and . drbd.conf protocol On the TCP/IP link the specified protocol is used. Valid protocol specifiers are A, B, and C. Protocol A: write IO is reported as completed, if it has reached local disk and local TCP send buffer. Protocol B: write IO is reported as completed, if it has reached local disk and remote buffer cache. Protocol C: write IO is reported as completed, if it has reached both local and remote disk. drbd.conf device The name of the block device node of the resource being described. You must use this device with your application (file system) and you must not use the low level block device which is specified with the parameter. One can ether omit the name or and the minor number. If you omit the name a default of /dev/drbdminor will be used. Udev will create additional symlinks in /dev/drbd/by-res and /dev/drbd/by-disk. drbd.conf disk DRBD uses this block device to actually store and retrieve the data. Never access such a device while DRBD is running on top of it. This also holds true for dumpe2fs 8 and similar commands. drbd.conf address A resource needs one IP address per device, which is used to wait for incoming connections from the partner device respectively to reach the partner device. AF must be one of , , or (for compatibility reasons is an alias for ). It may be omited for IPv4 addresses. The actual IPv6 address that follows the keyword must be placed inside brackets: ipv6 [fd01:2345:6789:abcd::1]:7800. Each DRBD resource needs a TCP port which is used to connect to the node's partner device. Two different DRBD resources may not use the same addr:port combination on the same node. drbd.conf meta-disk Internal means that the last part of the backing device is used to store the meta-data. The size of the meta-data is computed based on the size of the device. When a device is specified, either with or without an index, DRBD stores the meta-data on this device. Without index, the size of the meta-data is determined by the size of the data device. This is usually used with LVM, which allows to have many variable sized block devices. The meta-data size is 36kB + Backing-Storage-size / 32k, rounded up to the next 4kb boundary. (Rule of the thumb: 32kByte per 1GByte of storage, rounded up to the next MB.) When an index is specified, each index number refers to a fixed slot of meta-data of 128 MB, which allows a maximum data size of 4 GB. This way, multiple DBRD devices can share the same meta-data device. For example, if /dev/sde6[0] and /dev/sde6[1] are used, /dev/sde6 must be at least 256 MB big. Because of the hard size limit, use of meta-disk indexes is discouraged. drbd.conf on-io-error handler is taken, if the lower level device reports io-errors to the upper layers. handler may be , or : The node downgrades the disk status to inconsistent, marks the erroneous block as inconsistent in the bitmap and retries the IO on the remote node. : Call the handler script . : The node drops its low level device, and continues in diskless mode. drbd.conf fencing By we understand preventive measures to avoid situations where both nodes are primary and disconnected (AKA split brain). Valid fencing policies are: This is the default policy. No fencing actions are taken. If a node becomes a disconnected primary, it tries to fence the peer's disk. This is done by calling the handler. The handler is supposed to reach the other node over alternative communication paths and call '' there. If a node becomes a disconnected primary, it freezes all its IO operations and calls its fence-peer handler. The fence-peer handler is supposed to reach the peer over alternative communication paths and call 'drbdadm outdate res' there. In case it cannot reach the peer it should stonith the peer. IO is resumed as soon as the situation is resolved. In case your handler fails, you can resume IO with the command. drbd.conf disk-barrier drbd.conf disk-flushes drbd.conf disk-drain DRBD has four implementations to express write-after-write dependencies to its backing storage device. DRBD will use the first method that is supported by the backing storage device and that is not disabled. By default the flush method is used. Since drbd-8.4.2 is disabled by default because since linux-2.6.36 (or 2.6.32 RHEL6) there is no reliable way to determine if queuing of IO-barriers works. Dangerous only enable if you are told so by one that knows for sure. When selecting the method you should not only base your decision on the measurable performance. In case your backing storage device has a volatile write cache (plain disks, RAID of plain disks) you should use one of the first two. In case your backing storage device has battery-backed write cache you may go with option 3. Option 4 (disable everything, use "none") is dangerous on most IO stacks, may result in write-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles. Do not use . Unfortunately device mapper (LVM) might not support barriers. The letter after "wo:" in /proc/drbd indicates with method is currently in use for a device: , , , . The implementations are: barrier The first requires that the driver of the backing storage device support barriers (called 'tagged command queuing' in SCSI and 'native command queuing' in SATA speak). The use of this method can be enabled by setting the options to . flush The second requires that the backing device support disk flushes (called 'force unit access' in the drive vendors speak). The use of this method can be disabled setting to . drain The third method is simply to let write requests drain before write requests of a new reordering domain are issued. This was the only implementation before 8.0.9. none The fourth method is to not express write-after-write dependencies to the backing store at all, by also specifying . This is dangerous on most IO stacks, may result in write-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles. Do not use . drbd.conf md-flushes Disables the use of disk flushes and barrier BIOs when accessing the meta data device. See the notes on . drbd.conf max-bio-bvecs In some special circumstances the device mapper stack manages to pass BIOs to DRBD that violate the constraints that are set forth by DRBD's merge_bvec() function and which have more than one bvec. A known example is: phys-disk -> DRBD -> LVM -> Xen -> misaligned partition (63) -> DomU FS. Then you might see "bio would need to, but cannot, be split:" in the Dom0's kernel log. The best workaround is to proper align the partition within the VM (E.g. start it at sector 1024). This costs 480 KiB of storage. Unfortunately the default of most Linux partitioning tools is to start the first partition at an odd number (63). Therefore most distribution's install helpers for virtual linux machines will end up with misaligned partitions. The second best workaround is to limit DRBD's max bvecs per BIO (= ) to 1, but that might cost performance. The default value of is 0, which means that there is no user imposed limitation. drbd.conf disk-timeout If the driver of the lower_device does not finish an IO request within disk_timeout, DRBD considers the disk as failed. If DRBD is connected to a remote host, it will reissue local pending IO requests to the peer, and ship all new IO requests to the peer only. The disk state advances to diskless, as soon as the backing block device has finished all IO requests. The default value of is 0, which means that no timeout is enforced. The default unit is 100ms. This option is available since 8.3.12. drbd.conf read-balancing The supported methods for load balancing of read requests are , , , , , , , , , and . The default value of is . This option is available since 8.4.1. drbd.conf sndbuf-size size is the size of the TCP socket send buffer. The default value is 0, i.e. autotune. You can specify smaller or larger values. Larger values are appropriate for reasonable write throughput with protocol A over high latency networks. Values below 32K do not make sense. Since 8.0.13 resp. 8.2.7, setting the size value to 0 means that the kernel should autotune this. drbd.conf rcvbuf-size size is the size of the TCP socket receive buffer. The default value is 0, i.e. autotune. You can specify smaller or larger values. Usually this should be left at its default. Setting the size value to 0 means that the kernel should autotune this. drbd.conf timeout If the partner node fails to send an expected response packet within time tenths of a second, the partner node is considered dead and therefore the TCP/IP connection is abandoned. This must be lower than connect-int and ping-int. The default value is 60 = 6 seconds, the unit 0.1 seconds. drbd.conf connect-int In case it is not possible to connect to the remote DRBD device immediately, DRBD keeps on trying to connect. With this option you can set the time between two retries. The default value is 10 seconds, the unit is 1 second. drbd.conf ping-int If the TCP/IP connection linking a DRBD device pair is idle for more than time seconds, DRBD will generate a keep-alive packet to check if its partner is still alive. The default is 10 seconds, the unit is 1 second. drbd.conf ping-timeout The time the peer has time to answer to a keep-alive packet. In case the peer's reply is not received within this time period, it is considered as dead. The default value is 500ms, the default unit are tenths of a second. drbd.conf max-buffers Limits the memory usage per DRBD minor device on the receiving side, or for internal buffers during resync or online-verify. Unit is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible setting is hard coded to 32 (=128 KiB). These buffers are used to hold data blocks while they are written to/read from disk. To avoid possible distributed deadlocks on congestion, this setting is used as a throttle threshold rather than a hard limit. Once more than max-buffers pages are in use, further allocation from this pool is throttled. You want to increase max-buffers if you cannot saturate the IO backend on the receiving side. drbd.conf ko-count In case the secondary node fails to complete a single write request for count times the timeout, it is expelled from the cluster. (I.e. the primary node goes into mode.) The default value is 0, which disables this feature. drbd.conf max-epoch-size The highest number of data blocks between two write barriers. If you set this smaller than 10, you might decrease your performance. drbd.conf allow-two-primaries With this option set you may assign the primary role to both nodes. You only should use this option if you use a shared storage file system on top of DRBD. At the time of writing the only ones are: OCFS2 and GFS. If you use this option with any other file system, you are going to crash your nodes and to corrupt your data! drbd.conf unplug-watermark This setting has no effect with recent kernels that use explicit on-stack plugging (upstream Linux kernel 2.6.39, distributions may have backported). When the number of pending write requests on the standby (secondary) node exceeds the , we trigger the request processing of our backing storage device. Some storage controllers deliver better performance with small values, others deliver best performance when the value is set to the same value as max-buffers, yet others don't feel much effect at all. Minimum 16, default 128, maximum 131072. drbd.conf cram-hmac-alg You need to specify the HMAC algorithm to enable peer authentication at all. You are strongly encouraged to use peer authentication. The HMAC algorithm will be used for the challenge response authentication of the peer. You may specify any digest algorithm that is named in . drbd.conf shared-secret The shared secret used in peer authentication. May be up to 64 characters. Note that peer authentication is disabled as long as no (see above) is specified. policy drbd.conf after-sb-0pri possible policies are: No automatic resynchronization, simply disconnect. Auto sync from the node that was primary before the split-brain situation happened. Auto sync from the node that became primary as second during the split-brain situation. In case one node did not write anything since the split brain became evident, sync from the node that wrote something to the node that did not write anything. In case none wrote anything this policy uses a random decision to perform a "resync" of 0 blocks. In case both have written something this policy disconnects the nodes. Auto sync from the node that touched more blocks during the split brain situation. Auto sync to the named node. policy drbd.conf after-sb-1pri possible policies are: No automatic resynchronization, simply disconnect. Discard the version of the secondary if the outcome of the algorithm would also destroy the current secondary's data. Otherwise disconnect. Always take the decision of the algorithm, even if that causes an erratic change of the primary's view of the data. This is only useful if you use a one-node FS (i.e. not OCFS2 or GFS) with the flag, AND if you really know what you are doing. This is DANGEROUS and MAY CRASH YOUR MACHINE if you have an FS mounted on the primary node. Discard the secondary's version. Always honor the outcome of the algorithm. In case it decides the current secondary has the right data, it calls the "pri-lost-after-sb" handler on the current primary. policy drbd.conf after-sb-2pri possible policies are: No automatic resynchronization, simply disconnect. Always take the decision of the algorithm, even if that causes an erratic change of the primary's view of the data. This is only useful if you use a one-node FS (i.e. not OCFS2 or GFS) with the flag, AND if you really know what you are doing. This is DANGEROUS and MAY CRASH YOUR MACHINE if you have an FS mounted on the primary node. Call the "pri-lost-after-sb" helper program on one of the machines. This program is expected to reboot the machine, i.e. make it secondary. Normally the automatic after-split-brain policies are only used if current states of the UUIDs do not indicate the presence of a third node. With this option you request that the automatic after-split-brain policies are used as long as the data sets of the nodes are somehow related. This might cause a full sync, if the UUIDs indicate the presence of a third node. (Or double faults led to strange UUID sets.) policy drbd.conf rr-conflict This option helps to solve the cases when the outcome of the resync decision is incompatible with the current role assignment in the cluster. No automatic resynchronization, simply disconnect. Sync to the primary node is allowed, violating the assumption that data on a block device are stable for one of the nodes. Dangerous, do not use. Call the "pri-lost" helper program on one of the machines. This program is expected to reboot the machine, i.e. make it secondary. alg drbd.conf data-integrity-alg DRBD can ensure the data integrity of the user's data on the network by comparing hash values. Normally this is ensured by the 16 bit checksums in the headers of TCP/IP packets. This option can be set to any of the kernel's data digest algorithms. In a typical kernel configuration you should have at least one of , , and available. By default this is not enabled. See also the notes on data integrity. drbd.conf tcp-cork DRBD usually uses the TCP socket option TCP_CORK to hint to the network stack when it can expect more data, and when it should flush out what it has in its send queue. It turned out that there is at least one network stack that performs worse when one uses this hinting method. Therefore we introducted this option. By setting to you can disable the setting and clearing of the TCP_CORK socket option by DRBD. By default DRBD blocks when the available TCP send queue becomes full. That means it will slow down the application that generates the write requests that cause DRBD to send more data down that TCP connection. When DRBD is deployed with DRBD-proxy it might be more desirable that DRBD goes into AHEAD/BEHIND mode shortly before the send queue becomes full. In AHEAD/BEHIND mode DRBD does no longer replicate data, but still keeps the connection open. The advantage of the AHEAD/BEHIND mode is that the application is not slowed down, even if DRBD-proxy's buffer is not sufficient to buffer all write requests. The downside is that the peer node falls behind, and that a resync will be necessary to bring it back into sync. During that resync the peer node will have an inconsistent disk. Available congestion_policys are and . The default is . Fill_threshold might be in the range of 0 to 10GiBytes. The default is 0 which disables the check. Active_extents_threshold has the same limits as . The AHEAD/BEHIND mode and its settings are available since DRBD 8.3.10. Wait for connection timeout. drbd.conf wfc-timeout The init script drbd 8 blocks the boot process until the DRBD resources are connected. When the cluster manager starts later, it does not see a resource with internal split-brain. In case you want to limit the wait time, do it here. Default is 0, which means unlimited. The unit is seconds. drbd.conf degr-wfc-timeout Wait for connection timeout, if this node was a degraded cluster. In case a degraded cluster (= cluster with only one node left) is rebooted, this timeout value is used instead of wfc-timeout, because the peer is less likely to show up in time, if it had been dead before. Value 0 means unlimited. drbd.conf outdated-wfc-timeout Wait for connection timeout, if the peer was outdated. In case a degraded cluster (= cluster with only one node left) with an outdated peer disk is rebooted, this timeout value is used instead of wfc-timeout, because the peer is not allowed to become primary in the meantime. Value 0 means unlimited. By setting this option you can make the init script to continue to wait even if the device pair had a split brain situation and therefore refuses to connect. Sets on which node the device should be promoted to primary role by the init script. The node-name might either be a host name or the keyword . When this option is not set the devices stay in secondary role on both nodes. Usually one delegates the role assignment to a cluster manager (e.g. heartbeat). Usually and are ignored for stacked devices, instead twice the amount of is used for the connection timeouts. With the keyword you disable this, and force DRBD to mind the and statements. Only do that if the peer of the stacked resource is usually not available or will usually not become primary. By using this option incorrectly, you run the risk of causing unexpected split brain. drbd.conf resync-rate To ensure a smooth operation of the application on top of DRBD, it is possible to limit the bandwidth which may be used by background synchronizations. The default is 250 KB/sec, the default unit is KB/sec. Optional suffixes K, M, G are allowed. drbd.conf use-rle During resync-handshake, the dirty-bitmaps of the nodes are exchanged and merged (using bit-or), so the nodes will have the same understanding of which blocks are dirty. On large devices, the fine grained dirty-bitmap can become large as well, and the bitmap exchange can take quite some time on low-bandwidth links. Because the bitmap typically contains compact areas where all bits are unset (clean) or set (dirty), a simple run-length encoding scheme can considerably reduce the network traffic necessary for the bitmap exchange. For backward compatibilty reasons, and because on fast links this possibly does not improve transfer time but consumes cpu cycles, this defaults to off. drbd.conf resync-after By default, resynchronization of all devices would run in parallel. By defining a resync-after dependency, the resynchronization of this resource will start only if the resource res-name is already in connected state (i.e., has finished its resynchronization). drbd.conf al-extents DRBD automatically performs hot area detection. With this parameter you control how big the hot area (= active set) can get. Each extent marks 4M of the backing storage (= low-level device). In case a primary node leaves the cluster unexpectedly, the areas covered by the active set must be resynced upon rejoining of the failed node. The data structure is stored in the meta-data area, therefore each change of the active set is a write operation to the meta-data device. A higher number of extents gives longer resync times but less updates to the meta-data. The default number of extents is 1237. (Minimum: 7, Maximum: 65534) Note that the effective maximum may be smaller, depending on how you created the device meta data, see also drbdmeta8. The effective maximum is 919 * (available on-disk activity-log ring-buffer area/4kB -1), the default 32kB ring-buffer effects a maximum of 6433 (covers more than 25 GiB of data). We recommend to keep this well within the amount your backend storage and replication link are able to resync inside of about 5 minutes. drbd.conf al-updates DRBD's activity log transaction writing makes it possible, that after the crash of a primary node a partial (bit-map based) resync is sufficient to bring the node back to up-to-date. Setting to might increase normal operation performance but causes DRBD to do a full resync when a crashed primary gets reconnected. The default value is . During online verification (as initiated by the verify sub-command), rather than doing a bit-wise comparison, DRBD applies a hash function to the contents of every block being verified, and compares that hash with the peer. This option defines the hash algorithm being used for that purpose. It can be set to any of the kernel's data digest algorithms. In a typical kernel configuration you should have at least one of , , and available. By default this is not enabled; you must set this option explicitly in order to be able to use on-line device verification. See also the notes on data integrity. A resync process sends all marked data blocks from the source to the destination node, as long as no is given. When one is specified the resync process exchanges hash values of all marked blocks first, and sends only those data blocks that have different hash values. This setting is useful for DRBD setups with low bandwidth links. During the restart of a crashed primary node, all blocks covered by the activity log are marked for resync. But a large part of those will actually be still in sync, therefore using will lower the required bandwidth in exchange for CPU cycles. The dynamic resync speed controller gets enabled with setting plan_time to a positive value. It aims to fill the buffers along the data path with either a constant amount of data fill_target, or aims to have a constant delay time of delay_target along the path. The controller has an upper bound of max_rate. By plan_time the agility of the controller is configured. Higher values yield for slower/lower responses of the controller to deviation from the target value. It should be at least 5 times RTT. For regular data paths a fill_target in the area of 4k to 100k is appropriate. For a setup that contains drbd-proxy it is advisable to use delay_target instead. Only when fill_target is set to 0 the controller will use delay_target. 5 times RTT is a reasonable starting value. Max_rate should be set to the bandwidth available between the DRBD-hosts and the machines hosting DRBD-proxy, or to the available disk-bandwidth. The default value of plan_time is 0, the default unit is 0.1 seconds. Fill_target has 0 and sectors as default unit. Delay_target has 1 (100ms) and 0.1 as default unit. Max_rate has 10240 (100MiB/s) and KiB/s as default unit. The dynamic resync speed controller and its settings are available since DRBD 8.3.9. A node that is primary and sync-source has to schedule application IO requests and resync IO requests. The min_rate tells DRBD use only up to min_rate for resync IO and to dedicate all other available IO bandwidth to application requests. Note: The value 0 has a special meaning. It disables the limitation of resync IO completely, which might slow down application IO considerably. Set it to a value of 1, if you prefer that resync IO never slows down application IO. Note: Although the name might suggest that it is a lower bound for the dynamic resync speed controller, it is not. If the DRBD-proxy buffer is full, the dynamic resync speed controller is free to lower the resync speed down to 0, completely independent of the setting. Min_rate has 4096 (4MiB/s) and KiB/s as default unit. This setting controls what happens to IO requests on a degraded, disk less node (I.e. no data store is reachable). The available policies are and . If ond-policy is set to you can either resume IO by attaching/connecting the last lost data storage, or by the drbdadm resume-io res command. The latter will result in IO errors of course. The default is . This setting is available since DRBD 8.3.9. drbd.conf cpu-mask Sets the cpu-affinity-mask for DRBD's kernel threads of this device. The default value of cpu-mask is 0, which means that DRBD's kernel threads should be spread over all CPUs of the machine. This value must be given in hexadecimal notation. If it is too big it will be truncated. drbd.conf pri-on-incon-degr This handler is called if the node is primary, degraded and if the local copy of the data is inconsistent. drbd.conf pri-lost-after-sb The node is currently primary, but lost the after-split-brain auto recovery procedure. As as consequence, it should be abandoned. drbd.conf pri-lost The node is currently primary, but DRBD's algorithm thinks that it should become sync target. As a consequence it should give up its primary role. drbd.conf fence-peer The handler is part of the mechanism. This handler is called in case the node needs to fence the peer's disk. It should use other communication paths than DRBD's network link. drbd.conf local-io-error DRBD got an IO error from the local IO subsystem. drbd.conf initial-split-brain DRBD has connected and detected a split brain situation. This handler can alert someone in all cases of split brain, not just those that go unresolved. drbd.conf split-brain DRBD detected a split brain situation but remains unresolved. Manual recovery is necessary. This handler should alert someone on duty. drbd.conf before-resync-target DRBD calls this handler just before a resync begins on the node that becomes resync target. It might be used to take a snapshot of the backing block device. drbd.conf after-resync-target DRBD calls this handler just after a resync operation finished on the node whose disk just became consistent after being inconsistent for the duration of the resync. It might be used to remove a snapshot of the backing device that was created by the handler. Other Keywords drbd.conf include Include all files matching the wildcard pattern file-pattern. The statement is only allowed on the top level, i.e. it is not allowed inside any section. Notes on data integrity There are two independent methods in DRBD to ensure the integrity of the mirrored data. The online-verify mechanism and the of the section. Both mechanisms might deliver false positives if the user of DRBD modifies the data which gets written to disk while the transfer goes on. This may happen for swap, or for certain append while global sync, or truncate/rewrite workloads, and not necessarily poses a problem for the integrity of the data. Usually when the initiator of the data transfer does this, it already knows that that data block will not be part of an on disk data structure, or will be resubmitted with correct data soon enough. The causes the receiving side to log an error about "Digest integrity check FAILED: Ns +x\n", where N is the sector offset, and x is the size of the request in bytes. It will then disconnect, and reconnect, thus causing a quick resync. If the sending side at the same time detected a modification, it warns about "Digest mismatch, buffer modified by upper layers during write: Ns +x\n", which shows that this was a false positive. The sending side may detect these buffer modifications immediately after the unmodified data has been copied to the tcp buffers, in which case the receiving side won't notice it. The most recent (2007) example of systematic corruption was an issue with the TCP offloading engine and the driver of a certain type of GBit NIC. The actual corruption happened on the DMA transfer from core memory to the card. Since the TCP checksum gets calculated on the card, this type of corruption stays undetected as long as you do not use either the online or the . We suggest to use the only during a pre-production phase due to its CPU costs. Further we suggest to do online runs regularly e.g. once a month during a low load period. Version This document was revised for version 8.4.0 of the DRBD distribution. Author Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com. Reporting Bugs Report bugs to drbd-user@lists.linbit.com. Copyright Copyright 2001-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Also drbd8, drbddisk8, drbdsetup8, drbdmeta8, drbdadm8, DRBD User's Guide, DRBD web site drbd-8.4.4/documentation/drbd.xml0000664000000000000000000000670212132747531015462 0ustar rootroot drbd The start and stop script for DRBD DRBD 8.3.2 15 Oct 2008 drbd 8 System Administration /etc/init.d/drbd resource start stop status reload restart force-reload Introduction The script is used to start and stop drbd on a system V style init system. In order to use you must define a resource, a host, and any other configuration options in the drbd configuration file. See for details. If resource is omitted, then all of the resources listed in the config file are configured. This script might ask you Do you want to abort waiting for other server and make this one primary? Only answer this question with yes if you are sure that it is impossible to repair the other node. Version This document was revised for version 8.3.2 of the DRBD distribution. Author Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com. Reporting Bugs Report bugs to drbd-user@lists.linbit.com. Copyright Copyright 2001-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Also drbd.conf5, drbddisk8, drbdsetup8drbdadm8DRBD Homepage drbd-8.4.4/documentation/drbdadm.xml0000664000000000000000000005501212221261130016123 0ustar rootroot 6 May 2011 DRBD 8.4.0 drbdadm 8 System Administration drbdadm Administration tool for DRBD drbdadm drbdadm -d -cfile -tfile -scmd -mcmd -S -hhost --backend-options command all resource/volume> Description is the high level tool of the DRBD program suite. is to and what / is to . reads its configuration file and performs the specified commands by calling the and/or the program. can operate on whole resources or on individual volumes in a resource. The sub commands: , , , , , , , , , , , , , , , , , , work on whole resources and on individual volumes. Resource level only commands are: , , , , and . Options , Just prints the calls of to stdout, but does not run the commands. , file Specifies the configuration file drbdadm will use. If this parameter is not specified, drbdadm will look for , , and . , file Specifies an additional configuration file drbdadm to check. This option is only allowed with the dump and the sh-nop commands. , file Specifies the full path to the program. If this option is omitted, drbdadm will look for and . , file Specifies the full path to the program. If this option is omitted, drbdadm will look for and . , Specifies that this command should be performed on a stacked resource. , Specifies to which peer node to connect. Only necessary if there are more than two host sections in the resource you are working on. backend-options All options following the doubly hyphen are considered backend-options. These are passed through to the backend command. I.e. to , or . Commands attach Attaches a local backing block device to the DRBD resource's device. detach drbdadm detach Removes the backing storage device from a DRBD resource's device. connect drbdadm connect Sets up the network configuration of the resource's device. If the peer device is already configured, the two DRBD devices will connect. If there are more than two host sections in the resource you need to use the option to select the peer you want to connect to. disconnect drbdadm disconnect Removes the network configuration from the resource. The device will then go into StandAlone state. syncer drbdadm syncer Loads the resynchronization parameters into the device. up drbdadm up Is a shortcut for attach and connect. down drbdadm down Is a shortcut for disconnect and detach. primary drbdadm primary Promote the resource's device into primary role. You need to do this before any access to the device, such as creating or mounting a file system. secondary drbdadm secondary Brings the device back into secondary role. This is needed since in a connected DRBD device pair, only one of the two peers may have primary role (except if is explicitly set in the configuration file). invalidate drbdadm invalidate Forces DRBD to consider the data on the local backing storage device as out-of-sync. Therefore DRBD will copy each and every block from its peer, to bring the local storage device back in sync. To avoid races, you need an established replication link, or be disconnected Secondary. invalidate-remote drbdadm invalidate-remote This command is similar to the invalidate command, however, the peer's backing storage is invalidated and hence rewritten with the data of the local node. To avoid races, you need an established replication link, or be disconnected Primary. resize drbdadm resize Causes DRBD to re-examine all sizing constraints, and resize the resource's device accordingly. For example, if you increased the size of your backing storage devices (on both nodes, of course), then DRBD will adopt the new size after you called this command on one of your nodes. Since new storage space must be synchronised this command only works if there is at least one primary node present. The option can be used to online shrink the usable size of a drbd device. It's the users responsibility to make sure that a file system on the device is not truncated by that operation. The allows you to resize a device which is currently not connected to the peer. Use with care, since if you do not resize the peer's disk as well, further connect attempts of the two will fail. The allows you to resize an existing device and avoid syncing the new space. This is useful when adding addtional blank storage to your device. Example: # drbdadm -- --assume-clean resize r0 The options and may be used to change the layout of the activity log online. In case of internal meta data this may invovle shrinking the user visible size at the same time (unsing the ) or increasing the avalable space on the backing devices. check-resize drbdadm check-resize Calls drbdmeta to eventually move internal meta data. If the backing device was resized, while DRBD was not running, meta data has to be moved to the end of the device, so that the next command can succeed. create-md drbdadm create-md Initializes the meta data storage. This needs to be done before a DRBD resource can be taken online for the first time. In case of issues with that command have a look at drbdmeta 8 get-gi drbdadm get-gi Shows a short textual representation of the data generation identifiers. show-gi drbdadm show-gi Prints a textual representation of the data generation identifiers including explanatory information. dump-md drbdadm dump-md Dumps the whole contents of the meta data storage, including the stored bit-map and activity-log, in a textual representation. outdate drbdadm outdate Sets the outdated flag in the meta data. adjust drbdadm adjust Synchronizes the configuration of the device with your configuration file. You should always examine the output of the dry-run mode before actually executing this command. wait-connect drbdadm wait-connect Waits until the device is connected to its peer device. role drbdadm role Shows the current roles of the devices (local/peer). E.g. Primary/Secondary state drbdadm state Deprecated alias for "role", see above. cstate drbdadm cstate Shows the current connection state of the devices. dump drbdadm dump Just parse the configuration file and dump it to stdout. May be used to check the configuration file for syntactic correctness. outdate drbdadm outdate Used to mark the node's data as outdated. Usually used by the peer's fence-peer handler. verify drbdadm verify Starts online verify. During online verify, data on both nodes is compared for equality. See /proc/drbd for online verify progress. If out-of-sync blocks are found, they are not resynchronized automatically. To do that, disconnect and connect the resource when verification has completed. See also the notes on data integrity on the drbd.conf manpage. pause-sync drbdadm pause-sync Temporarily suspend an ongoing resynchronization by setting the local pause flag. Resync only progresses if neither the local nor the remote pause flag is set. It might be desirable to postpone DRBD's resynchronization until after any resynchronization of the backing storage's RAID setup. resume-sync drbdadm resume-sync Unset the local sync pause flag. new-current-uuid drbdadm new-current-uuid Generates a new currend UUID and rotates all other UUID values. This can be used to shorten the initial resync of a cluster. See the manpage for a more details. dstate drbdadm dstate Show the current state of the backing storage devices. (local/peer) hidden-commands Shows all commands undocumented on purpose. Version This document was revised for version 8.4.0 of the DRBD distribution. Author Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com Reporting Bugs Report bugs to drbd-user@lists.linbit.com. Copyright Copyright 2001-2011 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Also drbd.conf 5 , drbd 8 , drbddisk 8 , drbdsetup 8 , drbdmeta 8 and the DRBD project web site drbd-8.4.4/documentation/drbddisk.xml0000664000000000000000000000615012132747531016332 0ustar rootroot drbddisk Script to mark devices as primary and mount file systems 15 Oct 2008 DRBD 8.3.2 drbddisk 8 System Administration /etc/ha.d/resource.d/drbddisk resource start stop status Introduction The script brings the local device of resource into primary role. It is designed to be used by Heartbeat. In order to use you must define a resource, a host, and any other configuration options in the DRBD configuration file. See for details. If resource is omitted, then all of the resources listed in the config file are affected. Version This document was revised for version 8.0.14 of the DRBD distribution. Author Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com. Reporting Bugs Report bugs to drbd-user@lists.linbit.com. Copyright Copyright 2001-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Also drbd.conf5, drbd8, drbdsetup8drbdadm8DRBD Homepage drbd-8.4.4/documentation/drbdmeta.xml0000664000000000000000000002416312132747531016332 0ustar rootroot 15 Oct 2008 DRBD 8.3.2 drbdmeta 8 System Administration drbdmeta DRBD's meta data management tool drbdmeta drbdmeta --force --ignore-sanity-checks device v06 minor v07 meta_dev index v08 meta_dev index command cmd args Description Drbdmeta is used to create, display and modify the contents of DRBD's meta data storage. Usually you do not want to use this command directly, but start it via the frontend drbdadm8. This command only works if the DRBD resource is currently down, or at least detached from its backing storage. The first parameter is the device node associated to the resource. With the second parameter you can select the version of the meta data. Currently all major DRBD releases (0.6, 0.7 and 8) are supported. Options --force drbdmeta--force All questions that get asked by drbdmeta are treated as if the user answered 'yes'. --ignore-sanity-checks drbdmeta--ignore-sanity-checks Some sanity checks cause drbdmeta to terminate. E.g. if a file system image would get destroyed by creating the meta data. By using that option you can force drbdmeta to ignore these checks. Commands create-md drbdmetacreate-md Create-md initializes the meta data storage. This needs to be done before a DRBD resource can be taken online for the first time. In case there is already a meta data signature of an older format in place, drbdmeta will ask you if it should convert the older format to the selected format. If you will use the resource before it is connected to its peer for the first time DRBD may perform better if you use the option. For DRBD versions of the peer use up to these values: <8.3.7 -> 4k, 8.3.8 -> 32k, 8.3.9 -> 128k, 8.4.0 -> 1M. If you want to use more than 6433 activity log extents, or live on top of a spriped RAID, you may specify the number of stripes (, default 1), and the stripe size (, default 32). To just use a larger linear on-disk ring-buffer, leave the number of stripes at 1, and increase the size only: drbdmeta 0 v08 /dev/vg23/lv42 internal create-md --al-stripe-size 1M To avoid a single "spindle" from becoming a bottleneck, increase the number of stripes, to achieve an interleaved layout of the on-disk activity-log transactions. What you give as "stripe-size" should be what is a.k.a. "chunk size" or "granularity" or "strip unit": the minimum skip to the next "spindle". drbdmeta 0 v08 /dev/vg23/lv42 internal create-md --al-stripes 7 --al-stripe-size 64k get-gi drbdmetaget-gi Get-gi shows a short textual representation of the data generation identifier. In version 0.6 and 0.7 these are generation counters, while in version 8 it is a set of UUIDs. show-gi drbdmetashow-gi Show-gi prints a textual representation of the data generation identifiers including explanatory information. dump-md drbdmetadump-md Dumps the whole contents of the meta data storage including the stored bit-map and activity-log in a textual representation. outdate drbdmetaoutdate Sets the outdated flag in the meta data. This is used by the peer node when it wants to become primary, but cannot communicate with the DRBD stack on this host. dstate drbdmetadstate Prints the state of the data on the backing storage. The output is always followed by '/DUnknown' since drbdmeta only looks at the local meta data. check-resize drbdmetacheck-resize Examines the device size of a backing device, and it's last known device size, recorded in a file /var/lib/drbd/drbd-minor-??.lkbd. In case the size of the backing device changed, and the meta data can be found at the old position, it moves the meta data to the right position at the end of the block device. Expert's commands Drbdmeta allows you to modify the meta data as well. This is intentionally omitted for the command's usage output, since you should only use it if you really know what you are doing. By setting the generation identifiers to wrong values, you risk to overwrite your up-to-data data with an older version of your data. set-gi gi drbdmetaset-gi Set-gi allows you to set the generation identifier. Gi needs to be a generation counter for the 0.6 and 0.7 format, and a UUID set for 8.x. Specify it in the same way as get-gi shows it. restore-md dump_file drbdmetarestore-md Reads the dump_file and writes it to the meta data. Version This document was revised for version 8.3.2 of the DRBD distribution. Author Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com. Reporting Bugs Report bugs to drbd-user@lists.linbit.com. Copyright Copyright 2001-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Also drbdadm 8 drbd-8.4.4/documentation/drbdsetup.xml0000664000000000000000000022056512221261130016531 0ustar rootroot 6 May 2011 DRBD 8.4.0 drbdsetup 8 System Administration drbdsetup Setup tool for DRBD drbdsetup Description drbdsetup is used to associate DRBD devices with their backing block devices, to set up DRBD device pairs to mirror their backing block devices, and to inspect the configuration of running DRBD devices. Note drbdsetup is a low level tool of the DRBD program suite. It is used by the data disk and drbd scripts to communicate with the device driver. Commands Each drbdsetup sub-command might require arguments and bring its own set of options. All values have default units which might be overruled by K, M or G. These units are defined in the usual way (e.g. K = 2^10 = 1024). Common options All drbdsetup sub-commands accept these two options In case the specified DRBD device (minor number) does not exist yet, create it implicitly. new-resource Resources are the primary objects of any DRBD configuration. A resource must be created with the command before any volumes or minor devices can be created. Connections are referenced by name. new-minor A minor is used as a synonym for replicated block device. It is represented in the /dev/ directory by a block device. It is the application's interface to the DRBD-replicated block devices. These block devices get addressed by their minor numbers on the drbdsetup commandline. A pair of replicated block devices may have different minor numbers on the two machines. They are associated by a common volume-number. Volume numbers are local to each connection. Minor numbers are global on one node. del-resource Destroys a resource object. This is only possible if the resource has no volumes. del-minor Minors can only be destroyed if its disk is detached. attach, disk-options drbdsetup disk Attach associates device with lower_device to store its data blocks on. The (or ) should only be used if you wish not to use as much as possible from the backing block devices. If you do not use , the device is only ready for use as soon as it was connected to its peer once. (See the command.) With the disk-options command it is possible to change the options of a minor while it is attached. You can override DRBD's size determination method with this option. If you need to use the device before it was ever connected to its peer, use this option to pass the size of the DRBD device to the driver. Default unit is sectors (1s = 512 bytes). If you use the size parameter in drbd.conf, we strongly recommend to add an explicit unit postfix. drbdadm and drbdsetup used to have mismatching default units. If the driver of the lower_device reports an error to DRBD, DRBD will mark the disk as inconsistent, call a helper program, or detach the device from its backing storage and perform all further IO by requesting it from the peer. The valid err_handlers are: , and . Under we understand preventive measures to avoid situations where both nodes are primary and disconnected (AKA split brain). Valid fencing policies are: This is the default policy. No fencing actions are done. If a node becomes a disconnected primary, it tries to outdate the peer's disk. This is done by calling the fence-peer handler. The handler is supposed to reach the other node over alternative communication paths and call 'drbdadm outdate res' there. If a node becomes a disconnected primary, it freezes all its IO operations and calls its fence-peer handler. The fence-peer handler is supposed to reach the peer over alternative communication paths and call 'drbdadm outdate res' there. In case it cannot reach the peer, it should stonith the peer. IO is resumed as soon as the situation is resolved. In case your handler fails, you can resume IO with the command. DRBD has four implementations to express write-after-write dependencies to its backing storage device. DRBD will use the first method that is supported by the backing storage device and that is not disabled. By default the flush method is used. Since drbd-8.4.2 is disabled by default because since linux-2.6.36 (or 2.6.32 RHEL6) there is no reliable way to determine if queuing of IO-barriers works. Dangerous only enable if you are told so by one that knows for sure. When selecting the method you should not only base your decision on the measurable performance. In case your backing storage device has a volatile write cache (plain disks, RAID of plain disks) you should use one of the first two. In case your backing storage device has battery-backed write cache you may go with option 3. Option 4 (disable everything, use "none") is dangerous on most IO stacks, may result in write-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles. Do not use . Unfortunately device mapper (LVM) might not support barriers. The letter after "wo:" in /proc/drbd indicates with method is currently in use for a device: b, f, d, n. The implementations: barrier The first requires that the driver of the backing storage device support barriers (called 'tagged command queuing' in SCSI and 'native command queuing' in SATA speak). The use of this method can be enabled by setting the options to . flush The second requires that the backing device support disk flushes (called 'force unit access' in the drive vendors speak). The use of this method can be disabled setting to . drain The third method is simply to let write requests drain before write requests of a new reordering domain are issued. That was the only implementation before 8.0.9. none The fourth method is to not express write-after-write dependencies to the backing store at all, by also specifying . This is dangerous on most IO stacks, may result in write-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles. Do not use . Disables the use of disk flushes and barrier BIOs when accessing the meta data device. See the notes on . In some special circumstances the device mapper stack manages to pass BIOs to DRBD that violate the constraints that are set forth by DRBD's merge_bvec() function and which have more than one bvec. A known example is: phys-disk -> DRBD -> LVM -> Xen -> missaligned partition (63) -> DomU FS. Then you might see "bio would need to, but cannot, be split:" in the Dom0's kernel log. The best workaround is to proper align the partition within the VM (E.g. start it at sector 1024). That costs 480 KiB of storage. Unfortunately the default of most Linux partitioning tools is to start the first partition at an odd number (63). Therefore most distributions install helpers for virtual linux machines will end up with missaligned partitions. The second best workaround is to limit DRBD's max bvecs per BIO (i.e., the option) to 1, but that might cost performance. The default value of is 0, which means that there is no user imposed limitation. To ensure smooth operation of the application on top of DRBD, it is possible to limit the bandwidth that may be used by background synchronization. The default is 250 KiB/sec, the default unit is KiB/sec. Start resync on this device only if the device with minor is already in connected state. Otherwise this device waits in SyncPause state. DRBD automatically performs hot area detection. With this parameter you control how big the hot area (=active set) can get. Each extent marks 4M of the backing storage. In case a primary node leaves the cluster unexpectedly, the areas covered by the active set must be resynced upon rejoining of the failed node. The data structure is stored in the meta-data area, therefore each change of the active set is a write operation to the meta-data device. A higher number of extents gives longer resync times but less updates to the meta-data. The default number of extents is 1237. (Minimum: 7, Maximum: 65534) See also drbd.conf5 and drbdmeta8 for additional limitations and necessary preparation. DRBD's activity log transaction writing makes it possible, that after the crash of a primary node a partial (bit-map based) resync is sufficient to bring the node back to up-to-date. Setting to might increase normal operation performance but causes DRBD to do a full resync when a crashed primary gets reconnected. The default value is . The dynamic resync speed controller gets enabled with setting plan_time to a positive value. It aims to fill the buffers along the data path with either a constant amount of data fill_target, or aims to have a constant delay time of delay_target along the path. The controller has an upper bound of max_rate. By plan_time the agility of the controller is configured. Higher values yield for slower/lower responses of the controller to deviation from the target value. It should be at least 5 times RTT. For regular data paths a fill_target in the area of 4k to 100k is appropriate. For a setup that contains drbd-proxy it is advisable to use delay_target instead. Only when fill_target is set to 0 the controller will use delay_target. 5 times RTT is a reasonable starting value. Max_rate should be set to the bandwidth available between the DRBD-hosts and the machines hosting DRBD-proxy, or to the available disk-bandwidth. The default value of plan_time is 0, the default unit is 0.1 seconds. Fill_target has 0 and sectors as default unit. Delay_target has 1 (100ms) and 0.1 as default unit. Max_rate has 10240 (100MiB/s) and KiB/s as default unit. We track the disk IO rate caused by the resync, so we can detect non-resync IO on the lower level device. If the lower level device seems to be busy, and the current resync rate is above min_rate, we throttle the resync. The default value of min_rate is 4M, the default unit is k. If you want to not throttle at all, set it to zero, if you want to throttle always, set it to one. , If the driver of the lower_device does not finish an IO request within disk_timeout, DRBD considers the disk as failed. If DRBD is connected to a remote host, it will reissue local pending IO requests to the peer, and ship all new IO requests to the peer only. The disk state advances to diskless, as soon as the backing block device has finished all IO requests. The default value of is 0, which means that no timeout is enforced. The default unit is 100ms. This option is available since 8.3.12. The supported methods for load balancing of read requests are , , , and , , , , , and . The default value of is . This option is available since 8.4.1. connect, net-options drbdsetup net Connect sets up the device to listen on af:local_addr:port for incoming connections and to try to connect to af:remote_addr:port. If port is omitted, 7788 is used as default. If af is omitted gets used. Other supported address families are , for Dolphin Interconnect Solutions' "super sockets" and for Sockets Direct Protocol (Infiniband). The net-options command allows you to change options while the connection is established. On the TCP/IP link the specified protocol is used. Valid protocol specifiers are A, B, and C. Protocol A: write IO is reported as completed, if it has reached local disk and local TCP send buffer. Protocol B: write IO is reported as completed, if it has reached local disk and remote buffer cache. Protocol C: write IO is reported as completed, if it has reached both local and remote disk. In case it is not possible to connect to the remote DRBD device immediately, DRBD keeps on trying to connect. With this option you can set the time between two retries. The default value is 10. The unit is seconds. If the TCP/IP connection linking a DRBD device pair is idle for more than time seconds, DRBD will generate a keep-alive packet to check if its partner is still alive. The default value is 10. The unit is seconds. If the partner node fails to send an expected response packet within val tenths of a second, the partner node is considered dead and therefore the TCP/IP connection is abandoned. The default value is 60 (= 6 seconds). The socket send buffer is used to store packets sent to the secondary node, which are not yet acknowledged (from a network point of view) by the secondary node. When using protocol A, it might be necessary to increase the size of this data structure in order to increase asynchronicity between primary and secondary nodes. But keep in mind that more asynchronicity is synonymous with more data loss in the case of a primary node failure. Since 8.0.13 resp. 8.2.7 setting the size value to 0 means that the kernel should autotune this. The default size is 0, i.e. autotune. Packets received from the network are stored in the socket receive buffer first. From there they are consumed by DRBD. Before 8.3.2 the receive buffer's size was always set to the size of the socket send buffer. Since 8.3.2 they can be tuned independently. A value of 0 means that the kernel should autotune this. The default size is 0, i.e. autotune. In case the secondary node fails to complete a single write request for count times the timeout, it is expelled from the cluster, i.e. the primary node goes into StandAlone mode. The default is 0, which disables this feature. With this option the maximal number of write requests between two barriers is limited. Typically set to the same as , or the allowed maximum. Values smaller than 10 can lead to degraded performance. The default value is 2048. With this option the maximal number of buffer pages allocated by DRBD's receiver thread is limited. Typically set to the same as . Small values could lead to degraded performance. The default value is 2048, the minimum 32. Increase this if you cannot saturate the IO backend of the receiving side during linear write or during resync while otherwise idle. See also drbd.conf5 This setting has no effect with recent kernels that use explicit on-stack plugging (upstream Linux kernel 2.6.39, distributions may have backported). When the number of pending write requests on the standby (secondary) node exceeds the unplug-watermark, we trigger the request processing of our backing storage device. Some storage controllers deliver better performance with small values, others deliver best performance when the value is set to the same value as max-buffers, yet others don't feel much effect at all. Minimum 16, default 128, maximum 131072. With this option set you may assign primary role to both nodes. You only should use this option if you use a shared storage file system on top of DRBD. At the time of writing the only ones are: OCFS2 and GFS. If you use this option with any other file system, you are going to crash your nodes and to corrupt your data! You need to specify the HMAC algorithm to enable peer authentication at all. You are strongly encouraged to use peer authentication. The HMAC algorithm will be used for the challenge response authentication of the peer. You may specify any digest algorithm that is named in /proc/crypto. The shared secret used in peer authentication. May be up to 64 characters. possible policies are: No automatic resynchronization, simply disconnect. Auto sync from the node that was primary before the split-brain situation occurred. Auto sync from the node that became primary as second during the split-brain situation. In case one node did not write anything since the split brain became evident, sync from the node that wrote something to the node that did not write anything. In case none wrote anything this policy uses a random decision to perform a "resync" of 0 blocks. In case both have written something this policy disconnects the nodes. Auto sync from the node that touched more blocks during the split brain situation. Auto sync to the named node. possible policies are: No automatic resynchronization, simply disconnect. Discard the version of the secondary if the outcome of the algorithm would also destroy the current secondary's data. Otherwise disconnect. Discard the secondary's version. Always honor the outcome of the algorithm. In case it decides the current secondary has the correct data, call the on the current primary. Always honor the outcome of the algorithm. In case it decides the current secondary has the correct data, accept a possible instantaneous change of the primary's data. possible policies are: No automatic resynchronization, simply disconnect. Always honor the outcome of the algorithm. In case it decides the current secondary has the right data, call the on the current primary. Always honor the outcome of the algorithm. In case it decides the current secondary has the right data, accept a possible instantaneous change of the primary's data. Normally the automatic after-split-brain policies are only used if current states of the UUIDs do not indicate the presence of a third node. With this option you request that the automatic after-split-brain policies are used as long as the data sets of the nodes are somehow related. This might cause a full sync, if the UUIDs indicate the presence of a third node. (Or double faults have led to strange UUID sets.) This option sets DRBD's behavior when DRBD deduces from its meta data that a resynchronization is needed, and the SyncTarget node is already primary. The possible settings are: , and . While speaks for itself, with the setting the handler is called which is expected to either change the role of the node to secondary, or remove the node from the cluster. The default is . With the setting you allow DRBD to force a primary node into SyncTarget state. This means that the data exposed by DRBD changes to the SyncSource's version of the data instantaneously. USE THIS OPTION ONLY IF YOU KNOW WHAT YOU ARE DOING. DRBD can ensure the data integrity of the user's data on the network by comparing hash values. Normally this is ensured by the 16 bit checksums in the headers of TCP/IP packets. This option can be set to any of the kernel's data digest algorithms. In a typical kernel configuration you should have at least one of , , and available. By default this is not enabled. See also the notes on data integrity on the drbd.conf manpage. DRBD usually uses the TCP socket option TCP_CORK to hint to the network stack when it can expect more data, and when it should flush out what it has in its send queue. There is at least one network stack that performs worse when one uses this hinting method. Therefore we introduced this option, which disable the setting and clearing of the TCP_CORK socket option by DRBD. The time the peer has to answer to a keep-alive packet. In case the peer's reply is not received within this time period, it is considered dead. The default unit is tenths of a second, the default value is 5 (for half a second). Use this option to manually recover from a split-brain situation. In case you do not have any automatic after-split-brain policies selected, the nodes refuse to connect. By passing this option you make this node a sync target immediately after successful connect. Causes DRBD to abort the connection process after the resync handshake, i.e. no resync gets performed. You can find out which resync DRBD would perform by looking at the kernel's log file. By default DRBD blocks when the available TCP send queue becomes full. That means it will slow down the application that generates the write requests that cause DRBD to send more data down that TCP connection. When DRBD is deployed with DRBD-proxy it might be more desirable that DRBD goes into AHEAD/BEHIND mode shortly before the send queue becomes full. In AHEAD/BEHIND mode DRBD does no longer replicate data, but still keeps the connection open. The advantage of the AHEAD/BEHIND mode is that the application is not slowed down, even if DRBD-proxy's buffer is not sufficient to buffer all write requests. The downside is that the peer node falls behind, and that a resync will be necessary to bring it back into sync. During that resync the peer node will have an inconsistent disk. Available congestion_policys are and . The default is . Fill_threshold might be in the range of 0 to 10GiBytes. The default is 0 which disables the check. Active_extents_threshold has the same limits as . The AHEAD/BEHIND mode and its settings are available since DRBD 8.3.10. During online verification (as initiated by the verify sub-command), rather than doing a bit-wise comparison, DRBD applies a hash function to the contents of every block being verified, and compares that hash with the peer. This option defines the hash algorithm being used for that purpose. It can be set to any of the kernel's data digest algorithms. In a typical kernel configuration you should have at least one of , , and available. By default this is not enabled; you must set this option explicitly in order to be able to use on-line device verification. See also the notes on data integrity on the drbd.conf manpage. A resync process sends all marked data blocks form the source to the destination node, as long as no is given. When one is specified the resync process exchanges hash values of all marked blocks first, and sends only those data blocks over, that have different hash values. This setting is useful for DRBD setups with low bandwidth links. During the restart of a crashed primary node, all blocks covered by the activity log are marked for resync. But a large part of those will actually be still in sync, therefore using will lower the required bandwidth in exchange for CPU cycles. During resync-handshake, the dirty-bitmaps of the nodes are exchanged and merged (using bit-or), so the nodes will have the same understanding of which blocks are dirty. On large devices, the fine grained dirty-bitmap can become large as well, and the bitmap exchange can take quite some time on low-bandwidth links. Because the bitmap typically contains compact areas where all bits are unset (clean) or set (dirty), a simple run-length encoding scheme can considerably reduce the network traffic necessary for the bitmap exchange. For backward compatibility reasons, and because on fast links this possibly does not improve transfer time but consumes cpu cycles, this defaults to off. Introduced in 8.3.2. resource-options drbdsetup resource-options Changes the options of the resource at runtime. Sets the cpu-affinity-mask for DRBD's kernel threads of this device. The default value of cpu-mask is 0, which means that DRBD's kernel threads should be spread over all CPUs of the machine. This value must be given in hexadecimal notation. If it is too big it will be truncated. This setting controls what happens to IO requests on a degraded, disk less node (I.e. no data store is reachable). The available policies are and . If ond-policy is set to you can either resume IO by attaching/connecting the last lost data storage, or by the drbdadm resume-io res command. The latter will result in IO errors of course. The default is . This setting is available since DRBD 8.3.9. primary drbdsetup primary Sets the device into primary role. This means that applications (e.g. a file system) may open the device for read and write access. Data written to the device in primary role are mirrored to the device in secondary role. Normally it is not possible to set both devices of a connected DRBD device pair to primary role. By using the option, you override this behavior and instruct DRBD to allow two primaries. Alias for --force. Becoming primary fails if the local replica is not up-to-date. I.e. when it is inconsistent, outdated of consistent. By using this option you can force it into primary role anyway. USE THIS OPTION ONLY IF YOU KNOW WHAT YOU ARE DOING. secondary drbdsetup secondary Brings the device into secondary role. This operation fails as long as at least one application (or file system) has opened the device. It is possible that both devices of a connected DRBD device pair are secondary. verify drbdsetup verify This initiates on-line device verification. During on-line verification, the contents of every block on the local node are compared to those on the peer node. Device verification progress can be monitored via /proc/drbd. Any blocks whose content differs from that of the corresponding block on the peer node will be marked out-of-sync in DRBD's on-disk bitmap; they are not brought back in sync automatically. To do that, simply disconnect and reconnect the resource. If on-line verification is already in progress (and this node is "VerifyS"), this command silently "succeeds". In this case, any start-sector (see below) will be ignored, and any stop-sector (see below) will be honored. This can be used to stop a running verify, or to update/shorten/extend the coverage of the currently running verify. This command will fail if the device is not part of a connected device pair. See also the notes on data integrity on the drbd.conf manpage. Since version 8.3.2, on-line verification should resume from the last position after connection loss. It may also be started from an arbitrary position by setting this option. If you had reached some stop-sector before, and you do not specify an explicit start-sector, verify should resume from the previous stop-sector. Default unit is sectors. You may also specify a unit explicitly. The will be rounded down to a multiple of 8 sectors (4kB). , Since version 8.3.14, on-line verification can be stopped before it reaches end-of-device. Default unit is sectors. You may also specify a unit explicitly. The may be updated by issuing an additional drbdsetup verify command on the same node while the verify is running. This can be used to stop a running verify, or to update/shorten/extend the coverage of the currently running verify. invalidate drbdsetup invalidate This forces the local device of a pair of connected DRBD devices into SyncTarget state, which means that all data blocks of the device are copied over from the peer. This command will fail if the device is not either part of a connected device pair, or disconnected Secondary. invalidate-remote drbdsetup invalidate-remote This forces the local device of a pair of connected DRBD devices into SyncSource state, which means that all data blocks of the device are copied to the peer. On a disconnected Primary device, this will set all bits in the out of sync bitmap. As a side affect this suspends updates to the on disk activity log. Updates to the on disk activity log resume automatically when necessary. wait-connect drbdsetup wait-connect Returns as soon as the device can communicate with its partner device. This command will fail if the device cannot communicate with its partner for timeout seconds. If the peer was working before this node was rebooted, the wfc_timeout is used. If the peer was already down before this node was rebooted, the degr_wfc_timeout is used. If the peer was successfully outdated before this node was rebooted the outdated_wfc_timeout is used. The default value for all those timeout values is 0 which means to wait forever. The unit is seconds. In case the connection status goes down to StandAlone because the peer appeared but the devices had a split brain situation, the default for the command is to terminate. You can change this behavior with the option. wait-sync drbdsetup wait-sync Returns as soon as the device leaves any synchronization into connected state. The options are the same as with the wait-connect command. disconnect drbdsetup disconnect Removes the information set by the command from the device. This means that the device goes into unconnected state and will no longer listen for incoming connections. detach drbdsetup detach Removes the information set by the command from the device. This means that the device is detached from its backing storage device. , A regular detach returns after the disk state finally reached diskless. As a consequence detaching from a frozen backing block device never terminates. On the other hand A forced detach returns immediately. It allows you to detach DRBD from a frozen backing block device. Please note that the disk will be marked as failed until all pending IO requests where finished by the backing block device. down drbdsetup down Removes all configuration information from the device and forces it back to unconfigured state. role drbdsetup role Shows the current roles of the device and its peer, as local/peer. state drbdsetup state Deprecated alias for "role" cstate drbdsetup cstate Shows the current connection state of the device. dstate drbdsetup dstate Shows the current states of the backing storage devices, as local/peer. resize drbdsetup resize This causes DRBD to reexamine the size of the device's backing storage device. To actually do online growing you need to extend the backing storages on both devices and call the command on one of your nodes. The option can be used to online shrink the usable size of a drbd device. It's the users responsibility to make sure that a file system on the device is not truncated by that operation. The allows you to resize a device which is currently not connected to the peer. Use with care, since if you do not resize the peer's disk as well, further connect attempts of the two will fail. When the option is given DRBD will skip the resync of the new storage. Only do this if you know that the new storage was initialized to the same content by other means. The options and may be used to change the layout of the activity log online. In case of internal meta data this may invovle shrinking the user visible size at the same time (unsing the ) or increasing the avalable space on the backing devices. check-resize drbdsetup check-resize To enable DRBD to detect offline resizing of backing devices this command may be used to record the current size of backing devices. The size is stored in files in /var/lib/drbd/ named drbd-minor-??.lkbd This command is called by drbdadm resize res after drbdsetup device resize returned. pause-sync drbdsetup pause-sync Temporarily suspend an ongoing resynchronization by setting the local pause flag. Resync only progresses if neither the local nor the remote pause flag is set. It might be desirable to postpone DRBD's resynchronization after eventual resynchronization of the backing storage's RAID setup. resume-sync drbdsetup resume-sync Unset the local sync pause flag. outdate drbdsetup outdate Mark the data on the local backing storage as outdated. An outdated device refuses to become primary. This is used in conjunction with and by the peer's handler. show-gi drbdsetup show-gi Displays the device's data generation identifiers verbosely. get-gi drbdsetup get-gi Displays the device's data generation identifiers. show drbdsetup show Shows all available configuration information of the device. suspend-io drbdsetup suspend-io This command is of no apparent use and just provided for the sake of completeness. resume-io drbdsetup resume-io If the fence-peer handler fails to stonith the peer node, and your policy is set to resource-and-stonith, you can unfreeze IO operations with this command. events drbdsetup events Displays every state change of DRBD and all calls to helper programs. This might be used to get notified of DRBD's state changes by piping the output to another program. Display the events of all DRBD minors. This is a debugging aid that displays the content of all received netlink messages. new-current-uuid drbdsetup new-current-uuid Generates a new current UUID and rotates all other UUID values. This has at least two use cases, namely to skip the initial sync, and to reduce network bandwidth when starting in a single node configuration and then later (re-)integrating a remote site. Available option: Clears the sync bitmap in addition to generating a new current UUID. This can be used to skip the initial sync, if you want to start from scratch. This use-case does only work on "Just Created" meta data. Necessary steps: On both nodes, initialize meta data and configure the device. drbdadm create-md --force res They need to do the initial handshake, so they know their sizes. drbdadm up res They are now Connected Secondary/Secondary Inconsistent/Inconsistent. Generate a new current-uuid and clear the dirty bitmap. drbdadm new-current-uuid --clear-bitmap res They are now Connected Secondary/Secondary UpToDate/UpToDate. Make one side primary and create a file system. drbdadm primary res mkfs -t fs-type $(drbdadm sh-dev res) One obvious side-effect is that the replica is full of old garbage (unless you made them identical using other means), so any online-verify is expected to find any number of out-of-sync blocks. You must not use this on pre-existing data! Even though it may appear to work at first glance, once you switch to the other node, your data is toast, as it never got replicated. So do not leave out the mkfs (or equivalent). This can also be used to shorten the initial resync of a cluster where the second node is added after the first node is gone into production, by means of disk shipping. This use-case works on disconnected devices only, the device may be in primary or secondary role. The necessary steps on the current active server are: drbdsetup new-current-uuid --clear-bitmap minor Take the copy of the current active server. E.g. by pulling a disk out of the RAID1 controller, or by copying with dd. You need to copy the actual data, and the meta data. drbdsetup new-current-uuid minor Now add the disk to the new secondary node, and join it to the cluster. You will get a resync of that parts that were changed since the first call to drbdsetup in step 1. Examples For examples, please have a look at the DRBD User's Guide. Version This document was revised for version 8.3.2 of the DRBD distribution. Author Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com Reporting Bugs Report bugs to drbd-user@lists.linbit.com. Copyright Copyright 2001-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Also drbd.conf 5 , drbd 8 , drbddisk 8 , drbdadm 8 , DRBD User's Guide, DRBD web site drbd-8.4.4/documentation/fencing-by-constraints.txt0000664000000000000000000001002712221261130021130 0ustar rootroot# vim: set foldenable foldmethod=indent sw=4 ts=8 : # Copyright 2013 Linbit HA Solutions GmbH # Lars Ellenberg @ linbit.com TODO: someone convert this into proper ascii doc please ;-) ... and draw some pictures ... How crm-fence-peer.sh, pacemaker, and the OCF Linbit DRBD resource agent are supposed to work together. Two node cluster is the trickier one, because it has not real quorum. Relative Timeouts --dc-timeout > dead-time resp. stonith-timeout if stonith enabled, --timeout >= --dc-timeout if no stonith, then timeout may be small. Pacemaker operations timeouts monitor and promote action timeout > max(dc_timeout, timeout) Node reboot, possibly because of crash or stonith due to communication loss no peer reachable [no delay] crm may decide to elect itself, shoot the peer, and start services. If DRBD peer disk state is known Outdated or worse, DRBD will switch itself to UpToDate, allowing it to be promoted, without further fencing actions. If DRBD peer disk state is DUnknown, DRBD will be only Consistent. In case crm decides to promote this instance, the fence-peer callback runs, finds the peer "unreachable", finds itself Consistent only, does NOT set any constraint, and DRBD refuses to be promoted. CRM will now try in an endless loop to promote this instance. Avoid this by adding param adjust_master_score="0 10 1000 10000" to the DRBD resource definition. no replication link CRM can see both nodes. [delay: crmadmin -S $peer] If currently both nodes are Secondary Consistent, CRM will decide to promote one instance. The fence-peer callback will find the other node still reachable after timeout, and set the constraint. If there is already one Primary, and this is a node rejoining the cluster, there should already be a constraint preventing this node from being promoted. Only Replication link breaks during normal operation Single Primary [delay: crmadmin -S $peer] fence-peer callback finds DC, crmadmin -S confirms peer still "reachable", and sets contraint. Dual Primary both fence-peer callbacks find DC, both see node_state "reachable", optionaly delay for --network-hickup timeout, and if DRBD is still disconnected, both try to set the constraint. Only one succeeds. The loser should probably commit suicide, to reduce the overall recovery time. --suicide-on-failure-if-primary Node crash surviving node is Secondary, [no delay] If not DC, triggers DC election, elects itself. Is DC now. If stonith enabled, shoots the peer. Promotes this node. During promotion, fenc-peer callback finds a DC, and a node_state "unreachable", so sets the constraint "immediately". surviving node is Primary (DC) [delay up to timeout] If stonith enabled, shoots the peer. fence-peer callback finds DC, after some time sees node_state "unreachable", or times out while node_state is still "reachable". Either way still sets the constraint. surviving node is Primary (not DC) [delay up to mac(dc_timeout,timeout)] fence-peer callback loops trying to contact DC. eventually this node is elected DC. If stonith enabled, shoots the peer. Fence-peer callback either times out while no DC is available, thus fails. Make sure you chose a suitable --dc-timeout. Or it finds the other node "unreachable", and sets the constraint. Total communication loss To the single node, this looks like node crash, so see above. The difference is the potential of data divergence. If DRBD was configured for "fencing resource-and-stonith", IO on any Primary is frozen while the fence-peer callback runs. If stonith is enabled, timeouts should be selected so that we are shot while waiting for the DC to confirm node_state "unreachable" of the peer, thus combined with freezing IO, no harmful data diversion can happen at this time. If there is no stonith enabled, data divergence is unavoidable. ==> Multi-Primary *requires* both node level fencing (stonith) AND drbd resource level fencing Again: Multi-Primary REQUIRES stonith enabled and working. drbd-8.4.4/documentation/xml-usage-to-docbook.xsl0000664000000000000000000000323412132747531020512 0ustar rootroot drbdsetup -- val -- -- drbd-8.4.4/drbd-kernel.spec.in0000664000000000000000000000773512226007136014627 0ustar rootrootName: drbd-kernel Summary: Kernel driver for DRBD Version: @PACKAGE_VERSION@ Release: 2%{?dist} Source: http://oss.linbit.com/%{name}/8.3/drbd-%{version}.tar.gz License: GPLv2+ Group: System Environment/Kernel URL: http://www.drbd.org/ BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) %if ! %{defined suse_version} BuildRequires: redhat-rpm-config %endif %if %{defined kernel_module_package_buildreqs} BuildRequires: %kernel_module_package_buildreqs %endif %description This module is the kernel-dependent driver for DRBD. This is split out so that multiple kernel driver versions can be installed, one for each installed kernel. %prep %setup -q -n drbd-%{version} %if %{defined suse_kernel_module_package} # Support also sles10, where kernel_module_package was not yet defined. # In sles11, suse_k_m_p became a wrapper around k_m_p. %if 0%{?suse_version} < 1110 # We need to exclude some flavours on sles10 etc, # or we hit an rpm internal buffer limit. %suse_kernel_module_package -n drbd -p preamble -f filelist-suse kdump kdumppae vmi vmipae um %else %suse_kernel_module_package -n drbd -p preamble -f filelist-suse %endif %else # Concept stolen from sles kernel-module-subpackage: # include the kernel version in the package version, # so we can have more than one kmod-drbd. # Needed, because even though kABI is still "compatible" in RHEL 6.0 to 6.1, # the actual functionality differs very much: 6.1 does no longer do BARRIERS, # but wants FLUSH/FUA instead. # For convenience, we want both 6.0 and 6.1 in the same repository, # and have yum/rpm figure out via dependencies, which kmod version should be installed. # This is a dirty hack, non generic, and should probably be enclosed in some "if-on-rhel6". %define _this_kmp_version %{version}_%(echo %kernel_version | sed -r 'y/-/_/; s/\.el.\.(x86_64|i.86)$//;') %kernel_module_package -v %_this_kmp_version -n drbd -p preamble -f filelist-redhat %endif %build rm -rf obj mkdir obj ln -s ../scripts obj/ for flavor in %flavors_to_build; do cp -r drbd obj/$flavor #make -C %{kernel_source $flavor} M=$PWD/obj/$flavor make -C obj/$flavor %{_smp_mflags} all KDIR=%{kernel_source $flavor} done %install export INSTALL_MOD_PATH=$RPM_BUILD_ROOT %if %{defined kernel_module_package_moddir} export INSTALL_MOD_DIR=%{kernel_module_package_moddir drbd} %else %if %{defined suse_kernel_module_package} export INSTALL_MOD_DIR=updates %else export INSTALL_MOD_DIR=extra/drbd %endif %endif # Very likely kernel_module_package_moddir did ignore the parameter, # so we just append it here. The weak-modules magic expects that location. [ $INSTALL_MOD_DIR = extra ] && INSTALL_MOD_DIR=extra/drbd for flavor in %flavors_to_build ; do make -C %{kernel_source $flavor} modules_install \ M=$PWD/obj/$flavor kernelrelease=$(make -s -C %{kernel_source $flavor} kernelrelease) mv obj/$flavor/.kernel.config.gz obj/k-config-$kernelrelease.gz done %if %{defined suse_kernel_module_package} # On SUSE, putting the modules into the default path determined by # %kernel_module_package_moddir is enough to give them priority over # shipped modules. rm -f drbd.conf %else mkdir -p $RPM_BUILD_ROOT/etc/depmod.d echo "override drbd * weak-updates" \ > $RPM_BUILD_ROOT/etc/depmod.d/drbd.conf %endif %clean rm -rf %{buildroot} %changelog * Fri Oct 11 2013 Philipp Reisner - 8.4.4-1 - New upstream release. * Tue Feb 5 2013 Philipp Reisner - 8.4.3-1 - New upstream release. * Thu Sep 6 2012 Philipp Reisner - 8.4.2-1 - New upstream release. * Tue Feb 21 2012 Lars Ellenberg - 8.4.1-2 - Build fix for RHEL 6 and ubuntu lucid * Tue Dec 20 2011 Philipp Reisner - 8.4.1-1 - New upstream release. * Mon Jul 18 2011 Philipp Reisner - 8.4.0-1 - New upstream release. * Fri Jan 28 2011 Philipp Reisner - 8.3.10-1 - New upstream release. * Wed Nov 25 2010 Andreas Gruenbacher - 8.3.9-1 - Convert to a Kernel Module Package. drbd-8.4.4/drbd-km.spec.in0000664000000000000000000001347412226007136013753 0ustar rootroot# "uname -r" output of the kernel to build for, the running one # if none was specified with "--define 'kernelversion '" # PLEASE: provide both (correctly) or none!! %{!?kernelversion: %{expand: %%define kernelversion %(uname -r)}} %{!?kdir: %{expand: %%define kdir /lib/modules/%(uname -r)/build}} # encode - to _ to be able to include that in a package name or release "number" %global krelver %(echo %{kernelversion} | tr -s '-' '_') Name: @PACKAGE_TARNAME@-km Summary: DRBD driver for Linux Version: @PACKAGE_VERSION@ Release: 2@RPM_DIST_TAG@ Source: http://oss.linbit.com/%{name}/8.3/drbd-%{version}.tar.gz License: GPLv2+ ExclusiveOS: linux Group: System Environment/Kernel URL: http://www.drbd.org/ BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) BuildRequires: gcc, @RPM_BUILDREQ_KM@ %description DRBD mirrors a block device over the network to another machine. Think of it as networked raid 1. It is a building block for setting up high availability (HA) clusters. # I choose to have the kernelversion as part of the package name! # drbd-km is prepended... %package %{krelver} Summary: Kernel driver for DRBD. Group: System Environment/Kernel Conflicts: @RPM_CONFLICTS_KM@ # always require a suitable userland and depmod. Requires: drbd-utils = %{version}, /sbin/depmod # to be able to override from build scripts which flavor of kernel we are building against. Requires: %{expand: %(echo ${DRBD_KMOD_REQUIRES:-kernel})} # TODO: break up this generic .spec file into per distribution ones, # and use the distribution specific naming and build conventions for kernel modules. %description %{krelver} This module is the kernel-dependent driver for DRBD. This is split out so that multiple kernel driver versions can be installed, one for each installed kernel. %files %{krelver} %defattr(-,root,root) /lib/modules/%{kernelversion}/ %doc COPYING %doc ChangeLog %doc drbd/k-config-%{kernelversion}.gz %prep %setup -q -n drbd-%{version} test -d %{kdir}/. test "$(KDIR=%{kdir} scripts/get_uts_release.sh)" = %{kernelversion} %build %configure \ --without-utils \ --with-km \ --without-udev \ --without-xen \ --without-pacemaker \ --without-heartbeat \ --without-rgmanager \ --without-bashcompletion echo kernelversion=%{kernelversion} echo kversion=%{kversion} echo krelver=%{krelver} make %{_smp_mflags} module KDIR=%{kdir} %install rm -rf %{buildroot} make install DESTDIR=%{buildroot} cd drbd mv .kernel.config.gz k-config-%{kernelversion}.gz %clean rm -rf %{buildroot} %preun %{krelver} lsmod | grep drbd > /dev/null 2>&1 if [ $? -eq 0 ]; then rmmod drbd fi %post %{krelver} # hack for distribution kernel packages, # which already contain some (probably outdated) drbd module EXTRA_DRBD_KO=/lib/modules/%{kernelversion}/extra/drbd.ko if test -e $EXTRA_DRBD_KO; then mv $EXTRA_DRBD_KO $EXTRA_DRBD_KO.orig fi uname -r | grep BOOT || /sbin/depmod -a -F /boot/System.map-%{kernelversion} %{kernelversion} >/dev/null 2>&1 || true %postun %{krelver} /sbin/depmod -a -F /boot/System.map-%{kernelversion} %{kernelversion} >/dev/null 2>&1 || true %changelog * Fri Oct 11 2013 Philipp Reisner - 8.4.4-1 - New upstream release. * Tue Feb 5 2013 Philipp Reisner - 8.4.3-1 - New upstream release. * Thu Sep 6 2012 Philipp Reisner - 8.4.2-1 - New upstream release. * Tue Feb 21 2012 Lars Ellenberg - 8.4.1-2 - Build fix for RHEL 6 and ubuntu lucid * Tue Dec 20 2011 Philipp Reisner - 8.4.1-1 - New upstream release. * Mon Jul 18 2011 Philipp Reisner - 8.4.0-1 - New upstream release. * Fri Jan 28 2011 Philipp Reisner - 8.3.10-1 - New upstream release. * Fri Oct 22 2010 Philipp Reisner - 8.3.9-1 - New upstream release. * Wed Jun 2 2010 Philipp Reisner - 8.3.8-1 - New upstream release. * Thu Jan 13 2010 Philipp Reisner - 8.3.7-1 - New upstream release. * Thu Nov 8 2009 Philipp Reisner - 8.3.6-1 - New upstream release. * Thu Oct 27 2009 Philipp Reisner - 8.3.5-1 - New upstream release. * Wed Oct 21 2009 Florian Haas - 8.3.4-12 - Packaging makeover. * Thu Oct 6 2009 Philipp Reisner - 8.3.4-1 - New upstream release. * Thu Oct 5 2009 Philipp Reisner - 8.3.3-1 - New upstream release. * Fri Jul 3 2009 Philipp Reisner - 8.3.2-1 - New upstream release. * Fri Mar 27 2009 Philipp Reisner - 8.3.1-1 - New upstream release. * Thu Dec 18 2008 Philipp Reisner - 8.3.0-1 - New upstream release. * Thu Nov 12 2008 Philipp Reisner - 8.2.7-1 - New upstream release. * Fri May 30 2008 Philipp Reisner - 8.2.6-1 - New upstream release. * Tue Feb 12 2008 Philipp Reisner - 8.2.5-1 - New upstream release. * Fri Jan 11 2008 Philipp Reisner - 8.2.4-1 - New upstream release. * Wed Jan 9 2008 Philipp Reisner - 8.2.3-1 - New upstream release. * Fri Nov 2 2007 Philipp Reisner - 8.2.1-1 - New upstream release. * Fri Sep 28 2007 Philipp Reisner - 8.2.0-1 - New upstream release. * Mon Sep 3 2007 Philipp Reisner - 8.0.6-1 - New upstream release. * Fri Aug 3 2007 Philipp Reisner - 8.0.5-1 - New upstream release. * Wed Jun 27 2007 Philipp Reisner - 8.0.4-1 - New upstream release. * Mon May 7 2007 Philipp Reisner - 8.0.3-1 - New upstream release. * Fri Apr 6 2007 Philipp Reisner - 8.0.2-1 - New upstream release. * Mon Mar 3 2007 Philipp Reisner - 8.0.1-1 - New upstream release. * Wed Jan 24 2007 Philipp Reisner - 8.0.0-1 - New upstream release. drbd-8.4.4/drbd.spec.in0000664000000000000000000003062112226007136013337 0ustar rootroot# Define init script directory. %{_initddir} is available from Fedora # 9 forward; CentOS knows 5 only %{_initrddir}. Neither are known to # autoconf... %{!?_initddir: %{expand: %%global _initddir %{_initrddir}}} # Compatibility macro wrappers for legacy RPM versions that do not # support conditional builds %{!?bcond_without: %{expand: %%global bcond_without() %%{expand:%%%%{!?_without_%%{1}:%%%%global with_%%{1} 1}}}} %{!?bcond_with: %{expand: %%global bcond_with() %%{expand:%%%%{?_with_%%{1}:%%%%global with_%%{1} 1}}}} %{!?with: %{expand: %%global with() %%{expand:%%%%{?with_%%{1}:1}%%%%{!?with_%%{1}:0}}}} %{!?without: %{expand: %%global without() %%{expand:%%%%{?with_%%{1}:0}%%%%{!?with_%%{1}:1}}}} # Conditionals # Invoke "rpmbuild --without " or "rpmbuild --with " # to disable or enable specific features %bcond_without udev %bcond_without pacemaker %bcond_with rgmanager %bcond_without heartbeat # conditionals may not contain "-" nor "_", hence "bashcompletion" %bcond_without bashcompletion # --with xen is ignored on any non-x86 architecture %bcond_without xen %bcond_without legacy_utils %ifnarch %{ix86} x86_64 %global _without_xen --without-xen %endif Name: @PACKAGE_TARNAME@ Summary: DRBD driver for Linux Version: @PACKAGE_VERSION@ Release: 2@RPM_DIST_TAG@ Source: http://oss.linbit.com/%{name}/8.3/%{name}-%{version}.tar.gz License: GPLv2+ ExclusiveOS: linux Group: System Environment/Kernel URL: http://www.drbd.org/ BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) BuildRequires: @RPM_BUILDREQ_DEFAULT@ Requires: %{name}-utils = %{version} %ifarch %{ix86} x86_64 %if %{with xen} Requires: %{name}-xen = %{version} %endif %endif %if %{with udev} Requires: %{name}-udev = %{version} BuildRequires: udev %endif %if %{with pacemaker} Requires: %{name}-pacemaker = %{version} %endif ## %if %{with rgmanager} ## ## No. ## ## We don't want to annoy the majority of our userbase on pacemaker ## ## by pulling in the full rgmanager stack via drbd-rgmanager as well. ## Requires: %{name}-rgmanager = %{version} ## %endif %if %{with heartbeat} Requires: %{name}-heartbeat = %{version} %endif %if %{with bashcompletion} Requires: %{name}-bash-completion = %{version} %endif %description DRBD mirrors a block device over the network to another machine. Think of it as networked raid 1. It is a building block for setting up high availability (HA) clusters. This is a virtual package, installing the full DRBD userland suite. # Just a few docs go into the "drbd" package. Everything else is part # of one of the drbd-* packages. %files %defattr(-,root,root,-) %doc COPYING %doc ChangeLog %doc README %package utils Summary: Management utilities for DRBD Group: System Environment/Kernel # We used to have one monolithic userland package. # Since all other packages require drbd-utils, # it should be sufficient to add the conflict here. Conflicts: drbd < 8.3.6 # These exist in centos extras: Conflicts: drbd82 drbd83 @RPM_REQ_CHKCONFIG_POST@ @RPM_REQ_CHKCONFIG_PREUN@ %description utils DRBD mirrors a block device over the network to another machine. Think of it as networked raid 1. It is a building block for setting up high availability (HA) clusters. This packages includes the DRBD administration tools. %files utils %defattr(755,root,root,-) %{_sbindir}/drbdsetup %{_sbindir}/drbdadm %{_sbindir}/drbdmeta %if %{with legacy_utils} %dir /lib/drbd/ /lib/drbd/drbdsetup-83 /lib/drbd/drbdadm-83 %endif %{_initddir}/%{name} %{_sbindir}/drbd-overview %dir %{_prefix}/lib/%{name} %{_prefix}/lib/%{name}/outdate-peer.sh %{_prefix}/lib/%{name}/snapshot-resync-target-lvm.sh %{_prefix}/lib/%{name}/unsnapshot-resync-target-lvm.sh %{_prefix}/lib/%{name}/notify-out-of-sync.sh %{_prefix}/lib/%{name}/notify-split-brain.sh %{_prefix}/lib/%{name}/notify-emergency-reboot.sh %{_prefix}/lib/%{name}/notify-emergency-shutdown.sh %{_prefix}/lib/%{name}/notify-io-error.sh %{_prefix}/lib/%{name}/notify-pri-lost-after-sb.sh %{_prefix}/lib/%{name}/notify-pri-lost.sh %{_prefix}/lib/%{name}/notify-pri-on-incon-degr.sh %{_prefix}/lib/%{name}/notify.sh %defattr(-,root,root,-) %dir %{_var}/lib/%{name} %config(noreplace) %{_sysconfdir}/drbd.conf %dir %{_sysconfdir}/drbd.d %config(noreplace) %{_sysconfdir}/drbd.d/global_common.conf %{_mandir}/man8/drbd.8.* %{_mandir}/man8/drbdsetup.8.* %{_mandir}/man8/drbdadm.8.* %{_mandir}/man5/drbd.conf.5.* %{_mandir}/man8/drbdmeta.8.* %doc scripts/drbd.conf.example %doc COPYING %doc ChangeLog %doc README %ifarch %{ix86} x86_64 %if %{with xen} %package xen Summary: Xen block device management script for DRBD Group: System Environment/Kernel Requires: %{name}-utils = %{version}-%{release} @RPM_REQ_XEN@ @RPM_SUBPACKAGE_NOARCH@ %description xen This package contains a Xen block device helper script for DRBD, capable of promoting and demoting DRBD resources as necessary. %files xen %defattr(755,root,root,-) %{_sysconfdir}/xen/scripts/block-drbd %endif # with xen %endif # arch %{ix86} x86_64 %if %{with udev} %package udev Summary: udev integration scripts for DRBD Group: System Environment/Kernel Requires: %{name}-utils = %{version}-%{release}, udev @RPM_SUBPACKAGE_NOARCH@ %description udev This package contains udev helper scripts for DRBD, managing symlinks to DRBD devices in /dev/drbd/by-res and /dev/drbd/by-disk. %files udev %defattr(-,root,root,-) %config %{_sysconfdir}/udev/rules.d/65-drbd.rules* %endif # with udev %if %{with pacemaker} %package pacemaker Summary: Pacemaker resource agent for DRBD Group: System Environment/Base Requires: %{name}-utils = %{version}-%{release} @RPM_REQ_PACEMAKER@ License: GPLv2 @RPM_SUBPACKAGE_NOARCH@ %description pacemaker This package contains the master/slave DRBD resource agent for the Pacemaker High Availability cluster manager. %files pacemaker %defattr(755,root,root,-) %{_prefix}/lib/%{name}/crm-fence-peer.sh %{_prefix}/lib/%{name}/crm-unfence-peer.sh %{_prefix}/lib/%{name}/stonith_admin-fence-peer.sh %{_prefix}/lib/ocf/resource.d/linbit/drbd %endif # with pacemaker # Dependencies for drbd-rgmanager are particularly awful. On RHEL 5 # and prior (and corresponding Fedora releases), %{_datadir}/cluster # was owned by rgmanager version 2, so we have to depend on that. # # With Red Hat Cluster 3.0.1 (around Fedora 12), the DRBD resource # agent was merged in, and it became part of the resource-agents 3 # package (which of course is different from resource-agents on all # other platforms -- go figure). So for resource-agents >= 3, we must # generally conflict. # # Then for RHEL 6, Red Hat in all their glory decided to keep the # packaging scheme, but kicked DRBD out of the resource-agents # package. Thus, for RHEL 6 specifically, we must not conflict with # resource-agents >=3, but instead require it. # # The saga continues: # In RHEL 6.1 they have listed the drbd resource agent as valid agent, # but do not include it in their resource-agents package. -> So we # drop any dependency regarding rgmanager's version. # # All of this for exactly two (2) files. %if %{with rgmanager} %package rgmanager Summary: Red Hat Cluster Suite agent for DRBD Group: System Environment/Base Requires: %{name}-utils = %{version}-%{release} @RPM_SUBPACKAGE_NOARCH@ %description rgmanager This package contains the DRBD resource agent for the Red Hat Cluster Suite resource manager. As of Red Hat Cluster Suite 3.0.1, the DRBD resource agent is included in the Cluster distribution. %files rgmanager %defattr(755,root,root,-) %{_datadir}/cluster/drbd.sh %{_prefix}/lib/%{name}/rhcs_fence %defattr(-,root,root,-) %{_datadir}/cluster/drbd.metadata %endif # with rgmanager %if %{with heartbeat} %package heartbeat Summary: Heartbeat resource agent for DRBD Group: System Environment/Base Requires: %{name}-utils = %{version}-%{release} @RPM_REQ_HEARTBEAT@ License: GPLv2 @RPM_SUBPACKAGE_NOARCH@ %description heartbeat This package contains the DRBD resource agents for the Heartbeat cluster resource manager (in v1 compatibility mode). %files heartbeat %defattr(755,root,root,-) %{_sysconfdir}/ha.d/resource.d/drbddisk %{_sysconfdir}/ha.d/resource.d/drbdupper %defattr(-,root,root,-) %{_mandir}/man8/drbddisk.8.* %endif # with heartbeat %if %{with bashcompletion} %package bash-completion Summary: Programmable bash completion support for drbdadm Group: System Environment/Base Requires: %{name}-utils = %{version}-%{release} @RPM_REQ_BASH_COMPLETION@ @RPM_SUBPACKAGE_NOARCH@ %description bash-completion This package contains programmable bash completion support for the drbdadm management utility. %files bash-completion %defattr(-,root,root,-) %config %{_sysconfdir}/bash_completion.d/drbdadm* %endif # with bashcompletion %prep %setup -q %build %configure \ --with-utils \ --without-km \ %{?_without_udev} \ %{?_without_xen} \ %{?_without_pacemaker} \ %{?_without_heartbeat} \ %{?_with_rgmanager} \ %{?_without_bashcompletion} \ %{?_without_legacy_utils} \ --with-initdir=%{_initddir} make %{?_smp_mflags} %install rm -rf %{buildroot} make install DESTDIR=%{buildroot} %clean rm -rf %{buildroot} %post utils chkconfig --add drbd %if %{without udev} for i in `seq 0 15` ; do test -b /dev/drbd$i || mknod -m 0660 /dev/drbd$i b 147 $i; done %endif #without udev # compat: we used to live in /sbin/ # there may be many hardcoded /sbin/drbd* out there, # including variants of our own scripts. if ! test /sbin -ef %{_sbindir}; then ln -sf %{_sbindir}/drbdsetup /sbin/ ln -sf %{_sbindir}/drbdmeta /sbin/ ln -sf %{_sbindir}/drbdadm /sbin/ fi %preun utils if [ $1 -eq 0 ]; then %{_initrddir}/drbd stop >/dev/null 2>&1 /sbin/chkconfig --del drbd # remove compat symlinks installed by post if ! test /sbin -ef %{_sbindir}; then rm -f /sbin/drbdsetup rm -f /sbin/drbdmeta rm -f /sbin/drbdadm fi fi %changelog * Fri Oct 11 2013 Philipp Reisner - 8.4.4-1 - New upstream release. * Tue Feb 5 2013 Philipp Reisner - 8.4.3-1 - New upstream release. * Thu Sep 6 2012 Philipp Reisner - 8.4.2-1 - New upstream release. * Tue Feb 21 2012 Lars Ellenberg - 8.4.1-2 - Build fix for RHEL 6 and ubuntu lucid * Tue Dec 20 2011 Philipp Reisner - 8.4.1-1 - New upstream release. * Wed Jul 15 2011 Philipp Reisner - 8.4.0-1 - New upstream release. * Fri Jan 28 2011 Philipp Reisner - 8.3.10-1 - New upstream release. * Fri Oct 22 2010 Philipp Reisner - 8.3.9-1 - New upstream release. * Wed Jun 2 2010 Philipp Reisner - 8.3.8-1 - New upstream release. * Thu Jan 13 2010 Philipp Reisner - 8.3.7-1 - New upstream release. * Thu Nov 8 2009 Philipp Reisner - 8.3.6-1 - New upstream release. * Thu Oct 27 2009 Philipp Reisner - 8.3.5-1 - New upstream release. * Wed Oct 21 2009 Florian Haas - 8.3.4-12 - Packaging makeover. * Thu Oct 6 2009 Philipp Reisner - 8.3.4-1 - New upstream release. * Thu Oct 5 2009 Philipp Reisner - 8.3.3-1 - New upstream release. * Fri Jul 3 2009 Philipp Reisner - 8.3.2-1 - New upstream release. * Fri Mar 27 2009 Philipp Reisner - 8.3.1-1 - New upstream release. * Thu Dec 18 2008 Philipp Reisner - 8.3.0-1 - New upstream release. * Thu Nov 12 2008 Philipp Reisner - 8.2.7-1 - New upstream release. * Fri May 30 2008 Philipp Reisner - 8.2.6-1 - New upstream release. * Tue Feb 12 2008 Philipp Reisner - 8.2.5-1 - New upstream release. * Fri Jan 11 2008 Philipp Reisner - 8.2.4-1 - New upstream release. * Wed Jan 9 2008 Philipp Reisner - 8.2.3-1 - New upstream release. * Fri Nov 2 2007 Philipp Reisner - 8.2.1-1 - New upstream release. * Fri Sep 28 2007 Philipp Reisner - 8.2.0-1 - New upstream release. * Mon Sep 3 2007 Philipp Reisner - 8.0.6-1 - New upstream release. * Fri Aug 3 2007 Philipp Reisner - 8.0.5-1 - New upstream release. * Wed Jun 27 2007 Philipp Reisner - 8.0.4-1 - New upstream release. * Mon May 7 2007 Philipp Reisner - 8.0.3-1 - New upstream release. * Fri Apr 6 2007 Philipp Reisner - 8.0.2-1 - New upstream release. * Mon Mar 3 2007 Philipp Reisner - 8.0.1-1 - New upstream release. * Wed Jan 24 2007 Philipp Reisner - 8.0.0-1 - New upstream release. drbd-8.4.4/drbd/Kbuild0000664000000000000000000000652212221331406013211 0ustar rootrootobj-m := drbd.o clean-files := compat.h .config.timestamp LINUXINCLUDE := -I$(src) $(LINUXINCLUDE) # Files in the standard include directories take precendence over files # in the compat directory. # # Add -I$(src) to EXTRA_CFLAGS again: some (rhel5, maybe other) kbuild does not # yet use LINUXINCLUDE like we expect it to ;( fortunately it does not contain # in-tree drbd either yet, so precedence of include files is not important. # # override: we absolutely need this, even if EXTRA_CFLAGS originates from make # command line or environment override EXTRA_CFLAGS += -I$(src) -I$(src)/compat # The augmented rbtree helper functions are not exported at least until kernel # version 2.6.38-rc2. ifeq ($(shell grep -e '\' \ -e '\' \ -e '\' \ $(objtree)/Module.symvers | wc -l),3) override EXTRA_CFLAGS += -DAUGMENTED_RBTREE_SYMBOLS_EXPORTED endif ifeq ($(shell grep -e '\' \ $(objtree)/Module.symvers | wc -l),1) override EXTRA_CFLAGS += -DIDR_GET_NEXT_EXPORTED else compat_objs += compat/idr.o endif ifeq ($(shell grep -e '\' \ $(objtree)/Module.symvers | wc -l),1) override EXTRA_CFLAGS += -DKOBJECT_CREATE_AND_ADD_EXPORTED else compat_objs += compat/kobject.o endif ifeq ($(shell grep -e '\' \ $(objtree)/Module.symvers | wc -l),1) override EXTRA_CFLAGS += -DBLKDEV_ISSUE_ZEROOUT_EXPORTED else compat_objs += compat/blkdev_issue_zeroout.o endif drbd-y := drbd_buildtag.o drbd_bitmap.o drbd_proc.o drbd-y += drbd_worker.o drbd_receiver.o drbd_req.o drbd_actlog.o drbd-y += lru_cache.o drbd_main.o drbd_strings.o drbd_nl.o drbd-y += drbd_interval.o drbd_state.o $(compat_objs) drbd-y += drbd_nla.o drbd_sysfs.o ifndef CONFIG_CONNECTOR drbd-y += connector.o cn_queue.o endif $(patsubst %,$(obj)/%,$(drbd-y)): $(obj)/compat.h obj-$(CONFIG_BLK_DEV_DRBD) += drbd.o # ====================================================================== # remember KERNELRELEASE for install target # .kernelversion can be included in Makefile as well as # sourced from shell $(shell echo -e "VERSION=$(VERSION)\n" \ "PATCHLEVEL=$(PATCHLEVEL)\n" \ "SUBLEVEL=$(SUBLEVEL)\n" \ "EXTRAVERSION=$(EXTRAVERSION)\n" \ "LOCALVERSION=$(LOCALVERSION)\n" \ "KERNELRELEASE=$(KERNELRELEASE)\n" \ "KERNELVERSION=$(KERNELVERSION)" \ > $(src)/.drbd_kernelrelease.new \ ) # Are we in stage 2 of the build (modpost)? KBUILD_STAGE ?= $(if $(filter $(srctree)/scripts/Makefile.modpost,$(MAKEFILE_LIST)),modpost) ifneq ($(shell date -r $(objtree)/.config),$(shell date -r $(obj)/.config.timestamp 2> /dev/null)) COMPAT_FORCE := FORCE endif ifneq ($(KBUILD_STAGE),modpost) $(obj)/compat.h: $(wildcard $(src)/compat/tests/*.c) $(COMPAT_FORCE) $(call filechk,compat.h) $(Q)touch $@ $(Q)touch -r $(objtree)/.config $(obj)/.config.timestamp endif filechk_compat.h = \ for cfg in $(sort $(filter-out FORCE,$^)); do \ var=`echo $$cfg | \ sed -e "s,.*/,COMPAT_," -e "s,\.c,," | \ tr -- -a-z _A-Z | \ tr -dc A-Z0-9_`; \ if $(CC) $(c_flags) $(COMPAT_CFLAGS) -c -o $(obj)/dummy.o $$cfg \ > /dev/null $(if $(quiet),2>&1); then \ echo "\#define $$var"; \ rm -f $(obj)/dummy.{o,gcda,gcno}; \ else \ echo "/* \#undef $$var */"; \ fi; \ done drbd-8.4.4/drbd/Kconfig0000664000000000000000000000460312132747531013370 0ustar rootroot# # DRBD device driver configuration # comment "DRBD disabled because PROC_FS or INET not selected" depends on PROC_FS='n' || INET='n' config BLK_DEV_DRBD tristate "DRBD Distributed Replicated Block Device support" depends on PROC_FS && INET select LRU_CACHE select LIBCRC32C default n help NOTE: In order to authenticate connections you have to select CRYPTO_HMAC and a hash function as well. DRBD is a shared-nothing, synchronously replicated block device. It is designed to serve as a building block for high availability clusters and in this context, is a "drop-in" replacement for shared storage. Simplistically, you could see it as a network RAID 1. Each minor device has a role, which can be 'primary' or 'secondary'. On the node with the primary device the application is supposed to run and to access the device (/dev/drbdX). Every write is sent to the local 'lower level block device' and, across the network, to the node with the device in 'secondary' state. The secondary device simply writes the data to its lower level block device. DRBD can also be used in dual-Primary mode (device writable on both nodes), which means it can exhibit shared disk semantics in a shared-nothing cluster. Needless to say, on top of dual-Primary DRBD utilizing a cluster file system is necessary to maintain for cache coherency. For automatic failover you need a cluster manager (e.g. heartbeat). See also: http://www.drbd.org/, http://www.linux-ha.org If unsure, say N. config DRBD_FAULT_INJECTION bool "DRBD fault injection" depends on BLK_DEV_DRBD help Say Y here if you want to simulate IO errors, in order to test DRBD's behavior. The actual simulation of IO errors is done by writing 3 values to /sys/module/drbd/parameters/ enable_faults: bitmask of... 1 meta data write 2 read 4 resync data write 8 read 16 data write 32 data read 64 read ahead 128 kmalloc of bitmap 256 allocation of peer_requests 512 insert data corruption on receiving side fault_devs: bitmask of minor numbers fault_rate: frequency in percent Example: Simulate data write errors on /dev/drbd0 with a probability of 5%. echo 16 > /sys/module/drbd/parameters/enable_faults echo 1 > /sys/module/drbd/parameters/fault_devs echo 5 > /sys/module/drbd/parameters/fault_rate If unsure, say N. drbd-8.4.4/drbd/Makefile0000664000000000000000000001515312221261130013510 0ustar rootroot# makefile for drbd for linux 2.4 // 2.6 # # By Lars Ellenberg. # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd; see the file COPYING. If not, write to # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # # usage: make [ KDIR=/path/to/kernel/source ] # # this file is read twice: # the first invocation calls out to the toplevel Makefile in the # kernel source tree, which then in turn will call this file again # as subdir Makefile, with all appropriate vars and macros set. # # note: if you get strange make errors when ARCH=um, you # probably need to "make mrproper" in the KDIR first... # The destination "root" directory. Meant to be overridden by # packaging scripts. DESTDIR ?= / # since 2.6.16, KERNELRELEASE may be empty, # e.g. when building agains some (broken?) linux-header package. # Lets test on PATCHLEVEL, that won't change too soon... ifneq ($(PATCHLEVEL),) ifneq ($(VERSION),3) ifneq ($(VERSION),2) $(error "won't compile with this kernel version") endif ifneq ($(PATCHLEVEL),6) $(error "won't compile with this kernel version") endif endif CONFIG_BLK_DEV_DRBD := m include $(src)/Kbuild else # called from command line in current directory # for some reason some of the commands below only work correctly in bash, # and not in e.g. dash. I'm too lazy to fix it to be compatible. SHELL=/bin/bash DRBDSRC := $(shell pwd) # to be overridden on command line: PREFIX := / ifneq ($(wildcard ../build-for-uml),) #### for Philipp's convenience :) ARCH_UM := "ARCH=um" KDIR := /usr/src/linux-um else ifeq ($(wildcard /lib/modules/$(shell uname -r)/source),) KDIR := /lib/modules/$(shell uname -r)/build else KDIR := /lib/modules/$(shell uname -r)/source ifneq ("$(origin KDIR)", "command line") ifneq ($(wildcard /lib/modules/$(shell uname -r)/build),) O := /lib/modules/$(shell uname -r)/build endif endif endif endif .PHONY: drbd.o default all greeting clean kbuild install dep tags drbd.o: greeting kbuild default: drbd.o all: drbd.o greeting: @echo "" ;\ echo " Calling toplevel makefile of kernel source tree, which I believe is in" ;\ echo " KDIR=$(KDIR)" ; \ echo ""; @if ! test -e $(KDIR)/Makefile ; then \ echo -e " SORRY, kernel makefile not found. You need to tell me a correct KDIR!\n" ;\ false;\ fi .PHONY: drbd_buildtag.c drbd_buildtag.c: @set -e; exec > $@.new; \ echo -e "/* automatically generated. DO NOT EDIT. */"; \ echo -e "#include "; \ echo -e "const char *drbd_buildtag(void)\n{"; \ if test -e ../.git && GITHEAD=$$(git rev-parse HEAD); then \ GITDIFF=$$(cd .. && git diff --name-only HEAD | \ tr -s '\t\n' ' ' | \ sed -e 's/^/ /;s/ *$$//'); \ echo -e "\treturn \"GIT-hash: $$GITHEAD$$GITDIFF\""; \ elif ! test -e $@ ; then \ echo >&2 "$@ not found."; \ test -e ../.git && \ >&2 printf "%s\n" \ "git did not work, but this looks like a git checkout?" \ "Install git and try again." || \ echo >&2 "Your DRBD source tree is broken. Unpack again."; \ exit 1; \ else \ grep return $@ ; \ fi ; \ echo -e "\t\t\" build by $$USER@$$HOSTNAME, `date "+%F %T"`\";\n}"; \ mv --force $@.new $@ kbuild: drbd_buildtag.c @rm -f .drbd_kernelrelease* # previous to 2.6.6 (suse: 2.6.5-dunno), this should be: $(MAKE) -C $(KDIR) $(if $(O),O=$(O),) SUBDIRS=$(DRBDSRC) $(ARCH_UM) modules # $(MAKE) -C $(KDIR) M=$(DRBDSRC) $(ARCH_UM) modules -mv .drbd_kernelrelease.new .drbd_kernelrelease @echo -n "Memorizing module configuration ... " @config=$$( (for x in $(KDIR)/.config $(O)/.config ; do \ if test -e $$x ; then echo $$x ; exit 0; fi ; \ done; echo $(KDIR)/.config) | sed -e 's,//,/,g') ; \ { echo -e "#\n# drbd.o was compiled with" ; \ echo "# `gcc -v 2>&1 | tail -1`" ; \ echo "# against this kernelrelease:" ; \ sed 's/^/# /' .drbd_kernelrelease ; \ echo "# kernel .config from" ; \ echo -n "# $$config" ; \ test -L "$${config%/.config}" && echo " alias" && \ echo "# $$(readlink $${config%/.config})/.config" || echo "" ; \ echo -e "# follows\n#\n" ; \ cat $$config ; } | gzip > .kernel.config.gz @echo "done." clean: rm -rf .tmp_versions Module.markers Module.symvers modules.order rm -f *.[oas] *.ko .*.cmd .*.d .*.tmp *.mod.c .*.flags .depend .kernel* rm -f compat/*.[oas] compat/.*.cmd distclean: clean @if git show HEAD:drbd/linux/drbd_config.h > linux/drbd_config.h.tmp \ && ! diff -s -U0 linux/drbd_config.h.tmp linux/drbd_config.h ; then \ mv linux/drbd_config.h.tmp linux/drbd_config.h ; \ else \ rm -f linux/drbd_config.h.tmp ; \ fi tags: ctags -R -I__initdata,__exitdata,__acquires,__releases \ -I __must_hold,__protected_by,__protected_read_by,__protected_write_by \ -I BIO_ENDIO_ARGS ifneq ($(wildcard .drbd_kernelrelease),) # for VERSION, PATCHLEVEL, SUBLEVEL, EXTRAVERSION, KERNELRELEASE include .drbd_kernelrelease MODOBJ := drbd.ko MODSUBDIR := updates LINUX := $(wildcard /lib/modules/$(KERNELRELEASE)/build) install: @if ! [ -e $(MODOBJ) ] ; then \ echo "No $(MODOBJ): nothing to install??"; false ; \ fi install -d $(DESTDIR)/lib/modules/$(KERNELRELEASE)/$(MODSUBDIR) install -m 644 $(MODOBJ) $(DESTDIR)/lib/modules/$(KERNELRELEASE)/$(MODSUBDIR) ifeq ($(DESTDIR),/) ifeq ($(shell uname -r),$(KERNELRELEASE)) /sbin/depmod -a || /sbin/depmod -e $(MODOBJ) 2>&1 >/dev/null || true else [ -e $(LINUX)/System.map ] && \ /sbin/depmod -F $(LINUX)/System.map -e ./$(MODOBJ) 2>&1 >/dev/null || true endif endif else install: @echo "No .drbd_kernelrelease found. Do you need to 'make' the module first?" @false endif depmod: [ -e $(KDIR)/System.map ] && [ -e ./$(MODOBJ) ] && \ /sbin/depmod -F $(KDIR)/System.map -n -e ./$(MODOBJ) # 2>&1 >/dev/null endif uninstall: spell: for f in $(wildcard *.c); do \ aspell --save-repl --dont-backup --personal=./../documentation/aspell.en.per check $$f; \ done drbd-8.4.4/drbd/compat/asm-generic/bitops/le.h0000664000000000000000000000355411605310253017606 0ustar rootroot#ifndef _ASM_GENERIC_BITOPS_LE_H_ #define _ASM_GENERIC_BITOPS_LE_H_ #include #include #if defined(__LITTLE_ENDIAN) #define BITOP_LE_SWIZZLE 0 static inline unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset) { return find_next_zero_bit(addr, size, offset); } static inline unsigned long find_next_bit_le(const void *addr, unsigned long size, unsigned long offset) { return find_next_bit(addr, size, offset); } static inline unsigned long find_first_zero_bit_le(const void *addr, unsigned long size) { return find_first_zero_bit(addr, size); } #elif defined(__BIG_ENDIAN) #define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) extern unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset); extern unsigned long find_next_bit_le(const void *addr, unsigned long size, unsigned long offset); #define find_first_zero_bit_le(addr, size) \ find_next_zero_bit_le((addr), (size), 0) #else #error "Please fix " #endif static inline int test_bit_le(int nr, const void *addr) { return test_bit(nr ^ BITOP_LE_SWIZZLE, addr); } static inline void __set_bit_le(int nr, void *addr) { __set_bit(nr ^ BITOP_LE_SWIZZLE, addr); } static inline void __clear_bit_le(int nr, void *addr) { __clear_bit(nr ^ BITOP_LE_SWIZZLE, addr); } static inline int test_and_set_bit_le(int nr, void *addr) { return test_and_set_bit(nr ^ BITOP_LE_SWIZZLE, addr); } static inline int test_and_clear_bit_le(int nr, void *addr) { return test_and_clear_bit(nr ^ BITOP_LE_SWIZZLE, addr); } static inline int __test_and_set_bit_le(int nr, void *addr) { return __test_and_set_bit(nr ^ BITOP_LE_SWIZZLE, addr); } static inline int __test_and_clear_bit_le(int nr, void *addr) { return __test_and_clear_bit(nr ^ BITOP_LE_SWIZZLE, addr); } #endif /* _ASM_GENERIC_BITOPS_LE_H_ */ drbd-8.4.4/drbd/compat/asm/barrier.h0000664000000000000000000000013711751757715015740 0ustar rootroot/* used to be part of asm/system.h, before that was "Disintegrated" */ #include drbd-8.4.4/drbd/compat/bitops.h0000664000000000000000000000371711605310253015015 0ustar rootroot#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,25) /* did not yet include generic_find_next_le_bit() {{{ */ #if defined(__LITTLE_ENDIAN) #define generic_find_next_le_bit(addr, size, offset) \ find_next_bit(addr, size, offset) #elif defined(__BIG_ENDIAN) /* from 2.6.33 lib/find_bit.c */ /* include/linux/byteorder does not support "unsigned long" type */ static inline unsigned long ext2_swabp(const unsigned long * x) { #if BITS_PER_LONG == 64 return (unsigned long) __swab64p((u64 *) x); #elif BITS_PER_LONG == 32 return (unsigned long) __swab32p((u32 *) x); #else #error BITS_PER_LONG not defined #endif } /* include/linux/byteorder doesn't support "unsigned long" type */ static inline unsigned long ext2_swab(const unsigned long y) { #if BITS_PER_LONG == 64 return (unsigned long) __swab64((u64) y); #elif BITS_PER_LONG == 32 return (unsigned long) __swab32((u32) y); #else #error BITS_PER_LONG not defined #endif } unsigned long generic_find_next_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset) { const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset & ~(BITS_PER_LONG - 1); unsigned long tmp; if (offset >= size) return size; size -= result; offset &= (BITS_PER_LONG - 1UL); if (offset) { tmp = ext2_swabp(p++); tmp &= (~0UL << offset); if (size < BITS_PER_LONG) goto found_first; if (tmp) goto found_middle; size -= BITS_PER_LONG; result += BITS_PER_LONG; } while (size & ~(BITS_PER_LONG - 1)) { tmp = *(p++); if (tmp) goto found_middle_swap; result += BITS_PER_LONG; size -= BITS_PER_LONG; } if (!size) return result; tmp = ext2_swabp(p); found_first: tmp &= (~0UL >> (BITS_PER_LONG - size)); if (tmp == 0UL) /* Are any bits set? */ return result + size; /* Nope. */ found_middle: return result + __ffs(tmp); found_middle_swap: return result + __ffs(ext2_swab(tmp)); } #else #error "unknown byte order" #endif #endif /* LINUX_VERSION_CODE < KERNEL_VERSION(2,6,25) */ drbd-8.4.4/drbd/compat/blkdev_issue_zeroout.c0000664000000000000000000000373112226007136017755 0ustar rootroot//#include #include #include #include #include "drbd_wrappers.h" // Taken from blk-lib.c struct bio_batch { atomic_t done; unsigned long flags; struct completion *wait; }; BIO_ENDIO_TYPE bio_batch_end_io(struct bio *bio, int err) { struct bio_batch *bb = bio->bi_private; BIO_ENDIO_FN_START; if (err && (err != -EOPNOTSUPP)) clear_bit(BIO_UPTODATE, &bb->flags); if (atomic_dec_and_test(&bb->done)) complete(bb->wait); bio_put(bio); BIO_ENDIO_FN_RETURN; } /** * blkdev_issue_zeroout - zero-fill a block range * @bdev: blockdev to write * @sector: start sector * @nr_sects: number of sectors to write * @gfp_mask: memory allocation flags (for bio_alloc) * * Description: * Generate and issue number of bios with zerofiled pages. */ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask) { int ret; struct bio *bio; struct bio_batch bb; unsigned int sz; struct page *page; DECLARE_COMPLETION_ONSTACK(wait); page = alloc_page(gfp_mask | __GFP_ZERO); if (!page) return -ENOMEM; atomic_set(&bb.done, 1); bb.flags = 1 << BIO_UPTODATE; bb.wait = &wait; ret = 0; while (nr_sects != 0) { bio = bio_alloc(gfp_mask, min(nr_sects, (sector_t)BIO_MAX_PAGES)); if (!bio) { ret = -ENOMEM; break; } bio->bi_sector = sector; bio->bi_bdev = bdev; bio->bi_end_io = bio_batch_end_io; bio->bi_private = &bb; while (nr_sects != 0) { sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects); ret = bio_add_page(bio, page, sz << 9, 0); nr_sects -= ret >> 9; sector += ret >> 9; if (ret < (sz << 9)) break; } ret = 0; atomic_inc(&bb.done); submit_bio(WRITE, bio); } /* Wait for bios in-flight */ if (!atomic_dec_and_test(&bb.done)) wait_for_completion(&wait); if (!test_bit(BIO_UPTODATE, &bb.flags)) /* One of bios in the batch was completed with error.*/ ret = -EIO; put_page(page); return ret; } drbd-8.4.4/drbd/compat/idr.c0000664000000000000000000000205311605310253014256 0ustar rootroot#include #include #include #include #include /* The idr_get_next() function exists since 2009-04-02 Linux-2.6.29 (commit 38460b48) but is exported for use in modules since 2010-01-29 Linux-2.6.35 (commit 4d1ee80f) */ #ifndef IDR_GET_NEXT_EXPORTED #ifndef rcu_dereference_raw /* see c26d34a rcu: Add lockdep-enabled variants of rcu_dereference() */ #define rcu_dereference_raw(p) rcu_dereference(p) #endif void *idr_get_next(struct idr *idp, int *nextidp) { struct idr_layer *p, *pa[MAX_LEVEL]; struct idr_layer **paa = &pa[0]; int id = *nextidp; int n, max; /* find first ent */ n = idp->layers * IDR_BITS; max = 1 << n; p = rcu_dereference_raw(idp->top); if (!p) return NULL; while (id < max) { while (n > 0 && p) { n -= IDR_BITS; *paa++ = p; p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]); } if (p) { *nextidp = id; return p; } id += 1 << n; while (n < fls(id)) { n += IDR_BITS; p = *--paa; } } return NULL; } #endif drbd-8.4.4/drbd/compat/kobject.c0000664000000000000000000000257012132747531015136 0ustar rootroot#include #include /* These functions mimmic the post 2.6.24 kobject api on the pre 2.6.24 api */ static void dynamic_kobj_release(struct kobject *kobj) { pr_debug("kobject: (%p): %s\n", kobj, __func__); kfree(kobj); } static struct kobj_type dynamic_kobj_ktype = { .release = dynamic_kobj_release, .sysfs_ops = NULL, }; static struct kobject *kobject_create(void) { struct kobject *kobj; kobj = kzalloc(sizeof(*kobj), GFP_KERNEL); if (!kobj) return NULL; kobject_init(kobj); kobj->ktype = &dynamic_kobj_ktype; return kobj; } struct kobject *kobject_create_and_add(const char *name, struct kobject *parent) { struct kobject *kobj; int retval; kobj = kobject_create(); if (!kobj) return NULL; kobject_set_name(kobj, "%s", name); kobj->parent = parent; retval = kobject_add(kobj); if (retval) { printk(KERN_WARNING "%s: kobject_add error: %d\n", __func__, retval); kobject_put(kobj); kobj = NULL; } return kobj; } int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, struct kobject *parent, const char *name) { int retval; kobject_init(kobj); kobj->ktype = ktype; kobject_set_name(kobj, "%s", name); kobj->parent = parent; retval = kobject_add(kobj); if (retval) { printk(KERN_WARNING "%s: kobject_add error: %d\n", __func__, retval); kobject_put(kobj); } return retval; } drbd-8.4.4/drbd/compat/linux/autoconf.h0000664000000000000000000000004511516050234016462 0ustar rootroot/* empty file, for compat reasons */ drbd-8.4.4/drbd/compat/linux/dynamic_debug.h0000664000000000000000000000020111516050234017430 0ustar rootroot#ifndef _DYNAMIC_DEBUG_H #define _DYNAMIC_DEBUG_H #ifndef dynamic_dev_dbg #define dynamic_dev_dbg(dev, fmt, ...) #endif #endif drbd-8.4.4/drbd/compat/linux/hardirq.h0000664000000000000000000000003211516050234016272 0ustar rootroot/* Just an empty file. */ drbd-8.4.4/drbd/compat/linux/memcontrol.h0000664000000000000000000000020511516050234017021 0ustar rootroot/* just an empty file * memcontrol.h did not exist prior to 2.6.25. * but it needs more recent kernels for mm_inline.h to work. */ drbd-8.4.4/drbd/compat/linux/mutex.h0000664000000000000000000000127611516050234016015 0ustar rootroot/* "Backport" of the mutex to older Linux-2.6.x kernels. */ #ifndef __LINUX_MUTEX_H #define __LINUX_MUTEX_H #include struct mutex { struct semaphore sem; }; static inline void mutex_init(struct mutex *m) { sema_init(&m->sem, 1); } static inline void mutex_lock(struct mutex *m) { down(&m->sem); } static inline int mutex_lock_interruptible(struct mutex *m) { return down_interruptible(&m->sem); } static inline void mutex_unlock(struct mutex *m) { up(&m->sem); } static inline int mutex_is_locked(struct mutex *lock) { return atomic_read(&lock->sem.count) != 1; } static inline int mutex_trylock(struct mutex *lock) { return !down_trylock(&lock->sem); } #endif drbd-8.4.4/drbd/compat/linux/tracepoint.h0000664000000000000000000000002311516050234017010 0ustar rootrootstruct tracepoint; drbd-8.4.4/drbd/compat/tests/bio_split_has_bio_split_pool_parameter.c0000664000000000000000000000034612226001623024617 0ustar rootroot#include /* * bio_split() had a memory pool parameter until commit 6feef53 (2.6.28-rc1). */ void test(void) { struct bio *bio = NULL; struct bio_pair *bio_pair; bio_pair = bio_split(bio, bio_split_pool, 0); } drbd-8.4.4/drbd/compat/tests/bioset_create_has_three_parameters.c0000664000000000000000000000032112226001623023711 0ustar rootroot#include /* * Note that up until 2.6.21 inclusive, it was * struct bio_set *bioset_create(int bio_pool_size, int bvec_pool_size, int scale) */ void dummy(void) { bioset_create(16, 16, 4); } drbd-8.4.4/drbd/compat/tests/blkdev_issue_zeroout_has_5_paramters.c0000664000000000000000000000032112226001623024237 0ustar rootroot#include /* In 2.6.34 and 2.6.35 this function had 5 parameters. Later the flags parameter was dropped */ int foo() { int r; r = blkdev_issue_zeroout(NULL, 0, 0, 0, BLKDEV_IFL_WAIT); } drbd-8.4.4/drbd/compat/tests/drbd_release_returns_void.c0000664000000000000000000000047712226001623022066 0ustar rootroot// #include #include #ifndef __same_type # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) #endif void dummy(void) { struct block_device_operations ops; void (*release) (struct gendisk *, fmode_t); BUILD_BUG_ON(!(__same_type(ops.release, release))); } drbd-8.4.4/drbd/compat/tests/have_IS_ERR_OR_NULL.c0000664000000000000000000000012412226001623020115 0ustar rootroot#include int foo(void) { void *x = 0; return IS_ERR_OR_NULL(x); } drbd-8.4.4/drbd/compat/tests/have_atomic_in_flight.c0000664000000000000000000000032512226001623021142 0ustar rootroot#include #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,39) #include static struct hd_struct hd; void dummy(void) { BUILD_BUG_ON(!__same_type(atomic_t, hd.in_flight[0])); } #endif drbd-8.4.4/drbd/compat/tests/have_bio_bi_destructor.c0000664000000000000000000000013112226001623021337 0ustar rootroot#include void dummy(void) { struct bio bio; bio.bi_destructor = NULL; } drbd-8.4.4/drbd/compat/tests/have_bioset_create_front_pad.c0000664000000000000000000000203612226001623022510 0ustar rootroot#include /* * upstream commit (included in 2.6.29) * commit bb799ca0202a360fa74d5f17039b9100caebdde7 * Author: Jens Axboe * Date: Wed Dec 10 15:35:05 2008 +0100 * * bio: allow individual slabs in the bio_set * * does * -struct bio_set *bioset_create(int bio_pool_size, int bvec_pool_size) * +struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad) * * Note that up until 2.6.21 inclusive, it was * struct bio_set *bioset_create(int bio_pool_size, int bvec_pool_size, int scale) * so if we want to support old kernels (RHEL5), we will need an additional compat check. * * This also means that we must not use the front_pad trick as long as we want * to keep compatibility with < 2.6.29. */ extern struct bio_set *compat_check_bioset_create(unsigned int, unsigned int); #ifndef __same_type # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) #endif void dummy(void) { BUILD_BUG_ON(!__same_type(&compat_check_bioset_create, &bioset_create)); } drbd-8.4.4/drbd/compat/tests/have_blk_queue_max_hw_sectors.c0000664000000000000000000000014712226001623022726 0ustar rootroot#include #ifndef blk_queue_max_hw_sectors void *p = blk_queue_max_hw_sectors; #endif drbd-8.4.4/drbd/compat/tests/have_blk_queue_max_segments.c0000664000000000000000000000014312226001623022367 0ustar rootroot#include #ifndef blk_queue_max_segments void *p = blk_queue_max_segments; #endif drbd-8.4.4/drbd/compat/tests/have_blk_set_stacking_limits.c0000664000000000000000000000016012226001651022530 0ustar rootroot#include void foo(void) { struct queue_limits *lim = NULL; blk_set_stacking_limits(lim); } drbd-8.4.4/drbd/compat/tests/have_blkdev_get_by_path.c0000664000000000000000000000041612226001623021460 0ustar rootroot#include /* * In kernel version 2.6.38-rc1, open_bdev_exclusive() was replaced by * blkdev_get_by_path(); see commits e525fd89 and d4d77629. */ void foo(void) { struct block_device *blkdev; blkdev = blkdev_get_by_path("", (fmode_t) 0, (void *) 0); } drbd-8.4.4/drbd/compat/tests/have_bool_type.c0000664000000000000000000000004212226001623017633 0ustar rootroot#include bool x; drbd-8.4.4/drbd/compat/tests/have_clear_bit_unlock.c0000664000000000000000000000071012226001623021140 0ustar rootroot#include /* Including asm/barrier.h is necessary for s390. They define smp_mb__before_clear_bit() in asm/system.h From asm/bitops.h they include asm-generic/bitops/lock.h The macro defining clear_bit_unlock() in asm-generic/bitops/lock.h needs smp_mb__before_clear_bit(). They fail to include asm/barrier.h from asm/bitops.h */ #include void foo(void) { unsigned long bar; clear_bit_unlock(0, &bar); } drbd-8.4.4/drbd/compat/tests/have_cn_netlink_skb_parms.c0000664000000000000000000000051012226001623022024 0ustar rootroot#include #include #ifndef __same_type # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) #endif void dummy(void) { void (*cb) (struct cn_msg *, struct netlink_skb_parms *) = NULL; struct cn_callback_data ccb; BUILD_BUG_ON(!(__same_type(ccb.callback, cb))); } drbd-8.4.4/drbd/compat/tests/have_cpumask_empty.c0000664000000000000000000000015312226001623020523 0ustar rootroot#include int main(void) { int e = cpumask_empty((struct cpumask *)NULL); return e; } drbd-8.4.4/drbd/compat/tests/have_ctrl_attr_mcast_groups.c0000664000000000000000000000012012226001623022420 0ustar rootroot#include void f(void) { int i = CTRL_ATTR_MCAST_GROUPS; } drbd-8.4.4/drbd/compat/tests/have_dst_groups.c0000664000000000000000000000020212226001623020026 0ustar rootroot#include #include void dummy(void) { static struct netlink_skb_parms p; p.dst_groups = 0; } drbd-8.4.4/drbd/compat/tests/have_find_next_zero_bit_le.c0000664000000000000000000000033712226001623022201 0ustar rootroot#include #include unsigned long func(void) { void *addr; unsigned long size, offset; addr = NULL; size = 0; offset = 0; return find_next_zero_bit_le(addr, size, offset); } drbd-8.4.4/drbd/compat/tests/have_fmode_t.c0000664000000000000000000000007412226001623017261 0ustar rootroot#include void foo(void) { fmode_t mode; } drbd-8.4.4/drbd/compat/tests/have_genl_lock.c0000664000000000000000000000024412226001623017600 0ustar rootroot#include /* genl_lock() is exported for modules since 2.6.34 */ void foo(void) { void (*genl_lock_ptr)(void); genl_lock_ptr = genl_lock; } drbd-8.4.4/drbd/compat/tests/have_genlmsg_msg_size.c0000664000000000000000000000017012226001623021175 0ustar rootroot#include void f(void) { int dummy; dummy = genlmsg_msg_size(0); dummy = genlmsg_total_size(0); } drbd-8.4.4/drbd/compat/tests/have_genlmsg_new.c0000664000000000000000000000015112226001623020145 0ustar rootroot#include void f(void) { struct sk_buff *skb; skb = genlmsg_new(123, GFP_KERNEL); } drbd-8.4.4/drbd/compat/tests/have_genlmsg_put_reply.c0000664000000000000000000000031612226001623021402 0ustar rootroot#include void f(void) { struct sk_buff *skb = NULL; struct genl_info *info = NULL; struct genl_family *family = NULL; void *ret; ret = genlmsg_put_reply(skb, info, family, 0, 0); } drbd-8.4.4/drbd/compat/tests/have_genlmsg_reply.c0000664000000000000000000000021712226001623020512 0ustar rootroot#include void f(void) { struct sk_buff *skb = NULL; struct genl_info *info = NULL; int ret = genlmsg_reply(skb, info); } drbd-8.4.4/drbd/compat/tests/have_idr_alloc.c0000664000000000000000000000022712226001623017574 0ustar rootroot#include #include void foo(void) { int i; struct idr idr; int n = 10; i = idr_alloc(&idr, &i, n, n+1, GFP_KERNEL); } drbd-8.4.4/drbd/compat/tests/have_idr_for_each.c0000664000000000000000000000026612226001623020253 0ustar rootroot#include static int idr_has_entry(int id, void *p, void *data) { return 1; } bool idr_is_empty(struct idr *idr) { return !idr_for_each(idr, idr_has_entry, NULL); } drbd-8.4.4/drbd/compat/tests/have_idr_for_each_entry.c0000664000000000000000000000017112226001623021467 0ustar rootroot#include void foo(void) { struct idr idr; struct bar *b; int i; idr_for_each_entry(&idr, b, i) ; } drbd-8.4.4/drbd/compat/tests/have_kref_sub.c0000664000000000000000000000012512226001623017441 0ustar rootroot#include void foo(void) { struct kref t; kref_sub(&t, 2, NULL); } drbd-8.4.4/drbd/compat/tests/have_linux_byteorder_swabb_h.c0000664000000000000000000000007612226001623022551 0ustar rootroot#include #include drbd-8.4.4/drbd/compat/tests/have_list_splice_tail_init.c0000664000000000000000000000023512226001623022211 0ustar rootroot#include void *p = list_splice_tail_init; void bar(void) { LIST_HEAD(list1); LIST_HEAD(list2); list_splice_tail_init(&list1, &list2); } drbd-8.4.4/drbd/compat/tests/have_netlink_skb_parms_portid.c0000664000000000000000000000014112226001623022725 0ustar rootroot#include void dummy(void) { struct netlink_skb_parms nsp; nsp.portid = 0; } drbd-8.4.4/drbd/compat/tests/have_nlmsg_hdr.c0000664000000000000000000000021412226001623017615 0ustar rootroot#include #include void f(void) { struct sk_buff *skb = NULL; struct nlmsghdr *hdr = nlmsg_hdr(skb); } drbd-8.4.4/drbd/compat/tests/have_nr_cpu_ids.c0000664000000000000000000000010412226001623017763 0ustar rootroot#include void foo(void) { int x = nr_cpu_ids; } drbd-8.4.4/drbd/compat/tests/have_open_bdev_exclusive.c0000664000000000000000000000042712226001623021676 0ustar rootroot#include #include /* * In kernel version v2.6.28-rc1, open_bdev_excl() was replaced by * open_bdev_exclusive(); see commit 30c40d2. */ void foo(void) { struct block_device *blkdev; blkdev = open_bdev_exclusive("", (fmode_t) 0, (void *) 0); } drbd-8.4.4/drbd/compat/tests/have_prandom_u32.c0000664000000000000000000000012112226001623017766 0ustar rootroot#include int main(void) { u32 r = prandom_u32(); return 0; } drbd-8.4.4/drbd/compat/tests/have_proc_create_data.c0000664000000000000000000000013012226001623021114 0ustar rootroot#include #ifndef proc_create_data void *p = proc_create_data; #endif drbd-8.4.4/drbd/compat/tests/have_proc_pde_data.c0000664000000000000000000000017612226001623020433 0ustar rootroot#include int main(void) { struct inode *inode = NULL; void *data; data = PDE_DATA(inode); return 0; } drbd-8.4.4/drbd/compat/tests/have_rb_augment_functions.c0000664000000000000000000000051712226001623022061 0ustar rootroot#include /* introduced with commit b945d6b2, Linux 2.6.35-rc5 */ void foo(void) { struct rb_node *n; rb_augment_insert((struct rb_node *) NULL, (rb_augment_f) NULL, NULL); n = rb_augment_erase_begin((struct rb_node *)NULL); rb_augment_erase_end((struct rb_node *) NULL, (rb_augment_f) NULL, NULL); } drbd-8.4.4/drbd/compat/tests/have_security_netlink_recv.c0000664000000000000000000000065512226001623022263 0ustar rootroot#include /* int f(void) { struct sk_buff *skb = NULL; return security_netlink_recv(skb, CAP_SYS_ADMIN); } gcc treats function calls of unkown functions as warning. Therefore we compile the tests with -Werror=implicit-function-declaration but on gentoo users tend to disable all warnings system wide! But the following is a compiler error even on such a gentoo system: */ void *p = security_netlink_recv; drbd-8.4.4/drbd/compat/tests/have_sock_shutdown.c0000664000000000000000000000013412226001623020533 0ustar rootroot#include #ifndef kernel_sock_shutdown void *p = kernel_sock_shutdown; #endif drbd-8.4.4/drbd/compat/tests/have_struct_queue_limits.c0000664000000000000000000000014712226003604021757 0ustar rootroot#include struct queue_limits *foo(void) { struct queue_limits lim; return &lim; } drbd-8.4.4/drbd/compat/tests/have_task_pid_nr.c0000664000000000000000000000013712226001623020141 0ustar rootroot#include int main(void) { pid_t p = task_pid_nr(current); return (int)p; } drbd-8.4.4/drbd/compat/tests/have_umh_wait_proc.c0000664000000000000000000000010212226001623020474 0ustar rootroot#include void foo() { int bar = UMH_WAIT_PROC; } drbd-8.4.4/drbd/compat/tests/have_void_make_request.c0000664000000000000000000000102112226001623021343 0ustar rootroot#include /* hm. sometimes this pragma is ignored :( * use BUILD_BUG_ON instead. #pragma GCC diagnostic warning "-Werror" */ /* in Commit 5a7bbad27a410350e64a2d7f5ec18fc73836c14f (between Linux-3.1 and 3.2) make_request() becomes type void. Before it had type int. */ void drbd_make_request(struct request_queue *q, struct bio *bio) { } void foo(void) { struct request_queue *q = NULL; blk_queue_make_request(q, drbd_make_request); BUILD_BUG_ON(!(__same_type(drbd_make_request, make_request_fn))); } drbd-8.4.4/drbd/compat/tests/have_vzalloc.c0000664000000000000000000000010212226001623017306 0ustar rootroot#include void foo() { void *v = vzalloc(8); } drbd-8.4.4/drbd/compat/tests/hlist_for_each_entry_has_three_parameters.c0000664000000000000000000000057412226001623025305 0ustar rootroot#include #include struct element { struct hlist_node colision; int x; }; /* * Befor linux-3.9 it was hlist_for_each_entry(tpos, pos, head, member) * now it is hlist_for_each_entry(pos, head, member) */ void dummy(void) { struct element *e; struct hlist_head head; INIT_HLIST_HEAD(&head); hlist_for_each_entry(e, &head, colision) ; } drbd-8.4.4/drbd/compat/tests/init_work_has_three_arguments.c0000664000000000000000000000014412226001623022753 0ustar rootroot#include void f(void) { struct work_struct ws; INIT_WORK(&ws, NULL, NULL); } drbd-8.4.4/drbd/compat/tests/kmap_atomic_page_only.c0000664000000000000000000000023712226001623021163 0ustar rootroot#include /* see 980c19e3 * highmem: mark k[un]map_atomic() with two arguments as deprecated */ void *f(void) { return kmap_atomic(NULL); } drbd-8.4.4/drbd/compat/tests/need_genlmsg_multicast_wrapper.c0000664000000000000000000000017212226001623023114 0ustar rootroot#include void f(void) { struct sk_buff *skb = NULL; int ret; ret = genlmsg_multicast(skb, 0, 0); } drbd-8.4.4/drbd/compat/tests/queue_limits_has_discard_zeroes_data.c0000664000000000000000000000021612226001672024254 0ustar rootroot#include struct queue_limits *foo(void) { struct queue_limits *lim = NULL; lim->discard_zeroes_data = 1; return lim; } drbd-8.4.4/drbd/compat/tests/use_blk_queue_max_sectors_anyways.c0000664000000000000000000000072712226001623023660 0ustar rootroot#include #ifndef blk_queue_max_hw_sectors void *p = blk_queue_max_hw_sectors; #endif /* For kernel versions 2.6.31 to 2.6.33 inclusive, even though * blk_queue_max_hw_sectors is present, we actually need to use * blk_queue_max_sectors to set max_hw_sectors. :-( * RHEL6 2.6.32 chose to be different and already has eliminated * blk_queue_max_sectors as upstream 2.6.34 did. */ #ifndef blk_queue_max_sectors void *q = blk_queue_max_sectors; #endif drbd-8.4.4/drbd/data-structure-v9.txt0000664000000000000000000000360712221261130016115 0ustar rootrootThis describes the in kernel data structure for DRBD-9. Starting with Linux v3.12 we are reorganizing DRBD to use this data structure. Basic Data Structure ==================== A node has a number of DRBD resources. Each such resource has a number of devices (aka volumes) and connections to other nodes ("peer nodes"). Each DRBD device is represented by a block device locally. The DRBD objects are interconnected to form a matrix as depicted below; a drbd_peer_device object sits at each intersection between a drbd_device and a drbd_connection: /--------------+---------------+.....+---------------\ | resource | device | | device | +--------------+---------------+.....+---------------+ | connection | peer_device | | peer_device | +--------------+---------------+.....+---------------+ : : : : : : : : : : +--------------+---------------+.....+---------------+ | connection | peer_device | | peer_device | \--------------+---------------+.....+---------------/ In this table, horizontally, devices can be accessed from resources by their volume number. Likewise, peer_devices can be accessed from connections by their volume number. Objects in the vertical direction are connected by double linked lists. There are back pointers from peer_devices to their connections a devices, and from connections and devices to their resource. All resources are in the drbd_resources double-linked list. In addition, all devices can be accessed by their minor device number via the drbd_devices idr. The drbd_resource, drbd_connection, and drbd_device objects are reference counted. The peer_device objects only serve to establish the links between devices and connections; their lifetime is determined by the lifetime of the device and connection which they reference. drbd-8.4.4/drbd/drbd_actlog.c0000664000000000000000000011666312221310730014471 0ustar rootroot/* drbd_actlog.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include "drbd_int.h" #include "drbd_wrappers.h" enum al_transaction_types { AL_TR_UPDATE = 0, AL_TR_INITIALIZED = 0xffff }; /* all fields on disc in big endian */ struct __packed al_transaction_on_disk { /* don't we all like magic */ __be32 magic; /* to identify the most recent transaction block * in the on disk ring buffer */ __be32 tr_number; /* checksum on the full 4k block, with this field set to 0. */ __be32 crc32c; /* type of transaction, special transaction types like: * purge-all, set-all-idle, set-all-active, ... to-be-defined * see also enum al_transaction_types */ __be16 transaction_type; /* we currently allow only a few thousand extents, * so 16bit will be enough for the slot number. */ /* how many updates in this transaction */ __be16 n_updates; /* maximum slot number, "al-extents" in drbd.conf speak. * Having this in each transaction should make reconfiguration * of that parameter easier. */ __be16 context_size; /* slot number the context starts with */ __be16 context_start_slot_nr; /* Some reserved bytes. Expected usage is a 64bit counter of * sectors-written since device creation, and other data generation tag * supporting usage */ __be32 __reserved[4]; /* --- 36 byte used --- */ /* Reserve space for up to AL_UPDATES_PER_TRANSACTION changes * in one transaction, then use the remaining byte in the 4k block for * context information. "Flexible" number of updates per transaction * does not help, as we have to account for the case when all update * slots are used anyways, so it would only complicate code without * additional benefit. */ __be16 update_slot_nr[AL_UPDATES_PER_TRANSACTION]; /* but the extent number is 32bit, which at an extent size of 4 MiB * allows to cover device sizes of up to 2**54 Byte (16 PiB) */ __be32 update_extent_nr[AL_UPDATES_PER_TRANSACTION]; /* --- 420 bytes used (36 + 64*6) --- */ /* 4096 - 420 = 3676 = 919 * 4 */ __be32 context[AL_CONTEXT_PER_TRANSACTION]; }; struct update_odbm_work { struct drbd_work w; struct drbd_device *device; unsigned int enr; }; struct update_al_work { struct drbd_work w; struct drbd_device *device; struct completion event; int err; }; void *drbd_md_get_buffer(struct drbd_device *device) { int r; wait_event(device->misc_wait, (r = atomic_cmpxchg(&device->md_io_in_use, 0, 1)) == 0 || device->state.disk <= D_FAILED); return r ? NULL : page_address(device->md_io_page); } void drbd_md_put_buffer(struct drbd_device *device) { if (atomic_dec_and_test(&device->md_io_in_use)) wake_up(&device->misc_wait); } void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_backing_dev *bdev, unsigned int *done) { long dt; rcu_read_lock(); dt = rcu_dereference(bdev->disk_conf)->disk_timeout; rcu_read_unlock(); dt = dt * HZ / 10; if (dt == 0) dt = MAX_SCHEDULE_TIMEOUT; dt = wait_event_timeout(device->misc_wait, *done || test_bit(FORCE_DETACH, &device->flags), dt); if (dt == 0) { drbd_err(device, "meta-data IO operation timed out\n"); drbd_chk_io_error(device, 1, DRBD_FORCE_DETACH); } } static int _drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev *bdev, struct page *page, sector_t sector, int rw, int size) { struct bio *bio; int err; if ((rw & WRITE) && !test_bit(MD_NO_BARRIER, &device->flags)) rw |= DRBD_REQ_FUA | DRBD_REQ_FLUSH; rw |= DRBD_REQ_UNPLUG | DRBD_REQ_SYNC; #ifndef REQ_FLUSH /* < 2.6.36, "barrier" semantic may fail with EOPNOTSUPP */ retry: #endif device->md_io.done = 0; device->md_io.error = -ENODEV; bio = bio_alloc_drbd(GFP_NOIO); bio->bi_bdev = bdev->md_bdev; bio->bi_sector = sector; err = -EIO; if (bio_add_page(bio, page, size, 0) != size) goto out; bio->bi_private = &device->md_io; bio->bi_end_io = drbd_md_io_complete; bio->bi_rw = rw; if (!(rw & WRITE) && device->state.disk == D_DISKLESS && device->ldev == NULL) /* special case, drbd_md_read() during drbd_adm_attach(): no get_ldev */ ; else if (!get_ldev_if_state(device, D_ATTACHING)) { /* Corresponding put_ldev in drbd_md_io_complete() */ drbd_err(device, "ASSERT FAILED: get_ldev_if_state() == 1 in _drbd_md_sync_page_io()\n"); err = -ENODEV; goto out; } bio_get(bio); /* one bio_put() is in the completion handler */ atomic_inc(&device->md_io_in_use); /* drbd_md_put_buffer() is in the completion handler */ if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD)) bio_endio(bio, -EIO); else submit_bio(rw, bio); wait_until_done_or_force_detached(device, bdev, &device->md_io.done); if (bio_flagged(bio, BIO_UPTODATE)) err = device->md_io.error; #ifndef REQ_FLUSH /* check for unsupported barrier op. * would rather check on EOPNOTSUPP, but that is not reliable. * don't try again for ANY return value != 0 */ if (err && device->md_io.done && (bio->bi_rw & DRBD_REQ_HARDBARRIER)) { /* Try again with no barrier */ drbd_warn(device, "Barriers not supported on meta data device - disabling\n"); set_bit(MD_NO_BARRIER, &device->flags); rw &= ~DRBD_REQ_HARDBARRIER; bio_put(bio); goto retry; } #endif out: bio_put(bio); return err; } int drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev *bdev, sector_t sector, int rw) { int err; struct page *iop = device->md_io_page; D_ASSERT(device, atomic_read(&device->md_io_in_use) == 1); if (!bdev->md_bdev) { if (DRBD_ratelimit(5*HZ, 5)) { drbd_err(device, "bdev->md_bdev==NULL\n"); dump_stack(); } return -EIO; } dynamic_drbd_dbg(device, "meta_data io: %s [%d]:%s(,%llus,%s) %pS\n", current->comm, current->pid, __func__, (unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ", (void*)_RET_IP_ ); if (sector < drbd_md_first_sector(bdev) || sector + 7 > drbd_md_last_sector(bdev)) drbd_alert(device, "%s [%d]:%s(,%llus,%s) out of range md access!\n", current->comm, current->pid, __func__, (unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ"); /* we do all our meta data IO in aligned 4k blocks. */ err = _drbd_md_sync_page_io(device, bdev, iop, sector, rw, 4096); if (err) { drbd_err(device, "drbd_md_sync_page_io(,%llus,%s) failed with error %d\n", (unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ", err); } return err; } static struct bm_extent * find_active_resync_extent(struct drbd_device *device, unsigned int enr) { struct lc_element *tmp; tmp = lc_find(device->resync, enr/AL_EXT_PER_BM_SECT); if (unlikely(tmp != NULL)) { struct bm_extent *bm_ext = lc_entry(tmp, struct bm_extent, lce); if (test_bit(BME_NO_WRITES, &bm_ext->flags)) return bm_ext; } return NULL; } static struct lc_element *_al_get(struct drbd_device *device, unsigned int enr, bool nonblock) { struct lc_element *al_ext; struct bm_extent *bm_ext; int wake; spin_lock_irq(&device->al_lock); bm_ext = find_active_resync_extent(device, enr); if (bm_ext) { wake = !test_and_set_bit(BME_PRIORITY, &bm_ext->flags); spin_unlock_irq(&device->al_lock); if (wake) wake_up(&device->al_wait); return NULL; } if (nonblock) al_ext = lc_try_get(device->act_log, enr); else al_ext = lc_get(device->act_log, enr); spin_unlock_irq(&device->al_lock); return al_ext; } bool drbd_al_begin_io_fastpath(struct drbd_device *device, struct drbd_interval *i) { /* for bios crossing activity log extent boundaries, * we may need to activate two extents in one go */ unsigned first = i->sector >> (AL_EXTENT_SHIFT-9); unsigned last = i->size == 0 ? first : (i->sector + (i->size >> 9) - 1) >> (AL_EXTENT_SHIFT-9); D_ASSERT(device, (unsigned)(last - first) <= 1); D_ASSERT(device, atomic_read(&device->local_cnt) > 0); /* FIXME figure out a fast path for bios crossing AL extent boundaries */ if (first != last) return false; return _al_get(device, first, true); } bool drbd_al_begin_io_prepare(struct drbd_device *device, struct drbd_interval *i) { /* for bios crossing activity log extent boundaries, * we may need to activate two extents in one go */ unsigned first = i->sector >> (AL_EXTENT_SHIFT-9); unsigned last = i->size == 0 ? first : (i->sector + (i->size >> 9) - 1) >> (AL_EXTENT_SHIFT-9); unsigned enr; bool need_transaction = false; D_ASSERT(device, first <= last); D_ASSERT(device, atomic_read(&device->local_cnt) > 0); for (enr = first; enr <= last; enr++) { struct lc_element *al_ext; wait_event(device->al_wait, (al_ext = _al_get(device, enr, false)) != NULL); if (al_ext->lc_number != enr) need_transaction = true; } return need_transaction; } static int al_write_transaction(struct drbd_device *device, bool delegate); /* When called through generic_make_request(), we must delegate * activity log I/O to the worker thread: a further request * submitted via generic_make_request() within the same task * would be queued on current->bio_list, and would only start * after this function returns (see generic_make_request()). * * However, if we *are* the worker, we must not delegate to ourselves. */ /* * @delegate: delegate activity log I/O to the worker thread */ void drbd_al_begin_io_commit(struct drbd_device *device, bool delegate) { bool locked = false; BUG_ON(delegate && current == first_peer_device(device)->connection->worker.task); /* Serialize multiple transactions. * This uses test_and_set_bit, memory barrier is implicit. */ wait_event(device->al_wait, device->act_log->pending_changes == 0 || (locked = lc_try_lock_for_transaction(device->act_log))); if (locked) { /* Double check: it may have been committed by someone else, * while we have been waiting for the lock. */ if (device->act_log->pending_changes) { bool write_al_updates; rcu_read_lock(); write_al_updates = rcu_dereference(device->ldev->disk_conf)->al_updates; rcu_read_unlock(); if (write_al_updates) al_write_transaction(device, delegate); spin_lock_irq(&device->al_lock); /* FIXME if (err) we need an "lc_cancel" here; */ lc_committed(device->act_log); spin_unlock_irq(&device->al_lock); } lc_unlock(device->act_log); wake_up(&device->al_wait); } } /* * @delegate: delegate activity log I/O to the worker thread */ void drbd_al_begin_io(struct drbd_device *device, struct drbd_interval *i, bool delegate) { BUG_ON(delegate && current == first_peer_device(device)->connection->worker.task); if (drbd_al_begin_io_prepare(device, i)) drbd_al_begin_io_commit(device, delegate); } int drbd_al_begin_io_nonblock(struct drbd_device *device, struct drbd_interval *i) { struct lru_cache *al = device->act_log; /* for bios crossing activity log extent boundaries, * we may need to activate two extents in one go */ unsigned first = i->sector >> (AL_EXTENT_SHIFT-9); unsigned last = i->size == 0 ? first : (i->sector + (i->size >> 9) - 1) >> (AL_EXTENT_SHIFT-9); unsigned nr_al_extents; unsigned available_update_slots; unsigned enr; D_ASSERT(device, first <= last); nr_al_extents = 1 + last - first; /* worst case: all touched extends are cold. */ available_update_slots = min(al->nr_elements - al->used, al->max_pending_changes - al->pending_changes); /* We want all necessary updates for a given request within the same transaction * We could first check how many updates are *actually* needed, * and use that instead of the worst-case nr_al_extents */ if (available_update_slots < nr_al_extents) return -EWOULDBLOCK; /* Is resync active in this area? */ for (enr = first; enr <= last; enr++) { struct lc_element *tmp; tmp = lc_find(device->resync, enr/AL_EXT_PER_BM_SECT); if (unlikely(tmp != NULL)) { struct bm_extent *bm_ext = lc_entry(tmp, struct bm_extent, lce); if (test_bit(BME_NO_WRITES, &bm_ext->flags)) { if (!test_and_set_bit(BME_PRIORITY, &bm_ext->flags)) return -EBUSY; return -EWOULDBLOCK; } } } /* Checkout the refcounts. * Given that we checked for available elements and update slots above, * this has to be successful. */ for (enr = first; enr <= last; enr++) { struct lc_element *al_ext; al_ext = lc_get_cumulative(device->act_log, enr); if (!al_ext) drbd_info(device, "LOGIC BUG for enr=%u\n", enr); } return 0; } void drbd_al_complete_io(struct drbd_device *device, struct drbd_interval *i) { /* for bios crossing activity log extent boundaries, * we may need to activate two extents in one go */ unsigned first = i->sector >> (AL_EXTENT_SHIFT-9); unsigned last = i->size == 0 ? first : (i->sector + (i->size >> 9) - 1) >> (AL_EXTENT_SHIFT-9); unsigned enr; struct lc_element *extent; unsigned long flags; D_ASSERT(device, first <= last); spin_lock_irqsave(&device->al_lock, flags); for (enr = first; enr <= last; enr++) { extent = lc_find(device->act_log, enr); if (!extent) { drbd_err(device, "al_complete_io() called on inactive extent %u\n", enr); continue; } lc_put(device->act_log, extent); } spin_unlock_irqrestore(&device->al_lock, flags); wake_up(&device->al_wait); } #if (PAGE_SHIFT + 3) < (AL_EXTENT_SHIFT - BM_BLOCK_SHIFT) /* Currently BM_BLOCK_SHIFT, BM_EXT_SHIFT and AL_EXTENT_SHIFT * are still coupled, or assume too much about their relation. * Code below will not work if this is violated. * Will be cleaned up with some followup patch. */ # error FIXME #endif static unsigned int al_extent_to_bm_page(unsigned int al_enr) { return al_enr >> /* bit to page */ ((PAGE_SHIFT + 3) - /* al extent number to bit */ (AL_EXTENT_SHIFT - BM_BLOCK_SHIFT)); } static unsigned int rs_extent_to_bm_page(unsigned int rs_enr) { return rs_enr >> /* bit to page */ ((PAGE_SHIFT + 3) - /* resync extent number to bit */ (BM_EXT_SHIFT - BM_BLOCK_SHIFT)); } static sector_t al_tr_number_to_on_disk_sector(struct drbd_device *device) { const unsigned int stripes = device->ldev->md.al_stripes; const unsigned int stripe_size_4kB = device->ldev->md.al_stripe_size_4k; /* transaction number, modulo on-disk ring buffer wrap around */ unsigned int t = device->al_tr_number % (device->ldev->md.al_size_4k); /* ... to aligned 4k on disk block */ t = ((t % stripes) * stripe_size_4kB) + t/stripes; /* ... to 512 byte sector in activity log */ t *= 8; /* ... plus offset to the on disk position */ return device->ldev->md.md_offset + device->ldev->md.al_offset + t; } static int _al_write_transaction(struct drbd_device *device) { struct al_transaction_on_disk *buffer; struct lc_element *e; sector_t sector; int i, mx; unsigned extent_nr; unsigned crc = 0; int err = 0; if (!get_ldev(device)) { drbd_err(device, "disk is %s, cannot start al transaction\n", drbd_disk_str(device->state.disk)); return -EIO; } /* The bitmap write may have failed, causing a state change. */ if (device->state.disk < D_INCONSISTENT) { drbd_err(device, "disk is %s, cannot write al transaction\n", drbd_disk_str(device->state.disk)); put_ldev(device); return -EIO; } buffer = drbd_md_get_buffer(device); /* protects md_io_buffer, al_tr_cycle, ... */ if (!buffer) { drbd_err(device, "disk failed while waiting for md_io buffer\n"); put_ldev(device); return -ENODEV; } memset(buffer, 0, sizeof(*buffer)); buffer->magic = cpu_to_be32(DRBD_AL_MAGIC); buffer->tr_number = cpu_to_be32(device->al_tr_number); i = 0; /* Even though no one can start to change this list * once we set the LC_LOCKED -- from drbd_al_begin_io(), * lc_try_lock_for_transaction() --, someone may still * be in the process of changing it. */ spin_lock_irq(&device->al_lock); list_for_each_entry(e, &device->act_log->to_be_changed, list) { if (i == AL_UPDATES_PER_TRANSACTION) { i++; break; } buffer->update_slot_nr[i] = cpu_to_be16(e->lc_index); buffer->update_extent_nr[i] = cpu_to_be32(e->lc_new_number); if (e->lc_number != LC_FREE) drbd_bm_mark_for_writeout(device, al_extent_to_bm_page(e->lc_number)); i++; } spin_unlock_irq(&device->al_lock); BUG_ON(i > AL_UPDATES_PER_TRANSACTION); buffer->n_updates = cpu_to_be16(i); for ( ; i < AL_UPDATES_PER_TRANSACTION; i++) { buffer->update_slot_nr[i] = cpu_to_be16(-1); buffer->update_extent_nr[i] = cpu_to_be32(LC_FREE); } buffer->context_size = cpu_to_be16(device->act_log->nr_elements); buffer->context_start_slot_nr = cpu_to_be16(device->al_tr_cycle); mx = min_t(int, AL_CONTEXT_PER_TRANSACTION, device->act_log->nr_elements - device->al_tr_cycle); for (i = 0; i < mx; i++) { unsigned idx = device->al_tr_cycle + i; extent_nr = lc_element_by_index(device->act_log, idx)->lc_number; buffer->context[i] = cpu_to_be32(extent_nr); } for (; i < AL_CONTEXT_PER_TRANSACTION; i++) buffer->context[i] = cpu_to_be32(LC_FREE); device->al_tr_cycle += AL_CONTEXT_PER_TRANSACTION; if (device->al_tr_cycle >= device->act_log->nr_elements) device->al_tr_cycle = 0; sector = al_tr_number_to_on_disk_sector(device); crc = crc32c(0, buffer, 4096); buffer->crc32c = cpu_to_be32(crc); if (drbd_bm_write_hinted(device)) err = -EIO; else { bool write_al_updates; rcu_read_lock(); write_al_updates = rcu_dereference(device->ldev->disk_conf)->al_updates; rcu_read_unlock(); if (write_al_updates) { if (drbd_md_sync_page_io(device, device->ldev, sector, WRITE)) { err = -EIO; drbd_chk_io_error(device, 1, DRBD_META_IO_ERROR); } else { device->al_tr_number++; device->al_writ_cnt++; } } } drbd_md_put_buffer(device); put_ldev(device); return err; } static int w_al_write_transaction(struct drbd_work *w, int unused) { struct update_al_work *aw = container_of(w, struct update_al_work, w); struct drbd_device *device = aw->device; int err; err = _al_write_transaction(device); aw->err = err; complete(&aw->event); return err != -EIO ? err : 0; } /* Calls from worker context (see w_restart_disk_io()) need to write the transaction directly. Others came through generic_make_request(), those need to delegate it to the worker. */ static int al_write_transaction(struct drbd_device *device, bool delegate) { if (delegate) { struct update_al_work al_work; init_completion(&al_work.event); al_work.w.cb = w_al_write_transaction; al_work.device = device; drbd_queue_work_front(&first_peer_device(device)->connection->sender_work, &al_work.w); wait_for_completion(&al_work.event); return al_work.err; } else return _al_write_transaction(device); } static int _try_lc_del(struct drbd_device *device, struct lc_element *al_ext) { int rv; spin_lock_irq(&device->al_lock); rv = (al_ext->refcnt == 0); if (likely(rv)) lc_del(device->act_log, al_ext); spin_unlock_irq(&device->al_lock); return rv; } /** * drbd_al_shrink() - Removes all active extents form the activity log * @device: DRBD device. * * Removes all active extents form the activity log, waiting until * the reference count of each entry dropped to 0 first, of course. * * You need to lock device->act_log with lc_try_lock() / lc_unlock() */ void drbd_al_shrink(struct drbd_device *device) { struct lc_element *al_ext; int i; D_ASSERT(device, test_bit(__LC_LOCKED, &device->act_log->flags)); for (i = 0; i < device->act_log->nr_elements; i++) { al_ext = lc_element_by_index(device->act_log, i); if (al_ext->lc_number == LC_FREE) continue; wait_event(device->al_wait, _try_lc_del(device, al_ext)); } wake_up(&device->al_wait); } int drbd_initialize_al(struct drbd_device *device, void *buffer) { struct al_transaction_on_disk *al = buffer; struct drbd_md *md = &device->ldev->md; sector_t al_base = md->md_offset + md->al_offset; int al_size_4k = md->al_stripes * md->al_stripe_size_4k; int i; memset(al, 0, 4096); al->magic = cpu_to_be32(DRBD_AL_MAGIC); al->transaction_type = cpu_to_be16(AL_TR_INITIALIZED); al->crc32c = cpu_to_be32(crc32c(0, al, 4096)); for (i = 0; i < al_size_4k; i++) { int err = drbd_md_sync_page_io(device, device->ldev, al_base + i * 8, WRITE); if (err) return err; } return 0; } static int w_update_odbm(struct drbd_work *w, int unused) { struct update_odbm_work *udw = container_of(w, struct update_odbm_work, w); struct drbd_device *device = udw->device; struct sib_info sib = { .sib_reason = SIB_SYNC_PROGRESS, }; if (!get_ldev(device)) { if (DRBD_ratelimit(5*HZ, 5)) drbd_warn(device, "Can not update on disk bitmap, local IO disabled.\n"); kfree(udw); return 0; } drbd_bm_write_page(device, rs_extent_to_bm_page(udw->enr)); put_ldev(device); kfree(udw); if (drbd_bm_total_weight(device) <= device->rs_failed) { switch (device->state.conn) { case C_SYNC_SOURCE: case C_SYNC_TARGET: case C_PAUSED_SYNC_S: case C_PAUSED_SYNC_T: drbd_resync_finished(device); default: /* nothing to do */ break; } } drbd_bcast_event(device, &sib); return 0; } /* ATTENTION. The AL's extents are 4MB each, while the extents in the * resync LRU-cache are 16MB each. * The caller of this function has to hold an get_ldev() reference. * * TODO will be obsoleted once we have a caching lru of the on disk bitmap */ static void drbd_try_clear_on_disk_bm(struct drbd_device *device, sector_t sector, int count, int success) { struct lc_element *e; struct update_odbm_work *udw; unsigned int enr; D_ASSERT(device, atomic_read(&device->local_cnt)); /* I simply assume that a sector/size pair never crosses * a 16 MB extent border. (Currently this is true...) */ enr = BM_SECT_TO_EXT(sector); e = lc_get(device->resync, enr); if (e) { struct bm_extent *ext = lc_entry(e, struct bm_extent, lce); if (ext->lce.lc_number == enr) { if (success) ext->rs_left -= count; else ext->rs_failed += count; if (ext->rs_left < ext->rs_failed) { drbd_warn(device, "BAD! sector=%llus enr=%u rs_left=%d " "rs_failed=%d count=%d cstate=%s\n", (unsigned long long)sector, ext->lce.lc_number, ext->rs_left, ext->rs_failed, count, drbd_conn_str(device->state.conn)); /* We don't expect to be able to clear more bits * than have been set when we originally counted * the set bits to cache that value in ext->rs_left. * Whatever the reason (disconnect during resync, * delayed local completion of an application write), * try to fix it up by recounting here. */ ext->rs_left = drbd_bm_e_weight(device, enr); } } else { /* Normally this element should be in the cache, * since drbd_rs_begin_io() pulled it already in. * * But maybe an application write finished, and we set * something outside the resync lru_cache in sync. */ int rs_left = drbd_bm_e_weight(device, enr); if (ext->flags != 0) { drbd_warn(device, "changing resync lce: %d[%u;%02lx]" " -> %d[%u;00]\n", ext->lce.lc_number, ext->rs_left, ext->flags, enr, rs_left); ext->flags = 0; } if (ext->rs_failed) { drbd_warn(device, "Kicking resync_lru element enr=%u " "out with rs_failed=%d\n", ext->lce.lc_number, ext->rs_failed); } ext->rs_left = rs_left; ext->rs_failed = success ? 0 : count; /* we don't keep a persistent log of the resync lru, * we can commit any change right away. */ lc_committed(device->resync); } lc_put(device->resync, &ext->lce); /* no race, we are within the al_lock! */ if (ext->rs_left == ext->rs_failed) { ext->rs_failed = 0; udw = kmalloc(sizeof(*udw), GFP_ATOMIC); if (udw) { udw->enr = ext->lce.lc_number; udw->w.cb = w_update_odbm; udw->device = device; drbd_queue_work_front(&first_peer_device(device)->connection->sender_work, &udw->w); } else { drbd_warn(device, "Could not kmalloc an udw\n"); } } } else { drbd_err(device, "lc_get() failed! locked=%d/%d flags=%lu\n", device->resync_locked, device->resync->nr_elements, device->resync->flags); } } void drbd_advance_rs_marks(struct drbd_device *device, unsigned long still_to_go) { unsigned long now = jiffies; unsigned long last = device->rs_mark_time[device->rs_last_mark]; int next = (device->rs_last_mark + 1) % DRBD_SYNC_MARKS; if (time_after_eq(now, last + DRBD_SYNC_MARK_STEP)) { if (device->rs_mark_left[device->rs_last_mark] != still_to_go && device->state.conn != C_PAUSED_SYNC_T && device->state.conn != C_PAUSED_SYNC_S) { device->rs_mark_time[next] = now; device->rs_mark_left[next] = still_to_go; device->rs_last_mark = next; } } } /* clear the bit corresponding to the piece of storage in question: * size byte of data starting from sector. Only clear a bits of the affected * one ore more _aligned_ BM_BLOCK_SIZE blocks. * * called by worker on C_SYNC_TARGET and receiver on SyncSource. * */ void __drbd_set_in_sync(struct drbd_device *device, sector_t sector, int size, const char *file, const unsigned int line) { /* Is called from worker and receiver context _only_ */ unsigned long sbnr, ebnr, lbnr; unsigned long count = 0; sector_t esector, nr_sectors; int wake_up = 0; unsigned long flags; if (size <= 0 || !IS_ALIGNED(size, 512) || size > DRBD_MAX_DISCARD_SIZE) { drbd_err(device, "drbd_set_in_sync: sector=%llus size=%d nonsense!\n", (unsigned long long)sector, size); return; } if (!get_ldev(device)) return; /* no disk, no metadata, no bitmap to clear bits in */ nr_sectors = drbd_get_capacity(device->this_bdev); esector = sector + (size >> 9) - 1; if (!expect(sector < nr_sectors)) goto out; if (!expect(esector < nr_sectors)) esector = nr_sectors - 1; lbnr = BM_SECT_TO_BIT(nr_sectors-1); /* we clear it (in sync). * round up start sector, round down end sector. we make sure we only * clear full, aligned, BM_BLOCK_SIZE (4K) blocks */ if (unlikely(esector < BM_SECT_PER_BIT-1)) goto out; if (unlikely(esector == (nr_sectors-1))) ebnr = lbnr; else ebnr = BM_SECT_TO_BIT(esector - (BM_SECT_PER_BIT-1)); sbnr = BM_SECT_TO_BIT(sector + BM_SECT_PER_BIT-1); if (sbnr > ebnr) goto out; /* * ok, (capacity & 7) != 0 sometimes, but who cares... * we count rs_{total,left} in bits, not sectors. */ count = drbd_bm_clear_bits(device, sbnr, ebnr); if (count) { drbd_advance_rs_marks(device, drbd_bm_total_weight(device)); spin_lock_irqsave(&device->al_lock, flags); drbd_try_clear_on_disk_bm(device, sector, count, true); spin_unlock_irqrestore(&device->al_lock, flags); /* just wake_up unconditional now, various lc_chaged(), * lc_put() in drbd_try_clear_on_disk_bm(). */ wake_up = 1; } out: put_ldev(device); if (wake_up) wake_up(&device->al_wait); } /* * this is intended to set one request worth of data out of sync. * affects at least 1 bit, * and at most 1+DRBD_MAX_BIO_SIZE/BM_BLOCK_SIZE bits. * * called by tl_clear and drbd_send_dblock (==drbd_make_request). * so this can be _any_ process. */ int __drbd_set_out_of_sync(struct drbd_device *device, sector_t sector, int size, const char *file, const unsigned int line) { unsigned long sbnr, ebnr, flags; sector_t esector, nr_sectors; unsigned int enr, count = 0; struct lc_element *e; /* this should be an empty REQ_FLUSH */ if (size == 0) return 0; if (size < 0 || !IS_ALIGNED(size, 512) || size > DRBD_MAX_DISCARD_SIZE) { drbd_err(device, "sector: %llus, size: %d\n", (unsigned long long)sector, size); return 0; } if (!get_ldev(device)) return 0; /* no disk, no metadata, no bitmap to set bits in */ nr_sectors = drbd_get_capacity(device->this_bdev); esector = sector + (size >> 9) - 1; if (!expect(sector < nr_sectors)) goto out; if (!expect(esector < nr_sectors)) esector = nr_sectors - 1; /* we set it out of sync, * we do not need to round anything here */ sbnr = BM_SECT_TO_BIT(sector); ebnr = BM_SECT_TO_BIT(esector); /* ok, (capacity & 7) != 0 sometimes, but who cares... * we count rs_{total,left} in bits, not sectors. */ spin_lock_irqsave(&device->al_lock, flags); count = drbd_bm_set_bits(device, sbnr, ebnr); enr = BM_SECT_TO_EXT(sector); e = lc_find(device->resync, enr); if (e) lc_entry(e, struct bm_extent, lce)->rs_left += count; spin_unlock_irqrestore(&device->al_lock, flags); out: put_ldev(device); return count; } static struct bm_extent *_bme_get(struct drbd_device *device, unsigned int enr) { struct lc_element *e; struct bm_extent *bm_ext; int wakeup = 0; unsigned long rs_flags; spin_lock_irq(&device->al_lock); if (device->resync_locked > device->resync->nr_elements/2) { spin_unlock_irq(&device->al_lock); return NULL; } e = lc_get(device->resync, enr); bm_ext = e ? lc_entry(e, struct bm_extent, lce) : NULL; if (bm_ext) { if (bm_ext->lce.lc_number != enr) { bm_ext->rs_left = drbd_bm_e_weight(device, enr); bm_ext->rs_failed = 0; lc_committed(device->resync); wakeup = 1; } if (bm_ext->lce.refcnt == 1) device->resync_locked++; set_bit(BME_NO_WRITES, &bm_ext->flags); } rs_flags = device->resync->flags; spin_unlock_irq(&device->al_lock); if (wakeup) wake_up(&device->al_wait); if (!bm_ext) { if (rs_flags & LC_STARVING) drbd_warn(device, "Have to wait for element" " (resync LRU too small?)\n"); BUG_ON(rs_flags & LC_LOCKED); } return bm_ext; } static int _is_in_al(struct drbd_device *device, unsigned int enr) { int rv; spin_lock_irq(&device->al_lock); rv = lc_is_used(device->act_log, enr); spin_unlock_irq(&device->al_lock); return rv; } /** * drbd_rs_begin_io() - Gets an extent in the resync LRU cache and sets it to BME_LOCKED * @device: DRBD device. * @sector: The sector number. * * This functions sleeps on al_wait. Returns 0 on success, -EINTR if interrupted. */ int drbd_rs_begin_io(struct drbd_device *device, sector_t sector) { unsigned int enr = BM_SECT_TO_EXT(sector); struct bm_extent *bm_ext; int i, sig; bool sa; retry: sig = wait_event_interruptible(device->al_wait, (bm_ext = _bme_get(device, enr))); if (sig) return -EINTR; if (test_bit(BME_LOCKED, &bm_ext->flags)) return 0; /* step aside only while we are above c-min-rate; unless disabled. */ sa = drbd_rs_c_min_rate_throttle(device); for (i = 0; i < AL_EXT_PER_BM_SECT; i++) { sig = wait_event_interruptible(device->al_wait, !_is_in_al(device, enr * AL_EXT_PER_BM_SECT + i) || (sa && test_bit(BME_PRIORITY, &bm_ext->flags))); if (sig || (sa && test_bit(BME_PRIORITY, &bm_ext->flags))) { spin_lock_irq(&device->al_lock); if (lc_put(device->resync, &bm_ext->lce) == 0) { bm_ext->flags = 0; /* clears BME_NO_WRITES and eventually BME_PRIORITY */ device->resync_locked--; wake_up(&device->al_wait); } spin_unlock_irq(&device->al_lock); if (sig) return -EINTR; if (schedule_timeout_interruptible(HZ/10)) return -EINTR; goto retry; } } set_bit(BME_LOCKED, &bm_ext->flags); return 0; } /** * drbd_try_rs_begin_io() - Gets an extent in the resync LRU cache, does not sleep * @device: DRBD device. * @sector: The sector number. * * Gets an extent in the resync LRU cache, sets it to BME_NO_WRITES, then * tries to set it to BME_LOCKED. Returns 0 upon success, and -EAGAIN * if there is still application IO going on in this area. */ int drbd_try_rs_begin_io(struct drbd_device *device, sector_t sector) { unsigned int enr = BM_SECT_TO_EXT(sector); const unsigned int al_enr = enr*AL_EXT_PER_BM_SECT; struct lc_element *e; struct bm_extent *bm_ext; int i; spin_lock_irq(&device->al_lock); if (device->resync_wenr != LC_FREE && device->resync_wenr != enr) { /* in case you have very heavy scattered io, it may * stall the syncer undefined if we give up the ref count * when we try again and requeue. * * if we don't give up the refcount, but the next time * we are scheduled this extent has been "synced" by new * application writes, we'd miss the lc_put on the * extent we keep the refcount on. * so we remembered which extent we had to try again, and * if the next requested one is something else, we do * the lc_put here... * we also have to wake_up */ e = lc_find(device->resync, device->resync_wenr); bm_ext = e ? lc_entry(e, struct bm_extent, lce) : NULL; if (bm_ext) { D_ASSERT(device, !test_bit(BME_LOCKED, &bm_ext->flags)); D_ASSERT(device, test_bit(BME_NO_WRITES, &bm_ext->flags)); clear_bit(BME_NO_WRITES, &bm_ext->flags); device->resync_wenr = LC_FREE; if (lc_put(device->resync, &bm_ext->lce) == 0) device->resync_locked--; wake_up(&device->al_wait); } else { drbd_alert(device, "LOGIC BUG\n"); } } /* TRY. */ e = lc_try_get(device->resync, enr); bm_ext = e ? lc_entry(e, struct bm_extent, lce) : NULL; if (bm_ext) { if (test_bit(BME_LOCKED, &bm_ext->flags)) goto proceed; if (!test_and_set_bit(BME_NO_WRITES, &bm_ext->flags)) { device->resync_locked++; } else { /* we did set the BME_NO_WRITES, * but then could not set BME_LOCKED, * so we tried again. * drop the extra reference. */ bm_ext->lce.refcnt--; D_ASSERT(device, bm_ext->lce.refcnt > 0); } goto check_al; } else { /* do we rather want to try later? */ if (device->resync_locked > device->resync->nr_elements-3) goto try_again; /* Do or do not. There is no try. -- Yoda */ e = lc_get(device->resync, enr); bm_ext = e ? lc_entry(e, struct bm_extent, lce) : NULL; if (!bm_ext) { const unsigned long rs_flags = device->resync->flags; if (rs_flags & LC_STARVING) drbd_warn(device, "Have to wait for element" " (resync LRU too small?)\n"); BUG_ON(rs_flags & LC_LOCKED); goto try_again; } if (bm_ext->lce.lc_number != enr) { bm_ext->rs_left = drbd_bm_e_weight(device, enr); bm_ext->rs_failed = 0; lc_committed(device->resync); wake_up(&device->al_wait); D_ASSERT(device, test_bit(BME_LOCKED, &bm_ext->flags) == 0); } set_bit(BME_NO_WRITES, &bm_ext->flags); D_ASSERT(device, bm_ext->lce.refcnt == 1); device->resync_locked++; goto check_al; } check_al: for (i = 0; i < AL_EXT_PER_BM_SECT; i++) { if (lc_is_used(device->act_log, al_enr+i)) goto try_again; } set_bit(BME_LOCKED, &bm_ext->flags); proceed: device->resync_wenr = LC_FREE; spin_unlock_irq(&device->al_lock); return 0; try_again: if (bm_ext) device->resync_wenr = enr; spin_unlock_irq(&device->al_lock); return -EAGAIN; } void drbd_rs_complete_io(struct drbd_device *device, sector_t sector) { unsigned int enr = BM_SECT_TO_EXT(sector); struct lc_element *e; struct bm_extent *bm_ext; unsigned long flags; spin_lock_irqsave(&device->al_lock, flags); e = lc_find(device->resync, enr); bm_ext = e ? lc_entry(e, struct bm_extent, lce) : NULL; if (!bm_ext) { spin_unlock_irqrestore(&device->al_lock, flags); if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "drbd_rs_complete_io() called, but extent not found\n"); return; } if (bm_ext->lce.refcnt == 0) { spin_unlock_irqrestore(&device->al_lock, flags); drbd_err(device, "drbd_rs_complete_io(,%llu [=%u]) called, " "but refcnt is 0!?\n", (unsigned long long)sector, enr); return; } if (lc_put(device->resync, &bm_ext->lce) == 0) { bm_ext->flags = 0; /* clear BME_LOCKED, BME_NO_WRITES and BME_PRIORITY */ device->resync_locked--; wake_up(&device->al_wait); } spin_unlock_irqrestore(&device->al_lock, flags); } /** * drbd_rs_cancel_all() - Removes all extents from the resync LRU (even BME_LOCKED) * @device: DRBD device. */ void drbd_rs_cancel_all(struct drbd_device *device) { spin_lock_irq(&device->al_lock); if (get_ldev_if_state(device, D_FAILED)) { /* Makes sure ->resync is there. */ lc_reset(device->resync); put_ldev(device); } device->resync_locked = 0; device->resync_wenr = LC_FREE; spin_unlock_irq(&device->al_lock); wake_up(&device->al_wait); } /** * drbd_rs_del_all() - Gracefully remove all extents from the resync LRU * @device: DRBD device. * * Returns 0 upon success, -EAGAIN if at least one reference count was * not zero. */ int drbd_rs_del_all(struct drbd_device *device) { struct lc_element *e; struct bm_extent *bm_ext; int i; spin_lock_irq(&device->al_lock); if (get_ldev_if_state(device, D_FAILED)) { /* ok, ->resync is there. */ for (i = 0; i < device->resync->nr_elements; i++) { e = lc_element_by_index(device->resync, i); bm_ext = lc_entry(e, struct bm_extent, lce); if (bm_ext->lce.lc_number == LC_FREE) continue; if (bm_ext->lce.lc_number == device->resync_wenr) { drbd_info(device, "dropping %u in drbd_rs_del_all, apparently" " got 'synced' by application io\n", device->resync_wenr); D_ASSERT(device, !test_bit(BME_LOCKED, &bm_ext->flags)); D_ASSERT(device, test_bit(BME_NO_WRITES, &bm_ext->flags)); clear_bit(BME_NO_WRITES, &bm_ext->flags); device->resync_wenr = LC_FREE; lc_put(device->resync, &bm_ext->lce); } if (bm_ext->lce.refcnt != 0) { drbd_info(device, "Retrying drbd_rs_del_all() later. " "refcnt=%d\n", bm_ext->lce.refcnt); put_ldev(device); spin_unlock_irq(&device->al_lock); return -EAGAIN; } D_ASSERT(device, !test_bit(BME_LOCKED, &bm_ext->flags)); D_ASSERT(device, !test_bit(BME_NO_WRITES, &bm_ext->flags)); lc_del(device->resync, &bm_ext->lce); } D_ASSERT(device, device->resync->used == 0); put_ldev(device); } spin_unlock_irq(&device->al_lock); wake_up(&device->al_wait); return 0; } /** * drbd_rs_failed_io() - Record information on a failure to resync the specified blocks * @device: DRBD device. * @sector: The sector number. * @size: Size of failed IO operation, in byte. */ void drbd_rs_failed_io(struct drbd_device *device, sector_t sector, int size) { /* Is called from worker and receiver context _only_ */ unsigned long sbnr, ebnr, lbnr; unsigned long count; sector_t esector, nr_sectors; int wake_up = 0; if (size <= 0 || !IS_ALIGNED(size, 512) || size > DRBD_MAX_DISCARD_SIZE) { drbd_err(device, "drbd_rs_failed_io: sector=%llus size=%d nonsense!\n", (unsigned long long)sector, size); return; } nr_sectors = drbd_get_capacity(device->this_bdev); esector = sector + (size >> 9) - 1; if (!expect(sector < nr_sectors)) return; if (!expect(esector < nr_sectors)) esector = nr_sectors - 1; lbnr = BM_SECT_TO_BIT(nr_sectors-1); /* * round up start sector, round down end sector. we make sure we only * handle full, aligned, BM_BLOCK_SIZE (4K) blocks */ if (unlikely(esector < BM_SECT_PER_BIT-1)) return; if (unlikely(esector == (nr_sectors-1))) ebnr = lbnr; else ebnr = BM_SECT_TO_BIT(esector - (BM_SECT_PER_BIT-1)); sbnr = BM_SECT_TO_BIT(sector + BM_SECT_PER_BIT-1); if (sbnr > ebnr) return; /* * ok, (capacity & 7) != 0 sometimes, but who cares... * we count rs_{total,left} in bits, not sectors. */ spin_lock_irq(&device->al_lock); count = drbd_bm_count_bits(device, sbnr, ebnr); if (count) { device->rs_failed += count; if (get_ldev(device)) { drbd_try_clear_on_disk_bm(device, sector, count, false); put_ldev(device); } /* just wake_up unconditional now, various lc_chaged(), * lc_put() in drbd_try_clear_on_disk_bm(). */ wake_up = 1; } spin_unlock_irq(&device->al_lock); if (wake_up) wake_up(&device->al_wait); } drbd-8.4.4/drbd/drbd_bitmap.c0000664000000000000000000014172012221261130014463 0ustar rootroot/* drbd_bitmap.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2004-2008, LINBIT Information Technologies GmbH. Copyright (C) 2004-2008, Philipp Reisner . Copyright (C) 2004-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include "drbd_int.h" /* See the ifdefs and comments inside that header file. * On recent kernels this is not needed. */ #include "compat/bitops.h" /* OPAQUE outside this file! * interface defined in drbd_int.h * convention: * function name drbd_bm_... => used elsewhere, "public". * function name bm_... => internal to implementation, "private". */ /* * LIMITATIONS: * We want to support >= peta byte of backend storage, while for now still using * a granularity of one bit per 4KiB of storage. * 1 << 50 bytes backend storage (1 PiB) * 1 << (50 - 12) bits needed * 38 --> we need u64 to index and count bits * 1 << (38 - 3) bitmap bytes needed * 35 --> we still need u64 to index and count bytes * (that's 32 GiB of bitmap for 1 PiB storage) * 1 << (35 - 2) 32bit longs needed * 33 --> we'd even need u64 to index and count 32bit long words. * 1 << (35 - 3) 64bit longs needed * 32 --> we could get away with a 32bit unsigned int to index and count * 64bit long words, but I rather stay with unsigned long for now. * We probably should neither count nor point to bytes or long words * directly, but either by bitnumber, or by page index and offset. * 1 << (35 - 12) * 22 --> we need that much 4KiB pages of bitmap. * 1 << (22 + 3) --> on a 64bit arch, * we need 32 MiB to store the array of page pointers. * * Because I'm lazy, and because the resulting patch was too large, too ugly * and still incomplete, on 32bit we still "only" support 16 TiB (minus some), * (1 << 32) bits * 4k storage. * * bitmap storage and IO: * Bitmap is stored little endian on disk, and is kept little endian in * core memory. Currently we still hold the full bitmap in core as long * as we are "attached" to a local disk, which at 32 GiB for 1PiB storage * seems excessive. * * We plan to reduce the amount of in-core bitmap pages by paging them in * and out against their on-disk location as necessary, but need to make * sure we don't cause too much meta data IO, and must not deadlock in * tight memory situations. This needs some more work. */ /* * NOTE * Access to the *bm_pages is protected by bm_lock. * It is safe to read the other members within the lock. * * drbd_bm_set_bits is called from bio_endio callbacks, * We may be called with irq already disabled, * so we need spin_lock_irqsave(). * And we need the kmap_atomic. */ struct drbd_bitmap { struct page **bm_pages; spinlock_t bm_lock; /* see LIMITATIONS: above */ unsigned long bm_set; /* nr of set bits; THINK maybe atomic_t? */ unsigned long bm_bits; size_t bm_words; size_t bm_number_of_pages; sector_t bm_dev_capacity; struct mutex bm_change; /* serializes resize operations */ wait_queue_head_t bm_io_wait; /* used to serialize IO of single pages */ enum bm_flag bm_flags; /* debugging aid, in case we are still racy somewhere */ char *bm_why; struct task_struct *bm_task; }; #define bm_print_lock_info(m) __bm_print_lock_info(m, __func__) static void __bm_print_lock_info(struct drbd_device *device, const char *func) { struct drbd_bitmap *b = device->bitmap; if (!DRBD_ratelimit(5*HZ, 5)) return; drbd_err(device, "FIXME %s[%d] in %s, bitmap locked for '%s' by %s[%d]\n", current->comm, task_pid_nr(current), func, b->bm_why ?: "?", b->bm_task->comm, task_pid_nr(b->bm_task)); } void drbd_bm_lock(struct drbd_device *device, char *why, enum bm_flag flags) { struct drbd_bitmap *b = device->bitmap; int trylock_failed; if (!b) { drbd_err(device, "FIXME no bitmap in drbd_bm_lock!?\n"); return; } trylock_failed = !mutex_trylock(&b->bm_change); if (trylock_failed) { drbd_warn(device, "%s[%d] going to '%s' but bitmap already locked for '%s' by %s[%d]\n", current->comm, task_pid_nr(current), why, b->bm_why ?: "?", b->bm_task->comm, task_pid_nr(b->bm_task)); mutex_lock(&b->bm_change); } if (BM_LOCKED_MASK & b->bm_flags) drbd_err(device, "FIXME bitmap already locked in bm_lock\n"); b->bm_flags |= flags & BM_LOCKED_MASK; b->bm_why = why; b->bm_task = current; } void drbd_bm_unlock(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; if (!b) { drbd_err(device, "FIXME no bitmap in drbd_bm_unlock!?\n"); return; } if (!(BM_LOCKED_MASK & device->bitmap->bm_flags)) drbd_err(device, "FIXME bitmap not locked in bm_unlock\n"); b->bm_flags &= ~BM_LOCKED_MASK; b->bm_why = NULL; b->bm_task = NULL; mutex_unlock(&b->bm_change); } /* we store some "meta" info about our pages in page->private */ /* at a granularity of 4k storage per bitmap bit: * one peta byte storage: 1<<50 byte, 1<<38 * 4k storage blocks * 1<<38 bits, * 1<<23 4k bitmap pages. * Use 24 bits as page index, covers 2 peta byte storage * at a granularity of 4k per bit. * Used to report the failed page idx on io error from the endio handlers. */ #define BM_PAGE_IDX_MASK ((1UL<<24)-1) /* this page is currently read in, or written back */ #define BM_PAGE_IO_LOCK 31 /* if there has been an IO error for this page */ #define BM_PAGE_IO_ERROR 30 /* this is to be able to intelligently skip disk IO, * set if bits have been set since last IO. */ #define BM_PAGE_NEED_WRITEOUT 29 /* to mark for lazy writeout once syncer cleared all clearable bits, * we if bits have been cleared since last IO. */ #define BM_PAGE_LAZY_WRITEOUT 28 /* pages marked with this "HINT" will be considered for writeout * on activity log transactions */ #define BM_PAGE_HINT_WRITEOUT 27 /* store_page_idx uses non-atomic assignment. It is only used directly after * allocating the page. All other bm_set_page_* and bm_clear_page_* need to * use atomic bit manipulation, as set_out_of_sync (and therefore bitmap * changes) may happen from various contexts, and wait_on_bit/wake_up_bit * requires it all to be atomic as well. */ static void bm_store_page_idx(struct page *page, unsigned long idx) { BUG_ON(0 != (idx & ~BM_PAGE_IDX_MASK)); set_page_private(page, idx); } static unsigned long bm_page_to_idx(struct page *page) { return page_private(page) & BM_PAGE_IDX_MASK; } /* As is very unlikely that the same page is under IO from more than one * context, we can get away with a bit per page and one wait queue per bitmap. */ static void bm_page_lock_io(struct drbd_device *device, int page_nr) { struct drbd_bitmap *b = device->bitmap; void *addr = &page_private(b->bm_pages[page_nr]); wait_event(b->bm_io_wait, !test_and_set_bit(BM_PAGE_IO_LOCK, addr)); } static void bm_page_unlock_io(struct drbd_device *device, int page_nr) { struct drbd_bitmap *b = device->bitmap; void *addr = &page_private(b->bm_pages[page_nr]); clear_bit_unlock(BM_PAGE_IO_LOCK, addr); wake_up(&device->bitmap->bm_io_wait); } /* set _before_ submit_io, so it may be reset due to being changed * while this page is in flight... will get submitted later again */ static void bm_set_page_unchanged(struct page *page) { /* use cmpxchg? */ clear_bit(BM_PAGE_NEED_WRITEOUT, &page_private(page)); clear_bit(BM_PAGE_LAZY_WRITEOUT, &page_private(page)); } static void bm_set_page_need_writeout(struct page *page) { set_bit(BM_PAGE_NEED_WRITEOUT, &page_private(page)); } /** * drbd_bm_mark_for_writeout() - mark a page with a "hint" to be considered for writeout * @device: DRBD device. * @page_nr: the bitmap page to mark with the "hint" flag * * From within an activity log transaction, we mark a few pages with these * hints, then call drbd_bm_write_hinted(), which will only write out changed * pages which are flagged with this mark. */ void drbd_bm_mark_for_writeout(struct drbd_device *device, int page_nr) { struct page *page; if (page_nr >= device->bitmap->bm_number_of_pages) { drbd_warn(device, "BAD: page_nr: %u, number_of_pages: %u\n", page_nr, (int)device->bitmap->bm_number_of_pages); return; } page = device->bitmap->bm_pages[page_nr]; set_bit(BM_PAGE_HINT_WRITEOUT, &page_private(page)); } static int bm_test_page_unchanged(struct page *page) { volatile const unsigned long *addr = &page_private(page); return (*addr & ((1UL<> PAGE_SHIFT; */ unsigned int page_nr = long_nr >> (PAGE_SHIFT - LN2_BPL + 3); BUG_ON(page_nr >= b->bm_number_of_pages); return page_nr; } static unsigned int bm_bit_to_page_idx(struct drbd_bitmap *b, u64 bitnr) { /* page_nr = (bitnr/8) >> PAGE_SHIFT; */ unsigned int page_nr = bitnr >> (PAGE_SHIFT + 3); BUG_ON(page_nr >= b->bm_number_of_pages); return page_nr; } #ifdef COMPAT_KMAP_ATOMIC_PAGE_ONLY #define __bm_map_pidx(b, idx, km) ___bm_map_pidx(b, idx) static unsigned long *___bm_map_pidx(struct drbd_bitmap *b, unsigned int idx) #else static unsigned long *__bm_map_pidx(struct drbd_bitmap *b, unsigned int idx, const enum km_type km) #endif { struct page *page = b->bm_pages[idx]; return (unsigned long *) drbd_kmap_atomic(page, km); } static unsigned long *bm_map_pidx(struct drbd_bitmap *b, unsigned int idx) { return __bm_map_pidx(b, idx, KM_IRQ1); } #ifdef COMPAT_KMAP_ATOMIC_PAGE_ONLY #define __bm_unmap(p_addr, km) ___bm_unmap(p_addr) static void ___bm_unmap(unsigned long *p_addr) #else static void __bm_unmap(unsigned long *p_addr, const enum km_type km) #endif { drbd_kunmap_atomic(p_addr, km); }; static void bm_unmap(unsigned long *p_addr) { return __bm_unmap(p_addr, KM_IRQ1); } /* long word offset of _bitmap_ sector */ #define S2W(s) ((s)<<(BM_EXT_SHIFT-BM_BLOCK_SHIFT-LN2_BPL)) /* word offset from start of bitmap to word number _in_page_ * modulo longs per page #define MLPP(X) ((X) % (PAGE_SIZE/sizeof(long)) hm, well, Philipp thinks gcc might not optimize the % into & (... - 1) so do it explicitly: */ #define MLPP(X) ((X) & ((PAGE_SIZE/sizeof(long))-1)) /* Long words per page */ #define LWPP (PAGE_SIZE/sizeof(long)) /* * actually most functions herein should take a struct drbd_bitmap*, not a * struct drbd_device*, but for the debug macros I like to have the device around * to be able to report device specific. */ static void bm_free_pages(struct page **pages, unsigned long number) { unsigned long i; if (!pages) return; for (i = 0; i < number; i++) { if (!pages[i]) { printk(KERN_ALERT "drbd: bm_free_pages tried to free " "a NULL pointer; i=%lu n=%lu\n", i, number); continue; } __free_page(pages[i]); pages[i] = NULL; } } static void bm_vk_free(void *ptr, int v) { if (v) vfree(ptr); else kfree(ptr); } /* * "have" and "want" are NUMBER OF PAGES. */ static struct page **bm_realloc_pages(struct drbd_bitmap *b, unsigned long want) { struct page **old_pages = b->bm_pages; struct page **new_pages, *page; unsigned int i, bytes, vmalloced = 0; unsigned long have = b->bm_number_of_pages; BUG_ON(have == 0 && old_pages != NULL); BUG_ON(have != 0 && old_pages == NULL); if (have == want) return old_pages; /* Trying kmalloc first, falling back to vmalloc. * GFP_NOIO, as this is called while drbd IO is "suspended", * and during resize or attach on diskless Primary, * we must not block on IO to ourselves. * Context is receiver thread or dmsetup. */ bytes = sizeof(struct page *)*want; new_pages = kzalloc(bytes, GFP_NOIO); if (!new_pages) { new_pages = __vmalloc(bytes, GFP_NOIO | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL); if (!new_pages) return NULL; vmalloced = 1; } if (want >= have) { for (i = 0; i < have; i++) new_pages[i] = old_pages[i]; for (; i < want; i++) { page = alloc_page(GFP_NOIO | __GFP_HIGHMEM); if (!page) { bm_free_pages(new_pages + have, i - have); bm_vk_free(new_pages, vmalloced); return NULL; } /* we want to know which page it is * from the endio handlers */ bm_store_page_idx(page, i); new_pages[i] = page; } } else { for (i = 0; i < want; i++) new_pages[i] = old_pages[i]; /* NOT HERE, we are outside the spinlock! bm_free_pages(old_pages + want, have - want); */ } if (vmalloced) b->bm_flags |= BM_P_VMALLOCED; else b->bm_flags &= ~BM_P_VMALLOCED; return new_pages; } /* * called on driver init only. TODO call when a device is created. * allocates the drbd_bitmap, and stores it in device->bitmap. */ int drbd_bm_init(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; WARN_ON(b != NULL); b = kzalloc(sizeof(struct drbd_bitmap), GFP_KERNEL); if (!b) return -ENOMEM; spin_lock_init(&b->bm_lock); mutex_init(&b->bm_change); init_waitqueue_head(&b->bm_io_wait); device->bitmap = b; return 0; } sector_t drbd_bm_capacity(struct drbd_device *device) { if (!expect(device->bitmap)) return 0; return device->bitmap->bm_dev_capacity; } /* called on driver unload. TODO: call when a device is destroyed. */ void drbd_bm_cleanup(struct drbd_device *device) { if (!expect(device->bitmap)) return; bm_free_pages(device->bitmap->bm_pages, device->bitmap->bm_number_of_pages); bm_vk_free(device->bitmap->bm_pages, (BM_P_VMALLOCED & device->bitmap->bm_flags)); kfree(device->bitmap); device->bitmap = NULL; } /* * since (b->bm_bits % BITS_PER_LONG) != 0, * this masks out the remaining bits. * Returns the number of bits cleared. */ #define BITS_PER_PAGE (1UL << (PAGE_SHIFT + 3)) #define BITS_PER_PAGE_MASK (BITS_PER_PAGE - 1) #define BITS_PER_LONG_MASK (BITS_PER_LONG - 1) static int bm_clear_surplus(struct drbd_bitmap *b) { unsigned long mask; unsigned long *p_addr, *bm; int tmp; int cleared = 0; /* number of bits modulo bits per page */ tmp = (b->bm_bits & BITS_PER_PAGE_MASK); /* mask the used bits of the word containing the last bit */ mask = (1UL << (tmp & BITS_PER_LONG_MASK)) -1; /* bitmap is always stored little endian, * on disk and in core memory alike */ mask = cpu_to_lel(mask); p_addr = bm_map_pidx(b, b->bm_number_of_pages - 1); bm = p_addr + (tmp/BITS_PER_LONG); if (mask) { /* If mask != 0, we are not exactly aligned, so bm now points * to the long containing the last bit. * If mask == 0, bm already points to the word immediately * after the last (long word aligned) bit. */ cleared = hweight_long(*bm & ~mask); *bm &= mask; bm++; } if (BITS_PER_LONG == 32 && ((bm - p_addr) & 1) == 1) { /* on a 32bit arch, we may need to zero out * a padding long to align with a 64bit remote */ cleared += hweight_long(*bm); *bm = 0; } bm_unmap(p_addr); return cleared; } static void bm_set_surplus(struct drbd_bitmap *b) { unsigned long mask; unsigned long *p_addr, *bm; int tmp; /* number of bits modulo bits per page */ tmp = (b->bm_bits & BITS_PER_PAGE_MASK); /* mask the used bits of the word containing the last bit */ mask = (1UL << (tmp & BITS_PER_LONG_MASK)) -1; /* bitmap is always stored little endian, * on disk and in core memory alike */ mask = cpu_to_lel(mask); p_addr = bm_map_pidx(b, b->bm_number_of_pages - 1); bm = p_addr + (tmp/BITS_PER_LONG); if (mask) { /* If mask != 0, we are not exactly aligned, so bm now points * to the long containing the last bit. * If mask == 0, bm already points to the word immediately * after the last (long word aligned) bit. */ *bm |= ~mask; bm++; } if (BITS_PER_LONG == 32 && ((bm - p_addr) & 1) == 1) { /* on a 32bit arch, we may need to zero out * a padding long to align with a 64bit remote */ *bm = ~0UL; } bm_unmap(p_addr); } /* you better not modify the bitmap while this is running, * or its results will be stale */ static unsigned long bm_count_bits(struct drbd_bitmap *b) { unsigned long *p_addr; unsigned long bits = 0; unsigned long mask = (1UL << (b->bm_bits & BITS_PER_LONG_MASK)) -1; int idx, i, last_word; /* all but last page */ for (idx = 0; idx < b->bm_number_of_pages - 1; idx++) { p_addr = __bm_map_pidx(b, idx, KM_USER0); for (i = 0; i < LWPP; i++) bits += hweight_long(p_addr[i]); __bm_unmap(p_addr, KM_USER0); cond_resched(); } /* last (or only) page */ last_word = ((b->bm_bits - 1) & BITS_PER_PAGE_MASK) >> LN2_BPL; p_addr = __bm_map_pidx(b, idx, KM_USER0); for (i = 0; i < last_word; i++) bits += hweight_long(p_addr[i]); p_addr[last_word] &= cpu_to_lel(mask); bits += hweight_long(p_addr[last_word]); /* 32bit arch, may have an unused padding long */ if (BITS_PER_LONG == 32 && (last_word & 1) == 0) p_addr[last_word+1] = 0; __bm_unmap(p_addr, KM_USER0); return bits; } /* offset and len in long words.*/ static void bm_memset(struct drbd_bitmap *b, size_t offset, int c, size_t len) { unsigned long *p_addr, *bm; unsigned int idx; size_t do_now, end; end = offset + len; if (end > b->bm_words) { printk(KERN_ALERT "drbd: bm_memset end > bm_words\n"); return; } while (offset < end) { do_now = min_t(size_t, ALIGN(offset + 1, LWPP), end) - offset; idx = bm_word_to_page_idx(b, offset); p_addr = bm_map_pidx(b, idx); bm = p_addr + MLPP(offset); if (bm+do_now > p_addr + LWPP) { printk(KERN_ALERT "drbd: BUG BUG BUG! p_addr:%p bm:%p do_now:%d\n", p_addr, bm, (int)do_now); } else memset(bm, c, do_now * sizeof(long)); bm_unmap(p_addr); bm_set_page_need_writeout(b->bm_pages[idx]); offset += do_now; } } /* For the layout, see comment above drbd_md_set_sector_offsets(). */ static u64 drbd_md_on_disk_bits(struct drbd_backing_dev *ldev) { u64 bitmap_sectors; if (ldev->md.al_offset == 8) bitmap_sectors = ldev->md.md_size_sect - ldev->md.bm_offset; else bitmap_sectors = ldev->md.al_offset - ldev->md.bm_offset; return bitmap_sectors << (9 + 3); } /* * make sure the bitmap has enough room for the attached storage, * if necessary, resize. * called whenever we may have changed the device size. * returns -ENOMEM if we could not allocate enough memory, 0 on success. * In case this is actually a resize, we copy the old bitmap into the new one. * Otherwise, the bitmap is initialized to all bits set. */ int drbd_bm_resize(struct drbd_device *device, sector_t capacity, int set_new_bits) { struct drbd_bitmap *b = device->bitmap; unsigned long bits, words, owords, obits; unsigned long want, have, onpages; /* number of pages */ struct page **npages, **opages = NULL; int err = 0, growing; int opages_vmalloced; if (!expect(b)) return -ENOMEM; drbd_bm_lock(device, "resize", BM_LOCKED_MASK); drbd_info(device, "drbd_bm_resize called with capacity == %llu\n", (unsigned long long)capacity); if (capacity == b->bm_dev_capacity) goto out; opages_vmalloced = (BM_P_VMALLOCED & b->bm_flags); if (capacity == 0) { spin_lock_irq(&b->bm_lock); opages = b->bm_pages; onpages = b->bm_number_of_pages; owords = b->bm_words; b->bm_pages = NULL; b->bm_number_of_pages = b->bm_set = b->bm_bits = b->bm_words = b->bm_dev_capacity = 0; spin_unlock_irq(&b->bm_lock); bm_free_pages(opages, onpages); bm_vk_free(opages, opages_vmalloced); goto out; } bits = BM_SECT_TO_BIT(ALIGN(capacity, BM_SECT_PER_BIT)); /* if we would use words = ALIGN(bits,BITS_PER_LONG) >> LN2_BPL; a 32bit host could present the wrong number of words to a 64bit host. */ words = ALIGN(bits, 64) >> LN2_BPL; if (get_ldev(device)) { u64 bits_on_disk = drbd_md_on_disk_bits(device->ldev); put_ldev(device); if (bits > bits_on_disk) { drbd_err(device, "Not enough space for bitmap: %lu > %lu\n", (unsigned long)bits, (unsigned long)bits_on_disk); err = -ENOSPC; goto out; } } want = ALIGN(words*sizeof(long), PAGE_SIZE) >> PAGE_SHIFT; have = b->bm_number_of_pages; if (want == have) { D_ASSERT(device, b->bm_pages != NULL); npages = b->bm_pages; } else { if (drbd_insert_fault(device, DRBD_FAULT_BM_ALLOC)) npages = NULL; else npages = bm_realloc_pages(b, want); } if (!npages) { err = -ENOMEM; goto out; } spin_lock_irq(&b->bm_lock); opages = b->bm_pages; owords = b->bm_words; obits = b->bm_bits; growing = bits > obits; if (opages && growing && set_new_bits) bm_set_surplus(b); b->bm_pages = npages; b->bm_number_of_pages = want; b->bm_bits = bits; b->bm_words = words; b->bm_dev_capacity = capacity; if (growing) { if (set_new_bits) { bm_memset(b, owords, 0xff, words-owords); b->bm_set += bits - obits; } else bm_memset(b, owords, 0x00, words-owords); } if (want < have) { /* implicit: (opages != NULL) && (opages != npages) */ bm_free_pages(opages + want, have - want); } (void)bm_clear_surplus(b); spin_unlock_irq(&b->bm_lock); if (opages != npages) bm_vk_free(opages, opages_vmalloced); if (!growing) b->bm_set = bm_count_bits(b); drbd_info(device, "resync bitmap: bits=%lu words=%lu pages=%lu\n", bits, words, want); out: drbd_bm_unlock(device); return err; } /* inherently racy: * if not protected by other means, return value may be out of date when * leaving this function... * we still need to lock it, since it is important that this returns * bm_set == 0 precisely. * * maybe bm_set should be atomic_t ? */ unsigned long _drbd_bm_total_weight(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; unsigned long s; unsigned long flags; if (!expect(b)) return 0; if (!expect(b->bm_pages)) return 0; spin_lock_irqsave(&b->bm_lock, flags); s = b->bm_set; spin_unlock_irqrestore(&b->bm_lock, flags); return s; } unsigned long drbd_bm_total_weight(struct drbd_device *device) { unsigned long s; /* if I don't have a disk, I don't know about out-of-sync status */ if (!get_ldev_if_state(device, D_NEGOTIATING)) return 0; s = _drbd_bm_total_weight(device); put_ldev(device); return s; } size_t drbd_bm_words(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; if (!expect(b)) return 0; if (!expect(b->bm_pages)) return 0; return b->bm_words; } unsigned long drbd_bm_bits(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; if (!expect(b)) return 0; return b->bm_bits; } /* merge number words from buffer into the bitmap starting at offset. * buffer[i] is expected to be little endian unsigned long. * bitmap must be locked by drbd_bm_lock. * currently only used from receive_bitmap. */ void drbd_bm_merge_lel(struct drbd_device *device, size_t offset, size_t number, unsigned long *buffer) { struct drbd_bitmap *b = device->bitmap; unsigned long *p_addr, *bm; unsigned long word, bits; unsigned int idx; size_t end, do_now; end = offset + number; if (!expect(b)) return; if (!expect(b->bm_pages)) return; if (number == 0) return; WARN_ON(offset >= b->bm_words); WARN_ON(end > b->bm_words); spin_lock_irq(&b->bm_lock); while (offset < end) { do_now = min_t(size_t, ALIGN(offset+1, LWPP), end) - offset; idx = bm_word_to_page_idx(b, offset); p_addr = bm_map_pidx(b, idx); bm = p_addr + MLPP(offset); offset += do_now; while (do_now--) { bits = hweight_long(*bm); word = *bm | *buffer++; *bm++ = word; b->bm_set += hweight_long(word) - bits; } bm_unmap(p_addr); bm_set_page_need_writeout(b->bm_pages[idx]); } /* with 32bit <-> 64bit cross-platform connect * this is only correct for current usage, * where we _know_ that we are 64 bit aligned, * and know that this function is used in this way, too... */ if (end == b->bm_words) b->bm_set -= bm_clear_surplus(b); spin_unlock_irq(&b->bm_lock); } /* copy number words from the bitmap starting at offset into the buffer. * buffer[i] will be little endian unsigned long. */ void drbd_bm_get_lel(struct drbd_device *device, size_t offset, size_t number, unsigned long *buffer) { struct drbd_bitmap *b = device->bitmap; unsigned long *p_addr, *bm; size_t end, do_now; end = offset + number; if (!expect(b)) return; if (!expect(b->bm_pages)) return; spin_lock_irq(&b->bm_lock); if ((offset >= b->bm_words) || (end > b->bm_words) || (number <= 0)) drbd_err(device, "offset=%lu number=%lu bm_words=%lu\n", (unsigned long) offset, (unsigned long) number, (unsigned long) b->bm_words); else { while (offset < end) { do_now = min_t(size_t, ALIGN(offset+1, LWPP), end) - offset; p_addr = bm_map_pidx(b, bm_word_to_page_idx(b, offset)); bm = p_addr + MLPP(offset); offset += do_now; while (do_now--) *buffer++ = *bm++; bm_unmap(p_addr); } } spin_unlock_irq(&b->bm_lock); } /* set all bits in the bitmap */ void drbd_bm_set_all(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; if (!expect(b)) return; if (!expect(b->bm_pages)) return; spin_lock_irq(&b->bm_lock); bm_memset(b, 0, 0xff, b->bm_words); (void)bm_clear_surplus(b); b->bm_set = b->bm_bits; spin_unlock_irq(&b->bm_lock); } /* clear all bits in the bitmap */ void drbd_bm_clear_all(struct drbd_device *device) { struct drbd_bitmap *b = device->bitmap; if (!expect(b)) return; if (!expect(b->bm_pages)) return; spin_lock_irq(&b->bm_lock); bm_memset(b, 0, 0, b->bm_words); b->bm_set = 0; spin_unlock_irq(&b->bm_lock); } struct bm_aio_ctx { struct drbd_device *device; atomic_t in_flight; unsigned int done; unsigned flags; #define BM_AIO_COPY_PAGES 1 #define BM_AIO_WRITE_HINTED 2 #define BM_WRITE_ALL_PAGES 4 int error; struct kref kref; }; static void bm_aio_ctx_destroy(struct kref *kref) { struct bm_aio_ctx *ctx = container_of(kref, struct bm_aio_ctx, kref); put_ldev(ctx->device); kfree(ctx); } /* bv_page may be a copy, or may be the original */ static BIO_ENDIO_TYPE bm_async_io_complete BIO_ENDIO_ARGS(struct bio *bio, int error) { struct bm_aio_ctx *ctx = bio->bi_private; struct drbd_device *device = ctx->device; struct drbd_bitmap *b = device->bitmap; unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page); int uptodate = bio_flagged(bio, BIO_UPTODATE); BIO_ENDIO_FN_START; /* strange behavior of some lower level drivers... * fail the request by clearing the uptodate flag, * but do not return any error?! * do we want to WARN() on this? */ if (!error && !uptodate) error = -EIO; if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 && !bm_test_page_unchanged(b->bm_pages[idx])) drbd_warn(device, "bitmap page idx %u changed during IO!\n", idx); if (error) { /* ctx error will hold the completed-last non-zero error code, * in case error codes differ. */ ctx->error = error; bm_set_page_io_err(b->bm_pages[idx]); /* Not identical to on disk version of it. * Is BM_PAGE_IO_ERROR enough? */ if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "IO ERROR %d on bitmap page idx %u\n", error, idx); } else { bm_clear_page_io_err(b->bm_pages[idx]); dynamic_drbd_dbg(device, "bitmap page idx %u completed\n", idx); } bm_page_unlock_io(device, idx); if (ctx->flags & BM_AIO_COPY_PAGES) mempool_free(bio->bi_io_vec[0].bv_page, drbd_md_io_page_pool); bio_put(bio); if (atomic_dec_and_test(&ctx->in_flight)) { ctx->done = 1; wake_up(&device->misc_wait); kref_put(&ctx->kref, &bm_aio_ctx_destroy); } BIO_ENDIO_FN_RETURN; } static void bm_page_io_async(struct bm_aio_ctx *ctx, int page_nr, int rw) __must_hold(local) { struct bio *bio = bio_alloc_drbd(GFP_NOIO); struct drbd_device *device = ctx->device; struct drbd_bitmap *b = device->bitmap; struct page *page; unsigned int len; sector_t on_disk_sector = device->ldev->md.md_offset + device->ldev->md.bm_offset; on_disk_sector += ((sector_t)page_nr) << (PAGE_SHIFT-9); /* this might happen with very small * flexible external meta data device, * or with PAGE_SIZE > 4k */ len = min_t(unsigned int, PAGE_SIZE, (drbd_md_last_sector(device->ldev) - on_disk_sector + 1)<<9); /* serialize IO on this page */ bm_page_lock_io(device, page_nr); /* before memcpy and submit, * so it can be redirtied any time */ bm_set_page_unchanged(b->bm_pages[page_nr]); if (ctx->flags & BM_AIO_COPY_PAGES) { page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_WAIT); copy_highpage(page, b->bm_pages[page_nr]); bm_store_page_idx(page, page_nr); } else page = b->bm_pages[page_nr]; bio->bi_bdev = device->ldev->md_bdev; bio->bi_sector = on_disk_sector; /* bio_add_page of a single page to an empty bio will always succeed, * according to api. Do we want to assert that? */ bio_add_page(bio, page, len, 0); bio->bi_private = ctx; bio->bi_end_io = bm_async_io_complete; if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD)) { bio->bi_rw |= rw; bio_endio(bio, -EIO); } else { submit_bio(rw, bio); /* this should not count as user activity and cause the * resync to throttle -- see drbd_rs_should_slow_down(). */ atomic_add(len >> 9, &device->rs_sect_ev); } } /* * bm_rw: read/write the whole bitmap from/to its on disk location. */ static int bm_rw(struct drbd_device *device, int rw, unsigned flags, unsigned lazy_writeout_upper_idx) __must_hold(local) { struct bm_aio_ctx *ctx; struct drbd_bitmap *b = device->bitmap; int num_pages, i, count = 0; unsigned long now; char ppb[10]; int err = 0; /* * We are protected against bitmap disappearing/resizing by holding an * ldev reference (caller must have called get_ldev()). * For read/write, we are protected against changes to the bitmap by * the bitmap lock (see drbd_bitmap_io). * For lazy writeout, we don't care for ongoing changes to the bitmap, * as we submit copies of pages anyways. */ ctx = kmalloc(sizeof(struct bm_aio_ctx), GFP_NOIO); if (!ctx) return -ENOMEM; *ctx = (struct bm_aio_ctx) { .device = device, .in_flight = ATOMIC_INIT(1), .done = 0, .flags = flags, .error = 0, .kref = { ATOMIC_INIT(2) }, }; if (!get_ldev_if_state(device, D_ATTACHING)) { /* put is in bm_aio_ctx_destroy() */ drbd_err(device, "ASSERT FAILED: get_ldev_if_state() == 1 in bm_rw()\n"); kfree(ctx); return -ENODEV; } if (!ctx->flags) WARN_ON(!(BM_LOCKED_MASK & b->bm_flags)); num_pages = b->bm_number_of_pages; now = jiffies; /* let the layers below us try to merge these bios... */ for (i = 0; i < num_pages; i++) { /* ignore completely unchanged pages */ if (lazy_writeout_upper_idx && i == lazy_writeout_upper_idx) break; if (rw & WRITE) { if ((flags & BM_AIO_WRITE_HINTED) && !test_and_clear_bit(BM_PAGE_HINT_WRITEOUT, &page_private(b->bm_pages[i]))) continue; if (!(flags & BM_WRITE_ALL_PAGES) && bm_test_page_unchanged(b->bm_pages[i])) { dynamic_drbd_dbg(device, "skipped bm write for idx %u\n", i); continue; } /* during lazy writeout, * ignore those pages not marked for lazy writeout. */ if (lazy_writeout_upper_idx && !bm_test_page_lazy_writeout(b->bm_pages[i])) { dynamic_drbd_dbg(device, "skipped bm lazy write for idx %u\n", i); continue; } } atomic_inc(&ctx->in_flight); bm_page_io_async(ctx, i, rw); ++count; cond_resched(); } /* * We initialize ctx->in_flight to one to make sure bm_async_io_complete * will not set ctx->done early, and decrement / test it here. If there * are still some bios in flight, we need to wait for them here. * If all IO is done already (or nothing had been submitted), there is * no need to wait. Still, we need to put the kref associated with the * "in_flight reached zero, all done" event. */ if (!atomic_dec_and_test(&ctx->in_flight)) { drbd_blk_run_queue(bdev_get_queue(device->ldev->md_bdev)); wait_until_done_or_force_detached(device, device->ldev, &ctx->done); } else kref_put(&ctx->kref, &bm_aio_ctx_destroy); /* summary for global bitmap IO */ if (flags == 0) drbd_info(device, "bitmap %s of %u pages took %lu jiffies\n", rw == WRITE ? "WRITE" : "READ", count, jiffies - now); if (ctx->error) { drbd_alert(device, "we had at least one MD IO ERROR during bitmap IO\n"); drbd_chk_io_error(device, 1, DRBD_META_IO_ERROR); err = -EIO; /* ctx->error ? */ } if (atomic_read(&ctx->in_flight)) err = -EIO; /* Disk timeout/force-detach during IO... */ now = jiffies; if (rw == WRITE) { drbd_md_flush(device); } else /* rw == READ */ { b->bm_set = bm_count_bits(b); drbd_info(device, "recounting of set bits took additional %lu jiffies\n", jiffies - now); } now = b->bm_set; if (flags == 0) drbd_info(device, "%s (%lu bits) marked out-of-sync by on disk bit-map.\n", ppsize(ppb, now << (BM_BLOCK_SHIFT-10)), now); kref_put(&ctx->kref, &bm_aio_ctx_destroy); return err; } /** * drbd_bm_read() - Read the whole bitmap from its on disk location. * @device: DRBD device. */ int drbd_bm_read(struct drbd_device *device) __must_hold(local) { return bm_rw(device, READ, 0, 0); } /** * drbd_bm_write() - Write the whole bitmap to its on disk location. * @device: DRBD device. * * Will only write pages that have changed since last IO. */ int drbd_bm_write(struct drbd_device *device) __must_hold(local) { return bm_rw(device, WRITE, 0, 0); } /** * drbd_bm_write_all() - Write the whole bitmap to its on disk location. * @device: DRBD device. * * Will write all pages. */ int drbd_bm_write_all(struct drbd_device *device) __must_hold(local) { return bm_rw(device, WRITE, BM_WRITE_ALL_PAGES, 0); } /** * drbd_bm_lazy_write_out() - Write bitmap pages 0 to @upper_idx-1, if they have changed. * @device: DRBD device. * @upper_idx: 0: write all changed pages; +ve: page index to stop scanning for changed pages */ int drbd_bm_write_lazy(struct drbd_device *device, unsigned upper_idx) __must_hold(local) { return bm_rw(device, WRITE, BM_AIO_COPY_PAGES, upper_idx); } /** * drbd_bm_write_copy_pages() - Write the whole bitmap to its on disk location. * @device: DRBD device. * * Will only write pages that have changed since last IO. * In contrast to drbd_bm_write(), this will copy the bitmap pages * to temporary writeout pages. It is intended to trigger a full write-out * while still allowing the bitmap to change, for example if a resync or online * verify is aborted due to a failed peer disk, while local IO continues, or * pending resync acks are still being processed. */ int drbd_bm_write_copy_pages(struct drbd_device *device) __must_hold(local) { return bm_rw(device, WRITE, BM_AIO_COPY_PAGES, 0); } /** * drbd_bm_write_hinted() - Write bitmap pages with "hint" marks, if they have changed. * @device: DRBD device. */ int drbd_bm_write_hinted(struct drbd_device *device) __must_hold(local) { return bm_rw(device, WRITE, BM_AIO_WRITE_HINTED | BM_AIO_COPY_PAGES, 0); } /** * drbd_bm_write_page() - Writes a PAGE_SIZE aligned piece of bitmap * @device: DRBD device. * @idx: bitmap page index * * We don't want to special case on logical_block_size of the backend device, * so we submit PAGE_SIZE aligned pieces. * Note that on "most" systems, PAGE_SIZE is 4k. * * In case this becomes an issue on systems with larger PAGE_SIZE, * we may want to change this again to write 4k aligned 4k pieces. */ int drbd_bm_write_page(struct drbd_device *device, unsigned int idx) __must_hold(local) { struct bm_aio_ctx *ctx; int err; if (bm_test_page_unchanged(device->bitmap->bm_pages[idx])) { dynamic_drbd_dbg(device, "skipped bm page write for idx %u\n", idx); return 0; } ctx = kmalloc(sizeof(struct bm_aio_ctx), GFP_NOIO); if (!ctx) return -ENOMEM; *ctx = (struct bm_aio_ctx) { .device = device, .in_flight = ATOMIC_INIT(1), .done = 0, .flags = BM_AIO_COPY_PAGES, .error = 0, .kref = { ATOMIC_INIT(2) }, }; if (!get_ldev_if_state(device, D_ATTACHING)) { /* put is in bm_aio_ctx_destroy() */ drbd_err(device, "ASSERT FAILED: get_ldev_if_state() == 1 in drbd_bm_write_page()\n"); kfree(ctx); return -ENODEV; } bm_page_io_async(ctx, idx, WRITE_SYNC); wait_until_done_or_force_detached(device, device->ldev, &ctx->done); if (ctx->error) drbd_chk_io_error(device, 1, DRBD_META_IO_ERROR); /* that causes us to detach, so the in memory bitmap will be * gone in a moment as well. */ device->bm_writ_cnt++; err = atomic_read(&ctx->in_flight) ? -EIO : ctx->error; kref_put(&ctx->kref, &bm_aio_ctx_destroy); return err; } /* NOTE * find_first_bit returns int, we return unsigned long. * For this to work on 32bit arch with bitnumbers > (1<<32), * we'd need to return u64, and get a whole lot of other places * fixed where we still use unsigned long. * * this returns a bit number, NOT a sector! */ #ifdef COMPAT_KMAP_ATOMIC_PAGE_ONLY #define __bm_find_next(device, bm_fo, find_zero_bit, km) ___bm_find_next(device, bm_fo, find_zero_bit) static unsigned long ___bm_find_next(struct drbd_device *device, unsigned long bm_fo, const int find_zero_bit) #else static unsigned long __bm_find_next(struct drbd_device *device, unsigned long bm_fo, const int find_zero_bit, const enum km_type km) #endif { struct drbd_bitmap *b = device->bitmap; unsigned long *p_addr; unsigned long bit_offset; unsigned i; if (bm_fo > b->bm_bits) { drbd_err(device, "bm_fo=%lu bm_bits=%lu\n", bm_fo, b->bm_bits); bm_fo = DRBD_END_OF_BITMAP; } else { while (bm_fo < b->bm_bits) { /* bit offset of the first bit in the page */ bit_offset = bm_fo & ~BITS_PER_PAGE_MASK; p_addr = __bm_map_pidx(b, bm_bit_to_page_idx(b, bm_fo), km); if (find_zero_bit) i = find_next_zero_bit_le(p_addr, PAGE_SIZE*8, bm_fo & BITS_PER_PAGE_MASK); else i = find_next_bit_le(p_addr, PAGE_SIZE*8, bm_fo & BITS_PER_PAGE_MASK); __bm_unmap(p_addr, km); if (i < PAGE_SIZE*8) { bm_fo = bit_offset + i; if (bm_fo >= b->bm_bits) break; goto found; } bm_fo = bit_offset + PAGE_SIZE*8; } bm_fo = DRBD_END_OF_BITMAP; } found: return bm_fo; } static unsigned long bm_find_next(struct drbd_device *device, unsigned long bm_fo, const int find_zero_bit) { struct drbd_bitmap *b = device->bitmap; unsigned long i = DRBD_END_OF_BITMAP; if (!expect(b)) return i; if (!expect(b->bm_pages)) return i; spin_lock_irq(&b->bm_lock); if (BM_DONT_TEST & b->bm_flags) bm_print_lock_info(device); i = __bm_find_next(device, bm_fo, find_zero_bit, KM_IRQ1); spin_unlock_irq(&b->bm_lock); return i; } unsigned long drbd_bm_find_next(struct drbd_device *device, unsigned long bm_fo) { return bm_find_next(device, bm_fo, 0); } #if 0 /* not yet needed for anything. */ unsigned long drbd_bm_find_next_zero(struct drbd_device *device, unsigned long bm_fo) { return bm_find_next(device, bm_fo, 1); } #endif /* does not spin_lock_irqsave. * you must take drbd_bm_lock() first */ unsigned long _drbd_bm_find_next(struct drbd_device *device, unsigned long bm_fo) { /* WARN_ON(!(BM_DONT_SET & device->b->bm_flags)); */ return __bm_find_next(device, bm_fo, 0, KM_USER1); } unsigned long _drbd_bm_find_next_zero(struct drbd_device *device, unsigned long bm_fo) { /* WARN_ON(!(BM_DONT_SET & device->b->bm_flags)); */ return __bm_find_next(device, bm_fo, 1, KM_USER1); } /* returns number of bits actually changed. * for val != 0, we change 0 -> 1, return code positive * for val == 0, we change 1 -> 0, return code negative * wants bitnr, not sector. * expected to be called for only a few bits (e - s about BITS_PER_LONG). * Must hold bitmap lock already. */ static int __bm_change_bits_to(struct drbd_device *device, const unsigned long s, unsigned long e, int val) { struct drbd_bitmap *b = device->bitmap; unsigned long *p_addr = NULL; unsigned long bitnr; unsigned int last_page_nr = -1U; int c = 0; int changed_total = 0; if (e >= b->bm_bits) { drbd_err(device, "ASSERT FAILED: bit_s=%lu bit_e=%lu bm_bits=%lu\n", s, e, b->bm_bits); e = b->bm_bits ? b->bm_bits -1 : 0; } for (bitnr = s; bitnr <= e; bitnr++) { unsigned int page_nr = bm_bit_to_page_idx(b, bitnr); if (page_nr != last_page_nr) { if (p_addr) __bm_unmap(p_addr, KM_IRQ1); if (c < 0) bm_set_page_lazy_writeout(b->bm_pages[last_page_nr]); else if (c > 0) bm_set_page_need_writeout(b->bm_pages[last_page_nr]); changed_total += c; c = 0; p_addr = __bm_map_pidx(b, page_nr, KM_IRQ1); last_page_nr = page_nr; } if (val) c += (0 == __test_and_set_bit_le(bitnr & BITS_PER_PAGE_MASK, p_addr)); else c -= (0 != __test_and_clear_bit_le(bitnr & BITS_PER_PAGE_MASK, p_addr)); } if (p_addr) __bm_unmap(p_addr, KM_IRQ1); if (c < 0) bm_set_page_lazy_writeout(b->bm_pages[last_page_nr]); else if (c > 0) bm_set_page_need_writeout(b->bm_pages[last_page_nr]); changed_total += c; b->bm_set += changed_total; return changed_total; } /* returns number of bits actually changed. * for val != 0, we change 0 -> 1, return code positive * for val == 0, we change 1 -> 0, return code negative * wants bitnr, not sector */ static int bm_change_bits_to(struct drbd_device *device, const unsigned long s, const unsigned long e, int val) { unsigned long flags; struct drbd_bitmap *b = device->bitmap; int c = 0; if (!expect(b)) return 1; if (!expect(b->bm_pages)) return 0; spin_lock_irqsave(&b->bm_lock, flags); if ((val ? BM_DONT_SET : BM_DONT_CLEAR) & b->bm_flags) bm_print_lock_info(device); c = __bm_change_bits_to(device, s, e, val); spin_unlock_irqrestore(&b->bm_lock, flags); return c; } /* returns number of bits changed 0 -> 1 */ int drbd_bm_set_bits(struct drbd_device *device, const unsigned long s, const unsigned long e) { return bm_change_bits_to(device, s, e, 1); } /* returns number of bits changed 1 -> 0 */ int drbd_bm_clear_bits(struct drbd_device *device, const unsigned long s, const unsigned long e) { return -bm_change_bits_to(device, s, e, 0); } /* sets all bits in full words, * from first_word up to, but not including, last_word */ static inline void bm_set_full_words_within_one_page(struct drbd_bitmap *b, int page_nr, int first_word, int last_word) { int i; int bits; int changed = 0; unsigned long *paddr = drbd_kmap_atomic(b->bm_pages[page_nr], KM_IRQ1); for (i = first_word; i < last_word; i++) { bits = hweight_long(paddr[i]); paddr[i] = ~0UL; changed += BITS_PER_LONG - bits; } drbd_kunmap_atomic(paddr, KM_IRQ1); if (changed) { /* We only need lazy writeout, the information is still in the * remote bitmap as well, and is reconstructed during the next * bitmap exchange, if lost locally due to a crash. */ bm_set_page_lazy_writeout(b->bm_pages[page_nr]); b->bm_set += changed; } } /* Same thing as drbd_bm_set_bits, * but more efficient for a large bit range. * You must first drbd_bm_lock(). * Can be called to set the whole bitmap in one go. * Sets bits from s to e _inclusive_. */ void _drbd_bm_set_bits(struct drbd_device *device, const unsigned long s, const unsigned long e) { /* First set_bit from the first bit (s) * up to the next long boundary (sl), * then assign full words up to the last long boundary (el), * then set_bit up to and including the last bit (e). * * Do not use memset, because we must account for changes, * so we need to loop over the words with hweight() anyways. */ struct drbd_bitmap *b = device->bitmap; unsigned long sl = ALIGN(s,BITS_PER_LONG); unsigned long el = (e+1) & ~((unsigned long)BITS_PER_LONG-1); int first_page; int last_page; int page_nr; int first_word; int last_word; if (e - s <= 3*BITS_PER_LONG) { /* don't bother; el and sl may even be wrong. */ spin_lock_irq(&b->bm_lock); __bm_change_bits_to(device, s, e, 1); spin_unlock_irq(&b->bm_lock); return; } /* difference is large enough that we can trust sl and el */ spin_lock_irq(&b->bm_lock); /* bits filling the current long */ if (sl) __bm_change_bits_to(device, s, sl-1, 1); first_page = sl >> (3 + PAGE_SHIFT); last_page = el >> (3 + PAGE_SHIFT); /* MLPP: modulo longs per page */ /* LWPP: long words per page */ first_word = MLPP(sl >> LN2_BPL); last_word = LWPP; /* first and full pages, unless first page == last page */ for (page_nr = first_page; page_nr < last_page; page_nr++) { bm_set_full_words_within_one_page(device->bitmap, page_nr, first_word, last_word); spin_unlock_irq(&b->bm_lock); cond_resched(); first_word = 0; spin_lock_irq(&b->bm_lock); } /* last page (respectively only page, for first page == last page) */ last_word = MLPP(el >> LN2_BPL); /* consider bitmap->bm_bits = 32768, bitmap->bm_number_of_pages = 1. (or multiples). * ==> e = 32767, el = 32768, last_page = 2, * and now last_word = 0. * We do not want to touch last_page in this case, * as we did not allocate it, it is not present in bitmap->bm_pages. */ if (last_word) bm_set_full_words_within_one_page(device->bitmap, last_page, first_word, last_word); /* possibly trailing bits. * example: (e & 63) == 63, el will be e+1. * if that even was the very last bit, * it would trigger an assert in __bm_change_bits_to() */ if (el <= e) __bm_change_bits_to(device, el, e, 1); spin_unlock_irq(&b->bm_lock); } /* returns bit state * wants bitnr, NOT sector. * inherently racy... area needs to be locked by means of {al,rs}_lru * 1 ... bit set * 0 ... bit not set * -1 ... first out of bounds access, stop testing for bits! */ int drbd_bm_test_bit(struct drbd_device *device, const unsigned long bitnr) { unsigned long flags; struct drbd_bitmap *b = device->bitmap; unsigned long *p_addr; int i; if (!expect(b)) return 0; if (!expect(b->bm_pages)) return 0; spin_lock_irqsave(&b->bm_lock, flags); if (BM_DONT_TEST & b->bm_flags) bm_print_lock_info(device); if (bitnr < b->bm_bits) { p_addr = bm_map_pidx(b, bm_bit_to_page_idx(b, bitnr)); i = test_bit_le(bitnr & BITS_PER_PAGE_MASK, p_addr) ? 1 : 0; bm_unmap(p_addr); } else if (bitnr == b->bm_bits) { i = -1; } else { /* (bitnr > b->bm_bits) */ drbd_err(device, "bitnr=%lu > bm_bits=%lu\n", bitnr, b->bm_bits); i = 0; } spin_unlock_irqrestore(&b->bm_lock, flags); return i; } /* returns number of bits set in the range [s, e] */ int drbd_bm_count_bits(struct drbd_device *device, const unsigned long s, const unsigned long e) { unsigned long flags; struct drbd_bitmap *b = device->bitmap; unsigned long *p_addr = NULL; unsigned long bitnr; unsigned int page_nr = -1U; int c = 0; /* If this is called without a bitmap, that is a bug. But just to be * robust in case we screwed up elsewhere, in that case pretend there * was one dirty bit in the requested area, so we won't try to do a * local read there (no bitmap probably implies no disk) */ if (!expect(b)) return 1; if (!expect(b->bm_pages)) return 1; spin_lock_irqsave(&b->bm_lock, flags); if (BM_DONT_TEST & b->bm_flags) bm_print_lock_info(device); for (bitnr = s; bitnr <= e; bitnr++) { unsigned int idx = bm_bit_to_page_idx(b, bitnr); if (page_nr != idx) { page_nr = idx; if (p_addr) bm_unmap(p_addr); p_addr = bm_map_pidx(b, idx); } if (expect(bitnr < b->bm_bits)) c += (0 != test_bit_le(bitnr - (page_nr << (PAGE_SHIFT+3)), p_addr)); else drbd_err(device, "bitnr=%lu bm_bits=%lu\n", bitnr, b->bm_bits); } if (p_addr) bm_unmap(p_addr); spin_unlock_irqrestore(&b->bm_lock, flags); return c; } /* inherently racy... * return value may be already out-of-date when this function returns. * but the general usage is that this is only use during a cstate when bits are * only cleared, not set, and typically only care for the case when the return * value is zero, or we already "locked" this "bitmap extent" by other means. * * enr is bm-extent number, since we chose to name one sector (512 bytes) * worth of the bitmap a "bitmap extent". * * TODO * I think since we use it like a reference count, we should use the real * reference count of some bitmap extent element from some lru instead... * */ int drbd_bm_e_weight(struct drbd_device *device, unsigned long enr) { struct drbd_bitmap *b = device->bitmap; int count, s, e; unsigned long flags; unsigned long *p_addr, *bm; if (!expect(b)) return 0; if (!expect(b->bm_pages)) return 0; spin_lock_irqsave(&b->bm_lock, flags); if (BM_DONT_TEST & b->bm_flags) bm_print_lock_info(device); s = S2W(enr); e = min((size_t)S2W(enr+1), b->bm_words); count = 0; if (s < b->bm_words) { int n = e-s; p_addr = bm_map_pidx(b, bm_word_to_page_idx(b, s)); bm = p_addr + MLPP(s); while (n--) count += hweight_long(*bm++); bm_unmap(p_addr); } else { drbd_err(device, "start offset (%d) too large in drbd_bm_e_weight\n", s); } spin_unlock_irqrestore(&b->bm_lock, flags); #if DUMP_MD >= 3 drbd_info(device, "enr=%lu weight=%d e=%d s=%d\n", enr, count, e, s); #endif return count; } drbd-8.4.4/drbd/drbd_int.h0000664000000000000000000024103412225206427014021 0ustar rootroot/* drbd_int.h This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef _DRBD_INT_H #define _DRBD_INT_H #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "drbd_wrappers.h" #include "drbd_strings.h" #include "drbd_state.h" #include "drbd_protocol.h" #ifdef __CHECKER__ # define __protected_by(x) __attribute__((require_context(x,1,999,"rdwr"))) # define __protected_read_by(x) __attribute__((require_context(x,1,999,"read"))) # define __protected_write_by(x) __attribute__((require_context(x,1,999,"write"))) # define __must_hold(x) __attribute__((context(x,1,1), require_context(x,1,999,"call"))) #else # define __protected_by(x) # define __protected_read_by(x) # define __protected_write_by(x) # define __must_hold(x) #endif #define __no_warn(lock, stmt) do { __acquire(lock); stmt; __release(lock); } while (0) /* Compatibility for older kernels */ #undef __cond_lock #ifdef __CHECKER__ # ifndef __acquires # define __acquires(x) __attribute__((context(x,0,1))) # define __releases(x) __attribute__((context(x,1,0))) # define __acquire(x) __context__(x,1) # define __release(x) __context__(x,-1) # endif # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) #else # ifndef __acquires # define __acquires(x) # define __releases(x) # define __acquire(x) (void)0 # define __release(x) (void)0 # endif # define __cond_lock(x,c) (c) #endif #ifdef NEED_BOOL_TYPE typedef _Bool bool; enum { false = 0, true = 1 }; #endif /* module parameter, defined in drbd_main.c */ extern unsigned int minor_count; extern bool disable_sendpage; extern bool allow_oos; #ifdef CONFIG_DRBD_FAULT_INJECTION extern int enable_faults; extern int fault_rate; extern int fault_devs; #endif extern char usermode_helper[]; #include #ifndef DRBD_MAJOR # define DRBD_MAJOR 147 #endif #include #include /* I don't remember why XCPU ... * This is used to wake the asender, * and to interrupt sending the sending task * on disconnect. */ #define DRBD_SIG SIGXCPU /* This is used to stop/restart our threads. * Cannot use SIGTERM nor SIGKILL, since these * are sent out by init on runlevel changes * I choose SIGHUP for now. * * FIXME btw, we should register some reboot notifier. */ #define DRBD_SIGKILL SIGHUP #define ID_IN_SYNC (4711ULL) #define ID_OUT_OF_SYNC (4712ULL) #define ID_SYNCER (-1ULL) #define UUID_NEW_BM_OFFSET ((u64)0x0001000000000000ULL) struct drbd_device; struct drbd_connection; #if defined(dev_to_disk) && defined(disk_to_dev) #define __drbd_printk_device(level, device, fmt, args...) \ dev_printk(level, disk_to_dev((device)->vdisk), fmt, ## args) #define __drbd_printk_peer_device(level, peer_device, fmt, args...) \ dev_printk(level, disk_to_dev((peer_device)->device->vdisk), fmt, ## args) #else /* For kernels <= 2.6.24 */ #define __drbd_printk_device(level, device, fmt, args...) \ printk(level "block drbd%u: " fmt, (device)->minor, ## args) #define __drbd_printk_peer_device(level, peer_device, fmt, args...) \ printk(level "block drbd%u: " fmt, (peer_device)->device->minor, ## args) #endif #define __drbd_printk_resource(level, resource, fmt, args...) \ printk(level "drbd %s: " fmt, (resource)->name, ## args) #define __drbd_printk_connection(level, connection, fmt, args...) \ printk(level "drbd %s: " fmt, (connection)->resource->name, ## args) void drbd_printk_with_wrong_object_type(void); #define __drbd_printk_if_same_type(obj, type, func, level, fmt, args...) \ (__builtin_types_compatible_p(typeof(obj), type) || \ __builtin_types_compatible_p(typeof(obj), const type)), \ func(level, (const type)(obj), fmt, ## args) #define drbd_printk(level, obj, fmt, args...) \ __builtin_choose_expr( \ __drbd_printk_if_same_type(obj, struct drbd_device *, \ __drbd_printk_device, level, fmt, ## args), \ __builtin_choose_expr( \ __drbd_printk_if_same_type(obj, struct drbd_resource *, \ __drbd_printk_resource, level, fmt, ## args), \ __builtin_choose_expr( \ __drbd_printk_if_same_type(obj, struct drbd_connection *, \ __drbd_printk_connection, level, fmt, ## args), \ __builtin_choose_expr( \ __drbd_printk_if_same_type(obj, struct drbd_peer_device *, \ __drbd_printk_peer_device, level, fmt, ## args), \ drbd_printk_with_wrong_object_type())))) #define drbd_dbg(obj, fmt, args...) \ drbd_printk(KERN_DEBUG, obj, fmt, ## args) #define drbd_alert(obj, fmt, args...) \ drbd_printk(KERN_ALERT, obj, fmt, ## args) #define drbd_err(obj, fmt, args...) \ drbd_printk(KERN_ERR, obj, fmt, ## args) #define drbd_warn(obj, fmt, args...) \ drbd_printk(KERN_WARNING, obj, fmt, ## args) #define drbd_info(obj, fmt, args...) \ drbd_printk(KERN_INFO, obj, fmt, ## args) #define drbd_emerg(obj, fmt, args...) \ drbd_printk(KERN_EMERG, obj, fmt, ## args) #if defined(dev_to_disk) && defined(disk_to_dev) #define dynamic_drbd_dbg(device, fmt, args...) \ dynamic_dev_dbg(disk_to_dev(device->vdisk), fmt, ## args) #else #define dynamic_drbd_dbg(device, fmt, args...) #endif /* see kernel/printk.c:printk_ratelimit * macro, so it is easy do have independent rate limits at different locations * "initializer element not constant ..." with kernel 2.4 :( * so I initialize toks to something large */ #define DRBD_ratelimit(ratelimit_jiffies, ratelimit_burst) \ ({ \ int __ret; \ static unsigned long toks = 0x80000000UL; \ static unsigned long last_msg; \ static int missed; \ unsigned long now = jiffies; \ toks += now - last_msg; \ last_msg = now; \ if (toks > (ratelimit_burst * ratelimit_jiffies)) \ toks = ratelimit_burst * ratelimit_jiffies; \ if (toks >= ratelimit_jiffies) { \ int lost = missed; \ missed = 0; \ toks -= ratelimit_jiffies; \ if (lost) \ drbd_warn(device, "%d messages suppressed in %s:%d.\n", \ lost, __FILE__, __LINE__); \ __ret = 1; \ } else { \ missed++; \ __ret = 0; \ } \ __ret; \ }) #ifdef DBG_ASSERTS extern void drbd_assert_breakpoint(struct drbd_device *, char *, char *, int); # define D_ASSERT(device, exp) if (!(exp)) \ drbd_assert_breakpoint(device, #exp, __FILE__, __LINE__) #else # define D_ASSERT(device, exp) if (!(exp)) \ drbd_err(device, "ASSERT( " #exp " ) in %s:%d\n", __FILE__, __LINE__) #endif /** * expect - Make an assertion * * Unlike the assert macro, this macro returns a boolean result. */ #define expect(exp) ({ \ bool _bool = (exp); \ if (!_bool) \ drbd_err(device, "ASSERTION %s FAILED in %s\n", \ #exp, __func__); \ _bool; \ }) /* Defines to control fault insertion */ enum { DRBD_FAULT_MD_WR = 0, /* meta data write */ DRBD_FAULT_MD_RD = 1, /* read */ DRBD_FAULT_RS_WR = 2, /* resync */ DRBD_FAULT_RS_RD = 3, DRBD_FAULT_DT_WR = 4, /* data */ DRBD_FAULT_DT_RD = 5, DRBD_FAULT_DT_RA = 6, /* data read ahead */ DRBD_FAULT_BM_ALLOC = 7, /* bitmap allocation */ DRBD_FAULT_AL_EE = 8, /* alloc ee */ DRBD_FAULT_RECEIVE = 9, /* Changes some bytes upon receiving a [rs]data block */ DRBD_FAULT_MAX, }; extern unsigned int _drbd_insert_fault(struct drbd_device *device, unsigned int type); static inline int drbd_insert_fault(struct drbd_device *device, unsigned int type) { #ifdef CONFIG_DRBD_FAULT_INJECTION return fault_rate && (enable_faults & (1<magic = (long)(x) ^ DRBD_MAGIC; }) #define IS_VALID_MDEV(x) \ (typecheck(struct drbd_device*, x) && \ ((x) ? (((x)->magic ^ DRBD_MAGIC) == (long)(x)) : 0)) extern struct idr drbd_devices; /* RCU, updates: genl_lock() */ extern struct list_head drbd_resources; /* RCU, updates: genl_lock() */ extern const char *cmdname(enum drbd_packet cmd); /* for sending/receiving the bitmap, * possibly in some encoding scheme */ struct bm_xfer_ctx { /* "const" * stores total bits and long words * of the bitmap, so we don't need to * call the accessor functions over and again. */ unsigned long bm_bits; unsigned long bm_words; /* during xfer, current position within the bitmap */ unsigned long bit_offset; unsigned long word_offset; /* statistics; index: (h->command == P_BITMAP) */ unsigned packets[2]; unsigned bytes[2]; }; extern void INFO_bm_xfer_stats(struct drbd_device *device, const char *direction, struct bm_xfer_ctx *c); static inline void bm_xfer_ctx_bit_to_word_offset(struct bm_xfer_ctx *c) { /* word_offset counts "native long words" (32 or 64 bit), * aligned at 64 bit. * Encoded packet may end at an unaligned bit offset. * In case a fallback clear text packet is transmitted in * between, we adjust this offset back to the last 64bit * aligned "native long word", which makes coding and decoding * the plain text bitmap much more convenient. */ #if BITS_PER_LONG == 64 c->word_offset = c->bit_offset >> 6; #elif BITS_PER_LONG == 32 c->word_offset = c->bit_offset >> 5; c->word_offset &= ~(1UL); #else # error "unsupported BITS_PER_LONG" #endif } extern unsigned int drbd_header_size(struct drbd_connection *connection); /**********************************************************************/ enum drbd_thread_state { NONE, RUNNING, EXITING, RESTARTING }; struct drbd_thread { spinlock_t t_lock; struct task_struct *task; struct completion stop; enum drbd_thread_state t_state; int (*function) (struct drbd_thread *); struct drbd_resource *resource; struct drbd_connection *connection; int reset_cpu_mask; const char *name; }; static inline enum drbd_thread_state get_t_state(struct drbd_thread *thi) { /* THINK testing the t_state seems to be uncritical in all cases * (but thread_{start,stop}), so we can read it *without* the lock. * --lge */ smp_rmb(); return thi->t_state; } struct drbd_work { struct list_head list; int (*cb)(struct drbd_work *, int cancel); }; struct drbd_device_work { struct drbd_work w; struct drbd_device *device; }; #include "drbd_interval.h" extern int drbd_wait_misc(struct drbd_device *, struct drbd_interval *); extern bool idr_is_empty(struct idr *idr); struct drbd_request { struct drbd_work w; struct drbd_device *device; /* if local IO is not allowed, will be NULL. * if local IO _is_ allowed, holds the locally submitted bio clone, * or, after local IO completion, the ERR_PTR(error). * see drbd_request_endio(). */ struct bio *private_bio; struct drbd_interval i; /* epoch: used to check on "completion" whether this req was in * the current epoch, and we therefore have to close it, * causing a p_barrier packet to be send, starting a new epoch. * * This corresponds to "barrier" in struct p_barrier[_ack], * and to "barrier_nr" in struct drbd_epoch (and various * comments/function parameters/local variable names). */ unsigned int epoch; struct list_head tl_requests; /* ring list in the transfer log */ struct bio *master_bio; /* master bio pointer */ unsigned long start_time; /* once it hits 0, we may complete the master_bio */ atomic_t completion_ref; /* once it hits 0, we may destroy this drbd_request object */ struct kref kref; unsigned rq_state; /* see comments above _req_mod() */ }; struct drbd_epoch { struct drbd_connection *connection; struct list_head list; unsigned int barrier_nr; atomic_t epoch_size; /* increased on every request added. */ atomic_t active; /* increased on every req. added, and dec on every finished. */ unsigned long flags; }; /* drbd_epoch flag bits */ enum { DE_BARRIER_IN_NEXT_EPOCH_ISSUED, DE_BARRIER_IN_NEXT_EPOCH_DONE, DE_CONTAINS_A_BARRIER, DE_HAVE_BARRIER_NUMBER, DE_IS_FINISHING, }; enum epoch_event { EV_PUT, EV_GOT_BARRIER_NR, EV_BARRIER_DONE, EV_BECAME_LAST, EV_CLEANUP = 32, /* used as flag */ }; struct digest_info { int digest_size; void *digest; }; struct drbd_peer_request { struct drbd_work w; struct drbd_peer_device *peer_device; struct drbd_epoch *epoch; /* for writes */ struct page *pages; atomic_t pending_bios; struct drbd_interval i; /* see comments on ee flag bits below */ unsigned long flags; union { u64 block_id; struct digest_info *digest; }; }; /* ee flag bits. * While corresponding bios are in flight, the only modification will be * set_bit WAS_ERROR, which has to be atomic. * If no bios are in flight yet, or all have been completed, * non-atomic modification to ee->flags is ok. */ enum { __EE_CALL_AL_COMPLETE_IO, __EE_MAY_SET_IN_SYNC, /* This peer request closes an epoch using a barrier. * On successful completion, the epoch is released, * and the P_BARRIER_ACK send. */ __EE_IS_BARRIER, /* is this a TRIM aka REQ_DISCARD? */ __EE_IS_TRIM, /* our lower level cannot handle trim, * and we want to fall back to zeroout instead */ __EE_IS_TRIM_USE_ZEROOUT, /* In case a barrier failed, * we need to resubmit without the barrier flag. */ __EE_RESUBMITTED, /* we may have several bios per peer request. * if any of those fail, we set this flag atomically * from the endio callback */ __EE_WAS_ERROR, /* This ee has a pointer to a digest instead of a block id */ __EE_HAS_DIGEST, /* Conflicting local requests need to be restarted after this request */ __EE_RESTART_REQUESTS, /* The peer wants a write ACK for this (wire proto C) */ __EE_SEND_WRITE_ACK, /* Is set when net_conf had two_primaries set while creating this peer_req */ __EE_IN_INTERVAL_TREE, }; #define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO) #define EE_MAY_SET_IN_SYNC (1<<__EE_MAY_SET_IN_SYNC) #define EE_IS_BARRIER (1<<__EE_IS_BARRIER) #define EE_IS_TRIM (1<<__EE_IS_TRIM) #define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT) #define EE_RESUBMITTED (1<<__EE_RESUBMITTED) #define EE_WAS_ERROR (1<<__EE_WAS_ERROR) #define EE_HAS_DIGEST (1<<__EE_HAS_DIGEST) #define EE_RESTART_REQUESTS (1<<__EE_RESTART_REQUESTS) #define EE_SEND_WRITE_ACK (1<<__EE_SEND_WRITE_ACK) #define EE_IN_INTERVAL_TREE (1<<__EE_IN_INTERVAL_TREE) /* flag bits per device */ enum { UNPLUG_REMOTE, /* sending a "UnplugRemote" could help */ MD_DIRTY, /* current uuids and flags not yet on disk */ USE_DEGR_WFC_T, /* degr-wfc-timeout instead of wfc-timeout. */ CL_ST_CHG_SUCCESS, CL_ST_CHG_FAIL, CRASHED_PRIMARY, /* This node was a crashed primary. * Gets cleared when the state.conn * goes into C_CONNECTED state. */ CONSIDER_RESYNC, MD_NO_BARRIER, /* meta data device does not support barriers, so don't even try */ SUSPEND_IO, /* suspend application io */ BITMAP_IO, /* suspend application io; once no more io in flight, start bitmap io */ BITMAP_IO_QUEUED, /* Started bitmap IO */ GO_DISKLESS, /* Disk is being detached, on io-error or admin request. */ WAS_IO_ERROR, /* Local disk failed, returned IO error */ WAS_READ_ERROR, /* Local disk READ failed (set additionally to the above) */ FORCE_DETACH, /* Force-detach from local disk, aborting any pending local IO */ RESYNC_AFTER_NEG, /* Resync after online grow after the attach&negotiate finished. */ RESIZE_PENDING, /* Size change detected locally, waiting for the response from * the peer, if it changed there as well. */ NEW_CUR_UUID, /* Create new current UUID when thawing IO */ AL_SUSPENDED, /* Activity logging is currently suspended. */ AHEAD_TO_SYNC_SOURCE, /* Ahead -> SyncSource queued */ B_RS_H_DONE, /* Before resync handler done (already executed) */ DISCARD_MY_DATA, /* discard_my_data flag per volume */ READ_BALANCE_RR, }; struct drbd_bitmap; /* opaque for drbd_device */ /* definition of bits in bm_flags to be used in drbd_bm_lock * and drbd_bitmap_io and friends. */ enum bm_flag { /* do we need to kfree, or vfree bm_pages? */ BM_P_VMALLOCED = 0x10000, /* internal use only, will be masked out */ /* currently locked for bulk operation */ BM_LOCKED_MASK = 0xf, /* in detail, that is: */ BM_DONT_CLEAR = 0x1, BM_DONT_SET = 0x2, BM_DONT_TEST = 0x4, /* so we can mark it locked for bulk operation, * and still allow all non-bulk operations */ BM_IS_LOCKED = 0x8, /* (test bit, count bit) allowed (common case) */ BM_LOCKED_TEST_ALLOWED = BM_DONT_CLEAR | BM_DONT_SET | BM_IS_LOCKED, /* testing bits, as well as setting new bits allowed, but clearing bits * would be unexpected. Used during bitmap receive. Setting new bits * requires sending of "out-of-sync" information, though. */ BM_LOCKED_SET_ALLOWED = BM_DONT_CLEAR | BM_IS_LOCKED, /* for drbd_bm_write_copy_pages, everything is allowed, * only concurrent bulk operations are locked out. */ BM_LOCKED_CHANGE_ALLOWED = BM_IS_LOCKED, }; struct drbd_work_queue { struct list_head q; spinlock_t q_lock; /* to protect the list. */ wait_queue_head_t q_wait; }; struct drbd_socket { struct mutex mutex; struct socket *socket; /* this way we get our * send/receive buffers off the stack */ void *sbuf; void *rbuf; }; struct drbd_md { u64 md_offset; /* sector offset to 'super' block */ u64 la_size_sect; /* last agreed size, unit sectors */ spinlock_t uuid_lock; u64 uuid[UI_SIZE]; u64 device_uuid; u32 flags; u32 md_size_sect; s32 al_offset; /* signed relative sector offset to activity log */ s32 bm_offset; /* signed relative sector offset to bitmap */ /* cached value of bdev->disk_conf->meta_dev_idx (see below) */ s32 meta_dev_idx; /* see al_tr_number_to_on_disk_sector() */ u32 al_stripes; u32 al_stripe_size_4k; u32 al_size_4k; /* cached product of the above */ }; struct drbd_backing_dev { struct kobject kobject; struct block_device *backing_bdev; struct block_device *md_bdev; struct drbd_md md; struct disk_conf *disk_conf; /* RCU, for updates: resource->conf_update */ sector_t known_size; /* last known size of that backing device */ }; struct drbd_md_io { unsigned int done; int error; }; struct bm_io_work { struct drbd_work w; char *why; enum bm_flag flags; int (*io_fn)(struct drbd_device *device); void (*done)(struct drbd_device *device, int rv); }; enum write_ordering_e { WO_none, WO_drain_io, WO_bdev_flush, WO_bio_barrier }; struct fifo_buffer { unsigned int head_index; unsigned int size; int total; /* sum of all values */ int values[0]; }; extern struct fifo_buffer *fifo_alloc(int fifo_size); /* flag bits per connection */ enum { NET_CONGESTED, /* The data socket is congested */ RESOLVE_CONFLICTS, /* Set on one node, cleared on the peer! */ SEND_PING, /* whether asender should send a ping asap */ SIGNAL_ASENDER, /* whether asender wants to be interrupted */ GOT_PING_ACK, /* set when we receive a ping_ack packet, ping_wait gets woken */ CONN_WD_ST_CHG_REQ, /* A cluster wide state change on the connection is active */ CONN_WD_ST_CHG_OKAY, CONN_WD_ST_CHG_FAIL, CONN_DRY_RUN, /* Expect disconnect after resync handshake. */ CREATE_BARRIER, /* next P_DATA is preceded by a P_BARRIER */ STATE_SENT, /* Do not change state/UUIDs while this is set */ CALLBACK_PENDING, /* Whether we have a call_usermodehelper(, UMH_WAIT_PROC) * pending, from drbd worker context. * If set, bdi_write_congested() returns true, * so shrink_page_list() would not recurse into, * and potentially deadlock on, this drbd worker. */ DISCONNECT_SENT, }; struct drbd_resource { char *name; struct kref kref; struct idr devices; /* volume number to device mapping */ struct list_head connections; struct list_head resources; struct res_opts res_opts; struct mutex conf_update; /* mutex for ready-copy-update of net_conf and disk_conf */ spinlock_t req_lock; unsigned susp:1; /* IO suspended by user */ unsigned susp_nod:1; /* IO suspended because no data */ unsigned susp_fen:1; /* IO suspended because fence peer handler runs */ #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,30) && !defined(cpumask_bits) cpumask_t cpu_mask[1]; #else cpumask_var_t cpu_mask; #endif }; struct drbd_connection { struct list_head connections; struct drbd_resource *resource; struct kref kref; struct idr peer_devices; /* volume number to peer device mapping */ enum drbd_conns cstate; /* Only C_STANDALONE to C_WF_REPORT_PARAMS */ struct mutex cstate_mutex; /* Protects graceful disconnects */ unsigned int connect_cnt; /* Inc each time a connection is established */ unsigned long flags; struct net_conf *net_conf; /* content protected by rcu */ wait_queue_head_t ping_wait; /* Woken upon reception of a ping, and a state change */ struct sockaddr_storage my_addr; int my_addr_len; struct sockaddr_storage peer_addr; int peer_addr_len; struct drbd_socket data; /* data/barrier/cstate/parameter packets */ struct drbd_socket meta; /* ping/ack (metadata) packets */ int agreed_pro_version; /* actually used protocol version */ u32 agreed_features; unsigned long last_received; /* in jiffies, either socket */ unsigned int ko_count; struct list_head transfer_log; /* all requests not yet fully processed */ struct crypto_hash *cram_hmac_tfm; struct crypto_hash *integrity_tfm; /* checksums we compute, updates protected by connection->data->mutex */ struct crypto_hash *peer_integrity_tfm; /* checksums we verify, only accessed from receiver thread */ struct crypto_hash *csums_tfm; struct crypto_hash *verify_tfm; void *int_dig_in; void *int_dig_vv; /* receiver side */ struct drbd_epoch *current_epoch; spinlock_t epoch_lock; unsigned int epochs; enum write_ordering_e write_ordering; atomic_t current_tle_nr; /* transfer log epoch number */ unsigned current_tle_writes; /* writes seen within this tl epoch */ unsigned long last_reconnect_jif; struct drbd_thread receiver; struct drbd_thread worker; struct drbd_thread asender; /* sender side */ struct drbd_work_queue sender_work; struct { /* whether this sender thread * has processed a single write yet. */ bool seen_any_write_yet; /* Which barrier number to send with the next P_BARRIER */ int current_epoch_nr; /* how many write requests have been sent * with req->epoch == current_epoch_nr. * If none, no P_BARRIER will be sent. */ unsigned current_epoch_writes; } send; }; struct submit_worker { struct workqueue_struct *wq; struct work_struct worker; spinlock_t lock; struct list_head writes; }; struct drbd_peer_device { struct list_head peer_devices; struct drbd_device *device; struct drbd_connection *connection; }; struct drbd_device { #ifdef PARANOIA long magic; #endif struct drbd_resource *resource; struct list_head peer_devices; int vnr; /* volume number within the connection */ struct kobject kobj; /* things that are stored as / read from meta data on disk */ unsigned long flags; /* configured by drbdsetup */ struct drbd_backing_dev *ldev __protected_by(local); sector_t p_size; /* partner's disk size */ struct request_queue *rq_queue; struct block_device *this_bdev; struct gendisk *vdisk; unsigned long last_reattach_jif; struct drbd_work resync_work; struct drbd_work unplug_work; struct drbd_work go_diskless; struct drbd_work md_sync_work; struct drbd_work start_resync_work; struct timer_list resync_timer; struct timer_list md_sync_timer; struct timer_list start_resync_timer; struct timer_list request_timer; #ifdef DRBD_DEBUG_MD_SYNC struct { unsigned int line; const char* func; } last_md_mark_dirty; #endif /* Used after attach while negotiating new disk state. */ union drbd_state new_state_tmp; union drbd_dev_state state; wait_queue_head_t misc_wait; wait_queue_head_t state_wait; /* upon each state change. */ unsigned int send_cnt; unsigned int recv_cnt; unsigned int read_cnt; unsigned int writ_cnt; unsigned int al_writ_cnt; unsigned int bm_writ_cnt; atomic_t ap_bio_cnt; /* Requests we need to complete */ atomic_t ap_pending_cnt; /* AP data packets on the wire, ack expected */ atomic_t rs_pending_cnt; /* RS request/data packets on the wire */ atomic_t unacked_cnt; /* Need to send replies for */ atomic_t local_cnt; /* Waiting for local completion */ /* Interval tree of pending local write requests */ struct rb_root read_requests; struct rb_root write_requests; /* blocks to resync in this run [unit BM_BLOCK_SIZE] */ unsigned long rs_total; /* number of resync blocks that failed in this run */ unsigned long rs_failed; /* Syncer's start time [unit jiffies] */ unsigned long rs_start; /* cumulated time in PausedSyncX state [unit jiffies] */ unsigned long rs_paused; /* skipped because csum was equal [unit BM_BLOCK_SIZE] */ unsigned long rs_same_csum; #define DRBD_SYNC_MARKS 8 #define DRBD_SYNC_MARK_STEP (3*HZ) /* block not up-to-date at mark [unit BM_BLOCK_SIZE] */ unsigned long rs_mark_left[DRBD_SYNC_MARKS]; /* marks's time [unit jiffies] */ unsigned long rs_mark_time[DRBD_SYNC_MARKS]; /* current index into rs_mark_{left,time} */ int rs_last_mark; unsigned long rs_last_bcast; /* [unit jiffies] */ /* where does the admin want us to start? (sector) */ sector_t ov_start_sector; sector_t ov_stop_sector; /* where are we now? (sector) */ sector_t ov_position; /* Start sector of out of sync range (to merge printk reporting). */ sector_t ov_last_oos_start; /* size of out-of-sync range in sectors. */ sector_t ov_last_oos_size; unsigned long ov_left; /* in bits */ struct drbd_bitmap *bitmap; unsigned long bm_resync_fo; /* bit offset for drbd_bm_find_next */ /* Used to track operations of resync... */ struct lru_cache *resync; /* Number of locked elements in resync LRU */ unsigned int resync_locked; /* resync extent number waiting for application requests */ unsigned int resync_wenr; int open_cnt; u64 *p_uuid; /* FIXME clean comments, restructure so it is more obvious which * members are protected by what */ struct list_head active_ee; /* IO in progress (P_DATA gets written to disk) */ struct list_head sync_ee; /* IO in progress (P_RS_DATA_REPLY gets written to disk) */ struct list_head done_ee; /* need to send P_WRITE_ACK */ struct list_head read_ee; /* [RS]P_DATA_REQUEST being read */ struct list_head net_ee; /* zero-copy network send in progress */ int next_barrier_nr; struct list_head resync_reads; atomic_t pp_in_use; /* allocated from page pool */ atomic_t pp_in_use_by_net; /* sendpage()d, still referenced by tcp */ wait_queue_head_t ee_wait; struct page *md_io_page; /* one page buffer for md_io */ struct drbd_md_io md_io; atomic_t md_io_in_use; /* protects the md_io, md_io_page and md_io_tmpp */ spinlock_t al_lock; wait_queue_head_t al_wait; struct lru_cache *act_log; /* activity log */ unsigned int al_tr_number; int al_tr_cycle; wait_queue_head_t seq_wait; atomic_t packet_seq; unsigned int peer_seq; spinlock_t peer_seq_lock; unsigned int minor; unsigned long comm_bm_set; /* communicated number of set bits. */ struct bm_io_work bm_io_work; u64 ed_uuid; /* UUID of the exposed data */ struct mutex own_state_mutex; struct mutex *state_mutex; /* either own_state_mutex or first_peer_device(device)->connection->cstate_mutex */ char congestion_reason; /* Why we where congested... */ atomic_t rs_sect_in; /* for incoming resync data rate, SyncTarget */ atomic_t rs_sect_ev; /* for submitted resync data rate, both */ int rs_last_sect_ev; /* counter to compare with */ int rs_last_events; /* counter of read or write "events" (unit sectors) * on the lower level device when we last looked. */ int c_sync_rate; /* current resync rate after syncer throttle magic */ struct fifo_buffer *rs_plan_s; /* correction values of resync planer (RCU, connection->conn_update) */ int rs_in_flight; /* resync sectors in flight (to proxy, in proxy and from proxy) */ atomic_t ap_in_flight; /* App sectors in flight (waiting for ack) */ unsigned int peer_max_bio_size; unsigned int local_max_bio_size; /* any requests that would block in drbd_make_request() * are deferred to this single-threaded work queue */ struct submit_worker submit; }; struct drbd_config_context { /* assigned from drbd_genlmsghdr */ unsigned int minor; /* assigned from request attributes, if present */ unsigned int volume; #define VOLUME_UNSPECIFIED (-1U) /* pointer into the request skb, * limited lifetime! */ char *resource_name; struct nlattr *my_addr; struct nlattr *peer_addr; /* reply buffer */ struct sk_buff *reply_skb; /* pointer into reply buffer */ struct drbd_genlmsghdr *reply_dh; /* resolved from attributes, if possible */ struct drbd_device *device; struct drbd_resource *resource; struct drbd_connection *connection; }; static inline struct drbd_device *minor_to_device(unsigned int minor) { return (struct drbd_device *)idr_find(&drbd_devices, minor); } static inline struct drbd_peer_device *first_peer_device(struct drbd_device *device) { return list_first_entry(&device->peer_devices, struct drbd_peer_device, peer_devices); } #define for_each_resource(resource, _resources) \ list_for_each_entry(resource, _resources, resources) #define for_each_resource_rcu(resource, _resources) \ list_for_each_entry_rcu(resource, _resources, resources) #define for_each_resource_safe(resource, tmp, _resources) \ list_for_each_entry_safe(resource, tmp, _resources, resources) #define for_each_connection(connection, resource) \ list_for_each_entry(connection, &resource->connections, connections) #define for_each_connection_rcu(connection, resource) \ list_for_each_entry_rcu(connection, &resource->connections, connections) #define for_each_connection_safe(connection, tmp, resource) \ list_for_each_entry_safe(connection, tmp, &resource->connections, connections) #define for_each_peer_device(peer_device, device) \ list_for_each_entry(peer_device, &device->peer_devices, peer_devices) #define for_each_peer_device_rcu(peer_device, device) \ list_for_each_entry_rcu(peer_device, &device->peer_devices, peer_devices) #define for_each_peer_device_safe(peer_device, tmp, device) \ list_for_each_entry_safe(peer_device, tmp, &device->peer_devices, peer_devices) static inline unsigned int device_to_minor(struct drbd_device *device) { return device->minor; } /* * function declarations *************************/ /* drbd_main.c */ enum dds_flags { DDSF_FORCED = 1, DDSF_NO_RESYNC = 2, /* Do not run a resync for the new space */ }; extern void drbd_init_set_defaults(struct drbd_device *device); extern int drbd_thread_start(struct drbd_thread *thi); extern void _drbd_thread_stop(struct drbd_thread *thi, int restart, int wait); #ifdef CONFIG_SMP extern void drbd_thread_current_set_cpu(struct drbd_thread *thi); #else #define drbd_thread_current_set_cpu(A) ({}) #endif extern void tl_release(struct drbd_connection *, unsigned int barrier_nr, unsigned int set_size); extern void tl_clear(struct drbd_connection *); extern void drbd_free_sock(struct drbd_connection *connection); extern int drbd_send(struct drbd_connection *connection, struct socket *sock, void *buf, size_t size, unsigned msg_flags); extern int drbd_send_all(struct drbd_connection *, struct socket *, void *, size_t, unsigned); extern int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_packet cmd); extern int drbd_send_protocol(struct drbd_connection *connection); extern int drbd_send_uuids(struct drbd_peer_device *); extern int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *); extern void drbd_gen_and_send_sync_uuid(struct drbd_peer_device *); extern int drbd_send_sizes(struct drbd_peer_device *peer_device, int trigger_reply, enum dds_flags flags); #define drbd_send_state(m, s) drbd_send_state_(m, s, __func__ , __LINE__ ) #define drbd_send_current_state(m) drbd_send_current_state_(m, __func__ , __LINE__ ) extern int drbd_send_state_(struct drbd_peer_device *peer_device, union drbd_state s, const char *func, unsigned int line); extern int drbd_send_current_state_(struct drbd_peer_device *peer_device, const char *func, unsigned int line); extern int drbd_send_sync_param(struct drbd_peer_device *peer_device); extern void drbd_send_b_ack(struct drbd_connection *connection, u32 barrier_nr, u32 set_size); extern int drbd_send_ack(struct drbd_peer_device *, enum drbd_packet, struct drbd_peer_request *); extern void drbd_send_ack_rp(struct drbd_peer_device *, enum drbd_packet, struct p_block_req *rp); extern void drbd_send_ack_dp(struct drbd_peer_device *, enum drbd_packet, struct p_data *dp, int data_size); extern int drbd_send_ack_ex(struct drbd_peer_device *, enum drbd_packet, sector_t sector, int blksize, u64 block_id); extern int drbd_send_out_of_sync(struct drbd_peer_device *, struct drbd_request *); extern int drbd_send_block(struct drbd_peer_device *, enum drbd_packet, struct drbd_peer_request *); extern int drbd_send_dblock(struct drbd_peer_device *, struct drbd_request *req); extern int drbd_send_drequest(struct drbd_peer_device *, int cmd, sector_t sector, int size, u64 block_id); extern int drbd_send_drequest_csum(struct drbd_peer_device *, sector_t sector, int size, void *digest, int digest_size, enum drbd_packet cmd); extern int drbd_send_ov_request(struct drbd_peer_device *, sector_t sector, int size); extern int drbd_send_bitmap(struct drbd_device *device); extern void drbd_send_sr_reply(struct drbd_peer_device *, enum drbd_state_rv retcode); extern void conn_send_sr_reply(struct drbd_connection *connection, enum drbd_state_rv retcode); extern void drbd_free_bc(struct drbd_backing_dev *ldev); extern void drbd_device_cleanup(struct drbd_device *device); void drbd_print_uuids(struct drbd_device *device, const char *text); extern void conn_md_sync(struct drbd_connection *connection); extern void drbd_md_write(struct drbd_device *device, void *buffer); extern void drbd_md_sync(struct drbd_device *device); extern int drbd_md_read(struct drbd_device *device, struct drbd_backing_dev *bdev); extern void drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local); extern void _drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local); extern void drbd_uuid_new_current(struct drbd_device *device) __must_hold(local); extern void drbd_uuid_set_bm(struct drbd_device *device, u64 val) __must_hold(local); extern void drbd_uuid_move_history(struct drbd_device *device) __must_hold(local); extern void __drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local); extern void drbd_md_set_flag(struct drbd_device *device, int flags) __must_hold(local); extern void drbd_md_clear_flag(struct drbd_device *device, int flags)__must_hold(local); extern int drbd_md_test_flag(struct drbd_backing_dev *, int); #ifndef DRBD_DEBUG_MD_SYNC extern void drbd_md_mark_dirty(struct drbd_device *device); #else #define drbd_md_mark_dirty(m) drbd_md_mark_dirty_(m, __LINE__ , __func__ ) extern void drbd_md_mark_dirty_(struct drbd_device *device, unsigned int line, const char *func); #endif extern void drbd_queue_bitmap_io(struct drbd_device *device, int (*io_fn)(struct drbd_device *), void (*done)(struct drbd_device *, int), char *why, enum bm_flag flags); extern int drbd_bitmap_io(struct drbd_device *device, int (*io_fn)(struct drbd_device *), char *why, enum bm_flag flags); extern int drbd_bitmap_io_from_worker(struct drbd_device *device, int (*io_fn)(struct drbd_device *), char *why, enum bm_flag flags); extern int drbd_bmio_set_n_write(struct drbd_device *device); extern int drbd_bmio_clear_n_write(struct drbd_device *device); extern void drbd_ldev_destroy(struct drbd_device *device); /* Meta data layout * * We currently have two possible layouts. * Offsets in (512 byte) sectors. * external: * |----------- md_size_sect ------------------| * [ 4k superblock ][ activity log ][ Bitmap ] * | al_offset == 8 | * | bm_offset = al_offset + X | * ==> bitmap sectors = md_size_sect - bm_offset * * Variants: * old, indexed fixed size meta data: * * internal: * |----------- md_size_sect ------------------| * [data.....][ Bitmap ][ activity log ][ 4k superblock ][padding*] * | al_offset < 0 | * | bm_offset = al_offset - Y | * ==> bitmap sectors = Y = al_offset - bm_offset * * [padding*] are zero or up to 7 unused 512 Byte sectors to the * end of the device, so that the [4k superblock] will be 4k aligned. * * The activity log consists of 4k transaction blocks, * which are written in a ring-buffer, or striped ring-buffer like fashion, * which are writtensize used to be fixed 32kB, * but is about to become configurable. */ /* Our old fixed size meta data layout * allows up to about 3.8TB, so if you want more, * you need to use the "flexible" meta data format. */ #define MD_128MB_SECT (128LLU << 11) /* 128 MB, unit sectors */ #define MD_4kB_SECT 8 #define MD_32kB_SECT 64 /* One activity log extent represents 4M of storage */ #define AL_EXTENT_SHIFT 22 #define AL_EXTENT_SIZE (1< we need 32 KB bitmap. * Bit 0 ==> local node thinks this block is binary identical on both nodes * Bit 1 ==> local node thinks this block needs to be synced. */ #define SLEEP_TIME (HZ/10) /* We do bitmap IO in units of 4k blocks. * We also still have a hardcoded 4k per bit relation. */ #define BM_BLOCK_SHIFT 12 /* 4k per bit */ #define BM_BLOCK_SIZE (1<>(BM_BLOCK_SHIFT-9)) #define BM_BIT_TO_SECT(x) ((sector_t)(x)<<(BM_BLOCK_SHIFT-9)) #define BM_SECT_PER_BIT BM_BIT_TO_SECT(1) /* bit to represented kilo byte conversion */ #define Bit2KB(bits) ((bits)<<(BM_BLOCK_SHIFT-10)) /* in which _bitmap_ extent (resp. sector) the bit for a certain * _storage_ sector is located in */ #define BM_SECT_TO_EXT(x) ((x)>>(BM_EXT_SHIFT-9)) /* how much _storage_ sectors we have per bitmap sector */ #define BM_EXT_TO_SECT(x) ((sector_t)(x) << (BM_EXT_SHIFT-9)) #define BM_SECT_PER_EXT BM_EXT_TO_SECT(1) /* in one sector of the bitmap, we have this many activity_log extents. */ #define AL_EXT_PER_BM_SECT (1 << (BM_EXT_SHIFT - AL_EXTENT_SHIFT)) #define BM_BLOCKS_PER_BM_EXT_B (BM_EXT_SHIFT - BM_BLOCK_SHIFT) #define BM_BLOCKS_PER_BM_EXT_MASK ((1< BIO_MAX_SIZE #error Architecture not supported: DRBD_MAX_BIO_SIZE > BIO_MAX_SIZE #endif #define DRBD_MAX_BIO_SIZE_SAFE (1U << 12) /* Works always = 4k */ #define DRBD_MAX_SIZE_H80_PACKET (1U << 15) /* Header 80 only allows packets up to 32KiB data */ #define DRBD_MAX_BIO_SIZE_P95 (1U << 17) /* Protocol 95 to 99 allows bios up to 128KiB */ /* For now, don't allow more than one activity log extent worth of data * to be discarded in one go. We may need to rework drbd_al_begin_io() * to allow for even larger discard ranges */ #define DRBD_MAX_DISCARD_SIZE AL_EXTENT_SIZE #define DRBD_MAX_DISCARD_SECTORS (DRBD_MAX_DISCARD_SIZE >> 9) extern int drbd_bm_init(struct drbd_device *device); extern int drbd_bm_resize(struct drbd_device *device, sector_t sectors, int set_new_bits); extern void drbd_bm_cleanup(struct drbd_device *device); extern void drbd_bm_set_all(struct drbd_device *device); extern void drbd_bm_clear_all(struct drbd_device *device); /* set/clear/test only a few bits at a time */ extern int drbd_bm_set_bits( struct drbd_device *device, unsigned long s, unsigned long e); extern int drbd_bm_clear_bits( struct drbd_device *device, unsigned long s, unsigned long e); extern int drbd_bm_count_bits( struct drbd_device *device, const unsigned long s, const unsigned long e); /* bm_set_bits variant for use while holding drbd_bm_lock, * may process the whole bitmap in one go */ extern void _drbd_bm_set_bits(struct drbd_device *device, const unsigned long s, const unsigned long e); extern int drbd_bm_test_bit(struct drbd_device *device, unsigned long bitnr); extern int drbd_bm_e_weight(struct drbd_device *device, unsigned long enr); extern int drbd_bm_write_page(struct drbd_device *device, unsigned int idx) __must_hold(local); extern int drbd_bm_read(struct drbd_device *device) __must_hold(local); extern void drbd_bm_mark_for_writeout(struct drbd_device *device, int page_nr); extern int drbd_bm_write(struct drbd_device *device) __must_hold(local); extern int drbd_bm_write_hinted(struct drbd_device *device) __must_hold(local); extern int drbd_bm_write_all(struct drbd_device *device) __must_hold(local); extern int drbd_bm_write_copy_pages(struct drbd_device *device) __must_hold(local); extern size_t drbd_bm_words(struct drbd_device *device); extern unsigned long drbd_bm_bits(struct drbd_device *device); extern sector_t drbd_bm_capacity(struct drbd_device *device); #define DRBD_END_OF_BITMAP (~(unsigned long)0) extern unsigned long drbd_bm_find_next(struct drbd_device *device, unsigned long bm_fo); /* bm_find_next variants for use while you hold drbd_bm_lock() */ extern unsigned long _drbd_bm_find_next(struct drbd_device *device, unsigned long bm_fo); extern unsigned long _drbd_bm_find_next_zero(struct drbd_device *device, unsigned long bm_fo); extern unsigned long _drbd_bm_total_weight(struct drbd_device *device); extern unsigned long drbd_bm_total_weight(struct drbd_device *device); extern int drbd_bm_rs_done(struct drbd_device *device); /* for receive_bitmap */ extern void drbd_bm_merge_lel(struct drbd_device *device, size_t offset, size_t number, unsigned long *buffer); /* for _drbd_send_bitmap */ extern void drbd_bm_get_lel(struct drbd_device *device, size_t offset, size_t number, unsigned long *buffer); extern void drbd_bm_lock(struct drbd_device *device, char *why, enum bm_flag flags); extern void drbd_bm_unlock(struct drbd_device *device); /* drbd_main.c */ extern struct kmem_cache *drbd_request_cache; extern struct kmem_cache *drbd_ee_cache; /* peer requests */ extern struct kmem_cache *drbd_bm_ext_cache; /* bitmap extents */ extern struct kmem_cache *drbd_al_ext_cache; /* activity log extents */ extern mempool_t *drbd_request_mempool; extern mempool_t *drbd_ee_mempool; /* drbd's page pool, used to buffer data received from the peer, * or data requested by the peer. * * This does not have an emergency reserve. * * When allocating from this pool, it first takes pages from the pool. * Only if the pool is depleted will try to allocate from the system. * * The assumption is that pages taken from this pool will be processed, * and given back, "quickly", and then can be recycled, so we can avoid * frequent calls to alloc_page(), and still will be able to make progress even * under memory pressure. */ extern struct page *drbd_pp_pool; extern spinlock_t drbd_pp_lock; extern int drbd_pp_vacant; extern wait_queue_head_t drbd_pp_wait; /* We also need a standard (emergency-reserve backed) page pool * for meta data IO (activity log, bitmap). * We can keep it global, as long as it is used as "N pages at a time". * 128 should be plenty, currently we probably can get away with as few as 1. */ #define DRBD_MIN_POOL_PAGES 128 extern mempool_t *drbd_md_io_page_pool; /* We also need to make sure we get a bio * when we need it for housekeeping purposes */ extern struct bio_set *drbd_md_io_bio_set; /* to allocate from that set */ extern struct bio *bio_alloc_drbd(gfp_t gfp_mask); extern rwlock_t global_state_lock; extern int conn_lowest_minor(struct drbd_connection *connection); extern enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsigned int minor); extern void drbd_delete_device(struct drbd_device *mdev); extern struct drbd_resource *drbd_create_resource(const char *name); extern void drbd_free_resource(struct drbd_resource *resource); extern int set_resource_options(struct drbd_resource *resource, struct res_opts *res_opts); extern struct drbd_connection *conn_create(const char *name, struct res_opts *res_opts); extern void drbd_destroy_connection(struct kref *kref); extern struct drbd_connection *conn_get_by_addrs(void *my_addr, int my_addr_len, void *peer_addr, int peer_addr_len); extern struct drbd_resource *drbd_find_resource(const char *name); extern void drbd_destroy_resource(struct kref *kref); extern void conn_free_crypto(struct drbd_connection *connection); extern int proc_details; /* drbd_req */ extern void do_submit(struct work_struct *ws); extern void __drbd_make_request(struct drbd_device *, struct bio *, unsigned long); extern MAKE_REQUEST_TYPE drbd_make_request(struct request_queue *q, struct bio *bio); extern int drbd_read_remote(struct drbd_device *device, struct drbd_request *req); extern int drbd_merge_bvec(struct request_queue *q, #ifdef HAVE_bvec_merge_data struct bvec_merge_data *bvm, #else struct bio *bvm, #endif struct bio_vec *bvec); extern int is_valid_ar_handle(struct drbd_request *, sector_t); /* drbd_nl.c */ extern int drbd_msg_put_info(struct sk_buff *skb, const char *info); extern void drbd_suspend_io(struct drbd_device *device); extern void drbd_resume_io(struct drbd_device *device); extern char *ppsize(char *buf, unsigned long long size); extern sector_t drbd_new_dev_size(struct drbd_device *, struct drbd_backing_dev *, sector_t, int); enum determine_dev_size { DS_ERROR_SHRINK = -3, DS_ERROR_SPACE_MD = -2, DS_ERROR = -1, DS_UNCHANGED = 0, DS_SHRUNK = 1, DS_GREW = 2, DS_GREW_FROM_ZERO = 3, }; extern enum determine_dev_size drbd_determine_dev_size(struct drbd_device *, enum dds_flags, struct resize_parms *) __must_hold(local); extern void resync_after_online_grow(struct drbd_device *); extern void drbd_reconsider_max_bio_size(struct drbd_device *device); extern enum drbd_state_rv drbd_set_role(struct drbd_device *device, enum drbd_role new_role, int force); extern bool conn_try_outdate_peer(struct drbd_connection *connection); extern void conn_try_outdate_peer_async(struct drbd_connection *connection); extern int drbd_khelper(struct drbd_device *device, char *cmd); /* drbd_worker.c */ extern int drbd_worker(struct drbd_thread *thi); enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_minor); void drbd_resync_after_changed(struct drbd_device *device); extern void drbd_start_resync(struct drbd_device *device, enum drbd_conns side); extern void resume_next_sg(struct drbd_device *device); extern void suspend_other_sg(struct drbd_device *device); extern int drbd_resync_finished(struct drbd_device *device); /* maybe rather drbd_main.c ? */ extern void *drbd_md_get_buffer(struct drbd_device *device); extern void drbd_md_put_buffer(struct drbd_device *device); extern int drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev *bdev, sector_t sector, int rw); extern void drbd_ov_out_of_sync_found(struct drbd_device *, sector_t, int); extern void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_backing_dev *bdev, unsigned int *done); extern void drbd_rs_controller_reset(struct drbd_device *device); static inline void ov_out_of_sync_print(struct drbd_device *device) { if (device->ov_last_oos_size) { drbd_err(device, "Out of sync: start=%llu, size=%lu (sectors)\n", (unsigned long long)device->ov_last_oos_start, (unsigned long)device->ov_last_oos_size); } device->ov_last_oos_size=0; } extern void drbd_csum_bio(struct crypto_hash *, struct bio *, void *); extern void drbd_csum_ee(struct crypto_hash *, struct drbd_peer_request *, void *); /* worker callbacks */ extern int w_e_end_data_req(struct drbd_work *, int); extern int w_e_end_rsdata_req(struct drbd_work *, int); extern int w_e_end_csum_rs_req(struct drbd_work *, int); extern int w_e_end_ov_reply(struct drbd_work *, int); extern int w_e_end_ov_req(struct drbd_work *, int); extern int w_ov_finished(struct drbd_work *, int); extern int w_resync_timer(struct drbd_work *, int); extern int w_send_write_hint(struct drbd_work *, int); extern int w_send_dblock(struct drbd_work *, int); extern int w_send_read_req(struct drbd_work *, int); extern int w_e_reissue(struct drbd_work *, int); extern int w_restart_disk_io(struct drbd_work *, int); extern int w_send_out_of_sync(struct drbd_work *, int); extern int w_start_resync(struct drbd_work *, int); extern void resync_timer_fn(unsigned long data); extern void start_resync_timer_fn(unsigned long data); extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req); /* drbd_receiver.c */ extern int drbd_receiver(struct drbd_thread *thi); extern int drbd_asender(struct drbd_thread *thi); extern bool drbd_rs_c_min_rate_throttle(struct drbd_device *device); extern bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector); extern int drbd_submit_peer_request(struct drbd_device *, struct drbd_peer_request *, const unsigned, const int); extern int drbd_free_peer_reqs(struct drbd_device *, struct list_head *); extern struct drbd_peer_request *drbd_alloc_peer_req(struct drbd_peer_device *, u64, sector_t, unsigned int, bool, gfp_t) __must_hold(local); extern void __drbd_free_peer_req(struct drbd_device *, struct drbd_peer_request *, int); #define drbd_free_peer_req(m,e) __drbd_free_peer_req(m, e, 0) #define drbd_free_net_peer_req(m,e) __drbd_free_peer_req(m, e, 1) extern struct page *drbd_alloc_pages(struct drbd_peer_device *, unsigned int, bool); extern void drbd_set_recv_tcq(struct drbd_device *device, int tcq_enabled); extern void _drbd_clear_done_ee(struct drbd_device *device, struct list_head *to_be_freed); extern int drbd_connected(struct drbd_peer_device *); /* Yes, there is kernel_setsockopt, but only since 2.6.18. * So we have our own copy of it here. */ static inline int drbd_setsockopt(struct socket *sock, int level, int optname, char *optval, int optlen) { mm_segment_t oldfs = get_fs(); char __user *uoptval; int err; uoptval = (char __user __force *)optval; set_fs(KERNEL_DS); if (level == SOL_SOCKET) err = sock_setsockopt(sock, level, optname, uoptval, optlen); else err = sock->ops->setsockopt(sock, level, optname, uoptval, optlen); set_fs(oldfs); return err; } static inline void drbd_tcp_cork(struct socket *sock) { int val = 1; (void) drbd_setsockopt(sock, SOL_TCP, TCP_CORK, (char*)&val, sizeof(val)); } static inline void drbd_tcp_uncork(struct socket *sock) { int val = 0; (void) drbd_setsockopt(sock, SOL_TCP, TCP_CORK, (char*)&val, sizeof(val)); } static inline void drbd_tcp_nodelay(struct socket *sock) { int val = 1; (void) drbd_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char*)&val, sizeof(val)); } static inline void drbd_tcp_quickack(struct socket *sock) { int val = 2; (void) drbd_setsockopt(sock, SOL_TCP, TCP_QUICKACK, (char*)&val, sizeof(val)); } static inline sector_t drbd_get_capacity(struct block_device *bdev) { /* return bdev ? get_capacity(bdev->bd_disk) : 0; */ return bdev ? i_size_read(bdev->bd_inode) >> 9 : 0; } /* sets the number of 512 byte sectors of our virtual device */ static inline void drbd_set_my_capacity(struct drbd_device *device, sector_t size) { /* set_capacity(device->this_bdev->bd_disk, size); */ set_capacity(device->vdisk, size); device->this_bdev->bd_inode->i_size = (loff_t)size << 9; } static inline void drbd_kobject_uevent(struct drbd_device *device) { kobject_uevent(disk_to_kobj(device->vdisk), KOBJ_CHANGE); /* rhel4 / sles9 and older don't have this at all, * which means user space (udev) won't get events about possible changes of * corresponding resource + disk names after the initial drbd minor creation. */ } /* * used to submit our private bio */ static inline void drbd_generic_make_request(struct drbd_device *device, int fault_type, struct bio *bio) { __release(local); if (!bio->bi_bdev) { printk(KERN_ERR "drbd%d: drbd_generic_make_request: " "bio->bi_bdev == NULL\n", device_to_minor(device)); dump_stack(); bio_endio(bio, -ENODEV); return; } if (drbd_insert_fault(device, fault_type)) bio_endio(bio, -EIO); else generic_make_request(bio); } void drbd_bump_write_ordering(struct drbd_connection *connection, enum write_ordering_e wo); /* drbd_proc.c */ extern struct proc_dir_entry *drbd_proc; extern const struct file_operations drbd_proc_fops; extern const char *drbd_conn_str(enum drbd_conns s); extern const char *drbd_role_str(enum drbd_role s); /* drbd_actlog.c */ extern bool drbd_al_begin_io_prepare(struct drbd_device *device, struct drbd_interval *i); extern int drbd_al_begin_io_nonblock(struct drbd_device *device, struct drbd_interval *i); extern void drbd_al_begin_io_commit(struct drbd_device *device, bool delegate); extern bool drbd_al_begin_io_fastpath(struct drbd_device *device, struct drbd_interval *i); extern void drbd_al_begin_io(struct drbd_device *device, struct drbd_interval *i, bool delegate); extern void drbd_al_complete_io(struct drbd_device *device, struct drbd_interval *i); extern void drbd_rs_complete_io(struct drbd_device *device, sector_t sector); extern int drbd_rs_begin_io(struct drbd_device *device, sector_t sector); extern int drbd_try_rs_begin_io(struct drbd_device *device, sector_t sector); extern void drbd_rs_cancel_all(struct drbd_device *device); extern int drbd_rs_del_all(struct drbd_device *device); extern void drbd_rs_failed_io(struct drbd_device *device, sector_t sector, int size); extern void drbd_advance_rs_marks(struct drbd_device *device, unsigned long still_to_go); extern void __drbd_set_in_sync(struct drbd_device *device, sector_t sector, int size, const char *file, const unsigned int line); #define drbd_set_in_sync(device, sector, size) \ __drbd_set_in_sync(device, sector, size, __FILE__, __LINE__) extern int __drbd_set_out_of_sync(struct drbd_device *device, sector_t sector, int size, const char *file, const unsigned int line); #define drbd_set_out_of_sync(device, sector, size) \ __drbd_set_out_of_sync(device, sector, size, __FILE__, __LINE__) extern void drbd_al_shrink(struct drbd_device *device); extern int drbd_initialize_al(struct drbd_device *, void *); /* drbd_sysfs.c */ extern struct kobj_type drbd_bdev_kobj_type; /* drbd_nl.c */ /* state info broadcast */ struct sib_info { enum drbd_state_info_bcast_reason sib_reason; union { struct { char *helper_name; unsigned helper_exit_code; }; struct { union drbd_state os; union drbd_state ns; }; }; }; void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib); /* * inline helper functions *************************/ /* see also page_chain_add and friends in drbd_receiver.c */ static inline struct page *page_chain_next(struct page *page) { return (struct page *)page_private(page); } #define page_chain_for_each(page) \ for (; page && ({ prefetch(page_chain_next(page)); 1; }); \ page = page_chain_next(page)) #define page_chain_for_each_safe(page, n) \ for (; page && ({ n = page_chain_next(page); 1; }); page = n) static inline int drbd_peer_req_has_active_page(struct drbd_peer_request *peer_req) { struct page *page = peer_req->pages; page_chain_for_each(page) { if (page_count(page) > 1) return 1; } return 0; } static inline enum drbd_state_rv _drbd_set_state(struct drbd_device *device, union drbd_state ns, enum chg_state_flags flags, struct completion *done) { enum drbd_state_rv rv; read_lock(&global_state_lock); rv = __drbd_set_state(device, ns, flags, done); read_unlock(&global_state_lock); return rv; } static inline union drbd_state drbd_read_state(struct drbd_device *device) { struct drbd_resource *resource = device->resource; union drbd_state rv; rv.i = device->state.i; rv.susp = resource->susp; rv.susp_nod = resource->susp_nod; rv.susp_fen = resource->susp_fen; return rv; } enum drbd_force_detach_flags { DRBD_READ_ERROR, DRBD_WRITE_ERROR, DRBD_META_IO_ERROR, DRBD_FORCE_DETACH, }; #define __drbd_chk_io_error(m,f) __drbd_chk_io_error_(m,f, __func__) static inline void __drbd_chk_io_error_(struct drbd_device *device, enum drbd_force_detach_flags df, const char *where) { enum drbd_io_error_p ep; rcu_read_lock(); ep = rcu_dereference(device->ldev->disk_conf)->on_io_error; rcu_read_unlock(); switch (ep) { case EP_PASS_ON: /* FIXME would this be better named "Ignore"? */ if (df == DRBD_READ_ERROR || df == DRBD_WRITE_ERROR) { if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Local IO failed in %s.\n", where); if (device->state.disk > D_INCONSISTENT) _drbd_set_state(_NS(device, disk, D_INCONSISTENT), CS_HARD, NULL); break; } /* NOTE fall through for DRBD_META_IO_ERROR or DRBD_FORCE_DETACH */ case EP_DETACH: case EP_CALL_HELPER: /* Remember whether we saw a READ or WRITE error. * * Recovery of the affected area for WRITE failure is covered * by the activity log. * READ errors may fall outside that area though. Certain READ * errors can be "healed" by writing good data to the affected * blocks, which triggers block re-allocation in lower layers. * * If we can not write the bitmap after a READ error, * we may need to trigger a full sync (see w_go_diskless()). * * Force-detach is not really an IO error, but rather a * desperate measure to try to deal with a completely * unresponsive lower level IO stack. * Still it should be treated as a WRITE error. * * Meta IO error is always WRITE error: * we read meta data only once during attach, * which will fail in case of errors. */ set_bit(WAS_IO_ERROR, &device->flags); if (df == DRBD_READ_ERROR) set_bit(WAS_READ_ERROR, &device->flags); if (df == DRBD_FORCE_DETACH) set_bit(FORCE_DETACH, &device->flags); if (device->state.disk > D_FAILED) { _drbd_set_state(_NS(device, disk, D_FAILED), CS_HARD, NULL); drbd_err(device, "Local IO failed in %s. Detaching...\n", where); } break; } } /** * drbd_chk_io_error: Handle the on_io_error setting, should be called from all io completion handlers * @device: DRBD device. * @error: Error code passed to the IO completion callback * @forcedetach: Force detach. I.e. the error happened while accessing the meta data * * See also drbd_main.c:after_state_ch() if (os.disk > D_FAILED && ns.disk == D_FAILED) */ #define drbd_chk_io_error(m,e,f) drbd_chk_io_error_(m,e,f, __func__) static inline void drbd_chk_io_error_(struct drbd_device *device, int error, enum drbd_force_detach_flags forcedetach, const char *where) { if (error) { unsigned long flags; spin_lock_irqsave(&device->resource->req_lock, flags); __drbd_chk_io_error_(device, forcedetach, where); spin_unlock_irqrestore(&device->resource->req_lock, flags); } } /** * drbd_md_first_sector() - Returns the first sector number of the meta data area * @bdev: Meta data block device. * * BTW, for internal meta data, this happens to be the maximum capacity * we could agree upon with our peer node. */ static inline sector_t drbd_md_first_sector(struct drbd_backing_dev *bdev) { switch (bdev->md.meta_dev_idx) { case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_INT: return bdev->md.md_offset + bdev->md.bm_offset; case DRBD_MD_INDEX_FLEX_EXT: default: return bdev->md.md_offset; } } /** * drbd_md_last_sector() - Return the last sector number of the meta data area * @bdev: Meta data block device. */ static inline sector_t drbd_md_last_sector(struct drbd_backing_dev *bdev) { switch (bdev->md.meta_dev_idx) { case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_INT: return bdev->md.md_offset + MD_4kB_SECT -1; case DRBD_MD_INDEX_FLEX_EXT: default: return bdev->md.md_offset + bdev->md.md_size_sect -1; } } /** * drbd_get_max_capacity() - Returns the capacity we announce to out peer * @bdev: Meta data block device. * * returns the capacity we announce to out peer. we clip ourselves at the * various MAX_SECTORS, because if we don't, current implementation will * oops sooner or later */ static inline sector_t drbd_get_max_capacity(struct drbd_backing_dev *bdev) { sector_t s; switch (bdev->md.meta_dev_idx) { case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_INT: s = drbd_get_capacity(bdev->backing_bdev) ? min_t(sector_t, DRBD_MAX_SECTORS_FLEX, drbd_md_first_sector(bdev)) : 0; break; case DRBD_MD_INDEX_FLEX_EXT: s = min_t(sector_t, DRBD_MAX_SECTORS_FLEX, drbd_get_capacity(bdev->backing_bdev)); /* clip at maximum size the meta device can support */ s = min_t(sector_t, s, BM_EXT_TO_SECT(bdev->md.md_size_sect - bdev->md.bm_offset)); break; default: s = min_t(sector_t, DRBD_MAX_SECTORS, drbd_get_capacity(bdev->backing_bdev)); } return s; } /** * drbd_md_ss() - Return the sector number of our meta data super block * @bdev: Meta data block device. */ static inline sector_t drbd_md_ss(struct drbd_backing_dev *bdev) { const int meta_dev_idx = bdev->md.meta_dev_idx; if (meta_dev_idx == DRBD_MD_INDEX_FLEX_EXT) return 0; /* Since drbd08, internal meta data is always "flexible". * position: last 4k aligned block of 4k size */ if (meta_dev_idx == DRBD_MD_INDEX_INTERNAL || meta_dev_idx == DRBD_MD_INDEX_FLEX_INT) return (drbd_get_capacity(bdev->backing_bdev) & ~7ULL) - 8; /* external, some index; this is the old fixed size layout */ return MD_128MB_SECT * bdev->md.meta_dev_idx; } static inline void drbd_queue_work_front(struct drbd_work_queue *q, struct drbd_work *w) { unsigned long flags; spin_lock_irqsave(&q->q_lock, flags); list_add(&w->list, &q->q); spin_unlock_irqrestore(&q->q_lock, flags); wake_up(&q->q_wait); } static inline void drbd_queue_work(struct drbd_work_queue *q, struct drbd_work *w) { unsigned long flags; spin_lock_irqsave(&q->q_lock, flags); list_add_tail(&w->list, &q->q); spin_unlock_irqrestore(&q->q_lock, flags); wake_up(&q->q_wait); } extern void drbd_flush_workqueue(struct drbd_work_queue *work_queue); static inline void wake_asender(struct drbd_connection *connection) { if (test_bit(SIGNAL_ASENDER, &connection->flags)) force_sig(DRBD_SIG, connection->asender.task); } static inline void request_ping(struct drbd_connection *connection) { set_bit(SEND_PING, &connection->flags); wake_asender(connection); } extern void *conn_prepare_command(struct drbd_connection *, struct drbd_socket *); extern void *drbd_prepare_command(struct drbd_peer_device *, struct drbd_socket *); extern int conn_send_command(struct drbd_connection *, struct drbd_socket *, enum drbd_packet, unsigned int, void *, unsigned int); extern int drbd_send_command(struct drbd_peer_device *, struct drbd_socket *, enum drbd_packet, unsigned int, void *, unsigned int); extern int drbd_send_ping(struct drbd_connection *connection); extern int drbd_send_ping_ack(struct drbd_connection *connection); extern int drbd_send_state_req(struct drbd_peer_device *, union drbd_state, union drbd_state); extern int conn_send_state_req(struct drbd_connection *, union drbd_state, union drbd_state); static inline void drbd_thread_stop(struct drbd_thread *thi) { _drbd_thread_stop(thi, false, true); } static inline void drbd_thread_stop_nowait(struct drbd_thread *thi) { _drbd_thread_stop(thi, false, false); } static inline void drbd_thread_restart_nowait(struct drbd_thread *thi) { _drbd_thread_stop(thi, true, false); } /* counts how many answer packets packets we expect from our peer, * for either explicit application requests, * or implicit barrier packets as necessary. * increased: * w_send_barrier * _req_mod(req, QUEUE_FOR_NET_WRITE or QUEUE_FOR_NET_READ); * it is much easier and equally valid to count what we queue for the * worker, even before it actually was queued or send. * (drbd_make_request_common; recovery path on read io-error) * decreased: * got_BarrierAck (respective tl_clear, tl_clear_barrier) * _req_mod(req, DATA_RECEIVED) * [from receive_DataReply] * _req_mod(req, WRITE_ACKED_BY_PEER or RECV_ACKED_BY_PEER or NEG_ACKED) * [from got_BlockAck (P_WRITE_ACK, P_RECV_ACK)] * FIXME * for some reason it is NOT decreased in got_NegAck, * but in the resulting cleanup code from report_params. * we should try to remember the reason for that... * _req_mod(req, SEND_FAILED or SEND_CANCELED) * _req_mod(req, CONNECTION_LOST_WHILE_PENDING) * [from tl_clear_barrier] */ static inline void inc_ap_pending(struct drbd_device *device) { atomic_inc(&device->ap_pending_cnt); } #define ERR_IF_CNT_IS_NEGATIVE(which, func, line) \ if (atomic_read(&device->which) < 0) \ drbd_err(device, "in %s:%d: " #which " = %d < 0 !\n", \ func, line, \ atomic_read(&device->which)) #define dec_ap_pending(device) _dec_ap_pending(device, __FUNCTION__, __LINE__) static inline void _dec_ap_pending(struct drbd_device *device, const char *func, int line) { if (atomic_dec_and_test(&device->ap_pending_cnt)) wake_up(&device->misc_wait); ERR_IF_CNT_IS_NEGATIVE(ap_pending_cnt, func, line); } /* counts how many resync-related answers we still expect from the peer * increase decrease * C_SYNC_TARGET sends P_RS_DATA_REQUEST (and expects P_RS_DATA_REPLY) * C_SYNC_SOURCE sends P_RS_DATA_REPLY (and expects P_WRITE_ACK with ID_SYNCER) * (or P_NEG_ACK with ID_SYNCER) */ static inline void inc_rs_pending(struct drbd_device *device) { atomic_inc(&device->rs_pending_cnt); } #define dec_rs_pending(device) _dec_rs_pending(device, __FUNCTION__, __LINE__) static inline void _dec_rs_pending(struct drbd_device *device, const char *func, int line) { atomic_dec(&device->rs_pending_cnt); ERR_IF_CNT_IS_NEGATIVE(rs_pending_cnt, func, line); } /* counts how many answers we still need to send to the peer. * increased on * receive_Data unless protocol A; * we need to send a P_RECV_ACK (proto B) * or P_WRITE_ACK (proto C) * receive_RSDataReply (recv_resync_read) we need to send a P_WRITE_ACK * receive_DataRequest (receive_RSDataRequest) we need to send back P_DATA * receive_Barrier_* we need to send a P_BARRIER_ACK */ static inline void inc_unacked(struct drbd_device *device) { atomic_inc(&device->unacked_cnt); } #define dec_unacked(device) _dec_unacked(device, __FUNCTION__, __LINE__) static inline void _dec_unacked(struct drbd_device *device, const char *func, int line) { atomic_dec(&device->unacked_cnt); ERR_IF_CNT_IS_NEGATIVE(unacked_cnt, func, line); } #define sub_unacked(device, n) _sub_unacked(device, n, __FUNCTION__, __LINE__) static inline void _sub_unacked(struct drbd_device *device, int n, const char *func, int line) { atomic_sub(n, &device->unacked_cnt); ERR_IF_CNT_IS_NEGATIVE(unacked_cnt, func, line); } /** * get_ldev() - Increase the ref count on device->ldev. Returns 0 if there is no ldev * @M: DRBD device. * * You have to call put_ldev() when finished working with device->ldev. */ #define get_ldev(M) __cond_lock(local, _get_ldev_if_state(M,D_INCONSISTENT)) #define get_ldev_if_state(M,MINS) __cond_lock(local, _get_ldev_if_state(M,MINS)) static inline void put_ldev(struct drbd_device *device) { int i = atomic_dec_return(&device->local_cnt); /* This may be called from some endio handler, * so we must not sleep here. */ __release(local); D_ASSERT(device, i >= 0); if (i == 0) { if (device->state.disk == D_DISKLESS) /* even internal references gone, safe to destroy */ drbd_ldev_destroy(device); if (device->state.disk == D_FAILED) { /* all application IO references gone. */ if (!test_and_set_bit(GO_DISKLESS, &device->flags)) drbd_queue_work(&first_peer_device(device)->connection->sender_work, &device->go_diskless); } wake_up(&device->misc_wait); } } #ifndef __CHECKER__ static inline int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_state mins) { int io_allowed; /* never get a reference while D_DISKLESS */ if (device->state.disk == D_DISKLESS) return 0; atomic_inc(&device->local_cnt); io_allowed = (device->state.disk >= mins); if (!io_allowed) put_ldev(device); return io_allowed; } #else extern int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_state mins); #endif /* you must have an "get_ldev" reference */ static inline void drbd_get_syncer_progress(struct drbd_device *device, unsigned long *bits_left, unsigned int *per_mil_done) { /* this is to break it at compile time when we change that, in case we * want to support more than (1<<32) bits on a 32bit arch. */ typecheck(unsigned long, device->rs_total); /* note: both rs_total and rs_left are in bits, i.e. in * units of BM_BLOCK_SIZE. * for the percentage, we don't care. */ if (device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) *bits_left = device->ov_left; else *bits_left = drbd_bm_total_weight(device) - device->rs_failed; /* >> 10 to prevent overflow, * +1 to prevent division by zero */ if (*bits_left > device->rs_total) { /* doh. maybe a logic bug somewhere. * may also be just a race condition * between this and a disconnect during sync. * for now, just prevent in-kernel buffer overflow. */ smp_rmb(); drbd_warn(device, "cs:%s rs_left=%lu > rs_total=%lu (rs_failed %lu)\n", drbd_conn_str(device->state.conn), *bits_left, device->rs_total, device->rs_failed); *per_mil_done = 0; } else { /* Make sure the division happens in long context. * We allow up to one petabyte storage right now, * at a granularity of 4k per bit that is 2**38 bits. * After shift right and multiplication by 1000, * this should still fit easily into a 32bit long, * so we don't need a 64bit division on 32bit arch. * Note: currently we don't support such large bitmaps on 32bit * arch anyways, but no harm done to be prepared for it here. */ unsigned int shift = device->rs_total > UINT_MAX ? 16 : 10; unsigned long left = *bits_left >> shift; unsigned long total = 1UL + (device->rs_total >> shift); unsigned long tmp = 1000UL - left * 1000UL/total; *per_mil_done = tmp; } } /* this throttles on-the-fly application requests * according to max_buffers settings; * maybe re-implement using semaphores? */ static inline int drbd_get_max_buffers(struct drbd_device *device) { struct net_conf *nc; int mxb; rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); mxb = nc ? nc->max_buffers : 1000000; /* arbitrary limit on open requests */ rcu_read_unlock(); return mxb; } static inline int drbd_state_is_stable(struct drbd_device *device) { union drbd_dev_state s = device->state; /* DO NOT add a default clause, we want the compiler to warn us * for any newly introduced state we may have forgotten to add here */ switch ((enum drbd_conns)s.conn) { /* new io only accepted when there is no connection, ... */ case C_STANDALONE: case C_WF_CONNECTION: /* ... or there is a well established connection. */ case C_CONNECTED: case C_SYNC_SOURCE: case C_SYNC_TARGET: case C_VERIFY_S: case C_VERIFY_T: case C_PAUSED_SYNC_S: case C_PAUSED_SYNC_T: case C_AHEAD: case C_BEHIND: /* transitional states, IO allowed */ case C_DISCONNECTING: case C_UNCONNECTED: case C_TIMEOUT: case C_BROKEN_PIPE: case C_NETWORK_FAILURE: case C_PROTOCOL_ERROR: case C_TEAR_DOWN: case C_WF_REPORT_PARAMS: case C_STARTING_SYNC_S: case C_STARTING_SYNC_T: break; /* Allow IO in BM exchange states with new protocols */ case C_WF_BITMAP_S: if (first_peer_device(device)->connection->agreed_pro_version < 96) return 0; break; /* no new io accepted in these states */ case C_WF_BITMAP_T: case C_WF_SYNC_UUID: case C_MASK: /* not "stable" */ return 0; } switch ((enum drbd_disk_state)s.disk) { case D_DISKLESS: case D_INCONSISTENT: case D_OUTDATED: case D_CONSISTENT: case D_UP_TO_DATE: case D_FAILED: /* disk state is stable as well. */ break; /* no new io accepted during transitional states */ case D_ATTACHING: case D_NEGOTIATING: case D_UNKNOWN: case D_MASK: /* not "stable" */ return 0; } return 1; } static inline int drbd_suspended(struct drbd_device *device) { struct drbd_resource *resource = device->resource; return resource->susp || resource->susp_fen || resource->susp_nod; } static inline bool may_inc_ap_bio(struct drbd_device *device) { int mxb = drbd_get_max_buffers(device); if (drbd_suspended(device)) return false; if (test_bit(SUSPEND_IO, &device->flags)) return false; /* to avoid potential deadlock or bitmap corruption, * in various places, we only allow new application io * to start during "stable" states. */ /* no new io accepted when attaching or detaching the disk */ if (!drbd_state_is_stable(device)) return false; /* since some older kernels don't have atomic_add_unless, * and we are within the spinlock anyways, we have this workaround. */ if (atomic_read(&device->ap_bio_cnt) > mxb) return false; if (test_bit(BITMAP_IO, &device->flags)) return false; return true; } static inline bool inc_ap_bio_cond(struct drbd_device *device) { bool rv = false; spin_lock_irq(&device->resource->req_lock); rv = may_inc_ap_bio(device); if (rv) atomic_inc(&device->ap_bio_cnt); spin_unlock_irq(&device->resource->req_lock); return rv; } static inline void inc_ap_bio(struct drbd_device *device) { /* we wait here * as long as the device is suspended * until the bitmap is no longer on the fly during connection * handshake as long as we would exceed the max_buffer limit. * * to avoid races with the reconnect code, * we need to atomic_inc within the spinlock. */ wait_event(device->misc_wait, inc_ap_bio_cond(device)); } static inline void dec_ap_bio(struct drbd_device *device) { int mxb = drbd_get_max_buffers(device); int ap_bio = atomic_dec_return(&device->ap_bio_cnt); D_ASSERT(device, ap_bio >= 0); if (ap_bio == 0 && test_bit(BITMAP_IO, &device->flags)) { if (!test_and_set_bit(BITMAP_IO_QUEUED, &device->flags)) drbd_queue_work(&first_peer_device(device)-> connection->sender_work, &device->bm_io_work.w); } /* this currently does wake_up for every dec_ap_bio! * maybe rather introduce some type of hysteresis? * e.g. (ap_bio == mxb/2 || ap_bio == 0) ? */ if (ap_bio < mxb) wake_up(&device->misc_wait); } static inline bool verify_can_do_stop_sector(struct drbd_device *device) { return first_peer_device(device)->connection->agreed_pro_version >= 97 && first_peer_device(device)->connection->agreed_pro_version != 100; } static inline int drbd_set_ed_uuid(struct drbd_device *device, u64 val) { int changed = device->ed_uuid != val; device->ed_uuid = val; return changed; } static inline int drbd_queue_order_type(struct drbd_device *device) { /* sorry, we currently have no working implementation * of distributed TCQ stuff */ #ifndef QUEUE_ORDERED_NONE #define QUEUE_ORDERED_NONE 0 #endif return QUEUE_ORDERED_NONE; } #ifdef blk_queue_plugged static inline void drbd_blk_run_queue(struct request_queue *q) { if (q && q->unplug_fn) q->unplug_fn(q); } static inline void drbd_kick_lo(struct drbd_device *device) { if (get_ldev(device)) { drbd_blk_run_queue(bdev_get_queue(device->ldev->backing_bdev)); put_ldev(device); } } #else static inline void drbd_blk_run_queue(struct request_queue *q) { } static inline void drbd_kick_lo(struct drbd_device *device) { } #endif static inline void drbd_md_flush(struct drbd_device *device) { int r; if (device->ldev == NULL) { drbd_warn(device, "device->ldev == NULL in drbd_md_flush\n"); return; } if (test_bit(MD_NO_BARRIER, &device->flags)) return; r = blkdev_issue_flush(device->ldev->md_bdev, GFP_NOIO, NULL); if (r) { set_bit(MD_NO_BARRIER, &device->flags); drbd_err(device, "meta data flush failed with status %d, disabling md-flushes\n", r); } } /* resync bitmap */ /* 16MB sized 'bitmap extent' to track syncer usage */ struct bm_extent { int rs_left; /* number of bits set (out of sync) in this extent. */ int rs_failed; /* number of failed resync requests in this extent. */ unsigned long flags; struct lc_element lce; }; #define BME_NO_WRITES 0 /* bm_extent.flags: no more requests on this one! */ #define BME_LOCKED 1 /* bm_extent.flags: syncer active on this one. */ #define BME_PRIORITY 2 /* finish resync IO on this extent ASAP! App IO waiting! */ /* Used to be defined in drivers/md/md.h. * Since 3.8 it is available from wait.h */ #ifndef wait_event_lock_irq #define __wait_event_lock_irq(wq, condition, lock, cmd) \ do { \ wait_queue_t __wait; \ init_waitqueue_entry(&__wait, current); \ \ add_wait_queue(&wq, &__wait); \ for (;;) { \ set_current_state(TASK_UNINTERRUPTIBLE); \ if (condition) \ break; \ spin_unlock_irq(&lock); \ cmd; \ schedule(); \ spin_lock_irq(&lock); \ } \ current->state = TASK_RUNNING; \ remove_wait_queue(&wq, &__wait); \ } while (0) #define wait_event_lock_irq(wq, condition, lock) \ do { \ if (condition) \ break; \ __wait_event_lock_irq(wq, condition, lock, ); \ } while (0) #endif static inline struct drbd_connection *first_connection(struct drbd_resource *resource) { return list_first_entry(&resource->connections, struct drbd_connection, connections); } #endif drbd-8.4.4/drbd/drbd_interval.c0000664000000000000000000001070212176213144015041 0ustar rootroot#include "drbd_interval.h" #include "drbd_wrappers.h" /** * interval_end - return end of @node */ static inline sector_t interval_end(struct rb_node *node) { struct drbd_interval *this = rb_entry(node, struct drbd_interval, rb); return this->end; } /** * update_interval_end - recompute end of @node * * The end of an interval is the highest (start + (size >> 9)) value of this * node and of its children. Called for @node and its parents whenever the end * may have changed. */ static void update_interval_end(struct rb_node *node, void *__unused) { struct drbd_interval *this = rb_entry(node, struct drbd_interval, rb); sector_t end; end = this->sector + (this->size >> 9); if (node->rb_left) { sector_t left = interval_end(node->rb_left); if (left > end) end = left; } if (node->rb_right) { sector_t right = interval_end(node->rb_right); if (right > end) end = right; } this->end = end; } /** * drbd_insert_interval - insert a new interval into a tree */ bool drbd_insert_interval(struct rb_root *root, struct drbd_interval *this) { struct rb_node **new = &root->rb_node, *parent = NULL; BUG_ON(!IS_ALIGNED(this->size, 512)); while (*new) { struct drbd_interval *here = rb_entry(*new, struct drbd_interval, rb); parent = *new; if (this->sector < here->sector) new = &(*new)->rb_left; else if (this->sector > here->sector) new = &(*new)->rb_right; else if (this < here) new = &(*new)->rb_left; else if (this > here) new = &(*new)->rb_right; else return false; } rb_link_node(&this->rb, parent, new); rb_insert_color(&this->rb, root); rb_augment_insert(&this->rb, update_interval_end, NULL); return true; } /** * drbd_contains_interval - check if a tree contains a given interval * @sector: start sector of @interval * @interval: may not be a valid pointer * * Returns if the tree contains the node @interval with start sector @start. * Does not dereference @interval until @interval is known to be a valid object * in @tree. Returns %false if @interval is in the tree but with a different * sector number. */ bool drbd_contains_interval(struct rb_root *root, sector_t sector, struct drbd_interval *interval) { struct rb_node *node = root->rb_node; while (node) { struct drbd_interval *here = rb_entry(node, struct drbd_interval, rb); if (sector < here->sector) node = node->rb_left; else if (sector > here->sector) node = node->rb_right; else if (interval < here) node = node->rb_left; else if (interval > here) node = node->rb_right; else return true; } return false; } /** * drbd_remove_interval - remove an interval from a tree */ void drbd_remove_interval(struct rb_root *root, struct drbd_interval *this) { struct rb_node *deepest; deepest = rb_augment_erase_begin(&this->rb); rb_erase(&this->rb, root); rb_augment_erase_end(deepest, update_interval_end, NULL); } /** * drbd_find_overlap - search for an interval overlapping with [sector, sector + size) * @sector: start sector * @size: size, aligned to 512 bytes * * Returns an interval overlapping with [sector, sector + size), or NULL if * there is none. When there is more than one overlapping interval in the * tree, the interval with the lowest start sector is returned, and all other * overlapping intervals will be on the right side of the tree, reachable with * rb_next(). */ struct drbd_interval * drbd_find_overlap(struct rb_root *root, sector_t sector, unsigned int size) { struct rb_node *node = root->rb_node; struct drbd_interval *overlap = NULL; sector_t end = sector + (size >> 9); BUG_ON(!IS_ALIGNED(size, 512)); while (node) { struct drbd_interval *here = rb_entry(node, struct drbd_interval, rb); if (node->rb_left && sector < interval_end(node->rb_left)) { /* Overlap if any must be on left side */ node = node->rb_left; } else if (here->sector < end && sector < here->sector + (here->size >> 9)) { overlap = here; break; } else if (sector >= here->sector) { /* Overlap if any must be on right side */ node = node->rb_right; } else break; } return overlap; } struct drbd_interval * drbd_next_overlap(struct drbd_interval *i, sector_t sector, unsigned int size) { sector_t end = sector + (size >> 9); struct rb_node *node; for (;;) { node = rb_next(&i->rb); if (!node) return NULL; i = rb_entry(node, struct drbd_interval, rb); if (i->sector >= end) return NULL; if (sector < i->sector + (i->size >> 9)) return i; } } drbd-8.4.4/drbd/drbd_interval.h0000664000000000000000000000363012176213144015050 0ustar rootroot#ifndef __DRBD_INTERVAL_H #define __DRBD_INTERVAL_H #include #include #include /* Compatibility code for 2.6.16 (SLES10) */ #ifndef rb_parent #define rb_parent(r) ((r)->rb_parent) #endif /* * Kernels between mainline commit dd67d051 (v2.6.18-rc1) and 10fd48f2 * (v2.6.19-rc1) have a broken version of RB_EMPTY_NODE(). * * RHEL5 kernels until at least 2.6.18-238.12.1.el5 have the broken definition. */ #if !defined(RB_EMPTY_NODE) || LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,19) #undef RB_EMPTY_NODE #define RB_EMPTY_NODE(node) (rb_parent(node) == node) #endif #ifndef RB_CLEAR_NODE static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p) { rb->rb_parent = p; } #define RB_CLEAR_NODE(node) (rb_set_parent(node, node)) #endif /* /Compatibility code */ struct drbd_interval { struct rb_node rb; sector_t sector; /* start sector of the interval */ unsigned int size; /* size in bytes */ sector_t end; /* highest interval end in subtree */ int local:1 /* local or remote request? */; int waiting:1; }; static inline void drbd_clear_interval(struct drbd_interval *i) { RB_CLEAR_NODE(&i->rb); } static inline bool drbd_interval_empty(struct drbd_interval *i) { return RB_EMPTY_NODE(&i->rb); } extern bool drbd_insert_interval(struct rb_root *, struct drbd_interval *); extern bool drbd_contains_interval(struct rb_root *, sector_t, struct drbd_interval *); extern void drbd_remove_interval(struct rb_root *, struct drbd_interval *); extern struct drbd_interval *drbd_find_overlap(struct rb_root *, sector_t, unsigned int); extern struct drbd_interval *drbd_next_overlap(struct drbd_interval *, sector_t, unsigned int); #define drbd_for_each_overlap(i, root, sector, size) \ for (i = drbd_find_overlap(root, sector, size); \ i; \ i = drbd_next_overlap(i, sector, size)) #endif /* __DRBD_INTERVAL_H */ drbd-8.4.4/drbd/drbd_main.c0000664000000000000000000034254612221331365014154 0ustar rootroot/* drbd.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . Thanks to Carter Burden, Bart Grantham and Gennadiy Nerubayev from Logicworks, Inc. for making SDP replication support possible. drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define __KERNEL_SYSCALLS__ #include #include #include #include #include #include "drbd_int.h" #include "drbd_protocol.h" #include "drbd_req.h" /* only for _req_mod in tl_release and tl_clear */ #include "drbd_vli.h" #ifdef COMPAT_HAVE_LINUX_BYTEORDER_SWABB_H #include #else #include #endif #ifdef COMPAT_DRBD_RELEASE_RETURNS_VOID #define DRBD_RELEASE_RETURN void #else #define DRBD_RELEASE_RETURN int #endif #ifdef BD_OPS_USE_FMODE static int drbd_open(struct block_device *bdev, fmode_t mode); static DRBD_RELEASE_RETURN drbd_release(struct gendisk *gd, fmode_t mode); #else static int drbd_open(struct inode *inode, struct file *file); static DRBD_RELEASE_RETURN drbd_release(struct inode *inode, struct file *file); #endif static int w_md_sync(struct drbd_work *w, int unused); static void md_sync_timer_fn(unsigned long data); static int w_bitmap_io(struct drbd_work *w, int unused); static int w_go_diskless(struct drbd_work *w, int unused); static void drbd_destroy_device(struct kobject *kobj); MODULE_AUTHOR("Philipp Reisner , " "Lars Ellenberg "); MODULE_DESCRIPTION("drbd - Distributed Replicated Block Device v" REL_VERSION); MODULE_VERSION(REL_VERSION); MODULE_LICENSE("GPL"); MODULE_PARM_DESC(minor_count, "Approximate number of drbd devices (" __stringify(DRBD_MINOR_COUNT_MIN) "-" __stringify(DRBD_MINOR_COUNT_MAX) ")"); MODULE_ALIAS_BLOCKDEV_MAJOR(DRBD_MAJOR); #include /* allow_open_on_secondary */ MODULE_PARM_DESC(allow_oos, "DONT USE!"); /* thanks to these macros, if compiled into the kernel (not-module), * this becomes the boot parameter drbd.minor_count */ module_param(minor_count, uint, 0444); module_param(disable_sendpage, bool, 0644); module_param(allow_oos, bool, 0); module_param(proc_details, int, 0644); #ifdef CONFIG_DRBD_FAULT_INJECTION int enable_faults; int fault_rate; static int fault_count; int fault_devs; /* bitmap of enabled faults */ module_param(enable_faults, int, 0664); /* fault rate % value - applies to all enabled faults */ module_param(fault_rate, int, 0664); /* count of faults inserted */ module_param(fault_count, int, 0664); /* bitmap of devices to insert faults on */ module_param(fault_devs, int, 0644); #endif /* module parameter, defined */ unsigned int minor_count = DRBD_MINOR_COUNT_DEF; bool disable_sendpage; bool allow_oos; int proc_details; /* Detail level in proc drbd*/ /* Module parameter for setting the user mode helper program * to run. Default is /sbin/drbdadm */ char usermode_helper[80] = "/sbin/drbdadm"; module_param_string(usermode_helper, usermode_helper, sizeof(usermode_helper), 0644); /* in 2.6.x, our device mapping and config info contains our virtual gendisks * as member "struct gendisk *vdisk;" */ struct idr drbd_devices; struct list_head drbd_resources; struct kmem_cache *drbd_request_cache; struct kmem_cache *drbd_ee_cache; /* peer requests */ struct kmem_cache *drbd_bm_ext_cache; /* bitmap extents */ struct kmem_cache *drbd_al_ext_cache; /* activity log extents */ mempool_t *drbd_request_mempool; mempool_t *drbd_ee_mempool; mempool_t *drbd_md_io_page_pool; struct bio_set *drbd_md_io_bio_set; /* I do not use a standard mempool, because: 1) I want to hand out the pre-allocated objects first. 2) I want to be able to interrupt sleeping allocation with a signal. Note: This is a single linked list, the next pointer is the private member of struct page. */ struct page *drbd_pp_pool; spinlock_t drbd_pp_lock; int drbd_pp_vacant; wait_queue_head_t drbd_pp_wait; static const struct block_device_operations drbd_ops = { .owner = THIS_MODULE, .open = drbd_open, .release = drbd_release, }; static struct kobj_type drbd_device_kobj_type = { .release = drbd_destroy_device, }; #ifdef COMPAT_HAVE_BIO_BI_DESTRUCTOR static void bio_destructor_drbd(struct bio *bio) { bio_free(bio, drbd_md_io_bio_set); } struct bio *bio_alloc_drbd(gfp_t gfp_mask) { struct bio *bio; if (!drbd_md_io_bio_set) return bio_alloc(gfp_mask, 1); bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set); if (!bio) return NULL; bio->bi_destructor = bio_destructor_drbd; return bio; } #else struct bio *bio_alloc_drbd(gfp_t gfp_mask) { struct bio *bio; if (!drbd_md_io_bio_set) return bio_alloc(gfp_mask, 1); bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set); if (!bio) return NULL; return bio; } #endif #ifdef __CHECKER__ /* When checking with sparse, and this is an inline function, sparse will give tons of false positives. When this is a real functions sparse works. */ int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_state mins) { int io_allowed; atomic_inc(&device->local_cnt); io_allowed = (device->state.disk >= mins); if (!io_allowed) { if (atomic_dec_and_test(&device->local_cnt)) wake_up(&device->misc_wait); } return io_allowed; } #endif /** * tl_release() - mark as BARRIER_ACKED all requests in the corresponding transfer log epoch * @connection: DRBD connection. * @barrier_nr: Expected identifier of the DRBD write barrier packet. * @set_size: Expected number of requests before that barrier. * * In case the passed barrier_nr or set_size does not match the oldest * epoch of not yet barrier-acked requests, this function will cause a * termination of the connection. */ void tl_release(struct drbd_connection *connection, unsigned int barrier_nr, unsigned int set_size) { struct drbd_request *r; struct drbd_request *req = NULL; int expect_epoch = 0; int expect_size = 0; spin_lock_irq(&connection->resource->req_lock); /* find oldest not yet barrier-acked write request, * count writes in its epoch. */ list_for_each_entry(r, &connection->transfer_log, tl_requests) { const unsigned s = r->rq_state; if (!req) { if (!(s & RQ_WRITE)) continue; if (!(s & RQ_NET_MASK)) continue; if (s & RQ_NET_DONE) continue; req = r; expect_epoch = req->epoch; expect_size ++; } else { if (r->epoch != expect_epoch) break; if (!(s & RQ_WRITE)) continue; /* if (s & RQ_DONE): not expected */ /* if (!(s & RQ_NET_MASK)): not expected */ expect_size++; } } /* first some paranoia code */ if (req == NULL) { drbd_err(connection, "BAD! BarrierAck #%u received, but no epoch in tl!?\n", barrier_nr); goto bail; } if (expect_epoch != barrier_nr) { drbd_err(connection, "BAD! BarrierAck #%u received, expected #%u!\n", barrier_nr, expect_epoch); goto bail; } if (expect_size != set_size) { drbd_err(connection, "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n", barrier_nr, set_size, expect_size); goto bail; } /* Clean up list of requests processed during current epoch. */ /* this extra list walk restart is paranoia, * to catch requests being barrier-acked "unexpectedly". * It usually should find the same req again, or some READ preceding it. */ list_for_each_entry(req, &connection->transfer_log, tl_requests) if (req->epoch == expect_epoch) break; list_for_each_entry_safe_from(req, r, &connection->transfer_log, tl_requests) { if (req->epoch != expect_epoch) break; _req_mod(req, BARRIER_ACKED); } spin_unlock_irq(&connection->resource->req_lock); return; bail: spin_unlock_irq(&connection->resource->req_lock); conn_request_state(connection, NS(conn, C_PROTOCOL_ERROR), CS_HARD); } /** * _tl_restart() - Walks the transfer log, and applies an action to all requests * @device: DRBD device. * @what: The action/event to perform with all request objects * * @what might be one of CONNECTION_LOST_WHILE_PENDING, RESEND, FAIL_FROZEN_DISK_IO, * RESTART_FROZEN_DISK_IO. */ /* must hold resource->req_lock */ void _tl_restart(struct drbd_connection *connection, enum drbd_req_event what) { struct drbd_request *req, *r; list_for_each_entry_safe(req, r, &connection->transfer_log, tl_requests) _req_mod(req, what); } void tl_restart(struct drbd_connection *connection, enum drbd_req_event what) { spin_lock_irq(&connection->resource->req_lock); _tl_restart(connection, what); spin_unlock_irq(&connection->resource->req_lock); } /** * tl_clear() - Clears all requests and &struct drbd_tl_epoch objects out of the TL * @device: DRBD device. * * This is called after the connection to the peer was lost. The storage covered * by the requests on the transfer gets marked as our of sync. Called from the * receiver thread and the worker thread. */ void tl_clear(struct drbd_connection *connection) { tl_restart(connection, CONNECTION_LOST_WHILE_PENDING); } /** * tl_abort_disk_io() - Abort disk I/O for all requests for a certain device in the TL * @device: DRBD device. */ void tl_abort_disk_io(struct drbd_device *device) { struct drbd_connection *connection = first_peer_device(device)->connection; struct drbd_request *req, *r; spin_lock_irq(&connection->resource->req_lock); list_for_each_entry_safe(req, r, &connection->transfer_log, tl_requests) { if (!(req->rq_state & RQ_LOCAL_PENDING)) continue; if (req->device != device) continue; _req_mod(req, ABORT_DISK_IO); } spin_unlock_irq(&connection->resource->req_lock); } static int drbd_thread_setup(void *arg) { struct drbd_thread *thi = (struct drbd_thread *) arg; struct drbd_resource *resource = thi->resource; unsigned long flags; int retval; restart: retval = thi->function(thi); spin_lock_irqsave(&thi->t_lock, flags); /* if the receiver has been "EXITING", the last thing it did * was set the conn state to "StandAlone", * if now a re-connect request comes in, conn state goes C_UNCONNECTED, * and receiver thread will be "started". * drbd_thread_start needs to set "RESTARTING" in that case. * t_state check and assignment needs to be within the same spinlock, * so either thread_start sees EXITING, and can remap to RESTARTING, * or thread_start see NONE, and can proceed as normal. */ if (thi->t_state == RESTARTING) { drbd_info(resource, "Restarting %s thread\n", thi->name); thi->t_state = RUNNING; spin_unlock_irqrestore(&thi->t_lock, flags); goto restart; } thi->task = NULL; thi->t_state = NONE; smp_mb(); complete_all(&thi->stop); spin_unlock_irqrestore(&thi->t_lock, flags); drbd_info(resource, "Terminating %s\n", current->comm); /* Release mod reference taken when thread was started */ if (thi->connection) kref_put(&thi->connection->kref, drbd_destroy_connection); kref_put(&resource->kref, drbd_destroy_resource); module_put(THIS_MODULE); return retval; } static void drbd_thread_init(struct drbd_resource *resource, struct drbd_thread *thi, int (*func) (struct drbd_thread *), const char *name) { spin_lock_init(&thi->t_lock); thi->task = NULL; thi->t_state = NONE; thi->function = func; thi->resource = resource; thi->connection = NULL; thi->name = name; } int drbd_thread_start(struct drbd_thread *thi) { struct drbd_resource *resource = thi->resource; struct task_struct *nt; unsigned long flags; /* is used from state engine doing drbd_thread_stop_nowait, * while holding the req lock irqsave */ spin_lock_irqsave(&thi->t_lock, flags); switch (thi->t_state) { case NONE: drbd_info(resource, "Starting %s thread (from %s [%d])\n", thi->name, current->comm, current->pid); /* Get ref on module for thread - this is released when thread exits */ if (!try_module_get(THIS_MODULE)) { drbd_err(resource, "Failed to get module reference in drbd_thread_start\n"); spin_unlock_irqrestore(&thi->t_lock, flags); return false; } kref_get(&resource->kref); if (thi->connection) kref_get(&thi->connection->kref); init_completion(&thi->stop); thi->reset_cpu_mask = 1; thi->t_state = RUNNING; spin_unlock_irqrestore(&thi->t_lock, flags); flush_signals(current); /* otherw. may get -ERESTARTNOINTR */ nt = kthread_create(drbd_thread_setup, (void *) thi, "drbd_%c_%s", thi->name[0], thi->resource->name); if (IS_ERR(nt)) { drbd_err(resource, "Couldn't start thread\n"); if (thi->connection) kref_put(&thi->connection->kref, drbd_destroy_connection); kref_put(&resource->kref, drbd_destroy_resource); module_put(THIS_MODULE); return false; } spin_lock_irqsave(&thi->t_lock, flags); thi->task = nt; thi->t_state = RUNNING; spin_unlock_irqrestore(&thi->t_lock, flags); wake_up_process(nt); break; case EXITING: thi->t_state = RESTARTING; drbd_info(resource, "Restarting %s thread (from %s [%d])\n", thi->name, current->comm, current->pid); /* fall through */ case RUNNING: case RESTARTING: default: spin_unlock_irqrestore(&thi->t_lock, flags); break; } return true; } void _drbd_thread_stop(struct drbd_thread *thi, int restart, int wait) { unsigned long flags; enum drbd_thread_state ns = restart ? RESTARTING : EXITING; /* may be called from state engine, holding the req lock irqsave */ spin_lock_irqsave(&thi->t_lock, flags); if (thi->t_state == NONE) { spin_unlock_irqrestore(&thi->t_lock, flags); if (restart) drbd_thread_start(thi); return; } if (thi->t_state != ns) { if (thi->task == NULL) { spin_unlock_irqrestore(&thi->t_lock, flags); return; } thi->t_state = ns; smp_mb(); init_completion(&thi->stop); if (thi->task != current) force_sig(DRBD_SIGKILL, thi->task); } spin_unlock_irqrestore(&thi->t_lock, flags); if (wait) wait_for_completion(&thi->stop); } int conn_lowest_minor(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr = 0, minor = -1; rcu_read_lock(); peer_device = idr_get_next(&connection->peer_devices, &vnr); if (peer_device) minor = device_to_minor(peer_device->device); rcu_read_unlock(); return minor; } #ifdef CONFIG_SMP /** * drbd_calc_cpu_mask() - Generate CPU masks, spread over all CPUs * * Forces all threads of a resource onto the same CPU. This is beneficial for * DRBD's performance. May be overwritten by user's configuration. */ static void drbd_calc_cpu_mask(cpumask_var_t *cpu_mask) { unsigned int *resources_per_cpu, min_index = ~0; resources_per_cpu = kzalloc(nr_cpu_ids * sizeof(*resources_per_cpu), GFP_KERNEL); if (resources_per_cpu) { struct drbd_resource *resource; unsigned int cpu, min = ~0; rcu_read_lock(); for_each_resource_rcu(resource, &drbd_resources) { for_each_cpu(cpu, resource->cpu_mask) resources_per_cpu[cpu]++; } rcu_read_unlock(); for_each_online_cpu(cpu) { if (resources_per_cpu[cpu] < min) { min = resources_per_cpu[cpu]; min_index = cpu; } } kfree(resources_per_cpu); } if (min_index == ~0) { cpumask_setall(*cpu_mask); return; } cpumask_set_cpu(min_index, *cpu_mask); } /** * drbd_thread_current_set_cpu() - modifies the cpu mask of the _current_ thread * @device: DRBD device. * @thi: drbd_thread object * * call in the "main loop" of _all_ threads, no need for any mutex, current won't die * prematurely. */ void drbd_thread_current_set_cpu(struct drbd_thread *thi) { struct drbd_resource *resource = thi->resource; struct task_struct *p = current; if (!thi->reset_cpu_mask) return; thi->reset_cpu_mask = 0; set_cpus_allowed_ptr(p, resource->cpu_mask); } #else #define drbd_calc_cpu_mask(A) ({}) #endif /** * drbd_header_size - size of a packet header * * The header size is a multiple of 8, so any payload following the header is * word aligned on 64-bit architectures. (The bitmap send and receive code * relies on this.) */ unsigned int drbd_header_size(struct drbd_connection *connection) { if (connection->agreed_pro_version >= 100) { BUILD_BUG_ON(!IS_ALIGNED(sizeof(struct p_header100), 8)); return sizeof(struct p_header100); } else { BUILD_BUG_ON(sizeof(struct p_header80) != sizeof(struct p_header95)); BUILD_BUG_ON(!IS_ALIGNED(sizeof(struct p_header80), 8)); return sizeof(struct p_header80); } } static unsigned int prepare_header80(struct p_header80 *h, enum drbd_packet cmd, int size) { h->magic = cpu_to_be32(DRBD_MAGIC); h->command = cpu_to_be16(cmd); h->length = cpu_to_be16(size); return sizeof(struct p_header80); } static unsigned int prepare_header95(struct p_header95 *h, enum drbd_packet cmd, int size) { h->magic = cpu_to_be16(DRBD_MAGIC_BIG); h->command = cpu_to_be16(cmd); h->length = cpu_to_be32(size); return sizeof(struct p_header95); } static unsigned int prepare_header100(struct p_header100 *h, enum drbd_packet cmd, int size, int vnr) { h->magic = cpu_to_be32(DRBD_MAGIC_100); h->volume = cpu_to_be16(vnr); h->command = cpu_to_be16(cmd); h->length = cpu_to_be32(size); h->pad = 0; return sizeof(struct p_header100); } static unsigned int prepare_header(struct drbd_connection *connection, int vnr, void *buffer, enum drbd_packet cmd, int size) { if (connection->agreed_pro_version >= 100) return prepare_header100(buffer, cmd, size, vnr); else if (connection->agreed_pro_version >= 95 && size > DRBD_MAX_SIZE_H80_PACKET) return prepare_header95(buffer, cmd, size); else return prepare_header80(buffer, cmd, size); } static void *__conn_prepare_command(struct drbd_connection *connection, struct drbd_socket *sock) { if (!sock->socket) return NULL; return sock->sbuf + drbd_header_size(connection); } void *conn_prepare_command(struct drbd_connection *connection, struct drbd_socket *sock) { void *p; mutex_lock(&sock->mutex); p = __conn_prepare_command(connection, sock); if (!p) mutex_unlock(&sock->mutex); return p; } void *drbd_prepare_command(struct drbd_peer_device *peer_device, struct drbd_socket *sock) { return conn_prepare_command(peer_device->connection, sock); } static int __send_command(struct drbd_connection *connection, int vnr, struct drbd_socket *sock, enum drbd_packet cmd, unsigned int header_size, void *data, unsigned int size) { int msg_flags; int err; /* * Called with @data == NULL and the size of the data blocks in @size * for commands that send data blocks. For those commands, omit the * MSG_MORE flag: this will increase the likelihood that data blocks * which are page aligned on the sender will end up page aligned on the * receiver. */ msg_flags = data ? MSG_MORE : 0; header_size += prepare_header(connection, vnr, sock->sbuf, cmd, header_size + size); err = drbd_send_all(connection, sock->socket, sock->sbuf, header_size, msg_flags); if (data && !err) err = drbd_send_all(connection, sock->socket, data, size, 0); return err; } static int __conn_send_command(struct drbd_connection *connection, struct drbd_socket *sock, enum drbd_packet cmd, unsigned int header_size, void *data, unsigned int size) { return __send_command(connection, 0, sock, cmd, header_size, data, size); } int conn_send_command(struct drbd_connection *connection, struct drbd_socket *sock, enum drbd_packet cmd, unsigned int header_size, void *data, unsigned int size) { int err; err = __conn_send_command(connection, sock, cmd, header_size, data, size); mutex_unlock(&sock->mutex); return err; } int drbd_send_command(struct drbd_peer_device *peer_device, struct drbd_socket *sock, enum drbd_packet cmd, unsigned int header_size, void *data, unsigned int size) { int err; err = __send_command(peer_device->connection, peer_device->device->vnr, sock, cmd, header_size, data, size); mutex_unlock(&sock->mutex); return err; } int drbd_send_ping(struct drbd_connection *connection) { struct drbd_socket *sock; sock = &connection->meta; if (!conn_prepare_command(connection, sock)) return -EIO; return conn_send_command(connection, sock, P_PING, 0, NULL, 0); } int drbd_send_ping_ack(struct drbd_connection *connection) { struct drbd_socket *sock; sock = &connection->meta; if (!conn_prepare_command(connection, sock)) return -EIO; return conn_send_command(connection, sock, P_PING_ACK, 0, NULL, 0); } int drbd_send_sync_param(struct drbd_peer_device *peer_device) { struct drbd_socket *sock; struct p_rs_param_95 *p; int size; const int apv = peer_device->connection->agreed_pro_version; enum drbd_packet cmd; struct net_conf *nc; struct disk_conf *dc; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; rcu_read_lock(); nc = rcu_dereference(peer_device->connection->net_conf); size = apv <= 87 ? sizeof(struct p_rs_param) : apv == 88 ? sizeof(struct p_rs_param) + strlen(nc->verify_alg) + 1 : apv <= 94 ? sizeof(struct p_rs_param_89) : /* apv >= 95 */ sizeof(struct p_rs_param_95); cmd = apv >= 89 ? P_SYNC_PARAM89 : P_SYNC_PARAM; /* initialize verify_alg and csums_alg */ memset(p->verify_alg, 0, 2 * SHARED_SECRET_MAX); if (get_ldev(peer_device->device)) { dc = rcu_dereference(peer_device->device->ldev->disk_conf); p->resync_rate = cpu_to_be32(dc->resync_rate); p->c_plan_ahead = cpu_to_be32(dc->c_plan_ahead); p->c_delay_target = cpu_to_be32(dc->c_delay_target); p->c_fill_target = cpu_to_be32(dc->c_fill_target); p->c_max_rate = cpu_to_be32(dc->c_max_rate); put_ldev(peer_device->device); } else { p->resync_rate = cpu_to_be32(DRBD_RESYNC_RATE_DEF); p->c_plan_ahead = cpu_to_be32(DRBD_C_PLAN_AHEAD_DEF); p->c_delay_target = cpu_to_be32(DRBD_C_DELAY_TARGET_DEF); p->c_fill_target = cpu_to_be32(DRBD_C_FILL_TARGET_DEF); p->c_max_rate = cpu_to_be32(DRBD_C_MAX_RATE_DEF); } if (apv >= 88) strcpy(p->verify_alg, nc->verify_alg); if (apv >= 89) strcpy(p->csums_alg, nc->csums_alg); rcu_read_unlock(); return drbd_send_command(peer_device, sock, cmd, size, NULL, 0); } int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_packet cmd) { struct drbd_socket *sock; struct p_protocol *p; struct net_conf *nc; int size, cf; sock = &connection->data; p = __conn_prepare_command(connection, sock); if (!p) return -EIO; rcu_read_lock(); nc = rcu_dereference(connection->net_conf); if (nc->tentative && connection->agreed_pro_version < 92) { rcu_read_unlock(); mutex_unlock(&sock->mutex); drbd_err(connection, "--dry-run is not supported by peer"); return -EOPNOTSUPP; } size = sizeof(*p); if (connection->agreed_pro_version >= 87) size += strlen(nc->integrity_alg) + 1; p->protocol = cpu_to_be32(nc->wire_protocol); p->after_sb_0p = cpu_to_be32(nc->after_sb_0p); p->after_sb_1p = cpu_to_be32(nc->after_sb_1p); p->after_sb_2p = cpu_to_be32(nc->after_sb_2p); p->two_primaries = cpu_to_be32(nc->two_primaries); cf = 0; if (nc->discard_my_data) cf |= CF_DISCARD_MY_DATA; if (nc->tentative) cf |= CF_DRY_RUN; p->conn_flags = cpu_to_be32(cf); if (connection->agreed_pro_version >= 87) strcpy(p->integrity_alg, nc->integrity_alg); rcu_read_unlock(); return __conn_send_command(connection, sock, cmd, size, NULL, 0); } int drbd_send_protocol(struct drbd_connection *connection) { int err; mutex_lock(&connection->data.mutex); err = __drbd_send_protocol(connection, P_PROTOCOL); mutex_unlock(&connection->data.mutex); return err; } static int _drbd_send_uuids(struct drbd_peer_device *peer_device, u64 uuid_flags) { struct drbd_device *device = peer_device->device; struct drbd_socket *sock; struct p_uuids *p; int i; if (!get_ldev_if_state(device, D_NEGOTIATING)) return 0; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) { put_ldev(device); return -EIO; } spin_lock_irq(&device->ldev->md.uuid_lock); for (i = UI_CURRENT; i < UI_SIZE; i++) p->uuid[i] = cpu_to_be64(device->ldev->md.uuid[i]); spin_unlock_irq(&device->ldev->md.uuid_lock); device->comm_bm_set = drbd_bm_total_weight(device); p->uuid[UI_SIZE] = cpu_to_be64(device->comm_bm_set); rcu_read_lock(); uuid_flags |= rcu_dereference(peer_device->connection->net_conf)->discard_my_data ? 1 : 0; rcu_read_unlock(); uuid_flags |= test_bit(CRASHED_PRIMARY, &device->flags) ? 2 : 0; uuid_flags |= device->new_state_tmp.disk == D_INCONSISTENT ? 4 : 0; p->uuid[UI_FLAGS] = cpu_to_be64(uuid_flags); put_ldev(device); return drbd_send_command(peer_device, sock, P_UUIDS, sizeof(*p), NULL, 0); } int drbd_send_uuids(struct drbd_peer_device *peer_device) { return _drbd_send_uuids(peer_device, 0); } int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *peer_device) { return _drbd_send_uuids(peer_device, 8); } void drbd_print_uuids(struct drbd_device *device, const char *text) { if (get_ldev_if_state(device, D_NEGOTIATING)) { u64 *uuid = device->ldev->md.uuid; drbd_info(device, "%s %016llX:%016llX:%016llX:%016llX\n", text, (unsigned long long)uuid[UI_CURRENT], (unsigned long long)uuid[UI_BITMAP], (unsigned long long)uuid[UI_HISTORY_START], (unsigned long long)uuid[UI_HISTORY_END]); put_ldev(device); } else { drbd_info(device, "%s effective data uuid: %016llX\n", text, (unsigned long long)device->ed_uuid); } } void drbd_gen_and_send_sync_uuid(struct drbd_peer_device *peer_device) { struct drbd_device *device = peer_device->device; struct drbd_socket *sock; struct p_rs_uuid *p; u64 uuid; D_ASSERT(device, device->state.disk == D_UP_TO_DATE); uuid = device->ldev->md.uuid[UI_BITMAP]; if (uuid && uuid != UUID_JUST_CREATED) uuid = uuid + UUID_NEW_BM_OFFSET; else get_random_bytes(&uuid, sizeof(u64)); drbd_uuid_set(device, UI_BITMAP, uuid); drbd_print_uuids(device, "updated sync UUID"); drbd_md_sync(device); sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (p) { p->uuid = cpu_to_be64(uuid); drbd_send_command(peer_device, sock, P_SYNC_UUID, sizeof(*p), NULL, 0); } } int drbd_send_sizes(struct drbd_peer_device *peer_device, int trigger_reply, enum dds_flags flags) { struct drbd_device *device = peer_device->device; struct drbd_socket *sock; struct p_sizes *p; sector_t d_size, u_size; int q_order_type; unsigned int max_bio_size; if (get_ldev_if_state(device, D_NEGOTIATING)) { D_ASSERT(device, device->ldev->backing_bdev); d_size = drbd_get_max_capacity(device->ldev); rcu_read_lock(); u_size = rcu_dereference(device->ldev->disk_conf)->disk_size; rcu_read_unlock(); q_order_type = drbd_queue_order_type(device); max_bio_size = queue_max_hw_sectors(device->ldev->backing_bdev->bd_disk->queue) << 9; max_bio_size = min(max_bio_size, DRBD_MAX_BIO_SIZE); put_ldev(device); } else { d_size = 0; u_size = 0; q_order_type = QUEUE_ORDERED_NONE; max_bio_size = DRBD_MAX_BIO_SIZE; /* ... multiple BIOs per peer_request */ } sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; if (peer_device->connection->agreed_pro_version <= 94) max_bio_size = min(max_bio_size, DRBD_MAX_SIZE_H80_PACKET); else if (peer_device->connection->agreed_pro_version < 100) max_bio_size = min(max_bio_size, DRBD_MAX_BIO_SIZE_P95); p->d_size = cpu_to_be64(d_size); p->u_size = cpu_to_be64(u_size); p->c_size = cpu_to_be64(trigger_reply ? 0 : drbd_get_capacity(device->this_bdev)); p->max_bio_size = cpu_to_be32(max_bio_size); p->queue_order_type = cpu_to_be16(q_order_type); p->dds_flags = cpu_to_be16(flags); return drbd_send_command(peer_device, sock, P_SIZES, sizeof(*p), NULL, 0); } /** * drbd_send_current_state() - Sends the drbd state to the peer * @peer_device: DRBD peer device. */ int drbd_send_current_state_(struct drbd_peer_device *peer_device, const char *func, unsigned int line) { struct drbd_socket *sock; struct p_state *p; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->state = cpu_to_be32(peer_device->device->state.i); /* Within the send mutex */ return drbd_send_command(peer_device, sock, P_STATE, sizeof(*p), NULL, 0); } /** * drbd_send_state() - After a state change, sends the new state to the peer * @peer_device: DRBD peer device. * @state: the state to send, not necessarily the current state. * * Each state change queues an "after_state_ch" work, which will eventually * send the resulting new state to the peer. If more state changes happen * between queuing and processing of the after_state_ch work, we still * want to send each intermediary state in the order it occurred. */ int drbd_send_state_(struct drbd_peer_device *peer_device, union drbd_state state, const char *func, unsigned int line) { struct drbd_socket *sock; struct p_state *p; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->state = cpu_to_be32(state.i); /* Within the send mutex */ return drbd_send_command(peer_device, sock, P_STATE, sizeof(*p), NULL, 0); } int drbd_send_state_req(struct drbd_peer_device *peer_device, union drbd_state mask, union drbd_state val) { struct drbd_socket *sock; struct p_req_state *p; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->mask = cpu_to_be32(mask.i); p->val = cpu_to_be32(val.i); return drbd_send_command(peer_device, sock, P_STATE_CHG_REQ, sizeof(*p), NULL, 0); } int conn_send_state_req(struct drbd_connection *connection, union drbd_state mask, union drbd_state val) { enum drbd_packet cmd; struct drbd_socket *sock; struct p_req_state *p; cmd = connection->agreed_pro_version < 100 ? P_STATE_CHG_REQ : P_CONN_ST_CHG_REQ; sock = &connection->data; p = conn_prepare_command(connection, sock); if (!p) return -EIO; p->mask = cpu_to_be32(mask.i); p->val = cpu_to_be32(val.i); return conn_send_command(connection, sock, cmd, sizeof(*p), NULL, 0); } void drbd_send_sr_reply(struct drbd_peer_device *peer_device, enum drbd_state_rv retcode) { struct drbd_socket *sock; struct p_req_state_reply *p; sock = &peer_device->connection->meta; p = drbd_prepare_command(peer_device, sock); if (p) { p->retcode = cpu_to_be32(retcode); drbd_send_command(peer_device, sock, P_STATE_CHG_REPLY, sizeof(*p), NULL, 0); } } void conn_send_sr_reply(struct drbd_connection *connection, enum drbd_state_rv retcode) { struct drbd_socket *sock; struct p_req_state_reply *p; enum drbd_packet cmd = connection->agreed_pro_version < 100 ? P_STATE_CHG_REPLY : P_CONN_ST_CHG_REPLY; sock = &connection->meta; p = conn_prepare_command(connection, sock); if (p) { p->retcode = cpu_to_be32(retcode); conn_send_command(connection, sock, cmd, sizeof(*p), NULL, 0); } } static void dcbp_set_code(struct p_compressed_bm *p, enum drbd_bitmap_code code) { BUG_ON(code & ~0xf); p->encoding = (p->encoding & ~0xf) | code; } static void dcbp_set_start(struct p_compressed_bm *p, int set) { p->encoding = (p->encoding & ~0x80) | (set ? 0x80 : 0); } static void dcbp_set_pad_bits(struct p_compressed_bm *p, int n) { BUG_ON(n & ~0x7); p->encoding = (p->encoding & (~0x7 << 4)) | (n << 4); } int fill_bitmap_rle_bits(struct drbd_device *device, struct p_compressed_bm *p, unsigned int size, struct bm_xfer_ctx *c) { struct bitstream bs; unsigned long plain_bits; unsigned long tmp; unsigned long rl; unsigned len; unsigned toggle; int bits, use_rle; /* may we use this feature? */ rcu_read_lock(); use_rle = rcu_dereference(first_peer_device(device)->connection->net_conf)->use_rle; rcu_read_unlock(); if (!use_rle || first_peer_device(device)->connection->agreed_pro_version < 90) return 0; if (c->bit_offset >= c->bm_bits) return 0; /* nothing to do. */ /* use at most thus many bytes */ bitstream_init(&bs, p->code, size, 0); memset(p->code, 0, size); /* plain bits covered in this code string */ plain_bits = 0; /* p->encoding & 0x80 stores whether the first run length is set. * bit offset is implicit. * start with toggle == 2 to be able to tell the first iteration */ toggle = 2; /* see how much plain bits we can stuff into one packet * using RLE and VLI. */ do { tmp = (toggle == 0) ? _drbd_bm_find_next_zero(device, c->bit_offset) : _drbd_bm_find_next(device, c->bit_offset); if (tmp == -1UL) tmp = c->bm_bits; rl = tmp - c->bit_offset; if (toggle == 2) { /* first iteration */ if (rl == 0) { /* the first checked bit was set, * store start value, */ dcbp_set_start(p, 1); /* but skip encoding of zero run length */ toggle = !toggle; continue; } dcbp_set_start(p, 0); } /* paranoia: catch zero runlength. * can only happen if bitmap is modified while we scan it. */ if (rl == 0) { drbd_err(device, "unexpected zero runlength while encoding bitmap " "t:%u bo:%lu\n", toggle, c->bit_offset); return -1; } bits = vli_encode_bits(&bs, rl); if (bits == -ENOBUFS) /* buffer full */ break; if (bits <= 0) { drbd_err(device, "error while encoding bitmap: %d\n", bits); return 0; } toggle = !toggle; plain_bits += rl; c->bit_offset = tmp; } while (c->bit_offset < c->bm_bits); len = bs.cur.b - p->code + !!bs.cur.bit; if (plain_bits < (len << 3)) { /* incompressible with this method. * we need to rewind both word and bit position. */ c->bit_offset -= plain_bits; bm_xfer_ctx_bit_to_word_offset(c); c->bit_offset = c->word_offset * BITS_PER_LONG; return 0; } /* RLE + VLI was able to compress it just fine. * update c->word_offset. */ bm_xfer_ctx_bit_to_word_offset(c); /* store pad_bits */ dcbp_set_pad_bits(p, (8 - bs.cur.bit) & 0x7); return len; } /** * send_bitmap_rle_or_plain * * Return 0 when done, 1 when another iteration is needed, and a negative error * code upon failure. */ static int send_bitmap_rle_or_plain(struct drbd_device *device, struct bm_xfer_ctx *c) { struct drbd_socket *sock = &first_peer_device(device)->connection->data; unsigned int header_size = drbd_header_size(first_peer_device(device)->connection); struct p_compressed_bm *p = sock->sbuf + header_size; int len, err; len = fill_bitmap_rle_bits(device, p, DRBD_SOCKET_BUFFER_SIZE - header_size - sizeof(*p), c); if (len < 0) return -EIO; if (len) { dcbp_set_code(p, RLE_VLI_Bits); err = __send_command(first_peer_device(device)->connection, device->vnr, sock, P_COMPRESSED_BITMAP, sizeof(*p) + len, NULL, 0); c->packets[0]++; c->bytes[0] += header_size + sizeof(*p) + len; if (c->bit_offset >= c->bm_bits) len = 0; /* DONE */ } else { /* was not compressible. * send a buffer full of plain text bits instead. */ unsigned int data_size; unsigned long num_words; unsigned long *p = sock->sbuf + header_size; data_size = DRBD_SOCKET_BUFFER_SIZE - header_size; num_words = min_t(size_t, data_size / sizeof(*p), c->bm_words - c->word_offset); len = num_words * sizeof(*p); if (len) drbd_bm_get_lel(device, c->word_offset, num_words, p); err = __send_command(first_peer_device(device)->connection, device->vnr, sock, P_BITMAP, len, NULL, 0); c->word_offset += num_words; c->bit_offset = c->word_offset * BITS_PER_LONG; c->packets[1]++; c->bytes[1] += header_size + len; if (c->bit_offset > c->bm_bits) c->bit_offset = c->bm_bits; } if (!err) { if (len == 0) { INFO_bm_xfer_stats(device, "send", c); return 0; } else return 1; } return -EIO; } /* See the comment at receive_bitmap() */ static int _drbd_send_bitmap(struct drbd_device *device) { struct bm_xfer_ctx c; int err; if (!expect(device->bitmap)) return false; if (get_ldev(device)) { if (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC)) { drbd_info(device, "Writing the whole bitmap, MDF_FullSync was set.\n"); drbd_bm_set_all(device); if (drbd_bm_write(device)) { /* write_bm did fail! Leave full sync flag set in Meta P_DATA * but otherwise process as per normal - need to tell other * side that a full resync is required! */ drbd_err(device, "Failed to write bitmap to disk!\n"); } else { drbd_md_clear_flag(device, MDF_FULL_SYNC); drbd_md_sync(device); } } put_ldev(device); } c = (struct bm_xfer_ctx) { .bm_bits = drbd_bm_bits(device), .bm_words = drbd_bm_words(device), }; do { err = send_bitmap_rle_or_plain(device, &c); } while (err > 0); return err == 0; } int drbd_send_bitmap(struct drbd_device *device) { struct drbd_socket *sock = &first_peer_device(device)->connection->data; int err = -1; mutex_lock(&sock->mutex); if (sock->socket) err = !_drbd_send_bitmap(device); mutex_unlock(&sock->mutex); return err; } void drbd_send_b_ack(struct drbd_connection *connection, u32 barrier_nr, u32 set_size) { struct drbd_socket *sock; struct p_barrier_ack *p; if (connection->cstate < C_WF_REPORT_PARAMS) return; sock = &connection->meta; p = conn_prepare_command(connection, sock); if (!p) return; p->barrier = barrier_nr; p->set_size = cpu_to_be32(set_size); conn_send_command(connection, sock, P_BARRIER_ACK, sizeof(*p), NULL, 0); } /** * _drbd_send_ack() - Sends an ack packet * @device: DRBD device. * @cmd: Packet command code. * @sector: sector, needs to be in big endian byte order * @blksize: size in byte, needs to be in big endian byte order * @block_id: Id, big endian byte order */ static int _drbd_send_ack(struct drbd_peer_device *peer_device, enum drbd_packet cmd, u64 sector, u32 blksize, u64 block_id) { struct drbd_socket *sock; struct p_block_ack *p; if (peer_device->device->state.conn < C_CONNECTED) return -EIO; sock = &peer_device->connection->meta; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->sector = sector; p->block_id = block_id; p->blksize = blksize; p->seq_num = cpu_to_be32(atomic_inc_return(&peer_device->device->packet_seq)); return drbd_send_command(peer_device, sock, cmd, sizeof(*p), NULL, 0); } /* dp->sector and dp->block_id already/still in network byte order, * data_size is payload size according to dp->head, * and may need to be corrected for digest size. */ void drbd_send_ack_dp(struct drbd_peer_device *peer_device, enum drbd_packet cmd, struct p_data *dp, int data_size) { if (peer_device->connection->peer_integrity_tfm) data_size -= crypto_hash_digestsize(peer_device->connection->peer_integrity_tfm); _drbd_send_ack(peer_device, cmd, dp->sector, cpu_to_be32(data_size), dp->block_id); } void drbd_send_ack_rp(struct drbd_peer_device *peer_device, enum drbd_packet cmd, struct p_block_req *rp) { _drbd_send_ack(peer_device, cmd, rp->sector, rp->blksize, rp->block_id); } /** * drbd_send_ack() - Sends an ack packet * @device: DRBD device * @cmd: packet command code * @peer_req: peer request */ int drbd_send_ack(struct drbd_peer_device *peer_device, enum drbd_packet cmd, struct drbd_peer_request *peer_req) { return _drbd_send_ack(peer_device, cmd, cpu_to_be64(peer_req->i.sector), cpu_to_be32(peer_req->i.size), peer_req->block_id); } /* This function misuses the block_id field to signal if the blocks * are is sync or not. */ int drbd_send_ack_ex(struct drbd_peer_device *peer_device, enum drbd_packet cmd, sector_t sector, int blksize, u64 block_id) { return _drbd_send_ack(peer_device, cmd, cpu_to_be64(sector), cpu_to_be32(blksize), cpu_to_be64(block_id)); } int drbd_send_drequest(struct drbd_peer_device *peer_device, int cmd, sector_t sector, int size, u64 block_id) { struct drbd_socket *sock; struct p_block_req *p; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->sector = cpu_to_be64(sector); p->block_id = block_id; p->blksize = cpu_to_be32(size); return drbd_send_command(peer_device, sock, cmd, sizeof(*p), NULL, 0); } int drbd_send_drequest_csum(struct drbd_peer_device *peer_device, sector_t sector, int size, void *digest, int digest_size, enum drbd_packet cmd) { struct drbd_socket *sock; struct p_block_req *p; /* FIXME: Put the digest into the preallocated socket buffer. */ sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->sector = cpu_to_be64(sector); p->block_id = ID_SYNCER /* unused */; p->blksize = cpu_to_be32(size); return drbd_send_command(peer_device, sock, cmd, sizeof(*p), digest, digest_size); } int drbd_send_ov_request(struct drbd_peer_device *peer_device, sector_t sector, int size) { struct drbd_socket *sock; struct p_block_req *p; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->sector = cpu_to_be64(sector); p->block_id = ID_SYNCER /* unused */; p->blksize = cpu_to_be32(size); return drbd_send_command(peer_device, sock, P_OV_REQUEST, sizeof(*p), NULL, 0); } /* called on sndtimeo * returns false if we should retry, * true if we think connection is dead */ static int we_should_drop_the_connection(struct drbd_connection *connection, struct socket *sock) { int drop_it; drop_it = connection->meta.socket == sock || !connection->asender.task || get_t_state(&connection->asender) != RUNNING || connection->cstate < C_WF_REPORT_PARAMS; if (drop_it) return true; drop_it = !--connection->ko_count; if (!drop_it) { drbd_err(connection, "[%s/%d] sock_sendmsg time expired, ko = %u\n", current->comm, current->pid, connection->ko_count); request_ping(connection); } return drop_it; /* && (device->state == R_PRIMARY) */; } static void drbd_update_congested(struct drbd_connection *connection) { struct sock *sk = connection->data.socket->sk; if (sk->sk_wmem_queued > sk->sk_sndbuf * 4 / 5) set_bit(NET_CONGESTED, &connection->flags); } /* The idea of sendpage seems to be to put some kind of reference * to the page into the skb, and to hand it over to the NIC. In * this process get_page() gets called. * * As soon as the page was really sent over the network put_page() * gets called by some part of the network layer. [ NIC driver? ] * * [ get_page() / put_page() increment/decrement the count. If count * reaches 0 the page will be freed. ] * * This works nicely with pages from FSs. * But this means that in protocol A we might signal IO completion too early! * * In order not to corrupt data during a resync we must make sure * that we do not reuse our own buffer pages (EEs) to early, therefore * we have the net_ee list. * * XFS seems to have problems, still, it submits pages with page_count == 0! * As a workaround, we disable sendpage on pages * with page_count == 0 or PageSlab. */ static int _drbd_no_send_page(struct drbd_peer_device *peer_device, struct page *page, int offset, size_t size, unsigned msg_flags) { struct socket *socket; void *addr; int err; socket = peer_device->connection->data.socket; addr = kmap(page) + offset; err = drbd_send_all(peer_device->connection, socket, addr, size, msg_flags); kunmap(page); if (!err) peer_device->device->send_cnt += size >> 9; return err; } static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *page, int offset, size_t size, unsigned msg_flags) { struct socket *socket = peer_device->connection->data.socket; mm_segment_t oldfs = get_fs(); int len = size; int err = -EIO; /* e.g. XFS meta- & log-data is in slab pages, which have a * page_count of 0 and/or have PageSlab() set. * we cannot use send_page for those, as that does get_page(); * put_page(); and would cause either a VM_BUG directly, or * __page_cache_release a page that would actually still be referenced * by someone, leading to some obscure delayed Oops somewhere else. */ if (disable_sendpage || (page_count(page) < 1) || PageSlab(page)) return _drbd_no_send_page(peer_device, page, offset, size, msg_flags); msg_flags |= MSG_NOSIGNAL; drbd_update_congested(peer_device->connection); set_fs(KERNEL_DS); do { int sent; sent = socket->ops->sendpage(socket, page, offset, len, msg_flags); if (sent <= 0) { if (sent == -EAGAIN) { if (we_should_drop_the_connection(peer_device->connection, socket)) break; continue; } drbd_warn(peer_device->device, "%s: size=%d len=%d sent=%d\n", __func__, (int)size, len, sent); if (sent < 0) err = sent; break; } len -= sent; offset += sent; } while (len > 0 /* THINK && device->cstate >= C_CONNECTED*/); set_fs(oldfs); clear_bit(NET_CONGESTED, &peer_device->connection->flags); if (len == 0) { err = 0; peer_device->device->send_cnt += size >> 9; } return err; } static int _drbd_send_bio(struct drbd_peer_device *peer_device, struct bio *bio) { struct bio_vec *bvec; int i; /* hint all but last page with MSG_MORE */ bio_for_each_segment(bvec, bio, i) { int err; err = _drbd_no_send_page(peer_device, bvec->bv_page, bvec->bv_offset, bvec->bv_len, i == bio->bi_vcnt - 1 ? 0 : MSG_MORE); if (err) return err; } return 0; } static int _drbd_send_zc_bio(struct drbd_peer_device *peer_device, struct bio *bio) { struct bio_vec *bvec; int i; /* hint all but last page with MSG_MORE */ bio_for_each_segment(bvec, bio, i) { int err; err = _drbd_send_page(peer_device, bvec->bv_page, bvec->bv_offset, bvec->bv_len, i == bio->bi_vcnt - 1 ? 0 : MSG_MORE); if (err) return err; } return 0; } static int _drbd_send_zc_ee(struct drbd_peer_device *peer_device, struct drbd_peer_request *peer_req) { struct page *page = peer_req->pages; unsigned len = peer_req->i.size; int err; /* hint all but last page with MSG_MORE */ page_chain_for_each(page) { unsigned l = min_t(unsigned, len, PAGE_SIZE); err = _drbd_send_page(peer_device, page, 0, l, page_chain_next(page) ? MSG_MORE : 0); if (err) return err; len -= l; } return 0; } /* see also wire_flags_to_bio() * DRBD_REQ_*, because we need to semantically map the flags to data packet * flags and back. We may replicate to other kernel versions. */ static u32 bio_flags_to_wire(struct drbd_connection *connection, unsigned long bi_rw) { if (connection->agreed_pro_version >= 95) return (bi_rw & DRBD_REQ_SYNC ? DP_RW_SYNC : 0) | (bi_rw & DRBD_REQ_UNPLUG ? DP_UNPLUG : 0) | (bi_rw & DRBD_REQ_FUA ? DP_FUA : 0) | (bi_rw & DRBD_REQ_FLUSH ? DP_FLUSH : 0) | (bi_rw & DRBD_REQ_DISCARD ? DP_DISCARD : 0); /* else: we used to communicate one bit only in older DRBD */ return bi_rw & (DRBD_REQ_SYNC | DRBD_REQ_UNPLUG) ? DP_RW_SYNC : 0; } /* Used to send write or TRIM aka REQ_DISCARD requests * R_PRIMARY -> Peer (P_DATA, P_TRIM) */ int drbd_send_dblock(struct drbd_peer_device *peer_device, struct drbd_request *req) { struct drbd_device *device = peer_device->device; struct drbd_socket *sock; struct p_data *p; unsigned int dp_flags = 0; int dgs; int err; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); dgs = peer_device->connection->integrity_tfm ? crypto_hash_digestsize(peer_device->connection->integrity_tfm) : 0; if (!p) return -EIO; p->sector = cpu_to_be64(req->i.sector); p->block_id = (unsigned long)req; p->seq_num = cpu_to_be32(atomic_inc_return(&device->packet_seq)); dp_flags = bio_flags_to_wire(peer_device->connection, req->master_bio->bi_rw); if (device->state.conn >= C_SYNC_SOURCE && device->state.conn <= C_PAUSED_SYNC_T) dp_flags |= DP_MAY_SET_IN_SYNC; if (peer_device->connection->agreed_pro_version >= 100) { if (req->rq_state & RQ_EXP_RECEIVE_ACK) dp_flags |= DP_SEND_RECEIVE_ACK; if (req->rq_state & RQ_EXP_WRITE_ACK) dp_flags |= DP_SEND_WRITE_ACK; } p->dp_flags = cpu_to_be32(dp_flags); if (dp_flags & DP_DISCARD) { struct p_trim *t = (struct p_trim*)p; t->size = cpu_to_be32(req->i.size); err = __send_command(peer_device->connection, device->vnr, sock, P_TRIM, sizeof(*t), NULL, 0); goto out; } /* our digest is still only over the payload. * TRIM does not carry any payload. */ if (dgs) drbd_csum_bio(peer_device->connection->integrity_tfm, req->master_bio, p + 1); err = __send_command(peer_device->connection, device->vnr, sock, P_DATA, sizeof(*p) + dgs, NULL, req->i.size); if (!err) { /* For protocol A, we have to memcpy the payload into * socket buffers, as we may complete right away * as soon as we handed it over to tcp, at which point the data * pages may become invalid. * * For data-integrity enabled, we copy it as well, so we can be * sure that even if the bio pages may still be modified, it * won't change the data on the wire, thus if the digest checks * out ok after sending on this side, but does not fit on the * receiving side, we sure have detected corruption elsewhere. */ if (!(req->rq_state & (RQ_EXP_RECEIVE_ACK | RQ_EXP_WRITE_ACK)) || dgs) err = _drbd_send_bio(peer_device, req->master_bio); else err = _drbd_send_zc_bio(peer_device, req->master_bio); /* double check digest, sometimes buffers have been modified in flight. */ if (dgs > 0 && dgs <= 64) { /* 64 byte, 512 bit, is the largest digest size * currently supported in kernel crypto. */ unsigned char digest[64]; drbd_csum_bio(peer_device->connection->integrity_tfm, req->master_bio, digest); if (memcmp(p + 1, digest, dgs)) { drbd_warn(device, "Digest mismatch, buffer modified by upper layers during write: %llus +%u\n", (unsigned long long)req->i.sector, req->i.size); } } /* else if (dgs > 64) { ... Be noisy about digest too large ... } */ } out: mutex_unlock(&sock->mutex); /* locked by drbd_prepare_command() */ return err; } /* answer packet, used to send data back for read requests: * Peer -> (diskless) R_PRIMARY (P_DATA_REPLY) * C_SYNC_SOURCE -> C_SYNC_TARGET (P_RS_DATA_REPLY) */ int drbd_send_block(struct drbd_peer_device *peer_device, enum drbd_packet cmd, struct drbd_peer_request *peer_req) { struct drbd_device *device = peer_device->device; struct drbd_socket *sock; struct p_data *p; int err; int dgs; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); dgs = peer_device->connection->integrity_tfm ? crypto_hash_digestsize(peer_device->connection->integrity_tfm) : 0; if (!p) return -EIO; p->sector = cpu_to_be64(peer_req->i.sector); p->block_id = peer_req->block_id; p->seq_num = 0; /* unused */ p->dp_flags = 0; if (dgs) drbd_csum_ee(peer_device->connection->integrity_tfm, peer_req, p + 1); err = __send_command(peer_device->connection, device->vnr, sock, cmd, sizeof(*p) + dgs, NULL, peer_req->i.size); if (!err) err = _drbd_send_zc_ee(peer_device, peer_req); mutex_unlock(&sock->mutex); /* locked by drbd_prepare_command() */ return err; } int drbd_send_out_of_sync(struct drbd_peer_device *peer_device, struct drbd_request *req) { struct drbd_socket *sock; struct p_block_desc *p; sock = &peer_device->connection->data; p = drbd_prepare_command(peer_device, sock); if (!p) return -EIO; p->sector = cpu_to_be64(req->i.sector); p->blksize = cpu_to_be32(req->i.size); return drbd_send_command(peer_device, sock, P_OUT_OF_SYNC, sizeof(*p), NULL, 0); } /* drbd_send distinguishes two cases: Packets sent via the data socket "sock" and packets sent via the meta data socket "msock" sock msock -----------------+-------------------------+------------------------------ timeout conf.timeout / 2 conf.timeout / 2 timeout action send a ping via msock Abort communication and close all sockets */ /* * you must have down()ed the appropriate [m]sock_mutex elsewhere! */ int drbd_send(struct drbd_connection *connection, struct socket *sock, void *buf, size_t size, unsigned msg_flags) { struct kvec iov; struct msghdr msg; int rv, sent = 0; if (!sock) return -EBADR; /* THINK if (signal_pending) return ... ? */ iov.iov_base = buf; iov.iov_len = size; msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; if (sock == connection->data.socket) { rcu_read_lock(); connection->ko_count = rcu_dereference(connection->net_conf)->ko_count; rcu_read_unlock(); drbd_update_congested(connection); } do { /* STRANGE * tcp_sendmsg does _not_ use its size parameter at all ? * * -EAGAIN on timeout, -EINTR on signal. */ /* THINK * do we need to block DRBD_SIG if sock == &meta.socket ?? * otherwise wake_asender() might interrupt some send_*Ack ! */ rv = kernel_sendmsg(sock, &msg, &iov, 1, size); if (rv == -EAGAIN) { if (we_should_drop_the_connection(connection, sock)) break; else continue; } if (rv == -EINTR) { flush_signals(current); rv = 0; } if (rv < 0) break; sent += rv; iov.iov_base += rv; iov.iov_len -= rv; } while (sent < size); if (sock == connection->data.socket) clear_bit(NET_CONGESTED, &connection->flags); if (rv <= 0) { if (rv != -EAGAIN) { drbd_err(connection, "%s_sendmsg returned %d\n", sock == connection->meta.socket ? "msock" : "sock", rv); conn_request_state(connection, NS(conn, C_BROKEN_PIPE), CS_HARD); } else conn_request_state(connection, NS(conn, C_TIMEOUT), CS_HARD); } return sent; } /** * drbd_send_all - Send an entire buffer * * Returns 0 upon success and a negative error value otherwise. */ int drbd_send_all(struct drbd_connection *connection, struct socket *sock, void *buffer, size_t size, unsigned msg_flags) { int err; err = drbd_send(connection, sock, buffer, size, msg_flags); if (err < 0) return err; if (err != size) return -EIO; return 0; } #ifdef BD_OPS_USE_FMODE static int drbd_open(struct block_device *bdev, fmode_t mode) #else static int drbd_open(struct inode *inode, struct file *file) #endif { #ifdef BD_OPS_USE_FMODE struct drbd_device *device = bdev->bd_disk->private_data; #else int mode = file->f_mode; struct drbd_device *device = inode->i_bdev->bd_disk->private_data; #endif unsigned long flags; int rv = 0; spin_lock_irqsave(&device->resource->req_lock, flags); /* to have a stable device->state.role * and no race with updating open_cnt */ if (device->state.role != R_PRIMARY) { if (mode & FMODE_WRITE) rv = -EROFS; else if (!allow_oos) rv = -EMEDIUMTYPE; } if (!rv) device->open_cnt++; spin_unlock_irqrestore(&device->resource->req_lock, flags); return rv; } #ifdef BD_OPS_USE_FMODE static DRBD_RELEASE_RETURN drbd_release(struct gendisk *gd, fmode_t mode) { struct drbd_device *device = gd->private_data; device->open_cnt--; #ifndef COMPAT_DRBD_RELEASE_RETURNS_VOID return 0; #endif } #else static DRBD_RELEASE_RETURN drbd_release(struct inode *inode, struct file *file) { struct drbd_device *device = inode->i_bdev->bd_disk->private_data; device->open_cnt--; #ifndef COMPAT_DRBD_RELEASE_RETURNS_VOID return 0; #endif } #endif #ifdef blk_queue_plugged static void drbd_unplug_fn(struct request_queue *q) { struct drbd_device *device = q->queuedata; /* unplug FIRST */ spin_lock_irq(q->queue_lock); blk_remove_plug(q); spin_unlock_irq(q->queue_lock); /* only if connected */ spin_lock_irq(&device->resource->req_lock); if (device->state.pdsk >= D_INCONSISTENT && device->state.conn >= C_CONNECTED) { D_ASSERT(device, device->state.role == R_PRIMARY); if (test_and_clear_bit(UNPLUG_REMOTE, &device->flags)) { /* add to the sender_work queue, * unless already queued. * XXX this might be a good addition to drbd_queue_work * anyways, to detect "double queuing" ... */ if (list_empty(&device->unplug_work.list)) drbd_queue_work(&first_peer_device(device)->connection->sender_work, &device->unplug_work); } } spin_unlock_irq(&device->resource->req_lock); if (device->state.disk >= D_INCONSISTENT) drbd_kick_lo(device); } #endif static void drbd_set_defaults(struct drbd_device *device) { /* Beware! The actual layout differs * between big endian and little endian */ device->state = (union drbd_dev_state) { { .role = R_SECONDARY, .peer = R_UNKNOWN, .conn = C_STANDALONE, .disk = D_DISKLESS, .pdsk = D_UNKNOWN, } }; } void drbd_init_set_defaults(struct drbd_device *device) { /* the memset(,0,) did most of this. * note: only assignments, no allocation in here */ #ifdef PARANOIA SET_MDEV_MAGIC(device); #endif drbd_set_defaults(device); atomic_set(&device->ap_bio_cnt, 0); atomic_set(&device->ap_pending_cnt, 0); atomic_set(&device->rs_pending_cnt, 0); atomic_set(&device->unacked_cnt, 0); atomic_set(&device->local_cnt, 0); atomic_set(&device->pp_in_use_by_net, 0); atomic_set(&device->rs_sect_in, 0); atomic_set(&device->rs_sect_ev, 0); atomic_set(&device->ap_in_flight, 0); atomic_set(&device->md_io_in_use, 0); mutex_init(&device->own_state_mutex); device->state_mutex = &device->own_state_mutex; spin_lock_init(&device->al_lock); spin_lock_init(&device->peer_seq_lock); INIT_LIST_HEAD(&device->active_ee); INIT_LIST_HEAD(&device->sync_ee); INIT_LIST_HEAD(&device->done_ee); INIT_LIST_HEAD(&device->read_ee); INIT_LIST_HEAD(&device->net_ee); INIT_LIST_HEAD(&device->resync_reads); INIT_LIST_HEAD(&device->resync_work.list); INIT_LIST_HEAD(&device->unplug_work.list); INIT_LIST_HEAD(&device->go_diskless.list); INIT_LIST_HEAD(&device->md_sync_work.list); INIT_LIST_HEAD(&device->start_resync_work.list); INIT_LIST_HEAD(&device->bm_io_work.w.list); device->resync_work.cb = w_resync_timer; device->unplug_work.cb = w_send_write_hint; device->go_diskless.cb = w_go_diskless; device->md_sync_work.cb = w_md_sync; device->bm_io_work.w.cb = w_bitmap_io; device->start_resync_work.cb = w_start_resync; init_timer(&device->resync_timer); init_timer(&device->md_sync_timer); init_timer(&device->start_resync_timer); init_timer(&device->request_timer); device->resync_timer.function = resync_timer_fn; device->resync_timer.data = (unsigned long) device; device->md_sync_timer.function = md_sync_timer_fn; device->md_sync_timer.data = (unsigned long) device; device->start_resync_timer.function = start_resync_timer_fn; device->start_resync_timer.data = (unsigned long) device; device->request_timer.function = request_timer_fn; device->request_timer.data = (unsigned long) device; init_waitqueue_head(&device->misc_wait); init_waitqueue_head(&device->state_wait); init_waitqueue_head(&device->ee_wait); init_waitqueue_head(&device->al_wait); init_waitqueue_head(&device->seq_wait); device->resync_wenr = LC_FREE; device->peer_max_bio_size = DRBD_MAX_BIO_SIZE_SAFE; device->local_max_bio_size = DRBD_MAX_BIO_SIZE_SAFE; } void drbd_device_cleanup(struct drbd_device *device) { int i; if (first_peer_device(device)->connection->receiver.t_state != NONE) drbd_err(device, "ASSERT FAILED: receiver t_state == %d expected 0.\n", first_peer_device(device)->connection->receiver.t_state); device->al_writ_cnt = device->bm_writ_cnt = device->read_cnt = device->recv_cnt = device->send_cnt = device->writ_cnt = device->p_size = device->rs_start = device->rs_total = device->rs_failed = 0; device->rs_last_events = 0; device->rs_last_sect_ev = 0; for (i = 0; i < DRBD_SYNC_MARKS; i++) { device->rs_mark_left[i] = 0; device->rs_mark_time[i] = 0; } D_ASSERT(device, first_peer_device(device)->connection->net_conf == NULL); drbd_set_my_capacity(device, 0); if (device->bitmap) { /* maybe never allocated. */ drbd_bm_resize(device, 0, 1); drbd_bm_cleanup(device); } drbd_free_bc(device->ldev); device->ldev = NULL; clear_bit(AL_SUSPENDED, &device->flags); D_ASSERT(device, list_empty(&device->active_ee)); D_ASSERT(device, list_empty(&device->sync_ee)); D_ASSERT(device, list_empty(&device->done_ee)); D_ASSERT(device, list_empty(&device->read_ee)); D_ASSERT(device, list_empty(&device->net_ee)); D_ASSERT(device, list_empty(&device->resync_reads)); D_ASSERT(device, list_empty(&first_peer_device(device)->connection->sender_work.q)); D_ASSERT(device, list_empty(&device->resync_work.list)); D_ASSERT(device, list_empty(&device->unplug_work.list)); D_ASSERT(device, list_empty(&device->go_diskless.list)); drbd_set_defaults(device); } static void drbd_destroy_mempools(void) { struct page *page; while (drbd_pp_pool) { page = drbd_pp_pool; drbd_pp_pool = (struct page *)page_private(page); __free_page(page); drbd_pp_vacant--; } /* D_ASSERT(device, atomic_read(&drbd_pp_vacant)==0); */ if (drbd_md_io_bio_set) bioset_free(drbd_md_io_bio_set); if (drbd_md_io_page_pool) mempool_destroy(drbd_md_io_page_pool); if (drbd_ee_mempool) mempool_destroy(drbd_ee_mempool); if (drbd_request_mempool) mempool_destroy(drbd_request_mempool); if (drbd_ee_cache) kmem_cache_destroy(drbd_ee_cache); if (drbd_request_cache) kmem_cache_destroy(drbd_request_cache); if (drbd_bm_ext_cache) kmem_cache_destroy(drbd_bm_ext_cache); if (drbd_al_ext_cache) kmem_cache_destroy(drbd_al_ext_cache); drbd_md_io_bio_set = NULL; drbd_md_io_page_pool = NULL; drbd_ee_mempool = NULL; drbd_request_mempool = NULL; drbd_ee_cache = NULL; drbd_request_cache = NULL; drbd_bm_ext_cache = NULL; drbd_al_ext_cache = NULL; return; } static int drbd_create_mempools(void) { struct page *page; const int number = (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * minor_count; int i; /* prepare our caches and mempools */ drbd_request_mempool = NULL; drbd_ee_cache = NULL; drbd_request_cache = NULL; drbd_bm_ext_cache = NULL; drbd_al_ext_cache = NULL; drbd_pp_pool = NULL; drbd_md_io_page_pool = NULL; drbd_md_io_bio_set = NULL; /* caches */ drbd_request_cache = kmem_cache_create( "drbd_req", sizeof(struct drbd_request), 0, 0, NULL); if (drbd_request_cache == NULL) goto Enomem; drbd_ee_cache = kmem_cache_create( "drbd_ee", sizeof(struct drbd_peer_request), 0, 0, NULL); if (drbd_ee_cache == NULL) goto Enomem; drbd_bm_ext_cache = kmem_cache_create( "drbd_bm", sizeof(struct bm_extent), 0, 0, NULL); if (drbd_bm_ext_cache == NULL) goto Enomem; drbd_al_ext_cache = kmem_cache_create( "drbd_al", sizeof(struct lc_element), 0, 0, NULL); if (drbd_al_ext_cache == NULL) goto Enomem; /* mempools */ drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0); if (drbd_md_io_bio_set == NULL) goto Enomem; drbd_md_io_page_pool = mempool_create_page_pool(DRBD_MIN_POOL_PAGES, 0); if (drbd_md_io_page_pool == NULL) goto Enomem; drbd_request_mempool = mempool_create(number, mempool_alloc_slab, mempool_free_slab, drbd_request_cache); if (drbd_request_mempool == NULL) goto Enomem; drbd_ee_mempool = mempool_create(number, mempool_alloc_slab, mempool_free_slab, drbd_ee_cache); if (drbd_ee_mempool == NULL) goto Enomem; /* drbd's page pool */ spin_lock_init(&drbd_pp_lock); for (i = 0; i < number; i++) { page = alloc_page(GFP_HIGHUSER); if (!page) goto Enomem; set_page_private(page, (unsigned long)drbd_pp_pool); drbd_pp_pool = page; } drbd_pp_vacant = number; return 0; Enomem: drbd_destroy_mempools(); /* in case we allocated some */ return -ENOMEM; } static int drbd_notify_sys(struct notifier_block *this, unsigned long code, void *unused) { /* just so we have it. you never know what interesting things we * might want to do here some day... */ return NOTIFY_DONE; } static struct notifier_block drbd_notifier = { .notifier_call = drbd_notify_sys, }; static void drbd_release_all_peer_reqs(struct drbd_device *device) { int rr; rr = drbd_free_peer_reqs(device, &device->active_ee); if (rr) drbd_err(device, "%d EEs in active list found!\n", rr); rr = drbd_free_peer_reqs(device, &device->sync_ee); if (rr) drbd_err(device, "%d EEs in sync list found!\n", rr); rr = drbd_free_peer_reqs(device, &device->read_ee); if (rr) drbd_err(device, "%d EEs in read list found!\n", rr); rr = drbd_free_peer_reqs(device, &device->done_ee); if (rr) drbd_err(device, "%d EEs in done list found!\n", rr); rr = drbd_free_peer_reqs(device, &device->net_ee); if (rr) drbd_err(device, "%d EEs in net list found!\n", rr); } /* caution. no locking. */ static void drbd_destroy_device(struct kobject *kobj) { struct drbd_device *device = container_of(kobj, struct drbd_device, kobj); struct drbd_resource *resource = device->resource; struct drbd_connection *connection; del_timer_sync(&device->request_timer); /* paranoia asserts */ D_ASSERT(device, device->open_cnt == 0); /* end paranoia asserts */ /* cleanup stuff that may have been allocated during * device (re-)configuration or state changes */ if (device->this_bdev) bdput(device->this_bdev); drbd_free_bc(device->ldev); device->ldev = NULL; drbd_release_all_peer_reqs(device); lc_destroy(device->act_log); lc_destroy(device->resync); kfree(device->p_uuid); /* device->p_uuid = NULL; */ if (device->bitmap) /* should no longer be there. */ drbd_bm_cleanup(device); __free_page(device->md_io_page); put_disk(device->vdisk); blk_cleanup_queue(device->rq_queue); kfree(device->rs_plan_s); kfree(first_peer_device(device)); kfree(device); for_each_connection(connection, resource) kref_put(&connection->kref, drbd_destroy_connection); kref_put(&resource->kref, drbd_destroy_resource); } /* One global retry thread, if we need to push back some bio and have it * reinserted through our make request function. */ static struct retry_worker { struct workqueue_struct *wq; struct work_struct worker; spinlock_t lock; struct list_head writes; } retry; static void do_retry(struct work_struct *ws) { struct retry_worker *retry = container_of(ws, struct retry_worker, worker); LIST_HEAD(writes); struct drbd_request *req, *tmp; spin_lock_irq(&retry->lock); list_splice_init(&retry->writes, &writes); spin_unlock_irq(&retry->lock); list_for_each_entry_safe(req, tmp, &writes, tl_requests) { struct drbd_device *device = req->device; struct bio *bio = req->master_bio; unsigned long start_time = req->start_time; bool expected; expected = expect(atomic_read(&req->completion_ref) == 0) && expect(req->rq_state & RQ_POSTPONED) && expect((req->rq_state & RQ_LOCAL_PENDING) == 0 || (req->rq_state & RQ_LOCAL_ABORTED) != 0); if (!expected) drbd_err(device, "req=%p completion_ref=%d rq_state=%x\n", req, atomic_read(&req->completion_ref), req->rq_state); /* We still need to put one kref associated with the * "completion_ref" going zero in the code path that queued it * here. The request object may still be referenced by a * frozen local req->private_bio, in case we force-detached. */ kref_put(&req->kref, drbd_req_destroy); /* A single suspended or otherwise blocking device may stall * all others as well. Fortunately, this code path is to * recover from a situation that "should not happen": * concurrent writes in multi-primary setup. * In a "normal" lifecycle, this workqueue is supposed to be * destroyed without ever doing anything. * If it turns out to be an issue anyways, we can do per * resource (replication group) or per device (minor) retry * workqueues instead. */ /* We are not just doing generic_make_request(), * as we want to keep the start_time information. */ inc_ap_bio(device); __drbd_make_request(device, bio, start_time); } } void drbd_restart_request(struct drbd_request *req) { unsigned long flags; spin_lock_irqsave(&retry.lock, flags); list_move_tail(&req->tl_requests, &retry.writes); spin_unlock_irqrestore(&retry.lock, flags); /* Drop the extra reference that would otherwise * have been dropped by complete_master_bio. * do_retry() needs to grab a new one. */ dec_ap_bio(req->device); queue_work(retry.wq, &retry.worker); } void drbd_destroy_resource(struct kref *kref) { struct drbd_resource *resource = container_of(kref, struct drbd_resource, kref); idr_destroy(&resource->devices); free_cpumask_var(resource->cpu_mask); kfree(resource->name); kfree(resource); } void drbd_free_resource(struct drbd_resource *resource) { struct drbd_connection *connection, *tmp; for_each_connection_safe(connection, tmp, resource) { list_del(&connection->connections); kref_put(&connection->kref, drbd_destroy_connection); } kref_put(&resource->kref, drbd_destroy_resource); } static void drbd_cleanup(void) { unsigned int i; struct drbd_device *device; struct drbd_resource *resource, *tmp; unregister_reboot_notifier(&drbd_notifier); /* first remove proc, * drbdsetup uses it's presence to detect * whether DRBD is loaded. * If we would get stuck in proc removal, * but have netlink already deregistered, * some drbdsetup commands may wait forever * for an answer. */ if (drbd_proc) remove_proc_entry("drbd", NULL); if (retry.wq) destroy_workqueue(retry.wq); drbd_genl_unregister(); idr_for_each_entry(&drbd_devices, device, i) drbd_delete_device(device); /* not _rcu since, no other updater anymore. Genl already unregistered */ for_each_resource_safe(resource, tmp, &drbd_resources) { list_del(&resource->resources); drbd_free_resource(resource); } drbd_destroy_mempools(); drbd_unregister_blkdev(DRBD_MAJOR, "drbd"); idr_destroy(&drbd_devices); printk(KERN_INFO "drbd: module cleanup done.\n"); } /** * drbd_congested() - Callback for the flusher thread * @congested_data: User data * @bdi_bits: Bits the BDI flusher thread is currently interested in * * Returns 1<connection->flags)) { r |= (1 << BDI_async_congested); /* Without good local data, we would need to read from remote, * and that would need the worker thread as well, which is * currently blocked waiting for that usermode helper to * finish. */ if (!get_ldev_if_state(device, D_UP_TO_DATE)) r |= (1 << BDI_sync_congested); else put_ldev(device); r &= bdi_bits; reason = 'c'; goto out; } if (get_ldev(device)) { q = bdev_get_queue(device->ldev->backing_bdev); r = bdi_congested(&q->backing_dev_info, bdi_bits); put_ldev(device); if (r) reason = 'b'; } if (bdi_bits & (1 << BDI_async_congested) && test_bit(NET_CONGESTED, &first_peer_device(device)->connection->flags)) { r |= (1 << BDI_async_congested); reason = reason == 'b' ? 'a' : 'n'; } out: device->congestion_reason = reason; return r; } static void drbd_init_workqueue(struct drbd_work_queue* wq) { spin_lock_init(&wq->q_lock); INIT_LIST_HEAD(&wq->q); init_waitqueue_head(&wq->q_wait); } struct completion_work { struct drbd_work w; struct completion done; }; static int w_complete(struct drbd_work *w, int cancel) { struct completion_work *completion_work = container_of(w, struct completion_work, w); complete(&completion_work->done); return 0; } void drbd_flush_workqueue(struct drbd_work_queue *work_queue) { struct completion_work completion_work; completion_work.w.cb = w_complete; init_completion(&completion_work.done); drbd_queue_work(work_queue, &completion_work.w); wait_for_completion(&completion_work.done); } struct drbd_resource *drbd_find_resource(const char *name) { struct drbd_resource *resource; if (!name || !name[0]) return NULL; rcu_read_lock(); for_each_resource_rcu(resource, &drbd_resources) { if (!strcmp(resource->name, name)) { kref_get(&resource->kref); goto found; } } resource = NULL; found: rcu_read_unlock(); return resource; } struct drbd_connection *conn_get_by_addrs(void *my_addr, int my_addr_len, void *peer_addr, int peer_addr_len) { struct drbd_resource *resource; struct drbd_connection *connection; rcu_read_lock(); for_each_resource_rcu(resource, &drbd_resources) { for_each_connection_rcu(connection, resource) { if (connection->my_addr_len == my_addr_len && connection->peer_addr_len == peer_addr_len && !memcmp(&connection->my_addr, my_addr, my_addr_len) && !memcmp(&connection->peer_addr, peer_addr, peer_addr_len)) { kref_get(&connection->kref); goto found; } } } connection = NULL; found: rcu_read_unlock(); return connection; } static int drbd_alloc_socket(struct drbd_socket *socket) { socket->rbuf = (void *) __get_free_page(GFP_KERNEL); if (!socket->rbuf) return -ENOMEM; socket->sbuf = (void *) __get_free_page(GFP_KERNEL); if (!socket->sbuf) return -ENOMEM; return 0; } static void drbd_free_socket(struct drbd_socket *socket) { free_page((unsigned long) socket->sbuf); free_page((unsigned long) socket->rbuf); } void conn_free_crypto(struct drbd_connection *connection) { drbd_free_sock(connection); crypto_free_hash(connection->csums_tfm); crypto_free_hash(connection->verify_tfm); crypto_free_hash(connection->cram_hmac_tfm); crypto_free_hash(connection->integrity_tfm); crypto_free_hash(connection->peer_integrity_tfm); kfree(connection->int_dig_in); kfree(connection->int_dig_vv); connection->csums_tfm = NULL; connection->verify_tfm = NULL; connection->cram_hmac_tfm = NULL; connection->integrity_tfm = NULL; connection->peer_integrity_tfm = NULL; connection->int_dig_in = NULL; connection->int_dig_vv = NULL; } int set_resource_options(struct drbd_resource *resource, struct res_opts *res_opts) { struct drbd_connection *connection; cpumask_var_t new_cpu_mask; int err; if (!zalloc_cpumask_var(&new_cpu_mask, GFP_KERNEL)) return -ENOMEM; /* retcode = ERR_NOMEM; drbd_msg_put_info("unable to allocate cpumask"); */ /* silently ignore cpu mask on UP kernel */ if (nr_cpu_ids > 1 && res_opts->cpu_mask[0] != 0) { err = bitmap_parse(res_opts->cpu_mask, DRBD_CPU_MASK_SIZE, cpumask_bits(new_cpu_mask), nr_cpu_ids); if (err) { drbd_warn(resource, "bitmap_parse() failed with %d\n", err); /* retcode = ERR_CPU_MASK_PARSE; */ goto fail; } } resource->res_opts = *res_opts; if (cpumask_empty(new_cpu_mask)) drbd_calc_cpu_mask(&new_cpu_mask); if (!cpumask_equal(resource->cpu_mask, new_cpu_mask)) { cpumask_copy(resource->cpu_mask, new_cpu_mask); for_each_connection_rcu(connection, resource) { connection->receiver.reset_cpu_mask = 1; connection->asender.reset_cpu_mask = 1; connection->worker.reset_cpu_mask = 1; } } err = 0; fail: free_cpumask_var(new_cpu_mask); return err; } struct drbd_resource *drbd_create_resource(const char *name) { struct drbd_resource *resource; resource = kzalloc(sizeof(struct drbd_resource), GFP_KERNEL); if (!resource) goto fail; resource->name = kstrdup(name, GFP_KERNEL); if (!resource->name) goto fail_free_resource; if (!zalloc_cpumask_var(&resource->cpu_mask, GFP_KERNEL)) goto fail_free_name; kref_init(&resource->kref); idr_init(&resource->devices); INIT_LIST_HEAD(&resource->connections); list_add_tail_rcu(&resource->resources, &drbd_resources); mutex_init(&resource->conf_update); spin_lock_init(&resource->req_lock); return resource; fail_free_name: kfree(resource->name); fail_free_resource: kfree(resource); fail: return NULL; } /* caller must be under genl_lock() */ struct drbd_connection *conn_create(const char *name, struct res_opts *res_opts) { struct drbd_resource *resource; struct drbd_connection *connection; connection = kzalloc(sizeof(struct drbd_connection), GFP_KERNEL); if (!connection) return NULL; if (drbd_alloc_socket(&connection->data)) goto fail; if (drbd_alloc_socket(&connection->meta)) goto fail; connection->current_epoch = kzalloc(sizeof(struct drbd_epoch), GFP_KERNEL); if (!connection->current_epoch) goto fail; INIT_LIST_HEAD(&connection->transfer_log); INIT_LIST_HEAD(&connection->current_epoch->list); connection->epochs = 1; spin_lock_init(&connection->epoch_lock); connection->write_ordering = WO_bio_barrier; connection->send.seen_any_write_yet = false; connection->send.current_epoch_nr = 0; connection->send.current_epoch_writes = 0; resource = drbd_create_resource(name); if (!resource) goto fail; connection->cstate = C_STANDALONE; mutex_init(&connection->cstate_mutex); init_waitqueue_head(&connection->ping_wait); idr_init(&connection->peer_devices); drbd_init_workqueue(&connection->sender_work); mutex_init(&connection->data.mutex); mutex_init(&connection->meta.mutex); drbd_thread_init(resource, &connection->receiver, drbd_receiver, "receiver"); connection->receiver.connection = connection; drbd_thread_init(resource, &connection->worker, drbd_worker, "worker"); connection->worker.connection = connection; drbd_thread_init(resource, &connection->asender, drbd_asender, "asender"); connection->asender.connection = connection; kref_init(&connection->kref); connection->resource = resource; if (set_resource_options(resource, res_opts)) goto fail_resource; kref_get(&resource->kref); list_add_tail_rcu(&connection->connections, &resource->connections); return connection; fail_resource: list_del(&resource->resources); drbd_free_resource(resource); fail: kfree(connection->current_epoch); drbd_free_socket(&connection->meta); drbd_free_socket(&connection->data); kfree(connection); return NULL; } void drbd_destroy_connection(struct kref *kref) { struct drbd_connection *connection = container_of(kref, struct drbd_connection, kref); struct drbd_resource *resource = connection->resource; if (atomic_read(&connection->current_epoch->epoch_size) != 0) drbd_err(connection, "epoch_size:%d\n", atomic_read(&connection->current_epoch->epoch_size)); kfree(connection->current_epoch); idr_destroy(&connection->peer_devices); drbd_free_socket(&connection->meta); drbd_free_socket(&connection->data); kfree(connection->int_dig_in); kfree(connection->int_dig_vv); kfree(connection); kref_put(&resource->kref, drbd_destroy_resource); } int init_submitter(struct drbd_device *device) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,3,0) /* opencoded create_singlethread_workqueue(), * to be able to say "drbd%d", ..., minor */ device->submit.wq = alloc_workqueue("drbd%u_submit", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, device->minor); #else device->submit.wq = create_singlethread_workqueue("drbd_submit"); #endif if (!device->submit.wq) return -ENOMEM; #ifdef COMPAT_INIT_WORK_HAS_THREE_ARGUMENTS INIT_WORK(&device->submit.worker, do_submit, &device->submit.worker); #else INIT_WORK(&device->submit.worker, do_submit); #endif spin_lock_init(&device->submit.lock); INIT_LIST_HEAD(&device->submit.writes); return 0; } enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsigned int minor) { struct drbd_resource *resource = adm_ctx->resource; struct kobject *parent; struct drbd_connection *connection; struct drbd_device *device; struct drbd_peer_device *peer_device, *tmp_peer_device; struct gendisk *disk; struct request_queue *q; int id, refs = 2; int vnr = adm_ctx->volume; enum drbd_ret_code err = ERR_NOMEM; device = minor_to_device(minor); if (device) return ERR_MINOR_EXISTS; /* GFP_KERNEL, we are outside of all write-out paths */ device = kzalloc(sizeof(struct drbd_device), GFP_KERNEL); if (!device) return ERR_NOMEM; kref_get(&resource->kref); device->resource = resource; device->minor = minor; device->vnr = vnr; drbd_init_set_defaults(device); q = blk_alloc_queue(GFP_KERNEL); if (!q) goto out_no_q; device->rq_queue = q; q->queuedata = device; disk = alloc_disk(1); if (!disk) goto out_no_disk; device->vdisk = disk; set_disk_ro(disk, true); disk->queue = q; disk->major = DRBD_MAJOR; disk->first_minor = minor; disk->fops = &drbd_ops; sprintf(disk->disk_name, "drbd%d", minor); disk->private_data = device; device->this_bdev = bdget(MKDEV(DRBD_MAJOR, minor)); /* we have no partitions. we contain only ourselves. */ device->this_bdev->bd_contains = device->this_bdev; q->backing_dev_info.congested_fn = drbd_congested; q->backing_dev_info.congested_data = device; blk_queue_make_request(q, drbd_make_request); #ifdef REQ_FLUSH blk_queue_flush(q, REQ_FLUSH | REQ_FUA); #endif /* Setting the max_hw_sectors to an odd value of 8kibyte here This triggers a max_bio_size message upon first attach or connect */ blk_queue_max_hw_sectors(q, DRBD_MAX_BIO_SIZE_SAFE >> 8); blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); blk_queue_merge_bvec(q, drbd_merge_bvec); q->queue_lock = &resource->req_lock; #ifdef blk_queue_plugged /* plugging on a queue, that actually has no requests! */ q->unplug_fn = drbd_unplug_fn; #endif device->md_io_page = alloc_page(GFP_KERNEL); if (!device->md_io_page) goto out_no_io_page; if (drbd_bm_init(device)) goto out_no_bitmap; device->read_requests = RB_ROOT; device->write_requests = RB_ROOT; id = idr_alloc(&drbd_devices, device, minor, minor + 1, GFP_KERNEL); if (id < 0) { if (id == -ENOSPC) { err = ERR_MINOR_EXISTS; drbd_msg_put_info(adm_ctx->reply_skb, "requested minor exists already"); } goto out_no_minor_idr; } id = idr_alloc(&resource->devices, device, vnr, vnr + 1, GFP_KERNEL); if (id < 0) { if (id == -ENOSPC) { err = ERR_MINOR_EXISTS; drbd_msg_put_info(adm_ctx->reply_skb, "requested minor exists already"); } goto out_idr_remove_minor; } INIT_LIST_HEAD(&device->peer_devices); for_each_connection(connection, resource) { peer_device = kzalloc(sizeof(struct drbd_peer_device), GFP_KERNEL); if (!peer_device) goto out_idr_remove_from_resource; peer_device->connection = connection; peer_device->device = device; list_add(&peer_device->peer_devices, &device->peer_devices); refs++; id = idr_alloc(&connection->peer_devices, peer_device, vnr, vnr + 1, GFP_KERNEL); if (id < 0) { if (id == -ENOSPC) { err = ERR_INVALID_REQUEST; drbd_msg_put_info(adm_ctx->reply_skb, "requested volume exists already"); } goto out_idr_remove_from_resource; } kref_get(&connection->kref); } if (init_submitter(device)) { err = ERR_NOMEM; drbd_msg_put_info(adm_ctx->reply_skb, "unable to create submit workqueue"); goto out_idr_remove_vol; } add_disk(disk); parent = drbd_kobj_of_disk(disk); /* one ref for both idrs and the the add_disk */ if (kobject_init_and_add(&device->kobj, &drbd_device_kobj_type, parent, "drbd")) goto out_del_disk; while (refs--) kobject_get(&device->kobj); /* inherit the connection state */ device->state.conn = first_connection(resource)->cstate; if (device->state.conn == C_WF_REPORT_PARAMS) { for_each_peer_device(peer_device, device) drbd_connected(peer_device); } return NO_ERROR; out_del_disk: destroy_workqueue(device->submit.wq); del_gendisk(device->vdisk); out_idr_remove_vol: idr_remove(&connection->peer_devices, vnr); out_idr_remove_from_resource: for_each_connection(connection, resource) { peer_device = idr_find(&connection->peer_devices, vnr); if (peer_device) { idr_remove(&connection->peer_devices, vnr); kref_put(&connection->kref, drbd_destroy_connection); } } for_each_peer_device_safe(peer_device, tmp_peer_device, device) { list_del(&peer_device->peer_devices); kfree(peer_device); } idr_remove(&resource->devices, vnr); out_idr_remove_minor: idr_remove(&drbd_devices, minor); synchronize_rcu(); out_no_minor_idr: drbd_bm_cleanup(device); out_no_bitmap: __free_page(device->md_io_page); out_no_io_page: put_disk(disk); out_no_disk: blk_cleanup_queue(q); out_no_q: kref_put(&resource->kref, drbd_destroy_resource); kfree(device); return err; } void drbd_delete_device(struct drbd_device *device) { struct drbd_resource *resource = device->resource; struct drbd_connection *connection; for_each_connection(connection, resource) { idr_remove(&connection->peer_devices, device->vnr); kobject_put(&device->kobj); } idr_remove(&resource->devices, device->vnr); kobject_put(&device->kobj); idr_remove(&drbd_devices, device_to_minor(device)); kobject_put(&device->kobj); destroy_workqueue(device->submit.wq); del_gendisk(device->vdisk); kobject_del(&device->kobj); synchronize_rcu(); kobject_put(&device->kobj); } int __init drbd_init(void) { int err; if (minor_count < DRBD_MINOR_COUNT_MIN || minor_count > DRBD_MINOR_COUNT_MAX) { printk(KERN_ERR "drbd: invalid minor_count (%d)\n", minor_count); #ifdef MODULE return -EINVAL; #else minor_count = DRBD_MINOR_COUNT_DEF; #endif } err = register_blkdev(DRBD_MAJOR, "drbd"); if (err) { printk(KERN_ERR "drbd: unable to register block device major %d\n", DRBD_MAJOR); return err; } register_reboot_notifier(&drbd_notifier); /* * allocate all necessary structs */ init_waitqueue_head(&drbd_pp_wait); drbd_proc = NULL; /* play safe for drbd_cleanup */ idr_init(&drbd_devices); rwlock_init(&global_state_lock); INIT_LIST_HEAD(&drbd_resources); err = drbd_genl_register(); if (err) { printk(KERN_ERR "drbd: unable to register generic netlink family\n"); goto fail; } err = drbd_create_mempools(); if (err) goto fail; err = -ENOMEM; drbd_proc = proc_create_data("drbd", S_IFREG | S_IRUGO , NULL, &drbd_proc_fops, NULL); if (!drbd_proc) { printk(KERN_ERR "drbd: unable to register proc file\n"); goto fail; } retry.wq = create_singlethread_workqueue("drbd-reissue"); if (!retry.wq) { printk(KERN_ERR "drbd: unable to create retry workqueue\n"); goto fail; } #ifdef COMPAT_INIT_WORK_HAS_THREE_ARGUMENTS INIT_WORK(&retry.worker, do_retry, &retry.worker); #else INIT_WORK(&retry.worker, do_retry); #endif spin_lock_init(&retry.lock); INIT_LIST_HEAD(&retry.writes); printk(KERN_INFO "drbd: initialized. " "Version: " REL_VERSION " (api:%d/proto:%d-%d)\n", API_VERSION, PRO_VERSION_MIN, PRO_VERSION_MAX); printk(KERN_INFO "drbd: %s\n", drbd_buildtag()); printk(KERN_INFO "drbd: registered as block device major %d\n", DRBD_MAJOR); return 0; /* Success! */ fail: drbd_cleanup(); if (err == -ENOMEM) printk(KERN_ERR "drbd: ran out of memory\n"); else printk(KERN_ERR "drbd: initialization failure\n"); return err; } void drbd_free_bc(struct drbd_backing_dev *ldev) { if (ldev == NULL) return; blkdev_put(ldev->backing_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); blkdev_put(ldev->md_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); kobject_del(&ldev->kobject); kobject_put(&ldev->kobject); } void drbd_free_sock(struct drbd_connection *connection) { if (connection->data.socket) { mutex_lock(&connection->data.mutex); kernel_sock_shutdown(connection->data.socket, SHUT_RDWR); sock_release(connection->data.socket); connection->data.socket = NULL; mutex_unlock(&connection->data.mutex); } if (connection->meta.socket) { mutex_lock(&connection->meta.mutex); kernel_sock_shutdown(connection->meta.socket, SHUT_RDWR); sock_release(connection->meta.socket); connection->meta.socket = NULL; mutex_unlock(&connection->meta.mutex); } } /* meta data management */ void conn_md_sync(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); drbd_md_sync(device); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); } /* aligned 4kByte */ struct meta_data_on_disk { u64 la_size_sect; /* last agreed size. */ u64 uuid[UI_SIZE]; /* UUIDs. */ u64 device_uuid; u64 reserved_u64_1; u32 flags; /* MDF */ u32 magic; u32 md_size_sect; u32 al_offset; /* offset to this block */ u32 al_nr_extents; /* important for restoring the AL (userspace) */ /* `-- act_log->nr_elements <-- ldev->dc.al_extents */ u32 bm_offset; /* offset to the bitmap, from here */ u32 bm_bytes_per_bit; /* BM_BLOCK_SIZE */ u32 la_peer_max_bio_size; /* last peer max_bio_size */ /* see al_tr_number_to_on_disk_sector() */ u32 al_stripes; u32 al_stripe_size_4k; u8 reserved_u8[4096 - (7*8 + 10*4)]; } __packed; void drbd_md_write(struct drbd_device *device, void *b) { struct meta_data_on_disk *buffer = b; sector_t sector; int i; memset(buffer, 0, sizeof(*buffer)); buffer->la_size_sect = cpu_to_be64(drbd_get_capacity(device->this_bdev)); for (i = UI_CURRENT; i < UI_SIZE; i++) buffer->uuid[i] = cpu_to_be64(device->ldev->md.uuid[i]); buffer->flags = cpu_to_be32(device->ldev->md.flags); buffer->magic = cpu_to_be32(DRBD_MD_MAGIC_84_UNCLEAN); buffer->md_size_sect = cpu_to_be32(device->ldev->md.md_size_sect); buffer->al_offset = cpu_to_be32(device->ldev->md.al_offset); buffer->al_nr_extents = cpu_to_be32(device->act_log->nr_elements); buffer->bm_bytes_per_bit = cpu_to_be32(BM_BLOCK_SIZE); buffer->device_uuid = cpu_to_be64(device->ldev->md.device_uuid); buffer->bm_offset = cpu_to_be32(device->ldev->md.bm_offset); buffer->la_peer_max_bio_size = cpu_to_be32(device->peer_max_bio_size); buffer->al_stripes = cpu_to_be32(device->ldev->md.al_stripes); buffer->al_stripe_size_4k = cpu_to_be32(device->ldev->md.al_stripe_size_4k); D_ASSERT(device, drbd_md_ss(device->ldev) == device->ldev->md.md_offset); sector = device->ldev->md.md_offset; if (drbd_md_sync_page_io(device, device->ldev, sector, WRITE)) { /* this was a try anyways ... */ drbd_err(device, "meta data update failed!\n"); drbd_chk_io_error(device, 1, DRBD_META_IO_ERROR); } } /** * drbd_md_sync() - Writes the meta data super block if the MD_DIRTY flag bit is set * @device: DRBD device. */ void drbd_md_sync(struct drbd_device *device) { struct meta_data_on_disk *buffer; /* Don't accidentally change the DRBD meta data layout. */ BUILD_BUG_ON(UI_SIZE != 4); BUILD_BUG_ON(sizeof(struct meta_data_on_disk) != 4096); del_timer(&device->md_sync_timer); /* timer may be rearmed by drbd_md_mark_dirty() now. */ if (!test_and_clear_bit(MD_DIRTY, &device->flags)) return; /* We use here D_FAILED and not D_ATTACHING because we try to write * metadata even if we detach due to a disk failure! */ if (!get_ldev_if_state(device, D_FAILED)) return; buffer = drbd_md_get_buffer(device); if (!buffer) goto out; drbd_md_write(device, buffer); /* Update device->ldev->md.la_size_sect, * since we updated it on metadata. */ device->ldev->md.la_size_sect = drbd_get_capacity(device->this_bdev); drbd_md_put_buffer(device); out: put_ldev(device); } static int check_activity_log_stripe_size(struct drbd_device *device, struct meta_data_on_disk *on_disk, struct drbd_md *in_core) { u32 al_stripes = be32_to_cpu(on_disk->al_stripes); u32 al_stripe_size_4k = be32_to_cpu(on_disk->al_stripe_size_4k); u64 al_size_4k; /* both not set: default to old fixed size activity log */ if (al_stripes == 0 && al_stripe_size_4k == 0) { al_stripes = 1; al_stripe_size_4k = MD_32kB_SECT/8; } /* some paranoia plausibility checks */ /* we need both values to be set */ if (al_stripes == 0 || al_stripe_size_4k == 0) goto err; al_size_4k = (u64)al_stripes * al_stripe_size_4k; /* Upper limit of activity log area, to avoid potential overflow * problems in al_tr_number_to_on_disk_sector(). As right now, more * than 72 * 4k blocks total only increases the amount of history, * limiting this arbitrarily to 16 GB is not a real limitation ;-) */ if (al_size_4k > (16 * 1024 * 1024/4)) goto err; /* Lower limit: we need at least 8 transaction slots (32kB) * to not break existing setups */ if (al_size_4k < MD_32kB_SECT/8) goto err; in_core->al_stripe_size_4k = al_stripe_size_4k; in_core->al_stripes = al_stripes; in_core->al_size_4k = al_size_4k; return 0; err: drbd_err(device, "invalid activity log striping: al_stripes=%u, al_stripe_size_4k=%u\n", al_stripes, al_stripe_size_4k); return -EINVAL; } static int check_offsets_and_sizes(struct drbd_device *device, struct drbd_backing_dev *bdev) { sector_t capacity = drbd_get_capacity(bdev->md_bdev); struct drbd_md *in_core = &bdev->md; s32 on_disk_al_sect; s32 on_disk_bm_sect; /* The on-disk size of the activity log, calculated from offsets, and * the size of the activity log calculated from the stripe settings, * should match. * Though we could relax this a bit: it is ok, if the striped activity log * fits in the available on-disk activity log size. * Right now, that would break how resize is implemented. * TODO: make drbd_determine_dev_size() (and the drbdmeta tool) aware * of possible unused padding space in the on disk layout. */ if (in_core->al_offset < 0) { if (in_core->bm_offset > in_core->al_offset) goto err; on_disk_al_sect = -in_core->al_offset; on_disk_bm_sect = in_core->al_offset - in_core->bm_offset; } else { if (in_core->al_offset != MD_4kB_SECT) goto err; if (in_core->bm_offset < in_core->al_offset + in_core->al_size_4k * MD_4kB_SECT) goto err; on_disk_al_sect = in_core->bm_offset - MD_4kB_SECT; on_disk_bm_sect = in_core->md_size_sect - in_core->bm_offset; } /* old fixed size meta data is exactly that: fixed. */ if (in_core->meta_dev_idx >= 0) { if (in_core->md_size_sect != MD_128MB_SECT || in_core->al_offset != MD_4kB_SECT || in_core->bm_offset != MD_4kB_SECT + MD_32kB_SECT || in_core->al_stripes != 1 || in_core->al_stripe_size_4k != MD_32kB_SECT/8) goto err; } if (capacity < in_core->md_size_sect) goto err; if (capacity - in_core->md_size_sect < drbd_md_first_sector(bdev)) goto err; /* should be aligned, and at least 32k */ if ((on_disk_al_sect & 7) || (on_disk_al_sect < MD_32kB_SECT)) goto err; /* should fit (for now: exactly) into the available on-disk space; * overflow prevention is in check_activity_log_stripe_size() above. */ if (on_disk_al_sect != in_core->al_size_4k * MD_4kB_SECT) goto err; /* again, should be aligned */ if (in_core->bm_offset & 7) goto err; /* FIXME check for device grow with flex external meta data? */ /* can the available bitmap space cover the last agreed device size? */ if (on_disk_bm_sect < (in_core->la_size_sect+7)/MD_4kB_SECT/8/512) goto err; return 0; err: drbd_err(device, "meta data offsets don't make sense: idx=%d " "al_s=%u, al_sz4k=%u, al_offset=%d, bm_offset=%d, " "md_size_sect=%u, la_size=%llu, md_capacity=%llu\n", in_core->meta_dev_idx, in_core->al_stripes, in_core->al_stripe_size_4k, in_core->al_offset, in_core->bm_offset, in_core->md_size_sect, (unsigned long long)in_core->la_size_sect, (unsigned long long)capacity); return -EINVAL; } /** * drbd_md_read() - Reads in the meta data super block * @device: DRBD device. * @bdev: Device from which the meta data should be read in. * * Return NO_ERROR on success, and an enum drbd_ret_code in case * something goes wrong. * * Called exactly once during drbd_adm_attach(), while still being D_DISKLESS, * even before @bdev is assigned to @device->ldev. */ int drbd_md_read(struct drbd_device *device, struct drbd_backing_dev *bdev) { struct meta_data_on_disk *buffer; u32 magic, flags; int i, rv = NO_ERROR; if (device->state.disk != D_DISKLESS) return ERR_DISK_CONFIGURED; buffer = drbd_md_get_buffer(device); if (!buffer) return ERR_NOMEM; /* First, figure out where our meta data superblock is located, * and read it. */ bdev->md.meta_dev_idx = bdev->disk_conf->meta_dev_idx; bdev->md.md_offset = drbd_md_ss(bdev); if (drbd_md_sync_page_io(device, bdev, bdev->md.md_offset, READ)) { /* NOTE: can't do normal error processing here as this is called BEFORE disk is attached */ drbd_err(device, "Error while reading metadata.\n"); rv = ERR_IO_MD_DISK; goto err; } magic = be32_to_cpu(buffer->magic); flags = be32_to_cpu(buffer->flags); if (magic == DRBD_MD_MAGIC_84_UNCLEAN || (magic == DRBD_MD_MAGIC_08 && !(flags & MDF_AL_CLEAN))) { /* btw: that's Activity Log clean, not "all" clean. */ drbd_err(device, "Found unclean meta data. Did you \"drbdadm apply-al\"?\n"); rv = ERR_MD_UNCLEAN; goto err; } rv = ERR_MD_INVALID; if (magic != DRBD_MD_MAGIC_08) { if (magic == DRBD_MD_MAGIC_07) drbd_err(device, "Found old (0.7) meta data magic. Did you \"drbdadm create-md\"?\n"); else drbd_err(device, "Meta data magic not found. Did you \"drbdadm create-md\"?\n"); goto err; } if (be32_to_cpu(buffer->bm_bytes_per_bit) != BM_BLOCK_SIZE) { drbd_err(device, "unexpected bm_bytes_per_bit: %u (expected %u)\n", be32_to_cpu(buffer->bm_bytes_per_bit), BM_BLOCK_SIZE); goto err; } /* convert to in_core endian */ bdev->md.la_size_sect = be64_to_cpu(buffer->la_size_sect); for (i = UI_CURRENT; i < UI_SIZE; i++) bdev->md.uuid[i] = be64_to_cpu(buffer->uuid[i]); bdev->md.flags = be32_to_cpu(buffer->flags); bdev->md.device_uuid = be64_to_cpu(buffer->device_uuid); bdev->md.md_size_sect = be32_to_cpu(buffer->md_size_sect); bdev->md.al_offset = be32_to_cpu(buffer->al_offset); bdev->md.bm_offset = be32_to_cpu(buffer->bm_offset); if (check_activity_log_stripe_size(device, buffer, &bdev->md)) goto err; if (check_offsets_and_sizes(device, bdev)) goto err; if (be32_to_cpu(buffer->bm_offset) != bdev->md.bm_offset) { drbd_err(device, "unexpected bm_offset: %d (expected %d)\n", be32_to_cpu(buffer->bm_offset), bdev->md.bm_offset); goto err; } if (be32_to_cpu(buffer->md_size_sect) != bdev->md.md_size_sect) { drbd_err(device, "unexpected md_size: %u (expected %u)\n", be32_to_cpu(buffer->md_size_sect), bdev->md.md_size_sect); goto err; } rv = NO_ERROR; spin_lock_irq(&device->resource->req_lock); if (device->state.conn < C_CONNECTED) { unsigned int peer; peer = be32_to_cpu(buffer->la_peer_max_bio_size); peer = max(peer, DRBD_MAX_BIO_SIZE_SAFE); device->peer_max_bio_size = peer; } spin_unlock_irq(&device->resource->req_lock); err: drbd_md_put_buffer(device); return rv; } /** * drbd_md_mark_dirty() - Mark meta data super block as dirty * @device: DRBD device. * * Call this function if you change anything that should be written to * the meta-data super block. This function sets MD_DIRTY, and starts a * timer that ensures that within five seconds you have to call drbd_md_sync(). */ #ifdef DRBD_DEBUG_MD_SYNC void drbd_md_mark_dirty_(struct drbd_device *device, unsigned int line, const char *func) { if (!test_and_set_bit(MD_DIRTY, &device->flags)) { mod_timer(&device->md_sync_timer, jiffies + HZ); device->last_md_mark_dirty.line = line; device->last_md_mark_dirty.func = func; } } #else void drbd_md_mark_dirty(struct drbd_device *device) { if (!test_and_set_bit(MD_DIRTY, &device->flags)) mod_timer(&device->md_sync_timer, jiffies + 5*HZ); } #endif void drbd_uuid_move_history(struct drbd_device *device) __must_hold(local) { int i; for (i = UI_HISTORY_START; i < UI_HISTORY_END; i++) device->ldev->md.uuid[i+1] = device->ldev->md.uuid[i]; } void __drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local) { if (idx == UI_CURRENT) { if (device->state.role == R_PRIMARY) val |= 1; else val &= ~((u64)1); drbd_set_ed_uuid(device, val); } device->ldev->md.uuid[idx] = val; drbd_md_mark_dirty(device); } void _drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local) { unsigned long flags; spin_lock_irqsave(&device->ldev->md.uuid_lock, flags); __drbd_uuid_set(device, idx, val); spin_unlock_irqrestore(&device->ldev->md.uuid_lock, flags); } void drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local) { unsigned long flags; spin_lock_irqsave(&device->ldev->md.uuid_lock, flags); if (device->ldev->md.uuid[idx]) { drbd_uuid_move_history(device); device->ldev->md.uuid[UI_HISTORY_START] = device->ldev->md.uuid[idx]; } __drbd_uuid_set(device, idx, val); spin_unlock_irqrestore(&device->ldev->md.uuid_lock, flags); } /** * drbd_uuid_new_current() - Creates a new current UUID * @device: DRBD device. * * Creates a new current UUID, and rotates the old current UUID into * the bitmap slot. Causes an incremental resync upon next connect. */ void drbd_uuid_new_current(struct drbd_device *device) __must_hold(local) { u64 val; unsigned long long bm_uuid; get_random_bytes(&val, sizeof(u64)); spin_lock_irq(&device->ldev->md.uuid_lock); bm_uuid = device->ldev->md.uuid[UI_BITMAP]; if (bm_uuid) drbd_warn(device, "bm UUID was already set: %llX\n", bm_uuid); device->ldev->md.uuid[UI_BITMAP] = device->ldev->md.uuid[UI_CURRENT]; __drbd_uuid_set(device, UI_CURRENT, val); spin_unlock_irq(&device->ldev->md.uuid_lock); drbd_print_uuids(device, "new current UUID"); /* get it to stable storage _now_ */ drbd_md_sync(device); } void drbd_uuid_set_bm(struct drbd_device *device, u64 val) __must_hold(local) { unsigned long flags; if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0) return; spin_lock_irqsave(&device->ldev->md.uuid_lock, flags); if (val == 0) { drbd_uuid_move_history(device); device->ldev->md.uuid[UI_HISTORY_START] = device->ldev->md.uuid[UI_BITMAP]; device->ldev->md.uuid[UI_BITMAP] = 0; } else { unsigned long long bm_uuid = device->ldev->md.uuid[UI_BITMAP]; if (bm_uuid) drbd_warn(device, "bm UUID was already set: %llX\n", bm_uuid); device->ldev->md.uuid[UI_BITMAP] = val & ~((u64)1); } spin_unlock_irqrestore(&device->ldev->md.uuid_lock, flags); drbd_md_mark_dirty(device); } /** * drbd_bmio_set_n_write() - io_fn for drbd_queue_bitmap_io() or drbd_bitmap_io() * @device: DRBD device. * * Sets all bits in the bitmap and writes the whole bitmap to stable storage. */ int drbd_bmio_set_n_write(struct drbd_device *device) { int rv = -EIO; if (get_ldev_if_state(device, D_ATTACHING)) { drbd_md_set_flag(device, MDF_FULL_SYNC); drbd_md_sync(device); drbd_bm_set_all(device); rv = drbd_bm_write(device); if (!rv) { drbd_md_clear_flag(device, MDF_FULL_SYNC); drbd_md_sync(device); } put_ldev(device); } return rv; } /** * drbd_bmio_clear_n_write() - io_fn for drbd_queue_bitmap_io() or drbd_bitmap_io() * @device: DRBD device. * * Clears all bits in the bitmap and writes the whole bitmap to stable storage. */ int drbd_bmio_clear_n_write(struct drbd_device *device) { int rv = -EIO; drbd_resume_al(device); if (get_ldev_if_state(device, D_ATTACHING)) { drbd_bm_clear_all(device); rv = drbd_bm_write(device); put_ldev(device); } return rv; } static int w_bitmap_io(struct drbd_work *w, int unused) { struct drbd_device *device = container_of(w, struct drbd_device, bm_io_work.w); struct bm_io_work *work = &device->bm_io_work; int rv = -EIO; D_ASSERT(device, atomic_read(&device->ap_bio_cnt) == 0); if (get_ldev(device)) { drbd_bm_lock(device, work->why, work->flags); rv = work->io_fn(device); drbd_bm_unlock(device); put_ldev(device); } clear_bit_unlock(BITMAP_IO, &device->flags); wake_up(&device->misc_wait); if (work->done) work->done(device, rv); clear_bit(BITMAP_IO_QUEUED, &device->flags); work->why = NULL; work->flags = 0; return 0; } void drbd_ldev_destroy(struct drbd_device *device) { lc_destroy(device->resync); device->resync = NULL; lc_destroy(device->act_log); device->act_log = NULL; __no_warn(local, drbd_free_bc(device->ldev); device->ldev = NULL;); clear_bit(GO_DISKLESS, &device->flags); } static int w_go_diskless(struct drbd_work *w, int unused) { struct drbd_device *device = container_of(w, struct drbd_device, go_diskless); D_ASSERT(device, device->state.disk == D_FAILED); /* we cannot assert local_cnt == 0 here, as get_ldev_if_state will * inc/dec it frequently. Once we are D_DISKLESS, no one will touch * the protected members anymore, though, so once put_ldev reaches zero * again, it will be safe to free them. */ /* Try to write changed bitmap pages, read errors may have just * set some bits outside the area covered by the activity log. * * If we have an IO error during the bitmap writeout, * we will want a full sync next time, just in case. * (Do we want a specific meta data flag for this?) * * If that does not make it to stable storage either, * we cannot do anything about that anymore. * * We still need to check if both bitmap and ldev are present, we may * end up here after a failed attach, before ldev was even assigned. */ if (device->bitmap && device->ldev) { /* An interrupted resync or similar is allowed to recounts bits * while we detach. * Any modifications would not be expected anymore, though. */ if (drbd_bitmap_io_from_worker(device, drbd_bm_write, "detach", BM_LOCKED_TEST_ALLOWED)) { if (test_bit(WAS_READ_ERROR, &device->flags)) { drbd_md_set_flag(device, MDF_FULL_SYNC); drbd_md_sync(device); } } } drbd_force_state(device, NS(disk, D_DISKLESS)); return 0; } /** * drbd_queue_bitmap_io() - Queues an IO operation on the whole bitmap * @device: DRBD device. * @io_fn: IO callback to be called when bitmap IO is possible * @done: callback to be called after the bitmap IO was performed * @why: Descriptive text of the reason for doing the IO * * While IO on the bitmap happens we freeze application IO thus we ensure * that drbd_set_out_of_sync() can not be called. This function MAY ONLY be * called from worker context. It MUST NOT be used while a previous such * work is still pending! */ void drbd_queue_bitmap_io(struct drbd_device *device, int (*io_fn)(struct drbd_device *), void (*done)(struct drbd_device *, int), char *why, enum bm_flag flags) { D_ASSERT(device, current == first_peer_device(device)->connection->worker.task); D_ASSERT(device, !test_bit(BITMAP_IO_QUEUED, &device->flags)); D_ASSERT(device, !test_bit(BITMAP_IO, &device->flags)); D_ASSERT(device, list_empty(&device->bm_io_work.w.list)); if (device->bm_io_work.why) drbd_err(device, "FIXME going to queue '%s' but '%s' still pending?\n", why, device->bm_io_work.why); device->bm_io_work.io_fn = io_fn; device->bm_io_work.done = done; device->bm_io_work.why = why; device->bm_io_work.flags = flags; spin_lock_irq(&device->resource->req_lock); set_bit(BITMAP_IO, &device->flags); if (atomic_read(&device->ap_bio_cnt) == 0) { if (!test_and_set_bit(BITMAP_IO_QUEUED, &device->flags)) drbd_queue_work(&first_peer_device(device)->connection->sender_work, &device->bm_io_work.w); } spin_unlock_irq(&device->resource->req_lock); } /** * drbd_bitmap_io() - Does an IO operation on the whole bitmap * @device: DRBD device. * @io_fn: IO callback to be called when bitmap IO is possible * @why: Descriptive text of the reason for doing the IO * * freezes application IO while that the actual IO operations runs. This * functions MAY NOT be called from worker context. */ int drbd_bitmap_io(struct drbd_device *device, int (*io_fn)(struct drbd_device *), char *why, enum bm_flag flags) { int rv; D_ASSERT(device, current != first_peer_device(device)->connection->worker.task); if ((flags & BM_LOCKED_SET_ALLOWED) == 0) drbd_suspend_io(device); drbd_bm_lock(device, why, flags); rv = io_fn(device); drbd_bm_unlock(device); if ((flags & BM_LOCKED_SET_ALLOWED) == 0) drbd_resume_io(device); return rv; } void drbd_md_set_flag(struct drbd_device *device, int flag) __must_hold(local) { if ((device->ldev->md.flags & flag) != flag) { drbd_md_mark_dirty(device); device->ldev->md.flags |= flag; } } void drbd_md_clear_flag(struct drbd_device *device, int flag) __must_hold(local) { if ((device->ldev->md.flags & flag) != 0) { drbd_md_mark_dirty(device); device->ldev->md.flags &= ~flag; } } int drbd_md_test_flag(struct drbd_backing_dev *bdev, int flag) { return (bdev->md.flags & flag) != 0; } static void md_sync_timer_fn(unsigned long data) { struct drbd_device *device = (struct drbd_device *) data; /* must not double-queue! */ if (list_empty(&device->md_sync_work.list)) drbd_queue_work_front(&first_peer_device(device)->connection->sender_work, &device->md_sync_work); } static int w_md_sync(struct drbd_work *w, int unused) { struct drbd_device *device = container_of(w, struct drbd_device, md_sync_work); drbd_warn(device, "md_sync_timer expired! Worker calls drbd_md_sync().\n"); #ifdef DRBD_DEBUG_MD_SYNC drbd_warn(device, "last md_mark_dirty: %s:%u\n", device->last_md_mark_dirty.func, device->last_md_mark_dirty.line); #endif drbd_md_sync(device); return 0; } const char *cmdname(enum drbd_packet cmd) { /* THINK may need to become several global tables * when we want to support more than * one PRO_VERSION */ static const char *cmdnames[] = { [P_DATA] = "Data", [P_DATA_REPLY] = "DataReply", [P_RS_DATA_REPLY] = "RSDataReply", [P_BARRIER] = "Barrier", [P_BITMAP] = "ReportBitMap", [P_BECOME_SYNC_TARGET] = "BecomeSyncTarget", [P_BECOME_SYNC_SOURCE] = "BecomeSyncSource", [P_UNPLUG_REMOTE] = "UnplugRemote", [P_DATA_REQUEST] = "DataRequest", [P_RS_DATA_REQUEST] = "RSDataRequest", [P_SYNC_PARAM] = "SyncParam", [P_SYNC_PARAM89] = "SyncParam89", [P_PROTOCOL] = "ReportProtocol", [P_UUIDS] = "ReportUUIDs", [P_SIZES] = "ReportSizes", [P_STATE] = "ReportState", [P_SYNC_UUID] = "ReportSyncUUID", [P_AUTH_CHALLENGE] = "AuthChallenge", [P_AUTH_RESPONSE] = "AuthResponse", [P_PING] = "Ping", [P_PING_ACK] = "PingAck", [P_RECV_ACK] = "RecvAck", [P_WRITE_ACK] = "WriteAck", [P_RS_WRITE_ACK] = "RSWriteAck", [P_SUPERSEDED] = "Superseded", [P_NEG_ACK] = "NegAck", [P_NEG_DREPLY] = "NegDReply", [P_NEG_RS_DREPLY] = "NegRSDReply", [P_BARRIER_ACK] = "BarrierAck", [P_STATE_CHG_REQ] = "StateChgRequest", [P_STATE_CHG_REPLY] = "StateChgReply", [P_OV_REQUEST] = "OVRequest", [P_OV_REPLY] = "OVReply", [P_OV_RESULT] = "OVResult", [P_CSUM_RS_REQUEST] = "CsumRSRequest", [P_RS_IS_IN_SYNC] = "CsumRSIsInSync", [P_COMPRESSED_BITMAP] = "CBitmap", [P_DELAY_PROBE] = "DelayProbe", [P_OUT_OF_SYNC] = "OutOfSync", [P_RETRY_WRITE] = "RetryWrite", [P_RS_CANCEL] = "RSCancel", [P_CONN_ST_CHG_REQ] = "conn_st_chg_req", [P_CONN_ST_CHG_REPLY] = "conn_st_chg_reply", [P_RETRY_WRITE] = "retry_write", [P_PROTOCOL_UPDATE] = "protocol_update", /* enum drbd_packet, but not commands - obsoleted flags: * P_MAY_IGNORE * P_MAX_OPT_CMD */ }; /* too big for the array: 0xfffX */ if (cmd == P_INITIAL_META) return "InitialMeta"; if (cmd == P_INITIAL_DATA) return "InitialData"; if (cmd == P_CONNECTION_FEATURES) return "ConnectionFeatures"; if (cmd >= ARRAY_SIZE(cmdnames)) return "Unknown"; return cmdnames[cmd]; } /** * drbd_wait_misc - wait for a request to make progress * @device: device associated with the request * @i: the struct drbd_interval embedded in struct drbd_request or * struct drbd_peer_request */ int drbd_wait_misc(struct drbd_device *device, struct drbd_interval *i) { struct net_conf *nc; DEFINE_WAIT(wait); long timeout; rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); if (!nc) { rcu_read_unlock(); return -ETIMEDOUT; } timeout = nc->ko_count ? nc->timeout * HZ / 10 * nc->ko_count : MAX_SCHEDULE_TIMEOUT; rcu_read_unlock(); /* Indicate to wake up device->misc_wait on progress. */ i->waiting = true; prepare_to_wait(&device->misc_wait, &wait, TASK_INTERRUPTIBLE); spin_unlock_irq(&device->resource->req_lock); timeout = schedule_timeout(timeout); finish_wait(&device->misc_wait, &wait); spin_lock_irq(&device->resource->req_lock); if (!timeout || device->state.conn < C_CONNECTED) return -ETIMEDOUT; if (signal_pending(current)) return -ERESTARTSYS; return 0; } #ifdef COMPAT_HAVE_IDR_FOR_EACH static int idr_has_entry(int id, void *p, void *data) { return 1; } bool idr_is_empty(struct idr *idr) { return !idr_for_each(idr, idr_has_entry, NULL); } #else bool idr_is_empty(struct idr *idr) { int n = 0; void *p; idr_for_each_entry(idr, p, n) return false; return true; } #endif #ifdef CONFIG_DRBD_FAULT_INJECTION /* Fault insertion support including random number generator shamelessly * stolen from kernel/rcutorture.c */ struct fault_random_state { unsigned long state; unsigned long count; }; #define FAULT_RANDOM_MULT 39916801 /* prime */ #define FAULT_RANDOM_ADD 479001701 /* prime */ #define FAULT_RANDOM_REFRESH 10000 /* * Crude but fast random-number generator. Uses a linear congruential * generator, with occasional help from get_random_bytes(). */ static unsigned long _drbd_fault_random(struct fault_random_state *rsp) { long refresh; if (!rsp->count--) { get_random_bytes(&refresh, sizeof(refresh)); rsp->state += refresh; rsp->count = FAULT_RANDOM_REFRESH; } rsp->state = rsp->state * FAULT_RANDOM_MULT + FAULT_RANDOM_ADD; return swahw32(rsp->state); } static char * _drbd_fault_str(unsigned int type) { static char *_faults[] = { [DRBD_FAULT_MD_WR] = "Meta-data write", [DRBD_FAULT_MD_RD] = "Meta-data read", [DRBD_FAULT_RS_WR] = "Resync write", [DRBD_FAULT_RS_RD] = "Resync read", [DRBD_FAULT_DT_WR] = "Data write", [DRBD_FAULT_DT_RD] = "Data read", [DRBD_FAULT_DT_RA] = "Data read ahead", [DRBD_FAULT_BM_ALLOC] = "BM allocation", [DRBD_FAULT_AL_EE] = "EE allocation", [DRBD_FAULT_RECEIVE] = "receive data corruption", }; return (type < DRBD_FAULT_MAX) ? _faults[type] : "**Unknown**"; } unsigned int _drbd_insert_fault(struct drbd_device *device, unsigned int type) { static struct fault_random_state rrs = {0, 0}; unsigned int ret = ( (fault_devs == 0 || ((1 << device_to_minor(device)) & fault_devs) != 0) && (((_drbd_fault_random(&rrs) % 100) + 1) <= fault_rate)); if (ret) { fault_count++; if (DRBD_ratelimit(5*HZ, 5)) drbd_warn(device, "***Simulating %s failure\n", _drbd_fault_str(type)); } return ret; } #endif module_init(drbd_init) module_exit(drbd_cleanup) /* For drbd_tracing: */ EXPORT_SYMBOL(drbd_conn_str); EXPORT_SYMBOL(drbd_role_str); EXPORT_SYMBOL(drbd_disk_str); EXPORT_SYMBOL(drbd_set_st_err_str); drbd-8.4.4/drbd/drbd_nl.c0000664000000000000000000031747512226001711013636 0ustar rootroot/* drbd_nl.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include "drbd_int.h" #include "drbd_protocol.h" #include "drbd_req.h" #include #include #include #include #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,31) /* * copied from more recent kernel source */ int genl_register_family_with_ops(struct genl_family *family, struct genl_ops *ops, size_t n_ops) { int err, i; err = genl_register_family(family); if (err) return err; for (i = 0; i < n_ops; ++i, ++ops) { err = genl_register_ops(family, ops); if (err) goto err_out; } return 0; err_out: genl_unregister_family(family); return err; } #endif /* .doit */ // int drbd_adm_create_resource(struct sk_buff *skb, struct genl_info *info); // int drbd_adm_delete_resource(struct sk_buff *skb, struct genl_info *info); int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info); int drbd_adm_del_minor(struct sk_buff *skb, struct genl_info *info); int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info); int drbd_adm_del_resource(struct sk_buff *skb, struct genl_info *info); int drbd_adm_down(struct sk_buff *skb, struct genl_info *info); int drbd_adm_set_role(struct sk_buff *skb, struct genl_info *info); int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info); int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info); int drbd_adm_detach(struct sk_buff *skb, struct genl_info *info); int drbd_adm_connect(struct sk_buff *skb, struct genl_info *info); int drbd_adm_net_opts(struct sk_buff *skb, struct genl_info *info); int drbd_adm_resize(struct sk_buff *skb, struct genl_info *info); int drbd_adm_start_ov(struct sk_buff *skb, struct genl_info *info); int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info); int drbd_adm_disconnect(struct sk_buff *skb, struct genl_info *info); int drbd_adm_invalidate(struct sk_buff *skb, struct genl_info *info); int drbd_adm_invalidate_peer(struct sk_buff *skb, struct genl_info *info); int drbd_adm_pause_sync(struct sk_buff *skb, struct genl_info *info); int drbd_adm_resume_sync(struct sk_buff *skb, struct genl_info *info); int drbd_adm_suspend_io(struct sk_buff *skb, struct genl_info *info); int drbd_adm_resume_io(struct sk_buff *skb, struct genl_info *info); int drbd_adm_outdate(struct sk_buff *skb, struct genl_info *info); int drbd_adm_resource_opts(struct sk_buff *skb, struct genl_info *info); int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info); int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info); /* .dumpit */ int drbd_adm_get_status_all(struct sk_buff *skb, struct netlink_callback *cb); #include #include "drbd_nla.h" #include /* used blkdev_get_by_path, to claim our meta data device(s) */ static char *drbd_m_holder = "Hands off! this is DRBD's meta data device."; static void drbd_adm_send_reply(struct sk_buff *skb, struct genl_info *info) { genlmsg_end(skb, genlmsg_data(nlmsg_data(nlmsg_hdr(skb)))); if (genlmsg_reply(skb, info)) printk(KERN_ERR "drbd: error sending genl reply\n"); } /* Used on a fresh "drbd_adm_prepare"d reply_skb, this cannot fail: The only * reason it could fail was no space in skb, and there are 4k available. */ int drbd_msg_put_info(struct sk_buff *skb, const char *info) { struct nlattr *nla; int err = -EMSGSIZE; if (!info || !info[0]) return 0; nla = nla_nest_start(skb, DRBD_NLA_CFG_REPLY); if (!nla) return err; err = nla_put_string(skb, T_info_text, info); if (err) { nla_nest_cancel(skb, nla); return err; } else nla_nest_end(skb, nla); return 0; } #ifdef COMPAT_HAVE_SECURITY_NETLINK_RECV #define drbd_security_netlink_recv(skb, cap) \ security_netlink_recv(skb, cap) #else /* see * fd77846 security: remove the security_netlink_recv hook as it is equivalent to capable() */ static inline bool drbd_security_netlink_recv(struct sk_buff *skb, int cap) { return !capable(cap); } #endif /* This would be a good candidate for a "pre_doit" hook, * and per-family private info->pointers. * But we need to stay compatible with older kernels. * If it returns successfully, adm_ctx members are valid. */ #define DRBD_ADM_NEED_MINOR 1 #define DRBD_ADM_NEED_RESOURCE 2 #define DRBD_ADM_NEED_CONNECTION 4 static int drbd_adm_prepare(struct drbd_config_context *adm_ctx, struct sk_buff *skb, struct genl_info *info, unsigned flags) { struct drbd_genlmsghdr *d_in = info->userhdr; const u8 cmd = info->genlhdr->cmd; int err; memset(adm_ctx, 0, sizeof(*adm_ctx)); /* genl_rcv_msg only checks for CAP_NET_ADMIN on "GENL_ADMIN_PERM" :( */ if (cmd != DRBD_ADM_GET_STATUS && drbd_security_netlink_recv(skb, CAP_SYS_ADMIN)) return -EPERM; adm_ctx->reply_skb = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); if (!adm_ctx->reply_skb) { err = -ENOMEM; goto fail; } adm_ctx->reply_dh = genlmsg_put_reply(adm_ctx->reply_skb, info, &drbd_genl_family, 0, cmd); /* put of a few bytes into a fresh skb of >= 4k will always succeed. * but anyways */ if (!adm_ctx->reply_dh) { err = -ENOMEM; goto fail; } adm_ctx->reply_dh->minor = d_in->minor; adm_ctx->reply_dh->ret_code = NO_ERROR; adm_ctx->volume = VOLUME_UNSPECIFIED; if (info->attrs[DRBD_NLA_CFG_CONTEXT]) { struct nlattr *nla; /* parse and validate only */ err = drbd_cfg_context_from_attrs(NULL, info); if (err) goto fail; /* It was present, and valid, * copy it over to the reply skb. */ err = nla_put_nohdr(adm_ctx->reply_skb, info->attrs[DRBD_NLA_CFG_CONTEXT]->nla_len, info->attrs[DRBD_NLA_CFG_CONTEXT]); if (err) goto fail; /* and assign stuff to the global adm_ctx */ nla = nested_attr_tb[__nla_type(T_ctx_volume)]; if (nla) adm_ctx->volume = nla_get_u32(nla); nla = nested_attr_tb[__nla_type(T_ctx_resource_name)]; if (nla) adm_ctx->resource_name = nla_data(nla); adm_ctx->my_addr = nested_attr_tb[__nla_type(T_ctx_my_addr)]; adm_ctx->peer_addr = nested_attr_tb[__nla_type(T_ctx_peer_addr)]; if ((adm_ctx->my_addr && nla_len(adm_ctx->my_addr) > sizeof(adm_ctx->connection->my_addr)) || (adm_ctx->peer_addr && nla_len(adm_ctx->peer_addr) > sizeof(adm_ctx->connection->peer_addr))) { err = -EINVAL; goto fail; } } adm_ctx->minor = d_in->minor; adm_ctx->device = minor_to_device(d_in->minor); if (adm_ctx->resource_name) { adm_ctx->resource = drbd_find_resource(adm_ctx->resource_name); } if (!adm_ctx->device && (flags & DRBD_ADM_NEED_MINOR)) { drbd_msg_put_info(adm_ctx->reply_skb, "unknown minor"); return ERR_MINOR_INVALID; } if (!adm_ctx->resource && (flags & DRBD_ADM_NEED_RESOURCE)) { drbd_msg_put_info(adm_ctx->reply_skb, "unknown resource"); if (adm_ctx->resource_name) return ERR_RES_NOT_KNOWN; return ERR_INVALID_REQUEST; } if (flags & DRBD_ADM_NEED_CONNECTION) { if (adm_ctx->resource) { drbd_msg_put_info(adm_ctx->reply_skb, "no resource name expected"); return ERR_INVALID_REQUEST; } if (adm_ctx->device) { drbd_msg_put_info(adm_ctx->reply_skb, "no minor number expected"); return ERR_INVALID_REQUEST; } if (adm_ctx->my_addr && adm_ctx->peer_addr) adm_ctx->connection = conn_get_by_addrs(nla_data(adm_ctx->my_addr), nla_len(adm_ctx->my_addr), nla_data(adm_ctx->peer_addr), nla_len(adm_ctx->peer_addr)); if (!adm_ctx->connection) { drbd_msg_put_info(adm_ctx->reply_skb, "unknown connection"); return ERR_INVALID_REQUEST; } } /* some more paranoia, if the request was over-determined */ if (adm_ctx->device && adm_ctx->resource && adm_ctx->device->resource != adm_ctx->resource) { pr_warning("request: minor=%u, resource=%s; but that minor belongs to resource %s\n", adm_ctx->minor, adm_ctx->resource->name, adm_ctx->device->resource->name); drbd_msg_put_info(adm_ctx->reply_skb, "minor exists in different resource"); return ERR_INVALID_REQUEST; } if (adm_ctx->device && adm_ctx->volume != VOLUME_UNSPECIFIED && adm_ctx->volume != adm_ctx->device->vnr) { pr_warning("request: minor=%u, volume=%u; but that minor is volume %u in %s\n", adm_ctx->minor, adm_ctx->volume, adm_ctx->device->vnr, adm_ctx->device->resource->name); drbd_msg_put_info(adm_ctx->reply_skb, "minor exists as different volume"); return ERR_INVALID_REQUEST; } return NO_ERROR; fail: nlmsg_free(adm_ctx->reply_skb); adm_ctx->reply_skb = NULL; return err; } static int drbd_adm_finish(struct drbd_config_context *adm_ctx, struct genl_info *info, int retcode) { if (adm_ctx->connection) { kref_put(&adm_ctx->connection->kref, &drbd_destroy_connection); adm_ctx->connection = NULL; } if (adm_ctx->resource) { kref_put(&adm_ctx->resource->kref, drbd_destroy_resource); adm_ctx->resource = NULL; } if (!adm_ctx->reply_skb) return -ENOMEM; adm_ctx->reply_dh->ret_code = retcode; drbd_adm_send_reply(adm_ctx->reply_skb, info); return 0; } static void setup_khelper_env(struct drbd_connection *connection, char **envp) { char *afs; /* FIXME: A future version will not allow this case. */ if (connection->my_addr_len == 0 || connection->peer_addr_len == 0) return; switch (((struct sockaddr *)&connection->peer_addr)->sa_family) { case AF_INET6: afs = "ipv6"; snprintf(envp[4], 60, "DRBD_PEER_ADDRESS=%pI6", &((struct sockaddr_in6 *)&connection->peer_addr)->sin6_addr); break; case AF_INET: afs = "ipv4"; snprintf(envp[4], 60, "DRBD_PEER_ADDRESS=%pI4", &((struct sockaddr_in *)&connection->peer_addr)->sin_addr); break; default: afs = "ssocks"; snprintf(envp[4], 60, "DRBD_PEER_ADDRESS=%pI4", &((struct sockaddr_in *)&connection->peer_addr)->sin_addr); } snprintf(envp[3], 20, "DRBD_PEER_AF=%s", afs); } int drbd_khelper(struct drbd_device *device, char *cmd) { char *envp[] = { "HOME=/", "TERM=linux", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", (char[20]) { }, /* address family */ (char[60]) { }, /* address */ NULL }; char mb[12]; char *argv[] = {usermode_helper, cmd, mb, NULL }; struct drbd_connection *connection = first_peer_device(device)->connection; struct sib_info sib; int ret; if (current == connection->worker.task) set_bit(CALLBACK_PENDING, &connection->flags); snprintf(mb, 12, "minor-%d", device_to_minor(device)); setup_khelper_env(connection, envp); /* The helper may take some time. * write out any unsynced meta data changes now */ drbd_md_sync(device); drbd_info(device, "helper command: %s %s %s\n", usermode_helper, cmd, mb); sib.sib_reason = SIB_HELPER_PRE; sib.helper_name = cmd; drbd_bcast_event(device, &sib); ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC); if (ret) drbd_warn(device, "helper command: %s %s %s exit code %u (0x%x)\n", usermode_helper, cmd, mb, (ret >> 8) & 0xff, ret); else drbd_info(device, "helper command: %s %s %s exit code %u (0x%x)\n", usermode_helper, cmd, mb, (ret >> 8) & 0xff, ret); sib.sib_reason = SIB_HELPER_POST; sib.helper_exit_code = ret; drbd_bcast_event(device, &sib); if (current == connection->worker.task) clear_bit(CALLBACK_PENDING, &connection->flags); if (ret < 0) /* Ignore any ERRNOs we got. */ ret = 0; return ret; } int conn_khelper(struct drbd_connection *connection, char *cmd) { char *envp[] = { "HOME=/", "TERM=linux", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", (char[20]) { }, /* address family */ (char[60]) { }, /* address */ NULL }; char *resource_name = connection->resource->name; char *argv[] = {usermode_helper, cmd, resource_name, NULL }; int ret; setup_khelper_env(connection, envp); conn_md_sync(connection); drbd_info(connection, "helper command: %s %s %s\n", usermode_helper, cmd, resource_name); /* TODO: conn_bcast_event() ?? */ ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC); if (ret) drbd_warn(connection, "helper command: %s %s %s exit code %u (0x%x)\n", usermode_helper, cmd, resource_name, (ret >> 8) & 0xff, ret); else drbd_info(connection, "helper command: %s %s %s exit code %u (0x%x)\n", usermode_helper, cmd, resource_name, (ret >> 8) & 0xff, ret); /* TODO: conn_bcast_event() ?? */ if (ret < 0) /* Ignore any ERRNOs we got. */ ret = 0; return ret; } static enum drbd_fencing_p highest_fencing_policy(struct drbd_connection *connection) { enum drbd_fencing_p fp = FP_NOT_AVAIL; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (get_ldev_if_state(device, D_CONSISTENT)) { struct disk_conf *disk_conf = rcu_dereference(peer_device->device->ldev->disk_conf); fp = max_t(enum drbd_fencing_p, fp, disk_conf->fencing); put_ldev(device); } } rcu_read_unlock(); if (fp == FP_NOT_AVAIL) { /* IO Suspending works on the whole resource. Do it only for one device. */ vnr = 0; peer_device = idr_get_next(&connection->peer_devices, &vnr); drbd_change_state(peer_device->device, CS_VERBOSE | CS_HARD, NS(susp_fen, 0)); } return fp; } bool conn_try_outdate_peer(struct drbd_connection *connection) { unsigned int connect_cnt; union drbd_state mask = { }; union drbd_state val = { }; enum drbd_fencing_p fp; char *ex_to_string; int r; if (connection->cstate >= C_WF_REPORT_PARAMS) { drbd_err(connection, "Expected cstate < C_WF_REPORT_PARAMS\n"); return false; } spin_lock_irq(&connection->resource->req_lock); connect_cnt = connection->connect_cnt; spin_unlock_irq(&connection->resource->req_lock); fp = highest_fencing_policy(connection); switch (fp) { case FP_NOT_AVAIL: drbd_warn(connection, "Not fencing peer, I'm not even Consistent myself.\n"); goto out; case FP_DONT_CARE: return true; default: ; } r = conn_khelper(connection, "fence-peer"); switch ((r>>8) & 0xff) { case 3: /* peer is inconsistent */ ex_to_string = "peer is inconsistent or worse"; mask.pdsk = D_MASK; val.pdsk = D_INCONSISTENT; break; case 4: /* peer got outdated, or was already outdated */ ex_to_string = "peer was fenced"; mask.pdsk = D_MASK; val.pdsk = D_OUTDATED; break; case 5: /* peer was down */ if (conn_highest_disk(connection) == D_UP_TO_DATE) { /* we will(have) create(d) a new UUID anyways... */ ex_to_string = "peer is unreachable, assumed to be dead"; mask.pdsk = D_MASK; val.pdsk = D_OUTDATED; } else { ex_to_string = "peer unreachable, doing nothing since disk != UpToDate"; } break; case 6: /* Peer is primary, voluntarily outdate myself. * This is useful when an unconnected R_SECONDARY is asked to * become R_PRIMARY, but finds the other peer being active. */ ex_to_string = "peer is active"; drbd_warn(connection, "Peer is primary, outdating myself.\n"); mask.disk = D_MASK; val.disk = D_OUTDATED; break; case 7: /* THINK: do we need to handle this * like case 4, or more like case 5? */ if (fp != FP_STONITH) drbd_err(connection, "fence-peer() = 7 && fencing != Stonith !!!\n"); ex_to_string = "peer was stonithed"; mask.pdsk = D_MASK; val.pdsk = D_OUTDATED; break; default: /* The script is broken ... */ drbd_err(connection, "fence-peer helper broken, returned %d\n", (r>>8)&0xff); return false; /* Eventually leave IO frozen */ } drbd_info(connection, "fence-peer helper returned %d (%s)\n", (r>>8) & 0xff, ex_to_string); out: /* Not using conn_request_state(connection, mask, val, CS_VERBOSE); here, because we might were able to re-establish the connection in the meantime. */ spin_lock_irq(&connection->resource->req_lock); if (connection->cstate < C_WF_REPORT_PARAMS && !test_bit(STATE_SENT, &connection->flags)) { if (connection->connect_cnt != connect_cnt) /* In case the connection was established and droped while the fence-peer handler was running, ignore it */ drbd_info(connection, "Ignoring fence-peer exit code\n"); else _conn_request_state(connection, mask, val, CS_VERBOSE); } spin_unlock_irq(&connection->resource->req_lock); return conn_highest_pdsk(connection) <= D_OUTDATED; } static int _try_outdate_peer_async(void *data) { struct drbd_connection *connection = (struct drbd_connection *)data; conn_try_outdate_peer(connection); kref_put(&connection->kref, &drbd_destroy_connection); return 0; } void conn_try_outdate_peer_async(struct drbd_connection *connection) { struct task_struct *opa; kref_get(&connection->kref); opa = kthread_run(_try_outdate_peer_async, connection, "drbd_async_h"); if (IS_ERR(opa)) { drbd_err(connection, "out of mem, failed to invoke fence-peer helper\n"); kref_put(&connection->kref, drbd_destroy_connection); } } enum drbd_state_rv drbd_set_role(struct drbd_device *device, enum drbd_role new_role, int force) { const int max_tries = 4; enum drbd_state_rv rv = SS_UNKNOWN_ERROR; struct net_conf *nc; int try = 0; int forced = 0; union drbd_state mask, val; if (new_role == R_PRIMARY) { struct drbd_connection *connection; /* Detect dead peers as soon as possible. */ rcu_read_lock(); for_each_connection(connection, device->resource) request_ping(connection); rcu_read_unlock(); } mutex_lock(device->state_mutex); mask.i = 0; mask.role = R_MASK; val.i = 0; val.role = new_role; while (try++ < max_tries) { rv = _drbd_request_state(device, mask, val, CS_WAIT_COMPLETE); /* in case we first succeeded to outdate, * but now suddenly could establish a connection */ if (rv == SS_CW_FAILED_BY_PEER && mask.pdsk != 0) { val.pdsk = 0; mask.pdsk = 0; continue; } if (rv == SS_NO_UP_TO_DATE_DISK && force && (device->state.disk < D_UP_TO_DATE && device->state.disk >= D_INCONSISTENT)) { mask.disk = D_MASK; val.disk = D_UP_TO_DATE; forced = 1; continue; } if (rv == SS_NO_UP_TO_DATE_DISK && device->state.disk == D_CONSISTENT && mask.pdsk == 0) { D_ASSERT(device, device->state.pdsk == D_UNKNOWN); if (conn_try_outdate_peer(first_peer_device(device)->connection)) { val.disk = D_UP_TO_DATE; mask.disk = D_MASK; } continue; } if (rv == SS_NOTHING_TO_DO) goto out; if (rv == SS_PRIMARY_NOP && mask.pdsk == 0) { if (!conn_try_outdate_peer(first_peer_device(device)->connection) && force) { drbd_warn(device, "Forced into split brain situation!\n"); mask.pdsk = D_MASK; val.pdsk = D_OUTDATED; } continue; } if (rv == SS_TWO_PRIMARIES) { /* Maybe the peer is detected as dead very soon... retry at most once more in this case. */ int timeo; rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); timeo = nc ? (nc->ping_timeo + 1) * HZ / 10 : 1; rcu_read_unlock(); schedule_timeout_interruptible(timeo); if (try < max_tries) try = max_tries - 1; continue; } if (rv < SS_SUCCESS) { rv = _drbd_request_state(device, mask, val, CS_VERBOSE + CS_WAIT_COMPLETE); if (rv < SS_SUCCESS) goto out; } break; } if (rv < SS_SUCCESS) goto out; if (forced) drbd_warn(device, "Forced to consider local data as UpToDate!\n"); /* Wait until nothing is on the fly :) */ wait_event(device->misc_wait, atomic_read(&device->ap_pending_cnt) == 0); /* FIXME also wait for all pending P_BARRIER_ACK? */ if (new_role == R_SECONDARY) { set_disk_ro(device->vdisk, true); if (get_ldev(device)) { device->ldev->md.uuid[UI_CURRENT] &= ~(u64)1; put_ldev(device); } } else { /* Called from drbd_adm_set_role only. * We are still holding the conf_update mutex. */ nc = first_peer_device(device)->connection->net_conf; if (nc) nc->discard_my_data = 0; /* without copy; single bit op is atomic */ set_disk_ro(device->vdisk, false); if (get_ldev(device)) { if (((device->state.conn < C_CONNECTED || device->state.pdsk <= D_FAILED) && device->ldev->md.uuid[UI_BITMAP] == 0) || forced) drbd_uuid_new_current(device); device->ldev->md.uuid[UI_CURRENT] |= (u64)1; put_ldev(device); } } /* writeout of activity log covered areas of the bitmap * to stable storage done in after state change already */ if (device->state.conn >= C_WF_REPORT_PARAMS) { /* if this was forced, we should consider sync */ if (forced) drbd_send_uuids(first_peer_device(device)); drbd_send_current_state(first_peer_device(device)); } drbd_md_sync(device); drbd_kobject_uevent(device); out: mutex_unlock(device->state_mutex); return rv; } static const char *from_attrs_err_to_txt(int err) { return err == -ENOMSG ? "required attribute missing" : err == -EOPNOTSUPP ? "unknown mandatory attribute" : err == -EEXIST ? "can not change invariant setting" : "invalid attribute value"; } int drbd_adm_set_role(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct set_role_parms parms; int err; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; memset(&parms, 0, sizeof(parms)); if (info->attrs[DRBD_NLA_SET_ROLE_PARMS]) { err = set_role_parms_from_attrs(&parms, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto out; } } mutex_lock(&adm_ctx.device->resource->conf_update); genl_unlock(); if (info->genlhdr->cmd == DRBD_ADM_PRIMARY) retcode = drbd_set_role(adm_ctx.device, R_PRIMARY, parms.assume_uptodate); else retcode = drbd_set_role(adm_ctx.device, R_SECONDARY, 0); genl_lock(); mutex_unlock(&adm_ctx.device->resource->conf_update); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } /* Initializes the md.*_offset members, so we are able to find * the on disk meta data. * * We currently have two possible layouts: * external: * |----------- md_size_sect ------------------| * [ 4k superblock ][ activity log ][ Bitmap ] * | al_offset == 8 | * | bm_offset = al_offset + X | * ==> bitmap sectors = md_size_sect - bm_offset * * internal: * |----------- md_size_sect ------------------| * [data.....][ Bitmap ][ activity log ][ 4k superblock ] * | al_offset < 0 | * | bm_offset = al_offset - Y | * ==> bitmap sectors = Y = al_offset - bm_offset * * Activity log size used to be fixed 32kB, * but is about to become configurable. */ static void drbd_md_set_sector_offsets(struct drbd_device *device, struct drbd_backing_dev *bdev) { sector_t md_size_sect = 0; unsigned int al_size_sect = bdev->md.al_size_4k * 8; bdev->md.md_offset = drbd_md_ss(bdev); switch (bdev->md.meta_dev_idx) { default: /* v07 style fixed size indexed meta data */ bdev->md.md_size_sect = MD_128MB_SECT; bdev->md.al_offset = MD_4kB_SECT; bdev->md.bm_offset = MD_4kB_SECT + al_size_sect; break; case DRBD_MD_INDEX_FLEX_EXT: /* just occupy the full device; unit: sectors */ bdev->md.md_size_sect = drbd_get_capacity(bdev->md_bdev); bdev->md.al_offset = MD_4kB_SECT; bdev->md.bm_offset = MD_4kB_SECT + al_size_sect; break; case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_INT: /* al size is still fixed */ bdev->md.al_offset = -al_size_sect; /* we need (slightly less than) ~ this much bitmap sectors: */ md_size_sect = drbd_get_capacity(bdev->backing_bdev); md_size_sect = ALIGN(md_size_sect, BM_SECT_PER_EXT); md_size_sect = BM_SECT_TO_EXT(md_size_sect); md_size_sect = ALIGN(md_size_sect, 8); /* plus the "drbd meta data super block", * and the activity log; */ md_size_sect += MD_4kB_SECT + al_size_sect; bdev->md.md_size_sect = md_size_sect; /* bitmap offset is adjusted by 'super' block size */ bdev->md.bm_offset = -md_size_sect + MD_4kB_SECT; break; } } /* input size is expected to be in KB */ char *ppsize(char *buf, unsigned long long size) { /* Needs 9 bytes at max including trailing NUL: * -1ULL ==> "16384 EB" */ static char units[] = { 'K', 'M', 'G', 'T', 'P', 'E' }; int base = 0; while (size >= 10000 && base < sizeof(units)-1) { /* shift + round */ size = (size >> 10) + !!(size & (1<<9)); base++; } sprintf(buf, "%u %cB", (unsigned)size, units[base]); return buf; } /* there is still a theoretical deadlock when called from receiver * on an D_INCONSISTENT R_PRIMARY: * remote READ does inc_ap_bio, receiver would need to receive answer * packet from remote to dec_ap_bio again. * receiver receive_sizes(), comes here, * waits for ap_bio_cnt == 0. -> deadlock. * but this cannot happen, actually, because: * R_PRIMARY D_INCONSISTENT, and peer's disk is unreachable * (not connected, or bad/no disk on peer): * see drbd_fail_request_early, ap_bio_cnt is zero. * R_PRIMARY D_INCONSISTENT, and C_SYNC_TARGET: * peer may not initiate a resize. */ /* Note these are not to be confused with * drbd_adm_suspend_io/drbd_adm_resume_io, * which are (sub) state changes triggered by admin (drbdsetup), * and can be long lived. * This changes an device->flag, is triggered by drbd internals, * and should be short-lived. */ void drbd_suspend_io(struct drbd_device *device) { set_bit(SUSPEND_IO, &device->flags); if (drbd_suspended(device)) return; wait_event(device->misc_wait, !atomic_read(&device->ap_bio_cnt)); } void drbd_resume_io(struct drbd_device *device) { clear_bit(SUSPEND_IO, &device->flags); wake_up(&device->misc_wait); } /** * drbd_determine_dev_size() - Sets the right device size obeying all constraints * @device: DRBD device. * * Returns 0 on success, negative return values indicate errors. * You should call drbd_md_sync() after calling this function. */ enum determine_dev_size drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct resize_parms *rs) __must_hold(local) { sector_t prev_first_sect, prev_size; /* previous meta location */ sector_t la_size_sect, u_size; struct drbd_md *md = &device->ldev->md; u32 prev_al_stripe_size_4k; u32 prev_al_stripes; sector_t size; char ppb[10]; void *buffer; int md_moved, la_size_changed; enum determine_dev_size rv = DS_UNCHANGED; /* race: * application request passes inc_ap_bio, * but then cannot get an AL-reference. * this function later may wait on ap_bio_cnt == 0. -> deadlock. * * to avoid that: * Suspend IO right here. * still lock the act_log to not trigger ASSERTs there. */ drbd_suspend_io(device); buffer = drbd_md_get_buffer(device); /* Lock meta-data IO */ if (!buffer) { drbd_resume_io(device); return DS_ERROR; } /* no wait necessary anymore, actually we could assert that */ wait_event(device->al_wait, lc_try_lock(device->act_log)); prev_first_sect = drbd_md_first_sector(device->ldev); prev_size = device->ldev->md.md_size_sect; la_size_sect = device->ldev->md.la_size_sect; if (rs) { /* rs is non NULL if we should change the AL layout only */ prev_al_stripes = md->al_stripes; prev_al_stripe_size_4k = md->al_stripe_size_4k; md->al_stripes = rs->al_stripes; md->al_stripe_size_4k = rs->al_stripe_size / 4; md->al_size_4k = (u64)rs->al_stripes * rs->al_stripe_size / 4; } drbd_md_set_sector_offsets(device, device->ldev); rcu_read_lock(); u_size = rcu_dereference(device->ldev->disk_conf)->disk_size; rcu_read_unlock(); size = drbd_new_dev_size(device, device->ldev, u_size, flags & DDSF_FORCED); if (size < la_size_sect) { if (rs && u_size == 0) { /* Remove "rs &&" later. This check should always be active, but right now the receiver expects the permissive behavior */ drbd_warn(device, "Implicit shrink not allowed. " "Use --size=%llus for explicit shrink.\n", (unsigned long long)size); rv = DS_ERROR_SHRINK; } if (u_size > size) rv = DS_ERROR_SPACE_MD; if (rv != DS_UNCHANGED) goto err_out; } if (drbd_get_capacity(device->this_bdev) != size || drbd_bm_capacity(device) != size) { int err; err = drbd_bm_resize(device, size, !(flags & DDSF_NO_RESYNC)); if (unlikely(err)) { /* currently there is only one error: ENOMEM! */ size = drbd_bm_capacity(device)>>1; if (size == 0) { drbd_err(device, "OUT OF MEMORY! " "Could not allocate bitmap!\n"); } else { drbd_err(device, "BM resizing failed. " "Leaving size unchanged at size = %lu KB\n", (unsigned long)size); } rv = DS_ERROR; } /* racy, see comments above. */ drbd_set_my_capacity(device, size); device->ldev->md.la_size_sect = size; drbd_info(device, "size = %s (%llu KB)\n", ppsize(ppb, size>>1), (unsigned long long)size>>1); } if (rv <= DS_ERROR) goto err_out; la_size_changed = (la_size_sect != device->ldev->md.la_size_sect); md_moved = prev_first_sect != drbd_md_first_sector(device->ldev) || prev_size != device->ldev->md.md_size_sect; if (la_size_changed || md_moved || rs) { u32 prev_flags; drbd_al_shrink(device); /* All extents inactive. */ prev_flags = md->flags; md->flags &= ~MDF_PRIMARY_IND; drbd_md_write(device, buffer); drbd_info(device, "Writing the whole bitmap, %s\n", la_size_changed && md_moved ? "size changed and md moved" : la_size_changed ? "size changed" : "md moved"); /* next line implicitly does drbd_suspend_io()+drbd_resume_io() */ drbd_bitmap_io(device, md_moved ? &drbd_bm_write_all : &drbd_bm_write, "size changed", BM_LOCKED_MASK); drbd_initialize_al(device, buffer); md->flags = prev_flags; drbd_md_write(device, buffer); if (rs) drbd_info(device, "Changed AL layout to al-stripes = %d, al-stripe-size-kB = %d\n", md->al_stripes, md->al_stripe_size_4k * 4); } if (size > la_size_sect) rv = la_size_sect ? DS_GREW : DS_GREW_FROM_ZERO; if (size < la_size_sect) rv = DS_SHRUNK; if (0) { err_out: if (rs) { md->al_stripes = prev_al_stripes; md->al_stripe_size_4k = prev_al_stripe_size_4k; md->al_size_4k = (u64)prev_al_stripes * prev_al_stripe_size_4k; drbd_md_set_sector_offsets(device, device->ldev); } } lc_unlock(device->act_log); wake_up(&device->al_wait); drbd_md_put_buffer(device); drbd_resume_io(device); return rv; } sector_t drbd_new_dev_size(struct drbd_device *device, struct drbd_backing_dev *bdev, sector_t u_size, int assume_peer_has_space) { sector_t p_size = device->p_size; /* partner's disk size. */ sector_t la_size_sect = bdev->md.la_size_sect; /* last agreed size. */ sector_t m_size; /* my size */ sector_t size = 0; m_size = drbd_get_max_capacity(bdev); if (device->state.conn < C_CONNECTED && assume_peer_has_space) { drbd_warn(device, "Resize while not connected was forced by the user!\n"); p_size = m_size; } if (p_size && m_size) { size = min_t(sector_t, p_size, m_size); } else { if (la_size_sect) { size = la_size_sect; if (m_size && m_size < size) size = m_size; if (p_size && p_size < size) size = p_size; } else { if (m_size) size = m_size; if (p_size) size = p_size; } } if (size == 0) drbd_err(device, "Both nodes diskless!\n"); if (u_size) { if (u_size > size) drbd_err(device, "Requested disk size is too big (%lu > %lu)\n", (unsigned long)u_size>>1, (unsigned long)size>>1); else size = u_size; } return size; } /** * drbd_check_al_size() - Ensures that the AL is of the right size * @device: DRBD device. * * Returns -EBUSY if current al lru is still used, -ENOMEM when allocation * failed, and 0 on success. You should call drbd_md_sync() after you called * this function. */ static int drbd_check_al_size(struct drbd_device *device, struct disk_conf *dc) { struct lru_cache *n, *t; struct lc_element *e; unsigned int in_use; int i; if (device->act_log && device->act_log->nr_elements == dc->al_extents) return 0; in_use = 0; t = device->act_log; n = lc_create("act_log", drbd_al_ext_cache, AL_UPDATES_PER_TRANSACTION, dc->al_extents, sizeof(struct lc_element), 0); if (n == NULL) { drbd_err(device, "Cannot allocate act_log lru!\n"); return -ENOMEM; } spin_lock_irq(&device->al_lock); if (t) { for (i = 0; i < t->nr_elements; i++) { e = lc_element_by_index(t, i); if (e->refcnt) drbd_err(device, "refcnt(%d)==%d\n", e->lc_number, e->refcnt); in_use += e->refcnt; } } if (!in_use) device->act_log = n; spin_unlock_irq(&device->al_lock); if (in_use) { drbd_err(device, "Activity log still in use!\n"); lc_destroy(n); return -EBUSY; } else { if (t) lc_destroy(t); } drbd_md_mark_dirty(device); /* we changed device->act_log->nr_elemens */ return 0; } static void drbd_setup_queue_param(struct drbd_device *device, unsigned int max_bio_size) { struct request_queue * const q = device->rq_queue; unsigned int max_hw_sectors = max_bio_size >> 9; unsigned int max_segments = 0; struct request_queue *b = NULL; if (get_ldev_if_state(device, D_ATTACHING)) { b = device->ldev->backing_bdev->bd_disk->queue; max_hw_sectors = min(queue_max_hw_sectors(b), max_bio_size >> 9); rcu_read_lock(); max_segments = rcu_dereference(device->ldev->disk_conf)->max_bio_bvecs; rcu_read_unlock(); blk_set_stacking_limits(DRBD_QUEUE_LIMITS(q)); } blk_queue_logical_block_size(q, 512); blk_queue_max_hw_sectors(q, max_hw_sectors); /* This is the workaround for "bio would need to, but cannot, be split" */ blk_queue_max_segments(q, max_segments ? max_segments : BLK_MAX_SEGMENTS); blk_queue_segment_boundary(q, PAGE_CACHE_SIZE-1); if (b) { struct drbd_connection *connection = first_peer_device(device)->connection; #if QUEUE_FLAG_DISCARD != -1 /* If this is not defined, the there is might be no q->limits */ if (blk_queue_discard(b) && (connection->cstate < C_CONNECTED || connection->agreed_features & FF_TRIM)) { /* For now, don't allow more than one activity log extent worth of data * to be discarded in one go. We may need to rework drbd_al_begin_io() * to allow for even larger discard ranges */ q->limits.max_discard_sectors = DRBD_MAX_DISCARD_SECTORS; queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); /* REALLY? Is stacking secdiscard "legal"? */ if (blk_queue_secdiscard(b)) queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, q); } else { q->limits.max_discard_sectors = 0; queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q); queue_flag_clear_unlocked(QUEUE_FLAG_SECDISCARD, q); } #endif blk_queue_stack_limits(q, b); if (q->backing_dev_info.ra_pages != b->backing_dev_info.ra_pages) { drbd_info(device, "Adjusting my ra_pages to backing device's (%lu -> %lu)\n", q->backing_dev_info.ra_pages, b->backing_dev_info.ra_pages); q->backing_dev_info.ra_pages = b->backing_dev_info.ra_pages; } put_ldev(device); } } void drbd_reconsider_max_bio_size(struct drbd_device *device) { unsigned int now, new, local, peer; now = queue_max_hw_sectors(device->rq_queue) << 9; local = device->local_max_bio_size; /* Eventually last known value, from volatile memory */ peer = device->peer_max_bio_size; /* Eventually last known value, from meta data */ if (get_ldev_if_state(device, D_ATTACHING)) { local = queue_max_hw_sectors(device->ldev->backing_bdev->bd_disk->queue) << 9; device->local_max_bio_size = local; put_ldev(device); } local = min(local, DRBD_MAX_BIO_SIZE); /* We may ignore peer limits if the peer is modern enough. Because new from 8.3.8 onwards the peer can use multiple BIOs for a single peer_request */ if (device->state.conn >= C_WF_REPORT_PARAMS) { if (first_peer_device(device)->connection->agreed_pro_version < 94) peer = min(device->peer_max_bio_size, DRBD_MAX_SIZE_H80_PACKET); /* Correct old drbd (up to 8.3.7) if it believes it can do more than 32KiB */ else if (first_peer_device(device)->connection->agreed_pro_version == 94) peer = DRBD_MAX_SIZE_H80_PACKET; else if (first_peer_device(device)->connection->agreed_pro_version < 100) peer = DRBD_MAX_BIO_SIZE_P95; /* drbd 8.3.8 onwards, before 8.4.0 */ else peer = DRBD_MAX_BIO_SIZE; /* We may later detach and re-attach on a disconnected Primary. * Avoid this setting to jump back in that case. * We want to store what we know the peer DRBD can handle, * not what the peer IO backend can handle. */ if (peer > device->peer_max_bio_size) device->peer_max_bio_size = peer; } new = min(local, peer); if (device->state.role == R_PRIMARY && new < now) drbd_err(device, "ASSERT FAILED new < now; (%u < %u)\n", new, now); if (new != now) drbd_info(device, "max BIO size = %u\n", new); drbd_setup_queue_param(device, new); } /* Starts the worker thread */ static void conn_reconfig_start(struct drbd_connection *connection) { drbd_thread_start(&connection->worker); drbd_flush_workqueue(&connection->sender_work); } /* if still unconfigured, stops worker again. */ static void conn_reconfig_done(struct drbd_connection *connection) { bool stop_threads; spin_lock_irq(&connection->resource->req_lock); stop_threads = conn_all_vols_unconf(connection) && connection->cstate == C_STANDALONE; spin_unlock_irq(&connection->resource->req_lock); if (stop_threads) { /* asender is implicitly stopped by receiver * in conn_disconnect() */ drbd_thread_stop(&connection->receiver); drbd_thread_stop(&connection->worker); } } /* Make sure IO is suspended before calling this function(). */ static void drbd_suspend_al(struct drbd_device *device) { int s = 0; if (!lc_try_lock(device->act_log)) { drbd_warn(device, "Failed to lock al in drbd_suspend_al()\n"); return; } drbd_al_shrink(device); spin_lock_irq(&device->resource->req_lock); if (device->state.conn < C_CONNECTED) s = !test_and_set_bit(AL_SUSPENDED, &device->flags); spin_unlock_irq(&device->resource->req_lock); lc_unlock(device->act_log); if (s) drbd_info(device, "Suspended AL updates\n"); } static bool should_set_defaults(struct genl_info *info) { unsigned flags = ((struct drbd_genlmsghdr*)info->userhdr)->flags; return 0 != (flags & DRBD_GENL_F_SET_DEFAULTS); } static unsigned int drbd_al_extents_max(struct drbd_backing_dev *bdev) { /* This is limited by 16 bit "slot" numbers, * and by available on-disk context storage. * * Also (u16)~0 is special (denotes a "free" extent). * * One transaction occupies one 4kB on-disk block, * we have n such blocks in the on disk ring buffer, * the "current" transaction may fail (n-1), * and there is 919 slot numbers context information per transaction. * * 72 transaction blocks amounts to more than 2**16 context slots, * so cap there first. */ const unsigned int max_al_nr = DRBD_AL_EXTENTS_MAX; const unsigned int sufficient_on_disk = (max_al_nr + AL_CONTEXT_PER_TRANSACTION -1) /AL_CONTEXT_PER_TRANSACTION; unsigned int al_size_4k = bdev->md.al_size_4k; if (al_size_4k > sufficient_on_disk) return max_al_nr; return (al_size_4k - 1) * AL_CONTEXT_PER_TRANSACTION; } int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; struct drbd_device *device; struct disk_conf *new_disk_conf, *old_disk_conf; struct fifo_buffer *old_plan = NULL, *new_plan = NULL; int err, fifo_size; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; device = adm_ctx.device; /* we also need a disk * to change the options on */ if (!get_ldev(device)) { retcode = ERR_NO_DISK; goto out; } new_disk_conf = kmalloc(sizeof(struct disk_conf), GFP_KERNEL); if (!new_disk_conf) { retcode = ERR_NOMEM; goto fail; } mutex_lock(&device->resource->conf_update); old_disk_conf = device->ldev->disk_conf; *new_disk_conf = *old_disk_conf; if (should_set_defaults(info)) set_disk_conf_defaults(new_disk_conf); err = disk_conf_from_attrs_for_change(new_disk_conf, info); if (err && err != -ENOMSG) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail_unlock; } if (!expect(new_disk_conf->resync_rate >= 1)) new_disk_conf->resync_rate = 1; if (new_disk_conf->al_extents < DRBD_AL_EXTENTS_MIN) new_disk_conf->al_extents = DRBD_AL_EXTENTS_MIN; if (new_disk_conf->al_extents > drbd_al_extents_max(device->ldev)) new_disk_conf->al_extents = drbd_al_extents_max(device->ldev); if (new_disk_conf->c_plan_ahead > DRBD_C_PLAN_AHEAD_MAX) new_disk_conf->c_plan_ahead = DRBD_C_PLAN_AHEAD_MAX; fifo_size = (new_disk_conf->c_plan_ahead * 10 * SLEEP_TIME) / HZ; if (fifo_size != device->rs_plan_s->size) { new_plan = fifo_alloc(fifo_size); if (!new_plan) { drbd_err(device, "kmalloc of fifo_buffer failed"); retcode = ERR_NOMEM; goto fail_unlock; } } drbd_suspend_io(device); wait_event(device->al_wait, lc_try_lock(device->act_log)); drbd_al_shrink(device); err = drbd_check_al_size(device, new_disk_conf); lc_unlock(device->act_log); wake_up(&device->al_wait); drbd_resume_io(device); if (err) { retcode = ERR_NOMEM; goto fail_unlock; } write_lock_irq(&global_state_lock); retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after); if (retcode == NO_ERROR) { rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf); drbd_resync_after_changed(device); } write_unlock_irq(&global_state_lock); if (retcode != NO_ERROR) goto fail_unlock; if (new_plan) { old_plan = device->rs_plan_s; rcu_assign_pointer(device->rs_plan_s, new_plan); } mutex_unlock(&device->resource->conf_update); if (new_disk_conf->al_updates) device->ldev->md.flags &= ~MDF_AL_DISABLED; else device->ldev->md.flags |= MDF_AL_DISABLED; if (new_disk_conf->md_flushes) clear_bit(MD_NO_BARRIER, &device->flags); else set_bit(MD_NO_BARRIER, &device->flags); drbd_bump_write_ordering(first_peer_device(device)->connection, WO_bio_barrier); drbd_md_sync(device); if (device->state.conn >= C_CONNECTED) { struct drbd_peer_device *peer_device; for_each_peer_device(peer_device, device) drbd_send_sync_param(peer_device); } synchronize_rcu(); kfree(old_disk_conf); kfree(old_plan); mod_timer(&device->request_timer, jiffies + HZ); goto success; fail_unlock: mutex_unlock(&device->resource->conf_update); fail: kfree(new_disk_conf); kfree(new_plan); success: put_ldev(device); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_device *device; int err; enum drbd_ret_code retcode; enum determine_dev_size dd; sector_t max_possible_sectors; sector_t min_md_device_sectors; struct drbd_backing_dev *nbc = NULL; /* new_backing_conf */ struct disk_conf *new_disk_conf = NULL; struct block_device *bdev; struct lru_cache *resync_lru = NULL; struct fifo_buffer *new_plan = NULL; union drbd_state ns, os; enum drbd_state_rv rv; struct net_conf *nc; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto finish; device = adm_ctx.device; conn_reconfig_start(first_peer_device(device)->connection); /* if you want to reconfigure, please tear down first */ if (device->state.disk > D_DISKLESS) { retcode = ERR_DISK_CONFIGURED; goto fail; } /* It may just now have detached because of IO error. Make sure * drbd_ldev_destroy is done already, we may end up here very fast, * e.g. if someone calls attach from the on-io-error handler, * to realize a "hot spare" feature (not that I'd recommend that) */ wait_event(device->misc_wait, !atomic_read(&device->local_cnt)); /* make sure there is no leftover from previous force-detach attempts */ clear_bit(FORCE_DETACH, &device->flags); clear_bit(WAS_IO_ERROR, &device->flags); clear_bit(WAS_READ_ERROR, &device->flags); /* and no leftover from previously aborted resync or verify, either */ device->rs_total = 0; device->rs_failed = 0; atomic_set(&device->rs_pending_cnt, 0); /* allocation not in the IO path, drbdsetup context */ nbc = kzalloc(sizeof(struct drbd_backing_dev), GFP_KERNEL); if (!nbc) { retcode = ERR_NOMEM; goto fail; } spin_lock_init(&nbc->md.uuid_lock); new_disk_conf = kzalloc(sizeof(struct disk_conf), GFP_KERNEL); if (!new_disk_conf) { retcode = ERR_NOMEM; goto fail; } nbc->disk_conf = new_disk_conf; set_disk_conf_defaults(new_disk_conf); err = disk_conf_from_attrs(new_disk_conf, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail; } if (new_disk_conf->c_plan_ahead > DRBD_C_PLAN_AHEAD_MAX) new_disk_conf->c_plan_ahead = DRBD_C_PLAN_AHEAD_MAX; new_plan = fifo_alloc((new_disk_conf->c_plan_ahead * 10 * SLEEP_TIME) / HZ); if (!new_plan) { retcode = ERR_NOMEM; goto fail; } if (new_disk_conf->meta_dev_idx < DRBD_MD_INDEX_FLEX_INT) { retcode = ERR_MD_IDX_INVALID; goto fail; } write_lock_irq(&global_state_lock); retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after); write_unlock_irq(&global_state_lock); if (retcode != NO_ERROR) goto fail; rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); if (nc) { if (new_disk_conf->fencing == FP_STONITH && nc->wire_protocol == DRBD_PROT_A) { rcu_read_unlock(); retcode = ERR_STONITH_AND_PROT_A; goto fail; } } rcu_read_unlock(); bdev = blkdev_get_by_path(new_disk_conf->backing_dev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, device); if (IS_ERR(bdev)) { drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->backing_dev, PTR_ERR(bdev)); retcode = ERR_OPEN_DISK; goto fail; } nbc->backing_bdev = bdev; /* * meta_dev_idx >= 0: external fixed size, possibly multiple * drbd sharing one meta device. TODO in that case, paranoia * check that [md_bdev, meta_dev_idx] is not yet used by some * other drbd minor! (if you use drbd.conf + drbdadm, that * should check it for you already; but if you don't, or * someone fooled it, we need to double check here) */ bdev = blkdev_get_by_path(new_disk_conf->meta_dev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, (new_disk_conf->meta_dev_idx < 0) ? (void *)device : (void *)drbd_m_holder); if (IS_ERR(bdev)) { drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->meta_dev, PTR_ERR(bdev)); retcode = ERR_OPEN_MD_DISK; goto fail; } nbc->md_bdev = bdev; if ((nbc->backing_bdev == nbc->md_bdev) != (new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_INTERNAL || new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_FLEX_INT)) { retcode = ERR_MD_IDX_INVALID; goto fail; } resync_lru = lc_create("resync", drbd_bm_ext_cache, 1, 61, sizeof(struct bm_extent), offsetof(struct bm_extent, lce)); if (!resync_lru) { retcode = ERR_NOMEM; goto fail; } /* Read our meta data super block early. * This also sets other on-disk offsets. */ retcode = drbd_md_read(device, nbc); if (retcode != NO_ERROR) goto fail; if (new_disk_conf->al_extents < DRBD_AL_EXTENTS_MIN) new_disk_conf->al_extents = DRBD_AL_EXTENTS_MIN; if (new_disk_conf->al_extents > drbd_al_extents_max(nbc)) new_disk_conf->al_extents = drbd_al_extents_max(nbc); if (drbd_get_max_capacity(nbc) < new_disk_conf->disk_size) { drbd_err(device, "max capacity %llu smaller than disk size %llu\n", (unsigned long long) drbd_get_max_capacity(nbc), (unsigned long long) new_disk_conf->disk_size); retcode = ERR_DISK_TOO_SMALL; goto fail; } if (new_disk_conf->meta_dev_idx < 0) { max_possible_sectors = DRBD_MAX_SECTORS_FLEX; /* at least one MB, otherwise it does not make sense */ min_md_device_sectors = (2<<10); } else { max_possible_sectors = DRBD_MAX_SECTORS; min_md_device_sectors = MD_128MB_SECT * (new_disk_conf->meta_dev_idx + 1); } if (drbd_get_capacity(nbc->md_bdev) < min_md_device_sectors) { retcode = ERR_MD_DISK_TOO_SMALL; drbd_warn(device, "refusing attach: md-device too small, " "at least %llu sectors needed for this meta-disk type\n", (unsigned long long) min_md_device_sectors); goto fail; } /* Make sure the new disk is big enough * (we may currently be R_PRIMARY with no local disk...) */ if (drbd_get_max_capacity(nbc) < drbd_get_capacity(device->this_bdev)) { retcode = ERR_DISK_TOO_SMALL; goto fail; } nbc->known_size = drbd_get_capacity(nbc->backing_bdev); if (nbc->known_size > max_possible_sectors) { drbd_warn(device, "==> truncating very big lower level device " "to currently maximum possible %llu sectors <==\n", (unsigned long long) max_possible_sectors); if (new_disk_conf->meta_dev_idx >= 0) drbd_warn(device, "==>> using internal or flexible " "meta data may help <<==\n"); } drbd_suspend_io(device); /* also wait for the last barrier ack. */ /* FIXME see also https://daiquiri.linbit/cgi-bin/bugzilla/show_bug.cgi?id=171 * We need a way to either ignore barrier acks for barriers sent before a device * was attached, or a way to wait for all pending barrier acks to come in. * As barriers are counted per resource, * we'd need to suspend io on all devices of a resource. */ wait_event(device->misc_wait, !atomic_read(&device->ap_pending_cnt) || drbd_suspended(device)); /* and for any other previously queued work */ drbd_flush_workqueue(&first_peer_device(device)->connection->sender_work); rv = _drbd_request_state(device, NS(disk, D_ATTACHING), CS_VERBOSE); retcode = rv; /* FIXME: Type mismatch. */ drbd_resume_io(device); if (rv < SS_SUCCESS) goto fail; if (!get_ldev_if_state(device, D_ATTACHING)) goto force_diskless; if (!device->bitmap) { if (drbd_bm_init(device)) { retcode = ERR_NOMEM; goto force_diskless_dec; } } if (device->state.conn < C_CONNECTED && device->state.role == R_PRIMARY && (device->ed_uuid & ~((u64)1)) != (nbc->md.uuid[UI_CURRENT] & ~((u64)1))) { drbd_err(device, "Can only attach to data with current UUID=%016llX\n", (unsigned long long)device->ed_uuid); retcode = ERR_DATA_NOT_CURRENT; goto force_diskless_dec; } /* Since we are diskless, fix the activity log first... */ if (drbd_check_al_size(device, new_disk_conf)) { retcode = ERR_NOMEM; goto force_diskless_dec; } /* Prevent shrinking of consistent devices ! */ if (drbd_md_test_flag(nbc, MDF_CONSISTENT) && drbd_new_dev_size(device, nbc, nbc->disk_conf->disk_size, 0) < nbc->md.la_size_sect) { drbd_warn(device, "refusing to truncate a consistent device\n"); retcode = ERR_DISK_TOO_SMALL; goto force_diskless_dec; } if (kobject_init_and_add(&nbc->kobject, &drbd_bdev_kobj_type, &device->kobj, "meta_data")) { retcode = ERR_NOMEM; goto remove_kobject; } /* Reset the "barriers don't work" bits here, then force meta data to * be written, to ensure we determine if barriers are supported. */ if (new_disk_conf->md_flushes) clear_bit(MD_NO_BARRIER, &device->flags); else set_bit(MD_NO_BARRIER, &device->flags); /* Point of no return reached. * Devices and memory are no longer released by error cleanup below. * now device takes over responsibility, and the state engine should * clean it up somewhere. */ D_ASSERT(device, device->ldev == NULL); device->ldev = nbc; device->resync = resync_lru; device->rs_plan_s = new_plan; nbc = NULL; resync_lru = NULL; new_disk_conf = NULL; new_plan = NULL; drbd_bump_write_ordering(first_peer_device(device)->connection, WO_bio_barrier); if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY)) set_bit(CRASHED_PRIMARY, &device->flags); else clear_bit(CRASHED_PRIMARY, &device->flags); if (drbd_md_test_flag(device->ldev, MDF_PRIMARY_IND) && !(device->state.role == R_PRIMARY && device->resource->susp_nod)) set_bit(CRASHED_PRIMARY, &device->flags); device->send_cnt = 0; device->recv_cnt = 0; device->read_cnt = 0; device->writ_cnt = 0; drbd_reconsider_max_bio_size(device); /* If I am currently not R_PRIMARY, * but meta data primary indicator is set, * I just now recover from a hard crash, * and have been R_PRIMARY before that crash. * * Now, if I had no connection before that crash * (have been degraded R_PRIMARY), chances are that * I won't find my peer now either. * * In that case, and _only_ in that case, * we use the degr-wfc-timeout instead of the default, * so we can automatically recover from a crash of a * degraded but active "cluster" after a certain timeout. */ clear_bit(USE_DEGR_WFC_T, &device->flags); if (device->state.role != R_PRIMARY && drbd_md_test_flag(device->ldev, MDF_PRIMARY_IND) && !drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND)) set_bit(USE_DEGR_WFC_T, &device->flags); dd = drbd_determine_dev_size(device, 0, NULL); if (dd <= DS_ERROR) { retcode = ERR_NOMEM_BITMAP; goto remove_kobject; } else if (dd == DS_GREW) set_bit(RESYNC_AFTER_NEG, &device->flags); if (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) || (test_bit(CRASHED_PRIMARY, &device->flags) && drbd_md_test_flag(device->ldev, MDF_AL_DISABLED))) { drbd_info(device, "Assuming that all blocks are out of sync " "(aka FullSync)\n"); if (drbd_bitmap_io(device, &drbd_bmio_set_n_write, "set_n_write from attaching", BM_LOCKED_MASK)) { retcode = ERR_IO_MD_DISK; goto remove_kobject; } } else { if (drbd_bitmap_io(device, &drbd_bm_read, "read from attaching", BM_LOCKED_MASK)) { retcode = ERR_IO_MD_DISK; goto remove_kobject; } } if (_drbd_bm_total_weight(device) == drbd_bm_bits(device)) drbd_suspend_al(device); /* IO is still suspended here... */ spin_lock_irq(&device->resource->req_lock); os = drbd_read_state(device); ns = os; /* If MDF_CONSISTENT is not set go into inconsistent state, otherwise investigate MDF_WasUpToDate... If MDF_WAS_UP_TO_DATE is not set go into D_OUTDATED disk state, otherwise into D_CONSISTENT state. */ if (drbd_md_test_flag(device->ldev, MDF_CONSISTENT)) { if (drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE)) ns.disk = D_CONSISTENT; else ns.disk = D_OUTDATED; } else { ns.disk = D_INCONSISTENT; } if (drbd_md_test_flag(device->ldev, MDF_PEER_OUT_DATED)) ns.pdsk = D_OUTDATED; rcu_read_lock(); if (ns.disk == D_CONSISTENT && (ns.pdsk == D_OUTDATED || rcu_dereference(device->ldev->disk_conf)->fencing == FP_DONT_CARE)) ns.disk = D_UP_TO_DATE; /* All tests on MDF_PRIMARY_IND, MDF_CONNECTED_IND, MDF_CONSISTENT and MDF_WAS_UP_TO_DATE must happen before this point, because drbd_request_state() modifies these flags. */ if (rcu_dereference(device->ldev->disk_conf)->al_updates) device->ldev->md.flags &= ~MDF_AL_DISABLED; else device->ldev->md.flags |= MDF_AL_DISABLED; rcu_read_unlock(); /* In case we are C_CONNECTED postpone any decision on the new disk state after the negotiation phase. */ if (device->state.conn == C_CONNECTED) { device->new_state_tmp.i = ns.i; ns.i = os.i; ns.disk = D_NEGOTIATING; /* We expect to receive up-to-date UUIDs soon. To avoid a race in receive_state, free p_uuid while holding req_lock. I.e. atomic with the state change */ kfree(device->p_uuid); device->p_uuid = NULL; } rv = _drbd_set_state(device, ns, CS_VERBOSE, NULL); spin_unlock_irq(&device->resource->req_lock); if (rv < SS_SUCCESS) goto remove_kobject; mod_timer(&device->request_timer, jiffies + HZ); if (device->state.role == R_PRIMARY) device->ldev->md.uuid[UI_CURRENT] |= (u64)1; else device->ldev->md.uuid[UI_CURRENT] &= ~(u64)1; drbd_md_mark_dirty(device); drbd_md_sync(device); drbd_kobject_uevent(device); put_ldev(device); conn_reconfig_done(first_peer_device(device)->connection); drbd_adm_finish(&adm_ctx, info, retcode); return 0; remove_kobject: drbd_free_bc(nbc); nbc = NULL; force_diskless_dec: put_ldev(device); force_diskless: drbd_force_state(device, NS(disk, D_DISKLESS)); drbd_md_sync(device); fail: conn_reconfig_done(first_peer_device(device)->connection); if (nbc) { if (nbc->backing_bdev) blkdev_put(nbc->backing_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); if (nbc->md_bdev) blkdev_put(nbc->md_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); kfree(nbc); } kfree(new_disk_conf); lc_destroy(resync_lru); kfree(new_plan); finish: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static int adm_detach(struct drbd_device *device, int force) { enum drbd_state_rv retcode; int ret; if (force) { set_bit(FORCE_DETACH, &device->flags); drbd_force_state(device, NS(disk, D_FAILED)); retcode = SS_SUCCESS; goto out; } drbd_suspend_io(device); /* so no-one is stuck in drbd_al_begin_io */ drbd_md_get_buffer(device); /* make sure there is no in-flight meta-data IO */ retcode = drbd_request_state(device, NS(disk, D_FAILED)); drbd_md_put_buffer(device); /* D_FAILED will transition to DISKLESS. */ ret = wait_event_interruptible(device->misc_wait, device->state.disk != D_FAILED); drbd_resume_io(device); if (retcode == SS_IS_DISKLESS) retcode = SS_NOTHING_TO_DO; if (ret) retcode = ERR_INTR; out: return retcode; } /* Detaching the disk is a process in multiple stages. First we need to lock * out application IO, in-flight IO, IO stuck in drbd_al_begin_io. * Then we transition to D_DISKLESS, and wait for put_ldev() to return all * internal references as well. * Only then we have finally detached. */ int drbd_adm_detach(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; struct detach_parms parms = { }; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; if (info->attrs[DRBD_NLA_DETACH_PARMS]) { err = detach_parms_from_attrs(&parms, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto out; } } retcode = adm_detach(adm_ctx.device, parms.force_detach); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static bool conn_resync_running(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; bool rv = false; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (device->state.conn == C_SYNC_SOURCE || device->state.conn == C_SYNC_TARGET || device->state.conn == C_PAUSED_SYNC_S || device->state.conn == C_PAUSED_SYNC_T) { rv = true; break; } } rcu_read_unlock(); return rv; } static bool conn_ov_running(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; bool rv = false; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) { rv = true; break; } } rcu_read_unlock(); return rv; } static enum drbd_ret_code _check_net_options(struct drbd_connection *connection, struct net_conf *old_net_conf, struct net_conf *new_net_conf) { struct drbd_peer_device *peer_device; int i; if (old_net_conf && connection->cstate == C_WF_REPORT_PARAMS && connection->agreed_pro_version < 100) { if (new_net_conf->wire_protocol != old_net_conf->wire_protocol) return ERR_NEED_APV_100; if (new_net_conf->two_primaries != old_net_conf->two_primaries) return ERR_NEED_APV_100; if (strcmp(new_net_conf->integrity_alg, old_net_conf->integrity_alg)) return ERR_NEED_APV_100; } if (!new_net_conf->two_primaries && conn_highest_role(connection) == R_PRIMARY && conn_highest_peer(connection) == R_PRIMARY) return ERR_NEED_ALLOW_TWO_PRI; if (new_net_conf->two_primaries && (new_net_conf->wire_protocol != DRBD_PROT_C)) return ERR_NOT_PROTO_C; idr_for_each_entry(&connection->peer_devices, peer_device, i) { struct drbd_device *device = peer_device->device; if (get_ldev(device)) { enum drbd_fencing_p fp = rcu_dereference(device->ldev->disk_conf)->fencing; put_ldev(device); if (new_net_conf->wire_protocol == DRBD_PROT_A && fp == FP_STONITH) return ERR_STONITH_AND_PROT_A; } if (device->state.role == R_PRIMARY && new_net_conf->discard_my_data) return ERR_DISCARD_IMPOSSIBLE; } if (new_net_conf->on_congestion != OC_BLOCK && new_net_conf->wire_protocol != DRBD_PROT_A) return ERR_CONG_NOT_PROTO_A; return NO_ERROR; } static enum drbd_ret_code check_net_options(struct drbd_connection *connection, struct net_conf *new_net_conf) { static enum drbd_ret_code rv; struct drbd_peer_device *peer_device; int i; rcu_read_lock(); rv = _check_net_options(connection, rcu_dereference(connection->net_conf), new_net_conf); rcu_read_unlock(); /* connection->volumes protected by genl_lock() here */ idr_for_each_entry(&connection->peer_devices, peer_device, i) { struct drbd_device *device = peer_device->device; if (!device->bitmap) { if(drbd_bm_init(device)) return ERR_NOMEM; } } return rv; } struct crypto { struct crypto_hash *verify_tfm; struct crypto_hash *csums_tfm; struct crypto_hash *cram_hmac_tfm; struct crypto_hash *integrity_tfm; }; static int alloc_hash(struct crypto_hash **tfm, char *tfm_name, int err_alg) { if (!tfm_name[0]) return NO_ERROR; *tfm = crypto_alloc_hash(tfm_name, 0, CRYPTO_ALG_ASYNC); if (IS_ERR(*tfm)) { *tfm = NULL; return err_alg; } return NO_ERROR; } static enum drbd_ret_code alloc_crypto(struct crypto *crypto, struct net_conf *new_net_conf) { char hmac_name[CRYPTO_MAX_ALG_NAME]; enum drbd_ret_code rv; rv = alloc_hash(&crypto->csums_tfm, new_net_conf->csums_alg, ERR_CSUMS_ALG); if (rv != NO_ERROR) return rv; rv = alloc_hash(&crypto->verify_tfm, new_net_conf->verify_alg, ERR_VERIFY_ALG); if (rv != NO_ERROR) return rv; rv = alloc_hash(&crypto->integrity_tfm, new_net_conf->integrity_alg, ERR_INTEGRITY_ALG); if (rv != NO_ERROR) return rv; if (new_net_conf->cram_hmac_alg[0] != 0) { snprintf(hmac_name, CRYPTO_MAX_ALG_NAME, "hmac(%s)", new_net_conf->cram_hmac_alg); rv = alloc_hash(&crypto->cram_hmac_tfm, hmac_name, ERR_AUTH_ALG); } return rv; } static void free_crypto(struct crypto *crypto) { crypto_free_hash(crypto->cram_hmac_tfm); crypto_free_hash(crypto->integrity_tfm); crypto_free_hash(crypto->csums_tfm); crypto_free_hash(crypto->verify_tfm); } int drbd_adm_net_opts(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; struct drbd_connection *connection; struct net_conf *old_net_conf, *new_net_conf = NULL; int err; int ovr; /* online verify running */ int rsr; /* re-sync running */ struct crypto crypto = { }; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_CONNECTION); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; connection = adm_ctx.connection; new_net_conf = kzalloc(sizeof(struct net_conf), GFP_KERNEL); if (!new_net_conf) { retcode = ERR_NOMEM; goto out; } conn_reconfig_start(connection); mutex_lock(&connection->data.mutex); mutex_lock(&connection->resource->conf_update); old_net_conf = connection->net_conf; if (!old_net_conf) { drbd_msg_put_info(adm_ctx.reply_skb, "net conf missing, try connect"); retcode = ERR_INVALID_REQUEST; goto fail; } *new_net_conf = *old_net_conf; if (should_set_defaults(info)) set_net_conf_defaults(new_net_conf); err = net_conf_from_attrs_for_change(new_net_conf, info); if (err && err != -ENOMSG) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail; } retcode = check_net_options(connection, new_net_conf); if (retcode != NO_ERROR) goto fail; /* re-sync running */ rsr = conn_resync_running(connection); if (rsr && strcmp(new_net_conf->csums_alg, old_net_conf->csums_alg)) { retcode = ERR_CSUMS_RESYNC_RUNNING; goto fail; } /* online verify running */ ovr = conn_ov_running(connection); if (ovr && strcmp(new_net_conf->verify_alg, old_net_conf->verify_alg)) { retcode = ERR_VERIFY_RUNNING; goto fail; } retcode = alloc_crypto(&crypto, new_net_conf); if (retcode != NO_ERROR) goto fail; rcu_assign_pointer(connection->net_conf, new_net_conf); if (!rsr) { crypto_free_hash(connection->csums_tfm); connection->csums_tfm = crypto.csums_tfm; crypto.csums_tfm = NULL; } if (!ovr) { crypto_free_hash(connection->verify_tfm); connection->verify_tfm = crypto.verify_tfm; crypto.verify_tfm = NULL; } crypto_free_hash(connection->integrity_tfm); connection->integrity_tfm = crypto.integrity_tfm; if (connection->cstate >= C_WF_REPORT_PARAMS && connection->agreed_pro_version >= 100) /* Do this without trying to take connection->data.mutex again. */ __drbd_send_protocol(connection, P_PROTOCOL_UPDATE); crypto_free_hash(connection->cram_hmac_tfm); connection->cram_hmac_tfm = crypto.cram_hmac_tfm; mutex_unlock(&connection->resource->conf_update); mutex_unlock(&connection->data.mutex); synchronize_rcu(); kfree(old_net_conf); if (connection->cstate >= C_WF_REPORT_PARAMS) { struct drbd_peer_device *peer_device; int vnr; idr_for_each_entry(&connection->peer_devices, peer_device, vnr) drbd_send_sync_param(peer_device); } goto done; fail: mutex_unlock(&connection->resource->conf_update); mutex_unlock(&connection->data.mutex); free_crypto(&crypto); kfree(new_net_conf); done: conn_reconfig_done(connection); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_connect(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_peer_device *peer_device; struct net_conf *old_net_conf, *new_net_conf = NULL; struct crypto crypto = { }; struct drbd_resource *resource; struct drbd_connection *connection; enum drbd_ret_code retcode; int i; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; if (!(adm_ctx.my_addr && adm_ctx.peer_addr)) { drbd_msg_put_info(adm_ctx.reply_skb, "connection endpoint(s) missing"); retcode = ERR_INVALID_REQUEST; goto out; } /* No need for _rcu here. All reconfiguration is * strictly serialized on genl_lock(). We are protected against * concurrent reconfiguration/addition/deletion */ for_each_resource(resource, &drbd_resources) { for_each_connection(connection, resource) { if (nla_len(adm_ctx.my_addr) == connection->my_addr_len && !memcmp(nla_data(adm_ctx.my_addr), &connection->my_addr, connection->my_addr_len)) { retcode = ERR_LOCAL_ADDR; goto out; } if (nla_len(adm_ctx.peer_addr) == connection->peer_addr_len && !memcmp(nla_data(adm_ctx.peer_addr), &connection->peer_addr, connection->peer_addr_len)) { retcode = ERR_PEER_ADDR; goto out; } } } connection = first_connection(adm_ctx.resource); conn_reconfig_start(connection); if (connection->cstate > C_STANDALONE) { retcode = ERR_NET_CONFIGURED; goto fail; } /* allocation not in the IO path, drbdsetup / netlink process context */ new_net_conf = kzalloc(sizeof(*new_net_conf), GFP_KERNEL); if (!new_net_conf) { retcode = ERR_NOMEM; goto fail; } set_net_conf_defaults(new_net_conf); err = net_conf_from_attrs(new_net_conf, info); if (err && err != -ENOMSG) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail; } retcode = check_net_options(connection, new_net_conf); if (retcode != NO_ERROR) goto fail; retcode = alloc_crypto(&crypto, new_net_conf); if (retcode != NO_ERROR) goto fail; ((char *)new_net_conf->shared_secret)[SHARED_SECRET_MAX-1] = 0; drbd_flush_workqueue(&connection->sender_work); mutex_lock(&adm_ctx.resource->conf_update); old_net_conf = connection->net_conf; if (old_net_conf) { retcode = ERR_NET_CONFIGURED; mutex_unlock(&adm_ctx.resource->conf_update); goto fail; } rcu_assign_pointer(connection->net_conf, new_net_conf); conn_free_crypto(connection); connection->cram_hmac_tfm = crypto.cram_hmac_tfm; connection->integrity_tfm = crypto.integrity_tfm; connection->csums_tfm = crypto.csums_tfm; connection->verify_tfm = crypto.verify_tfm; connection->my_addr_len = nla_len(adm_ctx.my_addr); memcpy(&connection->my_addr, nla_data(adm_ctx.my_addr), connection->my_addr_len); connection->peer_addr_len = nla_len(adm_ctx.peer_addr); memcpy(&connection->peer_addr, nla_data(adm_ctx.peer_addr), connection->peer_addr_len); mutex_unlock(&adm_ctx.resource->conf_update); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, i) { struct drbd_device *device = peer_device->device; device->send_cnt = 0; device->recv_cnt = 0; } rcu_read_unlock(); retcode = conn_request_state(connection, NS(conn, C_UNCONNECTED), CS_VERBOSE); conn_reconfig_done(connection); drbd_adm_finish(&adm_ctx, info, retcode); return 0; fail: free_crypto(&crypto); kfree(new_net_conf); conn_reconfig_done(connection); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static enum drbd_state_rv conn_try_disconnect(struct drbd_connection *connection, bool force) { enum drbd_state_rv rv; rv = conn_request_state(connection, NS(conn, C_DISCONNECTING), force ? CS_HARD : 0); switch (rv) { case SS_NOTHING_TO_DO: break; case SS_ALREADY_STANDALONE: return SS_SUCCESS; case SS_PRIMARY_NOP: /* Our state checking code wants to see the peer outdated. */ rv = conn_request_state(connection, NS2(conn, C_DISCONNECTING, pdsk, D_OUTDATED), 0); if (rv == SS_OUTDATE_WO_CONN) /* lost connection before graceful disconnect succeeded */ rv = conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_VERBOSE); break; case SS_CW_FAILED_BY_PEER: /* The peer probably wants to see us outdated. */ rv = conn_request_state(connection, NS2(conn, C_DISCONNECTING, disk, D_OUTDATED), 0); if (rv == SS_IS_DISKLESS || rv == SS_LOWER_THAN_OUTDATED) { rv = conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); } break; default:; /* no special handling necessary */ } if (rv >= SS_SUCCESS) { enum drbd_state_rv rv2; /* No one else can reconfigure the network while I am here. * The state handling only uses drbd_thread_stop_nowait(), * we want to really wait here until the receiver is no more. */ drbd_thread_stop(&connection->receiver); /* Race breaker. This additional state change request may be * necessary, if this was a forced disconnect during a receiver * restart. We may have "killed" the receiver thread just * after drbd_receiver() returned. Typically, we should be * C_STANDALONE already, now, and this becomes a no-op. */ rv2 = conn_request_state(connection, NS(conn, C_STANDALONE), CS_VERBOSE | CS_HARD); if (rv2 < SS_SUCCESS) drbd_err(connection, "unexpected rv2=%d in conn_try_disconnect()\n", rv2); } return rv; } int drbd_adm_disconnect(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct disconnect_parms parms; struct drbd_connection *connection; enum drbd_state_rv rv; enum drbd_ret_code retcode; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_CONNECTION); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto fail; connection = adm_ctx.connection; memset(&parms, 0, sizeof(parms)); if (info->attrs[DRBD_NLA_DISCONNECT_PARMS]) { err = disconnect_parms_from_attrs(&parms, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail; } } rv = conn_try_disconnect(connection, parms.force_disconnect); if (rv < SS_SUCCESS) retcode = rv; /* FIXME: Type mismatch. */ else retcode = NO_ERROR; fail: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } void resync_after_online_grow(struct drbd_device *device) { int iass; /* I am sync source */ drbd_info(device, "Resync of new storage after online grow\n"); if (device->state.role != device->state.peer) iass = (device->state.role == R_PRIMARY); else iass = test_bit(RESOLVE_CONFLICTS, &first_peer_device(device)->connection->flags); if (iass) drbd_start_resync(device, C_SYNC_SOURCE); else _drbd_request_state(device, NS(conn, C_WF_SYNC_UUID), CS_VERBOSE + CS_SERIALIZE); } int drbd_adm_resize(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct disk_conf *old_disk_conf, *new_disk_conf = NULL; struct resize_parms rs; struct drbd_device *device; enum drbd_ret_code retcode; enum determine_dev_size dd; bool change_al_layout = false; enum dds_flags ddsf; sector_t u_size; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto fail; device = adm_ctx.device; if (!get_ldev(device)) { retcode = ERR_NO_DISK; goto fail; } memset(&rs, 0, sizeof(struct resize_parms)); rs.al_stripes = device->ldev->md.al_stripes; rs.al_stripe_size = device->ldev->md.al_stripe_size_4k * 4; if (info->attrs[DRBD_NLA_RESIZE_PARMS]) { err = resize_parms_from_attrs(&rs, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail_ldev; } } if (device->state.conn > C_CONNECTED) { retcode = ERR_RESIZE_RESYNC; goto fail_ldev; } if (device->state.role == R_SECONDARY && device->state.peer == R_SECONDARY) { retcode = ERR_NO_PRIMARY; goto fail_ldev; } if (rs.no_resync && first_peer_device(device)->connection->agreed_pro_version < 93) { retcode = ERR_NEED_APV_93; goto fail_ldev; } rcu_read_lock(); u_size = rcu_dereference(device->ldev->disk_conf)->disk_size; rcu_read_unlock(); if (u_size != (sector_t)rs.resize_size) { new_disk_conf = kmalloc(sizeof(struct disk_conf), GFP_KERNEL); if (!new_disk_conf) { retcode = ERR_NOMEM; goto fail_ldev; } } if (device->ldev->md.al_stripes != rs.al_stripes || device->ldev->md.al_stripe_size_4k != rs.al_stripe_size / 4) { u32 al_size_k = rs.al_stripes * rs.al_stripe_size; if (al_size_k > (16 * 1024 * 1024)) { retcode = ERR_MD_LAYOUT_TOO_BIG; goto fail_ldev; } if (al_size_k < MD_32kB_SECT/2) { retcode = ERR_MD_LAYOUT_TOO_SMALL; goto fail_ldev; } if (device->state.conn != C_CONNECTED && !rs.resize_force) { retcode = ERR_MD_LAYOUT_CONNECTED; goto fail_ldev; } change_al_layout = true; } if (device->ldev->known_size != drbd_get_capacity(device->ldev->backing_bdev)) device->ldev->known_size = drbd_get_capacity(device->ldev->backing_bdev); if (new_disk_conf) { mutex_lock(&device->resource->conf_update); old_disk_conf = device->ldev->disk_conf; *new_disk_conf = *old_disk_conf; new_disk_conf->disk_size = (sector_t)rs.resize_size; rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf); mutex_unlock(&device->resource->conf_update); synchronize_rcu(); kfree(old_disk_conf); } ddsf = (rs.resize_force ? DDSF_FORCED : 0) | (rs.no_resync ? DDSF_NO_RESYNC : 0); dd = drbd_determine_dev_size(device, ddsf, change_al_layout ? &rs : NULL); drbd_md_sync(device); put_ldev(device); if (dd == DS_ERROR) { retcode = ERR_NOMEM_BITMAP; goto fail; } else if (dd == DS_ERROR_SPACE_MD) { retcode = ERR_MD_LAYOUT_NO_FIT; goto fail; } else if (dd == DS_ERROR_SHRINK) { retcode = ERR_IMPLICIT_SHRINK; goto fail; } if (device->state.conn == C_CONNECTED) { if (dd == DS_GREW) set_bit(RESIZE_PENDING, &device->flags); drbd_send_uuids(first_peer_device(device)); drbd_send_sizes(first_peer_device(device), 1, ddsf); } fail: drbd_adm_finish(&adm_ctx, info, retcode); return 0; fail_ldev: put_ldev(device); goto fail; } int drbd_adm_resource_opts(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; struct res_opts res_opts; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto fail; res_opts = adm_ctx.resource->res_opts; if (should_set_defaults(info)) set_res_opts_defaults(&res_opts); err = res_opts_from_attrs(&res_opts, info); if (err && err != -ENOMSG) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto fail; } err = set_resource_options(adm_ctx.resource, &res_opts); if (err) { retcode = ERR_INVALID_REQUEST; if (err == -ENOMEM) retcode = ERR_NOMEM; } fail: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_invalidate(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_device *device; int retcode; /* enum drbd_ret_code rsp. enum drbd_state_rv */ retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; device = adm_ctx.device; /* If there is still bitmap IO pending, probably because of a previous * resync just being finished, wait for it before requesting a new resync. * Also wait for it's after_state_ch(). */ drbd_suspend_io(device); wait_event(device->misc_wait, !test_bit(BITMAP_IO, &device->flags)); drbd_flush_workqueue(&first_peer_device(device)->connection->sender_work); /* If we happen to be C_STANDALONE R_SECONDARY, just change to * D_INCONSISTENT, and set all bits in the bitmap. Otherwise, * try to start a resync handshake as sync target for full sync. */ if (device->state.conn == C_STANDALONE && device->state.role == R_SECONDARY) { retcode = drbd_request_state(device, NS(disk, D_INCONSISTENT)); if (retcode >= SS_SUCCESS) { if (drbd_bitmap_io(device, &drbd_bmio_set_n_write, "set_n_write from invalidate", BM_LOCKED_MASK)) retcode = ERR_IO_MD_DISK; } } else retcode = drbd_request_state(device, NS(conn, C_STARTING_SYNC_T)); drbd_resume_io(device); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static int drbd_adm_simple_request_state(struct sk_buff *skb, struct genl_info *info, union drbd_state mask, union drbd_state val) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; retcode = drbd_request_state(adm_ctx.device, mask, val); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static int drbd_bmio_set_susp_al(struct drbd_device *device) { int rv; rv = drbd_bmio_set_n_write(device); drbd_suspend_al(device); return rv; } int drbd_adm_invalidate_peer(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; int retcode; /* drbd_ret_code, drbd_state_rv */ struct drbd_device *device; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; device = adm_ctx.device; /* If there is still bitmap IO pending, probably because of a previous * resync just being finished, wait for it before requesting a new resync. * Also wait for it's after_state_ch(). */ drbd_suspend_io(device); wait_event(device->misc_wait, !test_bit(BITMAP_IO, &device->flags)); drbd_flush_workqueue(&first_peer_device(device)->connection->sender_work); /* If we happen to be C_STANDALONE R_PRIMARY, just set all bits * in the bitmap. Otherwise, try to start a resync handshake * as sync source for full sync. */ if (device->state.conn == C_STANDALONE && device->state.role == R_PRIMARY) { /* The peer will get a resync upon connect anyways. Just make that into a full resync. */ retcode = drbd_request_state(device, NS(pdsk, D_INCONSISTENT)); if (retcode >= SS_SUCCESS) { if (drbd_bitmap_io(device, &drbd_bmio_set_susp_al, "set_n_write from invalidate_peer", BM_LOCKED_SET_ALLOWED)) retcode = ERR_IO_MD_DISK; } } else retcode = drbd_request_state(device, NS(conn, C_STARTING_SYNC_S)); drbd_resume_io(device); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_pause_sync(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; if (drbd_request_state(adm_ctx.device, NS(user_isp, 1)) == SS_NOTHING_TO_DO) retcode = ERR_PAUSE_IS_SET; out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_resume_sync(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; union drbd_dev_state s; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; if (drbd_request_state(adm_ctx.device, NS(user_isp, 0)) == SS_NOTHING_TO_DO) { s = adm_ctx.device->state; if (s.conn == C_PAUSED_SYNC_S || s.conn == C_PAUSED_SYNC_T) { retcode = s.aftr_isp ? ERR_PIC_AFTER_DEP : s.peer_isp ? ERR_PIC_PEER_DEP : ERR_PAUSE_IS_CLEAR; } else { retcode = ERR_PAUSE_IS_CLEAR; } } out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_suspend_io(struct sk_buff *skb, struct genl_info *info) { return drbd_adm_simple_request_state(skb, info, NS(susp, 1)); } int drbd_adm_resume_io(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_device *device; int retcode; /* enum drbd_ret_code rsp. enum drbd_state_rv */ retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; device = adm_ctx.device; if (test_bit(NEW_CUR_UUID, &device->flags)) { drbd_uuid_new_current(device); clear_bit(NEW_CUR_UUID, &device->flags); } drbd_suspend_io(device); retcode = drbd_request_state(device, NS3(susp, 0, susp_nod, 0, susp_fen, 0)); if (retcode == SS_SUCCESS) { if (device->state.conn < C_CONNECTED) tl_clear(first_peer_device(device)->connection); if (device->state.disk == D_DISKLESS || device->state.disk == D_FAILED) tl_restart(first_peer_device(device)->connection, FAIL_FROZEN_DISK_IO); } drbd_resume_io(device); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_outdate(struct sk_buff *skb, struct genl_info *info) { return drbd_adm_simple_request_state(skb, info, NS(disk, D_OUTDATED)); } static int nla_put_drbd_cfg_context(struct sk_buff *skb, struct drbd_resource *resource, struct drbd_connection *connection, struct drbd_device *device) { struct nlattr *nla; nla = nla_nest_start(skb, DRBD_NLA_CFG_CONTEXT); if (!nla) goto nla_put_failure; if (device && nla_put_u32(skb, T_ctx_volume, device->vnr)) goto nla_put_failure; if (nla_put_string(skb, T_ctx_resource_name, connection->resource->name)) goto nla_put_failure; if (connection) { if (connection->my_addr_len && nla_put(skb, T_ctx_my_addr, connection->my_addr_len, &connection->my_addr)) goto nla_put_failure; if (connection->peer_addr_len && nla_put(skb, T_ctx_peer_addr, connection->peer_addr_len, &connection->peer_addr)) goto nla_put_failure; } nla_nest_end(skb, nla); return 0; nla_put_failure: if (nla) nla_nest_cancel(skb, nla); return -EMSGSIZE; } /* * Return the connection of @resource if @resource has exactly one connection. */ static struct drbd_connection *the_only_connection(struct drbd_resource *resource) { struct list_head *connections = &resource->connections; if (list_empty(connections) || connections->next->next != connections) return NULL; return list_first_entry(&resource->connections, struct drbd_connection, connections); } int nla_put_status_info(struct sk_buff *skb, struct drbd_device *device, const struct sib_info *sib) { struct drbd_resource *resource = device->resource; struct state_info *si = NULL; /* for sizeof(si->member); */ struct nlattr *nla; int got_ldev; int err = 0; int exclude_sensitive; /* If sib != NULL, this is drbd_bcast_event, which anyone can listen * to. So we better exclude_sensitive information. * * If sib == NULL, this is drbd_adm_get_status, executed synchronously * in the context of the requesting user process. Exclude sensitive * information, unless current has superuser. * * NOTE: for drbd_adm_get_status_all(), this is a netlink dump, and * relies on the current implementation of netlink_dump(), which * executes the dump callback successively from netlink_recvmsg(), * always in the context of the receiving process */ exclude_sensitive = sib || !capable(CAP_SYS_ADMIN); got_ldev = get_ldev(device); /* We need to add connection name and volume number information still. * Minor number is in drbd_genlmsghdr. */ if (nla_put_drbd_cfg_context(skb, resource, the_only_connection(resource), device)) goto nla_put_failure; if (res_opts_to_skb(skb, &device->resource->res_opts, exclude_sensitive)) goto nla_put_failure; rcu_read_lock(); if (got_ldev) { struct disk_conf *disk_conf; disk_conf = rcu_dereference(device->ldev->disk_conf); err = disk_conf_to_skb(skb, disk_conf, exclude_sensitive); } if (!err) { struct net_conf *nc; nc = rcu_dereference(first_peer_device(device)->connection->net_conf); if (nc) err = net_conf_to_skb(skb, nc, exclude_sensitive); } rcu_read_unlock(); if (err) goto nla_put_failure; nla = nla_nest_start(skb, DRBD_NLA_STATE_INFO); if (!nla) goto nla_put_failure; if (nla_put_u32(skb, T_sib_reason, sib ? sib->sib_reason : SIB_GET_STATUS_REPLY) || nla_put_u32(skb, T_current_state, device->state.i) || nla_put_u64(skb, T_ed_uuid, device->ed_uuid) || nla_put_u64(skb, T_capacity, drbd_get_capacity(device->this_bdev)) || nla_put_u64(skb, T_send_cnt, device->send_cnt) || nla_put_u64(skb, T_recv_cnt, device->recv_cnt) || nla_put_u64(skb, T_read_cnt, device->read_cnt) || nla_put_u64(skb, T_writ_cnt, device->writ_cnt) || nla_put_u64(skb, T_al_writ_cnt, device->al_writ_cnt) || nla_put_u64(skb, T_bm_writ_cnt, device->bm_writ_cnt) || nla_put_u32(skb, T_ap_bio_cnt, atomic_read(&device->ap_bio_cnt)) || nla_put_u32(skb, T_ap_pending_cnt, atomic_read(&device->ap_pending_cnt)) || nla_put_u32(skb, T_rs_pending_cnt, atomic_read(&device->rs_pending_cnt))) goto nla_put_failure; if (got_ldev) { int err; spin_lock_irq(&device->ldev->md.uuid_lock); err = nla_put(skb, T_uuids, sizeof(si->uuids), device->ldev->md.uuid); spin_unlock_irq(&device->ldev->md.uuid_lock); if (err) goto nla_put_failure; if (nla_put_u32(skb, T_disk_flags, device->ldev->md.flags) || nla_put_u64(skb, T_bits_total, drbd_bm_bits(device)) || nla_put_u64(skb, T_bits_oos, drbd_bm_total_weight(device))) goto nla_put_failure; if (C_SYNC_SOURCE <= device->state.conn && C_PAUSED_SYNC_T >= device->state.conn) { if (nla_put_u64(skb, T_bits_rs_total, device->rs_total) || nla_put_u64(skb, T_bits_rs_failed, device->rs_failed)) goto nla_put_failure; } } if (sib) { switch(sib->sib_reason) { case SIB_SYNC_PROGRESS: case SIB_GET_STATUS_REPLY: break; case SIB_STATE_CHANGE: if (nla_put_u32(skb, T_prev_state, sib->os.i) || nla_put_u32(skb, T_new_state, sib->ns.i)) goto nla_put_failure; break; case SIB_HELPER_POST: if (nla_put_u32(skb, T_helper_exit_code, sib->helper_exit_code)) goto nla_put_failure; /* fall through */ case SIB_HELPER_PRE: if (nla_put_string(skb, T_helper, sib->helper_name)) goto nla_put_failure; break; } } nla_nest_end(skb, nla); if (0) nla_put_failure: err = -EMSGSIZE; if (got_ldev) put_ldev(device); return err; } int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; err = nla_put_status_info(adm_ctx.reply_skb, adm_ctx.device, NULL); if (err) { nlmsg_free(adm_ctx.reply_skb); return err; } out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static int get_one_status(struct sk_buff *skb, struct netlink_callback *cb) { struct drbd_device *device; struct drbd_genlmsghdr *dh; struct drbd_resource *pos = (struct drbd_resource *)cb->args[0]; struct drbd_resource *resource = NULL; struct drbd_resource *tmp; unsigned volume = cb->args[1]; /* Open coded, deferred, iteration: * for_each_resource_safe(resource, tmp, &drbd_resources) { * connection = "first connection of resource or undefined"; * idr_for_each_entry(&resource->devices, device, i) { * ... * } * } * where resource is cb->args[0]; * and i is cb->args[1]; * * cb->args[2] indicates if we shall loop over all resources, * or just dump all volumes of a single resource. * * This may miss entries inserted after this dump started, * or entries deleted before they are reached. * * We need to make sure the device won't disappear while * we are looking at it, and revalidate our iterators * on each iteration. */ /* synchronize with conn_create()/drbd_destroy_connection() */ rcu_read_lock(); /* revalidate iterator position */ for_each_resource_rcu(tmp, &drbd_resources) { if (pos == NULL) { /* first iteration */ pos = tmp; resource = pos; break; } if (tmp == pos) { resource = pos; break; } } if (resource) { next_resource: device = idr_get_next(&resource->devices, &volume); if (!device) { /* No more volumes to dump on this resource. * Advance resource iterator. */ pos = list_entry_rcu(resource->resources.next, struct drbd_resource, resources); /* Did we dump any volume of this resource yet? */ if (volume != 0) { /* If we reached the end of the list, * or only a single resource dump was requested, * we are done. */ if (&pos->resources == &drbd_resources || cb->args[2]) goto out; volume = 0; resource = pos; goto next_resource; } } dh = genlmsg_put(skb, NETLINK_CB_PORTID(cb->skb), cb->nlh->nlmsg_seq, &drbd_genl_family, NLM_F_MULTI, DRBD_ADM_GET_STATUS); if (!dh) goto out; if (!device) { /* This is a connection without a single volume. * Suprisingly enough, it may have a network * configuration. */ struct drbd_connection *connection; dh->minor = -1U; dh->ret_code = NO_ERROR; connection = the_only_connection(resource); if (nla_put_drbd_cfg_context(skb, resource, connection, NULL)) goto cancel; if (connection) { struct net_conf *nc; nc = rcu_dereference(connection->net_conf); if (nc && net_conf_to_skb(skb, nc, 1) != 0) goto cancel; } goto done; } D_ASSERT(device, device->vnr == volume); D_ASSERT(device, device->resource == resource); dh->minor = device_to_minor(device); dh->ret_code = NO_ERROR; if (nla_put_status_info(skb, device, NULL)) { cancel: genlmsg_cancel(skb, dh); goto out; } done: genlmsg_end(skb, dh); } out: rcu_read_unlock(); /* where to start the next iteration */ cb->args[0] = (long)pos; cb->args[1] = (pos == resource) ? volume + 1 : 0; /* No more resources/volumes/minors found results in an empty skb. * Which will terminate the dump. */ return skb->len; } /* * Request status of all resources, or of all volumes within a single resource. * * This is a dump, as the answer may not fit in a single reply skb otherwise. * Which means we cannot use the family->attrbuf or other such members, because * dump is NOT protected by the genl_lock(). During dump, we only have access * to the incoming skb, and need to opencode "parsing" of the nlattr payload. * * Once things are setup properly, we call into get_one_status(). */ int drbd_adm_get_status_all(struct sk_buff *skb, struct netlink_callback *cb) { const unsigned hdrlen = GENL_HDRLEN + GENL_MAGIC_FAMILY_HDRSZ; struct nlattr *nla; const char *resource_name; struct drbd_resource *resource; int maxtype; /* Is this a followup call? */ if (cb->args[0]) { /* ... of a single resource dump, * and the resource iterator has been advanced already? */ if (cb->args[2] && cb->args[2] != cb->args[0]) return 0; /* DONE. */ goto dump; } /* First call (from netlink_dump_start). We need to figure out * which resource(s) the user wants us to dump. */ nla = nla_find(nlmsg_attrdata(cb->nlh, hdrlen), nlmsg_attrlen(cb->nlh, hdrlen), DRBD_NLA_CFG_CONTEXT); /* No explicit context given. Dump all. */ if (!nla) goto dump; maxtype = ARRAY_SIZE(drbd_cfg_context_nl_policy) - 1; nla = drbd_nla_find_nested(maxtype, nla, __nla_type(T_ctx_resource_name)); if (IS_ERR(nla)) return PTR_ERR(nla); /* context given, but no name present? */ if (!nla) return -EINVAL; resource_name = nla_data(nla); if (!*resource_name) return -ENODEV; resource = drbd_find_resource(resource_name); if (!resource) return -ENODEV; kref_put(&resource->kref, drbd_destroy_resource); /* get_one_status() revalidates the resource */ /* prime iterators, and set "filter" mode mark: * only dump this connection. */ cb->args[0] = (long)resource; /* cb->args[1] = 0; passed in this way. */ cb->args[2] = (long)resource; dump: return get_one_status(skb, cb); } int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; struct timeout_parms tp; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; tp.timeout_type = adm_ctx.device->state.pdsk == D_OUTDATED ? UT_PEER_OUTDATED : test_bit(USE_DEGR_WFC_T, &adm_ctx.device->flags) ? UT_DEGRADED : UT_DEFAULT; err = timeout_parms_to_priv_skb(adm_ctx.reply_skb, &tp); if (err) { nlmsg_free(adm_ctx.reply_skb); return err; } out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_start_ov(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_device *device; enum drbd_ret_code retcode; struct start_ov_parms parms; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; device = adm_ctx.device; /* resume from last known position, if possible */ parms.ov_start_sector = device->ov_start_sector; parms.ov_stop_sector = ULLONG_MAX; if (info->attrs[DRBD_NLA_START_OV_PARMS]) { int err = start_ov_parms_from_attrs(&parms, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto out; } } /* w_make_ov_request expects position to be aligned */ device->ov_start_sector = parms.ov_start_sector & ~(BM_SECT_PER_BIT-1); device->ov_stop_sector = parms.ov_stop_sector; /* If there is still bitmap IO pending, e.g. previous resync or verify * just being finished, wait for it before requesting a new resync. */ drbd_suspend_io(device); wait_event(device->misc_wait, !test_bit(BITMAP_IO, &device->flags)); retcode = drbd_request_state(device,NS(conn,C_VERIFY_S)); drbd_resume_io(device); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_device *device; enum drbd_ret_code retcode; int skip_initial_sync = 0; int err; struct new_c_uuid_parms args; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out_nolock; device = adm_ctx.device; memset(&args, 0, sizeof(args)); if (info->attrs[DRBD_NLA_NEW_C_UUID_PARMS]) { err = new_c_uuid_parms_from_attrs(&args, info); if (err) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto out_nolock; } } mutex_lock(device->state_mutex); /* Protects us against serialized state changes. */ if (!get_ldev(device)) { retcode = ERR_NO_DISK; goto out; } /* this is "skip initial sync", assume to be clean */ if (device->state.conn == C_CONNECTED && first_peer_device(device)->connection->agreed_pro_version >= 90 && device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED && args.clear_bm) { drbd_info(device, "Preparing to skip initial sync\n"); skip_initial_sync = 1; } else if (device->state.conn != C_STANDALONE) { retcode = ERR_CONNECTED; goto out_dec; } drbd_uuid_set(device, UI_BITMAP, 0); /* Rotate UI_BITMAP to History 1, etc... */ drbd_uuid_new_current(device); /* New current, previous to UI_BITMAP */ if (args.clear_bm) { err = drbd_bitmap_io(device, &drbd_bmio_clear_n_write, "clear_n_write from new_c_uuid", BM_LOCKED_MASK); if (err) { drbd_err(device, "Writing bitmap failed with %d\n", err); retcode = ERR_IO_MD_DISK; } if (skip_initial_sync) { drbd_send_uuids_skip_initial_sync(first_peer_device(device)); _drbd_uuid_set(device, UI_BITMAP, 0); drbd_print_uuids(device, "cleared bitmap UUID"); spin_lock_irq(&device->resource->req_lock); _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE, pdsk, D_UP_TO_DATE), CS_VERBOSE, NULL); spin_unlock_irq(&device->resource->req_lock); } } drbd_md_sync(device); out_dec: put_ldev(device); out: mutex_unlock(device->state_mutex); out_nolock: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static enum drbd_ret_code drbd_check_resource_name(struct drbd_config_context *adm_ctx) { const char *name = adm_ctx->resource_name; if (!name || !name[0]) { drbd_msg_put_info(adm_ctx->reply_skb, "resource name missing"); return ERR_MANDATORY_TAG; } /* if we want to use these in sysfs/configfs/debugfs some day, * we must not allow slashes */ if (strchr(name, '/')) { drbd_msg_put_info(adm_ctx->reply_skb, "invalid resource name"); return ERR_INVALID_REQUEST; } return NO_ERROR; } int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; struct res_opts res_opts; int err; retcode = drbd_adm_prepare(&adm_ctx, skb, info, 0); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; set_res_opts_defaults(&res_opts); err = res_opts_from_attrs(&res_opts, info); if (err && err != -ENOMSG) { retcode = ERR_MANDATORY_TAG; drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); goto out; } retcode = drbd_check_resource_name(&adm_ctx); if (retcode != NO_ERROR) goto out; if (adm_ctx.resource) goto out; if (!conn_create(adm_ctx.resource_name, &res_opts)) retcode = ERR_NOMEM; out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_genlmsghdr *dh = info->userhdr; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; if (dh->minor > MINORMASK) { drbd_msg_put_info(adm_ctx.reply_skb, "requested minor out of range"); retcode = ERR_INVALID_REQUEST; goto out; } if (adm_ctx.volume > DRBD_VOLUME_MAX) { drbd_msg_put_info(adm_ctx.reply_skb, "requested volume id out of range"); retcode = ERR_INVALID_REQUEST; goto out; } if (adm_ctx.device) goto out; retcode = drbd_create_device(&adm_ctx, dh->minor); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } static enum drbd_ret_code adm_del_minor(struct drbd_device *device) { if (device->state.disk == D_DISKLESS && /* no need to be device->state.conn == C_STANDALONE && * we may want to delete a minor from a live replication group. */ device->state.role == R_SECONDARY) { _drbd_request_state(device, NS(conn, C_WF_REPORT_PARAMS), CS_VERBOSE + CS_WAIT_COMPLETE); drbd_delete_device(device); return NO_ERROR; } else return ERR_MINOR_CONFIGURED; } int drbd_adm_del_minor(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; retcode = adm_del_minor(adm_ctx.device); out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_down(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_resource *resource; struct drbd_connection *connection; struct drbd_device *device; int retcode; /* enum drbd_ret_code rsp. enum drbd_state_rv */ unsigned i; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; resource = adm_ctx.resource; /* demote */ for_each_connection(connection, resource) { struct drbd_peer_device *peer_device; idr_for_each_entry(&connection->peer_devices, peer_device, i) { retcode = drbd_set_role(peer_device->device, R_SECONDARY, 0); if (retcode < SS_SUCCESS) { drbd_msg_put_info(adm_ctx.reply_skb, "failed to demote"); goto out; } } retcode = conn_try_disconnect(connection, 0); if (retcode < SS_SUCCESS) { drbd_msg_put_info(adm_ctx.reply_skb, "failed to disconnect"); goto out; } } /* detach */ idr_for_each_entry(&resource->devices, device, i) { retcode = adm_detach(device, 0); if (retcode < SS_SUCCESS || retcode > NO_ERROR) { drbd_msg_put_info(adm_ctx.reply_skb, "failed to detach"); goto out; } } /* If we reach this, all volumes (of this connection) are Secondary, * Disconnected, Diskless, aka Unconfigured. Make sure all threads have * actually stopped, state handling only does drbd_thread_stop_nowait(). */ for_each_connection(connection, resource) drbd_thread_stop(&connection->worker); /* Now, nothing can fail anymore */ /* delete volumes */ idr_for_each_entry(&resource->devices, device, i) { retcode = adm_del_minor(device); if (retcode != NO_ERROR) { /* "can not happen" */ drbd_msg_put_info(adm_ctx.reply_skb, "failed to delete volume"); goto out; } } list_del_rcu(&resource->resources); synchronize_rcu(); drbd_free_resource(resource); retcode = NO_ERROR; out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } int drbd_adm_del_resource(struct sk_buff *skb, struct genl_info *info) { struct drbd_config_context adm_ctx; struct drbd_resource *resource; struct drbd_connection *connection; enum drbd_ret_code retcode; retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); if (!adm_ctx.reply_skb) return retcode; if (retcode != NO_ERROR) goto out; resource = adm_ctx.resource; for_each_connection(connection, resource) { if (connection->cstate > C_STANDALONE) { retcode = ERR_NET_CONFIGURED; goto out; } } if (!idr_is_empty(&resource->devices)) { retcode = ERR_RES_IN_USE; goto out; } list_del_rcu(&resource->resources); for_each_connection(connection, resource) drbd_thread_stop(&connection->worker); synchronize_rcu(); drbd_free_resource(resource); retcode = NO_ERROR; out: drbd_adm_finish(&adm_ctx, info, retcode); return 0; } void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib) { static atomic_t drbd_genl_seq = ATOMIC_INIT(2); /* two. */ struct sk_buff *msg; struct drbd_genlmsghdr *d_out; unsigned seq; int err = -ENOMEM; if (sib->sib_reason == SIB_SYNC_PROGRESS) { if (time_after(jiffies, device->rs_last_bcast + HZ)) device->rs_last_bcast = jiffies; else return; } seq = atomic_inc_return(&drbd_genl_seq); msg = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO); if (!msg) goto failed; err = -EMSGSIZE; d_out = genlmsg_put(msg, 0, seq, &drbd_genl_family, 0, DRBD_EVENT); if (!d_out) /* cannot happen, but anyways. */ goto nla_put_failure; d_out->minor = device_to_minor(device); d_out->ret_code = NO_ERROR; if (nla_put_status_info(msg, device, sib)) goto nla_put_failure; genlmsg_end(msg, d_out); err = drbd_genl_multicast_events(msg, 0); /* msg has been consumed or freed in netlink_broadcast() */ if (err && err != -ESRCH) goto failed; return; nla_put_failure: nlmsg_free(msg); failed: drbd_err(device, "Error %d while broadcasting event. " "Event seq:%u sib_reason:%u\n", err, seq, sib->sib_reason); } drbd-8.4.4/drbd/drbd_nla.c0000664000000000000000000000263612176213144013776 0ustar rootroot#include "drbd_wrappers.h" #include #include #include #include "drbd_nla.h" static int drbd_nla_check_mandatory(int maxtype, struct nlattr *nla) { struct nlattr *head = nla_data(nla); int len = nla_len(nla); int rem; /* * validate_nla (called from nla_parse_nested) ignores attributes * beyond maxtype, and does not understand the DRBD_GENLA_F_MANDATORY flag. * In order to have it validate attributes with the DRBD_GENLA_F_MANDATORY * flag set also, check and remove that flag before calling * nla_parse_nested. */ nla_for_each_attr(nla, head, len, rem) { if (nla->nla_type & DRBD_GENLA_F_MANDATORY) { nla->nla_type &= ~DRBD_GENLA_F_MANDATORY; if (nla_type(nla) > maxtype) return -EOPNOTSUPP; } } return 0; } int drbd_nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, const struct nla_policy *policy) { int err; err = drbd_nla_check_mandatory(maxtype, nla); if (!err) err = nla_parse_nested(tb, maxtype, nla, policy); return err; } struct nlattr *drbd_nla_find_nested(int maxtype, struct nlattr *nla, int attrtype) { int err; /* * If any nested attribute has the DRBD_GENLA_F_MANDATORY flag set and * we don't know about that attribute, reject all the nested * attributes. */ err = drbd_nla_check_mandatory(maxtype, nla); if (err) return ERR_PTR(err); return nla_find_nested(nla, attrtype); } drbd-8.4.4/drbd/drbd_nla.h0000664000000000000000000000044012176213144013772 0ustar rootroot#ifndef __DRBD_NLA_H #define __DRBD_NLA_H extern int drbd_nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, const struct nla_policy *policy); extern struct nlattr *drbd_nla_find_nested(int maxtype, struct nlattr *nla, int attrtype); #endif /* __DRBD_NLA_H */ drbd-8.4.4/drbd/drbd_proc.c0000664000000000000000000002402312221331365014156 0ustar rootroot/* drbd_proc.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include "drbd_int.h" static int drbd_proc_open(struct inode *inode, struct file *file); static int drbd_proc_release(struct inode *inode, struct file *file); struct proc_dir_entry *drbd_proc; const struct file_operations drbd_proc_fops = { .owner = THIS_MODULE, .open = drbd_proc_open, .read = seq_read, .llseek = seq_lseek, .release = drbd_proc_release, }; void seq_printf_with_thousands_grouping(struct seq_file *seq, long v) { /* v is in kB/sec. We don't expect TiByte/sec yet. */ if (unlikely(v >= 1000000)) { /* cool: > GiByte/s */ seq_printf(seq, "%ld,", v / 1000000); v %= 1000000; seq_printf(seq, "%03ld,%03ld", v/1000, v % 1000); } else if (likely(v >= 1000)) seq_printf(seq, "%ld,%03ld", v/1000, v % 1000); else seq_printf(seq, "%ld", v); } /*lge * progress bars shamelessly adapted from driver/md/md.c * output looks like * [=====>..............] 33.5% (23456/123456) * finish: 2:20:20 speed: 6,345 (6,456) K/sec */ static void drbd_syncer_progress(struct drbd_device *device, struct seq_file *seq) { unsigned long db, dt, dbdt, rt, rs_left; unsigned int res; int i, x, y; int stalled = 0; drbd_get_syncer_progress(device, &rs_left, &res); x = res/50; y = 20-x; seq_printf(seq, "\t["); for (i = 1; i < x; i++) seq_printf(seq, "="); seq_printf(seq, ">"); for (i = 0; i < y; i++) seq_printf(seq, "."); seq_printf(seq, "] "); if (device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) seq_printf(seq, "verified:"); else seq_printf(seq, "sync'ed:"); seq_printf(seq, "%3u.%u%% ", res / 10, res % 10); /* if more than a few GB, display in MB */ if (device->rs_total > (4UL << (30 - BM_BLOCK_SHIFT))) seq_printf(seq, "(%lu/%lu)M", (unsigned long) Bit2KB(rs_left >> 10), (unsigned long) Bit2KB(device->rs_total >> 10)); else seq_printf(seq, "(%lu/%lu)K", (unsigned long) Bit2KB(rs_left), (unsigned long) Bit2KB(device->rs_total)); seq_printf(seq, "\n\t"); /* see drivers/md/md.c * We do not want to overflow, so the order of operands and * the * 100 / 100 trick are important. We do a +1 to be * safe against division by zero. We only estimate anyway. * * dt: time from mark until now * db: blocks written from mark until now * rt: remaining time */ /* Rolling marks. last_mark+1 may just now be modified. last_mark+2 is * at least (DRBD_SYNC_MARKS-2)*DRBD_SYNC_MARK_STEP old, and has at * least DRBD_SYNC_MARK_STEP time before it will be modified. */ /* ------------------------ ~18s average ------------------------ */ i = (device->rs_last_mark + 2) % DRBD_SYNC_MARKS; dt = (jiffies - device->rs_mark_time[i]) / HZ; if (dt > 180) stalled = 1; if (!dt) dt++; db = device->rs_mark_left[i] - rs_left; rt = (dt * (rs_left / (db/100+1)))/100; /* seconds */ seq_printf(seq, "finish: %lu:%02lu:%02lu", rt / 3600, (rt % 3600) / 60, rt % 60); dbdt = Bit2KB(db/dt); seq_printf(seq, " speed: "); seq_printf_with_thousands_grouping(seq, dbdt); seq_printf(seq, " ("); /* ------------------------- ~3s average ------------------------ */ if (proc_details >= 1) { /* this is what drbd_rs_should_slow_down() uses */ i = (device->rs_last_mark + DRBD_SYNC_MARKS-1) % DRBD_SYNC_MARKS; dt = (jiffies - device->rs_mark_time[i]) / HZ; if (!dt) dt++; db = device->rs_mark_left[i] - rs_left; dbdt = Bit2KB(db/dt); seq_printf_with_thousands_grouping(seq, dbdt); seq_printf(seq, " -- "); } /* --------------------- long term average ---------------------- */ /* mean speed since syncer started * we do account for PausedSync periods */ dt = (jiffies - device->rs_start - device->rs_paused) / HZ; if (dt == 0) dt = 1; db = device->rs_total - rs_left; dbdt = Bit2KB(db/dt); seq_printf_with_thousands_grouping(seq, dbdt); seq_printf(seq, ")"); if (device->state.conn == C_SYNC_TARGET || device->state.conn == C_VERIFY_S) { seq_printf(seq, " want: "); seq_printf_with_thousands_grouping(seq, device->c_sync_rate); } seq_printf(seq, " K/sec%s\n", stalled ? " (stalled)" : ""); if (proc_details >= 1) { /* 64 bit: * we convert to sectors in the display below. */ unsigned long bm_bits = drbd_bm_bits(device); unsigned long bit_pos; unsigned long long stop_sector = 0; if (device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) { bit_pos = bm_bits - device->ov_left; if (verify_can_do_stop_sector(device)) stop_sector = device->ov_stop_sector; } else bit_pos = device->bm_resync_fo; /* Total sectors may be slightly off for oddly * sized devices. So what. */ seq_printf(seq, "\t%3d%% sector pos: %llu/%llu", (int)(bit_pos / (bm_bits/100+1)), (unsigned long long)bit_pos * BM_SECT_PER_BIT, (unsigned long long)bm_bits * BM_SECT_PER_BIT); if (stop_sector != 0 && stop_sector != ULLONG_MAX) seq_printf(seq, " stop sector: %llu", stop_sector); seq_printf(seq, "\n"); } } static void resync_dump_detail(struct seq_file *seq, struct lc_element *e) { struct bm_extent *bme = lc_entry(e, struct bm_extent, lce); seq_printf(seq, "%5d %s %s\n", bme->rs_left, bme->flags & BME_NO_WRITES ? "NO_WRITES" : "---------", bme->flags & BME_LOCKED ? "LOCKED" : "------" ); } static int drbd_seq_show(struct seq_file *seq, void *v) { int i, prev_i = -1; const char *sn; struct drbd_device *device; struct net_conf *nc; char wp; static char write_ordering_chars[] = { [WO_none] = 'n', [WO_drain_io] = 'd', [WO_bdev_flush] = 'f', [WO_bio_barrier] = 'b', }; seq_printf(seq, "version: " REL_VERSION " (api:%d/proto:%d-%d)\n%s\n", API_VERSION, PRO_VERSION_MIN, PRO_VERSION_MAX, drbd_buildtag()); /* cs .. connection state ro .. node role (local/remote) ds .. disk state (local/remote) protocol various flags ns .. network send nr .. network receive dw .. disk write dr .. disk read al .. activity log write count bm .. bitmap update write count pe .. pending (waiting for ack or data reply) ua .. unack'd (still need to send ack or data reply) ap .. application requests accepted, but not yet completed ep .. number of epochs currently "on the fly", P_BARRIER_ACK pending wo .. write ordering mode currently in use oos .. known out-of-sync kB */ rcu_read_lock(); idr_for_each_entry(&drbd_devices, device, i) { if (prev_i != i - 1) seq_printf(seq, "\n"); prev_i = i; sn = drbd_conn_str(device->state.conn); if (device->state.conn == C_STANDALONE && device->state.disk == D_DISKLESS && device->state.role == R_SECONDARY) { seq_printf(seq, "%2d: cs:Unconfigured\n", i); } else { /* reset device->congestion_reason */ bdi_rw_congested(&device->rq_queue->backing_dev_info); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); wp = nc ? nc->wire_protocol - DRBD_PROT_A + 'A' : ' '; seq_printf(seq, "%2d: cs:%s ro:%s/%s ds:%s/%s %c %c%c%c%c%c%c\n" " ns:%u nr:%u dw:%u dr:%u al:%u bm:%u " "lo:%d pe:%d ua:%d ap:%d ep:%d wo:%c", i, sn, drbd_role_str(device->state.role), drbd_role_str(device->state.peer), drbd_disk_str(device->state.disk), drbd_disk_str(device->state.pdsk), wp, drbd_suspended(device) ? 's' : 'r', device->state.aftr_isp ? 'a' : '-', device->state.peer_isp ? 'p' : '-', device->state.user_isp ? 'u' : '-', device->congestion_reason ?: '-', test_bit(AL_SUSPENDED, &device->flags) ? 's' : '-', device->send_cnt/2, device->recv_cnt/2, device->writ_cnt/2, device->read_cnt/2, device->al_writ_cnt, device->bm_writ_cnt, atomic_read(&device->local_cnt), atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt), atomic_read(&device->unacked_cnt), atomic_read(&device->ap_bio_cnt), first_peer_device(device)->connection->epochs, write_ordering_chars[first_peer_device(device)->connection->write_ordering] ); seq_printf(seq, " oos:%llu\n", Bit2KB((unsigned long long) drbd_bm_total_weight(device))); } if (device->state.conn == C_SYNC_SOURCE || device->state.conn == C_SYNC_TARGET || device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) drbd_syncer_progress(device, seq); if (proc_details >= 1 && get_ldev_if_state(device, D_FAILED)) { lc_seq_printf_stats(seq, device->resync); lc_seq_printf_stats(seq, device->act_log); put_ldev(device); } if (proc_details >= 2) { if (device->resync) { lc_seq_dump_details(seq, device->resync, "rs_left", resync_dump_detail); } } } rcu_read_unlock(); return 0; } static int drbd_proc_open(struct inode *inode, struct file *file) { int err; if (try_module_get(THIS_MODULE)) { err = single_open(file, drbd_seq_show, PDE_DATA(inode)); if (err) module_put(THIS_MODULE); return err; } return -ENODEV; } static int drbd_proc_release(struct inode *inode, struct file *file) { module_put(THIS_MODULE); return single_release(inode, file); } /* PROC FS stuff end */ drbd-8.4.4/drbd/drbd_protocol.h0000664000000000000000000002043612221314366015067 0ustar rootroot#ifndef __DRBD_PROTOCOL_H #define __DRBD_PROTOCOL_H enum drbd_packet { /* receiver (data socket) */ P_DATA = 0x00, P_DATA_REPLY = 0x01, /* Response to P_DATA_REQUEST */ P_RS_DATA_REPLY = 0x02, /* Response to P_RS_DATA_REQUEST */ P_BARRIER = 0x03, P_BITMAP = 0x04, P_BECOME_SYNC_TARGET = 0x05, P_BECOME_SYNC_SOURCE = 0x06, P_UNPLUG_REMOTE = 0x07, /* Used at various times to hint the peer */ P_DATA_REQUEST = 0x08, /* Used to ask for a data block */ P_RS_DATA_REQUEST = 0x09, /* Used to ask for a data block for resync */ P_SYNC_PARAM = 0x0a, P_PROTOCOL = 0x0b, P_UUIDS = 0x0c, P_SIZES = 0x0d, P_STATE = 0x0e, P_SYNC_UUID = 0x0f, P_AUTH_CHALLENGE = 0x10, P_AUTH_RESPONSE = 0x11, P_STATE_CHG_REQ = 0x12, /* asender (meta socket */ P_PING = 0x13, P_PING_ACK = 0x14, P_RECV_ACK = 0x15, /* Used in protocol B */ P_WRITE_ACK = 0x16, /* Used in protocol C */ P_RS_WRITE_ACK = 0x17, /* Is a P_WRITE_ACK, additionally call set_in_sync(). */ P_SUPERSEDED = 0x18, /* Used in proto C, two-primaries conflict detection */ P_NEG_ACK = 0x19, /* Sent if local disk is unusable */ P_NEG_DREPLY = 0x1a, /* Local disk is broken... */ P_NEG_RS_DREPLY = 0x1b, /* Local disk is broken... */ P_BARRIER_ACK = 0x1c, P_STATE_CHG_REPLY = 0x1d, /* "new" commands, no longer fitting into the ordering scheme above */ P_OV_REQUEST = 0x1e, /* data socket */ P_OV_REPLY = 0x1f, P_OV_RESULT = 0x20, /* meta socket */ P_CSUM_RS_REQUEST = 0x21, /* data socket */ P_RS_IS_IN_SYNC = 0x22, /* meta socket */ P_SYNC_PARAM89 = 0x23, /* data socket, protocol version 89 replacement for P_SYNC_PARAM */ P_COMPRESSED_BITMAP = 0x24, /* compressed or otherwise encoded bitmap transfer */ /* P_CKPT_FENCE_REQ = 0x25, * currently reserved for protocol D */ /* P_CKPT_DISABLE_REQ = 0x26, * currently reserved for protocol D */ P_DELAY_PROBE = 0x27, /* is used on BOTH sockets */ P_OUT_OF_SYNC = 0x28, /* Mark as out of sync (Outrunning), data socket */ P_RS_CANCEL = 0x29, /* meta: Used to cancel RS_DATA_REQUEST packet by SyncSource */ P_CONN_ST_CHG_REQ = 0x2a, /* data sock: Connection wide state request */ P_CONN_ST_CHG_REPLY = 0x2b, /* meta sock: Connection side state req reply */ P_RETRY_WRITE = 0x2c, /* Protocol C: retry conflicting write request */ P_PROTOCOL_UPDATE = 0x2d, /* data sock: is used in established connections */ /* 0x2e to 0x30 reserved, used in drbd 9 */ /* REQ_DISCARD. We used "discard" in different contexts before, * which is why I chose TRIM here, to disambiguate. */ P_TRIM = 0x31, P_MAY_IGNORE = 0x100, /* Flag to test if (cmd > P_MAY_IGNORE) ... */ P_MAX_OPT_CMD = 0x101, /* special command ids for handshake */ P_INITIAL_META = 0xfff1, /* First Packet on the MetaSock */ P_INITIAL_DATA = 0xfff2, /* First Packet on the Socket */ P_CONNECTION_FEATURES = 0xfffe /* FIXED for the next century! */ }; #ifndef __packed #define __packed __attribute__((packed)) #endif /* This is the layout for a packet on the wire. * The byteorder is the network byte order. * (except block_id and barrier fields. * these are pointers to local structs * and have no relevance for the partner, * which just echoes them as received.) * * NOTE that the payload starts at a long aligned offset, * regardless of 32 or 64 bit arch! */ struct p_header80 { u32 magic; u16 command; u16 length; /* bytes of data after this header */ } __packed; /* Header for big packets, Used for data packets exceeding 64kB */ struct p_header95 { u16 magic; /* use DRBD_MAGIC_BIG here */ u16 command; u32 length; } __packed; struct p_header100 { u32 magic; u16 volume; u16 command; u32 length; u32 pad; } __packed; /* these defines must not be changed without changing the protocol version */ #define DP_HARDBARRIER 1 /* no longer used */ #define DP_RW_SYNC 2 /* equals REQ_SYNC */ #define DP_MAY_SET_IN_SYNC 4 #define DP_UNPLUG 8 /* equals REQ_UNPLUG */ #define DP_FUA 16 /* equals REQ_FUA */ #define DP_FLUSH 32 /* equals REQ_FLUSH */ #define DP_DISCARD 64 /* equals REQ_DISCARD */ #define DP_SEND_RECEIVE_ACK 128 /* This is a proto B write request */ #define DP_SEND_WRITE_ACK 256 /* This is a proto C write request */ struct p_data { u64 sector; /* 64 bits sector number */ u64 block_id; /* to identify the request in protocol B&C */ u32 seq_num; u32 dp_flags; } __packed; struct p_trim { struct p_data p_data; u32 size; /* == bio->bi_size */ } __packed; /* * commands which share a struct: * p_block_ack: * P_RECV_ACK (proto B), P_WRITE_ACK (proto C), * P_SUPERSEDED (proto C, two-primaries conflict detection) * p_block_req: * P_DATA_REQUEST, P_RS_DATA_REQUEST */ struct p_block_ack { u64 sector; u64 block_id; u32 blksize; u32 seq_num; } __packed; struct p_block_req { u64 sector; u64 block_id; u32 blksize; u32 pad; /* to multiple of 8 Byte */ } __packed; /* * commands with their own struct for additional fields: * P_CONNECTION_FEATURES * P_BARRIER * P_BARRIER_ACK * P_SYNC_PARAM * ReportParams */ #define FF_TRIM 1 struct p_connection_features { u32 protocol_min; u32 feature_flags; u32 protocol_max; /* should be more than enough for future enhancements * for now, feature_flags and the reserved array shall be zero. */ u32 _pad; u64 reserved[7]; } __packed; struct p_barrier { u32 barrier; /* barrier number _handle_ only */ u32 pad; /* to multiple of 8 Byte */ } __packed; struct p_barrier_ack { u32 barrier; u32 set_size; } __packed; struct p_rs_param { u32 resync_rate; /* Since protocol version 88 and higher. */ char verify_alg[0]; } __packed; struct p_rs_param_89 { u32 resync_rate; /* protocol version 89: */ char verify_alg[SHARED_SECRET_MAX]; char csums_alg[SHARED_SECRET_MAX]; } __packed; struct p_rs_param_95 { u32 resync_rate; char verify_alg[SHARED_SECRET_MAX]; char csums_alg[SHARED_SECRET_MAX]; u32 c_plan_ahead; u32 c_delay_target; u32 c_fill_target; u32 c_max_rate; } __packed; enum drbd_conn_flags { CF_DISCARD_MY_DATA = 1, CF_DRY_RUN = 2, }; struct p_protocol { u32 protocol; u32 after_sb_0p; u32 after_sb_1p; u32 after_sb_2p; u32 conn_flags; u32 two_primaries; /* Since protocol version 87 and higher. */ char integrity_alg[0]; } __packed; struct p_uuids { u64 uuid[UI_EXTENDED_SIZE]; } __packed; struct p_rs_uuid { u64 uuid; } __packed; struct p_sizes { u64 d_size; /* size of disk */ u64 u_size; /* user requested size */ u64 c_size; /* current exported size */ u32 max_bio_size; /* Maximal size of a BIO */ u16 queue_order_type; /* not yet implemented in DRBD*/ u16 dds_flags; /* use enum dds_flags here. */ } __packed; struct p_state { u32 state; } __packed; struct p_req_state { u32 mask; u32 val; } __packed; struct p_req_state_reply { u32 retcode; } __packed; struct p_drbd06_param { u64 size; u32 state; u32 blksize; u32 protocol; u32 version; u32 gen_cnt[5]; u32 bit_map_gen[5]; } __packed; struct p_block_desc { u64 sector; u32 blksize; u32 pad; /* to multiple of 8 Byte */ } __packed; /* Valid values for the encoding field. * Bump proto version when changing this. */ enum drbd_bitmap_code { /* RLE_VLI_Bytes = 0, * and other bit variants had been defined during * algorithm evaluation. */ RLE_VLI_Bits = 2, }; struct p_compressed_bm { /* (encoding & 0x0f): actual encoding, see enum drbd_bitmap_code * (encoding & 0x80): polarity (set/unset) of first runlength * ((encoding >> 4) & 0x07): pad_bits, number of trailing zero bits * used to pad up to head.length bytes */ u8 encoding; u8 code[0]; } __packed; struct p_delay_probe93 { u32 seq_num; /* sequence number to match the two probe packets */ u32 offset; /* usecs the probe got sent after the reference time point */ } __packed; /* * Bitmap packets need to fit within a single page on the sender and receiver, * so we are limited to 4 KiB (and not to PAGE_SIZE, which can be bigger). */ #define DRBD_SOCKET_BUFFER_SIZE 4096 #endif /* __DRBD_PROTOCOL_H */ drbd-8.4.4/drbd/drbd_receiver.c0000664000000000000000000051675512225234677015055 0ustar rootroot/* drbd_receiver.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #define __KERNEL_SYSCALLS__ #include #include #include #include "drbd_int.h" #include "drbd_protocol.h" #include "drbd_req.h" #include "drbd_vli.h" #include #define PRO_FEATURES (FF_TRIM) struct flush_work { struct drbd_work w; struct drbd_epoch *epoch; }; struct packet_info { enum drbd_packet cmd; unsigned int size; unsigned int vnr; void *data; }; enum finish_epoch { FE_STILL_LIVE, FE_DESTROYED, FE_RECYCLED, }; static int drbd_do_features(struct drbd_connection *connection); static int drbd_do_auth(struct drbd_connection *connection); static int drbd_disconnected(struct drbd_peer_device *); static enum finish_epoch drbd_may_finish_epoch(struct drbd_connection *, struct drbd_epoch *, enum epoch_event); static int e_end_block(struct drbd_work *, int); static struct drbd_epoch *previous_epoch(struct drbd_connection *connection, struct drbd_epoch *epoch) { struct drbd_epoch *prev; spin_lock(&connection->epoch_lock); prev = list_entry(epoch->list.prev, struct drbd_epoch, list); if (prev == epoch || prev == connection->current_epoch) prev = NULL; spin_unlock(&connection->epoch_lock); return prev; } #ifdef DBG_ASSERTS void drbd_assert_breakpoint(struct drbd_device *device, char *exp, char *file, int line) { drbd_err(device, "ASSERT( %s ) in %s:%d\n", exp, file, line); } #endif #define GFP_TRY (__GFP_HIGHMEM | __GFP_NOWARN) /* * some helper functions to deal with single linked page lists, * page->private being our "next" pointer. */ /* If at least n pages are linked at head, get n pages off. * Otherwise, don't modify head, and return NULL. * Locking is the responsibility of the caller. */ static struct page *page_chain_del(struct page **head, int n) { struct page *page; struct page *tmp; BUG_ON(!n); BUG_ON(!head); page = *head; if (!page) return NULL; while (page) { tmp = page_chain_next(page); if (--n == 0) break; /* found sufficient pages */ if (tmp == NULL) /* insufficient pages, don't use any of them. */ return NULL; page = tmp; } /* add end of list marker for the returned list */ set_page_private(page, 0); /* actual return value, and adjustment of head */ page = *head; *head = tmp; return page; } /* may be used outside of locks to find the tail of a (usually short) * "private" page chain, before adding it back to a global chain head * with page_chain_add() under a spinlock. */ static struct page *page_chain_tail(struct page *page, int *len) { struct page *tmp; int i = 1; while ((tmp = page_chain_next(page))) ++i, page = tmp; if (len) *len = i; return page; } static int page_chain_free(struct page *page) { struct page *tmp; int i = 0; page_chain_for_each_safe(page, tmp) { put_page(page); ++i; } return i; } static void page_chain_add(struct page **head, struct page *chain_first, struct page *chain_last) { #if 1 struct page *tmp; tmp = page_chain_tail(chain_first, NULL); BUG_ON(tmp != chain_last); #endif /* add chain to head */ set_page_private(chain_last, (unsigned long)*head); *head = chain_first; } static struct page *__drbd_alloc_pages(struct drbd_device *device, unsigned int number) { struct page *page = NULL; struct page *tmp = NULL; unsigned int i = 0; /* Yes, testing drbd_pp_vacant outside the lock is racy. * So what. It saves a spin_lock. */ if (drbd_pp_vacant >= number) { spin_lock(&drbd_pp_lock); page = page_chain_del(&drbd_pp_pool, number); if (page) drbd_pp_vacant -= number; spin_unlock(&drbd_pp_lock); if (page) return page; } /* GFP_TRY, because we must not cause arbitrary write-out: in a DRBD * "criss-cross" setup, that might cause write-out on some other DRBD, * which in turn might block on the other node at this very place. */ for (i = 0; i < number; i++) { tmp = alloc_page(GFP_TRY); if (!tmp) break; set_page_private(tmp, (unsigned long)page); page = tmp; } if (i == number) return page; /* Not enough pages immediately available this time. * No need to jump around here, drbd_alloc_pages will retry this * function "soon". */ if (page) { tmp = page_chain_tail(page, NULL); spin_lock(&drbd_pp_lock); page_chain_add(&drbd_pp_pool, page, tmp); drbd_pp_vacant += i; spin_unlock(&drbd_pp_lock); } return NULL; } /* kick lower level device, if we have more than (arbitrary number) * reference counts on it, which typically are locally submitted io * requests. don't use unacked_cnt, so we speed up proto A and B, too. */ static void maybe_kick_lo(struct drbd_device *device) { unsigned int watermark; struct net_conf *nc; rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); watermark = nc ? nc->unplug_watermark : 1000000; rcu_read_unlock(); if (atomic_read(&device->local_cnt) >= watermark) drbd_kick_lo(device); } static void reclaim_finished_net_peer_reqs(struct drbd_device *device, struct list_head *to_be_freed) { struct drbd_peer_request *peer_req, *tmp; /* The EEs are always appended to the end of the list. Since they are sent in order over the wire, they have to finish in order. As soon as we see the first not finished we can stop to examine the list... */ list_for_each_entry_safe(peer_req, tmp, &device->net_ee, w.list) { if (drbd_peer_req_has_active_page(peer_req)) break; list_move(&peer_req->w.list, to_be_freed); } } static void drbd_kick_lo_and_reclaim_net(struct drbd_device *device) { LIST_HEAD(reclaimed); struct drbd_peer_request *peer_req, *t; maybe_kick_lo(device); spin_lock_irq(&device->resource->req_lock); reclaim_finished_net_peer_reqs(device, &reclaimed); spin_unlock_irq(&device->resource->req_lock); list_for_each_entry_safe(peer_req, t, &reclaimed, w.list) drbd_free_net_peer_req(device, peer_req); } /** * drbd_alloc_pages() - Returns @number pages, retries forever (or until signalled) * @device: DRBD device. * @number: number of pages requested * @retry: whether to retry, if not enough pages are available right now * * Tries to allocate number pages, first from our own page pool, then from * the kernel. * Possibly retry until DRBD frees sufficient pages somewhere else. * * If this allocation would exceed the max_buffers setting, we throttle * allocation (schedule_timeout) to give the system some room to breathe. * * We do not use max-buffers as hard limit, because it could lead to * congestion and further to a distributed deadlock during online-verify or * (checksum based) resync, if the max-buffers, socket buffer sizes and * resync-rate settings are mis-configured. * * Returns a page chain linked via page->private. */ struct page *drbd_alloc_pages(struct drbd_peer_device *peer_device, unsigned int number, bool retry) { struct drbd_device *device = peer_device->device; struct page *page = NULL; struct net_conf *nc; DEFINE_WAIT(wait); unsigned int mxb; rcu_read_lock(); nc = rcu_dereference(peer_device->connection->net_conf); mxb = nc ? nc->max_buffers : 1000000; rcu_read_unlock(); if (atomic_read(&device->pp_in_use) < mxb) page = __drbd_alloc_pages(device, number); while (page == NULL) { prepare_to_wait(&drbd_pp_wait, &wait, TASK_INTERRUPTIBLE); drbd_kick_lo_and_reclaim_net(device); if (atomic_read(&device->pp_in_use) < mxb) { page = __drbd_alloc_pages(device, number); if (page) break; } if (!retry) break; if (signal_pending(current)) { drbd_warn(device, "drbd_alloc_pages interrupted!\n"); break; } if (schedule_timeout(HZ/10) == 0) mxb = UINT_MAX; } finish_wait(&drbd_pp_wait, &wait); if (page) atomic_add(number, &device->pp_in_use); return page; } /* Must not be used from irq, as that may deadlock: see drbd_alloc_pages. * Is also used from inside an other spin_lock_irq(&resource->req_lock); * Either links the page chain back to the global pool, * or returns all pages to the system. */ static void drbd_free_pages(struct drbd_device *device, struct page *page, int is_net) { atomic_t *a = is_net ? &device->pp_in_use_by_net : &device->pp_in_use; int i; if (page == NULL) return; if (drbd_pp_vacant > (DRBD_MAX_BIO_SIZE/PAGE_SIZE) * minor_count) i = page_chain_free(page); else { struct page *tmp; tmp = page_chain_tail(page, &i); spin_lock(&drbd_pp_lock); page_chain_add(&drbd_pp_pool, page, tmp); drbd_pp_vacant += i; spin_unlock(&drbd_pp_lock); } i = atomic_sub_return(i, a); if (i < 0) drbd_warn(device, "ASSERTION FAILED: %s: %d < 0\n", is_net ? "pp_in_use_by_net" : "pp_in_use", i); wake_up(&drbd_pp_wait); } /* You need to hold the req_lock: _drbd_wait_ee_list_empty() You must not have the req_lock: drbd_free_peer_req() drbd_alloc_peer_req() drbd_free_peer_reqs() drbd_ee_fix_bhs() drbd_finish_peer_reqs() drbd_clear_done_ee() drbd_wait_ee_list_empty() */ struct drbd_peer_request * drbd_alloc_peer_req(struct drbd_peer_device *peer_device, u64 id, sector_t sector, unsigned int data_size, bool has_payload, gfp_t gfp_mask) __must_hold(local) { struct drbd_device *device = peer_device->device; struct drbd_peer_request *peer_req; struct page *page = NULL; unsigned nr_pages = (data_size + PAGE_SIZE -1) >> PAGE_SHIFT; if (drbd_insert_fault(device, DRBD_FAULT_AL_EE)) return NULL; peer_req = mempool_alloc(drbd_ee_mempool, gfp_mask & ~__GFP_HIGHMEM); if (!peer_req) { if (!(gfp_mask & __GFP_NOWARN)) drbd_err(device, "%s: allocation failed\n", __func__); return NULL; } if (has_payload && data_size) { page = drbd_alloc_pages(peer_device, nr_pages, (gfp_mask & __GFP_WAIT)); if (!page) goto fail; } drbd_clear_interval(&peer_req->i); peer_req->i.size = data_size; peer_req->i.sector = sector; peer_req->i.local = false; peer_req->i.waiting = false; peer_req->epoch = NULL; peer_req->peer_device = peer_device; peer_req->pages = page; atomic_set(&peer_req->pending_bios, 0); peer_req->flags = 0; /* * The block_id is opaque to the receiver. It is not endianness * converted, and sent back to the sender unchanged. */ peer_req->block_id = id; return peer_req; fail: mempool_free(peer_req, drbd_ee_mempool); return NULL; } void __drbd_free_peer_req(struct drbd_device *device, struct drbd_peer_request *peer_req, int is_net) { if (peer_req->flags & EE_HAS_DIGEST) kfree(peer_req->digest); drbd_free_pages(device, peer_req->pages, is_net); D_ASSERT(device, atomic_read(&peer_req->pending_bios) == 0); D_ASSERT(device, drbd_interval_empty(&peer_req->i)); mempool_free(peer_req, drbd_ee_mempool); } int drbd_free_peer_reqs(struct drbd_device *device, struct list_head *list) { LIST_HEAD(work_list); struct drbd_peer_request *peer_req, *t; int count = 0; int is_net = list == &device->net_ee; spin_lock_irq(&device->resource->req_lock); list_splice_init(list, &work_list); spin_unlock_irq(&device->resource->req_lock); list_for_each_entry_safe(peer_req, t, &work_list, w.list) { __drbd_free_peer_req(device, peer_req, is_net); count++; } return count; } /* * See also comments in _req_mod(,BARRIER_ACKED) and receive_Barrier. */ static int drbd_finish_peer_reqs(struct drbd_device *device) { LIST_HEAD(work_list); LIST_HEAD(reclaimed); struct drbd_peer_request *peer_req, *t; int err = 0; spin_lock_irq(&device->resource->req_lock); reclaim_finished_net_peer_reqs(device, &reclaimed); list_splice_init(&device->done_ee, &work_list); spin_unlock_irq(&device->resource->req_lock); list_for_each_entry_safe(peer_req, t, &reclaimed, w.list) drbd_free_net_peer_req(device, peer_req); /* possible callbacks here: * e_end_block, and e_end_resync_block, e_send_superseded. * all ignore the last argument. */ list_for_each_entry_safe(peer_req, t, &work_list, w.list) { int err2; /* list_del not necessary, next/prev members not touched */ err2 = peer_req->w.cb(&peer_req->w, !!err); if (!err) err = err2; drbd_free_peer_req(device, peer_req); } wake_up(&device->ee_wait); return err; } static void _drbd_wait_ee_list_empty(struct drbd_device *device, struct list_head *head) { DEFINE_WAIT(wait); /* avoids spin_lock/unlock * and calling prepare_to_wait in the fast path */ while (!list_empty(head)) { prepare_to_wait(&device->ee_wait, &wait, TASK_UNINTERRUPTIBLE); spin_unlock_irq(&device->resource->req_lock); drbd_kick_lo(device); schedule(); finish_wait(&device->ee_wait, &wait); spin_lock_irq(&device->resource->req_lock); } } static void drbd_wait_ee_list_empty(struct drbd_device *device, struct list_head *head) { spin_lock_irq(&device->resource->req_lock); _drbd_wait_ee_list_empty(device, head); spin_unlock_irq(&device->resource->req_lock); } static int drbd_recv_short(struct socket *sock, void *buf, size_t size, int flags) { mm_segment_t oldfs; struct kvec iov = { .iov_base = buf, .iov_len = size, }; struct msghdr msg = { .msg_iovlen = 1, .msg_iov = (struct iovec *)&iov, .msg_flags = (flags ? flags : MSG_WAITALL | MSG_NOSIGNAL) }; int rv; oldfs = get_fs(); set_fs(KERNEL_DS); rv = sock_recvmsg(sock, &msg, size, msg.msg_flags); set_fs(oldfs); return rv; } static int drbd_recv(struct drbd_connection *connection, void *buf, size_t size) { int rv; rv = drbd_recv_short(connection->data.socket, buf, size, 0); if (rv < 0) { if (rv == -ECONNRESET) drbd_info(connection, "sock was reset by peer\n"); else if (rv != -ERESTARTSYS) drbd_err(connection, "sock_recvmsg returned %d\n", rv); } else if (rv == 0) { if (test_bit(DISCONNECT_SENT, &connection->flags)) { long t; rcu_read_lock(); t = rcu_dereference(connection->net_conf)->ping_timeo * HZ/10; rcu_read_unlock(); t = wait_event_timeout(connection->ping_wait, connection->cstate < C_WF_REPORT_PARAMS, t); if (t) goto out; } drbd_info(connection, "sock was shut down by peer\n"); } if (rv != size) conn_request_state(connection, NS(conn, C_BROKEN_PIPE), CS_HARD); out: return rv; } static int drbd_recv_all(struct drbd_connection *connection, void *buf, size_t size) { int err; err = drbd_recv(connection, buf, size); if (err != size) { if (err >= 0) err = -EIO; } else err = 0; return err; } static int drbd_recv_all_warn(struct drbd_connection *connection, void *buf, size_t size) { int err; err = drbd_recv_all(connection, buf, size); if (err && !signal_pending(current)) drbd_warn(connection, "short read (expected size %d)\n", (int)size); return err; } /* quoting tcp(7): * On individual connections, the socket buffer size must be set prior to the * listen(2) or connect(2) calls in order to have it take effect. * This is our wrapper to do so. */ static void drbd_setbufsize(struct socket *sock, unsigned int snd, unsigned int rcv) { /* open coded SO_SNDBUF, SO_RCVBUF */ if (snd) { sock->sk->sk_sndbuf = snd; sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK; } if (rcv) { sock->sk->sk_rcvbuf = rcv; sock->sk->sk_userlocks |= SOCK_RCVBUF_LOCK; } } static struct socket *drbd_try_connect(struct drbd_connection *connection) { const char *what; struct socket *sock; struct sockaddr_in6 src_in6; struct sockaddr_in6 peer_in6; struct net_conf *nc; int err, peer_addr_len, my_addr_len; int sndbuf_size, rcvbuf_size, connect_int; int disconnect_on_error = 1; rcu_read_lock(); nc = rcu_dereference(connection->net_conf); if (!nc) { rcu_read_unlock(); return NULL; } sndbuf_size = nc->sndbuf_size; rcvbuf_size = nc->rcvbuf_size; connect_int = nc->connect_int; rcu_read_unlock(); my_addr_len = min_t(int, connection->my_addr_len, sizeof(src_in6)); memcpy(&src_in6, &connection->my_addr, my_addr_len); if (((struct sockaddr *)&connection->my_addr)->sa_family == AF_INET6) src_in6.sin6_port = 0; else ((struct sockaddr_in *)&src_in6)->sin_port = 0; /* AF_INET & AF_SCI */ peer_addr_len = min_t(int, connection->peer_addr_len, sizeof(src_in6)); memcpy(&peer_in6, &connection->peer_addr, peer_addr_len); what = "sock_create_kern"; err = sock_create_kern(((struct sockaddr *)&src_in6)->sa_family, SOCK_STREAM, IPPROTO_TCP, &sock); if (err < 0) { sock = NULL; goto out; } sock->sk->sk_rcvtimeo = sock->sk->sk_sndtimeo = connect_int * HZ; drbd_setbufsize(sock, sndbuf_size, rcvbuf_size); /* explicitly bind to the configured IP as source IP * for the outgoing connections. * This is needed for multihomed hosts and to be * able to use lo: interfaces for drbd. * Make sure to use 0 as port number, so linux selects * a free one dynamically. */ what = "bind before connect"; err = sock->ops->bind(sock, (struct sockaddr *) &src_in6, my_addr_len); if (err < 0) goto out; /* connect may fail, peer not yet available. * stay C_WF_CONNECTION, don't go Disconnecting! */ disconnect_on_error = 0; what = "connect"; err = sock->ops->connect(sock, (struct sockaddr *) &peer_in6, peer_addr_len, 0); out: if (err < 0) { if (sock) { sock_release(sock); sock = NULL; } switch (-err) { /* timeout, busy, signal pending */ case ETIMEDOUT: case EAGAIN: case EINPROGRESS: case EINTR: case ERESTARTSYS: /* peer not (yet) available, network problem */ case ECONNREFUSED: case ENETUNREACH: case EHOSTDOWN: case EHOSTUNREACH: disconnect_on_error = 0; break; default: drbd_err(connection, "%s failed, err = %d\n", what, err); } if (disconnect_on_error) conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); } return sock; } struct accept_wait_data { struct drbd_connection *connection; struct socket *s_listen; struct completion door_bell; void (*original_sk_state_change)(struct sock *sk); }; static void drbd_incoming_connection(struct sock *sk) { struct accept_wait_data *ad = sk->sk_user_data; void (*state_change)(struct sock *sk); state_change = ad->original_sk_state_change; if (sk->sk_state == TCP_ESTABLISHED) complete(&ad->door_bell); state_change(sk); } static int prepare_listen_socket(struct drbd_connection *connection, struct accept_wait_data *ad) { int err, sndbuf_size, rcvbuf_size, my_addr_len; struct sockaddr_in6 my_addr; struct socket *s_listen; struct net_conf *nc; const char *what; rcu_read_lock(); nc = rcu_dereference(connection->net_conf); if (!nc) { rcu_read_unlock(); return -EIO; } sndbuf_size = nc->sndbuf_size; rcvbuf_size = nc->rcvbuf_size; rcu_read_unlock(); my_addr_len = min_t(int, connection->my_addr_len, sizeof(struct sockaddr_in6)); memcpy(&my_addr, &connection->my_addr, my_addr_len); what = "sock_create_kern"; err = sock_create_kern(((struct sockaddr *)&my_addr)->sa_family, SOCK_STREAM, IPPROTO_TCP, &s_listen); if (err) { s_listen = NULL; goto out; } s_listen->sk->sk_reuse = SK_CAN_REUSE; /* SO_REUSEADDR */ drbd_setbufsize(s_listen, sndbuf_size, rcvbuf_size); what = "bind before listen"; err = s_listen->ops->bind(s_listen, (struct sockaddr *)&my_addr, my_addr_len); if (err < 0) goto out; ad->s_listen = s_listen; write_lock_bh(&s_listen->sk->sk_callback_lock); ad->original_sk_state_change = s_listen->sk->sk_state_change; s_listen->sk->sk_state_change = drbd_incoming_connection; s_listen->sk->sk_user_data = ad; write_unlock_bh(&s_listen->sk->sk_callback_lock); what = "listen"; err = s_listen->ops->listen(s_listen, 5); if (err < 0) goto out; return 0; out: if (s_listen) sock_release(s_listen); if (err < 0) { if (err != -EAGAIN && err != -EINTR && err != -ERESTARTSYS) { drbd_err(connection, "%s failed, err = %d\n", what, err); conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); } } return -EIO; } static void unregister_state_change(struct sock *sk, struct accept_wait_data *ad) { write_lock_bh(&sk->sk_callback_lock); sk->sk_state_change = ad->original_sk_state_change; sk->sk_user_data = NULL; write_unlock_bh(&sk->sk_callback_lock); } static struct socket *drbd_wait_for_connect(struct drbd_connection *connection, struct accept_wait_data *ad) { int timeo, connect_int, err = 0; struct socket *s_estab = NULL; struct net_conf *nc; rcu_read_lock(); nc = rcu_dereference(connection->net_conf); if (!nc) { rcu_read_unlock(); return NULL; } connect_int = nc->connect_int; rcu_read_unlock(); timeo = connect_int * HZ; timeo += (prandom_u32() & 1) ? timeo / 7 : -timeo / 7; /* 28.5% random jitter */ err = wait_for_completion_interruptible_timeout(&ad->door_bell, timeo); if (err <= 0) return NULL; err = kernel_accept(ad->s_listen, &s_estab, 0); if (err < 0) { if (err != -EAGAIN && err != -EINTR && err != -ERESTARTSYS) { drbd_err(connection, "accept failed, err = %d\n", err); conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); } } if (s_estab) unregister_state_change(s_estab->sk, ad); return s_estab; } static int decode_header(struct drbd_connection *, void *, struct packet_info *); static int send_first_packet(struct drbd_connection *connection, struct drbd_socket *sock, enum drbd_packet cmd) { if (!conn_prepare_command(connection, sock)) return -EIO; return conn_send_command(connection, sock, cmd, 0, NULL, 0); } static int receive_first_packet(struct drbd_connection *connection, struct socket *sock) { unsigned int header_size = drbd_header_size(connection); struct packet_info pi; int err; err = drbd_recv_short(sock, connection->data.rbuf, header_size, 0); if (err != header_size) { if (err >= 0) err = -EIO; return err; } err = decode_header(connection, connection->data.rbuf, &pi); if (err) return err; return pi.cmd; } /** * drbd_socket_okay() - Free the socket if its connection is not okay * @sock: pointer to the pointer to the socket. */ static int drbd_socket_okay(struct socket **sock) { int rr; char tb[4]; if (!*sock) return false; rr = drbd_recv_short(*sock, tb, 4, MSG_DONTWAIT | MSG_PEEK); if (rr > 0 || rr == -EAGAIN) { return true; } else { sock_release(*sock); *sock = NULL; return false; } } /* Gets called if a connection is established, or if a new minor gets created in a connection */ int drbd_connected(struct drbd_peer_device *peer_device) { struct drbd_device *device = peer_device->device; int err; atomic_set(&device->packet_seq, 0); device->peer_seq = 0; device->state_mutex = peer_device->connection->agreed_pro_version < 100 ? &peer_device->connection->cstate_mutex : &device->own_state_mutex; err = drbd_send_sync_param(peer_device); if (!err) err = drbd_send_sizes(peer_device, 0, 0); if (!err) err = drbd_send_uuids(peer_device); if (!err) err = drbd_send_current_state(peer_device); clear_bit(USE_DEGR_WFC_T, &device->flags); clear_bit(RESIZE_PENDING, &device->flags); atomic_set(&device->ap_in_flight, 0); mod_timer(&device->request_timer, jiffies + HZ); /* just start it here. */ return err; } /* * return values: * 1 yes, we have a valid connection * 0 oops, did not work out, please try again * -1 peer talks different language, * no point in trying again, please go standalone. * -2 We do not have a network config... */ static int conn_connect(struct drbd_connection *connection) { struct drbd_socket sock, msock; struct drbd_peer_device *peer_device; struct net_conf *nc; int vnr, timeout, h, ok; bool discard_my_data; enum drbd_state_rv rv; struct accept_wait_data ad = { .connection = connection, .door_bell = COMPLETION_INITIALIZER_ONSTACK(ad.door_bell), }; clear_bit(DISCONNECT_SENT, &connection->flags); if (conn_request_state(connection, NS(conn, C_WF_CONNECTION), CS_VERBOSE) < SS_SUCCESS) return -2; mutex_init(&sock.mutex); sock.sbuf = connection->data.sbuf; sock.rbuf = connection->data.rbuf; sock.socket = NULL; mutex_init(&msock.mutex); msock.sbuf = connection->meta.sbuf; msock.rbuf = connection->meta.rbuf; msock.socket = NULL; /* Assume that the peer only understands protocol 80 until we know better. */ connection->agreed_pro_version = 80; if (prepare_listen_socket(connection, &ad)) return 0; do { struct socket *s; s = drbd_try_connect(connection); if (s) { if (!sock.socket) { sock.socket = s; send_first_packet(connection, &sock, P_INITIAL_DATA); } else if (!msock.socket) { clear_bit(RESOLVE_CONFLICTS, &connection->flags); msock.socket = s; send_first_packet(connection, &msock, P_INITIAL_META); } else { drbd_err(connection, "Logic error in conn_connect()\n"); goto out_release_sockets; } } if (sock.socket && msock.socket) { rcu_read_lock(); nc = rcu_dereference(connection->net_conf); timeout = nc->ping_timeo * HZ / 10; rcu_read_unlock(); schedule_timeout_interruptible(timeout); ok = drbd_socket_okay(&sock.socket); ok = drbd_socket_okay(&msock.socket) && ok; if (ok) break; } retry: s = drbd_wait_for_connect(connection, &ad); if (s) { int fp = receive_first_packet(connection, s); drbd_socket_okay(&sock.socket); drbd_socket_okay(&msock.socket); switch (fp) { case P_INITIAL_DATA: if (sock.socket) { drbd_warn(connection, "initial packet S crossed\n"); sock_release(sock.socket); sock.socket = s; goto randomize; } sock.socket = s; break; case P_INITIAL_META: set_bit(RESOLVE_CONFLICTS, &connection->flags); if (msock.socket) { drbd_warn(connection, "initial packet M crossed\n"); sock_release(msock.socket); msock.socket = s; goto randomize; } msock.socket = s; break; default: drbd_warn(connection, "Error receiving initial packet\n"); sock_release(s); randomize: if (prandom_u32() & 1) goto retry; } } if (connection->cstate <= C_DISCONNECTING) goto out_release_sockets; if (signal_pending(current)) { flush_signals(current); smp_rmb(); if (get_t_state(&connection->receiver) == EXITING) goto out_release_sockets; } ok = drbd_socket_okay(&sock.socket); ok = drbd_socket_okay(&msock.socket) && ok; } while (!ok); if (ad.s_listen) sock_release(ad.s_listen); sock.socket->sk->sk_reuse = SK_CAN_REUSE; /* SO_REUSEADDR */ msock.socket->sk->sk_reuse = SK_CAN_REUSE; /* SO_REUSEADDR */ sock.socket->sk->sk_allocation = GFP_NOIO; msock.socket->sk->sk_allocation = GFP_NOIO; sock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE_BULK; msock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE; /* NOT YET ... * sock.socket->sk->sk_sndtimeo = connection->net_conf->timeout*HZ/10; * sock.socket->sk->sk_rcvtimeo = MAX_SCHEDULE_TIMEOUT; * first set it to the P_CONNECTION_FEATURES timeout, * which we set to 4x the configured ping_timeout. */ rcu_read_lock(); nc = rcu_dereference(connection->net_conf); sock.socket->sk->sk_sndtimeo = sock.socket->sk->sk_rcvtimeo = nc->ping_timeo*4*HZ/10; msock.socket->sk->sk_rcvtimeo = nc->ping_int*HZ; timeout = nc->timeout * HZ / 10; discard_my_data = nc->discard_my_data; rcu_read_unlock(); msock.socket->sk->sk_sndtimeo = timeout; /* we don't want delays. * we use TCP_CORK where appropriate, though */ drbd_tcp_nodelay(sock.socket); drbd_tcp_nodelay(msock.socket); connection->data.socket = sock.socket; connection->meta.socket = msock.socket; connection->last_received = jiffies; h = drbd_do_features(connection); if (h <= 0) return h; if (connection->cram_hmac_tfm) { /* drbd_request_state(device, NS(conn, WFAuth)); */ switch (drbd_do_auth(connection)) { case -1: drbd_err(connection, "Authentication of peer failed\n"); return -1; case 0: drbd_err(connection, "Authentication of peer failed, trying again.\n"); return 0; } } connection->data.socket->sk->sk_sndtimeo = timeout; connection->data.socket->sk->sk_rcvtimeo = MAX_SCHEDULE_TIMEOUT; if (drbd_send_protocol(connection) == -EOPNOTSUPP) return -1; /* Prevent a race between resync-handshake and * being promoted to Primary. * * Grab and release the state mutex, so we know that any current * drbd_set_role() is finished, and any incoming drbd_set_role * will see the STATE_SENT flag, and wait for it to be cleared. */ idr_for_each_entry(&connection->peer_devices, peer_device, vnr) mutex_lock(peer_device->device->state_mutex); set_bit(STATE_SENT, &connection->flags); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) mutex_unlock(peer_device->device->state_mutex); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); if (discard_my_data) set_bit(DISCARD_MY_DATA, &device->flags); else clear_bit(DISCARD_MY_DATA, &device->flags); drbd_connected(peer_device); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); rv = conn_request_state(connection, NS(conn, C_WF_REPORT_PARAMS), CS_VERBOSE); if (rv < SS_SUCCESS || connection->cstate != C_WF_REPORT_PARAMS) { clear_bit(STATE_SENT, &connection->flags); return 0; } drbd_thread_start(&connection->asender); mutex_lock(&connection->resource->conf_update); /* The discard_my_data flag is a single-shot modifier to the next * connection attempt, the handshake of which is now well underway. * No need for rcu style copying of the whole struct * just to clear a single value. */ connection->net_conf->discard_my_data = 0; mutex_unlock(&connection->resource->conf_update); return h; out_release_sockets: if (ad.s_listen) sock_release(ad.s_listen); if (sock.socket) sock_release(sock.socket); if (msock.socket) sock_release(msock.socket); return -1; } static int decode_header(struct drbd_connection *connection, void *header, struct packet_info *pi) { unsigned int header_size = drbd_header_size(connection); if (header_size == sizeof(struct p_header100) && *(__be32 *)header == cpu_to_be32(DRBD_MAGIC_100)) { struct p_header100 *h = header; if (h->pad != 0) { drbd_err(connection, "Header padding is not zero\n"); return -EINVAL; } pi->vnr = be16_to_cpu(h->volume); pi->cmd = be16_to_cpu(h->command); pi->size = be32_to_cpu(h->length); } else if (header_size == sizeof(struct p_header95) && *(__be16 *)header == cpu_to_be16(DRBD_MAGIC_BIG)) { struct p_header95 *h = header; pi->cmd = be16_to_cpu(h->command); pi->size = be32_to_cpu(h->length); pi->vnr = 0; } else if (header_size == sizeof(struct p_header80) && *(__be32 *)header == cpu_to_be32(DRBD_MAGIC)) { struct p_header80 *h = header; pi->cmd = be16_to_cpu(h->command); pi->size = be16_to_cpu(h->length); pi->vnr = 0; } else { drbd_err(connection, "Wrong magic value 0x%08x in protocol version %d\n", be32_to_cpu(*(__be32 *)header), connection->agreed_pro_version); return -EINVAL; } pi->data = header + header_size; return 0; } static int drbd_recv_header(struct drbd_connection *connection, struct packet_info *pi) { void *buffer = connection->data.rbuf; int err; err = drbd_recv_all_warn(connection, buffer, drbd_header_size(connection)); if (err) return err; err = decode_header(connection, buffer, pi); connection->last_received = jiffies; return err; } static enum finish_epoch drbd_flush_after_epoch(struct drbd_connection *connection, struct drbd_epoch *epoch) { int rv; struct drbd_peer_device *peer_device; int vnr; if (connection->write_ordering >= WO_bdev_flush) { rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (!get_ldev(device)) continue; kobject_get(&device->kobj); rcu_read_unlock(); rv = blkdev_issue_flush(device->ldev->backing_bdev, GFP_NOIO, NULL); if (rv) { drbd_info(device, "local disk flush failed with status %d\n", rv); /* would rather check on EOPNOTSUPP, but that is not reliable. * don't try again for ANY return value != 0 * if (rv == -EOPNOTSUPP) */ drbd_bump_write_ordering(connection, WO_drain_io); } put_ldev(device); kobject_put(&device->kobj); rcu_read_lock(); if (rv) break; } rcu_read_unlock(); } return drbd_may_finish_epoch(connection, epoch, EV_BARRIER_DONE); } static int w_flush(struct drbd_work *w, int cancel) { struct flush_work *fw = container_of(w, struct flush_work, w); struct drbd_epoch *epoch = fw->epoch; struct drbd_connection *connection = epoch->connection; kfree(fw); if (!test_and_set_bit(DE_BARRIER_IN_NEXT_EPOCH_ISSUED, &epoch->flags)) drbd_flush_after_epoch(connection, epoch); drbd_may_finish_epoch(connection, epoch, EV_PUT | (connection->cstate < C_WF_REPORT_PARAMS ? EV_CLEANUP : 0)); return 0; } /** * drbd_may_finish_epoch() - Applies an epoch_event to the epoch's state, eventually finishes it. * @device: DRBD device. * @epoch: Epoch object. * @ev: Epoch event. */ static enum finish_epoch drbd_may_finish_epoch(struct drbd_connection *connection, struct drbd_epoch *epoch, enum epoch_event ev) { int finish, epoch_size; struct drbd_epoch *next_epoch; int schedule_flush = 0; enum finish_epoch rv = FE_STILL_LIVE; spin_lock(&connection->epoch_lock); do { next_epoch = NULL; finish = 0; epoch_size = atomic_read(&epoch->epoch_size); switch (ev & ~EV_CLEANUP) { case EV_PUT: atomic_dec(&epoch->active); break; case EV_GOT_BARRIER_NR: set_bit(DE_HAVE_BARRIER_NUMBER, &epoch->flags); /* Special case: If we just switched from WO_bio_barrier to WO_bdev_flush we should not finish the current epoch */ if (test_bit(DE_CONTAINS_A_BARRIER, &epoch->flags) && epoch_size == 1 && connection->write_ordering != WO_bio_barrier && epoch == connection->current_epoch) clear_bit(DE_CONTAINS_A_BARRIER, &epoch->flags); break; case EV_BARRIER_DONE: set_bit(DE_BARRIER_IN_NEXT_EPOCH_DONE, &epoch->flags); break; case EV_BECAME_LAST: /* nothing to do*/ break; } if (epoch_size != 0 && atomic_read(&epoch->active) == 0 && (test_bit(DE_HAVE_BARRIER_NUMBER, &epoch->flags) || ev & EV_CLEANUP) && epoch->list.prev == &connection->current_epoch->list && !test_bit(DE_IS_FINISHING, &epoch->flags)) { /* Nearly all conditions are met to finish that epoch... */ if (test_bit(DE_BARRIER_IN_NEXT_EPOCH_DONE, &epoch->flags) || connection->write_ordering == WO_none || (epoch_size == 1 && test_bit(DE_CONTAINS_A_BARRIER, &epoch->flags)) || ev & EV_CLEANUP) { finish = 1; set_bit(DE_IS_FINISHING, &epoch->flags); } else if (!test_bit(DE_BARRIER_IN_NEXT_EPOCH_ISSUED, &epoch->flags) && connection->write_ordering == WO_bio_barrier) { atomic_inc(&epoch->active); schedule_flush = 1; } } if (finish) { if (!(ev & EV_CLEANUP)) { spin_unlock(&connection->epoch_lock); drbd_send_b_ack(epoch->connection, epoch->barrier_nr, epoch_size); spin_lock(&connection->epoch_lock); } #if 0 /* FIXME: dec unacked on connection, once we have * something to count pending connection packets in. */ if (test_bit(DE_HAVE_BARRIER_NUMBER, &epoch->flags)) dec_unacked(epoch->connection); #endif if (connection->current_epoch != epoch) { next_epoch = list_entry(epoch->list.next, struct drbd_epoch, list); list_del(&epoch->list); ev = EV_BECAME_LAST | (ev & EV_CLEANUP); connection->epochs--; kfree(epoch); if (rv == FE_STILL_LIVE) rv = FE_DESTROYED; } else { epoch->flags = 0; atomic_set(&epoch->epoch_size, 0); /* atomic_set(&epoch->active, 0); is alrady zero */ if (rv == FE_STILL_LIVE) rv = FE_RECYCLED; } } if (!next_epoch) break; epoch = next_epoch; } while (1); spin_unlock(&connection->epoch_lock); if (schedule_flush) { struct flush_work *fw; fw = kmalloc(sizeof(*fw), GFP_ATOMIC); if (fw) { fw->w.cb = w_flush; fw->epoch = epoch; drbd_queue_work(&connection->sender_work, &fw->w); } else { drbd_warn(connection, "Could not kmalloc a flush_work obj\n"); set_bit(DE_BARRIER_IN_NEXT_EPOCH_ISSUED, &epoch->flags); /* That is not a recursion, only one level */ drbd_may_finish_epoch(connection, epoch, EV_BARRIER_DONE); drbd_may_finish_epoch(connection, epoch, EV_PUT); } } return rv; } /** * drbd_bump_write_ordering() - Fall back to an other write ordering method * @connection: DRBD connection. * @wo: Write ordering method to try. */ void drbd_bump_write_ordering(struct drbd_connection *connection, enum write_ordering_e wo) { struct disk_conf *dc; struct drbd_peer_device *peer_device; enum write_ordering_e pwo; int vnr, i = 0; static char *write_ordering_str[] = { [WO_none] = "none", [WO_drain_io] = "drain", [WO_bdev_flush] = "flush", [WO_bio_barrier] = "barrier", }; pwo = connection->write_ordering; wo = min(pwo, wo); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (i++ == 1 && wo == WO_bio_barrier) wo = WO_bdev_flush; /* WO = barrier does not handle multiple volumes */ if (!get_ldev_if_state(device, D_ATTACHING)) continue; dc = rcu_dereference(device->ldev->disk_conf); if (wo == WO_bio_barrier && !dc->disk_barrier) wo = WO_bdev_flush; if (wo == WO_bdev_flush && !dc->disk_flushes) wo = WO_drain_io; if (wo == WO_drain_io && !dc->disk_drain) wo = WO_none; put_ldev(device); } rcu_read_unlock(); connection->write_ordering = wo; if (pwo != connection->write_ordering || wo == WO_bio_barrier) drbd_info(connection, "Method to ensure write ordering: %s\n", write_ordering_str[connection->write_ordering]); } void conn_wait_active_ee_empty(struct drbd_connection *connection); /** * drbd_submit_peer_request() * @device: DRBD device. * @peer_req: peer request * @rw: flag field, see bio->bi_rw * * May spread the pages to multiple bios, * depending on bio_add_page restrictions. * * Returns 0 if all bios have been submitted, * -ENOMEM if we could not allocate enough bios, * -ENOSPC (any better suggestion?) if we have not been able to bio_add_page a * single page to an empty bio (which should never happen and likely indicates * that the lower level IO stack is in some way broken). This has been observed * on certain Xen deployments. */ /* TODO allocate from our own bio_set. */ int drbd_submit_peer_request(struct drbd_device *device, struct drbd_peer_request *peer_req, const unsigned rw, const int fault_type) { struct bio *bios = NULL; struct bio *bio; struct page *page = peer_req->pages; sector_t sector = peer_req->i.sector; unsigned ds = peer_req->i.size; unsigned n_bios = 0; unsigned nr_pages = (ds + PAGE_SIZE -1) >> PAGE_SHIFT; int err = -ENOMEM; if (peer_req->flags & EE_IS_TRIM_USE_ZEROOUT) { /* wait for all pending IO completions, before we start * zeroing things out. */ conn_wait_active_ee_empty(first_peer_device(device)->connection); if (blkdev_issue_zeroout(device->ldev->backing_bdev, sector, ds >> 9, GFP_NOIO)) peer_req->flags |= EE_WAS_ERROR; drbd_endio_write_sec_final(peer_req); return 0; } if (peer_req->flags & EE_IS_TRIM) nr_pages = 0; /* discards don't have any payload. */ /* In most cases, we will only need one bio. But in case the lower * level restrictions happen to be different at this offset on this * side than those of the sending peer, we may need to submit the * request in more than one bio. * * Plain bio_alloc is good enough here, this is no DRBD internally * generated bio, but a bio allocated on behalf of the peer. */ next_bio: bio = bio_alloc(GFP_NOIO, nr_pages); if (!bio) { drbd_err(device, "submit_ee: Allocation of a bio failed (nr_pages=%u)\n", nr_pages); goto fail; } /* > peer_req->i.sector, unless this is the first bio */ bio->bi_sector = sector; bio->bi_bdev = device->ldev->backing_bdev; /* we special case some flags in the multi-bio case, see below * (REQ_UNPLUG, REQ_FLUSH, or BIO_RW_BARRIER in older kernels) */ bio->bi_rw = rw; bio->bi_private = peer_req; bio->bi_end_io = drbd_peer_request_endio; bio->bi_next = bios; bios = bio; ++n_bios; if (rw & DRBD_REQ_DISCARD) { bio->bi_size = ds; goto submit; } page_chain_for_each(page) { unsigned len = min_t(unsigned, ds, PAGE_SIZE); if (!bio_add_page(bio, page, len, 0)) { /* A single page must always be possible! * But in case it fails anyways, * we deal with it, and complain (below). */ if (bio->bi_vcnt == 0) { drbd_err(device, "bio_add_page failed for len=%u, " "bi_vcnt=0 (bi_sector=%llu)\n", len, (unsigned long long)bio->bi_sector); err = -ENOSPC; goto fail; } goto next_bio; } ds -= len; sector += len >> 9; --nr_pages; } D_ASSERT(device, ds == 0); submit: D_ASSERT(device, page == NULL); atomic_set(&peer_req->pending_bios, n_bios); do { bio = bios; bios = bios->bi_next; bio->bi_next = NULL; /* strip off REQ_UNPLUG unless it is the last bio */ if (bios) bio->bi_rw &= ~DRBD_REQ_UNPLUG; drbd_generic_make_request(device, fault_type, bio); /* strip off REQ_FLUSH, * unless it is the first or last bio */ if (bios && bios->bi_next) bios->bi_rw &= ~DRBD_REQ_FLUSH; } while (bios); maybe_kick_lo(device); return 0; fail: while (bios) { bio = bios; bios = bios->bi_next; bio_put(bio); } return err; } static void drbd_remove_epoch_entry_interval(struct drbd_device *device, struct drbd_peer_request *peer_req) { struct drbd_interval *i = &peer_req->i; drbd_remove_interval(&device->write_requests, i); drbd_clear_interval(i); /* Wake up any processes waiting for this peer request to complete. */ if (i->waiting) wake_up(&device->misc_wait); } /** * w_e_reissue() - Worker callback; Resubmit a bio, without REQ_HARDBARRIER set * @device: DRBD device. * @w: work object. * @cancel: The connection will be closed anyways (unused in this callback) */ int w_e_reissue(struct drbd_work *w, int cancel) __releases(local) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_device *device = peer_req->peer_device->device; int err; /* We leave DE_CONTAINS_A_BARRIER and EE_IS_BARRIER in place, (and DE_BARRIER_IN_NEXT_EPOCH_ISSUED in the previous Epoch) so that we can finish that epoch in drbd_may_finish_epoch(). That is necessary if we already have a long chain of Epochs, before we realize that BARRIER is actually not supported */ /* As long as the -ENOTSUPP on the barrier is reported immediately that will never trigger. If it is reported late, we will just print that warning and continue correctly for all future requests with WO_bdev_flush */ if (previous_epoch(first_peer_device(device)->connection, peer_req->epoch)) drbd_warn(device, "Write ordering was not enforced (one time event)\n"); /* we still have a local reference, * get_ldev was done in receive_Data. */ peer_req->w.cb = e_end_block; err = drbd_submit_peer_request(device, peer_req, WRITE, DRBD_FAULT_DT_WR); switch (err) { case -ENOMEM: peer_req->w.cb = w_e_reissue; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &peer_req->w); /* retry later; fall through */ case 0: /* keep worker happy and connection up */ return 0; case -ENOSPC: /* no other error expected, but anyways: */ default: /* forget the object, * and cause a "Network failure" */ spin_lock_irq(&device->resource->req_lock); list_del(&peer_req->w.list); drbd_remove_epoch_entry_interval(device, peer_req); spin_unlock_irq(&device->resource->req_lock); if (peer_req->flags & EE_CALL_AL_COMPLETE_IO) drbd_al_complete_io(device, &peer_req->i); drbd_may_finish_epoch(first_peer_device(device)->connection, peer_req->epoch, EV_PUT + EV_CLEANUP); drbd_free_peer_req(device, peer_req); drbd_err(device, "submit failed, triggering re-connect\n"); return err; } } void conn_wait_active_ee_empty(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); drbd_wait_ee_list_empty(device, &device->active_ee); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); } void conn_wait_done_ee_empty(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); drbd_wait_ee_list_empty(device, &device->done_ee); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); } #ifdef blk_queue_plugged static void drbd_unplug_all_devices(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); drbd_kick_lo(device); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); } #else static void drbd_unplug_all_devices(struct drbd_connection *connection) { } #endif static struct drbd_peer_device * conn_peer_device(struct drbd_connection *connection, int volume_number) { return idr_find(&connection->peer_devices, volume_number); } static int receive_Barrier(struct drbd_connection *connection, struct packet_info *pi) { int rv, issue_flush; struct p_barrier *p = pi->data; struct drbd_epoch *epoch; drbd_unplug_all_devices(connection); /* FIXME these are unacked on connection, * not a specific (peer)device. */ connection->current_epoch->barrier_nr = p->barrier; connection->current_epoch->connection = connection; rv = drbd_may_finish_epoch(connection, connection->current_epoch, EV_GOT_BARRIER_NR); /* P_BARRIER_ACK may imply that the corresponding extent is dropped from * the activity log, which means it would not be resynced in case the * R_PRIMARY crashes now. * Therefore we must send the barrier_ack after the barrier request was * completed. */ switch (connection->write_ordering) { case WO_bio_barrier: case WO_none: if (rv == FE_RECYCLED) return 0; break; case WO_bdev_flush: case WO_drain_io: if (rv == FE_STILL_LIVE) { set_bit(DE_BARRIER_IN_NEXT_EPOCH_ISSUED, &connection->current_epoch->flags); conn_wait_active_ee_empty(connection); rv = drbd_flush_after_epoch(connection, connection->current_epoch); } if (rv == FE_RECYCLED) return 0; /* The asender will send all the ACKs and barrier ACKs out, since all EEs moved from the active_ee to the done_ee. We need to provide a new epoch object for the EEs that come in soon */ break; } /* receiver context, in the writeout path of the other node. * avoid potential distributed deadlock */ epoch = kmalloc(sizeof(struct drbd_epoch), GFP_NOIO); if (!epoch) { issue_flush = !test_and_set_bit(DE_BARRIER_IN_NEXT_EPOCH_ISSUED, &connection->current_epoch->flags); conn_wait_active_ee_empty(connection); if (issue_flush) { rv = drbd_flush_after_epoch(connection, connection->current_epoch); if (rv == FE_RECYCLED) return 0; } conn_wait_done_ee_empty(connection); return 0; } epoch->flags = 0; atomic_set(&epoch->epoch_size, 0); atomic_set(&epoch->active, 0); spin_lock(&connection->epoch_lock); if (atomic_read(&connection->current_epoch->epoch_size)) { list_add(&epoch->list, &connection->current_epoch->list); connection->current_epoch = epoch; connection->epochs++; } else { /* The current_epoch got recycled while we allocated this one... */ kfree(epoch); } spin_unlock(&connection->epoch_lock); return 0; } /* used from receive_RSDataReply (recv_resync_read) * and from receive_Data */ static struct drbd_peer_request * read_in_block(struct drbd_peer_device *peer_device, u64 id, sector_t sector, struct packet_info *pi) __must_hold(local) { struct drbd_device *device = peer_device->device; const sector_t capacity = drbd_get_capacity(device->this_bdev); struct drbd_peer_request *peer_req; struct page *page; int dgs, ds, err; int data_size = pi->size; void *dig_in = peer_device->connection->int_dig_in; void *dig_vv = peer_device->connection->int_dig_vv; unsigned long *data; struct p_trim *trim = (pi->cmd == P_TRIM) ? pi->data : NULL; dgs = 0; if (!trim && peer_device->connection->peer_integrity_tfm) { dgs = crypto_hash_digestsize(peer_device->connection->peer_integrity_tfm); /* * FIXME: Receive the incoming digest into the receive buffer * here, together with its struct p_data? */ err = drbd_recv_all_warn(peer_device->connection, dig_in, dgs); if (err) return NULL; data_size -= dgs; } if (trim) { D_ASSERT(peer_device, data_size == 0); data_size = be32_to_cpu(trim->size); } if (!expect(IS_ALIGNED(data_size, 512))) return NULL; /* prepare for larger trim requests. */ if (!trim && !expect(data_size <= DRBD_MAX_BIO_SIZE)) return NULL; /* even though we trust out peer, * we sometimes have to double check. */ if (sector + (data_size>>9) > capacity) { drbd_err(device, "request from peer beyond end of local disk: " "capacity: %llus < sector: %llus + size: %u\n", (unsigned long long)capacity, (unsigned long long)sector, data_size); return NULL; } /* GFP_NOIO, because we must not cause arbitrary write-out: in a DRBD * "criss-cross" setup, that might cause write-out on some other DRBD, * which in turn might block on the other node at this very place. */ peer_req = drbd_alloc_peer_req(peer_device, id, sector, data_size, trim == NULL, GFP_NOIO); if (!peer_req) return NULL; if (trim) return peer_req; ds = data_size; page = peer_req->pages; page_chain_for_each(page) { unsigned len = min_t(int, ds, PAGE_SIZE); data = kmap(page); err = drbd_recv_all_warn(peer_device->connection, data, len); if (drbd_insert_fault(device, DRBD_FAULT_RECEIVE)) { drbd_err(device, "Fault injection: Corrupting data on receive\n"); data[0] = data[0] ^ (unsigned long)-1; } kunmap(page); if (err) { drbd_free_peer_req(device, peer_req); return NULL; } ds -= len; } if (dgs) { drbd_csum_ee(peer_device->connection->peer_integrity_tfm, peer_req, dig_vv); if (memcmp(dig_in, dig_vv, dgs)) { drbd_err(device, "Digest integrity check FAILED: %llus +%u\n", (unsigned long long)sector, data_size); drbd_free_peer_req(device, peer_req); return NULL; } } device->recv_cnt += data_size>>9; return peer_req; } /* drbd_drain_block() just takes a data block * out of the socket input buffer, and discards it. */ static int drbd_drain_block(struct drbd_peer_device *peer_device, int data_size) { struct page *page; int err = 0; void *data; if (!data_size) return 0; page = drbd_alloc_pages(peer_device, 1, 1); data = kmap(page); while (data_size) { unsigned int len = min_t(int, data_size, PAGE_SIZE); err = drbd_recv_all_warn(peer_device->connection, data, len); if (err) break; data_size -= len; } kunmap(page); drbd_free_pages(peer_device->device, page, 0); return err; } static int recv_dless_read(struct drbd_peer_device *peer_device, struct drbd_request *req, sector_t sector, int data_size) { struct bio_vec *bvec; struct bio *bio; int dgs, err, i, expect; void *dig_in = peer_device->connection->int_dig_in; void *dig_vv = peer_device->connection->int_dig_vv; dgs = 0; if (peer_device->connection->peer_integrity_tfm) { dgs = crypto_hash_digestsize(peer_device->connection->peer_integrity_tfm); err = drbd_recv_all_warn(peer_device->connection, dig_in, dgs); if (err) return err; data_size -= dgs; } /* optimistically update recv_cnt. if receiving fails below, * we disconnect anyways, and counters will be reset. */ peer_device->device->recv_cnt += data_size>>9; bio = req->master_bio; D_ASSERT(peer_device->device, sector == bio->bi_sector); bio_for_each_segment(bvec, bio, i) { void *mapped = kmap(bvec->bv_page) + bvec->bv_offset; expect = min_t(int, data_size, bvec->bv_len); err = drbd_recv_all_warn(peer_device->connection, mapped, expect); kunmap(bvec->bv_page); if (err) return err; data_size -= expect; } if (dgs) { drbd_csum_bio(peer_device->connection->peer_integrity_tfm, bio, dig_vv); if (memcmp(dig_in, dig_vv, dgs)) { drbd_err(peer_device, "Digest integrity check FAILED. Broken NICs?\n"); return -EINVAL; } } D_ASSERT(peer_device->device, data_size == 0); return 0; } /* * e_end_resync_block() is called in asender context via * drbd_finish_peer_reqs(). */ static int e_end_resync_block(struct drbd_work *w, int unused) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; sector_t sector = peer_req->i.sector; int err; D_ASSERT(device, drbd_interval_empty(&peer_req->i)); if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) { drbd_set_in_sync(device, sector, peer_req->i.size); err = drbd_send_ack(peer_device, P_RS_WRITE_ACK, peer_req); } else { /* Record failure to sync */ drbd_rs_failed_io(device, sector, peer_req->i.size); err = drbd_send_ack(peer_device, P_NEG_ACK, peer_req); } dec_unacked(device); return err; } static int recv_resync_read(struct drbd_peer_device *peer_device, sector_t sector, struct packet_info *pi) __releases(local) { struct drbd_device *device = peer_device->device; struct drbd_peer_request *peer_req; peer_req = read_in_block(peer_device, ID_SYNCER, sector, pi); if (!peer_req) goto fail; dec_rs_pending(device); inc_unacked(device); /* corresponding dec_unacked() in e_end_resync_block() * respective _drbd_clear_done_ee */ peer_req->w.cb = e_end_resync_block; spin_lock_irq(&device->resource->req_lock); list_add(&peer_req->w.list, &device->sync_ee); spin_unlock_irq(&device->resource->req_lock); atomic_add(pi->size >> 9, &device->rs_sect_ev); if (drbd_submit_peer_request(device, peer_req, WRITE, DRBD_FAULT_RS_WR) == 0) return 0; /* don't care for the reason here */ drbd_err(device, "submit failed, triggering re-connect\n"); spin_lock_irq(&device->resource->req_lock); list_del(&peer_req->w.list); spin_unlock_irq(&device->resource->req_lock); drbd_free_peer_req(device, peer_req); fail: put_ldev(device); return -EIO; } static struct drbd_request * find_request(struct drbd_device *device, struct rb_root *root, u64 id, sector_t sector, bool missing_ok, const char *func) { struct drbd_request *req; /* Request object according to our peer */ req = (struct drbd_request *)(unsigned long)id; if (drbd_contains_interval(root, sector, &req->i) && req->i.local) return req; if (!missing_ok) { drbd_err(device, "%s: failed to find request 0x%lx, sector %llus\n", func, (unsigned long)id, (unsigned long long)sector); } return NULL; } static int receive_DataReply(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct drbd_request *req; sector_t sector; int err; struct p_data *p = pi->data; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; sector = be64_to_cpu(p->sector); spin_lock_irq(&device->resource->req_lock); req = find_request(device, &device->read_requests, p->block_id, sector, false, __func__); spin_unlock_irq(&device->resource->req_lock); if (unlikely(!req)) return -EIO; /* drbd_remove_request_interval() is done in _req_may_be_done, to avoid * special casing it there for the various failure cases. * still no race with drbd_fail_pending_reads */ err = recv_dless_read(peer_device, req, sector, pi->size); if (!err) req_mod(req, DATA_RECEIVED); /* else: nothing. handled from drbd_disconnect... * I don't think we may complete this just yet * in case we are "on-disconnect: freeze" */ return err; } static int receive_RSDataReply(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; sector_t sector; int err; struct p_data *p = pi->data; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; sector = be64_to_cpu(p->sector); D_ASSERT(device, p->block_id == ID_SYNCER); if (get_ldev(device)) { /* data is submitted to disk within recv_resync_read. * corresponding put_ldev done below on error, * or in drbd_peer_request_endio. */ err = recv_resync_read(peer_device, sector, pi); } else { if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Can not write resync data to local disk.\n"); err = drbd_drain_block(peer_device, pi->size); drbd_send_ack_dp(peer_device, P_NEG_ACK, p, pi->size); } atomic_add(pi->size >> 9, &device->rs_sect_in); return err; } static void restart_conflicting_writes(struct drbd_device *device, sector_t sector, int size) { struct drbd_interval *i; struct drbd_request *req; drbd_for_each_overlap(i, &device->write_requests, sector, size) { if (!i->local) continue; req = container_of(i, struct drbd_request, i); if (req->rq_state & RQ_LOCAL_PENDING || !(req->rq_state & RQ_POSTPONED)) continue; /* as it is RQ_POSTPONED, this will cause it to * be queued on the retry workqueue. */ __req_mod(req, CONFLICT_RESOLVED, NULL); } } /* * e_end_block() is called in asender context via drbd_finish_peer_reqs(). */ static int e_end_block(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; sector_t sector = peer_req->i.sector; struct drbd_epoch *epoch; int err = 0, pcmd; if (peer_req->flags & EE_IS_BARRIER) { epoch = previous_epoch(first_peer_device(device)->connection, peer_req->epoch); if (epoch) drbd_may_finish_epoch(first_peer_device(device)->connection, epoch, EV_BARRIER_DONE + (cancel ? EV_CLEANUP : 0)); } if (peer_req->flags & EE_SEND_WRITE_ACK) { if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) { pcmd = (device->state.conn >= C_SYNC_SOURCE && device->state.conn <= C_PAUSED_SYNC_T && peer_req->flags & EE_MAY_SET_IN_SYNC) ? P_RS_WRITE_ACK : P_WRITE_ACK; err = drbd_send_ack(peer_device, pcmd, peer_req); if (pcmd == P_RS_WRITE_ACK) drbd_set_in_sync(device, sector, peer_req->i.size); } else { err = drbd_send_ack(peer_device, P_NEG_ACK, peer_req); /* we expect it to be marked out of sync anyways... * maybe assert this? */ } dec_unacked(device); } /* we delete from the conflict detection hash _after_ we sent out the * P_WRITE_ACK / P_NEG_ACK, to get the sequence number right. */ if (peer_req->flags & EE_IN_INTERVAL_TREE) { spin_lock_irq(&device->resource->req_lock); D_ASSERT(device, !drbd_interval_empty(&peer_req->i)); drbd_remove_epoch_entry_interval(device, peer_req); if (peer_req->flags & EE_RESTART_REQUESTS) restart_conflicting_writes(device, sector, peer_req->i.size); spin_unlock_irq(&device->resource->req_lock); } else D_ASSERT(device, drbd_interval_empty(&peer_req->i)); drbd_may_finish_epoch(first_peer_device(device)->connection, peer_req->epoch, EV_PUT + (cancel ? EV_CLEANUP : 0)); return err; } static int e_send_ack(struct drbd_work *w, enum drbd_packet ack) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; int err; err = drbd_send_ack(peer_device, ack, peer_req); dec_unacked(peer_device->device); return err; } static int e_send_superseded(struct drbd_work *w, int unused) { return e_send_ack(w, P_SUPERSEDED); } static int e_send_retry_write(struct drbd_work *w, int unused) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_connection *connection = peer_req->peer_device->connection; return e_send_ack(w, connection->agreed_pro_version >= 100 ? P_RETRY_WRITE : P_SUPERSEDED); } static bool seq_greater(u32 a, u32 b) { /* * We assume 32-bit wrap-around here. * For 24-bit wrap-around, we would have to shift: * a <<= 8; b <<= 8; */ return (s32)a - (s32)b > 0; } static u32 seq_max(u32 a, u32 b) { return seq_greater(a, b) ? a : b; } static void update_peer_seq(struct drbd_peer_device *peer_device, unsigned int peer_seq) { struct drbd_device *device = peer_device->device; unsigned int newest_peer_seq; if (test_bit(RESOLVE_CONFLICTS, &peer_device->connection->flags)) { spin_lock(&device->peer_seq_lock); newest_peer_seq = seq_max(device->peer_seq, peer_seq); device->peer_seq = newest_peer_seq; spin_unlock(&device->peer_seq_lock); /* wake up only if we actually changed device->peer_seq */ if (peer_seq == newest_peer_seq) wake_up(&device->seq_wait); } } static inline int overlaps(sector_t s1, int l1, sector_t s2, int l2) { return !((s1 + (l1>>9) <= s2) || (s1 >= s2 + (l2>>9))); } /* maybe change sync_ee into interval trees as well? */ static bool overlapping_resync_write(struct drbd_device *device, struct drbd_peer_request *peer_req) { struct drbd_peer_request *rs_req; bool rv = 0; spin_lock_irq(&device->resource->req_lock); list_for_each_entry(rs_req, &device->sync_ee, w.list) { if (overlaps(peer_req->i.sector, peer_req->i.size, rs_req->i.sector, rs_req->i.size)) { rv = 1; break; } } spin_unlock_irq(&device->resource->req_lock); return rv; } /* Called from receive_Data. * Synchronize packets on sock with packets on msock. * * This is here so even when a P_DATA packet traveling via sock overtook an Ack * packet traveling on msock, they are still processed in the order they have * been sent. * * Note: we don't care for Ack packets overtaking P_DATA packets. * * In case packet_seq is larger than device->peer_seq number, there are * outstanding packets on the msock. We wait for them to arrive. * In case we are the logically next packet, we update device->peer_seq * ourselves. Correctly handles 32bit wrap around. * * Assume we have a 10 GBit connection, that is about 1<<30 byte per second, * about 1<<21 sectors per second. So "worst" case, we have 1<<3 == 8 seconds * for the 24bit wrap (historical atomic_t guarantee on some archs), and we have * 1<<9 == 512 seconds aka ages for the 32bit wrap around... * * returns 0 if we may process the packet, * -ERESTARTSYS if we were interrupted (by disconnect signal). */ static int wait_for_and_update_peer_seq(struct drbd_peer_device *peer_device, const u32 peer_seq) { struct drbd_device *device = peer_device->device; DEFINE_WAIT(wait); long timeout; int ret = 0, tp; if (!test_bit(RESOLVE_CONFLICTS, &peer_device->connection->flags)) return 0; spin_lock(&device->peer_seq_lock); for (;;) { if (!seq_greater(peer_seq - 1, device->peer_seq)) { device->peer_seq = seq_max(device->peer_seq, peer_seq); break; } if (signal_pending(current)) { ret = -ERESTARTSYS; break; } rcu_read_lock(); tp = rcu_dereference(first_peer_device(device)->connection->net_conf)->two_primaries; rcu_read_unlock(); if (!tp) break; /* Only need to wait if two_primaries is enabled */ prepare_to_wait(&device->seq_wait, &wait, TASK_INTERRUPTIBLE); spin_unlock(&device->peer_seq_lock); rcu_read_lock(); timeout = rcu_dereference(peer_device->connection->net_conf)->ping_timeo*HZ/10; rcu_read_unlock(); timeout = schedule_timeout(timeout); spin_lock(&device->peer_seq_lock); if (!timeout) { ret = -ETIMEDOUT; drbd_err(device, "Timed out waiting for missing ack packets; disconnecting\n"); break; } } spin_unlock(&device->peer_seq_lock); finish_wait(&device->seq_wait, &wait); return ret; } /* see also bio_flags_to_wire() * DRBD_REQ_*, because we need to semantically map the flags to data packet * flags and back. We may replicate to other kernel versions. */ static unsigned long wire_flags_to_bio(struct drbd_connection *connection, u32 dpf) { if (connection->agreed_pro_version >= 95) return (dpf & DP_RW_SYNC ? DRBD_REQ_SYNC : 0) | (dpf & DP_UNPLUG ? DRBD_REQ_UNPLUG : 0) | (dpf & DP_FUA ? DRBD_REQ_FUA : 0) | (dpf & DP_FLUSH ? DRBD_REQ_FLUSH : 0) | (dpf & DP_DISCARD ? DRBD_REQ_DISCARD : 0); /* else: we used to communicate one bit only in older DRBD */ return dpf & DP_RW_SYNC ? (DRBD_REQ_SYNC | DRBD_REQ_UNPLUG) : 0; } static void fail_postponed_requests(struct drbd_device *device, sector_t sector, unsigned int size) { struct drbd_interval *i; repeat: drbd_for_each_overlap(i, &device->write_requests, sector, size) { struct drbd_request *req; struct bio_and_error m; if (!i->local) continue; req = container_of(i, struct drbd_request, i); if (!(req->rq_state & RQ_POSTPONED)) continue; req->rq_state &= ~RQ_POSTPONED; __req_mod(req, NEG_ACKED, &m); spin_unlock_irq(&device->resource->req_lock); if (m.bio) complete_master_bio(device, &m); spin_lock_irq(&device->resource->req_lock); goto repeat; } } static int handle_write_conflicts(struct drbd_device *device, struct drbd_peer_request *peer_req) { struct drbd_connection *connection = peer_req->peer_device->connection; bool resolve_conflicts = test_bit(RESOLVE_CONFLICTS, &connection->flags); sector_t sector = peer_req->i.sector; const unsigned int size = peer_req->i.size; struct drbd_interval *i; bool equal; int err; /* * Inserting the peer request into the write_requests tree will prevent * new conflicting local requests from being added. */ drbd_insert_interval(&device->write_requests, &peer_req->i); repeat: drbd_for_each_overlap(i, &device->write_requests, sector, size) { if (i == &peer_req->i) continue; if (!i->local) { /* * Our peer has sent a conflicting remote request; this * should not happen in a two-node setup. Wait for the * earlier peer request to complete. */ err = drbd_wait_misc(device, i); if (err) goto out; goto repeat; } equal = i->sector == sector && i->size == size; if (resolve_conflicts) { /* * If the peer request is fully contained within the * overlapping request, it can be considered overwritten * and thus superseded; otherwise, it will be retried * once all overlapping requests have completed. */ bool superseded = i->sector <= sector && i->sector + (i->size >> 9) >= sector + (size >> 9); if (!equal) drbd_alert(device, "Concurrent writes detected: " "local=%llus +%u, remote=%llus +%u, " "assuming %s came first\n", (unsigned long long)i->sector, i->size, (unsigned long long)sector, size, superseded ? "local" : "remote"); inc_unacked(device); peer_req->w.cb = superseded ? e_send_superseded : e_send_retry_write; list_add_tail(&peer_req->w.list, &device->done_ee); wake_asender(connection); err = -ENOENT; goto out; } else { struct drbd_request *req = container_of(i, struct drbd_request, i); if (!equal) drbd_alert(device, "Concurrent writes detected: " "local=%llus +%u, remote=%llus +%u\n", (unsigned long long)i->sector, i->size, (unsigned long long)sector, size); if (req->rq_state & RQ_LOCAL_PENDING || !(req->rq_state & RQ_POSTPONED)) { /* * Wait for the node with the discard flag to * decide if this request has been superseded * or needs to be retried. * Requests that have been superseded will * disappear from the write_requests tree. * * In addition, wait for the conflicting * request to finish locally before submitting * the conflicting peer request. */ err = drbd_wait_misc(device, &req->i); if (err) { _conn_request_state(connection, NS(conn, C_TIMEOUT), CS_HARD); fail_postponed_requests(device, sector, size); goto out; } goto repeat; } /* * Remember to restart the conflicting requests after * the new peer request has completed. */ peer_req->flags |= EE_RESTART_REQUESTS; } } err = 0; out: if (err) drbd_remove_epoch_entry_interval(device, peer_req); return err; } /* mirrored write */ static int receive_Data(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; sector_t sector; struct drbd_peer_request *peer_req; struct p_data *p = pi->data; u32 peer_seq = be32_to_cpu(p->seq_num); int rw = WRITE; u32 dp_flags; int err, tp; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; if (!get_ldev(device)) { int err2; err = wait_for_and_update_peer_seq(peer_device, peer_seq); drbd_send_ack_dp(peer_device, P_NEG_ACK, p, pi->size); atomic_inc(&connection->current_epoch->epoch_size); err2 = drbd_drain_block(peer_device, pi->size); if (!err) err = err2; return err; } /* * Corresponding put_ldev done either below (on various errors), or in * drbd_peer_request_endio, if we successfully submit the data at the * end of this function. */ sector = be64_to_cpu(p->sector); peer_req = read_in_block(peer_device, p->block_id, sector, pi); if (!peer_req) { put_ldev(device); return -EIO; } peer_req->w.cb = e_end_block; dp_flags = be32_to_cpu(p->dp_flags); rw |= wire_flags_to_bio(connection, dp_flags); if (pi->cmd == P_TRIM) { struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev); peer_req->flags |= EE_IS_TRIM; if (!blk_queue_discard(q)) peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT; D_ASSERT(peer_device, peer_req->i.size > 0); D_ASSERT(peer_device, rw & DRBD_REQ_DISCARD); D_ASSERT(peer_device, peer_req->pages == NULL); } else if (peer_req->pages == NULL) { D_ASSERT(device, peer_req->i.size == 0); D_ASSERT(device, dp_flags & DP_FLUSH); } if (dp_flags & DP_MAY_SET_IN_SYNC) peer_req->flags |= EE_MAY_SET_IN_SYNC; /* last "fixes" to rw flags. * Strip off BIO_RW_BARRIER unconditionally, * it is not supposed to be here anyways. * (Was FUA or FLUSH on the peer, * and got translated to BARRIER on this side). * Note that the epoch handling code below * may add it again, though. */ rw &= ~DRBD_REQ_HARDBARRIER; spin_lock(&connection->epoch_lock); peer_req->epoch = connection->current_epoch; atomic_inc(&peer_req->epoch->epoch_size); atomic_inc(&peer_req->epoch->active); if (first_peer_device(device)->connection->write_ordering == WO_bio_barrier && atomic_read(&peer_req->epoch->epoch_size) == 1) { struct drbd_epoch *epoch; /* Issue a barrier if we start a new epoch, and the previous epoch was not a epoch containing a single request which already was a Barrier. */ epoch = list_entry(peer_req->epoch->list.prev, struct drbd_epoch, list); if (epoch == peer_req->epoch) { set_bit(DE_CONTAINS_A_BARRIER, &peer_req->epoch->flags); rw |= DRBD_REQ_FLUSH | DRBD_REQ_FUA; peer_req->flags |= EE_IS_BARRIER; } else { if (atomic_read(&epoch->epoch_size) > 1 || !test_bit(DE_CONTAINS_A_BARRIER, &epoch->flags)) { set_bit(DE_BARRIER_IN_NEXT_EPOCH_ISSUED, &epoch->flags); set_bit(DE_CONTAINS_A_BARRIER, &peer_req->epoch->flags); rw |= DRBD_REQ_FLUSH | DRBD_REQ_FUA; peer_req->flags |= EE_IS_BARRIER; } } } spin_unlock(&connection->epoch_lock); rcu_read_lock(); tp = rcu_dereference(peer_device->connection->net_conf)->two_primaries; rcu_read_unlock(); if (tp) { peer_req->flags |= EE_IN_INTERVAL_TREE; err = wait_for_and_update_peer_seq(peer_device, peer_seq); if (err) goto out_interrupted; spin_lock_irq(&device->resource->req_lock); err = handle_write_conflicts(device, peer_req); if (err) { spin_unlock_irq(&device->resource->req_lock); if (err == -ENOENT) { put_ldev(device); return 0; } goto out_interrupted; } } else { update_peer_seq(peer_device, peer_seq); spin_lock_irq(&device->resource->req_lock); } /* if we use the zeroout fallback code, we process synchronously * and we wait for all pending requests, respectively wait for * active_ee to become empty in drbd_submit_peer_request(); * better not add ourselves here. */ if ((peer_req->flags & EE_IS_TRIM_USE_ZEROOUT) == 0) list_add(&peer_req->w.list, &device->active_ee); spin_unlock_irq(&device->resource->req_lock); if (device->state.conn == C_SYNC_TARGET) wait_event(device->ee_wait, !overlapping_resync_write(device, peer_req)); if (peer_device->connection->agreed_pro_version < 100) { rcu_read_lock(); switch (rcu_dereference(peer_device->connection->net_conf)->wire_protocol) { case DRBD_PROT_C: dp_flags |= DP_SEND_WRITE_ACK; break; case DRBD_PROT_B: dp_flags |= DP_SEND_RECEIVE_ACK; break; } rcu_read_unlock(); } if (dp_flags & DP_SEND_WRITE_ACK) { peer_req->flags |= EE_SEND_WRITE_ACK; inc_unacked(device); /* corresponding dec_unacked() in e_end_block() * respective _drbd_clear_done_ee */ } if (dp_flags & DP_SEND_RECEIVE_ACK) { /* I really don't like it that the receiver thread * sends on the msock, but anyways */ drbd_send_ack(first_peer_device(device), P_RECV_ACK, peer_req); } if (device->state.pdsk < D_INCONSISTENT) { /* In case we have the only disk of the cluster, */ drbd_set_out_of_sync(device, peer_req->i.sector, peer_req->i.size); peer_req->flags |= EE_CALL_AL_COMPLETE_IO; peer_req->flags &= ~EE_MAY_SET_IN_SYNC; drbd_al_begin_io(device, &peer_req->i, true); } err = drbd_submit_peer_request(device, peer_req, rw, DRBD_FAULT_DT_WR); if (!err) return 0; /* don't care for the reason here */ drbd_err(device, "submit failed, triggering re-connect\n"); spin_lock_irq(&device->resource->req_lock); list_del(&peer_req->w.list); drbd_remove_epoch_entry_interval(device, peer_req); spin_unlock_irq(&device->resource->req_lock); if (peer_req->flags & EE_CALL_AL_COMPLETE_IO) drbd_al_complete_io(device, &peer_req->i); out_interrupted: drbd_may_finish_epoch(connection, peer_req->epoch, EV_PUT + EV_CLEANUP); put_ldev(device); drbd_free_peer_req(device, peer_req); return err; } /* We may throttle resync, if the lower device seems to be busy, * and current sync rate is above c_min_rate. * * To decide whether or not the lower device is busy, we use a scheme similar * to MD RAID is_mddev_idle(): if the partition stats reveal "significant" * (more than 64 sectors) of activity we cannot account for with our own resync * activity, it obviously is "busy". * * The current sync rate used here uses only the most recent two step marks, * to have a short time average so we can react faster. */ bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector) { struct lc_element *tmp; bool throttle = true; if (!drbd_rs_c_min_rate_throttle(device)) return false; spin_lock_irq(&device->al_lock); tmp = lc_find(device->resync, BM_SECT_TO_EXT(sector)); if (tmp) { struct bm_extent *bm_ext = lc_entry(tmp, struct bm_extent, lce); if (test_bit(BME_PRIORITY, &bm_ext->flags)) throttle = false; /* Do not slow down if app IO is already waiting for this extent */ } spin_unlock_irq(&device->al_lock); return throttle; } bool drbd_rs_c_min_rate_throttle(struct drbd_device *device) { unsigned long db, dt, dbdt; unsigned int c_min_rate; int curr_events; rcu_read_lock(); c_min_rate = rcu_dereference(device->ldev->disk_conf)->c_min_rate; rcu_read_unlock(); /* feature disabled? */ if (c_min_rate == 0) return false; curr_events = drbd_backing_bdev_events(device->ldev->backing_bdev->bd_contains->bd_disk) - atomic_read(&device->rs_sect_ev); if (!device->rs_last_events || curr_events - device->rs_last_events > 64) { unsigned long rs_left; int i; device->rs_last_events = curr_events; /* sync speed average over the last 2*DRBD_SYNC_MARK_STEP, * approx. */ i = (device->rs_last_mark + DRBD_SYNC_MARKS-1) % DRBD_SYNC_MARKS; if (device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) rs_left = device->ov_left; else rs_left = drbd_bm_total_weight(device) - device->rs_failed; dt = ((long)jiffies - (long)device->rs_mark_time[i]) / HZ; if (!dt) dt++; db = device->rs_mark_left[i] - rs_left; dbdt = Bit2KB(db/dt); if (dbdt > c_min_rate) return true; } return false; } static int receive_DataRequest(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; sector_t sector; sector_t capacity; struct drbd_peer_request *peer_req; struct digest_info *di = NULL; int size, verb; unsigned int fault_type; struct p_block_req *p = pi->data; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; capacity = drbd_get_capacity(device->this_bdev); sector = be64_to_cpu(p->sector); size = be32_to_cpu(p->blksize); if (size <= 0 || !IS_ALIGNED(size, 512) || size > DRBD_MAX_BIO_SIZE) { drbd_err(device, "%s:%d: sector: %llus, size: %u\n", __FILE__, __LINE__, (unsigned long long)sector, size); return -EINVAL; } if (sector + (size>>9) > capacity) { drbd_err(device, "%s:%d: sector: %llus, size: %u\n", __FILE__, __LINE__, (unsigned long long)sector, size); return -EINVAL; } if (!get_ldev_if_state(device, D_UP_TO_DATE)) { verb = 1; switch (pi->cmd) { case P_DATA_REQUEST: drbd_send_ack_rp(peer_device, P_NEG_DREPLY, p); break; case P_RS_DATA_REQUEST: case P_CSUM_RS_REQUEST: case P_OV_REQUEST: drbd_send_ack_rp(peer_device, P_NEG_RS_DREPLY , p); break; case P_OV_REPLY: verb = 0; dec_rs_pending(device); drbd_send_ack_ex(peer_device, P_OV_RESULT, sector, size, ID_IN_SYNC); break; default: BUG(); } if (verb && DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Can not satisfy peer's read request, " "no local data.\n"); /* drain possibly payload */ return drbd_drain_block(peer_device, pi->size); } /* GFP_NOIO, because we must not cause arbitrary write-out: in a DRBD * "criss-cross" setup, that might cause write-out on some other DRBD, * which in turn might block on the other node at this very place. */ peer_req = drbd_alloc_peer_req(peer_device, p->block_id, sector, size, true /* has real payload */, GFP_NOIO); if (!peer_req) { put_ldev(device); return -ENOMEM; } switch (pi->cmd) { case P_DATA_REQUEST: peer_req->w.cb = w_e_end_data_req; fault_type = DRBD_FAULT_DT_RD; /* application IO, don't drbd_rs_begin_io */ goto submit; case P_RS_DATA_REQUEST: peer_req->w.cb = w_e_end_rsdata_req; fault_type = DRBD_FAULT_RS_RD; /* used in the sector offset progress display */ device->bm_resync_fo = BM_SECT_TO_BIT(sector); break; case P_OV_REPLY: case P_CSUM_RS_REQUEST: fault_type = DRBD_FAULT_RS_RD; di = kmalloc(sizeof(*di) + pi->size, GFP_NOIO); if (!di) goto out_free_e; di->digest_size = pi->size; di->digest = (((char *)di)+sizeof(struct digest_info)); peer_req->digest = di; peer_req->flags |= EE_HAS_DIGEST; if (drbd_recv_all(peer_device->connection, di->digest, pi->size)) goto out_free_e; if (pi->cmd == P_CSUM_RS_REQUEST) { D_ASSERT(device, peer_device->connection->agreed_pro_version >= 89); peer_req->w.cb = w_e_end_csum_rs_req; /* used in the sector offset progress display */ device->bm_resync_fo = BM_SECT_TO_BIT(sector); } else if (pi->cmd == P_OV_REPLY) { /* track progress, we may need to throttle */ atomic_add(size >> 9, &device->rs_sect_in); peer_req->w.cb = w_e_end_ov_reply; dec_rs_pending(device); /* drbd_rs_begin_io done when we sent this request, * but accounting still needs to be done. */ goto submit_for_resync; } break; case P_OV_REQUEST: if (device->ov_start_sector == ~(sector_t)0 && peer_device->connection->agreed_pro_version >= 90) { unsigned long now = jiffies; int i; device->ov_start_sector = sector; device->ov_position = sector; device->ov_left = drbd_bm_bits(device) - BM_SECT_TO_BIT(sector); device->rs_total = device->ov_left; for (i = 0; i < DRBD_SYNC_MARKS; i++) { device->rs_mark_left[i] = device->ov_left; device->rs_mark_time[i] = now; } drbd_info(device, "Online Verify start sector: %llu\n", (unsigned long long)sector); } peer_req->w.cb = w_e_end_ov_req; fault_type = DRBD_FAULT_RS_RD; break; default: BUG(); } /* Throttle, drbd_rs_begin_io and submit should become asynchronous * wrt the receiver, but it is not as straightforward as it may seem. * Various places in the resync start and stop logic assume resync * requests are processed in order, requeuing this on the worker thread * introduces a bunch of new code for synchronization between threads. * * Unlimited throttling before drbd_rs_begin_io may stall the resync * "forever", throttling after drbd_rs_begin_io will lock that extent * for application writes for the same time. For now, just throttle * here, where the rest of the code expects the receiver to sleep for * a while, anyways. */ /* Throttle before drbd_rs_begin_io, as that locks out application IO; * this defers syncer requests for some time, before letting at least * on request through. The resync controller on the receiving side * will adapt to the incoming rate accordingly. * * We cannot throttle here if remote is Primary/SyncTarget: * we would also throttle its application reads. * In that case, throttling is done on the SyncTarget only. */ if (device->state.peer != R_PRIMARY && drbd_rs_should_slow_down(device, sector)) schedule_timeout_uninterruptible(HZ/10); if (drbd_rs_begin_io(device, sector)) goto out_free_e; submit_for_resync: atomic_add(size >> 9, &device->rs_sect_ev); submit: inc_unacked(device); spin_lock_irq(&device->resource->req_lock); list_add_tail(&peer_req->w.list, &device->read_ee); spin_unlock_irq(&device->resource->req_lock); if (drbd_submit_peer_request(device, peer_req, READ, fault_type) == 0) return 0; /* don't care for the reason here */ drbd_err(device, "submit failed, triggering re-connect\n"); spin_lock_irq(&device->resource->req_lock); list_del(&peer_req->w.list); spin_unlock_irq(&device->resource->req_lock); /* no drbd_rs_complete_io(), we are dropping the connection anyways */ out_free_e: put_ldev(device); drbd_free_peer_req(device, peer_req); return -EIO; } /** * drbd_asb_recover_0p - Recover after split-brain with no remaining primaries */ static int drbd_asb_recover_0p(struct drbd_peer_device *peer_device) __must_hold(local) { struct drbd_device *device = peer_device->device; int self, peer, rv = -100; unsigned long ch_self, ch_peer; enum drbd_after_sb_p after_sb_0p; self = device->ldev->md.uuid[UI_BITMAP] & 1; peer = device->p_uuid[UI_BITMAP] & 1; ch_peer = device->p_uuid[UI_SIZE]; ch_self = device->comm_bm_set; rcu_read_lock(); after_sb_0p = rcu_dereference(peer_device->connection->net_conf)->after_sb_0p; rcu_read_unlock(); switch (after_sb_0p) { case ASB_CONSENSUS: case ASB_DISCARD_SECONDARY: case ASB_CALL_HELPER: case ASB_VIOLENTLY: drbd_err(device, "Configuration error.\n"); break; case ASB_DISCONNECT: break; case ASB_DISCARD_YOUNGER_PRI: if (self == 0 && peer == 1) { rv = -1; break; } if (self == 1 && peer == 0) { rv = 1; break; } /* Else fall through to one of the other strategies... */ case ASB_DISCARD_OLDER_PRI: if (self == 0 && peer == 1) { rv = 1; break; } if (self == 1 && peer == 0) { rv = -1; break; } /* Else fall through to one of the other strategies... */ drbd_warn(device, "Discard younger/older primary did not find a decision\n" "Using discard-least-changes instead\n"); case ASB_DISCARD_ZERO_CHG: if (ch_peer == 0 && ch_self == 0) { rv = test_bit(RESOLVE_CONFLICTS, &peer_device->connection->flags) ? -1 : 1; break; } else { if (ch_peer == 0) { rv = 1; break; } if (ch_self == 0) { rv = -1; break; } } if (after_sb_0p == ASB_DISCARD_ZERO_CHG) break; case ASB_DISCARD_LEAST_CHG: if (ch_self < ch_peer) rv = -1; else if (ch_self > ch_peer) rv = 1; else /* ( ch_self == ch_peer ) */ /* Well, then use something else. */ rv = test_bit(RESOLVE_CONFLICTS, &peer_device->connection->flags) ? -1 : 1; break; case ASB_DISCARD_LOCAL: rv = -1; break; case ASB_DISCARD_REMOTE: rv = 1; } return rv; } /** * drbd_asb_recover_1p - Recover after split-brain with one remaining primary */ static int drbd_asb_recover_1p(struct drbd_peer_device *peer_device) __must_hold(local) { struct drbd_device *device = peer_device->device; int hg, rv = -100; enum drbd_after_sb_p after_sb_1p; rcu_read_lock(); after_sb_1p = rcu_dereference(peer_device->connection->net_conf)->after_sb_1p; rcu_read_unlock(); switch (after_sb_1p) { case ASB_DISCARD_YOUNGER_PRI: case ASB_DISCARD_OLDER_PRI: case ASB_DISCARD_LEAST_CHG: case ASB_DISCARD_LOCAL: case ASB_DISCARD_REMOTE: case ASB_DISCARD_ZERO_CHG: drbd_err(device, "Configuration error.\n"); break; case ASB_DISCONNECT: break; case ASB_CONSENSUS: hg = drbd_asb_recover_0p(peer_device); if (hg == -1 && device->state.role == R_SECONDARY) rv = hg; if (hg == 1 && device->state.role == R_PRIMARY) rv = hg; break; case ASB_VIOLENTLY: rv = drbd_asb_recover_0p(peer_device); break; case ASB_DISCARD_SECONDARY: return device->state.role == R_PRIMARY ? 1 : -1; case ASB_CALL_HELPER: hg = drbd_asb_recover_0p(peer_device); if (hg == -1 && device->state.role == R_PRIMARY) { enum drbd_state_rv rv2; /* drbd_change_state() does not sleep while in SS_IN_TRANSIENT_STATE, * we might be here in C_WF_REPORT_PARAMS which is transient. * we do not need to wait for the after state change work either. */ rv2 = drbd_change_state(device, CS_VERBOSE, NS(role, R_SECONDARY)); if (rv2 != SS_SUCCESS) { drbd_khelper(device, "pri-lost-after-sb"); } else { drbd_warn(device, "Successfully gave up primary role.\n"); rv = hg; } } else rv = hg; } return rv; } /** * drbd_asb_recover_2p - Recover after split-brain with two remaining primaries */ static int drbd_asb_recover_2p(struct drbd_peer_device *peer_device) __must_hold(local) { struct drbd_device *device = peer_device->device; int hg, rv = -100; enum drbd_after_sb_p after_sb_2p; rcu_read_lock(); after_sb_2p = rcu_dereference(peer_device->connection->net_conf)->after_sb_2p; rcu_read_unlock(); switch (after_sb_2p) { case ASB_DISCARD_YOUNGER_PRI: case ASB_DISCARD_OLDER_PRI: case ASB_DISCARD_LEAST_CHG: case ASB_DISCARD_LOCAL: case ASB_DISCARD_REMOTE: case ASB_CONSENSUS: case ASB_DISCARD_SECONDARY: case ASB_DISCARD_ZERO_CHG: drbd_err(device, "Configuration error.\n"); break; case ASB_VIOLENTLY: rv = drbd_asb_recover_0p(peer_device); break; case ASB_DISCONNECT: break; case ASB_CALL_HELPER: hg = drbd_asb_recover_0p(peer_device); if (hg == -1) { enum drbd_state_rv rv2; /* drbd_change_state() does not sleep while in SS_IN_TRANSIENT_STATE, * we might be here in C_WF_REPORT_PARAMS which is transient. * we do not need to wait for the after state change work either. */ rv2 = drbd_change_state(device, CS_VERBOSE, NS(role, R_SECONDARY)); if (rv2 != SS_SUCCESS) { drbd_khelper(device, "pri-lost-after-sb"); } else { drbd_warn(device, "Successfully gave up primary role.\n"); rv = hg; } } else rv = hg; } return rv; } static void drbd_uuid_dump(struct drbd_device *device, char *text, u64 *uuid, u64 bits, u64 flags) { if (!uuid) { drbd_info(device, "%s uuid info vanished while I was looking!\n", text); return; } drbd_info(device, "%s %016llX:%016llX:%016llX:%016llX bits:%llu flags:%llX\n", text, (unsigned long long)uuid[UI_CURRENT], (unsigned long long)uuid[UI_BITMAP], (unsigned long long)uuid[UI_HISTORY_START], (unsigned long long)uuid[UI_HISTORY_END], (unsigned long long)bits, (unsigned long long)flags); } /* 100 after split brain try auto recover 2 C_SYNC_SOURCE set BitMap 1 C_SYNC_SOURCE use BitMap 0 no Sync -1 C_SYNC_TARGET use BitMap -2 C_SYNC_TARGET set BitMap -100 after split brain, disconnect -1000 unrelated data -1091 requires proto 91 -1096 requires proto 96 */ static int drbd_uuid_compare(struct drbd_device *device, int *rule_nr) __must_hold(local) { u64 self, peer; int i, j; self = device->ldev->md.uuid[UI_CURRENT] & ~((u64)1); peer = device->p_uuid[UI_CURRENT] & ~((u64)1); *rule_nr = 10; if (self == UUID_JUST_CREATED && peer == UUID_JUST_CREATED) return 0; *rule_nr = 20; if ((self == UUID_JUST_CREATED || self == (u64)0) && peer != UUID_JUST_CREATED) return -2; *rule_nr = 30; if (self != UUID_JUST_CREATED && (peer == UUID_JUST_CREATED || peer == (u64)0)) return 2; if (self == peer) { int rct, dc; /* roles at crash time */ if (device->p_uuid[UI_BITMAP] == (u64)0 && device->ldev->md.uuid[UI_BITMAP] != (u64)0) { if (first_peer_device(device)->connection->agreed_pro_version < 91) return -1091; if ((device->ldev->md.uuid[UI_BITMAP] & ~((u64)1)) == (device->p_uuid[UI_HISTORY_START] & ~((u64)1)) && (device->ldev->md.uuid[UI_HISTORY_START] & ~((u64)1)) == (device->p_uuid[UI_HISTORY_START + 1] & ~((u64)1))) { drbd_info(device, "was SyncSource, missed the resync finished event, corrected myself:\n"); drbd_uuid_move_history(device); device->ldev->md.uuid[UI_HISTORY_START] = device->ldev->md.uuid[UI_BITMAP]; device->ldev->md.uuid[UI_BITMAP] = 0; drbd_uuid_dump(device, "self", device->ldev->md.uuid, device->state.disk >= D_NEGOTIATING ? drbd_bm_total_weight(device) : 0, 0); *rule_nr = 34; } else { drbd_info(device, "was SyncSource (peer failed to write sync_uuid)\n"); *rule_nr = 36; } return 1; } if (device->ldev->md.uuid[UI_BITMAP] == (u64)0 && device->p_uuid[UI_BITMAP] != (u64)0) { if (first_peer_device(device)->connection->agreed_pro_version < 91) return -1091; if ((device->ldev->md.uuid[UI_HISTORY_START] & ~((u64)1)) == (device->p_uuid[UI_BITMAP] & ~((u64)1)) && (device->ldev->md.uuid[UI_HISTORY_START + 1] & ~((u64)1)) == (device->p_uuid[UI_HISTORY_START] & ~((u64)1))) { drbd_info(device, "was SyncTarget, peer missed the resync finished event, corrected peer:\n"); device->p_uuid[UI_HISTORY_START + 1] = device->p_uuid[UI_HISTORY_START]; device->p_uuid[UI_HISTORY_START] = device->p_uuid[UI_BITMAP]; device->p_uuid[UI_BITMAP] = 0UL; drbd_uuid_dump(device, "peer", device->p_uuid, device->p_uuid[UI_SIZE], device->p_uuid[UI_FLAGS]); *rule_nr = 35; } else { drbd_info(device, "was SyncTarget (failed to write sync_uuid)\n"); *rule_nr = 37; } return -1; } /* Common power [off|failure] */ rct = (test_bit(CRASHED_PRIMARY, &device->flags) ? 1 : 0) + (device->p_uuid[UI_FLAGS] & 2); /* lowest bit is set when we were primary, * next bit (weight 2) is set when peer was primary */ *rule_nr = 40; switch (rct) { case 0: /* !self_pri && !peer_pri */ return 0; case 1: /* self_pri && !peer_pri */ return 1; case 2: /* !self_pri && peer_pri */ return -1; case 3: /* self_pri && peer_pri */ dc = test_bit(RESOLVE_CONFLICTS, &first_peer_device(device)->connection->flags); return dc ? -1 : 1; } } *rule_nr = 50; peer = device->p_uuid[UI_BITMAP] & ~((u64)1); if (self == peer) return -1; *rule_nr = 51; peer = device->p_uuid[UI_HISTORY_START] & ~((u64)1); if (self == peer) { if (first_peer_device(device)->connection->agreed_pro_version < 96 ? (device->ldev->md.uuid[UI_HISTORY_START] & ~((u64)1)) == (device->p_uuid[UI_HISTORY_START + 1] & ~((u64)1)) : peer + UUID_NEW_BM_OFFSET == (device->p_uuid[UI_BITMAP] & ~((u64)1))) { /* The last P_SYNC_UUID did not get though. Undo the last start of resync as sync source modifications of the peer's UUIDs. */ if (first_peer_device(device)->connection->agreed_pro_version < 91) return -1091; device->p_uuid[UI_BITMAP] = device->p_uuid[UI_HISTORY_START]; device->p_uuid[UI_HISTORY_START] = device->p_uuid[UI_HISTORY_START + 1]; drbd_info(device, "Lost last syncUUID packet, corrected:\n"); drbd_uuid_dump(device, "peer", device->p_uuid, device->p_uuid[UI_SIZE], device->p_uuid[UI_FLAGS]); return -1; } } *rule_nr = 60; self = device->ldev->md.uuid[UI_CURRENT] & ~((u64)1); for (i = UI_HISTORY_START; i <= UI_HISTORY_END; i++) { peer = device->p_uuid[i] & ~((u64)1); if (self == peer) return -2; } *rule_nr = 70; self = device->ldev->md.uuid[UI_BITMAP] & ~((u64)1); peer = device->p_uuid[UI_CURRENT] & ~((u64)1); if (self == peer) return 1; *rule_nr = 71; self = device->ldev->md.uuid[UI_HISTORY_START] & ~((u64)1); if (self == peer) { if (first_peer_device(device)->connection->agreed_pro_version < 96 ? (device->ldev->md.uuid[UI_HISTORY_START + 1] & ~((u64)1)) == (device->p_uuid[UI_HISTORY_START] & ~((u64)1)) : self + UUID_NEW_BM_OFFSET == (device->ldev->md.uuid[UI_BITMAP] & ~((u64)1))) { /* The last P_SYNC_UUID did not get though. Undo the last start of resync as sync source modifications of our UUIDs. */ if (first_peer_device(device)->connection->agreed_pro_version < 91) return -1091; __drbd_uuid_set(device, UI_BITMAP, device->ldev->md.uuid[UI_HISTORY_START]); __drbd_uuid_set(device, UI_HISTORY_START, device->ldev->md.uuid[UI_HISTORY_START + 1]); drbd_info(device, "Last syncUUID did not get through, corrected:\n"); drbd_uuid_dump(device, "self", device->ldev->md.uuid, device->state.disk >= D_NEGOTIATING ? drbd_bm_total_weight(device) : 0, 0); return 1; } } *rule_nr = 80; peer = device->p_uuid[UI_CURRENT] & ~((u64)1); for (i = UI_HISTORY_START; i <= UI_HISTORY_END; i++) { self = device->ldev->md.uuid[i] & ~((u64)1); if (self == peer) return 2; } *rule_nr = 90; self = device->ldev->md.uuid[UI_BITMAP] & ~((u64)1); peer = device->p_uuid[UI_BITMAP] & ~((u64)1); if (self == peer && self != ((u64)0)) return 100; *rule_nr = 100; for (i = UI_HISTORY_START; i <= UI_HISTORY_END; i++) { self = device->ldev->md.uuid[i] & ~((u64)1); for (j = UI_HISTORY_START; j <= UI_HISTORY_END; j++) { peer = device->p_uuid[j] & ~((u64)1); if (self == peer) return -100; } } return -1000; } /* drbd_sync_handshake() returns the new conn state on success, or CONN_MASK (-1) on failure. */ static enum drbd_conns drbd_sync_handshake(struct drbd_peer_device *peer_device, enum drbd_role peer_role, enum drbd_disk_state peer_disk) __must_hold(local) { struct drbd_device *device = peer_device->device; enum drbd_conns rv = C_MASK; enum drbd_disk_state mydisk; struct net_conf *nc; int hg, rule_nr, rr_conflict, tentative; mydisk = device->state.disk; if (mydisk == D_NEGOTIATING) mydisk = device->new_state_tmp.disk; drbd_info(device, "drbd_sync_handshake:\n"); spin_lock_irq(&device->ldev->md.uuid_lock); drbd_uuid_dump(device, "self", device->ldev->md.uuid, device->comm_bm_set, 0); drbd_uuid_dump(device, "peer", device->p_uuid, device->p_uuid[UI_SIZE], device->p_uuid[UI_FLAGS]); hg = drbd_uuid_compare(device, &rule_nr); spin_unlock_irq(&device->ldev->md.uuid_lock); drbd_info(device, "uuid_compare()=%d by rule %d\n", hg, rule_nr); if (hg == -1000) { drbd_alert(device, "Unrelated data, aborting!\n"); return C_MASK; } if (hg < -1000) { drbd_alert(device, "To resolve this both sides have to support at least protocol %d\n", -hg - 1000); return C_MASK; } if ((mydisk == D_INCONSISTENT && peer_disk > D_INCONSISTENT) || (peer_disk == D_INCONSISTENT && mydisk > D_INCONSISTENT)) { int f = (hg == -100) || abs(hg) == 2; hg = mydisk > D_INCONSISTENT ? 1 : -1; if (f) hg = hg*2; drbd_info(device, "Becoming sync %s due to disk states.\n", hg > 0 ? "source" : "target"); } if (abs(hg) == 100) drbd_khelper(device, "initial-split-brain"); rcu_read_lock(); nc = rcu_dereference(peer_device->connection->net_conf); if (hg == 100 || (hg == -100 && nc->always_asbp)) { int pcount = (device->state.role == R_PRIMARY) + (peer_role == R_PRIMARY); int forced = (hg == -100); switch (pcount) { case 0: hg = drbd_asb_recover_0p(peer_device); break; case 1: hg = drbd_asb_recover_1p(peer_device); break; case 2: hg = drbd_asb_recover_2p(peer_device); break; } if (abs(hg) < 100) { drbd_warn(device, "Split-Brain detected, %d primaries, " "automatically solved. Sync from %s node\n", pcount, (hg < 0) ? "peer" : "this"); if (forced) { drbd_warn(device, "Doing a full sync, since" " UUIDs where ambiguous.\n"); hg = hg*2; } } } if (hg == -100) { if (test_bit(DISCARD_MY_DATA, &device->flags) && !(device->p_uuid[UI_FLAGS]&1)) hg = -1; if (!test_bit(DISCARD_MY_DATA, &device->flags) && (device->p_uuid[UI_FLAGS]&1)) hg = 1; if (abs(hg) < 100) drbd_warn(device, "Split-Brain detected, manually solved. " "Sync from %s node\n", (hg < 0) ? "peer" : "this"); } rr_conflict = nc->rr_conflict; tentative = nc->tentative; rcu_read_unlock(); if (hg == -100) { /* FIXME this log message is not correct if we end up here * after an attempted attach on a diskless node. * We just refuse to attach -- well, we drop the "connection" * to that disk, in a way... */ drbd_alert(device, "Split-Brain detected but unresolved, dropping connection!\n"); drbd_khelper(device, "split-brain"); return C_MASK; } if (hg > 0 && mydisk <= D_INCONSISTENT) { drbd_err(device, "I shall become SyncSource, but I am inconsistent!\n"); return C_MASK; } if (hg < 0 && /* by intention we do not use mydisk here. */ device->state.role == R_PRIMARY && device->state.disk >= D_CONSISTENT) { switch (rr_conflict) { case ASB_CALL_HELPER: drbd_khelper(device, "pri-lost"); /* fall through */ case ASB_DISCONNECT: drbd_err(device, "I shall become SyncTarget, but I am primary!\n"); return C_MASK; case ASB_VIOLENTLY: drbd_warn(device, "Becoming SyncTarget, violating the stable-data" "assumption\n"); } } if (tentative || test_bit(CONN_DRY_RUN, &peer_device->connection->flags)) { if (hg == 0) drbd_info(device, "dry-run connect: No resync, would become Connected immediately.\n"); else drbd_info(device, "dry-run connect: Would become %s, doing a %s resync.", drbd_conn_str(hg > 0 ? C_SYNC_SOURCE : C_SYNC_TARGET), abs(hg) >= 2 ? "full" : "bit-map based"); return C_MASK; } if (abs(hg) >= 2) { drbd_info(device, "Writing the whole bitmap, full sync required after drbd_sync_handshake.\n"); if (drbd_bitmap_io(device, &drbd_bmio_set_n_write, "set_n_write from sync_handshake", BM_LOCKED_SET_ALLOWED)) return C_MASK; } if (hg > 0) { /* become sync source. */ rv = C_WF_BITMAP_S; } else if (hg < 0) { /* become sync target */ rv = C_WF_BITMAP_T; } else { rv = C_CONNECTED; if (drbd_bm_total_weight(device)) { drbd_info(device, "No resync, but %lu bits in bitmap!\n", drbd_bm_total_weight(device)); } } return rv; } static enum drbd_after_sb_p convert_after_sb(enum drbd_after_sb_p peer) { /* ASB_DISCARD_REMOTE - ASB_DISCARD_LOCAL is valid */ if (peer == ASB_DISCARD_REMOTE) return ASB_DISCARD_LOCAL; /* any other things with ASB_DISCARD_REMOTE or ASB_DISCARD_LOCAL are invalid */ if (peer == ASB_DISCARD_LOCAL) return ASB_DISCARD_REMOTE; /* everything else is valid if they are equal on both sides. */ return peer; } static int receive_protocol(struct drbd_connection *connection, struct packet_info *pi) { struct p_protocol *p = pi->data; enum drbd_after_sb_p p_after_sb_0p, p_after_sb_1p, p_after_sb_2p; int p_proto, p_discard_my_data, p_two_primaries, cf; struct net_conf *nc, *old_net_conf, *new_net_conf = NULL; char integrity_alg[SHARED_SECRET_MAX] = ""; struct crypto_hash *peer_integrity_tfm = NULL; void *int_dig_in = NULL, *int_dig_vv = NULL; p_proto = be32_to_cpu(p->protocol); p_after_sb_0p = be32_to_cpu(p->after_sb_0p); p_after_sb_1p = be32_to_cpu(p->after_sb_1p); p_after_sb_2p = be32_to_cpu(p->after_sb_2p); p_two_primaries = be32_to_cpu(p->two_primaries); cf = be32_to_cpu(p->conn_flags); p_discard_my_data = cf & CF_DISCARD_MY_DATA; if (connection->agreed_pro_version >= 87) { int err; if (pi->size > sizeof(integrity_alg)) return -EIO; err = drbd_recv_all(connection, integrity_alg, pi->size); if (err) return err; integrity_alg[SHARED_SECRET_MAX - 1] = 0; } if (pi->cmd != P_PROTOCOL_UPDATE) { clear_bit(CONN_DRY_RUN, &connection->flags); if (cf & CF_DRY_RUN) set_bit(CONN_DRY_RUN, &connection->flags); rcu_read_lock(); nc = rcu_dereference(connection->net_conf); if (p_proto != nc->wire_protocol) { drbd_err(connection, "incompatible %s settings\n", "protocol"); goto disconnect_rcu_unlock; } if (convert_after_sb(p_after_sb_0p) != nc->after_sb_0p) { drbd_err(connection, "incompatible %s settings\n", "after-sb-0pri"); goto disconnect_rcu_unlock; } if (convert_after_sb(p_after_sb_1p) != nc->after_sb_1p) { drbd_err(connection, "incompatible %s settings\n", "after-sb-1pri"); goto disconnect_rcu_unlock; } if (convert_after_sb(p_after_sb_2p) != nc->after_sb_2p) { drbd_err(connection, "incompatible %s settings\n", "after-sb-2pri"); goto disconnect_rcu_unlock; } if (p_discard_my_data && nc->discard_my_data) { drbd_err(connection, "incompatible %s settings\n", "discard-my-data"); goto disconnect_rcu_unlock; } if (p_two_primaries != nc->two_primaries) { drbd_err(connection, "incompatible %s settings\n", "allow-two-primaries"); goto disconnect_rcu_unlock; } if (strcmp(integrity_alg, nc->integrity_alg)) { drbd_err(connection, "incompatible %s settings\n", "data-integrity-alg"); goto disconnect_rcu_unlock; } rcu_read_unlock(); } if (integrity_alg[0]) { int hash_size; /* * We can only change the peer data integrity algorithm * here. Changing our own data integrity algorithm * requires that we send a P_PROTOCOL_UPDATE packet at * the same time; otherwise, the peer has no way to * tell between which packets the algorithm should * change. */ peer_integrity_tfm = crypto_alloc_hash(integrity_alg, 0, CRYPTO_ALG_ASYNC); if (!peer_integrity_tfm) { drbd_err(connection, "peer data-integrity-alg %s not supported\n", integrity_alg); goto disconnect; } hash_size = crypto_hash_digestsize(peer_integrity_tfm); int_dig_in = kmalloc(hash_size, GFP_KERNEL); int_dig_vv = kmalloc(hash_size, GFP_KERNEL); if (!(int_dig_in && int_dig_vv)) { drbd_err(connection, "Allocation of buffers for data integrity checking failed\n"); goto disconnect; } } new_net_conf = kmalloc(sizeof(struct net_conf), GFP_KERNEL); if (!new_net_conf) { drbd_err(connection, "Allocation of new net_conf failed\n"); goto disconnect; } mutex_lock(&connection->data.mutex); mutex_lock(&connection->resource->conf_update); old_net_conf = connection->net_conf; *new_net_conf = *old_net_conf; new_net_conf->wire_protocol = p_proto; new_net_conf->after_sb_0p = convert_after_sb(p_after_sb_0p); new_net_conf->after_sb_1p = convert_after_sb(p_after_sb_1p); new_net_conf->after_sb_2p = convert_after_sb(p_after_sb_2p); new_net_conf->two_primaries = p_two_primaries; rcu_assign_pointer(connection->net_conf, new_net_conf); mutex_unlock(&connection->resource->conf_update); mutex_unlock(&connection->data.mutex); crypto_free_hash(connection->peer_integrity_tfm); kfree(connection->int_dig_in); kfree(connection->int_dig_vv); connection->peer_integrity_tfm = peer_integrity_tfm; connection->int_dig_in = int_dig_in; connection->int_dig_vv = int_dig_vv; if (strcmp(old_net_conf->integrity_alg, integrity_alg)) drbd_info(connection, "peer data-integrity-alg: %s\n", integrity_alg[0] ? integrity_alg : "(none)"); synchronize_rcu(); kfree(old_net_conf); return 0; disconnect_rcu_unlock: rcu_read_unlock(); disconnect: crypto_free_hash(peer_integrity_tfm); kfree(int_dig_in); kfree(int_dig_vv); conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); return -EIO; } /* helper function * input: alg name, feature name * return: NULL (alg name was "") * ERR_PTR(error) if something goes wrong * or the crypto hash ptr, if it worked out ok. */ struct crypto_hash *drbd_crypto_alloc_digest_safe(const struct drbd_device *device, const char *alg, const char *name) { struct crypto_hash *tfm; if (!alg[0]) return NULL; tfm = crypto_alloc_hash(alg, 0, CRYPTO_ALG_ASYNC); if (IS_ERR(tfm)) { drbd_err(device, "Can not allocate \"%s\" as %s (reason: %ld)\n", alg, name, PTR_ERR(tfm)); return tfm; } return tfm; } static int ignore_remaining_packet(struct drbd_connection *connection, struct packet_info *pi) { void *buffer = connection->data.rbuf; int size = pi->size; while (size) { int s = min_t(int, size, DRBD_SOCKET_BUFFER_SIZE); s = drbd_recv(connection, buffer, s); if (s <= 0) { if (s < 0) return s; break; } size -= s; } if (size) return -EIO; return 0; } /* * config_unknown_volume - device configuration command for unknown volume * * When a device is added to an existing connection, the node on which the * device is added first will send configuration commands to its peer but the * peer will not know about the device yet. It will warn and ignore these * commands. Once the device is added on the second node, the second node will * send the same device configuration commands, but in the other direction. * * (We can also end up here if drbd is misconfigured.) */ static int config_unknown_volume(struct drbd_connection *connection, struct packet_info *pi) { drbd_warn(connection, "%s packet received for volume %u, which is not configured locally\n", cmdname(pi->cmd), pi->vnr); return ignore_remaining_packet(connection, pi); } static int receive_SyncParam(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_rs_param_95 *p; unsigned int header_size, data_size, exp_max_sz; struct crypto_hash *verify_tfm = NULL; struct crypto_hash *csums_tfm = NULL; struct net_conf *old_net_conf, *new_net_conf = NULL; struct disk_conf *old_disk_conf = NULL, *new_disk_conf = NULL; const int apv = connection->agreed_pro_version; struct fifo_buffer *old_plan = NULL, *new_plan = NULL; int fifo_size = 0; int err; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return config_unknown_volume(connection, pi); device = peer_device->device; exp_max_sz = apv <= 87 ? sizeof(struct p_rs_param) : apv == 88 ? sizeof(struct p_rs_param) + SHARED_SECRET_MAX : apv <= 94 ? sizeof(struct p_rs_param_89) : /* apv >= 95 */ sizeof(struct p_rs_param_95); if (pi->size > exp_max_sz) { drbd_err(device, "SyncParam packet too long: received %u, expected <= %u bytes\n", pi->size, exp_max_sz); return -EIO; } if (apv <= 88) { header_size = sizeof(struct p_rs_param); data_size = pi->size - header_size; } else if (apv <= 94) { header_size = sizeof(struct p_rs_param_89); data_size = pi->size - header_size; D_ASSERT(device, data_size == 0); } else { header_size = sizeof(struct p_rs_param_95); data_size = pi->size - header_size; D_ASSERT(device, data_size == 0); } /* initialize verify_alg and csums_alg */ p = pi->data; memset(p->verify_alg, 0, 2 * SHARED_SECRET_MAX); err = drbd_recv_all(peer_device->connection, p, header_size); if (err) return err; mutex_lock(&connection->resource->conf_update); old_net_conf = peer_device->connection->net_conf; if (get_ldev(device)) { new_disk_conf = kzalloc(sizeof(struct disk_conf), GFP_KERNEL); if (!new_disk_conf) { put_ldev(device); mutex_unlock(&connection->resource->conf_update); drbd_err(device, "Allocation of new disk_conf failed\n"); return -ENOMEM; } old_disk_conf = device->ldev->disk_conf; *new_disk_conf = *old_disk_conf; new_disk_conf->resync_rate = be32_to_cpu(p->resync_rate); } if (apv >= 88) { if (apv == 88) { if (data_size > SHARED_SECRET_MAX || data_size == 0) { drbd_err(device, "verify-alg of wrong size, " "peer wants %u, accepting only up to %u byte\n", data_size, SHARED_SECRET_MAX); err = -EIO; goto reconnect; } err = drbd_recv_all(peer_device->connection, p->verify_alg, data_size); if (err) goto reconnect; /* we expect NUL terminated string */ /* but just in case someone tries to be evil */ D_ASSERT(device, p->verify_alg[data_size-1] == 0); p->verify_alg[data_size-1] = 0; } else /* apv >= 89 */ { /* we still expect NUL terminated strings */ /* but just in case someone tries to be evil */ D_ASSERT(device, p->verify_alg[SHARED_SECRET_MAX-1] == 0); D_ASSERT(device, p->csums_alg[SHARED_SECRET_MAX-1] == 0); p->verify_alg[SHARED_SECRET_MAX-1] = 0; p->csums_alg[SHARED_SECRET_MAX-1] = 0; } if (strcmp(old_net_conf->verify_alg, p->verify_alg)) { if (device->state.conn == C_WF_REPORT_PARAMS) { drbd_err(device, "Different verify-alg settings. me=\"%s\" peer=\"%s\"\n", old_net_conf->verify_alg, p->verify_alg); goto disconnect; } verify_tfm = drbd_crypto_alloc_digest_safe(device, p->verify_alg, "verify-alg"); if (IS_ERR(verify_tfm)) { verify_tfm = NULL; goto disconnect; } } if (apv >= 89 && strcmp(old_net_conf->csums_alg, p->csums_alg)) { if (device->state.conn == C_WF_REPORT_PARAMS) { drbd_err(device, "Different csums-alg settings. me=\"%s\" peer=\"%s\"\n", old_net_conf->csums_alg, p->csums_alg); goto disconnect; } csums_tfm = drbd_crypto_alloc_digest_safe(device, p->csums_alg, "csums-alg"); if (IS_ERR(csums_tfm)) { csums_tfm = NULL; goto disconnect; } } if (apv > 94 && new_disk_conf) { new_disk_conf->c_plan_ahead = be32_to_cpu(p->c_plan_ahead); new_disk_conf->c_delay_target = be32_to_cpu(p->c_delay_target); new_disk_conf->c_fill_target = be32_to_cpu(p->c_fill_target); new_disk_conf->c_max_rate = be32_to_cpu(p->c_max_rate); fifo_size = (new_disk_conf->c_plan_ahead * 10 * SLEEP_TIME) / HZ; if (fifo_size != device->rs_plan_s->size) { new_plan = fifo_alloc(fifo_size); if (!new_plan) { drbd_err(device, "kmalloc of fifo_buffer failed"); put_ldev(device); goto disconnect; } } } if (verify_tfm || csums_tfm) { new_net_conf = kzalloc(sizeof(struct net_conf), GFP_KERNEL); if (!new_net_conf) { drbd_err(device, "Allocation of new net_conf failed\n"); goto disconnect; } *new_net_conf = *old_net_conf; if (verify_tfm) { strcpy(new_net_conf->verify_alg, p->verify_alg); new_net_conf->verify_alg_len = strlen(p->verify_alg) + 1; crypto_free_hash(peer_device->connection->verify_tfm); peer_device->connection->verify_tfm = verify_tfm; drbd_info(device, "using verify-alg: \"%s\"\n", p->verify_alg); } if (csums_tfm) { strcpy(new_net_conf->csums_alg, p->csums_alg); new_net_conf->csums_alg_len = strlen(p->csums_alg) + 1; crypto_free_hash(peer_device->connection->csums_tfm); peer_device->connection->csums_tfm = csums_tfm; drbd_info(device, "using csums-alg: \"%s\"\n", p->csums_alg); } rcu_assign_pointer(connection->net_conf, new_net_conf); } } if (new_disk_conf) { rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf); put_ldev(device); } if (new_plan) { old_plan = device->rs_plan_s; rcu_assign_pointer(device->rs_plan_s, new_plan); } mutex_unlock(&connection->resource->conf_update); synchronize_rcu(); if (new_net_conf) kfree(old_net_conf); kfree(old_disk_conf); kfree(old_plan); return 0; reconnect: if (new_disk_conf) { put_ldev(device); kfree(new_disk_conf); } mutex_unlock(&connection->resource->conf_update); return -EIO; disconnect: kfree(new_plan); if (new_disk_conf) { put_ldev(device); kfree(new_disk_conf); } mutex_unlock(&connection->resource->conf_update); /* just for completeness: actually not needed, * as this is not reached if csums_tfm was ok. */ crypto_free_hash(csums_tfm); /* but free the verify_tfm again, if csums_tfm did not work out */ crypto_free_hash(verify_tfm); conn_request_state(peer_device->connection, NS(conn, C_DISCONNECTING), CS_HARD); return -EIO; } static void drbd_setup_order_type(struct drbd_device *device, int peer) { /* sorry, we currently have no working implementation * of distributed TCQ */ } /* warn if the arguments differ by more than 12.5% */ static void warn_if_differ_considerably(struct drbd_device *device, const char *s, sector_t a, sector_t b) { sector_t d; if (a == 0 || b == 0) return; d = (a > b) ? (a - b) : (b - a); if (d > (a>>3) || d > (b>>3)) drbd_warn(device, "Considerable difference in %s: %llus vs. %llus\n", s, (unsigned long long)a, (unsigned long long)b); } static int receive_sizes(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_sizes *p = pi->data; enum determine_dev_size dd = DS_UNCHANGED; sector_t p_size, p_usize, my_usize; int ldsc = 0; /* local disk size changed */ enum dds_flags ddsf; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return config_unknown_volume(connection, pi); device = peer_device->device; p_size = be64_to_cpu(p->d_size); p_usize = be64_to_cpu(p->u_size); /* just store the peer's disk size for now. * we still need to figure out whether we accept that. */ device->p_size = p_size; if (get_ldev(device)) { rcu_read_lock(); my_usize = rcu_dereference(device->ldev->disk_conf)->disk_size; rcu_read_unlock(); warn_if_differ_considerably(device, "lower level device sizes", p_size, drbd_get_max_capacity(device->ldev)); warn_if_differ_considerably(device, "user requested size", p_usize, my_usize); /* if this is the first connect, or an otherwise expected * param exchange, choose the minimum */ if (device->state.conn == C_WF_REPORT_PARAMS) p_usize = min_not_zero(my_usize, p_usize); /* Never shrink a device with usable data during connect. But allow online shrinking if we are connected. */ if (drbd_new_dev_size(device, device->ldev, p_usize, 0) < drbd_get_capacity(device->this_bdev) && device->state.disk >= D_OUTDATED && device->state.conn < C_CONNECTED) { drbd_err(device, "The peer's disk size is too small!\n"); conn_request_state(peer_device->connection, NS(conn, C_DISCONNECTING), CS_HARD); put_ldev(device); return -EIO; } if (my_usize != p_usize) { struct disk_conf *old_disk_conf, *new_disk_conf = NULL; new_disk_conf = kzalloc(sizeof(struct disk_conf), GFP_KERNEL); if (!new_disk_conf) { drbd_err(device, "Allocation of new disk_conf failed\n"); put_ldev(device); return -ENOMEM; } mutex_lock(&connection->resource->conf_update); old_disk_conf = device->ldev->disk_conf; *new_disk_conf = *old_disk_conf; new_disk_conf->disk_size = p_usize; rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf); mutex_unlock(&connection->resource->conf_update); synchronize_rcu(); kfree(old_disk_conf); drbd_info(device, "Peer sets u_size to %lu sectors\n", (unsigned long)my_usize); } put_ldev(device); } device->peer_max_bio_size = be32_to_cpu(p->max_bio_size); drbd_reconsider_max_bio_size(device); /* Leave drbd_reconsider_max_bio_size() before drbd_determine_dev_size(). In case we cleared the QUEUE_FLAG_DISCARD from our queue in drbd_reconsider_max_bio_size(), we can be sure that after drbd_determine_dev_size() no REQ_DISCARDs are in the queue. */ ddsf = be16_to_cpu(p->dds_flags); if (get_ldev(device)) { dd = drbd_determine_dev_size(device, ddsf, NULL); put_ldev(device); if (dd == DS_ERROR) return -EIO; drbd_md_sync(device); } else { /* I am diskless, need to accept the peer's size. */ drbd_set_my_capacity(device, p_size); } if (get_ldev(device)) { if (device->ldev->known_size != drbd_get_capacity(device->ldev->backing_bdev)) { device->ldev->known_size = drbd_get_capacity(device->ldev->backing_bdev); ldsc = 1; } drbd_setup_order_type(device, be16_to_cpu(p->queue_order_type)); put_ldev(device); } if (device->state.conn > C_WF_REPORT_PARAMS) { if (be64_to_cpu(p->c_size) != drbd_get_capacity(device->this_bdev) || ldsc) { /* we have different sizes, probably peer * needs to know my new size... */ drbd_send_sizes(peer_device, 0, ddsf); } if (test_and_clear_bit(RESIZE_PENDING, &device->flags) || (dd == DS_GREW && device->state.conn == C_CONNECTED)) { if (device->state.pdsk >= D_INCONSISTENT && device->state.disk >= D_INCONSISTENT) { if (ddsf & DDSF_NO_RESYNC) drbd_info(device, "Resync of new storage suppressed with --assume-clean\n"); else resync_after_online_grow(device); } else set_bit(RESYNC_AFTER_NEG, &device->flags); } } return 0; } static int receive_uuids(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_uuids *p = pi->data; u64 *p_uuid; int i, updated_uuids = 0; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return config_unknown_volume(connection, pi); device = peer_device->device; p_uuid = kmalloc(sizeof(u64)*UI_EXTENDED_SIZE, GFP_NOIO); if (!p_uuid) { drbd_err(device, "kmalloc of p_uuid failed\n"); return false; } for (i = UI_CURRENT; i < UI_EXTENDED_SIZE; i++) p_uuid[i] = be64_to_cpu(p->uuid[i]); kfree(device->p_uuid); device->p_uuid = p_uuid; if (device->state.conn < C_CONNECTED && device->state.disk < D_INCONSISTENT && device->state.role == R_PRIMARY && (device->ed_uuid & ~((u64)1)) != (p_uuid[UI_CURRENT] & ~((u64)1))) { drbd_err(device, "Can only connect to data with current UUID=%016llX\n", (unsigned long long)device->ed_uuid); conn_request_state(peer_device->connection, NS(conn, C_DISCONNECTING), CS_HARD); return -EIO; } if (get_ldev(device)) { int skip_initial_sync = device->state.conn == C_CONNECTED && peer_device->connection->agreed_pro_version >= 90 && device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED && (p_uuid[UI_FLAGS] & 8); if (skip_initial_sync) { drbd_info(device, "Accepted new current UUID, preparing to skip initial sync\n"); drbd_bitmap_io(device, &drbd_bmio_clear_n_write, "clear_n_write from receive_uuids", BM_LOCKED_TEST_ALLOWED); _drbd_uuid_set(device, UI_CURRENT, p_uuid[UI_CURRENT]); _drbd_uuid_set(device, UI_BITMAP, 0); _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE, pdsk, D_UP_TO_DATE), CS_VERBOSE, NULL); drbd_md_sync(device); updated_uuids = 1; } put_ldev(device); } else if (device->state.disk < D_INCONSISTENT && device->state.role == R_PRIMARY) { /* I am a diskless primary, the peer just created a new current UUID for me. */ updated_uuids = drbd_set_ed_uuid(device, p_uuid[UI_CURRENT]); } /* Before we test for the disk state, we should wait until an eventually ongoing cluster wide state change is finished. That is important if we are primary and are detaching from our disk. We need to see the new disk state... */ mutex_lock(device->state_mutex); mutex_unlock(device->state_mutex); if (device->state.conn >= C_CONNECTED && device->state.disk < D_INCONSISTENT) updated_uuids |= drbd_set_ed_uuid(device, p_uuid[UI_CURRENT]); if (updated_uuids) drbd_print_uuids(device, "receiver updated UUIDs to"); return 0; } /** * convert_state() - Converts the peer's view of the cluster state to our point of view * @ps: The state as seen by the peer. */ static union drbd_state convert_state(union drbd_state ps) { union drbd_state ms; static enum drbd_conns c_tab[] = { [C_WF_REPORT_PARAMS] = C_WF_REPORT_PARAMS, [C_CONNECTED] = C_CONNECTED, [C_STARTING_SYNC_S] = C_STARTING_SYNC_T, [C_STARTING_SYNC_T] = C_STARTING_SYNC_S, [C_DISCONNECTING] = C_TEAR_DOWN, /* C_NETWORK_FAILURE, */ [C_VERIFY_S] = C_VERIFY_T, [C_MASK] = C_MASK, }; ms.i = ps.i; ms.conn = c_tab[ps.conn]; ms.peer = ps.role; ms.role = ps.peer; ms.pdsk = ps.disk; ms.disk = ps.pdsk; ms.peer_isp = (ps.aftr_isp | ps.user_isp); return ms; } static int receive_req_state(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_req_state *p = pi->data; union drbd_state mask, val; enum drbd_state_rv rv; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; mask.i = be32_to_cpu(p->mask); val.i = be32_to_cpu(p->val); if (test_bit(RESOLVE_CONFLICTS, &peer_device->connection->flags) && mutex_is_locked(device->state_mutex)) { drbd_send_sr_reply(peer_device, SS_CONCURRENT_ST_CHG); return 0; } mask = convert_state(mask); val = convert_state(val); rv = drbd_change_state(device, CS_VERBOSE, mask, val); drbd_send_sr_reply(peer_device, rv); drbd_md_sync(device); return 0; } static int receive_req_conn_state(struct drbd_connection *connection, struct packet_info *pi) { struct p_req_state *p = pi->data; union drbd_state mask, val; enum drbd_state_rv rv; mask.i = be32_to_cpu(p->mask); val.i = be32_to_cpu(p->val); if (test_bit(RESOLVE_CONFLICTS, &connection->flags) && mutex_is_locked(&connection->cstate_mutex)) { conn_send_sr_reply(connection, SS_CONCURRENT_ST_CHG); return 0; } mask = convert_state(mask); val = convert_state(val); rv = conn_request_state(connection, mask, val, CS_VERBOSE | CS_LOCAL_ONLY | CS_IGN_OUTD_FAIL); conn_send_sr_reply(connection, rv); return 0; } static int receive_state(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_state *p = pi->data; union drbd_state os, ns, peer_state; enum drbd_disk_state real_peer_disk; enum chg_state_flags cs_flags; int rv; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return config_unknown_volume(connection, pi); device = peer_device->device; peer_state.i = be32_to_cpu(p->state); real_peer_disk = peer_state.disk; if (peer_state.disk == D_NEGOTIATING) { real_peer_disk = device->p_uuid[UI_FLAGS] & 4 ? D_INCONSISTENT : D_CONSISTENT; drbd_info(device, "real peer disk state = %s\n", drbd_disk_str(real_peer_disk)); } spin_lock_irq(&device->resource->req_lock); retry: os = ns = drbd_read_state(device); spin_unlock_irq(&device->resource->req_lock); /* If some other part of the code (asender thread, timeout) * already decided to close the connection again, * we must not "re-establish" it here. */ if (os.conn <= C_TEAR_DOWN) return -ECONNRESET; /* If this is the "end of sync" confirmation, usually the peer disk * transitions from D_INCONSISTENT to D_UP_TO_DATE. For empty (0 bits * set) resync started in PausedSyncT, or if the timing of pause-/ * unpause-sync events has been "just right", the peer disk may * transition from D_CONSISTENT to D_UP_TO_DATE as well. */ if ((os.pdsk == D_INCONSISTENT || os.pdsk == D_CONSISTENT) && real_peer_disk == D_UP_TO_DATE && os.conn > C_CONNECTED && os.disk == D_UP_TO_DATE) { /* If we are (becoming) SyncSource, but peer is still in sync * preparation, ignore its uptodate-ness to avoid flapping, it * will change to inconsistent once the peer reaches active * syncing states. * It may have changed syncer-paused flags, however, so we * cannot ignore this completely. */ if (peer_state.conn > C_CONNECTED && peer_state.conn < C_SYNC_SOURCE) real_peer_disk = D_INCONSISTENT; /* if peer_state changes to connected at the same time, * it explicitly notifies us that it finished resync. * Maybe we should finish it up, too? */ else if (os.conn >= C_SYNC_SOURCE && peer_state.conn == C_CONNECTED) { if (drbd_bm_total_weight(device) <= device->rs_failed) drbd_resync_finished(device); return 0; } } /* explicit verify finished notification, stop sector reached. */ if (os.conn == C_VERIFY_T && os.disk == D_UP_TO_DATE && peer_state.conn == C_CONNECTED && real_peer_disk == D_UP_TO_DATE) { ov_out_of_sync_print(device); drbd_resync_finished(device); return 0; } /* peer says his disk is inconsistent, while we think it is uptodate, * and this happens while the peer still thinks we have a sync going on, * but we think we are already done with the sync. * We ignore this to avoid flapping pdsk. * This should not happen, if the peer is a recent version of drbd. */ if (os.pdsk == D_UP_TO_DATE && real_peer_disk == D_INCONSISTENT && os.conn == C_CONNECTED && peer_state.conn > C_SYNC_SOURCE) real_peer_disk = D_UP_TO_DATE; if (ns.conn == C_WF_REPORT_PARAMS) ns.conn = C_CONNECTED; if (peer_state.conn == C_AHEAD) ns.conn = C_BEHIND; if (device->p_uuid && peer_state.disk >= D_NEGOTIATING && get_ldev_if_state(device, D_NEGOTIATING)) { int cr; /* consider resync */ /* if we established a new connection */ cr = (os.conn < C_CONNECTED); /* if we had an established connection * and one of the nodes newly attaches a disk */ cr |= (os.conn == C_CONNECTED && (peer_state.disk == D_NEGOTIATING || os.disk == D_NEGOTIATING)); /* if we have both been inconsistent, and the peer has been * forced to be UpToDate with --overwrite-data */ cr |= test_bit(CONSIDER_RESYNC, &device->flags); /* if we had been plain connected, and the admin requested to * start a sync by "invalidate" or "invalidate-remote" */ cr |= (os.conn == C_CONNECTED && (peer_state.conn >= C_STARTING_SYNC_S && peer_state.conn <= C_WF_BITMAP_T)); if (cr) ns.conn = drbd_sync_handshake(peer_device, peer_state.role, real_peer_disk); put_ldev(device); if (ns.conn == C_MASK) { ns.conn = C_CONNECTED; if (device->state.disk == D_NEGOTIATING) { drbd_force_state(device, NS(disk, D_FAILED)); } else if (peer_state.disk == D_NEGOTIATING) { drbd_err(device, "Disk attach process on the peer node was aborted.\n"); peer_state.disk = D_DISKLESS; real_peer_disk = D_DISKLESS; } else { if (test_and_clear_bit(CONN_DRY_RUN, &peer_device->connection->flags)) return -EIO; D_ASSERT(device, os.conn == C_WF_REPORT_PARAMS); conn_request_state(peer_device->connection, NS(conn, C_DISCONNECTING), CS_HARD); return -EIO; } } } spin_lock_irq(&device->resource->req_lock); if (os.i != drbd_read_state(device).i) goto retry; clear_bit(CONSIDER_RESYNC, &device->flags); ns.peer = peer_state.role; ns.pdsk = real_peer_disk; ns.peer_isp = (peer_state.aftr_isp | peer_state.user_isp); if ((ns.conn == C_CONNECTED || ns.conn == C_WF_BITMAP_S) && ns.disk == D_NEGOTIATING) ns.disk = device->new_state_tmp.disk; cs_flags = CS_VERBOSE + (os.conn < C_CONNECTED && ns.conn >= C_CONNECTED ? 0 : CS_HARD); if (ns.pdsk == D_CONSISTENT && drbd_suspended(device) && ns.conn == C_CONNECTED && os.conn < C_CONNECTED && test_bit(NEW_CUR_UUID, &device->flags)) { /* Do not allow tl_restart(RESEND) for a rebooted peer. We can only allow this for temporal network outages! */ spin_unlock_irq(&device->resource->req_lock); drbd_err(device, "Aborting Connect, can not thaw IO with an only Consistent peer\n"); tl_clear(peer_device->connection); drbd_uuid_new_current(device); clear_bit(NEW_CUR_UUID, &device->flags); conn_request_state(peer_device->connection, NS2(conn, C_PROTOCOL_ERROR, susp, 0), CS_HARD); return -EIO; } rv = _drbd_set_state(device, ns, cs_flags, NULL); ns = drbd_read_state(device); spin_unlock_irq(&device->resource->req_lock); if (rv < SS_SUCCESS) { conn_request_state(peer_device->connection, NS(conn, C_DISCONNECTING), CS_HARD); return -EIO; } if (os.conn > C_WF_REPORT_PARAMS) { if (ns.conn > C_CONNECTED && peer_state.conn <= C_CONNECTED && peer_state.disk != D_NEGOTIATING ) { /* we want resync, peer has not yet decided to sync... */ /* Nowadays only used when forcing a node into primary role and setting its disk to UpToDate with that */ drbd_send_uuids(peer_device); drbd_send_current_state(peer_device); } } clear_bit(DISCARD_MY_DATA, &device->flags); drbd_md_sync(device); /* update connected indicator, la_size_sect, ... */ return 0; } static int receive_sync_uuid(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_rs_uuid *p = pi->data; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; wait_event(device->misc_wait, device->state.conn == C_WF_SYNC_UUID || device->state.conn == C_BEHIND || device->state.conn < C_CONNECTED || device->state.disk < D_NEGOTIATING); /* D_ASSERT(device, device->state.conn == C_WF_SYNC_UUID ); */ /* Here the _drbd_uuid_ functions are right, current should _not_ be rotated into the history */ if (get_ldev_if_state(device, D_NEGOTIATING)) { _drbd_uuid_set(device, UI_CURRENT, be64_to_cpu(p->uuid)); _drbd_uuid_set(device, UI_BITMAP, 0UL); drbd_print_uuids(device, "updated sync uuid"); drbd_start_resync(device, C_SYNC_TARGET); put_ldev(device); } else drbd_err(device, "Ignoring SyncUUID packet!\n"); return 0; } /** * receive_bitmap_plain * * Return 0 when done, 1 when another iteration is needed, and a negative error * code upon failure. */ static int receive_bitmap_plain(struct drbd_peer_device *peer_device, unsigned int size, unsigned long *p, struct bm_xfer_ctx *c) { unsigned int data_size = DRBD_SOCKET_BUFFER_SIZE - drbd_header_size(peer_device->connection); unsigned int num_words = min_t(size_t, data_size / sizeof(*p), c->bm_words - c->word_offset); unsigned int want = num_words * sizeof(*p); int err; if (want != size) { drbd_err(peer_device, "%s:want (%u) != size (%u)\n", __func__, want, size); return -EIO; } if (want == 0) return 0; err = drbd_recv_all(peer_device->connection, p, want); if (err) return err; drbd_bm_merge_lel(peer_device->device, c->word_offset, num_words, p); c->word_offset += num_words; c->bit_offset = c->word_offset * BITS_PER_LONG; if (c->bit_offset > c->bm_bits) c->bit_offset = c->bm_bits; return 1; } static enum drbd_bitmap_code dcbp_get_code(struct p_compressed_bm *p) { return (enum drbd_bitmap_code)(p->encoding & 0x0f); } static int dcbp_get_start(struct p_compressed_bm *p) { return (p->encoding & 0x80) != 0; } static int dcbp_get_pad_bits(struct p_compressed_bm *p) { return (p->encoding >> 4) & 0x7; } /** * recv_bm_rle_bits * * Return 0 when done, 1 when another iteration is needed, and a negative error * code upon failure. */ static int recv_bm_rle_bits(struct drbd_peer_device *peer_device, struct p_compressed_bm *p, struct bm_xfer_ctx *c, unsigned int len) { struct bitstream bs; u64 look_ahead; u64 rl; u64 tmp; unsigned long s = c->bit_offset; unsigned long e; int toggle = dcbp_get_start(p); int have; int bits; bitstream_init(&bs, p->code, len, dcbp_get_pad_bits(p)); bits = bitstream_get_bits(&bs, &look_ahead, 64); if (bits < 0) return -EIO; for (have = bits; have > 0; s += rl, toggle = !toggle) { bits = vli_decode_bits(&rl, look_ahead); if (bits <= 0) return -EIO; if (toggle) { e = s + rl -1; if (e >= c->bm_bits) { drbd_err(peer_device, "bitmap overflow (e:%lu) while decoding bm RLE packet\n", e); return -EIO; } _drbd_bm_set_bits(peer_device->device, s, e); } if (have < bits) { drbd_err(peer_device, "bitmap decoding error: h:%d b:%d la:0x%08llx l:%u/%u\n", have, bits, look_ahead, (unsigned int)(bs.cur.b - p->code), (unsigned int)bs.buf_len); return -EIO; } /* if we consumed all 64 bits, assign 0; >> 64 is "undefined"; */ if (likely(bits < 64)) look_ahead >>= bits; else look_ahead = 0; have -= bits; bits = bitstream_get_bits(&bs, &tmp, 64 - have); if (bits < 0) return -EIO; look_ahead |= tmp << have; have += bits; } c->bit_offset = s; bm_xfer_ctx_bit_to_word_offset(c); return (s != c->bm_bits); } /** * decode_bitmap_c * * Return 0 when done, 1 when another iteration is needed, and a negative error * code upon failure. */ static int decode_bitmap_c(struct drbd_peer_device *peer_device, struct p_compressed_bm *p, struct bm_xfer_ctx *c, unsigned int len) { if (dcbp_get_code(p) == RLE_VLI_Bits) return recv_bm_rle_bits(peer_device, p, c, len - sizeof(*p)); /* other variants had been implemented for evaluation, * but have been dropped as this one turned out to be "best" * during all our tests. */ drbd_err(peer_device, "receive_bitmap_c: unknown encoding %u\n", p->encoding); conn_request_state(peer_device->connection, NS(conn, C_PROTOCOL_ERROR), CS_HARD); return -EIO; } void INFO_bm_xfer_stats(struct drbd_device *device, const char *direction, struct bm_xfer_ctx *c) { /* what would it take to transfer it "plaintext" */ unsigned int header_size = drbd_header_size(first_peer_device(device)->connection); unsigned int data_size = DRBD_SOCKET_BUFFER_SIZE - header_size; unsigned int plain = header_size * (DIV_ROUND_UP(c->bm_words, data_size) + 1) + c->bm_words * sizeof(unsigned long); unsigned int total = c->bytes[0] + c->bytes[1]; unsigned int r; /* total can not be zero. but just in case: */ if (total == 0) return; /* don't report if not compressed */ if (total >= plain) return; /* total < plain. check for overflow, still */ r = (total > UINT_MAX/1000) ? (total / (plain/1000)) : (1000 * total / plain); if (r > 1000) r = 1000; r = 1000 - r; drbd_info(device, "%s bitmap stats [Bytes(packets)]: plain %u(%u), RLE %u(%u), " "total %u; compression: %u.%u%%\n", direction, c->bytes[1], c->packets[1], c->bytes[0], c->packets[0], total, r/10, r % 10); } /* Since we are processing the bitfield from lower addresses to higher, it does not matter if the process it in 32 bit chunks or 64 bit chunks as long as it is little endian. (Understand it as byte stream, beginning with the lowest byte...) If we would use big endian we would need to process it from the highest address to the lowest, in order to be agnostic to the 32 vs 64 bits issue. returns 0 on failure, 1 if we successfully received it. */ static int receive_bitmap(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct bm_xfer_ctx c; int err; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; drbd_bm_lock(device, "receive bitmap", BM_LOCKED_SET_ALLOWED); /* you are supposed to send additional out-of-sync information * if you actually set bits during this phase */ c = (struct bm_xfer_ctx) { .bm_bits = drbd_bm_bits(device), .bm_words = drbd_bm_words(device), }; for(;;) { if (pi->cmd == P_BITMAP) err = receive_bitmap_plain(peer_device, pi->size, pi->data, &c); else if (pi->cmd == P_COMPRESSED_BITMAP) { /* MAYBE: sanity check that we speak proto >= 90, * and the feature is enabled! */ struct p_compressed_bm *p = pi->data; if (pi->size > DRBD_SOCKET_BUFFER_SIZE - drbd_header_size(connection)) { drbd_err(device, "ReportCBitmap packet too large\n"); err = -EIO; goto out; } if (pi->size <= sizeof(*p)) { drbd_err(device, "ReportCBitmap packet too small (l:%u)\n", pi->size); err = -EIO; goto out; } err = drbd_recv_all(peer_device->connection, p, pi->size); if (err) goto out; err = decode_bitmap_c(peer_device, p, &c, pi->size); } else { drbd_warn(device, "receive_bitmap: cmd neither ReportBitMap nor ReportCBitMap (is 0x%x)", pi->cmd); err = -EIO; goto out; } c.packets[pi->cmd == P_BITMAP]++; c.bytes[pi->cmd == P_BITMAP] += drbd_header_size(connection) + pi->size; if (err <= 0) { if (err < 0) goto out; break; } err = drbd_recv_header(peer_device->connection, pi); if (err) goto out; } INFO_bm_xfer_stats(device, "receive", &c); if (device->state.conn == C_WF_BITMAP_T) { enum drbd_state_rv rv; err = drbd_send_bitmap(device); if (err) goto out; /* Omit CS_ORDERED with this state transition to avoid deadlocks. */ rv = _drbd_request_state(device, NS(conn, C_WF_SYNC_UUID), CS_VERBOSE); D_ASSERT(device, rv == SS_SUCCESS); } else if (device->state.conn != C_WF_BITMAP_S) { /* admin may have requested C_DISCONNECTING, * other threads may have noticed network errors */ drbd_info(device, "unexpected cstate (%s) in receive_bitmap\n", drbd_conn_str(device->state.conn)); } err = 0; out: drbd_bm_unlock(device); if (!err && device->state.conn == C_WF_BITMAP_S) drbd_start_resync(device, C_SYNC_SOURCE); return err; } static int receive_skip(struct drbd_connection *connection, struct packet_info *pi) { drbd_warn(connection, "skipping unknown optional packet type %d, l: %d!\n", pi->cmd, pi->size); return ignore_remaining_packet(connection, pi); } static int receive_UnplugRemote(struct drbd_connection *connection, struct packet_info *pi) { /* just unplug all devices always, regardless which volume number */ drbd_unplug_all_devices(connection); /* Make sure we've acked all the TCP data associated * with the data requests being unplugged */ drbd_tcp_quickack(connection->data.socket); return 0; } static int receive_out_of_sync(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_block_desc *p = pi->data; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; switch (device->state.conn) { case C_WF_SYNC_UUID: case C_WF_BITMAP_T: case C_BEHIND: break; default: drbd_err(device, "ASSERT FAILED cstate = %s, expected: WFSyncUUID|WFBitMapT|Behind\n", drbd_conn_str(device->state.conn)); } drbd_set_out_of_sync(device, be64_to_cpu(p->sector), be32_to_cpu(p->blksize)); return 0; } struct data_cmd { int expect_payload; size_t pkt_size; int (*fn)(struct drbd_connection *, struct packet_info *); }; static struct data_cmd drbd_cmd_handler[] = { [P_DATA] = { 1, sizeof(struct p_data), receive_Data }, [P_DATA_REPLY] = { 1, sizeof(struct p_data), receive_DataReply }, [P_RS_DATA_REPLY] = { 1, sizeof(struct p_data), receive_RSDataReply } , [P_BARRIER] = { 0, sizeof(struct p_barrier), receive_Barrier } , [P_BITMAP] = { 1, 0, receive_bitmap } , [P_COMPRESSED_BITMAP] = { 1, 0, receive_bitmap } , [P_UNPLUG_REMOTE] = { 0, 0, receive_UnplugRemote }, [P_DATA_REQUEST] = { 0, sizeof(struct p_block_req), receive_DataRequest }, [P_RS_DATA_REQUEST] = { 0, sizeof(struct p_block_req), receive_DataRequest }, [P_SYNC_PARAM] = { 1, 0, receive_SyncParam }, [P_SYNC_PARAM89] = { 1, 0, receive_SyncParam }, [P_PROTOCOL] = { 1, sizeof(struct p_protocol), receive_protocol }, [P_UUIDS] = { 0, sizeof(struct p_uuids), receive_uuids }, [P_SIZES] = { 0, sizeof(struct p_sizes), receive_sizes }, [P_STATE] = { 0, sizeof(struct p_state), receive_state }, [P_STATE_CHG_REQ] = { 0, sizeof(struct p_req_state), receive_req_state }, [P_SYNC_UUID] = { 0, sizeof(struct p_rs_uuid), receive_sync_uuid }, [P_OV_REQUEST] = { 0, sizeof(struct p_block_req), receive_DataRequest }, [P_OV_REPLY] = { 1, sizeof(struct p_block_req), receive_DataRequest }, [P_CSUM_RS_REQUEST] = { 1, sizeof(struct p_block_req), receive_DataRequest }, [P_DELAY_PROBE] = { 0, sizeof(struct p_delay_probe93), receive_skip }, [P_OUT_OF_SYNC] = { 0, sizeof(struct p_block_desc), receive_out_of_sync }, [P_CONN_ST_CHG_REQ] = { 0, sizeof(struct p_req_state), receive_req_conn_state }, [P_PROTOCOL_UPDATE] = { 1, sizeof(struct p_protocol), receive_protocol }, [P_TRIM] = { 0, sizeof(struct p_trim), receive_Data }, }; static void drbdd(struct drbd_connection *connection) { struct packet_info pi; size_t shs; /* sub header size */ int err; while (get_t_state(&connection->receiver) == RUNNING) { struct data_cmd *cmd; drbd_thread_current_set_cpu(&connection->receiver); if (drbd_recv_header(connection, &pi)) goto err_out; cmd = &drbd_cmd_handler[pi.cmd]; if (unlikely(pi.cmd >= ARRAY_SIZE(drbd_cmd_handler) || !cmd->fn)) { drbd_err(connection, "Unexpected data packet %s (0x%04x)", cmdname(pi.cmd), pi.cmd); goto err_out; } shs = cmd->pkt_size; if (pi.size > shs && !cmd->expect_payload) { drbd_err(connection, "No payload expected %s l:%d\n", cmdname(pi.cmd), pi.size); goto err_out; } if (shs) { err = drbd_recv_all_warn(connection, pi.data, shs); if (err) goto err_out; pi.size -= shs; } err = cmd->fn(connection, &pi); if (err) { drbd_err(connection, "error receiving %s, e: %d l: %d!\n", cmdname(pi.cmd), err, pi.size); goto err_out; } } return; err_out: conn_request_state(connection, NS(conn, C_PROTOCOL_ERROR), CS_HARD); } static void conn_disconnect(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; enum drbd_conns oc; int vnr; if (connection->cstate == C_STANDALONE) return; /* We are about to start the cleanup after connection loss. * Make sure drbd_make_request knows about that. * Usually we should be in some network failure state already, * but just in case we are not, we fix it up here. */ conn_request_state(connection, NS(conn, C_NETWORK_FAILURE), CS_HARD); /* asender does not clean up anything. it must not interfere, either */ drbd_thread_stop(&connection->asender); drbd_free_sock(connection); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); drbd_disconnected(peer_device); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); if (!list_empty(&connection->current_epoch->list)) drbd_err(connection, "ASSERTION FAILED: connection->current_epoch->list not empty\n"); /* ok, no more ee's on the fly, it is safe to reset the epoch_size */ atomic_set(&connection->current_epoch->epoch_size, 0); connection->send.seen_any_write_yet = false; drbd_info(connection, "Connection closed\n"); if (conn_highest_role(connection) == R_PRIMARY && conn_highest_pdsk(connection) >= D_UNKNOWN) conn_try_outdate_peer_async(connection); spin_lock_irq(&connection->resource->req_lock); oc = connection->cstate; if (oc >= C_UNCONNECTED) _conn_request_state(connection, NS(conn, C_UNCONNECTED), CS_VERBOSE); spin_unlock_irq(&connection->resource->req_lock); if (oc == C_DISCONNECTING) conn_request_state(connection, NS(conn, C_STANDALONE), CS_VERBOSE | CS_HARD); } static int drbd_disconnected(struct drbd_peer_device *peer_device) { struct drbd_device *device = peer_device->device; unsigned int i; /* wait for current activity to cease. */ spin_lock_irq(&device->resource->req_lock); _drbd_wait_ee_list_empty(device, &device->active_ee); _drbd_wait_ee_list_empty(device, &device->sync_ee); _drbd_wait_ee_list_empty(device, &device->read_ee); spin_unlock_irq(&device->resource->req_lock); /* We do not have data structures that would allow us to * get the rs_pending_cnt down to 0 again. * * On C_SYNC_TARGET we do not have any data structures describing * the pending RSDataRequest's we have sent. * * On C_SYNC_SOURCE there is no data structure that tracks * the P_RS_DATA_REPLY blocks that we sent to the SyncTarget. * And no, it is not the sum of the reference counts in the * resync_LRU. The resync_LRU tracks the whole operation including * the disk-IO, while the rs_pending_cnt only tracks the blocks * on the fly. */ drbd_rs_cancel_all(device); device->rs_total = 0; device->rs_failed = 0; atomic_set(&device->rs_pending_cnt, 0); wake_up(&device->misc_wait); del_timer_sync(&device->resync_timer); resync_timer_fn((unsigned long)device); /* wait for all w_e_end_data_req, w_e_end_rsdata_req, w_send_barrier, * w_make_resync_request etc. which may still be on the worker queue * to be "canceled" */ drbd_flush_workqueue(&peer_device->connection->sender_work); drbd_finish_peer_reqs(device); /* This second workqueue flush is necessary, since drbd_finish_peer_reqs() might have issued a work again. The one before drbd_finish_peer_reqs() is necessary to reclain net_ee in drbd_finish_peer_reqs(). */ drbd_flush_workqueue(&peer_device->connection->sender_work); /* need to do it again, drbd_finish_peer_reqs() may have populated it * again via drbd_try_clear_on_disk_bm(). */ drbd_rs_cancel_all(device); kfree(device->p_uuid); device->p_uuid = NULL; if (!drbd_suspended(device)) tl_clear(peer_device->connection); drbd_md_sync(device); /* serialize with bitmap writeout triggered by the state change, * if any. */ wait_event(device->misc_wait, !test_bit(BITMAP_IO, &device->flags)); /* tcp_close and release of sendpage pages can be deferred. I don't * want to use SO_LINGER, because apparently it can be deferred for * more than 20 seconds (longest time I checked). * * Actually we don't care for exactly when the network stack does its * put_page(), but release our reference on these pages right here. */ i = drbd_free_peer_reqs(device, &device->net_ee); if (i) drbd_info(device, "net_ee not empty, killed %u entries\n", i); i = atomic_read(&device->pp_in_use_by_net); if (i) drbd_info(device, "pp_in_use_by_net = %d, expected 0\n", i); i = atomic_read(&device->pp_in_use); if (i) drbd_info(device, "pp_in_use = %d, expected 0\n", i); D_ASSERT(device, list_empty(&device->read_ee)); D_ASSERT(device, list_empty(&device->active_ee)); D_ASSERT(device, list_empty(&device->sync_ee)); D_ASSERT(device, list_empty(&device->done_ee)); return 0; } /* * We support PRO_VERSION_MIN to PRO_VERSION_MAX. The protocol version * we can agree on is stored in agreed_pro_version. * * feature flags and the reserved array should be enough room for future * enhancements of the handshake protocol, and possible plugins... * * for now, they are expected to be zero, but ignored. */ static int drbd_send_features(struct drbd_connection *connection) { struct drbd_socket *sock; struct p_connection_features *p; sock = &connection->data; p = conn_prepare_command(connection, sock); if (!p) return -EIO; memset(p, 0, sizeof(*p)); p->protocol_min = cpu_to_be32(PRO_VERSION_MIN); p->protocol_max = cpu_to_be32(PRO_VERSION_MAX); p->feature_flags = cpu_to_be32(PRO_FEATURES); return conn_send_command(connection, sock, P_CONNECTION_FEATURES, sizeof(*p), NULL, 0); } /* * return values: * 1 yes, we have a valid connection * 0 oops, did not work out, please try again * -1 peer talks different language, * no point in trying again, please go standalone. */ static int drbd_do_features(struct drbd_connection *connection) { /* ASSERT current == connection->receiver ... */ struct p_connection_features *p; const int expect = sizeof(struct p_connection_features); struct packet_info pi; int err; err = drbd_send_features(connection); if (err) return 0; err = drbd_recv_header(connection, &pi); if (err) return 0; if (pi.cmd != P_CONNECTION_FEATURES) { drbd_err(connection, "expected ConnectionFeatures packet, received: %s (0x%04x)\n", cmdname(pi.cmd), pi.cmd); return -1; } if (pi.size != expect) { drbd_err(connection, "expected ConnectionFeatures length: %u, received: %u\n", expect, pi.size); return -1; } p = pi.data; err = drbd_recv_all_warn(connection, p, expect); if (err) return 0; p->protocol_min = be32_to_cpu(p->protocol_min); p->protocol_max = be32_to_cpu(p->protocol_max); if (p->protocol_max == 0) p->protocol_max = p->protocol_min; if (PRO_VERSION_MAX < p->protocol_min || PRO_VERSION_MIN > p->protocol_max) goto incompat; connection->agreed_pro_version = min_t(int, PRO_VERSION_MAX, p->protocol_max); connection->agreed_features = PRO_FEATURES & be32_to_cpu(p->feature_flags); drbd_info(connection, "Handshake successful: " "Agreed network protocol version %d\n", connection->agreed_pro_version); drbd_info(connection, "Agreed to%ssupport TRIM on protocol level\n", connection->agreed_features & FF_TRIM ? " " : " not "); return 1; incompat: drbd_err(connection, "incompatible DRBD dialects: " "I support %d-%d, peer supports %d-%d\n", PRO_VERSION_MIN, PRO_VERSION_MAX, p->protocol_min, p->protocol_max); return -1; } #if !defined(CONFIG_CRYPTO_HMAC) && !defined(CONFIG_CRYPTO_HMAC_MODULE) static int drbd_do_auth(struct drbd_connection *connection) { drbd_err(connection, "This kernel was build without CONFIG_CRYPTO_HMAC.\n"); drbd_err(connection, "You need to disable 'cram-hmac-alg' in drbd.conf.\n"); return -1; } #else #define CHALLENGE_LEN 64 /* Return value: 1 - auth succeeded, 0 - failed, try again (network error), -1 - auth failed, don't try again. */ static int drbd_do_auth(struct drbd_connection *connection) { struct drbd_socket *sock; char my_challenge[CHALLENGE_LEN]; /* 64 Bytes... */ struct scatterlist sg; char *response = NULL; char *right_response = NULL; char *peers_ch = NULL; unsigned int key_len; char secret[SHARED_SECRET_MAX]; /* 64 byte */ unsigned int resp_size; struct hash_desc desc; struct packet_info pi; struct net_conf *nc; int err, rv; /* FIXME: Put the challenge/response into the preallocated socket buffer. */ rcu_read_lock(); nc = rcu_dereference(connection->net_conf); key_len = strlen(nc->shared_secret); memcpy(secret, nc->shared_secret, key_len); rcu_read_unlock(); desc.tfm = connection->cram_hmac_tfm; desc.flags = 0; rv = crypto_hash_setkey(connection->cram_hmac_tfm, (u8 *)secret, key_len); if (rv) { drbd_err(connection, "crypto_hash_setkey() failed with %d\n", rv); rv = -1; goto fail; } get_random_bytes(my_challenge, CHALLENGE_LEN); sock = &connection->data; if (!conn_prepare_command(connection, sock)) { rv = 0; goto fail; } rv = !conn_send_command(connection, sock, P_AUTH_CHALLENGE, 0, my_challenge, CHALLENGE_LEN); if (!rv) goto fail; err = drbd_recv_header(connection, &pi); if (err) { rv = 0; goto fail; } if (pi.cmd != P_AUTH_CHALLENGE) { drbd_err(connection, "expected AuthChallenge packet, received: %s (0x%04x)\n", cmdname(pi.cmd), pi.cmd); rv = 0; goto fail; } if (pi.size > CHALLENGE_LEN * 2) { drbd_err(connection, "expected AuthChallenge payload too big.\n"); rv = -1; goto fail; } if (pi.size < CHALLENGE_LEN) { drbd_err(connection, "AuthChallenge payload too small.\n"); rv = -1; goto fail; } peers_ch = kmalloc(pi.size, GFP_NOIO); if (peers_ch == NULL) { drbd_err(connection, "kmalloc of peers_ch failed\n"); rv = -1; goto fail; } err = drbd_recv_all_warn(connection, peers_ch, pi.size); if (err) { rv = 0; goto fail; } if (!memcmp(my_challenge, peers_ch, CHALLENGE_LEN)) { drbd_err(connection, "Peer presented the same challenge!\n"); rv = -1; goto fail; } resp_size = crypto_hash_digestsize(connection->cram_hmac_tfm); response = kmalloc(resp_size, GFP_NOIO); if (response == NULL) { drbd_err(connection, "kmalloc of response failed\n"); rv = -1; goto fail; } sg_init_table(&sg, 1); sg_set_buf(&sg, peers_ch, pi.size); rv = crypto_hash_digest(&desc, &sg, sg.length, response); if (rv) { drbd_err(connection, "crypto_hash_digest() failed with %d\n", rv); rv = -1; goto fail; } if (!conn_prepare_command(connection, sock)) { rv = 0; goto fail; } rv = !conn_send_command(connection, sock, P_AUTH_RESPONSE, 0, response, resp_size); if (!rv) goto fail; err = drbd_recv_header(connection, &pi); if (err) { rv = 0; goto fail; } if (pi.cmd != P_AUTH_RESPONSE) { drbd_err(connection, "expected AuthResponse packet, received: %s (0x%04x)\n", cmdname(pi.cmd), pi.cmd); rv = 0; goto fail; } if (pi.size != resp_size) { drbd_err(connection, "expected AuthResponse payload of wrong size\n"); rv = 0; goto fail; } err = drbd_recv_all_warn(connection, response , resp_size); if (err) { rv = 0; goto fail; } right_response = kmalloc(resp_size, GFP_NOIO); if (right_response == NULL) { drbd_err(connection, "kmalloc of right_response failed\n"); rv = -1; goto fail; } sg_set_buf(&sg, my_challenge, CHALLENGE_LEN); rv = crypto_hash_digest(&desc, &sg, sg.length, right_response); if (rv) { drbd_err(connection, "crypto_hash_digest() failed with %d\n", rv); rv = -1; goto fail; } rv = !memcmp(response, right_response, resp_size); if (rv) drbd_info(connection, "Peer authenticated using %d bytes HMAC\n", resp_size); else rv = -1; fail: kfree(peers_ch); kfree(response); kfree(right_response); return rv; } #endif int drbd_receiver(struct drbd_thread *thi) { struct drbd_connection *connection = thi->connection; int h; drbd_info(connection, "receiver (re)started\n"); do { h = conn_connect(connection); if (h == 0) { conn_disconnect(connection); schedule_timeout_interruptible(HZ); } if (h == -1) { drbd_warn(connection, "Discarding network configuration.\n"); conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); } } while (h == 0); if (h > 0) drbdd(connection); conn_disconnect(connection); drbd_info(connection, "receiver terminated\n"); return 0; } /* ********* acknowledge sender ******** */ static int got_conn_RqSReply(struct drbd_connection *connection, struct packet_info *pi) { struct p_req_state_reply *p = pi->data; int retcode = be32_to_cpu(p->retcode); if (retcode >= SS_SUCCESS) { set_bit(CONN_WD_ST_CHG_OKAY, &connection->flags); } else { set_bit(CONN_WD_ST_CHG_FAIL, &connection->flags); drbd_err(connection, "Requested state change failed by peer: %s (%d)\n", drbd_set_st_err_str(retcode), retcode); } wake_up(&connection->ping_wait); return 0; } static int got_RqSReply(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_req_state_reply *p = pi->data; int retcode = be32_to_cpu(p->retcode); peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; if (test_bit(CONN_WD_ST_CHG_REQ, &connection->flags)) { D_ASSERT(device, connection->agreed_pro_version < 100); return got_conn_RqSReply(connection, pi); } if (retcode >= SS_SUCCESS) { set_bit(CL_ST_CHG_SUCCESS, &device->flags); } else { set_bit(CL_ST_CHG_FAIL, &device->flags); drbd_err(device, "Requested state change failed by peer: %s (%d)\n", drbd_set_st_err_str(retcode), retcode); } wake_up(&device->state_wait); return 0; } static int got_Ping(struct drbd_connection *connection, struct packet_info *pi) { return drbd_send_ping_ack(connection); } static int got_PingAck(struct drbd_connection *connection, struct packet_info *pi) { if (!test_and_set_bit(GOT_PING_ACK, &connection->flags)) wake_up(&connection->ping_wait); return 0; } static int got_IsInSync(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_block_ack *p = pi->data; sector_t sector = be64_to_cpu(p->sector); int blksize = be32_to_cpu(p->blksize); peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; D_ASSERT(device, peer_device->connection->agreed_pro_version >= 89); update_peer_seq(peer_device, be32_to_cpu(p->seq_num)); if (get_ldev(device)) { drbd_rs_complete_io(device, sector); drbd_set_in_sync(device, sector, blksize); /* rs_same_csums is supposed to count in units of BM_BLOCK_SIZE */ device->rs_same_csum += (blksize >> BM_BLOCK_SHIFT); put_ldev(device); } dec_rs_pending(device); atomic_add(blksize >> 9, &device->rs_sect_in); return 0; } static int validate_req_change_req_state(struct drbd_device *device, u64 id, sector_t sector, struct rb_root *root, const char *func, enum drbd_req_event what, bool missing_ok) { struct drbd_request *req; struct bio_and_error m; spin_lock_irq(&device->resource->req_lock); req = find_request(device, root, id, sector, missing_ok, func); if (unlikely(!req)) { spin_unlock_irq(&device->resource->req_lock); return -EIO; } __req_mod(req, what, &m); spin_unlock_irq(&device->resource->req_lock); if (m.bio) complete_master_bio(device, &m); return 0; } static int got_BlockAck(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_block_ack *p = pi->data; sector_t sector = be64_to_cpu(p->sector); int blksize = be32_to_cpu(p->blksize); enum drbd_req_event what; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; update_peer_seq(peer_device, be32_to_cpu(p->seq_num)); if (p->block_id == ID_SYNCER) { drbd_set_in_sync(device, sector, blksize); dec_rs_pending(device); return 0; } switch (pi->cmd) { case P_RS_WRITE_ACK: what = WRITE_ACKED_BY_PEER_AND_SIS; break; case P_WRITE_ACK: what = WRITE_ACKED_BY_PEER; break; case P_RECV_ACK: what = RECV_ACKED_BY_PEER; break; case P_SUPERSEDED: what = CONFLICT_RESOLVED; break; case P_RETRY_WRITE: what = POSTPONE_WRITE; break; default: BUG(); } return validate_req_change_req_state(device, p->block_id, sector, &device->write_requests, __func__, what, false); } static int got_NegAck(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_block_ack *p = pi->data; sector_t sector = be64_to_cpu(p->sector); int size = be32_to_cpu(p->blksize); int err; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; update_peer_seq(peer_device, be32_to_cpu(p->seq_num)); if (p->block_id == ID_SYNCER) { dec_rs_pending(device); drbd_rs_failed_io(device, sector, size); return 0; } err = validate_req_change_req_state(device, p->block_id, sector, &device->write_requests, __func__, NEG_ACKED, true); if (err) { /* Protocol A has no P_WRITE_ACKs, but has P_NEG_ACKs. The master bio might already be completed, therefore the request is no longer in the collision hash. */ /* In Protocol B we might already have got a P_RECV_ACK but then get a P_NEG_ACK afterwards. */ drbd_set_out_of_sync(device, sector, size); } return 0; } static int got_NegDReply(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_block_ack *p = pi->data; sector_t sector = be64_to_cpu(p->sector); peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; update_peer_seq(peer_device, be32_to_cpu(p->seq_num)); drbd_err(device, "Got NegDReply; Sector %llus, len %u.\n", (unsigned long long)sector, be32_to_cpu(p->blksize)); return validate_req_change_req_state(device, p->block_id, sector, &device->read_requests, __func__, NEG_ACKED, false); } static int got_NegRSDReply(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; sector_t sector; int size; struct p_block_ack *p = pi->data; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; sector = be64_to_cpu(p->sector); size = be32_to_cpu(p->blksize); update_peer_seq(peer_device, be32_to_cpu(p->seq_num)); dec_rs_pending(device); if (get_ldev_if_state(device, D_FAILED)) { drbd_rs_complete_io(device, sector); switch (pi->cmd) { case P_NEG_RS_DREPLY: drbd_rs_failed_io(device, sector, size); case P_RS_CANCEL: break; default: BUG(); } put_ldev(device); } return 0; } static int got_BarrierAck(struct drbd_connection *connection, struct packet_info *pi) { struct p_barrier_ack *p = pi->data; struct drbd_peer_device *peer_device; int vnr; tl_release(connection, p->barrier, be32_to_cpu(p->set_size)); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (device->state.conn == C_AHEAD && atomic_read(&device->ap_in_flight) == 0 && !test_and_set_bit(AHEAD_TO_SYNC_SOURCE, &device->flags)) { device->start_resync_timer.expires = jiffies + HZ; add_timer(&device->start_resync_timer); } } rcu_read_unlock(); return 0; } static int got_OVResult(struct drbd_connection *connection, struct packet_info *pi) { struct drbd_peer_device *peer_device; struct drbd_device *device; struct p_block_ack *p = pi->data; struct drbd_device_work *dw; sector_t sector; int size; peer_device = conn_peer_device(connection, pi->vnr); if (!peer_device) return -EIO; device = peer_device->device; sector = be64_to_cpu(p->sector); size = be32_to_cpu(p->blksize); update_peer_seq(peer_device, be32_to_cpu(p->seq_num)); if (be64_to_cpu(p->block_id) == ID_OUT_OF_SYNC) drbd_ov_out_of_sync_found(device, sector, size); else ov_out_of_sync_print(device); if (!get_ldev(device)) return 0; drbd_rs_complete_io(device, sector); dec_rs_pending(device); --device->ov_left; /* let's advance progress step marks only for every other megabyte */ if ((device->ov_left & 0x200) == 0x200) drbd_advance_rs_marks(device, device->ov_left); if (device->ov_left == 0) { dw = kmalloc(sizeof(*dw), GFP_NOIO); if (dw) { dw->w.cb = w_ov_finished; dw->device = device; drbd_queue_work(&peer_device->connection->sender_work, &dw->w); } else { drbd_err(device, "kmalloc(dw) failed."); ov_out_of_sync_print(device); drbd_resync_finished(device); } } put_ldev(device); return 0; } static int got_skip(struct drbd_connection *connection, struct packet_info *pi) { return 0; } static int connection_finish_peer_reqs(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr, not_empty = 0; do { clear_bit(SIGNAL_ASENDER, &connection->flags); flush_signals(current); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; kobject_get(&device->kobj); rcu_read_unlock(); if (drbd_finish_peer_reqs(device)) { kobject_put(&device->kobj); return 1; } kobject_put(&device->kobj); rcu_read_lock(); } set_bit(SIGNAL_ASENDER, &connection->flags); spin_lock_irq(&connection->resource->req_lock); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; not_empty = !list_empty(&device->done_ee); if (not_empty) break; } spin_unlock_irq(&connection->resource->req_lock); rcu_read_unlock(); } while (not_empty); return 0; } struct asender_cmd { size_t pkt_size; int (*fn)(struct drbd_connection *connection, struct packet_info *); }; static struct asender_cmd asender_tbl[] = { [P_PING] = { 0, got_Ping }, [P_PING_ACK] = { 0, got_PingAck }, [P_RECV_ACK] = { sizeof(struct p_block_ack), got_BlockAck }, [P_WRITE_ACK] = { sizeof(struct p_block_ack), got_BlockAck }, [P_RS_WRITE_ACK] = { sizeof(struct p_block_ack), got_BlockAck }, [P_SUPERSEDED] = { sizeof(struct p_block_ack), got_BlockAck }, [P_NEG_ACK] = { sizeof(struct p_block_ack), got_NegAck }, [P_NEG_DREPLY] = { sizeof(struct p_block_ack), got_NegDReply }, [P_NEG_RS_DREPLY] = { sizeof(struct p_block_ack), got_NegRSDReply }, [P_OV_RESULT] = { sizeof(struct p_block_ack), got_OVResult }, [P_BARRIER_ACK] = { sizeof(struct p_barrier_ack), got_BarrierAck }, [P_STATE_CHG_REPLY] = { sizeof(struct p_req_state_reply), got_RqSReply }, [P_RS_IS_IN_SYNC] = { sizeof(struct p_block_ack), got_IsInSync }, [P_DELAY_PROBE] = { sizeof(struct p_delay_probe93), got_skip }, [P_RS_CANCEL] = { sizeof(struct p_block_ack), got_NegRSDReply }, [P_CONN_ST_CHG_REPLY]={ sizeof(struct p_req_state_reply), got_conn_RqSReply }, [P_RETRY_WRITE] = { sizeof(struct p_block_ack), got_BlockAck }, }; int drbd_asender(struct drbd_thread *thi) { struct drbd_connection *connection = thi->connection; struct asender_cmd *cmd = NULL; struct packet_info pi; int rv; void *buf = connection->meta.rbuf; int received = 0; unsigned int header_size = drbd_header_size(connection); int expect = header_size; bool ping_timeout_active = false; struct net_conf *nc; int ping_timeo, tcp_cork, ping_int; struct sched_param param = { .sched_priority = 2 }; rv = sched_setscheduler(current, SCHED_RR, ¶m); if (rv < 0) drbd_err(connection, "drbd_asender: ERROR set priority, ret=%d\n", rv); while (get_t_state(thi) == RUNNING) { drbd_thread_current_set_cpu(thi); rcu_read_lock(); nc = rcu_dereference(connection->net_conf); ping_timeo = nc->ping_timeo; tcp_cork = nc->tcp_cork; ping_int = nc->ping_int; rcu_read_unlock(); if (test_and_clear_bit(SEND_PING, &connection->flags)) { if (drbd_send_ping(connection)) { drbd_err(connection, "drbd_send_ping has failed\n"); goto reconnect; } connection->meta.socket->sk->sk_rcvtimeo = ping_timeo * HZ / 10; ping_timeout_active = true; } /* TODO: conditionally cork; it may hurt latency if we cork without much to send */ if (tcp_cork) drbd_tcp_cork(connection->meta.socket); if (connection_finish_peer_reqs(connection)) { drbd_err(connection, "connection_finish_peer_reqs() failed\n"); goto reconnect; } /* but unconditionally uncork unless disabled */ if (tcp_cork) drbd_tcp_uncork(connection->meta.socket); /* short circuit, recv_msg would return EINTR anyways. */ if (signal_pending(current)) continue; rv = drbd_recv_short(connection->meta.socket, buf, expect-received, 0); clear_bit(SIGNAL_ASENDER, &connection->flags); flush_signals(current); /* Note: * -EINTR (on meta) we got a signal * -EAGAIN (on meta) rcvtimeo expired * -ECONNRESET other side closed the connection * -ERESTARTSYS (on data) we got a signal * rv < 0 other than above: unexpected error! * rv == expected: full header or command * rv < expected: "woken" by signal during receive * rv == 0 : "connection shut down by peer" */ if (likely(rv > 0)) { received += rv; buf += rv; } else if (rv == 0) { if (test_bit(DISCONNECT_SENT, &connection->flags)) { long t; rcu_read_lock(); t = rcu_dereference(connection->net_conf)->ping_timeo * HZ/10; rcu_read_unlock(); t = wait_event_timeout(connection->ping_wait, connection->cstate < C_WF_REPORT_PARAMS, t); if (t) break; } drbd_err(connection, "meta connection shut down by peer.\n"); goto reconnect; } else if (rv == -EAGAIN) { /* If the data socket received something meanwhile, * that is good enough: peer is still alive. */ if (time_after(connection->last_received, jiffies - connection->meta.socket->sk->sk_rcvtimeo)) continue; if (ping_timeout_active) { drbd_err(connection, "PingAck did not arrive in time.\n"); goto reconnect; } set_bit(SEND_PING, &connection->flags); continue; } else if (rv == -EINTR) { continue; } else { drbd_err(connection, "sock_recvmsg returned %d\n", rv); goto reconnect; } if (received == expect && cmd == NULL) { if (decode_header(connection, connection->meta.rbuf, &pi)) goto reconnect; cmd = &asender_tbl[pi.cmd]; if (pi.cmd >= ARRAY_SIZE(asender_tbl) || !cmd->fn) { drbd_err(connection, "Unexpected meta packet %s (0x%04x)\n", cmdname(pi.cmd), pi.cmd); goto disconnect; } expect = header_size + cmd->pkt_size; if (pi.size != expect - header_size) { drbd_err(connection, "Wrong packet size on meta (c: %d, l: %d)\n", pi.cmd, pi.size); goto reconnect; } } if (received == expect) { bool err; err = cmd->fn(connection, &pi); if (err) { drbd_err(connection, "%pf failed\n", cmd->fn); goto reconnect; } connection->last_received = jiffies; if (cmd == &asender_tbl[P_PING_ACK]) { /* restore idle timeout */ connection->meta.socket->sk->sk_rcvtimeo = ping_int * HZ; ping_timeout_active = false; } buf = connection->meta.rbuf; received = 0; expect = header_size; cmd = NULL; } } if (0) { reconnect: conn_request_state(connection, NS(conn, C_NETWORK_FAILURE), CS_HARD); conn_md_sync(connection); } if (0) { disconnect: conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD); } clear_bit(SIGNAL_ASENDER, &connection->flags); drbd_info(connection, "asender terminated\n"); return 0; } drbd-8.4.4/drbd/drbd_req.c0000664000000000000000000013322412225736115014014 0ustar rootroot/* drbd_req.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include "drbd_int.h" #include "drbd_req.h" /* We only support diskstats for 2.6.16 and up. * see also commit commit a362357b6cd62643d4dda3b152639303d78473da * Author: Jens Axboe * Date: Tue Nov 1 09:26:16 2005 +0100 * [BLOCK] Unify the separate read/write io stat fields into arrays */ #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,16) #define _drbd_start_io_acct(...) do {} while (0) #define _drbd_end_io_acct(...) do {} while (0) #else static bool drbd_may_do_local_read(struct drbd_device *device, sector_t sector, int size); /* Update disk stats at start of I/O request */ static void _drbd_start_io_acct(struct drbd_device *device, struct drbd_request *req) { const int rw = bio_data_dir(req->master_bio); #ifndef __disk_stat_inc int cpu; #endif #ifndef COMPAT_HAVE_ATOMIC_IN_FLIGHT spin_lock_irq(&device->resource->req_lock); #endif #ifdef __disk_stat_inc __disk_stat_inc(device->vdisk, ios[rw]); __disk_stat_add(device->vdisk, sectors[rw], req->i.size >> 9); disk_round_stats(device->vdisk); device->vdisk->in_flight++; #else cpu = part_stat_lock(); part_round_stats(cpu, &device->vdisk->part0); part_stat_inc(cpu, &device->vdisk->part0, ios[rw]); part_stat_add(cpu, &device->vdisk->part0, sectors[rw], req->i.size >> 9); (void) cpu; /* The macro invocations above want the cpu argument, I do not like the compiler warning about cpu only assigned but never used... */ part_inc_in_flight(&device->vdisk->part0, rw); part_stat_unlock(); #endif #ifndef COMPAT_HAVE_ATOMIC_IN_FLIGHT spin_unlock_irq(&device->resource->req_lock); #endif } /* Update disk stats when completing request upwards */ static void _drbd_end_io_acct(struct drbd_device *device, struct drbd_request *req) { int rw = bio_data_dir(req->master_bio); unsigned long duration = jiffies - req->start_time; #ifndef __disk_stat_inc int cpu; #endif #ifdef __disk_stat_add __disk_stat_add(device->vdisk, ticks[rw], duration); disk_round_stats(device->vdisk); device->vdisk->in_flight--; #else cpu = part_stat_lock(); part_stat_add(cpu, &device->vdisk->part0, ticks[rw], duration); part_round_stats(cpu, &device->vdisk->part0); part_dec_in_flight(&device->vdisk->part0, rw); part_stat_unlock(); #endif } #endif static struct drbd_request *drbd_req_new(struct drbd_device *device, struct bio *bio_src) { struct drbd_request *req; req = mempool_alloc(drbd_request_mempool, GFP_NOIO); if (!req) return NULL; drbd_req_make_private_bio(req, bio_src); req->rq_state = bio_data_dir(bio_src) == WRITE ? RQ_WRITE : 0; req->device = device; req->master_bio = bio_src; req->epoch = 0; drbd_clear_interval(&req->i); req->i.sector = bio_src->bi_sector; req->i.size = bio_src->bi_size; req->i.local = true; req->i.waiting = false; INIT_LIST_HEAD(&req->tl_requests); INIT_LIST_HEAD(&req->w.list); /* one reference to be put by __drbd_make_request */ atomic_set(&req->completion_ref, 1); /* one kref as long as completion_ref > 0 */ kref_init(&req->kref); return req; } void drbd_req_destroy(struct kref *kref) { struct drbd_request *req = container_of(kref, struct drbd_request, kref); struct drbd_device *device = req->device; const unsigned s = req->rq_state; if ((req->master_bio && !(s & RQ_POSTPONED)) || atomic_read(&req->completion_ref) || (s & RQ_LOCAL_PENDING) || ((s & RQ_NET_MASK) && !(s & RQ_NET_DONE))) { drbd_err(device, "drbd_req_destroy: Logic BUG rq_state = 0x%x, completion_ref = %d\n", s, atomic_read(&req->completion_ref)); return; } /* remove it from the transfer log. * well, only if it had been there in the first * place... if it had not (local only or conflicting * and never sent), it should still be "empty" as * initialized in drbd_req_new(), so we can list_del() it * here unconditionally */ list_del_init(&req->tl_requests); /* if it was a write, we may have to set the corresponding * bit(s) out-of-sync first. If it had a local part, we need to * release the reference to the activity log. */ if (s & RQ_WRITE) { /* Set out-of-sync unless both OK flags are set * (local only or remote failed). * Other places where we set out-of-sync: * READ with local io-error */ /* There is a special case: * we may notice late that IO was suspended, * and postpone, or schedule for retry, a write, * before it even was submitted or sent. * In that case we do not want to touch the bitmap at all. */ if ((s & (RQ_POSTPONED|RQ_LOCAL_MASK|RQ_NET_MASK)) != RQ_POSTPONED) { if (!(s & RQ_NET_OK) || !(s & RQ_LOCAL_OK)) drbd_set_out_of_sync(device, req->i.sector, req->i.size); if ((s & RQ_NET_OK) && (s & RQ_LOCAL_OK) && (s & RQ_NET_SIS)) drbd_set_in_sync(device, req->i.sector, req->i.size); } /* one might be tempted to move the drbd_al_complete_io * to the local io completion callback drbd_request_endio. * but, if this was a mirror write, we may only * drbd_al_complete_io after this is RQ_NET_DONE, * otherwise the extent could be dropped from the al * before it has actually been written on the peer. * if we crash before our peer knows about the request, * but after the extent has been dropped from the al, * we would forget to resync the corresponding extent. */ if (s & RQ_IN_ACT_LOG) { if (get_ldev_if_state(device, D_FAILED)) { drbd_al_complete_io(device, &req->i); put_ldev(device); } else if (DRBD_ratelimit(5*HZ, 3)) { drbd_warn(device, "Should have called drbd_al_complete_io(, %llu, %u), " "but my Disk seems to have failed :(\n", (unsigned long long) req->i.sector, req->i.size); } } } mempool_free(req, drbd_request_mempool); } static void wake_all_senders(struct drbd_connection *connection) { wake_up(&connection->sender_work.q_wait); } /* must hold resource->req_lock */ void start_new_tl_epoch(struct drbd_connection *connection) { /* no point closing an epoch, if it is empty, anyways. */ if (connection->current_tle_writes == 0) return; connection->current_tle_writes = 0; atomic_inc(&connection->current_tle_nr); wake_all_senders(connection); } void complete_master_bio(struct drbd_device *device, struct bio_and_error *m) { bio_endio(m->bio, m->error); dec_ap_bio(device); } static void drbd_remove_request_interval(struct rb_root *root, struct drbd_request *req) { struct drbd_device *device = req->device; struct drbd_interval *i = &req->i; drbd_remove_interval(root, i); /* Wake up any processes waiting for this request to complete. */ if (i->waiting) wake_up(&device->misc_wait); } /* Helper for __req_mod(). * Set m->bio to the master bio, if it is fit to be completed, * or leave it alone (it is initialized to NULL in __req_mod), * if it has already been completed, or cannot be completed yet. * If m->bio is set, the error status to be returned is placed in m->error. */ static void drbd_req_complete(struct drbd_request *req, struct bio_and_error *m) { const unsigned s = req->rq_state; struct drbd_device *device = req->device; int rw; int error, ok; /* we must not complete the master bio, while it is * still being processed by _drbd_send_zc_bio (drbd_send_dblock) * not yet acknowledged by the peer * not yet completed by the local io subsystem * these flags may get cleared in any order by * the worker, * the receiver, * the bio_endio completion callbacks. */ if ((s & RQ_LOCAL_PENDING && !(s & RQ_LOCAL_ABORTED)) || (s & RQ_NET_QUEUED) || (s & RQ_NET_PENDING) || (s & RQ_COMPLETION_SUSP)) { drbd_err(device, "drbd_req_complete: Logic BUG rq_state = 0x%x\n", s); return; } if (!req->master_bio) { drbd_err(device, "drbd_req_complete: Logic BUG, master_bio == NULL!\n"); return; } rw = bio_rw(req->master_bio); /* * figure out whether to report success or failure. * * report success when at least one of the operations succeeded. * or, to put the other way, * only report failure, when both operations failed. * * what to do about the failures is handled elsewhere. * what we need to do here is just: complete the master_bio. * * local completion error, if any, has been stored as ERR_PTR * in private_bio within drbd_request_endio. */ ok = (s & RQ_LOCAL_OK) || (s & RQ_NET_OK); error = PTR_ERR(req->private_bio); /* remove the request from the conflict detection * respective block_id verification hash */ if (!drbd_interval_empty(&req->i)) { struct rb_root *root; if (rw == WRITE) root = &device->write_requests; else root = &device->read_requests; drbd_remove_request_interval(root, req); } /* Before we can signal completion to the upper layers, * we may need to close the current transfer log epoch. * We are within the request lock, so we can simply compare * the request epoch number with the current transfer log * epoch number. If they match, increase the current_tle_nr, * and reset the transfer log epoch write_cnt. */ if (rw == WRITE && req->epoch == atomic_read(&first_peer_device(device)->connection->current_tle_nr)) start_new_tl_epoch(first_peer_device(device)->connection); /* Update disk stats */ _drbd_end_io_acct(device, req); /* If READ failed, * have it be pushed back to the retry work queue, * so it will re-enter __drbd_make_request(), * and be re-assigned to a suitable local or remote path, * or failed if we do not have access to good data anymore. * * Unless it was failed early by __drbd_make_request(), * because no path was available, in which case * it was not even added to the transfer_log. * * READA may fail, and will not be retried. * * WRITE should have used all available paths already. */ if (!ok && rw == READ && !list_empty(&req->tl_requests)) req->rq_state |= RQ_POSTPONED; if (!(req->rq_state & RQ_POSTPONED)) { m->error = ok ? 0 : (error ?: -EIO); m->bio = req->master_bio; req->master_bio = NULL; } } static int drbd_req_put_completion_ref(struct drbd_request *req, struct bio_and_error *m, int put) { struct drbd_device *device = req->device; D_ASSERT(device, m || (req->rq_state & RQ_POSTPONED)); if (!atomic_sub_and_test(put, &req->completion_ref)) return 0; drbd_req_complete(req, m); if (req->rq_state & RQ_POSTPONED) { /* don't destroy the req object just yet, * but queue it for retry */ drbd_restart_request(req); return 0; } return 1; } /* I'd like this to be the only place that manipulates * req->completion_ref and req->kref. */ static void mod_rq_state(struct drbd_request *req, struct bio_and_error *m, int clear, int set) { struct drbd_device *device = req->device; unsigned s = req->rq_state; int c_put = 0; int k_put = 0; if (drbd_suspended(device) && !((s | clear) & RQ_COMPLETION_SUSP)) set |= RQ_COMPLETION_SUSP; /* apply */ req->rq_state &= ~clear; req->rq_state |= set; /* no change? */ if (req->rq_state == s) return; /* intent: get references */ if (!(s & RQ_LOCAL_PENDING) && (set & RQ_LOCAL_PENDING)) atomic_inc(&req->completion_ref); if (!(s & RQ_NET_PENDING) && (set & RQ_NET_PENDING)) { inc_ap_pending(device); atomic_inc(&req->completion_ref); } if (!(s & RQ_NET_QUEUED) && (set & RQ_NET_QUEUED)) atomic_inc(&req->completion_ref); if (!(s & RQ_EXP_BARR_ACK) && (set & RQ_EXP_BARR_ACK)) kref_get(&req->kref); /* wait for the DONE */ if (!(s & RQ_NET_SENT) && (set & RQ_NET_SENT)) atomic_add(req->i.size >> 9, &device->ap_in_flight); if (!(s & RQ_COMPLETION_SUSP) && (set & RQ_COMPLETION_SUSP)) atomic_inc(&req->completion_ref); /* progress: put references */ if ((s & RQ_COMPLETION_SUSP) && (clear & RQ_COMPLETION_SUSP)) ++c_put; if (!(s & RQ_LOCAL_ABORTED) && (set & RQ_LOCAL_ABORTED)) { D_ASSERT(device, req->rq_state & RQ_LOCAL_PENDING); /* local completion may still come in later, * we need to keep the req object around. */ kref_get(&req->kref); ++c_put; } if ((s & RQ_LOCAL_PENDING) && (clear & RQ_LOCAL_PENDING)) { if (req->rq_state & RQ_LOCAL_ABORTED) ++k_put; else ++c_put; } if ((s & RQ_NET_PENDING) && (clear & RQ_NET_PENDING)) { dec_ap_pending(device); ++c_put; } if ((s & RQ_NET_QUEUED) && (clear & RQ_NET_QUEUED)) ++c_put; if ((s & RQ_EXP_BARR_ACK) && !(s & RQ_NET_DONE) && (set & RQ_NET_DONE)) { if (req->rq_state & RQ_NET_SENT) atomic_sub(req->i.size >> 9, &device->ap_in_flight); ++k_put; } /* potentially complete and destroy */ if (k_put || c_put) { /* Completion does it's own kref_put. If we are going to * kref_sub below, we need req to be still around then. */ int at_least = k_put + !!c_put; int refcount = atomic_read(&req->kref.refcount); if (refcount < at_least) drbd_err(device, "mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n", s, req->rq_state, refcount, at_least); } /* If we made progress, retry conflicting peer requests, if any. */ if (req->i.waiting) wake_up(&device->misc_wait); if (c_put) k_put += drbd_req_put_completion_ref(req, m, c_put); if (k_put) kref_sub(&req->kref, k_put, drbd_req_destroy); } static void drbd_report_io_error(struct drbd_device *device, struct drbd_request *req) { char b[BDEVNAME_SIZE]; if (!DRBD_ratelimit(5*HZ, 3)) return; drbd_warn(device, "local %s IO error sector %llu+%u on %s\n", (req->rq_state & RQ_WRITE) ? "WRITE" : "READ", (unsigned long long)req->i.sector, req->i.size >> 9, bdevname(device->ldev->backing_bdev, b)); } /* obviously this could be coded as many single functions * instead of one huge switch, * or by putting the code directly in the respective locations * (as it has been before). * * but having it this way * enforces that it is all in this one place, where it is easier to audit, * it makes it obvious that whatever "event" "happens" to a request should * happen "atomically" within the req_lock, * and it enforces that we have to think in a very structured manner * about the "events" that may happen to a request during its life time ... */ int __req_mod(struct drbd_request *req, enum drbd_req_event what, struct bio_and_error *m) { struct drbd_device *device = req->device; struct net_conf *nc; int p, rv = 0; if (m) m->bio = NULL; switch (what) { default: drbd_err(device, "LOGIC BUG in %s:%u\n", __FILE__ , __LINE__); break; /* does not happen... * initialization done in drbd_req_new case CREATED: break; */ case TO_BE_SENT: /* via network */ /* reached via __drbd_make_request * and from w_read_retry_remote */ D_ASSERT(device, !(req->rq_state & RQ_NET_MASK)); rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); p = nc->wire_protocol; rcu_read_unlock(); req->rq_state |= p == DRBD_PROT_C ? RQ_EXP_WRITE_ACK : p == DRBD_PROT_B ? RQ_EXP_RECEIVE_ACK : 0; mod_rq_state(req, m, 0, RQ_NET_PENDING); break; case TO_BE_SUBMITTED: /* locally */ /* reached via __drbd_make_request */ D_ASSERT(device, !(req->rq_state & RQ_LOCAL_MASK)); mod_rq_state(req, m, 0, RQ_LOCAL_PENDING); break; case COMPLETED_OK: if (req->rq_state & RQ_WRITE) device->writ_cnt += req->i.size >> 9; else device->read_cnt += req->i.size >> 9; mod_rq_state(req, m, RQ_LOCAL_PENDING, RQ_LOCAL_COMPLETED|RQ_LOCAL_OK); break; case ABORT_DISK_IO: mod_rq_state(req, m, 0, RQ_LOCAL_ABORTED); break; case WRITE_COMPLETED_WITH_ERROR: drbd_report_io_error(device, req); __drbd_chk_io_error(device, DRBD_WRITE_ERROR); mod_rq_state(req, m, RQ_LOCAL_PENDING, RQ_LOCAL_COMPLETED); break; case READ_COMPLETED_WITH_ERROR: drbd_set_out_of_sync(device, req->i.sector, req->i.size); drbd_report_io_error(device, req); __drbd_chk_io_error(device, DRBD_READ_ERROR); /* fall through. */ case READ_AHEAD_COMPLETED_WITH_ERROR: /* it is legal to fail READA, no __drbd_chk_io_error in that case. */ mod_rq_state(req, m, RQ_LOCAL_PENDING, RQ_LOCAL_COMPLETED); break; case DISCARD_COMPLETED_NOTSUPP: case DISCARD_COMPLETED_WITH_ERROR: /* I'd rather not detach from local disk just because it * failed a REQ_DISCARD. */ mod_rq_state(req, m, RQ_LOCAL_PENDING, RQ_LOCAL_COMPLETED); break; case QUEUE_FOR_NET_READ: /* READ or READA, and * no local disk, * or target area marked as invalid, * or just got an io-error. */ /* from __drbd_make_request * or from bio_endio during read io-error recovery */ /* So we can verify the handle in the answer packet. * Corresponding drbd_remove_request_interval is in * drbd_req_complete() */ D_ASSERT(device, drbd_interval_empty(&req->i)); drbd_insert_interval(&device->read_requests, &req->i); set_bit(UNPLUG_REMOTE, &device->flags); D_ASSERT(device, req->rq_state & RQ_NET_PENDING); D_ASSERT(device, (req->rq_state & RQ_LOCAL_MASK) == 0); mod_rq_state(req, m, 0, RQ_NET_QUEUED); req->w.cb = w_send_read_req; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &req->w); break; case QUEUE_FOR_NET_WRITE: /* assert something? */ /* from __drbd_make_request only */ /* Corresponding drbd_remove_request_interval is in * drbd_req_complete() */ D_ASSERT(device, drbd_interval_empty(&req->i)); drbd_insert_interval(&device->write_requests, &req->i); /* NOTE * In case the req ended up on the transfer log before being * queued on the worker, it could lead to this request being * missed during cleanup after connection loss. * So we have to do both operations here, * within the same lock that protects the transfer log. * * _req_add_to_epoch(req); this has to be after the * _maybe_start_new_epoch(req); which happened in * __drbd_make_request, because we now may set the bit * again ourselves to close the current epoch. * * Add req to the (now) current epoch (barrier). */ /* otherwise we may lose an unplug, which may cause some remote * io-scheduler timeout to expire, increasing maximum latency, * hurting performance. */ set_bit(UNPLUG_REMOTE, &device->flags); /* queue work item to send data */ D_ASSERT(device, req->rq_state & RQ_NET_PENDING); mod_rq_state(req, m, 0, RQ_NET_QUEUED|RQ_EXP_BARR_ACK); req->w.cb = w_send_dblock; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &req->w); /* close the epoch, in case it outgrew the limit */ rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); p = nc->max_epoch_size; rcu_read_unlock(); if (first_peer_device(device)->connection->current_tle_writes >= p) start_new_tl_epoch(first_peer_device(device)->connection); break; case QUEUE_FOR_SEND_OOS: mod_rq_state(req, m, 0, RQ_NET_QUEUED); req->w.cb = w_send_out_of_sync; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &req->w); break; case READ_RETRY_REMOTE_CANCELED: case SEND_CANCELED: case SEND_FAILED: /* real cleanup will be done from tl_clear. just update flags * so it is no longer marked as on the worker queue */ mod_rq_state(req, m, RQ_NET_QUEUED, 0); break; case HANDED_OVER_TO_NETWORK: /* assert something? */ if (bio_data_dir(req->master_bio) == WRITE && !(req->rq_state & (RQ_EXP_RECEIVE_ACK | RQ_EXP_WRITE_ACK))) { /* this is what is dangerous about protocol A: * pretend it was successfully written on the peer. */ if (req->rq_state & RQ_NET_PENDING) mod_rq_state(req, m, RQ_NET_PENDING, RQ_NET_OK); /* else: neg-ack was faster... */ /* it is still not yet RQ_NET_DONE until the * corresponding epoch barrier got acked as well, * so we know what to dirty on connection loss */ } mod_rq_state(req, m, RQ_NET_QUEUED, RQ_NET_SENT); break; case OOS_HANDED_TO_NETWORK: /* Was not set PENDING, no longer QUEUED, so is now DONE * as far as this connection is concerned. */ mod_rq_state(req, m, RQ_NET_QUEUED, RQ_NET_DONE); break; case CONNECTION_LOST_WHILE_PENDING: /* transfer log cleanup after connection loss */ mod_rq_state(req, m, RQ_NET_OK|RQ_NET_PENDING|RQ_COMPLETION_SUSP, RQ_NET_DONE); break; case CONFLICT_RESOLVED: /* for superseded conflicting writes of multiple primaries, * there is no need to keep anything in the tl, potential * node crashes are covered by the activity log. * * If this request had been marked as RQ_POSTPONED before, * it will actually not be completed, but "restarted", * resubmitted from the retry worker context. */ D_ASSERT(device, req->rq_state & RQ_NET_PENDING); D_ASSERT(device, req->rq_state & RQ_EXP_WRITE_ACK); mod_rq_state(req, m, RQ_NET_PENDING, RQ_NET_DONE|RQ_NET_OK); break; case WRITE_ACKED_BY_PEER_AND_SIS: req->rq_state |= RQ_NET_SIS; case WRITE_ACKED_BY_PEER: D_ASSERT(device, req->rq_state & RQ_EXP_WRITE_ACK); /* protocol C; successfully written on peer. * Nothing more to do here. * We want to keep the tl in place for all protocols, to cater * for volatile write-back caches on lower level devices. */ goto ack_common; case RECV_ACKED_BY_PEER: D_ASSERT(device, req->rq_state & RQ_EXP_RECEIVE_ACK); /* protocol B; pretends to be successfully written on peer. * see also notes above in HANDED_OVER_TO_NETWORK about * protocol != C */ ack_common: D_ASSERT(device, req->rq_state & RQ_NET_PENDING); mod_rq_state(req, m, RQ_NET_PENDING, RQ_NET_OK); break; case POSTPONE_WRITE: D_ASSERT(device, req->rq_state & RQ_EXP_WRITE_ACK); /* If this node has already detected the write conflict, the * worker will be waiting on misc_wait. Wake it up once this * request has completed locally. */ D_ASSERT(device, req->rq_state & RQ_NET_PENDING); req->rq_state |= RQ_POSTPONED; if (req->i.waiting) wake_up(&device->misc_wait); /* Do not clear RQ_NET_PENDING. This request will make further * progress via restart_conflicting_writes() or * fail_postponed_requests(). Hopefully. */ break; case NEG_ACKED: mod_rq_state(req, m, RQ_NET_OK|RQ_NET_PENDING, 0); break; case FAIL_FROZEN_DISK_IO: if (!(req->rq_state & RQ_LOCAL_COMPLETED)) break; mod_rq_state(req, m, RQ_COMPLETION_SUSP, 0); break; case RESTART_FROZEN_DISK_IO: if (!(req->rq_state & RQ_LOCAL_COMPLETED)) break; mod_rq_state(req, m, RQ_COMPLETION_SUSP|RQ_LOCAL_COMPLETED, RQ_LOCAL_PENDING); rv = MR_READ; if (bio_data_dir(req->master_bio) == WRITE) rv = MR_WRITE; get_ldev(device); /* always succeeds in this call path */ req->w.cb = w_restart_disk_io; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &req->w); break; case RESEND: /* Simply complete (local only) READs. */ if (!(req->rq_state & RQ_WRITE) && !req->w.cb) { mod_rq_state(req, m, RQ_COMPLETION_SUSP, 0); break; } /* If RQ_NET_OK is already set, we got a P_WRITE_ACK or P_RECV_ACK before the connection loss (B&C only); only P_BARRIER_ACK (or the local completion?) was missing when we suspended. Throwing them out of the TL here by pretending we got a BARRIER_ACK. During connection handshake, we ensure that the peer was not rebooted. */ if (!(req->rq_state & RQ_NET_OK)) { /* FIXME could this possibly be a req->dw.cb == w_send_out_of_sync? * in that case we must not set RQ_NET_PENDING. */ mod_rq_state(req, m, RQ_COMPLETION_SUSP, RQ_NET_QUEUED|RQ_NET_PENDING); if (req->w.cb) { drbd_queue_work(&first_peer_device(device)->connection->sender_work, &req->w); rv = req->rq_state & RQ_WRITE ? MR_WRITE : MR_READ; } /* else: FIXME can this happen? */ break; } /* else, fall through to BARRIER_ACKED */ case BARRIER_ACKED: /* barrier ack for READ requests does not make sense */ if (!(req->rq_state & RQ_WRITE)) break; if (req->rq_state & RQ_NET_PENDING) { /* barrier came in before all requests were acked. * this is bad, because if the connection is lost now, * we won't be able to clean them up... */ drbd_err(device, "FIXME (BARRIER_ACKED but pending)\n"); } /* Allowed to complete requests, even while suspended. * As this is called for all requests within a matching epoch, * we need to filter, and only set RQ_NET_DONE for those that * have actually been on the wire. */ mod_rq_state(req, m, RQ_COMPLETION_SUSP, (req->rq_state & RQ_NET_MASK) ? RQ_NET_DONE : 0); break; case DATA_RECEIVED: D_ASSERT(device, req->rq_state & RQ_NET_PENDING); mod_rq_state(req, m, RQ_NET_PENDING, RQ_NET_OK|RQ_NET_DONE); break; case QUEUE_AS_DRBD_BARRIER: start_new_tl_epoch(first_peer_device(device)->connection); mod_rq_state(req, m, 0, RQ_NET_OK|RQ_NET_DONE); break; }; return rv; } /* we may do a local read if: * - we are consistent (of course), * - or we are generally inconsistent, * BUT we are still/already IN SYNC for this area. * since size may be bigger than BM_BLOCK_SIZE, * we may need to check several bits. */ static bool drbd_may_do_local_read(struct drbd_device *device, sector_t sector, int size) { unsigned long sbnr, ebnr; sector_t esector, nr_sectors; if (device->state.disk == D_UP_TO_DATE) return true; if (device->state.disk != D_INCONSISTENT) return false; esector = sector + (size >> 9) - 1; nr_sectors = drbd_get_capacity(device->this_bdev); D_ASSERT(device, sector < nr_sectors); D_ASSERT(device, esector < nr_sectors); sbnr = BM_SECT_TO_BIT(sector); ebnr = BM_SECT_TO_BIT(esector); return drbd_bm_count_bits(device, sbnr, ebnr) == 0; } static bool remote_due_to_read_balancing(struct drbd_device *device, sector_t sector, enum drbd_read_balancing rbm) { struct backing_dev_info *bdi; int stripe_shift; switch (rbm) { case RB_CONGESTED_REMOTE: bdi = &device->ldev->backing_bdev->bd_disk->queue->backing_dev_info; return bdi_read_congested(bdi); case RB_LEAST_PENDING: return atomic_read(&device->local_cnt) > atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt); case RB_32K_STRIPING: /* stripe_shift = 15 */ case RB_64K_STRIPING: case RB_128K_STRIPING: case RB_256K_STRIPING: case RB_512K_STRIPING: case RB_1M_STRIPING: /* stripe_shift = 20 */ stripe_shift = (rbm - RB_32K_STRIPING + 15); return (sector >> (stripe_shift - 9)) & 1; case RB_ROUND_ROBIN: return test_and_change_bit(READ_BALANCE_RR, &device->flags); case RB_PREFER_REMOTE: return true; case RB_PREFER_LOCAL: default: return false; } } /* * complete_conflicting_writes - wait for any conflicting write requests * * The write_requests tree contains all active write requests which we * currently know about. Wait for any requests to complete which conflict with * the new one. * * Only way out: remove the conflicting intervals from the tree. */ static void complete_conflicting_writes(struct drbd_request *req) { DEFINE_WAIT(wait); struct drbd_device *device = req->device; struct drbd_interval *i; sector_t sector = req->i.sector; int size = req->i.size; i = drbd_find_overlap(&device->write_requests, sector, size); if (!i) return; for (;;) { prepare_to_wait(&device->misc_wait, &wait, TASK_UNINTERRUPTIBLE); i = drbd_find_overlap(&device->write_requests, sector, size); if (!i) break; /* Indicate to wake up device->misc_wait on progress. */ i->waiting = true; spin_unlock_irq(&device->resource->req_lock); schedule(); spin_lock_irq(&device->resource->req_lock); } finish_wait(&device->misc_wait, &wait); } /* called within req_lock and rcu_read_lock() */ static void maybe_pull_ahead(struct drbd_device *device) { struct drbd_connection *connection = first_peer_device(device)->connection; struct net_conf *nc; bool congested = false; enum drbd_on_congestion on_congestion; rcu_read_lock(); nc = rcu_dereference(connection->net_conf); on_congestion = nc ? nc->on_congestion : OC_BLOCK; rcu_read_unlock(); if (on_congestion == OC_BLOCK || connection->agreed_pro_version < 96) return; /* If I don't even have good local storage, we can not reasonably try * to pull ahead of the peer. We also need the local reference to make * sure device->act_log is there. */ if (!get_ldev_if_state(device, D_UP_TO_DATE)) return; if (nc->cong_fill && atomic_read(&device->ap_in_flight) >= nc->cong_fill) { drbd_info(device, "Congestion-fill threshold reached\n"); congested = true; } if (device->act_log->used >= nc->cong_extents) { drbd_info(device, "Congestion-extents threshold reached\n"); congested = true; } if (congested) { /* start a new epoch for non-mirrored writes */ start_new_tl_epoch(first_peer_device(device)->connection); if (on_congestion == OC_PULL_AHEAD) _drbd_set_state(_NS(device, conn, C_AHEAD), 0, NULL); else /*nc->on_congestion == OC_DISCONNECT */ _drbd_set_state(_NS(device, conn, C_DISCONNECTING), 0, NULL); } put_ldev(device); } /* If this returns false, and req->private_bio is still set, * this should be submitted locally. * * If it returns false, but req->private_bio is not set, * we do not have access to good data :( * * Otherwise, this destroys req->private_bio, if any, * and returns true. */ static bool do_remote_read(struct drbd_request *req) { struct drbd_device *device = req->device; enum drbd_read_balancing rbm; if (req->private_bio) { if (!drbd_may_do_local_read(device, req->i.sector, req->i.size)) { bio_put(req->private_bio); req->private_bio = NULL; put_ldev(device); } } if (device->state.pdsk != D_UP_TO_DATE) return false; if (req->private_bio == NULL) return true; /* TODO: improve read balancing decisions, take into account drbd * protocol, pending requests etc. */ rcu_read_lock(); rbm = rcu_dereference(device->ldev->disk_conf)->read_balancing; rcu_read_unlock(); if (rbm == RB_PREFER_LOCAL && req->private_bio) return false; /* submit locally */ if (remote_due_to_read_balancing(device, req->i.sector, rbm)) { if (req->private_bio) { bio_put(req->private_bio); req->private_bio = NULL; put_ldev(device); } return true; } return false; } /* returns number of connections (== 1, for drbd 8.4) * expected to actually write this data, * which does NOT include those that we are L_AHEAD for. */ static int drbd_process_write_request(struct drbd_request *req) { struct drbd_device *device = req->device; int remote, send_oos; remote = drbd_should_do_remote(device->state); send_oos = drbd_should_send_out_of_sync(device->state); /* Need to replicate writes. Unless it is an empty flush, * which is better mapped to a DRBD P_BARRIER packet, * also for drbd wire protocol compatibility reasons. * If this was a flush, just start a new epoch. * Unless the current epoch was empty anyways, or we are not currently * replicating, in which case there is no point. */ if (unlikely(req->i.size == 0)) { /* The only size==0 bios we expect are empty flushes. */ D_ASSERT(device, req->master_bio->bi_rw & DRBD_REQ_FLUSH); if (remote) _req_mod(req, QUEUE_AS_DRBD_BARRIER); return remote; } if (!remote && !send_oos) return 0; D_ASSERT(device, !(remote && send_oos)); if (remote) { _req_mod(req, TO_BE_SENT); _req_mod(req, QUEUE_FOR_NET_WRITE); } else if (drbd_set_out_of_sync(device, req->i.sector, req->i.size)) _req_mod(req, QUEUE_FOR_SEND_OOS); return remote; } static void drbd_submit_req_private_bio(struct drbd_request *req) { struct drbd_device *device = req->device; struct bio *bio = req->private_bio; const int rw = bio_rw(bio); bio->bi_bdev = device->ldev->backing_bdev; /* State may have changed since we grabbed our reference on the * ->ldev member. Double check, and short-circuit to endio. * In case the last activity log transaction failed to get on * stable storage, and this is a WRITE, we may not even submit * this bio. */ if (get_ldev(device)) { if (drbd_insert_fault(device, rw == WRITE ? DRBD_FAULT_DT_WR : rw == READ ? DRBD_FAULT_DT_RD : DRBD_FAULT_DT_RA)) bio_endio(bio, -EIO); else generic_make_request(bio); put_ldev(device); } else bio_endio(bio, -EIO); } static void drbd_queue_write(struct drbd_device *device, struct drbd_request *req) { spin_lock(&device->submit.lock); list_add_tail(&req->tl_requests, &device->submit.writes); spin_unlock(&device->submit.lock); queue_work(device->submit.wq, &device->submit.worker); } /* returns the new drbd_request pointer, if the caller is expected to * drbd_send_and_submit() it (to save latency), or NULL if we queued the * request on the submitter thread. * Returns ERR_PTR(-ENOMEM) if we cannot allocate a drbd_request. */ struct drbd_request * drbd_request_prepare(struct drbd_device *device, struct bio *bio, unsigned long start_time) { const int rw = bio_data_dir(bio); struct drbd_request *req; /* allocate outside of all locks; */ req = drbd_req_new(device, bio); if (!req) { dec_ap_bio(device); /* only pass the error to the upper layers. * if user cannot handle io errors, that's not our business. */ drbd_err(device, "could not kmalloc() req\n"); bio_endio(bio, -ENOMEM); return ERR_PTR(-ENOMEM); } req->start_time = start_time; if (!get_ldev(device)) { bio_put(req->private_bio); req->private_bio = NULL; } /* Update disk stats */ _drbd_start_io_acct(device, req); if (rw == WRITE && req->private_bio && req->i.size && !test_bit(AL_SUSPENDED, &device->flags)) { if (!drbd_al_begin_io_fastpath(device, &req->i)) { drbd_queue_write(device, req); return NULL; } req->rq_state |= RQ_IN_ACT_LOG; } return req; } static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request *req) { const int rw = bio_rw(req->master_bio); struct bio_and_error m = { NULL, }; bool no_remote = false; spin_lock_irq(&device->resource->req_lock); if (rw == WRITE) { /* This may temporarily give up the req_lock, * but will re-aquire it before it returns here. * Needs to be before the check on drbd_suspended() */ complete_conflicting_writes(req); /* no more giving up req_lock from now on! */ /* check for congestion, and potentially stop sending * full data updates, but start sending "dirty bits" only. */ maybe_pull_ahead(device); } if (drbd_suspended(device)) { /* push back and retry: */ req->rq_state |= RQ_POSTPONED; if (req->private_bio) { bio_put(req->private_bio); req->private_bio = NULL; put_ldev(device); } goto out; } /* We fail READ/READA early, if we can not serve it. * We must do this before req is registered on any lists. * Otherwise, drbd_req_complete() will queue failed READ for retry. */ if (rw != WRITE) { if (!do_remote_read(req) && !req->private_bio) goto nodata; } /* which transfer log epoch does this belong to? */ req->epoch = atomic_read(&first_peer_device(device)->connection->current_tle_nr); /* no point in adding empty flushes to the transfer log, * they are mapped to drbd barriers already. */ if (likely(req->i.size!=0)) { if (rw == WRITE) first_peer_device(device)->connection->current_tle_writes++; list_add_tail(&req->tl_requests, &first_peer_device(device)->connection->transfer_log); } if (rw == WRITE) { if (!drbd_process_write_request(req)) no_remote = true; } else { /* We either have a private_bio, or we can read from remote. * Otherwise we had done the goto nodata above. */ if (req->private_bio == NULL) { _req_mod(req, TO_BE_SENT); _req_mod(req, QUEUE_FOR_NET_READ); } else no_remote = true; } if (req->private_bio) { /* needs to be marked within the same spinlock */ _req_mod(req, TO_BE_SUBMITTED); /* but we need to give up the spinlock to submit */ spin_unlock_irq(&device->resource->req_lock); drbd_submit_req_private_bio(req); spin_lock_irq(&device->resource->req_lock); } else if (no_remote) { nodata: if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "IO ERROR: neither local nor remote data, sector %llu+%u\n", (unsigned long long)req->i.sector, req->i.size >> 9); /* A write may have been queued for send_oos, however. * So we can not simply free it, we must go through drbd_req_put_completion_ref() */ } out: if (drbd_req_put_completion_ref(req, &m, 1)) kref_put(&req->kref, drbd_req_destroy); spin_unlock_irq(&device->resource->req_lock); if (m.bio) complete_master_bio(device, &m); } void __drbd_make_request(struct drbd_device *device, struct bio *bio, unsigned long start_time) { struct drbd_request *req = drbd_request_prepare(device, bio, start_time); if (IS_ERR_OR_NULL(req)) return; drbd_send_and_submit(device, req); } static void submit_fast_path(struct drbd_device *device, struct list_head *incoming) { struct drbd_request *req, *tmp; list_for_each_entry_safe(req, tmp, incoming, tl_requests) { const int rw = bio_data_dir(req->master_bio); if (rw == WRITE /* rw != WRITE should not even end up here! */ && req->private_bio && req->i.size && !test_bit(AL_SUSPENDED, &device->flags)) { if (!drbd_al_begin_io_fastpath(device, &req->i)) continue; req->rq_state |= RQ_IN_ACT_LOG; } list_del_init(&req->tl_requests); drbd_send_and_submit(device, req); } } static bool prepare_al_transaction_nonblock(struct drbd_device *device, struct list_head *incoming, struct list_head *pending) { struct drbd_request *req, *tmp; int wake = 0; int err; spin_lock_irq(&device->al_lock); list_for_each_entry_safe(req, tmp, incoming, tl_requests) { err = drbd_al_begin_io_nonblock(device, &req->i); if (err == -EBUSY) wake = 1; if (err) continue; req->rq_state |= RQ_IN_ACT_LOG; list_move_tail(&req->tl_requests, pending); } spin_unlock_irq(&device->al_lock); if (wake) wake_up(&device->al_wait); return !list_empty(pending); } void do_submit(struct work_struct *ws) { struct drbd_device *device = container_of(ws, struct drbd_device, submit.worker); LIST_HEAD(incoming); LIST_HEAD(pending); struct drbd_request *req, *tmp; for (;;) { spin_lock(&device->submit.lock); list_splice_tail_init(&device->submit.writes, &incoming); spin_unlock(&device->submit.lock); submit_fast_path(device, &incoming); if (list_empty(&incoming)) break; skip_fast_path: wait_event(device->al_wait, prepare_al_transaction_nonblock(device, &incoming, &pending)); /* Maybe more was queued, while we prepared the transaction? * Try to stuff them into this transaction as well. * Be strictly non-blocking here, no wait_event, we already * have something to commit. * Stop if we don't make any more progres. */ for (;;) { LIST_HEAD(more_pending); LIST_HEAD(more_incoming); bool made_progress; /* It is ok to look outside the lock, * it's only an optimization anyways */ if (list_empty(&device->submit.writes)) break; spin_lock(&device->submit.lock); list_splice_tail_init(&device->submit.writes, &more_incoming); spin_unlock(&device->submit.lock); if (list_empty(&more_incoming)) break; made_progress = prepare_al_transaction_nonblock(device, &more_incoming, &more_pending); list_splice_tail_init(&more_pending, &pending); list_splice_tail_init(&more_incoming, &incoming); if (!made_progress) break; } drbd_al_begin_io_commit(device, false); list_for_each_entry_safe(req, tmp, &pending, tl_requests) { list_del_init(&req->tl_requests); drbd_send_and_submit(device, req); } /* If all currently hot activity log extents are kept busy by * incoming requests, we still must not totally starve new * requests to cold extents. In that case, prepare one request * in blocking mode. */ list_for_each_entry_safe(req, tmp, &incoming, tl_requests) { list_del_init(&req->tl_requests); req->rq_state |= RQ_IN_ACT_LOG; if (!drbd_al_begin_io_prepare(device, &req->i)) { /* Corresponding extent was hot after all? */ drbd_send_and_submit(device, req); } else { /* Found a request to a cold extent. * Put on "pending" list, * and try to cumulate with more. */ list_add(&req->tl_requests, &pending); goto skip_fast_path; } } } } MAKE_REQUEST_TYPE drbd_make_request(struct request_queue *q, struct bio *bio) { struct drbd_device *device = (struct drbd_device *) q->queuedata; unsigned long start_time; /* We never supported BIO_RW_BARRIER. * We don't need to, anymore, either: starting with kernel 2.6.36, * we have REQ_FUA and REQ_FLUSH, which will be handled transparently * by the block layer. */ if (unlikely(bio->bi_rw & DRBD_REQ_HARDBARRIER)) { bio_endio(bio, -EOPNOTSUPP); MAKE_REQUEST_RETURN; } start_time = jiffies; /* * what we "blindly" assume: */ D_ASSERT(device, IS_ALIGNED(bio->bi_size, 512)); inc_ap_bio(device); __drbd_make_request(device, bio, start_time); MAKE_REQUEST_RETURN; } /* This is called by bio_add_page(). * * q->max_hw_sectors and other global limits are already enforced there. * * We need to call down to our lower level device, * in case it has special restrictions. * * We also may need to enforce configured max-bio-bvecs limits. * * As long as the BIO is empty we have to allow at least one bvec, * regardless of size and offset, so no need to ask lower levels. */ int drbd_merge_bvec(struct request_queue *q, #ifdef HAVE_bvec_merge_data struct bvec_merge_data *bvm, #else struct bio *bvm, #endif struct bio_vec *bvec) { struct drbd_device *device = (struct drbd_device *) q->queuedata; unsigned int bio_size = bvm->bi_size; int limit = DRBD_MAX_BIO_SIZE; int backing_limit; if (bio_size && get_ldev(device)) { unsigned int max_hw_sectors = queue_max_hw_sectors(q); struct request_queue * const b = device->ldev->backing_bdev->bd_disk->queue; if (b->merge_bvec_fn) { backing_limit = b->merge_bvec_fn(b, bvm, bvec); limit = min(limit, backing_limit); } put_ldev(device); if ((limit >> 9) > max_hw_sectors) limit = max_hw_sectors << 9; } return limit; } static void find_oldest_requests( struct drbd_connection *connection, struct drbd_device *device, struct drbd_request **oldest_req_waiting_for_peer, struct drbd_request **oldest_req_waiting_for_disk) { struct drbd_request *r; *oldest_req_waiting_for_peer = NULL; *oldest_req_waiting_for_disk = NULL; list_for_each_entry(r, &connection->transfer_log, tl_requests) { const unsigned s = r->rq_state; if (!*oldest_req_waiting_for_peer && ((s & RQ_NET_MASK) && !(s & RQ_NET_DONE))) *oldest_req_waiting_for_peer = r; if (!*oldest_req_waiting_for_disk && (s & RQ_LOCAL_PENDING) && r->device == device) *oldest_req_waiting_for_disk = r; if (*oldest_req_waiting_for_peer && *oldest_req_waiting_for_disk) break; } } void request_timer_fn(unsigned long data) { struct drbd_device *device = (struct drbd_device *) data; struct drbd_connection *connection = first_peer_device(device)->connection; struct drbd_request *req_disk, *req_peer; /* oldest request */ struct net_conf *nc; unsigned long ent = 0, dt = 0, et, nt; /* effective timeout = ko_count * timeout */ unsigned long now; rcu_read_lock(); nc = rcu_dereference(connection->net_conf); if (nc && device->state.conn >= C_WF_REPORT_PARAMS) ent = nc->timeout * HZ/10 * nc->ko_count; if (get_ldev(device)) { /* implicit state.disk >= D_INCONSISTENT */ dt = rcu_dereference(device->ldev->disk_conf)->disk_timeout * HZ / 10; put_ldev(device); } rcu_read_unlock(); et = min_not_zero(dt, ent); if (!et) return; /* Recurring timer stopped */ now = jiffies; spin_lock_irq(&device->resource->req_lock); find_oldest_requests(connection, device, &req_peer, &req_disk); if (req_peer == NULL && req_disk == NULL) { spin_unlock_irq(&device->resource->req_lock); mod_timer(&device->request_timer, now + et); return; } /* The request is considered timed out, if * - we have some effective timeout from the configuration, * with above state restrictions applied, * - the oldest request is waiting for a response from the network * resp. the local disk, * - the oldest request is in fact older than the effective timeout, * - the connection was established (resp. disk was attached) * for longer than the timeout already. * Note that for 32bit jiffies and very stable connections/disks, * we may have a wrap around, which is catched by * !time_in_range(now, last_..._jif, last_..._jif + timeout). * * Side effect: once per 32bit wrap-around interval, which means every * ~198 days with 250 HZ, we have a window where the timeout would need * to expire twice (worst case) to become effective. Good enough. */ if (ent && req_peer && time_after(now, req_peer->start_time + ent) && !time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + ent)) { drbd_warn(device, "Remote failed to finish a request within ko-count * timeout\n"); _drbd_set_state(_NS(device, conn, C_TIMEOUT), CS_VERBOSE | CS_HARD, NULL); } if (dt && req_disk && time_after(now, req_disk->start_time + dt) && !time_in_range(now, device->last_reattach_jif, device->last_reattach_jif + dt)) { drbd_warn(device, "Local backing device failed to meet the disk-timeout\n"); __drbd_chk_io_error(device, DRBD_FORCE_DETACH); } /* Reschedule timer for the nearest not already expired timeout. * Fallback to now + min(effective network timeout, disk timeout). */ ent = (ent && req_peer && time_before(now, req_peer->start_time + ent)) ? req_peer->start_time + ent : now + et; dt = (dt && req_disk && time_before(now, req_disk->start_time + dt)) ? req_disk->start_time + dt : now + et; nt = time_before(ent, dt) ? ent : dt; spin_unlock_irq(&connection->resource->req_lock); mod_timer(&device->request_timer, nt); } drbd-8.4.4/drbd/drbd_req.h0000664000000000000000000002621512225234676014027 0ustar rootroot/* drbd_req.h This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2006-2008, LINBIT Information Technologies GmbH. Copyright (C) 2006-2008, Lars Ellenberg . Copyright (C) 2006-2008, Philipp Reisner . DRBD is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. DRBD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef _DRBD_REQ_H #define _DRBD_REQ_H #include #include #include #include "drbd_int.h" /* The request callbacks will be called in irq context by the IDE drivers, and in Softirqs/Tasklets/BH context by the SCSI drivers, and by the receiver and worker in kernel-thread context. Try to get the locking right :) */ /* * Objects of type struct drbd_request do only exist on a R_PRIMARY node, and are * associated with IO requests originating from the block layer above us. * * There are quite a few things that may happen to a drbd request * during its lifetime. * * It will be created. * It will be marked with the intention to be * submitted to local disk and/or * send via the network. * * It has to be placed on the transfer log and other housekeeping lists, * In case we have a network connection. * * It may be identified as a concurrent (write) request * and be handled accordingly. * * It may me handed over to the local disk subsystem. * It may be completed by the local disk subsystem, * either successfully or with io-error. * In case it is a READ request, and it failed locally, * it may be retried remotely. * * It may be queued for sending. * It may be handed over to the network stack, * which may fail. * It may be acknowledged by the "peer" according to the wire_protocol in use. * this may be a negative ack. * It may receive a faked ack when the network connection is lost and the * transfer log is cleaned up. * Sending may be canceled due to network connection loss. * When it finally has outlived its time, * corresponding dirty bits in the resync-bitmap may be cleared or set, * it will be destroyed, * and completion will be signalled to the originator, * with or without "success". */ enum drbd_req_event { CREATED, TO_BE_SENT, TO_BE_SUBMITTED, /* XXX yes, now I am inconsistent... * these are not "events" but "actions" * oh, well... */ QUEUE_FOR_NET_WRITE, QUEUE_FOR_NET_READ, QUEUE_FOR_SEND_OOS, /* An empty flush is queued as P_BARRIER, * which will cause it to complete "successfully", * even if the local disk flush failed. * * Just like "real" requests, empty flushes (blkdev_issue_flush()) will * only see an error if neither local nor remote data is reachable. */ QUEUE_AS_DRBD_BARRIER, SEND_CANCELED, SEND_FAILED, HANDED_OVER_TO_NETWORK, OOS_HANDED_TO_NETWORK, CONNECTION_LOST_WHILE_PENDING, READ_RETRY_REMOTE_CANCELED, RECV_ACKED_BY_PEER, WRITE_ACKED_BY_PEER, WRITE_ACKED_BY_PEER_AND_SIS, /* and set_in_sync */ CONFLICT_RESOLVED, POSTPONE_WRITE, NEG_ACKED, BARRIER_ACKED, /* in protocol A and B */ DATA_RECEIVED, /* (remote read) */ COMPLETED_OK, READ_COMPLETED_WITH_ERROR, READ_AHEAD_COMPLETED_WITH_ERROR, WRITE_COMPLETED_WITH_ERROR, DISCARD_COMPLETED_NOTSUPP, DISCARD_COMPLETED_WITH_ERROR, ABORT_DISK_IO, RESEND, FAIL_FROZEN_DISK_IO, RESTART_FROZEN_DISK_IO, NOTHING, }; /* encoding of request states for now. we don't actually need that many bits. * we don't need to do atomic bit operations either, since most of the time we * need to look at the connection state and/or manipulate some lists at the * same time, so we should hold the request lock anyways. */ enum drbd_req_state_bits { /* 3210 * 0000: no local possible * 0001: to be submitted * UNUSED, we could map: 011: submitted, completion still pending * 0110: completed ok * 0010: completed with error * 1001: Aborted (before completion) * 1x10: Aborted and completed -> free */ __RQ_LOCAL_PENDING, __RQ_LOCAL_COMPLETED, __RQ_LOCAL_OK, __RQ_LOCAL_ABORTED, /* 87654 * 00000: no network possible * 00001: to be send * 00011: to be send, on worker queue * 00101: sent, expecting recv_ack (B) or write_ack (C) * 11101: sent, * recv_ack (B) or implicit "ack" (A), * still waiting for the barrier ack. * master_bio may already be completed and invalidated. * 11100: write acked (C), * data received (for remote read, any protocol) * or finally the barrier ack has arrived (B,A)... * request can be freed * 01100: neg-acked (write, protocol C) * or neg-d-acked (read, any protocol) * or killed from the transfer log * during cleanup after connection loss * request can be freed * 01000: canceled or send failed... * request can be freed */ /* if "SENT" is not set, yet, this can still fail or be canceled. * if "SENT" is set already, we still wait for an Ack packet. * when cleared, the master_bio may be completed. * in (B,A) the request object may still linger on the transaction log * until the corresponding barrier ack comes in */ __RQ_NET_PENDING, /* If it is QUEUED, and it is a WRITE, it is also registered in the * transfer log. Currently we need this flag to avoid conflicts between * worker canceling the request and tl_clear_barrier killing it from * transfer log. We should restructure the code so this conflict does * no longer occur. */ __RQ_NET_QUEUED, /* well, actually only "handed over to the network stack". * * TODO can potentially be dropped because of the similar meaning * of RQ_NET_SENT and ~RQ_NET_QUEUED. * however it is not exactly the same. before we drop it * we must ensure that we can tell a request with network part * from a request without, regardless of what happens to it. */ __RQ_NET_SENT, /* when set, the request may be freed (if RQ_NET_QUEUED is clear). * basically this means the corresponding P_BARRIER_ACK was received */ __RQ_NET_DONE, /* whether or not we know (C) or pretend (B,A) that the write * was successfully written on the peer. */ __RQ_NET_OK, /* peer called drbd_set_in_sync() for this write */ __RQ_NET_SIS, /* keep this last, its for the RQ_NET_MASK */ __RQ_NET_MAX, /* Set when this is a write, clear for a read */ __RQ_WRITE, /* Should call drbd_al_complete_io() for this request... */ __RQ_IN_ACT_LOG, /* The peer has sent a retry ACK */ __RQ_POSTPONED, /* would have been completed, * but was not, because of drbd_suspended() */ __RQ_COMPLETION_SUSP, /* We expect a receive ACK (wire proto B) */ __RQ_EXP_RECEIVE_ACK, /* We expect a write ACK (wite proto C) */ __RQ_EXP_WRITE_ACK, /* waiting for a barrier ack, did an extra kref_get */ __RQ_EXP_BARR_ACK, }; #define RQ_LOCAL_PENDING (1UL << __RQ_LOCAL_PENDING) #define RQ_LOCAL_COMPLETED (1UL << __RQ_LOCAL_COMPLETED) #define RQ_LOCAL_OK (1UL << __RQ_LOCAL_OK) #define RQ_LOCAL_ABORTED (1UL << __RQ_LOCAL_ABORTED) #define RQ_LOCAL_MASK ((RQ_LOCAL_ABORTED << 1)-1) #define RQ_NET_PENDING (1UL << __RQ_NET_PENDING) #define RQ_NET_QUEUED (1UL << __RQ_NET_QUEUED) #define RQ_NET_SENT (1UL << __RQ_NET_SENT) #define RQ_NET_DONE (1UL << __RQ_NET_DONE) #define RQ_NET_OK (1UL << __RQ_NET_OK) #define RQ_NET_SIS (1UL << __RQ_NET_SIS) /* 0x1f8 */ #define RQ_NET_MASK (((1UL << __RQ_NET_MAX)-1) & ~RQ_LOCAL_MASK) #define RQ_WRITE (1UL << __RQ_WRITE) #define RQ_IN_ACT_LOG (1UL << __RQ_IN_ACT_LOG) #define RQ_POSTPONED (1UL << __RQ_POSTPONED) #define RQ_COMPLETION_SUSP (1UL << __RQ_COMPLETION_SUSP) #define RQ_EXP_RECEIVE_ACK (1UL << __RQ_EXP_RECEIVE_ACK) #define RQ_EXP_WRITE_ACK (1UL << __RQ_EXP_WRITE_ACK) #define RQ_EXP_BARR_ACK (1UL << __RQ_EXP_BARR_ACK) /* For waking up the frozen transfer log mod_req() has to return if the request should be counted in the epoch object*/ #define MR_WRITE 1 #define MR_READ 2 static inline void drbd_req_make_private_bio(struct drbd_request *req, struct bio *bio_src) { struct bio *bio; bio = bio_clone(bio_src, GFP_NOIO); /* XXX cannot fail?? */ req->private_bio = bio; bio->bi_private = req; bio->bi_end_io = drbd_request_endio; bio->bi_next = NULL; } /* Short lived temporary struct on the stack. * We could squirrel the error to be returned into * bio->bi_size, or similar. But that would be too ugly. */ struct bio_and_error { struct bio *bio; int error; }; extern void start_new_tl_epoch(struct drbd_connection *connection); extern void drbd_req_destroy(struct kref *kref); extern void _req_may_be_done(struct drbd_request *req, struct bio_and_error *m); extern int __req_mod(struct drbd_request *req, enum drbd_req_event what, struct bio_and_error *m); extern void complete_master_bio(struct drbd_device *device, struct bio_and_error *m); extern void request_timer_fn(unsigned long data); extern void tl_restart(struct drbd_connection *connection, enum drbd_req_event what); extern void _tl_restart(struct drbd_connection *connection, enum drbd_req_event what); /* this is in drbd_main.c */ extern void drbd_restart_request(struct drbd_request *req); /* use this if you don't want to deal with calling complete_master_bio() * outside the spinlock, e.g. when walking some list on cleanup. */ static inline int _req_mod(struct drbd_request *req, enum drbd_req_event what) { struct drbd_device *device = req->device; struct bio_and_error m; int rv; /* __req_mod possibly frees req, do not touch req after that! */ rv = __req_mod(req, what, &m); if (m.bio) complete_master_bio(device, &m); return rv; } /* completion of master bio is outside of spinlock. * If you need it irqsave, do it your self! * Which means: don't use from bio endio callback. */ static inline int req_mod(struct drbd_request *req, enum drbd_req_event what) { struct drbd_device *device = req->device; struct bio_and_error m; int rv; spin_lock_irq(&device->resource->req_lock); rv = __req_mod(req, what, &m); spin_unlock_irq(&device->resource->req_lock); if (m.bio) complete_master_bio(device, &m); return rv; } static inline bool drbd_should_do_remote(union drbd_dev_state s) { return s.pdsk == D_UP_TO_DATE || (s.pdsk >= D_INCONSISTENT && s.conn >= C_WF_BITMAP_T && s.conn < C_AHEAD); /* Before proto 96 that was >= CONNECTED instead of >= C_WF_BITMAP_T. That is equivalent since before 96 IO was frozen in the C_WF_BITMAP* states. */ } static inline bool drbd_should_send_out_of_sync(union drbd_dev_state s) { return s.conn == C_AHEAD || s.conn == C_WF_BITMAP_S; /* pdsk = D_INCONSISTENT as a consequence. Protocol 96 check not necessary since we enter state C_AHEAD only if proto >= 96 */ } #endif drbd-8.4.4/drbd/drbd_state.c0000664000000000000000000016223612221261130014334 0ustar rootroot/* drbd_state.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . Thanks to Carter Burden, Bart Grantham and Gennadiy Nerubayev from Logicworks, Inc. for making SDP replication support possible. drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include "drbd_int.h" #include "drbd_protocol.h" #include "drbd_req.h" /* in drbd_main.c */ extern void tl_abort_disk_io(struct drbd_device *device); struct after_state_chg_work { struct drbd_work w; struct drbd_device *device; union drbd_state os; union drbd_state ns; enum chg_state_flags flags; struct completion *done; }; enum sanitize_state_warnings { NO_WARNING, ABORTED_ONLINE_VERIFY, ABORTED_RESYNC, CONNECTION_LOST_NEGOTIATING, IMPLICITLY_UPGRADED_DISK, IMPLICITLY_UPGRADED_PDSK, }; static int w_after_state_ch(struct drbd_work *w, int unused); static void after_state_ch(struct drbd_device *device, union drbd_state os, union drbd_state ns, enum chg_state_flags flags); static enum drbd_state_rv is_valid_state(struct drbd_device *, union drbd_state); static enum drbd_state_rv is_valid_soft_transition(union drbd_state, union drbd_state, struct drbd_connection *); static enum drbd_state_rv is_valid_transition(union drbd_state os, union drbd_state ns); static union drbd_state sanitize_state(struct drbd_device *device, union drbd_state os, union drbd_state ns, enum sanitize_state_warnings *warn); static inline bool is_susp(union drbd_state s) { return s.susp || s.susp_nod || s.susp_fen; } bool conn_all_vols_unconf(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; bool rv = true; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (device->state.disk != D_DISKLESS || device->state.conn != C_STANDALONE || device->state.role != R_SECONDARY) { rv = false; break; } } rcu_read_unlock(); return rv; } /* Unfortunately the states where not correctly ordered, when they where defined. therefore can not use max_t() here. */ static enum drbd_role max_role(enum drbd_role role1, enum drbd_role role2) { if (role1 == R_PRIMARY || role2 == R_PRIMARY) return R_PRIMARY; if (role1 == R_SECONDARY || role2 == R_SECONDARY) return R_SECONDARY; return R_UNKNOWN; } static enum drbd_role min_role(enum drbd_role role1, enum drbd_role role2) { if (role1 == R_UNKNOWN || role2 == R_UNKNOWN) return R_UNKNOWN; if (role1 == R_SECONDARY || role2 == R_SECONDARY) return R_SECONDARY; return R_PRIMARY; } enum drbd_role conn_highest_role(struct drbd_connection *connection) { enum drbd_role role = R_UNKNOWN; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; role = max_role(role, device->state.role); } rcu_read_unlock(); return role; } enum drbd_role conn_highest_peer(struct drbd_connection *connection) { enum drbd_role peer = R_UNKNOWN; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; peer = max_role(peer, device->state.peer); } rcu_read_unlock(); return peer; } enum drbd_disk_state conn_highest_disk(struct drbd_connection *connection) { enum drbd_disk_state ds = D_DISKLESS; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; ds = max_t(enum drbd_disk_state, ds, device->state.disk); } rcu_read_unlock(); return ds; } enum drbd_disk_state conn_lowest_disk(struct drbd_connection *connection) { enum drbd_disk_state ds = D_MASK; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; ds = min_t(enum drbd_disk_state, ds, device->state.disk); } rcu_read_unlock(); return ds; } enum drbd_disk_state conn_highest_pdsk(struct drbd_connection *connection) { enum drbd_disk_state ds = D_DISKLESS; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; ds = max_t(enum drbd_disk_state, ds, device->state.pdsk); } rcu_read_unlock(); return ds; } enum drbd_conns conn_lowest_conn(struct drbd_connection *connection) { enum drbd_conns conn = C_MASK; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; conn = min_t(enum drbd_conns, conn, device->state.conn); } rcu_read_unlock(); return conn; } static bool no_peer_wf_report_params(struct drbd_connection *connection) { struct drbd_peer_device *peer_device; int vnr; bool rv = true; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) if (peer_device->device->state.conn == C_WF_REPORT_PARAMS) { rv = false; break; } rcu_read_unlock(); return rv; } /** * cl_wide_st_chg() - true if the state change is a cluster wide one * @device: DRBD device. * @os: old (current) state. * @ns: new (wanted) state. */ static int cl_wide_st_chg(struct drbd_device *device, union drbd_state os, union drbd_state ns) { return (os.conn >= C_CONNECTED && ns.conn >= C_CONNECTED && ((os.role != R_PRIMARY && ns.role == R_PRIMARY) || (os.conn != C_STARTING_SYNC_T && ns.conn == C_STARTING_SYNC_T) || (os.conn != C_STARTING_SYNC_S && ns.conn == C_STARTING_SYNC_S) || (os.disk != D_FAILED && ns.disk == D_FAILED))) || (os.conn >= C_CONNECTED && ns.conn == C_DISCONNECTING) || (os.conn == C_CONNECTED && ns.conn == C_VERIFY_S) || (os.conn == C_CONNECTED && ns.conn == C_WF_REPORT_PARAMS); } static union drbd_state apply_mask_val(union drbd_state os, union drbd_state mask, union drbd_state val) { union drbd_state ns; ns.i = (os.i & ~mask.i) | val.i; return ns; } enum drbd_state_rv drbd_change_state(struct drbd_device *device, enum chg_state_flags f, union drbd_state mask, union drbd_state val) { unsigned long flags; union drbd_state ns; enum drbd_state_rv rv; spin_lock_irqsave(&device->resource->req_lock, flags); ns = apply_mask_val(drbd_read_state(device), mask, val); rv = _drbd_set_state(device, ns, f, NULL); spin_unlock_irqrestore(&device->resource->req_lock, flags); return rv; } /** * drbd_force_state() - Impose a change which happens outside our control on our state * @device: DRBD device. * @mask: mask of state bits to change. * @val: value of new state bits. */ void drbd_force_state(struct drbd_device *device, union drbd_state mask, union drbd_state val) { drbd_change_state(device, CS_HARD, mask, val); } static enum drbd_state_rv _req_st_cond(struct drbd_device *device, union drbd_state mask, union drbd_state val) { union drbd_state os, ns; unsigned long flags; enum drbd_state_rv rv; if (test_and_clear_bit(CL_ST_CHG_SUCCESS, &device->flags)) return SS_CW_SUCCESS; if (test_and_clear_bit(CL_ST_CHG_FAIL, &device->flags)) return SS_CW_FAILED_BY_PEER; spin_lock_irqsave(&device->resource->req_lock, flags); os = drbd_read_state(device); ns = sanitize_state(device, os, apply_mask_val(os, mask, val), NULL); rv = is_valid_transition(os, ns); if (rv >= SS_SUCCESS) rv = SS_UNKNOWN_ERROR; /* cont waiting, otherwise fail. */ if (!cl_wide_st_chg(device, os, ns)) rv = SS_CW_NO_NEED; if (rv == SS_UNKNOWN_ERROR) { rv = is_valid_state(device, ns); if (rv >= SS_SUCCESS) { rv = is_valid_soft_transition(os, ns, first_peer_device(device)->connection); if (rv >= SS_SUCCESS) rv = SS_UNKNOWN_ERROR; /* cont waiting, otherwise fail. */ } } spin_unlock_irqrestore(&device->resource->req_lock, flags); return rv; } /** * drbd_req_state() - Perform an eventually cluster wide state change * @device: DRBD device. * @mask: mask of state bits to change. * @val: value of new state bits. * @f: flags * * Should not be called directly, use drbd_request_state() or * _drbd_request_state(). */ static enum drbd_state_rv drbd_req_state(struct drbd_device *device, union drbd_state mask, union drbd_state val, enum chg_state_flags f) { struct completion done; unsigned long flags; union drbd_state os, ns; enum drbd_state_rv rv; init_completion(&done); if (f & CS_SERIALIZE) mutex_lock(device->state_mutex); ns = val; /* assign debug info, if any */ spin_lock_irqsave(&device->resource->req_lock, flags); os = drbd_read_state(device); ns = sanitize_state(device, os, apply_mask_val(os, mask, val), NULL); rv = is_valid_transition(os, ns); if (rv < SS_SUCCESS) { spin_unlock_irqrestore(&device->resource->req_lock, flags); goto abort; } if (cl_wide_st_chg(device, os, ns)) { rv = is_valid_state(device, ns); if (rv == SS_SUCCESS) rv = is_valid_soft_transition(os, ns, first_peer_device(device)->connection); spin_unlock_irqrestore(&device->resource->req_lock, flags); if (rv < SS_SUCCESS) { if (f & CS_VERBOSE) print_st_err(device, os, ns, rv); goto abort; } if (drbd_send_state_req(first_peer_device(device), mask, val)) { rv = SS_CW_FAILED_BY_PEER; if (f & CS_VERBOSE) print_st_err(device, os, ns, rv); goto abort; } wait_event(device->state_wait, (rv = _req_st_cond(device, mask, val))); if (rv < SS_SUCCESS) { if (f & CS_VERBOSE) print_st_err(device, os, ns, rv); goto abort; } spin_lock_irqsave(&device->resource->req_lock, flags); ns = apply_mask_val(drbd_read_state(device), mask, val); rv = _drbd_set_state(device, ns, f, &done); } else { rv = _drbd_set_state(device, ns, f, &done); } spin_unlock_irqrestore(&device->resource->req_lock, flags); if (f & CS_WAIT_COMPLETE && rv == SS_SUCCESS) { D_ASSERT(device, current != first_peer_device(device)->connection->worker.task); wait_for_completion(&done); } abort: if (f & CS_SERIALIZE) mutex_unlock(device->state_mutex); return rv; } /** * _drbd_request_state() - Request a state change (with flags) * @device: DRBD device. * @mask: mask of state bits to change. * @val: value of new state bits. * @f: flags * * Cousin of drbd_request_state(), useful with the CS_WAIT_COMPLETE * flag, or when logging of failed state change requests is not desired. */ enum drbd_state_rv _drbd_request_state(struct drbd_device *device, union drbd_state mask, union drbd_state val, enum chg_state_flags f) { enum drbd_state_rv rv; wait_event(device->state_wait, (rv = drbd_req_state(device, mask, val, f)) != SS_IN_TRANSIENT_STATE); return rv; } /* pretty print of drbd internal state */ #define STATE_FMT " %s = { cs:%s ro:%s/%s ds:%s/%s %c%c%c%c%c%c }\n" #define STATE_ARGS(tag, s) \ tag, \ drbd_conn_str(s.conn), \ drbd_role_str(s.role), \ drbd_role_str(s.peer), \ drbd_disk_str(s.disk), \ drbd_disk_str(s.pdsk), \ is_susp(s) ? 's' : 'r', \ s.aftr_isp ? 'a' : '-', \ s.peer_isp ? 'p' : '-', \ s.user_isp ? 'u' : '-', \ s.susp_fen ? 'F' : '-', \ s.susp_nod ? 'N' : '-' void print_st(struct drbd_device *device, const char *tag, union drbd_state s) { drbd_err(device, STATE_FMT, STATE_ARGS(tag, s)); } void print_st_err(struct drbd_device *device, union drbd_state os, union drbd_state ns, enum drbd_state_rv err) { if (err == SS_IN_TRANSIENT_STATE) return; drbd_err(device, "State change failed: %s\n", drbd_set_st_err_str(err)); print_st(device, " state", os); print_st(device, "wanted", ns); } static long print_state_change(char *pb, union drbd_state os, union drbd_state ns, enum chg_state_flags flags) { char *pbp; pbp = pb; *pbp = 0; if (ns.role != os.role && flags & CS_DC_ROLE) pbp += sprintf(pbp, "role( %s -> %s ) ", drbd_role_str(os.role), drbd_role_str(ns.role)); if (ns.peer != os.peer && flags & CS_DC_PEER) pbp += sprintf(pbp, "peer( %s -> %s ) ", drbd_role_str(os.peer), drbd_role_str(ns.peer)); if (ns.conn != os.conn && flags & CS_DC_CONN) pbp += sprintf(pbp, "conn( %s -> %s ) ", drbd_conn_str(os.conn), drbd_conn_str(ns.conn)); if (ns.disk != os.disk && flags & CS_DC_DISK) pbp += sprintf(pbp, "disk( %s -> %s ) ", drbd_disk_str(os.disk), drbd_disk_str(ns.disk)); if (ns.pdsk != os.pdsk && flags & CS_DC_PDSK) pbp += sprintf(pbp, "pdsk( %s -> %s ) ", drbd_disk_str(os.pdsk), drbd_disk_str(ns.pdsk)); return pbp - pb; } static void drbd_pr_state_change(struct drbd_device *device, union drbd_state os, union drbd_state ns, enum chg_state_flags flags) { char pb[300]; char *pbp = pb; pbp += print_state_change(pbp, os, ns, flags ^ CS_DC_MASK); if (ns.aftr_isp != os.aftr_isp) pbp += sprintf(pbp, "aftr_isp( %d -> %d ) ", os.aftr_isp, ns.aftr_isp); if (ns.peer_isp != os.peer_isp) pbp += sprintf(pbp, "peer_isp( %d -> %d ) ", os.peer_isp, ns.peer_isp); if (ns.user_isp != os.user_isp) pbp += sprintf(pbp, "user_isp( %d -> %d ) ", os.user_isp, ns.user_isp); if (pbp != pb) drbd_info(device, "%s\n", pb); } static void conn_pr_state_change(struct drbd_connection *connection, union drbd_state os, union drbd_state ns, enum chg_state_flags flags) { char pb[300]; char *pbp = pb; pbp += print_state_change(pbp, os, ns, flags); if (is_susp(ns) != is_susp(os) && flags & CS_DC_SUSP) pbp += sprintf(pbp, "susp( %d -> %d ) ", is_susp(os), is_susp(ns)); if (pbp != pb) drbd_info(connection, "%s\n", pb); } /** * is_valid_state() - Returns an SS_ error code if ns is not valid * @device: DRBD device. * @ns: State to consider. */ static enum drbd_state_rv is_valid_state(struct drbd_device *device, union drbd_state ns) { /* See drbd_state_sw_errors in drbd_strings.c */ enum drbd_fencing_p fp; enum drbd_state_rv rv = SS_SUCCESS; struct net_conf *nc; rcu_read_lock(); fp = FP_DONT_CARE; if (get_ldev(device)) { fp = rcu_dereference(device->ldev->disk_conf)->fencing; put_ldev(device); } nc = rcu_dereference(first_peer_device(device)->connection->net_conf); if (nc) { if (!nc->two_primaries && ns.role == R_PRIMARY) { if (ns.peer == R_PRIMARY) rv = SS_TWO_PRIMARIES; else if (conn_highest_peer(first_peer_device(device)->connection) == R_PRIMARY) rv = SS_O_VOL_PEER_PRI; } } if (rv <= 0) /* already found a reason to abort */; else if (ns.role == R_SECONDARY && device->open_cnt) rv = SS_DEVICE_IN_USE; else if (ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.disk < D_UP_TO_DATE) rv = SS_NO_UP_TO_DATE_DISK; else if (fp >= FP_RESOURCE && ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk >= D_UNKNOWN) rv = SS_PRIMARY_NOP; else if (ns.role == R_PRIMARY && ns.disk <= D_INCONSISTENT && ns.pdsk <= D_INCONSISTENT) rv = SS_NO_UP_TO_DATE_DISK; else if (ns.conn > C_CONNECTED && ns.disk < D_INCONSISTENT) rv = SS_NO_LOCAL_DISK; else if (ns.conn > C_CONNECTED && ns.pdsk < D_INCONSISTENT) rv = SS_NO_REMOTE_DISK; else if (ns.conn > C_CONNECTED && ns.disk < D_UP_TO_DATE && ns.pdsk < D_UP_TO_DATE) rv = SS_NO_UP_TO_DATE_DISK; else if ((ns.conn == C_CONNECTED || ns.conn == C_WF_BITMAP_S || ns.conn == C_SYNC_SOURCE || ns.conn == C_PAUSED_SYNC_S) && ns.disk == D_OUTDATED) rv = SS_CONNECTED_OUTDATES; else if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) && (nc->verify_alg[0] == 0)) rv = SS_NO_VERIFY_ALG; else if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) && first_peer_device(device)->connection->agreed_pro_version < 88) rv = SS_NOT_SUPPORTED; else if (ns.role == R_PRIMARY && ns.disk < D_UP_TO_DATE && ns.pdsk < D_UP_TO_DATE) rv = SS_NO_UP_TO_DATE_DISK; else if ((ns.conn == C_STARTING_SYNC_S || ns.conn == C_STARTING_SYNC_T) && ns.pdsk == D_UNKNOWN) rv = SS_NEED_CONNECTION; else if (ns.conn >= C_CONNECTED && ns.pdsk == D_UNKNOWN) rv = SS_CONNECTED_OUTDATES; rcu_read_unlock(); return rv; } /** * is_valid_soft_transition() - Returns an SS_ error code if the state transition is not possible * This function limits state transitions that may be declined by DRBD. I.e. * user requests (aka soft transitions). * @device: DRBD device. * @ns: new state. * @os: old state. */ static enum drbd_state_rv is_valid_soft_transition(union drbd_state os, union drbd_state ns, struct drbd_connection *connection) { enum drbd_state_rv rv = SS_SUCCESS; if ((ns.conn == C_STARTING_SYNC_T || ns.conn == C_STARTING_SYNC_S) && os.conn > C_CONNECTED) rv = SS_RESYNC_RUNNING; if (ns.conn == C_DISCONNECTING && os.conn == C_STANDALONE) rv = SS_ALREADY_STANDALONE; if (ns.disk > D_ATTACHING && os.disk == D_DISKLESS) rv = SS_IS_DISKLESS; if (ns.conn == C_WF_CONNECTION && os.conn < C_UNCONNECTED) rv = SS_NO_NET_CONFIG; if (ns.disk == D_OUTDATED && os.disk < D_OUTDATED && os.disk != D_ATTACHING) rv = SS_LOWER_THAN_OUTDATED; if (ns.conn == C_DISCONNECTING && os.conn == C_UNCONNECTED) rv = SS_IN_TRANSIENT_STATE; /* if (ns.conn == os.conn && ns.conn == C_WF_REPORT_PARAMS) rv = SS_IN_TRANSIENT_STATE; */ /* While establishing a connection only allow cstate to change. Delay/refuse role changes, detach attach etc... */ if (test_bit(STATE_SENT, &connection->flags) && !(os.conn == C_WF_REPORT_PARAMS || (ns.conn == C_WF_REPORT_PARAMS && os.conn == C_WF_CONNECTION))) rv = SS_IN_TRANSIENT_STATE; if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) && os.conn < C_CONNECTED) rv = SS_NEED_CONNECTION; if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) && ns.conn != os.conn && os.conn > C_CONNECTED) rv = SS_RESYNC_RUNNING; if ((ns.conn == C_STARTING_SYNC_S || ns.conn == C_STARTING_SYNC_T) && os.conn < C_CONNECTED) rv = SS_NEED_CONNECTION; if ((ns.conn == C_SYNC_TARGET || ns.conn == C_SYNC_SOURCE) && os.conn < C_WF_REPORT_PARAMS) rv = SS_NEED_CONNECTION; /* No NetworkFailure -> SyncTarget etc... */ if (ns.conn == C_DISCONNECTING && ns.pdsk == D_OUTDATED && os.conn < C_CONNECTED && os.pdsk > D_OUTDATED) rv = SS_OUTDATE_WO_CONN; return rv; } static enum drbd_state_rv is_valid_conn_transition(enum drbd_conns oc, enum drbd_conns nc) { /* no change -> nothing to do, at least for the connection part */ if (oc == nc) return SS_NOTHING_TO_DO; /* disconnect of an unconfigured connection does not make sense */ if (oc == C_STANDALONE && nc == C_DISCONNECTING) return SS_ALREADY_STANDALONE; /* from C_STANDALONE, we start with C_UNCONNECTED */ if (oc == C_STANDALONE && nc != C_UNCONNECTED) return SS_NEED_CONNECTION; /* When establishing a connection we need to go through WF_REPORT_PARAMS! Necessary to do the right thing upon invalidate-remote on a disconnected resource */ if (oc < C_WF_REPORT_PARAMS && nc >= C_CONNECTED) return SS_NEED_CONNECTION; /* After a network error only C_UNCONNECTED or C_DISCONNECTING may follow. */ if (oc >= C_TIMEOUT && oc <= C_TEAR_DOWN && nc != C_UNCONNECTED && nc != C_DISCONNECTING) return SS_IN_TRANSIENT_STATE; /* After C_DISCONNECTING only C_STANDALONE may follow */ if (oc == C_DISCONNECTING && nc != C_STANDALONE) return SS_IN_TRANSIENT_STATE; return SS_SUCCESS; } /** * is_valid_transition() - Returns an SS_ error code if the state transition is not possible * This limits hard state transitions. Hard state transitions are facts there are * imposed on DRBD by the environment. E.g. disk broke or network broke down. * But those hard state transitions are still not allowed to do everything. * @ns: new state. * @os: old state. */ static enum drbd_state_rv is_valid_transition(union drbd_state os, union drbd_state ns) { enum drbd_state_rv rv; rv = is_valid_conn_transition(os.conn, ns.conn); /* we cannot fail (again) if we already detached */ if (ns.disk == D_FAILED && os.disk == D_DISKLESS) rv = SS_IS_DISKLESS; return rv; } static void print_sanitize_warnings(struct drbd_device *device, enum sanitize_state_warnings warn) { static const char *msg_table[] = { [NO_WARNING] = "", [ABORTED_ONLINE_VERIFY] = "Online-verify aborted.", [ABORTED_RESYNC] = "Resync aborted.", [CONNECTION_LOST_NEGOTIATING] = "Connection lost while negotiating, no data!", [IMPLICITLY_UPGRADED_DISK] = "Implicitly upgraded disk", [IMPLICITLY_UPGRADED_PDSK] = "Implicitly upgraded pdsk", }; if (warn != NO_WARNING) drbd_warn(device, "%s\n", msg_table[warn]); } /** * sanitize_state() - Resolves implicitly necessary additional changes to a state transition * @device: DRBD device. * @os: old state. * @ns: new state. * @warn_sync_abort: * * When we loose connection, we have to set the state of the peers disk (pdsk) * to D_UNKNOWN. This rule and many more along those lines are in this function. */ static union drbd_state sanitize_state(struct drbd_device *device, union drbd_state os, union drbd_state ns, enum sanitize_state_warnings *warn) { enum drbd_fencing_p fp; enum drbd_disk_state disk_min, disk_max, pdsk_min, pdsk_max; if (warn) *warn = NO_WARNING; fp = FP_DONT_CARE; if (get_ldev(device)) { rcu_read_lock(); fp = rcu_dereference(device->ldev->disk_conf)->fencing; rcu_read_unlock(); put_ldev(device); } /* Implications from connection to peer and peer_isp */ if (ns.conn < C_CONNECTED) { ns.peer_isp = 0; ns.peer = R_UNKNOWN; if (ns.pdsk > D_UNKNOWN || ns.pdsk < D_INCONSISTENT) ns.pdsk = D_UNKNOWN; } /* Clear the aftr_isp when becoming unconfigured */ if (ns.conn == C_STANDALONE && ns.disk == D_DISKLESS && ns.role == R_SECONDARY) ns.aftr_isp = 0; /* An implication of the disk states onto the connection state */ /* Abort resync if a disk fails/detaches */ if (ns.conn > C_CONNECTED && (ns.disk <= D_FAILED || ns.pdsk <= D_FAILED)) { if (warn) *warn = ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T ? ABORTED_ONLINE_VERIFY : ABORTED_RESYNC; ns.conn = C_CONNECTED; } /* Connection breaks down before we finished "Negotiating" */ if (ns.conn < C_CONNECTED && ns.disk == D_NEGOTIATING && get_ldev_if_state(device, D_NEGOTIATING)) { if (device->ed_uuid == device->ldev->md.uuid[UI_CURRENT]) { ns.disk = device->new_state_tmp.disk; ns.pdsk = device->new_state_tmp.pdsk; } else { if (warn) *warn = CONNECTION_LOST_NEGOTIATING; ns.disk = D_DISKLESS; ns.pdsk = D_UNKNOWN; } put_ldev(device); } /* D_CONSISTENT and D_OUTDATED vanish when we get connected */ if (ns.conn >= C_CONNECTED && ns.conn < C_AHEAD) { if (ns.disk == D_CONSISTENT || ns.disk == D_OUTDATED) ns.disk = D_UP_TO_DATE; if (ns.pdsk == D_CONSISTENT || ns.pdsk == D_OUTDATED) ns.pdsk = D_UP_TO_DATE; } /* Implications of the connection stat on the disk states */ disk_min = D_DISKLESS; disk_max = D_UP_TO_DATE; pdsk_min = D_INCONSISTENT; pdsk_max = D_UNKNOWN; switch ((enum drbd_conns)ns.conn) { case C_WF_BITMAP_T: case C_PAUSED_SYNC_T: case C_STARTING_SYNC_T: case C_WF_SYNC_UUID: case C_BEHIND: disk_min = D_INCONSISTENT; disk_max = D_OUTDATED; pdsk_min = D_UP_TO_DATE; pdsk_max = D_UP_TO_DATE; break; case C_VERIFY_S: case C_VERIFY_T: disk_min = D_UP_TO_DATE; disk_max = D_UP_TO_DATE; pdsk_min = D_UP_TO_DATE; pdsk_max = D_UP_TO_DATE; break; case C_CONNECTED: disk_min = D_DISKLESS; disk_max = D_UP_TO_DATE; pdsk_min = D_DISKLESS; pdsk_max = D_UP_TO_DATE; break; case C_WF_BITMAP_S: case C_PAUSED_SYNC_S: case C_STARTING_SYNC_S: case C_AHEAD: disk_min = D_UP_TO_DATE; disk_max = D_UP_TO_DATE; pdsk_min = D_INCONSISTENT; pdsk_max = D_CONSISTENT; /* D_OUTDATED would be nice. But explicit outdate necessary*/ break; case C_SYNC_TARGET: disk_min = D_INCONSISTENT; disk_max = D_INCONSISTENT; pdsk_min = D_UP_TO_DATE; pdsk_max = D_UP_TO_DATE; break; case C_SYNC_SOURCE: disk_min = D_UP_TO_DATE; disk_max = D_UP_TO_DATE; pdsk_min = D_INCONSISTENT; pdsk_max = D_INCONSISTENT; break; case C_STANDALONE: case C_DISCONNECTING: case C_UNCONNECTED: case C_TIMEOUT: case C_BROKEN_PIPE: case C_NETWORK_FAILURE: case C_PROTOCOL_ERROR: case C_TEAR_DOWN: case C_WF_CONNECTION: case C_WF_REPORT_PARAMS: case C_MASK: break; } if (ns.disk > disk_max) ns.disk = disk_max; if (ns.disk < disk_min) { if (warn) *warn = IMPLICITLY_UPGRADED_DISK; ns.disk = disk_min; } if (ns.pdsk > pdsk_max) ns.pdsk = pdsk_max; if (ns.pdsk < pdsk_min) { if (warn) *warn = IMPLICITLY_UPGRADED_PDSK; ns.pdsk = pdsk_min; } if (fp == FP_STONITH && (ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk > D_OUTDATED) && !(os.role == R_PRIMARY && os.conn < C_CONNECTED && os.pdsk > D_OUTDATED)) ns.susp_fen = 1; /* Suspend IO while fence-peer handler runs (peer lost) */ if (device->resource->res_opts.on_no_data == OND_SUSPEND_IO && (ns.role == R_PRIMARY && ns.disk < D_UP_TO_DATE && ns.pdsk < D_UP_TO_DATE) && !(os.role == R_PRIMARY && os.disk < D_UP_TO_DATE && os.pdsk < D_UP_TO_DATE)) ns.susp_nod = 1; /* Suspend IO while no data available (no accessible data available) */ if (ns.aftr_isp || ns.peer_isp || ns.user_isp) { if (ns.conn == C_SYNC_SOURCE) ns.conn = C_PAUSED_SYNC_S; if (ns.conn == C_SYNC_TARGET) ns.conn = C_PAUSED_SYNC_T; } else { if (ns.conn == C_PAUSED_SYNC_S) ns.conn = C_SYNC_SOURCE; if (ns.conn == C_PAUSED_SYNC_T) ns.conn = C_SYNC_TARGET; } return ns; } void drbd_resume_al(struct drbd_device *device) { if (test_and_clear_bit(AL_SUSPENDED, &device->flags)) drbd_info(device, "Resumed AL updates\n"); } /* helper for __drbd_set_state */ static void set_ov_position(struct drbd_device *device, enum drbd_conns cs) { if (first_peer_device(device)->connection->agreed_pro_version < 90) device->ov_start_sector = 0; device->rs_total = drbd_bm_bits(device); device->ov_position = 0; if (cs == C_VERIFY_T) { /* starting online verify from an arbitrary position * does not fit well into the existing protocol. * on C_VERIFY_T, we initialize ov_left and friends * implicitly in receive_DataRequest once the * first P_OV_REQUEST is received */ device->ov_start_sector = ~(sector_t)0; } else { unsigned long bit = BM_SECT_TO_BIT(device->ov_start_sector); if (bit >= device->rs_total) { device->ov_start_sector = BM_BIT_TO_SECT(device->rs_total - 1); device->rs_total = 1; } else device->rs_total -= bit; device->ov_position = device->ov_start_sector; } device->ov_left = device->rs_total; } /** * __drbd_set_state() - Set a new DRBD state * @device: DRBD device. * @ns: new state. * @flags: Flags * @done: Optional completion, that will get completed after the after_state_ch() finished * * Caller needs to hold req_lock, and global_state_lock. Do not call directly. */ enum drbd_state_rv __drbd_set_state(struct drbd_device *device, union drbd_state ns, enum chg_state_flags flags, struct completion *done) { union drbd_state os; enum drbd_state_rv rv = SS_SUCCESS; enum sanitize_state_warnings ssw; struct after_state_chg_work *ascw; bool did_remote, should_do_remote; os = drbd_read_state(device); ns = sanitize_state(device, os, ns, &ssw); if (ns.i == os.i) return SS_NOTHING_TO_DO; rv = is_valid_transition(os, ns); if (rv < SS_SUCCESS) return rv; if (!(flags & CS_HARD)) { /* pre-state-change checks ; only look at ns */ /* See drbd_state_sw_errors in drbd_strings.c */ rv = is_valid_state(device, ns); if (rv < SS_SUCCESS) { /* If the old state was illegal as well, then let this happen...*/ if (is_valid_state(device, os) == rv) rv = is_valid_soft_transition(os, ns, first_peer_device(device)->connection); } else rv = is_valid_soft_transition(os, ns, first_peer_device(device)->connection); } if (rv < SS_SUCCESS) { if (flags & CS_VERBOSE) print_st_err(device, os, ns, rv); return rv; } print_sanitize_warnings(device, ssw); drbd_pr_state_change(device, os, ns, flags); /* Display changes to the susp* flags that where caused by the call to sanitize_state(). Only display it here if we where not called from _conn_request_state() */ if (!(flags & CS_DC_SUSP)) conn_pr_state_change(first_peer_device(device)->connection, os, ns, (flags & ~CS_DC_MASK) | CS_DC_SUSP); /* if we are going -> D_FAILED or D_DISKLESS, grab one extra reference * on the ldev here, to be sure the transition -> D_DISKLESS resp. * drbd_ldev_destroy() won't happen before our corresponding * after_state_ch works run, where we put_ldev again. */ if ((os.disk != D_FAILED && ns.disk == D_FAILED) || (os.disk != D_DISKLESS && ns.disk == D_DISKLESS)) atomic_inc(&device->local_cnt); did_remote = drbd_should_do_remote(device->state); device->state.i = ns.i; should_do_remote = drbd_should_do_remote(device->state); device->resource->susp = ns.susp; device->resource->susp_nod = ns.susp_nod; device->resource->susp_fen = ns.susp_fen; /* put replicated vs not-replicated requests in seperate epochs */ if (did_remote != should_do_remote) start_new_tl_epoch(first_peer_device(device)->connection); if (os.disk == D_ATTACHING && ns.disk >= D_NEGOTIATING) drbd_print_uuids(device, "attached to UUIDs"); /* Wake up role changes, that were delayed because of connection establishing */ if (os.conn == C_WF_REPORT_PARAMS && ns.conn != C_WF_REPORT_PARAMS && no_peer_wf_report_params(first_peer_device(device)->connection)) clear_bit(STATE_SENT, &first_peer_device(device)->connection->flags); wake_up(&device->misc_wait); wake_up(&device->state_wait); wake_up(&first_peer_device(device)->connection->ping_wait); /* Aborted verify run, or we reached the stop sector. * Log the last position, unless end-of-device. */ if ((os.conn == C_VERIFY_S || os.conn == C_VERIFY_T) && ns.conn <= C_CONNECTED) { device->ov_start_sector = BM_BIT_TO_SECT(drbd_bm_bits(device) - device->ov_left); if (device->ov_left) drbd_info(device, "Online Verify reached sector %llu\n", (unsigned long long)device->ov_start_sector); } if ((os.conn == C_PAUSED_SYNC_T || os.conn == C_PAUSED_SYNC_S) && (ns.conn == C_SYNC_TARGET || ns.conn == C_SYNC_SOURCE)) { drbd_info(device, "Syncer continues.\n"); device->rs_paused += (long)jiffies -(long)device->rs_mark_time[device->rs_last_mark]; if (ns.conn == C_SYNC_TARGET) mod_timer(&device->resync_timer, jiffies); } if ((os.conn == C_SYNC_TARGET || os.conn == C_SYNC_SOURCE) && (ns.conn == C_PAUSED_SYNC_T || ns.conn == C_PAUSED_SYNC_S)) { drbd_info(device, "Resync suspended\n"); device->rs_mark_time[device->rs_last_mark] = jiffies; } if (os.conn == C_CONNECTED && (ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T)) { unsigned long now = jiffies; int i; set_ov_position(device, ns.conn); device->rs_start = now; device->rs_last_events = 0; device->rs_last_sect_ev = 0; device->ov_last_oos_size = 0; device->ov_last_oos_start = 0; for (i = 0; i < DRBD_SYNC_MARKS; i++) { device->rs_mark_left[i] = device->ov_left; device->rs_mark_time[i] = now; } drbd_rs_controller_reset(device); if (ns.conn == C_VERIFY_S) { drbd_info(device, "Starting Online Verify from sector %llu\n", (unsigned long long)device->ov_position); mod_timer(&device->resync_timer, jiffies); } } if (get_ldev(device)) { u32 mdf = device->ldev->md.flags & ~(MDF_CONSISTENT|MDF_PRIMARY_IND| MDF_CONNECTED_IND|MDF_WAS_UP_TO_DATE| MDF_PEER_OUT_DATED|MDF_CRASHED_PRIMARY); mdf &= ~MDF_AL_CLEAN; if (test_bit(CRASHED_PRIMARY, &device->flags)) mdf |= MDF_CRASHED_PRIMARY; if (device->state.role == R_PRIMARY || (device->state.pdsk < D_INCONSISTENT && device->state.peer == R_PRIMARY)) mdf |= MDF_PRIMARY_IND; if (device->state.conn > C_WF_REPORT_PARAMS) mdf |= MDF_CONNECTED_IND; if (device->state.disk > D_INCONSISTENT) mdf |= MDF_CONSISTENT; if (device->state.disk > D_OUTDATED) mdf |= MDF_WAS_UP_TO_DATE; if (device->state.pdsk <= D_OUTDATED && device->state.pdsk >= D_INCONSISTENT) mdf |= MDF_PEER_OUT_DATED; if (mdf != device->ldev->md.flags) { device->ldev->md.flags = mdf; drbd_md_mark_dirty(device); } if (os.disk < D_CONSISTENT && ns.disk >= D_CONSISTENT) drbd_set_ed_uuid(device, device->ldev->md.uuid[UI_CURRENT]); put_ldev(device); } /* Peer was forced D_UP_TO_DATE & R_PRIMARY, consider to resync */ if (os.disk == D_INCONSISTENT && os.pdsk == D_INCONSISTENT && os.peer == R_SECONDARY && ns.peer == R_PRIMARY) set_bit(CONSIDER_RESYNC, &device->flags); /* Receiver should clean up itself */ if (os.conn != C_DISCONNECTING && ns.conn == C_DISCONNECTING) drbd_thread_stop_nowait(&first_peer_device(device)->connection->receiver); /* Now the receiver finished cleaning up itself, it should die */ if (os.conn != C_STANDALONE && ns.conn == C_STANDALONE) drbd_thread_stop_nowait(&first_peer_device(device)->connection->receiver); /* Upon network failure, we need to restart the receiver. */ if (os.conn > C_WF_CONNECTION && ns.conn <= C_TEAR_DOWN && ns.conn >= C_TIMEOUT) drbd_thread_restart_nowait(&first_peer_device(device)->connection->receiver); /* Resume AL writing if we get a connection */ if (os.conn < C_CONNECTED && ns.conn >= C_CONNECTED) { drbd_resume_al(device); first_peer_device(device)->connection->connect_cnt++; } /* remember last attach time so request_timer_fn() won't * kill newly established sessions while we are still trying to thaw * previously frozen IO */ if ((os.disk == D_ATTACHING || os.disk == D_NEGOTIATING) && ns.disk > D_NEGOTIATING) device->last_reattach_jif = jiffies; ascw = kmalloc(sizeof(*ascw), GFP_ATOMIC); if (ascw) { ascw->os = os; ascw->ns = ns; ascw->flags = flags; ascw->w.cb = w_after_state_ch; ascw->device = device; ascw->done = done; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &ascw->w); } else { drbd_err(device, "Could not kmalloc an ascw\n"); } return rv; } static int w_after_state_ch(struct drbd_work *w, int unused) { struct after_state_chg_work *ascw = container_of(w, struct after_state_chg_work, w); struct drbd_device *device = ascw->device; after_state_ch(device, ascw->os, ascw->ns, ascw->flags); if (ascw->flags & CS_WAIT_COMPLETE) complete(ascw->done); kfree(ascw); return 0; } static void abw_start_sync(struct drbd_device *device, int rv) { if (rv) { drbd_err(device, "Writing the bitmap failed not starting resync.\n"); _drbd_request_state(device, NS(conn, C_CONNECTED), CS_VERBOSE); return; } switch (device->state.conn) { case C_STARTING_SYNC_T: _drbd_request_state(device, NS(conn, C_WF_SYNC_UUID), CS_VERBOSE); break; case C_STARTING_SYNC_S: drbd_start_resync(device, C_SYNC_SOURCE); break; } } int drbd_bitmap_io_from_worker(struct drbd_device *device, int (*io_fn)(struct drbd_device *), char *why, enum bm_flag flags) { int rv; D_ASSERT(device, current == first_peer_device(device)->connection->worker.task); /* open coded non-blocking drbd_suspend_io(device); */ set_bit(SUSPEND_IO, &device->flags); drbd_bm_lock(device, why, flags); rv = io_fn(device); drbd_bm_unlock(device); drbd_resume_io(device); return rv; } /** * after_state_ch() - Perform after state change actions that may sleep * @device: DRBD device. * @os: old state. * @ns: new state. * @flags: Flags */ static void after_state_ch(struct drbd_device *device, union drbd_state os, union drbd_state ns, enum chg_state_flags flags) { struct drbd_resource *resource = device->resource; struct sib_info sib; sib.sib_reason = SIB_STATE_CHANGE; sib.os = os; sib.ns = ns; if (os.conn != C_CONNECTED && ns.conn == C_CONNECTED) { clear_bit(CRASHED_PRIMARY, &device->flags); if (device->p_uuid) device->p_uuid[UI_FLAGS] &= ~((u64)2); } /* Inform userspace about the change... */ drbd_bcast_event(device, &sib); if (!(os.role == R_PRIMARY && os.disk < D_UP_TO_DATE && os.pdsk < D_UP_TO_DATE) && (ns.role == R_PRIMARY && ns.disk < D_UP_TO_DATE && ns.pdsk < D_UP_TO_DATE)) drbd_khelper(device, "pri-on-incon-degr"); /* Here we have the actions that are performed after a state change. This function might sleep */ if (ns.susp_nod) { struct drbd_connection *connection = first_peer_device(device)->connection; enum drbd_req_event what = NOTHING; spin_lock_irq(&device->resource->req_lock); if (os.conn < C_CONNECTED && conn_lowest_conn(connection) >= C_CONNECTED) what = RESEND; if ((os.disk == D_ATTACHING || os.disk == D_NEGOTIATING) && conn_lowest_disk(connection) > D_NEGOTIATING) what = RESTART_FROZEN_DISK_IO; if (resource->susp_nod && what != NOTHING) { _tl_restart(connection, what); _conn_request_state(connection, (union drbd_state) { { .susp_nod = 1 } }, (union drbd_state) { { .susp_nod = 0 } }, CS_VERBOSE); } spin_unlock_irq(&device->resource->req_lock); } if (ns.susp_fen) { struct drbd_connection *connection = first_peer_device(device)->connection; spin_lock_irq(&device->resource->req_lock); if (resource->susp_fen && conn_lowest_conn(connection) >= C_CONNECTED) { /* case2: The connection was established again: */ struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) clear_bit(NEW_CUR_UUID, &peer_device->device->flags); rcu_read_unlock(); _tl_restart(connection, RESEND); _conn_request_state(connection, (union drbd_state) { { .susp_fen = 1 } }, (union drbd_state) { { .susp_fen = 0 } }, CS_VERBOSE); } spin_unlock_irq(&device->resource->req_lock); } /* Became sync source. With protocol >= 96, we still need to send out * the sync uuid now. Need to do that before any drbd_send_state, or * the other side may go "paused sync" before receiving the sync uuids, * which is unexpected. */ if ((os.conn != C_SYNC_SOURCE && os.conn != C_PAUSED_SYNC_S) && (ns.conn == C_SYNC_SOURCE || ns.conn == C_PAUSED_SYNC_S) && first_peer_device(device)->connection->agreed_pro_version >= 96 && get_ldev(device)) { drbd_gen_and_send_sync_uuid(first_peer_device(device)); put_ldev(device); } /* Do not change the order of the if above and the two below... */ if (os.pdsk == D_DISKLESS && ns.pdsk > D_DISKLESS && ns.pdsk != D_UNKNOWN) { /* attach on the peer */ /* we probably will start a resync soon. * make sure those things are properly reset. */ device->rs_total = 0; device->rs_failed = 0; atomic_set(&device->rs_pending_cnt, 0); drbd_rs_cancel_all(device); drbd_send_uuids(first_peer_device(device)); drbd_send_state(first_peer_device(device), ns); } /* No point in queuing send_bitmap if we don't have a connection * anymore, so check also the _current_ state, not only the new state * at the time this work was queued. */ if (os.conn != C_WF_BITMAP_S && ns.conn == C_WF_BITMAP_S && device->state.conn == C_WF_BITMAP_S) drbd_queue_bitmap_io(device, &drbd_send_bitmap, NULL, "send_bitmap (WFBitMapS)", BM_LOCKED_TEST_ALLOWED); /* Lost contact to peer's copy of the data */ if ((os.pdsk >= D_INCONSISTENT && os.pdsk != D_UNKNOWN && os.pdsk != D_OUTDATED) && (ns.pdsk < D_INCONSISTENT || ns.pdsk == D_UNKNOWN || ns.pdsk == D_OUTDATED)) { if (get_ldev(device)) { if ((ns.role == R_PRIMARY || ns.peer == R_PRIMARY) && device->ldev->md.uuid[UI_BITMAP] == 0 && ns.disk >= D_UP_TO_DATE) { if (drbd_suspended(device)) { set_bit(NEW_CUR_UUID, &device->flags); } else { drbd_uuid_new_current(device); drbd_send_uuids(first_peer_device(device)); } } put_ldev(device); } } if (ns.pdsk < D_INCONSISTENT && get_ldev(device)) { if (os.peer == R_SECONDARY && ns.peer == R_PRIMARY && device->ldev->md.uuid[UI_BITMAP] == 0 && ns.disk >= D_UP_TO_DATE) { drbd_uuid_new_current(device); drbd_send_uuids(first_peer_device(device)); } /* D_DISKLESS Peer becomes secondary */ if (os.peer == R_PRIMARY && ns.peer == R_SECONDARY) /* We may still be Primary ourselves. * No harm done if the bitmap still changes, * redirtied pages will follow later. */ drbd_bitmap_io_from_worker(device, &drbd_bm_write, "demote diskless peer", BM_LOCKED_SET_ALLOWED); put_ldev(device); } /* Write out all changed bits on demote. * Though, no need to da that just yet * if there is a resync going on still */ if (os.role == R_PRIMARY && ns.role == R_SECONDARY && device->state.conn <= C_CONNECTED && get_ldev(device)) { /* No changes to the bitmap expected this time, so assert that, * even though no harm was done if it did change. */ drbd_bitmap_io_from_worker(device, &drbd_bm_write, "demote", BM_LOCKED_TEST_ALLOWED); put_ldev(device); } /* Last part of the attaching process ... */ if (ns.conn >= C_CONNECTED && os.disk == D_ATTACHING && ns.disk == D_NEGOTIATING) { drbd_send_sizes(first_peer_device(device), 0, 0); /* to start sync... */ drbd_send_uuids(first_peer_device(device)); drbd_send_state(first_peer_device(device), ns); } /* We want to pause/continue resync, tell peer. */ if (ns.conn >= C_CONNECTED && ((os.aftr_isp != ns.aftr_isp) || (os.user_isp != ns.user_isp))) drbd_send_state(first_peer_device(device), ns); /* In case one of the isp bits got set, suspend other devices. */ if ((!os.aftr_isp && !os.peer_isp && !os.user_isp) && (ns.aftr_isp || ns.peer_isp || ns.user_isp)) suspend_other_sg(device); /* Make sure the peer gets informed about eventual state changes (ISP bits) while we were in WFReportParams. */ if (os.conn == C_WF_REPORT_PARAMS && ns.conn >= C_CONNECTED) drbd_send_state(first_peer_device(device), ns); if (os.conn != C_AHEAD && ns.conn == C_AHEAD) drbd_send_state(first_peer_device(device), ns); /* We are in the progress to start a full sync... */ if ((os.conn != C_STARTING_SYNC_T && ns.conn == C_STARTING_SYNC_T) || (os.conn != C_STARTING_SYNC_S && ns.conn == C_STARTING_SYNC_S)) /* no other bitmap changes expected during this phase */ drbd_queue_bitmap_io(device, &drbd_bmio_set_n_write, &abw_start_sync, "set_n_write from StartingSync", BM_LOCKED_TEST_ALLOWED); /* first half of local IO error, failure to attach, * or administrative detach */ if (os.disk != D_FAILED && ns.disk == D_FAILED) { enum drbd_io_error_p eh = EP_PASS_ON; int was_io_error = 0; /* corresponding get_ldev was in __drbd_set_state, to serialize * our cleanup here with the transition to D_DISKLESS. * But is is still not save to dreference ldev here, since * we might come from an failed Attach before ldev was set. */ if (device->ldev) { rcu_read_lock(); eh = rcu_dereference(device->ldev->disk_conf)->on_io_error; rcu_read_unlock(); was_io_error = test_and_clear_bit(WAS_IO_ERROR, &device->flags); if (was_io_error && eh == EP_CALL_HELPER) drbd_khelper(device, "local-io-error"); /* Immediately allow completion of all application IO, * that waits for completion from the local disk, * if this was a force-detach due to disk_timeout * or administrator request (drbdsetup detach --force). * Do NOT abort otherwise. * Aborting local requests may cause serious problems, * if requests are completed to upper layers already, * and then later the already submitted local bio completes. * This can cause DMA into former bio pages that meanwhile * have been re-used for other things. * So aborting local requests may cause crashes, * or even worse, silent data corruption. */ if (test_and_clear_bit(FORCE_DETACH, &device->flags)) tl_abort_disk_io(device); /* current state still has to be D_FAILED, * there is only one way out: to D_DISKLESS, * and that may only happen after our put_ldev below. */ if (device->state.disk != D_FAILED) drbd_err(device, "ASSERT FAILED: disk is %s during detach\n", drbd_disk_str(device->state.disk)); if (ns.conn >= C_CONNECTED) drbd_send_state(first_peer_device(device), ns); drbd_rs_cancel_all(device); /* In case we want to get something to stable storage still, * this may be the last chance. * Following put_ldev may transition to D_DISKLESS. */ drbd_md_sync(device); } put_ldev(device); } /* second half of local IO error, failure to attach, * or administrative detach, * after local_cnt references have reached zero again */ if (os.disk != D_DISKLESS && ns.disk == D_DISKLESS) { /* We must still be diskless, * re-attach has to be serialized with this! */ if (device->state.disk != D_DISKLESS) drbd_err(device, "ASSERT FAILED: disk is %s while going diskless\n", drbd_disk_str(device->state.disk)); if (ns.conn >= C_CONNECTED) drbd_send_state(first_peer_device(device), ns); /* corresponding get_ldev in __drbd_set_state * this may finaly trigger drbd_ldev_destroy. */ put_ldev(device); } /* Notify peer that I had a local IO error and did not detach. */ if (os.disk == D_UP_TO_DATE && ns.disk == D_INCONSISTENT && ns.conn >= C_CONNECTED) drbd_send_state(first_peer_device(device), ns); /* Disks got bigger while they were detached */ if (ns.disk > D_NEGOTIATING && ns.pdsk > D_NEGOTIATING && test_and_clear_bit(RESYNC_AFTER_NEG, &device->flags)) { if (ns.conn == C_CONNECTED) resync_after_online_grow(device); } /* A resync finished or aborted, wake paused devices... */ if ((os.conn > C_CONNECTED && ns.conn <= C_CONNECTED) || (os.peer_isp && !ns.peer_isp) || (os.user_isp && !ns.user_isp)) resume_next_sg(device); /* sync target done with resync. Explicitly notify peer, even though * it should (at least for non-empty resyncs) already know itself. */ if (os.disk < D_UP_TO_DATE && os.conn >= C_SYNC_SOURCE && ns.conn == C_CONNECTED) drbd_send_state(first_peer_device(device), ns); /* Verify finished, or reached stop sector. Peer did not know about * the stop sector, and we may even have changed the stop sector during * verify to interrupt/stop early. Send the new state. */ if (os.conn == C_VERIFY_S && ns.conn == C_CONNECTED && verify_can_do_stop_sector(device)) drbd_send_state(first_peer_device(device), ns); /* This triggers bitmap writeout of potentially still unwritten pages * if the resync finished cleanly, or aborted because of peer disk * failure, or because of connection loss. * For resync aborted because of local disk failure, we cannot do * any bitmap writeout anymore. * No harm done if some bits change during this phase. */ if (os.conn > C_CONNECTED && ns.conn <= C_CONNECTED && get_ldev(device)) { drbd_queue_bitmap_io(device, &drbd_bm_write_copy_pages, NULL, "write from resync_finished", BM_LOCKED_CHANGE_ALLOWED); put_ldev(device); } if (ns.disk == D_DISKLESS && ns.conn == C_STANDALONE && ns.role == R_SECONDARY) { if (os.aftr_isp != ns.aftr_isp) resume_next_sg(device); } drbd_md_sync(device); } struct after_conn_state_chg_work { struct drbd_work w; enum drbd_conns oc; union drbd_state ns_min; union drbd_state ns_max; /* new, max state, over all devices */ enum chg_state_flags flags; struct drbd_connection *connection; }; static int w_after_conn_state_ch(struct drbd_work *w, int unused) { struct after_conn_state_chg_work *acscw = container_of(w, struct after_conn_state_chg_work, w); struct drbd_connection *connection = acscw->connection; enum drbd_conns oc = acscw->oc; union drbd_state ns_max = acscw->ns_max; struct drbd_peer_device *peer_device; int vnr; kfree(acscw); /* Upon network configuration, we need to start the receiver */ if (oc == C_STANDALONE && ns_max.conn == C_UNCONNECTED) drbd_thread_start(&connection->receiver); if (oc == C_DISCONNECTING && ns_max.conn == C_STANDALONE) { struct net_conf *old_conf; mutex_lock(&connection->resource->conf_update); old_conf = connection->net_conf; connection->my_addr_len = 0; connection->peer_addr_len = 0; rcu_assign_pointer(connection->net_conf, NULL); conn_free_crypto(connection); mutex_unlock(&connection->resource->conf_update); synchronize_rcu(); kfree(old_conf); } if (ns_max.susp_fen) { /* case1: The outdate peer handler is successful: */ if (ns_max.pdsk <= D_OUTDATED) { rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; if (test_bit(NEW_CUR_UUID, &device->flags)) { drbd_uuid_new_current(device); clear_bit(NEW_CUR_UUID, &device->flags); } } rcu_read_unlock(); spin_lock_irq(&connection->resource->req_lock); _tl_restart(connection, CONNECTION_LOST_WHILE_PENDING); _conn_request_state(connection, (union drbd_state) { { .susp_fen = 1 } }, (union drbd_state) { { .susp_fen = 0 } }, CS_VERBOSE); spin_unlock_irq(&connection->resource->req_lock); } } kref_put(&connection->kref, &drbd_destroy_connection); conn_md_sync(connection); return 0; } void conn_old_common_state(struct drbd_connection *connection, union drbd_state *pcs, enum chg_state_flags *pf) { enum chg_state_flags flags = ~0; struct drbd_peer_device *peer_device; int vnr, first_vol = 1; union drbd_dev_state os, cs = { { .role = R_SECONDARY, .peer = R_UNKNOWN, .conn = connection->cstate, .disk = D_DISKLESS, .pdsk = D_UNKNOWN, } }; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; os = device->state; if (first_vol) { cs = os; first_vol = 0; continue; } if (cs.role != os.role) flags &= ~CS_DC_ROLE; if (cs.peer != os.peer) flags &= ~CS_DC_PEER; if (cs.conn != os.conn) flags &= ~CS_DC_CONN; if (cs.disk != os.disk) flags &= ~CS_DC_DISK; if (cs.pdsk != os.pdsk) flags &= ~CS_DC_PDSK; } rcu_read_unlock(); *pf |= CS_DC_MASK; *pf &= flags; (*pcs).i = cs.i; } static enum drbd_state_rv conn_is_valid_transition(struct drbd_connection *connection, union drbd_state mask, union drbd_state val, enum chg_state_flags flags) { enum drbd_state_rv rv = SS_SUCCESS; union drbd_state ns, os; struct drbd_peer_device *peer_device; int vnr; rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; os = drbd_read_state(device); ns = sanitize_state(device, os, apply_mask_val(os, mask, val), NULL); if (flags & CS_IGN_OUTD_FAIL && ns.disk == D_OUTDATED && os.disk < D_OUTDATED) ns.disk = os.disk; if (ns.i == os.i) continue; rv = is_valid_transition(os, ns); if (rv >= SS_SUCCESS && !(flags & CS_HARD)) { rv = is_valid_state(device, ns); if (rv < SS_SUCCESS) { if (is_valid_state(device, os) == rv) rv = is_valid_soft_transition(os, ns, connection); } else rv = is_valid_soft_transition(os, ns, connection); } if (rv < SS_SUCCESS) { if (flags & CS_VERBOSE) print_st_err(device, os, ns, rv); break; } } rcu_read_unlock(); return rv; } void conn_set_state(struct drbd_connection *connection, union drbd_state mask, union drbd_state val, union drbd_state *pns_min, union drbd_state *pns_max, enum chg_state_flags flags) { union drbd_state ns, os, ns_max = { }; union drbd_state ns_min = { { .role = R_MASK, .peer = R_MASK, .conn = val.conn, .disk = D_MASK, .pdsk = D_MASK } }; struct drbd_peer_device *peer_device; enum drbd_state_rv rv; int vnr, number_of_volumes = 0; if (mask.conn == C_MASK) { /* remember last connect time so request_timer_fn() won't * kill newly established sessions while we are still trying to thaw * previously frozen IO */ if (connection->cstate != C_WF_REPORT_PARAMS && val.conn == C_WF_REPORT_PARAMS) connection->last_reconnect_jif = jiffies; connection->cstate = val.conn; } rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; number_of_volumes++; os = drbd_read_state(device); ns = apply_mask_val(os, mask, val); ns = sanitize_state(device, os, ns, NULL); if (flags & CS_IGN_OUTD_FAIL && ns.disk == D_OUTDATED && os.disk < D_OUTDATED) ns.disk = os.disk; rv = __drbd_set_state(device, ns, flags, NULL); if (rv < SS_SUCCESS) BUG(); ns.i = device->state.i; ns_max.role = max_role(ns.role, ns_max.role); ns_max.peer = max_role(ns.peer, ns_max.peer); ns_max.conn = max_t(enum drbd_conns, ns.conn, ns_max.conn); ns_max.disk = max_t(enum drbd_disk_state, ns.disk, ns_max.disk); ns_max.pdsk = max_t(enum drbd_disk_state, ns.pdsk, ns_max.pdsk); ns_min.role = min_role(ns.role, ns_min.role); ns_min.peer = min_role(ns.peer, ns_min.peer); ns_min.conn = min_t(enum drbd_conns, ns.conn, ns_min.conn); ns_min.disk = min_t(enum drbd_disk_state, ns.disk, ns_min.disk); ns_min.pdsk = min_t(enum drbd_disk_state, ns.pdsk, ns_min.pdsk); } rcu_read_unlock(); if (number_of_volumes == 0) { ns_min = ns_max = (union drbd_state) { { .role = R_SECONDARY, .peer = R_UNKNOWN, .conn = val.conn, .disk = D_DISKLESS, .pdsk = D_UNKNOWN } }; } ns_min.susp = ns_max.susp = connection->resource->susp; ns_min.susp_nod = ns_max.susp_nod = connection->resource->susp_nod; ns_min.susp_fen = ns_max.susp_fen = connection->resource->susp_fen; *pns_min = ns_min; *pns_max = ns_max; } static enum drbd_state_rv _conn_rq_cond(struct drbd_connection *connection, union drbd_state mask, union drbd_state val) { enum drbd_state_rv err, rv = SS_UNKNOWN_ERROR; /* continue waiting */; if (test_and_clear_bit(CONN_WD_ST_CHG_OKAY, &connection->flags)) rv = SS_CW_SUCCESS; if (test_and_clear_bit(CONN_WD_ST_CHG_FAIL, &connection->flags)) rv = SS_CW_FAILED_BY_PEER; err = conn_is_valid_transition(connection, mask, val, 0); if (err == SS_SUCCESS && connection->cstate == C_WF_REPORT_PARAMS) return rv; return err; } enum drbd_state_rv _conn_request_state(struct drbd_connection *connection, union drbd_state mask, union drbd_state val, enum chg_state_flags flags) { enum drbd_state_rv rv = SS_SUCCESS; struct after_conn_state_chg_work *acscw; enum drbd_conns oc = connection->cstate; union drbd_state ns_max, ns_min, os; bool have_mutex = false; if (mask.conn) { rv = is_valid_conn_transition(oc, val.conn); if (rv < SS_SUCCESS) goto abort; } rv = conn_is_valid_transition(connection, mask, val, flags); if (rv < SS_SUCCESS) goto abort; if (oc == C_WF_REPORT_PARAMS && val.conn == C_DISCONNECTING && !(flags & (CS_LOCAL_ONLY | CS_HARD))) { /* This will be a cluster-wide state change. * Need to give up the spinlock, grab the mutex, * then send the state change request, ... */ spin_unlock_irq(&connection->resource->req_lock); mutex_lock(&connection->cstate_mutex); have_mutex = true; set_bit(CONN_WD_ST_CHG_REQ, &connection->flags); if (conn_send_state_req(connection, mask, val)) { /* sending failed. */ clear_bit(CONN_WD_ST_CHG_REQ, &connection->flags); rv = SS_CW_FAILED_BY_PEER; /* need to re-aquire the spin lock, though */ goto abort_unlocked; } if (val.conn == C_DISCONNECTING) set_bit(DISCONNECT_SENT, &connection->flags); /* ... and re-aquire the spinlock. * If _conn_rq_cond() returned >= SS_SUCCESS, we must call * conn_set_state() within the same spinlock. */ spin_lock_irq(&connection->resource->req_lock); wait_event_lock_irq(connection->ping_wait, (rv = _conn_rq_cond(connection, mask, val)), connection->resource->req_lock); clear_bit(CONN_WD_ST_CHG_REQ, &connection->flags); if (rv < SS_SUCCESS) goto abort; } conn_old_common_state(connection, &os, &flags); flags |= CS_DC_SUSP; conn_set_state(connection, mask, val, &ns_min, &ns_max, flags); conn_pr_state_change(connection, os, ns_max, flags); acscw = kmalloc(sizeof(*acscw), GFP_ATOMIC); if (acscw) { acscw->oc = os.conn; acscw->ns_min = ns_min; acscw->ns_max = ns_max; acscw->flags = flags; acscw->w.cb = w_after_conn_state_ch; kref_get(&connection->kref); acscw->connection = connection; drbd_queue_work(&connection->sender_work, &acscw->w); } else { drbd_err(connection, "Could not kmalloc an acscw\n"); } abort: if (have_mutex) { /* mutex_unlock() "... must not be used in interrupt context.", * so give up the spinlock, then re-aquire it */ spin_unlock_irq(&connection->resource->req_lock); abort_unlocked: mutex_unlock(&connection->cstate_mutex); spin_lock_irq(&connection->resource->req_lock); } if (rv < SS_SUCCESS && flags & CS_VERBOSE) { drbd_err(connection, "State change failed: %s\n", drbd_set_st_err_str(rv)); drbd_err(connection, " mask = 0x%x val = 0x%x\n", mask.i, val.i); drbd_err(connection, " old_conn:%s wanted_conn:%s\n", drbd_conn_str(oc), drbd_conn_str(val.conn)); } return rv; } enum drbd_state_rv conn_request_state(struct drbd_connection *connection, union drbd_state mask, union drbd_state val, enum chg_state_flags flags) { enum drbd_state_rv rv; spin_lock_irq(&connection->resource->req_lock); rv = _conn_request_state(connection, mask, val, flags); spin_unlock_irq(&connection->resource->req_lock); return rv; } drbd-8.4.4/drbd/drbd_state.h0000664000000000000000000001360212221261130014331 0ustar rootroot#ifndef DRBD_STATE_H #define DRBD_STATE_H struct drbd_device; struct drbd_connection; /** * DOC: DRBD State macros * * These macros are used to express state changes in easily readable form. * * The NS macros expand to a mask and a value, that can be bit ored onto the * current state as soon as the spinlock (req_lock) was taken. * * The _NS macros are used for state functions that get called with the * spinlock. These macros expand directly to the new state value. * * Besides the basic forms NS() and _NS() additional _?NS[23] are defined * to express state changes that affect more than one aspect of the state. * * E.g. NS2(conn, C_CONNECTED, peer, R_SECONDARY) * Means that the network connection was established and that the peer * is in secondary role. */ #define role_MASK R_MASK #define peer_MASK R_MASK #define disk_MASK D_MASK #define pdsk_MASK D_MASK #define conn_MASK C_MASK #define susp_MASK 1 #define user_isp_MASK 1 #define aftr_isp_MASK 1 #define susp_nod_MASK 1 #define susp_fen_MASK 1 #define NS(T, S) \ ({ union drbd_state mask; mask.i = 0; mask.T = T##_MASK; mask; }), \ ({ union drbd_state val; val.i = 0; val.T = (S); val; }) #define NS2(T1, S1, T2, S2) \ ({ union drbd_state mask; mask.i = 0; mask.T1 = T1##_MASK; \ mask.T2 = T2##_MASK; mask; }), \ ({ union drbd_state val; val.i = 0; val.T1 = (S1); \ val.T2 = (S2); val; }) #define NS3(T1, S1, T2, S2, T3, S3) \ ({ union drbd_state mask; mask.i = 0; mask.T1 = T1##_MASK; \ mask.T2 = T2##_MASK; mask.T3 = T3##_MASK; mask; }), \ ({ union drbd_state val; val.i = 0; val.T1 = (S1); \ val.T2 = (S2); val.T3 = (S3); val; }) #define _NS(D, T, S) \ D, ({ union drbd_state __ns; __ns = drbd_read_state(D); __ns.T = (S); __ns; }) #define _NS2(D, T1, S1, T2, S2) \ D, ({ union drbd_state __ns; __ns = drbd_read_state(D); __ns.T1 = (S1); \ __ns.T2 = (S2); __ns; }) #define _NS3(D, T1, S1, T2, S2, T3, S3) \ D, ({ union drbd_state __ns; __ns = drbd_read_state(D); __ns.T1 = (S1); \ __ns.T2 = (S2); __ns.T3 = (S3); __ns; }) enum chg_state_flags { CS_HARD = 1 << 0, CS_VERBOSE = 1 << 1, CS_WAIT_COMPLETE = 1 << 2, CS_SERIALIZE = 1 << 3, CS_ORDERED = CS_WAIT_COMPLETE + CS_SERIALIZE, CS_LOCAL_ONLY = 1 << 4, /* Do not consider a device pair wide state change */ CS_DC_ROLE = 1 << 5, /* DC = display as connection state change */ CS_DC_PEER = 1 << 6, CS_DC_CONN = 1 << 7, CS_DC_DISK = 1 << 8, CS_DC_PDSK = 1 << 9, CS_DC_SUSP = 1 << 10, CS_DC_MASK = CS_DC_ROLE + CS_DC_PEER + CS_DC_CONN + CS_DC_DISK + CS_DC_PDSK, CS_IGN_OUTD_FAIL = 1 << 11, }; /* drbd_dev_state and drbd_state are different types. This is to stress the small difference. There is no suspended flag (.susp), and no suspended while fence handler runs flas (susp_fen). */ union drbd_dev_state { struct { #if defined(__LITTLE_ENDIAN_BITFIELD) unsigned role:2 ; /* 3/4 primary/secondary/unknown */ unsigned peer:2 ; /* 3/4 primary/secondary/unknown */ unsigned conn:5 ; /* 17/32 cstates */ unsigned disk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned pdsk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned _unused:1 ; unsigned aftr_isp:1 ; /* isp .. imposed sync pause */ unsigned peer_isp:1 ; unsigned user_isp:1 ; unsigned _pad:11; /* 0 unused */ #elif defined(__BIG_ENDIAN_BITFIELD) unsigned _pad:11; unsigned user_isp:1 ; unsigned peer_isp:1 ; unsigned aftr_isp:1 ; /* isp .. imposed sync pause */ unsigned _unused:1 ; unsigned pdsk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned disk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned conn:5 ; /* 17/32 cstates */ unsigned peer:2 ; /* 3/4 primary/secondary/unknown */ unsigned role:2 ; /* 3/4 primary/secondary/unknown */ #else # error "this endianess is not supported" #endif }; unsigned int i; }; extern enum drbd_state_rv drbd_change_state(struct drbd_device *device, enum chg_state_flags f, union drbd_state mask, union drbd_state val); extern void drbd_force_state(struct drbd_device *, union drbd_state, union drbd_state); extern enum drbd_state_rv _drbd_request_state(struct drbd_device *, union drbd_state, union drbd_state, enum chg_state_flags); extern enum drbd_state_rv __drbd_set_state(struct drbd_device *, union drbd_state, enum chg_state_flags, struct completion *done); extern void print_st_err(struct drbd_device *, union drbd_state, union drbd_state, int); enum drbd_state_rv _conn_request_state(struct drbd_connection *connection, union drbd_state mask, union drbd_state val, enum chg_state_flags flags); enum drbd_state_rv conn_request_state(struct drbd_connection *connection, union drbd_state mask, union drbd_state val, enum chg_state_flags flags); extern void drbd_resume_al(struct drbd_device *device); extern bool conn_all_vols_unconf(struct drbd_connection *connection); /** * drbd_request_state() - Reqest a state change * @device: DRBD device. * @mask: mask of state bits to change. * @val: value of new state bits. * * This is the most graceful way of requesting a state change. It is verbose * quite verbose in case the state change is not possible, and all those * state changes are globally serialized. */ static inline int drbd_request_state(struct drbd_device *device, union drbd_state mask, union drbd_state val) { return _drbd_request_state(device, mask, val, CS_VERBOSE + CS_ORDERED); } enum drbd_role conn_highest_role(struct drbd_connection *connection); enum drbd_role conn_highest_peer(struct drbd_connection *connection); enum drbd_disk_state conn_highest_disk(struct drbd_connection *connection); enum drbd_disk_state conn_lowest_disk(struct drbd_connection *connection); enum drbd_disk_state conn_highest_pdsk(struct drbd_connection *connection); enum drbd_conns conn_lowest_conn(struct drbd_connection *connection); #endif drbd-8.4.4/drbd/drbd_strings.c0000664000000000000000000001054512221261130014700 0ustar rootroot/* drbd.h This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include "drbd_strings.h" static const char *drbd_conn_s_names[] = { [C_STANDALONE] = "StandAlone", [C_DISCONNECTING] = "Disconnecting", [C_UNCONNECTED] = "Unconnected", [C_TIMEOUT] = "Timeout", [C_BROKEN_PIPE] = "BrokenPipe", [C_NETWORK_FAILURE] = "NetworkFailure", [C_PROTOCOL_ERROR] = "ProtocolError", [C_WF_CONNECTION] = "WFConnection", [C_WF_REPORT_PARAMS] = "WFReportParams", [C_TEAR_DOWN] = "TearDown", [C_CONNECTED] = "Connected", [C_STARTING_SYNC_S] = "StartingSyncS", [C_STARTING_SYNC_T] = "StartingSyncT", [C_WF_BITMAP_S] = "WFBitMapS", [C_WF_BITMAP_T] = "WFBitMapT", [C_WF_SYNC_UUID] = "WFSyncUUID", [C_SYNC_SOURCE] = "SyncSource", [C_SYNC_TARGET] = "SyncTarget", [C_PAUSED_SYNC_S] = "PausedSyncS", [C_PAUSED_SYNC_T] = "PausedSyncT", [C_VERIFY_S] = "VerifyS", [C_VERIFY_T] = "VerifyT", [C_AHEAD] = "Ahead", [C_BEHIND] = "Behind", }; static const char *drbd_role_s_names[] = { [R_PRIMARY] = "Primary", [R_SECONDARY] = "Secondary", [R_UNKNOWN] = "Unknown" }; static const char *drbd_disk_s_names[] = { [D_DISKLESS] = "Diskless", [D_ATTACHING] = "Attaching", [D_FAILED] = "Failed", [D_NEGOTIATING] = "Negotiating", [D_INCONSISTENT] = "Inconsistent", [D_OUTDATED] = "Outdated", [D_UNKNOWN] = "DUnknown", [D_CONSISTENT] = "Consistent", [D_UP_TO_DATE] = "UpToDate", }; static const char *drbd_state_sw_errors[] = { [-SS_TWO_PRIMARIES] = "Multiple primaries not allowed by config", [-SS_NO_UP_TO_DATE_DISK] = "Need access to UpToDate data", [-SS_NO_LOCAL_DISK] = "Can not resync without local disk", [-SS_NO_REMOTE_DISK] = "Can not resync without remote disk", [-SS_CONNECTED_OUTDATES] = "Refusing to be Outdated while Connected", [-SS_PRIMARY_NOP] = "Refusing to be Primary while peer is not outdated", [-SS_RESYNC_RUNNING] = "Can not start OV/resync since it is already active", [-SS_ALREADY_STANDALONE] = "Can not disconnect a StandAlone device", [-SS_CW_FAILED_BY_PEER] = "State change was refused by peer node", [-SS_IS_DISKLESS] = "Device is diskless, the requested operation requires a disk", [-SS_DEVICE_IN_USE] = "Device is held open by someone", [-SS_NO_NET_CONFIG] = "Have no net/connection configuration", [-SS_NO_VERIFY_ALG] = "Need a verify algorithm to start online verify", [-SS_NEED_CONNECTION] = "Need a connection to start verify or resync", [-SS_NOT_SUPPORTED] = "Peer does not support protocol", [-SS_LOWER_THAN_OUTDATED] = "Disk state is lower than outdated", [-SS_IN_TRANSIENT_STATE] = "In transient state, retry after next state change", [-SS_CONCURRENT_ST_CHG] = "Concurrent state changes detected and aborted", [-SS_OUTDATE_WO_CONN] = "Need a connection for a graceful disconnect/outdate peer", [-SS_O_VOL_PEER_PRI] = "Other vol primary on peer not allowed by config", }; const char *drbd_conn_str(enum drbd_conns s) { /* enums are unsigned... */ return s > C_BEHIND ? "TOO_LARGE" : drbd_conn_s_names[s]; } const char *drbd_role_str(enum drbd_role s) { return s > R_SECONDARY ? "TOO_LARGE" : drbd_role_s_names[s]; } const char *drbd_disk_str(enum drbd_disk_state s) { return s > D_UP_TO_DATE ? "TOO_LARGE" : drbd_disk_s_names[s]; } const char *drbd_set_st_err_str(enum drbd_state_rv err) { return err <= SS_AFTER_LAST_ERROR ? "TOO_SMALL" : err > SS_TWO_PRIMARIES ? "TOO_LARGE" : drbd_state_sw_errors[-err]; } drbd-8.4.4/drbd/drbd_strings.h0000664000000000000000000000045412221261130014703 0ustar rootroot#ifndef __DRBD_STRINGS_H #define __DRBD_STRINGS_H extern const char *drbd_conn_str(enum drbd_conns); extern const char *drbd_role_str(enum drbd_role); extern const char *drbd_disk_str(enum drbd_disk_state); extern const char *drbd_set_st_err_str(enum drbd_state_rv); #endif /* __DRBD_STRINGS_H */ drbd-8.4.4/drbd/drbd_sysfs.c0000664000000000000000000000532512176213144014371 0ustar rootroot/* drbd_sysfs.c This file is part of DRBD. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include "linux/kobject.h" #include #include "drbd_int.h" struct drbd_md_attribute { struct attribute attr; ssize_t (*show)(struct drbd_backing_dev *bdev, char *buf); /* ssize_t (*store)(struct drbd_backing_dev *bdev, const char *buf, size_t count); */ }; static ssize_t drbd_md_attr_show(struct kobject *, struct attribute *, char *); static ssize_t data_gen_id_show(struct drbd_backing_dev *, char *); static void backing_dev_release(struct kobject *kobj); #define DRBD_MD_ATTR(_name) struct drbd_md_attribute drbd_md_attr_##_name = __ATTR_RO(_name) static DRBD_MD_ATTR(data_gen_id); static struct attribute *bdev_attrs[] = { &drbd_md_attr_data_gen_id.attr, NULL }; struct kobj_type drbd_bdev_kobj_type = { .release = backing_dev_release, .sysfs_ops = &(struct sysfs_ops) { .show = drbd_md_attr_show, .store = NULL, }, .default_attrs = bdev_attrs, }; static ssize_t drbd_md_attr_show(struct kobject *kobj, struct attribute *attr, char *buffer) { struct drbd_backing_dev *bdev = container_of(kobj, struct drbd_backing_dev, kobject); struct drbd_md_attribute *drbd_md_attr = container_of(attr, struct drbd_md_attribute, attr); return drbd_md_attr->show(bdev, buffer); } static ssize_t data_gen_id_show(struct drbd_backing_dev *bdev, char *buf) { unsigned long flags; enum drbd_uuid_index idx; char *b = buf; /* does this need to be _irqsave, or is _irq good enough */ spin_lock_irqsave(&bdev->md.uuid_lock, flags); for (idx = UI_CURRENT; idx <= UI_HISTORY_END; idx++) { b += sprintf(b, "0x%016llX\n", bdev->md.uuid[idx]); } spin_unlock_irqrestore(&bdev->md.uuid_lock, flags); return b - buf; } static void backing_dev_release(struct kobject *kobj) { struct drbd_backing_dev *bdev = container_of(kobj, struct drbd_backing_dev, kobject); kfree(bdev->disk_conf); kfree(bdev); } drbd-8.4.4/drbd/drbd_vli.h0000664000000000000000000002637212176213144014026 0ustar rootroot/* -*- linux-c -*- drbd_receiver.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef _DRBD_VLI_H #define _DRBD_VLI_H /* * At a granularity of 4KiB storage represented per bit, * and stroage sizes of several TiB, * and possibly small-bandwidth replication, * the bitmap transfer time can take much too long, * if transmitted in plain text. * * We try to reduce the transferred bitmap information * by encoding runlengths of bit polarity. * * We never actually need to encode a "zero" (runlengths are positive). * But then we have to store the value of the first bit. * The first bit of information thus shall encode if the first runlength * gives the number of set or unset bits. * * We assume that large areas are either completely set or unset, * which gives good compression with any runlength method, * even when encoding the runlength as fixed size 32bit/64bit integers. * * Still, there may be areas where the polarity flips every few bits, * and encoding the runlength sequence of those areas with fix size * integers would be much worse than plaintext. * * We want to encode small runlength values with minimum code length, * while still being able to encode a Huge run of all zeros. * * Thus we need a Variable Length Integer encoding, VLI. * * For some cases, we produce more code bits than plaintext input. * We need to send incompressible chunks as plaintext, skip over them * and then see if the next chunk compresses better. * * We don't care too much about "excellent" compression ratio for large * runlengths (all set/all clear): whether we achieve a factor of 100 * or 1000 is not that much of an issue. * We do not want to waste too much on short runlengths in the "noisy" * parts of the bitmap, though. * * There are endless variants of VLI, we experimented with: * * simple byte-based * * various bit based with different code word length. * * To avoid yet an other configuration parameter (choice of bitmap compression * algorithm) which was difficult to explain and tune, we just chose the one * variant that turned out best in all test cases. * Based on real world usage patterns, with device sizes ranging from a few GiB * to several TiB, file server/mailserver/webserver/mysql/postgress, * mostly idle to really busy, the all time winner (though sometimes only * marginally better) is: */ /* * encoding is "visualised" as * __little endian__ bitstream, least significant bit first (left most) * * this particular encoding is chosen so that the prefix code * starts as unary encoding the level, then modified so that * 10 levels can be described in 8bit, with minimal overhead * for the smaller levels. * * Number of data bits follow fibonacci sequence, with the exception of the * last level (+1 data bit, so it makes 64bit total). The only worse code when * encoding bit polarity runlength is 1 plain bits => 2 code bits. prefix data bits max val Nº data bits 0 x 0x2 1 10 x 0x4 1 110 xx 0x8 2 1110 xxx 0x10 3 11110 xxx xx 0x30 5 111110 xx xxxxxx 0x130 8 11111100 xxxxxxxx xxxxx 0x2130 13 11111110 xxxxxxxx xxxxxxxx xxxxx 0x202130 21 11111101 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xx 0x400202130 34 11111111 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 56 * maximum encodable value: 0x100000400202130 == 2**56 + some */ /* compression "table": transmitted x 0.29 as plaintext x ........................ x ........................ x ........................ x 0.59 0.21........................ x ........................................................ x .. c ................................................... x 0.44.. o ................................................... x .......... d ................................................... x .......... e ................................................... X............. ................................................... x.............. b ................................................... 2.0x............... i ................................................... #X................ t ................................................... #................. s ........................... plain bits .......... -+----------------------------------------------------------------------- 1 16 32 64 */ /* LEVEL: (total bits, prefix bits, prefix value), * sorted ascending by number of total bits. * The rest of the code table is calculated at compiletime from this. */ /* fibonacci data 1, 1, ... */ #define VLI_L_1_1() do { \ LEVEL( 2, 1, 0x00); \ LEVEL( 3, 2, 0x01); \ LEVEL( 5, 3, 0x03); \ LEVEL( 7, 4, 0x07); \ LEVEL(10, 5, 0x0f); \ LEVEL(14, 6, 0x1f); \ LEVEL(21, 8, 0x3f); \ LEVEL(29, 8, 0x7f); \ LEVEL(42, 8, 0xbf); \ LEVEL(64, 8, 0xff); \ } while (0) /* finds a suitable level to decode the least significant part of in. * returns number of bits consumed. * * BUG() for bad input, as that would mean a buggy code table. */ static inline int vli_decode_bits(u64 *out, const u64 in) { u64 adj = 1; #define LEVEL(t,b,v) \ do { \ if ((in & ((1 << b) -1)) == v) { \ *out = ((in & ((~0ULL) >> (64-t))) >> b) + adj; \ return t; \ } \ adj += 1ULL << (t - b); \ } while (0) VLI_L_1_1(); /* NOT REACHED, if VLI_LEVELS code table is defined properly */ BUG(); #undef LEVEL } /* return number of code bits needed, * or negative error number */ static inline int __vli_encode_bits(u64 *out, const u64 in) { u64 max = 0; u64 adj = 1; if (in == 0) return -EINVAL; #define LEVEL(t,b,v) do { \ max += 1ULL << (t - b); \ if (in <= max) { \ if (out) \ *out = ((in - adj) << b) | v; \ return t; \ } \ adj = max + 1; \ } while (0) VLI_L_1_1(); return -EOVERFLOW; #undef LEVEL } #undef VLI_L_1_1 /* code from here down is independend of actually used bit code */ /* * Code length is determined by some unique (e.g. unary) prefix. * This encodes arbitrary bit length, not whole bytes: we have a bit-stream, * not a byte stream. */ /* for the bitstream, we need a cursor */ struct bitstream_cursor { /* the current byte */ u8 *b; /* the current bit within *b, nomalized: 0..7 */ unsigned int bit; }; /* initialize cursor to point to first bit of stream */ static inline void bitstream_cursor_reset(struct bitstream_cursor *cur, void *s) { cur->b = s; cur->bit = 0; } /* advance cursor by that many bits; maximum expected input value: 64, * but depending on VLI implementation, it may be more. */ static inline void bitstream_cursor_advance(struct bitstream_cursor *cur, unsigned int bits) { bits += cur->bit; cur->b = cur->b + (bits >> 3); cur->bit = bits & 7; } /* the bitstream itself knows its length */ struct bitstream { struct bitstream_cursor cur; unsigned char *buf; size_t buf_len; /* in bytes */ /* for input stream: * number of trailing 0 bits for padding * total number of valid bits in stream: buf_len * 8 - pad_bits */ unsigned int pad_bits; }; static inline void bitstream_init(struct bitstream *bs, void *s, size_t len, unsigned int pad_bits) { bs->buf = s; bs->buf_len = len; bs->pad_bits = pad_bits; bitstream_cursor_reset(&bs->cur, bs->buf); } static inline void bitstream_rewind(struct bitstream *bs) { bitstream_cursor_reset(&bs->cur, bs->buf); memset(bs->buf, 0, bs->buf_len); } /* Put (at most 64) least significant bits of val into bitstream, and advance cursor. * Ignores "pad_bits". * Returns zero if bits == 0 (nothing to do). * Returns number of bits used if successful. * * If there is not enough room left in bitstream, * leaves bitstream unchanged and returns -ENOBUFS. */ static inline int bitstream_put_bits(struct bitstream *bs, u64 val, const unsigned int bits) { unsigned char *b = bs->cur.b; unsigned int tmp; if (bits == 0) return 0; if ((bs->cur.b + ((bs->cur.bit + bits -1) >> 3)) - bs->buf >= bs->buf_len) return -ENOBUFS; /* paranoia: strip off hi bits; they should not be set anyways. */ if (bits < 64) val &= ~0ULL >> (64 - bits); *b++ |= (val & 0xff) << bs->cur.bit; for (tmp = 8 - bs->cur.bit; tmp < bits; tmp += 8) *b++ |= (val >> tmp) & 0xff; bitstream_cursor_advance(&bs->cur, bits); return bits; } /* Fetch (at most 64) bits from bitstream into *out, and advance cursor. * * If more than 64 bits are requested, returns -EINVAL and leave *out unchanged. * * If there are less than the requested number of valid bits left in the * bitstream, still fetches all available bits. * * Returns number of actually fetched bits. */ static inline int bitstream_get_bits(struct bitstream *bs, u64 *out, int bits) { u64 val; unsigned int n; if (bits > 64) return -EINVAL; if (bs->cur.b + ((bs->cur.bit + bs->pad_bits + bits -1) >> 3) - bs->buf >= bs->buf_len) bits = ((bs->buf_len - (bs->cur.b - bs->buf)) << 3) - bs->cur.bit - bs->pad_bits; if (bits == 0) { *out = 0; return 0; } /* get the high bits */ val = 0; n = (bs->cur.bit + bits + 7) >> 3; /* n may be at most 9, if cur.bit + bits > 64 */ /* which means this copies at most 8 byte */ if (n) { memcpy(&val, bs->cur.b+1, n - 1); val = le64_to_cpu(val) << (8 - bs->cur.bit); } /* we still need the low bits */ val |= bs->cur.b[0] >> bs->cur.bit; /* and mask out bits we don't want */ val &= ~0ULL >> (64 - bits); bitstream_cursor_advance(&bs->cur, bits); *out = val; return bits; } /* encodes @in as vli into @bs; * return values * > 0: number of bits successfully stored in bitstream * -ENOBUFS @bs is full * -EINVAL input zero (invalid) * -EOVERFLOW input too large for this vli code (invalid) */ static inline int vli_encode_bits(struct bitstream *bs, u64 in) { u64 code = code; int bits = __vli_encode_bits(&code, in); if (bits <= 0) return bits; return bitstream_put_bits(bs, code, bits); } #endif drbd-8.4.4/drbd/drbd_worker.c0000664000000000000000000016345212225234676014551 0ustar rootroot/* drbd_worker.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include #include #include "drbd_int.h" #include "drbd_protocol.h" #include "drbd_req.h" static int make_ov_request(struct drbd_device *, int); static int make_resync_request(struct drbd_device *, int); /* endio handlers: * drbd_md_io_complete (defined here) * drbd_request_endio (defined here) * drbd_peer_request_endio (defined here) * bm_async_io_complete (defined in drbd_bitmap.c) * * For all these callbacks, note the following: * The callbacks will be called in irq context by the IDE drivers, * and in Softirqs/Tasklets/BH context by the SCSI drivers. * Try to get the locking right :) * */ /* About the global_state_lock Each state transition on an device holds a read lock. In case we have to evaluate the resync after dependencies, we grab a write lock, because we need stable states on all devices for that. */ rwlock_t global_state_lock; /* used for synchronous meta data and bitmap IO * submitted by drbd_md_sync_page_io() */ BIO_ENDIO_TYPE drbd_md_io_complete BIO_ENDIO_ARGS(struct bio *bio, int error) { struct drbd_md_io *md_io; struct drbd_device *device; BIO_ENDIO_FN_START; md_io = (struct drbd_md_io *)bio->bi_private; device = container_of(md_io, struct drbd_device, md_io); md_io->error = error; /* We grabbed an extra reference in _drbd_md_sync_page_io() to be able * to timeout on the lower level device, and eventually detach from it. * If this io completion runs after that timeout expired, this * drbd_md_put_buffer() may allow us to finally try and re-attach. * During normal operation, this only puts that extra reference * down to 1 again. * Make sure we first drop the reference, and only then signal * completion, or we may (in drbd_al_read_log()) cycle so fast into the * next drbd_md_sync_page_io(), that we trigger the * ASSERT(atomic_read(&device->md_io_in_use) == 1) there. */ drbd_md_put_buffer(device); md_io->done = 1; wake_up(&device->misc_wait); bio_put(bio); if (device->ldev) /* special case: drbd_md_read() during drbd_adm_attach() */ put_ldev(device); BIO_ENDIO_FN_RETURN; } /* reads on behalf of the partner, * "submitted" by the receiver */ void drbd_endio_read_sec_final(struct drbd_peer_request *peer_req) __releases(local) { unsigned long flags = 0; struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; spin_lock_irqsave(&device->resource->req_lock, flags); device->read_cnt += peer_req->i.size >> 9; list_del(&peer_req->w.list); if (list_empty(&device->read_ee)) wake_up(&device->ee_wait); if (test_bit(__EE_WAS_ERROR, &peer_req->flags)) __drbd_chk_io_error(device, DRBD_READ_ERROR); spin_unlock_irqrestore(&device->resource->req_lock, flags); drbd_queue_work(&peer_device->connection->sender_work, &peer_req->w); put_ldev(device); } static int is_failed_barrier(int ee_flags) { return (ee_flags & (EE_IS_BARRIER|EE_WAS_ERROR|EE_RESUBMITTED|EE_IS_TRIM)) == (EE_IS_BARRIER|EE_WAS_ERROR); } /* writes on behalf of the partner, or resync writes, * "submitted" by the receiver, final stage. */ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(local) { unsigned long flags = 0; struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; struct drbd_interval i; int do_wake; u64 block_id; int do_al_complete_io; /* if this is a failed barrier request, disable use of barriers, * and schedule for resubmission */ if (is_failed_barrier(peer_req->flags)) { drbd_bump_write_ordering(first_peer_device(device)->connection, WO_bdev_flush); spin_lock_irqsave(&device->resource->req_lock, flags); list_del(&peer_req->w.list); peer_req->flags = (peer_req->flags & ~EE_WAS_ERROR) | EE_RESUBMITTED; peer_req->w.cb = w_e_reissue; /* put_ldev actually happens below, once we come here again. */ __release(local); spin_unlock_irqrestore(&device->resource->req_lock, flags); drbd_queue_work(&first_peer_device(device)->connection->sender_work, &peer_req->w); return; } /* after we moved peer_req to done_ee, * we may no longer access it, * it may be freed/reused already! * (as soon as we release the req_lock) */ i = peer_req->i; do_al_complete_io = peer_req->flags & EE_CALL_AL_COMPLETE_IO; block_id = peer_req->block_id; spin_lock_irqsave(&device->resource->req_lock, flags); device->writ_cnt += peer_req->i.size >> 9; list_move_tail(&peer_req->w.list, &device->done_ee); /* * Do not remove from the write_requests tree here: we did not send the * Ack yet and did not wake possibly waiting conflicting requests. * Removed from the tree from "drbd_process_done_ee" within the * appropriate dw.cb (e_end_block/e_end_resync_block) or from * _drbd_clear_done_ee. */ do_wake = list_empty(block_id == ID_SYNCER ? &device->sync_ee : &device->active_ee); /* FIXME do we want to detach for failed REQ_DISCARD? * ((peer_req->flags & (EE_WAS_ERROR|EE_IS_TRIM)) == EE_WAS_ERROR) */ if (peer_req->flags & EE_WAS_ERROR) __drbd_chk_io_error(device, DRBD_WRITE_ERROR); spin_unlock_irqrestore(&device->resource->req_lock, flags); if (block_id == ID_SYNCER) drbd_rs_complete_io(device, i.sector); if (do_wake) wake_up(&device->ee_wait); if (do_al_complete_io) drbd_al_complete_io(device, &i); wake_asender(peer_device->connection); put_ldev(device); } /* writes on behalf of the partner, or resync writes, * "submitted" by the receiver. */ BIO_ENDIO_TYPE drbd_peer_request_endio BIO_ENDIO_ARGS(struct bio *bio, int error) { struct drbd_peer_request *peer_req = bio->bi_private; struct drbd_device *device = peer_req->peer_device->device; int uptodate = bio_flagged(bio, BIO_UPTODATE); int is_write = bio_data_dir(bio) == WRITE; int is_discard = !!(bio->bi_rw & DRBD_REQ_DISCARD); BIO_ENDIO_FN_START; if (error && DRBD_ratelimit(5*HZ, 5)) drbd_warn(device, "%s: error=%d s=%llus\n", is_write ? (is_discard ? "discard" : "write") : "read", error, (unsigned long long)peer_req->i.sector); if (!error && !uptodate) { if (DRBD_ratelimit(5*HZ, 5)) drbd_warn(device, "%s: setting error to -EIO s=%llus\n", is_write ? "write" : "read", (unsigned long long)peer_req->i.sector); /* strange behavior of some lower level drivers... * fail the request by clearing the uptodate flag, * but do not return any error?! */ error = -EIO; } if (error) set_bit(__EE_WAS_ERROR, &peer_req->flags); bio_put(bio); /* no need for the bio anymore */ if (atomic_dec_and_test(&peer_req->pending_bios)) { if (is_write) drbd_endio_write_sec_final(peer_req); else drbd_endio_read_sec_final(peer_req); } BIO_ENDIO_FN_RETURN; } /* read, readA or write requests on R_PRIMARY coming from drbd_make_request */ BIO_ENDIO_TYPE drbd_request_endio BIO_ENDIO_ARGS(struct bio *bio, int error) { unsigned long flags; struct drbd_request *req = bio->bi_private; struct drbd_device *device = req->device; struct bio_and_error m; enum drbd_req_event what; int uptodate = bio_flagged(bio, BIO_UPTODATE); BIO_ENDIO_FN_START; if (!error && !uptodate) { drbd_warn(device, "p %s: setting error to -EIO\n", bio_data_dir(bio) == WRITE ? "write" : "read"); /* strange behavior of some lower level drivers... * fail the request by clearing the uptodate flag, * but do not return any error?! */ error = -EIO; } /* If this request was aborted locally before, * but now was completed "successfully", * chances are that this caused arbitrary data corruption. * * "aborting" requests, or force-detaching the disk, is intended for * completely blocked/hung local backing devices which do no longer * complete requests at all, not even do error completions. In this * situation, usually a hard-reset and failover is the only way out. * * By "aborting", basically faking a local error-completion, * we allow for a more graceful swichover by cleanly migrating services. * Still the affected node has to be rebooted "soon". * * By completing these requests, we allow the upper layers to re-use * the associated data pages. * * If later the local backing device "recovers", and now DMAs some data * from disk into the original request pages, in the best case it will * just put random data into unused pages; but typically it will corrupt * meanwhile completely unrelated data, causing all sorts of damage. * * Which means delayed successful completion, * especially for READ requests, * is a reason to panic(). * * We assume that a delayed *error* completion is OK, * though we still will complain noisily about it. */ if (unlikely(req->rq_state & RQ_LOCAL_ABORTED)) { if (DRBD_ratelimit(5*HZ, 5)) drbd_emerg(device, "delayed completion of aborted local request; disk-timeout may be too aggressive\n"); if (!error) panic("possible random memory corruption caused by delayed completion of aborted local request\n"); } /* to avoid recursion in __req_mod */ if (unlikely(error)) { if (bio->bi_rw & DRBD_REQ_DISCARD) what = (error == -EOPNOTSUPP) ? DISCARD_COMPLETED_NOTSUPP : DISCARD_COMPLETED_WITH_ERROR; else what = (bio_data_dir(bio) == WRITE) ? WRITE_COMPLETED_WITH_ERROR : (bio_rw(bio) == READ) ? READ_COMPLETED_WITH_ERROR : READ_AHEAD_COMPLETED_WITH_ERROR; } else what = COMPLETED_OK; bio_put(req->private_bio); req->private_bio = ERR_PTR(error); /* not req_mod(), we need irqsave here! */ spin_lock_irqsave(&device->resource->req_lock, flags); __req_mod(req, what, &m); spin_unlock_irqrestore(&device->resource->req_lock, flags); put_ldev(device); if (m.bio) complete_master_bio(device, &m); BIO_ENDIO_FN_RETURN; } void drbd_csum_ee(struct crypto_hash *tfm, struct drbd_peer_request *peer_req, void *digest) { struct hash_desc desc; struct scatterlist sg; struct page *page = peer_req->pages; struct page *tmp; unsigned len; desc.tfm = tfm; desc.flags = 0; sg_init_table(&sg, 1); crypto_hash_init(&desc); while ((tmp = page_chain_next(page))) { /* all but the last page will be fully used */ sg_set_page(&sg, page, PAGE_SIZE, 0); crypto_hash_update(&desc, &sg, sg.length); page = tmp; } /* and now the last, possibly only partially used page */ len = peer_req->i.size & (PAGE_SIZE - 1); sg_set_page(&sg, page, len ?: PAGE_SIZE, 0); crypto_hash_update(&desc, &sg, sg.length); crypto_hash_final(&desc, digest); } void drbd_csum_bio(struct crypto_hash *tfm, struct bio *bio, void *digest) { struct hash_desc desc; struct scatterlist sg; struct bio_vec *bvec; int i; desc.tfm = tfm; desc.flags = 0; sg_init_table(&sg, 1); crypto_hash_init(&desc); bio_for_each_segment(bvec, bio, i) { sg_set_page(&sg, bvec->bv_page, bvec->bv_len, bvec->bv_offset); crypto_hash_update(&desc, &sg, sg.length); } crypto_hash_final(&desc, digest); } /* MAYBE merge common code with w_e_end_ov_req */ static int w_e_send_csum(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; int digest_size; void *digest; int err = 0; if (unlikely(cancel)) goto out; if (unlikely((peer_req->flags & EE_WAS_ERROR) != 0)) goto out; digest_size = crypto_hash_digestsize(peer_device->connection->csums_tfm); digest = kmalloc(digest_size, GFP_NOIO); if (digest) { sector_t sector = peer_req->i.sector; unsigned int size = peer_req->i.size; drbd_csum_ee(peer_device->connection->csums_tfm, peer_req, digest); /* Free peer_req and pages before send. * In case we block on congestion, we could otherwise run into * some distributed deadlock, if the other side blocks on * congestion as well, because our receiver blocks in * drbd_alloc_pages due to pp_in_use > max_buffers. */ drbd_free_peer_req(device, peer_req); peer_req = NULL; inc_rs_pending(device); err = drbd_send_drequest_csum(peer_device, sector, size, digest, digest_size, P_CSUM_RS_REQUEST); kfree(digest); } else { drbd_err(device, "kmalloc() of digest failed.\n"); err = -ENOMEM; } out: if (peer_req) drbd_free_peer_req(device, peer_req); if (unlikely(err)) drbd_err(device, "drbd_send_drequest(..., csum) failed\n"); return err; } #define GFP_TRY (__GFP_HIGHMEM | __GFP_NOWARN) static int read_for_csum(struct drbd_peer_device *peer_device, sector_t sector, int size) { struct drbd_device *device = peer_device->device; struct drbd_peer_request *peer_req; if (!get_ldev(device)) return -EIO; if (drbd_rs_should_slow_down(device, sector)) goto defer; /* GFP_TRY, because if there is no memory available right now, this may * be rescheduled for later. It is "only" background resync, after all. */ peer_req = drbd_alloc_peer_req(peer_device, ID_SYNCER /* unused */, sector, size, true /* has real payload */, GFP_TRY); if (!peer_req) goto defer; peer_req->w.cb = w_e_send_csum; spin_lock_irq(&device->resource->req_lock); list_add(&peer_req->w.list, &device->read_ee); spin_unlock_irq(&device->resource->req_lock); atomic_add(size >> 9, &device->rs_sect_ev); if (drbd_submit_peer_request(device, peer_req, READ, DRBD_FAULT_RS_RD) == 0) return 0; /* If it failed because of ENOMEM, retry should help. If it failed * because bio_add_page failed (probably broken lower level driver), * retry may or may not help. * If it does not, you may need to force disconnect. */ spin_lock_irq(&device->resource->req_lock); list_del(&peer_req->w.list); spin_unlock_irq(&device->resource->req_lock); drbd_free_peer_req(device, peer_req); defer: put_ldev(device); return -EAGAIN; } int w_resync_timer(struct drbd_work *w, int cancel) { struct drbd_device *device = container_of(w, struct drbd_device, resync_work); switch (device->state.conn) { case C_VERIFY_S: make_ov_request(device, cancel); break; case C_SYNC_TARGET: make_resync_request(device, cancel); break; } return 0; } void resync_timer_fn(unsigned long data) { struct drbd_device *device = (struct drbd_device *) data; if (list_empty(&device->resync_work.list)) drbd_queue_work(&first_peer_device(device)->connection->sender_work, &device->resync_work); } static void fifo_set(struct fifo_buffer *fb, int value) { int i; for (i = 0; i < fb->size; i++) fb->values[i] = value; } static int fifo_push(struct fifo_buffer *fb, int value) { int ov; ov = fb->values[fb->head_index]; fb->values[fb->head_index++] = value; if (fb->head_index >= fb->size) fb->head_index = 0; return ov; } static void fifo_add_val(struct fifo_buffer *fb, int value) { int i; for (i = 0; i < fb->size; i++) fb->values[i] += value; } struct fifo_buffer *fifo_alloc(int fifo_size) { struct fifo_buffer *fb; fb = kzalloc(sizeof(struct fifo_buffer) + sizeof(int) * fifo_size, GFP_NOIO); if (!fb) return NULL; fb->head_index = 0; fb->size = fifo_size; fb->total = 0; return fb; } static int drbd_rs_controller(struct drbd_device *device, unsigned int sect_in) { struct disk_conf *dc; unsigned int want; /* The number of sectors we want in the proxy */ int req_sect; /* Number of sectors to request in this turn */ int correction; /* Number of sectors more we need in the proxy*/ int cps; /* correction per invocation of drbd_rs_controller() */ int steps; /* Number of time steps to plan ahead */ int curr_corr; int max_sect; struct fifo_buffer *plan; dc = rcu_dereference(device->ldev->disk_conf); plan = rcu_dereference(device->rs_plan_s); steps = plan->size; /* (dc->c_plan_ahead * 10 * SLEEP_TIME) / HZ; */ if (device->rs_in_flight + sect_in == 0) { /* At start of resync */ want = ((dc->resync_rate * 2 * SLEEP_TIME) / HZ) * steps; } else { /* normal path */ want = dc->c_fill_target ? dc->c_fill_target : sect_in * dc->c_delay_target * HZ / (SLEEP_TIME * 10); } correction = want - device->rs_in_flight - plan->total; /* Plan ahead */ cps = correction / steps; fifo_add_val(plan, cps); plan->total += cps * steps; /* What we do in this step */ curr_corr = fifo_push(plan, 0); plan->total -= curr_corr; req_sect = sect_in + curr_corr; if (req_sect < 0) req_sect = 0; max_sect = (dc->c_max_rate * 2 * SLEEP_TIME) / HZ; if (req_sect > max_sect) req_sect = max_sect; /* drbd_warn(device, "si=%u if=%d wa=%u co=%d st=%d cps=%d pl=%d cc=%d rs=%d\n", sect_in, device->rs_in_flight, want, correction, steps, cps, device->rs_planed, curr_corr, req_sect); */ return req_sect; } static int drbd_rs_number_requests(struct drbd_device *device) { unsigned int sect_in; /* Number of sectors that came in since the last turn */ int number, mxb; sect_in = atomic_xchg(&device->rs_sect_in, 0); device->rs_in_flight -= sect_in; rcu_read_lock(); mxb = drbd_get_max_buffers(device) / 2; if (rcu_dereference(device->rs_plan_s)->size) { number = drbd_rs_controller(device, sect_in) >> (BM_BLOCK_SHIFT - 9); device->c_sync_rate = number * HZ * (BM_BLOCK_SIZE / 1024) / SLEEP_TIME; } else { device->c_sync_rate = rcu_dereference(device->ldev->disk_conf)->resync_rate; number = SLEEP_TIME * device->c_sync_rate / ((BM_BLOCK_SIZE / 1024) * HZ); } rcu_read_unlock(); /* Don't have more than "max-buffers"/2 in-flight. * Otherwise we may cause the remote site to stall on drbd_alloc_pages(), * potentially causing a distributed deadlock on congestion during * online-verify or (checksum-based) resync, if max-buffers, * socket buffer sizes and resync rate settings are mis-configured. */ if (mxb - device->rs_in_flight < number) number = mxb - device->rs_in_flight; return number; } static int make_resync_request(struct drbd_device *device, int cancel) { unsigned long bit; sector_t sector; const sector_t capacity = drbd_get_capacity(device->this_bdev); int max_bio_size; int number, rollback_i, size; int align, queued, sndbuf; int i = 0; #ifdef PARANOIA BUG_ON(w != &device->resync_work); #endif if (unlikely(cancel)) return 0; if (device->rs_total == 0) { /* empty resync? */ drbd_resync_finished(device); return 0; } if (!get_ldev(device)) { /* Since we only need to access device->rsync a get_ldev_if_state(device,D_FAILED) would be sufficient, but to continue resync with a broken disk makes no sense at all */ drbd_err(device, "Disk broke down during resync!\n"); return 0; } max_bio_size = queue_max_hw_sectors(device->rq_queue) << 9; number = drbd_rs_number_requests(device); if (number <= 0) goto requeue; for (i = 0; i < number; i++) { /* Stop generating RS requests, when half of the send buffer is filled */ mutex_lock(&first_peer_device(device)->connection->data.mutex); if (first_peer_device(device)->connection->data.socket) { queued = first_peer_device(device)->connection->data.socket->sk->sk_wmem_queued; sndbuf = first_peer_device(device)->connection->data.socket->sk->sk_sndbuf; } else { queued = 1; sndbuf = 0; } mutex_unlock(&first_peer_device(device)->connection->data.mutex); if (queued > sndbuf / 2) goto requeue; next_sector: size = BM_BLOCK_SIZE; bit = drbd_bm_find_next(device, device->bm_resync_fo); if (bit == DRBD_END_OF_BITMAP) { device->bm_resync_fo = drbd_bm_bits(device); put_ldev(device); return 0; } sector = BM_BIT_TO_SECT(bit); if (drbd_rs_should_slow_down(device, sector) || drbd_try_rs_begin_io(device, sector)) { device->bm_resync_fo = bit; goto requeue; } device->bm_resync_fo = bit + 1; if (unlikely(drbd_bm_test_bit(device, bit) == 0)) { drbd_rs_complete_io(device, sector); goto next_sector; } #if DRBD_MAX_BIO_SIZE > BM_BLOCK_SIZE /* try to find some adjacent bits. * we stop if we have already the maximum req size. * * Additionally always align bigger requests, in order to * be prepared for all stripe sizes of software RAIDs. */ align = 1; rollback_i = i; while (i < number) { if (size + BM_BLOCK_SIZE > max_bio_size) break; /* Be always aligned */ if (sector & ((1<<(align+3))-1)) break; /* do not cross extent boundaries */ if (((bit+1) & BM_BLOCKS_PER_BM_EXT_MASK) == 0) break; /* now, is it actually dirty, after all? * caution, drbd_bm_test_bit is tri-state for some * obscure reason; ( b == 0 ) would get the out-of-band * only accidentally right because of the "oddly sized" * adjustment below */ if (drbd_bm_test_bit(device, bit+1) != 1) break; bit++; size += BM_BLOCK_SIZE; if ((BM_BLOCK_SIZE << align) <= size) align++; i++; } /* if we merged some, * reset the offset to start the next drbd_bm_find_next from */ if (size > BM_BLOCK_SIZE) device->bm_resync_fo = bit + 1; #endif /* adjust very last sectors, in case we are oddly sized */ if (sector + (size>>9) > capacity) size = (capacity-sector)<<9; if (first_peer_device(device)->connection->agreed_pro_version >= 89 && first_peer_device(device)->connection->csums_tfm) { switch (read_for_csum(first_peer_device(device), sector, size)) { case -EIO: /* Disk failure */ put_ldev(device); return -EIO; case -EAGAIN: /* allocation failed, or ldev busy */ drbd_rs_complete_io(device, sector); device->bm_resync_fo = BM_SECT_TO_BIT(sector); i = rollback_i; goto requeue; case 0: /* everything ok */ break; default: BUG(); } } else { int err; inc_rs_pending(device); err = drbd_send_drequest(first_peer_device(device), P_RS_DATA_REQUEST, sector, size, ID_SYNCER); if (err) { drbd_err(device, "drbd_send_drequest() failed, aborting...\n"); dec_rs_pending(device); put_ldev(device); return err; } } } if (device->bm_resync_fo >= drbd_bm_bits(device)) { /* last syncer _request_ was sent, * but the P_RS_DATA_REPLY not yet received. sync will end (and * next sync group will resume), as soon as we receive the last * resync data block, and the last bit is cleared. * until then resync "work" is "inactive" ... */ put_ldev(device); return 0; } requeue: device->rs_in_flight += (i << (BM_BLOCK_SHIFT - 9)); mod_timer(&device->resync_timer, jiffies + SLEEP_TIME); put_ldev(device); return 0; } static int make_ov_request(struct drbd_device *device, int cancel) { int number, i, size; sector_t sector; const sector_t capacity = drbd_get_capacity(device->this_bdev); bool stop_sector_reached = false; if (unlikely(cancel)) return 1; number = drbd_rs_number_requests(device); sector = device->ov_position; for (i = 0; i < number; i++) { if (sector >= capacity) return 1; /* We check for "finished" only in the reply path: * w_e_end_ov_reply(). * We need to send at least one request out. */ stop_sector_reached = i > 0 && verify_can_do_stop_sector(device) && sector >= device->ov_stop_sector; if (stop_sector_reached) break; size = BM_BLOCK_SIZE; if (drbd_rs_should_slow_down(device, sector) || drbd_try_rs_begin_io(device, sector)) { device->ov_position = sector; goto requeue; } if (sector + (size>>9) > capacity) size = (capacity-sector)<<9; inc_rs_pending(device); if (drbd_send_ov_request(first_peer_device(device), sector, size)) { dec_rs_pending(device); return 0; } sector += BM_SECT_PER_BIT; } device->ov_position = sector; requeue: device->rs_in_flight += (i << (BM_BLOCK_SHIFT - 9)); if (i == 0 || !stop_sector_reached) mod_timer(&device->resync_timer, jiffies + SLEEP_TIME); return 1; } int w_ov_finished(struct drbd_work *w, int cancel) { struct drbd_device_work *dw = container_of(w, struct drbd_device_work, w); struct drbd_device *device = dw->device; kfree(dw); ov_out_of_sync_print(device); drbd_resync_finished(device); return 0; } static int w_resync_finished(struct drbd_work *w, int cancel) { struct drbd_device_work *dw = container_of(w, struct drbd_device_work, w); struct drbd_device *device = dw->device; kfree(dw); drbd_resync_finished(device); return 0; } static void ping_peer(struct drbd_device *device) { struct drbd_connection *connection = first_peer_device(device)->connection; clear_bit(GOT_PING_ACK, &connection->flags); request_ping(connection); wait_event(connection->ping_wait, test_bit(GOT_PING_ACK, &connection->flags) || device->state.conn < C_CONNECTED); } int drbd_resync_finished(struct drbd_device *device) { unsigned long db, dt, dbdt; unsigned long n_oos; union drbd_state os, ns; struct drbd_device_work *dw; char *khelper_cmd = NULL; int verify_done = 0; /* Remove all elements from the resync LRU. Since future actions * might set bits in the (main) bitmap, then the entries in the * resync LRU would be wrong. */ if (drbd_rs_del_all(device)) { /* In case this is not possible now, most probably because * there are P_RS_DATA_REPLY Packets lingering on the worker's * queue (or even the read operations for those packets * is not finished by now). Retry in 100ms. */ drbd_kick_lo(device); schedule_timeout_interruptible(HZ / 10); dw = kmalloc(sizeof(struct drbd_device_work), GFP_ATOMIC); if (dw) { dw->w.cb = w_resync_finished; dw->device = device; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &dw->w); return 1; } drbd_err(device, "Warn failed to drbd_rs_del_all() and to kmalloc(dw).\n"); } dt = (jiffies - device->rs_start - device->rs_paused) / HZ; if (dt <= 0) dt = 1; db = device->rs_total; /* adjust for verify start and stop sectors, respective reached position */ if (device->state.conn == C_VERIFY_S || device->state.conn == C_VERIFY_T) db -= device->ov_left; dbdt = Bit2KB(db/dt); device->rs_paused /= HZ; if (!get_ldev(device)) goto out; ping_peer(device); spin_lock_irq(&device->resource->req_lock); os = drbd_read_state(device); verify_done = (os.conn == C_VERIFY_S || os.conn == C_VERIFY_T); /* This protects us against multiple calls (that can happen in the presence of application IO), and against connectivity loss just before we arrive here. */ if (os.conn <= C_CONNECTED) goto out_unlock; ns = os; ns.conn = C_CONNECTED; drbd_info(device, "%s done (total %lu sec; paused %lu sec; %lu K/sec)\n", verify_done ? "Online verify" : "Resync", dt + device->rs_paused, device->rs_paused, dbdt); n_oos = drbd_bm_total_weight(device); if (os.conn == C_VERIFY_S || os.conn == C_VERIFY_T) { if (n_oos) { drbd_alert(device, "Online verify found %lu %dk block out of sync!\n", n_oos, Bit2KB(1)); khelper_cmd = "out-of-sync"; } } else { D_ASSERT(device, (n_oos - device->rs_failed) == 0); if (os.conn == C_SYNC_TARGET || os.conn == C_PAUSED_SYNC_T) khelper_cmd = "after-resync-target"; if (first_peer_device(device)->connection->csums_tfm && device->rs_total) { const unsigned long s = device->rs_same_csum; const unsigned long t = device->rs_total; const int ratio = (t == 0) ? 0 : (t < 100000) ? ((s*100)/t) : (s/(t/100)); drbd_info(device, "%u %% had equal checksums, eliminated: %luK; " "transferred %luK total %luK\n", ratio, Bit2KB(device->rs_same_csum), Bit2KB(device->rs_total - device->rs_same_csum), Bit2KB(device->rs_total)); } } if (device->rs_failed) { drbd_info(device, " %lu failed blocks\n", device->rs_failed); if (os.conn == C_SYNC_TARGET || os.conn == C_PAUSED_SYNC_T) { ns.disk = D_INCONSISTENT; ns.pdsk = D_UP_TO_DATE; } else { ns.disk = D_UP_TO_DATE; ns.pdsk = D_INCONSISTENT; } } else { ns.disk = D_UP_TO_DATE; ns.pdsk = D_UP_TO_DATE; if (os.conn == C_SYNC_TARGET || os.conn == C_PAUSED_SYNC_T) { if (device->p_uuid) { int i; for (i = UI_BITMAP ; i <= UI_HISTORY_END ; i++) _drbd_uuid_set(device, i, device->p_uuid[i]); drbd_uuid_set(device, UI_BITMAP, device->ldev->md.uuid[UI_CURRENT]); _drbd_uuid_set(device, UI_CURRENT, device->p_uuid[UI_CURRENT]); } else { drbd_err(device, "device->p_uuid is NULL! BUG\n"); } } if (!(os.conn == C_VERIFY_S || os.conn == C_VERIFY_T)) { /* for verify runs, we don't update uuids here, * so there would be nothing to report. */ drbd_uuid_set_bm(device, 0UL); drbd_print_uuids(device, "updated UUIDs"); if (device->p_uuid) { /* Now the two UUID sets are equal, update what we * know of the peer. */ int i; for (i = UI_CURRENT ; i <= UI_HISTORY_END ; i++) device->p_uuid[i] = device->ldev->md.uuid[i]; } } } _drbd_set_state(device, ns, CS_VERBOSE, NULL); out_unlock: spin_unlock_irq(&device->resource->req_lock); put_ldev(device); out: device->rs_total = 0; device->rs_failed = 0; device->rs_paused = 0; /* reset start sector, if we reached end of device */ if (verify_done && device->ov_left == 0) device->ov_start_sector = 0; drbd_md_sync(device); if (khelper_cmd) drbd_khelper(device, khelper_cmd); return 1; } /* helper */ static void move_to_net_ee_or_free(struct drbd_device *device, struct drbd_peer_request *peer_req) { if (drbd_peer_req_has_active_page(peer_req)) { /* This might happen if sendpage() has not finished */ int i = (peer_req->i.size + PAGE_SIZE -1) >> PAGE_SHIFT; atomic_add(i, &device->pp_in_use_by_net); atomic_sub(i, &device->pp_in_use); spin_lock_irq(&device->resource->req_lock); list_add_tail(&peer_req->w.list, &device->net_ee); spin_unlock_irq(&device->resource->req_lock); wake_up(&drbd_pp_wait); } else drbd_free_peer_req(device, peer_req); } /** * w_e_end_data_req() - Worker callback, to send a P_DATA_REPLY packet in response to a P_DATA_REQUEST * @device: DRBD device. * @w: work object. * @cancel: The connection will be closed anyways */ int w_e_end_data_req(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; int err; if (unlikely(cancel)) { drbd_free_peer_req(device, peer_req); dec_unacked(device); return 0; } if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) { err = drbd_send_block(peer_device, P_DATA_REPLY, peer_req); } else { if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Sending NegDReply. sector=%llus.\n", (unsigned long long)peer_req->i.sector); err = drbd_send_ack(peer_device, P_NEG_DREPLY, peer_req); } dec_unacked(device); move_to_net_ee_or_free(device, peer_req); if (unlikely(err)) drbd_err(device, "drbd_send_block() failed\n"); return err; } /** * w_e_end_rsdata_req() - Worker callback to send a P_RS_DATA_REPLY packet in response to a P_RS_DATA_REQUEST * @w: work object. * @cancel: The connection will be closed anyways */ int w_e_end_rsdata_req(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; int err; if (unlikely(cancel)) { drbd_free_peer_req(device, peer_req); dec_unacked(device); return 0; } if (get_ldev_if_state(device, D_FAILED)) { drbd_rs_complete_io(device, peer_req->i.sector); put_ldev(device); } if (device->state.conn == C_AHEAD) { err = drbd_send_ack(peer_device, P_RS_CANCEL, peer_req); } else if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) { if (likely(device->state.pdsk >= D_INCONSISTENT)) { inc_rs_pending(device); err = drbd_send_block(peer_device, P_RS_DATA_REPLY, peer_req); } else { if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Not sending RSDataReply, " "partner DISKLESS!\n"); err = 0; } } else { if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Sending NegRSDReply. sector %llus.\n", (unsigned long long)peer_req->i.sector); err = drbd_send_ack(peer_device, P_NEG_RS_DREPLY, peer_req); /* update resync data with failure */ drbd_rs_failed_io(device, peer_req->i.sector, peer_req->i.size); } dec_unacked(device); move_to_net_ee_or_free(device, peer_req); if (unlikely(err)) drbd_err(device, "drbd_send_block() failed\n"); return err; } int w_e_end_csum_rs_req(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; struct digest_info *di; int digest_size; void *digest = NULL; int err, eq = 0; if (unlikely(cancel)) { drbd_free_peer_req(device, peer_req); dec_unacked(device); return 0; } if (get_ldev(device)) { drbd_rs_complete_io(device, peer_req->i.sector); put_ldev(device); } di = peer_req->digest; if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) { /* quick hack to try to avoid a race against reconfiguration. * a real fix would be much more involved, * introducing more locking mechanisms */ if (peer_device->connection->csums_tfm) { digest_size = crypto_hash_digestsize(peer_device->connection->csums_tfm); D_ASSERT(device, digest_size == di->digest_size); digest = kmalloc(digest_size, GFP_NOIO); } if (digest) { drbd_csum_ee(peer_device->connection->csums_tfm, peer_req, digest); eq = !memcmp(digest, di->digest, digest_size); kfree(digest); } if (eq) { drbd_set_in_sync(device, peer_req->i.sector, peer_req->i.size); /* rs_same_csums unit is BM_BLOCK_SIZE */ device->rs_same_csum += peer_req->i.size >> BM_BLOCK_SHIFT; err = drbd_send_ack(peer_device, P_RS_IS_IN_SYNC, peer_req); } else { inc_rs_pending(device); peer_req->block_id = ID_SYNCER; /* By setting block_id, digest pointer becomes invalid! */ peer_req->flags &= ~EE_HAS_DIGEST; /* This peer request no longer has a digest pointer */ kfree(di); err = drbd_send_block(peer_device, P_RS_DATA_REPLY, peer_req); } } else { err = drbd_send_ack(peer_device, P_NEG_RS_DREPLY, peer_req); if (DRBD_ratelimit(5*HZ, 5)) drbd_err(device, "Sending NegDReply. I guess it gets messy.\n"); } dec_unacked(device); move_to_net_ee_or_free(device, peer_req); if (unlikely(err)) drbd_err(device, "drbd_send_block/ack() failed\n"); return err; } int w_e_end_ov_req(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; sector_t sector = peer_req->i.sector; unsigned int size = peer_req->i.size; int digest_size; void *digest; int err = 0; if (unlikely(cancel)) goto out; digest_size = crypto_hash_digestsize(peer_device->connection->verify_tfm); /* FIXME if this allocation fails, online verify will not terminate! */ digest = kmalloc(digest_size, GFP_NOIO); if (!digest) { err = -ENOMEM; goto out; } if (!(peer_req->flags & EE_WAS_ERROR)) drbd_csum_ee(peer_device->connection->verify_tfm, peer_req, digest); else memset(digest, 0, digest_size); /* Free peer_req and pages before send. * In case we block on congestion, we could otherwise run into * some distributed deadlock, if the other side blocks on * congestion as well, because our receiver blocks in * drbd_alloc_pages due to pp_in_use > max_buffers. */ drbd_free_peer_req(device, peer_req); peer_req = NULL; inc_rs_pending(device); err = drbd_send_drequest_csum(peer_device, sector, size, digest, digest_size, P_OV_REPLY); if (err) dec_rs_pending(device); kfree(digest); out: if (peer_req) drbd_free_peer_req(device, peer_req); dec_unacked(device); return err; } void drbd_ov_out_of_sync_found(struct drbd_device *device, sector_t sector, int size) { if (device->ov_last_oos_start + device->ov_last_oos_size == sector) { device->ov_last_oos_size += size>>9; } else { device->ov_last_oos_start = sector; device->ov_last_oos_size = size>>9; } drbd_set_out_of_sync(device, sector, size); } int w_e_end_ov_reply(struct drbd_work *w, int cancel) { struct drbd_peer_request *peer_req = container_of(w, struct drbd_peer_request, w); struct drbd_peer_device *peer_device = peer_req->peer_device; struct drbd_device *device = peer_device->device; struct digest_info *di; void *digest; sector_t sector = peer_req->i.sector; unsigned int size = peer_req->i.size; int digest_size; int err, eq = 0; bool stop_sector_reached = false; if (unlikely(cancel)) { drbd_free_peer_req(device, peer_req); dec_unacked(device); return 0; } /* after "cancel", because after drbd_disconnect/drbd_rs_cancel_all * the resync lru has been cleaned up already */ if (get_ldev(device)) { drbd_rs_complete_io(device, peer_req->i.sector); put_ldev(device); } di = peer_req->digest; if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) { digest_size = crypto_hash_digestsize(peer_device->connection->verify_tfm); digest = kmalloc(digest_size, GFP_NOIO); if (digest) { drbd_csum_ee(peer_device->connection->verify_tfm, peer_req, digest); D_ASSERT(device, digest_size == di->digest_size); eq = !memcmp(digest, di->digest, digest_size); kfree(digest); } } /* Free peer_req and pages before send. * In case we block on congestion, we could otherwise run into * some distributed deadlock, if the other side blocks on * congestion as well, because our receiver blocks in * drbd_alloc_pages due to pp_in_use > max_buffers. */ drbd_free_peer_req(device, peer_req); if (!eq) drbd_ov_out_of_sync_found(device, sector, size); else ov_out_of_sync_print(device); err = drbd_send_ack_ex(peer_device, P_OV_RESULT, sector, size, eq ? ID_IN_SYNC : ID_OUT_OF_SYNC); dec_unacked(device); --device->ov_left; /* let's advance progress step marks only for every other megabyte */ if ((device->ov_left & 0x200) == 0x200) drbd_advance_rs_marks(device, device->ov_left); stop_sector_reached = verify_can_do_stop_sector(device) && (sector + (size>>9)) >= device->ov_stop_sector; if (device->ov_left == 0 || stop_sector_reached) { ov_out_of_sync_print(device); drbd_resync_finished(device); } return err; } /* FIXME * We need to track the number of pending barrier acks, * and to be able to wait for them. * See also comment in drbd_adm_attach before drbd_suspend_io. */ int drbd_send_barrier(struct drbd_connection *connection) { struct p_barrier *p; struct drbd_socket *sock; sock = &connection->data; p = conn_prepare_command(connection, sock); if (!p) return -EIO; p->barrier = connection->send.current_epoch_nr; p->pad = 0; connection->send.current_epoch_writes = 0; return conn_send_command(connection, sock, P_BARRIER, sizeof(*p), NULL, 0); } int w_send_write_hint(struct drbd_work *w, int cancel) { struct drbd_device *device = container_of(w, struct drbd_device, unplug_work); struct drbd_socket *sock; if (cancel) return 0; sock = &first_peer_device(device)->connection->data; if (!drbd_prepare_command(first_peer_device(device), sock)) return -EIO; return drbd_send_command(first_peer_device(device), sock, P_UNPLUG_REMOTE, 0, NULL, 0); } static void re_init_if_first_write(struct drbd_connection *connection, unsigned int epoch) { if (!connection->send.seen_any_write_yet) { connection->send.seen_any_write_yet = true; connection->send.current_epoch_nr = epoch; connection->send.current_epoch_writes = 0; } } static void maybe_send_barrier(struct drbd_connection *connection, unsigned int epoch) { /* re-init if first write on this connection */ if (!connection->send.seen_any_write_yet) return; if (connection->send.current_epoch_nr != epoch) { if (connection->send.current_epoch_writes) drbd_send_barrier(connection); connection->send.current_epoch_nr = epoch; } } int w_send_out_of_sync(struct drbd_work *w, int cancel) { struct drbd_request *req = container_of(w, struct drbd_request, w); struct drbd_device *device = req->device; struct drbd_connection *connection = first_peer_device(device)->connection; int err; if (unlikely(cancel)) { req_mod(req, SEND_CANCELED); return 0; } /* this time, no connection->send.current_epoch_writes++; * If it was sent, it was the closing barrier for the last * replicated epoch, before we went into AHEAD mode. * No more barriers will be sent, until we leave AHEAD mode again. */ maybe_send_barrier(connection, req->epoch); err = drbd_send_out_of_sync(first_peer_device(device), req); req_mod(req, OOS_HANDED_TO_NETWORK); return err; } /** * w_send_dblock() - Worker callback to send a P_DATA packet in order to mirror a write request * @w: work object. * @cancel: The connection will be closed anyways */ int w_send_dblock(struct drbd_work *w, int cancel) { struct drbd_request *req = container_of(w, struct drbd_request, w); struct drbd_device *device = req->device; struct drbd_connection *connection = first_peer_device(device)->connection; int err; if (unlikely(cancel)) { req_mod(req, SEND_CANCELED); return 0; } re_init_if_first_write(connection, req->epoch); maybe_send_barrier(connection, req->epoch); connection->send.current_epoch_writes++; err = drbd_send_dblock(first_peer_device(device), req); req_mod(req, err ? SEND_FAILED : HANDED_OVER_TO_NETWORK); return err; } /** * w_send_read_req() - Worker callback to send a read request (P_DATA_REQUEST) packet * @w: work object. * @cancel: The connection will be closed anyways */ int w_send_read_req(struct drbd_work *w, int cancel) { struct drbd_request *req = container_of(w, struct drbd_request, w); struct drbd_device *device = req->device; struct drbd_connection *connection = first_peer_device(device)->connection; int err; if (unlikely(cancel)) { req_mod(req, SEND_CANCELED); return 0; } /* Even read requests may close a write epoch, * if there was any yet. */ maybe_send_barrier(connection, req->epoch); err = drbd_send_drequest(first_peer_device(device), P_DATA_REQUEST, req->i.sector, req->i.size, (unsigned long)req); req_mod(req, err ? SEND_FAILED : HANDED_OVER_TO_NETWORK); return err; } int w_restart_disk_io(struct drbd_work *w, int cancel) { struct drbd_request *req = container_of(w, struct drbd_request, w); struct drbd_device *device = req->device; if (bio_data_dir(req->master_bio) == WRITE && req->rq_state & RQ_IN_ACT_LOG) drbd_al_begin_io(device, &req->i, false); drbd_req_make_private_bio(req, req->master_bio); req->private_bio->bi_bdev = device->ldev->backing_bdev; generic_make_request(req->private_bio); return 0; } static int _drbd_may_sync_now(struct drbd_device *device) { struct drbd_device *odev = device; int resync_after; while (1) { if (!odev->ldev || odev->state.disk == D_DISKLESS) return 1; rcu_read_lock(); resync_after = rcu_dereference(odev->ldev->disk_conf)->resync_after; rcu_read_unlock(); if (resync_after == -1) return 1; odev = minor_to_device(resync_after); if (!odev) return 1; if ((odev->state.conn >= C_SYNC_SOURCE && odev->state.conn <= C_PAUSED_SYNC_T) || odev->state.aftr_isp || odev->state.peer_isp || odev->state.user_isp) return 0; } } /** * _drbd_pause_after() - Pause resync on all devices that may not resync now * @device: DRBD device. * * Called from process context only (admin command and after_state_ch). */ static int _drbd_pause_after(struct drbd_device *device) { struct drbd_device *odev; int i, rv = 0; rcu_read_lock(); idr_for_each_entry(&drbd_devices, odev, i) { if (odev->state.conn == C_STANDALONE && odev->state.disk == D_DISKLESS) continue; if (!_drbd_may_sync_now(odev)) rv |= (__drbd_set_state(_NS(odev, aftr_isp, 1), CS_HARD, NULL) != SS_NOTHING_TO_DO); } rcu_read_unlock(); return rv; } /** * _drbd_resume_next() - Resume resync on all devices that may resync now * @device: DRBD device. * * Called from process context only (admin command and worker). */ static int _drbd_resume_next(struct drbd_device *device) { struct drbd_device *odev; int i, rv = 0; rcu_read_lock(); idr_for_each_entry(&drbd_devices, odev, i) { if (odev->state.conn == C_STANDALONE && odev->state.disk == D_DISKLESS) continue; if (odev->state.aftr_isp) { if (_drbd_may_sync_now(odev)) rv |= (__drbd_set_state(_NS(odev, aftr_isp, 0), CS_HARD, NULL) != SS_NOTHING_TO_DO) ; } } rcu_read_unlock(); return rv; } void resume_next_sg(struct drbd_device *device) { write_lock_irq(&global_state_lock); _drbd_resume_next(device); write_unlock_irq(&global_state_lock); } void suspend_other_sg(struct drbd_device *device) { write_lock_irq(&global_state_lock); _drbd_pause_after(device); write_unlock_irq(&global_state_lock); } /* caller must hold global_state_lock */ enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_minor) { struct drbd_device *odev; int resync_after; if (o_minor == -1) return NO_ERROR; if (o_minor < -1 || o_minor > MINORMASK) return ERR_RESYNC_AFTER; /* check for loops */ odev = minor_to_device(o_minor); while (1) { if (odev == device) return ERR_RESYNC_AFTER_CYCLE; /* You are free to depend on diskless, non-existing, * or not yet/no longer existing minors. * We only reject dependency loops. * We cannot follow the dependency chain beyond a detached or * missing minor. */ if (!odev || !odev->ldev || odev->state.disk == D_DISKLESS) return NO_ERROR; rcu_read_lock(); resync_after = rcu_dereference(odev->ldev->disk_conf)->resync_after; rcu_read_unlock(); /* dependency chain ends here, no cycles. */ if (resync_after == -1) return NO_ERROR; /* follow the dependency chain */ odev = minor_to_device(resync_after); } } /* caller must hold global_state_lock */ void drbd_resync_after_changed(struct drbd_device *device) { int changes; do { changes = _drbd_pause_after(device); changes |= _drbd_resume_next(device); } while (changes); } void drbd_rs_controller_reset(struct drbd_device *device) { struct fifo_buffer *plan; atomic_set(&device->rs_sect_in, 0); atomic_set(&device->rs_sect_ev, 0); device->rs_in_flight = 0; /* Updating the RCU protected object in place is necessary since this function gets called from atomic context. It is valid since all other updates also lead to an completely empty fifo */ rcu_read_lock(); plan = rcu_dereference(device->rs_plan_s); plan->total = 0; fifo_set(plan, 0); rcu_read_unlock(); } void start_resync_timer_fn(unsigned long data) { struct drbd_device *device = (struct drbd_device *) data; drbd_queue_work(&first_peer_device(device)->connection->sender_work, &device->start_resync_work); } int w_start_resync(struct drbd_work *w, int cancel) { struct drbd_device *device = container_of(w, struct drbd_device, start_resync_work); if (atomic_read(&device->unacked_cnt) || atomic_read(&device->rs_pending_cnt)) { drbd_warn(device, "w_start_resync later...\n"); device->start_resync_timer.expires = jiffies + HZ/10; add_timer(&device->start_resync_timer); return 0; } drbd_start_resync(device, C_SYNC_SOURCE); clear_bit(AHEAD_TO_SYNC_SOURCE, &device->flags); return 0; } /** * drbd_start_resync() - Start the resync process * @device: DRBD device. * @side: Either C_SYNC_SOURCE or C_SYNC_TARGET * * This function might bring you directly into one of the * C_PAUSED_SYNC_* states. */ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side) { union drbd_state ns; int r; if (device->state.conn >= C_SYNC_SOURCE && device->state.conn < C_AHEAD) { drbd_err(device, "Resync already running!\n"); return; } if (!test_bit(B_RS_H_DONE, &device->flags)) { if (side == C_SYNC_TARGET) { /* Since application IO was locked out during C_WF_BITMAP_T and C_WF_SYNC_UUID we are still unmodified. Before going to C_SYNC_TARGET we check that we might make the data inconsistent. */ r = drbd_khelper(device, "before-resync-target"); r = (r >> 8) & 0xff; if (r > 0) { drbd_info(device, "before-resync-target handler returned %d, " "dropping connection.\n", r); conn_request_state(first_peer_device(device)->connection, NS(conn, C_DISCONNECTING), CS_HARD); return; } } else /* C_SYNC_SOURCE */ { r = drbd_khelper(device, "before-resync-source"); r = (r >> 8) & 0xff; if (r > 0) { if (r == 3) { drbd_info(device, "before-resync-source handler returned %d, " "ignoring. Old userland tools?", r); } else { drbd_info(device, "before-resync-source handler returned %d, " "dropping connection.\n", r); conn_request_state(first_peer_device(device)->connection, NS(conn, C_DISCONNECTING), CS_HARD); return; } } } } if (current == first_peer_device(device)->connection->worker.task) { /* The worker should not sleep waiting for state_mutex, that can take long */ if (!mutex_trylock(device->state_mutex)) { set_bit(B_RS_H_DONE, &device->flags); device->start_resync_timer.expires = jiffies + HZ/5; add_timer(&device->start_resync_timer); return; } } else { mutex_lock(device->state_mutex); } clear_bit(B_RS_H_DONE, &device->flags); /* req_lock: serialize with drbd_send_and_submit() and others * global_state_lock: for stable sync-after dependencies */ spin_lock_irq(&device->resource->req_lock); write_lock(&global_state_lock); /* Did some connection breakage or IO error race with us? */ if (device->state.conn < C_CONNECTED || !get_ldev_if_state(device, D_NEGOTIATING)) { write_unlock(&global_state_lock); spin_unlock_irq(&device->resource->req_lock); mutex_unlock(device->state_mutex); return; } ns = drbd_read_state(device); ns.aftr_isp = !_drbd_may_sync_now(device); ns.conn = side; if (side == C_SYNC_TARGET) ns.disk = D_INCONSISTENT; else /* side == C_SYNC_SOURCE */ ns.pdsk = D_INCONSISTENT; r = __drbd_set_state(device, ns, CS_VERBOSE, NULL); ns = drbd_read_state(device); if (ns.conn < C_CONNECTED) r = SS_UNKNOWN_ERROR; if (r == SS_SUCCESS) { unsigned long tw = drbd_bm_total_weight(device); unsigned long now = jiffies; int i; device->rs_failed = 0; device->rs_paused = 0; device->rs_same_csum = 0; device->rs_last_events = 0; device->rs_last_sect_ev = 0; device->rs_total = tw; device->rs_start = now; for (i = 0; i < DRBD_SYNC_MARKS; i++) { device->rs_mark_left[i] = tw; device->rs_mark_time[i] = now; } _drbd_pause_after(device); } write_unlock(&global_state_lock); spin_unlock_irq(&device->resource->req_lock); if (r == SS_SUCCESS) { /* reset rs_last_bcast when a resync or verify is started, * to deal with potential jiffies wrap. */ device->rs_last_bcast = jiffies - HZ; drbd_info(device, "Began resync as %s (will sync %lu KB [%lu bits set]).\n", drbd_conn_str(ns.conn), (unsigned long) device->rs_total << (BM_BLOCK_SHIFT-10), (unsigned long) device->rs_total); if (side == C_SYNC_TARGET) device->bm_resync_fo = 0; /* Since protocol 96, we must serialize drbd_gen_and_send_sync_uuid * with w_send_oos, or the sync target will get confused as to * how much bits to resync. We cannot do that always, because for an * empty resync and protocol < 95, we need to do it here, as we call * drbd_resync_finished from here in that case. * We drbd_gen_and_send_sync_uuid here for protocol < 96, * and from after_state_ch otherwise. */ if (side == C_SYNC_SOURCE && first_peer_device(device)->connection->agreed_pro_version < 96) drbd_gen_and_send_sync_uuid(first_peer_device(device)); if (first_peer_device(device)->connection->agreed_pro_version < 95 && device->rs_total == 0) { /* This still has a race (about when exactly the peers * detect connection loss) that can lead to a full sync * on next handshake. In 8.3.9 we fixed this with explicit * resync-finished notifications, but the fix * introduces a protocol change. Sleeping for some * time longer than the ping interval + timeout on the * SyncSource, to give the SyncTarget the chance to * detect connection loss, then waiting for a ping * response (implicit in drbd_resync_finished) reduces * the race considerably, but does not solve it. */ if (side == C_SYNC_SOURCE) { struct net_conf *nc; int timeo; rcu_read_lock(); nc = rcu_dereference(first_peer_device(device)->connection->net_conf); timeo = nc->ping_int * HZ + nc->ping_timeo * HZ / 9; rcu_read_unlock(); schedule_timeout_interruptible(timeo); } drbd_resync_finished(device); } drbd_rs_controller_reset(device); /* ns.conn may already be != device->state.conn, * we may have been paused in between, or become paused until * the timer triggers. * No matter, that is handled in resync_timer_fn() */ if (ns.conn == C_SYNC_TARGET) mod_timer(&device->resync_timer, jiffies); drbd_md_sync(device); } put_ldev(device); mutex_unlock(device->state_mutex); } bool dequeue_work_batch(struct drbd_work_queue *queue, struct list_head *work_list) { spin_lock_irq(&queue->q_lock); list_splice_init(&queue->q, work_list); spin_unlock_irq(&queue->q_lock); return !list_empty(work_list); } bool dequeue_work_item(struct drbd_work_queue *queue, struct list_head *work_list) { spin_lock_irq(&queue->q_lock); if (!list_empty(&queue->q)) list_move(queue->q.next, work_list); spin_unlock_irq(&queue->q_lock); return !list_empty(work_list); } void wait_for_work(struct drbd_connection *connection, struct list_head *work_list) { DEFINE_WAIT(wait); struct net_conf *nc; int uncork, cork; dequeue_work_item(&connection->sender_work, work_list); if (!list_empty(work_list)) return; /* Still nothing to do? * Maybe we still need to close the current epoch, * even if no new requests are queued yet. * * Also, poke TCP, just in case. * Then wait for new work (or signal). */ rcu_read_lock(); nc = rcu_dereference(connection->net_conf); uncork = nc ? nc->tcp_cork : 0; rcu_read_unlock(); if (uncork) { mutex_lock(&connection->data.mutex); if (connection->data.socket) drbd_tcp_uncork(connection->data.socket); mutex_unlock(&connection->data.mutex); } for (;;) { int send_barrier; prepare_to_wait(&connection->sender_work.q_wait, &wait, TASK_INTERRUPTIBLE); spin_lock_irq(&connection->resource->req_lock); spin_lock(&connection->sender_work.q_lock); /* FIXME get rid of this one? */ /* dequeue single item only, * we still use drbd_queue_work_front() in some places */ if (!list_empty(&connection->sender_work.q)) list_move(connection->sender_work.q.next, work_list); spin_unlock(&connection->sender_work.q_lock); /* FIXME get rid of this one? */ if (!list_empty(work_list) || signal_pending(current)) { spin_unlock_irq(&connection->resource->req_lock); break; } /* We found nothing new to do, no to-be-communicated request, * no other work item. We may still need to close the last * epoch. Next incoming request epoch will be connection -> * current transfer log epoch number. If that is different * from the epoch of the last request we communicated, it is * safe to send the epoch separating barrier now. */ send_barrier = atomic_read(&connection->current_tle_nr) != connection->send.current_epoch_nr; spin_unlock_irq(&connection->resource->req_lock); if (send_barrier) maybe_send_barrier(connection, connection->send.current_epoch_nr + 1); schedule(); /* may be woken up for other things but new work, too, * e.g. if the current epoch got closed. * In which case we send the barrier above. */ } finish_wait(&connection->sender_work.q_wait, &wait); /* someone may have changed the config while we have been waiting above. */ rcu_read_lock(); nc = rcu_dereference(connection->net_conf); cork = nc ? nc->tcp_cork : 0; rcu_read_unlock(); mutex_lock(&connection->data.mutex); if (connection->data.socket) { if (cork) drbd_tcp_cork(connection->data.socket); else if (!uncork) drbd_tcp_uncork(connection->data.socket); } mutex_unlock(&connection->data.mutex); } int drbd_worker(struct drbd_thread *thi) { struct drbd_connection *connection = thi->connection; struct drbd_work *w = NULL; struct drbd_peer_device *peer_device; LIST_HEAD(work_list); int vnr; while (get_t_state(thi) == RUNNING) { drbd_thread_current_set_cpu(thi); /* as long as we use drbd_queue_work_front(), * we may only dequeue single work items here, not batches. */ if (list_empty(&work_list)) wait_for_work(connection, &work_list); if (signal_pending(current)) { flush_signals(current); if (get_t_state(thi) == RUNNING) { drbd_warn(connection, "Worker got an unexpected signal\n"); continue; } break; } if (get_t_state(thi) != RUNNING) break; while (!list_empty(&work_list)) { w = list_first_entry(&work_list, struct drbd_work, list); list_del_init(&w->list); if (w->cb(w, connection->cstate < C_WF_REPORT_PARAMS) == 0) continue; if (connection->cstate >= C_WF_REPORT_PARAMS) conn_request_state(connection, NS(conn, C_NETWORK_FAILURE), CS_HARD); } } do { while (!list_empty(&work_list)) { w = list_first_entry(&work_list, struct drbd_work, list); list_del_init(&w->list); w->cb(w, 1); } dequeue_work_batch(&connection->sender_work, &work_list); } while (!list_empty(&work_list)); rcu_read_lock(); idr_for_each_entry(&connection->peer_devices, peer_device, vnr) { struct drbd_device *device = peer_device->device; D_ASSERT(device, device->state.disk == D_DISKLESS && device->state.conn == C_STANDALONE); kobject_get(&device->kobj); rcu_read_unlock(); drbd_device_cleanup(device); kobject_put(&device->kobj); rcu_read_lock(); } rcu_read_unlock(); return 0; } drbd-8.4.4/drbd/drbd_wrappers.h0000664000000000000000000010226712226001711015064 0ustar rootroot#ifndef _DRBD_WRAPPERS_H #define _DRBD_WRAPPERS_H #include "compat.h" #include #include #include #include #include #include #include #include #include #include #include #include #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) # error "At least kernel version 2.6.18 (with patches) required" #endif /* The history of blkdev_issue_flush() It had 2 arguments before fbd9b09a177a481eda256447c881f014f29034fe, after it had 4 arguments. (With that commit came BLKDEV_IFL_WAIT) It had 4 arguments before dd3932eddf428571762596e17b65f5dc92ca361b, after it got 3 arguments. (With that commit came BLKDEV_DISCARD_SECURE and BLKDEV_IFL_WAIT disappeared again.) */ #ifndef BLKDEV_IFL_WAIT #ifndef BLKDEV_DISCARD_SECURE /* before fbd9b09a177 */ #define blkdev_issue_flush(b, gfpf, s) blkdev_issue_flush(b, s) #endif /* after dd3932eddf4 no define at all */ #else /* between fbd9b09a177 and dd3932eddf4 */ #define blkdev_issue_flush(b, gfpf, s) blkdev_issue_flush(b, gfpf, s, BLKDEV_IFL_WAIT) #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,31) static inline unsigned short queue_logical_block_size(struct request_queue *q) { int retval = 512; if (q && q->hardsect_size) retval = q->hardsect_size; return retval; } static inline sector_t bdev_logical_block_size(struct block_device *bdev) { return queue_logical_block_size(bdev_get_queue(bdev)); } static inline unsigned int queue_max_hw_sectors(struct request_queue *q) { return q->max_hw_sectors; } static inline unsigned int queue_max_sectors(struct request_queue *q) { return q->max_sectors; } static inline void blk_queue_logical_block_size(struct request_queue *q, unsigned short size) { q->hardsect_size = size; } #endif #ifdef COMPAT_HAVE_VOID_MAKE_REQUEST /* in Commit 5a7bbad27a410350e64a2d7f5ec18fc73836c14f (between Linux-3.1 and 3.2) make_request() becomes type void. Before it had type int. */ #define MAKE_REQUEST_TYPE void #define MAKE_REQUEST_RETURN return #else #define MAKE_REQUEST_TYPE int #define MAKE_REQUEST_RETURN return 0 #endif #ifndef COMPAT_HAVE_FMODE_T typedef unsigned __bitwise__ fmode_t; #endif #ifndef COMPAT_HAVE_BLKDEV_GET_BY_PATH /* see kernel 2.6.37, * d4d7762 block: clean up blkdev_get() wrappers and their users * e525fd8 block: make blkdev_get/put() handle exclusive access * and kernel 2.6.28 * 30c40d2 [PATCH] propagate mode through open_bdev_excl/close_bdev_excl * Also note that there is no FMODE_EXCL before * 86d434d [PATCH] eliminate use of ->f_flags in block methods */ #ifndef COMPAT_HAVE_OPEN_BDEV_EXCLUSIVE #ifndef FMODE_EXCL #define FMODE_EXCL 0 #endif static inline struct block_device *open_bdev_exclusive(const char *path, fmode_t mode, void *holder) { /* drbd does not open readonly, but try to be correct, anyways */ return open_bdev_excl(path, (mode & FMODE_WRITE) ? 0 : MS_RDONLY, holder); } static inline void close_bdev_exclusive(struct block_device *bdev, fmode_t mode) { /* mode ignored. */ close_bdev_excl(bdev); } #endif static inline struct block_device *blkdev_get_by_path(const char *path, fmode_t mode, void *holder) { return open_bdev_exclusive(path, mode, holder); } static inline int drbd_blkdev_put(struct block_device *bdev, fmode_t mode) { /* blkdev_put != close_bdev_exclusive, in general, so this is obviously * not correct, and there should be some if (mode & FMODE_EXCL) ... * But this is the only way it is used in DRBD, * and for <= 2.6.27, there is no FMODE_EXCL anyways. */ close_bdev_exclusive(bdev, mode); /* blkdev_put seems to not have useful return values, * close_bdev_exclusive is void. */ return 0; } #define blkdev_put(b, m) drbd_blkdev_put(b, m) #endif #define drbd_bio_uptodate(bio) bio_flagged(bio, BIO_UPTODATE) #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,24) /* Before Linux-2.6.24 bie_endio() had the size of the bio as second argument. See 6712ecf8f648118c3363c142196418f89a510b90 */ #define bio_endio(B,E) bio_endio(B, (B)->bi_size, E) #define BIO_ENDIO_TYPE int #define BIO_ENDIO_ARGS(b,e) (b, unsigned int bytes_done, e) #define BIO_ENDIO_FN_START if (bio->bi_size) return 1 #define BIO_ENDIO_FN_RETURN return 0 #else #define BIO_ENDIO_TYPE void #define BIO_ENDIO_ARGS(b,e) (b,e) #define BIO_ENDIO_FN_START do {} while (0) #define BIO_ENDIO_FN_RETURN return #endif /* bi_end_io handlers */ extern BIO_ENDIO_TYPE drbd_md_io_complete BIO_ENDIO_ARGS(struct bio *bio, int error); extern BIO_ENDIO_TYPE drbd_peer_request_endio BIO_ENDIO_ARGS(struct bio *bio, int error); extern BIO_ENDIO_TYPE drbd_request_endio BIO_ENDIO_ARGS(struct bio *bio, int error); #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,32) #define part_inc_in_flight(A, B) part_inc_in_flight(A) #define part_dec_in_flight(A, B) part_dec_in_flight(A) #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,23) /* Before 2.6.23 (with 20c2df83d25c6a95affe6157a4c9cac4cf5ffaac) kmem_cache_create had a ctor and a dtor */ #define kmem_cache_create(N,S,A,F,C) kmem_cache_create(N,S,A,F,C,NULL) #endif #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,26) # undef HAVE_bvec_merge_data # define HAVE_bvec_merge_data 1 #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,24) static inline void sg_set_page(struct scatterlist *sg, struct page *page, unsigned int len, unsigned int offset) { sg->page = page; sg->offset = offset; sg->length = len; } #define sg_init_table(S,N) ({}) #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,28) # define BD_OPS_USE_FMODE #endif /* how to get to the kobj of a gendisk. * see also upstream commits * edfaa7c36574f1bf09c65ad602412db9da5f96bf * ed9e1982347b36573cd622ee5f4e2a7ccd79b3fd * 548b10eb2959c96cef6fc29fc96e0931eeb53bc5 */ #ifndef dev_to_disk # define disk_to_kobj(disk) (&(disk)->kobj) #else # ifndef disk_to_dev # define disk_to_dev(disk) (&(disk)->dev) # endif # define disk_to_kobj(disk) (&disk_to_dev(disk)->kobj) #endif static inline int drbd_backing_bdev_events(struct gendisk *disk) { #if defined(__disk_stat_inc) /* older kernel */ return (int)disk_stat_read(disk, sectors[0]) + (int)disk_stat_read(disk, sectors[1]); #else /* recent kernel */ return (int)part_stat_read(&disk->part0, sectors[0]) + (int)part_stat_read(&disk->part0, sectors[1]); #endif } #ifndef COMPAT_HAVE_SOCK_SHUTDOWN #define COMPAT_HAVE_SOCK_SHUTDOWN 1 enum sock_shutdown_cmd { SHUT_RD = 0, SHUT_WR = 1, SHUT_RDWR = 2, }; static inline int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how) { return sock->ops->shutdown(sock, how); } #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,23) static inline void drbd_unregister_blkdev(unsigned int major, const char *name) { int ret = unregister_blkdev(major, name); if (ret) printk(KERN_ERR "drbd: unregister of device failed\n"); } #else #define drbd_unregister_blkdev unregister_blkdev #endif #if !defined(CRYPTO_ALG_ASYNC) /* With Linux-2.6.19 the crypto API changed! */ /* This is not a generic backport of the new api, it just implements the corner case of "hmac(xxx)". */ #define CRYPTO_ALG_ASYNC 4711 #define CRYPTO_ALG_TYPE_HASH CRYPTO_ALG_TYPE_DIGEST struct crypto_hash { struct crypto_tfm *base; const u8 *key; int keylen; }; struct hash_desc { struct crypto_hash *tfm; u32 flags; }; static inline struct crypto_hash * crypto_alloc_hash(char *alg_name, u32 type, u32 mask) { struct crypto_hash *ch; char *closing_bracket; /* "hmac(xxx)" is in alg_name we need that xxx. */ closing_bracket = strchr(alg_name, ')'); if (!closing_bracket) { ch = kmalloc(sizeof(struct crypto_hash), GFP_KERNEL); if (!ch) return ERR_PTR(-ENOMEM); ch->base = crypto_alloc_tfm(alg_name, 0); if (ch->base == NULL) { kfree(ch); return ERR_PTR(-ENOMEM); } return ch; } if (closing_bracket-alg_name < 6) return ERR_PTR(-ENOENT); ch = kmalloc(sizeof(struct crypto_hash), GFP_KERNEL); if (!ch) return ERR_PTR(-ENOMEM); *closing_bracket = 0; ch->base = crypto_alloc_tfm(alg_name + 5, 0); *closing_bracket = ')'; if (ch->base == NULL) { kfree(ch); return ERR_PTR(-ENOMEM); } return ch; } static inline int crypto_hash_setkey(struct crypto_hash *hash, const u8 *key, unsigned int keylen) { hash->key = key; hash->keylen = keylen; return 0; } static inline int crypto_hash_digest(struct hash_desc *desc, struct scatterlist *sg, unsigned int nbytes, u8 *out) { crypto_hmac(desc->tfm->base, (u8 *)desc->tfm->key, &desc->tfm->keylen, sg, 1 /* ! */ , out); /* ! this is not generic. Would need to convert nbytes -> nsg */ return 0; } static inline void crypto_free_hash(struct crypto_hash *tfm) { if (!tfm) return; crypto_free_tfm(tfm->base); kfree(tfm); } static inline unsigned int crypto_hash_digestsize(struct crypto_hash *tfm) { return crypto_tfm_alg_digestsize(tfm->base); } static inline struct crypto_tfm *crypto_hash_tfm(struct crypto_hash *tfm) { return tfm->base; } static inline int crypto_hash_init(struct hash_desc *desc) { crypto_digest_init(desc->tfm->base); return 0; } static inline int crypto_hash_update(struct hash_desc *desc, struct scatterlist *sg, unsigned int nbytes) { crypto_digest_update(desc->tfm->base,sg,1 /* ! */ ); /* ! this is not generic. Would need to convert nbytes -> nsg */ return 0; } static inline int crypto_hash_final(struct hash_desc *desc, u8 *out) { crypto_digest_final(desc->tfm->base, out); return 0; } #endif #ifndef COMPAT_HAVE_VZALLOC static inline void *vzalloc(unsigned long size) { void *rv = vmalloc(size); if (rv) memset(rv, 0, size); return rv; } #endif #ifndef COMPAT_HAVE_UMH_WAIT_PROC /* On Jul 17 2007 with commit 86313c4 usermodehelper: Tidy up waiting, * UMH_WAIT_PROC was added as an enum value of 1. * On Mar 23 2012 with commit 9d944ef3 that got changed to a define of 2. */ #define UMH_WAIT_PROC 1 #endif /* see upstream commit 2d3854a37e8b767a51aba38ed6d22817b0631e33 */ #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,30) #ifndef cpumask_bits #ifndef COMPAT_HAVE_NR_CPU_IDS #define nr_cpu_ids NR_CPUS #endif #define nr_cpumask_bits nr_cpu_ids typedef cpumask_t cpumask_var_t[1]; #define cpumask_bits(maskp) ((unsigned long*)(maskp)) #define cpu_online_mask &(cpu_online_map) static inline void cpumask_clear(cpumask_t *dstp) { bitmap_zero(cpumask_bits(dstp), NR_CPUS); } static inline int cpumask_equal(const cpumask_t *src1p, const cpumask_t *src2p) { return bitmap_equal(cpumask_bits(src1p), cpumask_bits(src2p), nr_cpumask_bits); } static inline void cpumask_copy(cpumask_t *dstp, cpumask_t *srcp) { bitmap_copy(cpumask_bits(dstp), cpumask_bits(srcp), nr_cpumask_bits); } static inline unsigned int cpumask_weight(const cpumask_t *srcp) { return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits); } static inline void cpumask_set_cpu(unsigned int cpu, cpumask_t *dstp) { set_bit(cpu, cpumask_bits(dstp)); } static inline void cpumask_setall(cpumask_t *dstp) { bitmap_fill(cpumask_bits(dstp), nr_cpumask_bits); } static inline void free_cpumask_var(cpumask_var_t mask) { } #endif /* see upstream commit 0281b5dc0350cbf6dd21ed558a33cccce77abc02 */ #ifdef CONFIG_CPUMASK_OFFSTACK static inline int zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags) { return alloc_cpumask_var(mask, flags | __GFP_ZERO); } #else static inline int zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags) { cpumask_clear(*mask); return 1; } #endif /* see upstream commit cd8ba7cd9be0192348c2836cb6645d9b2cd2bfd2 */ #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,26) /* As macro because RH has it in 2.6.18-128.4.1.el5, but not exported to modules !?!? */ #define set_cpus_allowed_ptr(P, NM) set_cpus_allowed(P, *NM) #endif #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) #define bitmap_parse(BUF, BUFLEN, MASKP, NMASK) \ backport_bitmap_parse(BUF, BUFLEN, 0, MASKP, NMASK) #define CHUNKSZ 32 #define nbits_to_hold_value(val) fls(val) #define unhex(c) (isdigit(c) ? (c - '0') : (toupper(c) - 'A' + 10)) static inline int backport_bitmap_parse(const char *buf, unsigned int buflen, int is_user, unsigned long *maskp, int nmaskbits) { int c, old_c, totaldigits, ndigits, nchunks, nbits; u32 chunk; const char __user *ubuf = buf; bitmap_zero(maskp, nmaskbits); nchunks = nbits = totaldigits = c = 0; do { chunk = ndigits = 0; /* Get the next chunk of the bitmap */ while (buflen) { old_c = c; if (is_user) { if (__get_user(c, ubuf++)) return -EFAULT; } else c = *buf++; buflen--; if (isspace(c)) continue; /* * If the last character was a space and the current * character isn't '\0', we've got embedded whitespace. * This is a no-no, so throw an error. */ if (totaldigits && c && isspace(old_c)) return -EINVAL; /* A '\0' or a ',' signal the end of the chunk */ if (c == '\0' || c == ',') break; if (!isxdigit(c)) return -EINVAL; /* * Make sure there are at least 4 free bits in 'chunk'. * If not, this hexdigit will overflow 'chunk', so * throw an error. */ if (chunk & ~((1UL << (CHUNKSZ - 4)) - 1)) return -EOVERFLOW; chunk = (chunk << 4) | unhex(c); ndigits++; totaldigits++; } if (ndigits == 0) return -EINVAL; if (nchunks == 0 && chunk == 0) continue; bitmap_shift_left(maskp, maskp, CHUNKSZ, nmaskbits); *maskp |= chunk; nchunks++; nbits += (nchunks == 1) ? nbits_to_hold_value(chunk) : CHUNKSZ; if (nbits > nmaskbits) return -EOVERFLOW; } while (buflen && c == ','); return 0; } #endif #ifndef net_random #define random32 net_random #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,30) #define BDI_async_congested BDI_write_congested #define BDI_sync_congested BDI_read_congested #endif /* see upstream commits * 2d3a4e3666325a9709cc8ea2e88151394e8f20fc (in 2.6.25-rc1) * 59b7435149eab2dd06dd678742faff6049cb655f (in 2.6.26-rc1) * this "backport" does not close the race that lead to the API change, * but only provides an equivalent function call. */ #ifndef COMPAT_HAVE_PROC_CREATE_DATA static inline struct proc_dir_entry *proc_create_data(const char *name, mode_t mode, struct proc_dir_entry *parent, struct file_operations *proc_fops, void *data) { struct proc_dir_entry *pde = create_proc_entry(name, mode, parent); if (pde) { pde->proc_fops = proc_fops; pde->data = data; } return pde; } #endif #ifndef COMPAT_HAVE_BLK_QUEUE_MAX_HW_SECTORS static inline void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max) { blk_queue_max_sectors(q, max); } #elif defined(COMPAT_USE_BLK_QUEUE_MAX_SECTORS_ANYWAYS) /* For kernel versions 2.6.31 to 2.6.33 inclusive, even though * blk_queue_max_hw_sectors is present, we actually need to use * blk_queue_max_sectors to set max_hw_sectors. :-( * RHEL6 2.6.32 chose to be different and already has eliminated * blk_queue_max_sectors as upstream 2.6.34 did. */ #define blk_queue_max_hw_sectors(q, max) blk_queue_max_sectors(q, max) #endif #ifndef COMPAT_HAVE_BLK_QUEUE_MAX_SEGMENTS static inline void blk_queue_max_segments(struct request_queue *q, unsigned short max_segments) { blk_queue_max_phys_segments(q, max_segments); blk_queue_max_hw_segments(q, max_segments); #define BLK_MAX_SEGMENTS MAX_HW_SEGMENTS /* or max MAX_PHYS_SEGMENTS. Probably does not matter */ } #endif /* REQ_* and BIO_RW_* flags have been moved around in the tree, * and have finally been "merged" with * 7b6d91daee5cac6402186ff224c3af39d79f4a0e and * 7cc015811ef8992dfcce314d0ed9642bc18143d1 * We communicate between different systems, * so we have to somehow semantically map the bi_rw flags * bi_rw (some kernel version) -> data packet flags -> bi_rw (other kernel version) */ /* RHEL 6.1 backported FLUSH/FUA as BIO_RW_FLUSH/FUA * and at that time also introduced the defines BIO_FLUSH/FUA. * There is also REQ_FLUSH/FUA, but these do NOT share * the same value space as the bio rw flags, yet. */ #ifdef BIO_FLUSH #define DRBD_REQ_FLUSH (1UL << BIO_RW_FLUSH) #define DRBD_REQ_FUA (1UL << BIO_RW_FUA) #define DRBD_REQ_HARDBARRIER (1UL << BIO_RW_BARRIER) #define DRBD_REQ_DISCARD (1UL << BIO_RW_DISCARD) #define DRBD_REQ_SYNC (1UL << BIO_RW_SYNCIO) #define DRBD_REQ_UNPLUG (1UL << BIO_RW_UNPLUG) #elif defined(REQ_FLUSH) /* introduced in 2.6.36, * now equivalent to bi_rw */ #define DRBD_REQ_SYNC REQ_SYNC #define DRBD_REQ_FLUSH REQ_FLUSH #define DRBD_REQ_FUA REQ_FUA #define DRBD_REQ_DISCARD REQ_DISCARD /* REQ_HARDBARRIER has been around for a long time, * without being directly related to bi_rw. * so the ifdef is only usful inside the ifdef REQ_FLUSH! * commit 7cc0158 (v2.6.36-rc1) made it a bi_rw flag, ... */ #ifdef REQ_HARDBARRIER #define DRBD_REQ_HARDBARRIER REQ_HARDBARRIER #else /* ... but REQ_HARDBARRIER was removed again in 02e031c (v2.6.37-rc4). */ #define DRBD_REQ_HARDBARRIER 0 #endif /* again: testing on this _inside_ the ifdef REQ_FLUSH, * see 721a960 block: kill off REQ_UNPLUG */ #ifdef REQ_UNPLUG #define DRBD_REQ_UNPLUG REQ_UNPLUG #else #define DRBD_REQ_UNPLUG 0 #endif #else /* "older", and hopefully not * "partially backported" kernel */ #if defined(BIO_RW_SYNC) /* see upstream commits * 213d9417fec62ef4c3675621b9364a667954d4dd, * 93dbb393503d53cd226e5e1f0088fe8f4dbaa2b8 * later, the defines even became an enum ;-) */ #define DRBD_REQ_SYNC (1UL << BIO_RW_SYNC) #define DRBD_REQ_UNPLUG (1UL << BIO_RW_SYNC) #else /* cannot test on defined(BIO_RW_SYNCIO), it may be an enum */ #define DRBD_REQ_SYNC (1UL << BIO_RW_SYNCIO) #define DRBD_REQ_UNPLUG (1UL << BIO_RW_UNPLUG) #endif #define DRBD_REQ_FLUSH (1UL << BIO_RW_BARRIER) /* REQ_FUA has been around for a longer time, * without a direct equivalent in bi_rw. */ #define DRBD_REQ_FUA (1UL << BIO_RW_BARRIER) #define DRBD_REQ_HARDBARRIER (1UL << BIO_RW_BARRIER) /* we don't support DISCARDS yet, anyways. * cannot test on defined(BIO_RW_DISCARD), it may be an enum */ #define DRBD_REQ_DISCARD 0 #endif /* this results in: bi_rw -> dp_flags < 2.6.28 SYNC -> SYNC|UNPLUG BARRIER -> FUA|FLUSH there is no DISCARD 2.6.28 SYNC -> SYNC|UNPLUG BARRIER -> FUA|FLUSH DISCARD -> DISCARD 2.6.29 SYNCIO -> SYNC UNPLUG -> UNPLUG BARRIER -> FUA|FLUSH DISCARD -> DISCARD 2.6.36 SYNC -> SYNC UNPLUG -> UNPLUG FUA -> FUA FLUSH -> FLUSH DISCARD -> DISCARD -------------------------------------- dp_flags -> bi_rw < 2.6.28 SYNC -> SYNC (and unplug) UNPLUG -> SYNC (and unplug) FUA -> BARRIER FLUSH -> BARRIER there is no DISCARD, it will be silently ignored on the receiving side. 2.6.28 SYNC -> SYNC (and unplug) UNPLUG -> SYNC (and unplug) FUA -> BARRIER FLUSH -> BARRIER DISCARD -> DISCARD (if that fails, we handle it like any other IO error) 2.6.29 SYNC -> SYNCIO UNPLUG -> UNPLUG FUA -> BARRIER FLUSH -> BARRIER DISCARD -> DISCARD 2.6.36 SYNC -> SYNC UNPLUG -> UNPLUG FUA -> FUA FLUSH -> FLUSH DISCARD -> DISCARD NOTE: DISCARDs likely need some work still. We should actually never see DISCARD requests, as our queue does not announce QUEUE_FLAG_DISCARD yet. */ #ifndef CONFIG_DYNAMIC_DEBUG /* At least in 2.6.34 the function macro dynamic_dev_dbg() is broken when compiling without CONFIG_DYNAMIC_DEBUG. It has 'format' in the argument list, it references to 'fmt' in its body. */ #ifdef dynamic_dev_dbg #undef dynamic_dev_dbg #define dynamic_dev_dbg(dev, fmt, ...) \ do { if (0) dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__); } while (0) #endif #endif #ifndef min_not_zero #define min_not_zero(x, y) ({ \ typeof(x) __x = (x); \ typeof(y) __y = (y); \ __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); }) #endif /* Introduced with 2.6.26. See include/linux/jiffies.h */ #ifndef time_is_before_eq_jiffies #define time_is_before_jiffies(a) time_after(jiffies, a) #define time_is_after_jiffies(a) time_before(jiffies, a) #define time_is_before_eq_jiffies(a) time_after_eq(jiffies, a) #define time_is_after_eq_jiffies(a) time_before_eq(jiffies, a) #endif #ifndef time_in_range #define time_in_range(a,b,c) \ (time_after_eq(a,b) && \ time_before_eq(a,c)) #endif #ifdef COMPAT_BIO_SPLIT_HAS_BIO_SPLIT_POOL_PARAMETER #define bio_split(bi, first_sectors) bio_split(bi, bio_split_pool, first_sectors) #endif #ifndef COMPAT_HAVE_BIOSET_CREATE_FRONT_PAD /* see comments in compat/tests/have_bioset_create_front_pad.c */ #ifdef COMPAT_BIOSET_CREATE_HAS_THREE_PARAMETERS #define bioset_create(pool_size, front_pad) bioset_create(pool_size, pool_size, 1) #else #define bioset_create(pool_size, front_pad) bioset_create(pool_size, 1) #endif #endif #if !(defined(COMPAT_HAVE_RB_AUGMENT_FUNCTIONS) && \ defined(AUGMENTED_RBTREE_SYMBOLS_EXPORTED)) /* * Make sure the replacements for the augmented rbtree helper functions do not * clash with functions the kernel implements but does not export. */ #define rb_augment_f drbd_rb_augment_f #define rb_augment_path drbd_rb_augment_path #define rb_augment_insert drbd_rb_augment_insert #define rb_augment_erase_begin drbd_rb_augment_erase_begin #define rb_augment_erase_end drbd_rb_augment_erase_end typedef void (*rb_augment_f)(struct rb_node *node, void *data); static inline void rb_augment_path(struct rb_node *node, rb_augment_f func, void *data) { struct rb_node *parent; up: func(node, data); parent = rb_parent(node); if (!parent) return; if (node == parent->rb_left && parent->rb_right) func(parent->rb_right, data); else if (parent->rb_left) func(parent->rb_left, data); node = parent; goto up; } /* * after inserting @node into the tree, update the tree to account for * both the new entry and any damage done by rebalance */ static inline void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data) { if (node->rb_left) node = node->rb_left; else if (node->rb_right) node = node->rb_right; rb_augment_path(node, func, data); } /* * before removing the node, find the deepest node on the rebalance path * that will still be there after @node gets removed */ static inline struct rb_node *rb_augment_erase_begin(struct rb_node *node) { struct rb_node *deepest; if (!node->rb_right && !node->rb_left) deepest = rb_parent(node); else if (!node->rb_right) deepest = node->rb_left; else if (!node->rb_left) deepest = node->rb_right; else { deepest = rb_next(node); if (deepest->rb_right) deepest = deepest->rb_right; else if (rb_parent(deepest) != node) deepest = rb_parent(deepest); } return deepest; } /* * after removal, update the tree to account for the removed entry * and any rebalance damage. */ static inline void rb_augment_erase_end(struct rb_node *node, rb_augment_f func, void *data) { if (node) rb_augment_path(node, func, data); } #endif /* * In commit c4945b9e (v2.6.39-rc1), the little-endian bit operations have been * renamed to be less weird. */ #ifndef COMPAT_HAVE_FIND_NEXT_ZERO_BIT_LE #define find_next_zero_bit_le(addr, size, offset) \ generic_find_next_zero_le_bit(addr, size, offset) #define find_next_bit_le(addr, size, offset) \ generic_find_next_le_bit(addr, size, offset) #define test_bit_le(nr, addr) \ generic_test_le_bit(nr, addr) #define __test_and_set_bit_le(nr, addr) \ generic___test_and_set_le_bit(nr, addr) #define __test_and_clear_bit_le(nr, addr) \ generic___test_and_clear_le_bit(nr, addr) #endif #ifndef IDR_GET_NEXT_EXPORTED /* Body in compat/idr.c */ extern void *idr_get_next(struct idr *idp, int *nextidp); #endif /* #ifndef COMPAT_HAVE_LIST_ENTRY_RCU */ #ifndef list_entry_rcu #ifndef rcu_dereference_raw /* see c26d34a rcu: Add lockdep-enabled variants of rcu_dereference() */ #define rcu_dereference_raw(p) rcu_dereference(p) #endif #define list_entry_rcu(ptr, type, member) \ ({typeof (*ptr) *__ptr = (typeof (*ptr) __force *)ptr; \ container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \ }) #endif /* * Introduced in 930631ed (v2.6.19-rc1). */ #ifndef DIV_ROUND_UP #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) #endif /* * IS_ALIGNED() was added to in mainline commit 0c0e6195 (and * improved in f10db627); 2.6.24-rc1. */ #ifndef IS_ALIGNED #define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) #endif /* * NLA_TYPE_MASK and nla_type() were added to in mainline * commit 8f4c1f9b; v2.6.24-rc1. Before that, none of the nlattr->nla_type * flags had a special meaning. */ #ifndef NLA_TYPE_MASK #define NLA_TYPE_MASK ~0 static inline int nla_type(const struct nlattr *nla) { return nla->nla_type & NLA_TYPE_MASK; } #endif /* * nlmsg_hdr was added to in mainline commit b529ccf2 * (v2.6.22-rc1). */ #ifndef COMPAT_HAVE_NLMSG_HDR static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb) { return (struct nlmsghdr *)skb->data; } #endif /* * genlmsg_reply() was added to in mainline commit 81878d27 * (v2.6.20-rc2). */ #ifndef COMPAT_HAVE_GENLMSG_REPLY #include static inline int genlmsg_reply(struct sk_buff *skb, struct genl_info *info) { return genlmsg_unicast(skb, info->snd_pid); } #endif /* * genlmsg_msg_size() and genlmsg_total_size() were added to * in mainline commit 17db952c (v2.6.19-rc1). */ #ifndef COMPAT_HAVE_GENLMSG_MSG_SIZE #include #include static inline int genlmsg_msg_size(int payload) { return GENL_HDRLEN + payload; } static inline int genlmsg_total_size(int payload) { return NLMSG_ALIGN(genlmsg_msg_size(payload)); } #endif /* * genlmsg_new() was added to in mainline commit 3dabc715 * (v2.6.20-rc2). */ #ifndef COMPAT_HAVE_GENLMSG_NEW #include static inline struct sk_buff *genlmsg_new(size_t payload, gfp_t flags) { return nlmsg_new(genlmsg_total_size(payload), flags); } #endif /* * genlmsg_put() was introduced in mainline commit 482a8524 (v2.6.15-rc1) and * changed in 17c157c8 (v2.6.20-rc2). genlmsg_put_reply() was introduced in * 17c157c8. We replace the compat_genlmsg_put() from 482a8524. */ #ifndef COMPAT_HAVE_GENLMSG_PUT_REPLY #include static inline void *compat_genlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, struct genl_family *family, int flags, u8 cmd) { struct nlmsghdr *nlh; struct genlmsghdr *hdr; nlh = nlmsg_put(skb, pid, seq, family->id, GENL_HDRLEN + family->hdrsize, flags); if (nlh == NULL) return NULL; hdr = nlmsg_data(nlh); hdr->cmd = cmd; hdr->version = family->version; hdr->reserved = 0; return (char *) hdr + GENL_HDRLEN; } #define genlmsg_put compat_genlmsg_put static inline void *genlmsg_put_reply(struct sk_buff *skb, struct genl_info *info, struct genl_family *family, int flags, u8 cmd) { return genlmsg_put(skb, info->snd_pid, info->snd_seq, family, flags, cmd); } #endif /* * compat_genlmsg_multicast() got a gfp_t parameter in mainline commit d387f6ad * (v2.6.19-rc1). */ #ifdef COMPAT_NEED_GENLMSG_MULTICAST_WRAPPER #include static inline int compat_genlmsg_multicast(struct sk_buff *skb, u32 pid, unsigned int group, gfp_t flags) { return genlmsg_multicast(skb, pid, group); } #define genlmsg_multicast compat_genlmsg_multicast #endif /* * Dynamic generic netlink multicast groups were introduced in mainline commit * 2dbba6f7 (v2.6.23-rc1). Before that, netlink had a fixed number of 32 * multicast groups. Use an arbitrary hard-coded group number for that case. */ #ifndef COMPAT_HAVE_CTRL_ATTR_MCAST_GROUPS struct genl_multicast_group { struct genl_family *family; /* private */ struct list_head list; /* private */ char name[GENL_NAMSIZ]; u32 id; }; static inline int genl_register_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { grp->id = 1; return 0; } static inline void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { } #endif /* pr_warning was introduced with 2.6.37 (commit 968ab183) */ #ifndef pr_fmt #define pr_fmt(fmt) fmt #endif #ifndef pr_warning #define pr_warning(fmt, ...) \ printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) #endif #ifndef COMPAT_HAVE_IS_ERR_OR_NULL static inline long __must_check IS_ERR_OR_NULL(const void *ptr) { return !ptr || IS_ERR_VALUE((unsigned long)ptr); } #endif #ifndef disk_to_dev /* disk_to_dev was introduced with 2.6.27. Before that the kobj was directly in gendisk */ static inline struct kobject *drbd_kobj_of_disk(struct gendisk *disk) { return &disk->kobj; } #else static inline struct kobject *drbd_kobj_of_disk(struct gendisk *disk) { return &disk_to_dev(disk)->kobj; } #endif #ifndef ULLONG_MAX /* introduced in 2.6.18 */ #define ULLONG_MAX (~0ULL) #endif #ifndef SK_CAN_REUSE /* This constant was introduced by Pavel Emelyanov on Thu Apr 19 03:39:36 2012 +0000. Before the release of linux-3.5 commit 4a17fd52 sock: Introduce named constants for sk_reuse */ #define SK_CAN_REUSE 1 #endif #ifndef COMPAT_HAVE_KREF_SUB static inline int kref_sub(struct kref *kref, unsigned int count, void (*release)(struct kref *kref)) { WARN_ON(release == NULL); if (atomic_sub_and_test((int) count, &kref->refcount)) { release(kref); return 1; } return 0; } #endif #ifndef KOBJECT_CREATE_AND_ADD_EXPORTED struct kobject *kobject_create_and_add(const char *name, struct kobject *parent); int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, struct kobject *parent, const char *name); #endif #ifdef COMPAT_KMAP_ATOMIC_PAGE_ONLY /* see 980c19e3 * highmem: mark k[un]map_atomic() with two arguments as deprecated */ #define drbd_kmap_atomic(page, km) kmap_atomic(page) #define drbd_kunmap_atomic(addr, km) kunmap_atomic(addr) #else #define drbd_kmap_atomic(page, km) kmap_atomic(page, km) #define drbd_kunmap_atomic(addr, km) kunmap_atomic(addr, km) #endif #ifdef COMPAT_HAVE_NETLINK_SKB_PARMS_PORTID #define NETLINK_CB_PORTID(skb) NETLINK_CB(skb).portid #else #define NETLINK_CB_PORTID(skb) NETLINK_CB(skb).pid #endif #ifndef COMPAT_HAVE_LIST_SPLICE_TAIL_INIT static inline void __backported_list_splice(const struct list_head *list, struct list_head *prev, struct list_head *next) { struct list_head *first = list->next; struct list_head *last = list->prev; first->prev = prev; prev->next = first; last->next = next; next->prev = last; } static inline void list_splice_tail_init(struct list_head *list, struct list_head *head) { if (!list_empty(list)) { __backported_list_splice(list, head->prev, head); INIT_LIST_HEAD(list); } } #endif #ifndef COMPAT_HAVE_IDR_ALLOC static inline int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask) { int rv, got; if (!idr_pre_get(idr, gfp_mask)) return -ENOMEM; rv = idr_get_new_above(idr, ptr, start, &got); if (rv < 0) return rv; if (got >= end) { idr_remove(idr, got); return -ENOSPC; } return got; } #endif #ifndef COMPAT_HAVE_IDR_FOR_EACH_ENTRY /** * idr_for_each_entry - iterate over an idr's elements of a given type * @idp: idr handle * @entry: the type * to use as cursor * @id: id entry's key * * @entry and @id do not need to be initialized before the loop, and * after normal terminatinon @entry is left with the value NULL. This * is convenient for a "not found" value. */ #define idr_for_each_entry(idp, entry, id) \ for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id) #endif #ifndef COMPAT_HAVE_PRANDOM_U32 static inline u32 prandom_u32(void) { return random32(); } #endif #ifndef COMPAT_HAVE_PROC_PDE_DATA #define PDE_DATA(inode) PDE(inode)->data #endif #ifndef COMPAT_HAVE_TASK_PID_NR #include static inline pid_t task_pid_nr(struct task_struct *tsk) { return tsk->pid; } #endif #ifndef for_each_cpu # define for_each_cpu(cpu, mask) for_each_cpu_mask(cpu, mask) #endif #ifndef COMPAT_HAVE_CPUMASK_EMPTY #include #define cpumask_empty(mask) cpus_empty(mask) #endif #if !defined(QUEUE_FLAG_DISCARD) || !defined(QUEUE_FLAG_SECDISCARD) # define queue_flag_set_unlocked(F, Q) \ ({ \ if ((F) != -1) \ queue_flag_set_unlocked(F, Q); \ }) # define queue_flag_clear_unlocked(F, Q) \ ({ \ if ((F) != -1) \ queue_flag_clear_unlocked(F, Q); \ }) # ifndef blk_queue_discard # define blk_queue_discard(q) (0) # define QUEUE_FLAG_DISCARD (-1) # endif # ifndef blk_queue_secdiscard # define blk_queue_secdiscard(q) (0) # define QUEUE_FLAG_SECDISCARD (-1) # endif #endif #ifndef BLKDEV_ISSUE_ZEROOUT_EXPORTED /* Was introduced with 2.6.34 */ extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask); #else #ifdef COMPAT_BLKDEV_ISSUE_ZEROOUT_HAS_5_PARAMTERS /* ... but in 2.6.34 and 2.6.35 it had 5 parameters. Later only 4 */ #define blkdev_issue_zeroout(BDEV, SS, NS, GFP) \ blkdev_issue_zeroout(BDEV, SS, NS, GFP, BLKDEV_IFL_WAIT) #endif #endif #ifndef COMPAT_HAVE_GENL_LOCK static inline void genl_lock(void) { } static inline void genl_unlock(void) { } #endif #ifdef COMPAT_HAVE_STRUCT_QUEUE_LIMITS #define DRBD_QUEUE_LIMITS(q) (&(q)->limits) #define LIMIT_TYPE struct queue_limits #else #define DRBD_QUEUE_LIMITS(q) (q) #define LIMIT_TYPE struct request_queue #endif #ifndef COMPAT_HAVE_BLK_SET_STACKING_LIMITS static inline void blk_set_stacking_limits(LIMIT_TYPE *lim) { # ifdef COMPAT_QUEUE_LIMITS_HAS_DISCARD_ZEROES_DATA lim->discard_zeroes_data = 1; # endif } #endif #endif drbd-8.4.4/drbd/linux/drbd.h0000664000000000000000000002415412221261130014274 0ustar rootroot/* drbd.h Kernel module for 2.6.x Kernels This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 2001-2008, Philipp Reisner . Copyright (C) 2001-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef DRBD_H #define DRBD_H #include #include #ifdef __KERNEL__ #include #include #else #include #include #include /* Although the Linux source code makes a difference between generic endianness and the bitfields' endianness, there is no architecture as of Linux-2.6.24-rc4 where the bitfields' endianness does not match the generic endianness. */ #if __BYTE_ORDER == __LITTLE_ENDIAN #define __LITTLE_ENDIAN_BITFIELD #elif __BYTE_ORDER == __BIG_ENDIAN #define __BIG_ENDIAN_BITFIELD #else # error "sorry, weird endianness on this box" #endif #endif enum drbd_io_error_p { EP_PASS_ON, /* FIXME should the better be named "Ignore"? */ EP_CALL_HELPER, EP_DETACH }; enum drbd_fencing_p { FP_NOT_AVAIL = -1, /* Not a policy */ FP_DONT_CARE = 0, FP_RESOURCE, FP_STONITH }; enum drbd_disconnect_p { DP_RECONNECT, DP_DROP_NET_CONF, DP_FREEZE_IO }; enum drbd_after_sb_p { ASB_DISCONNECT, ASB_DISCARD_YOUNGER_PRI, ASB_DISCARD_OLDER_PRI, ASB_DISCARD_ZERO_CHG, ASB_DISCARD_LEAST_CHG, ASB_DISCARD_LOCAL, ASB_DISCARD_REMOTE, ASB_CONSENSUS, ASB_DISCARD_SECONDARY, ASB_CALL_HELPER, ASB_VIOLENTLY }; enum drbd_on_no_data { OND_IO_ERROR, OND_SUSPEND_IO }; enum drbd_on_congestion { OC_BLOCK, OC_PULL_AHEAD, OC_DISCONNECT, }; enum drbd_read_balancing { RB_PREFER_LOCAL, RB_PREFER_REMOTE, RB_ROUND_ROBIN, RB_LEAST_PENDING, RB_CONGESTED_REMOTE, RB_32K_STRIPING, RB_64K_STRIPING, RB_128K_STRIPING, RB_256K_STRIPING, RB_512K_STRIPING, RB_1M_STRIPING, }; /* KEEP the order, do not delete or insert. Only append. */ enum drbd_ret_code { ERR_CODE_BASE = 100, NO_ERROR = 101, ERR_LOCAL_ADDR = 102, ERR_PEER_ADDR = 103, ERR_OPEN_DISK = 104, ERR_OPEN_MD_DISK = 105, ERR_DISK_NOT_BDEV = 107, ERR_MD_NOT_BDEV = 108, ERR_DISK_TOO_SMALL = 111, ERR_MD_DISK_TOO_SMALL = 112, ERR_BDCLAIM_DISK = 114, ERR_BDCLAIM_MD_DISK = 115, ERR_MD_IDX_INVALID = 116, ERR_IO_MD_DISK = 118, ERR_MD_INVALID = 119, ERR_AUTH_ALG = 120, ERR_AUTH_ALG_ND = 121, ERR_NOMEM = 122, ERR_DISCARD_IMPOSSIBLE = 123, ERR_DISK_CONFIGURED = 124, ERR_NET_CONFIGURED = 125, ERR_MANDATORY_TAG = 126, ERR_MINOR_INVALID = 127, ERR_INTR = 129, /* EINTR */ ERR_RESIZE_RESYNC = 130, ERR_NO_PRIMARY = 131, ERR_RESYNC_AFTER = 132, ERR_RESYNC_AFTER_CYCLE = 133, ERR_PAUSE_IS_SET = 134, ERR_PAUSE_IS_CLEAR = 135, ERR_PACKET_NR = 137, ERR_NO_DISK = 138, ERR_NOT_PROTO_C = 139, ERR_NOMEM_BITMAP = 140, ERR_INTEGRITY_ALG = 141, /* DRBD 8.2 only */ ERR_INTEGRITY_ALG_ND = 142, /* DRBD 8.2 only */ ERR_CPU_MASK_PARSE = 143, /* DRBD 8.2 only */ ERR_CSUMS_ALG = 144, /* DRBD 8.2 only */ ERR_CSUMS_ALG_ND = 145, /* DRBD 8.2 only */ ERR_VERIFY_ALG = 146, /* DRBD 8.2 only */ ERR_VERIFY_ALG_ND = 147, /* DRBD 8.2 only */ ERR_CSUMS_RESYNC_RUNNING= 148, /* DRBD 8.2 only */ ERR_VERIFY_RUNNING = 149, /* DRBD 8.2 only */ ERR_DATA_NOT_CURRENT = 150, ERR_CONNECTED = 151, /* DRBD 8.3 only */ ERR_PERM = 152, ERR_NEED_APV_93 = 153, ERR_STONITH_AND_PROT_A = 154, ERR_CONG_NOT_PROTO_A = 155, ERR_PIC_AFTER_DEP = 156, ERR_PIC_PEER_DEP = 157, ERR_RES_NOT_KNOWN = 158, ERR_RES_IN_USE = 159, ERR_MINOR_CONFIGURED = 160, ERR_MINOR_EXISTS = 161, ERR_INVALID_REQUEST = 162, ERR_NEED_APV_100 = 163, ERR_NEED_ALLOW_TWO_PRI = 164, ERR_MD_UNCLEAN = 165, ERR_MD_LAYOUT_CONNECTED = 166, ERR_MD_LAYOUT_TOO_BIG = 167, ERR_MD_LAYOUT_TOO_SMALL = 168, ERR_MD_LAYOUT_NO_FIT = 169, ERR_IMPLICIT_SHRINK = 170, /* insert new ones above this line */ AFTER_LAST_ERR_CODE }; #define DRBD_PROT_A 1 #define DRBD_PROT_B 2 #define DRBD_PROT_C 3 enum drbd_role { R_UNKNOWN = 0, R_PRIMARY = 1, /* role */ R_SECONDARY = 2, /* role */ R_MASK = 3, }; /* The order of these constants is important. * The lower ones (=C_WF_REPORT_PARAMS ==> There is a socket */ enum drbd_conns { C_STANDALONE, C_DISCONNECTING, /* Temporal state on the way to StandAlone. */ C_UNCONNECTED, /* >= C_UNCONNECTED -> inc_net() succeeds */ /* These temporal states are all used on the way * from >= C_CONNECTED to Unconnected. * The 'disconnect reason' states * I do not allow to change between them. */ C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE, C_PROTOCOL_ERROR, C_TEAR_DOWN, C_WF_CONNECTION, C_WF_REPORT_PARAMS, /* we have a socket */ C_CONNECTED, /* we have introduced each other */ C_STARTING_SYNC_S, /* starting full sync by admin request. */ C_STARTING_SYNC_T, /* starting full sync by admin request. */ C_WF_BITMAP_S, C_WF_BITMAP_T, C_WF_SYNC_UUID, /* All SyncStates are tested with this comparison * xx >= C_SYNC_SOURCE && xx <= C_PAUSED_SYNC_T */ C_SYNC_SOURCE, C_SYNC_TARGET, C_VERIFY_S, C_VERIFY_T, C_PAUSED_SYNC_S, C_PAUSED_SYNC_T, C_AHEAD, C_BEHIND, C_MASK = 31 }; enum drbd_disk_state { D_DISKLESS, D_ATTACHING, /* In the process of reading the meta-data */ D_FAILED, /* Becomes D_DISKLESS as soon as we told it the peer */ /* when >= D_FAILED it is legal to access mdev->bc */ D_NEGOTIATING, /* Late attaching state, we need to talk to the peer */ D_INCONSISTENT, D_OUTDATED, D_UNKNOWN, /* Only used for the peer, never for myself */ D_CONSISTENT, /* Might be D_OUTDATED, might be D_UP_TO_DATE ... */ D_UP_TO_DATE, /* Only this disk state allows applications' IO ! */ D_MASK = 15 }; union drbd_state { /* According to gcc's docs is the ... * The order of allocation of bit-fields within a unit (C90 6.5.2.1, C99 6.7.2.1). * Determined by ABI. * pointed out by Maxim Uvarov q * even though we transmit as "cpu_to_be32(state)", * the offsets of the bitfields still need to be swapped * on different endianness. */ struct { #if defined(__LITTLE_ENDIAN_BITFIELD) unsigned role:2 ; /* 3/4 primary/secondary/unknown */ unsigned peer:2 ; /* 3/4 primary/secondary/unknown */ unsigned conn:5 ; /* 17/32 cstates */ unsigned disk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned pdsk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned susp:1 ; /* 2/2 IO suspended no/yes (by user) */ unsigned aftr_isp:1 ; /* isp .. imposed sync pause */ unsigned peer_isp:1 ; unsigned user_isp:1 ; unsigned susp_nod:1 ; /* IO suspended because no data */ unsigned susp_fen:1 ; /* IO suspended because fence peer handler runs*/ unsigned _pad:9; /* 0 unused */ #elif defined(__BIG_ENDIAN_BITFIELD) unsigned _pad:9; unsigned susp_fen:1 ; unsigned susp_nod:1 ; unsigned user_isp:1 ; unsigned peer_isp:1 ; unsigned aftr_isp:1 ; /* isp .. imposed sync pause */ unsigned susp:1 ; /* 2/2 IO suspended no/yes */ unsigned pdsk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned disk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned conn:5 ; /* 17/32 cstates */ unsigned peer:2 ; /* 3/4 primary/secondary/unknown */ unsigned role:2 ; /* 3/4 primary/secondary/unknown */ #else # error "this endianness is not supported" #endif }; unsigned int i; }; enum drbd_state_rv { SS_CW_NO_NEED = 4, SS_CW_SUCCESS = 3, SS_NOTHING_TO_DO = 2, SS_SUCCESS = 1, SS_UNKNOWN_ERROR = 0, /* Used to sleep longer in _drbd_request_state */ SS_TWO_PRIMARIES = -1, SS_NO_UP_TO_DATE_DISK = -2, SS_NO_LOCAL_DISK = -4, SS_NO_REMOTE_DISK = -5, SS_CONNECTED_OUTDATES = -6, SS_PRIMARY_NOP = -7, SS_RESYNC_RUNNING = -8, SS_ALREADY_STANDALONE = -9, SS_CW_FAILED_BY_PEER = -10, SS_IS_DISKLESS = -11, SS_DEVICE_IN_USE = -12, SS_NO_NET_CONFIG = -13, SS_NO_VERIFY_ALG = -14, /* drbd-8.2 only */ SS_NEED_CONNECTION = -15, /* drbd-8.2 only */ SS_LOWER_THAN_OUTDATED = -16, SS_NOT_SUPPORTED = -17, /* drbd-8.2 only */ SS_IN_TRANSIENT_STATE = -18, /* Retry after the next state change */ SS_CONCURRENT_ST_CHG = -19, /* Concurrent cluster side state change! */ SS_O_VOL_PEER_PRI = -20, SS_OUTDATE_WO_CONN = -21, SS_AFTER_LAST_ERROR = -22, /* Keep this at bottom */ }; #define SHARED_SECRET_MAX 64 #define MDF_CONSISTENT (1 << 0) #define MDF_PRIMARY_IND (1 << 1) #define MDF_CONNECTED_IND (1 << 2) #define MDF_FULL_SYNC (1 << 3) #define MDF_WAS_UP_TO_DATE (1 << 4) #define MDF_PEER_OUT_DATED (1 << 5) #define MDF_CRASHED_PRIMARY (1 << 6) #define MDF_AL_CLEAN (1 << 7) #define MDF_AL_DISABLED (1 << 8) enum drbd_uuid_index { UI_CURRENT, UI_BITMAP, UI_HISTORY_START, UI_HISTORY_END, UI_SIZE, /* nl-packet: number of dirty bits */ UI_FLAGS, /* nl-packet: flags */ UI_EXTENDED_SIZE /* Everything. */ }; enum drbd_timeout_flag { UT_DEFAULT = 0, UT_DEGRADED = 1, UT_PEER_OUTDATED = 2, }; #define UUID_JUST_CREATED ((__u64)4) /* magic numbers used in meta data and network packets */ #define DRBD_MAGIC 0x83740267 #define DRBD_MAGIC_BIG 0x835a #define DRBD_MAGIC_100 0x8620ec20 #define DRBD_MD_MAGIC_07 (DRBD_MAGIC+3) #define DRBD_MD_MAGIC_08 (DRBD_MAGIC+4) #define DRBD_MD_MAGIC_84_UNCLEAN (DRBD_MAGIC+5) /* how I came up with this magic? * base64 decode "actlog==" ;) */ #define DRBD_AL_MAGIC 0x69cb65a2 /* these are of type "int" */ #define DRBD_MD_INDEX_INTERNAL -1 #define DRBD_MD_INDEX_FLEX_EXT -2 #define DRBD_MD_INDEX_FLEX_INT -3 #define DRBD_CPU_MASK_SIZE 32 #endif drbd-8.4.4/drbd/linux/drbd_config.h0000664000000000000000000000312112226007136015621 0ustar rootroot/* drbd_config.h DRBD's compile time configuration. drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef DRBD_CONFIG_H #define DRBD_CONFIG_H extern const char *drbd_buildtag(void); /* Necessary to build the external module against >= Linux-2.6.33 */ #ifdef REL_VERSION #undef REL_VERSION #undef API_VERSION #undef PRO_VERSION_MIN #undef PRO_VERSION_MAX #endif /* End of external module for 2.6.33 stuff */ #define REL_VERSION "8.4.4" #define API_VERSION 1 #define PRO_VERSION_MIN 86 #define PRO_VERSION_MAX 101 #ifndef __CHECKER__ /* for a sparse run, we need all STATICs */ #define DBG_ALL_SYMBOLS /* no static functs, improves quality of OOPS traces */ #endif /* drbd_assert_breakpoint() function #define DBG_ASSERTS */ /* Dump all cstate changes */ #define DUMP_MD 2 /* some extra checks #define PARANOIA */ /* Enable fault insertion code */ #ifndef CONFIG_DRBD_FAULT_INJECTION #define CONFIG_DRBD_FAULT_INJECTION 1 #endif #ifdef __KERNEL__ #include "compat.h" #endif #endif drbd-8.4.4/drbd/linux/drbd_genl.h0000664000000000000000000003672512221331365015320 0ustar rootroot/* * General overview: * full generic netlink message: * |nlmsghdr|genlmsghdr| * * payload: * |optional fixed size family header| * * sequence of netlink attributes: * I chose to have all "top level" attributes NLA_NESTED, * corresponding to some real struct. * So we have a sequence of |tla, len| * * nested nla sequence: * may be empty, or contain a sequence of netlink attributes * representing the struct fields. * * The tag number of any field (regardless of containing struct) * will be available as T_ ## field_name, * so you cannot have the same field name in two differnt structs. * * The tag numbers themselves are per struct, though, * so should always begin at 1 (not 0, that is the special "NLA_UNSPEC" type, * which we won't use here). * The tag numbers are used as index in the respective nla_policy array. * * GENL_struct(tag_name, tag_number, struct name, struct fields) - struct and policy * genl_magic_struct.h * generates the struct declaration, * generates an entry in the tla enum, * genl_magic_func.h * generates an entry in the static tla policy * with .type = NLA_NESTED * generates the static _nl_policy definition, * and static conversion functions * * genl_magic_func.h * * GENL_mc_group(group) * genl_magic_struct.h * does nothing * genl_magic_func.h * defines and registers the mcast group, * and provides a send helper * * GENL_notification(op_name, op_num, mcast_group, tla list) * These are notifications to userspace. * * genl_magic_struct.h * generates an entry in the genl_ops enum, * genl_magic_func.h * does nothing * * mcast group: the name of the mcast group this notification should be * expected on * tla list: the list of expected top level attributes, * for documentation and sanity checking. * * GENL_op(op_name, op_num, flags and handler, tla list) - "genl operations" * These are requests from userspace. * * _op and _notification share the same "number space", * op_nr will be assigned to "genlmsghdr->cmd" * * genl_magic_struct.h * generates an entry in the genl_ops enum, * genl_magic_func.h * generates an entry in the static genl_ops array, * and static register/unregister functions to * genl_register_family_with_ops(). * * flags and handler: * GENL_op_init( .doit = x, .dumpit = y, .flags = something) * GENL_doit(x) => .dumpit = NULL, .flags = GENL_ADMIN_PERM * tla list: the list of expected top level attributes, * for documentation and sanity checking. */ /* * STRUCTS */ /* this is sent kernel -> userland on various error conditions, and contains * informational textual info, which is supposedly human readable. * The computer relevant return code is in the drbd_genlmsghdr. */ GENL_struct(DRBD_NLA_CFG_REPLY, 1, drbd_cfg_reply, /* "arbitrary" size strings, nla_policy.len = 0 */ __str_field(1, DRBD_GENLA_F_MANDATORY, info_text, 0) ) /* Configuration requests typically need a context to operate on. * Possible keys are device minor (fits in the drbd_genlmsghdr), * the replication link (aka connection) name, * and/or the replication group (aka resource) name, * and the volume id within the resource. */ GENL_struct(DRBD_NLA_CFG_CONTEXT, 2, drbd_cfg_context, __u32_field(1, DRBD_GENLA_F_MANDATORY, ctx_volume) __str_field(2, DRBD_GENLA_F_MANDATORY, ctx_resource_name, 128) __bin_field(3, DRBD_GENLA_F_MANDATORY, ctx_my_addr, 128) __bin_field(4, DRBD_GENLA_F_MANDATORY, ctx_peer_addr, 128) ) GENL_struct(DRBD_NLA_DISK_CONF, 3, disk_conf, __str_field(1, DRBD_F_REQUIRED | DRBD_F_INVARIANT, backing_dev, 128) __str_field(2, DRBD_F_REQUIRED | DRBD_F_INVARIANT, meta_dev, 128) __s32_field(3, DRBD_F_REQUIRED | DRBD_F_INVARIANT, meta_dev_idx) /* use the resize command to try and change the disk_size */ __u64_field(4, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, disk_size) /* we could change the max_bio_bvecs, * but it won't propagate through the stack */ __u32_field(5, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, max_bio_bvecs) __u32_field_def(6, DRBD_GENLA_F_MANDATORY, on_io_error, DRBD_ON_IO_ERROR_DEF) __u32_field_def(7, DRBD_GENLA_F_MANDATORY, fencing, DRBD_FENCING_DEF) __u32_field_def(8, DRBD_GENLA_F_MANDATORY, resync_rate, DRBD_RESYNC_RATE_DEF) __s32_field_def(9, DRBD_GENLA_F_MANDATORY, resync_after, DRBD_MINOR_NUMBER_DEF) __u32_field_def(10, DRBD_GENLA_F_MANDATORY, al_extents, DRBD_AL_EXTENTS_DEF) __u32_field_def(11, DRBD_GENLA_F_MANDATORY, c_plan_ahead, DRBD_C_PLAN_AHEAD_DEF) __u32_field_def(12, DRBD_GENLA_F_MANDATORY, c_delay_target, DRBD_C_DELAY_TARGET_DEF) __u32_field_def(13, DRBD_GENLA_F_MANDATORY, c_fill_target, DRBD_C_FILL_TARGET_DEF) __u32_field_def(14, DRBD_GENLA_F_MANDATORY, c_max_rate, DRBD_C_MAX_RATE_DEF) __u32_field_def(15, DRBD_GENLA_F_MANDATORY, c_min_rate, DRBD_C_MIN_RATE_DEF) __flg_field_def(16, DRBD_GENLA_F_MANDATORY, disk_barrier, DRBD_DISK_BARRIER_DEF) __flg_field_def(17, DRBD_GENLA_F_MANDATORY, disk_flushes, DRBD_DISK_FLUSHES_DEF) __flg_field_def(18, DRBD_GENLA_F_MANDATORY, disk_drain, DRBD_DISK_DRAIN_DEF) __flg_field_def(19, DRBD_GENLA_F_MANDATORY, md_flushes, DRBD_MD_FLUSHES_DEF) __u32_field_def(20, DRBD_GENLA_F_MANDATORY, disk_timeout, DRBD_DISK_TIMEOUT_DEF) __u32_field_def(21, 0 /* OPTIONAL */, read_balancing, DRBD_READ_BALANCING_DEF) /* 9: __u32_field_def(22, DRBD_GENLA_F_MANDATORY, unplug_watermark, DRBD_UNPLUG_WATERMARK_DEF) */ __flg_field_def(23, 0 /* OPTIONAL */, al_updates, DRBD_AL_UPDATES_DEF) ) GENL_struct(DRBD_NLA_RESOURCE_OPTS, 4, res_opts, __str_field_def(1, DRBD_GENLA_F_MANDATORY, cpu_mask, DRBD_CPU_MASK_SIZE) __u32_field_def(2, DRBD_GENLA_F_MANDATORY, on_no_data, DRBD_ON_NO_DATA_DEF) ) GENL_struct(DRBD_NLA_NET_CONF, 5, net_conf, __str_field_def(1, DRBD_GENLA_F_MANDATORY | DRBD_F_SENSITIVE, shared_secret, SHARED_SECRET_MAX) __str_field_def(2, DRBD_GENLA_F_MANDATORY, cram_hmac_alg, SHARED_SECRET_MAX) __str_field_def(3, DRBD_GENLA_F_MANDATORY, integrity_alg, SHARED_SECRET_MAX) __str_field_def(4, DRBD_GENLA_F_MANDATORY, verify_alg, SHARED_SECRET_MAX) __str_field_def(5, DRBD_GENLA_F_MANDATORY, csums_alg, SHARED_SECRET_MAX) __u32_field_def(6, DRBD_GENLA_F_MANDATORY, wire_protocol, DRBD_PROTOCOL_DEF) __u32_field_def(7, DRBD_GENLA_F_MANDATORY, connect_int, DRBD_CONNECT_INT_DEF) __u32_field_def(8, DRBD_GENLA_F_MANDATORY, timeout, DRBD_TIMEOUT_DEF) __u32_field_def(9, DRBD_GENLA_F_MANDATORY, ping_int, DRBD_PING_INT_DEF) __u32_field_def(10, DRBD_GENLA_F_MANDATORY, ping_timeo, DRBD_PING_TIMEO_DEF) __u32_field_def(11, DRBD_GENLA_F_MANDATORY, sndbuf_size, DRBD_SNDBUF_SIZE_DEF) __u32_field_def(12, DRBD_GENLA_F_MANDATORY, rcvbuf_size, DRBD_RCVBUF_SIZE_DEF) __u32_field_def(13, DRBD_GENLA_F_MANDATORY, ko_count, DRBD_KO_COUNT_DEF) __u32_field_def(14, DRBD_GENLA_F_MANDATORY, max_buffers, DRBD_MAX_BUFFERS_DEF) __u32_field_def(15, DRBD_GENLA_F_MANDATORY, max_epoch_size, DRBD_MAX_EPOCH_SIZE_DEF) __u32_field_def(16, DRBD_GENLA_F_MANDATORY, unplug_watermark, DRBD_UNPLUG_WATERMARK_DEF) __u32_field_def(17, DRBD_GENLA_F_MANDATORY, after_sb_0p, DRBD_AFTER_SB_0P_DEF) __u32_field_def(18, DRBD_GENLA_F_MANDATORY, after_sb_1p, DRBD_AFTER_SB_1P_DEF) __u32_field_def(19, DRBD_GENLA_F_MANDATORY, after_sb_2p, DRBD_AFTER_SB_2P_DEF) __u32_field_def(20, DRBD_GENLA_F_MANDATORY, rr_conflict, DRBD_RR_CONFLICT_DEF) __u32_field_def(21, DRBD_GENLA_F_MANDATORY, on_congestion, DRBD_ON_CONGESTION_DEF) __u32_field_def(22, DRBD_GENLA_F_MANDATORY, cong_fill, DRBD_CONG_FILL_DEF) __u32_field_def(23, DRBD_GENLA_F_MANDATORY, cong_extents, DRBD_CONG_EXTENTS_DEF) __flg_field_def(24, DRBD_GENLA_F_MANDATORY, two_primaries, DRBD_ALLOW_TWO_PRIMARIES_DEF) __flg_field(25, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, discard_my_data) __flg_field_def(26, DRBD_GENLA_F_MANDATORY, tcp_cork, DRBD_TCP_CORK_DEF) __flg_field_def(27, DRBD_GENLA_F_MANDATORY, always_asbp, DRBD_ALWAYS_ASBP_DEF) __flg_field(28, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, tentative) __flg_field_def(29, DRBD_GENLA_F_MANDATORY, use_rle, DRBD_USE_RLE_DEF) /* 9: __u32_field_def(30, DRBD_GENLA_F_MANDATORY, fencing_policy, DRBD_FENCING_DEF) */ ) GENL_struct(DRBD_NLA_SET_ROLE_PARMS, 6, set_role_parms, __flg_field(1, DRBD_GENLA_F_MANDATORY, assume_uptodate) ) GENL_struct(DRBD_NLA_RESIZE_PARMS, 7, resize_parms, __u64_field(1, DRBD_GENLA_F_MANDATORY, resize_size) __flg_field(2, DRBD_GENLA_F_MANDATORY, resize_force) __flg_field(3, DRBD_GENLA_F_MANDATORY, no_resync) __u32_field_def(4, 0 /* OPTIONAL */, al_stripes, DRBD_AL_STRIPES_DEF) __u32_field_def(5, 0 /* OPTIONAL */, al_stripe_size, DRBD_AL_STRIPE_SIZE_DEF) ) GENL_struct(DRBD_NLA_STATE_INFO, 8, state_info, /* the reason of the broadcast, * if this is an event triggered broadcast. */ __u32_field(1, DRBD_GENLA_F_MANDATORY, sib_reason) __u32_field(2, DRBD_F_REQUIRED, current_state) __u64_field(3, DRBD_GENLA_F_MANDATORY, capacity) __u64_field(4, DRBD_GENLA_F_MANDATORY, ed_uuid) /* These are for broadcast from after state change work. * prev_state and new_state are from the moment the state change took * place, new_state is not neccessarily the same as current_state, * there may have been more state changes since. Which will be * broadcasted soon, in their respective after state change work. */ __u32_field(5, DRBD_GENLA_F_MANDATORY, prev_state) __u32_field(6, DRBD_GENLA_F_MANDATORY, new_state) /* if we have a local disk: */ __bin_field(7, DRBD_GENLA_F_MANDATORY, uuids, (UI_SIZE*sizeof(__u64))) __u32_field(8, DRBD_GENLA_F_MANDATORY, disk_flags) __u64_field(9, DRBD_GENLA_F_MANDATORY, bits_total) __u64_field(10, DRBD_GENLA_F_MANDATORY, bits_oos) /* and in case resync or online verify is active */ __u64_field(11, DRBD_GENLA_F_MANDATORY, bits_rs_total) __u64_field(12, DRBD_GENLA_F_MANDATORY, bits_rs_failed) /* for pre and post notifications of helper execution */ __str_field(13, DRBD_GENLA_F_MANDATORY, helper, 32) __u32_field(14, DRBD_GENLA_F_MANDATORY, helper_exit_code) __u64_field(15, 0, send_cnt) __u64_field(16, 0, recv_cnt) __u64_field(17, 0, read_cnt) __u64_field(18, 0, writ_cnt) __u64_field(19, 0, al_writ_cnt) __u64_field(20, 0, bm_writ_cnt) __u32_field(21, 0, ap_bio_cnt) __u32_field(22, 0, ap_pending_cnt) __u32_field(23, 0, rs_pending_cnt) ) GENL_struct(DRBD_NLA_START_OV_PARMS, 9, start_ov_parms, __u64_field(1, DRBD_GENLA_F_MANDATORY, ov_start_sector) __u64_field(2, DRBD_GENLA_F_MANDATORY, ov_stop_sector) ) GENL_struct(DRBD_NLA_NEW_C_UUID_PARMS, 10, new_c_uuid_parms, __flg_field(1, DRBD_GENLA_F_MANDATORY, clear_bm) ) GENL_struct(DRBD_NLA_TIMEOUT_PARMS, 11, timeout_parms, __u32_field(1, DRBD_F_REQUIRED, timeout_type) ) GENL_struct(DRBD_NLA_DISCONNECT_PARMS, 12, disconnect_parms, __flg_field(1, DRBD_GENLA_F_MANDATORY, force_disconnect) ) GENL_struct(DRBD_NLA_DETACH_PARMS, 13, detach_parms, __flg_field(1, DRBD_GENLA_F_MANDATORY, force_detach) ) /* * Notifications and commands (genlmsghdr->cmd) */ GENL_mc_group(events) /* kernel -> userspace announcement of changes */ GENL_notification( DRBD_EVENT, 1, events, GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_STATE_INFO, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_NET_CONF, DRBD_GENLA_F_MANDATORY) GENL_tla_expected(DRBD_NLA_DISK_CONF, DRBD_GENLA_F_MANDATORY) GENL_tla_expected(DRBD_NLA_SYNCER_CONF, DRBD_GENLA_F_MANDATORY) ) /* query kernel for specific or all info */ GENL_op( DRBD_ADM_GET_STATUS, 2, GENL_op_init( .doit = drbd_adm_get_status, .dumpit = drbd_adm_get_status_all, /* anyone may ask for the status, * it is broadcasted anyways */ ), /* To select the object .doit. * Or a subset of objects in .dumpit. */ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY) ) /* add DRBD minor devices as volumes to resources */ GENL_op(DRBD_ADM_NEW_MINOR, 5, GENL_doit(drbd_adm_new_minor), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_DEL_MINOR, 6, GENL_doit(drbd_adm_del_minor), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) /* add or delete resources */ GENL_op(DRBD_ADM_NEW_RESOURCE, 7, GENL_doit(drbd_adm_new_resource), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_DEL_RESOURCE, 8, GENL_doit(drbd_adm_del_resource), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_RESOURCE_OPTS, 9, GENL_doit(drbd_adm_resource_opts), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_RESOURCE_OPTS, DRBD_GENLA_F_MANDATORY) ) GENL_op( DRBD_ADM_CONNECT, 10, GENL_doit(drbd_adm_connect), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_NET_CONF, DRBD_F_REQUIRED) ) GENL_op( DRBD_ADM_CHG_NET_OPTS, 29, GENL_doit(drbd_adm_net_opts), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_NET_CONF, DRBD_F_REQUIRED) ) GENL_op(DRBD_ADM_DISCONNECT, 11, GENL_doit(drbd_adm_disconnect), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_ATTACH, 12, GENL_doit(drbd_adm_attach), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_DISK_CONF, DRBD_F_REQUIRED) ) GENL_op(DRBD_ADM_CHG_DISK_OPTS, 28, GENL_doit(drbd_adm_disk_opts), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_DISK_OPTS, DRBD_F_REQUIRED) ) GENL_op( DRBD_ADM_RESIZE, 13, GENL_doit(drbd_adm_resize), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_RESIZE_PARMS, DRBD_GENLA_F_MANDATORY) ) GENL_op( DRBD_ADM_PRIMARY, 14, GENL_doit(drbd_adm_set_role), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_SET_ROLE_PARMS, DRBD_F_REQUIRED) ) GENL_op( DRBD_ADM_SECONDARY, 15, GENL_doit(drbd_adm_set_role), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_SET_ROLE_PARMS, DRBD_F_REQUIRED) ) GENL_op( DRBD_ADM_NEW_C_UUID, 16, GENL_doit(drbd_adm_new_c_uuid), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_NEW_C_UUID_PARMS, DRBD_GENLA_F_MANDATORY) ) GENL_op( DRBD_ADM_START_OV, 17, GENL_doit(drbd_adm_start_ov), GENL_tla_expected(DRBD_NLA_START_OV_PARMS, DRBD_GENLA_F_MANDATORY) ) GENL_op(DRBD_ADM_DETACH, 18, GENL_doit(drbd_adm_detach), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) GENL_tla_expected(DRBD_NLA_DETACH_PARMS, DRBD_GENLA_F_MANDATORY)) GENL_op(DRBD_ADM_INVALIDATE, 19, GENL_doit(drbd_adm_invalidate), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_INVAL_PEER, 20, GENL_doit(drbd_adm_invalidate_peer), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_PAUSE_SYNC, 21, GENL_doit(drbd_adm_pause_sync), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_RESUME_SYNC, 22, GENL_doit(drbd_adm_resume_sync), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_SUSPEND_IO, 23, GENL_doit(drbd_adm_suspend_io), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_RESUME_IO, 24, GENL_doit(drbd_adm_resume_io), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_OUTDATE, 25, GENL_doit(drbd_adm_outdate), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_GET_TIMEOUT_TYPE, 26, GENL_doit(drbd_adm_get_timeout_type), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) GENL_op(DRBD_ADM_DOWN, 27, GENL_doit(drbd_adm_down), GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) drbd-8.4.4/drbd/linux/drbd_genl_api.h0000664000000000000000000000335112132747531016145 0ustar rootroot#ifndef DRBD_GENL_STRUCT_H #define DRBD_GENL_STRUCT_H /** * struct drbd_genlmsghdr - DRBD specific header used in NETLINK_GENERIC requests * @minor: * For admin requests (user -> kernel): which minor device to operate on. * For (unicast) replies or informational (broadcast) messages * (kernel -> user): which minor device the information is about. * If we do not operate on minors, but on connections or resources, * the minor value shall be (~0), and the attribute DRBD_NLA_CFG_CONTEXT * is used instead. * @flags: possible operation modifiers (relevant only for user->kernel): * DRBD_GENL_F_SET_DEFAULTS * @volume: * When creating a new minor (adding it to a resource), the resource needs * to know which volume number within the resource this is supposed to be. * The volume number corresponds to the same volume number on the remote side, * whereas the minor number on the remote side may be different * (union with flags). * @ret_code: kernel->userland unicast cfg reply return code (union with flags); */ struct drbd_genlmsghdr { __u32 minor; union { __u32 flags; __s32 ret_code; }; }; /* To be used in drbd_genlmsghdr.flags */ enum { DRBD_GENL_F_SET_DEFAULTS = 1, }; enum drbd_state_info_bcast_reason { SIB_GET_STATUS_REPLY = 1, SIB_STATE_CHANGE = 2, SIB_HELPER_PRE = 3, SIB_HELPER_POST = 4, SIB_SYNC_PROGRESS = 5, }; /* hack around predefined gcc/cpp "linux=1", * we cannot possibly include <1/drbd_genl.h> */ #undef linux #include #define GENL_MAGIC_VERSION API_VERSION #define GENL_MAGIC_FAMILY drbd #define GENL_MAGIC_FAMILY_HDRSZ sizeof(struct drbd_genlmsghdr) #define GENL_MAGIC_INCLUDE_FILE #include #endif drbd-8.4.4/drbd/linux/drbd_limits.h0000664000000000000000000001560712221261130015660 0ustar rootroot/* drbd_limits.h This file is part of DRBD by Philipp Reisner and Lars Ellenberg. */ /* * Our current limitations. * Some of them are hard limits, * some of them are arbitrary range limits, that make it easier to provide * feedback about nonsense settings for certain configurable values. */ #ifndef DRBD_LIMITS_H #define DRBD_LIMITS_H 1 #define DEBUG_RANGE_CHECK 0 #define DRBD_MINOR_COUNT_MIN 1 #define DRBD_MINOR_COUNT_MAX 255 #define DRBD_MINOR_COUNT_DEF 32 #define DRBD_MINOR_COUNT_SCALE '1' #define DRBD_VOLUME_MAX 65535 #define DRBD_DIALOG_REFRESH_MIN 0 #define DRBD_DIALOG_REFRESH_MAX 600 #define DRBD_DIALOG_REFRESH_SCALE '1' /* valid port number */ #define DRBD_PORT_MIN 1 #define DRBD_PORT_MAX 0xffff #define DRBD_PORT_SCALE '1' /* startup { */ /* if you want more than 3.4 days, disable */ #define DRBD_WFC_TIMEOUT_MIN 0 #define DRBD_WFC_TIMEOUT_MAX 300000 #define DRBD_WFC_TIMEOUT_DEF 0 #define DRBD_WFC_TIMEOUT_SCALE '1' #define DRBD_DEGR_WFC_TIMEOUT_MIN 0 #define DRBD_DEGR_WFC_TIMEOUT_MAX 300000 #define DRBD_DEGR_WFC_TIMEOUT_DEF 0 #define DRBD_DEGR_WFC_TIMEOUT_SCALE '1' #define DRBD_OUTDATED_WFC_TIMEOUT_MIN 0 #define DRBD_OUTDATED_WFC_TIMEOUT_MAX 300000 #define DRBD_OUTDATED_WFC_TIMEOUT_DEF 0 #define DRBD_OUTDATED_WFC_TIMEOUT_SCALE '1' /* }*/ /* net { */ /* timeout, unit centi seconds * more than one minute timeout is not useful */ #define DRBD_TIMEOUT_MIN 1 #define DRBD_TIMEOUT_MAX 600 #define DRBD_TIMEOUT_DEF 60 /* 6 seconds */ #define DRBD_TIMEOUT_SCALE '1' /* If backing disk takes longer than disk_timeout, mark the disk as failed */ #define DRBD_DISK_TIMEOUT_MIN 0 /* 0 = disabled */ #define DRBD_DISK_TIMEOUT_MAX 6000 /* 10 Minutes */ #define DRBD_DISK_TIMEOUT_DEF 0 /* disabled */ #define DRBD_DISK_TIMEOUT_SCALE '1' /* active connection retries when C_WF_CONNECTION */ #define DRBD_CONNECT_INT_MIN 1 #define DRBD_CONNECT_INT_MAX 120 #define DRBD_CONNECT_INT_DEF 10 /* seconds */ #define DRBD_CONNECT_INT_SCALE '1' /* keep-alive probes when idle */ #define DRBD_PING_INT_MIN 1 #define DRBD_PING_INT_MAX 120 #define DRBD_PING_INT_DEF 10 #define DRBD_PING_INT_SCALE '1' /* timeout for the ping packets.*/ #define DRBD_PING_TIMEO_MIN 1 #define DRBD_PING_TIMEO_MAX 300 #define DRBD_PING_TIMEO_DEF 5 #define DRBD_PING_TIMEO_SCALE '1' /* max number of write requests between write barriers */ #define DRBD_MAX_EPOCH_SIZE_MIN 1 #define DRBD_MAX_EPOCH_SIZE_MAX 20000 #define DRBD_MAX_EPOCH_SIZE_DEF 2048 #define DRBD_MAX_EPOCH_SIZE_SCALE '1' /* I don't think that a tcp send buffer of more than 10M is useful */ #define DRBD_SNDBUF_SIZE_MIN 0 #define DRBD_SNDBUF_SIZE_MAX (10<<20) #define DRBD_SNDBUF_SIZE_DEF 0 #define DRBD_SNDBUF_SIZE_SCALE '1' #define DRBD_RCVBUF_SIZE_MIN 0 #define DRBD_RCVBUF_SIZE_MAX (10<<20) #define DRBD_RCVBUF_SIZE_DEF 0 #define DRBD_RCVBUF_SIZE_SCALE '1' /* @4k PageSize -> 128kB - 512MB */ #define DRBD_MAX_BUFFERS_MIN 32 #define DRBD_MAX_BUFFERS_MAX 131072 #define DRBD_MAX_BUFFERS_DEF 2048 #define DRBD_MAX_BUFFERS_SCALE '1' /* @4k PageSize -> 4kB - 512MB */ #define DRBD_UNPLUG_WATERMARK_MIN 1 #define DRBD_UNPLUG_WATERMARK_MAX 131072 #define DRBD_UNPLUG_WATERMARK_DEF (DRBD_MAX_BUFFERS_DEF/16) #define DRBD_UNPLUG_WATERMARK_SCALE '1' /* 0 is disabled. * 200 should be more than enough even for very short timeouts */ #define DRBD_KO_COUNT_MIN 0 #define DRBD_KO_COUNT_MAX 200 #define DRBD_KO_COUNT_DEF 7 #define DRBD_KO_COUNT_SCALE '1' /* } */ /* syncer { */ /* FIXME allow rate to be zero? */ #define DRBD_RESYNC_RATE_MIN 1 /* channel bonding 10 GbE, or other hardware */ #define DRBD_RESYNC_RATE_MAX (4 << 20) #define DRBD_RESYNC_RATE_DEF 250 #define DRBD_RESYNC_RATE_SCALE 'k' /* kilobytes */ /* less than 7 would hit performance unnecessarily. */ #define DRBD_AL_EXTENTS_MIN 7 /* we use u16 as "slot number", (u16)~0 is "FREE". * If you use >= 292 kB on-disk ring buffer, * this is the maximum you can use: */ #define DRBD_AL_EXTENTS_MAX 0xfffe #define DRBD_AL_EXTENTS_DEF 1237 #define DRBD_AL_EXTENTS_SCALE '1' #define DRBD_MINOR_NUMBER_MIN -1 #define DRBD_MINOR_NUMBER_MAX ((1 << 20) - 1) #define DRBD_MINOR_NUMBER_DEF -1 #define DRBD_MINOR_NUMBER_SCALE '1' /* } */ /* drbdsetup XY resize -d Z * you are free to reduce the device size to nothing, if you want to. * the upper limit with 64bit kernel, enough ram and flexible meta data * is 1 PiB, currently. */ /* DRBD_MAX_SECTORS */ #define DRBD_DISK_SIZE_MIN 0 #define DRBD_DISK_SIZE_MAX (1 * (2LLU << 40)) #define DRBD_DISK_SIZE_DEF 0 /* = disabled = no user size... */ #define DRBD_DISK_SIZE_SCALE 's' /* sectors */ #define DRBD_ON_IO_ERROR_DEF EP_DETACH #define DRBD_FENCING_DEF FP_DONT_CARE #define DRBD_AFTER_SB_0P_DEF ASB_DISCONNECT #define DRBD_AFTER_SB_1P_DEF ASB_DISCONNECT #define DRBD_AFTER_SB_2P_DEF ASB_DISCONNECT #define DRBD_RR_CONFLICT_DEF ASB_DISCONNECT #define DRBD_ON_NO_DATA_DEF OND_IO_ERROR #define DRBD_ON_CONGESTION_DEF OC_BLOCK #define DRBD_READ_BALANCING_DEF RB_PREFER_LOCAL #define DRBD_MAX_BIO_BVECS_MIN 0 #define DRBD_MAX_BIO_BVECS_MAX 128 #define DRBD_MAX_BIO_BVECS_DEF 0 #define DRBD_MAX_BIO_BVECS_SCALE '1' #define DRBD_C_PLAN_AHEAD_MIN 0 #define DRBD_C_PLAN_AHEAD_MAX 300 #define DRBD_C_PLAN_AHEAD_DEF 20 #define DRBD_C_PLAN_AHEAD_SCALE '1' #define DRBD_C_DELAY_TARGET_MIN 1 #define DRBD_C_DELAY_TARGET_MAX 100 #define DRBD_C_DELAY_TARGET_DEF 10 #define DRBD_C_DELAY_TARGET_SCALE '1' #define DRBD_C_FILL_TARGET_MIN 0 #define DRBD_C_FILL_TARGET_MAX (1<<20) /* 500MByte in sec */ #define DRBD_C_FILL_TARGET_DEF 100 /* Try to place 50KiB in socket send buffer during resync */ #define DRBD_C_FILL_TARGET_SCALE 's' /* sectors */ #define DRBD_C_MAX_RATE_MIN 250 #define DRBD_C_MAX_RATE_MAX (4 << 20) #define DRBD_C_MAX_RATE_DEF 102400 #define DRBD_C_MAX_RATE_SCALE 'k' /* kilobytes */ #define DRBD_C_MIN_RATE_MIN 0 #define DRBD_C_MIN_RATE_MAX (4 << 20) #define DRBD_C_MIN_RATE_DEF 250 #define DRBD_C_MIN_RATE_SCALE 'k' /* kilobytes */ #define DRBD_CONG_FILL_MIN 0 #define DRBD_CONG_FILL_MAX (10<<21) /* 10GByte in sectors */ #define DRBD_CONG_FILL_DEF 0 #define DRBD_CONG_FILL_SCALE 's' /* sectors */ #define DRBD_CONG_EXTENTS_MIN DRBD_AL_EXTENTS_MIN #define DRBD_CONG_EXTENTS_MAX DRBD_AL_EXTENTS_MAX #define DRBD_CONG_EXTENTS_DEF DRBD_AL_EXTENTS_DEF #define DRBD_CONG_EXTENTS_SCALE DRBD_AL_EXTENTS_SCALE #define DRBD_PROTOCOL_DEF DRBD_PROT_C #define DRBD_DISK_BARRIER_DEF 0 #define DRBD_DISK_FLUSHES_DEF 1 #define DRBD_DISK_DRAIN_DEF 1 #define DRBD_MD_FLUSHES_DEF 1 #define DRBD_TCP_CORK_DEF 1 #define DRBD_AL_UPDATES_DEF 1 #define DRBD_ALLOW_TWO_PRIMARIES_DEF 0 #define DRBD_ALWAYS_ASBP_DEF 0 #define DRBD_USE_RLE_DEF 1 #define DRBD_AL_STRIPES_MIN 1 #define DRBD_AL_STRIPES_MAX 1024 #define DRBD_AL_STRIPES_DEF 1 #define DRBD_AL_STRIPES_SCALE '1' #define DRBD_AL_STRIPE_SIZE_MIN 4 #define DRBD_AL_STRIPE_SIZE_MAX 16777216 #define DRBD_AL_STRIPE_SIZE_DEF 32 #define DRBD_AL_STRIPE_SIZE_SCALE 'k' /* kilobytes */ #endif drbd-8.4.4/drbd/linux/genl_magic_func.h0000664000000000000000000003027012132747531016474 0ustar rootroot#ifndef GENL_MAGIC_FUNC_H #define GENL_MAGIC_FUNC_H #include /* * Magic: declare tla policy {{{1 * Magic: declare nested policies * {{{2 */ #undef GENL_mc_group #define GENL_mc_group(group) #undef GENL_notification #define GENL_notification(op_name, op_num, mcast_group, tla_list) #undef GENL_op #define GENL_op(op_name, op_num, handler, tla_list) #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ [tag_name] = { .type = NLA_NESTED }, static struct nla_policy CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy)[] \ __attribute__((unused)) = { #include GENL_MAGIC_INCLUDE_FILE }; #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ static struct nla_policy s_name ## _nl_policy[] __read_mostly = \ { s_fields }; #undef __field #define __field(attr_nr, attr_flag, name, nla_type, _type, __get, \ __put, __is_signed) \ [attr_nr] = { .type = nla_type }, #undef __array #define __array(attr_nr, attr_flag, name, nla_type, _type, maxlen, \ __get, __put, __is_signed) \ [attr_nr] = { .type = nla_type, \ .len = maxlen - (nla_type == NLA_NUL_STRING) }, #include GENL_MAGIC_INCLUDE_FILE #ifndef __KERNEL__ #ifndef pr_info #define pr_info(args...) fprintf(stderr, args); #endif #endif #ifdef GENL_MAGIC_DEBUG static void dprint_field(const char *dir, int nla_type, const char *name, void *valp) { __u64 val = valp ? *(__u32 *)valp : 1; switch (nla_type) { case NLA_U8: val = (__u8)val; case NLA_U16: val = (__u16)val; case NLA_U32: val = (__u32)val; pr_info("%s attr %s: %d 0x%08x\n", dir, name, (int)val, (unsigned)val); break; case NLA_U64: val = *(__u64*)valp; pr_info("%s attr %s: %lld 0x%08llx\n", dir, name, (long long)val, (unsigned long long)val); break; case NLA_FLAG: if (val) pr_info("%s attr %s: set\n", dir, name); break; } } static void dprint_array(const char *dir, int nla_type, const char *name, const char *val, unsigned len) { switch (nla_type) { case NLA_NUL_STRING: if (len && val[len-1] == '\0') len--; pr_info("%s attr %s: [len:%u] '%s'\n", dir, name, len, val); break; default: /* we can always show 4 byte, * thats what nlattr are aligned to. */ pr_info("%s attr %s: [len:%u] %02x%02x%02x%02x ...\n", dir, name, len, val[0], val[1], val[2], val[3]); } } #define DPRINT_TLA(a, op, b) pr_info("%s %s %s\n", a, op, b); /* Name is a member field name of the struct s. * If s is NULL (only parsing, no copy requested in *_from_attrs()), * nla is supposed to point to the attribute containing the information * corresponding to that struct member. */ #define DPRINT_FIELD(dir, nla_type, name, s, nla) \ do { \ if (s) \ dprint_field(dir, nla_type, #name, &s->name); \ else if (nla) \ dprint_field(dir, nla_type, #name, \ (nla_type == NLA_FLAG) ? NULL \ : nla_data(nla)); \ } while (0) #define DPRINT_ARRAY(dir, nla_type, name, s, nla) \ do { \ if (s) \ dprint_array(dir, nla_type, #name, \ s->name, s->name ## _len); \ else if (nla) \ dprint_array(dir, nla_type, #name, \ nla_data(nla), nla_len(nla)); \ } while (0) #else #define DPRINT_TLA(a, op, b) do {} while (0) #define DPRINT_FIELD(dir, nla_type, name, s, nla) do {} while (0) #define DPRINT_ARRAY(dir, nla_type, name, s, nla) do {} while (0) #endif /* * Magic: provide conversion functions {{{1 * populate struct from attribute table: * {{{2 */ /* processing of generic netlink messages is serialized. * use one static buffer for parsing of nested attributes */ static struct nlattr *nested_attr_tb[128]; #ifndef BUILD_BUG_ON /* Force a compilation error if condition is true */ #define BUILD_BUG_ON(condition) ((void)BUILD_BUG_ON_ZERO(condition)) /* Force a compilation error if condition is true, but also produce a result (of value 0 and type size_t), so the expression can be used e.g. in a structure initializer (or where-ever else comma expressions aren't permitted). */ #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); })) #define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); })) #endif #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ static int __ ## s_name ## _from_attrs(struct s_name *s, \ struct genl_info *info, bool exclude_invariants) \ { \ const int maxtype = ARRAY_SIZE(s_name ## _nl_policy)-1; \ struct nlattr *tla = info->attrs[tag_number]; \ struct nlattr **ntb = nested_attr_tb; \ struct nlattr *nla; \ int err; \ BUILD_BUG_ON(ARRAY_SIZE(s_name ## _nl_policy) > ARRAY_SIZE(nested_attr_tb)); \ if (!tla) \ return -ENOMSG; \ DPRINT_TLA(#s_name, "<=-", #tag_name); \ err = drbd_nla_parse_nested(ntb, maxtype, tla, s_name ## _nl_policy); \ if (err) \ return err; \ \ s_fields \ return 0; \ } __attribute__((unused)) \ static int s_name ## _from_attrs(struct s_name *s, \ struct genl_info *info) \ { \ return __ ## s_name ## _from_attrs(s, info, false); \ } __attribute__((unused)) \ static int s_name ## _from_attrs_for_change(struct s_name *s, \ struct genl_info *info) \ { \ return __ ## s_name ## _from_attrs(s, info, true); \ } __attribute__((unused)) \ #define __assign(attr_nr, attr_flag, name, nla_type, type, assignment...) \ nla = ntb[attr_nr]; \ if (nla) { \ if (exclude_invariants && ((attr_flag) & DRBD_F_INVARIANT)) { \ pr_info("<< must not change invariant attr: %s\n", #name); \ return -EEXIST; \ } \ assignment; \ } else if (exclude_invariants && ((attr_flag) & DRBD_F_INVARIANT)) { \ /* attribute missing from payload, */ \ /* which was expected */ \ } else if ((attr_flag) & DRBD_F_REQUIRED) { \ pr_info("<< missing attr: %s\n", #name); \ return -ENOMSG; \ } #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, __get, __put, \ __is_signed) \ __assign(attr_nr, attr_flag, name, nla_type, type, \ if (s) \ s->name = __get(nla); \ DPRINT_FIELD("<<", nla_type, name, s, nla)) /* validate_nla() already checked nla_len <= maxlen appropriately. */ #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, maxlen, \ __get, __put, __is_signed) \ __assign(attr_nr, attr_flag, name, nla_type, type, \ if (s) \ s->name ## _len = \ __get(s->name, nla, maxlen); \ DPRINT_ARRAY("<<", nla_type, name, s, nla)) #include GENL_MAGIC_INCLUDE_FILE #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) /* * Magic: define op number to op name mapping {{{1 * {{{2 */ static const char *CONCAT_(GENL_MAGIC_FAMILY, _genl_cmd_to_str)(__u8 cmd) __attribute__ ((unused)); static const char *CONCAT_(GENL_MAGIC_FAMILY, _genl_cmd_to_str)(__u8 cmd) { switch (cmd) { #undef GENL_op #define GENL_op(op_name, op_num, handler, tla_list) \ case op_num: return #op_name; #include GENL_MAGIC_INCLUDE_FILE default: return "unknown"; } } #ifdef __KERNEL__ #include /* * Magic: define genl_ops {{{1 * {{{2 */ #undef GENL_op #define GENL_op(op_name, op_num, handler, tla_list) \ { \ handler \ .cmd = op_name, \ .policy = CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy), \ }, #define ZZZ_genl_ops CONCAT_(GENL_MAGIC_FAMILY, _genl_ops) static struct genl_ops ZZZ_genl_ops[] __read_mostly = { #include GENL_MAGIC_INCLUDE_FILE }; #undef GENL_op #define GENL_op(op_name, op_num, handler, tla_list) /* * Define the genl_family, multicast groups, {{{1 * and provide register/unregister functions. * {{{2 */ #define ZZZ_genl_family CONCAT_(GENL_MAGIC_FAMILY, _genl_family) static struct genl_family ZZZ_genl_family __read_mostly = { .id = GENL_ID_GENERATE, .name = __stringify(GENL_MAGIC_FAMILY), .version = GENL_MAGIC_VERSION, #ifdef GENL_MAGIC_FAMILY_HDRSZ .hdrsize = NLA_ALIGN(GENL_MAGIC_FAMILY_HDRSZ), #endif .maxattr = ARRAY_SIZE(drbd_tla_nl_policy)-1, }; /* * Magic: define multicast groups * Magic: define multicast group registration helper */ #undef GENL_mc_group #define GENL_mc_group(group) \ static struct genl_multicast_group \ CONCAT_(GENL_MAGIC_FAMILY, _mcg_ ## group) __read_mostly = { \ .name = #group, \ }; \ static int CONCAT_(GENL_MAGIC_FAMILY, _genl_multicast_ ## group)( \ struct sk_buff *skb, gfp_t flags) \ { \ unsigned int group_id = \ CONCAT_(GENL_MAGIC_FAMILY, _mcg_ ## group).id; \ if (!group_id) \ return -EINVAL; \ return genlmsg_multicast(skb, 0, group_id, flags); \ } #include GENL_MAGIC_INCLUDE_FILE int CONCAT_(GENL_MAGIC_FAMILY, _genl_register)(void) { int err = genl_register_family_with_ops(&ZZZ_genl_family, ZZZ_genl_ops, ARRAY_SIZE(ZZZ_genl_ops)); if (err) return err; #undef GENL_mc_group #define GENL_mc_group(group) \ err = genl_register_mc_group(&ZZZ_genl_family, \ &CONCAT_(GENL_MAGIC_FAMILY, _mcg_ ## group)); \ if (err) \ goto fail; \ else \ pr_info("%s: mcg %s: %u\n", #group, \ __stringify(GENL_MAGIC_FAMILY), \ CONCAT_(GENL_MAGIC_FAMILY, _mcg_ ## group).id); #include GENL_MAGIC_INCLUDE_FILE #undef GENL_mc_group #define GENL_mc_group(group) return 0; fail: genl_unregister_family(&ZZZ_genl_family); return err; } void CONCAT_(GENL_MAGIC_FAMILY, _genl_unregister)(void) { genl_unregister_family(&ZZZ_genl_family); } /* * Magic: provide conversion functions {{{1 * populate skb from struct. * {{{2 */ #undef GENL_op #define GENL_op(op_name, op_num, handler, tla_list) #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ static int s_name ## _to_skb(struct sk_buff *skb, struct s_name *s, \ const bool exclude_sensitive) \ { \ struct nlattr *tla = nla_nest_start(skb, tag_number); \ if (!tla) \ goto nla_put_failure; \ DPRINT_TLA(#s_name, "-=>", #tag_name); \ s_fields \ nla_nest_end(skb, tla); \ return 0; \ \ nla_put_failure: \ if (tla) \ nla_nest_cancel(skb, tla); \ return -EMSGSIZE; \ } \ static inline int s_name ## _to_priv_skb(struct sk_buff *skb, \ struct s_name *s) \ { \ return s_name ## _to_skb(skb, s, 0); \ } \ static inline int s_name ## _to_unpriv_skb(struct sk_buff *skb, \ struct s_name *s) \ { \ return s_name ## _to_skb(skb, s, 1); \ } #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, __get, __put, \ __is_signed) \ if (!exclude_sensitive || !((attr_flag) & DRBD_F_SENSITIVE)) { \ DPRINT_FIELD(">>", nla_type, name, s, NULL); \ if (__put(skb, attr_nr, s->name)) \ goto nla_put_failure; \ } #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, maxlen, \ __get, __put, __is_signed) \ if (!exclude_sensitive || !((attr_flag) & DRBD_F_SENSITIVE)) { \ DPRINT_ARRAY(">>",nla_type, name, s, NULL); \ if (__put(skb, attr_nr, min_t(int, maxlen, \ s->name ## _len + (nla_type == NLA_NUL_STRING)),\ s->name)) \ goto nla_put_failure; \ } #include GENL_MAGIC_INCLUDE_FILE /* Functions for initializing structs to default values. */ #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, __get, __put, \ __is_signed) #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, maxlen, \ __get, __put, __is_signed) #undef __u32_field_def #define __u32_field_def(attr_nr, attr_flag, name, default) \ x->name = default; #undef __s32_field_def #define __s32_field_def(attr_nr, attr_flag, name, default) \ x->name = default; #undef __flg_field_def #define __flg_field_def(attr_nr, attr_flag, name, default) \ x->name = default; #undef __str_field_def #define __str_field_def(attr_nr, attr_flag, name, maxlen) \ memset(x->name, 0, sizeof(x->name)); \ x->name ## _len = 0; #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ static void set_ ## s_name ## _defaults(struct s_name *x) __attribute__((unused)); \ static void set_ ## s_name ## _defaults(struct s_name *x) { \ s_fields \ } #include GENL_MAGIC_INCLUDE_FILE #endif /* __KERNEL__ */ /* }}}1 */ #endif /* GENL_MAGIC_FUNC_H */ /* vim: set foldmethod=marker foldlevel=1 nofoldenable : */ drbd-8.4.4/drbd/linux/genl_magic_struct.h0000664000000000000000000001673012001220573017055 0ustar rootroot#ifndef GENL_MAGIC_STRUCT_H #define GENL_MAGIC_STRUCT_H #ifndef GENL_MAGIC_FAMILY # error "you need to define GENL_MAGIC_FAMILY before inclusion" #endif #ifndef GENL_MAGIC_VERSION # error "you need to define GENL_MAGIC_VERSION before inclusion" #endif #ifndef GENL_MAGIC_INCLUDE_FILE # error "you need to define GENL_MAGIC_INCLUDE_FILE before inclusion" #endif #include #include #include #define CONCAT__(a,b) a ## b #define CONCAT_(a,b) CONCAT__(a,b) extern int CONCAT_(GENL_MAGIC_FAMILY, _genl_register)(void); extern void CONCAT_(GENL_MAGIC_FAMILY, _genl_unregister)(void); /* * Extension of genl attribute validation policies {{{2 */ /* * @DRBD_GENLA_F_MANDATORY: By default, netlink ignores attributes it does not * know about. This flag can be set in nlattr->nla_type to indicate that this * attribute must not be ignored. * * We check and remove this flag in drbd_nla_check_mandatory() before * validating the attribute types and lengths via nla_parse_nested(). */ #define DRBD_GENLA_F_MANDATORY (1 << 14) /* * Flags specific to drbd and not visible at the netlink layer, used in * _from_attrs and _to_skb: * * @DRBD_F_REQUIRED: Attribute is required; a request without this attribute is * invalid. * * @DRBD_F_SENSITIVE: Attribute includes sensitive information and must not be * included in unpriviledged get requests or broadcasts. * * @DRBD_F_INVARIANT: Attribute is set when an object is initially created, but * cannot subsequently be changed. */ #define DRBD_F_REQUIRED (1 << 0) #define DRBD_F_SENSITIVE (1 << 1) #define DRBD_F_INVARIANT (1 << 2) #define __nla_type(x) ((__u16)((x) & NLA_TYPE_MASK & ~DRBD_GENLA_F_MANDATORY)) /* }}}1 * MAGIC * multi-include macro expansion magic starts here */ /* MAGIC helpers {{{2 */ /* possible field types */ #define __flg_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U8, char, \ nla_get_u8, nla_put_u8, false) #define __u8_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U8, unsigned char, \ nla_get_u8, nla_put_u8, false) #define __u16_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U16, __u16, \ nla_get_u16, nla_put_u16, false) #define __u32_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U32, __u32, \ nla_get_u32, nla_put_u32, false) #define __s32_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U32, __s32, \ nla_get_u32, nla_put_u32, true) #define __u64_field(attr_nr, attr_flag, name) \ __field(attr_nr, attr_flag, name, NLA_U64, __u64, \ nla_get_u64, nla_put_u64, false) #define __str_field(attr_nr, attr_flag, name, maxlen) \ __array(attr_nr, attr_flag, name, NLA_NUL_STRING, char, maxlen, \ nla_strlcpy, nla_put, false) #define __bin_field(attr_nr, attr_flag, name, maxlen) \ __array(attr_nr, attr_flag, name, NLA_BINARY, char, maxlen, \ nla_memcpy, nla_put, false) /* fields with default values */ #define __flg_field_def(attr_nr, attr_flag, name, default) \ __flg_field(attr_nr, attr_flag, name) #define __u32_field_def(attr_nr, attr_flag, name, default) \ __u32_field(attr_nr, attr_flag, name) #define __s32_field_def(attr_nr, attr_flag, name, default) \ __s32_field(attr_nr, attr_flag, name) #define __str_field_def(attr_nr, attr_flag, name, maxlen) \ __str_field(attr_nr, attr_flag, name, maxlen) #define GENL_op_init(args...) args #define GENL_doit(handler) \ .doit = handler, \ .flags = GENL_ADMIN_PERM, #define GENL_dumpit(handler) \ .dumpit = handler, \ .flags = GENL_ADMIN_PERM, /* }}}1 * Magic: define the enum symbols for genl_ops * Magic: define the enum symbols for top level attributes * Magic: define the enum symbols for nested attributes * {{{2 */ #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) #undef GENL_mc_group #define GENL_mc_group(group) #undef GENL_notification #define GENL_notification(op_name, op_num, mcast_group, tla_list) \ op_name = op_num, #undef GENL_op #define GENL_op(op_name, op_num, handler, tla_list) \ op_name = op_num, enum { #include GENL_MAGIC_INCLUDE_FILE }; #undef GENL_notification #define GENL_notification(op_name, op_num, mcast_group, tla_list) #undef GENL_op #define GENL_op(op_name, op_num, handler, attr_list) #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ tag_name = tag_number, enum { #include GENL_MAGIC_INCLUDE_FILE }; #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ enum { \ s_fields \ }; #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, \ __get, __put, __is_signed) \ T_ ## name = (__u16)(attr_nr | ((attr_flag) & DRBD_GENLA_F_MANDATORY)), #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, \ maxlen, __get, __put, __is_signed) \ T_ ## name = (__u16)(attr_nr | ((attr_flag) & DRBD_GENLA_F_MANDATORY)), #include GENL_MAGIC_INCLUDE_FILE /* }}}1 * Magic: compile time assert unique numbers for operations * Magic: -"- unique numbers for top level attributes * Magic: -"- unique numbers for nested attributes * {{{2 */ #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) #undef GENL_op #define GENL_op(op_name, op_num, handler, attr_list) \ case op_name: #undef GENL_notification #define GENL_notification(op_name, op_num, mcast_group, tla_list) \ case op_name: static inline void ct_assert_unique_operations(void) { switch (0) { #include GENL_MAGIC_INCLUDE_FILE ; } } #undef GENL_op #define GENL_op(op_name, op_num, handler, attr_list) #undef GENL_notification #define GENL_notification(op_name, op_num, mcast_group, tla_list) #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ case tag_number: static inline void ct_assert_unique_top_level_attributes(void) { switch (0) { #include GENL_MAGIC_INCLUDE_FILE ; } } #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ static inline void ct_assert_unique_ ## s_name ## _attributes(void) \ { \ switch (0) { \ s_fields \ ; \ } \ } #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, __get, __put, \ __is_signed) \ case attr_nr: #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, maxlen, \ __get, __put, __is_signed) \ case attr_nr: #include GENL_MAGIC_INCLUDE_FILE /* }}}1 * Magic: declare structs * struct { * fields * }; * {{{2 */ #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ struct s_name { s_fields }; #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, __get, __put, \ __is_signed) \ type name; #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, maxlen, \ __get, __put, __is_signed) \ type name[maxlen]; \ __u32 name ## _len; #include GENL_MAGIC_INCLUDE_FILE #undef GENL_struct #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ enum { \ s_fields \ }; #undef __field #define __field(attr_nr, attr_flag, name, nla_type, type, __get, __put, \ is_signed) \ F_ ## name ## _IS_SIGNED = is_signed, #undef __array #define __array(attr_nr, attr_flag, name, nla_type, type, maxlen, \ __get, __put, is_signed) \ F_ ## name ## _IS_SIGNED = is_signed, #include GENL_MAGIC_INCLUDE_FILE /* }}}1 */ #endif /* GENL_MAGIC_STRUCT_H */ /* vim: set foldmethod=marker nofoldenable : */ drbd-8.4.4/drbd/linux/lru_cache.h0000664000000000000000000003265712221261130015315 0ustar rootroot/* lru_cache.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef LRU_CACHE_H #define LRU_CACHE_H #include #include #include #include /* for memset */ #include /* Compatibility code */ #include "compat.h" #ifndef COMPAT_HAVE_CLEAR_BIT_UNLOCK static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) { #if defined(__x86_64__) || defined(__i386__) || defined(__arch_um__) barrier(); #else smp_mb(); /* Be on the save side for alpha, and others */ #endif clear_bit(nr, addr); } #endif #ifndef COMPAT_HAVE_BOOL_TYPE typedef _Bool bool; enum { false = 0, true = 1 }; #define COMPAT_HAVE_BOOL_TYPE #endif #ifndef COMPAT_HLIST_FOR_EACH_ENTRY_HAS_THREE_PARAMETERS #define hlist_entry_safe(ptr, type, member) \ (ptr) ? hlist_entry(ptr, type, member) : NULL #ifdef hlist_for_each_entry #undef hlist_for_each_entry #endif #define hlist_for_each_entry(pos, head, member) \ for (pos = hlist_entry_safe((head)->first, typeof(*(pos)), member);\ pos; \ pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member)) #define COMPAT_HLIST_FOR_EACH_ENTRY_HAS_THREE_PARAMETERS #endif /* End of Compatibility code */ /* This header file (and its .c file; kernel-doc of functions see there) define a helper framework to easily keep track of index:label associations, and changes to an "active set" of objects, as well as pending transactions, to persistently record those changes. We use an LRU policy if it is necessary to "cool down" a region currently in the active set before we can "heat" a previously unused region. Because of this later property, it is called "lru_cache". As it actually Tracks Objects in an Active SeT, we could also call it toast (incidentally that is what may happen to the data on the backend storage uppon next resync, if we don't get it right). What for? We replicate IO (more or less synchronously) to local and remote disk. For crash recovery after replication node failure, we need to resync all regions that have been target of in-flight WRITE IO (in use, or "hot", regions), as we don't know whether or not those WRITEs have made it to stable storage. To avoid a "full resync", we need to persistently track these regions. This is known as "write intent log", and can be implemented as on-disk (coarse or fine grained) bitmap, or other meta data. To avoid the overhead of frequent extra writes to this meta data area, usually the condition is softened to regions that _may_ have been target of in-flight WRITE IO, e.g. by only lazily clearing the on-disk write-intent bitmap, trading frequency of meta data transactions against amount of (possibly unnecessary) resync traffic. If we set a hard limit on the area that may be "hot" at any given time, we limit the amount of resync traffic needed for crash recovery. For recovery after replication link failure, we need to resync all blocks that have been changed on the other replica in the mean time, or, if both replica have been changed independently [*], all blocks that have been changed on either replica in the mean time. [*] usually as a result of a cluster split-brain and insufficient protection. but there are valid use cases to do this on purpose. Tracking those blocks can be implemented as "dirty bitmap". Having it fine-grained reduces the amount of resync traffic. It should also be persistent, to allow for reboots (or crashes) while the replication link is down. There are various possible implementations for persistently storing write intent log information, three of which are mentioned here. "Chunk dirtying" The on-disk "dirty bitmap" may be re-used as "write-intent" bitmap as well. To reduce the frequency of bitmap updates for write-intent log purposes, one could dirty "chunks" (of some size) at a time of the (fine grained) on-disk bitmap, while keeping the in-memory "dirty" bitmap as clean as possible, flushing it to disk again when a previously "hot" (and on-disk dirtied as full chunk) area "cools down" again (no IO in flight anymore, and none expected in the near future either). "Explicit (coarse) write intent bitmap" An other implementation could chose a (probably coarse) explicit bitmap, for write-intent log purposes, additionally to the fine grained dirty bitmap. "Activity log" Yet an other implementation may keep track of the hot regions, by starting with an empty set, and writing down a journal of region numbers that have become "hot", or have "cooled down" again. To be able to use a ring buffer for this journal of changes to the active set, we not only record the actual changes to that set, but also record the not changing members of the set in a round robin fashion. To do so, we use a fixed (but configurable) number of slots which we can identify by index, and associate region numbers (labels) with these indices. For each transaction recording a change to the active set, we record the change itself (index: -old_label, +new_label), and which index is associated with which label (index: current_label) within a certain sliding window that is moved further over the available indices with each such transaction. Thus, for crash recovery, if the ringbuffer is sufficiently large, we can accurately reconstruct the active set. Sufficiently large depends only on maximum number of active objects, and the size of the sliding window recording "index: current_label" associations within each transaction. This is what we call the "activity log". Currently we need one activity log transaction per single label change, which does not give much benefit over the "dirty chunks of bitmap" approach, other than potentially less seeks. We plan to change the transaction format to support multiple changes per transaction, which then would reduce several (disjoint, "random") updates to the bitmap into one transaction to the activity log ring buffer. */ /* this defines an element in a tracked set * .colision is for hash table lookup. * When we process a new IO request, we know its sector, thus can deduce the * region number (label) easily. To do the label -> object lookup without a * full list walk, we use a simple hash table. * * .list is on one of three lists: * in_use: currently in use (refcnt > 0, lc_number != LC_FREE) * lru: unused but ready to be reused or recycled * (lc_refcnt == 0, lc_number != LC_FREE), * free: unused but ready to be recycled * (lc_refcnt == 0, lc_number == LC_FREE), * * an element is said to be "in the active set", * if either on "in_use" or "lru", i.e. lc_number != LC_FREE. * * DRBD currently (May 2009) only uses 61 elements on the resync lru_cache * (total memory usage 2 pages), and up to 3833 elements on the act_log * lru_cache, totalling ~215 kB for 64bit architecture, ~53 pages. * * We usually do not actually free these objects again, but only "recycle" * them, as the change "index: -old_label, +LC_FREE" would need a transaction * as well. Which also means that using a kmem_cache to allocate the objects * from wastes some resources. * But it avoids high order page allocations in kmalloc. */ struct lc_element { struct hlist_node colision; struct list_head list; /* LRU list or free list */ unsigned refcnt; /* back "pointer" into lc_cache->element[index], * for paranoia, and for "lc_element_to_index" */ unsigned lc_index; /* if we want to track a larger set of objects, * it needs to become arch independend u64 */ unsigned lc_number; /* special label when on free list */ #define LC_FREE (~0U) /* for pending changes */ unsigned lc_new_number; }; struct lru_cache { /* the least recently used item is kept at lru->prev */ struct list_head lru; struct list_head free; struct list_head in_use; struct list_head to_be_changed; /* the pre-created kmem cache to allocate the objects from */ struct kmem_cache *lc_cache; /* size of tracked objects, used to memset(,0,) them in lc_reset */ size_t element_size; /* offset of struct lc_element member in the tracked object */ size_t element_off; /* number of elements (indices) */ unsigned int nr_elements; /* Arbitrary limit on maximum tracked objects. Practical limit is much * lower due to allocation failures, probably. For typical use cases, * nr_elements should be a few thousand at most. * This also limits the maximum value of lc_element.lc_index, allowing the * 8 high bits of .lc_index to be overloaded with flags in the future. */ #define LC_MAX_ACTIVE (1<<24) /* allow to accumulate a few (index:label) changes, * but no more than max_pending_changes */ unsigned int max_pending_changes; /* number of elements currently on to_be_changed list */ unsigned int pending_changes; /* statistics */ unsigned used; /* number of elements currently on in_use list */ unsigned long hits, misses, starving, locked, changed; /* see below: flag-bits for lru_cache */ unsigned long flags; void *lc_private; const char *name; /* nr_elements there */ struct hlist_head *lc_slot; struct lc_element **lc_element; }; /* flag-bits for lru_cache */ enum { /* debugging aid, to catch concurrent access early. * user needs to guarantee exclusive access by proper locking! */ __LC_PARANOIA, /* annotate that the set is "dirty", possibly accumulating further * changes, until a transaction is finally triggered */ __LC_DIRTY, /* Locked, no further changes allowed. * Also used to serialize changing transactions. */ __LC_LOCKED, /* if we need to change the set, but currently there is no free nor * unused element available, we are "starving", and must not give out * further references, to guarantee that eventually some refcnt will * drop to zero and we will be able to make progress again, changing * the set, writing the transaction. * if the statistics say we are frequently starving, * nr_elements is too small. */ __LC_STARVING, }; #define LC_PARANOIA (1<<__LC_PARANOIA) #define LC_DIRTY (1<<__LC_DIRTY) #define LC_LOCKED (1<<__LC_LOCKED) #define LC_STARVING (1<<__LC_STARVING) extern struct lru_cache *lc_create(const char *name, struct kmem_cache *cache, unsigned max_pending_changes, unsigned e_count, size_t e_size, size_t e_off); extern void lc_reset(struct lru_cache *lc); extern void lc_destroy(struct lru_cache *lc); extern void lc_set(struct lru_cache *lc, unsigned int enr, int index); extern void lc_del(struct lru_cache *lc, struct lc_element *element); extern struct lc_element *lc_get_cumulative(struct lru_cache *lc, unsigned int enr); extern struct lc_element *lc_try_get(struct lru_cache *lc, unsigned int enr); extern struct lc_element *lc_find(struct lru_cache *lc, unsigned int enr); extern struct lc_element *lc_get(struct lru_cache *lc, unsigned int enr); extern unsigned int lc_put(struct lru_cache *lc, struct lc_element *e); extern void lc_committed(struct lru_cache *lc); struct seq_file; extern size_t lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc); extern void lc_seq_dump_details(struct seq_file *seq, struct lru_cache *lc, char *utext, void (*detail) (struct seq_file *, struct lc_element *)); /** * lc_try_lock_for_transaction - can be used to stop lc_get() from changing the tracked set * @lc: the lru cache to operate on * * Allows (expects) the set to be "dirty". Note that the reference counts and * order on the active and lru lists may still change. Used to serialize * changing transactions. Returns true if we aquired the lock. */ static inline int lc_try_lock_for_transaction(struct lru_cache *lc) { return !test_and_set_bit(__LC_LOCKED, &lc->flags); } /** * lc_try_lock - variant to stop lc_get() from changing the tracked set * @lc: the lru cache to operate on * * Note that the reference counts and order on the active and lru lists may * still change. Only works on a "clean" set. Returns true if we aquired the * lock, which means there are no pending changes, and any further attempt to * change the set will not succeed until the next lc_unlock(). */ extern int lc_try_lock(struct lru_cache *lc); /** * lc_unlock - unlock @lc, allow lc_get() to change the set again * @lc: the lru cache to operate on */ static inline void lc_unlock(struct lru_cache *lc) { clear_bit(__LC_DIRTY, &lc->flags); clear_bit_unlock(__LC_LOCKED, &lc->flags); } extern bool lc_is_used(struct lru_cache *lc, unsigned int enr); #define lc_entry(ptr, type, member) \ container_of(ptr, type, member) extern struct lc_element *lc_element_by_index(struct lru_cache *lc, unsigned i); extern unsigned int lc_index_of(struct lru_cache *lc, struct lc_element *e); #endif drbd-8.4.4/drbd/lru_cache.c0000664000000000000000000004507012221261130014142 0ustar rootroot/* lru_cache.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include /* for memset */ #include /* for seq_printf */ #include /* this is developers aid only. * it catches concurrent access (lack of locking on the users part) */ #define PARANOIA_ENTRY() do { \ BUG_ON(!lc); \ BUG_ON(!lc->nr_elements); \ BUG_ON(test_and_set_bit(__LC_PARANOIA, &lc->flags)); \ } while (0) #define RETURN(x...) do { \ clear_bit_unlock(__LC_PARANOIA, &lc->flags); \ return x ; } while (0) /* BUG() if e is not one of the elements tracked by lc */ #define PARANOIA_LC_ELEMENT(lc, e) do { \ struct lru_cache *lc_ = (lc); \ struct lc_element *e_ = (e); \ unsigned i = e_->lc_index; \ BUG_ON(i >= lc_->nr_elements); \ BUG_ON(lc_->lc_element[i] != e_); } while (0) /* We need to atomically * - try to grab the lock (set LC_LOCKED) * - only if there is no pending transaction * (neither LC_DIRTY nor LC_STARVING is set) * Because of PARANOIA_ENTRY() above abusing lc->flags as well, * it is not sufficient to just say * return 0 == cmpxchg(&lc->flags, 0, LC_LOCKED); */ int lc_try_lock(struct lru_cache *lc) { unsigned long val; do { val = cmpxchg(&lc->flags, 0, LC_LOCKED); } while (unlikely (val == LC_PARANOIA)); /* Spin until no-one is inside a PARANOIA_ENTRY()/RETURN() section. */ return 0 == val; #if 0 /* Alternative approach, spin in case someone enters or leaves a * PARANOIA_ENTRY()/RETURN() section. */ unsigned long old, new, val; do { old = lc->flags & LC_PARANOIA; new = old | LC_LOCKED; val = cmpxchg(&lc->flags, old, new); } while (unlikely (val == (old ^ LC_PARANOIA))); return old == val; #endif } /** * lc_create - prepares to track objects in an active set * @name: descriptive name only used in lc_seq_printf_stats and lc_seq_dump_details * @max_pending_changes: maximum changes to accumulate until a transaction is required * @e_count: number of elements allowed to be active simultaneously * @e_size: size of the tracked objects * @e_off: offset to the &struct lc_element member in a tracked object * * Returns a pointer to a newly initialized struct lru_cache on success, * or NULL on (allocation) failure. */ struct lru_cache *lc_create(const char *name, struct kmem_cache *cache, unsigned max_pending_changes, unsigned e_count, size_t e_size, size_t e_off) { struct hlist_head *slot = NULL; struct lc_element **element = NULL; struct lru_cache *lc; struct lc_element *e; unsigned cache_obj_size = kmem_cache_size(cache); unsigned i; WARN_ON(cache_obj_size < e_size); if (cache_obj_size < e_size) return NULL; /* e_count too big; would probably fail the allocation below anyways. * for typical use cases, e_count should be few thousand at most. */ if (e_count > LC_MAX_ACTIVE) return NULL; slot = kcalloc(e_count, sizeof(struct hlist_head), GFP_KERNEL); if (!slot) goto out_fail; element = kzalloc(e_count * sizeof(struct lc_element *), GFP_KERNEL); if (!element) goto out_fail; lc = kzalloc(sizeof(*lc), GFP_KERNEL); if (!lc) goto out_fail; INIT_LIST_HEAD(&lc->in_use); INIT_LIST_HEAD(&lc->lru); INIT_LIST_HEAD(&lc->free); INIT_LIST_HEAD(&lc->to_be_changed); lc->name = name; lc->element_size = e_size; lc->element_off = e_off; lc->nr_elements = e_count; lc->max_pending_changes = max_pending_changes; lc->lc_cache = cache; lc->lc_element = element; lc->lc_slot = slot; /* preallocate all objects */ for (i = 0; i < e_count; i++) { void *p = kmem_cache_alloc(cache, GFP_KERNEL); if (!p) break; memset(p, 0, lc->element_size); e = p + e_off; e->lc_index = i; e->lc_number = LC_FREE; e->lc_new_number = LC_FREE; list_add(&e->list, &lc->free); element[i] = e; } if (i == e_count) return lc; /* else: could not allocate all elements, give up */ for (i--; i; i--) { void *p = element[i]; kmem_cache_free(cache, p - e_off); } kfree(lc); out_fail: kfree(element); kfree(slot); return NULL; } void lc_free_by_index(struct lru_cache *lc, unsigned i) { void *p = lc->lc_element[i]; WARN_ON(!p); if (p) { p -= lc->element_off; kmem_cache_free(lc->lc_cache, p); } } /** * lc_destroy - frees memory allocated by lc_create() * @lc: the lru cache to destroy */ void lc_destroy(struct lru_cache *lc) { unsigned i; if (!lc) return; for (i = 0; i < lc->nr_elements; i++) lc_free_by_index(lc, i); kfree(lc->lc_element); kfree(lc->lc_slot); kfree(lc); } /** * lc_reset - does a full reset for @lc and the hash table slots. * @lc: the lru cache to operate on * * It is roughly the equivalent of re-allocating a fresh lru_cache object, * basically a short cut to lc_destroy(lc); lc = lc_create(...); */ void lc_reset(struct lru_cache *lc) { unsigned i; INIT_LIST_HEAD(&lc->in_use); INIT_LIST_HEAD(&lc->lru); INIT_LIST_HEAD(&lc->free); INIT_LIST_HEAD(&lc->to_be_changed); lc->used = 0; lc->hits = 0; lc->misses = 0; lc->starving = 0; lc->locked = 0; lc->changed = 0; lc->pending_changes = 0; lc->flags = 0; memset(lc->lc_slot, 0, sizeof(struct hlist_head) * lc->nr_elements); for (i = 0; i < lc->nr_elements; i++) { struct lc_element *e = lc->lc_element[i]; void *p = e; p -= lc->element_off; memset(p, 0, lc->element_size); /* re-init it */ e->lc_index = i; e->lc_number = LC_FREE; e->lc_new_number = LC_FREE; list_add(&e->list, &lc->free); } } /** * lc_seq_printf_stats - print stats about @lc into @seq * @seq: the seq_file to print into * @lc: the lru cache to print statistics of */ size_t lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc) { /* NOTE: * total calls to lc_get are * (starving + hits + misses) * misses include "locked" count (update from an other thread in * progress) and "changed", when this in fact lead to an successful * update of the cache. */ return seq_printf(seq, "\t%s: used:%u/%u " "hits:%lu misses:%lu starving:%lu locked:%lu changed:%lu\n", lc->name, lc->used, lc->nr_elements, lc->hits, lc->misses, lc->starving, lc->locked, lc->changed); } static struct hlist_head *lc_hash_slot(struct lru_cache *lc, unsigned int enr) { return lc->lc_slot + (enr % lc->nr_elements); } static struct lc_element *__lc_find(struct lru_cache *lc, unsigned int enr, bool include_changing) { struct lc_element *e; BUG_ON(!lc); BUG_ON(!lc->nr_elements); hlist_for_each_entry(e, lc_hash_slot(lc, enr), colision) { /* "about to be changed" elements, pending transaction commit, * are hashed by their "new number". "Normal" elements have * lc_number == lc_new_number. */ if (e->lc_new_number != enr) continue; if (e->lc_new_number == e->lc_number || include_changing) return e; break; } return NULL; } /** * lc_find - find element by label, if present in the hash table * @lc: The lru_cache object * @enr: element number * * Returns the pointer to an element, if the element with the requested * "label" or element number is present in the hash table, * or NULL if not found. Does not change the refcnt. * Ignores elements that are "about to be used", i.e. not yet in the active * set, but still pending transaction commit. */ struct lc_element *lc_find(struct lru_cache *lc, unsigned int enr) { return __lc_find(lc, enr, 0); } /** * lc_is_used - find element by label * @lc: The lru_cache object * @enr: element number * * Returns true, if the element with the requested "label" or element number is * present in the hash table, and is used (refcnt > 0). * Also finds elements that are not _currently_ used but only "about to be * used", i.e. on the "to_be_changed" list, pending transaction commit. */ bool lc_is_used(struct lru_cache *lc, unsigned int enr) { struct lc_element *e = __lc_find(lc, enr, 1); return e && e->refcnt; } /** * lc_del - removes an element from the cache * @lc: The lru_cache object * @e: The element to remove * * @e must be unused (refcnt == 0). Moves @e from "lru" to "free" list, * sets @e->enr to %LC_FREE. */ void lc_del(struct lru_cache *lc, struct lc_element *e) { PARANOIA_ENTRY(); PARANOIA_LC_ELEMENT(lc, e); BUG_ON(e->refcnt); e->lc_number = e->lc_new_number = LC_FREE; hlist_del_init(&e->colision); list_move(&e->list, &lc->free); RETURN(); } static struct lc_element *lc_prepare_for_change(struct lru_cache *lc, unsigned new_number) { struct list_head *n; struct lc_element *e; if (!list_empty(&lc->free)) n = lc->free.next; else if (!list_empty(&lc->lru)) n = lc->lru.prev; else return NULL; e = list_entry(n, struct lc_element, list); PARANOIA_LC_ELEMENT(lc, e); e->lc_new_number = new_number; if (!hlist_unhashed(&e->colision)) __hlist_del(&e->colision); hlist_add_head(&e->colision, lc_hash_slot(lc, new_number)); list_move(&e->list, &lc->to_be_changed); return e; } static int lc_unused_element_available(struct lru_cache *lc) { if (!list_empty(&lc->free)) return 1; /* something on the free list */ if (!list_empty(&lc->lru)) return 1; /* something to evict */ return 0; } /* used as internal flags to __lc_get */ enum { LC_GET_MAY_CHANGE = 1, LC_GET_MAY_USE_UNCOMMITTED = 2, }; static struct lc_element *__lc_get(struct lru_cache *lc, unsigned int enr, unsigned int flags) { struct lc_element *e; PARANOIA_ENTRY(); if (lc->flags & LC_STARVING) { ++lc->starving; RETURN(NULL); } e = __lc_find(lc, enr, 1); /* if lc_new_number != lc_number, * this enr is currently being pulled in already, * and will be available once the pending transaction * has been committed. */ if (e) { if (e->lc_new_number != e->lc_number) { /* It has been found above, but on the "to_be_changed" * list, not yet committed. Don't pull it in twice, * wait for the transaction, then try again... */ if (!(flags & LC_GET_MAY_USE_UNCOMMITTED)) RETURN(NULL); /* ... unless the caller is aware of the implications, * probably preparing a cumulative transaction. */ ++e->refcnt; ++lc->hits; RETURN(e); } /* else: lc_new_number == lc_number; a real hit. */ ++lc->hits; if (e->refcnt++ == 0) lc->used++; list_move(&e->list, &lc->in_use); /* Not evictable... */ RETURN(e); } /* e == NULL */ ++lc->misses; if (!(flags & LC_GET_MAY_CHANGE)) RETURN(NULL); /* To avoid races with lc_try_lock(), first, mark us dirty * (using test_and_set_bit, as it implies memory barriers), ... */ test_and_set_bit(__LC_DIRTY, &lc->flags); /* ... only then check if it is locked anyways. If lc_unlock clears * the dirty bit again, that's not a problem, we will come here again. */ if (test_bit(__LC_LOCKED, &lc->flags)) { ++lc->locked; RETURN(NULL); } /* In case there is nothing available and we can not kick out * the LRU element, we have to wait ... */ if (!lc_unused_element_available(lc)) { __set_bit(__LC_STARVING, &lc->flags); RETURN(NULL); } /* It was not present in the active set. We are going to recycle an * unused (or even "free") element, but we won't accumulate more than * max_pending_changes changes. */ if (lc->pending_changes >= lc->max_pending_changes) RETURN(NULL); e = lc_prepare_for_change(lc, enr); BUG_ON(!e); clear_bit(__LC_STARVING, &lc->flags); BUG_ON(++e->refcnt != 1); lc->used++; lc->pending_changes++; RETURN(e); } /** * lc_get - get element by label, maybe change the active set * @lc: the lru cache to operate on * @enr: the label to look up * * Finds an element in the cache, increases its usage count, * "touches" and returns it. * * In case the requested number is not present, it needs to be added to the * cache. Therefore it is possible that an other element becomes evicted from * the cache. In either case, the user is notified so he is able to e.g. keep * a persistent log of the cache changes, and therefore the objects in use. * * Return values: * NULL * The cache was marked %LC_STARVING, * or the requested label was not in the active set * and a changing transaction is still pending (@lc was marked %LC_DIRTY). * Or no unused or free element could be recycled (@lc will be marked as * %LC_STARVING, blocking further lc_get() operations). * * pointer to the element with the REQUESTED element number. * In this case, it can be used right away * * pointer to an UNUSED element with some different element number, * where that different number may also be %LC_FREE. * * In this case, the cache is marked %LC_DIRTY, * so lc_try_lock() will no longer succeed. * The returned element pointer is moved to the "to_be_changed" list, * and registered with the new element number on the hash collision chains, * so it is possible to pick it up from lc_is_used(). * Up to "max_pending_changes" (see lc_create()) can be accumulated. * The user now should do whatever housekeeping is necessary, * typically serialize on lc_try_lock_for_transaction(), then call * lc_committed(lc) and lc_unlock(), to finish the change. * * NOTE: The user needs to check the lc_number on EACH use, so he recognizes * any cache set change. */ struct lc_element *lc_get(struct lru_cache *lc, unsigned int enr) { return __lc_get(lc, enr, LC_GET_MAY_CHANGE); } /** * lc_get_cumulative - like lc_get; also finds to-be-changed elements * @lc: the lru cache to operate on * @enr: the label to look up * * Unlike lc_get this also returns the element for @enr, if it is belonging to * a pending transaction, so the return values are like for lc_get(), * plus: * * pointer to an element already on the "to_be_changed" list. * In this case, the cache was already marked %LC_DIRTY. * * Caller needs to make sure that the pending transaction is completed, * before proceeding to actually use this element. */ struct lc_element *lc_get_cumulative(struct lru_cache *lc, unsigned int enr) { return __lc_get(lc, enr, LC_GET_MAY_CHANGE|LC_GET_MAY_USE_UNCOMMITTED); } /** * lc_try_get - get element by label, if present; do not change the active set * @lc: the lru cache to operate on * @enr: the label to look up * * Finds an element in the cache, increases its usage count, * "touches" and returns it. * * Return values: * NULL * The cache was marked %LC_STARVING, * or the requested label was not in the active set * * pointer to the element with the REQUESTED element number. * In this case, it can be used right away */ struct lc_element *lc_try_get(struct lru_cache *lc, unsigned int enr) { return __lc_get(lc, enr, 0); } /** * lc_committed - tell @lc that pending changes have been recorded * @lc: the lru cache to operate on * * User is expected to serialize on explicit lc_try_lock_for_transaction() * before the transaction is started, and later needs to lc_unlock() explicitly * as well. */ void lc_committed(struct lru_cache *lc) { struct lc_element *e, *tmp; PARANOIA_ENTRY(); list_for_each_entry_safe(e, tmp, &lc->to_be_changed, list) { /* count number of changes, not number of transactions */ ++lc->changed; e->lc_number = e->lc_new_number; list_move(&e->list, &lc->in_use); } lc->pending_changes = 0; RETURN(); } /** * lc_put - give up refcnt of @e * @lc: the lru cache to operate on * @e: the element to put * * If refcnt reaches zero, the element is moved to the lru list, * and a %LC_STARVING (if set) is cleared. * Returns the new (post-decrement) refcnt. */ unsigned int lc_put(struct lru_cache *lc, struct lc_element *e) { PARANOIA_ENTRY(); PARANOIA_LC_ELEMENT(lc, e); BUG_ON(e->refcnt == 0); BUG_ON(e->lc_number != e->lc_new_number); if (--e->refcnt == 0) { /* move it to the front of LRU. */ list_move(&e->list, &lc->lru); lc->used--; clear_bit_unlock(__LC_STARVING, &lc->flags); } RETURN(e->refcnt); } /** * lc_element_by_index * @lc: the lru cache to operate on * @i: the index of the element to return */ struct lc_element *lc_element_by_index(struct lru_cache *lc, unsigned i) { BUG_ON(i >= lc->nr_elements); BUG_ON(lc->lc_element[i] == NULL); BUG_ON(lc->lc_element[i]->lc_index != i); return lc->lc_element[i]; } /** * lc_index_of * @lc: the lru cache to operate on * @e: the element to query for its index position in lc->element */ unsigned int lc_index_of(struct lru_cache *lc, struct lc_element *e) { PARANOIA_LC_ELEMENT(lc, e); return e->lc_index; } /** * lc_set - associate index with label * @lc: the lru cache to operate on * @enr: the label to set * @index: the element index to associate label with. * * Used to initialize the active set to some previously recorded state. */ void lc_set(struct lru_cache *lc, unsigned int enr, int index) { struct lc_element *e; struct list_head *lh; if (index < 0 || index >= lc->nr_elements) return; e = lc_element_by_index(lc, index); BUG_ON(e->lc_number != e->lc_new_number); BUG_ON(e->refcnt != 0); e->lc_number = e->lc_new_number = enr; hlist_del_init(&e->colision); if (enr == LC_FREE) lh = &lc->free; else { hlist_add_head(&e->colision, lc_hash_slot(lc, enr)); lh = &lc->lru; } list_move(&e->list, lh); } /** * lc_dump - Dump a complete LRU cache to seq in textual form. * @lc: the lru cache to operate on * @seq: the &struct seq_file pointer to seq_printf into * @utext: user supplied "heading" or other info * @detail: function pointer the user may provide to dump further details * of the object the lc_element is embedded in. */ void lc_seq_dump_details(struct seq_file *seq, struct lru_cache *lc, char *utext, void (*detail) (struct seq_file *, struct lc_element *)) { unsigned int nr_elements = lc->nr_elements; struct lc_element *e; int i; seq_printf(seq, "\tnn: lc_number refcnt %s\n ", utext); for (i = 0; i < nr_elements; i++) { e = lc_element_by_index(lc, i); if (e->lc_number == LC_FREE) { seq_printf(seq, "\t%2d: FREE\n", i); } else { seq_printf(seq, "\t%2d: %4u %4u ", i, e->lc_number, e->refcnt); detail(seq, e); } } } drbd-8.4.4/filelist-redhat0000664000000000000000000000045111753207431014150 0ustar rootroot%defattr(644,root,root,755) %doc COPYING %doc ChangeLog %if 0%(grep -q "release 5" /etc/redhat-release && echo 1) /lib/modules/%verrel%variant %doc obj/k-config-%verrel%variant.gz %else /lib/modules/%verrel%dotvariant %doc obj/k-config-%verrel%dotvariant.gz %endif %config /etc/depmod.d/drbd.conf drbd-8.4.4/filelist-suse0000664000000000000000000000036011753207431013657 0ustar rootroot%defattr(-,root,root) %doc COPYING %doc ChangeLog %if %{defined 3} # on sles10, _suse_kernel_module_subpackage takes 3 arguments still /lib/modules/%3-%1 %doc obj/k-config-%3-%1.gz %else /lib/modules/%2-%1 %doc obj/k-config-%2-%1.gz %endif drbd-8.4.4/preamble0000664000000000000000000000074212004012032012637 0ustar rootroot# always require a suitable userland Requires: drbd-utils = %{version} %if %{defined suse_kernel_module_package} %if 0%{?sles_version} == 10 %{expand:%(cat %_sourcedir/preamble-sles10)} %else %if 0%{?sles_version} == 11 %{expand:%(cat _sourcedir/preamble-sles11)} %endif %endif %else %if 0%((test -e /etc/redhat-release && grep -q "release 5" /etc/redhat-release) && echo 1) %{expand:%(cat _sourcedir/preamble-rhel5)} # CentOS: Conflicts: kmod-drbd82 kmod-drbd83 %endif %endif drbd-8.4.4/preamble-rhel50000664000000000000000000001240411753207431013675 0ustar rootrootProvides: drbd-km-2.6.18_238.1.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_238.1.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_238.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_238.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.32.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.32.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.26.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.26.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.17.4.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.17.4.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.17.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.17.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.11.4.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.11.4.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.11.3.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.11.3.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.11.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.11.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.8.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.8.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.3.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.3.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_194.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_194.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.15.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.15.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.11.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.11.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.10.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.10.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.9.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.9.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.6.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.6.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.2.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.2.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_164.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_164.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.7.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.7.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.4.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.4.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.2.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.2.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.1.16.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.1.16.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.1.14.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.1.14.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.1.10.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.1.10.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.1.6.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.1.6.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.1.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.1.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_128.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_128.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.1.22.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.1.22.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.1.18.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.1.18.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.1.13.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.1.13.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.1.10.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.1.10.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.1.6.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.1.6.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.1.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.1.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_92.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_92.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.1.21.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.1.21.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.1.19.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.1.19.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.1.14.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.1.14.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.1.13.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.1.13.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.1.6.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.1.6.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.1.4.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.1.4.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_53.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_53.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.15.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.15.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.14.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.14.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.8.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.8.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.6.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.6.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.4.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.4.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.3.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.3.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.1.1.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.1.1.el5%variant < 8.3.10 Provides: drbd-km-2.6.18_8.el5%variant = 8.3.10 Obsoletes: drbd-km-2.6.18_8.el5%variant < 8.3.10 drbd-8.4.4/preamble-sles100000664000000000000000000000455011753207431013770 0ustar rootrootProvides: drbd-km-2.6.16.60_0.60.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.60.1_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.59.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.59.1_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.58.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.58.1_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.54.5_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.54.5_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.42.7_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.42.7_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.42.5_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.42.5_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.42.4_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.42.4_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.39.3_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.39.3_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.37_f594963d_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.37_f594963d_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.34_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.34_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.33_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.33_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.31_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.31_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.30_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.30_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.29_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.29_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.27_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.27_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.25_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.25_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.23_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.23_%1 < 8.3.10 Provides: drbd-km-2.6.16.60_0.21_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.60_0.21_%1 < 8.3.10 Provides: drbd-km-2.6.16.54_0.2.5_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.54_0.2.5_%1 < 8.3.10 Provides: drbd-km-2.6.16.54_0.2.3_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.54_0.2.3_%1 < 8.3.10 Provides: drbd-km-2.6.16.53_0.16_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.53_0.16_%1 < 8.3.10 Provides: drbd-km-2.6.16_53_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16_53_%1 < 8.3.10 Provides: drbd-km-2.6.16.46_0.14_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.46_0.14_%1 < 8.3.10 Provides: drbd-km-2.6.16.46_0.12_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.46_0.12_%1 < 8.3.10 Provides: drbd-km-2.6.16.21_0.15_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.21_0.15_%1 < 8.3.10 Provides: drbd-km-2.6.16.21_0.8_%1 = 8.3.10 Obsoletes: drbd-km-2.6.16.21_0.8_%1 < 8.3.10 drbd-8.4.4/preamble-sles110000664000000000000000000000277611753207431014001 0ustar rootroot# SLES 11 SP1 Provides: drbd-km-2.6.32.27_0.2_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.27_0.2_%1 < 8.3.10 Provides: drbd-km-2.6.32.24_0.2_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.24_0.2_%1 < 8.3.10 Provides: drbd-km-2.6.32.23_0.3_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.23_0.3_%1 < 8.3.10 Provides: drbd-km-2.6.32.19_0.3_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.19_0.3_%1 < 8.3.10 Provides: drbd-km-2.6.32.19_0.2_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.19_0.2_%1 < 8.3.10 Provides: drbd-km-2.6.32.13_0.5_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.13_0.5_%1 < 8.3.10 Provides: drbd-km-2.6.32.13_0.4_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.13_0.4_%1 < 8.3.10 Provides: drbd-km-2.6.32.12_0.7_%1 = 8.3.10 Obsoletes: drbd-km-2.6.32.12_0.7_%1 < 8.3.10 # SLES 11 Provides: drbd-km-2.6.27.45_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.45_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.42_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.42_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.39_0.3_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.39_0.3_%1 < 8.3.10 Provides: drbd-km-2.6.27.37_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.37_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.29_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.29_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.25_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.25_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.23_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.23_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.21_0.1_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.21_0.1_%1 < 8.3.10 Provides: drbd-km-2.6.27.19_5_%1 = 8.3.10 Obsoletes: drbd-km-2.6.27.19_5_%1 < 8.3.10 drbd-8.4.4/rpm-macro-fixes/README0000664000000000000000000000075411516050235015037 0ustar rootrootmacros.kernel-source.sles11-sp1.diff: Patch needed on SUSE products in order to allow building kernel module packages for a specific kernel version. See the patch for more detailed documentation. macros.kernel-source.sles11.diff: Same thing for sles11 (no sp1) suse_macros.sles10.diff: Similar thing for sles10 kmodtool.rhel5.diff Add filelist tag substitution capabilities to rhel5 kmodtool, and drop the dependency on a ...-kmod-common package, similar to what rhel6 does. drbd-8.4.4/rpm-macro-fixes/kmodtool.rhel5.diff0000664000000000000000000000170111516050235017650 0ustar rootroot--- /usr/lib/rpm/redhat/kmodtool +++ /usr/lib/rpm/redhat/kmodtool @@ -65,12 +65,19 @@ { local variant="${1}" local dashvariant="${variant:+-${variant}}" + local dotvariant="${variant:+.${variant}}" + case "$verrel" in *.el*) kdep="kernel${dashvariant}-%{_target_cpu} = ${verrel}" ;; *.EL*) kdep="kernel${dashvariant}-%{_target_cpu} = ${verrel}" ;; *) kdep="kernel-%{_target_cpu} = ${verrel}${variant}" ;; esac + echo "%global verrel $verrel" + echo "%global variant ${variant:-%nil}" + echo "%global dashvariant ${dashvariant:-%nil}" + echo "%global dotvariant ${dotvariant:-%nil}" + echo "%package -n kmod-${kmod_name}${dashvariant}" if [ -z "$kmp_provides_summary" ]; then @@ -100,7 +107,6 @@ fi cat <= %{?epoch:%{epoch}:}%{version} Requires(post): /sbin/depmod Requires(postun): /sbin/depmod EOF drbd-8.4.4/rpm-macro-fixes/macros.kernel-source.sles11-sp1.diff0000664000000000000000000000433611516050235022662 0ustar rootrootBy default, the %kernel_module_package will build packages for all kernel flavors it finds in /usr/src/linux-obj: this directory contains symlinks to the latest kernel-$flavor-devel packages installed. This default can be overridden by defining the %kernel_version macro on the rpmbuild command line. For example, you can build against version 2.6.32.19-0.2 with: rpmbuild --define 'kernel_version 2.6.32.19-0.2' When doing that, rpmbuild will iterate over the kernels defined in /usr/src/linux-%kernel_version-obj, instead. It is not possible to iterate over all installed kernel-$flavor-devel packages in one rpmbuild command: rpm only allows to build a single sub-package with a given name (for example, drbd-kmp-default), and cannot build two separate drbd-kmp-default sub-packages with different versions. Andreas Gruenbacher --- /etc/rpm/macros.kernel-source.orig +++ /etc/rpm/macros.kernel-source @@ -9,14 +9,14 @@ echo "%%define _suse_kernel_module_subpackage(n:v:r:f:p:) %%{expand:%%(cd %_sourcedir; cat $subpkg; echo %%%%nil)}" \ flavors_to_build= \ flavors="%*" \ - for flavor in $(ls /usr/src/linux-obj/%_target_cpu 2>/dev/null); do \ + for flavor in $(ls /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu 2>/dev/null); do \ case " $flavors " in \ (*" $flavor "*) \ [ -n "%{-X}" ] && continue ;; \ (*) \ [ -z "%{-X}" -a -n "$flavors" ] && continue ;; \ esac \ - krel=$(make -s -C /usr/src/linux-obj/%_target_cpu/$flavor kernelrelease) \ + krel=$(make -s -C /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu/$flavor kernelrelease) \ kver=${krel%%-*} \ [ -e /boot/symsets-$kver-$flavor.tar.gz ] || continue \ flavors_to_build="$flavors_to_build $flavor" \ @@ -24,7 +24,7 @@ done \ echo "%%global flavors_to_build${flavors_to_build:-%%nil}" \ echo "%%{expand:%%(test -z '%flavors_to_build' && echo %%%%internal_kmp_error)}" \ - echo "%%global kernel_source() /usr/src/linux-obj/%_target_cpu/%%%%{1}" \ + echo "%%global kernel_source() /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu/%%%%{1}" \ echo "%%global kernel_module_package_moddir() updates" \ \ echo "%package -n %{-n*}%{!-n:%name}-kmp-_dummy_" \ drbd-8.4.4/rpm-macro-fixes/macros.kernel-source.sles11.diff0000664000000000000000000000263211516050235022156 0ustar rootrootSee comment in macros.kernel-source.sles11-sp1.diff --- /etc/rpm/macros.kernel-source.orig +++ /etc/rpm/macros.kernel-source @@ -9,14 +9,14 @@ echo "%%define _suse_kernel_module_subpackage(n:v:r:f:p:) %%{expand:%%(cd %_sourcedir; cat $subpkg; echo %%%%nil)}" \ flavors_to_build= \ flavors="%*" \ - for flavor in $(ls /usr/src/linux-obj/%_target_cpu 2>/dev/null); do \ + for flavor in $(ls /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu 2>/dev/null); do \ case " $flavors " in \ (*" $flavor "*) \ [ -n "%{-X}" ] && continue ;; \ (*) \ [ -z "%{-X}" -a -n "$flavors" ] && continue ;; \ esac \ - krel=$(make -s -C /usr/src/linux-obj/%_target_cpu/$flavor kernelrelease) \ + krel=$(make -s -C /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu/$flavor kernelrelease) \ kver=${krel%%-*} \ [ -e /boot/symsets-$kver-$flavor.tar.gz ] || continue \ flavors_to_build="$flavors_to_build $flavor" \ @@ -24,7 +24,7 @@ done \ echo "%%global flavors_to_build${flavors_to_build:-%%nil}" \ echo "%%{expand:%%(test -z '%flavors_to_build' && echo %%%%internal_kmp_error)}" \ - echo "%%global kernel_source() /usr/src/linux-obj/%_target_cpu/%%%%{1}" \ + echo "%%global kernel_source() /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu/%%%%{1}" \ \ echo "%package -n %{-n*}%{!-n:%name}-kmp-_dummy_" \ echo "Version: %version" \ drbd-8.4.4/rpm-macro-fixes/macros.rhel5.diff0000664000000000000000000000126611516050235017312 0ustar rootroot--- /usr/lib/rpm/redhat/macros.orig +++ /usr/lib/rpm/redhat/macros @@ -170,8 +170,8 @@ %kernel_module_package(n:v:r:s:f:xp:) %{expand:%( \ %define kmodtool %{-s*}%{!-s:/usr/lib/rpm/redhat/kmodtool} \ - %define kmp_version %{-v*}%{!-v:%{version}} \ - %define kmp_release %{-r*}%{!-r:%{release}} \ + %global kmp_version %{-v*}%{!-v:%{version}} \ + %global kmp_release %{-r*}%{!-r:%{release}} \ %define latest_kernel %(rpm -q --qf '%{VERSION}-%{RELEASE}\\\\n' `rpm -q kernel-devel | /usr/lib/rpm/redhat/rpmsort -r | head -n 1` | head -n 1) \ %{!?kernel_version:%{expand:%%global kernel_version %{latest_kernel}}} \ %global kverrel %(%{kmodtool} verrel %{?kernel_version} 2>/dev/null) \ drbd-8.4.4/rpm-macro-fixes/suse_macros.sles10.diff0000664000000000000000000000406711517315301020442 0ustar rootrootSee comment in macros.kernel-source.sles11-sp1.diff --- /usr/lib/rpm/suse_macros.orig +++ /usr/lib/rpm/suse_macros @@ -473,12 +473,12 @@ # Defines %flavors_to_build as a side effect. %suse_kernel_module_package(n:v:r:s:f:xp:) \ -%{expand:%( \ +%{expand:%{expand:%( \ + ( \ subpkg=%{-s*}%{!-s:/usr/lib/rpm/rpm-suse-kernel-module-subpackage} \ echo "%%define _suse_kernel_module_subpackage(n:v:r:f:p:) %%{expand:%%(cd %_sourcedir; cat $subpkg; echo %%%%nil)}" \ - flavors="%{-x:%*}%{!-x:$(ls /usr/src/linux-obj/%_target_cpu 2>/dev/null)}" \ + flavors="%{-x:%*}%{!-x:$(ls /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu 2>/dev/null)}" \ flavors_to_build= \ - kver=$(rpm -q --qf '%{VERSION}-%{RELEASE}' kernel-source) \ for flavor in $flavors; do \ if [ -z "%{-x}" ]; then \ case " %* " in \ @@ -486,19 +486,23 @@ continue ;; \ esac \ fi \ - krel=$(make -s -C /usr/src/linux-obj/%_target_cpu/$flavor kernelrelease) \ + krel=$(make -s -C /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu/$flavor kernelrelease) \ + kver=${krel%%-*} \ [ -e /boot/symsets-$krel.tar.gz ] || continue \ flavors_to_build="$flavors_to_build $flavor" \ echo "%%_suse_kernel_module_subpackage -n %{-n*}%{!-n:%name}-kmp -v %{-v*}%{!-v:%version} -r %{-r*}%{!-r:%release} %{-p} $flavor $krel $kver" \ done \ echo "%%global flavors_to_build${flavors_to_build:-%%nil}" \ + echo "%%global kernel_source() /usr/src/linux-%{?kernel_version:%kernel_version-}obj/%_target_cpu/%%%%{1}" \ + echo "%%global kernel_module_package_moddir() updates" \ \ echo "%package -n %{-n*}%{!-n:%name}-kmp-_dummy_" \ echo "Version: %version" \ echo "Summary: %summary" \ echo "Group: %group" \ echo "%description -n %{-n*}%{!-n:%name}-kmp-_dummy_" \ - )} + ) | sed -e 's/%%/%%%%/g' \ + )}} %suse_version 1010 %sles_version 10 drbd-8.4.4/rpm-macro-fixes/symset-table.diff0000664000000000000000000000357211516050235017423 0ustar rootrootsymsets-xyz-tar.gz contain only the current symsets, and potentially compatible symsets. To be compatible by definition means to be a subset of the current symset. If we scan through the symsets in ascending order of their size in bytes, the first symset to match a particular symbol will be the "oldest", "most compatible". This way, even if the most recent kernel version provides some new symset containing new symbols, a kernel module package built against it will still only require the weakest symset(s) necessary, so will stay compatible on the rpm dependency level with all older kernels that provide the actually used symbols. Without the sorting and filtering, the resulting kmp would require all symsets the respective symbols are defined in, including the latest symset, even if only a subset of the contained symbols is actually used. Thus the kmp may become "incompatible" on the rpm level with older kernel versions, even though it works just fine with "weak-modules" on the actual symbol version level. --- /usr/lib/rpm/symset-table +++ /usr/lib/rpm/symset-table @@ -21,15 +21,26 @@ for symsets in *; do krel=${symsets#symsets-} - for symset in $symsets/*; do + for symset in $(ls -Sr $symsets/* ); do class=${symset##*/} ; class=${class%.*} hash=${symset##*.} awk ' BEGIN { FS = "\t" ; OFS = "\t" } { sub(/0x0*/, "", $1) - print krel "/" $1 "/" $2, class, hash } + print krel "/" $1, $2, class, hash } ' krel="$krel" class="$class" hash="$hash" $symset - done + done \ + | awk ' + # Filter out duplicate symbols. Since we went through the symset + # files in increasing size order, each symbol will remain in the + # table with the oldest symset it is defined in. + BEGIN { FS = "\t" ; OFS = "\t" } + { if ($2 in seen) + next + seen[$2]=1 + print $1 "/" $2, $3, $4 } + ' \ + | sort -t $'\t' -k 1,1 done # vim:shiftwidth=4 softtabstop=4 drbd-8.4.4/scripts/Makefile.in0000664000000000000000000001216312221413357014701 0ustar rootroot# Makefile for scripts # # This file is part of DRBD by Philipp Reisner & Lars Ellenberg. # # Copright 2001-2008 LINBIT Information Technologies # Philipp Reisner, Lars Ellenberg # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd; see the file COPYING. If not, write to # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # # for Debian: # update-rc.d drbd defaults 70 08 # # variables set by configure DISTRO = @DISTRO@ prefix = @prefix@ exec_prefix = @exec_prefix@ datarootdir = @datarootdir@ datadir = @datadir@ sbindir = @sbindir@ sysconfdir = @sysconfdir@ BASH_COMPLETION_SUFFIX = @BASH_COMPLETION_SUFFIX@ UDEV_RULE_SUFFIX = @UDEV_RULE_SUFFIX@ INITDIR = @INITDIR@ LIBDIR = @prefix@/lib/@PACKAGE_TARNAME@ LN_S = @LN_S@ # features enabled or disabled by configure WITH_UTILS = @WITH_UTILS@ WITH_KM = @WITH_KM@ WITH_UDEV = @WITH_UDEV@ WITH_XEN = @WITH_XEN@ WITH_PACEMAKER = @WITH_PACEMAKER@ WITH_HEARTBEAT = @WITH_HEARTBEAT@ WITH_RGMANAGER = @WITH_RGMANAGER@ WITH_BASHCOMPLETION = @WITH_BASHCOMPLETION@ # variables meant to be overridden from the make command line DESTDIR ?= / all: install: install-utils install-udev install-xen install-heartbeat install-pacemaker install-rgmanager install-bashcompletion install-utils: ifeq ($(WITH_UTILS),yes) install -d $(DESTDIR)$(INITDIR) install -m 755 drbd $(DESTDIR)$(INITDIR)/ @ if [ ! -e $(DESTDIR)$(sysconfdir)/drbd.conf ]; then \ install -d $(DESTDIR)$(sysconfdir)/; \ install -m 644 drbd.conf $(DESTDIR)$(sysconfdir)/; \ install -d $(DESTDIR)$(sysconfdir)/drbd.d; \ install -m 644 global_common.conf $(DESTDIR)$(sysconfdir)/drbd.d; \ fi install -d $(DESTDIR)$(LIBDIR) install -m 755 outdate-peer.sh $(DESTDIR)$(LIBDIR) install -m 755 snapshot-resync-target-lvm.sh $(DESTDIR)$(LIBDIR) install -m 755 notify.sh $(DESTDIR)$(LIBDIR) install -m 755 stonith_admin-fence-peer.sh $(DESTDIR)$(LIBDIR) ( set -e ; cd $(DESTDIR)$(LIBDIR) ;\ $(LN_S) -f snapshot-resync-target-lvm.sh unsnapshot-resync-target-lvm.sh ;\ $(LN_S) -f notify.sh notify-split-brain.sh ;\ $(LN_S) -f notify.sh notify-io-error.sh ;\ $(LN_S) -f notify.sh notify-pri-on-incon-degr.sh ;\ $(LN_S) -f notify.sh notify-pri-lost.sh ;\ $(LN_S) -f notify.sh notify-pri-lost-after-sb.sh ;\ $(LN_S) -f notify.sh notify-emergency-reboot.sh ;\ $(LN_S) -f notify.sh notify-emergency-shutdown.sh ;\ $(LN_S) -f notify.sh notify-out-of-sync.sh; ) install -d $(DESTDIR)$(sbindir) install -m 755 drbd-overview.pl $(DESTDIR)$(sbindir)/drbd-overview ifeq ($(DISTRO),debian) @ echo "Don't forget to run update-rc.d" else @ echo "Don't forget to run chkconfig" endif endif install-heartbeat: ifeq ($(WITH_HEARTBEAT),yes) mkdir -p $(DESTDIR)$(sysconfdir)/ha.d/resource.d install -m 755 drbddisk $(DESTDIR)$(sysconfdir)/ha.d/resource.d install -m 755 drbdupper $(DESTDIR)$(sysconfdir)/ha.d/resource.d endif # Do not use $(prefix) for the resource agent. The OCF standard # explicitly mandates where resource agents must live, # no matter what prefix is configured to. install-pacemaker: ifeq ($(WITH_PACEMAKER),yes) install -d $(DESTDIR)$(LIBDIR) install -m 755 crm-fence-peer.sh $(DESTDIR)$(LIBDIR) ( set -e ; cd $(DESTDIR)$(LIBDIR) ;\ $(LN_S) -f crm-fence-peer.sh crm-unfence-peer.sh ; ) mkdir -p $(DESTDIR)/usr/lib/ocf/resource.d/linbit install -m 755 drbd.ocf $(DESTDIR)/usr/lib/ocf/resource.d/linbit/drbd endif install-rgmanager: ifeq ($(WITH_RGMANAGER),yes) mkdir -p $(DESTDIR)$(datadir)/cluster install -m 755 drbd.sh.rhcs $(DESTDIR)$(datadir)/cluster/drbd.sh install -m 644 drbd.metadata.rhcs $(DESTDIR)$(datadir)/cluster/drbd.metadata install -d $(DESTDIR)$(LIBDIR) install -m 755 rhcs_fence $(DESTDIR)$(LIBDIR) endif install-xen: ifeq ($(WITH_XEN),yes) mkdir -p $(DESTDIR)$(sysconfdir)/xen/scripts install -m 755 block-drbd $(DESTDIR)$(sysconfdir)/xen/scripts endif install-udev: ifeq ($(WITH_UDEV),yes) mkdir -p $(DESTDIR)$(sysconfdir)/udev/rules.d install -m 644 drbd.rules $(DESTDIR)$(sysconfdir)/udev/rules.d/65-drbd.rules$(UDEV_RULE_SUFFIX) endif install-bashcompletion: ifeq ($(WITH_BASHCOMPLETION),yes) mkdir -p $(DESTDIR)$(sysconfdir)/bash_completion.d install -m 644 drbdadm.bash_completion $(DESTDIR)$(sysconfdir)/bash_completion.d/drbdadm$(BASH_COMPLETION_SUFFIX) endif clean: rm -f *~ rm -f datadisk distclean: clean uninstall: rm -f $(DESTDIR)$(INITDIR)/drbd rm -f $(DESTDIR)$(sysconfdir)/ha.d/resource.d/drbddisk rm -f $(DESTDIR)$(sysconfdir)/ha.d/resource.d/drbdupper rm -f $(DESTDIR)$(sysconfdir)/xen/scripts/block-drbd rm -f $(DESTDIR)$(sysconfdir)/bash_completion.d/drbdadm$(BASH_COMPLETION_SUFFIX) rm -f $(DESTDIR)$(sbindir)/drbd-overview ! test -L $(DESTDIR)/sbin/rcdrbd || rm $(DESTDIR)/sbin/rcdrbd drbd-8.4.4/scripts/README0000664000000000000000000000136312132747531013521 0ustar rootrootdrbd drbd sys-v type init script drbd.gentoo gentoo specific variation of the same drbdadm.bash_completion bash completion for the drbdadm command drbddisk resource script for heartbeat (R1 style) block-drbd xen "drbd" vbd type implementation drbd.sh.rhcs resource script for Red Hat Cluster Suite rgmanager drbd.metadata.rhcs resource script metadata for Red Hat Cluster Suite rgmanager outdate-peer.sh example implementation of the fence-peer mechanism using ssh drbd.conf commented example configuration drbd-overview.pl consolidates /proc/drbd, /etc/drbd.conf, xm list, lvs, df pretty-proc-drbd.sh example how to pretty-print (optionally with color) /proc/drbd build helpers: get_uts_release.sh adjust_drbd_config_h.sh patch-kernel drbd-8.4.4/scripts/README.rhcs_fence0000664000000000000000000000636712221261130015611 0ustar rootrootProgram: rhcs_fence Author: Madison Kelly (digimer@alteeve.com) Alteeve's Niche! - https://alteeve.com/w/ Date: 2013-03-13 Version: 0.2.6 License: GPL v2.0+ -=] Description: This script is designed to be used as DRBD's 'fence-peer' handler. It ties DRBD's fence call, using 'disk { fencing resource-and-stonith }' into Red Hat Cluster Service's fenced daemon. This allows you to configure fencing once in your cluster and use it for both the cluster and DRBD. This program was based heavily on Lon Hohberger "obliterate-peer.sh" script. This was created as a replacement fence handler designed to add some features and intelligence to his script, but became a new program in order to switch to perl. -=] Supported Environments This script supports two-node DRBD setups only, but the nodes themselves may be part of a larger cluster. This script should be used when 'fencing resource-and-stonith' is set only. The 'on { }' name *must* match the ' as well. -=] Limitations As this handler insists on seeing the local disk as 'UpToDate' before it will proceed with a fence. Thus, if you have a simultaneous and complete cluster crash followed by the recovery of only one node, the recovered node will be in a 'Consistent' state, which will abort the fence call. As such, this scenario will recover human intervention to recover from. -=] Notes: This program takes certain steps to avoid dual-fencing; - First, Timing: This program will use the cluster's 'Node ID' as a base value for a delay prior to fencing. If a node has 'Node ID: 1' (as seen with 'cman_tool status'), there will be no delay and the fence will occur immediately. All other nodes will sleep for ((node_id x 2) + 5) seconds, up to a maximum of 30 seconds. It is possible to override this behaviour by setting 'local_delay' in the head configuration section of the script. If this is a non-0 value, the script will pause for the defined number of seconds, ignoring the behaviour described above. - Second, 'UpToDate' check; When a fence call is made, this program checks the referenced resource minor number's disk state to ensure it is 'UpToDate' as resported by '/proc/drbd'. The fence call will abort if the disk state is 'Consistent' (or anything else). This helps prevent accidentally fencing the original survivng node when the cluster communication is up, but the storage network is not, avoiding a fence-loop. -=] Failure Modes: This program follows the "Fail Early, Fail Often" ethos. It will *only* fence the peer if several conditions are met. Please test the functionality of this script before going into production! Exit codes; 1 - Fence failed, see syslog 7 - Fence succeeded 255 - End of script hit, likely a program error If you run into any trouble, please enable 'debug' mode by setting the internal 'debug' value to '1'. If you need help, please send the output of the syslog of both nodes with debug enabled to the Linux Cluster mailing list or DRBD Users mailing list. -=] Getting Help: By email: digimer@alteeve.com / https://alteeve.com IRC: #linux-cluster and #drbd on freenode.net Mailing list: https://www.redhat.com/mailman/listinfo/linux-cluster http://lists.linbit.com/mailman/listinfo/drbd-user drbd-8.4.4/scripts/block-drbd0000775000000000000000000001755711516050235014577 0ustar rootroot#!/bin/bash # # Copyright (c) 2007 LINBIT Information Technologies GmbH # Based on the original "block" VBD script by XenSource Inc. # # This script implements the "drbd" VBD type. To use a DRBD resource # as a virtual block device, include a line similar to this in your # domU configuration: # # disk = [ 'drbd:myresource,xvda1,w' ] # # This will direct Xen to put the DRBD resource named 'myresource' # into the Primary role, and configure it as device xvda1 in your # domU. You may use as many DRBD resources as you like. If you are # using DRBD in dual-Primary mode (available in DRBD versions 8.0 and # up), your DRBD-backed domU will be live migration capable. # # IMPORTANT: If you run DRBD in dual-Primary mode with Xen, you MUST # ensure that the only time the resource is accessed by # both nodes is during domain migration. If you fire up a # DRBD-backed domU simultaneously on two nodes, you WILL # wreck your VBD data. DO NOT attempt to set up a live # migration capable, DRBD-backed domU unless you # understand these implications. # # This script will not load the DRBD kernel module for you, nor will # it attach, detach, connect, or disconnect your resource. The init # script distributed with DRBD will do that for you. Make sure it is # started before attempting to start a DRBD-backed domU. # # Known limitations: # # - With 'file' and 'phy' VBD's, Xen will allow one block device to be # made available read-only to multiple domU's, or be mounted # read-only in the dom0 and be made available read-only to # domU's. This is not supported by the 'drbd' VBD type. # - Tested, thus far, only on Debian etch with Xen 3.0.3. # # For more information about DRBD, visit http://www.drbd.org/. # # # This library is free software; you can redistribute it and/or # modify it under the terms of version 2.1 of the GNU Lesser General Public # License as published by the Free Software Foundation. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # dir=$(dirname "$0") . "$dir/block-common.sh" ## # canonicalise_mode mode # # Takes the given mode, which may be r, w, ro, rw, w!, or rw!, or variations # thereof, and canonicalises them to one of # # 'r': perform checks for a new read-only mount; # 'w': perform checks for a read-write mount; or # '!': perform no checks at all. # canonicalise_mode() { local mode="$1" if ! expr index "$mode" 'w' >/dev/null then echo 'r' elif ! expr index "$mode" '!' >/dev/null then echo 'w' else echo '!' fi } ## # check_sharing device mode # # Check whether the device requested is already in use. To use the device in # read-only mode, it may be in use in read-only mode, but may not be in use in # read-write anywhere at all. To use the device in read-write mode, it must # not be in use anywhere at all. # # Prints one of # # 'local': the device may not be used because it is mounted in the current # (i.e. the privileged domain) in a way incompatible with the # requested mode; # 'guest': the device may not be used because it already mounted by a guest # in a way incompatible with the requested mode; or # 'ok': the device may be used. # check_sharing() { local dev="$1" local mode="$2" local devmm=$(device_major_minor "$dev") local file # Here, different from the original 'block' script, we don't check # explicitly for read/write mounts. See "known limitations" above. toskip="^$" for file in $(cat /proc/mounts | grep -v "$toskip" | cut -f 1 -d ' ') do if [ -e "$file" ] then local d=$(device_major_minor "$file") if [ "$d" = "$devmm" ] then echo 'local' return fi fi done local base_path="$XENBUS_BASE_PATH/$XENBUS_TYPE" for dom in $(xenstore-list "$base_path") do for dev in $(xenstore-list "$base_path/$dom") do d=$(xenstore_read_default "$base_path/$dom/$dev/physical-device" "") if [ "$d" = "$devmm" ] then # Here, different from the original 'block' script, we don't # check explicitly for read/write mounts. See "known # limitations" above. if ! same_vm $dom then echo 'guest' return fi fi done done echo 'ok' } same_vm() { local otherdom="$1" # Note that othervm can be MISSING here, because Xend will be racing with # the hotplug scripts -- the entries in /local/domain can be removed by # Xend before the hotplug scripts have removed the entry in # /local/domain/0/backend/. In this case, we want to pretend that the # VM is the same as FRONTEND_UUID, because that way the 'sharing' will be # allowed. local othervm=$(xenstore_read_default "/local/domain/$otherdom/vm" \ "$FRONTEND_UUID") [ "$FRONTEND_UUID" = "$othervm" ] } ## # check_device_sharing dev mode # # Perform the sharing check for the given physical device and mode. # check_device_sharing() { local dev="$1" local mode=$(canonicalise_mode "$2") local result if [ "x$mode" = 'x!' ] then return 0 fi result=$(check_sharing "$dev" "$mode") if [ "$result" != 'ok' ] then do_ebusy "Device $dev is mounted " "$mode" "$result" fi } ## # do_ebusy prefix mode result # # Helper function for check_device_sharing check_file_sharing, calling ebusy # with an error message constructed from the given prefix, mode, and result # from a call to check_sharing. # do_ebusy() { local prefix="$1" local mode="$2" local result="$3" if [ "$result" = 'guest' ] then dom='a guest ' when='now' else dom='the privileged ' when='by a guest' fi if [ "$mode" = 'w' ] then m1='' m2='' else m1='read-write ' m2='read-only ' fi release_lock "block" ebusy \ "${prefix}${m1}in ${dom}domain, and so cannot be mounted ${m2}${when}." } t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING') case "$command" in add) phys=$(xenstore_read_default "$XENBUS_PATH/physical-device" 'MISSING') if [ "$phys" != 'MISSING' ] then # Depending upon the hotplug configuration, it is possible for this # script to be called twice, so just bail. exit 0 fi if [ -n "$t" ] then p=$(xenstore_read "$XENBUS_PATH/params") mode=$(xenstore_read "$XENBUS_PATH/mode") fi case $t in drbd) drbd_resource=$p drbd_role="$(/sbin/drbdadm role $drbd_resource)" drbd_lrole="${drbd_role%%/*}" drbd_dev="$(/sbin/drbdadm sh-dev $drbd_resource)" if [ "$drbd_lrole" != 'Primary' ]; then /sbin/drbdadm primary $drbd_resource fi dev=$drbd_dev FRONTEND_ID=$(xenstore_read "$XENBUS_PATH/frontend-id") FRONTEND_UUID=$(xenstore_read_default \ "/local/domain/$FRONTEND_ID/vm" 'unknown') claim_lock "block" check_device_sharing "$dev" "$mode" write_dev "$dev" release_lock "block" exit 0 ;; "") claim_lock "block" success release_lock "block" ;; esac ;; remove) case $t in drbd) p=$(xenstore_read "$XENBUS_PATH/params") drbd_resource=$p drbd_role="$(/sbin/drbdadm role $drbd_resource)" drbd_lrole="${drbd_role%%/*}" drbd_dev="$(/sbin/drbdadm sh-dev $drbd_resource)" if [ "$drbd_lrole" != 'Secondary' ]; then /sbin/drbdadm secondary $drbd_resource fi exit 0 ;; "") exit 0 ;; esac ;; esac drbd-8.4.4/scripts/crm-fence-peer.sh0000775000000000000000000007273412221261130015763 0ustar rootroot#!/bin/bash # sed_rsc_location_suitable_for_string_compare() { # expected input: exactly one tag per line: "^[[:space:]]*<.*/?>$" sed -ne ' # within the rsc_location constraint with that id, // { /<\/rsc_location>/q # done, if closing tag is found s/^[[:space:]]*// # trim spaces s/ *\bid="[^"]*"// # remove id tag # print each attribute on its own line, by : attr h # remember the current (tail of the) line # remove all but the first attribute, and print, s/^\([^[:space:]]*[[:space:]][^= ]*="[^"]*"\).*$/\1/p g # then restore the remembered line, # and remove the first attribute. s/^\([^[:space:]]*\)[[:space:]][^= ]*="[^"]*"\(.*\)$/\1\2/ # then repeat, until no more attributes are left t attr }' | sort } cibadmin_invocations=0 set_constraint() { cibadmin_invocations=$(( $cibadmin_invocations + 1 )) cibadmin -C -o constraints -X "$new_constraint" } remove_constraint() { cibadmin_invocations=$(( $cibadmin_invocations + 1 )) cibadmin -D -X "" } cib_xml="" get_cib_xml() { cibadmin_invocations=$(( $cibadmin_invocations + 1 )) cib_xml=$( set +x; cibadmin "$@" ) } # if not passed in, try to "guess" it from the cib # we only know the DRBD_RESOURCE. fence_peer_init() { # we know which instance we are: $OCF_RESOURCE_INSTANCE. # but we do not know the xml ID of the :( # cibadmin -Ql --xpath \ # '//master[primitive[@type="drbd" and instance_attributes/nvpair[@name = "drbd_resource" and @value="r0"]]]/@id' # but I'd have to pipe that through sed anyways, because @attribute # xpath queries are not supported. # and I'd be incompatible with older cibadmin not supporting --xpath. # be cool, sed it out: : ${master_id=$(set +x; echo "$cib_xml" | sed -ne '// { /= deadtime, dc-timeout > timeout # Intended use case: fencing resource-and-stonith, STONITH configured. # # Difference to a) # # If peer is still reachable according to the cib, # we first poll the cib/try to confirm with crmadmin, # until either crmadim confirms reachability, timeout has elapsed, # or the peer becomes definetely unreachable. # # This gives STONITH the chance to kill us. # With "fencing resource-and-stontith;" this protects us against # completing transactions to userland which might otherwise be lost. # # We then place the constraint (if we are UpToDate), as explained below, # and return reachable/unreachable according to our last cib status poll # or crmadmin -S result. # # # replication link loss, current Primary calls this handler: # We are UpToDate, but we potentially need to wait for a DC election. # Once we have contacted the DC, we poll the cib until the peer is # confirmed unreachable, or crmadmin -S confirms it as reachable, # or timeout expired. # Then we place the constraint, and are done. # # If it is complete communications loss, one will stonith the other. # For two-node clusters with no-quorum-policy=ignore, we will have a # deathmatch shoot-out, which the former DC is likely to win. # # In dual-primary setups, if it is only replication link loss, both nodes # will call this handler, but only one will succeed to place the # constraint. The other will then typically need to "commit suicide". # With stonith enabled, and --suicide-on-failure-if-primary, # we will trigger a node level fencing, telling # pacemaker to "terminate" that node, # and scheduling a reboot -f just in case. # # Primary crash, promotion of former Secondary: # DC-election, if any, will have taken place already. # We are UpToDate, we place the constraint, done. # # node or cluster crash, promotion of Secondary with replication link down: # We are "Only" Consistent. Usually any "init-dead-time" or similar has # expired already, and the cib node states are already authoritative # without doing additional waiting. If the peer is still reachable, we # place the constraint - if the peer had better data, it should have a # higher master score, and we should not have been asked to become # primary. If the peer is not reachable, we don't do anything, and drbd # will refuse to be promoted. This is neccessary to avoid problems # With data diversion, in case this "crash" was due to a STONITH operation, # maybe the reboot did not fix our cluster communications! # # Note that typically, if STONITH is in use, it has been done on any # unreachable node _before_ we are promoted, so the cib should already # know that the peer is dead - if it is. # # slightly different logic than crm_is_true crm_is_not_false() { case $1 in no|n|false|0|off) false ;; *) true ;; esac } check_cluster_properties() { local x properties=$(set +x; echo "$cib_xml" | sed -n -e '/ $SECONDS )) || break sleep $(( net_hickup_time - SECONDS )) done set_states_from_proc_drbd : == DEBUG == DRBD_peer=${DRBD_peer[*]} === case "${DRBD_peer[*]}" in *Secondary*|*Primary*) # WTF? We are supposed to fence the peer, # but the replication link is just fine? echo WARNING "peer is not Unknown, did not place the constraint!" rc=0 return ;; esac : == DEBUG == CTS_mode=$CTS_mode == : == DEBUG == DRBD_disk_all_consistent=$DRBD_disk_all_consistent == : == DEBUG == DRBD_disk_all_uptodate=$DRBD_disk_all_uptodate == : == DEBUG == $peer_state/${DRBD_disk[*]}/$unreachable_peer_is == if [[ ${#DRBD_disk[*]} = 0 ]]; then # Someone called this script, without the corresponding drbd # resource being configured. That's not very useful. echo WARNING "could not determine my disk state: did not place the constraint!" rc=0 # keep drbd_fence_peer_exit_code at "generic error", # which will cause a "script is broken" message in case it was # indeed called as handler from within drbd # No, NOT fenced/Consistent: # just because we have been able to shoot him # does not make our data any better. elif [[ $peer_state = reachable ]] && $DRBD_disk_all_consistent; then # = reachable ]] && $DRBD_disk_all_uptodate # is implicitly handled here as well. set_constraint && drbd_fence_peer_exit_code=4 rc=0 && echo INFO "peer is $peer_state, my disk is ${DRBD_disk[*]}: placed constraint '$id_prefix-$master_id'" elif [[ $peer_state = fenced ]] && $DRBD_disk_all_uptodate ; then set_constraint && drbd_fence_peer_exit_code=7 rc=0 && echo INFO "peer is $peer_state, my disk is $DRBD_disk: placed constraint '$id_prefix-$master_id'" # Peer is neither "reachable" nor "fenced" (above would have matched) # So we just hit some timeout. # As long as we are UpToDate, place the constraint and continue. # If you don't like that, use a ridiculously high timeout, # or patch this script. elif $DRBD_disk_all_uptodate ; then # We could differentiate between unreachable, # and DC-unreachable. In the latter case, placing the # constraint will fail anyways, and drbd_fence_peer_exit_code # will stay at "generic error". set_constraint && drbd_fence_peer_exit_code=5 rc=0 && echo INFO "peer is not reachable, my disk is UpToDate: placed constraint '$id_prefix-$master_id'" # This block is reachable by operator intervention only # (unless you are hacking this script and know what you are doing) elif [[ $peer_state != reachable ]] && [[ $unreachable_peer_is = outdated ]] && $DRBD_disk_all_consistent; then # If the peer is not reachable, but we are only Consistent, we # may need some way to still allow promotion. # Easy way out: --force primary with drbdsetup. # But that would not place the constraint, nor outdate the # peer. With this --unreachable-peer-is-outdated, we still try # to set the constraint. Next promotion attempt will find the # "correct" constraint, consider the peer as successfully # fenced, and continue. set_constraint && drbd_fence_peer_exit_code=5 rc=0 && echo WARNING "peer is unreachable, my disk is only Consistent: --unreachable-peer-is-outdated FORCED constraint '$id_prefix-$master_id'" && echo WARNING "This MAY RISK DATA INTEGRITY" # So I'm not UpToDate, and peer is not reachable. # Tell the module about "not reachable", and don't do anything else. else echo WARNING "peer is $peer_state, my disk is ${DRBD_disk[*]}: did not place the constraint!" drbd_fence_peer_exit_code=5 rc=0 # I'd like to return 6 here, otherwise pacemaker will retry # forever to promote, even though 6 is not strictly correct. fi return $rc } commit_suicide() { local reboot_timeout=20 local extra_msg if $stonith_enabled ; then # avoid double fence, tell pacemaker to kill me echo WARNING "trying to have pacemaker kill me now!" crm_attribute -t status -N $HOSTNAME -n terminate -v 1 echo WARNING "told pacemaker to kill me, but scheduling reboot -f in 300 seconds just in case" # ------------------------- echo WARNING $'\n'" told pacemaker to kill me,"\ $'\n'" but scheduling reboot -f in 300 seconds just in case."\ $'\n'" kill $$ # to cancel" | wall # ------------------------- reboot_timeout=300 extra_msg="Pacemaker terminate pending. If that fails, I'm " else # ------------------------- echo WARNING $'\n'" going to reboot -f in $reboot_timeout seconds"\ $'\n'" kill $$ # to cancel!" | wall # ------------------------- fi reboot_timeout=$(( reboot_timeout + SECONDS )) # pacemaker apparently cannot kill me. while (( $SECONDS < $reboot_timeout )); do echo WARNING "${extra_msg}going to reboot -f in $(( reboot_timeout - SECONDS )) seconds! To cancel: kill $$" sleep 2 done echo WARNING "going to reboot -f now!" reboot -f sleep 864000 } # drbd_peer_fencing fence|unfence drbd_peer_fencing() { local rc # input to fence_peer_init: # $DRBD_RESOURCE is set by command line of from environment. # $id_prefix is set by command line or default. # $master_id is set by command line or will be parsed from the cib. # output of fence_peer_init: local have_constraint new_constraint # if I cannot query the local cib, give up get_cib_xml -Ql || return fence_peer_init || return case $1 in fence) local startup_fencing stonith_enabled check_cluster_properties if [[ $fencing_attribute = "#uname" ]]; then fencing_value=$HOSTNAME elif ! fencing_value=$(crm_attribute -Q -t nodes -n $fencing_attribute 2>/dev/null); then fencing_attribute="#uname" fencing_value=$HOSTNAME fi # double negation: do not run but with my data. new_constraint="\ " if [[ -z $have_constraint ]] ; then # try to place it. # interessting: # In case this is a two-node cluster (still common with # drbd clusters) it does not have real quorum. # If it is configured to do stonith, and reboot, # and after reboot that stonithed node cluster comm is # still broken, it will shoot the still online node, # and try to go online with stale data. # Exactly what this "fence" hanler should prevent. # But setting contraints in a cluster partition with # "no-quorum-policy=ignore" will usually succeed. # # So we need to differentiate between node reachable or # not, and DRBD "Consistent" or "UpToDate". try_place_constraint && return # maybe callback and operator raced for the same constraint? # before we potentially trigger node level fencing # or keep IO frozen, double check. # try_place_constraint has updated cib_xml from DC have_constraint=$(set +x; echo "$cib_xml" | sed_rsc_location_suitable_for_string_compare "$id_prefix-$master_id") fi if [[ "$have_constraint" = "$(set +x; echo "$new_constraint" | sed_rsc_location_suitable_for_string_compare "$id_prefix-$master_id")" ]]; then echo INFO "suitable constraint already placed: '$id_prefix-$master_id'" drbd_fence_peer_exit_code=4 rc=0 elif [[ -n "$have_constraint" ]] ; then # if this id already exists, but looks different, we may have lost a shootout echo WARNING "constraint "$have_constraint" already exists" # anything != 0 will do; # 21 happend to be "The object already exists" with my cibadmin rc=21 # maybe: drbd_fence_peer_exit_code=6 # as this is not the constraint we'd like to set, # it is likely the inverse, so we probably can assume # that the peer is active primary, or at least has # better data than us, and wants us outdated. fi if [[ $rc != 0 ]]; then # at least we tried. # maybe it was already in place? echo WARNING "DATA INTEGRITY at RISK: could not place the fencing constraint!" fi # XXX policy decision: if $suicide_on_failure_if_primary && [[ $drbd_fence_peer_exit_code != [3457] ]]; then set_states_from_proc_drbd [[ "${DRBD_role[*]}" = *Primary* ]] && commit_suicide fi return $rc ;; unfence) if [[ -n $have_constraint ]]; then # remove it based on that id remove_constraint else return 0 fi esac } guess_if_pacemaker_will_fence() { # try to guess whether it is useful to wait and poll again, # (node fencing in progress...), # or if pacemaker thinks the node is "clean" dead. local x # "return values:" crmd= in_ccm= expected= join= will_fence=false # Older pacemaker has an "ha" attribute, too. # For stonith-enabled=false, the "crmd" attribute may stay "online", # but once ha="dead", we can stop waiting for changes. ha_dead=false for x in ${node_state%/>} ; do case $x in in_ccm=\"*\") x=${x#*=\"}; x=${x%\"}; in_ccm=$x ;; crmd=\"*\") x=${x#*=\"}; x=${x%\"}; crmd=$x ;; expected=\"*\") x=${x#*=\"}; x=${x%\"}; expected=$x ;; join=\"*\") x=${x#*=\"}; x=${x%\"}; join=$x ;; ha=\"dead\") ha_dead=true ;; esac done # if it is not enabled, no point in waiting for it. if ! $stonith_enabled ; then # "normalize" the rest of the logic # where this is called. # for stonith-enabled=false, and ha="dead", # reset crmd="offline". # Then we stop polling the cib for changes. $ha_dead && crmd="offline" return fi if [[ -z $node_state ]] ; then # if we don't know nothing about the peer, # and startup_fencing is explicitly disabled, # no fencing will take place. $startup_fencing || return fi # for further inspiration, see pacemaker:lib/pengine/unpack.c, determine_online_status_fencing() [[ -z $in_ccm ]] && will_fence=true [[ $crmd = "banned" ]] && will_fence=true if [[ ${expected-down} = "down" && $in_ccm = "false" && $crmd != "online" ]]; then : "pacemaker considers this as clean down" elif [[ $in_ccm = false ]] || [[ $crmd != "online" ]]; then will_fence=true fi } # return value in $peer_state: # DC-unreachable # We have not been able to contact the DC. # fenced # According to the node_state recorded in the cib, # the peer is offline and expected down # (which means successfully fenced, if stonith is enabled) # reachable # cib says it's online, and crmadmin -S says peer state is "ok" # unreachable # cib says it's offline (but does not yet say "expected" down) # and we reached the timeout # unknown # cib does not say it was offline (or we don't know who the peer is) # and we reached the timeout # check_peer_node_reachable() { # we are going to increase the cib timeout with every timeout (see below). # for the actual invocation, we use int(cibtimeout/10). # scaled by 5 / 4 with each iteration, # this results in a timeout sequence of 1 2 2 3 4 5 6 7 9 ... seconds local cibtimeout=18 local full_timeout local nr_other_nodes local other_node_uname_attrs # we have a cibadmin -Ql in cib_xml already # filter out $/\1/' | grep -v -F uname=\"$HOSTNAME\") set -- $other_node_uname_attrs nr_other_nodes=$# while :; do local state_lines= node_state= local crmd= in_ccm= expected= join= will_fence= ha_dead= while :; do local t=$SECONDS # # Update our view of the cib, ask the DC this time. # Timeout, in case no DC is available. # Caution, some cibadmin (pacemaker 0.6 and earlier) # apparently use -t use milliseconds, so will timeout # many times until a suitably long timeout is reached # by increasing below. # # Why not use the default timeout? # Because that would unecessarily wait for 30 seconds # or longer, even if the DC is re-elected right now, # and available within the next second. # get_cib_xml -Q -t $(( cibtimeout/10 )) && break # bash magic $SECONDS is seconds since shell invocation. if (( $SECONDS > $dc_timeout )) ; then # unreachable: cannot even reach the DC peer_state="DC-unreachable" return fi # avoid busy loop [[ $t = $SECONDS ]] && sleep 1 # try again, longer timeout. let "cibtimeout = cibtimeout * 5 / 4" done state_lines=$( set +x; echo "$cib_xml" | grep ' 2 cluster. # (yes, I've seen such beasts in the wild!) # As we don't know the peer, # we could only safely return here if *all* # potential peers are confirmed down. # Don't try to be smart, just wait for the full # timeout, which should allow STONITH to # complete. full_timeout=$(( $timeout - $SECONDS )) if (( $full_timeout > 0 )) ; then echo WARNING "don't know who my peer is; sleep $full_timeout seconds just in case" sleep $full_timeout fi # In the unlikely case that we don't know our DRBD peer, # there is no point in polling the cib again, # that won't teach us who our DRBD peer is. # # We waited $full_timeout seconds already, # to allow for node level fencing to shoot us. # # So if we are still alive, then obviously no-one has shot us. # peer_state="unknown" return fi # # we know the peer or/and are a two node cluster # node_state=$(set +x; echo "$state_lines" | grep -F uname=\"$DRBD_PEER\") # populates in_ccm, crmd, exxpected, join, will_fence=[false|true] guess_if_pacemaker_will_fence if ! $will_fence && [[ $crmd != "online" ]] ; then # "legacy" cman + pacemaker clusters older than 1.1.10 # may "forget" about startup fencing. # We can detect this because the "expected" attribute is missing. # Does not make much difference for our logic, though. [[ $expected/$in_ccm = "down/false" ]] && peer_state="fenced" || peer_state="unreachable" return fi # So the cib does still indicate the peer was reachable. # # try crmadmin; if we can sucessfully query the state of the remote crmd, # it is obviously reachable. # # Do this only after we have been able to reach a DC above. # Note: crmadmin timeout is in milli-seconds, and defaults to 30000 (30 seconds). # Our variable $cibtimeout should be in deci-seconds (see above) # (unless you use a very old version of pacemaker, so don't do that). # Convert deci-seconds to milli-seconds, and double it. if [[ $crmd = "online" ]] ; then local out if out=$( crmadmin -t $(( cibtimeout * 200 )) -S $DRBD_PEER ) \ && [[ $out = *"(ok)" ]]; then peer_state="reachable" return fi fi # We know our DRBD peer. # We are still not sure about its status, though. # # It is not (yet) "expected down" per the cib, but it is not # reliably reachable via crmadmin -S either. # # If we already polled for longer than timeout, give up. # # For a resource-and-stonith setup, or dual-primaries (which # you should only use with resource-and-stonith, anyways), # the recommended timeout is larger than the deadtime or # stonith timeout, and according to beekhof maybe should be # tuned up to the election-timeout (which, btw, defaults to 2 # minutes!). # if (( $SECONDS >= $timeout )) ; then [[ $crmd = offline ]] && peer_state="unreachable" || peer_state="unknown" return fi # wait a bit before we poll the DC again sleep 2 done # NOT REACHED } set_states_from_proc_drbd() { local IFS line lines i disk # DRBD_MINOR exported by drbdadm since 8.3.3 [[ $DRBD_MINOR ]] || DRBD_MINOR=$(drbdadm ${DRBD_CONF:+ -c "$DRBD_CONF"} sh-minor $DRBD_RESOURCE) || return # if we have more than one minor, do a word split, ... set -- $DRBD_MINOR # ... and convert into regex: IFS="|$IFS"; DRBD_MINOR="($*)"; IFS=${IFS#?} # We must not recurse into netlink, # this may be a callback triggered by "drbdsetup primary". # grep /proc/drbd instead # This magic does not work, if # DRBD_peer=() DRBD_role=() DRBD_disk=() DRBD_disk_all_uptodate=true DRBD_disk_all_consistent=true IFS=$'\n' lines=($(sed -nre "/^ *$DRBD_MINOR: cs:/ { s/:/ /g; p; }" /proc/drbd)) IFS=$' \t\n' i=0 for line in "${lines[@]}"; do set -- $line DRBD_peer[i]=${5#*/} DRBD_role[i]=${5%/*} disk=${7%/*} DRBD_disk[i]=${disk:-Unconfigured} case $disk in UpToDate) ;; Consistent) DRBD_disk_all_uptodate=false ;; *) DRBD_disk_all_uptodate=false DRBD_disk_all_consistent=false ;; esac let i++ done if (( i = 0 )) ; then DRBD_disk_all_uptodate=false DRBD_disk_all_consistent=false fi } ############################################################ # try to get possible output on stdout/err to syslog PROG=${0##*/} redirect_to_logger() { local lf=${1:-local5} case $lf in # do we want to exclude some? auth|authpriv|cron|daemon|ftp|kern|lpr|mail|news|syslog|user|uucp|local[0-7]) : OK ;; *) echo >&2 "invalid logfacility: $lf" return ;; esac exec > >(2>&- ; logger -t "$PROG[$$]" -p $lf.info) 2>&1 } if [[ $- != *x* ]]; then # you may override with --logfacility below redirect_to_logger local5 fi # clean environment just in case. unset fencing_attribute id_prefix timeout dc_timeout unreachable_peer_is CTS_mode=false suicide_on_failure_if_primary=false # poor mans command line argument parsing, # allow for command line overrides while [[ $# != 0 ]]; do case $1 in --logfacility=*) redirect_to_logger ${1#*=} ;; --logfacility) redirect_to_logger $2 shift ;; --resource=*) DRBD_RESOURCE=${1#*=} ;; -r|--resource) DRBD_RESOURCE=$2 shift ;; --master-id=*) master_id=${1#*=} ;; -i|--master-id) master_id=$2 shift ;; --role=*) role=${1#*=} ;; -l|--role) role=${2} shift ;; --fencing-attribute=*) fencing_attribute=${1#*=} ;; -a|--fencing-attribute) fencing_attribute=$2 shift ;; --id-prefix=*) id_prefix=${1#*=} ;; -p|--id-prefix) id_prefix=$2 shift ;; --timeout=*) timeout=${1#*=} ;; -t|--timeout) timeout=$2 shift ;; --dc-timeout=*) dc_timeout=${1#*=} ;; -d|--dc-timeout) dc_timeout=$2 shift ;; --net-hickup=*|--network-hickup=*) net_hickup_time=${1#*=} ;; --net-hickup|--network-hickup) net_hickup_time=$2 shift ;; --CTS-mode) CTS_mode=true ;; --unreachable-peer-is-outdated) # This is NOT to be scripted. # Or people will put this into the handler definition in # drbd.conf, and all this nice work was useless. test -t 0 && unreachable_peer_is=outdated ;; --suicide-on-failure-if-primary) suicide_on_failure_if_primary=true ;; -*) echo >&2 "ignoring unknown option $1" ;; *) echo >&2 "ignoring unexpected argument $1" ;; esac shift done # DRBD_RESOURCE: from environment # master_id: parsed from cib : "== unreachable_peer_is == ${unreachable_peer_is:=unknown}" # apply defaults: : "== fencing_attribute == ${fencing_attribute:="#uname"}" : "== id_prefix == ${id_prefix:="drbd-fence-by-handler"}" : "== role == ${role:="Master"}" # defaults suitable for most cases : "== net_hickup_time == ${net_hickup_time:=0}" : "== timeout == ${timeout:=90}" : "== dc_timeout == ${dc_timeout:=20}" # check envars normally passed in by drbdadm # TODO DRBD_CONF is also passed in. we may need to use it in the # xpath query, in case someone is crazy enough to use different # conf files with the _same_ resource name. # for now: do not do that, or hardcode the cib id of the master # in the handler section of your drbd conf file. for var in DRBD_RESOURCE; do if [ -z "${!var}" ]; then echo "Environment variable \$$var not found (this is normally passed in by drbdadm)." >&2 exit 1 fi done # Fixup id-prefix to include the resource name # There may be multiple drbd instances part of the same M/S Group, pointing to # the same master-id. Still they need to all have their own constraint, to be # able to unfence independently when they finish their resync independently. # Be nice to people who already explicitly configure an id prefix containing # the resource name. if [[ $id_prefix != *"-$DRBD_RESOURCE" ]] ; then id_prefix="$id_prefix-$DRBD_RESOURCE" : "== id_prefix == ${id_prefix}" fi # make sure it contains what we expect HOSTNAME=$(uname -n) echo "invoked for $DRBD_RESOURCE${master_id:+" (master-id: $master_id)"}" # to be set by drbd_peer_fencing() drbd_fence_peer_exit_code=1 case $PROG in crm-fence-peer.sh) if drbd_peer_fencing fence; then : == DEBUG == $cibadmin_invocations cibadmin calls == : == DEBUG == $SECONDS seconds == exit $drbd_fence_peer_exit_code fi ;; crm-unfence-peer.sh) if drbd_peer_fencing unfence; then : == DEBUG == $cibadmin_invocations cibadmin calls == : == DEBUG == $SECONDS seconds == exit 0 fi esac # 1: unexpected error exit 1 drbd-8.4.4/scripts/drbd0000775000000000000000000001366112132747531013506 0ustar rootroot#!/bin/bash # # chkconfig: - 70 08 # description: Loads and unloads the drbd module # # Copyright 2001-2010 LINBIT # # Philipp Reisner, Lars Ellenberg # ### BEGIN INIT INFO # Provides: drbd # Required-Start: $local_fs $network $syslog # Required-Stop: $local_fs $network $syslog # Should-Start: sshd multipathd # Should-Stop: sshd multipathd # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # X-Start-Before: heartbeat corosync # X-Stop-After: heartbeat corosync # X-Interactive: true # Short-Description: Control drbd resources. ### END INIT INFO DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" DRBDSETUP="/sbin/drbdsetup" PROC_DRBD="/proc/drbd" MODPROBE="/sbin/modprobe" RMMOD="/sbin/rmmod" UDEV_TIMEOUT=10 ADD_MOD_PARAM="" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi test -f $DRBDADM || exit 5 # we only use these two functions, define fallback versions of them ... log_daemon_msg() { echo -n "${1:-}: ${2:-}"; } log_end_msg() { echo "."; } # ... and let the lsb override them, if it thinks it knows better. if [ -f /lib/lsb/init-functions ]; then . /lib/lsb/init-functions fi function assure_module_is_loaded { [ -e "$PROC_DRBD" ] && return $MODPROBE -s drbd $ADD_MOD_PARAM || { echo "Can not load the drbd module."$'\n'; exit 20 } # tell klogd to reload module symbol information ... [ -e /var/run/klogd.pid ] && [ -x /sbin/klogd ] && /sbin/klogd -i } drbd_pretty_status() { local proc_drbd=$1 # add resource names if ! type column &> /dev/null || ! type paste &> /dev/null || ! type join &> /dev/null || ! type sed &> /dev/null || ! type tr &> /dev/null then cat "$proc_drbd" return fi sed -e '2q' < "$proc_drbd" sed_script=$( i=0; _sh_status_process() { let i++ ; stacked=${_stacked_on:+"^^${_stacked_on_minor:-${_stacked_on//[!a-zA-Z0-9_ -]/_}}"} printf "s|^ *%u:|%6u\t&%s%s|\n" \ $_minor $i \ "${_res_name//[!a-zA-Z0-9_ -]/_}" "$stacked" }; eval "$(drbdadm sh-status)" ) p() { sed -e "1,2d" \ -e "$sed_script" \ -e '/^ *[0-9]\+: cs:Unconfigured/d;' \ -e 's/^\(.* cs:.*[^ ]\) \([rs]...\)$/\1 - \2/g' \ -e 's/^\(.* \)cs:\([^ ]* \)st:\([^ ]* \)ds:\([^ ]*\)/\1\2\3\4/' \ -e 's/^\(.* \)cs:\([^ ]* \)ro:\([^ ]* \)ds:\([^ ]*\)/\1\2\3\4/' \ -e 's/^\(.* \)cs:\([^ ]*\)$/\1\2/' \ -e 's/^ *[0-9]\+:/ x &??not-found??/;' \ -e '/^$/d;/ns:.*nr:.*dw:/d;/resync:/d;/act_log:/d;' \ -e 's/^\(.\[.*\)\(sync.ed:\)/... ... \2/;/^.finish:/d;' \ -e 's/^\(.[0-9 %]*oos:\)/... ... \1/' \ < "$proc_drbd" | tr -s '\t ' ' ' } m() { join -1 2 -2 1 -o 1.1,2.2,2.3 \ <( ( drbdadm sh-dev all ; drbdadm -S sh-dev all ) | cat -n | sort -k2,2) \ <(sort < /proc/mounts ) | sort -n | tr -s '\t ' ' ' | sed -e 's/^ *//' } # echo "=== p ===" # p # echo "=== m ===" # m # echo "=========" # join -a1 <(p|sort) <(m|sort) # echo "=========" ( echo m:res cs ro ds p mounted fstype join -a1 <(p|sort) <(m|sort) | cut -d' ' -f2-6,8- | sort -k1,1n -k2,2 ) | column -t } # Try to settle regardless of udev version or presence, # so "/etc/init.d/drbd stop" is able to rmmod, without interfering # temporary module references caused by udev scanning the devices. # But don't wait too long. _udev_settle() { if udevadm version ; then # ok, we have udevadm, use it. udevadm settle --timeout=5 else # if udevsettle is not there, # no matter. udevsettle --timeout=5 fi } case "$1" in start) # Just in case drbdadm want to display any errors in the configuration # file, or we need to ask the user about registering this installation # at http://usage.drbd.org, we call drbdadm here without any IO # redirection. # If "no op" has a non-zero exit code, the config is unusable, # and every other command will fail. log_daemon_msg "Starting DRBD resources" if ! out=$($DRBDADM sh-nop 2>&1) ; then printf "\n%s\n" "$out" >&2 log_end_msg 1 exit 1 fi assure_module_is_loaded $DRBDADM adjust-with-progress all [[ $? -gt 1 ]] && exit 20 # make sure udev has time to create the device files # FIXME this probably should, on platforms that have it, # use udevadm settle --timeout=X --exit-if-exists=$DEVICE for DEVICE in `$DRBDADM sh-dev all`; do UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt 0 ] ; do sleep 1 UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 )) done done [ -d /var/lock/subsys ] && touch /var/lock/subsys/drbd # for RedHat $DRBDADM wait-con-int # User interruptible version of wait-connect all $DRBDADM sh-b-pri all # Become primary if configured log_end_msg 0 ;; stop) $DRBDADM sh-nop log_daemon_msg "Stopping all DRBD resources" for try in 1 2; do if [ -e $PROC_DRBD ] ; then [[ $try = 2 ]] && echo "Retrying once..." # bypass drbdadm and drbd config file and everything, # to avoid leaving devices around that are not referenced by # the current config file, or in case the current config file # does not parse for some reason. for d in /dev/drbd* ; do [ -L "$d" ] && continue [ -b "$d" ] || continue M=$(umount "$d" 2>&1) case $M in *" not mounted") :;; *) echo "$M" >&2 ;; esac done for res in $(drbdsetup all show | sed -ne 's/^resource \(.*\) {$/\1/p'); do drbdsetup "$res" down done _udev_settle &> /dev/null $RMMOD drbd && break fi done [ -f /var/lock/subsys/drbd ] && rm /var/lock/subsys/drbd log_end_msg 0 ;; status) # NEEDS to be heartbeat friendly... # so: put some "OK" in the output. if [ -e $PROC_DRBD ]; then echo "drbd driver loaded OK; device status:" drbd_pretty_status $PROC_DRBD 2>/dev/null exit 0 else echo >&2 "drbd not loaded" exit 3 fi ;; reload) $DRBDADM sh-nop log_daemon_msg "Reloading DRBD configuration" $DRBDADM adjust all log_end_msg 0 ;; restart|force-reload) ( . $0 stop ) ( . $0 start ) ;; *) echo "Usage: /etc/init.d/drbd {start|stop|status|reload|restart|force-reload}" exit 1 ;; esac exit 0 drbd-8.4.4/scripts/drbd-overview.pl0000775000000000000000000001622512221261130015744 0ustar rootroot#!/usr/bin/perl use strict; use warnings; ## MAYBE set 'sane' PATH ?? $ENV{LANG} = 'C'; $ENV{LC_ALL} = 'C'; $ENV{LANGUAGE} = 'C'; #use Data::Dumper; # globals my $PROC_DRBD = "/proc/drbd"; my $stderr_to_dev_null = 1; my $watch = 0; my %drbd; my %minor_of_name; my %xen_info; my %virsh_info; # sets $drbd{minor}->{name} (and possibly ->{ll_dev}) sub map_minor_to_resource_names() { my @drbdadm_sh_status = `drbdadm sh-status`; my ($ll_res, $ll_dev, $ll_minor, $conf_res, $conf_vnr, $minor, $name, $vnr); for (@drbdadm_sh_status) { # volumes only present in >= 8.4 # some things generated by drbdadm /^_conf_res_name=(.*)\n/ and $conf_res = $1, $name = $conf_res; /^_conf_volume=(\d+)\n/ and $conf_vnr = $1; /^_stacked_on=(.*?)\n/ and $ll_res = $1; # not always present: /^_stacked_on_device=(.*)\n/ and $ll_dev = $1; /^_stacked_on_minor=(\d+)\n/ and $ll_minor = $1; # rest generated by drbdsetup /^_minor=(.*?)\n/ and $minor = $1; /^_res_name=(.+?)\n/ and $name = $1; /^_volume=(\d+)\n/ and $vnr = $1; /^_sh_status_process/ or next; $drbd{$minor}{name} = $name; if (defined $conf_vnr) { # >= 8.4, append /volume to resource name. # If both are present, they should be the same. But # just in case, prefer the kernel volume number, if it # is present and positive. Else, use the volume number # from the config. $drbd{$minor}{name} .= defined $vnr ? "/$vnr" : "/$conf_vnr"; } $minor_of_name{$name} = $minor; $drbd{$minor}{ll_dev} = defined($ll_dev) ? $ll_minor : $ll_res if $ll_res; } # fix up hack for git versions 8.3.1 > x > 8.3.0: # _stacked_on_minor information is missing, # _stacked_on is resource name # may be defined (and reported) out of order. for my $i (keys %drbd) { next unless exists $drbd{$i}->{ll_dev}; my $lower = $drbd{$i}->{ll_dev}; next if $lower =~ /^\d+$/; next unless exists $minor_of_name{$lower}; $drbd{$i}->{ll_dev} = $minor_of_name{$lower}; } # fix up to be able to report "lower dev of:" for my $i (keys %drbd) { next unless exists $drbd{$i}->{ll_dev}; my $lower = $drbd{$i}->{ll_dev}; $drbd{$lower}->{ll_dev_of} = $i; } } sub ll_dev_info { my $i = shift; ( "ll-dev of:", $i, $drbd{$i}{name} ) } # sets $drbd{minor}->{state} and (and possibly ->{sync}) sub slurp_proc_drbd_or_exit() { unless (open(PD,$PROC_DRBD)) { print "drbd not loaded\n"; exit 0; } my $minor; while (defined($_ = )) { chomp; /^ *(\d+):/ and do { # skip unconfigured devices $minor = $1; if (/^ *(\d+): cs:Unconfigured/) { next unless exists $drbd{$minor} and exists $drbd{$minor}{name}; } # add "-" for protocol, in case it is missing s/^(.* cs:.*\S) ([rs]...)$/$1 - $2/; # strip off what will be in the heading s/^(.* )cs:([^ ]* )(?:st|ro):([^ ]* )ds:([^ ]*)/$1$2$3$4/; s/^(.* )cs:([^ ]* )(?:st|ro):([^ ]* )ld:([^ ]*)/$1$2$3$4/; s/^(.* )cs:([^ ]*)$/$1$2/; # strip off leading minor number s/^ *\d+:\s+//; # add alignment helpers for Unconfigured devices s/Unconfigured$/$& . . . ./; $drbd{$minor}{state} = $_; }; /^\t\[.*sync.ed:/ and do { $drbd{$minor}{sync} = $_; }; /^\t[0-9 %]+oos:/ and do { $drbd{$minor}{sync} = $_; }; } close PD; for (values %drbd) { $_->{state} ||= "Unconfigured . . . ."; } } # sets $drbd{minor}->{pv_info} sub get_pv_info() { for (`pvs --noheadings --units g -o pv_name,vg_name,pv_size,pv_used`) { m{^\s*/dev/drbd(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s*$} or next; # PV VG PSize Used $drbd{$1}{pv_info} = { vg => $2, size => $3, used => $4 }; } } sub pv_info { my $t = shift; "lvm-pv:", @{$t}{qw(vg size used)}; } # sets $drbd{minor}->{df_info} sub get_df_info() { for (`df -TPhl -x tmpfs`) { # Filesystem Type Size Used Avail Use% Mounted on m{^/dev/drbd(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)} or next; $drbd{$1}{df_info} = { type => $2, size => $3, used => $4, avail => $5, use_percent => $6, mountpoint => $7 }; } } sub df_info { my $t = shift; @{$t}{qw(mountpoint type size used avail use_percent)}; } # sets $drbd{minor}->{xen_info} sub get_xen_info() { my $dom; my $running = 0; my %i; for (`xm list --long`) { /^\s+\(name ([^)\n]+)\)/ and $dom = $1; /drbd:([^)\n]+)/ and $i{$minor_of_name{$1}}++; m{phy:/dev/drbd(\d+)} and $i{$1}++; /^\s+\(state r/ and $running = 1; if (/^\)$/) { for (keys %i) { $drbd{$_}{xen_info} = $running ? "\*$dom" : "_$dom"; } $running = 0; %i = (); } } } # set $drbd{minor}->{virsh_info} sub get_virsh_info() { local $/ = undef; my $virsh_list = `virsh list --all`; # Id Name State # ---------------------------------- # 1 mail running # 2 support running # - debian-master shut off # - www shut off my %info; my $virsh_dumpxml; my $pid; $virsh_list =~ s/^\s+Id\s+Name\s+?State\s*-+\n//; while ($virsh_list =~ m{^\s*(\S+)\s+(\S+)\s+(\S.*?)\n}gm) { $info{$2} = { id => $1, name => $2, state => $3 }; # print STDERR "$1, $2, $3\n"; } for my $dom (keys %info) { # add error processing as above $pid = open(V, "-|"); return unless defined $pid; if ($pid == 0) { # child exec("virsh", "dumpxml", $dom) or die "can't exec program: $!"; # NOTREACHED } # parent $_ = ; close(V) or warn "virsh dumpxml exit code: $?\n"; for (m{]*>.+?}gs) { m{} or next; my ($path,$dev) = ($1,$2); if ($dev !~ /^\d+$/) { my @stat = stat($path) or next; $dev = $stat[6] & 0xff; } m{ $info{$dom}->{state} eq 'running' ? "\*$dom" : "_$dom", vdev => $1, bus => $2, }; } } } sub virsh_info { my $t = shift; @{$t}{qw(domname vdev bus)}; } # very stupid option handling # first, for debugging of this script and its regex'es, # allow reading from a prepared file instead of /proc/drbd if (@ARGV > 1 and $ARGV[0] eq '--proc-drbd') { $PROC_DRBD = $ARGV[1]; splice @ARGV,0,2; } $stderr_to_dev_null = 0 if @ARGV and $ARGV[0] eq '-d'; open STDERR, "/dev/null" if $stderr_to_dev_null; map_minor_to_resource_names; slurp_proc_drbd_or_exit; get_pv_info; get_df_info; get_xen_info; get_virsh_info; # generate output, adjust columns my @out = []; my @maxw = (); my $line = 0; for my $m (sort { $a <=> $b } keys %drbd) { my $t = $drbd{$m}; my @used_by = exists $t->{xen_info} ? "xen-vbd: $t->{xen_info}" : exists $t->{pv_info} ? pv_info $t->{pv_info} : exists $t->{df_info} ? df_info $t->{df_info} : exists $t->{virsh_info} ? virsh_info $t->{virsh_info} : exists $t->{ll_dev_of} ? ll_dev_info $t->{ll_dev_of} : (); $out[$line] = [ sprintf("%3u:%s", $m, $t->{name} || "??not-found??"), defined($t->{ll_dev}) ? "^^$t->{ll_dev}" : "", split(/\s+/, $t->{state}), @used_by ]; for (my $c = 0; $c < @{$out[$line]}; $c++) { my $l = length($out[$line][$c]) + 1; $maxw[$c] = $l unless $maxw[$c] and $l < $maxw[$c]; } ++$line; if (defined $t->{sync}) { $out[$line++] = [ $t->{sync} ]; } } my @fmt = map { "%-${_}s" } @maxw; for (@out) { for (my $c = 0; $c < @$_; $c++) { printf $fmt[$c], $_->[$c]; } print "\n"; } drbd-8.4.4/scripts/drbd.conf0000664000000000000000000000020511516050235014406 0ustar rootroot# You can find an example in /usr/share/doc/drbd.../drbd.conf.example include "drbd.d/global_common.conf"; include "drbd.d/*.res"; drbd-8.4.4/scripts/drbd.conf.example0000664000000000000000000000614312221261130016037 0ustar rootrootresource example { options { on-no-data-accessible suspend-io; } net { cram-hmac-alg "sha1"; shared-secret "secret_string"; } # The disk section is possible on resource level and in each # volume section disk { # If you have a resonable RAID controller # with non volatile write cache (BBWC, flash) disk-flushes no; disk-barrier no; md-flushes no; } # volume sections on resource level, are inherited to all node # sections. Place it here if the backing devices have the same # device names on all your nodes. volume 1 { device minor 1; disk /dev/sdb1; meta-disk internal; disk { resync-after example/0; } } on wurzel { address 192.168.47.1:7780; volume 0 { device minor 0; disk /dev/vg_wurzel/lg_example; meta-disk /dev/vg_wurzel/lv_example_md; } } on sepp { address 192.168.47.2:7780; volume 0 { device minor 0; disk /dev/vg_sepp/lg_example; meta-disk /dev/vg_sepp/lv_example_md; } } } resource "ipv6_example_res" { net { cram-hmac-alg "sha1"; shared-secret "ieho4CiiUmaes6Ai"; } volume 2 { device "/dev/drbd_fancy_name" minor 0; disk /dev/vg0/example2; meta-disk internal; } on amd { # Here is an example of ipv6. # If you want to use ipv4 in ipv6 i.e. something like [::ffff:192.168.22.11] # you have to set disable-ip-verification in the global section. address ipv6 [fd0c:39f4:f135:305:230:48ff:fe63:5c9a]:7789; } on alf { address ipv6 [fd0c:39f4:f135:305:230:48ff:fe63:5ebe]:7789; } } # # A two volume setup with a node for disaster recovery in an off-site location. # resource alpha-bravo { net { cram-hmac-alg "sha1"; shared-secret "Gei6mahcui4Ai0Oh"; } on alpha { volume 0 { device minor 0; disk /dev/foo; meta-disk /dev/bar; } volume 1 { device minor 1; disk /dev/foo1; meta-disk /dev/bar1; } address 192.168.23.21:7780; } on bravo { volume 0 { device minor 0; disk /dev/foo; meta-disk /dev/bar; } volume 1 { device minor 1; disk /dev/foo1; meta-disk /dev/bar1; } address 192.168.23.22:7780; } } resource stacked_multi_volume { net { protocol A; on-congestion pull-ahead; congestion-fill 400M; congestion-extents 1000; } disk { c-fill-target 10M; } volume 0 { device minor 10; } volume 1 { device minor 11; } proxy { memlimit 500M; plugin { lzma contexts 4 level 9; } } stacked-on-top-of alpha-bravo { address 192.168.23.23:7780; proxy on charly { # In the regular production site, there is a dedicated host to run # DRBD-proxy inside 192.168.23.24:7780; # for connections to DRBD outside 172.16.17.18:7780; # for connections over the WAN or VPN options { memlimit 1G; # Additional proxy options are possible here } } } on delta { volume 0 { device minor 0; disk /dev/foo; meta-disk /dev/bar; } volume 1 { device minor 1; disk /dev/foo1; meta-disk /dev/bar1; } address 127.0.0.2:7780; proxy on delta { # In the DR-site the proxy runs on the machine that stores the data inside 127.0.0.1:7780; outside 172.16.17.19:7780; } } } drbd-8.4.4/scripts/drbd.gentoo0000664000000000000000000000702511610000300014742 0ustar rootroot#!/sbin/runscript # Distributed under the terms of the GNU General Public License v2 # Copright 2001-2008 LINBIT Information Technologies # Philipp Reisner, Lars Ellenberg # Original script adapted to gentoo environment # I so do not see why gentoo would need it's own init script. # But if you think it does, well, you get to fix it. # See what we do in the generic one. echo "FIXME, contributers welcome. This is broken for 8.4" >&2 exit 255 depend() { use logger need net before heartbeat after sshd } opts="${opts} reload" DEFAULTFILE="/etc/conf.d/drbd" DRBDADM="/sbin/drbdadm" PROC_DRBD="/proc/drbd" MODPROBE="/sbin/modprobe" RMMOD="/sbin/rmmod" UDEV_TIMEOUT=10 ADD_MOD_PARAM="" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi # Just in case drbdadm want to display any errors in the configuration # file, or we need to ask the user about registering this installation # at http://usage.drbd.org, we call drbdadm here without any IO # redirection. $DRBDADM sh-nop function assure_module_is_loaded { [ -e "$PROC_DRBD" ] && return ebegin "Loading drbd module" ret=0 $MODPROBE -s drbd `$DRBDADM sh-mod-parms` $ADD_MOD_PARAM || ret=20 eend $ret return $ret } function adjust_with_progress { IFS_O=$IFS NEWLINE=' ' IFS=$NEWLINE local D=0 local S=0 local N=0 einfon "Setting drbd parameters " COMMANDS=`$DRBDADM -d adjust all` || { eend 20 "Error executing drbdadm" return 20 } echo -n "[ " for CMD in $COMMANDS; do if echo $CMD | grep -q disk; then echo -n "d$D "; D=$(( D+1 )); elif echo $CMD | grep -q syncer; then echo -n "s$S "; S=$(( S+1 )); elif echo $CMD | grep -q net; then echo -n "n$N "; N=$(( N+1 )); else echo echo -n ".. "; fi IFS=$IFS_O $CMD || { echo eend 20 "cmd $CMD failed!" return 20 } IFS=$NEWLINE done echo "]" eend 0 IFS=$IFS_O } start() { einfo "Starting DRBD resources:" eindent assure_module_is_loaded || return $? adjust_with_progress || return $? # make sure udev has time to create the device files ebegin "Waiting for udev device creation ..." for RESOURCE in `$DRBDADM sh-resources`; do for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt 0 ] ; do sleep 1 UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 )) done done done eend 0 einfon "Waiting for connection " $DRBDADM wait-con-int echo ret=$? eend $ret return $ret } stop() { ebegin "Stopping all DRBD resources" # Check for mounted drbd devices if ! grep -q '^/dev/drbd' /proc/mounts &>/dev/null; then if [ -e ${PROC_DRBD} ]; then ${DRBDADM} down all ${RMMOD} drbd fi ret=$? eend $ret return $ret else einfo "drbd devices mounted, please umount them before trying to stop drbd!" eend 1 return 1 fi } status() { # NEEDS to be heartbeat friendly... # so: put some "OK" in the output. if [ -e $PROC_DRBD ]; then ret=0 ebegin "drbd driver loaded OK; device status:" eend $ret cat $PROC_DRBD else ebegin "drbd not loaded" ret=3 eend $ret fi return $ret } reload() { ebegin "Reloading DRBD" ${DRBDADM} adjust all ret=$? eend $ret return $ret } drbd-8.4.4/scripts/drbd.metadata.rhcs0000664000000000000000000000261411516050235016205 0ustar rootroot 1.0 This is a DRBD resource. The resource must be configured in the configuration file (/etc/drbd.conf), and the DRBD kernel module must be loaded. This is a DRBD resource. Symbolic name for this resource. Cluster resource name The DRBD resource name, as specified in /etc/drbd.conf. DRBD resource name drbd-8.4.4/scripts/drbd.ocf0000664000000000000000000010076012221261130014227 0ustar rootroot#!/bin/bash # # # OCF Resource Agent compliant drbd resource script. # # Copyright (c) 2009 LINBIT HA-Solutions GmbH, # Copyright (c) 2009 Florian Haas, Lars Ellenberg # Based on the Heartbeat drbd OCF Resource Agent by Lars Marowsky-Bree # (though it turned out to be an almost complete rewrite) # # All Rights Reserved. # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Further, this software is distributed without any warranty that it is # free of the rightful claim of any third person regarding infringement # or the like. Any license provided herein, whether implied or # otherwise, applies only to this software file. Patent licenses, if # any, provided herein do not apply to combinations of this program with # other software, or any other product whatsoever. # # You should have received a copy of the GNU General Public License # along with this program; if not, write the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. # # # OCF instance parameters # OCF_RESKEY_drbd_resource # OCF_RESKEY_drbdconf # OCF_RESKEY_stop_outdates_secondary # OCF_RESKEY_adjust_master_score # # meta stuff this agent looks at: # OCF_RESKEY_CRM_meta_clone_max # OCF_RESKEY_CRM_meta_clone_node_max # OCF_RESKEY_CRM_meta_master_max # OCF_RESKEY_CRM_meta_master_node_max # # OCF_RESKEY_CRM_meta_interval # # OCF_RESKEY_CRM_meta_notify # OCF_RESKEY_CRM_meta_notify_active_uname # OCF_RESKEY_CRM_meta_notify_demote_uname # OCF_RESKEY_CRM_meta_notify_master_uname # OCF_RESKEY_CRM_meta_notify_operation # OCF_RESKEY_CRM_meta_notify_promote_uname # OCF_RESKEY_CRM_meta_notify_slave_uname # OCF_RESKEY_CRM_meta_notify_start_uname # OCF_RESKEY_CRM_meta_notify_stop_uname # OCF_RESKEY_CRM_meta_notify_type # ####################################################################### # Initialization: # Resource-agents have moved their ocf-shellfuncs file around. # There are supposed to be symlinks or wrapper files in the old location, # pointing to the new one, but people seem to get it wrong all the time. # Try several locations. if test -n "${OCF_FUNCTIONS_DIR}" ; then if test -e "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs" ; then . "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs" elif test -e "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs" ; then . "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs" fi else if test -e "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs" ; then . "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs" elif test -e "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"; then . "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs" fi fi # Defaults OCF_RESKEY_drbdconf_default="/etc/drbd.conf" # The passed in OCF_CRM_meta_notify_* environment # is not reliably with pacemaker up to at least # 1.0.10 and 1.1.4. It should be fixed later. # Until that is fixed, the "self-outdating feature" would base its actions on # wrong information, and possibly not outdate when it should, or, even worse, # outdate the last remaining valid copy. # Disable. OCF_RESKEY_stop_outdates_secondary_default="false" OCF_RESKEY_adjust_master_score_default="5 10 1000 10000" # ignored | Consistent | Unknown -' | | | # ignored | NOT UpToDate | UpToDate ---' | | # Secondary | UpToDate | unknown --------' | # ignored | UpToDate | known --------------+ # Primary | UpToDate | ignored --------------' : ${OCF_RESKEY_drbdconf:=${OCF_RESKEY_drbdconf_default}} : ${OCF_RESKEY_stop_outdates_secondary:=${OCF_RESKEY_stop_outdates_secondary_default}} : ${OCF_RESKEY_adjust_master_score:=${OCF_RESKEY_adjust_master_score_default}} # Defaults according to "Configuration 1.0 Explained", # "Multi-state resource configuration options" : ${OCF_RESKEY_CRM_meta_clone_node_max=1} : ${OCF_RESKEY_CRM_meta_master_max=1} : ${OCF_RESKEY_CRM_meta_master_node_max=1} ####################################################################### # for debugging this RA DEBUG_LOG_DIR=/tmp/drbd.ocf.ra.debug DEBUG_LOG=$DEBUG_LOG_DIR/log USE_DEBUG_LOG=false ls_stat_is_dir_0700_root() { set -- $(command ls -ldn "$1" 2>/dev/null); [[ $1/$3 = drwx?-??-?/0 ]] } # try to avoid symlink vuln. if ls_stat_is_dir_0700_root $DEBUG_LOG_DIR && [[ -w "$DEBUG_LOG" && ! -L "$DEBUG_LOG" ]] then USE_DEBUG_LOG=true exec 9>>"$DEBUG_LOG" date >&9 echo "$*" >&9 env | grep OCF_ | sort >&9 else exec 9>/dev/null fi # end of debugging aid ####################################################################### meta_data() { cat < 1.3 This resource agent manages a DRBD resource as a master/slave resource. DRBD is a shared-nothing replicated storage device. Note that you should configure resource level fencing in DRBD, this cannot be done from this resource agent. See the DRBD User's Guide for more information. http://www.drbd.org/docs/applications/ Manages a DRBD device as a Master/Slave resource The name of the drbd resource from the drbd.conf file. drbd resource name Full path to the drbd.conf file. Path to drbd.conf Space separated list of four master score adjustments for different scenarios: - only access to 'consistent' data - only remote access to 'uptodate' data - currently Secondary, local access to 'uptodate' data, but remote is unknown - local access to 'uptodate' data, and currently Primary or remote is known Numeric values are expected to be non-decreasing. Default are the previously hardcoded values. Set the first value to 0 (and configure proper fencing methods) to prevent pacemaker from trying to promote while it is unclear whether the data is really the most recent copy. (DRBD knows it is "consistent", but is unsure about "uptodate"ness). Advanced use: Adjust the other values to better fit into complex dependency score calculations. master score adjustments Recommended setting: leave at default (disabled). Note that this feature depends on the passed in information in OCF_RESKEY_CRM_meta_notify_master_uname to be correct, which unfortunately is not reliable for pacemaker versions up to at least 1.0.10 / 1.1.4. If a Secondary is stopped (unconfigured), it may be marked as outdated in the drbd meta data, if we know there is still a Primary running in the cluster. Note that this does not affect fencing policies set in drbd config, but is an additional safety feature of this resource agent only. You can enable this behaviour by setting the parameter to true. If this feature seems to not do what you expect, make sure you have defined fencing policies in the drbd configuration as well. outdate a secondary on stop END } do_cmd() { # Run a command, return its exit code, capture any output, and log # everything if appropriate. local cmd="$*" cmd_out ret ocf_log debug "$DRBD_RESOURCE: Calling $cmd" cmd_out=$( "$@" ) ret=$? if [ $ret != 0 ]; then ocf_log err "$DRBD_RESOURCE: Called $cmd" ocf_log err "$DRBD_RESOURCE: Exit code $ret" ocf_log err "$DRBD_RESOURCE: Command output: $cmd_out" else ocf_log debug "$DRBD_RESOURCE: Exit code $ret" ocf_log debug "$DRBD_RESOURCE: Command output: $cmd_out" fi echo "$cmd_out" return $ret } do_drbdadm() { local ret # Run drbdadm with appropriate command line options, and capture # its output. # $DRBDADM is defined during drbd_validate as "drbdadm" plus # appropriate command line options do_cmd $DRBDADM "$@" ret=$? # having the version mismatch warning once per RA invokation # should be enough. export DRBD_DONT_WARN_ON_VERSION_MISMATCH= return $ret } set_master_score() { # Use quiet mode (-Q) to quench logging. Actual score updates # will get logged by attrd anyway do_cmd ${HA_SBIN_DIR}/crm_master -Q -l reboot -v $1 } remove_master_score() { do_cmd ${HA_SBIN_DIR}/crm_master -l reboot -D } _sh_status_process() { # _volume not present should not happen, # but may help make this agent work even if it talks to drbd 8.3. : ${_volume:=0} # not-yet-created volumes are reported as -1 (( _volume >= 0 )) || _volume=$[1 << 16] DRBD_ROLE_LOCAL[$_volume]=${_role:-Unconfigured} DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown} DRBD_CSTATE[$_volume]=$_cstate DRBD_DSTATE_LOCAL[$_volume]=${_disk:-Unconfigured} DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown} } drbd_set_status_variables() { # drbdsetup sh-status prints these values to stdout, # and then prints _sh_status_process. # # if we eval that, we do not need several drbdadm/drbdsetup commands # to figure out the various aspects of the state. local _minor _res_name _known _cstate _role _peer _disk _pdsk local _volume local _flags_susp _flags_aftr_isp _flags_peer_isp _flags_user_isp local _resynced_percent # NOT local! but "return values" # since 8.4 supports multi volumes per resource, # these are shell arrays DRBD_ROLE_LOCAL=(Unconfigured) DRBD_ROLE_REMOTE=(Unknown) DRBD_CSTATE=(Unconfigured) DRBD_DSTATE_LOCAL=(Unconfigured) DRBD_DSTATE_REMOTE=(DUnknown) # Populates a set of variables relevant to DRBD's status eval "$($DRBDSETUP "$DRBD_RESOURCE" sh-status)" } # This is not the only fencing mechanism. # But in addition to the drbd "fence-peer" handler, which should be configured, # and is expected to place some appropriate constraints, this is used to # actually store the Outdated information in DRBD on-disk meta data. # # called after stop, and from post notification events. maybe_outdate_self() { # if you claim your right to go online with stale data, # there you are. ocf_is_true $OCF_RESKEY_stop_outdates_secondary || return 1 local host stop_uname # We ignore $OCF_RESKEY_CRM_meta_notify_promote_uname here # because: if demote and promote for a _stacked_ resource # (or a "floating" one, where DRBD sits on top of some SAN) # happen in the same transition, demote will see the promote # hostname here, and voluntarily outdate itself. Which would # result in promote failure, as it is using the same meta # data, which would then be outdated. # If that is not sufficient for you, you probably need to # configure fencing policies in the drbd configuration. host=$(printf "%s\n" $OCF_RESKEY_CRM_meta_notify_master_uname | grep -vix -m1 -e "$HOSTNAME" ) if [[ -z $host ]] ; then # no current master host found, do not outdate myself return 1 fi for stop_uname in $OCF_RESKEY_CRM_meta_notify_stop_uname; do [[ $host == "$stop_uname" ]] || continue # post notification for stop on that host. # hrmpf. crm passed in stale master_uname :( # ignore return 1 done # e.g. post/promote of some other peer. # Should not happen, fencing constraints should take care of that. # But in case it does, scream out loud. case "${DRBD_ROLE_LOCAL[*]}" in *Primary*) # I am Primary. # The other one is Primary (according to OCF_RESKEY_CRM_meta_notify_master_uname). # But we cannot talk to each other :( (otherwise this function was not called) # One of us has to die. # Which one, however, is not ours to decide. ocf_log crit "resource internal SPLIT BRAIN: both $HOSTNAME and $host are Primary for $DRBD_RESOURCE, but the replication link is down!" return 1 esac # OK, I am not Primary, but there is an other node Primary # Outdate myself ocf_log notice "outdating $DRBD_RESOURCE: according to OCF_RESKEY_CRM_meta_notify_master_uname, '$host' is still master" do_drbdadm outdate $DRBD_RESOURCE # on some pacemaker versions, -INFINITY may cause resource instance stop/start. # But in this case that is ok, it may even clear the replication link # problem. set_master_score -INFINITY return 0 } drbd_update_master_score() { set -- $OCF_RESKEY_adjust_master_score local only_consistent=$1 only_remote=$2 local_ok=$3 as_good_as_it_gets=$4 # NOTE # there may be constraint scores from rules on role=Master, # that in some ways can add to the node attribute based master score we # specify below. If you think you want to add personal preferences, # in case the scores given by this RA do not suffice, this is the # value space you can work with: # -INFINITY: Do not promote. Really. Won't work anyways. # Too bad, at least with current (Oktober 2009) Pacemaker, # negative master scores cause instance stop; restart cycle :( # missing, zero: Do not promote. # I think my data is not good enough. # Though, of course, you may try, and it might even work. # 5: please, do not promote, unless this is your only option. # 10: promotion is probably a bad idea, our local data is no good, # you'd probably run into severe performance problems, and risk # application crashes or blocking IO in case you lose the # replication connection. # 1000: Ok to be promoted, we have good data locally (though we don't # know about the peer, so possibly it has even better data?). # You sould use the crm-fence-peer.sh handler or similar # mechanism to avoid data divergence. # 10000: Please promote me/keep me Primary. # I'm confident that my data is as good as it gets. # # For multi volume, we need to compare who is "better" a bit more sophisticated. # The ${XXX[*]//UpToDate}, without being in double quotes, results in a single space, # if all are UpToDate. : == DEBUG == ${DRBD_ROLE_LOCAL[*]}/${DRBD_DSTATE_LOCAL[*]//UpToDate/ }/${DRBD_DSTATE_REMOTE[*]//UpToDate/ }/ == case ${DRBD_ROLE_LOCAL[*]}/${DRBD_DSTATE_LOCAL[*]//UpToDate/ }/${DRBD_DSTATE_REMOTE[*]//UpToDate/ }/ in *Primary*/\ /*/) # I am Primary, all local disks are UpToDate set_master_score $as_good_as_it_gets ;; */\ /*DUnknown*/) # all local disks are UpToDate, # but I'm not Primary, # and I'm not sure about the peer's disk state(s). # We may need to outdate ourselves? # But if we outdate in a MONITOR, and are disconnected # secondary because of a hard primary crash, before CRM noticed # that there is no more master, we'd make us utterly useless! # Trust that the primary will also notice the disconnect, # and will place an appropriate fencing constraint via # its fence-peer handler callback. set_master_score $local_ok ;; */\ /*/) # We know something about our peer, which means that either the # replication link is established, or it was not even # consistent last time we talked to each other. # Also all our local disks are UpToDate, which means even if we are # currently synchronizing, we do so as SyncSource. set_master_score $as_good_as_it_gets ;; */*/\ /) # At least one of our local disks is not up to date. # But our peer is ALL OK. # We can expect to have access to useful # data, but must expect degraded performance. set_master_score $only_remote ;; */*Attaching*/*/|\ */*Negotiating*/*/) # some transitional state. # just don't do anything : ;; Unconfigured*|\ */*Diskless*/*/|\ */*Failed*/*/|\ */*Inconsistent*/*/|\ */*Outdated*/*/) # ALWAYS put the cluster in MAINTENANCE MODE # if you add a volume to a live replication group, # because the new volume will typically come up as Inconsistent # the first time, which would cause a monitor to revoke the # master score! # # At least some of our local disks are not really useable. # Our peer is not all good either (or some previous case block # would have matched). We have no access to useful data. # DRBD would refuse to be promoted, anyways. # # set_master_score -INFINITY # Too bad, at least with current (Oktober 2009) Pacemaker, # negative master scores cause instance stop; restart cycle :( # Hope that this will suffice. remove_master_score ;; *) # All local disks seem to be Consistent. # They _may_ be up to date, or not. # We hope that fencing mechanisms have put constraints in # place, so we won't be promoted with stale data. # But in case this was a cluster crash, # at least allow _someone_ to be promoted. set_master_score $only_consistent ;; esac return $OCF_SUCCESS } is_drbd_enabled() { test -f /proc/drbd } ####################################################################### drbd_usage() { echo "\ usage: $0 {start|stop|monitor|validate-all|promote|demote|notify|meta-data} Expects to have a fully populated OCF RA-compliant environment set." } drbd_status() { local rc local dev rc=$OCF_NOT_RUNNING is_drbd_enabled || return $rc # Not running, if no block devices exist. # # FIXME what if some do, and some do not exist? # Adding/removing volumes to/from existing resources should only be # done with maintenance-mode enabled. # If someone does manually kill/remove only some of the volumes, # we tolerate that here. for dev in ${DRBD_DEVICES[@]} ""; do test -b $dev && break done [[ $dev ]] || return $rc # ok, module is loaded, block device nodes exist. # lets see the status drbd_set_status_variables case "${DRBD_ROLE_LOCAL[*]}" in *Primary*) rc=$OCF_RUNNING_MASTER ;; *Secondary*) rc=$OCF_SUCCESS ;; *Unconfigured*) rc=$OCF_NOT_RUNNING ;; *) ocf_log err "Unexpected role(s) >>${DRBD_ROLE_LOCAL[*]}<<" rc=$OCF_ERR_GENERIC esac return $rc } # I'm sorry, but there is no $OCF_DEGRADED_MASTER or similar yet. drbd_monitor() { local status DRBD_ROLE_LOCAL=(Unconfigured) DRBD_ROLE_REMOTE=(Unknown) DRBD_CSTATE=(Unconfigured) DRBD_DSTATE_LOCAL=(Unconfigured) DRBD_DSTATE_REMOTE=(DUnknown) drbd_status status=$? if [[ $status = $OCF_NOT_RUNNING ]] && ocf_is_probe ; then # see also linux-ha mailing list archives, # From: Andrew Beekhof # Subject: Re: pacemaker+drbd promotion delay # Date: 2012-04-13 01:47:37 GMT # e.g.: http://thread.gmane.org/gmane.linux.highavailability.user/37089/focus=37163 # --- : "do nothing" ; else drbd_update_master_score fi case $status in (0) : "OCF_SUCCESS" ;; (1) : "OCF_ERR_GENERIC" ;; (2) : "OCF_ERR_ARGS" ;; (3) : "OCF_ERR_UNIMPLEMENTED" ;; (4) : "OCF_ERR_PERM" ;; (5) : "OCF_ERR_INSTALLED" ;; (6) : "OCF_ERR_CONFIGURED" ;; (7) : "OCF_NOT_RUNNING" ;; (8) : "OCF_RUNNING_MASTER" ;; (9) : "OCF_FAILED_MASTER" ;; (*) : " WTF? $status " ;; esac return $status } figure_out_drbd_peer_uname() { # depending on whether or not the peer is currently # configured, slave, master, or about to be started, # it may be mentioned in various variables (or not at all) local x # intentionally not cared for stop_uname x=$(printf "%s\n" \ $OCF_RESKEY_CRM_meta_notify_start_uname \ $OCF_RESKEY_CRM_meta_notify_promote_uname \ $OCF_RESKEY_CRM_meta_notify_master_uname \ $OCF_RESKEY_CRM_meta_notify_slave_uname \ $OCF_RESKEY_CRM_meta_notify_demote_uname | grep -vix -m1 -e "$HOSTNAME" ) DRBD_TO_PEER=${x:+ --peer $x} } my_udevsettle() { for dev in ${DRBD_DEVICES[@]}; do while ! test -b $dev; do sleep 1; done done return 0 } create_device_udev_settle() { local dev if $DRBD_HAS_MULTI_VOLUME; then if do_drbdadm new-resource $DRBD_RESOURCE && do_drbdadm sh-new-minor $DRBD_RESOURCE; then my_udevsettle else return 1 fi elif do_drbdadm syncer $DRBD_RESOURCE ; then my_udevsettle else return 1 fi } drbd_start() { local rc local status local first_try=true rc=$OCF_ERR_GENERIC if ! is_drbd_enabled; then do_cmd modprobe -s drbd `$DRBDADM sh-mod-parms` || { ocf_log err "Cannot load the drbd module."; return $OCF_ERR_INSTALLED } ocf_log debug "$DRBD_RESOURCE start: Module loaded." fi # Keep trying to bring up the resource; # wait for the CRM to time us out if this fails while :; do drbd_status status=$? case "$status" in $OCF_SUCCESS) # Just in case we have to adjust something, this is a # good place to do it. Actually, we don't expect to be # called to "start" an already "running" resource, so # this is probably dead code. # Also, ignore the exit code of adjust, as we are # "running" already, anyways, right? figure_out_drbd_peer_uname do_drbdadm $DRBD_TO_PEER adjust $DRBD_RESOURCE rc=$OCF_SUCCESS break ;; $OCF_NOT_RUNNING) # Check for offline resize. If using internal meta data, # we may need to move it first to its expected location. $first_try && do_drbdadm check-resize $DRBD_RESOURCE figure_out_drbd_peer_uname if ! create_device_udev_settle; then # We cannot even create the objects exit $OCF_ERR_GENERIC fi if ! do_drbdadm $DRBD_TO_PEER attach $DRBD_RESOURCE ; then # If we cannot up it, even on the second try, # it is unlikely to get better. Don't wait for # this operation to timeout, but short circuit # exit with generic error. $first_try || exit $OCF_ERR_GENERIC sleep 1 fi ;; $OCF_RUNNING_MASTER) ocf_log warn "$DRBD_RESOURCE already Primary, demoting." do_drbdadm secondary $DRBD_RESOURCE esac $first_try || sleep 1 first_try=false done # in case someone does not configure monitor, # we must at least call it once after start. drbd_update_master_score return $rc } drbd_promote() { local rc local status local first_try=true rc=$OCF_ERR_GENERIC # Keep trying to promote the resource; # wait for the CRM to time us out if this fails while :; do drbd_status status=$? case "$status" in $OCF_SUCCESS) do_drbdadm primary $DRBD_RESOURCE if [[ $? = 17 ]]; then # All available disks are inconsistent, # or I am consistent, but failed to fence the peer. # Cannot become primary. # No need to retry indefinitely. ocf_log crit "Refusing to be promoted to Primary without UpToDate data" break fi ;; $OCF_NOT_RUNNING) ocf_log error "Trying to promote a resource that was not started" break ;; $OCF_RUNNING_MASTER) rc=$OCF_SUCCESS break esac $first_try || sleep 1 first_try=false done # avoid too tight pacemaker driven "recovery" loop, # if promotion keeps failing for some reason if [[ $rc != 0 ]] && (( $SECONDS < 15 )) ; then delay=$(( 15 - SECONDS )) ocf_log warn "promotion failed; sleep $delay # to prevent tight recovery loop" sleep $delay fi return $rc } drbd_demote() { local rc local status local first_try=true rc=$OCF_ERR_GENERIC # Keep trying to demote the resource; # wait for the CRM to time us out if this fails while :; do drbd_status status=$? case "$status" in $OCF_SUCCESS) rc=$OCF_SUCCESS break ;; $OCF_NOT_RUNNING) ocf_log error "Trying to promote a resource that was not started" break ;; $OCF_RUNNING_MASTER) do_drbdadm secondary $DRBD_RESOURCE esac $first_try || sleep 1 first_try=false done return $rc } drbd_stop() { local rc=$OCF_ERR_GENERIC local first_try=true # Keep trying to bring down the resource; # wait for the CRM to time us out if this fails while :; do drbd_status status=$? case "$status" in $OCF_SUCCESS) do_drbdadm down $DRBD_RESOURCE ;; $OCF_NOT_RUNNING) # Just in case, down it anyways, in case it has been # deconfigured but not yet removed. # Relevant for >= 8.4. do_drbdadm down $DRBD_RESOURCE # But ignore any return codes, # we are not running, so stop is successfull. rc=$OCF_SUCCESS break ;; $OCF_RUNNING_MASTER) ocf_log warn "$DRBD_RESOURCE still Primary, demoting." do_drbdadm secondary $DRBD_RESOURCE esac $first_try || sleep 1 first_try=false done # if there is some Master (Primary) still around, # outdate myself in drbd on-disk meta data. maybe_outdate_self # do not let old master scores laying around. # they may confuse crm if this node was set to standby. remove_master_score return $rc } drbd_notify() { local n_type=$OCF_RESKEY_CRM_meta_notify_type local n_op=$OCF_RESKEY_CRM_meta_notify_operation # active_* and *_resource not really interessting # : "== DEBUG == active = $OCF_RESKEY_CRM_meta_notify_active_uname" : "== DEBUG == slave = $OCF_RESKEY_CRM_meta_notify_slave_uname" : "== DEBUG == master = $OCF_RESKEY_CRM_meta_notify_master_uname" : "== DEBUG == start = $OCF_RESKEY_CRM_meta_notify_start_uname" : "== DEBUG == promote = $OCF_RESKEY_CRM_meta_notify_promote_uname" : "== DEBUG == stop = $OCF_RESKEY_CRM_meta_notify_stop_uname" : "== DEBUG == demote = $OCF_RESKEY_CRM_meta_notify_demote_uname" case $n_type/$n_op in */start) # We do not get a /pre/ start notification for ourself. # but we get a /pre/ start notification for the other side, unless both # are started from the same transition graph. If there are only two # peers (the "classic" two-node DRBD), this adjust is usually a no-op. # # In case of more than one _possible_ peer, we may still be StandAlone, # or configured for a meanwhile failed peer, and should now adjust our # network settings during pre-notification of start of the other node. # # We usually get /post/ notification for ourself and the peer. # In both cases adjust should be a no-op. drbd_set_status_variables figure_out_drbd_peer_uname do_drbdadm $DRBD_TO_PEER -v adjust $DRBD_RESOURCE ;; post/*) # After something has been done is a good time to # recheck our status: drbd_set_status_variables drbd_update_master_score : == DEBUG == ${DRBD_DSTATE_REMOTE[*]} == case ${DRBD_DSTATE_REMOTE[*]} in *DUnknown*) # Still not communicating. # Maybe someone else is primary (too)? maybe_outdate_self esac esac return $OCF_SUCCESS } # "macro" to be able to give useful error messages # on clone resource configuration error. meta_expect() { local what=$1 whatvar=OCF_RESKEY_CRM_meta_${1//-/_} op=$2 expect=$3 local val=${!whatvar} if [[ -n $val ]]; then # [, not [[, or it won't work ;) [ $val $op $expect ] && return fi ocf_log err "meta parameter misconfigured, expected $what $op $expect, but found ${val:-unset}." exit $OCF_ERR_CONFIGURED } ls_stat_is_block_maj_147() { set -- $(command ls -L -l "$1" 2>/dev/null) [[ $1 = b* ]] && [[ $5 == 147,* ]] } check_crm_feature_set() { set -- ${OCF_RESKEY_crm_feature_set//[!0-9]/ } local a=${1:-0} b=${2:-0} c=${3:-0} (( a > 3 )) || (( a == 3 && b > 0 )) || (( a == 3 && b == 0 && c > 0 )) || ocf_log warn "You may be disappointed: This RA is intended for pacemaker 1.0 or better!" } drbd_validate_all () { DRBDADM="drbdadm" DRBDSETUP="drbdsetup" DRBD_HAS_MULTI_VOLUME=false # these will _exit_ if they don't find the binaries check_binary $DRBDADM check_binary $DRBDSETUP # XXX I really take cibadmin, sed, grep, etc. for granted. local VERSION DRBDADM_VERSION_CODE=0 if VERSION="$($DRBDADM --version 2>/dev/null)"; then eval $VERSION fi if (( $DRBDADM_VERSION_CODE >= 0x080400 )); then DRBD_HAS_MULTI_VOLUME=true fi check_crm_feature_set # Check clone and M/S options. meta_expect clone-max -le 2 meta_expect clone-node-max = 1 meta_expect master-node-max = 1 meta_expect master-max -le 2 # Rather than returning $OCF_ERR_CONFIGURED, we sometimes return # $OCF_ERR_INSTALLED here: the local config may be broken, but some # other node may have a valid config. # check drbdconf plausibility case "$OCF_RESKEY_drbdconf" in "") # this is actually ok. drbdadm has its own builtin defaults. # but as long as we assign an explicit default above, # this cannot happen anyways. : ;; *[!-%+./0-9:=@A-Z_a-z]*) # no, I do not trust the configurable cib parameters. ocf_log err "drbdconf name must only contain [-%+./0-9:=@A-Z_a-z]" return $OCF_ERR_CONFIGURED ;; *) # Check if we can read the configuration file. if [ ! -r "${OCF_RESKEY_drbdconf}" ]; then ocf_log err "Configuration file ${OCF_RESKEY_drbdconf} does not exist or is not readable!" return $OCF_ERR_INSTALLED fi DRBDADM="$DRBDADM -c $OCF_RESKEY_drbdconf" esac # check drbd_resource plausibility case "$OCF_RESKEY_drbd_resource" in "") ocf_log err "No resource name specified!" return $OCF_ERR_CONFIGURED ;; *[!-%+./0-9:=@A-Z_a-z]*) # no, I do not trust the configurable cib parameters. ocf_log err "Resource name must only contain [-%+./0-9:=@A-Z_a-z]" return $OCF_ERR_CONFIGURED esac # exporting this is useful for "drbdsetup show". # and it makes it all a little bit more readable. export DRBD_RESOURCE=$OCF_RESKEY_drbd_resource # The resource should appear in the config file, # otherwise something's fishy # NOTE # since 8.4 has multi volume support, # DRBD_DEVICES will be a shell array! # FIXME we should double check that we explicitly restrict the set of # valid characters in device names... if DRBD_DEVICES=($($DRBDADM --stacked sh-dev $DRBD_RESOURCE 2>/dev/null)); then # apparently a "stacked" resource. Remember for future DRBDADM calls. DRBDADM="$DRBDADM -S" elif DRBD_DEVICES=($($DRBDADM sh-dev $DRBD_RESOURCE 2>/dev/null)); then : # nothing to do. else if [[ $__OCF_ACTION = "monitor" && $OCF_RESKEY_CRM_meta_interval = 0 ]]; then # ok, this was a probe. That may happen on any node, # to enforce configuration. return $OCF_NOT_RUNNING else # hm. probably misconfigured constraint somewhere. # sorry. don't retry anywhere. ocf_log err "DRBD resource ${DRBD_RESOURCE} not found in configuration file ${OCF_RESKEY_drbdconf}." remove_master_score return $OCF_ERR_INSTALLED fi fi # check for master-max and allow-two-primaries on start|promote only, # so it could be stopped still, if someone re-configured while running. case $__OCF_ACTION:$OCF_RESKEY_CRM_meta_master_max in start:2|promote:2) if ! $DRBDADM -d -v dump $DRBD_RESOURCE 2>/dev/null | grep -q -Ee '^[[:space:]]*allow-two-primaries([[:space:]]+yes)?;$' then ocf_log err "master-max = 2, but DRBD resource $DRBD_RESOURCE does not allow-two-primaries." return $OCF_ERR_CONFIGURED fi esac # detect whether notify is configured or not. # for probes, the meta_notify* namespace is not exported. case $__OCF_ACTION in monitor|validate-all) :;; *) # Test if the environment variables for either the notify # enabled, or one of its effects, are set. # If both are unset, we complain. if ! ocf_is_true ${OCF_RESKEY_CRM_meta_notify} && [[ ${OCF_RESKEY_CRM_meta_notify_start_uname- NOT SET } = " NOT SET " ]]; then ocf_log err "you really should enable notify when using this RA" return $OCF_ERR_CONFIGURED fi esac local i j n=0 fallback=false for i in $OCF_RESKEY_adjust_master_score; do [[ $i = *[!0-9]* ]] && fallback=true && ocf_log err "BAD adjust_master_score value $i ; falling back to default" [[ $j && $i -lt $j ]] && fallback=true && ocf_log err "BAD adjust_master_score value $j > $i ; falling back to default" n=$(( n+1 )) done [[ $n != 4 ]] && fallback=true && ocf_log err "Not enough adjust_master_score values ($n != 4); falling back to default" $fallback && OCF_RESKEY_adjust_master_score=$OCF_RESKEY_adjust_master_score_default # we use it in various places, # just make sure it contains what we expect. HOSTNAME=`uname -n` return $OCF_SUCCESS } ####################################################################### if [ $# != 1 ]; then drbd_usage exit $OCF_ERR_ARGS fi # if $__OCF_ACTION = monitor, but meta_interval not set, # this is a "probe". we could change behaviour. : ${OCF_RESKEY_CRM_meta_interval=0} case $__OCF_ACTION in meta-data) meta_data exit $OCF_SUCCESS ;; usage) drbd_usage exit $OCF_SUCCESS esac if $USE_DEBUG_LOG ; then exec 2>&9 set -x fi # Everything except usage and meta-data must pass the validate test drbd_validate_all || exit case $__OCF_ACTION in start) drbd_start ;; stop) drbd_stop ;; notify) drbd_notify ;; promote) drbd_promote ;; demote) drbd_demote ;; status) drbd_status ;; monitor) drbd_monitor ;; validate-all) ;; *) drbd_usage exit $OCF_ERR_UNIMPLEMENTED esac # exit code is the exit code (return code) of the last command (shell function) drbd-8.4.4/scripts/drbd.rules0000664000000000000000000000123511736532437014634 0ustar rootroot# This file contains the rules to create named DRBD devices. # DO NOT WRAP THIS LINE # # old udev does not understand some of it, # and would end up skipping only some lines, not the full rule. # which can cause all sort of trouble with strange-named device nodes # for completely unrelated devices, # resulting in unusable network lookback, etc. # # in case this is "accidentally" installed on a system with old udev, # having it as one single line avoids those problems. # # DO NOT WRAP THIS LINE SUBSYSTEM=="block", KERNEL=="drbd*", IMPORT{program}="/sbin/drbdadm sh-udev minor-%m", NAME="$env{DEVICE}", SYMLINK="drbd/by-res/$env{RESOURCE} drbd/by-disk/$env{DISK}" drbd-8.4.4/scripts/drbd.sh.rhcs0000775000000000000000000000657611610000300015034 0ustar rootroot#!/bin/bash # # Copyright LINBIT, 2008 # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 2, or (at your option) any # later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; see the file COPYING. If not, write to the # Free Software Foundation, Inc., 675 Mass Ave, Cambridge, # MA 02139, USA. # # # DRBD resource management using the drbdadm utility. # LC_ALL=C LANG=C PATH=/bin:/sbin:/usr/bin:/usr/sbin export LC_ALL LANG PATH . $(dirname $0)/ocf-shellfuncs drbd_verify_all() { # Do we have the drbdadm utility? if ! which drbdadm >/dev/null 2>&1 ; then ocf_log error "drbdadm not installed, not found in PATH ($PATH), or not executable." return $OCF_ERR_INSTALLED fi # Is drbd loaded? if ! grep drbd /proc/modules >/dev/null 2>&1; then ocf_log error "drbd not found in /proc/modules. Do you need to modprobe?" return $OCF_ERR_INSTALLED fi # Do we have the "resource" parameter? if [ -n "$OCF_RESKEY_resource" ]; then # Can drbdadm parse the resource name? if ! drbdadm sh-dev $OCF_RESKEY_resource >/dev/null 2>&1; then ocf_log error "DRBD resource \"$OCF_RESKEY_resource\" not found." return $OCF_ERR_CONFIGURED fi # Is the backing device a locally available block device? backing_dev=$(drbdadm sh-ll-dev $OCF_RESKEY_resource) for dev in $backing_dev ; do [ -b $dev ] && continue; ocf_log error "Backing device for DRBD resource \"$OCF_RESKEY_resource\" ($dev) not found or not a block device." return $OCF_ERR_INSTALLED done fi return 0 } drbd_status() { local all_primary=true for role in $(drbdadm role $OCF_RESKEY_resource); do case $role in Primary/*) ;; Secondary/*) all_primary=false ;; *) return $OCF_ERR_GENERIC ;; esac done $all_primary && return $OCF_SUCCESS return $OCF_NOT_RUNNING } drbd_promote() { drbdadm primary $OCF_RESKEY_resource || return $? } drbd_demote() { drbdadm secondary $OCF_RESKEY_resource || return $? } if [ -z "$OCF_CHECK_LEVEL" ]; then OCF_CHECK_LEVEL=0 fi # This one doesn't need to pass the verify check case $1 in meta-data) cat `echo $0 | sed 's/^\(.*\)\.sh$/\1.metadata/'` && exit 0 exit $OCF_ERR_GENERIC ;; esac # Everything else does drbd_verify_all || exit $? case $1 in start) if drbd_status; then ocf_log debug "DRBD resource ${OCF_RESKEY_resource} already configured" exit 0 fi drbd_promote if [ $? -ne 0 ]; then exit $OCF_ERR_GENERIC fi exit $? ;; stop) if drbd_status; then drbd_demote if [ $? -ne 0 ]; then exit $OCF_ERR_GENERIC fi else ocf_log debug "DRBD resource ${OCF_RESKEY_resource} is not configured" fi exit 0 ;; status|monitor) drbd_status exit $? ;; restart) $0 stop || exit $OCF_ERR_GENERIC $0 start || exit $OCF_ERR_GENERIC exit 0 ;; verify-all) exit 0 ;; *) echo "usage: $0 {start|stop|status|monitor|restart|meta-data|verify-all}" exit $OCF_ERR_GENERIC ;; esac drbd-8.4.4/scripts/drbdadm.bash_completion0000664000000000000000000001124311610000300017274 0ustar rootroot# # /etc/bash_completion.d/drbdadm # # Bash completion for the DRBD top-level management application, drbdadm. # # If you have bash completion enabled, this module will # # - provide tab completion for drbdadm sub-commands (up, down, primary, # secondary etc.); # # - try to detect your current resource state and provide appropriate # command completion for the sub-command you provided. For example, # when if you have entered the "primary" sub-command, it will list # only those resources that are currently in the Secondary role; # # - differentiate between stacked and unstacked resources. # # This module does NOT guarantee that the DRBD state engine will in # fact agree to do what you ask it to. For example, resources that are # currently Primary and not connected are not excluded from the # completion list for the "detach" sub-command. # # Finally, this module is only capable of parsing resources correctly # if you are using the default location for your DRBD configuration # file (/etc/drbd.conf). __drbdadm_all_resources() { # Detects all resources currently listed in drbd.conf local resources="$(${DRBDADM} sh-resources) all" COMPREPLY=( $(compgen -W "$resources" -- "$current") ) } __drbdadm_resources_by_status() { # Detects only those resources that match a particular status local status_type="$1" shift 1 local status_filter="$*" local resources="$(${DRBDADM} sh-resources)" local filtered_resources local res for res in $resources; do local resource_status="$(${DRBDADM} $status_type $res 2>/dev/null)" # In case of multiple volumes, consider only the first line set -- $resource_status resource_status=$1 local i for i in $status_filter; do if [ "${resource_status%%/*}" = $i ]; then filtered_resources="$filtered_resources $res" fi done done COMPREPLY=( $(compgen -W "$filtered_resources" -- "$current") ) } __drbdadm_commands() { # Lists drbdadm sub-commands COMPREPLY=( $(compgen -W "$drbdadm_command_list" -- "$current") ) } __drbdadm_options() { # Lists global drbdadm options local options='-d --dry-run -v --verbose -S --stacked -t --config-to-test' COMPREPLY=( $(compgen -W "$options" -- "$current") ) } __drbdadm_subcmd_options() { local subcmd="$1" local options=($(drbdadm help $subcmd | sed -e '1,/OPTIONS FOR/ d;/^$/,$ d;s/ \(--[a-z-]*\).*/\1/')) local filtered local o have for o in ${options[@]}; do for have in ${COMP_WORDS[@]}; do [[ $o = "$have" ]] && continue 2 done filtered="$filtered $o" done COMPREPLY=( $(compgen -W "$filtered" -- "$current") ) } _drbdadm() { local DRBDADM=${COMP_WORDS[0]} local drbdadm_command_list=' attach disk-options detach connect net-options disconnect up resource-options down primary secondary invalidate invalidate-remote outdate verify pause-sync resume-sync resize adjust wait-connect role cstate dstate dump wait-connect wait-con-int create-md dump-md wipe-md get-gi show-gi help apply-al hidden-commands ' # Redefine the drbdadm we use in __drbdadm_all_resources and # __drbdadm_resources_by_status, if running in stacked mode case "$COMP_LINE " in *" -S "*|*" --stacked "*) DRBDADM="$DRBDADM --stacked" ;; esac local current previous # The word currently being evaluated for completion current=${COMP_WORDS[COMP_CWORD]} # The word that precedes the currently-evaluated one previous=${COMP_WORDS[COMP_CWORD-1]} case "$previous" in drbdadm) case "$current" in -*) __drbdadm_options ;; *) __drbdadm_commands ;; esac ;; primary) __drbdadm_resources_by_status "role" "Secondary" ;; secondary) __drbdadm_resources_by_status "role" "Primary" ;; detach|disk-options) __drbdadm_resources_by_status "dstate" "UpToDate" "Inconsistent" "Outdated" ;; outdate) __drbdadm_resources_by_status "dstate" "UpToDate" ;; attach|apply-al) __drbdadm_resources_by_status "dstate" "Diskless" "Unconfigured" ;; connect) __drbdadm_resources_by_status "cstate" "StandAlone" "Unconfigured" ;; invalidate-remote) __drbdadm_resources_by_status "cstate" "Connected" ;; disconnect|net-options) __drbdadm_resources_by_status "cstate" "Connected" "WFConnection" "VerifyT" "VerifyS" ;; verify) __drbdadm_resources_by_status "cstate" "Connected" ;; pause-sync) __drbdadm_resources_by_status "cstate" "SyncSource" "SyncTarget" ;; resume-sync) __drbdadm_resources_by_status "cstate" "PausedSyncS" "PausedSyncT" ;; *) if (( COMP_CWORD > 2 )); then local subcmd subcmd=${COMP_WORDS[1]} case "$drbdadm_command_list" in *" $subcmd "*) __drbdadm_subcmd_options $subcmd ;; esac else __drbdadm_all_resources fi ;; esac } complete -o default -F _drbdadm drbdadm drbd-8.4.4/scripts/drbddisk0000775000000000000000000000613212132747531014354 0ustar rootroot#!/bin/bash # # This script is inteded to be used as resource script by heartbeat # # Copright 2003-2008 LINBIT Information Technologies # Philipp Reisner, Lars Ellenberg # ### DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi if [ "$#" -eq 2 ]; then RES="$1" CMD="$2" else RES="all" CMD="$1" fi ## EXIT CODES # since this is a "legacy heartbeat R1 resource agent" script, # exit codes actually do not matter that much as long as we conform to # http://wiki.linux-ha.org/HeartbeatResourceAgent # but it does not hurt to conform to lsb init-script exit codes, # where we can. # http://refspecs.linux-foundation.org/LSB_3.1.0/ # LSB-Core-generic/LSB-Core-generic/iniscrptact.html #### drbd_set_role_from_proc_drbd() { local out if ! test -e /proc/drbd; then ROLE="Unconfigured" return fi dev=$( $DRBDADM sh-dev $RES ) minor=${dev#/dev/drbd} if [[ $minor = *[!0-9]* ]] ; then # sh-minor is only supported since drbd 8.3.1 minor=$( $DRBDADM sh-minor $RES ) fi if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then ROLE=Unknown return fi if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then set -- $out ROLE=${5%/*} : ${ROLE:=Unconfigured} # if it does not show up else ROLE=Unknown fi } case "$CMD" in start) # try several times, in case heartbeat deadtime # was smaller than drbd ping time try=6 while true; do $DRBDADM primary $RES && break let "--try" || exit 1 # LSB generic error sleep 1 done ;; stop) # heartbeat (haresources mode) will retry failed stop # for a number of times in addition to this internal retry. try=3 while true; do $DRBDADM secondary $RES && break # We used to lie here, and pretend success for anything != 11, # to avoid the reboot on failed stop recovery for "simple # config errors" and such. But that is incorrect. # Don't lie to your cluster manager. # And don't do config errors... let --try || exit 1 # LSB generic error sleep 1 done ;; status) if [ "$RES" = "all" ]; then echo "A resource name is required for status inquiries." exit 10 fi ST=$( $DRBDADM role $RES ) ROLE=${ST%/*} case $ROLE in Primary|Secondary|Unconfigured) # expected ;; *) # unexpected. whatever... # If we are unsure about the state of a resource, we need to # report it as possibly running, so heartbeat can, after failed # stop, do a recovery by reboot. # drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is # suddenly readonly. So we retry by parsing /proc/drbd. drbd_set_role_from_proc_drbd esac case $ROLE in Primary) echo "running (Primary)" exit 0 # LSB status "service is OK" ;; Secondary|Unconfigured) echo "stopped ($ROLE)" exit 3 # LSB status "service is not running" ;; *) # NOTE the "running" in below message. # this is a "heartbeat" resource script, # the exit code is _ignored_. echo "cannot determine status, may be running ($ROLE)" exit 4 # LSB status "service status is unknown" ;; esac ;; *) echo "Usage: drbddisk [resource] {start|stop|status}" exit 1 ;; esac exit 0 drbd-8.4.4/scripts/drbdupper0000664000000000000000000000221712132747531014552 0ustar rootroot#!/bin/bash # # This script is inteded to be used as resource script by heartbeat # # Dec 2005 by Philipp Reisner. # # In heartbeat's haresources you should have: # IPAddr::XXX drbdupper::resource Filesystem::XXX # in other words, you have to allocate the service IP before you # try to activate the upper DRBD resource. ### DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi if [ "$#" -eq 2 ]; then RES="$1" CMD="$2" else echo "A resource name is required." exit 10 fi if [ "$RES" = "all" ]; then echo "A resource name is required." exit 10 fi case "$CMD" in start) set -e # exit if one of these fails $DRBDADM primary `$DRBDADM -S sh-lr-of $RES` $DRBDADM -S check-resize $RES || true # may fail $DRBDADM -S adjust $RES $DRBDADM -S wait-connect $RES || true # may fail $DRBDADM -S primary $RES ;; stop) $DRBDADM -S down $RES $DRBDADM secondary `$DRBDADM -S sh-lr-of $RES` ;; status) if $DRBDADM -S role $RES | grep -q "Primary/"; then echo "running" else echo "stopped" fi ;; *) echo "Usage: drbdupper {resource} {start|stop|status}" exit 1 ;; esac exit 0 drbd-8.4.4/scripts/get_uts_release.sh0000775000000000000000000000055511661146603016353 0ustar rootroot#!/bin/bash { for x in include/generated/utsrelease.h include/linux/{utsrelease,version}.h; do for d in $KDIR $O; do test -e "$d/$x" || continue; echo "#include \"$d/$x\""; done; done; echo "drbd_kernel_release UTS_RELEASE" } | gcc -nostdinc -E -P - | sed -ne 's/^drbd_kernel_release "\(.*\)".*/\1/p' drbd-8.4.4/scripts/global_common.conf0000664000000000000000000000345412132747531016323 0ustar rootrootglobal { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { handlers { # These are EXAMPLE handlers only. # They may have severe implications, # like hard resetting the node under certain circumstances. # Be careful when chosing your poison. # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb } options { # cpu-mask on-no-data-accessible } disk { # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes # disk-drain md-flushes resync-rate resync-after al-extents # c-plan-ahead c-delay-target c-fill-target c-max-rate # c-min-rate disk-timeout } net { # protocol timeout max-epoch-size max-buffers unplug-watermark # connect-int ping-int sndbuf-size rcvbuf-size ko-count # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri # after-sb-1pri after-sb-2pri always-asbp rr-conflict # ping-timeout data-integrity-alg tcp-cork on-congestion # congestion-fill congestion-extents csums-alg verify-alg # use-rle } } drbd-8.4.4/scripts/notify.sh0000775000000000000000000000710111610000300014454 0ustar rootroot#!/bin/bash # # notify.sh -- a notification handler for various DRBD events. # This is meant to be invoked via a symlink in /usr/lib/drbd, # by drbdadm's userspace callouts. # try to get possible output on stdout/err to syslog PROG=${0##*/} exec > >(2>&- ; logger -t "$PROG[$$]" -p local5.info) 2>&1 if [[ $DRBD_VOLUME ]]; then pretty_print="$DRBD_RESOURCE/$DRBD_VOLUME (drbd$DRBD_MINOR)" else pretty_print="$DRBD_RESOURCE" fi echo "invoked for $pretty_print" # Default to sending email to root, unless otherwise specified RECIPIENT=${1:-root} # check arguments specified on command line if [ -z "$RECIPIENT" ]; then echo "You must specify a notification recipient when using this handler." >&2 exit 1 fi # check envars normally passed in by drbdadm for var in DRBD_RESOURCE DRBD_PEER; do if [ -z "${!var}" ]; then echo "Environment variable \$$var not found (this is normally passed in by drbdadm)." >&2 exit 1 fi done : ${DRBD_CONF:="usually /etc/drbd.conf"} DRBD_LOCAL_HOST=$(hostname) case "$0" in *split-brain.sh) SUBJECT="DRBD split brain on resource $pretty_print" BODY=" DRBD has detected split brain on resource $pretty_print between $DRBD_LOCAL_HOST and $DRBD_PEER. Please rectify this immediately. Please see http://www.drbd.org/users-guide/s-resolve-split-brain.html for details on doing so." ;; *out-of-sync.sh) SUBJECT="DRBD resource $pretty_print has out-of-sync blocks" BODY=" DRBD has detected out-of-sync blocks on resource $pretty_print between $DRBD_LOCAL_HOST and $DRBD_PEER. Please see the system logs for details." ;; *io-error.sh) SUBJECT="DRBD resource $pretty_print detected a local I/O error" BODY=" DRBD has detected an I/O error on resource $pretty_print on $DRBD_LOCAL_HOST. Please see the system logs for details." ;; *pri-lost.sh) SUBJECT="DRBD resource $pretty_print is currently Primary, but is to become SyncTarget on $DRBD_LOCAL_HOST" BODY=" The DRBD resource $pretty_print is currently in the Primary role on host $DRBD_LOCAL_HOST, but lost the SyncSource election process." ;; *pri-lost-after-sb.sh) SUBJECT="DRBD resource $pretty_print is currently Primary, but lost split brain auto recovery on $DRBD_LOCAL_HOST" BODY=" The DRBD resource $pretty_print is currently in the Primary role on host $DRBD_LOCAL_HOST, but was selected as the split brain victim in a post split brain auto-recovery." ;; *pri-on-incon-degr.sh) SUBJECT="DRBD resource $pretty_print no longer has access to valid data on $DRBD_LOCAL_HOST" BODY=" DRBD has detected that the resource $pretty_print on $DRBD_LOCAL_HOST has lost access to its backing device, and has also lost connection to its peer, $DRBD_PEER. This resource now no longer has access to valid data." ;; *emergency-reboot.sh) SUBJECT="DRBD initiating emergency reboot of node $DRBD_LOCAL_HOST" BODY=" Due to an emergency condition, DRBD is about to issue a reboot of node $DRBD_LOCAL_HOST. If this is unintended, please check your DRBD configuration file ($DRBD_CONF)." ;; *emergency-shutdown.sh) SUBJECT="DRBD initiating emergency shutdown of node $DRBD_LOCAL_HOST" BODY=" Due to an emergency condition, DRBD is about to shut down node $DRBD_LOCAL_HOST. If this is unintended, please check your DRBD configuration file ($DRBD_CONF)." ;; *) SUBJECT="Unspecified DRBD notification" BODY=" DRBD on $DRBD_LOCAL_HOST was configured to launch a notification handler for resource $pretty_print, but no specific notification event was set. This is most likely due to DRBD misconfiguration. Please check your configuration file ($DRBD_CONF)." ;; esac echo "$BODY" | mail -s "$SUBJECT" $RECIPIENT drbd-8.4.4/scripts/outdate-peer.sh0000775000000000000000000000570711101361567015600 0ustar rootroot#!/bin/bash # # outdate-peer.sh # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # # It is expected that your clustermanager of choice brings its own # implementation of this ... E.g. Heartbeat's variant should be able # to use all of Heartbeat's communication pathes, including the # serial connections. # # This script requires, that there is a password less ssh-key for # root. You should not use such keys on a bigger scale. Only use # it with the "from" option! # # How to setup SSH: # # 1. ssh-keygen -t dsa (as root, on the first machine) # no passphrase! # # 2. go to the second machine, edit the file .ssh/authorized_keys2 # Start a line with from="10.9.9.181,10.99.99.1" [content of id_dsa.pub] # Put the IPs of you first machine here, also the id_dsa.pub # is from the first machine All needs to be in a single line. # # 3. ssh from the first machine to the second one, do this for all # IP addresses of the second machine. When doing this the first # time it asks you if it should ad the fingerprint to the list # of known hosts: Say yes here. # # 4. Do this a second time for each IP address, now it should not ask # any questions... # # Repeate this 4 steps for the other direction, BTW, you can not # copy the file over, since you have two distrinct keys.. and also # the IP addresses in the from="" part are different. # # # The caller (drbdadm) sets DRBD_RESOURCE and DRBD_PEER for us. # TIMEOUT=6 for P in "$@"; do if [ "$P" = "on" ]; then EXP_HOST_NAME=1 EXP_PEER_IP=0 EXP_OWN_IP=0 else if [ "$EXP_PEER_IP" = "1" ]; then PEER_IP="$PEER_IP $P" fi; if [ "$EXP_OWN_IP" = "1" ]; then OWN_IP="$OWN_IP $P" fi; if [ "$EXP_HOST_NAME" = "1" ]; then if [ "$P" != `uname -n` ]; then EXP_PEER_IP=1 else EXP_OWN_IP=1 fi EXP_HOST_NAME=0 fi fi done if [ -z "$PEER_IP" -o -z "$OWN_IP" ]; then echo "USAGE: outdate-peer.sh on host1 IP IP ... on host2 IP IP ..." exit 10 fi for IP in $PEER_IP; do ssh $IP drbdadm outdate ${DRBD_RESOURCE:-r0} & SSH_PID="$SSH_PID $!" done SSH_CMDS_RUNNING=1 while [ "$SSH_CMDS_RUNNING" = "1" ] && [ $TIMEOUT -gt 0 ]; do sleep 1 SSH_CMDS_RUNNING=0 for P in $SSH_PID; do if [ -d /proc/$P ]; then SSH_CMDS_RUNNING=1; fi done TIMEOUT=$(( $TIMEOUT - 1 )) done RV=5 for P in $SSH_PID; do if [ -d /proc/$P ]; then kill $P wait $P else wait $P EXIT_CODE=$? # exit codes of drbdmeata outdate: # 5 -> is inconsistent # 0 -> is outdated # 17 -> outdate failed because peer is primary. # Unfortunately 20 can have other reasons too.... if [ $EXIT_CODE -eq 5 ]; then RV=3; else if [ $EXIT_CODE -eq 17 ]; then RV=6; else if [ $EXIT_CODE -eq 0 ]; then RV=4; else echo "do not know about this exit code" fi fi fi fi done # We return to DRBD - kernel driver: # # 6 -> peer is primary (and UpToDate) # 5 -> peer is down / unreachable. # 4 -> peer is outdated # 3 -> peer is inconsistent exit $RV drbd-8.4.4/scripts/pretty-proc-drbd.sh0000664000000000000000000000742011610000300016346 0ustar rootroot#!/bin/bash PATH=/sbin:$PATH DEFAULTS=/etc/defaults/drbd-pretty-status # for highlighting see console_codes(4) colorize=false short=true # node role: Primary Secondary Unknown c_pri_1=$'\e[44m' c_pri_0=$'\e[49m' #c_sec_1=$'\e[7m' c_sec_0=$'\e[27m' c_sec_1="" c_sec_0="" c_unk_1=$'\e[43m' c_unk_0=$'\e[49m' # connection state: # Unconfigured # # StandAlone c_sta_1=$'\e[34m' c_sta_0=$'\e[39m' # Disconnecting Unconnected Timeout BrokenPipe NetworkFailure ProtocolError TearDown c_net_bad_1=$'\e[41m' c_net_bad_0=$'\e[49m' # WFConnection WFReportParams c_wfc_1=$'\e[36m' c_wfc_0=$'\e[39m' # Connected c_con_1=$'\e[32m' c_con_0=$'\e[39m' # StartingSyncS StartingSyncT WFBitMapS WFBitMapT WFSyncUUID c_ssy_1=$'\e[35m' c_ssy_0=$'\e[39m' # SyncSource PausedSyncS c_src_1=$'\e[46m' c_src_0=$'\e[49m' # SyncTarget PausedSyncT c_tgt_1=$'\e[41m' c_tgt_0=$'\e[49m' # disk state: # Attaching Negotiating DUnknown Consistent # uncolored for now # # Diskless Failed Inconsistent c_dsk_bad_1=$'\e[41m' c_dsk_bad_0=$'\e[49m' # Outdated c_out_1=$'\e[43m' c_out_0=$'\e[44m' # UpToDate c_u2d_1=$'\e[32m' c_u2d_0=$'\e[39m' while true; do case "$1" in -c) colorize=true; shift;; -v) short=false; shift;; *) break;; esac done drbd_pretty_status() { if ! $short || ! type column &> /dev/null || ! type paste &> /dev/null || ! type join &> /dev/null || ! type sed &> /dev/null || ! type tr &> /dev/null then cat /proc/drbd else sed -e '2q' < /proc/drbd sed_script=$( i=0; _sh_status_process() { let i++ ; stacked=${_stacked_on:+"^^${_stacked_on_minor:-${_stacked_on//[!a-zA-Z0-9_ -]/_}}"} printf "s|^ *%u:|%6u\t&%s%s|\n" \ $_minor $i \ "${_res_name//[!a-zA-Z0-9_ -]/_}" "$stacked" }; eval "$(drbdadm sh-status)" ) p() { sed -e "1,2d" \ -e "$sed_script" \ -e '/^ *[0-9]\+: cs:Unconfigured/d;' \ -e 's/^\(.* cs:.*[^ ]\) \([rs]...\)$/\1 - \2/g' \ -e 's/^\(.* \)cs:\([^ ]* \)st:\([^ ]* \)ds:\([^ ]*\)/\1\2\3\4/' \ -e 's/^\(.* \)cs:\([^ ]* \)ro:\([^ ]* \)ds:\([^ ]*\)/\1\2\3\4/' \ -e 's/^\(.* \)cs:\([^ ]*\)$/\1\2/' \ -e 's/^ *[0-9]\+:/ x &??not-found??/;' \ -e '/^$/d;/ns:.*nr:.*dw:/d;/resync:/d;/act_log:/d;' \ -e 's/^\(.\[.*\)\(sync.ed:\)/... ... \2/;/^.finish:/d;' \ -e 's/^\(.[0-9 %]*oos:\)/... ... \1/' \ < "/proc/drbd" | tr -s '\t ' ' ' } m() { join -1 2 -2 1 -o 1.1,2.2,2.3 \ <( ( drbdadm sh-dev all ; drbdadm -S sh-dev all ) | cat -n | sort -k2,2) \ <(sort < /proc/mounts ) | sort -n | tr -s '\t ' ' ' | sed -e 's/^ *//' } # echo "=== p ===" # p # echo "=== m ===" # m # echo "=========" # join -a1 <(p|sort) <(m|sort) # echo "=========" ( echo m:res cs ro ds p mounted fstype join -a1 <(p|sort) <(m|sort) | cut -d' ' -f2-6,8- | sort -k1,1n -k2,2 ) | column -t fi | if [[ $colorize != true ]]; then cat else c_bold=$'\e[1m' c_norm=$'\e[0m' sed -e " s/^??not-found??/$c_dsk_bad_1&$c_dsk_bad_0/g; s/^[^\t ]\+/$c_bold&$c_norm/; s/Primary/$c_pri_1&$c_pri_0/g; s/Secondary/$c_sec_1&$c_sec_0/g; s/\'s # obliterate-peer.sh script. # # Exit Codes (as per; http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html) # - 3 -> peer is inconsistent # - 4 -> peer is outdated (this handler outdated it) [ resource fencing ] # - 5 -> peer was down / unreachable # - 6 -> peer is primary # - 7 -> peer got stonithed [ node fencing ] # This program uses; # - 1 = Something failed # - 7 = Fence succeeded # - 255 = End of program hit... should never happen. # # Features # - Clusters > 2 nodes supported, provided use strict; use warnings; use IO::Handle; my $THIS_FILE="rhcs_fence"; my $conf={ # If a program isn't at the defined path, $ENV{PATH} will be searched. path => { cman_tool => "/usr/sbin/cman_tool", fence_node => "/usr/sbin/fence_node", uname => "/bin/uname", }, # General settings. sys => { # Set 'debug' to '1' for DEBUG output. debug => 0, # Set 'local_delay' to the number of seconds to wait before # fencing the other node. If left to '0', Node with ID = 1 will # have no delay and all other nodes will wait: # ((node ID * 2) + 5) seconds. local_delay => 0, # Local host name. host_name => "", }, # The script will set this. cluster => { this_node => "", }, # These are the environment variables set by DRBD. See 'man drbd.conf' # -> 'handlers'. env => { # The resource triggering the fence. 'DRBD_RESOURCE' => $ENV{DRBD_RESOURCE}, # The resource minor number. 'DRBD_MINOR' => $ENV{DRBD_MINOR}, # The peer(s) hostname(s), space separated. 'DRBD_PEERS' => $ENV{DRBD_PEERS}, }, }; # Find executables. find_executables($conf); # Something for the logs to_log($conf, 0, __LINE__, "Attempting to fence peer using RHCS from DRBD..."); # Record the environment variables foreach my $key (keys %{$conf->{env}}) { if (not defined $conf->{env}{$key}) { $conf->{env}{$key}=""; } if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Environment variable: [$key] == [$conf->{env}{$key}]"); } } # Who am I? get_local_node_name($conf); # Am I up to date? get_local_resource_state($conf); # Who else is here? get_info_on_other_nodes($conf); # Who shall I kill? get_target($conf); # Sleep a bit to avoid a double-fence. sleep_a_bit($conf); # Eject the target, if I can. eject_target($conf); # Kill the target. kill_target($conf); exit(255); ############################################################################### # Functions # ############################################################################### # This checks the given paths and, if something isn't found, it searches PATH # trying to find it. sub find_executables { my ($conf)=@_; # Variables. my $check=""; my $bad=0; # Log entries can only happen if I've found 'logger', so an extra check # will be made on 'to_log' calls. my @dirs=split/:/, $ENV{PATH}; foreach my $exe (sort {$b cmp $a} keys %{$conf->{path}}) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Checking if: [$exe] is at: [$conf->{path}{$exe}]"); } if ( not -e $conf->{path}{$exe} ) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: It is not!"); } foreach my $path (@dirs) { $check="$path/$exe"; $check=~s/\/\//\//g; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Checking: [$check]"); } if ( -e $check ) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Found!"); } if (-e $conf->{path}{logger}) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Changed path for: [$exe] from: [$conf->{path}{$exe}] to: [$check]"); } } else { warn "DEBUG: Changed path for: [$exe] from: [$conf->{path}{$exe}] to: [$check]\n"; } $conf->{path}{$exe}=$check; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Set 'path::$exe' to: [$conf->{path}{$exe}]"); } } else { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Not found!"); } } } } else { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Found!"); } next; } # Make sure it exists now. if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Checking again if: [$exe] is at: [$conf->{path}{$exe}]."); } if ( not -e $conf->{path}{$exe} ) { $bad=1; if (-e $conf->{path}{logger}) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "Failed to find executable: [$exe]. Unable to proceed."); } } else { warn "Failed to find executable: [$exe]. Unable to proceed.\n"; } } } if ($bad) { nice_exit($conf, 1); } return(0); } # This is an artificial delay to help avoid a double-fence situation when both # nodes are alive, but comms failed. sub sleep_a_bit { my ($conf)=@_; # Variables my $i_am=$conf->{sys}{this_node}; my $my_id=$conf->{nodes}{$i_am}{id}; my $delay=$my_id; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: I am: [$i_am] and my id is: [$my_id]"); } # I don't want to fail because I don't have an ID number. if (not $my_id) { $my_id=int(rand(10)); } # Calculate the delay. $delay=(($my_id * 2) + 5); # But never wait more than 30 seconds. if ($delay > 30) { $delay=30; } # A user value trumps all. if ($conf->{sys}{local_delay}) { $delay=$conf->{sys}{local_delay}; } # Don't wait if this is node ID 1 unless the user has defined a delay. if (($my_id > 1) or ($conf->{sys}{local_delay})) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "Delaying for: [$delay] seconds to avoid dual-fencing..."); } sleep $delay; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Right then, break over."); } } else { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "I am the first node, so I won't delay."); } } return(0); } # This kills remote node. sub kill_target { my ($conf)=@_; # Variables my $remote_node=$conf->{env}{DRBD_PEERS}; my $sc=""; my $shell_call=""; my $line=""; my $sc_exit=""; # Hug it and squeeze it and call it George. if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "Fencing target: [$remote_node]..."); } $sc=IO::Handle->new(); $shell_call="$conf->{path}{fence_node} -v $remote_node"; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: shell call: [$shell_call]"); } open ($sc, "$shell_call 2>&1 |") or to_log($conf, 1, __LINE__, "Failed to call: [$sc], error was: $!"); while(<$sc>) { chomp; $line=$_; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: $line"); } if ($line=~/fence .*? success/) { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "'fence_node $remote_node' appears to have succeeded!"); } } else { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "'fence_node $remote_node' appears to have failed!"); } if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "Read: [$line]"); } } } $sc->close(); $sc_exit = $?; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Attempt to fence node: [$remote_node] exited with: [$sc_exit]"); } # Exit. if ($sc_exit) { if ($conf->{sys}{debug}) { to_log($conf, 1, __LINE__, "Attempt to fence: [$remote_node] failed!"); } nice_exit($conf, 1); } else { if ($conf->{sys}{debug}) { to_log($conf, 7, __LINE__, "Fencing of: [$remote_node] succeeded!"); } nice_exit($conf, 7); } # This should not be reachable. return(0); } # This ejects the remote node from the cluster, if cluster comms are still up. sub eject_target { my ($conf)=@_; # Variables; my $remote_node=""; my $sc=""; my $sc_exit=""; my $shell_call=""; my $line=""; ### I don't know if I really want to use/keep this. # If the node is still a cluster member, kick it out. $remote_node=$conf->{env}{DRBD_PEERS}; if ($conf->{nodes}{$remote_node}{member} eq "M") { # It is, kick it out. If cluster comms are up, this will # trigger a fence in a few moment, regardless of what we do # next. if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "Target node: [$remote_node] is a cluster member, attempting to eject."); } $sc=IO::Handle->new(); $shell_call="$conf->{path}{cman_tool} kill -n $remote_node"; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: shell call: [$shell_call]"); } open ($sc, "$shell_call 2>&1 |") or to_log($conf, 1, __LINE__, "Failed to call: [$sc], error was: $!"); while(<$sc>) { chomp; $line=$_; } $sc->close(); $sc_exit=$?; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Attempt to force-remove node: [$remote_node] exited with: [$sc_exit]"); } } else { if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "Target node: [$remote_node] is *not* a cluster member (state: [$conf->{nodes}{$remote_node}{member}]). Not ejecting."); } } return(0); } # This identifies the remote node. sub get_target { my ($conf)=@_; # Variables my $remote_node=$conf->{env}{DRBD_PEERS}; # Make sure I know my target. if ( not exists $conf->{nodes}{$remote_node} ) { # Try the short name. $remote_node=~s/^(.*?)\..*$//; if ( not exists $conf->{nodes}{$remote_node} ) { if ($conf->{sys}{debug}) { to_log($conf, 1, __LINE__, "I didn't see the other node: [$conf->{env}{DRBD_PEERS} ($remote_node)] in cman's node list. I can't fence this node."); } } # Update the peer. $conf->{env}{DRBD_PEERS}=$remote_node; } if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "I have identified my target: [$remote_node]"); } return(0); } # This uses 'cman_tool' to get the information on the other node(s) in the # cluster. sub get_info_on_other_nodes { my ($conf)=@_; # Variables my $node_count=0; my $sc=""; my $shell_call=""; my $sc_exit=""; my $line=""; my $node_id=""; my $node_name=""; my $member=""; my $address=""; $sc=IO::Handle->new(); $shell_call="$conf->{path}{cman_tool} -a -F id,name,type,addr nodes"; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: shell call: [$shell_call]"); } open ($sc, "$shell_call 2>&1 |") or to_log($conf, 1, __LINE__, "Failed to call: [$sc], error was: $!"); while(<$sc>) { chomp; $line=$_; ($node_id, $node_name, $member, $address)=(split/ /, $line); if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: id: [$node_id], name: [$node_name], member: [$member], address: [$address]"); } $conf->{nodes}{$node_name}{member}=$member; $conf->{nodes}{$node_name}{id}=$node_id; $conf->{nodes}{$node_name}{address}=$address; $node_count++; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: output: $line"); } } $sc->close(); $sc_exit=$?; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Attempt to gather cluster member information exited with: [$sc_exit]"); } return(0); } # This reads /proc/drbd and pulls out the state of the defined resource sub get_local_resource_state { my ($conf)=@_; # Variables my $minor=$conf->{env}{DRBD_MINOR}; my $sc=""; my $shell_call=""; my $sc_exit=""; my $line=""; my $state=""; # Minor may well be '0', so I need to check for an empty string here. if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Checking the state of resource with minor number: [$conf->{env}{DRBD_MINOR}]"); } if ($conf->{env}{DRBD_MINOR} eq "") { to_log($conf, 1, __LINE__, "Resource minor number not defined! Unable to proceed."); } if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: minor: [$minor]"); } $sc=IO::Handle->new(); $shell_call="{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: shell call: [$shell_call]"); } open ($sc, "$shell_call") or to_log($conf, 1, __LINE__, "Failed to call: [$sc], error was: $!"); while(<$sc>) { chomp; $line=$_; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: output: $line"); } $line=~s/^\s+//; if ($line=~/^$minor: .*? ds:(.*?)\//) { $state=$1; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: read state of minor: [$minor] as: [$state]"); } $conf->{sys}{local_res_uptodate}=$state eq "UpToDate" ? 1 : 0; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: sys::local_res_uptodate: [$conf->{sys}{local_res_uptodate}]"); } } } $sc->close(); $sc_exit=$?; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Attempt to collect UpToDate information device with minor: [$minor] exited with: [$sc_exit]"); } if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: UpToDate: [$conf->{sys}{local_res_uptodate}]"); } if (not $conf->{sys}{local_res_uptodate}) { to_log($conf, 1, __LINE__, "Local resource: [$conf->{env}{DRBD_RESOURCE}], minor: [$minor] is NOT 'UpToDate', will not fence peer."); } return(0); } # This reads in and sets the local node's name. sub get_local_node_name { my ($conf)=@_; # Variables my $sc=""; my $shell_call=""; my $sc_exit=""; my $line=""; $sc=IO::Handle->new(); $shell_call="$conf->{path}{cman_tool} status"; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: shell call: [$shell_call]"); } open ($sc, "$shell_call 2>&1 |") or to_log($conf, 1, __LINE__, "Failed to call: [$sc], error was: $!"); while(<$sc>) { chomp; $line=$_; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: output: $line"); } if ($line=~/Node name: (.*)/) { $conf->{sys}{this_node}=$1; last; } } $sc->close(); $sc_exit=$?; if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: Attempt to get local node name via 'cman_tool status' exited with: [$sc_exit]"); } if ($conf->{sys}{debug}) { to_log($conf, 0, __LINE__, "DEBUG: I am: [$conf->{sys}{this_node}]"); } if (not $conf->{sys}{this_node}) { to_log($conf, 1, __LINE__, "Unable to find local node name."); } return(0); } # Log file entries sub to_log { my ($conf, $exit, $line_num, $message)=@_; # Variables my $sc=""; my $shell_call=""; my $sc_exit=""; my $line=""; my $now=localtime; # I want the line number in DEBUG mode. if ($conf->{sys}{debug}) { $message="$line_num; $message"; } if (not $conf->{handle}{'log'}) { ### Logging in here causes deep recursion, so don't. # First log entry, a little setup $sc=IO::Handle->new(); $shell_call="$conf->{path}{uname} -n"; open ($sc, "$shell_call 2>&1 |") or to_log($conf, 1, __LINE__, "Failed to call: [$sc], error was: $!"); while(<$sc>) { chomp; $line=$_; $conf->{sys}{host_name}=$line; if ($conf->{sys}{host_name} =~ /\./) { $conf->{sys}{host_name}=~s/^(.*?)\..*$/$1/; } } $sc->close(); $sc_exit=$?; # Open a (hot) handle to syslog. $conf->{handle}{'log'}=IO::Handle->new(); open ($conf->{handle}{'log'}, ">>/var/log/messages") || die "Failed to append to syslog; $!\n"; $conf->{handle}{'log'}->autoflush(1); } # Setup the time and then the string. $now=~s/^.\w+ (.*?) \d+$/$1/; print {$conf->{handle}{'log'}} "$now $conf->{sys}{host_name} $THIS_FILE: $message\n"; if ($exit) { nice_exit($conf, $exit); } return(0); } # Cleanly exit. sub nice_exit { my ($conf, $code)=@_; if ($conf->{handle}{'log'}) { $conf->{handle}{'log'}->close(); } exit ($code); } drbd-8.4.4/scripts/snapshot-resync-target-lvm.sh0000775000000000000000000000561411737541514020426 0ustar rootroot#!/bin/bash # # snapshot-resync-target-lvm.sh # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # The caller (drbdadm) sets for us: # DRBD_RESOURCE, DRBD_VOLUME, DRBD_MINOR, DRBD_LL_DISK etc. # ########### # # There will be no resync if this script terminates with an # exit code != 0. So be carefull with the exit code! # export LC_ALL=C LANG=C if [[ -z "$DRBD_RESOURCE" || -z "$DRBD_LL_DISK" ]]; then echo "DRBD_RESOURCE/DRBD_LL_DISK is not set. This script is supposed to" echo "get called by drbdadm as a handler script" exit 0 fi PROG=$(basename $0) exec > >(exec 2>&- ; logger -t "$PROG[$$]" -p local5.info) 2>&1 echo "invoked for $DRBD_RESOURCE/$DRBD_VOLUME (drbd$DRBD_MINOR)" TEMP=$(getopt -o p:a:nv --long percent:,additional:,disconnect-on-error,verbose -- "$@") if [ $? != 0 ]; then echo "getopt failed" exit 0 fi if BACKING_BDEV=$(drbdadm sh-ll-dev "$DRBD_RESOURCE/$DRBD_VOLUME"); then is_stacked=false elif BACKING_BDEV=$(drbdadm sh-ll-dev "$(drbdadm -S sh-lr-of "$DRBD_RESOURCE")/$DRBD_VOLUME"); then is_stacked=true else echo "Cannot determine lower level device of resource $DRBD_RESOURCE/$DRBD_VOLUME, sorry." exit 0 fi set_vg_lv_size() { local X if ! X=$(lvs --noheadings --nosuffix --units s -o vg_name,lv_name,lv_size "$BACKING_BDEV") ; then # if lvs cannot tell me the info I need, # this is: echo "Cannot create snapshot of $BACKING_BDEV, apparently no LVM LV." return 1 fi set -- $X VG_NAME=$1 LV_NAME=$2 LV_SIZE_K=$[$3 / 2] return 0 } set_vg_lv_size || exit 0 # clean exit if not an lvm lv SNAP_PERC=10 SNAP_ADDITIONAL=10240 DISCONNECT_ON_ERROR=0 LVC_OPTIONS="" BE_VERBOSE=0 SNAP_NAME=$LV_NAME-before-resync $is_stacked && SNAP_NAME=$SNAP_NAME-stacked DEFAULTFILE="/etc/default/drbd-snapshot" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi ## command line parameters override default file eval set -- "$TEMP" while true; do case $1 in -p|--percent) SNAP_PERC="$2" shift ;; -a|--additional) SNAP_ADDITIONAL="$2" shift ;; -n|--disconnect-on-error) DISCONNECT_ON_ERROR=1 ;; -v|--verbose) BE_VERBOSE=1 ;; --) break ;; esac shift done shift # the -- LVC_OPTIONS="$@" if [[ $0 == *unsnapshot* ]]; then [ $BE_VERBOSE = 1 ] && set -x lvremove -f $VG_NAME/$SNAP_NAME exit 0 else ( set -e [ $BE_VERBOSE = 1 ] && set -x case $DRBD_MINOR in *[!0-9]*|"") if $is_stacked; then DRBD_MINOR=$(drbdadm -S sh-minor "$DRBD_RESOURCE") else DRBD_MINOR=$(drbdadm sh-minor "$DRBD_RESOURCE") fi ;; *) :;; # ok, already exported by drbdadm esac OUT_OF_SYNC=$(sed -ne "/^ *$DRBD_MINOR:/ "'{ n; s/^.* oos:\([0-9]*\).*$/\1/; s/^$/0/; # default if not found p; q; }' < /proc/drbd) # unit KiB SNAP_SIZE=$((OUT_OF_SYNC + SNAP_ADDITIONAL + LV_SIZE_K * SNAP_PERC / 100)) lvcreate -s -n $SNAP_NAME -L ${SNAP_SIZE}k $LVC_OPTIONS $VG_NAME/$LV_NAME ) RV=$? [ $DISCONNECT_ON_ERROR = 0 ] && exit 0 exit $RV fi drbd-8.4.4/scripts/stonith_admin-fence-peer.sh0000775000000000000000000000205311736532437020052 0ustar rootroot#!/bin/sh # # DRBD fence-peer handler for Pacemaker 1.1 clusters # (via stonith-ng). # # Requires that the cluster is running with STONITH # enabled, and has configured and functional STONITH # agents. # # Also requires that the DRBD disk fencing policy # is at least "resource-only", but "resource-and-stonith" # is more likely to be useful as most people will # use this in dual-Primary configurations. # # Returns 7 on on success (DRBD fence-peer exit code # for "yes, I've managed to fence this node"). # Returns 1 on any error (undefined generic error code, # causes DRBD devices with the "resource-and-stonith" # fencing policy to remain suspended). log() { local msg msg="$1" logger -i -t "`basename $0`" -s "$msg" } if [ -z "$DRBD_PEERS" ]; then log "DRBD_PEERS is empty or unset, cannot continue." exit 1 fi for p in $DRBD_PEERS; do stonith_admin --fence $p rc=$? if [ $rc -eq 0 ]; then log "stonith_admin successfully fenced peer $p." else log "Failed to fence peer $p. stonith_admin returned $rc." exit 1 fi done exit 7 drbd-8.4.4/scripts/unsnapshot-resync-target-lvm.sh0000777000000000000000000000000011101361567026520 2snapshot-resync-target-lvm.shustar rootrootdrbd-8.4.4/user/Makefile.in0000664000000000000000000001222012221261130014150 0ustar rootroot# Makefile for drbd.o # # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd; see the file COPYING. If not, write to # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # # variables set by configure DISTRO = @DISTRO@ prefix = @prefix@ exec_prefix = @exec_prefix@ localstatedir = @localstatedir@ datarootdir = @datarootdir@ datadir = @datadir@ sbindir = @sbindir@ sysconfdir = @sysconfdir@ BASH_COMPLETION_SUFFIX = @BASH_COMPLETION_SUFFIX@ UDEV_RULE_SUFFIX = @UDEV_RULE_SUFFIX@ INITDIR = @INITDIR@ LIBDIR = @prefix@/lib/@PACKAGE_TARNAME@ CC = @CC@ CFLAGS = @CFLAGS@ LDFLAGS = @LDFLAGS@ LN_S = @LN_S@ # features enabled or disabled by configure WITH_UTILS = @WITH_UTILS@ WITH_LEGACY_UTILS = @WITH_LEGACY_UTILS@ WITH_KM = @WITH_KM@ WITH_UDEV = @WITH_UDEV@ WITH_XEN = @WITH_XEN@ WITH_PACEMAKER = @WITH_PACEMAKER@ WITH_HEARTBEAT = @WITH_HEARTBEAT@ WITH_RGMANAGER = @WITH_RGMANAGER@ WITH_BASHCOMPLETION = @WITH_BASHCOMPLETION@ # variables meant to be overridden from the make command line DESTDIR ?= / CFLAGS += -Wall -I../drbd -I../drbd/compat GENETLINK_H := /usr/include/linux/genetlink.h libgenl.o: CFLAGS += $(shell for w in CTRL_ATTR_VERSION CTRL_ATTR_HDRSIZE CTRL_ATTR_MCAST_GROUPS; do grep -qw $$w $(GENETLINK_H) && echo -DHAVE_$$w; done) drbdadm-obj = drbdadm_scanner.o drbdadm_parser.o drbdadm_main.o \ drbdadm_adjust.o drbdtool_common.o drbdadm_usage_cnt.o \ drbd_buildtag.o registry.o config_flags.o libgenl.o \ drbd_nla.o drbdsetup-obj = libgenl.o registry.o drbdsetup.o drbdtool_common.o \ drbd_buildtag.o drbd_strings.o config_flags.o drbd_nla.o \ wrap_printf.o drbdmeta-obj = drbdmeta.o drbdmeta_scanner.o drbdtool_common.o drbd_buildtag.o all: tools ifeq ($(WITH_UTILS),yes) tools: drbdadm drbdmeta drbdsetup legacy-tools else tools: endif drbd_buildtag.c: ../drbd/drbd_buildtag.c cp $^ $@ drbd_strings.c: ../drbd/drbd_strings.c cp $^ $@ drbd_strings.h: ../drbd/drbd_strings.h cp $^ $@ drbdadm: $(drbdadm-obj) $(LINK.c) $(LDFLAGS) -o $@ $^ drbdadm_scanner.c: drbdadm_scanner.fl drbdadm_parser.h flex -s -odrbdadm_scanner.c drbdadm_scanner.fl drbdmeta_scanner.c: drbdmeta_scanner.fl drbdmeta_parser.h flex -s -odrbdmeta_scanner.c drbdmeta_scanner.fl drbdsetup: $(drbdsetup-obj) $(LINK.c) $(LDFLAGS) -o $@ $^ drbdmeta: $(drbdmeta-obj) $(LINK.c) $(LDFLAGS) -o $@ $^ legacy-tools: ifeq ($(WITH_LEGACY_UTILS),yes) $(MAKE) -C legacy ln -f -s legacy/drbdadm-83 ln -f -s legacy/drbdsetup-83 endif clean: rm -f drbdadm_scanner.c drbdmeta_scanner.c rm -f drbdsetup drbdadm drbdmeta *.o rm -f drbd_buildtag.c drbd_strings.c rm -f drbdadm-83 drbdsetup-83 rm -f *~ $(MAKE) -C legacy clean distclean: clean install: ifeq ($(WITH_UTILS),yes) install -d $(DESTDIR)$(sbindir) install -d $(DESTDIR)$(localstatedir)/lib/drbd install -d $(DESTDIR)$(localstatedir)/lock if getent group haclient > /dev/null 2> /dev/null ; then \ install -g haclient -m 4750 drbdsetup $(DESTDIR)$(sbindir) ; \ install -g haclient -m 4750 drbdmeta $(DESTDIR)$(sbindir) ; \ install -m 755 drbdadm $(DESTDIR)$(sbindir) ; \ else \ install -m 755 drbdsetup $(DESTDIR)$(sbindir) ; \ install -m 755 drbdmeta $(DESTDIR)$(sbindir) ; \ install -m 755 drbdadm $(DESTDIR)$(sbindir) ; \ fi if test -d $(DESTDIR)/sbin && \ ! test $(DESTDIR)/sbin -ef $(DESTDIR)$(sbindir) ; then \ ln -sf $(sbindir)/drbdsetup $(DESTDIR)/sbin ; \ ln -sf $(sbindir)/drbdmeta $(DESTDIR)/sbin ; \ ln -sf $(sbindir)/drbdadm $(DESTDIR)/sbin ; \ fi $(MAKE) -C legacy install endif uninstall: rm -f $(DESTDIR)$(sbindir)/{drbdsetup,drbdadm,drbdmeta} rm -f $(DESTDIR)/sbin/{drbdsetup,drbdadm,drbdmeta} $(MAKE) -C legacy uninstall spell: for f in drbdadm_adjust.c drbdadm_main.c drbdadm_parser.c drbdadm_usage_cnt.c drbdmeta.c drbdsetup.c drbdtool_common.c; do \ aspell --save-repl --dont-backup --personal=./../documentation/aspell.en.per check $$f; \ done ###dependencies drbdset.o: drbdtool_common.h ../drbd/linux/drbd_limits.h drbdsetup.o: ../drbd/linux/drbd_genl.h ../drbd/linux/drbd.h drbdsetup.o: ../drbd/linux/genl_magic_struct.h ../drbd/linux/genl_magic_func.h drbdsetup.o: libgenl.h config_flags.h libgenl.o: libgenl.h drbdtool_common.o: drbdtool_common.h drbdadm_main.o: drbdtool_common.h drbdadm.h drbdadm_adjust.o: drbdtool_common.h drbdadm.h drbdadm_parser.o: drbdtool_common.h drbdadm.h ../drbd/linux/drbd_limits.h drbdadm_scanner.o: drbdtool_common.h drbdadm.h drbdadm_parser.h drbdsetup.o: drbdtool_common.h ../drbd/linux/drbd_limits.h drbdmeta.o: drbdtool_common.h drbd_endian.h drbdadm_usage_cnt.o: drbdtool_common.h drbdadm.h drbd_endian.h drbd-8.4.4/user/config_flags.c0000664000000000000000000004670012221331365014712 0ustar rootroot#include #include #include #include #include #include #include "libgenl.h" #include #include #include #include #include "drbd_nla.h" #include #include "drbdtool_common.h" #include "config_flags.h" #ifndef ARRAY_SIZE #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) #endif #define NLA_POLICY(p) \ .nla_policy = p ## _nl_policy, \ .nla_policy_size = ARRAY_SIZE(p ## _nl_policy) /* ============================================================================================== */ static int enum_string_to_int(const char **map, int size, const char *value, int (*strcmp)(const char *, const char *)) { int n; if (!value) return -1; for (n = 0; n < size; n++) { if (map[n] && !strcmp(value, map[n])) return n; } return -1; } static bool enum_is_default(struct field_def *field, const char *value) { int n; n = enum_string_to_int(field->u.e.map, field->u.e.size, value, strcmp); return n == field->u.e.def; } static bool enum_is_equal(struct field_def *field, const char *a, const char *b) { return !strcmp(a, b); } static int type_of_field(struct context_def *ctx, struct field_def *field) { return ctx->nla_policy[__nla_type(field->nla_type)].type; } static int len_of_field(struct context_def *ctx, struct field_def *field) { return ctx->nla_policy[__nla_type(field->nla_type)].len; } static const char *get_enum(struct context_def *ctx, struct field_def *field, struct nlattr *nla) { int i; assert(type_of_field(ctx, field) == NLA_U32); i = nla_get_u32(nla); if (i < 0 || i >= field->u.e.size) return NULL; return field->u.e.map[i]; } static bool put_enum(struct context_def *ctx, struct field_def *field, struct msg_buff *msg, const char *value) { int n; n = enum_string_to_int(field->u.e.map, field->u.e.size, value, strcmp); if (n == -1) return false; assert(type_of_field(ctx, field) == NLA_U32); nla_put_u32(msg, field->nla_type, n); return true; } static int enum_usage(struct field_def *field, char *str, int size) { const char** map = field->u.e.map; char sep = '{'; int n, len = 0, l; l = snprintf(str, size, "[--%s=", field->name); len += l; size -= l; for (n = 0; n < field->u.e.size; n++) { if (!map[n]) continue; l = snprintf(str + len, size, "%c%s", sep, map[n]); len += l; size -= l; sep = '|'; } assert (sep != '{'); l = snprintf(str+len, size, "}]"); len += l; size -= l; return len; } static bool enum_is_default_nocase(struct field_def *field, const char *value) { int n; n = enum_string_to_int(field->u.e.map, field->u.e.size, value, strcasecmp); return n == field->u.e.def; } static bool enum_is_equal_nocase(struct field_def *field, const char *a, const char *b) { return !strcasecmp(a, b); } static bool put_enum_nocase(struct context_def *ctx, struct field_def *field, struct msg_buff *msg, const char *value) { int n; n = enum_string_to_int(field->u.e.map, field->u.e.size, value, strcasecmp); if (n == -1) return false; assert(type_of_field(ctx, field) == NLA_U32); nla_put_u32(msg, field->nla_type, n); return true; } static void enum_describe_xml(struct field_def *field) { const char **map = field->u.e.map; int n; printf("\t\n"); } /* ---------------------------------------------------------------------------------------------- */ static bool numeric_is_default(struct field_def *field, const char *value) { long long l; /* FIXME: unsigned long long values are broken. */ l = m_strtoll(value, field->u.n.scale); return l == field->u.n.def; } static bool numeric_is_equal(struct field_def *field, const char *a, const char *b) { long long la, lb; /* FIXME: unsigned long long values are broken. */ la = m_strtoll(a, field->u.n.scale); lb = m_strtoll(b, field->u.n.scale); return la == lb; } static const char *get_numeric(struct context_def *ctx, struct field_def *field, struct nlattr *nla) { static char buffer[1 + 20 + 2]; char scale = field->u.n.scale; unsigned long long l; int n; switch(type_of_field(ctx, field)) { case NLA_U8: l = nla_get_u8(nla); break; case NLA_U16: l = nla_get_u16(nla); break; case NLA_U32: l = nla_get_u32(nla); break; case NLA_U64: l = nla_get_u64(nla); break; default: return NULL; } if (field->u.n.is_signed) { /* Sign extend. */ switch(type_of_field(ctx, field)) { case NLA_U8: l = (int8_t)l; break; case NLA_U16: l = (int16_t)l; break; case NLA_U32: l = (int32_t)l; break; case NLA_U64: l = (int64_t)l; break; } n = snprintf(buffer, sizeof(buffer), "%lld%c", l, scale == '1' ? 0 : scale); } else n = snprintf(buffer, sizeof(buffer), "%llu%c", l, scale == '1' ? 0 : scale); assert(n < sizeof(buffer)); return buffer; } static bool put_numeric(struct context_def *ctx, struct field_def *field, struct msg_buff *msg, const char *value) { long long l; /* FIXME: unsigned long long values are broken. */ l = m_strtoll(value, field->u.n.scale); switch(type_of_field(ctx, field)) { case NLA_U8: nla_put_u8(msg, field->nla_type, l); break; case NLA_U16: nla_put_u16(msg, field->nla_type, l); break; case NLA_U32: nla_put_u32(msg, field->nla_type, l); break; case NLA_U64: nla_put_u64(msg, field->nla_type, l); break; default: return false; } return true; } static int numeric_usage(struct field_def *field, char *str, int size) { return snprintf(str, size,"[--%s=(%lld ... %lld)]", field->name, field->u.n.min, field->u.n.max); } static void numeric_describe_xml(struct field_def *field) { printf("\t\n"); } /* ---------------------------------------------------------------------------------------------- */ static int boolean_string_to_int(const char *value) { if (!value || !strcmp(value, "yes")) return 1; else if (!strcmp(value, "no")) return 0; else return -1; } static bool boolean_is_default(struct field_def *field, const char *value) { int yesno; yesno = boolean_string_to_int(value); return yesno == field->u.b.def; } static bool boolean_is_equal(struct field_def *field, const char *a, const char *b) { return boolean_string_to_int(a) == boolean_string_to_int(b); } static const char *get_boolean(struct context_def *ctx, struct field_def *field, struct nlattr *nla) { int i; assert(type_of_field(ctx, field) == NLA_U8); i = nla_get_u8(nla); return i ? "yes" : "no"; } static bool put_boolean(struct context_def *ctx, struct field_def *field, struct msg_buff *msg, const char *value) { int yesno; yesno = boolean_string_to_int(value); if (yesno == -1) return false; assert(type_of_field(ctx, field) == NLA_U8); nla_put_u8(msg, field->nla_type, yesno); return true; } static bool put_flag(struct context_def *ctx, struct field_def *field, struct msg_buff *msg, const char *value) { int yesno; yesno = boolean_string_to_int(value); if (yesno == -1) return false; assert(type_of_field(ctx, field) == NLA_U8); if (yesno) nla_put_u8(msg, field->nla_type, yesno); return true; } static int boolean_usage(struct field_def *field, char *str, int size) { return snprintf(str, size,"[--%s={yes|no}]", field->name); } static void boolean_describe_xml(struct field_def *field) { printf("\t\n", field->name, field->u.b.def ? "yes" : "no"); } /* ---------------------------------------------------------------------------------------------- */ static bool string_is_default(struct field_def *field, const char *value) { return value && !strcmp(value, ""); } static bool string_is_equal(struct field_def *field, const char *a, const char *b) { return !strcmp(a, b); } static const char *get_string(struct context_def *ctx, struct field_def *field, struct nlattr *nla) { char *str; int len; assert(type_of_field(ctx, field) == NLA_NUL_STRING); str = (char *)nla_data(nla); len = len_of_field(ctx, field); assert(memchr(str, 0, len + 1) != NULL); return str; } static bool put_string(struct context_def *ctx, struct field_def *field, struct msg_buff *msg, const char *value) { assert(type_of_field(ctx, field) == NLA_NUL_STRING); nla_put_string(msg, field->nla_type, value); return true; } static int string_usage(struct field_def *field, char *str, int size) { return snprintf(str, size,"[--%s=]", field->name); } static void string_describe_xml(struct field_def *field) { printf("\t\n", field->name); } const char *double_quote_string(const char *str) { static char *buffer; const char *s; char *b; int len = 0; for (s = str; *s; s++) { if (*s == '\\' || *s == '"') len++; len++; } b = realloc(buffer, len + 3); if (!b) return NULL; buffer = b; *b++ = '"'; for (s = str; *s; s++) { if (*s == '\\' || *s == '"') *b++ = '\\'; *b++ = *s; } *b++ = '"'; *b++ = 0; return buffer; } /* ============================================================================================== */ #define ENUM(f, d) \ .nla_type = T_ ## f, \ .is_default = enum_is_default, \ .is_equal = enum_is_equal, \ .get = get_enum, \ .put = put_enum, \ .usage = enum_usage, \ .describe_xml = enum_describe_xml, \ .u = { .e = { \ .map = f ## _map, \ .size = ARRAY_SIZE(f ## _map), \ .def = DRBD_ ## d ## _DEF } } #define ENUM_NOCASE(f, d) \ .nla_type = T_ ## f, \ .is_default = enum_is_default_nocase, \ .is_equal = enum_is_equal_nocase, \ .get = get_enum, \ .put = put_enum_nocase, \ .usage = enum_usage, \ .describe_xml = enum_describe_xml, \ .u = { .e = { \ .map = f ## _map, \ .size = ARRAY_SIZE(f ## _map), \ .def = DRBD_ ## d ## _DEF } } #define NUMERIC(f, d) \ .nla_type = T_ ## f, \ .is_default = numeric_is_default, \ .is_equal = numeric_is_equal, \ .get = get_numeric, \ .put = put_numeric, \ .usage = numeric_usage, \ .describe_xml = numeric_describe_xml, \ .u = { .n = { \ .min = DRBD_ ## d ## _MIN, \ .max = DRBD_ ## d ## _MAX, \ .def = DRBD_ ## d ## _DEF, \ .is_signed = F_ ## f ## _IS_SIGNED, \ .scale = DRBD_ ## d ## _SCALE } } #define BOOLEAN(f, d) \ .nla_type = T_ ## f, \ .is_default = boolean_is_default, \ .is_equal = boolean_is_equal, \ .get = get_boolean, \ .put = put_boolean, \ .usage = boolean_usage, \ .describe_xml = boolean_describe_xml, \ .u = { .b = { \ .def = DRBD_ ## d ## _DEF } }, \ .argument_is_optional = true #define FLAG(f) \ .nla_type = T_ ## f, \ .is_default = boolean_is_default, \ .is_equal = boolean_is_equal, \ .get = get_boolean, \ .put = put_flag, \ .usage = boolean_usage, \ .describe_xml = boolean_describe_xml, \ .u = { .b = { \ .def = false } }, \ .argument_is_optional = true #define STRING(f) \ .nla_type = T_ ## f, \ .is_default = string_is_default, \ .is_equal = string_is_equal, \ .get = get_string, \ .put = put_string, \ .usage = string_usage, \ .describe_xml = string_describe_xml, \ .needs_double_quoting = true /* ============================================================================================== */ const char *wire_protocol_map[] = { [DRBD_PROT_A] = "A", [DRBD_PROT_B] = "B", [DRBD_PROT_C] = "C", }; const char *on_io_error_map[] = { [EP_PASS_ON] = "pass_on", [EP_CALL_HELPER] = "call-local-io-error", [EP_DETACH] = "detach", }; const char *fencing_map[] = { [FP_DONT_CARE] = "dont-care", [FP_RESOURCE] = "resource-only", [FP_STONITH] = "resource-and-stonith", }; const char *after_sb_0p_map[] = { [ASB_DISCONNECT] = "disconnect", [ASB_DISCARD_YOUNGER_PRI] = "discard-younger-primary", [ASB_DISCARD_OLDER_PRI] = "discard-older-primary", [ASB_DISCARD_ZERO_CHG] = "discard-zero-changes", [ASB_DISCARD_LEAST_CHG] = "discard-least-changes", [ASB_DISCARD_LOCAL] = "discard-local", [ASB_DISCARD_REMOTE] = "discard-remote", }; const char *after_sb_1p_map[] = { [ASB_DISCONNECT] = "disconnect", [ASB_CONSENSUS] = "consensus", [ASB_VIOLENTLY] = "violently-as0p", [ASB_DISCARD_SECONDARY] = "discard-secondary", [ASB_CALL_HELPER] = "call-pri-lost-after-sb", }; const char *after_sb_2p_map[] = { [ASB_DISCONNECT] = "disconnect", [ASB_VIOLENTLY] = "violently-as0p", [ASB_CALL_HELPER] = "call-pri-lost-after-sb", }; const char *rr_conflict_map[] = { [ASB_DISCONNECT] = "disconnect", [ASB_VIOLENTLY] = "violently", [ASB_CALL_HELPER] = "call-pri-lost", }; const char *on_no_data_map[] = { [OND_IO_ERROR] = "io-error", [OND_SUSPEND_IO] = "suspend-io", }; const char *on_congestion_map[] = { [OC_BLOCK] = "block", [OC_PULL_AHEAD] = "pull-ahead", [OC_DISCONNECT] = "disconnect", }; const char *read_balancing_map[] = { [RB_PREFER_LOCAL] = "prefer-local", [RB_PREFER_REMOTE] = "prefer-remote", [RB_ROUND_ROBIN] = "round-robin", [RB_LEAST_PENDING] = "least-pending", [RB_CONGESTED_REMOTE] = "when-congested-remote", [RB_32K_STRIPING] = "32K-striping", [RB_64K_STRIPING] = "64K-striping", [RB_128K_STRIPING] = "128K-striping", [RB_256K_STRIPING] = "256K-striping", [RB_512K_STRIPING] = "512K-striping", [RB_1M_STRIPING] = "1M-striping" }; #define CHANGEABLE_DISK_OPTIONS \ { "on-io-error", ENUM(on_io_error, ON_IO_ERROR) }, \ { "fencing", ENUM(fencing, FENCING) }, \ { "disk-barrier", BOOLEAN(disk_barrier, DISK_BARRIER) }, \ { "disk-flushes", BOOLEAN(disk_flushes, DISK_FLUSHES) }, \ { "disk-drain", BOOLEAN(disk_drain, DISK_DRAIN) }, \ { "md-flushes", BOOLEAN(md_flushes, MD_FLUSHES) }, \ { "resync-rate", NUMERIC(resync_rate, RESYNC_RATE), \ .unit = "bytes/second" }, \ { "resync-after", NUMERIC(resync_after, MINOR_NUMBER) }, \ { "al-extents", NUMERIC(al_extents, AL_EXTENTS) }, \ { "al-updates", BOOLEAN(al_updates, AL_UPDATES) }, \ { "c-plan-ahead", NUMERIC(c_plan_ahead, C_PLAN_AHEAD), \ .unit = "1/10 seconds" }, \ { "c-delay-target", NUMERIC(c_delay_target, C_DELAY_TARGET), \ .unit = "1/10 seconds" }, \ { "c-fill-target", NUMERIC(c_fill_target, C_FILL_TARGET), \ .unit = "bytes" }, \ { "c-max-rate", NUMERIC(c_max_rate, C_MAX_RATE), \ .unit = "bytes/second" }, \ { "c-min-rate", NUMERIC(c_min_rate, C_MIN_RATE), \ .unit = "bytes/second" }, \ { "disk-timeout", NUMERIC(disk_timeout, DISK_TIMEOUT), \ .unit = "1/10 seconds" }, \ { "read-balancing", ENUM(read_balancing, READ_BALANCING) } \ #define CHANGEABLE_NET_OPTIONS \ { "protocol", ENUM_NOCASE(wire_protocol, PROTOCOL) }, \ { "timeout", NUMERIC(timeout, TIMEOUT), \ .unit = "1/10 seconds" }, \ { "max-epoch-size", NUMERIC(max_epoch_size, MAX_EPOCH_SIZE) }, \ { "max-buffers", NUMERIC(max_buffers, MAX_BUFFERS) }, \ { "unplug-watermark", NUMERIC(unplug_watermark, UNPLUG_WATERMARK) }, \ { "connect-int", NUMERIC(connect_int, CONNECT_INT), \ .unit = "seconds" }, \ { "ping-int", NUMERIC(ping_int, PING_INT), \ .unit = "seconds" }, \ { "sndbuf-size", NUMERIC(sndbuf_size, SNDBUF_SIZE), \ .unit = "bytes" }, \ { "rcvbuf-size", NUMERIC(rcvbuf_size, RCVBUF_SIZE), \ .unit = "bytes" }, \ { "ko-count", NUMERIC(ko_count, KO_COUNT) }, \ { "allow-two-primaries", BOOLEAN(two_primaries, ALLOW_TWO_PRIMARIES) }, \ { "cram-hmac-alg", STRING(cram_hmac_alg) }, \ { "shared-secret", STRING(shared_secret) }, \ { "after-sb-0pri", ENUM(after_sb_0p, AFTER_SB_0P) }, \ { "after-sb-1pri", ENUM(after_sb_1p, AFTER_SB_1P) }, \ { "after-sb-2pri", ENUM(after_sb_2p, AFTER_SB_2P) }, \ { "always-asbp", BOOLEAN(always_asbp, ALWAYS_ASBP) }, \ { "rr-conflict", ENUM(rr_conflict, RR_CONFLICT) }, \ { "ping-timeout", NUMERIC(ping_timeo, PING_TIMEO), \ .unit = "1/10 seconds" }, \ { "data-integrity-alg", STRING(integrity_alg) }, \ { "tcp-cork", BOOLEAN(tcp_cork, TCP_CORK) }, \ { "on-congestion", ENUM(on_congestion, ON_CONGESTION) }, \ { "congestion-fill", NUMERIC(cong_fill, CONG_FILL), \ .unit = "bytes" }, \ { "congestion-extents", NUMERIC(cong_extents, CONG_EXTENTS) }, \ { "csums-alg", STRING(csums_alg) }, \ { "verify-alg", STRING(verify_alg) }, \ { "use-rle", BOOLEAN(use_rle, USE_RLE) } struct context_def disk_options_ctx = { NLA_POLICY(disk_conf), .fields = { CHANGEABLE_DISK_OPTIONS, { } }, }; struct context_def net_options_ctx = { NLA_POLICY(net_conf), .fields = { CHANGEABLE_NET_OPTIONS, { } }, }; struct context_def primary_cmd_ctx = { NLA_POLICY(set_role_parms), .fields = { { "force", FLAG(assume_uptodate) }, { } }, }; struct context_def attach_cmd_ctx = { NLA_POLICY(disk_conf), .fields = { { "size", NUMERIC(disk_size, DISK_SIZE), .unit = "bytes" }, { "max-bio-bvecs", NUMERIC(max_bio_bvecs, MAX_BIO_BVECS) }, CHANGEABLE_DISK_OPTIONS, /* { "*", STRING(backing_dev) }, */ /* { "*", STRING(meta_dev) }, */ /* { "*", NUMERIC(meta_dev_idx, MINOR_NUMBER) }, */ { } }, }; struct context_def detach_cmd_ctx = { NLA_POLICY(detach_parms), .fields = { { "force", FLAG(force_detach) }, { } }, }; struct context_def connect_cmd_ctx = { NLA_POLICY(net_conf), .fields = { { "tentative", FLAG(tentative) }, { "discard-my-data", FLAG(discard_my_data) }, CHANGEABLE_NET_OPTIONS, { } }, }; struct context_def disconnect_cmd_ctx = { NLA_POLICY(disconnect_parms), .fields = { { "force", FLAG(force_disconnect) }, { } }, }; struct context_def resize_cmd_ctx = { NLA_POLICY(resize_parms), .fields = { { "size", NUMERIC(resize_size, DISK_SIZE), .unit = "bytes" }, { "assume-peer-has-space", FLAG(resize_force) }, { "assume-clean", FLAG(no_resync) }, { "al-stripes", NUMERIC(al_stripes, AL_STRIPES) }, { "al-stripe-size-kB", NUMERIC(al_stripe_size, AL_STRIPE_SIZE) }, { } }, }; struct context_def resource_options_cmd_ctx = { NLA_POLICY(res_opts), .fields = { { "cpu-mask", STRING(cpu_mask) }, { "on-no-data-accessible", ENUM(on_no_data, ON_NO_DATA) }, { } }, }; struct context_def new_current_uuid_cmd_ctx = { NLA_POLICY(new_c_uuid_parms), .fields = { { "clear-bitmap", FLAG(clear_bm) }, { } }, }; struct context_def verify_cmd_ctx = { NLA_POLICY(start_ov_parms), .fields = { { "start", NUMERIC(ov_start_sector, DISK_SIZE), .unit = "bytes" }, { "stop", NUMERIC(ov_stop_sector, DISK_SIZE), .unit = "bytes" }, { } }, }; struct context_def new_minor_cmd_ctx = { NLA_POLICY(drbd_cfg_context), .fields = { /* { "*", STRING(ctx_resource_name) }, */ /* { "*", NUMERIC(ctx_volume, >= 0) }, */ /* { "*", BINARY(ctx_my_addr) }, */ /* { "*", BINARY(ctx_peer_addr) }, */ { } }, }; drbd-8.4.4/user/config_flags.h0000664000000000000000000000316112221331365014711 0ustar rootroot#ifndef __DRBD_CONFIG_FLAGS_H #define __DRBD_CONFIG_FLAGS_H struct msg_buff; struct nlattr; struct context_def; struct field_def { const char *name; unsigned short nla_type; bool (*is_default)(struct field_def *, const char *); bool (*is_equal)(struct field_def *, const char *, const char *); const char *(*get)(struct context_def *, struct field_def *, struct nlattr *); bool (*put)(struct context_def *, struct field_def *, struct msg_buff *, const char *); int (*usage)(struct field_def *, char *, int); void (*describe_xml)(struct field_def *); union { struct { const char **map; int size; int def; } e; /* ENUM, ENUM_NOCASE */ struct { long long min; long long max; long long def; bool is_signed; char scale; } n; /* NUMERIC */ struct { bool def; } b; /* BOOLEAN */ } u; bool needs_double_quoting; bool argument_is_optional; const char *unit; }; struct context_def { struct nla_policy *nla_policy; int nla_policy_size; struct field_def fields[]; }; extern struct context_def disk_options_ctx; extern struct context_def net_options_ctx; extern struct context_def primary_cmd_ctx; extern struct context_def attach_cmd_ctx; extern struct context_def detach_cmd_ctx; extern struct context_def connect_cmd_ctx; extern struct context_def disconnect_cmd_ctx; extern struct context_def resize_cmd_ctx; extern struct context_def resource_options_cmd_ctx; extern struct context_def new_current_uuid_cmd_ctx; extern struct context_def verify_cmd_ctx; extern struct context_def new_minor_cmd_ctx; extern const char *double_quote_string(const char *str); #endif /* __DRBD_CONFIG_FLAGS_H */ drbd-8.4.4/user/drbd_endian.h0000664000000000000000000001173112216604252014525 0ustar rootroot#ifndef DRBD_ENDIAN_H #define DRBD_ENDIAN_H 1 /* * we don't want additional dependencies on other packages, * and we want to avoid to introduce incompatibilities by including kernel * headers from user space. * * we need the uint32_t and uint64_t types, * the hamming weight functions, * and the cpu_to_le etc. endianness convert functions. */ #include #include #ifndef BITS_PER_LONG # define BITS_PER_LONG __WORDSIZE #endif /* linux/byteorder/swab.h */ /* casts are necessary for constants, because we never know for sure * how U/UL/ULL map to __u16, uint32_t, uint64_t. At least not in a portable way. */ /* * __asm__("bswap %0" : "=r" (x) : "0" (x)); * oh, well... */ #define __swab16(x) \ ({ \ __u16 __x = (x); \ ((__u16)( \ (((__u16)(__x) & (__u16)0x00ffUL) << 8) | \ (((__u16)(__x) & (__u16)0xff00UL) >> 8) )); \ }) #define __swab32(x) \ ({ \ uint32_t __x = (x); \ ((uint32_t)( \ (((uint32_t)(__x) & (uint32_t)0x000000ffUL) << 24) | \ (((uint32_t)(__x) & (uint32_t)0x0000ff00UL) << 8) | \ (((uint32_t)(__x) & (uint32_t)0x00ff0000UL) >> 8) | \ (((uint32_t)(__x) & (uint32_t)0xff000000UL) >> 24) )); \ }) #define __swab64(x) \ ({ \ uint64_t __x = (x); \ ((uint64_t)( \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x00000000000000ffULL) << 56) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x000000000000ff00ULL) << 40) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x0000000000ff0000ULL) << 24) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x00000000ff000000ULL) << 8) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x000000ff00000000ULL) >> 8) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x0000ff0000000000ULL) >> 24) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0x00ff000000000000ULL) >> 40) | \ (uint64_t)(((uint64_t)(__x) & (uint64_t)0xff00000000000000ULL) >> 56) )); \ }) /* * linux/byteorder/little_endian.h * linux/byteorder/big_endian.h */ #if __BYTE_ORDER == __LITTLE_ENDIAN #define cpu_to_le64(x) ((uint64_t)(x)) #define le64_to_cpu(x) ((uint64_t)(x)) #define cpu_to_le32(x) ((uint32_t)(x)) #define le32_to_cpu(x) ((uint32_t)(x)) #define cpu_to_le16(x) ((__u16)(x)) #define le16_to_cpu(x) ((__u16)(x)) #define cpu_to_be64(x) __swab64((x)) #define be64_to_cpu(x) __swab64((x)) #define cpu_to_be32(x) __swab32((x)) #define be32_to_cpu(x) __swab32((x)) #define cpu_to_be16(x) __swab16((x)) #define be16_to_cpu(x) __swab16((x)) #elif __BYTE_ORDER == __BIG_ENDIAN # define cpu_to_le64(x) __swab64((x)) # define le64_to_cpu(x) __swab64((x)) # define cpu_to_le32(x) __swab32((x)) # define le32_to_cpu(x) __swab32((x)) # define cpu_to_le16(x) __swab16((x)) # define le16_to_cpu(x) __swab16((x)) # define cpu_to_be64(x) ((uint64_t)(x)) # define be64_to_cpu(x) ((uint64_t)(x)) # define cpu_to_be32(x) ((uint32_t)(x)) # define be32_to_cpu(x) ((uint32_t)(x)) # define cpu_to_be16(x) ((__u16)(x)) # define be16_to_cpu(x) ((__u16)(x)) #else # error "sorry, weird endianness on this box" #endif #if BITS_PER_LONG == 32 # define LN2_BPL 5 # define cpu_to_le_long cpu_to_le32 # define le_long_to_cpu le32_to_cpu #elif BITS_PER_LONG == 64 # define LN2_BPL 6 # define cpu_to_le_long cpu_to_le64 # define le_long_to_cpu le64_to_cpu #else # error "LN2 of BITS_PER_LONG unknown!" #endif /* linux/bitops.h */ /* * hweightN: returns the hamming weight (i.e. the number * of bits set) of a N-bit word */ static inline unsigned int generic_hweight32(unsigned int w) { unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); res = (res & 0x33333333) + ((res >> 2) & 0x33333333); res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); } static inline unsigned long generic_hweight64(uint64_t w) { #if BITS_PER_LONG < 64 return generic_hweight32((unsigned int)(w >> 32)) + generic_hweight32((unsigned int)w); #else uint64_t res; res = (w & 0x5555555555555555) + ((w >> 1) & 0x5555555555555555); res = (res & 0x3333333333333333) + ((res >> 2) & 0x3333333333333333); res = (res & 0x0F0F0F0F0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F0F0F0F0F); res = (res & 0x00FF00FF00FF00FF) + ((res >> 8) & 0x00FF00FF00FF00FF); res = (res & 0x0000FFFF0000FFFF) + ((res >> 16) & 0x0000FFFF0000FFFF); return (res & 0x00000000FFFFFFFF) + ((res >> 32) & 0x00000000FFFFFFFF); #endif } static inline unsigned long hweight_long(unsigned long w) { return sizeof(w) == 4 ? generic_hweight32(w) : generic_hweight64(w); } /* * Format macros for printf() */ #if BITS_PER_LONG == 32 # define X32(a) "%"#a"X" # define X64(a) "%"#a"llX" # define D32 "%d" # define D64 "%lld" # define U32 "%u" # define U64 "%llu" #elif BITS_PER_LONG == 64 # define X32(a) "%"#a"X" # define X64(a) "%"#a"lX" # define D32 "%d" # define D64 "%ld" # define U32 "%u" # define U64 "%lu" #else # error "sorry, unsupported word length on this box" #endif #if BITS_PER_LONG == 32 # define strto_u64 strtoull #elif BITS_PER_LONG == 64 # define strto_u64 strtoul #else # error "sorry, unsupported word length on this box" #endif #endif drbd-8.4.4/user/drbd_nla.c0000664000000000000000000000232512216604252014033 0ustar rootroot#include #include "libgenl.h" #include #include "drbd_nla.h" static int drbd_nla_check_mandatory(int maxtype, struct nlattr *nla) { struct nlattr *head = nla_data(nla); int len = nla_len(nla); int rem; /* * validate_nla (called from nla_parse_nested) ignores attributes * beyond maxtype, and does not understand the DRBD_GENLA_F_MANDATORY flag. * In order to have it validate attributes with the DRBD_GENLA_F_MANDATORY * flag set also, check and remove that flag before calling * nla_parse_nested. */ nla_for_each_attr(nla, head, len, rem) { if (nla->nla_type & DRBD_GENLA_F_MANDATORY) { nla->nla_type &= ~DRBD_GENLA_F_MANDATORY; if (nla_type(nla) > maxtype) return -EOPNOTSUPP; } } return 0; } int drbd_nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, const struct nla_policy *policy) { int err; err = drbd_nla_check_mandatory(maxtype, nla); if (!err) err = nla_parse_nested(tb, maxtype, nla, policy); return err; } struct nlattr *drbd_nla_find_nested(int maxtype, struct nlattr *nla, int attrtype) { int err; err = drbd_nla_check_mandatory(maxtype, nla); if (err) /* ignore */; return nla_find_nested(nla, attrtype); } drbd-8.4.4/user/drbd_nla.h0000664000000000000000000000044012216604252014034 0ustar rootroot#ifndef __DRBD_NLA_H #define __DRBD_NLA_H extern int drbd_nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, const struct nla_policy *policy); extern struct nlattr *drbd_nla_find_nested(int maxtype, struct nlattr *nla, int attrtype); #endif /* __DRBD_NLA_H */ drbd-8.4.4/user/drbdadm.h0000664000000000000000000002554212221331365013674 0ustar rootroot#ifndef DRBDADM_H #define DRBDADM_H #include #include #include #include #include #include #include #include "config.h" #define E_syntax 2 #define E_usage 3 #define E_config_invalid 10 #define E_exec_error 20 #define E_thinko 42 /* :) */ enum { SLEEPS_FINITE = 1, SLEEPS_SHORT = 2+1, SLEEPS_LONG = 4+1, SLEEPS_VERY_LONG = 8+1, SLEEPS_MASK = 15, RETURN_PID = 2, SLEEPS_FOREVER = 4, SUPRESS_STDERR = 0x10, RETURN_STDOUT_FD = 0x20, RETURN_STDERR_FD = 0x40, DONT_REPORT_FAILED = 0x80, }; /* for check_uniq(): Check for uniqueness of certain values... * comment out if you want to NOT choke on the first conflict */ #define EXIT_ON_CONFLICT 1 /* for verify_ips(): are not verifyable ips fatal? */ #define INVALID_IP_IS_INVALID_CONF 1 enum usage_count_type { UC_YES, UC_NO, UC_ASK, }; enum pp_flags { MATCH_ON_PROXY = 1, }; struct d_globals { int disable_io_hints; int disable_ip_verification; int minor_count; int dialog_refresh; enum usage_count_type usage_count; }; #define IFI_HADDR 8 #define IFI_ALIAS 1 struct ifi_info { char ifi_name[IFNAMSIZ]; /* interface name, nul terminated */ uint8_t ifi_haddr[IFI_HADDR]; /* hardware address */ uint16_t ifi_hlen; /* bytes in hardware address, 0, 6, 8 */ short ifi_flags; /* IFF_xxx constants from */ short ifi_myflags; /* our own IFI_xxx flags */ struct sockaddr *ifi_addr; /* primary address */ struct ifi_info *ifi_next; /* next ifi_info structure */ }; struct d_name { char *name; struct d_name *next; }; struct d_proxy_info { struct d_name *on_hosts; char* inside_addr; char* inside_port; char* inside_af; char* outside_addr; char* outside_port; char* outside_af; struct d_option *options; /* named proxy_options in other places */ struct d_option *plugins; /* named proxy_plugins in other places */ }; struct d_volume { unsigned vnr; char* device; unsigned device_minor; char* disk; char* meta_disk; char* meta_index; int meta_major; int meta_minor; struct d_volume *next; struct d_option* disk_options; /* Additional per volume options */ /* Do not dump an explicit volume section */ unsigned int implicit :1 ; /* flags for "drbdadm adjust" */ unsigned int adj_del_minor :1; unsigned int adj_add_minor :1; unsigned int adj_detach :1; unsigned int adj_attach :1; unsigned int adj_resize :1; unsigned int adj_disk_opts :1; }; struct d_host_info { struct d_name *on_hosts; struct d_volume *volumes; char* address; char* port; char* address_family; struct d_proxy_info *proxy; struct d_host_info* next; struct d_resource* lower; /* for device stacking */ char *lower_name; /* for device stacking, before bind_stacked_res() */ int config_line; unsigned int by_address:1; /* Match to machines by address, not by names (=on_hosts) */ struct d_option* res_options; /* Additional per host options */ }; struct d_option { char* name; char* value; struct d_option* next; unsigned int mentioned :1 ; // for the adjust command. unsigned int is_escaped :1 ; unsigned int adj_skip :1; }; struct d_resource { char* name; struct d_volume *volumes; /* gets propagated to host_info sections later. */ struct d_host_info* me; struct d_host_info* peer; struct d_host_info* all_hosts; struct d_option* net_options; struct d_option* disk_options; struct d_option* res_options; struct d_option* startup_options; struct d_option* handlers; struct d_option* proxy_options; struct d_option* proxy_plugins; struct d_resource* next; struct d_name *become_primary_on; char *config_file; /* The config file this resource is define in.*/ int start_line; unsigned int stacked_timeouts:1; unsigned int ignore:1; unsigned int stacked:1; /* Stacked on this node */ unsigned int stacked_on_one:1; /* Stacked either on me or on peer */ /* if a prerequisite command failed, don't try any further commands. * see run_deferred_cmds() */ unsigned int skip_further_deferred_command:1; }; struct adm_cmd; struct cfg_ctx { /* res == NULL: does not care for resources, or iterates over all * resources in the global "struct d_resource *config" */ struct d_resource *res; /* vol == NULL: operate on the resource itself, or iterates over all * volumes in res */ struct d_volume *vol; const char *arg; }; extern char *canonify_path(char *path); extern int adm_adjust(struct cfg_ctx *); extern int adm_new_minor(struct cfg_ctx *ctx); extern int adm_new_resource(struct cfg_ctx *); extern int adm_res_options(struct cfg_ctx *); extern int adm_set_default_res_options(struct cfg_ctx *); extern int adm_attach(struct cfg_ctx *); extern int adm_disk_options(struct cfg_ctx *); extern int adm_set_default_disk_options(struct cfg_ctx *); extern int adm_resize(struct cfg_ctx *); extern int adm_connect(struct cfg_ctx *); extern int adm_net_options(struct cfg_ctx *); extern int adm_set_default_net_options(struct cfg_ctx *); extern int adm_disconnect(struct cfg_ctx *); extern int adm_generic_s(struct cfg_ctx *); extern int adm_create_md(struct cfg_ctx *); extern int _admm_generic(struct cfg_ctx *, int flags); extern void m__system(char **argv, int flags, const char *res_name, pid_t *kid, int *fd, int *ex); static inline int m_system_ex(char **argv, int flags, const char *res_name) { int ex; m__system(argv, flags, res_name, NULL, NULL, &ex); return ex; } extern struct d_option* find_opt(struct d_option*,char*); extern void validate_resource(struct d_resource *, enum pp_flags); /* stages of configuration, as performed on "drbdadm up" * or "drbdadm adjust": */ enum drbd_cfg_stage { /* prerequisite stage: create objects, start daemons, ... */ CFG_PREREQ, /* run time changeable settings of resources */ CFG_RESOURCE, /* detach/attach local disks, */ CFG_DISK_PREREQ, CFG_DISK, /* The stage to discard network configuration, during adjust. * This is after the DISK stage, because we don't want to cut access to * good data while in primary role. And before the SETTINGS stage, as * some proxy or syncer settings may cause side effects and additional * handshakes while we have an established connection. */ CFG_NET_PREREQ, /* discard/set connection parameters */ CFG_NET, __CFG_LAST }; extern void schedule_deferred_cmd( int (*function)(struct cfg_ctx *), struct cfg_ctx *ctx, const char *arg, enum drbd_cfg_stage stage); extern int version_code_kernel(void); extern int version_code_userland(void); extern void warn_on_version_mismatch(void); extern void maybe_exec_drbdadm_83(char **argv); extern void uc_node(enum usage_count_type type); extern void convert_discard_opt(struct d_resource* res); extern void convert_after_option(struct d_resource* res); extern int have_ip(const char *af, const char *ip); enum pr_flags { NoneHAllowed = 4, PARSE_FOR_ADJUST = 8 }; extern struct d_resource* parse_resource_for_adjust(struct cfg_ctx *ctx); extern struct d_resource* parse_resource(char*, enum pr_flags); extern void post_parse(struct d_resource *config, enum pp_flags); extern struct d_option *new_opt(char *name, char *value); extern int name_in_names(char *name, struct d_name *names); extern char *_names_to_str(char* buffer, struct d_name *names); extern char *_names_to_str_c(char* buffer, struct d_name *names, char c); #define NAMES_STR_SIZE 255 #define names_to_str(N) _names_to_str(alloca(NAMES_STR_SIZE+1), N) #define names_to_str_c(N, C) _names_to_str_c(alloca(NAMES_STR_SIZE+1), N, C) extern void free_names(struct d_name *names); extern void set_me_in_resource(struct d_resource* res, int match_on_proxy); extern void set_peer_in_resource(struct d_resource* res, int peer_required); extern void set_on_hosts_in_res(struct d_resource *res); extern void set_disk_in_res(struct d_resource *res); extern int _proxy_connect_name_len(struct d_resource *res); extern char *_proxy_connection_name(char *conn_name, struct d_resource *res); #define proxy_connection_name(RES) \ _proxy_connection_name(alloca(_proxy_connect_name_len(RES)), RES) int parse_proxy_options_section(struct d_resource *res); /* conn_name is optional and mostly for compatibility with dcmd */ int do_proxy_conn_up(struct cfg_ctx *ctx); int do_proxy_conn_down(struct cfg_ctx *ctx); int do_proxy_conn_plugins(struct cfg_ctx *ctx); extern char *config_file; extern char *config_save; extern int config_valid; extern struct d_resource* config; extern struct d_resource* common; extern struct d_globals global_options; extern int line, fline; extern struct hsearch_data global_htable; extern int no_tty; extern int dry_run; extern int verbose; extern char* drbdsetup; extern char* drbd_proxy_ctl; extern char* drbdadm_83; extern char ss_buffer[1024]; extern struct utsname nodeinfo; extern char* connect_to_host; struct setup_option { bool explicit; char *option; }; struct setup_option *setup_options; extern void add_setup_option(bool explicit, char *option); /* ssprintf() places the result of the printf in the current stack frame and sets ptr to the resulting string. If the current stack frame is destroyed (=function returns), the allocated memory is freed automatically */ /* // This is the nicer version, that does not need the ss_buffer. // But it only works with very new glibcs. #define ssprintf(...) \ ({ int _ss_size = snprintf(0, 0, ##__VA_ARGS__); \ char *_ss_ret = __builtin_alloca(_ss_size+1); \ snprintf(_ss_ret, _ss_size+1, ##__VA_ARGS__); \ _ss_ret; }) */ #define ssprintf(ptr,...) \ ptr=strcpy(alloca(snprintf(ss_buffer,sizeof(ss_buffer),##__VA_ARGS__)+1),ss_buffer) /* CAUTION: arguments may not have side effects! */ #define for_each_resource(res,tmp,config) \ for (res = (config); res && (tmp = res->next, 1); res = tmp) #define for_each_volume(v_,volumes_) \ for (v_ = volumes_; v_; v_ = v_->next) #define APPEND(LIST,ITEM) ({ \ typeof((LIST)) _l = (LIST); \ typeof((ITEM)) _i = (ITEM); \ typeof((ITEM)) _t; \ _i->next = NULL; \ if (_l == NULL) { _l = _i; } \ else { \ for (_t = _l; _t->next; _t = _t->next); \ _t->next = _i; \ }; \ _l; \ }) #define INSERT_SORTED(LIST,ITEM,SORT) ({ \ typeof((LIST)) _l = (LIST); \ typeof((ITEM)) _i = (ITEM); \ typeof((ITEM)) _t, _p = NULL; \ for (_t = _l; _t && _t->SORT <= _i->SORT; _p = _t, _t = _t->next); \ if (_p) \ _p->next = _i; \ else \ _l = _i; \ _i->next = _t; \ _l; \ }) #define SPLICE(LIST,ITEMS) ({ \ typeof((LIST)) _l = (LIST); \ typeof((ITEMS)) _i = (ITEMS); \ typeof((ITEMS)) _t; \ if (_l == NULL) { _l = _i; } \ else { \ for (_t = _l; _t->next; _t = _t->next); \ _t->next = _i; \ }; \ _l; \ }) #endif drbd-8.4.4/user/drbdadm_adjust.c0000664000000000000000000005025612221261130015231 0ustar rootroot/* drbdadm_adjust.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include "drbdadm.h" #include "drbdtool_common.h" #include "drbdadm_parser.h" #include "config_flags.h" /* drbdsetup show might complain that the device minor does not exist at all. Redirect stderr to /dev/null therefore. */ static FILE *m_popen(int *pid,char** argv) { int mpid; int pipes[2]; int dev_null; if(pipe(pipes)) { perror("Creation of pipes failed"); exit(E_exec_error); } dev_null = open("/dev/null", O_WRONLY); if (dev_null == -1) { perror("Opening /dev/null failed"); exit(E_exec_error); } mpid = fork(); if(mpid == -1) { fprintf(stderr,"Can not fork"); exit(E_exec_error); } if(mpid == 0) { close(pipes[0]); // close reading end dup2(pipes[1], fileno(stdout)); close(pipes[1]); dup2(dev_null, fileno(stderr)); close(dev_null); execvp(argv[0],argv); fprintf(stderr,"Can not exec"); exit(E_exec_error); } close(pipes[1]); // close writing end close(dev_null); *pid=mpid; return fdopen(pipes[0],"r"); } static int is_equal(struct context_def *ctx, struct d_option *a, struct d_option *b) { struct field_def *field; for (field = ctx->fields; field->name; field++) { if (!strcmp(field->name, a->name)) return field->is_equal(field, a->value, b->value); } fprintf(stderr, "Internal error: option '%s' not known in this context\n", a->name); abort(); } static bool is_default(struct context_def *ctx, struct d_option *opt) { struct field_def *field; for (field = ctx->fields; field->name; field++) { if (strcmp(field->name, opt->name)) continue; return field->is_default(field, opt->value); } return false; } static int opts_equal(struct context_def *ctx, struct d_option* conf, struct d_option* running) { struct d_option* opt; while(running) { if((opt=find_opt(conf,running->name))) { if(!is_equal(ctx, running, opt)) { if (verbose > 2) fprintf(stderr, "Value of '%s' differs: r=%s c=%s\n", opt->name,running->value,opt->value); return 0; } if (verbose > 3) fprintf(stderr, "Value of '%s' equal: r=%s c=%s\n", opt->name,running->value,opt->value); opt->mentioned=1; } else { if(!is_default(ctx, running)) { if (verbose > 2) fprintf(stderr, "Only in running config %s: %s\n", running->name,running->value); return 0; } if (verbose > 3) fprintf(stderr, "Is default: '%s' equal: r=%s\n", running->name,running->value); } running=running->next; } while(conf) { if(conf->mentioned==0 && !is_default(ctx, conf)) { if (verbose > 2) fprintf(stderr, "Only in config file %s: %s\n", conf->name,conf->value); return 0; } conf=conf->next; } return 1; } static int addr_equal(struct d_resource* conf, struct d_resource* running) { int equal; char *peer_addr, *peer_af, *peer_port; if (conf->peer == NULL && running->peer == NULL) return 1; if (running->peer == NULL) return 0; equal = !strcmp(conf->me->address, running->me->address) && !strcmp(conf->me->port, running->me->port) && !strcmp(conf->me->address_family, running->me->address_family); if(conf->me->proxy) { peer_addr = conf->me->proxy->inside_addr; peer_port = conf->me->proxy->inside_port; peer_af = conf->me->proxy->inside_af; } else { peer_addr = conf->peer->address; peer_port = conf->peer->port; peer_af = conf->peer->address_family; } equal = equal && conf->peer && !strcmp(peer_addr, running->peer->address) && !strcmp(peer_port, running->peer->port) && !strcmp(peer_af, running->peer->address_family); if (verbose > 2) fprintf(stderr, "Network addresses differ:\n" "\trunning: %s:%s:%s -- %s:%s:%s\n" "\t config: %s:%s:%s -- %s:%s:%s\n", running->me->address_family, running->me->address, running->me->port, running->peer->address_family, running->peer->address, running->peer->port, conf->me->address_family, conf->me->address, conf->me->port, peer_af, peer_addr, peer_port); return equal; } /* Are both internal, or are both not internal. */ static int int_eq(char* m_conf, char* m_running) { return !strcmp(m_conf,"internal") == !strcmp(m_running,"internal"); } static int disk_equal(struct d_volume *conf, struct d_volume *running) { int eq = 1; if (conf->disk == NULL && running->disk == NULL) return 1; if (conf->disk == NULL || running->disk == NULL) return 0; eq &= !strcmp(conf->disk, running->disk); eq &= int_eq(conf->meta_disk, running->meta_disk); if (!strcmp(conf->meta_disk, "internal")) return eq; eq &= !strcmp(conf->meta_disk, running->meta_disk); return eq; } /* NULL terminated */ static void find_option_in_resources(char *name, struct d_option *list, struct d_option **opt, ...) { va_list va; va_start(va, opt); /* We need to keep setting *opt to NULL, even if a list == NULL. */ while (list || opt) { while (list) { if (strcmp(list->name, name) == 0) break; list = list->next; } *opt = list; list = va_arg(va, struct d_option*); opt = va_arg(va, struct d_option**); } va_end(va); } static int do_proxy_reconf(struct cfg_ctx *ctx) { int rv; char *argv[4] = { drbd_proxy_ctl, "-c", (char*)ctx->arg, NULL }; rv = m_system_ex(argv, SLEEPS_SHORT, ctx->res->name); return rv; } #define MAX_PLUGINS (10) #define MAX_PLUGIN_NAME (16) /* The new name is appended to the alist. */ int _is_plugin_in_list(char *string, char slist[MAX_PLUGINS][MAX_PLUGIN_NAME], char alist[MAX_PLUGINS][MAX_PLUGIN_NAME], int list_len) { int word_len, i; char *copy; for(word_len=0; string[word_len]; word_len++) if (isspace(string[word_len])) break; if (word_len+1 >= MAX_PLUGIN_NAME) { fprintf(stderr, "Wrong proxy plugin name %*.*s", word_len, word_len, string); exit(E_config_invalid); } copy = alist[list_len]; strncpy(copy, string, word_len); copy[word_len] = 0; for(i=0; i= MAX_PLUGINS) { fprintf(stderr, "Too many proxy plugins."); exit(E_config_invalid); } return 0; } static int proxy_reconf(struct cfg_ctx *ctx, struct d_resource *running) { int reconn = 0; struct d_resource *res = ctx->res; struct d_option* res_o, *run_o; unsigned long long v1, v2, minimum; char *plugin_changes[MAX_PLUGINS], *cp, *conn_name; /* It's less memory usage when we're storing char[]. malloc overhead for * the few bytes + pointers is much more. */ char p_res[MAX_PLUGINS][MAX_PLUGIN_NAME], p_run[MAX_PLUGINS][MAX_PLUGIN_NAME]; int used, i, re_do; reconn = 0; if (!running) goto redo_whole_conn; find_option_in_resources("memlimit", res->me->proxy->options, &res_o, running->proxy_options, &run_o, NULL, NULL); v1 = res_o ? m_strtoll(res_o->value, 1) : 0; v2 = run_o ? m_strtoll(run_o->value, 1) : 0; minimum = v1 < v2 ? v1 : v2; /* We allow an Ñ” [epsilon] of 2%, so that small (rounding) deviations do * not cause the connection to be re-established. */ if (res_o && (!run_o || abs(v1-v2)/(float)minimum > 0.02)) { redo_whole_conn: /* As the memory is in use while the connection is allocated we have to * completely destroy and rebuild the connection. */ schedule_deferred_cmd( do_proxy_conn_down, ctx, NULL, CFG_NET_PREREQ); schedule_deferred_cmd( do_proxy_conn_up, ctx, NULL, CFG_NET_PREREQ); schedule_deferred_cmd( do_proxy_conn_plugins, ctx, NULL, CFG_NET_PREREQ); /* With connection cleanup and reopen everything is rebuild anyway, and * DRBD will get a reconnect too. */ return 0; } res_o = res->me->proxy->plugins; run_o = running->proxy_plugins; used = 0; conn_name = proxy_connection_name(res); for(i=0; i= sizeof(plugin_changes)-1) { fprintf(stderr, "Too many proxy plugin changes"); exit(E_config_invalid); } /* Now we can be sure that we can store another pointer. */ if (!res_o) { if (run_o) { /* More plugins running than configured - just stop here. */ m_asprintf(&cp, "set plugin %s %d end", conn_name, i); plugin_changes[used++] = cp; } else { /* Both at the end? ok, quit loop */ } break; } /* res_o != NULL. */ if (!run_o) { p_run[i][0] = 0; if (_is_plugin_in_list(res_o->name, p_run, p_res, i)) { /* Current plugin was already active, just at another position. * Redo the whole connection. */ goto redo_whole_conn; } /* More configured than running - just add it, if it's not already * somewhere else. */ m_asprintf(&cp, "set plugin %s %d %s", conn_name, i, res_o->name); plugin_changes[used++] = cp; } else { /* If we get here, both lists have been filled in parallel, so we * can simply use the common counter. */ re_do = _is_plugin_in_list(res_o->name, p_run, p_res, i) || _is_plugin_in_list(run_o->name, p_res, p_run, i); if (re_do) { /* Plugin(s) were moved, not simple reconfigured. * Re-do the whole connection. */ goto redo_whole_conn; } /* TODO: We don't (yet) account for possible different ordering of * the parameters to the plugin. * plugin A 1 B 2 * should be treated as equal to * plugin B 2 A 1. */ if (strcmp(run_o->name, res_o->name) != 0) { /* Either a different plugin, or just different settings * - plugin can be overwritten. */ m_asprintf(&cp, "set plugin %s %d %s", conn_name, i, res_o->name); plugin_changes[used++] = cp; } } if (res_o) res_o = res_o->next; if (run_o) run_o = run_o->next; } /* change only a few plugin settings. */ for(i=0; iname); err = stat("/dev/drbd/by-res", &sbuf); if (err) /* probably no udev rules in use */ return 0; err = stat(link_name, &sbuf); if (err) /* resource link cannot be stat()ed. */ return 1; /* double check device information */ if (!S_ISBLK(sbuf.st_mode)) return 1; if (major(sbuf.st_rdev) != DRBD_MAJOR) return 1; if (minor(sbuf.st_rdev) != res->me->volumes->device_minor) return 1; /* Link exists, and is expected block major:minor. * Do nothing. */ return 0; } /* moves option to the head of the single linked option list, * and marks it as to be skiped for "adjust only" commands * like disk-options see e.g. adm_attach_and_or_disk_options(). */ static void move_opt_to_head(struct d_option **head, struct d_option *o) { struct d_option *t; if (!o) return; o->adj_skip = 1; if (o == *head) return; for (t = *head; t->next != o; t = t->next) ; t->next = o->next; o->next = *head; *head = o; } void compare_max_bio_bvecs(struct d_volume *conf, struct d_volume *kern) { struct d_option *c = find_opt(conf->disk_options, "max-bio-bvecs"); struct d_option *k = find_opt(kern->disk_options, "max-bio-bvecs"); /* move to front of list, so we can skip it * for the following opts_equal */ move_opt_to_head(&conf->disk_options, c); move_opt_to_head(&kern->disk_options, k); /* simplify logic below, would otherwise have to * (!x || is_default(x) all the time. */ if (k && is_default(&attach_cmd_ctx, k)) k = NULL; /* there was a bvec restriction set, * but it is no longer in config, or vice versa */ if (!k != !c) conf->adj_attach = 1; /* restrictions differ */ if (k && c && !is_equal(&attach_cmd_ctx, k, c)) conf->adj_attach = 1; } /* similar to compare_max_bio_bvecs above */ void compare_size(struct d_volume *conf, struct d_volume *kern) { struct d_option *c = find_opt(conf->disk_options, "size"); struct d_option *k = find_opt(kern->disk_options, "size"); move_opt_to_head(&conf->disk_options, c); move_opt_to_head(&kern->disk_options, k); if (k && is_default(&attach_cmd_ctx, k)) k = NULL; if (!k != !c) conf->adj_resize = 1; if (k && c && !is_equal(&attach_cmd_ctx, c, k)) conf->adj_resize = 1; } void compare_volume(struct d_volume *conf, struct d_volume *kern) { /* Special-case "max-bio-bvecs", we do not allow to change that * while attached, yet. * Also special case "size", we need to issue a resize command to change that. * Move both options to the head of the disk_options list, * so we can easily skip them in the opts_equal, later. */ struct d_option *c, *k; /* do we need to do a full attach, * potentially with a detach first? */ conf->adj_attach = (conf->device_minor != kern->device_minor) || !disk_equal(conf, kern); /* do we need to do a full (detach/)attach, * because max_bio_bvec setting differs? */ compare_max_bio_bvecs(conf, kern); /* do we need to resize? */ compare_size(conf, kern); /* skip these two options (if present) for the opts_equal below. * These have been move_opt_to_head()ed before already. */ k = kern->disk_options; while (k && (!strcmp(k->name, "size") || !strcmp(k->name, "max-bio-bvecs"))) k = k->next; c = conf->disk_options; while (c && (!strcmp(c->name, "size") || !strcmp(c->name, "max-bio-bvecs"))) c = c->next; /* is it sufficient to only adjust the disk options? */ if (!conf->adj_attach) conf->adj_disk_opts = !opts_equal(&disk_options_ctx, c, k); if (conf->adj_attach && kern->disk) conf->adj_detach = 1; } struct d_volume *new_to_be_deleted_minor_from_template(struct d_volume *kern) { /* need to delete it from kernel. * Create a minimal volume, * and flag it as "del_minor". */ struct d_volume *conf = calloc(1, sizeof(*conf)); conf->vnr = kern->vnr; /* conf->device: no need */ conf->device_minor = kern->device_minor; conf->disk = strdup(kern->disk); conf->meta_disk = strdup(kern->meta_disk); conf->meta_index = strdup(kern->meta_index); conf->adj_detach = 1; conf->adj_del_minor = 1; return conf; } #define ASSERT(x) do { if (!(x)) { \ fprintf(stderr, "%s:%u:%s: ASSERT(%s) failed.\n", \ __FILE__ , __LINE__ , __func__ , #x ); \ abort(); } \ } while (0) /* Both conf and kern are single linked lists * supposed to be ordered by ->vnr; * We may need to conjure dummy volumes to issue "del-minor" on, * and insert these into the conf list. * The resulting new conf list head is returned. */ struct d_volume *compare_volumes(struct d_volume *conf, struct d_volume *kern) { struct d_volume *to_be_deleted = NULL; struct d_volume *conf_head = conf; while (conf || kern) { if (kern && (conf == NULL || kern->vnr < conf->vnr)) { to_be_deleted = INSERT_SORTED(to_be_deleted, new_to_be_deleted_minor_from_template(kern), vnr); kern = kern->next; } else if (conf && (kern == NULL || kern->vnr > conf->vnr)) { conf->adj_add_minor = 1; conf->adj_attach = 1; conf = conf->next; } else { ASSERT(conf); ASSERT(kern); ASSERT(conf->vnr == kern->vnr); compare_volume(conf, kern); conf = conf->next; kern = kern->next; } } for_each_volume(conf, to_be_deleted) conf_head = INSERT_SORTED(conf_head, conf, vnr); return conf_head; } /* * CAUTION this modifies global static char * config_file! */ int adm_adjust(struct cfg_ctx *ctx) { char* argv[20]; int pid,argc, i; struct d_resource* running; struct d_volume *vol; /* necessary per resource actions */ int do_res_options = 0; /* necessary per connection actions * (currently we still only have one connection per resource */ int do_net_options = 0; int do_disconnect = 0; int do_connect = 0; /* necessary per volume actions are flagged * in the vol->adj_* members. */ int can_do_proxy = 1; char config_file_dummy[250]; char show_conn[128]; char *resource_name; /* disable check_uniq, so it won't interfere * with parsing of drbdsetup show output */ config_valid = 2; /* setup error reporting context for the parsing routines */ line = 1; sprintf(config_file_dummy,"drbdsetup show %s", ctx->res->name); config_file = config_file_dummy; argc=0; argv[argc++]=drbdsetup; argv[argc++]="show"; ssprintf(argv[argc++], "%s", ctx->res->name); argv[argc++]=0; /* actually parse drbdsetup show output */ yyin = m_popen(&pid,argv); running = parse_resource_for_adjust(ctx); fclose(yyin); waitpid(pid, 0, 0); if (running) { /* Sets "me" and "peer" pointer */ post_parse(running, 0); set_peer_in_resource(running, 0); } /* Parse proxy settings, if this host has a proxy definition. * FIXME what about "zombie" proxy settings, if we remove proxy * settings from the config file without prior proxy-down, this won't * clean them from the proxy. */ if (ctx->res->me->proxy) { line = 1; resource_name = proxy_connection_name(ctx->res); i=snprintf(show_conn, sizeof(show_conn), "show proxy-settings %s", resource_name); if (i>= sizeof(show_conn)-1) { fprintf(stderr,"connection name too long"); exit(E_thinko); } sprintf(config_file_dummy,"drbd-proxy-ctl -c '%s'", show_conn); config_file = config_file_dummy; argc=0; argv[argc++]=drbd_proxy_ctl; argv[argc++]="-c"; argv[argc++]=show_conn; argv[argc++]=0; /* actually parse "drbd-proxy-ctl show" output */ yyin = m_popen(&pid,argv); can_do_proxy = !parse_proxy_options_section(running); fclose(yyin); waitpid(pid,0,0); } ctx->res->me->volumes = compare_volumes(ctx->res->me->volumes, running ? running->me->volumes : NULL); if (running) { do_connect = !addr_equal(ctx->res,running); do_net_options = !opts_equal(&net_options_ctx, ctx->res->net_options, running->net_options); do_res_options = !opts_equal(&resource_options_cmd_ctx, ctx->res->res_options, running->res_options); } else { do_res_options = 0; do_connect = 1; schedule_deferred_cmd(adm_new_resource, ctx, "new-resource", CFG_PREREQ); } if (ctx->res->me->proxy && can_do_proxy) do_connect |= proxy_reconf(ctx, running); do_disconnect = do_connect && running && (running->peer || running->net_options); if (do_res_options) schedule_deferred_cmd(adm_set_default_res_options, ctx, "resource-options", CFG_RESOURCE); /* do we need to attach, * do we need to detach first, * or is this just some attribute change? */ for_each_volume(vol, ctx->res->me->volumes) { struct cfg_ctx tmp_ctx = { .res = ctx->res, .vol = vol }; if (vol->adj_detach) schedule_deferred_cmd(adm_generic_s, &tmp_ctx, "detach", CFG_PREREQ); if (vol->adj_del_minor) schedule_deferred_cmd(adm_generic_s, &tmp_ctx, "del-minor", CFG_PREREQ); if (vol->adj_add_minor) schedule_deferred_cmd(adm_new_minor, &tmp_ctx, "new-minor", CFG_DISK_PREREQ); if (vol->adj_attach) schedule_deferred_cmd(adm_attach, &tmp_ctx, "attach", CFG_DISK); if (vol->adj_disk_opts) schedule_deferred_cmd(adm_set_default_disk_options, &tmp_ctx, "disk-options", CFG_DISK); if (vol->adj_resize) schedule_deferred_cmd(adm_resize, &tmp_ctx, "resize", CFG_DISK); } if (do_connect) { /* "disconnect" specifying the end-point addresses currently in-use, * before "connect"ing with the addresses currently in-config-file. */ if (do_disconnect) { struct cfg_ctx tmp_ctx = { .res = running, .vol = vol, }; schedule_deferred_cmd(adm_disconnect, &tmp_ctx, "disconnect", CFG_NET_PREREQ); } schedule_deferred_cmd(adm_connect, ctx, "connect", CFG_NET); do_net_options = 0; } if (do_net_options) schedule_deferred_cmd(adm_set_default_net_options, ctx, "net-options", CFG_NET); return 0; } drbd-8.4.4/user/drbdadm_main.c0000664000000000000000000031016612221331365014672 0ustar rootroot/* drbdadm_main.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2002-2008, LINBIT Information Technologies GmbH. Copyright (C) 2002-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "linux/drbd_limits.h" #include "drbdtool_common.h" #include "drbdadm.h" #include "registry.h" #include "config_flags.h" #define MAX_ARGS 40 static int indent = 0; #define INDENT_WIDTH 4 #define BFMT "%s;\n" #define IPV4FMT "%-16s %s %s:%s;\n" #define IPV6FMT "%-16s %s [%s]:%s;\n" #define MDISK "%-16s %s;\n" #define MDISKI "%-16s %s [%s];\n" #define printI(fmt, args... ) printf("%*s" fmt,INDENT_WIDTH * indent,"" , ## args ) #define printA(name, val ) \ printf("%*s%*s %3s;\n", \ INDENT_WIDTH * indent,"" , \ -24+INDENT_WIDTH * indent, \ name, val ) char *progname; struct adm_cmd { const char *name; int (*function) (struct cfg_ctx *); /* which level this command is for. * 0: don't show this command, ever * 1: normal administrative commands, shown in normal help * 2-4: shown on "drbdadm hidden-commands" * 2: useful for shell scripts * 3: callbacks potentially called from kernel module on certain events * 4: advanced, experts and developers only */ unsigned int show_in_usage:3; /* if set, command requires an explicit resource name */ unsigned int res_name_required:1; /* if set, command requires an explicit volume number as well */ unsigned int vol_id_required:1; /* most commands need to iterate over all volumes in the resource */ unsigned int iterate_volumes:1; /* error out if the ip specified is not available/active now */ unsigned int verify_ips:1; /* if set, use the "cache" in /var/lib/drbd to figure out * which config file to use. * This is necessary for handlers (callbacks from kernel) to work * when using "drbdadm -c /some/other/config/file" */ unsigned int use_cached_config_file:1; unsigned int need_peer:1; unsigned int is_proxy_cmd:1; unsigned int uc_dialog:1; /* May show usage count dialog */ unsigned int test_config:1; /* Allow -t option */ const struct context_def *drbdsetup_ctx; }; struct deferred_cmd { int (*function) (struct cfg_ctx *); struct cfg_ctx ctx; struct deferred_cmd *next; }; struct option general_admopt[] = { {"stacked", no_argument, 0, 'S'}, {"dry-run", no_argument, 0, 'd'}, {"verbose", no_argument, 0, 'v'}, {"config-file", required_argument, 0, 'c'}, {"config-to-test", required_argument, 0, 't'}, {"drbdsetup", required_argument, 0, 's'}, {"drbdmeta", required_argument, 0, 'm'}, {"drbd-proxy-ctl", required_argument, 0, 'p'}, {"sh-varname", required_argument, 0, 'n'}, {"peer", required_argument, 0, 'P'}, {"version", no_argument, 0, 'V'}, {"setup-option", required_argument, 0, 'W'}, {"help", no_argument, 0, 'h'}, {0, 0, 0, 0} }; struct option *admopt = general_admopt; extern int my_parse(); extern int yydebug; extern FILE *yyin; static int adm_generic_l(struct cfg_ctx *); static int adm_up(struct cfg_ctx *); static int adm_dump(struct cfg_ctx *); static int adm_dump_xml(struct cfg_ctx *); static int adm_wait_c(struct cfg_ctx *); static int adm_wait_ci(struct cfg_ctx *); static int adm_proxy_up(struct cfg_ctx *); static int adm_proxy_down(struct cfg_ctx *); static int sh_nop(struct cfg_ctx *); static int sh_resources(struct cfg_ctx *); static int sh_resource(struct cfg_ctx *); static int sh_mod_parms(struct cfg_ctx *); static int sh_dev(struct cfg_ctx *); static int sh_udev(struct cfg_ctx *); static int sh_minor(struct cfg_ctx *); static int sh_ip(struct cfg_ctx *); static int sh_lres(struct cfg_ctx *); static int sh_ll_dev(struct cfg_ctx *); static int sh_md_dev(struct cfg_ctx *); static int sh_md_idx(struct cfg_ctx *); static int sh_b_pri(struct cfg_ctx *); static int sh_status(struct cfg_ctx *); static int admm_generic(struct cfg_ctx *); static int adm_khelper(struct cfg_ctx *); static int adm_generic_b(struct cfg_ctx *); static int hidden_cmds(struct cfg_ctx *); static int adm_outdate(struct cfg_ctx *); static int adm_chk_resize(struct cfg_ctx *); static void dump_options(char *name, struct d_option *opts); struct d_volume *volume_by_vnr(struct d_volume *volumes, int vnr); struct d_resource *res_by_name(const char *name); int ctx_by_name(struct cfg_ctx *ctx, const char *id); int ctx_set_implicit_volume(struct cfg_ctx *ctx); static char *get_opt_val(struct d_option *, const char *, char *); static struct ifreq *get_ifreq(); char ss_buffer[1024]; struct utsname nodeinfo; int line = 1; int fline; struct d_globals global_options = { 0, 0, 0, 1, UC_ASK }; char *config_file = NULL; char *config_save = NULL; char *config_test = NULL; struct d_resource *config = NULL; struct d_resource *common = NULL; struct ifreq *ifreq_list = NULL; int is_drbd_top; enum { NORMAL, STACKED, IGNORED, __N_RESOURCE_TYPES }; int nr_resources[__N_RESOURCE_TYPES]; int nr_volumes[__N_RESOURCE_TYPES]; int highest_minor; int number_of_minors = 0; int config_from_stdin = 0; int config_valid = 1; int no_tty; int dry_run = 0; int verbose = 0; int adjust_with_progress = 0; bool help; int do_verify_ips = 0; int do_register = 1; /* whether drbdadm was called with "all" instead of resource name(s) */ int all_resources = 0; char *drbdsetup = NULL; char *drbdmeta = NULL; char *drbdadm_83 = NULL; char *drbd_proxy_ctl; char *sh_varname = NULL; struct setup_option *setup_options; char *connect_to_host = NULL; volatile int alarm_raised; struct deferred_cmd *deferred_cmds[__CFG_LAST] = { NULL, }; struct deferred_cmd *deferred_cmds_tail[__CFG_LAST] = { NULL, }; void add_setup_option(bool explicit, char *option) { int n = 0; if (setup_options) { while (setup_options[n].option) n++; } setup_options = realloc(setup_options, (n + 2) * sizeof(*setup_options)); if (!setup_options) { /* ... */ } setup_options[n].explicit = explicit; setup_options[n].option = option; n++; setup_options[n].option = NULL; } int adm_adjust_wp(struct cfg_ctx *ctx) { if (!verbose && !dry_run) adjust_with_progress = 1; return adm_adjust(ctx); } /* DRBD adm_cmd flags shortcuts, * to avoid merge conflicts and unreadable diffs * when we add the next flag */ #define DRBD_acf1_default \ .show_in_usage = 1, \ .res_name_required = 1, \ .iterate_volumes = 1, \ .verify_ips = 0, \ .uc_dialog = 1, \ #define DRBD_acf1_resname \ .show_in_usage = 1, \ .res_name_required = 1, \ .uc_dialog = 1, \ #define DRBD_acf1_connect \ .show_in_usage = 1, \ .res_name_required = 1, \ .iterate_volumes = 0, \ .verify_ips = 1, \ .need_peer = 1, \ .uc_dialog = 1, \ #define DRBD_acf1_up \ .show_in_usage = 1, \ .res_name_required = 1, \ .iterate_volumes = 1, \ .verify_ips = 1, \ .need_peer = 1, \ .uc_dialog = 1, \ #define DRBD_acf1_defnet \ .show_in_usage = 1, \ .res_name_required = 1, \ .iterate_volumes = 1, \ .verify_ips = 1, \ .uc_dialog = 1, \ #define DRBD_acf3_handler \ .show_in_usage = 3, \ .res_name_required = 1, \ .iterate_volumes = 0, \ .vol_id_required = 1, \ .verify_ips = 0, \ .use_cached_config_file = 1, \ #define DRBD_acf3_res_handler \ .show_in_usage = 3, \ .res_name_required = 1, \ .iterate_volumes = 0, \ .vol_id_required = 0, \ .verify_ips = 0, \ .use_cached_config_file = 1, \ #define DRBD_acf4_advanced \ .show_in_usage = 4, \ .res_name_required = 1, \ .iterate_volumes = 1, \ .verify_ips = 0, \ .uc_dialog = 1, \ #define DRBD_acf4_advanced_need_vol \ .show_in_usage = 4, \ .res_name_required = 1, \ .iterate_volumes = 0, \ .vol_id_required = 1, \ .verify_ips = 0, \ .uc_dialog = 1, \ #define DRBD_acf1_dump \ .show_in_usage = 1, \ .res_name_required = 1, \ .verify_ips = 1, \ .uc_dialog = 1, \ .test_config = 1, \ #define DRBD_acf2_shell \ .show_in_usage = 2, \ .iterate_volumes = 1, \ .res_name_required = 1, \ .verify_ips = 0, \ #define DRBD_acf2_sh_resname \ .show_in_usage = 2, \ .iterate_volumes = 0, \ .res_name_required = 1, \ .verify_ips = 0, \ #define DRBD_acf2_proxy \ .show_in_usage = 2, \ .res_name_required = 1, \ .verify_ips = 0, \ .need_peer = 1, \ .is_proxy_cmd = 1, \ #define DRBD_acf2_hook \ .show_in_usage = 2, \ .res_name_required = 1, \ .verify_ips = 0, \ .use_cached_config_file = 1, \ #define DRBD_acf2_gen_shell \ .show_in_usage = 2, \ .res_name_required = 0, \ .verify_ips = 0, \ struct adm_cmd cmds[] = { /* name, function, flags * sort order: * - normal config commands, * - normal meta data manipulation * - sh-* * - handler * - advanced ***/ {"attach", adm_attach, DRBD_acf1_default .drbdsetup_ctx = &attach_cmd_ctx, }, {"disk-options", adm_disk_options, DRBD_acf1_default .drbdsetup_ctx = &disk_options_ctx, }, {"detach", adm_generic_l, DRBD_acf1_default .drbdsetup_ctx = &detach_cmd_ctx, }, {"connect", adm_connect, DRBD_acf1_connect .drbdsetup_ctx = &connect_cmd_ctx, }, {"net-options", adm_net_options, DRBD_acf1_connect .drbdsetup_ctx = &net_options_ctx, }, {"disconnect", adm_disconnect, DRBD_acf1_resname .drbdsetup_ctx = &disconnect_cmd_ctx, }, {"up", adm_up, DRBD_acf1_up}, {"resource-options", adm_res_options, DRBD_acf1_resname .drbdsetup_ctx = &resource_options_cmd_ctx, }, {"down", adm_generic_l, DRBD_acf1_resname}, {"primary", adm_generic_l, DRBD_acf1_default .drbdsetup_ctx = &primary_cmd_ctx, }, {"secondary", adm_generic_l, DRBD_acf1_default}, {"invalidate", adm_generic_b, DRBD_acf1_default}, {"invalidate-remote", adm_generic_l, DRBD_acf1_defnet}, {"outdate", adm_outdate, DRBD_acf1_default}, {"resize", adm_resize, DRBD_acf1_defnet}, {"verify", adm_generic_s, DRBD_acf1_defnet}, {"pause-sync", adm_generic_s, DRBD_acf1_defnet}, {"resume-sync", adm_generic_s, DRBD_acf1_defnet}, {"adjust", adm_adjust, DRBD_acf1_connect}, {"adjust-with-progress", adm_adjust_wp, DRBD_acf1_connect}, {"wait-connect", adm_wait_c, DRBD_acf1_defnet}, {"wait-con-int", adm_wait_ci, .show_in_usage = 1,.verify_ips = 1,}, {"role", adm_generic_s, DRBD_acf1_default}, {"cstate", adm_generic_s, DRBD_acf1_default}, {"dstate", adm_generic_b, DRBD_acf1_default}, {"dump", adm_dump, DRBD_acf1_dump}, {"dump-xml", adm_dump_xml, DRBD_acf1_dump}, {"create-md", adm_create_md, DRBD_acf1_default}, {"show-gi", adm_generic_b, DRBD_acf1_default}, {"get-gi", adm_generic_b, DRBD_acf1_default}, {"dump-md", admm_generic, DRBD_acf1_default}, {"wipe-md", admm_generic, DRBD_acf1_default}, {"apply-al", admm_generic, DRBD_acf1_default}, {"hidden-commands", hidden_cmds,.show_in_usage = 1,}, {"sh-nop", sh_nop, DRBD_acf2_gen_shell .uc_dialog = 1, .test_config = 1}, {"sh-resources", sh_resources, DRBD_acf2_gen_shell}, {"sh-resource", sh_resource, DRBD_acf2_sh_resname}, {"sh-mod-parms", sh_mod_parms, DRBD_acf2_gen_shell}, {"sh-dev", sh_dev, DRBD_acf2_shell}, {"sh-udev", sh_udev, .vol_id_required = 1, DRBD_acf2_hook}, {"sh-minor", sh_minor, DRBD_acf2_shell}, {"sh-ll-dev", sh_ll_dev, DRBD_acf2_shell}, {"sh-md-dev", sh_md_dev, DRBD_acf2_shell}, {"sh-md-idx", sh_md_idx, DRBD_acf2_shell}, {"sh-ip", sh_ip, DRBD_acf2_shell}, {"sh-lr-of", sh_lres, DRBD_acf2_shell}, {"sh-b-pri", sh_b_pri, DRBD_acf2_shell}, {"sh-status", sh_status, DRBD_acf2_gen_shell}, {"proxy-up", adm_proxy_up, DRBD_acf2_proxy}, {"proxy-down", adm_proxy_down, DRBD_acf2_proxy}, {"new-resource", adm_new_resource, DRBD_acf2_sh_resname}, {"sh-new-minor", adm_new_minor, DRBD_acf4_advanced}, {"before-resync-target", adm_khelper, DRBD_acf3_handler}, {"after-resync-target", adm_khelper, DRBD_acf3_handler}, {"before-resync-source", adm_khelper, DRBD_acf3_handler}, {"pri-on-incon-degr", adm_khelper, DRBD_acf3_handler}, {"pri-lost-after-sb", adm_khelper, DRBD_acf3_handler}, {"fence-peer", adm_khelper, DRBD_acf3_res_handler}, {"local-io-error", adm_khelper, DRBD_acf3_handler}, {"pri-lost", adm_khelper, DRBD_acf3_handler}, {"initial-split-brain", adm_khelper, DRBD_acf3_handler}, {"split-brain", adm_khelper, DRBD_acf3_handler}, {"out-of-sync", adm_khelper, DRBD_acf3_handler}, {"suspend-io", adm_generic_s, DRBD_acf4_advanced}, {"resume-io", adm_generic_s, DRBD_acf4_advanced}, {"set-gi", admm_generic, DRBD_acf4_advanced_need_vol}, {"new-current-uuid", adm_generic_s, DRBD_acf4_advanced_need_vol .drbdsetup_ctx = &new_current_uuid_cmd_ctx, }, {"check-resize", adm_chk_resize, DRBD_acf4_advanced}, }; void schedule_deferred_cmd(int (*function) (struct cfg_ctx *), struct cfg_ctx *ctx, const char *arg, enum drbd_cfg_stage stage) { struct deferred_cmd *d, *t; d = calloc(1, sizeof(struct deferred_cmd)); if (d == NULL) { perror("calloc"); exit(E_exec_error); } d->function = function; d->ctx.res = ctx->res; d->ctx.vol = ctx->vol; d->ctx.arg = arg; /* first to come is head */ if (!deferred_cmds[stage]) deferred_cmds[stage] = d; /* link it in at tail */ t = deferred_cmds_tail[stage]; if (t) t->next = d; /* advance tail */ deferred_cmds_tail[stage] = d; } enum on_error { KEEP_RUNNING, EXIT_ON_FAIL }; int call_cmd_fn(int (*function) (struct cfg_ctx *), struct cfg_ctx *ctx, enum on_error on_error) { int rv; rv = function(ctx); if (rv >= 20) { if (on_error == EXIT_ON_FAIL) exit(rv); } return rv; } /* If ctx->vol is NULL, and cmd->iterate_volumes is set, * iterate over all volumes in ctx->res. * Else, just pass it on. * */ int call_cmd(struct adm_cmd *cmd, struct cfg_ctx *ctx, enum on_error on_error) { struct cfg_ctx tmp_ctx = *ctx; struct d_resource *res = ctx->res; struct d_volume *vol; int ret; if (!res->peer) set_peer_in_resource(res, cmd->need_peer); if (!cmd->iterate_volumes || ctx->vol != NULL) return call_cmd_fn(cmd->function, &tmp_ctx, on_error); for_each_volume(vol, res->me->volumes) { tmp_ctx.vol = vol; ret = call_cmd_fn(cmd->function, &tmp_ctx, on_error); /* FIXME: Do we want to keep running? * When? * How would we determine which return value to return? */ if (ret) return ret; } return 0; } static char *drbd_cfg_stage_string[] = { [CFG_PREREQ] = "create res", [CFG_RESOURCE] = "adjust res", [CFG_DISK_PREREQ] = "prepare disk", [CFG_DISK] = "adjust disk", [CFG_NET_PREREQ] = "prepare net", [CFG_NET] = "adjust net", }; int _run_deferred_cmds(enum drbd_cfg_stage stage) { struct d_resource *last_res = NULL; struct deferred_cmd *d = deferred_cmds[stage]; struct deferred_cmd *t; int r; int rv = 0; if (d && adjust_with_progress) { printf("\n%15s:", drbd_cfg_stage_string[stage]); fflush(stdout); } while (d) { if (d->ctx.res->skip_further_deferred_command) { if (adjust_with_progress) { if (d->ctx.res != last_res) printf(" [skipped:%s]", d->ctx.res->name); } else fprintf(stderr, "%s: %s %s: skipped due to earlier error\n", progname, d->ctx.arg, d->ctx.res->name); r = 0; } else { if (adjust_with_progress) { if (d->ctx.res != last_res) printf(" %s", d->ctx.res->name); } r = call_cmd_fn(d->function, &d->ctx, KEEP_RUNNING); if (r) { /* If something in the "prerequisite" stages failed, * there is no point in trying to continue. * However if we just failed to adjust some * options, or failed to attach, we still want * to adjust other options, or try to connect. */ if (stage == CFG_PREREQ || stage == CFG_DISK_PREREQ) d->ctx.res->skip_further_deferred_command = 1; if (adjust_with_progress) printf(":failed(%s:%u)", d->ctx.arg, r); } } last_res = d->ctx.res; t = d->next; free(d); d = t; if (r > rv) rv = r; } return rv; } int run_deferred_cmds(void) { enum drbd_cfg_stage stage; int r; int ret = 0; if (adjust_with_progress) printf("["); for (stage = CFG_PREREQ; stage < __CFG_LAST; stage++) { r = _run_deferred_cmds(stage); if (r) { if (!adjust_with_progress) return 1; /* FIXME r? */ ret = 1; } } if (adjust_with_progress) printf("\n]\n"); return ret; } /*** These functions are used to the print the config ***/ static char *esc(char *str) { static char buffer[1024]; char *ue = str, *e = buffer; if (!str || !str[0]) { return "\"\""; } if (strchr(str, ' ') || strchr(str, '\t') || strchr(str, '\\')) { *e++ = '"'; while (*ue) { if (*ue == '"' || *ue == '\\') { *e++ = '\\'; } if (e - buffer >= 1022) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } *e++ = *ue++; if (e - buffer >= 1022) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } } *e++ = '"'; *e++ = '\0'; return buffer; } return str; } static char *esc_xml(char *str) { static char buffer[1024]; char *ue = str, *e = buffer; if (!str || !str[0]) { return ""; } if (strchr(str, '"') || strchr(str, '\'') || strchr(str, '<') || strchr(str, '>') || strchr(str, '&') || strchr(str, '\\')) { while (*ue) { if (*ue == '"' || *ue == '\\') { *e++ = '\\'; if (e - buffer >= 1021) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } *e++ = *ue++; } else if (*ue == '\'' || *ue == '<' || *ue == '>' || *ue == '&') { if (*ue == '\'' && e - buffer < 1017) { strcpy(e, "'"); e += 6; } else if (*ue == '<' && e - buffer < 1019) { strcpy(e, "<"); e += 4; } else if (*ue == '>' && e - buffer < 1019) { strcpy(e, ">"); e += 4; } else if (*ue == '&' && e - buffer < 1018) { strcpy(e, "&"); e += 5; } else { fprintf(stderr, "string too long.\n"); exit(E_syntax); } ue++; } else { *e++ = *ue++; if (e - buffer >= 1022) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } } } *e++ = '\0'; return buffer; } return str; } static void dump_options2(char *name, struct d_option *opts, void(*within)(void*), void *ctx) { if (!opts && !(within && ctx)) return; printI("%s {\n", name); ++indent; while (opts) { if (opts->value) printA(opts->name, opts->is_escaped ? opts->value : esc(opts-> value)); else printI(BFMT, opts->name); opts = opts->next; } if (within) within(ctx); --indent; printI("}\n"); } static void dump_options(char *name, struct d_option *opts) { dump_options2(name, opts, NULL, NULL); } void dump_proxy_plugins(void *ctx) { struct d_option *opt = ctx; dump_options("plugin", opt); } static void dump_global_info() { if (!global_options.minor_count && !global_options.disable_ip_verification && global_options.dialog_refresh == 1) return; printI("global {\n"); ++indent; if (global_options.disable_ip_verification) printI("disable-ip-verification;\n"); if (global_options.minor_count) printI("minor-count %i;\n", global_options.minor_count); if (global_options.dialog_refresh != 1) printI("dialog-refresh %i;\n", global_options.dialog_refresh); --indent; printI("}\n\n"); } static void fake_startup_options(struct d_resource *res); static void dump_common_info() { if (!common) return; printI("common {\n"); ++indent; fake_startup_options(common); dump_options("options", common->res_options); dump_options("net", common->net_options); dump_options("disk", common->disk_options); dump_options("startup", common->startup_options); dump_options2("proxy", common->proxy_options, dump_proxy_plugins, common->proxy_plugins); dump_options("handlers", common->handlers); --indent; printf("}\n\n"); } static void dump_address(char *name, char *addr, char *port, char *af) { if (!strcmp(af, "ipv6")) printI(IPV6FMT, name, af, addr, port); else printI(IPV4FMT, name, af, addr, port); } static void dump_proxy_info(struct d_proxy_info *pi) { printI("proxy on %s {\n", names_to_str(pi->on_hosts)); ++indent; dump_address("inside", pi->inside_addr, pi->inside_port, pi->inside_af); dump_address("outside", pi->outside_addr, pi->outside_port, pi->outside_af); dump_options2("options", pi->options, dump_proxy_plugins, pi->plugins); --indent; printI("}\n"); } static void dump_volume(int has_lower, struct d_volume *vol) { if (!vol->implicit) { printI("volume %d {\n", vol->vnr); ++indent; } dump_options("disk", vol->disk_options); printI("device%*s", -19 + INDENT_WIDTH * indent, ""); if (vol->device) printf("%s ", esc(vol->device)); printf("minor %d;\n", vol->device_minor); if (!has_lower) printA("disk", esc(vol->disk)); if (!has_lower) { if (!strcmp(vol->meta_index, "flexible")) printI(MDISK, "meta-disk", esc(vol->meta_disk)); else if (!strcmp(vol->meta_index, "internal")) printA("meta-disk", "internal"); else printI(MDISKI, "meta-disk", esc(vol->meta_disk), vol->meta_index); } if (!vol->implicit) { --indent; printI("}\n"); } } static void dump_host_info(struct d_host_info *hi) { struct d_volume *vol; if (!hi) { printI(" # No host section data available.\n"); return; } if (hi->lower) { printI("stacked-on-top-of %s {\n", esc(hi->lower->name)); ++indent; printI("# on %s \n", names_to_str(hi->on_hosts)); } else if (hi->by_address) { if (!strcmp(hi->address_family, "ipv6")) printI("floating ipv6 [%s]:%s {\n", hi->address, hi->port); else printI("floating %s %s:%s {\n", hi->address_family, hi->address, hi->port); ++indent; } else { printI("on %s {\n", names_to_str(hi->on_hosts)); ++indent; } dump_options("options", hi->res_options); for_each_volume(vol, hi->volumes) dump_volume(!!hi->lower, vol); if (!hi->by_address) dump_address("address", hi->address, hi->port, hi->address_family); if (hi->proxy) dump_proxy_info(hi->proxy); --indent; printI("}\n"); } static void dump_options_xml2(char *name, struct d_option *opts, void(*within)(void*), void *ctx) { if (!opts && !(within && ctx)) return; printI("
\n", name); ++indent; while (opts) { if (opts->value) printI("
\n"); } static void dump_options_xml(char *name, struct d_option *opts) { dump_options_xml2(name, opts, NULL, NULL); } void dump_proxy_plugins_xml(void *ctx) { struct d_option *opt = ctx; dump_options_xml("plugin", opt); } static void dump_global_info_xml() { if (!global_options.minor_count && !global_options.disable_ip_verification && global_options.dialog_refresh == 1) return; printI("\n"); ++indent; if (global_options.disable_ip_verification) printI("\n"); if (global_options.minor_count) printI("\n", global_options.minor_count); if (global_options.dialog_refresh != 1) printI("\n", global_options.dialog_refresh); --indent; printI("\n"); } static void dump_common_info_xml() { if (!common) return; printI("\n"); ++indent; fake_startup_options(common); dump_options_xml("options", common->res_options); dump_options_xml("net", common->net_options); dump_options_xml("disk", common->disk_options); dump_options_xml("startup", common->startup_options); dump_options_xml2("proxy", common->proxy_options, dump_proxy_plugins_xml, common->proxy_plugins); dump_options_xml("handlers", common->handlers); --indent; printI("\n"); } static void dump_proxy_info_xml(struct d_proxy_info *pi) { printI("\n", names_to_str(pi->on_hosts)); ++indent; printI("%s\n", pi->inside_af, pi->inside_port, pi->inside_addr); printI("%s\n", pi->outside_af, pi->outside_port, pi->outside_addr); dump_options_xml2("options", pi->options, dump_proxy_plugins_xml, pi->plugins); --indent; printI("\n"); } static void dump_volume_xml(struct d_volume *vol) { printI("\n", vol->vnr); ++indent; dump_options_xml("disk", vol->disk_options); printI("%s\n", vol->device_minor, esc_xml(vol->device)); printI("%s\n", esc_xml(vol->disk)); if (!strcmp(vol->meta_index, "flexible")) printI("%s\n", esc_xml(vol->meta_disk)); else if (!strcmp(vol->meta_index, "internal")) printI("internal\n"); else { printI("%s\n", vol->meta_index, esc_xml(vol->meta_disk)); } --indent; printI("\n"); } static void dump_host_info_xml(struct d_host_info *hi) { struct d_volume *vol; if (!hi) { printI("\n"); return; } if (hi->by_address) printI("\n"); else printI("\n", names_to_str(hi->on_hosts)); ++indent; dump_options_xml("options", hi->res_options); for_each_volume(vol, hi->volumes) dump_volume_xml(vol); printI("
%s
\n", hi->address_family, hi->port, hi->address); if (hi->proxy) dump_proxy_info_xml(hi->proxy); --indent; printI("
\n"); } static void fake_startup_options(struct d_resource *res) { struct d_option *opt; char *val; if (res->stacked_timeouts) { opt = new_opt(strdup("stacked-timeouts"), NULL); res->startup_options = APPEND(res->startup_options, opt); } if (res->become_primary_on) { val = strdup(names_to_str(res->become_primary_on)); opt = new_opt(strdup("become-primary-on"), val); opt->is_escaped = 1; res->startup_options = APPEND(res->startup_options, opt); } } static int adm_dump(struct cfg_ctx *ctx) { struct d_host_info *host; struct d_resource *res = ctx->res; printI("# resource %s on %s: %s, %s\n", esc(res->name), nodeinfo.nodename, res->ignore ? "ignored" : "not ignored", res->stacked ? "stacked" : "not stacked"); printI("# defined at %s:%u\n", res->config_file, res->start_line); printI("resource %s {\n", esc(res->name)); ++indent; for (host = res->all_hosts; host; host = host->next) dump_host_info(host); fake_startup_options(res); dump_options("options", res->res_options); dump_options("net", res->net_options); dump_options("disk", res->disk_options); dump_options("startup", res->startup_options); dump_options2("proxy", res->proxy_options, dump_proxy_plugins, res->proxy_plugins); dump_options("handlers", res->handlers); --indent; printf("}\n\n"); return 0; } static int adm_dump_xml(struct cfg_ctx *ctx) { struct d_host_info *host; struct d_resource *res = ctx->res; printI("\n", esc_xml(res->name), esc_xml(res->config_file), res->start_line); ++indent; // else if (common && common->protocol) printA("# common protocol", common->protocol); for (host = res->all_hosts; host; host = host->next) dump_host_info_xml(host); fake_startup_options(res); dump_options_xml("options", res->res_options); dump_options_xml("net", res->net_options); dump_options_xml("disk", res->disk_options); dump_options_xml("startup", res->startup_options); dump_options_xml2("proxy", res->proxy_options, dump_proxy_plugins_xml, res->proxy_plugins); dump_options_xml("handlers", res->handlers); --indent; printI("\n"); return 0; } static int sh_nop(struct cfg_ctx *ctx) { return 0; } static int sh_resources(struct cfg_ctx *ctx) { struct d_resource *res, *t; int first = 1; for_each_resource(res, t, config) { if (res->ignore) continue; if (is_drbd_top != res->stacked) continue; printf(first ? "%s" : " %s", esc(res->name)); first = 0; } if (!first) printf("\n"); return 0; } static int sh_resource(struct cfg_ctx *ctx) { printf("%s\n", ctx->res->name); return 0; } static int sh_dev(struct cfg_ctx *ctx) { printf("%s\n", ctx->vol->device); return 0; } static int sh_udev(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; struct d_volume *vol = ctx->vol; /* No shell escape necessary. Udev does not handle it anyways... */ if (!vol) { fprintf(stderr, "volume not specified\n"); return 1; } if (vol->implicit) printf("RESOURCE=%s\n", res->name); else printf("RESOURCE=%s/%u\n", res->name, vol->vnr); if (!strncmp(vol->device, "/dev/drbd", 9)) printf("DEVICE=%s\n", vol->device + 5); else printf("DEVICE=drbd%u\n", vol->device_minor); if (!strncmp(vol->disk, "/dev/", 5)) printf("DISK=%s\n", vol->disk + 5); else printf("DISK=%s\n", vol->disk); return 0; } static int sh_minor(struct cfg_ctx *ctx) { printf("%d\n", ctx->vol->device_minor); return 0; } static int sh_ip(struct cfg_ctx *ctx) { printf("%s\n", ctx->res->me->address); return 0; } static int sh_lres(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; if (!is_drbd_top) { fprintf(stderr, "sh-lower-resource only available in stacked mode\n"); exit(E_usage); } if (!res->stacked) { fprintf(stderr, "'%s' is not stacked on this host (%s)\n", res->name, nodeinfo.nodename); exit(E_usage); } printf("%s\n", res->me->lower->name); return 0; } static int sh_ll_dev(struct cfg_ctx *ctx) { printf("%s\n", ctx->vol->disk); return 0; } static int sh_md_dev(struct cfg_ctx *ctx) { struct d_volume *vol = ctx->vol; char *r; if (strcmp("internal", vol->meta_disk) == 0) r = vol->disk; else r = vol->meta_disk; printf("%s\n", r); return 0; } static int sh_md_idx(struct cfg_ctx *ctx) { printf("%s\n", ctx->vol->meta_index); return 0; } static int sh_b_pri(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; int i, rv; if (name_in_names(nodeinfo.nodename, res->become_primary_on) || name_in_names("both", res->become_primary_on)) { /* upon connect resync starts, and both sides become primary at the same time. One's try might be declined since an other state transition happens. Retry. */ for (i = 0; i < 5; i++) { const char *old_arg = ctx->arg; ctx->arg = "primary"; rv = adm_generic_s(ctx); ctx->arg = old_arg; if (rv == 0) return rv; sleep(1); } return rv; } return 0; } /* FIXME this module parameter will go */ static int sh_mod_parms(struct cfg_ctx *ctx) { int mc = global_options.minor_count; if (mc == 0) { mc = number_of_minors + 3; if (mc > DRBD_MINOR_COUNT_MAX) mc = DRBD_MINOR_COUNT_MAX; if (mc < DRBD_MINOR_COUNT_DEF) mc = DRBD_MINOR_COUNT_DEF; } printf("minor_count=%d\n", mc); return 0; } static void free_volume(struct d_volume *vol) { if (!vol) return; free(vol->device); free(vol->disk); free(vol->meta_disk); free(vol->meta_index); free(vol); } static void free_host_info(struct d_host_info *hi) { struct d_volume *vol; if (!hi) return; free_names(hi->on_hosts); for_each_volume(vol, hi->volumes) free_volume(vol); free(hi->address); free(hi->address_family); free(hi->port); } static void free_options(struct d_option *opts) { struct d_option *f; while (opts) { free(opts->name); free(opts->value); f = opts; opts = opts->next; free(f); } } static void free_config(struct d_resource *res) { struct d_resource *f, *t; struct d_host_info *host; for_each_resource(f, t, res) { free(f->name); free_volume(f->volumes); for (host = f->all_hosts; host; host = host->next) free_host_info(host); free_options(f->net_options); free_options(f->disk_options); free_options(f->startup_options); free_options(f->proxy_options); free_options(f->handlers); free(f); } if (common) { free_options(common->net_options); free_options(common->disk_options); free_options(common->startup_options); free_options(common->proxy_options); free_options(common->handlers); free(common); } if (ifreq_list) free(ifreq_list); } static void expand_opts(struct d_option *co, struct d_option **opts) { struct d_option *no; while (co) { if (!find_opt(*opts, co->name)) { // prepend new item to opts no = new_opt(strdup(co->name), co->value ? strdup(co->value) : NULL); no->next = *opts; *opts = no; } co = co->next; } } static void expand_common(void) { struct d_resource *res, *tmp; struct d_volume *vol, *host_vol; struct d_host_info *h; /* make sure vol->device is non-NULL */ for_each_resource(res, tmp, config) { for (h = res->all_hosts; h; h = h->next) { for_each_volume(vol, h->volumes) { if (!vol->device) m_asprintf(&vol->device, "/dev/drbd%u", vol->device_minor); } } } for_each_resource(res, tmp, config) { if (!common) break; expand_opts(common->net_options, &res->net_options); expand_opts(common->disk_options, &res->disk_options); expand_opts(common->startup_options, &res->startup_options); expand_opts(common->proxy_options, &res->proxy_options); expand_opts(common->handlers, &res->handlers); expand_opts(common->res_options, &res->res_options); if (common->stacked_timeouts) res->stacked_timeouts = 1; if (!res->become_primary_on) res->become_primary_on = common->become_primary_on; if (common->proxy_plugins && !res->proxy_plugins) expand_opts(common->proxy_plugins, &res->proxy_plugins); } /* now that common disk options (if any) have been propagated to the * resource level, further propagate them to the volume level. */ for_each_resource(res, tmp, config) { for (h = res->all_hosts; h; h = h->next) { for_each_volume(vol, h->volumes) { expand_opts(res->disk_options, &vol->disk_options); } if (h->proxy) { expand_opts(res->proxy_options, &h->proxy->options); expand_opts(res->proxy_plugins, &h->proxy->plugins); } } } /* now from all volume/disk-options on resource level to host level */ for_each_resource(res, tmp, config) { for_each_volume(vol, res->volumes) { for (h = res->all_hosts; h; h = h->next) { host_vol = volume_by_vnr(h->volumes, vol->vnr); expand_opts(vol->disk_options, &host_vol->disk_options); } } } } static void find_drbdcmd(char **cmd, char **pathes) { char **path; path = pathes; while (*path) { if (access(*path, X_OK) == 0) { *cmd = *path; return; } path++; } fprintf(stderr, "Can not find command (drbdsetup/drbdmeta)\n"); exit(E_exec_error); } static void alarm_handler(int __attribute((unused)) signo) { alarm_raised = 1; } void m__system(char **argv, int flags, const char *res_name, pid_t *kid, int *fd, int *ex) { pid_t pid; int status, rv = -1; int timeout = 0; char **cmdline = argv; int pipe_fds[2]; struct sigaction so; struct sigaction sa; sa.sa_handler = &alarm_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; if (dry_run || verbose) { if (sh_varname && *cmdline) printf("%s=%s\n", sh_varname, res_name ? shell_escape(res_name) : ""); while (*cmdline) { printf("%s ", shell_escape(*cmdline++)); } printf("\n"); if (dry_run) { if (kid) *kid = -1; if (fd) *fd = 0; if (ex) *ex = 0; return; } } /* flush stdout and stderr, so output of drbdadm * and helper binaries is reported in order! */ fflush(stdout); fflush(stderr); if (adjust_with_progress && !(flags & RETURN_STDERR_FD)) flags |= SUPRESS_STDERR; if (flags & (RETURN_STDOUT_FD | RETURN_STDERR_FD)) { if (pipe(pipe_fds) < 0) { perror("pipe"); fprintf(stderr, "Error in pipe, giving up.\n"); exit(E_exec_error); } } pid = fork(); if (pid == -1) { fprintf(stderr, "Can not fork\n"); exit(E_exec_error); } if (pid == 0) { if (flags & RETURN_STDOUT_FD) { close(pipe_fds[0]); dup2(pipe_fds[1], 1); } if (flags & RETURN_STDERR_FD) { close(pipe_fds[0]); dup2(pipe_fds[1], 2); } if (flags & SUPRESS_STDERR) fclose(stderr); execvp(argv[0], argv); fprintf(stderr, "Can not exec\n"); exit(E_exec_error); } if (flags & (RETURN_STDOUT_FD | RETURN_STDERR_FD)) close(pipe_fds[1]); if (flags & SLEEPS_FINITE) { sigaction(SIGALRM, &sa, &so); alarm_raised = 0; switch (flags & SLEEPS_MASK) { case SLEEPS_SHORT: timeout = 5; break; case SLEEPS_LONG: timeout = COMM_TIMEOUT + 1; break; case SLEEPS_VERY_LONG: timeout = 600; break; default: fprintf(stderr, "logic bug in %s:%d\n", __FILE__, __LINE__); exit(E_thinko); } alarm(timeout); } if (kid) *kid = pid; if (fd) *fd = pipe_fds[0]; if (flags & (RETURN_STDOUT_FD | RETURN_STDERR_FD) || flags == RETURN_PID) return; while (1) { if (waitpid(pid, &status, 0) == -1) { if (errno != EINTR) break; if (alarm_raised) { alarm(0); sigaction(SIGALRM, &so, NULL); rv = 0x100; break; } else { fprintf(stderr, "logic bug in %s:%d\n", __FILE__, __LINE__); exit(E_exec_error); } } else { if (WIFEXITED(status)) { rv = WEXITSTATUS(status); break; } } } if (flags & SLEEPS_FINITE) { if (rv >= 10 && !(flags & (DONT_REPORT_FAILED | SUPRESS_STDERR))) { fprintf(stderr, "Command '"); for (cmdline = argv; *cmdline; cmdline++) { fprintf(stderr, "%s", *cmdline); if (cmdline[1]) fputc(' ', stderr); } if (alarm_raised) { fprintf(stderr, "' did not terminate within %u seconds\n", timeout); exit(E_exec_error); } else { fprintf(stderr, "' terminated with exit code %d\n", rv); } } } fflush(stdout); fflush(stderr); if (ex) *ex = rv; } #define NA(ARGC) \ ({ if((ARGC) >= MAX_ARGS) { fprintf(stderr,"MAX_ARGS too small\n"); \ exit(E_thinko); \ } \ (ARGC)++; \ }) static void add_setup_options(char **argv, int *argcp) { int argc = *argcp; int i; if (!setup_options) return; for (i = 0; setup_options[i].option; i++) argv[NA(argc)] = setup_options[i].option; *argcp = argc; } #define make_options(OPT) \ while(OPT) { \ if(OPT->value) { \ ssprintf(argv[NA(argc)],"--%s=%s",OPT->name,OPT->value); \ } else { \ ssprintf(argv[NA(argc)],"--%s",OPT->name); \ } \ OPT=OPT->next; \ } /* FIXME: Don't leak the memory allocated by asprintf. */ #define make_address(ADDR, PORT, AF) \ if (!strcmp(AF, "ipv6")) { \ m_asprintf(&argv[NA(argc)], "%s:[%s]:%s", AF, ADDR, PORT); \ } else { \ m_asprintf(&argv[NA(argc)], "%s:%s:%s", AF, ADDR, PORT); \ } static int adm_attach_or_disk_options(struct cfg_ctx *ctx, bool do_attach, bool reset) { struct d_volume *vol = ctx->vol; char *argv[MAX_ARGS]; struct d_option *opt; int argc = 0; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = do_attach ? "attach" : "disk-options"; ssprintf(argv[NA(argc)], "%d", vol->device_minor); if (do_attach) { argv[NA(argc)] = vol->disk; if (!strcmp(vol->meta_disk, "internal")) { argv[NA(argc)] = vol->disk; } else { argv[NA(argc)] = vol->meta_disk; } argv[NA(argc)] = vol->meta_index; } if (reset) argv[NA(argc)] = "--set-defaults"; if (reset || do_attach) { opt = ctx->vol->disk_options; if (!do_attach) { while (opt && opt->adj_skip) opt = opt->next; } make_options(opt); } add_setup_options(argv, &argc); argv[NA(argc)] = 0; return m_system_ex(argv, SLEEPS_LONG, ctx->res->name); } int adm_attach(struct cfg_ctx *ctx) { int rv; ctx->arg = "apply-al"; rv = admm_generic(ctx); if (rv) return rv; ctx->arg = "attach"; return adm_attach_or_disk_options(ctx, true, false); } int adm_disk_options(struct cfg_ctx *ctx) { return adm_attach_or_disk_options(ctx, false, false); } int adm_set_default_disk_options(struct cfg_ctx *ctx) { return adm_attach_or_disk_options(ctx, false, true); } struct d_option *find_opt(struct d_option *base, char *name) { while (base) { if (!strcmp(base->name, name)) { return base; } base = base->next; } return 0; } int adm_new_minor(struct cfg_ctx *ctx) { char *argv[MAX_ARGS]; int argc = 0, ex; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = "new-minor"; ssprintf(argv[NA(argc)], "%s", ctx->res->name); ssprintf(argv[NA(argc)], "%u", ctx->vol->device_minor); ssprintf(argv[NA(argc)], "%u", ctx->vol->vnr); argv[NA(argc)] = NULL; ex = m_system_ex(argv, SLEEPS_SHORT, ctx->res->name); if (!ex && do_register) register_minor(ctx->vol->device_minor, config_save); return ex; } static int adm_new_resource_or_res_options(struct cfg_ctx *ctx, bool do_new_resource, bool reset) { char *argv[MAX_ARGS]; int argc = 0, ex; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = do_new_resource ? "new-resource" : "resource-options"; ssprintf(argv[NA(argc)], "%s", ctx->res->name); if (reset) argv[NA(argc)] = "--set-defaults"; if (reset || do_new_resource) make_options(ctx->res->res_options); add_setup_options(argv, &argc); argv[NA(argc)] = NULL; ex = m_system_ex(argv, SLEEPS_SHORT, ctx->res->name); if (!ex && do_new_resource && do_register) register_resource(ctx->res->name, config_save); return ex; } int adm_new_resource(struct cfg_ctx *ctx) { return adm_new_resource_or_res_options(ctx, true, false); } int adm_res_options(struct cfg_ctx *ctx) { return adm_new_resource_or_res_options(ctx, false, false); } int adm_set_default_res_options(struct cfg_ctx *ctx) { return adm_new_resource_or_res_options(ctx, false, true); } int adm_resize(struct cfg_ctx *ctx) { char *argv[MAX_ARGS]; struct d_option *opt; int argc = 0; int silent; int ex; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = "resize"; ssprintf(argv[NA(argc)], "%d", ctx->vol->device_minor); opt = find_opt(ctx->vol->disk_options, "size"); if (!opt) opt = find_opt(ctx->res->disk_options, "size"); if (opt) ssprintf(argv[NA(argc)], "--%s=%s", opt->name, opt->value); add_setup_options(argv, &argc); argv[NA(argc)] = 0; /* if this is not "resize", but "check-resize", be silent! */ silent = !strcmp(ctx->arg, "check-resize") ? SUPRESS_STDERR : 0; ex = m_system_ex(argv, SLEEPS_SHORT | silent, ctx->res->name); if (ex) return ex; /* Record last-known bdev info. * Unfortunately drbdsetup did not have enough information * when doing the "resize", and in theory, _our_ information * about the backing device may even be wrong. * Call drbdsetup again, tell it to ask the kernel for * current config, and update the last known bdev info * according to that. */ /* argv[0] = drbdsetup; */ argv[1] = "check-resize"; /* argv[2] = minor; */ argv[3] = NULL; /* ignore exit code */ m_system_ex(argv, SLEEPS_SHORT | silent, ctx->res->name); return 0; } int _admm_generic(struct cfg_ctx *ctx, int flags) { struct d_volume *vol = ctx->vol; char *argv[MAX_ARGS]; int argc = 0; argv[NA(argc)] = drbdmeta; ssprintf(argv[NA(argc)], "%d", vol->device_minor); argv[NA(argc)] = "v08"; if (!strcmp(vol->meta_disk, "internal")) { argv[NA(argc)] = vol->disk; } else { argv[NA(argc)] = vol->meta_disk; } if (!strcmp(vol->meta_index, "flexible")) { if (!strcmp(vol->meta_disk, "internal")) { argv[NA(argc)] = "flex-internal"; } else { argv[NA(argc)] = "flex-external"; } } else { argv[NA(argc)] = vol->meta_index; } argv[NA(argc)] = (char *)ctx->arg; add_setup_options(argv, &argc); argv[NA(argc)] = 0; return m_system_ex(argv, flags, ctx->res->name); } static int admm_generic(struct cfg_ctx *ctx) { return _admm_generic(ctx, SLEEPS_VERY_LONG); } static void _adm_generic(struct cfg_ctx *ctx, int flags, pid_t *pid, int *fd, int *ex) { char *argv[MAX_ARGS]; int argc = 0; if (!ctx->res) { /* ASSERT */ fprintf(stderr, "sorry, need at least a resource name to call drbdsetup\n"); abort(); } argv[NA(argc)] = drbdsetup; argv[NA(argc)] = (char *)ctx->arg; if (ctx->vol) ssprintf(argv[NA(argc)], "%d", ctx->vol->device_minor); else ssprintf(argv[NA(argc)], "%s", ctx->res->name); add_setup_options(argv, &argc); argv[NA(argc)] = 0; setenv("DRBD_RESOURCE", ctx->res->name, 1); m__system(argv, flags, ctx->res->name, pid, fd, ex); } static int adm_generic(struct cfg_ctx *ctx, int flags) { int ex; _adm_generic(ctx, flags, NULL, NULL, &ex); return ex; } int adm_generic_s(struct cfg_ctx *ctx) { return adm_generic(ctx, SLEEPS_SHORT); } int sh_status(struct cfg_ctx *ctx) { struct d_resource *r, *t; struct d_volume *vol, *lower_vol; int rv = 0; if (!dry_run) { printf("_drbd_version=%s\n_drbd_api=%u\n", shell_escape(REL_VERSION), API_VERSION); printf("_config_file=%s\n\n\n", shell_escape(config_save)); } for_each_resource(r, t, config) { if (r->ignore) continue; ctx->res = r; printf("_conf_res_name=%s\n", shell_escape(r->name)); printf("_conf_file_line=%s:%u\n\n", shell_escape(r->config_file), r->start_line); if (r->stacked && r->me->lower) { printf("_stacked_on=%s\n", shell_escape(r->me->lower->name)); lower_vol = r->me->lower->me->volumes; } else { /* reset stuff */ printf("_stacked_on=\n"); printf("_stacked_on_device=\n"); printf("_stacked_on_minor=\n"); lower_vol = NULL; } /* TODO: remove this loop, have drbdsetup use dump * and optionally filter on resource name. * "stacked" information is not directly known to drbdsetup, though. */ for_each_volume(vol, r->me->volumes) { /* do not continue in this loop, * or lower_vol will get out of sync */ if (lower_vol) { printf("_stacked_on_device=%s\n", shell_escape(lower_vol->device)); printf("_stacked_on_minor=%d\n", lower_vol->device_minor); } else if (r->stacked && r->me->lower) { /* ASSERT */ fprintf(stderr, "in %s: stacked volume[%u] without lower volume\n", r->name, vol->vnr); abort(); } printf("_conf_volume=%d\n", vol->vnr); ctx->vol = vol; rv = adm_generic(ctx, SLEEPS_SHORT); if (rv) return rv; if (lower_vol) lower_vol = lower_vol->next; /* vol is advanced by for_each_volume */ } } return 0; } int adm_generic_l(struct cfg_ctx *ctx) { return adm_generic(ctx, SLEEPS_LONG); } static int adm_outdate(struct cfg_ctx *ctx) { int rv; rv = adm_generic(ctx, SLEEPS_SHORT | SUPRESS_STDERR); /* special cases for outdate: * 17: drbdsetup outdate, but is primary and thus cannot be outdated. * 5: drbdsetup outdate, and is inconsistent or worse anyways. */ if (rv == 17) return rv; if (rv == 5) { /* That might mean it is diskless. */ rv = admm_generic(ctx); if (rv) rv = 5; return rv; } if (rv || dry_run) { rv = admm_generic(ctx); } return rv; } /* shell equivalent: * ( drbdsetup resize && drbdsetup check-resize ) || drbdmeta check-resize */ static int adm_chk_resize(struct cfg_ctx *ctx) { /* drbdsetup resize && drbdsetup check-resize */ int ex = adm_resize(ctx); if (ex == 0) return 0; /* try drbdmeta check-resize */ return admm_generic(ctx); } static int adm_generic_b(struct cfg_ctx *ctx) { char buffer[4096]; int fd, status, rv = 0, rr, s = 0; pid_t pid; _adm_generic(ctx, SLEEPS_SHORT | RETURN_STDERR_FD, &pid, &fd, NULL); if (fd < 0) { fprintf(stderr, "Strange: got negative fd.\n"); exit(E_thinko); } if (!dry_run) { while (1) { rr = read(fd, buffer + s, 4096 - s); if (rr <= 0) break; s += rr; } close(fd); rr = waitpid(pid, &status, 0); alarm(0); if (WIFEXITED(status)) rv = WEXITSTATUS(status); if (alarm_raised) { rv = 0x100; } } /* see drbdsetup.c, print_config_error(): * 11: some unspecific state change error * 17: SS_NO_UP_TO_DATE_DISK * In both cases, we don't need to retry with drbdmeta, * it would fail anyways with "Device is configured!" */ if (rv == 11 || rv == 17) { /* Some state transition error, report it ... */ rr = write(fileno(stderr), buffer, s); return rv; } if (rv || dry_run) { /* On other errors rv = 10 .. no minor allocated rv = 20 .. module not loaded rv = 16 .. we are diskless here retry with drbdmeta. */ rv = admm_generic(ctx); } return rv; } static int adm_khelper(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; struct d_volume *vol = ctx->vol; int rv = 0; char *sh_cmd; char minor_string[8]; char volume_string[8]; char *argv[] = { "/bin/sh", "-c", NULL, NULL }; if (!res->peer) { /* Since 8.3.2 we get DRBD_PEER_AF and DRBD_PEER_ADDRESS from the kernel. If we do not know the peer by now, use these to find the peer. */ struct d_host_info *host; char *peer_address = getenv("DRBD_PEER_ADDRESS"); char *peer_af = getenv("DRBD_PEER_AF"); if (peer_address && peer_af) { for (host = res->all_hosts; host; host = host->next) { if (!strcmp(host->address_family, peer_af) && !strcmp(host->address, peer_address)) { res->peer = host; break; } } } } if (res->peer) { setenv("DRBD_PEER_AF", res->peer->address_family, 1); /* since 8.3.0 */ setenv("DRBD_PEER_ADDRESS", res->peer->address, 1); /* since 8.3.0 */ setenv("DRBD_PEER", res->peer->on_hosts->name, 1); /* deprecated */ setenv("DRBD_PEERS", names_to_str(res->peer->on_hosts), 1); /* since 8.3.0, but not usable when using a config with "floating" statements. */ } if (vol) { snprintf(minor_string, sizeof(minor_string), "%u", vol->device_minor); snprintf(volume_string, sizeof(volume_string), "%u", vol->vnr); setenv("DRBD_MINOR", minor_string, 1); setenv("DRBD_VOLUME", volume_string, 1); setenv("DRBD_LL_DISK", vol->disk, 1); } else { char *minor_list; char *separator = ""; char *pos; int volumes = 0; int bufsize; int n; for_each_volume(vol, res->me->volumes) volumes++; /* max minor number is 2**20 - 1, which is 7 decimal digits. * plus separator respective trailing zero. */ bufsize = volumes * 8 + 1; minor_list = alloca(bufsize); pos = minor_list; for_each_volume(vol, res->me->volumes) { n = snprintf(pos, bufsize, "%s%d", separator, vol->device_minor); if (n >= bufsize) { /* "can not happen" */ fprintf(stderr, "buffer too small when generating the minor list\n"); abort(); break; } bufsize -= n; pos += n; separator = " "; } setenv("DRBD_MINOR", minor_list, 1); } setenv("DRBD_RESOURCE", res->name, 1); setenv("DRBD_CONF", config_save, 1); if ((sh_cmd = get_opt_val(res->handlers, ctx->arg, NULL))) { argv[2] = sh_cmd; rv = m_system_ex(argv, SLEEPS_VERY_LONG, res->name); } return rv; } // need to convert discard-node-nodename to discard-local or discard-remote. void convert_discard_opt(struct d_resource *res) { struct d_option *opt; if (res == NULL) return; if ((opt = find_opt(res->net_options, "after-sb-0pri"))) { if (!strncmp(opt->value, "discard-node-", 13)) { if (!strcmp(nodeinfo.nodename, opt->value + 13)) { free(opt->value); opt->value = strdup("discard-local"); } else { free(opt->value); opt->value = strdup("discard-remote"); } } } } static int add_connection_endpoints(char **argv, int *argcp, struct d_resource *res) { int argc = *argcp; make_address(res->me->address, res->me->port, res->me->address_family); if (res->me->proxy) { make_address(res->me->proxy->inside_addr, res->me->proxy->inside_port, res->me->proxy->inside_af); } else if (res->peer) { make_address(res->peer->address, res->peer->port, res->peer->address_family); } else if (dry_run) { argv[NA(argc)] = "N/A"; } else { fprintf(stderr, "resource %s: cannot configure network without knowing my peer.\n", res->name); return 20; } *argcp = argc; return 0; } static int adm_connect_or_net_options(struct cfg_ctx *ctx, bool do_connect, bool reset) { struct d_resource *res = ctx->res; char *argv[MAX_ARGS]; struct d_option *opt; int argc = 0; int err; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = do_connect ? "connect" : "net-options"; if (do_connect) ssprintf(argv[NA(argc)], "%s", res->name); err = add_connection_endpoints(argv, &argc, res); if (err) return err; if (reset) argv[NA(argc)] = "--set-defaults"; if (reset || do_connect) { opt = res->net_options; make_options(opt); } add_setup_options(argv, &argc); argv[NA(argc)] = 0; return m_system_ex(argv, SLEEPS_SHORT, res->name); } int adm_connect(struct cfg_ctx *ctx) { return adm_connect_or_net_options(ctx, true, false); } int adm_net_options(struct cfg_ctx *ctx) { return adm_connect_or_net_options(ctx, false, false); } int adm_set_default_net_options(struct cfg_ctx *ctx) { return adm_connect_or_net_options(ctx, false, true); } int adm_disconnect(struct cfg_ctx *ctx) { char *argv[MAX_ARGS]; int argc = 0; if (!ctx->res) { /* ASSERT */ fprintf(stderr, "sorry, need at least a resource name to call drbdsetup\n"); abort(); } argv[NA(argc)] = drbdsetup; argv[NA(argc)] = (char *)ctx->arg; add_connection_endpoints(argv, &argc, ctx->res); add_setup_options(argv, &argc); argv[NA(argc)] = 0; setenv("DRBD_RESOURCE", ctx->res->name, 1); return m_system_ex(argv, SLEEPS_SHORT, ctx->res->name); } struct d_option *del_opt(struct d_option *base, struct d_option *item) { struct d_option *i; if (base == item) { base = item->next; free(item->name); free(item->value); free(item); return base; } for (i = base; i; i = i->next) { if (i->next == item) { i->next = item->next; free(item->name); free(item->value); free(item); return base; } } return base; } // Need to convert after from resourcename to minor_number. void _convert_after_option(struct d_resource *res, struct d_volume *vol) { struct d_option *opt, *next; struct cfg_ctx depends_on_ctx = { }; int volumes; if (res == NULL) return; opt = vol->disk_options; while ((opt = find_opt(opt, "resync-after"))) { next = opt->next; ctx_by_name(&depends_on_ctx, opt->value); volumes = ctx_set_implicit_volume(&depends_on_ctx); if (volumes > 1) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "resync-after contains '%s', which is ambiguous, since it contains %d volumes\n", res->config_file, res->start_line, res->name, opt->value, volumes); config_valid = 0; return; } if (!depends_on_ctx.res || depends_on_ctx.res->ignore) { vol->disk_options = del_opt(vol->disk_options, opt); } else { free(opt->value); m_asprintf(&opt->value, "%d", depends_on_ctx.vol->device_minor); } opt = next; } } // Need to convert after from resourcename/volume to minor_number. void convert_after_option(struct d_resource *res) { struct d_volume *vol; struct d_host_info *h; for (h = res->all_hosts; h; h = h->next) for_each_volume(vol, h->volumes) _convert_after_option(res, vol); } int _proxy_connect_name_len(struct d_resource *res) { return strlen(res->name) + strlen(names_to_str_c(res->peer->proxy->on_hosts, '_')) + strlen(names_to_str_c(res->me->proxy->on_hosts, '_')) + 3 /* for the two dashes and the trailing 0 character */; } char *_proxy_connection_name(char *conn_name, struct d_resource *res) { sprintf(conn_name, "%s-%s-%s", res->name, names_to_str_c(res->peer->proxy->on_hosts, '_'), names_to_str_c(res->me->proxy->on_hosts, '_')); return conn_name; } int do_proxy_conn_up(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; char *argv[4] = { drbd_proxy_ctl, "-c", NULL, NULL }; char *conn_name; int rv; conn_name = proxy_connection_name(res); ssprintf(argv[2], "add connection %s %s:%s %s:%s %s:%s %s:%s", conn_name, res->me->proxy->inside_addr, res->me->proxy->inside_port, res->peer->proxy->outside_addr, res->peer->proxy->outside_port, res->me->proxy->outside_addr, res->me->proxy->outside_port, res->me->address, res->me->port); rv = m_system_ex(argv, SLEEPS_SHORT, res->name); return rv; } int do_proxy_conn_plugins(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; char *argv[MAX_ARGS]; char *conn_name; int argc = 0; struct d_option *opt; int counter; conn_name = proxy_connection_name(res); argc = 0; argv[NA(argc)] = drbd_proxy_ctl; opt = res->me->proxy->options; while (opt) { argv[NA(argc)] = "-c"; ssprintf(argv[NA(argc)], "set %s %s %s", opt->name, conn_name, opt->value); opt = opt->next; } counter = 0; opt = res->me->proxy->plugins; /* Don't send the "set plugin ... END" line if no plugins are defined * - that's incompatible with the drbd proxy version 1. */ if (opt) { while (1) { argv[NA(argc)] = "-c"; ssprintf(argv[NA(argc)], "set plugin %s %d %s", conn_name, counter, opt ? opt->name : "END"); if (!opt) break; opt = opt->next; counter ++; } } argv[NA(argc)] = 0; if (argc > 2) return m_system_ex(argv, SLEEPS_SHORT, res->name); return 0; } int do_proxy_conn_down(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; char *conn_name; char *argv[4] = { drbd_proxy_ctl, "-c", NULL, NULL}; int rv; conn_name = proxy_connection_name(res); ssprintf(argv[2], "del connection %s", conn_name); rv = m_system_ex(argv, SLEEPS_SHORT, res->name); return rv; } static int check_proxy(struct cfg_ctx *ctx, int do_up) { struct d_resource *res = ctx->res; int rv; if (!res->me->proxy) { if (all_resources) return 0; fprintf(stderr, "There is no proxy config for host %s in resource %s.\n", nodeinfo.nodename, res->name); exit(E_config_invalid); } if (!name_in_names(nodeinfo.nodename, res->me->proxy->on_hosts)) { if (all_resources) return 0; fprintf(stderr, "The proxy config in resource %s is not for %s.\n", res->name, nodeinfo.nodename); exit(E_config_invalid); } if (!res->peer) { fprintf(stderr, "Cannot determine the peer in resource %s.\n", res->name); exit(E_config_invalid); } if (!res->peer->proxy) { fprintf(stderr, "There is no proxy config for the peer in resource %s.\n", res->name); if (all_resources) return 0; exit(E_config_invalid); } if (do_up) { rv = do_proxy_conn_up(ctx); if (!rv) rv = do_proxy_conn_plugins(ctx); } else rv = do_proxy_conn_down(ctx); return rv; } static int adm_proxy_up(struct cfg_ctx *ctx) { return check_proxy(ctx, 1); } static int adm_proxy_down(struct cfg_ctx *ctx) { return check_proxy(ctx, 0); } /* The "main" loop iterates over resources. * This "sorts" the drbdsetup commands to bring those up * so we will later first create all objects, * then attach all local disks, * adjust various settings, * and then configure the network part */ static int adm_up(struct cfg_ctx *ctx) { static char *current_res_name; if (!current_res_name || strcmp(current_res_name, ctx->res->name)) { free(current_res_name); current_res_name = strdup(ctx->res->name); schedule_deferred_cmd(adm_new_resource, ctx, "new-resource", CFG_PREREQ); schedule_deferred_cmd(adm_connect, ctx, "connect", CFG_NET); } schedule_deferred_cmd(adm_new_minor, ctx, "new-minor", CFG_PREREQ); schedule_deferred_cmd(adm_attach, ctx, "attach", CFG_DISK); return 0; } /* The stacked-timeouts switch in the startup sections allows us to enforce the use of the specified timeouts instead the use of a sane value. Should only be used if the third node should never become primary. */ static int adm_wait_c(struct cfg_ctx *ctx) { struct d_resource *res = ctx->res; struct d_volume *vol = ctx->vol; char *argv[MAX_ARGS]; struct d_option *opt; int argc = 0, rv; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = "wait-connect"; ssprintf(argv[NA(argc)], "%d", vol->device_minor); if (is_drbd_top && !res->stacked_timeouts) { unsigned long timeout = 20; if ((opt = find_opt(res->net_options, "connect-int"))) { timeout = strtoul(opt->value, NULL, 10); // one connect-interval? two? timeout *= 2; } argv[argc++] = "-t"; ssprintf(argv[argc], "%lu", timeout); argc++; } else { opt = res->startup_options; make_options(opt); } argv[NA(argc)] = 0; rv = m_system_ex(argv, SLEEPS_FOREVER, res->name); return rv; } static unsigned minor_by_id(const char *id) { if (strncmp(id, "minor-", 6)) return -1U; return m_strtoll(id + 6, 1); } int ctx_by_minor(struct cfg_ctx *ctx, const char *id) { struct d_resource *res, *t; struct d_volume *vol; unsigned int mm; mm = minor_by_id(id); if (mm == -1U) return -ENOENT; for_each_resource(res, t, config) { if (res->ignore) continue; for_each_volume(vol, res->me->volumes) { if (mm == vol->device_minor) { is_drbd_top = res->stacked; ctx->res = res; ctx->vol = vol; return 0; } } } return -ENOENT; } struct d_resource *res_by_name(const char *name) { struct d_resource *res, *t; for_each_resource(res, t, config) { if (strcmp(name, res->name) == 0) return res; } return NULL; } struct d_volume *volume_by_vnr(struct d_volume *volumes, int vnr) { struct d_volume *vol; for_each_volume(vol, volumes) if (vnr == vol->vnr) return vol; return NULL; } int ctx_by_name(struct cfg_ctx *ctx, const char *id) { struct d_resource *res, *t; struct d_volume *vol; char *name = strdupa(id); char *vol_id = strchr(name, '/'); unsigned vol_nr = ~0U; if (vol_id) { *vol_id++ = '\0'; vol_nr = m_strtoll(vol_id, 0); } for_each_resource(res, t, config) { if (res->ignore) continue; if (strcmp(name, res->name) == 0) break; } if (!res) return -ENOENT; if (!vol_id) { /* We could assign implicit volumes here. * But that broke "drbdadm up specific-resource". */ ctx->res = res; ctx->vol = NULL; return 0; } vol = volume_by_vnr(res->me->volumes, vol_nr); if (vol) { ctx->res = res; ctx->vol = vol; return 0; } return -ENOENT; } int ctx_set_implicit_volume(struct cfg_ctx *ctx) { struct d_volume *vol, *v; int volumes = 0; if (ctx->vol || !ctx->res) return 0; if (!ctx->res->me) { return 0; } for_each_volume(vol, ctx->res->me->volumes) { volumes++; v = vol; } if (volumes == 1) ctx->vol = v; return volumes; } /* In case a child exited, or exits, its return code is stored as negative number in the pids[i] array */ static int childs_running(pid_t * pids, int opts) { int i = 0, wr, rv = 0, status; int N = nr_volumes[is_drbd_top ? STACKED : NORMAL]; for (i = 0; i < N; i++) { if (pids[i] <= 0) continue; wr = waitpid(pids[i], &status, opts); if (wr == -1) { // Wait error. if (errno == ECHILD) { printf("No exit code for %d\n", pids[i]); pids[i] = 0; // Child exited before ? continue; } perror("waitpid"); exit(E_exec_error); } if (wr == 0) rv = 1; // Child still running. if (wr > 0) { pids[i] = 0; if (WIFEXITED(status)) pids[i] = -WEXITSTATUS(status); if (WIFSIGNALED(status)) pids[i] = -1000; } } return rv; } static void kill_childs(pid_t * pids) { int i; int N = nr_volumes[is_drbd_top ? STACKED : NORMAL]; for (i = 0; i < N; i++) { if (pids[i] <= 0) continue; kill(pids[i], SIGINT); } } /* returns: -1 ... all childs terminated 0 ... timeout expired 1 ... a string was read */ int gets_timeout(pid_t * pids, char *s, int size, int timeout) { int pr, rr, n = 0; struct pollfd pfd; if (s) { pfd.fd = fileno(stdin); pfd.events = POLLIN | POLLHUP | POLLERR | POLLNVAL; n = 1; } if (!childs_running(pids, WNOHANG)) { pr = -1; goto out; } do { pr = poll(&pfd, n, timeout); if (pr == -1) { // Poll error. if (errno == EINTR) { if (childs_running(pids, WNOHANG)) continue; goto out; // pr = -1 here. } perror("poll"); exit(E_exec_error); } } while (pr == -1); if (pr == 1) { // Input available. rr = read(fileno(stdin), s, size - 1); if (rr == -1) { perror("read"); exit(E_exec_error); } s[rr] = 0; } out: return pr; } static char *get_opt_val(struct d_option *base, const char *name, char *def) { while (base) { if (!strcmp(base->name, name)) { return base->value; } base = base->next; } return def; } void chld_sig_hand(int __attribute((unused)) unused) { // do nothing. But interrupt systemcalls :) } static int check_exit_codes(pid_t * pids) { struct d_resource *res, *t; int i = 0, rv = 0; for_each_resource(res, t, config) { if (res->ignore) continue; if (is_drbd_top != res->stacked) continue; if (pids[i] == -5 || pids[i] == -1000) { pids[i] = 0; } if (pids[i] == -20) rv = 20; i++; } return rv; } static int adm_wait_ci(struct cfg_ctx *ctx) { struct d_resource *res, *t; char *argv[20], answer[40]; pid_t *pids; struct d_option *opt; int rr, wtime, argc, i = 0; time_t start; int saved_stdin, saved_stdout, fd; int N; struct sigaction so, sa; saved_stdin = -1; saved_stdout = -1; if (no_tty) { fprintf(stderr, "WARN: stdin/stdout is not a TTY; using /dev/console"); fprintf(stdout, "WARN: stdin/stdout is not a TTY; using /dev/console"); saved_stdin = dup(fileno(stdin)); if (saved_stdin == -1) perror("dup(stdin)"); saved_stdout = dup(fileno(stdout)); if (saved_stdin == -1) perror("dup(stdout)"); fd = open("/dev/console", O_RDONLY); if (fd == -1) perror("open('/dev/console, O_RDONLY)"); dup2(fd, fileno(stdin)); fd = open("/dev/console", O_WRONLY); if (fd == -1) perror("open('/dev/console, O_WRONLY)"); dup2(fd, fileno(stdout)); } sa.sa_handler = chld_sig_hand; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_NOCLDSTOP; sigaction(SIGCHLD, &sa, &so); N = nr_volumes[is_drbd_top ? STACKED : NORMAL]; pids = alloca(N * sizeof(pid_t)); /* alloca can not fail, it can "only" overflow the stack :) * but it needs to be initialized anyways! */ memset(pids, 0, N * sizeof(pid_t)); for_each_resource(res, t, config) { struct d_volume *vol; if (res->ignore) continue; if (is_drbd_top != res->stacked) continue; for_each_volume(vol, res->me->volumes) { /* ctx is not used */ argc = 0; argv[NA(argc)] = drbdsetup; argv[NA(argc)] = "wait-connect"; ssprintf(argv[NA(argc)], "%u", vol->device_minor); opt = res->startup_options; make_options(opt); argv[NA(argc)] = 0; m__system(argv, RETURN_PID, res->name, &pids[i++], NULL, NULL); } } wtime = global_options.dialog_refresh ? : -1; start = time(0); for (i = 0; i < 10; i++) { // no string, but timeout rr = gets_timeout(pids, 0, 0, 1 * 1000); if (rr < 0) break; putchar('.'); fflush(stdout); check_exit_codes(pids); } if (rr == 0) { /* track a "yes", as well as ctrl-d and ctrl-c, * in case our tty is stuck in "raw" mode, and * we get it one character a time (-icanon) */ char yes_string[] = "yes\n"; char *yes_expect = yes_string; int ctrl_c_count = 0; int ctrl_d_count = 0; /* Just in case, if plymouth or usplash is running, * tell them to step aside. * Also try to force canonical tty mode. */ if (system("exec > /dev/null 2>&1; plymouth quit ; usplash_write QUIT ; " "stty echo icanon icrnl")) /* Ignore return value. Cannot do anything about it anyways. */; printf ("\n***************************************************************\n" " DRBD's startup script waits for the peer node(s) to appear.\n" " - In case this node was already a degraded cluster before the\n" " reboot the timeout is %s seconds. [degr-wfc-timeout]\n" " - If the peer was available before the reboot the timeout will\n" " expire after %s seconds. [wfc-timeout]\n" " (These values are for resource '%s'; 0 sec -> wait forever)\n", get_opt_val(config->startup_options, "degr-wfc-timeout", "0"), get_opt_val(config->startup_options, "wfc-timeout", "0"), config->name); printf(" To abort waiting enter 'yes' [ -- ]: "); do { printf("\e[s\e[31G[%4d]:\e[u", (int)(time(0) - start)); // Redraw sec. fflush(stdout); rr = gets_timeout(pids, answer, 40, wtime * 1000); check_exit_codes(pids); if (rr != 1) continue; /* If our tty is in "sane" or "canonical" mode, * we get whole lines. * If it still is in "raw" mode, even though we * tried to set ICANON above, possibly some other * "boot splash thingy" is in operation. * We may be lucky to get single characters. * If a sysadmin sees things stuck during boot, * I expect that ctrl-c or ctrl-d will be one * of the first things that are tried. * In raw mode, we get these characters directly. * But I want them to try that three times ;) */ if (answer[0] && answer[1] == 0) { if (answer[0] == '\3') ++ctrl_c_count; if (answer[0] == '\4') ++ctrl_d_count; if (yes_expect && answer[0] == *yes_expect) ++yes_expect; else if (answer[0] == '\n') yes_expect = yes_string; else yes_expect = NULL; } if (!strcmp(answer, "yes\n") || (yes_expect && *yes_expect == '\0') || ctrl_c_count >= 3 || ctrl_d_count >= 3) { kill_childs(pids); childs_running(pids, 0); check_exit_codes(pids); break; } printf(" To abort waiting enter 'yes' [ -- ]:"); } while (rr != -1); printf("\n"); } if (saved_stdin != -1) { dup2(saved_stdin, fileno(stdin)); dup2(saved_stdout, fileno(stdout)); } return 0; } static void print_cmds(int level) { size_t i; int j = 0; for (i = 0; i < ARRAY_SIZE(cmds); i++) { if (cmds[i].show_in_usage != level) continue; if (j++ % 2) { printf("%-35s\n", cmds[i].name); } else { printf(" %-35s", cmds[i].name); } } if (j % 2) printf("\n"); } static int hidden_cmds(struct cfg_ctx *ignored __attribute((unused))) { printf("\nThese additional commands might be useful for writing\n" "nifty shell scripts around drbdadm:\n\n"); print_cmds(2); printf("\nThese commands are used by the kernel part of DRBD to\n" "invoke user mode helper programs:\n\n"); print_cmds(3); printf ("\nThese commands ought to be used by experts and developers:\n\n"); print_cmds(4); printf("\n"); exit(0); } static void field_to_option(const struct field_def *field, struct option *option) { option->name = field->name; option->has_arg = field->argument_is_optional ? optional_argument : required_argument; option->flag = NULL; option->val = 257; } static void print_option(struct option *opt) { if (opt->has_arg == required_argument) { printf(" --%s=...", opt->name); if (opt->val > 1 && opt->val < 256) printf(", -%c ...", opt->val); printf("\n"); } else if (opt->has_arg == optional_argument) { printf(" --%s[=...]", opt->name); if (opt->val > 1 && opt->val < 256) printf(", -%c...", opt->val); printf("\n"); } else { printf(" --%s", opt->name); if (opt->val > 1 && opt->val < 256) printf(", -%c", opt->val); printf("\n"); } } void print_usage_and_exit(struct adm_cmd *cmd, const char *addinfo, int status) { struct option *opt; printf("\nUSAGE: %s %s [OPTION...] {all|RESOURCE...}\n\n" "GENERAL OPTIONS:\n", progname, cmd ? cmd->name : "COMMAND"); for (opt = general_admopt; opt->name; opt++) print_option(opt); if (cmd && cmd->drbdsetup_ctx) { const struct field_def *field; printf("\nOPTIONS FOR %s:\n", cmd->name); for (field = cmd->drbdsetup_ctx->fields; field->name; field++) { struct option opt; field_to_option(field, &opt); print_option(&opt); } } if (!cmd) { printf("\nCOMMANDS:\n"); print_cmds(1); } printf("\nVersion: " REL_VERSION " (api:%d)\n%s\n", API_VERSION, drbd_buildtag()); if (addinfo) printf("\n%s\n", addinfo); exit(status); } /* * I'd really rather parse the output of * ip -o a s * once, and be done. * But anyways.... */ static struct ifreq *get_ifreq(void) { int sockfd, num_ifaces; struct ifreq *ifr; struct ifconf ifc; size_t buf_size; if (0 > (sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP))) { perror("Cannot open socket"); exit(EXIT_FAILURE); } num_ifaces = 0; ifc.ifc_req = NULL; /* realloc buffer size until no overflow occurs */ do { num_ifaces += 16; /* initial guess and increment */ buf_size = ++num_ifaces * sizeof(struct ifreq); ifc.ifc_len = buf_size; if (NULL == (ifc.ifc_req = realloc(ifc.ifc_req, ifc.ifc_len))) { fprintf(stderr, "Out of memory.\n"); return NULL; } if (ioctl(sockfd, SIOCGIFCONF, &ifc)) { perror("ioctl SIOCFIFCONF"); free(ifc.ifc_req); return NULL; } } while (buf_size <= (size_t) ifc.ifc_len); num_ifaces = ifc.ifc_len / sizeof(struct ifreq); /* Since we allocated at least one more than necessary, * this serves as a stop marker for the iteration in * have_ip() */ ifc.ifc_req[num_ifaces].ifr_name[0] = 0; for (ifr = ifc.ifc_req; ifr->ifr_name[0] != 0; ifr++) { /* we only want to look up the presence or absence of a certain address * here. but we want to skip "down" interfaces. if an interface is down, * we store an invalid sa_family, so the lookup will skip it. */ struct ifreq ifr_for_flags = *ifr; /* get a copy to work with */ if (ioctl(sockfd, SIOCGIFFLAGS, &ifr_for_flags) < 0) { perror("ioctl SIOCGIFFLAGS"); ifr->ifr_addr.sa_family = -1; /* what's wrong here? anyways: skip */ continue; } if (!(ifr_for_flags.ifr_flags & IFF_UP)) { ifr->ifr_addr.sa_family = -1; /* is not up: skip */ continue; } } close(sockfd); return ifc.ifc_req; } int have_ip_ipv4(const char *ip) { struct ifreq *ifr; struct in_addr query_addr; query_addr.s_addr = inet_addr(ip); if (!ifreq_list) ifreq_list = get_ifreq(); for (ifr = ifreq_list; ifr && ifr->ifr_name[0] != 0; ifr++) { /* SIOCGIFCONF only supports AF_INET */ struct sockaddr_in *list_addr = (struct sockaddr_in *)&ifr->ifr_addr; if (ifr->ifr_addr.sa_family != AF_INET) continue; if (query_addr.s_addr == list_addr->sin_addr.s_addr) return 1; } return 0; } int have_ip_ipv6(const char *ip) { FILE *if_inet6; struct in6_addr addr6, query_addr; unsigned int b[4]; char tmp_ip[INET6_ADDRSTRLEN+1]; char name[20]; /* IFNAMSIZ aka IF_NAMESIZE is 16 */ int i; /* don't want to do getaddrinfo lookup, but inet_pton get's confused by * %eth0 link local scope specifiers. So we have a temporary copy * without that part. */ for (i=0; ip[i] && ip[i] != '%' && i < INET6_ADDRSTRLEN; i++) tmp_ip[i] = ip[i]; tmp_ip[i] = 0; if (inet_pton(AF_INET6, tmp_ip, &query_addr) <= 0) return 0; #define PROC_IF_INET6 "/proc/net/if_inet6" if_inet6 = fopen(PROC_IF_INET6, "r"); if (!if_inet6) { if (errno != ENOENT) perror("open of " PROC_IF_INET6 " failed:"); #undef PROC_IF_INET6 return 0; } while (fscanf (if_inet6, X32(08) X32(08) X32(08) X32(08) " %*02x %*02x %*02x %*02x %s", b, b + 1, b + 2, b + 3, name) > 0) { for (i = 0; i < 4; i++) addr6.s6_addr32[i] = cpu_to_be32(b[i]); if (memcmp(&query_addr, &addr6, sizeof(struct in6_addr)) == 0) { fclose(if_inet6); return 1; } } fclose(if_inet6); return 0; } int have_ip(const char *af, const char *ip) { if (!strcmp(af, "ipv4")) return have_ip_ipv4(ip); else if (!strcmp(af, "ipv6")) return have_ip_ipv6(ip); return 1; /* SCI */ } void verify_ips(struct d_resource *res) { if (global_options.disable_ip_verification) return; if (dry_run == 1 || do_verify_ips == 0) return; if (res->ignore) return; if (res->stacked && !is_drbd_top) return; if (!have_ip(res->me->address_family, res->me->address)) { ENTRY e, *ep; e.key = e.data = ep = NULL; m_asprintf(&e.key, "%s:%s", res->me->address, res->me->port); hsearch_r(e, FIND, &ep, &global_htable); fprintf(stderr, "%s: in resource %s, on %s:\n\t" "IP %s not found on this host.\n", ep ? (char *)ep->data : res->config_file, res->name, names_to_str(res->me->on_hosts), res->me->address); if (INVALID_IP_IS_INVALID_CONF) config_valid = 0; } } static char *conf_file[] = { DRBD_CONFIG_DIR "/drbd-84.conf", DRBD_CONFIG_DIR "/drbd-83.conf", DRBD_CONFIG_DIR "/drbd-82.conf", DRBD_CONFIG_DIR "/drbd-08.conf", DRBD_CONFIG_DIR "/drbd.conf", 0 }; int sanity_check_abs_cmd(char *cmd_name) { struct stat sb; if (stat(cmd_name, &sb)) { /* If stat fails, just ignore this sanity check, * we are still iterating over $PATH probably. */ return 0; } if (!(sb.st_mode & S_ISUID) || sb.st_mode & S_IXOTH || sb.st_gid == 0) { static int did_header = 0; if (!did_header) fprintf(stderr, "WARN:\n" " You are using the 'drbd-peer-outdater' as fence-peer program.\n" " If you use that mechanism the dopd heartbeat plugin program needs\n" " to be able to call drbdsetup and drbdmeta with root privileges.\n\n" " You need to fix this with these commands:\n"); did_header = 1; fprintf(stderr, " chgrp haclient %s\n" " chmod o-x %s\n" " chmod u+s %s\n\n", cmd_name, cmd_name, cmd_name); } return 1; } void sanity_check_cmd(char *cmd_name) { char *path, *pp, *c; char abs_path[100]; if (strchr(cmd_name, '/')) { sanity_check_abs_cmd(cmd_name); } else { path = pp = c = strdup(getenv("PATH")); while (1) { c = strchr(pp, ':'); if (c) *c = 0; snprintf(abs_path, 100, "%s/%s", pp, cmd_name); if (sanity_check_abs_cmd(abs_path)) break; if (!c) break; c++; if (!*c) break; pp = c; } free(path); } } /* if the config file is not readable by haclient, * dopd cannot work. * NOTE: we assume that any gid != 0 will be the group dopd will run as, * typically haclient. */ void sanity_check_conf(char *c) { struct stat sb; /* if we cannot stat the config file, * we have other things to worry about. */ if (stat(c, &sb)) return; /* permissions are funny: if it is world readable, * but not group readable, and it belongs to my group, * I am denied access. * For the file to be readable by dopd (hacluster:haclient), * it is not enough to be world readable. */ /* ok if world readable, and NOT group haclient (see NOTE above) */ if (sb.st_mode & S_IROTH && sb.st_gid == 0) return; /* ok if group readable, and group haclient (see NOTE above) */ if (sb.st_mode & S_IRGRP && sb.st_gid != 0) return; fprintf(stderr, "WARN:\n" " You are using the 'drbd-peer-outdater' as fence-peer program.\n" " If you use that mechanism the dopd heartbeat plugin program needs\n" " to be able to read the drbd.config file.\n\n" " You need to fix this with these commands:\n" " chgrp haclient %s\n" " chmod g+r %s\n\n", c, c); } void sanity_check_perm() { static int checked = 0; if (checked) return; sanity_check_cmd(drbdsetup); sanity_check_cmd(drbdmeta); sanity_check_conf(config_file); checked = 1; } void validate_resource(struct d_resource *res, enum pp_flags flags) { struct d_option *opt, *next; struct d_name *bpo; /* there may be more than one "resync-after" statement, * see commit 89cd0585 */ opt = res->disk_options; while ((opt = find_opt(opt, "resync-after"))) { struct d_resource *rs_after_res = res_by_name(opt->value); next = opt->next; if (rs_after_res == NULL || (rs_after_res->ignore && !(flags & MATCH_ON_PROXY))) { fprintf(stderr, "%s:%d: in resource %s:\n\tresource '%s' mentioned in " "'resync-after' option is not known%s.\n", res->config_file, res->start_line, res->name, opt->value, rs_after_res ? " on this host" : ""); /* Non-fatal if run from some script. * When deleting resources, it is an easily made * oversight to leave references to the deleted * resources in resync-after statements. Don't fail on * every pacemaker-induced action, as it would * ultimately lead to all nodes committing suicide. */ if (no_tty) res->disk_options = del_opt(res->disk_options, opt); else config_valid = 0; } opt = next; } if (res->ignore) return; if (!res->me) { fprintf(stderr, "%s:%d: in resource %s:\n\tmissing section 'on %s { ... }'.\n", res->config_file, res->start_line, res->name, nodeinfo.nodename); config_valid = 0; } // need to verify that in the discard-node-nodename options only known // nodenames are mentioned. if ((opt = find_opt(res->net_options, "after-sb-0pri"))) { if (!strncmp(opt->value, "discard-node-", 13)) { if (res->peer && !name_in_names(opt->value + 13, res->peer->on_hosts) && !name_in_names(opt->value + 13, res->me->on_hosts)) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "the nodename in the '%s' option is " "not known.\n\t" "valid nodenames are: '%s %s'.\n", res->config_file, res->start_line, res->name, opt->value, names_to_str(res->me->on_hosts), names_to_str(res->peer->on_hosts)); config_valid = 0; } } } if ((opt = find_opt(res->handlers, "fence-peer"))) { if (strstr(opt->value, "drbd-peer-outdater")) sanity_check_perm(); } opt = find_opt(res->net_options, "allow-two-primaries"); if (name_in_names("both", res->become_primary_on) && opt == NULL) { fprintf(stderr, "%s:%d: in resource %s:\n" "become-primary-on is set to both, but allow-two-primaries " "is not set.\n", res->config_file, res->start_line, res->name); config_valid = 0; } if (!res->peer) set_peer_in_resource(res, 0); if (res->peer && ((res->me->proxy == NULL) != (res->peer->proxy == NULL))) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "Either both 'on' sections must contain a proxy subsection, or none.\n", res->config_file, res->start_line, res->name); config_valid = 0; } for (bpo = res->become_primary_on; bpo; bpo = bpo->next) { if (res->peer && !name_in_names(bpo->name, res->me->on_hosts) && !name_in_names(bpo->name, res->peer->on_hosts) && strcmp(bpo->name, "both")) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "become-primary-on contains '%s', which is not named with the 'on' sections.\n", res->config_file, res->start_line, res->name, bpo->name); config_valid = 0; } } } static void global_validate_maybe_expand_die_if_invalid(int expand, enum pp_flags flags) { struct d_resource *res, *tmp; for_each_resource(res, tmp, config) { validate_resource(res, flags); if (!config_valid) exit(E_config_invalid); if (expand) { convert_after_option(res); convert_discard_opt(res); } if (!config_valid) exit(E_config_invalid); } } /* * returns a pointer to an malloced area that contains * an absolute, canonical, version of path. * aborts if any allocation or syscall fails. * return value should be free()d, once no longer needed. */ char *canonify_path(char *path) { int cwd_fd = -1; char *last_slash; char *tmp; char *that_wd; char *abs_path; if (!path || !path[0]) { fprintf(stderr, "cannot canonify an empty path\n"); exit(E_usage); } tmp = strdupa(path); last_slash = strrchr(tmp, '/'); if (last_slash) { *last_slash++ = '\0'; cwd_fd = open(".", O_RDONLY); if (cwd_fd < 0) { fprintf(stderr, "open(\".\") failed: %m\n"); exit(E_usage); } if (chdir(tmp)) { fprintf(stderr, "chdir(\"%s\") failed: %m\n", tmp); exit(E_usage); } } else { last_slash = tmp; } that_wd = getcwd(NULL, 0); if (!that_wd) { fprintf(stderr, "getcwd() failed: %m\n"); exit(E_usage); } if (!strcmp("/", that_wd)) m_asprintf(&abs_path, "/%s", last_slash); else m_asprintf(&abs_path, "%s/%s", that_wd, last_slash); free(that_wd); if (cwd_fd >= 0) { if (fchdir(cwd_fd) < 0) { fprintf(stderr, "fchdir() failed: %m\n"); exit(E_usage); } } return abs_path; } void assign_command_names_from_argv0(char **argv) { struct cmd_helper { char *name; char **var; }; static struct cmd_helper helpers[] = { {"drbdsetup", &drbdsetup}, {"drbdmeta", &drbdmeta}, {"drbd-proxy-ctl", &drbd_proxy_ctl}, {"drbdadm-83", &drbdadm_83}, {NULL, NULL} }; struct cmd_helper *c; /* in case drbdadm is called with an absolute or relative pathname * look for the drbdsetup binary in the same location, * otherwise, just let execvp sort it out... */ if ((progname = strrchr(argv[0], '/')) == NULL) { progname = argv[0]; for (c = helpers; c->name; ++c) *(c->var) = strdup(c->name); } else { size_t len_dir, l; ++progname; len_dir = progname - argv[0]; for (c = helpers; c->name; ++c) { l = len_dir + strlen(c->name) + 1; *(c->var) = malloc(l); if (*(c->var)) { strncpy(*(c->var), argv[0], len_dir); strcpy(*(c->var) + len_dir, c->name); if (access(*(c->var), X_OK)) strcpy(*(c->var), c->name); /* see add_lib_drbd_to_path() */ } } /* for pretty printing, truncate to basename */ argv[0] = progname; } } static void recognize_all_drbdsetup_options(void) { int i; for (i = 0; i < ARRAY_SIZE(cmds); i++) { const struct adm_cmd *cmd = &cmds[i]; const struct field_def *field; if (!cmd->drbdsetup_ctx) continue; for (field = cmd->drbdsetup_ctx->fields; field->name; field++) { struct option opt; int n; field_to_option(field, &opt); for (n = 0; admopt[n].name; n++) { if (!strcmp(admopt[n].name, field->name)) { if (admopt[n].val == 257) assert (admopt[n].has_arg == opt.has_arg); else { fprintf(stderr, "Warning: drbdsetup %s option --%s " "can only be passed as -W--%s\n", cmd->name, admopt[n].name, admopt[n].name); goto skip; } } } if (admopt == general_admopt) { admopt = malloc((n + 2) * sizeof(*admopt)); memcpy(admopt, general_admopt, (n + 1) * sizeof(*admopt)); } else admopt = realloc(admopt, (n + 2) * sizeof(*admopt)); memcpy(&admopt[n+1], &admopt[n], sizeof(*admopt)); admopt[n] = opt; skip: /* dummy statement required because of label */ ; } } } struct adm_cmd *find_cmd(char *cmdname); int parse_options(int argc, char **argv, struct adm_cmd **cmd, char ***resource_names) { const char *optstring = make_optstring(admopt); int longindex, first_arg_index; int i; *cmd = NULL; *resource_names = malloc(sizeof(char *)); (*resource_names)[0] = NULL; opterr = 1; optind = 0; while (1) { int c; c = getopt_long(argc, argv, optstring, admopt, &longindex); if (c == -1) break; switch (c) { case 257: /* drbdsetup option */ { struct option *option = &admopt[longindex]; char *opt; int len; len = strlen(option->name) + 2; if (optarg) len += 1 + strlen(optarg); opt = malloc(len + 1); if (optarg) sprintf(opt, "--%s=%s", option->name, optarg); else sprintf(opt, "--%s", option->name); add_setup_option(false, opt); } break; case 'S': is_drbd_top = 1; break; case 'v': verbose++; break; case 'd': dry_run++; break; case 'c': if (!strcmp(optarg, "-")) { yyin = stdin; if (asprintf(&config_file, "STDIN") < 0) { fprintf(stderr, "asprintf(config_file): %m\n"); return 20; } config_from_stdin = 1; } else { yyin = fopen(optarg, "r"); if (!yyin) { fprintf(stderr, "Can not open '%s'.\n.", optarg); exit(E_exec_error); } if (asprintf(&config_file, "%s", optarg) < 0) { fprintf(stderr, "asprintf(config_file): %m\n"); return 20; } } break; case 't': config_test = optarg; break; case 's': { char *pathes[2]; pathes[0] = optarg; pathes[1] = 0; find_drbdcmd(&drbdsetup, pathes); } break; case 'm': { char *pathes[2]; pathes[0] = optarg; pathes[1] = 0; find_drbdcmd(&drbdmeta, pathes); } break; case 'p': { char *pathes[2]; pathes[0] = optarg; pathes[1] = 0; find_drbdcmd(&drbd_proxy_ctl, pathes); } break; case 'n': { char *c; int shell_var_name_ok = 1; for (c = optarg; *c && shell_var_name_ok; c++) { switch (*c) { case 'a'...'z': case 'A'...'Z': case '0'...'9': case '_': break; default: shell_var_name_ok = 0; } } if (shell_var_name_ok) sh_varname = optarg; else fprintf(stderr, "ignored --sh-varname=%s: " "contains suspect characters, allowed set is [a-zA-Z0-9_]\n", optarg); } break; case 'V': printf("DRBDADM_BUILDTAG=%s\n", shell_escape(drbd_buildtag())); printf("DRBDADM_API_VERSION=%u\n", API_VERSION); printf("DRBD_KERNEL_VERSION_CODE=0x%06x\n", version_code_kernel()); printf("DRBDADM_VERSION_CODE=0x%06x\n", version_code_userland()); printf("DRBDADM_VERSION=%s\n", shell_escape(REL_VERSION)); exit(0); break; case 'P': connect_to_host = optarg; break; case 'W': add_setup_option(true, optarg); break; case 'h': help = true; break; case '?': goto help; } } first_arg_index = optind; for (; optind < argc; optind++) { optarg = argv[optind]; if (*cmd) { int n; for (n = 0; (*resource_names)[n]; n++) /* do nothing */ ; *resource_names = realloc(*resource_names, (n + 2) * sizeof(char *)); (*resource_names)[n++] = optarg; (*resource_names)[n] = NULL; } else if (!strcmp(optarg, "help")) help = true; else { *cmd = find_cmd(optarg); if (!*cmd) { /* Passing drbdsetup options like this is discouraged! */ add_setup_option(true, optarg); } } } if (help) print_usage_and_exit(*cmd, 0, 0); if (*cmd == NULL) { if (first_arg_index < argc) { fprintf(stderr, "%s: Unknown command '%s'\n", progname, argv[first_arg_index]); return E_usage; } print_usage_and_exit(*cmd, "No command specified", E_usage); } if (setup_options) { /* * The drbdsetup options are command specific. Make sure that only * setup options that this command recognizes are used. */ for (i = 0; setup_options[i].option; i++) { const struct field_def *field; const char *option; int len; if (setup_options[i].explicit) continue; option = setup_options[i].option; for (len = 0; option[len]; len++) if (option[len] == '=') break; field = NULL; if (option[0] == '-' && option[1] == '-' && (*cmd)->drbdsetup_ctx) { for (field = (*cmd)->drbdsetup_ctx->fields; field->name; field++) { if (strlen(field->name) == len - 2 && !strncmp(option + 2, field->name, len - 2)) break; } if (!field->name) field = NULL; } if (!field) { fprintf(stderr, "%s: unrecognized option '%.*s'\n", progname, len, option); goto help; } } } return 0; help: if (*cmd) fprintf(stderr, "try '%s help %s'\n", progname, (*cmd)->name); else fprintf(stderr, "try '%s help'\n", progname); return E_usage; } static void substitute_deprecated_cmd(char **c, char *deprecated, char *substitution) { if (!strcmp(*c, deprecated)) { fprintf(stderr, "'%s %s' is deprecated, use '%s %s' instead.\n", progname, deprecated, progname, substitution); *c = substitution; } } struct adm_cmd *find_cmd(char *cmdname) { struct adm_cmd *cmd = NULL; unsigned int i; if (!strcmp("hidden-commands", cmdname)) { // before parsing the configuration file... hidden_cmds(NULL); exit(0); } /* R_PRIMARY / R_SECONDARY is not a state, but a role. Whatever that * means, actually. But anyways, we decided to start using _role_ as * the terminus of choice, and deprecate "state". */ substitute_deprecated_cmd(&cmdname, "state", "role"); /* "outdate-peer" got renamed to fence-peer, * it is not required to actually outdate the peer, * depending on situation it may be sufficient to power-reset it * or do some other fencing action, or even call out to "meatware". * The name of the handler should not imply something that is not done. */ substitute_deprecated_cmd(&cmdname, "outdate-peer", "fence-peer"); for (i = 0; i < ARRAY_SIZE(cmds); i++) { if (!strcmp(cmds[i].name, cmdname)) { cmd = cmds + i; break; } } return cmd; } char *config_file_from_arg(char *arg) { char *f; int minor = minor_by_id(arg); if (minor >= 0) { f = lookup_minor(minor); if (!f) { fprintf(stderr, "Don't know which config file belongs " "to minor %d, trying default ones...\n", minor); return NULL; } } else { f = lookup_resource(arg); if (!f) { fprintf(stderr, "Don't know which config file belongs " "to resource %s, trying default " "ones...\n", arg); return NULL; } } yyin = fopen(f, "r"); if (yyin == NULL) { fprintf(stderr, "Couldn't open file %s for reading, reason: %m\n" "trying default config file...\n", config_file); return NULL; } return f; } void assign_default_config_file(void) { int i; for (i = 0; conf_file[i]; i++) { yyin = fopen(conf_file[i], "r"); if (yyin) { config_file = conf_file[i]; break; } } if (!config_file) { fprintf(stderr, "Can not open '%s': %m\n", conf_file[i - 1]); exit(E_config_invalid); } } void count_resources_or_die(void) { int m, mc = global_options.minor_count; struct d_resource *res, *tmp; struct d_volume *vol; highest_minor = 0; number_of_minors = 0; for_each_resource(res, tmp, config) { if (res->ignore) { nr_resources[IGNORED]++; /* How can we count ignored volumes? * Do we want to? */ continue; } else if (res->stacked) nr_resources[STACKED]++; else nr_resources[NORMAL]++; for_each_volume(vol, res->me->volumes) { number_of_minors++; m = vol->device_minor; if (m > highest_minor) highest_minor = m; if (res->stacked) nr_volumes[STACKED]++; /* res->ignored won't come here */ else nr_volumes[NORMAL]++; } } // Just for the case that minor_of_res() returned 0 for all devices. if (nr_volumes[NORMAL]+nr_volumes[STACKED] > (highest_minor + 1)) highest_minor = nr_volumes[NORMAL] + nr_volumes[STACKED] -1; if (mc && mc < (highest_minor + 1)) { fprintf(stderr, "The highest minor you have in your config is %d" "but a minor_count of %d in your config!\n", highest_minor, mc); exit(E_usage); } } void die_if_no_resources(void) { if (!is_drbd_top && nr_resources[IGNORED] > 0 && nr_resources[NORMAL] == 0) { fprintf(stderr, "WARN: no normal resources defined for this host (%s)!?\n" "Misspelled name of the local machine with the 'on' keyword ?\n", nodeinfo.nodename); exit(E_usage); } if (!is_drbd_top && nr_resources[NORMAL] == 0) { fprintf(stderr, "WARN: no normal resources defined for this host (%s)!?\n", nodeinfo.nodename); exit(E_usage); } if (is_drbd_top && nr_resources[STACKED] == 0) { fprintf(stderr, "WARN: nothing stacked for this host (%s), " "nothing to do in stacked mode!\n", nodeinfo.nodename); exit(E_usage); } } void print_dump_xml_header(void) { printf("\n", config_save); ++indent; dump_global_info_xml(); dump_common_info_xml(); } void print_dump_header(void) { printf("# %s\n", config_save); dump_global_info(); dump_common_info(); } int main(int argc, char **argv) { size_t i; int rv = 0; struct adm_cmd *cmd = NULL; char **resource_names = NULL; struct d_resource *res, *tmp; char *env_drbd_nodename = NULL; int is_dump_xml; int is_dump; struct cfg_ctx ctx = { .arg = NULL }; yyin = NULL; uname(&nodeinfo); /* FIXME maybe fold to lower case ? */ no_tty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout))); env_drbd_nodename = getenv("__DRBD_NODE__"); if (env_drbd_nodename && *env_drbd_nodename) { strncpy(nodeinfo.nodename, env_drbd_nodename, sizeof(nodeinfo.nodename) - 1); nodeinfo.nodename[sizeof(nodeinfo.nodename) - 1] = 0; fprintf(stderr, "\n" " found __DRBD_NODE__ in environment\n" " PRETENDING that I am >>%s<<\n\n", nodeinfo.nodename); } assign_command_names_from_argv0(argv); if (drbdsetup == NULL || drbdmeta == NULL || drbd_proxy_ctl == NULL) { fprintf(stderr, "could not strdup argv[0].\n"); exit(E_exec_error); } if (!getenv("DRBD_DONT_WARN_ON_VERSION_MISMATCH")) warn_on_version_mismatch(); maybe_exec_drbdadm_83(argv); recognize_all_drbdsetup_options(); rv = parse_options(argc, argv, &cmd, &resource_names); if (rv) return rv; if (config_test && !cmd->test_config) { fprintf(stderr, "The --config-to-test (-t) option is only allowed " "with the dump and sh-nop commands\n"); exit(E_usage); } do_verify_ips = cmd->verify_ips; is_dump_xml = (cmd->function == adm_dump_xml); is_dump = (is_dump_xml || cmd->function == adm_dump); if (!resource_names[0]) { if (is_dump) all_resources = 1; else if (cmd->res_name_required) print_usage_and_exit(cmd, "No resource names specified", E_usage); } else if (resource_names[0] && resource_names[1]) { if (!cmd->res_name_required) fprintf(stderr, "This command will ignore resource names!\n"); else if (cmd->use_cached_config_file) fprintf(stderr, "You should not use this command with multiple resources!\n"); } if (!config_file && cmd->use_cached_config_file) config_file = config_file_from_arg(resource_names[0]); if (!config_file) /* may exit if no config file can be used! */ assign_default_config_file(); /* for error-reporting reasons config_file may be re-assigned by adm_adjust, * we need the current value for register_minor, though. * save that. */ if (config_from_stdin) config_save = config_file; else config_save = canonify_path(config_file); my_parse(); if (config_test) { char *saved_config_file = config_file; char *saved_config_save = config_save; config_file = config_test; config_save = canonify_path(config_test); fclose(yyin); yyin = fopen(config_test, "r"); if (!yyin) { fprintf(stderr, "Can not open '%s'.\n.", config_test); exit(E_exec_error); } my_parse(); config_file = saved_config_file; config_save = saved_config_save; } if (!config_valid) exit(E_config_invalid); post_parse(config, cmd->is_proxy_cmd ? MATCH_ON_PROXY : 0); if (!is_dump || dry_run || verbose) expand_common(); if (dry_run || config_from_stdin) do_register = 0; count_resources_or_die(); if (cmd->uc_dialog) uc_node(global_options.usage_count); ctx.arg = cmd->name; if (cmd->res_name_required) { if (config == NULL) { fprintf(stderr, "no resources defined!\n"); exit(E_usage); } global_validate_maybe_expand_die_if_invalid(!is_dump, cmd->is_proxy_cmd ? MATCH_ON_PROXY : 0); if (!resource_names[0] || !strcmp(resource_names[0], "all")) { /* either no resource arguments at all, * but command is dump / dump-xml, so implicit "all", * or an explicit "all" argument is given */ all_resources = 1; if (!is_dump) die_if_no_resources(); /* verify ips first, for all of them */ for_each_resource(res, tmp, config) { verify_ips(res); } if (!config_valid) exit(E_config_invalid); if (is_dump_xml) print_dump_xml_header(); else if (is_dump) print_dump_header(); for_each_resource(res, tmp, config) { if (!is_dump && res->ignore) continue; if (!is_dump && is_drbd_top != res->stacked) continue; ctx.res = res; ctx.vol = NULL; int r = call_cmd(cmd, &ctx, EXIT_ON_FAIL); /* does exit for r >= 20! */ /* this super positioning of return values is soo ugly * anyone any better idea? */ if (r > rv) rv = r; } if (is_dump_xml) { --indent; printf("\n"); } } else { /* explicit list of resources to work on */ for (i = 0; resource_names[i]; i++) { ctx.res = NULL; ctx.vol = NULL; ctx_by_name(&ctx, resource_names[i]); if (!ctx.res) ctx_by_minor(&ctx, resource_names[i]); if (!ctx.res) { fprintf(stderr, "'%s' not defined in your config (for this host).\n", resource_names[i]); exit(E_usage); } if (!cmd->vol_id_required && !cmd->iterate_volumes && ctx.vol != NULL) { if (ctx.vol->implicit) ctx.vol = NULL; else { fprintf(stderr, "%s operates on whole resources, but you specified a specific volume!\n", cmd->name); exit(E_usage); } } if (cmd->vol_id_required && !ctx.vol && ctx.res->me->volumes->implicit) ctx.vol = ctx.res->me->volumes; if (cmd->vol_id_required && !ctx.vol) { fprintf(stderr, "%s requires a specific volume id, but none is specified.\n" "Try '%s minor-' or '%s %s/'\n", cmd->name, cmd->name, cmd->name, resource_names[i]); exit(E_usage); } if (ctx.res->ignore && !is_dump) { fprintf(stderr, "'%s' ignored, since this host (%s) is not mentioned with an 'on' keyword.\n", ctx.res->name, nodeinfo.nodename); rv = E_usage; continue; } if (is_drbd_top != ctx.res->stacked && !is_dump) { fprintf(stderr, "'%s' is a %s resource, and not available in %s mode.\n", ctx.res->name, ctx.res->stacked ? "stacked" : "normal", is_drbd_top ? "stacked" : "normal"); rv = E_usage; continue; } verify_ips(ctx.res); if (!is_dump && !config_valid) exit(E_config_invalid); rv = call_cmd(cmd, &ctx, EXIT_ON_FAIL); /* does exit for rv >= 20! */ } } } else { // Commands which do not need a resource name /* no call_cmd, as that implies register_minor, * which does not make sense for resource independent commands. * It does also not need to iterate over volumes: it does not even know the resource. */ rv = cmd->function(&ctx); if (rv >= 10) { /* why do we special case the "generic sh-*" commands? */ fprintf(stderr, "command %s exited with code %d\n", cmd->name, rv); exit(rv); } } /* do we really have to bitor the exit code? * it is even only a Boolean value in this case! */ rv |= run_deferred_cmds(); free_config(config); free(resource_names); if (admopt != general_admopt) free(admopt); return rv; } void yyerror(char *text) { fprintf(stderr, "%s:%d: %s\n", config_file, line, text); exit(E_syntax); } drbd-8.4.4/user/drbdadm_parser.c0000664000000000000000000014623412221331365015245 0ustar rootroot/* * drbdadm_parser.c a hand crafted parser This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2006-2008, LINBIT Information Technologies GmbH Copyright (C) 2006-2008, Philipp Reisner Copyright (C) 2006-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include "drbdadm.h" #include "linux/drbd_limits.h" #include "drbdtool_common.h" #include "drbdadm_parser.h" YYSTYPE yylval; ///////////////////// static int c_section_start; static int parse_proxy_options(struct d_option **, struct d_option **); void my_parse(void); struct d_name *names_from_str(char* str) { struct d_name *names; names = malloc(sizeof(struct d_name)); names->next = NULL; names->name = strdup(str); return names; } char *_names_to_str_c(char* buffer, struct d_name *names, char c) { int n = 0; if (!names) { snprintf(buffer, NAMES_STR_SIZE, "UNKNOWN"); return buffer; } while (1) { n += snprintf(buffer + n, NAMES_STR_SIZE - n, "%s", names->name); names = names->next; if (!names) return buffer; n += snprintf(buffer + n, NAMES_STR_SIZE - n, "%c", c); } } char *_names_to_str(char* buffer, struct d_name *names) { return _names_to_str_c(buffer, names, ' '); } int name_in_names(char *name, struct d_name *names) { while (names) { if (!strcmp(names->name, name)) return 1; names = names->next; } return 0; } void free_names(struct d_name *names) { struct d_name *nf; while (names) { nf = names->next; free(names->name); free(names); names = nf; } } static void append_names(struct d_name **head, struct d_name ***last, struct d_name *to_copy) { struct d_name *new; while (to_copy) { new = malloc(sizeof(struct d_name)); if (!*head) *head = new; new->name = strdup(to_copy->name); new->next = NULL; if (*last) **last = new; *last = &new->next; to_copy = to_copy->next; } } struct d_name *concat_names(struct d_name *to_copy1, struct d_name *to_copy2) { struct d_name *head = NULL, **last = NULL; append_names(&head, &last, to_copy1); append_names(&head, &last, to_copy2); return head; } void m_strtoll_range(const char *s, char def_unit, const char *name, unsigned long long min, unsigned long long max) { unsigned long long r = m_strtoll(s, def_unit); char unit[] = { def_unit != '1' ? def_unit : 0, 0 }; if (min > r || r > max) { fprintf(stderr, "%s:%d: %s %s => %llu%s out of range [%llu..%llu]%s.\n", config_file, fline, name, s, r, unit, min, max, unit); if (config_valid <= 1) { config_valid = 0; return; } } if (DEBUG_RANGE_CHECK) { fprintf(stderr, "%s:%d: %s %s => %llu%s in range [%llu..%llu]%s.\n", config_file, fline, name, s, r, unit, min, max, unit); } } void range_check(const enum range_checks what, const char *name, char *value) { char proto = 0; /* * FIXME: Handle signed/unsigned values correctly by checking the * F_field_name_IS_SIGNED defines. */ #define M_STRTOLL_RANGE(x) \ m_strtoll_range(value, DRBD_ ## x ## _SCALE, name, \ DRBD_ ## x ## _MIN, \ DRBD_ ## x ## _MAX) switch (what) { case R_NO_CHECK: break; default: fprintf(stderr, "%s:%d: unknown range for %s => %s\n", config_file, fline, name, value); break; case R_MINOR_COUNT: M_STRTOLL_RANGE(MINOR_COUNT); break; case R_DIALOG_REFRESH: M_STRTOLL_RANGE(DIALOG_REFRESH); break; case R_DISK_SIZE: M_STRTOLL_RANGE(DISK_SIZE); break; case R_TIMEOUT: M_STRTOLL_RANGE(TIMEOUT); break; case R_CONNECT_INT: M_STRTOLL_RANGE(CONNECT_INT); break; case R_PING_INT: M_STRTOLL_RANGE(PING_INT); break; case R_MAX_BUFFERS: M_STRTOLL_RANGE(MAX_BUFFERS); break; case R_MAX_EPOCH_SIZE: M_STRTOLL_RANGE(MAX_EPOCH_SIZE); break; case R_SNDBUF_SIZE: M_STRTOLL_RANGE(SNDBUF_SIZE); break; case R_RCVBUF_SIZE: M_STRTOLL_RANGE(RCVBUF_SIZE); break; case R_KO_COUNT: M_STRTOLL_RANGE(KO_COUNT); break; case R_RATE: M_STRTOLL_RANGE(RESYNC_RATE); break; case R_AL_EXTENTS: M_STRTOLL_RANGE(AL_EXTENTS); break; case R_PORT: M_STRTOLL_RANGE(PORT); break; /* FIXME not yet implemented! case R_META_IDX: M_STRTOLL_RANGE(META_IDX); break; */ case R_WFC_TIMEOUT: M_STRTOLL_RANGE(WFC_TIMEOUT); break; case R_DEGR_WFC_TIMEOUT: M_STRTOLL_RANGE(DEGR_WFC_TIMEOUT); break; case R_OUTDATED_WFC_TIMEOUT: M_STRTOLL_RANGE(OUTDATED_WFC_TIMEOUT); break; case R_C_PLAN_AHEAD: M_STRTOLL_RANGE(C_PLAN_AHEAD); break; case R_C_DELAY_TARGET: M_STRTOLL_RANGE(C_DELAY_TARGET); break; case R_C_FILL_TARGET: M_STRTOLL_RANGE(C_FILL_TARGET); break; case R_C_MAX_RATE: M_STRTOLL_RANGE(C_MAX_RATE); break; case R_C_MIN_RATE: M_STRTOLL_RANGE(C_MIN_RATE); break; case R_CONG_FILL: M_STRTOLL_RANGE(CONG_FILL); break; case R_CONG_EXTENTS: M_STRTOLL_RANGE(CONG_EXTENTS); break; case R_PROTOCOL: if (value && value[0] && value[1] == 0) { proto = value[0] & ~0x20; /* toupper */ if (proto == 'A' || proto == 'B' || proto == 'C') value[0] = proto; else proto = 0; } if (!proto && config_valid <= 1) { config_valid = 0; fprintf(stderr, "unknown protocol '%s', should be one of A,B,C\n", value); } } } struct d_option *new_opt(char *name, char *value) { struct d_option *cn = calloc(1, sizeof(struct d_option)); /* fprintf(stderr,"%s:%d: %s = %s\n",config_file,line,name,value); */ cn->name = name; cn->value = value; return cn; } static void derror(struct d_host_info *host, struct d_resource *res, char *text) { config_valid = 0; fprintf(stderr, "%s:%d: in resource %s, on %s { ... }:" " '%s' keyword missing.\n", config_file, c_section_start, res->name, names_to_str(host->on_hosts), text); } void pdperror(char *text) { config_valid = 0; fprintf(stderr, "%s:%d: in proxy plugin section: %s.\n", config_file, line, text); exit(E_config_invalid); } static void pperror(struct d_host_info *host, struct d_proxy_info *proxy, char *text) { config_valid = 0; fprintf(stderr, "%s:%d: in section: on %s { proxy on %s { ... } }:" " '%s' keyword missing.\n", config_file, c_section_start, names_to_str(host->on_hosts), names_to_str(proxy->on_hosts), text); } #define typecheck(type,x) \ ({ type __dummy; \ typeof(x) __dummy2; \ (void)(&__dummy == &__dummy2); \ 1; \ }) #define for_each_host(h_,hosts_) \ for ( ({ typecheck(struct d_name*, h_); \ h_ = hosts_; }); \ h_; h_ = h_->next) /* * for check_uniq: check uniqueness of * resource names, ip:port, node:disk and node:device combinations * as well as resource:section ... * hash table to test for uniqueness of these values... * 256 (max minors) * *( * 2 (host sections) * 4 (res ip:port node:disk node:device) * + 4 (other sections) * + some more, * if we want to check for scoped uniqueness of *every* option * ) * since nobody (?) will actually use more than a dozen minors, * this should be more than enough. */ struct hsearch_data global_htable; void check_uniq_init(void) { memset(&global_htable, 0, sizeof(global_htable)); if (!hcreate_r(256 * ((2 * 4) + 4), &global_htable)) { fprintf(stderr, "Insufficient memory.\n"); exit(E_exec_error); }; } /* some settings need only be unique within one resource definition. * we need currently about 8 + (number of host) * 8 entries, * 200 should be much more than enough. */ struct hsearch_data per_resource_htable; void check_upr_init(void) { static int created = 0; if (config_valid >= 2) return; if (created) hdestroy_r(&per_resource_htable); memset(&per_resource_htable, 0, sizeof(per_resource_htable)); if (!hcreate_r(256, &per_resource_htable)) { fprintf(stderr, "Insufficient memory.\n"); exit(E_exec_error); }; created = 1; } /* FIXME * strictly speaking we don't need to check for uniqueness of disk and device names, * but for uniqueness of their major:minor numbers ;-) */ int vcheck_uniq(struct hsearch_data *ht, const char *what, const char *fmt, va_list ap) { int rv; ENTRY e, *ep; e.key = e.data = ep = NULL; /* if we are done parsing the config file, * switch off this paranoia */ if (config_valid >= 2) return 1; rv = vasprintf(&e.key, fmt, ap); if (rv < 0) { perror("vasprintf"); exit(E_thinko); } if (EXIT_ON_CONFLICT && !what) { fprintf(stderr, "Oops, unset argument in %s:%d.\n", __FILE__, __LINE__); exit(E_thinko); } m_asprintf((char **)&e.data, "%s:%u", config_file, fline); hsearch_r(e, FIND, &ep, ht); //fprintf(stderr, "FIND %s: %p\n", e.key, ep); if (ep) { if (what) { fprintf(stderr, "%s: conflicting use of %s '%s' ...\n" "%s: %s '%s' first used here.\n", (char *)e.data, what, ep->key, (char *)ep->data, what, ep->key); } free(e.key); free(e.data); config_valid = 0; } else { //fprintf(stderr, "ENTER %s\t=>\t%s\n", e.key, (char *)e.data); hsearch_r(e, ENTER, &ep, ht); if (!ep) { fprintf(stderr, "hash table entry (%s => %s) failed\n", e.key, (char *)e.data); exit(E_thinko); } ep = NULL; } if (EXIT_ON_CONFLICT && ep) exit(E_config_invalid); return !ep; } int check_uniq(const char *what, const char *fmt, ...) { int rv; va_list ap; va_start(ap, fmt); rv = vcheck_uniq(&global_htable, what, fmt, ap); va_end(ap); return rv; } /* unique per resource */ int check_upr(const char *what, const char *fmt, ...) { int rv; va_list ap; va_start(ap, fmt); rv = vcheck_uniq(&per_resource_htable, what, fmt, ap); va_end(ap); return rv; } void check_meta_disk(struct d_volume *vol, struct d_host_info *host) { struct d_name *h; /* when parsing "drbdsetup show[-all]" output, * a detached volume will only have device/minor, * but no disk or meta disk. */ if (vol->meta_disk == NULL) return; if (strcmp(vol->meta_disk, "internal") != 0) { /* index either some number, or "flexible" */ for_each_host(h, host->on_hosts) check_uniq("meta-disk", "%s:%s[%s]", h->name, vol->meta_disk, vol->meta_index); } } static void pe_expected(const char *exp) { const char *s = yytext; fprintf(stderr, "%s:%u: Parse error: '%s' expected,\n\t" "but got '%.20s%s'\n", config_file, line, exp, s, strlen(s) > 20 ? "..." : ""); exit(E_config_invalid); } static void check_string_error(int got) { const char *msg; switch(got) { case TK_ERR_STRING_TOO_LONG: msg = "Token too long"; break; case TK_ERR_DQSTRING_TOO_LONG: msg = "Double quoted string too long"; break; case TK_ERR_DQSTRING: msg = "Unterminated double quoted string\n we don't allow embedded newlines\n "; break; default: return; } fprintf(stderr,"%s:%u: %s >>>%.20s...<<<\n", config_file, line, msg, yytext); exit(E_config_invalid); } static void pe_expected_got(const char *exp, int got) { static char tmp[2] = "\0"; const char *s = yytext; if (exp[0] == '\'' && exp[1] && exp[2] == '\'' && exp[3] == 0) { tmp[0] = exp[1]; } fprintf(stderr, "%s:%u: Parse error: '%s' expected,\n\t" "but got '%.20s%s' (TK %d)\n", config_file, line, tmp[0] ? tmp : exp, s, strlen(s) > 20 ? "..." : "", got); exit(E_config_invalid); } #define EXP(TOKEN1) \ ({ \ int token; \ token = yylex(); \ if (token != TOKEN1) { \ if (TOKEN1 == TK_STRING) \ check_string_error(token); \ pe_expected_got( #TOKEN1, token); \ } \ token; \ }) static void expect_STRING_or_INT(void) { int token = yylex(); switch(token) { case TK_INTEGER: case TK_STRING: break; case TK_ON: yylval.txt = strdup(yytext); break; default: check_string_error(token); pe_expected_got("TK_STRING | TK_INTEGER", token); } } static void parse_global(void) { fline = line; check_uniq("global section", "global"); if (config) { fprintf(stderr, "%s:%u: You should put the global {} section\n\t" "in front of any resource {} section\n", config_file, line); } EXP('{'); while (1) { int token = yylex(); fline = line; switch (token) { case TK_DISABLE_IP_VERIFICATION: global_options.disable_ip_verification = 1; break; case TK_MINOR_COUNT: EXP(TK_INTEGER); range_check(R_MINOR_COUNT, "minor-count", yylval.txt); global_options.minor_count = atoi(yylval.txt); break; case TK_DIALOG_REFRESH: EXP(TK_INTEGER); range_check(R_DIALOG_REFRESH, "dialog-refresh", yylval.txt); global_options.dialog_refresh = atoi(yylval.txt); break; case TK_USAGE_COUNT: switch (yylex()) { case TK_YES: global_options.usage_count = UC_YES; break; case TK_NO: global_options.usage_count = UC_NO; break; case TK_ASK: global_options.usage_count = UC_ASK; break; default: pe_expected("yes | no | ask"); } break; case '}': return; default: pe_expected("dialog-refresh | minor-count | " "disable-ip-verification"); } EXP(';'); } } static void check_and_change_deprecated_alias(char **name, int token) { int i; static struct { enum yytokentype token; char *old_name, *new_name; } table[] = { { TK_HANDLER_OPTION, "outdate-peer", "fence-peer" }, { TK_DISK_OPTION, "rate", "resync-rate" }, { TK_DISK_OPTION, "after", "resync-after" }, }; for (i = 0; i < ARRAY_SIZE(table); i++) { if (table[i].token == token && !strcmp(table[i].old_name, *name)) { free(*name); *name = strdup(table[i].new_name); } } } /* The syncer section is deprecated. Distribute the options to the disk or net options. */ void parse_options_syncer(struct d_resource *res) { char *opt_name; int token; enum range_checks rc; struct d_option **options = NULL, *current_option = NULL; c_section_start = line; fline = line; while (1) { token = yylex(); fline = line; if (token >= TK_GLOBAL && !(token & TK_SYNCER_OLD_OPT)) pe_expected("a syncer option keyword"); token &= ~TK_SYNCER_OLD_OPT; switch (token) { case TK_NET_FLAG: case TK_NET_NO_FLAG: case TK_NET_OPTION: options = &res->net_options; break; case TK_DISK_FLAG: case TK_DISK_NO_FLAG: case TK_DISK_OPTION: options = &res->disk_options; break; case TK_RES_OPTION: options = &res->res_options; break; case '}': return; default: pe_expected("a syncer option keyword"); } opt_name = yylval.txt; switch (token) { case TK_NET_FLAG: case TK_DISK_FLAG: token = yylex(); switch(token) { case TK_NO: current_option = new_opt(opt_name, strdup("no")); *options = APPEND(*options, current_option); token = yylex(); break; default: current_option = new_opt(opt_name, strdup("yes")); *options = APPEND(*options, current_option); if (token == TK_YES) token = yylex(); break; } break; case TK_NET_NO_FLAG: case TK_DISK_NO_FLAG: /* Backward compatibility with the old config file syntax. */ assert(!strncmp(opt_name, "no-", 3)); current_option = new_opt(strdup(opt_name + 3), strdup("no")); *options = APPEND(*options, current_option); free(opt_name); token = yylex(); break; case TK_NET_OPTION: case TK_DISK_OPTION: case TK_RES_OPTION: check_and_change_deprecated_alias(&opt_name, token); rc = yylval.rc; expect_STRING_or_INT(); range_check(rc, opt_name, yylval.txt); current_option = new_opt(opt_name, yylval.txt); *options = APPEND(*options, current_option); token = yylex(); break; } switch (token) { case ';': break; default: pe_expected(";"); } } } static struct d_option *parse_options_d(int token_flag, int token_no_flag, int token_option, int token_delegate, void (*delegate)(void*), void *ctx) { char *opt_name; int token, token_group; enum range_checks rc; struct d_option *options = NULL, *current_option = NULL; c_section_start = line; fline = line; while (1) { token_group = yylex(); /* Keep the higher bits in token_option, remove them from token. */ token = REMOVE_GROUP_FROM_TOKEN(token_group); fline = line; opt_name = yylval.txt; if (token == token_flag) { switch(yylex()) { case TK_YES: current_option = new_opt(opt_name, strdup("yes")); options = APPEND(options, current_option); break; case TK_NO: current_option = new_opt(opt_name, strdup("no")); options = APPEND(options, current_option); break; case ';': /* Flag value missing; assume yes. */ options = APPEND(options, new_opt(opt_name, strdup("yes"))); continue; default: pe_expected("yes | no | ;"); } } else if (token == token_no_flag) { /* Backward compatibility with the old config file syntax. */ assert(!strncmp(opt_name, "no-", 3)); current_option = new_opt(strdup(opt_name + 3), strdup("no")); options = APPEND(options, current_option); free(opt_name); } else if (token == token_option || GET_TOKEN_GROUP(token_option & token_group)) { check_and_change_deprecated_alias(&opt_name, token_option); rc = yylval.rc; expect_STRING_or_INT(); range_check(rc, opt_name, yylval.txt); current_option = new_opt(opt_name, yylval.txt); options = APPEND(options, current_option); } else if (token == token_delegate || GET_TOKEN_GROUP(token_delegate & token_group)) { delegate(ctx); continue; } else if (token == TK_DEPRECATED_OPTION) { /* fprintf(stderr, "Warn: Ignoring deprecated option '%s'\n", yylval.txt); */ expect_STRING_or_INT(); } else if (token == '}') { return options; } else { pe_expected("an option keyword"); } EXP(';'); } } static struct d_option *parse_options(int token_flag, int token_no_flag, int token_option) { return parse_options_d(token_flag, token_no_flag, token_option, 0, NULL, NULL); } static void __parse_address(char** addr, char** port, char** af) { switch(yylex()) { case TK_SCI: /* 'ssocks' was names 'sci' before. */ if (af) *af = strdup("ssocks"); EXP(TK_IPADDR); break; case TK_SSOCKS: case TK_SDP: case TK_IPV4: if (af) *af = yylval.txt; EXP(TK_IPADDR); break; case TK_IPV6: if (af) *af = yylval.txt; EXP('['); EXP(TK_IPADDR6); break; case TK_IPADDR: if (af) *af = strdup("ipv4"); break; /* case '[': // Do not foster people's laziness ;) EXP(TK_IPADDR6); *af = strdup("ipv6"); break; */ default: pe_expected("ssocks | sdp | ipv4 | ipv6 | "); } if (addr) *addr = yylval.txt; if (af && !strcmp(*af, "ipv6")) EXP(']'); EXP(':'); EXP(TK_INTEGER); if (port) *port = yylval.txt; range_check(R_PORT, "port", yylval.txt); } static void parse_address(struct d_name *on_hosts, char** addr, char** port, char** af) { struct d_name *h; __parse_address(addr, port, af); if (!strcmp(*addr, "127.0.0.1") || !strcmp(*addr, "::1")) for_each_host(h, on_hosts) check_uniq("IP", "%s:%s:%s", h->name, *addr, *port); else check_uniq("IP", "%s:%s", *addr, *port); EXP(';'); } static void parse_hosts(struct d_name **pnp, char delimeter) { char errstr[20]; struct d_name *name; int hosts = 0; int token; while (1) { token = yylex(); switch (token) { case TK_STRING: name = malloc(sizeof(struct d_name)); name->name = yylval.txt; name->next = NULL; *pnp = name; pnp = &name->next; hosts++; break; default: if (token == delimeter) { if (!hosts) pe_expected_got("TK_STRING", token); return; } else { sprintf(errstr, "TK_STRING | '%c'", delimeter); pe_expected_got(errstr, token); } } } } static void parse_proxy_section(struct d_host_info *host) { struct d_proxy_info *proxy; proxy=calloc(1,sizeof(struct d_proxy_info)); host->proxy = proxy; EXP(TK_ON); parse_hosts(&proxy->on_hosts, '{'); while (1) { switch (yylex()) { case TK_INSIDE: parse_address(proxy->on_hosts, &proxy->inside_addr, &proxy->inside_port, &proxy->inside_af); break; case TK_OUTSIDE: parse_address(proxy->on_hosts, &proxy->outside_addr, &proxy->outside_port, &proxy->outside_af); break; case TK_OPTIONS: parse_proxy_options(&proxy->options, &proxy->plugins); break; case '}': goto break_loop; default: pe_expected("inside | outside"); } } break_loop: if (!proxy->inside_addr) pperror(host, proxy, "inside"); if (!proxy->outside_addr) pperror(host, proxy, "outside"); return; } void parse_meta_disk(struct d_volume *vol) { EXP(TK_STRING); vol->meta_disk = yylval.txt; if (strcmp("internal", yylval.txt) == 0) { /* internal, flexible size */ vol->meta_index = strdup("internal"); EXP(';'); } else { switch(yylex()) { case '[': EXP(TK_INTEGER); /* external, static size */ vol->meta_index = yylval.txt; EXP(']'); EXP(';'); break; case ';': /* external, flexible size */ vol->meta_index = strdup("flexible"); break; default: pe_expected("[ | ;"); } } } static void check_minor_nonsense(const char *devname, const int explicit_minor) { if (!devname) return; /* if devname is set, it starts with /dev/drbd */ if (only_digits(devname + 9)) { int m = strtol(devname + 9, NULL, 10); if (m == explicit_minor) return; fprintf(stderr, "%s:%d: explicit minor number must match with device name\n" "\tTry \"device /dev/drbd%u minor %u;\",\n" "\tor leave off either device name or explicit minor.\n" "\tArbitrary device names must start with /dev/drbd_\n" "\tmind the '_'! (/dev/ is optional, but drbd_ is required)\n", config_file, fline, explicit_minor, explicit_minor); config_valid = 0; return; } else if (devname[9] == '_') return; fprintf(stderr, "%s:%d: arbitrary device name must start with /dev/drbd_\n" "\tmind the '_'! (/dev/ is optional, but drbd_ is required)\n", config_file, fline); config_valid = 0; return; } static void parse_device(struct d_name* on_hosts, struct d_volume *vol) { struct d_name *h; int m; switch (yylex()) { case TK_STRING: if (!strncmp("drbd", yylval.txt, 4)) { m_asprintf(&vol->device, "/dev/%s", yylval.txt); free(yylval.txt); } else vol->device = yylval.txt; if (strncmp("/dev/drbd", vol->device, 9)) { fprintf(stderr, "%s:%d: device name must start with /dev/drbd\n" "\t(/dev/ is optional, but drbd is required)\n", config_file, fline); config_valid = 0; /* no goto out yet, * as that would additionally throw a parse error */ } switch (yylex()) { default: pe_expected("minor | ;"); /* fall through */ case ';': m = dt_minor_of_dev(vol->device); if (m < 0) { fprintf(stderr, "%s:%d: no minor given nor device name contains a minor number\n", config_file, fline); config_valid = 0; } vol->device_minor = m; goto out; case TK_MINOR: ; /* double fall through */ } case TK_MINOR: EXP(TK_INTEGER); vol->device_minor = atoi(yylval.txt); EXP(';'); /* if both device name and minor number are explicitly given, * force /dev/drbd or /dev/drbd_ */ check_minor_nonsense(vol->device, vol->device_minor); } out: for_each_host(h, on_hosts) { check_uniq("device-minor", "device-minor:%s:%u", h->name, vol->device_minor); if (vol->device) check_uniq("device", "device:%s:%s", h->name, vol->device); } } struct d_volume *find_volume(struct d_volume *vol, int vnr) { while (vol) { if (vol->vnr == vnr) return vol; vol = vol->next; } return NULL; } struct d_volume *volume0(struct d_volume **volp) { struct d_volume *vol; if (!*volp) { vol = calloc(1, sizeof(struct d_volume)); vol->device_minor = -1; *volp = vol; vol->implicit = 1; return vol; } else { vol = *volp; if (vol->vnr == 0 && vol->next == NULL && vol->implicit) return vol; config_valid = 0; fprintf(stderr, "%s:%d: Explicit and implicit volumes not allowed\n", config_file, line); return vol; } } int parse_volume_stmt(struct d_volume *vol, struct d_name* on_hosts, int token) { switch (token) { case TK_DISK: token = yylex(); switch (token) { case TK_STRING: vol->disk = yylval.txt; EXP(';'); break; case '{': vol->disk_options = parse_options(TK_DISK_FLAG, TK_DISK_NO_FLAG, TK_DISK_OPTION); break; default: check_string_error(token); pe_expected_got( "TK_STRING | {", token); } break; case TK_DEVICE: parse_device(on_hosts, vol); break; case TK_META_DISK: parse_meta_disk(vol); break; case TK_FLEX_META_DISK: EXP(TK_STRING); vol->meta_disk = yylval.txt; if (strcmp("internal", yylval.txt) != 0) { /* external, flexible ize */ vol->meta_index = strdup("flexible"); } else { /* internal, flexible size */ vol->meta_index = strdup("internal"); } EXP(';'); break; default: return 0; } return 1; } struct d_volume *parse_volume(int vnr, struct d_name* on_hosts) { struct d_volume *vol; int token; vol = calloc(1,sizeof(struct d_volume)); vol->device_minor = -1; vol->vnr = vnr; EXP('{'); while (1) { token = yylex(); if (token == '}') break; if (!parse_volume_stmt(vol, on_hosts, token)) pe_expected_got("device | disk | meta-disk | flex-meta-disk | }", token); } return vol; } struct d_volume *parse_stacked_volume(int vnr) { struct d_volume *vol; vol = calloc(1,sizeof(struct d_volume)); vol->device_minor = -1; vol->vnr = vnr; EXP('{'); EXP(TK_DEVICE); parse_device(NULL, vol); EXP('}'); vol->meta_disk = strdup("internal"); vol->meta_index = strdup("internal"); return vol; } void inherit_volumes(struct d_volume *from, struct d_host_info *host) { struct d_volume *s, *t; struct d_name *h; for (s = from; s != NULL ; s = s->next) { t = find_volume(host->volumes, s->vnr); if (!t) { t = calloc(1, sizeof(struct d_volume)); t->device_minor = -1; t->vnr = s->vnr; host->volumes = INSERT_SORTED(host->volumes, t, vnr); } if (!t->disk && s->disk) { t->disk = strdup(s->disk); for_each_host(h, host->on_hosts) check_uniq("disk", "disk:%s:%s", h->name, t->disk); } if (!t->device && s->device) t->device = strdup(s->device); if (t->device_minor == -1U && s->device_minor != -1U) { t->device_minor = s->device_minor; for_each_host(h, host->on_hosts) check_uniq("device-minor", "device-minor:%s:%d", h->name, t->device_minor); } if (!t->meta_disk && s->meta_disk) { t->meta_disk = strdup(s->meta_disk); if (s->meta_index) t->meta_index = strdup(s->meta_index); } } } void check_volume_complete(struct d_resource *res, struct d_host_info *host, struct d_volume *vol) { if (!vol->device && vol->device_minor == -1U) derror(host, res, "device"); if (!vol->disk) derror(host, res, "disk"); if (!vol->meta_disk) derror(host, res, "meta-disk"); if (!vol->meta_index) derror(host, res, "meta-index"); } void check_volumes_complete(struct d_resource *res, struct d_host_info *host) { struct d_volume *vol = host->volumes; unsigned vnr = -1U; while (vol) { if (vnr == -1U || vnr < vol->vnr) vnr = vol->vnr; else fprintf(stderr, "internal error: in %s: unsorted volumes list\n", res->name); check_volume_complete(res, host, vol); vol = vol->next; } } void check_volume_sets_equal(struct d_resource *res, struct d_host_info *host1, struct d_host_info *host2) { struct d_volume *a, *b; /* change the error output, if we have been called to * compare stacked with lower resource volumes */ int compare_stacked = host1->lower && host1->lower->me == host2; a = host1->volumes; b = host2->volumes; /* volume lists are supposed to be sorted on vnr */ while (a || b) { while (a && (!b || a->vnr < b->vnr)) { fprintf(stderr, "%s:%d: in resource %s, on %s { ... }: " "volume %d not defined on %s\n", config_file, line, res->name, names_to_str(host1->on_hosts), a->vnr, compare_stacked ? host1->lower->name : names_to_str(host2->on_hosts)); a = a->next; config_valid = 0; } while (b && (!a || a->vnr > b->vnr)) { /* Though unusual, it is "legal" for a lower resource * to have more volumes than the resource stacked on * top of it. Warn (if we have a terminal), * but consider it as valid. */ if (!(compare_stacked && no_tty)) fprintf(stderr, "%s:%d: in resource %s, on %s { ... }: " "volume %d missing (present on %s)\n", config_file, line, res->name, names_to_str(host1->on_hosts), b->vnr, compare_stacked ? host1->lower->name : names_to_str(host2->on_hosts)); if (!compare_stacked) config_valid = 0; b = b->next; } if (a && b && a->vnr == b->vnr) { a = a->next; b = b->next; } } } /* Ensure that in all host sections the same volumes are defined */ void check_volumes_hosts(struct d_resource *res) { struct d_host_info *host1, *host2; host1 = res->all_hosts; if (!host1) return; for (host2 = host1->next; host2; host2 = host2->next) check_volume_sets_equal(res, host1, host2); } enum parse_host_section_flags { REQUIRE_ALL = 1, BY_ADDRESS = 2, }; void parse_host_section(struct d_resource *res, struct d_name* on_hosts, enum parse_host_section_flags flags) { struct d_host_info *host; struct d_volume *vol; struct d_name *h; int in_braces = 1; c_section_start = line; fline = line; host = calloc(1,sizeof(struct d_host_info)); host->on_hosts = on_hosts; host->config_line = c_section_start; if (flags & BY_ADDRESS) { /* floating
{} */ char *fake_uname = NULL; int token; host->by_address = 1; __parse_address(&host->address, &host->port, &host->address_family); check_uniq("IP", "%s:%s", host->address, host->port); if (!strcmp(host->address_family, "ipv6")) m_asprintf(&fake_uname, "ipv6 [%s]:%s", host->address, host->port); else m_asprintf(&fake_uname, "%s:%s", host->address, host->port); on_hosts = names_from_str(fake_uname); host->on_hosts = on_hosts; token = yylex(); switch(token) { case '{': break; case ';': in_braces = 0; break; default: pe_expected_got("{ | ;", token); } } for_each_host(h, on_hosts) check_upr("host section", "%s: on %s", res->name, h->name); res->all_hosts = APPEND(res->all_hosts, host); while (in_braces) { int token = yylex(); fline = line; switch (token) { case TK_DISK: for_each_host(h, on_hosts) check_upr("disk statement", "%s:%s:disk", res->name, h->name); goto vol0stmt; /* for_each_host(h, on_hosts) check_uniq("disk", "disk:%s:%s", h->name, yylval.txt); */ case TK_DEVICE: for_each_host(h, on_hosts) check_upr("device statement", "%s:%s:device", res->name, h->name); goto vol0stmt; case TK_META_DISK: for_each_host(h, on_hosts) check_upr("meta-disk statement", "%s:%s:meta-disk", res->name, h->name); goto vol0stmt; case TK_FLEX_META_DISK: for_each_host(h, on_hosts) check_upr("meta-disk statement", "%s:%s:meta-disk", res->name, h->name); goto vol0stmt; break; case TK_ADDRESS: if (host->by_address) { fprintf(stderr, "%s:%d: address statement not allowed for floating {} host sections\n", config_file, fline); config_valid = 0; exit(E_config_invalid); } for_each_host(h, on_hosts) check_upr("address statement", "%s:%s:address", res->name, h->name); parse_address(on_hosts, &host->address, &host->port, &host->address_family); range_check(R_PORT, "port", host->port); break; case TK_PROXY: parse_proxy_section(host); break; case TK_VOLUME: EXP(TK_INTEGER); host->volumes = INSERT_SORTED(host->volumes, parse_volume(atoi(yylval.txt), on_hosts), vnr); break; case TK_OPTIONS: EXP('{'); host->res_options = parse_options(0, 0, TK_RES_OPTION); break; case '}': in_braces = 0; break; vol0stmt: if (parse_volume_stmt(volume0(&host->volumes), on_hosts, token)) break; /* else fall through */ default: pe_expected("disk | device | address | meta-disk " "| flexible-meta-disk"); } } inherit_volumes(res->volumes, host); for_each_volume(vol, host->volumes) check_meta_disk(vol, host); if (!(flags & REQUIRE_ALL)) return; if (!host->address) derror(host, res, "address"); check_volumes_complete(res, host); } void parse_skip() { int level; int token; fline = line; token = yylex(); switch (token) { case TK_STRING: EXP('{'); break; case '{': break; default: check_string_error(token); pe_expected("[ some_text ] {"); } level = 1; while (level) { switch (yylex()) { case '{': /* if you really want to, you can wrap this with a GB size config file :) */ level++; break; case '}': level--; break; case 0: fprintf(stderr, "%s:%u: reached eof " "while parsing this skip block.\n", config_file, fline); exit(E_config_invalid); } } while (level) ; } void parse_stacked_section(struct d_resource* res) { struct d_host_info *host; struct d_name *h; c_section_start = line; fline = line; host=calloc(1,sizeof(struct d_host_info)); res->all_hosts = APPEND(res->all_hosts, host); EXP(TK_STRING); check_uniq("stacked-on-top-of", "stacked:%s", yylval.txt); host->lower_name = yylval.txt; EXP('{'); while (1) { switch(yylex()) { case TK_DEVICE: /* for_each_host(h, host->on_hosts) check_upr("device statement", "%s:%s:device", res->name, h->name); */ parse_device(host->on_hosts, volume0(&host->volumes)); volume0(&host->volumes)->meta_disk = strdup("internal"); volume0(&host->volumes)->meta_index = strdup("internal"); break; case TK_ADDRESS: for_each_host(h, host->on_hosts) check_upr("address statement", "%s:%s:address", res->name, h->name); parse_address(NULL, &host->address, &host->port, &host->address_family); range_check(R_PORT, "port", yylval.txt); break; case TK_PROXY: parse_proxy_section(host); break; case TK_VOLUME: EXP(TK_INTEGER); host->volumes = INSERT_SORTED(host->volumes, parse_stacked_volume(atoi(yylval.txt)), vnr); break; case '}': goto break_loop; default: pe_expected("device | address | proxy"); } } break_loop: res->stacked_on_one = 1; inherit_volumes(res->volumes, host); if (!host->address) derror(host,res,"address"); } void startup_delegate(void *ctx) { struct d_resource *res = (struct d_resource *)ctx; if (!strcmp(yytext, "become-primary-on")) { parse_hosts(&res->become_primary_on, ';'); } else if (!strcmp(yytext, "stacked-timeouts")) { res->stacked_timeouts = 1; EXP(';'); } else pe_expected(" | become-primary-on | stacked-timeouts"); } void net_delegate(void *ctx) { enum pr_flags flags = (enum pr_flags)ctx; if (!strcmp(yytext, "discard-my-data") && flags & PARSE_FOR_ADJUST) { switch(yylex()) { case TK_YES: case TK_NO: /* Ignore this option. */ EXP(';'); break; case ';': /* Ignore this option. */ return; default: pe_expected("yes | no | ;"); } } else pe_expected("an option keyword"); } void set_me_in_resource(struct d_resource* res, int match_on_proxy) { struct d_host_info *host; /* Determine the local host section */ for (host = res->all_hosts; host; host=host->next) { /* do we match this host? */ if (match_on_proxy) { if (!host->proxy || !name_in_names(nodeinfo.nodename, host->proxy->on_hosts)) continue; } else if (host->by_address) { if (!have_ip(host->address_family, host->address) && /* for debugging only, e.g. __DRBD_NODE__=10.0.0.1 */ strcmp(nodeinfo.nodename, host->address)) continue; } else if (host->lower) { if (!host->lower->me) continue; } else if (!host->on_hosts) { /* huh? a resource without hosts to run on?! */ continue; } else { if (!name_in_names(nodeinfo.nodename, host->on_hosts) && strcmp("_this_host", host->on_hosts->name)) continue; } /* we matched. */ if (res->ignore) { config_valid = 0; fprintf(stderr, "%s:%d: in resource %s, %s %s { ... }:\n" "\tYou cannot ignore and define at the same time.\n", res->config_file, host->config_line, res->name, host->lower ? "stacked-on-top-of" : "on", host->lower ? host->lower->name : names_to_str(host->on_hosts)); } if (res->me) { config_valid = 0; fprintf(stderr, "%s:%d: in resource %s, %s %s { ... } ... %s %s { ... }:\n" "\tThere are multiple host sections for this node.\n", res->config_file, host->config_line, res->name, res->me->lower ? "stacked-on-top-of" : "on", res->me->lower ? res->me->lower->name : names_to_str(res->me->on_hosts), host->lower ? "stacked-on-top-of" : "on", host->lower ? host->lower->name : names_to_str(host->on_hosts)); } res->me = host; if (host->lower) res->stacked = 1; } /* If there is no me, implicitly ignore that resource */ if (!res->me) { res->ignore = 1; return; } } void set_peer_in_resource(struct d_resource* res, int peer_required) { struct d_host_info *host = NULL; if (res->ignore) return; /* me must be already set */ if (!res->me) { /* should have been implicitly ignored. */ fprintf(stderr, "%s:%d: in resource %s:\n" "\tcannot determine the peer, don't even know myself!\n", res->config_file, res->start_line, res->name); exit(E_thinko); } /* only one host section? */ if (!res->all_hosts->next) { if (peer_required) { fprintf(stderr, "%s:%d: in resource %s:\n" "\tMissing section 'on { ... }'.\n", res->config_file, res->start_line, res->name); config_valid = 0; } return; } /* short cut for exactly two host sections. * silently ignore any --peer connect_to_host option. */ if (res->all_hosts->next->next == NULL) { res->peer = res->all_hosts == res->me ? res->all_hosts->next : res->all_hosts; if (dry_run > 1 && connect_to_host) fprintf(stderr, "%s:%d: in resource %s:\n" "\tIgnoring --peer '%s': there are only two host sections.\n", res->config_file, res->start_line, res->name, connect_to_host); return; } /* Multiple peer hosts to choose from. * we need some help! */ if (!connect_to_host) { if (peer_required) { fprintf(stderr, "%s:%d: in resource %s:\n" "\tThere are multiple host sections for the peer node.\n" "\tUse the --peer option to select which peer section to use.\n", res->config_file, res->start_line, res->name); config_valid = 0; } return; } for (host = res->all_hosts; host; host=host->next) { if (host->by_address && strcmp(connect_to_host, host->address)) continue; if (host->proxy && !name_in_names(nodeinfo.nodename, host->proxy->on_hosts)) continue; if (!name_in_names(connect_to_host, host->on_hosts)) continue; if (host == res->me) { fprintf(stderr, "%s:%d: in resource %s\n" "\tInvoked with --peer '%s', but that matches myself!\n", res->config_file, res->start_line, res->name, connect_to_host); res->peer = NULL; break; } if (res->peer) { fprintf(stderr, "%s:%d: in resource %s:\n" "\tInvoked with --peer '%s', but that matches multiple times!\n", res->config_file, res->start_line, res->name, connect_to_host); res->peer = NULL; break; } res->peer = host; } if (peer_required && !res->peer) { config_valid = 0; if (!host) fprintf(stderr, "%s:%d: in resource %s:\n" "\tNo host ('on' or 'floating') section matches --peer '%s'\n", res->config_file, res->start_line, res->name, connect_to_host); } } void set_on_hosts_in_res(struct d_resource *res) { struct d_resource *l_res, *tmp; struct d_host_info *host, *host2; struct d_name *h, **last; for (host = res->all_hosts; host; host=host->next) { if (host->lower_name) { for_each_resource(l_res, tmp, config) { if (!strcmp(l_res->name, host->lower_name)) break; } if (l_res == NULL) { fprintf(stderr, "%s:%d: in resource %s, " "referenced resource '%s' not defined.\n", res->config_file, res->start_line, res->name, host->lower_name); config_valid = 0; continue; } /* Simple: host->on_hosts = concat_names(l_res->me->on_hosts, l_res->peer->on_hosts); */ last = NULL; for (host2 = l_res->all_hosts; host2; host2 = host2->next) if (!host2->lower_name) { append_names(&host->on_hosts, &last, host2->on_hosts); for_each_host(h, host2->on_hosts) { struct d_volume *vol; for_each_volume(vol, host->volumes) check_uniq("device-minor", "device-minor:%s:%u", h->name, vol->device_minor); for_each_volume(vol, host->volumes) if (vol->device) check_uniq("device", "device:%s:%s", h->name, vol->device); } } host->lower = l_res; /* */ if (!strcmp(host->address, "127.0.0.1") || !strcmp(host->address, "::1")) for_each_host(h, host->on_hosts) check_uniq("IP", "%s:%s:%s", h->name, host->address, host->port); } } } void set_disk_in_res(struct d_resource *res) { struct d_host_info *host; struct d_volume *a, *b; if (res->ignore) return; for (host = res->all_hosts; host; host=host->next) { if (!host->lower) continue; if (host->lower->ignore) continue; check_volume_sets_equal(res, host, host->lower->me); if (!config_valid) /* don't even bother for broken config. */ continue; /* volume lists are sorted on vnr */ a = host->volumes; b = host->lower->me->volumes; while (a) { while (b && a->vnr > b->vnr) { /* Lower resource has more volumes. * Probably unusual, but we decided * that it should be legal. * Skip those that do not match */ b = b->next; } if (a && b && a->vnr == b->vnr) { if (b->device) m_asprintf(&a->disk, "%s", b->device); else m_asprintf(&a->disk, "/dev/drbd%u", b->device_minor); /* stacked implicit volumes need internal meta data, too */ if (!a->meta_disk) m_asprintf(&a->meta_disk, "internal"); if (!a->meta_index) m_asprintf(&a->meta_index, "internal"); a = a->next; b = b->next; } else { /* config_invalid should have been set * by check_volume_sets_equal */ assert(0); } } } } void proxy_delegate(void *ctx) { struct d_option **proxy_plugins = (struct d_option **)ctx; int token; struct d_option *options, *opt; struct d_name *line, *word, **pnp; opt = NULL; token = yylex(); if (token != '{') { fprintf(stderr, "%s:%d: expected \"{\" after \"proxy\" keyword\n", config_file, fline); exit(E_config_invalid); } options = NULL; while (1) { pnp = &line; while (1) { yylval.txt = NULL; token = yylex(); if (token == ';') break; if (token == '}') { if (pnp == &line) goto out; fprintf(stderr, "%s:%d: Missing \";\" before \"}\"\n", config_file, fline); exit(E_config_invalid); } word = malloc(sizeof(struct d_name)); if (!word) pdperror("out of memory."); word->name = yylval.txt ? yylval.txt : strdup(yytext); word->next = NULL; *pnp = word; pnp = &word->next; } opt = calloc(1, sizeof(struct d_option)); if (!opt) pdperror("out of memory."); opt->name = strdup(names_to_str(line)); options = APPEND(options, opt); free_names(line); } out: if (proxy_plugins) *proxy_plugins = options; } static int parse_proxy_options(struct d_option **proxy_options, struct d_option **proxy_plugins) { struct d_option *opts; EXP('{'); opts = parse_options_d(0, 0, TK_PROXY_OPTION | TK_PROXY_GROUP, TK_PROXY_DELEGATE, proxy_delegate, proxy_plugins); if (proxy_options) *proxy_options = opts; return 0; } int parse_proxy_options_section(struct d_resource *res) { int token; token = yylex(); if (token != TK_PROXY) { yyrestart(yyin); /* flushes flex's buffers */ return 1; } return parse_proxy_options(&res->proxy_options, &res->proxy_plugins); } struct d_resource* parse_resource(char* res_name, enum pr_flags flags) { struct d_resource* res; struct d_name *host_names; char *opt_name; int token; check_upr_init(); check_uniq("resource section", res_name); res=calloc(1,sizeof(struct d_resource)); res->name = res_name; res->config_file = config_save; res->start_line = line; while(1) { token = yylex(); fline = line; switch(token) { case TK_NET_OPTION: if (strcmp(yylval.txt, "protocol")) goto goto_default; check_upr("protocol statement","%s: protocol",res->name); opt_name = yylval.txt; EXP(TK_STRING); range_check(R_PROTOCOL, opt_name, yylval.txt); res->net_options = APPEND(res->net_options, new_opt(opt_name, yylval.txt)); EXP(';'); break; case TK_ON: parse_hosts(&host_names, '{'); parse_host_section(res, host_names, REQUIRE_ALL); break; case TK_STACKED: parse_stacked_section(res); break; case TK_IGNORE: if (res->me || res->peer) { fprintf(stderr, "%s:%d: in resource %s, " "'ignore-on' statement must precede any real host section (on ... { ... }).\n", config_file, line, res->name); exit(E_config_invalid); } EXP(TK_STRING); fprintf(stderr, "%s:%d: in resource %s, " "WARN: The 'ignore-on' keyword is deprecated.\n", config_file, line, res->name); EXP(';'); break; case TK__THIS_HOST: EXP('{'); host_names = names_from_str("_this_host"); parse_host_section(res, host_names, 0); break; case TK__REMOTE_HOST: EXP('{'); host_names = names_from_str("_remote_host"); parse_host_section(res, host_names, 0); break; case TK_FLOATING: parse_host_section(res, NULL, REQUIRE_ALL + BY_ADDRESS); break; case TK_DISK: switch (token=yylex()) { case TK_STRING: /* open coded parse_volume_stmt() */ volume0(&res->volumes)->disk = yylval.txt; EXP(';'); break; case '{': check_upr("disk section", "%s:disk", res->name); res->disk_options = SPLICE(res->disk_options, parse_options(TK_DISK_FLAG, TK_DISK_NO_FLAG, TK_DISK_OPTION)); break; default: check_string_error(token); pe_expected_got( "TK_STRING | {", token); } break; case TK_NET: check_upr("net section", "%s:net", res->name); EXP('{'); res->net_options = SPLICE(res->net_options, parse_options_d(TK_NET_FLAG, TK_NET_NO_FLAG, TK_NET_OPTION, TK_NET_DELEGATE, &net_delegate, (void *)flags)); break; case TK_SYNCER: check_upr("syncer section", "%s:syncer", res->name); EXP('{'); parse_options_syncer(res); break; case TK_STARTUP: check_upr("startup section", "%s:startup", res->name); EXP('{'); res->startup_options = parse_options_d(TK_STARTUP_FLAG, 0, TK_STARTUP_OPTION, TK_STARTUP_DELEGATE, &startup_delegate, res); break; case TK_HANDLER: check_upr("handlers section", "%s:handlers", res->name); EXP('{'); res->handlers = parse_options(0, 0, TK_HANDLER_OPTION); break; case TK_PROXY: check_upr("proxy section", "%s:proxy", res->name); parse_proxy_options(&res->proxy_options, &res->proxy_plugins); break; case TK_DEVICE: check_upr("device statement", "%s:device", res->name); case TK_META_DISK: case TK_FLEX_META_DISK: parse_volume_stmt(volume0(&res->volumes), NULL, token); break; case TK_VOLUME: EXP(TK_INTEGER); res->volumes = INSERT_SORTED(res->volumes, parse_volume(atoi(yylval.txt), NULL), vnr); break; case TK_OPTIONS: check_upr("resource options section", "%s:res_options", res->name); EXP('{'); res->res_options = SPLICE(res->res_options, parse_options(0, 0, TK_RES_OPTION)); break; case '}': case 0: goto exit_loop; default: goto_default: pe_expected_got("protocol | on | disk | net | syncer |" " startup | handlers |" " ignore-on | stacked-on-top-of",token); } } exit_loop: if (flags == NoneHAllowed && res->all_hosts) { config_valid = 0; fprintf(stderr, "%s:%d: in the %s section, there are no host sections" " allowed.\n", config_file, c_section_start, res->name); } if (!(flags & PARSE_FOR_ADJUST)) check_volumes_hosts(res); return res; } struct d_resource* parse_resource_for_adjust(struct cfg_ctx *ctx) { int token; token = yylex(); if (token != TK_RESOURCE) return NULL; token = yylex(); if (token != TK_STRING) return NULL; /* FIXME assert that string and ctx->res->name match? */ token = yylex(); if (token != '{') return NULL; return parse_resource(ctx->res->name, PARSE_FOR_ADJUST); } void post_parse(struct d_resource *config, enum pp_flags flags) { struct d_resource *res,*tmp; for_each_resource(res, tmp, config) if (res->stacked_on_one) set_on_hosts_in_res(res); /* sets on_hosts and host->lower */ /* Needs "on_hosts" and host->lower already set */ for_each_resource(res, tmp, config) if (!res->stacked_on_one) set_me_in_resource(res, flags & MATCH_ON_PROXY); /* Needs host->lower->me already set */ for_each_resource(res, tmp, config) if (res->stacked_on_one) set_me_in_resource(res, flags & MATCH_ON_PROXY); // Needs "me" set already for_each_resource(res, tmp, config) if (res->stacked_on_one) set_disk_in_res(res); } void include_file(FILE *f, char *name) { int saved_line; char *saved_config_file, *saved_config_save; saved_line = line; saved_config_file = config_file; saved_config_save = config_save; line = 1; config_file = name; config_save = canonify_path(name); my_yypush_buffer_state(f); my_parse(); yypop_buffer_state(); line = saved_line; config_file = saved_config_file; config_save = saved_config_save; } void include_stmt(char *str) { char *last_slash, *tmp; glob_t glob_buf; int cwd_fd; FILE *f; size_t i; int r; /* in order to allow relative paths in include statements we change directory to the location of the current configuration file. */ cwd_fd = open(".", O_RDONLY); if (cwd_fd < 0) { fprintf(stderr, "open(\".\") failed: %m\n"); exit(E_usage); } tmp = strdupa(config_save); last_slash = strrchr(tmp, '/'); if (last_slash) *last_slash = 0; if (chdir(tmp)) { fprintf(stderr, "chdir(\"%s\") failed: %m\n", tmp); exit(E_usage); } r = glob(str, 0, NULL, &glob_buf); if (r == 0) { for (i=0; i Copyright (C) 2006-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ enum range_checks { R_NO_CHECK, R_MINOR_COUNT, R_DIALOG_REFRESH, R_DISK_SIZE, R_TIMEOUT, R_CONNECT_INT, R_PING_INT, R_MAX_BUFFERS, R_MAX_EPOCH_SIZE, R_SNDBUF_SIZE, R_RCVBUF_SIZE, R_KO_COUNT, R_RATE, R_GROUP, R_AL_EXTENTS, R_PORT, R_META_IDX, R_WFC_TIMEOUT, R_DEGR_WFC_TIMEOUT, R_OUTDATED_WFC_TIMEOUT, R_C_PLAN_AHEAD, R_C_DELAY_TARGET, R_C_FILL_TARGET, R_C_MAX_RATE, R_C_MIN_RATE, R_CONG_FILL, R_CONG_EXTENTS, R_PROTOCOL, }; enum yytokentype { TK_GLOBAL = 258, TK_RESOURCE, TK_ON, TK_STACKED, TK_IGNORE, TK_NET, TK_DISK, TK_SKIP, TK_SYNCER, /* depricated after 8.3 */ TK_STARTUP, TK_DISABLE_IP_VERIFICATION, TK_DIALOG_REFRESH, TK_PROTOCOL, TK_HANDLER, TK_COMMON, TK_ADDRESS, TK_DEVICE, TK_MINOR, TK_META_DISK, TK_FLEX_META_DISK, TK_MINOR_COUNT, TK_IPADDR, TK_INTEGER, TK_STRING, TK_ELSE, TK_DISK_FLAG, TK_DISK_NO_FLAG, TK_DISK_OPTION, TK_NET_FLAG, TK_NET_NO_FLAG, TK_NET_OPTION, TK_SYNCER_FLAG, TK_SYNCER_OPTION, TK_STARTUP_FLAG, TK_STARTUP_OPTION, TK_STARTUP_DELEGATE, TK_HANDLER_OPTION, TK_USAGE_COUNT, TK_ASK, TK_YES, TK_NO, TK__THIS_HOST, TK__REMOTE_HOST, TK_PROXY, TK_INSIDE, TK_OUTSIDE, TK_MEMLIMIT, TK_PROXY_OPTION, TK_PROXY_DELEGATE, TK_ERR_STRING_TOO_LONG, TK_ERR_DQSTRING_TOO_LONG, TK_ERR_DQSTRING, TK_SCI, TK_SDP, TK_SSOCKS, TK_IPV4, TK_IPV6, TK_IPADDR6, TK_NET_DELEGATE, TK_INCLUDE, TK_BWLIMIT, TK_FLOATING, TK_DEPRECATED_OPTION, TK_VOLUME, TK_RES_OPTION, TK_OPTIONS, TK__GROUPING_BASE = 0x1000, TK_SYNCER_OLD_OPT = 0x2000, /* Might be or'ed to TK_[NET|DISK]_[OPTION|SWITCH] */ TK_PROXY_GROUP = 0x3000, /* Gets or'ed to some options */ }; /* The higher bits define one or more token groups. */ #define GET_TOKEN_GROUP(__x) ((__x) & ~(TK__GROUPING_BASE - 1)) #define REMOVE_GROUP_FROM_TOKEN(__x) ((__x) & (TK__GROUPING_BASE - 1)) typedef struct YYSTYPE { char* txt; enum range_checks rc; } YYSTYPE; #define yystype YYSTYPE /* obsolescent; will be withdrawn */ #define YYSTYPE_IS_DECLARED 1 #define YYSTYPE_IS_TRIVIAL 1 extern yystype yylval; extern char* yytext; extern FILE* yyin; /* avoid compiler warnings about implicit declaration */ int yylex(void); void my_yypush_buffer_state(FILE *f); void yypop_buffer_state (void ); void yyrestart(FILE *input_file); drbd-8.4.4/user/drbdadm_scanner.fl0000664000000000000000000002207412221331365015554 0ustar rootroot%{ #include #include #include #include "drbdadm_parser.h" #include "drbdadm.h" #include "drbdtool_common.h" void long_string(char* text); void long_dqstring(char* text); void err_dqstring(char* text); #if 0 #define DP printf("'%s' ",yytext) #else #define DP #endif #define CP yylval.txt = strdup(yytext); yylval.rc = R_NO_CHECK #define RC(N) yylval.rc = R_ ## N #define YY_NO_INPUT 1 #define YY_NO_UNPUT 1 static void yyunput (int c, register char * yy_bp ) __attribute((unused)); #ifndef YY_FLEX_SUBMINOR_VERSION #define MAX_INCLUDE_DEPTH 10 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; int include_stack_ptr = 0; #endif %} %option noyywrap NUM [0-9]{1,8}[MKGs]? SNUMB [0-9]{1,3} IPV4ADDR ({SNUMB}"."){3}{SNUMB} HEX4 [0-9a-fA-F]{1,4} IPV6ADDR ((({HEX4}":"){0,5}{HEX4})?":"{HEX4}?":"({HEX4}(":"{HEX4}){0,5})?("%"{STRING})?)|("::"[fF]{4}":"{IPV4ADDR}) WS [ \t\r] OPCHAR [{};\[\]:] DQSTRING \"([^\"\\\n]|\\[^\n]){0,255}\" LONG_DQSTRING \"([^\"\\\n]|\\[^\n]){255}. ERR_DQSTRING \"([^\"\\\n]|\\[^\n]){0,255}[\\\n] STRING [a-zA-Z0-9/._-]{1,80} LONG_STRING [a-zA-Z0-9/._-]{81} %% \n { line++; } \#.* /* ignore comments */ {WS} /* ignore whitespaces */ {OPCHAR} { DP; return yytext[0]; } on { DP; return TK_ON; } ignore-on { DP; return TK_IGNORE; } stacked-on-top-of { DP; return TK_STACKED; } floating { DP; return TK_FLOATING; } no { DP; return TK_NO; } net { DP; return TK_NET; } yes { DP; return TK_YES; } ask { DP; return TK_ASK; } skip { DP; return TK_SKIP; } disk { DP; return TK_DISK; } proxy { DP; return TK_PROXY; } minor { DP; return TK_MINOR; } inside { DP; return TK_INSIDE; } volume { DP; return TK_VOLUME; } syncer { DP; return TK_SYNCER; } device { DP; return TK_DEVICE; } global { DP; return TK_GLOBAL; } common { DP; return TK_COMMON; } options { DP; return TK_OPTIONS; } outside { DP; return TK_OUTSIDE; } address { DP; return TK_ADDRESS; } startup { DP; return TK_STARTUP; } include { DP; return TK_INCLUDE; } handlers { DP; return TK_HANDLER; } minor-count { DP; return TK_MINOR_COUNT; } disable-ip-verification { DP; return TK_DISABLE_IP_VERIFICATION;} dialog-refresh { DP; return TK_DIALOG_REFRESH; } resource { DP; return TK_RESOURCE; } meta-disk { DP; return TK_META_DISK; } flexible-meta-disk { DP; return TK_FLEX_META_DISK; } usage-count { DP; return TK_USAGE_COUNT; } _this_host { DP; return TK__THIS_HOST; } _remote_host { DP; return TK__REMOTE_HOST; } sci { DP; CP; return TK_SCI; } ssocks { DP; CP; return TK_SSOCKS; } sdp { DP; CP; return TK_SDP; } ipv4 { DP; CP; return TK_IPV4; } ipv6 { DP; CP; return TK_IPV6; } size { DP; CP; RC(DISK_SIZE); return TK_DISK_OPTION; } on-io-error { DP; CP; return TK_DISK_OPTION; } fencing { DP; CP; return TK_DISK_OPTION; } max-bio-bvecs { DP; CP; return TK_DISK_OPTION; } disk-timeout { DP; CP; return TK_DISK_OPTION; } read-balancing { DP; CP; return TK_DISK_OPTION; } use-bmbv { DP; CP; return TK_DISK_FLAG; } disk-barrier { DP; CP; return TK_DISK_FLAG; } disk-flushes { DP; CP; return TK_DISK_FLAG; } disk-drain { DP; CP; return TK_DISK_FLAG; } md-flushes { DP; CP; return TK_DISK_FLAG; } no-disk-barrier { DP; CP; return TK_DISK_NO_FLAG; } no-disk-flushes { DP; CP; return TK_DISK_NO_FLAG; } no-disk-drain { DP; CP; return TK_DISK_NO_FLAG; } no-md-flushes { DP; CP; return TK_DISK_NO_FLAG; } timeout { DP; CP; RC(TIMEOUT); return TK_NET_OPTION; } protocol { DP; CP; RC(PROTOCOL); return TK_NET_OPTION; } ko-count { DP; CP; RC(KO_COUNT); return TK_NET_OPTION; } ping-int { DP; CP; RC(PING_INT); return TK_NET_OPTION; } max-buffers { DP; CP; RC(MAX_BUFFERS); return TK_NET_OPTION;} sndbuf-size { DP; CP; RC(SNDBUF_SIZE); return TK_NET_OPTION | TK_PROXY_GROUP;} rcvbuf-size { DP; CP; RC(RCVBUF_SIZE); return TK_NET_OPTION | TK_PROXY_GROUP;} connect-int { DP; CP; RC(CONNECT_INT); return TK_NET_OPTION;} cram-hmac-alg { DP; CP; return TK_NET_OPTION; } shared-secret { DP; CP; return TK_NET_OPTION; } max-epoch-size { DP; CP; RC(MAX_EPOCH_SIZE); return TK_NET_OPTION;} after-sb-[012]pri { DP; CP; return TK_NET_OPTION; } rr-conflict { DP; CP; return TK_NET_OPTION; } ping-timeout { DP; CP; return TK_NET_OPTION | TK_PROXY_GROUP;} unplug-watermark { DP; CP; return TK_NET_OPTION; } data-integrity-alg { DP; CP; return TK_NET_OPTION; } on-congestion { DP; CP; return TK_NET_OPTION; } congestion-fill { DP; CP; RC(CONG_FILL); return TK_NET_OPTION; } congestion-extents { DP; CP; RC(CONG_EXTENTS); return TK_NET_OPTION;} allow-two-primaries { DP; CP; return TK_NET_FLAG; } always-asbp { DP; CP; return TK_NET_FLAG; } no-tcp-cork { DP; CP; return TK_NET_NO_FLAG; } tcp-cork { DP; CP; return TK_NET_FLAG; } discard-my-data { DP; CP; return TK_NET_DELEGATE; } rate { DP; CP; RC(RATE); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } resync-rate { DP; CP; RC(RATE); return TK_DISK_OPTION; } after { DP; CP; return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } resync-after { DP; CP; return TK_DISK_OPTION; } verify-alg { DP; CP; return TK_SYNCER_OLD_OPT | TK_NET_OPTION; } csums-alg { DP; CP; return TK_SYNCER_OLD_OPT | TK_NET_OPTION; } al-extents { DP; CP; RC(AL_EXTENTS); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION;} al-updates { DP; CP; return TK_DISK_FLAG; } cpu-mask { DP; CP; return TK_SYNCER_OLD_OPT | TK_RES_OPTION; } use-rle { DP; CP; return TK_SYNCER_OLD_OPT | TK_NET_FLAG; } delay-probe-volume { DP; CP; return TK_DEPRECATED_OPTION; } delay-probe-interval { DP; CP; return TK_DEPRECATED_OPTION; } c-plan-ahead { DP; CP; RC(C_PLAN_AHEAD); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } c-delay-target { DP; CP; RC(C_DELAY_TARGET); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } c-fill-target { DP; CP; RC(C_FILL_TARGET); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } c-max-rate { DP; CP; RC(C_MAX_RATE); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } c-min-rate { DP; CP; RC(C_MIN_RATE); return TK_SYNCER_OLD_OPT | TK_DISK_OPTION; } throttle-threshold { DP; CP; return TK_DEPRECATED_OPTION; } hold-off-threshold { DP; CP; return TK_DEPRECATED_OPTION; } on-no-data-accessible { DP; CP; return TK_SYNCER_OLD_OPT | TK_RES_OPTION; } wfc-timeout { DP; CP; RC(WFC_TIMEOUT); return TK_STARTUP_OPTION;} degr-wfc-timeout { DP; CP; RC(DEGR_WFC_TIMEOUT); return TK_STARTUP_OPTION;} outdated-wfc-timeout { DP; CP; RC(OUTDATED_WFC_TIMEOUT); return TK_STARTUP_OPTION;} stacked-timeouts { DP; return TK_STARTUP_DELEGATE; } become-primary-on { DP; return TK_STARTUP_DELEGATE; } wait-after-sb { DP; CP; return TK_STARTUP_FLAG; } pri-on-incon-degr { DP; CP; return TK_HANDLER_OPTION; } pri-lost-after-sb { DP; CP; return TK_HANDLER_OPTION; } pri-lost { DP; CP; return TK_HANDLER_OPTION; } initial-split-brain { DP; CP; return TK_HANDLER_OPTION; } split-brain { DP; CP; return TK_HANDLER_OPTION; } outdate-peer { DP; CP; return TK_HANDLER_OPTION; } fence-peer { DP; CP; return TK_HANDLER_OPTION; } local-io-error { DP; CP; return TK_HANDLER_OPTION; } before-resync-target { DP; CP; return TK_HANDLER_OPTION; } after-resync-target { DP; CP; return TK_HANDLER_OPTION; } before-resync-source { DP; CP; return TK_HANDLER_OPTION; } memlimit { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } read-loops { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } compression { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } bwlimit { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } plugin { DP; CP; return TK_PROXY_DELEGATE; } out-of-sync { DP; CP; return TK_HANDLER_OPTION; } {IPV4ADDR} { DP; CP; return TK_IPADDR; } {IPV6ADDR} { DP; CP; return TK_IPADDR6; } {NUM} { DP; CP; return TK_INTEGER; } {DQSTRING} { unescape(yytext); DP; CP; return TK_STRING; } {STRING} { DP; CP; return TK_STRING; } {LONG_STRING} { return TK_ERR_STRING_TOO_LONG; } {LONG_DQSTRING} { return TK_ERR_DQSTRING_TOO_LONG; } {ERR_DQSTRING} { return TK_ERR_DQSTRING; } . { DP; return TK_ELSE; } %% /* Compatibility cruft for flex version 2.5.4a */ #ifndef YY_FLEX_SUBMINOR_VERSION /** Pushes the new state onto the stack. The new state becomes * the current state. This function will allocate the stack * if necessary. * @param new_buffer The new state. * */ void yypush_buffer_state (YY_BUFFER_STATE new_buffer ) { if (new_buffer == NULL) return; if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) { fprintf( stderr, "Includes nested too deeply" ); exit( 1 ); } include_stack[include_stack_ptr++] = YY_CURRENT_BUFFER; yy_switch_to_buffer(new_buffer); BEGIN(INITIAL); } /** Removes and deletes the top of the stack, if present. * The next element becomes the new top. * */ void yypop_buffer_state (void) { if (!YY_CURRENT_BUFFER) return; if ( --include_stack_ptr < 0 ) { fprintf( stderr, "error in flex compat code\n" ); exit( 1 ); } yy_delete_buffer(YY_CURRENT_BUFFER ); yy_switch_to_buffer(include_stack[include_stack_ptr]); } #endif void my_yypush_buffer_state(FILE *f) { /* Since we do not have YY_BUF_SIZE outside of the flex generated file.*/ yypush_buffer_state(yy_create_buffer(f, YY_BUF_SIZE)); } drbd-8.4.4/user/drbdadm_usage_cnt.c0000664000000000000000000004516412216604252015723 0ustar rootroot/* drbdadm_usage_cnt.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2006-2008, LINBIT Information Technologies GmbH Copyright (C) 2006-2008, Philipp Reisner Copyright (C) 2006-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "drbdadm.h" #include "drbdtool_common.h" #include "drbd_endian.h" #include "linux/drbd.h" /* only use DRBD_MAGIC from here! */ #define HTTP_PORT 80 #define HTTP_HOST "usage.drbd.org" #define HTTP_ADDR "212.69.161.111" #define NODE_ID_FILE DRBD_LIB_DIR"/node_id" #define GIT_HASH_BYTE 20 #define SRCVERSION_BYTE 12 /* actually 11 and a half. */ #define SRCVERSION_PAD (GIT_HASH_BYTE - SRCVERSION_BYTE) #define SVN_STYLE_OD 16 struct vcs_rel { uint32_t svn_revision; char git_hash[GIT_HASH_BYTE]; struct { unsigned major, minor, sublvl; } version; unsigned version_code; }; struct node_info { uint64_t node_uuid; struct vcs_rel rev; }; struct node_info_od { uint32_t magic; struct node_info ni; } __packed; /* For our purpose (finding the revision) SLURP_SIZE is always enough. */ static char* slurp_proc_drbd() { const int SLURP_SIZE = 4096; char* buffer; int rr, fd; fd = open("/proc/drbd",O_RDONLY); if( fd == -1) return 0; buffer = malloc(SLURP_SIZE); if(!buffer) return 0; rr = read(fd, buffer, SLURP_SIZE-1); if( rr == -1) { free(buffer); return 0; } buffer[rr]=0; close(fd); return buffer; } void read_hex(char* dst, char* src, int dst_size, int src_size) { int dst_i, u, src_i=0; for(dst_i=0;dst_i= src_size) break; if(src[src_i] == 0) break; if(++src_i >= src_size) break; } } void vcs_ver_from_str(struct vcs_rel *rel, const char *token) { char *dot; long maj, min, sub; maj = strtol(token, &dot, 10); if (*dot != '.') return; min = strtol(dot+1, &dot, 10); if (*dot != '.') return; sub = strtol(dot+1, &dot, 10); /* don't check on *dot == 0, * we may want to add some extraversion tag sometime if (*dot != 0) return; */ rel->version.major = maj; rel->version.minor = min; rel->version.sublvl = sub; rel->version_code = (maj << 16) + (min << 8) + sub; } void vcs_from_str(struct vcs_rel *rel, const char *text) { char token[80]; int plus=0; enum { begin, f_ver, f_svn, f_rev, f_git, f_srcv } ex = begin; while (sget_token(token, sizeof(token), &text) != EOF) { switch(ex) { case begin: if(!strcmp(token,"version:")) ex = f_ver; if(!strcmp(token,"SVN")) ex = f_svn; if(!strcmp(token,"GIT-hash:")) ex = f_git; if(!strcmp(token,"srcversion:")) ex = f_srcv; break; case f_ver: if(!strcmp(token,"plus")) plus = 1; /* still waiting for version */ else { vcs_ver_from_str(rel, token); ex = begin; } break; case f_svn: if(!strcmp(token,"Revision:")) ex = f_rev; break; case f_rev: rel->svn_revision = atol(token) * 10; if( plus ) rel->svn_revision += 1; memset(rel->git_hash, 0, GIT_HASH_BYTE); return; case f_git: read_hex(rel->git_hash, token, GIT_HASH_BYTE, strlen(token)); rel->svn_revision = 0; return; case f_srcv: memset(rel->git_hash, 0, SRCVERSION_PAD); read_hex(rel->git_hash + SRCVERSION_PAD, token, SRCVERSION_BYTE, strlen(token)); rel->svn_revision = 0; return; } } } static int current_vcs_is_from_proc_drbd; static struct vcs_rel current_vcs_rel; static struct vcs_rel userland_version; static void vcs_get_current(void) { char* version_txt; if (current_vcs_rel.version_code) return; version_txt = slurp_proc_drbd(); if(version_txt) { vcs_from_str(¤t_vcs_rel, version_txt); current_vcs_is_from_proc_drbd = 1; free(version_txt); } else { vcs_from_str(¤t_vcs_rel, drbd_buildtag()); vcs_ver_from_str(¤t_vcs_rel, REL_VERSION); } } static void vcs_get_userland(void) { if (userland_version.version_code) return; vcs_ver_from_str(&userland_version, REL_VERSION); } int version_code_kernel(void) { vcs_get_current(); return current_vcs_is_from_proc_drbd ? current_vcs_rel.version_code : 0; } int version_code_userland(void) { vcs_get_userland(); return userland_version.version_code; } static int vcs_eq(struct vcs_rel *rev1, struct vcs_rel *rev2) { if( rev1->svn_revision || rev2->svn_revision ) { return rev1->svn_revision == rev2->svn_revision; } else { return !memcmp(rev1->git_hash,rev2->git_hash,GIT_HASH_BYTE); } } static int vcs_ver_cmp(struct vcs_rel *rev1, struct vcs_rel *rev2) { return rev1->version_code - rev2->version_code; } void warn_on_version_mismatch(void) { char *msg; int cmp; /* get the kernel module version from /proc/drbd */ vcs_get_current(); /* get the userland version from REL_VERSION */ vcs_get_userland(); cmp = vcs_ver_cmp(&userland_version, ¤t_vcs_rel); /* no message if equal */ if (cmp == 0) return; if (cmp > 0xffff || cmp < -0xffff) /* major version differs! */ msg = "mixing different major numbers will not work!"; else if (cmp < 0) /* userland is older. always warn. */ msg = "you should upgrade your drbd tools!"; else if (cmp & 0xff00) /* userland is newer minor version */ msg = "please don't mix different DRBD series."; else /* userland is newer, but only differ in sublevel. */ msg = "preferably kernel and userland versions should match."; fprintf(stderr, "DRBD module version: %u.%u.%u\n" " userland version: %u.%u.%u\n%s\n", current_vcs_rel.version.major, current_vcs_rel.version.minor, current_vcs_rel.version.sublvl, userland_version.version.major, userland_version.version.minor, userland_version.version.sublvl, msg); } void add_lib_drbd_to_path(void) { char *new_path = NULL; char *old_path = getenv("PATH"); m_asprintf(&new_path, "%s%s%s", old_path, old_path ? ":" : "", "/lib/drbd"); setenv("PATH", new_path, 1); } void maybe_exec_drbdadm_83(char **argv) { if (current_vcs_rel.version.major == 8 && current_vcs_rel.version.minor == 3) { #ifdef DRBD_LEGACY_83 /* This drbdadm warned already... */ setenv("DRBD_DONT_WARN_ON_VERSION_MISMATCH", "1", 0); add_lib_drbd_to_path(); execvp(drbdadm_83, argv); fprintf(stderr, "execvp() failed to exec %s: %m\n", drbdadm_83); #else fprintf(stderr, "This drbdadm was build without support for legacy\n" "drbd kernel code. Consider to rebuild your user land\n" "tools with ./configure --with-legacy-connector\n"); #endif exit(E_exec_error); } } static char *vcs_to_str(struct vcs_rel *rev) { static char buffer[80]; // Not generic, sufficient for the purpose. if( rev->svn_revision ) { snprintf(buffer,80,"nv="U32,rev->svn_revision); } else { int len=20,p; unsigned char *bytes; p = sprintf(buffer,"git="); bytes = (unsigned char*)rev->git_hash; while(len--) p += sprintf(buffer+p,"%02x",*bytes++); } return buffer; } static void write_node_id(struct node_info *ni) { int fd; struct node_info_od on_disk; int size; fd = open(NODE_ID_FILE,O_WRONLY|O_CREAT,S_IRUSR|S_IWUSR); if( fd == -1 && errno == ENOENT) { mkdir(DRBD_LIB_DIR,S_IRWXU); fd = open(NODE_ID_FILE,O_WRONLY|O_CREAT,S_IRUSR|S_IWUSR); } if( fd == -1) { perror("Creation of "NODE_ID_FILE" failed."); exit(20); } if(ni->rev.svn_revision != 0) { // SVN style (old) on_disk.magic = cpu_to_be32(DRBD_MAGIC); on_disk.ni.node_uuid = cpu_to_be64(ni->node_uuid); on_disk.ni.rev.svn_revision = cpu_to_be32(ni->rev.svn_revision); memset(on_disk.ni.rev.git_hash,0,GIT_HASH_BYTE); size = SVN_STYLE_OD; } else { on_disk.magic = cpu_to_be32(DRBD_MAGIC+1); on_disk.ni.node_uuid = cpu_to_be64(ni->node_uuid); on_disk.ni.rev.svn_revision = 0; memcpy(on_disk.ni.rev.git_hash,ni->rev.git_hash,GIT_HASH_BYTE); size = sizeof(on_disk); } if( write(fd,&on_disk, size) != size) { perror("Write to "NODE_ID_FILE" failed."); exit(20); } close(fd); } static int read_node_id(struct node_info *ni) { int rr,fd; struct node_info_od on_disk; fd = open(NODE_ID_FILE,O_RDONLY); if( fd == -1) { return 0; } rr = read(fd,&on_disk, sizeof(on_disk)); if( rr != sizeof(on_disk) && rr != SVN_STYLE_OD ) { close(fd); return 0; } switch(be32_to_cpu(on_disk.magic)) { case DRBD_MAGIC: ni->node_uuid = be64_to_cpu(on_disk.ni.node_uuid); ni->rev.svn_revision = be32_to_cpu(on_disk.ni.rev.svn_revision); memset(ni->rev.git_hash,0,GIT_HASH_BYTE); break; case DRBD_MAGIC+1: ni->node_uuid = be64_to_cpu(on_disk.ni.node_uuid); ni->rev.svn_revision = 0; memcpy(ni->rev.git_hash,on_disk.ni.rev.git_hash,GIT_HASH_BYTE); break; default: return 0; } close(fd); return 1; } /* to interrupt gethostbyname, * we not only need a signal, * but also the long jump: * gethostbyname would otherwise just restart the syscall * and timeout again. */ static jmp_buf timed_out; static void alarm_handler(int __attribute((unused)) signo) { longjmp(timed_out, 1); } #define DNS_TIMEOUT 3 /* seconds */ #define SOCKET_TIMEOUT 3 /* seconds */ struct hostent *my_gethostbyname(const char *name) { struct sigaction sa; struct sigaction so; struct hostent *h; alarm(0); sa.sa_handler = &alarm_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sigaction(SIGALRM, &sa, &so); if (!setjmp(timed_out)) { alarm(DNS_TIMEOUT); h = gethostbyname(name); } else /* timed out, longjmp of SIGALRM jumped here */ h = NULL; alarm(0); sigaction(SIGALRM, &so, NULL); return h; } /** * insert_usage_with_socket: * * Return codes: * * 0 - success * 1 - failed to create socket * 2 - unknown server * 3 - cannot connect to server * 5 - other error */ static int make_get_request(char *uri) { struct sockaddr_in server; struct hostent *host_info; unsigned long addr; int sock; char *req_buf; char *http_host = HTTP_HOST; int buf_len = 1024; char buffer[buf_len]; FILE *sockfd; int writeit; struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT }; sock = socket( PF_INET, SOCK_STREAM, 0); if (sock < 0) return 1; setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout)); setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(timeout)); memset (&server, 0, sizeof(server)); /* convert host name to ip */ host_info = my_gethostbyname(http_host); if (host_info == NULL) { /* unknown host, try with ip */ if ((addr = inet_addr( HTTP_ADDR )) != INADDR_NONE) memcpy((char *)&server.sin_addr, &addr, sizeof(addr)); else { close(sock); return 2; } } else { memcpy((char *)&server.sin_addr, host_info->h_addr, host_info->h_length); } ssprintf(req_buf, "GET %s HTTP/1.0\r\n" "Host: "HTTP_HOST"\r\n" "User-Agent: drbdadm/"REL_VERSION" (%s; %s; %s; %s)\r\n" "\r\n", uri, nodeinfo.sysname, nodeinfo.release, nodeinfo.version, nodeinfo.machine); server.sin_family = AF_INET; server.sin_port = htons(HTTP_PORT); if (connect(sock, (struct sockaddr*)&server, sizeof(server))<0) { /* cannot connect to server */ close(sock); return 3; } if ((sockfd = fdopen(sock, "r+")) == NULL) { close(sock); return 5; } if (fputs(req_buf, sockfd) == EOF) { fclose(sockfd); close(sock); return 5; } writeit = 0; while (fgets(buffer, buf_len, sockfd) != NULL) { /* ignore http headers */ if (writeit == 0) { if (buffer[0] == '\r' || buffer[0] == '\n') writeit = 1; } else { fprintf(stderr,"%s", buffer); } } fclose(sockfd); close(sock); return 0; } static void url_encode(char* in, char* out) { char *h = "0123456789abcdef"; char c; while( (c = *in++) != 0 ) { if( c == '\n' ) break; if( ( 'a' <= c && c <= 'z' ) || ( 'A' <= c && c <= 'Z' ) || ( '0' <= c && c <= '9' ) || c == '-' || c == '_' || c == '.' ) *out++ = c; else if( c == ' ' ) *out++ = '+'; else { *out++ = '%'; *out++ = h[c >> 4]; *out++ = h[c & 0x0f]; } } *out = 0; } /* Ensure that the node is counted on http://usage.drbd.org */ #define ANSWER_SIZE 80 void uc_node(enum usage_count_type type) { struct node_info ni; char *uri; int send = 0; int update = 0; char answer[ANSWER_SIZE]; char n_comment[ANSWER_SIZE*3]; char *r; if( type == UC_NO ) return; if( getuid() != 0 ) return; /* not when running directly from init, * or if stdout is no tty. * you do not want to have the "user information message" * as output from `drbdadm sh-resources all` */ if (getenv("INIT_VERSION")) return; if (no_tty) return; vcs_get_current(); if( ! read_node_id(&ni) ) { get_random_bytes(&ni.node_uuid,sizeof(ni.node_uuid)); ni.rev = current_vcs_rel; send = 1; } else { // read_node_id() was successful if (!vcs_eq(&ni.rev,¤t_vcs_rel)) { ni.rev = current_vcs_rel; update = 1; send = 1; } } if(!send) return; n_comment[0]=0; if (type == UC_ASK ) { fprintf(stderr, "\n" "\t\t--== This is %s of DRBD ==--\n" "Please take part in the global DRBD usage count at http://"HTTP_HOST".\n\n" "The counter works anonymously. It creates a random number to identify\n" "your machine and sends that random number, along with the kernel and\n" "DRBD version, to "HTTP_HOST".\n\n" "The benefits for you are:\n" " * In response to your submission, the server ("HTTP_HOST") will tell you\n" " how many users before you have installed this version (%s).\n" " * With a high counter LINBIT has a strong motivation to\n" " continue funding DRBD's development.\n\n" "http://"HTTP_HOST"/cgi-bin/insert_usage.pl?nu="U64"&%s\n\n" "In case you want to participate but know that this machine is firewalled,\n" "simply issue the query string with your favorite web browser or wget.\n" "You can control all of this by setting 'usage-count' in your drbd.conf.\n\n" "* You may enter a free form comment about your machine, that gets\n" " used on "HTTP_HOST" instead of the big random number.\n" "* If you wish to opt out entirely, simply enter 'no'.\n" "* To count this node without comment, just press [RETURN]\n", update ? "an update" : "a new installation", REL_VERSION,ni.node_uuid, vcs_to_str(&ni.rev)); r = fgets(answer, ANSWER_SIZE, stdin); if(r && !strcmp(answer,"no\n")) send = 0; url_encode(answer,n_comment); } ssprintf(uri,"http://"HTTP_HOST"/cgi-bin/insert_usage.pl?nu="U64"&%s%s%s", ni.node_uuid, vcs_to_str(&ni.rev), n_comment[0] ? "&nc=" : "", n_comment); if (send) { write_node_id(&ni); fprintf(stderr, "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" " --== Thank you for participating in the global usage survey ==--\n" "The server's response is:\n\n"); make_get_request(uri); if (type == UC_ASK) { fprintf(stderr, "\n" "From now on, drbdadm will contact "HTTP_HOST" only when you update\n" "DRBD or when you use 'drbdadm create-md'. Of course it will continue\n" "to ask you for confirmation as long as 'usage-count' is at its default\n" "value of 'ask'.\n\n" "Just press [RETURN] to continue: "); r = fgets(answer, 9, stdin); } } } /* For our purpose (finding the revision) SLURP_SIZE is always enough. */ static char* run_admm_generic(struct cfg_ctx *ctx, const char *arg_override) { const int SLURP_SIZE = 4096; int rr,pipes[2]; char* buffer; pid_t pid; buffer = malloc(SLURP_SIZE); if(!buffer) return 0; if(pipe(pipes)) return 0; pid = fork(); if(pid == -1) { fprintf(stderr,"Can not fork\n"); exit(E_exec_error); } if(pid == 0) { // child close(pipes[0]); // close reading end dup2(pipes[1],1); // 1 = stdout close(pipes[1]); /* local modification in child, * no propagation to parent */ ctx->arg = arg_override; rr = _admm_generic(ctx, SLEEPS_VERY_LONG|SUPRESS_STDERR| DONT_REPORT_FAILED); exit(rr); } close(pipes[1]); // close writing end rr = read(pipes[0], buffer, SLURP_SIZE-1); if( rr == -1) { free(buffer); // FIXME cleanup return 0; } buffer[rr]=0; close(pipes[0]); waitpid(pid,0,0); return buffer; } int adm_create_md(struct cfg_ctx *ctx) { char answer[ANSWER_SIZE]; struct node_info ni; uint64_t device_uuid=0; uint64_t device_size=0; char *uri; int send=0; char *tb; int rv,fd; char *r; tb = run_admm_generic(ctx, "read-dev-uuid"); device_uuid = strto_u64(tb,NULL,16); free(tb); /* this is "drbdmeta ... create-md" */ rv = _admm_generic(ctx, SLEEPS_VERY_LONG); if(rv || dry_run) return rv; fd = open(ctx->vol->disk,O_RDONLY); if( fd != -1) { device_size = bdev_size(fd); close(fd); } if( read_node_id(&ni) && device_size && !device_uuid) { get_random_bytes(&device_uuid, sizeof(uint64_t)); if( global_options.usage_count == UC_YES ) send = 1; if( global_options.usage_count == UC_ASK ) { fprintf(stderr, "\n" "\t\t--== Creating metadata ==--\n" "As with nodes, we count the total number of devices mirrored by DRBD\n" "at http://"HTTP_HOST".\n\n" "The counter works anonymously. It creates a random number to identify\n" "the device and sends that random number, along with the kernel and\n" "DRBD version, to "HTTP_HOST".\n\n" "http://"HTTP_HOST"/cgi-bin/insert_usage.pl?nu="U64"&ru="U64"&rs="U64"\n\n" "* If you wish to opt out entirely, simply enter 'no'.\n" "* To continue, just press [RETURN]\n", ni.node_uuid,device_uuid,device_size ); r = fgets(answer, ANSWER_SIZE, stdin); if(r && strcmp(answer,"no\n")) send = 1; } } if(!device_uuid) { get_random_bytes(&device_uuid, sizeof(uint64_t)); } if (send) { ssprintf(uri,"http://"HTTP_HOST"/cgi-bin/insert_usage.pl?" "nu="U64"&ru="U64"&rs="U64, ni.node_uuid, device_uuid, device_size); make_get_request(uri); } /* HACK */ { struct cfg_ctx local_ctx = *ctx; struct setup_option *old_setup_options; char *opt; ssprintf(opt, X64(016), device_uuid); old_setup_options = setup_options; setup_options = NULL; add_setup_option(false, opt); local_ctx.arg = "write-dev-uuid"; _admm_generic(&local_ctx, SLEEPS_VERY_LONG); free(setup_options); setup_options = old_setup_options; } return rv; } drbd-8.4.4/user/drbdmeta.c0000664000000000000000000037304412221310571014053 0ustar rootroot/* drbdmeta.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2004-2008, LINBIT Information Technologies GmbH Copyright (C) 2004-2008, Philipp Reisner Copyright (C) 2004-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ /* have the first, otherwise you get e.g. "redefined" types from * sys/types.h and other weird stuff */ #define INITIALIZE_BITMAP 0 #define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* only use DRBD_MAGIC from here! */ #include /* for BLKFLSBUF */ #include "drbd_endian.h" #include "drbdtool_common.h" #include "drbdmeta_parser.h" #include "config.h" extern FILE* yyin; YYSTYPE yylval; int force = 0; int verbose = 0; int ignore_sanity_checks = 0; int dry_run = 0; int option_peer_max_bio_size = 0; unsigned option_al_stripes = 1; unsigned option_al_stripe_size_4k = 8; unsigned option_al_stripes_used = 0; struct option metaopt[] = { { "ignore-sanity-checks", no_argument, &ignore_sanity_checks, 1000 }, { "dry-run", no_argument, &dry_run, 1000 }, { "force", no_argument, 0, 'f' }, { "verbose", no_argument, 0, 'v' }, { "peer-max-bio-size", required_argument, NULL, 'p' }, { "al-stripes", required_argument, NULL, 's' }, { "al-stripe-size-kB", required_argument, NULL, 'z' }, { NULL, 0, 0, 0 }, }; /* FIXME? should use sector_t and off_t, not long/uint64_t ... */ /* Note RETURN VALUES: * exit code convention: int vXY_something() and meta_blah return some negative * error code, usually -1, when failed, 0 for success. * * FIXME some of the return -1; probably should better be exit(something); * or some of the exit() should be rather some return? * * AND, the exit codes should follow some defined scheme. */ #if 0 #define ASSERT(x) ((void)(0)) #define d_expect(x) (x) #else #define ASSERT(x) do { if (!(x)) { \ fprintf(stderr, "%s:%u:%s: ASSERT(%s) failed.\n", \ __FILE__ , __LINE__ , __func__ , #x ); \ abort(); } \ } while (0) #define d_expect(x) ({ \ int _x = (x); \ if (!_x) \ fprintf(stderr, "%s:%u:%s: ASSERT(%s) failed.\n",\ __FILE__ , __LINE__ , __func__ , #x ); \ _x; }) #endif static int confirmed(const char *text) { const char yes[] = "yes"; const ssize_t N = sizeof(yes); char *answer = NULL; size_t n = 0; int ok; printf("\n%s\n", text); if (force) { printf("*** confirmation forced via --force option ***\n"); ok = 1; } else { printf("[need to type '%s' to confirm] ", yes); ok = getline(&answer,&n,stdin) == N && strncmp(answer,yes,N-1) == 0; if (answer) free(answer); printf("\n"); } return ok; } /* * FIXME * * when configuring a drbd device: * * Require valid drbd meta data at the respective location. A meta data * block would only be created by the drbdmeta command. * * (How) do we want to implement this: A meta data block contains some * reference to the physical device it belongs. Refuse to attach not * corresponding meta data. * * THINK: put a checksum within the on-disk meta data block, too? * * When asked to create a new meta data block, the drbdmeta command * warns loudly if either the data device or the meta data device seem * to contain some data, and requires explicit confirmation anyways. * * See current implementation in check_for_existing_data below. * * XXX should also be done for meta-data != internal, i.e. refuse to * create meta data blocks on a device that seems to be in use for * something else. * * Maybe with an external meta data device, we want to require a "meta * data device super block", which could also serve as TOC to the meta * data, once we have variable size meta data. Other option could be a * /var/lib/drbd/md-toc plain file, and some magic block on every device * that serves as md storage. * * For certain content on the lower level device, we should refuse * always. e.g. refuse to be created on top of a LVM2 physical volume, * or on top of swap space. This would require people to do an dd * if=/dev/zero of=device. Protects them from shooting themselves, * and blaming us... */ /* reiserfs sb offset is 64k plus * align it to 4k, in case someone has unusual hard sect size (!= 512), * otherwise direct io will fail with EINVAL */ #define SO_MUCH (68*1024) /* * I think this block of declarations and definitions should be * in some common.h, too. * { */ #ifndef ALIGN # define ALIGN(x,a) ( ((x) + (a)-1) &~ ((a)-1) ) #endif #define MD_AL_OFFSET_07 8 #define MD_AL_MAX_SECT_07 64 #define MD_BM_OFFSET_07 (MD_AL_OFFSET_07 + MD_AL_MAX_SECT_07) #define MD_RESERVED_SECT_07 ( (uint64_t)(128ULL << 11) ) #define MD_BM_MAX_BYTE_07 ( (uint64_t)(MD_RESERVED_SECT_07 - MD_BM_OFFSET_07)*512 ) #if BITS_PER_LONG == 32 #define MD_BM_MAX_BYTE_FLEX ( (uint64_t)(1ULL << (32-3)) ) #else #define MD_BM_MAX_BYTE_FLEX ( (uint64_t)(1ULL << (38-3)) ) #endif #define DEFAULT_BM_BLOCK_SIZE (1<<12) #define DRBD_MD_MAGIC_06 (DRBD_MAGIC+2) #define DRBD_MD_MAGIC_07 (DRBD_MAGIC+3) #define DRBD_MD_MAGIC_08 (DRBD_MAGIC+4) #define DRBD_MD_MAGIC_84_UNCLEAN (DRBD_MAGIC+5) /* * } * end of should-be-shared */ /* * global variables and data types */ /* buffer_size has to be a multiple of 4096, and at least 32k. * Pending a "nice" implementation of replay_al_84 for striped activity log, * I chose a big buffer hopefully large enough to hold the whole activity log, * even with "large" number of stripes and stripe sizes. */ const size_t buffer_size = 32 * 1024 * 1024; size_t pagesize; /* = sysconf(_SC_PAGESIZE) */ int opened_odirect = 1; void *on_disk_buffer = NULL; int global_argc; char **global_argv; enum Known_Formats { Drbd_06, Drbd_07, Drbd_08, Drbd_Unknown, }; /* let gcc help us get it right. * some explicit endian types */ typedef struct { uint64_t le; } le_u64; typedef struct { uint64_t be; } be_u64; typedef struct { uint32_t le; } le_u32; typedef struct { uint32_t be; } be_u32; typedef struct { int32_t be; } be_s32; typedef struct { uint16_t be; } be_u16; typedef struct { unsigned long le; } le_ulong; typedef struct { unsigned long be; } be_ulong; /* NOTE that this structure does not need to be packed, * aligned, nor does it need to be in the same order as the on_disk variants. */ struct md_cpu { /* present since drbd 0.6 */ uint32_t gc[GEN_CNT_SIZE]; /* generation counter */ uint32_t magic; /* added in drbd 0.7; * 0.7 stores la_size_sect on disk as kb, 0.8 in units of sectors. * we use sectors in our general working structure here */ uint64_t la_sect; /* last agreed size. */ uint32_t md_size_sect; int32_t al_offset; /* signed sector offset to this block */ uint32_t al_nr_extents; /* important for restoring the AL */ int32_t bm_offset; /* signed sector offset to the bitmap, from here */ /* Since DRBD 0.8 we have uuid instead of gc */ uint64_t uuid[UI_SIZE]; uint32_t flags; uint64_t device_uuid; uint32_t bm_bytes_per_bit; uint32_t la_peer_max_bio_size; uint32_t al_stripes; uint32_t al_stripe_size_4k; }; /* * drbdmeta specific types */ struct format_ops; struct format { const struct format_ops *ops; char *md_device_name; /* well, in 06 it is file name */ char *drbd_dev_name; unsigned minor; /* cache, determined from drbd_dev_name */ int lock_fd; int drbd_fd; /* no longer used! */ int ll_fd; /* not yet used here */ int md_fd; int md_hard_sect_size; /* unused in 06 */ int md_index; unsigned int bm_bytes; unsigned int bits_set; /* 32 bit should be enough. @4k ==> 16TB */ int bits_counted:1; int update_lk_bdev:1; /* need to update the last known bdev info? */ struct md_cpu md; /* _byte_ offsets of our "super block" and other data, within fd */ uint64_t md_offset; uint64_t al_offset; uint64_t bm_offset; /* if create_md actually does convert, * we want to wipe the old meta data block _after_ convertion. */ uint64_t wipe_fixed; uint64_t wipe_flex; /* convenience */ uint64_t bd_size; /* size of block device for internal meta data */ /* size limit due to available on-disk bitmap */ uint64_t max_usable_sect; /* last-known bdev info, * to increase the chance of finding internal meta data in case the * lower level device has been resized without telling DRBD. * Loaded from file for internal metadata */ struct bdev_info lk_bd; }; /* - parse is expected to exit() if it does not work out. * - open is expected to read the respective on_disk members, * and copy the "superblock" meta data into the struct mem_cpu * FIXME describe rest of them, and when they should exit, * return error or success. */ struct format_ops { const char *name; char **args; int (*parse) (struct format *, char **, int, int *); int (*open) (struct format *); int (*close) (struct format *); int (*md_initialize) (struct format *); int (*md_disk_to_cpu) (struct format *); int (*md_cpu_to_disk) (struct format *); void (*get_gi) (struct md_cpu *md); void (*show_gi) (struct md_cpu *md); void (*set_gi) (struct md_cpu *md, char **argv, int argc); int (*outdate_gi) (struct md_cpu *md); int (*invalidate_gi) (struct md_cpu *md); }; /* * -- DRBD 0.6 -------------------------------------- */ struct __packed md_on_disk_06 { be_u32 gc[GEN_CNT_SIZE]; /* generation counter */ be_u32 magic; }; void md_disk_06_to_cpu(struct md_cpu *cpu, const struct md_on_disk_06 *disk) { int i; memset(cpu, 0, sizeof(*cpu)); for (i = 0; i < GEN_CNT_SIZE; i++) cpu->gc[i] = be32_to_cpu(disk->gc[i].be); cpu->magic = be32_to_cpu(disk->magic.be); } void md_cpu_to_disk_06(struct md_on_disk_06 *disk, struct md_cpu *cpu) { int i; for (i = 0; i < GEN_CNT_SIZE; i++) disk->gc[i].be = cpu_to_be32(cpu->gc[i]); disk->magic.be = cpu_to_be32(cpu->magic); } int v06_validate_md(struct format *cfg) { if (cfg->md.magic != DRBD_MD_MAGIC_06) { fprintf(stderr, "v06 Magic number not found\n"); return -1; } return 0; } /* * -- DRBD 0.7 -------------------------------------- */ struct __packed md_on_disk_07 { be_u64 la_kb; /* last agreed size. */ be_u32 gc[GEN_CNT_SIZE]; /* generation counter */ be_u32 magic; be_u32 md_size_kb; be_s32 al_offset; /* signed sector offset to this block */ be_u32 al_nr_extents; /* important for restoring the AL */ be_s32 bm_offset; /* signed sector offset to the bitmap, from here */ char reserved[8 * 512 - 48]; }; void md_disk_07_to_cpu(struct md_cpu *cpu, const struct md_on_disk_07 *disk) { int i; memset(cpu, 0, sizeof(*cpu)); cpu->la_sect = be64_to_cpu(disk->la_kb.be) << 1; for (i = 0; i < GEN_CNT_SIZE; i++) cpu->gc[i] = be32_to_cpu(disk->gc[i].be); cpu->magic = be32_to_cpu(disk->magic.be); cpu->md_size_sect = be32_to_cpu(disk->md_size_kb.be) << 1; cpu->al_offset = be32_to_cpu(disk->al_offset.be); cpu->al_nr_extents = be32_to_cpu(disk->al_nr_extents.be); cpu->bm_offset = be32_to_cpu(disk->bm_offset.be); cpu->bm_bytes_per_bit = 4096; cpu->al_stripes = 1; cpu->al_stripe_size_4k = 8; } void md_cpu_to_disk_07(struct md_on_disk_07 *disk, const struct md_cpu const *cpu) { int i; disk->la_kb.be = cpu_to_be64(cpu->la_sect >> 1); for (i = 0; i < GEN_CNT_SIZE; i++) disk->gc[i].be = cpu_to_be32(cpu->gc[i]); disk->magic.be = cpu_to_be32(cpu->magic); disk->md_size_kb.be = cpu_to_be32(cpu->md_size_sect >> 1); disk->al_offset.be = cpu_to_be32(cpu->al_offset); disk->al_nr_extents.be = cpu_to_be32(cpu->al_nr_extents); disk->bm_offset.be = cpu_to_be32(cpu->bm_offset); memset(disk->reserved, 0, sizeof(disk->reserved)); } int is_valid_md(int f, const struct md_cpu const *md, const int md_index, const uint64_t ll_size) { uint64_t md_size_sect; int al_size_sect; char *v = (f == Drbd_07) ? "v07" : "v08"; ASSERT(f == Drbd_07 || f == Drbd_08); if ((f == Drbd_07 && md->magic != DRBD_MD_MAGIC_07) || (f == Drbd_08 && md->magic != DRBD_MD_MAGIC_08 && md->magic != DRBD_MD_MAGIC_84_UNCLEAN)) { if (verbose >= 1) fprintf(stderr, "%s Magic number not found\n", v); return 0; } al_size_sect = md->al_stripes * md->al_stripe_size_4k * 8; switch(md_index) { default: case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_EXT: if (md->al_offset != MD_AL_OFFSET_07) { fprintf(stderr, "%s Magic number (al_offset) not found\n", v); fprintf(stderr, "\texpected: %d, found %d\n", MD_AL_OFFSET_07, md->al_offset); return 0; } if (md->bm_offset != MD_AL_OFFSET_07 + al_size_sect) { fprintf(stderr, "%s bm_offset: expected %d, found %d\n", v, MD_AL_OFFSET_07 + al_size_sect, md->bm_offset); return 0; } break; case DRBD_MD_INDEX_FLEX_INT: if (md->al_offset != -al_size_sect) { fprintf(stderr, "%s al_offset: expected %d, found %d\n", v, -al_size_sect, md->al_offset); return 0; } /* we need (slightly less than) ~ this much bitmap sectors: */ md_size_sect = (ll_size + (1UL<<24)-1) >> 24; /* BM_EXT_SIZE_B */ md_size_sect = (md_size_sect + 7) & ~7ULL; /* align on 4K blocks */ /* plus the "drbd meta data super block", * and the activity log; unit still sectors */ md_size_sect += MD_AL_OFFSET_07 + al_size_sect; if (md->bm_offset != -(int64_t)md_size_sect + MD_AL_OFFSET_07) { fprintf(stderr, "strange bm_offset %d (expected: "D64")\n", md->bm_offset, -(int64_t)md_size_sect + MD_AL_OFFSET_07); return 0; }; if (md->md_size_sect != md_size_sect) { fprintf(stderr, "strange md_size_sect %u (expected: "U64")\n", md->md_size_sect, md_size_sect); if (f == Drbd_08) return 0; /* else not an error, * was inconsistently implemented in v07 */ } break; } /* FIXME consistency check, la_size_sect < ll_device_size, * no overlap with internal meta data, * no overlap of flexible meta data offsets/sizes * ... */ return 1; /* VALID */ } /* * these stay the same for 0.8, too: */ struct al_sector_cpu { uint32_t magic; uint32_t tr_number; struct { uint32_t pos; uint32_t extent; } updates[62]; uint32_t xor_sum; }; struct __packed al_sector_on_disk { be_u32 magic; be_u32 tr_number; struct __packed { be_u32 pos; be_u32 extent; } updates[62]; be_u32 xor_sum; be_u32 pad; }; int v07_al_disk_to_cpu(struct al_sector_cpu *al_cpu, struct al_sector_on_disk *al_disk) { uint32_t xor_sum = 0; int i; al_cpu->magic = be32_to_cpu(al_disk->magic.be); al_cpu->tr_number = be32_to_cpu(al_disk->tr_number.be); for (i = 0; i < 62; i++) { al_cpu->updates[i].pos = be32_to_cpu(al_disk->updates[i].pos.be); al_cpu->updates[i].extent = be32_to_cpu(al_disk->updates[i].extent.be); xor_sum ^= al_cpu->updates[i].extent; } al_cpu->xor_sum = be32_to_cpu(al_disk->xor_sum.be); return al_cpu->magic == DRBD_MAGIC && al_cpu->xor_sum == xor_sum; } /* * -- DRBD 8.0, 8.2, 8.3 -------------------------------------- */ struct __packed md_on_disk_08 { be_u64 la_sect; /* last agreed size. */ be_u64 uuid[UI_SIZE]; // UUIDs. be_u64 device_uuid; be_u64 reserved_u64_1; be_u32 flags; be_u32 magic; be_u32 md_size_sect; be_s32 al_offset; /* signed sector offset to this block */ be_u32 al_nr_extents; /* important for restoring the AL */ be_s32 bm_offset; /* signed sector offset to the bitmap, from here */ be_u32 bm_bytes_per_bit; be_u32 la_peer_max_bio_size; /* last peer max_bio_size */ /* see al_tr_number_to_on_disk_sector() */ be_u32 al_stripes; be_u32 al_stripe_size_4k; be_u32 reserved_u32[1]; char reserved[8 * 512 - (8*(UI_SIZE+3)+4*11)]; }; void md_disk_08_to_cpu(struct md_cpu *cpu, const struct md_on_disk_08 *disk) { int i; memset(cpu, 0, sizeof(*cpu)); cpu->la_sect = be64_to_cpu(disk->la_sect.be); for ( i=UI_CURRENT ; iuuid[i] = be64_to_cpu(disk->uuid[i].be); cpu->device_uuid = be64_to_cpu(disk->device_uuid.be); cpu->flags = be32_to_cpu(disk->flags.be); cpu->magic = be32_to_cpu(disk->magic.be); cpu->md_size_sect = be32_to_cpu(disk->md_size_sect.be); cpu->al_offset = be32_to_cpu(disk->al_offset.be); cpu->al_nr_extents = be32_to_cpu(disk->al_nr_extents.be); cpu->bm_offset = be32_to_cpu(disk->bm_offset.be); cpu->bm_bytes_per_bit = be32_to_cpu(disk->bm_bytes_per_bit.be); cpu->la_peer_max_bio_size = be32_to_cpu(disk->la_peer_max_bio_size.be); cpu->al_stripes = be32_to_cpu(disk->al_stripes.be); cpu->al_stripe_size_4k = be32_to_cpu(disk->al_stripe_size_4k.be); /* not set? --> default to old fixed size activity log */ if (cpu->al_stripes == 0 && cpu->al_stripe_size_4k == 0) { cpu->al_stripes = 1; cpu->al_stripe_size_4k = 8; } } void md_cpu_to_disk_08(struct md_on_disk_08 *disk, const struct md_cpu *cpu) { int i; memset(disk, 0, sizeof(*disk)); disk->la_sect.be = cpu_to_be64(cpu->la_sect); for ( i=UI_CURRENT ; iuuid[i].be = cpu_to_be64(cpu->uuid[i]); } disk->device_uuid.be = cpu_to_be64(cpu->device_uuid); disk->flags.be = cpu_to_be32(cpu->flags); disk->magic.be = cpu_to_be32(cpu->magic); disk->md_size_sect.be = cpu_to_be32(cpu->md_size_sect); disk->al_offset.be = cpu_to_be32(cpu->al_offset); disk->al_nr_extents.be = cpu_to_be32(cpu->al_nr_extents); disk->bm_offset.be = cpu_to_be32(cpu->bm_offset); disk->bm_bytes_per_bit.be = cpu_to_be32(cpu->bm_bytes_per_bit); disk->la_peer_max_bio_size.be = cpu_to_be32(cpu->la_peer_max_bio_size); disk->al_stripes.be = cpu_to_be32(cpu->al_stripes); disk->al_stripe_size_4k.be = cpu_to_be32(cpu->al_stripe_size_4k); } /* * -- DRBD 8.4 -------------------------------------- */ /* new in 8.4: 4k al transaction blocks */ #define AL_UPDATES_PER_TRANSACTION 64 #define AL_CONTEXT_PER_TRANSACTION 919 /* from DRBD 8.4 linux/drbd/drbd_limits.h, DRBD_AL_EXTENTS_MAX */ #define AL_EXTENTS_MAX 65534 enum al_transaction_types { AL_TR_UPDATE = 0, AL_TR_INITIALIZED = 0xffff }; struct __packed al_4k_transaction_on_disk { /* don't we all like magic */ be_u32 magic; /* to identify the most recent transaction block * in the on disk ring buffer */ be_u32 tr_number; /* checksum on the full 4k block, with this field set to 0. */ be_u32 crc32c; /* type of transaction, special transaction types like: * purge-all, set-all-idle, set-all-active, ... to-be-defined * see also enum al_transaction_types */ be_u16 transaction_type; /* we currently allow only a few thousand extents, * so 16bit will be enough for the slot number. */ /* how many updates in this transaction */ be_u16 n_updates; /* maximum slot number, "al-extents" in drbd.conf speak. * Having this in each transaction should make reconfiguration * of that parameter easier. */ be_u16 context_size; /* slot number the context starts with */ be_u16 context_start_slot_nr; /* Some reserved bytes. Expected usage is a 64bit counter of * sectors-written since device creation, and other data generation tag * supporting usage */ be_u32 __reserved[4]; /* --- 36 byte used --- */ /* Reserve space for up to AL_UPDATES_PER_TRANSACTION changes * in one transaction, then use the remaining byte in the 4k block for * context information. "Flexible" number of updates per transaction * does not help, as we have to account for the case when all update * slots are used anyways, so it would only complicate code without * additional benefit. */ be_u16 update_slot_nr[AL_UPDATES_PER_TRANSACTION]; /* but the extent number is 32bit, which at an extent size of 4 MiB * allows to cover device sizes of up to 2**54 Byte (16 PiB) */ be_u32 update_extent_nr[AL_UPDATES_PER_TRANSACTION]; /* --- 420 bytes used (36 + 64*6) --- */ /* 4096 - 420 = 3676 = 919 * 4 */ be_u32 context[AL_CONTEXT_PER_TRANSACTION]; }; struct al_4k_cpu { uint32_t magic; uint32_t tr_number; uint32_t crc32c; uint16_t transaction_type; uint16_t n_updates; uint16_t context_size; uint16_t context_start_slot_nr; uint32_t __reserved[4]; uint16_t update_slot_nr[AL_UPDATES_PER_TRANSACTION]; uint32_t update_extent_nr[AL_UPDATES_PER_TRANSACTION]; uint32_t context[AL_CONTEXT_PER_TRANSACTION]; uint32_t is_valid; }; /* from linux/crypto/crc32.c */ static const uint32_t crc32c_table[256] = { 0x00000000L, 0xF26B8303L, 0xE13B70F7L, 0x1350F3F4L, 0xC79A971FL, 0x35F1141CL, 0x26A1E7E8L, 0xD4CA64EBL, 0x8AD958CFL, 0x78B2DBCCL, 0x6BE22838L, 0x9989AB3BL, 0x4D43CFD0L, 0xBF284CD3L, 0xAC78BF27L, 0x5E133C24L, 0x105EC76FL, 0xE235446CL, 0xF165B798L, 0x030E349BL, 0xD7C45070L, 0x25AFD373L, 0x36FF2087L, 0xC494A384L, 0x9A879FA0L, 0x68EC1CA3L, 0x7BBCEF57L, 0x89D76C54L, 0x5D1D08BFL, 0xAF768BBCL, 0xBC267848L, 0x4E4DFB4BL, 0x20BD8EDEL, 0xD2D60DDDL, 0xC186FE29L, 0x33ED7D2AL, 0xE72719C1L, 0x154C9AC2L, 0x061C6936L, 0xF477EA35L, 0xAA64D611L, 0x580F5512L, 0x4B5FA6E6L, 0xB93425E5L, 0x6DFE410EL, 0x9F95C20DL, 0x8CC531F9L, 0x7EAEB2FAL, 0x30E349B1L, 0xC288CAB2L, 0xD1D83946L, 0x23B3BA45L, 0xF779DEAEL, 0x05125DADL, 0x1642AE59L, 0xE4292D5AL, 0xBA3A117EL, 0x4851927DL, 0x5B016189L, 0xA96AE28AL, 0x7DA08661L, 0x8FCB0562L, 0x9C9BF696L, 0x6EF07595L, 0x417B1DBCL, 0xB3109EBFL, 0xA0406D4BL, 0x522BEE48L, 0x86E18AA3L, 0x748A09A0L, 0x67DAFA54L, 0x95B17957L, 0xCBA24573L, 0x39C9C670L, 0x2A993584L, 0xD8F2B687L, 0x0C38D26CL, 0xFE53516FL, 0xED03A29BL, 0x1F682198L, 0x5125DAD3L, 0xA34E59D0L, 0xB01EAA24L, 0x42752927L, 0x96BF4DCCL, 0x64D4CECFL, 0x77843D3BL, 0x85EFBE38L, 0xDBFC821CL, 0x2997011FL, 0x3AC7F2EBL, 0xC8AC71E8L, 0x1C661503L, 0xEE0D9600L, 0xFD5D65F4L, 0x0F36E6F7L, 0x61C69362L, 0x93AD1061L, 0x80FDE395L, 0x72966096L, 0xA65C047DL, 0x5437877EL, 0x4767748AL, 0xB50CF789L, 0xEB1FCBADL, 0x197448AEL, 0x0A24BB5AL, 0xF84F3859L, 0x2C855CB2L, 0xDEEEDFB1L, 0xCDBE2C45L, 0x3FD5AF46L, 0x7198540DL, 0x83F3D70EL, 0x90A324FAL, 0x62C8A7F9L, 0xB602C312L, 0x44694011L, 0x5739B3E5L, 0xA55230E6L, 0xFB410CC2L, 0x092A8FC1L, 0x1A7A7C35L, 0xE811FF36L, 0x3CDB9BDDL, 0xCEB018DEL, 0xDDE0EB2AL, 0x2F8B6829L, 0x82F63B78L, 0x709DB87BL, 0x63CD4B8FL, 0x91A6C88CL, 0x456CAC67L, 0xB7072F64L, 0xA457DC90L, 0x563C5F93L, 0x082F63B7L, 0xFA44E0B4L, 0xE9141340L, 0x1B7F9043L, 0xCFB5F4A8L, 0x3DDE77ABL, 0x2E8E845FL, 0xDCE5075CL, 0x92A8FC17L, 0x60C37F14L, 0x73938CE0L, 0x81F80FE3L, 0x55326B08L, 0xA759E80BL, 0xB4091BFFL, 0x466298FCL, 0x1871A4D8L, 0xEA1A27DBL, 0xF94AD42FL, 0x0B21572CL, 0xDFEB33C7L, 0x2D80B0C4L, 0x3ED04330L, 0xCCBBC033L, 0xA24BB5A6L, 0x502036A5L, 0x4370C551L, 0xB11B4652L, 0x65D122B9L, 0x97BAA1BAL, 0x84EA524EL, 0x7681D14DL, 0x2892ED69L, 0xDAF96E6AL, 0xC9A99D9EL, 0x3BC21E9DL, 0xEF087A76L, 0x1D63F975L, 0x0E330A81L, 0xFC588982L, 0xB21572C9L, 0x407EF1CAL, 0x532E023EL, 0xA145813DL, 0x758FE5D6L, 0x87E466D5L, 0x94B49521L, 0x66DF1622L, 0x38CC2A06L, 0xCAA7A905L, 0xD9F75AF1L, 0x2B9CD9F2L, 0xFF56BD19L, 0x0D3D3E1AL, 0x1E6DCDEEL, 0xEC064EEDL, 0xC38D26C4L, 0x31E6A5C7L, 0x22B65633L, 0xD0DDD530L, 0x0417B1DBL, 0xF67C32D8L, 0xE52CC12CL, 0x1747422FL, 0x49547E0BL, 0xBB3FFD08L, 0xA86F0EFCL, 0x5A048DFFL, 0x8ECEE914L, 0x7CA56A17L, 0x6FF599E3L, 0x9D9E1AE0L, 0xD3D3E1ABL, 0x21B862A8L, 0x32E8915CL, 0xC083125FL, 0x144976B4L, 0xE622F5B7L, 0xF5720643L, 0x07198540L, 0x590AB964L, 0xAB613A67L, 0xB831C993L, 0x4A5A4A90L, 0x9E902E7BL, 0x6CFBAD78L, 0x7FAB5E8CL, 0x8DC0DD8FL, 0xE330A81AL, 0x115B2B19L, 0x020BD8EDL, 0xF0605BEEL, 0x24AA3F05L, 0xD6C1BC06L, 0xC5914FF2L, 0x37FACCF1L, 0x69E9F0D5L, 0x9B8273D6L, 0x88D28022L, 0x7AB90321L, 0xAE7367CAL, 0x5C18E4C9L, 0x4F48173DL, 0xBD23943EL, 0xF36E6F75L, 0x0105EC76L, 0x12551F82L, 0xE03E9C81L, 0x34F4F86AL, 0xC69F7B69L, 0xD5CF889DL, 0x27A40B9EL, 0x79B737BAL, 0x8BDCB4B9L, 0x988C474DL, 0x6AE7C44EL, 0xBE2DA0A5L, 0x4C4623A6L, 0x5F16D052L, 0xAD7D5351L }; /* * Steps through buffer one byte at at time, calculates reflected * crc using table. */ static uint32_t crc32c(uint32_t crc, const uint8_t *data, unsigned int length) { while (length--) crc = crc32c_table[(crc ^ *data++) & 0xFFL] ^ (crc >> 8); return crc; } /* --- */ int v84_al_disk_to_cpu(struct al_4k_cpu *al_cpu, struct al_4k_transaction_on_disk *al_disk) { unsigned crc = 0; unsigned i; al_cpu->magic = be32_to_cpu(al_disk->magic.be); al_cpu->tr_number = be32_to_cpu(al_disk->tr_number.be); al_cpu->crc32c = be32_to_cpu(al_disk->crc32c.be); al_cpu->transaction_type = be16_to_cpu(al_disk->transaction_type.be); al_cpu->n_updates = be16_to_cpu(al_disk->n_updates.be); al_cpu->context_size = be16_to_cpu(al_disk->context_size.be); al_cpu->context_start_slot_nr = be16_to_cpu(al_disk->context_start_slot_nr.be); /* reserverd al_disk->__reserved[4] */ for (i=0; i < AL_UPDATES_PER_TRANSACTION; i++) al_cpu->update_slot_nr[i] = be16_to_cpu(al_disk->update_slot_nr[i].be); for (i=0; i < AL_UPDATES_PER_TRANSACTION; i++) al_cpu->update_extent_nr[i] = be32_to_cpu(al_disk->update_extent_nr[i].be); for (i=0; i < AL_CONTEXT_PER_TRANSACTION; i++) al_cpu->context[i] = be32_to_cpu(al_disk->context[i].be); al_disk->crc32c.be = 0; crc = crc32c(crc, (void*)al_disk, 4096); al_cpu->is_valid = (al_cpu->magic == DRBD_AL_MAGIC && al_cpu->crc32c == crc); return al_cpu->is_valid; } /* * -------------------------------------------------- */ /* pre declarations */ void m_get_gc(struct md_cpu *md); void m_show_gc(struct md_cpu *md); void m_set_gc(struct md_cpu *md, char **argv, int argc); int m_outdate_gc(struct md_cpu *md); int m_invalidate_gc(struct md_cpu *md); void m_get_uuid(struct md_cpu *md); void m_show_uuid(struct md_cpu *md); void m_set_uuid(struct md_cpu *md, char **argv, int argc); int m_outdate_uuid(struct md_cpu *md); int m_invalidate_uuid(struct md_cpu *md); int generic_md_close(struct format *cfg); int v06_md_cpu_to_disk(struct format *cfg); int v06_md_disk_to_cpu(struct format *cfg); int v06_parse(struct format *cfg, char **argv, int argc, int *ai); int v06_md_open(struct format *cfg); int v06_md_initialize(struct format *cfg); int v07_md_cpu_to_disk(struct format *cfg); int v07_md_disk_to_cpu(struct format *cfg); int v07_parse(struct format *cfg, char **argv, int argc, int *ai); int v07_md_initialize(struct format *cfg); int v07_style_md_open(struct format *cfg); int v08_md_open(struct format *cfg); int v08_md_cpu_to_disk(struct format *cfg); int v08_md_disk_to_cpu(struct format *cfg); int v08_md_initialize(struct format *cfg); int v08_md_close(struct format *cfg); /* return codes for md_open */ enum { VALID_MD_FOUND = 0, NO_VALID_MD_FOUND = -1, VALID_MD_FOUND_AT_LAST_KNOWN_LOCATION = -2, }; struct format_ops f_ops[] = { [Drbd_06] = { .name = "v06", .args = (char *[]){"minor", NULL}, .parse = v06_parse, .open = v06_md_open, .close = generic_md_close, .md_initialize = v06_md_initialize, .md_disk_to_cpu = v06_md_disk_to_cpu, .md_cpu_to_disk = v06_md_cpu_to_disk, .get_gi = m_get_gc, .show_gi = m_show_gc, .set_gi = m_set_gc, .outdate_gi = m_outdate_gc, .invalidate_gi = m_invalidate_gc, }, [Drbd_07] = { .name = "v07", .args = (char *[]){"device", "index", NULL}, .parse = v07_parse, .open = v07_style_md_open, .close = generic_md_close, .md_initialize = v07_md_initialize, .md_disk_to_cpu = v07_md_disk_to_cpu, .md_cpu_to_disk = v07_md_cpu_to_disk, .get_gi = m_get_gc, .show_gi = m_show_gc, .set_gi = m_set_gc, .outdate_gi = m_outdate_gc, .invalidate_gi = m_invalidate_gc, }, [Drbd_08] = { .name = "v08", .args = (char *[]){"device", "index", NULL}, .parse = v07_parse, .open = v08_md_open, .close = v08_md_close, .md_initialize = v08_md_initialize, .md_disk_to_cpu = v08_md_disk_to_cpu, .md_cpu_to_disk = v08_md_cpu_to_disk, .get_gi = m_get_uuid, .show_gi = m_show_uuid, .set_gi = m_set_uuid, .outdate_gi = m_outdate_uuid, .invalidate_gi = m_invalidate_uuid, }, }; static inline enum Known_Formats format_version(struct format *cfg) { return (cfg->ops - f_ops); } static inline int is_v06(struct format *cfg) { return format_version(cfg) == Drbd_06; } static inline int is_v07(struct format *cfg) { return format_version(cfg) == Drbd_07; } static inline int is_v08(struct format *cfg) { return format_version(cfg) == Drbd_08; } /****************************************** Commands we know about: ******************************************/ struct meta_cmd { const char *name; const char *args; int (*function) (struct format *, char **argv, int argc); int show_in_usage; }; /* Global command pointer, to be able to change behavior in helper functions * based on which top-level command is being processed. */ static struct meta_cmd *command; /* pre declarations */ int meta_get_gi(struct format *cfg, char **argv, int argc); int meta_show_gi(struct format *cfg, char **argv, int argc); int meta_dump_md(struct format *cfg, char **argv, int argc); int meta_apply_al(struct format *cfg, char **argv, int argc); int meta_restore_md(struct format *cfg, char **argv, int argc); int meta_verify_dump_file(struct format *cfg, char **argv, int argc); int meta_create_md(struct format *cfg, char **argv, int argc); int meta_wipe_md(struct format *cfg, char **argv, int argc); int meta_outdate(struct format *cfg, char **argv, int argc); int meta_invalidate(struct format *cfg, char **argv, int argc); int meta_set_gi(struct format *cfg, char **argv, int argc); int meta_read_dev_uuid(struct format *cfg, char **argv, int argc); int meta_write_dev_uuid(struct format *cfg, char **argv, int argc); int meta_dstate(struct format *cfg, char **argv, int argc); int meta_chk_offline_resize(struct format *cfg, char **argv, int argc); struct meta_cmd cmds[] = { {"get-gi", 0, meta_get_gi, 1}, {"show-gi", 0, meta_show_gi, 1}, {"dump-md", 0, meta_dump_md, 1}, {"restore-md", "file", meta_restore_md, 1}, {"verify-dump", "file", meta_verify_dump_file, 1}, {"apply-al", 0, meta_apply_al, 1}, {"wipe-md", 0, meta_wipe_md, 1}, {"outdate", 0, meta_outdate, 1}, {"invalidate", 0, meta_invalidate, 1}, {"dstate", 0, meta_dstate, 1}, {"read-dev-uuid", "VAL", meta_read_dev_uuid, 0}, {"write-dev-uuid", "VAL", meta_write_dev_uuid, 0}, {"set-gi", ":::VAL:VAL:...", meta_set_gi, 0}, {"check-resize", 0, meta_chk_offline_resize, 1}, {"create-md", "[--peer-max-bio-size {val}] " "[--al-stripes {val}] " "[--al-stripe-size-kB {val}]", meta_create_md, 1}, }; /* * generic helpers */ #define PREAD(a,b,c,d) pread_or_die((a),(b),(c),(d), __func__ ) #define PWRITE(a,b,c,d) pwrite_or_die((a),(b),(c),(d), __func__ ) /* Do we want to exit() right here, * or do we want to duplicate the error handling everywhere? */ void pread_or_die(int fd, void *buf, size_t count, off_t offset, const char* tag) { ssize_t c = pread(fd, buf, count, offset); if (verbose >= 2) { fflush(stdout); fprintf(stderr, " %-26s: pread(%u, ...,%6lu,%12llu)\n", tag, fd, (unsigned long)count, (unsigned long long)offset); if (count & ((1<<12)-1)) fprintf(stderr, "\tcount will cause EINVAL on hard sect size != 512\n"); if (offset & ((1<<12)-1)) fprintf(stderr, "\toffset will cause EINVAL on hard sect size != 512\n"); } if (c < 0) { fprintf(stderr,"pread(%u,...,%lu,%llu) in %s failed: %s\n", fd, (unsigned long)count, (unsigned long long)offset, tag, strerror(errno)); exit(10); } else if ((size_t)c != count) { fprintf(stderr,"confused in %s: expected to read %d bytes," " actually read %d\n", tag, (int)count, (int)c); exit(10); } if (verbose > 10) fprintf_hex(stderr, offset, buf, count); } static unsigned n_writes = 0; void pwrite_or_die(int fd, const void *buf, size_t count, off_t offset, const char* tag) { ssize_t c; ++n_writes; if (dry_run) { fprintf(stderr, " %-26s: pwrite(%u, ...,%6lu,%12llu) SKIPPED DUE TO DRY-RUN\n", tag, fd, (unsigned long)count, (unsigned long long)offset); if (verbose > 10) fprintf_hex(stderr, offset, buf, count); return; } c = pwrite(fd, buf, count, offset); if (verbose >= 2) { fflush(stdout); fprintf(stderr, " %-26s: pwrite(%u, ...,%6lu,%12llu)\n", tag, fd, (unsigned long)count, (unsigned long long)offset); if (count & ((1<<12)-1)) fprintf(stderr, "\tcount will cause EINVAL on hard sect size != 512\n"); if (offset & ((1<<12)-1)) fprintf(stderr, "\toffset will cause EINVAL on hard sect size != 512\n"); } if (c < 0) { fprintf(stderr,"pwrite(%u,...,%lu,%llu) in %s failed: %s\n", fd, (unsigned long)count, (unsigned long long)offset, tag, strerror(errno)); exit(10); } else if ((size_t)c != count) { /* FIXME we might just now have corrupted the on-disk data */ fprintf(stderr,"confused in %s: expected to write %d bytes," " actually wrote %d\n", tag, (int)count, (int)c); exit(10); } } size_t pwrite_with_limit_or_die(int fd, const void *buf, size_t count, off_t offset, off_t limit, const char* tag) { if (offset >= limit) { fprintf(stderr,"confused in %s: offset (%llu) > limit (%llu)\n", tag, (unsigned long long)offset, (unsigned long long)limit); exit(10); } if (count > limit - offset) { fprintf(stderr,"in %s: truncating byte count from %lu to %lu\n", tag, (unsigned long)count, (unsigned long)(limit -offset)); count = limit - offset; } pwrite_or_die(fd, buf, count, offset, tag); return count; } void m_get_gc(struct md_cpu *md) { dt_print_gc(md->gc); } void m_show_gc(struct md_cpu *md) { dt_pretty_print_gc(md->gc); } void m_get_uuid(struct md_cpu *md) { dt_print_uuids(md->uuid,md->flags); } void m_show_uuid(struct md_cpu *md) { dt_pretty_print_uuids(md->uuid,md->flags); } int m_strsep_u32(char **s, uint32_t *val) { char *t, *e; unsigned long v; if ((t = strsep(s, ":"))) { if (strlen(t)) { e = t; errno = 0; v = strtoul(t, &e, 0); if (*e != 0) { fprintf(stderr, "'%s' is not a number.\n", *s); exit(10); } if (errno) { fprintf(stderr, "'%s': ", *s); perror(0); exit(10); } if (v > 0xFFffFFffUL) { fprintf(stderr, "'%s' is out of range (max 0xFFffFFff).\n", *s); exit(10); } *val = (uint32_t)v; } return 1; } return 0; } int m_strsep_u64(char **s, uint64_t *val) { char *t, *e; uint64_t v; if ((t = strsep(s, ":"))) { if (strlen(t)) { e = t; errno = 0; v = strto_u64(t, &e, 16); if (*e != 0) { fprintf(stderr, "'%s' is not a number.\n", *s); exit(10); } if (errno) { fprintf(stderr, "'%s': ", *s); perror(0); exit(10); } *val = v; } return 1; } return 0; } int m_strsep_bit(char **s, uint32_t *val, int mask) { uint32_t d; int rv; d = *val & mask ? 1 : 0; rv = m_strsep_u32(s, &d); if (d > 1) { fprintf(stderr, "'%d' is not 0 or 1.\n", d); exit(10); } if (d) *val |= mask; else *val &= ~mask; return rv; } void m_set_gc(struct md_cpu *md, char **argv, int argc __attribute((unused))) { char **str; str = &argv[0]; do { if (!m_strsep_bit(str, &md->gc[Flags], MDF_CONSISTENT)) break; if (!m_strsep_u32(str, &md->gc[HumanCnt])) break; if (!m_strsep_u32(str, &md->gc[TimeoutCnt])) break; if (!m_strsep_u32(str, &md->gc[ConnectedCnt])) break; if (!m_strsep_u32(str, &md->gc[ArbitraryCnt])) break; if (!m_strsep_bit(str, &md->gc[Flags], MDF_PRIMARY_IND)) break; if (!m_strsep_bit(str, &md->gc[Flags], MDF_CONNECTED_IND)) break; if (!m_strsep_bit(str, &md->gc[Flags], MDF_FULL_SYNC)) break; } while (0); } void m_set_uuid(struct md_cpu *md, char **argv, int argc __attribute((unused))) { char **str; int i; str = &argv[0]; do { for ( i=UI_CURRENT ; iuuid[i])) return; } if (!m_strsep_bit(str, &md->flags, MDF_CONSISTENT)) break; if (!m_strsep_bit(str, &md->flags, MDF_WAS_UP_TO_DATE)) break; if (!m_strsep_bit(str, &md->flags, MDF_PRIMARY_IND)) break; if (!m_strsep_bit(str, &md->flags, MDF_CONNECTED_IND)) break; if (!m_strsep_bit(str, &md->flags, MDF_FULL_SYNC)) break; if (!m_strsep_bit(str, &md->flags, MDF_PEER_OUT_DATED)) break; if (!m_strsep_bit(str, &md->flags, MDF_CRASHED_PRIMARY)) break; } while (0); } int m_outdate_gc(struct md_cpu *md __attribute((unused))) { fprintf(stderr, "Can not outdate GC based meta data!\n"); return 5; } int m_outdate_uuid(struct md_cpu *md) { if ( !(md->flags & MDF_CONSISTENT) ) { return 5; } md->flags &= ~MDF_WAS_UP_TO_DATE; return 0; } int m_invalidate_gc(struct md_cpu *md) { md->gc[Flags] &= ~MDF_CONSISTENT; md->gc[Flags] |= MDF_FULL_SYNC; return 5; } int m_invalidate_uuid(struct md_cpu *md) { md->flags &= ~MDF_CONSISTENT; md->flags &= ~MDF_WAS_UP_TO_DATE; md->flags |= MDF_FULL_SYNC; return 0; } /****************************************** begin of v06 {{{ ******************************************/ int v06_md_disk_to_cpu(struct format *cfg) { PREAD(cfg->md_fd, on_disk_buffer, sizeof(struct md_on_disk_06), cfg->md_offset); md_disk_06_to_cpu(&cfg->md, (struct md_on_disk_06*)on_disk_buffer); return v06_validate_md(cfg); } int v06_md_cpu_to_disk(struct format *cfg) { if (v06_validate_md(cfg)) return -1; md_cpu_to_disk_06(on_disk_buffer, &cfg->md); PWRITE(cfg->md_fd, on_disk_buffer, sizeof(struct md_on_disk_06), cfg->md_offset); return 0; } int v06_parse(struct format *cfg, char **argv, int argc, int *ai) { unsigned long minor; char *e; if (argc < 1) { fprintf(stderr, "Too few arguments for format\n"); exit(20); } e = argv[0]; minor = strtol(argv[0], &e, 0); if (*e != 0 || minor > 255UL) { fprintf(stderr, "'%s' is not a valid minor number.\n", argv[0]); exit(20); } if (asprintf(&e, "%s/drbd%lu", DRBD_LIB_DIR, minor) <= 18) { fprintf(stderr, "asprintf() failed.\n"); exit(20); }; cfg->md_device_name = e; *ai += 1; return 0; } int v06_md_open(struct format *cfg) { struct stat sb; cfg->md_fd = open(cfg->md_device_name, O_RDWR); if (cfg->md_fd == -1) { PERROR("open(%s) failed", cfg->md_device_name); return NO_VALID_MD_FOUND; } if (fstat(cfg->md_fd, &sb)) { PERROR("fstat() failed"); return NO_VALID_MD_FOUND; } if (!S_ISREG(sb.st_mode)) { fprintf(stderr, "'%s' is not a plain file!\n", cfg->md_device_name); return NO_VALID_MD_FOUND; } if (cfg->ops->md_disk_to_cpu(cfg)) { return NO_VALID_MD_FOUND; } return VALID_MD_FOUND; } int generic_md_close(struct format *cfg) { /* On /dev/ram0 we may not use O_SYNC for some kernels (eg. RHEL6 2.6.32), * and fsync() returns EIO, too. So we don't do error checking here. */ fsync(cfg->md_fd); if (close(cfg->md_fd)) { PERROR("close() failed"); return -1; } return 0; } int v06_md_initialize(struct format *cfg) { cfg->md.gc[Flags] = 0; cfg->md.gc[HumanCnt] = 1; /* THINK 0? 1? */ cfg->md.gc[TimeoutCnt] = 1; cfg->md.gc[ConnectedCnt] = 1; cfg->md.gc[ArbitraryCnt] = 1; cfg->md.magic = DRBD_MD_MAGIC_06; return 0; } /****************************************** }}} end of v06 ******************************************/ static uint64_t max_usable_sectors(struct format *cfg) { #define min(x,y) ((x) < (y) ? (x) : (y)) /* We currently have two possible layouts: * external: * |----------- md_size_sect ------------------| * [ 4k superblock ][ activity log ][ Bitmap ] * | al_offset == 8 | * | bm_offset = al_offset + X | * ==> bitmap sectors = md_size_sect - bm_offset * * internal: * |----------- md_size_sect ------------------| * [data.....][ Bitmap ][ activity log ][ 4k superblock ] * | al_offset < 0 | * | bm_offset = al_offset - Y | * ==> bitmap sectors = Y = al_offset - bm_offset * * There also used to be the fixed size internal meta data, * which covers the last 128 MB of the device, * and has the same layout as the "external:" above. */ if(cfg->md_index == DRBD_MD_INDEX_INTERNAL || cfg->md_index == DRBD_MD_INDEX_FLEX_INT) { /* for internal meta data, the available storage is limitted by * the first meta data sector, even if the available bitmap * space would support more. */ return min( cfg->md_offset, min( cfg->al_offset, cfg->bm_offset )) >> 9; } else { /* For external meta data, * we are limited by available on-disk bitmap space. * Ok, and by the lower level storage device; * which we don't know about here :-( */ ASSERT(cfg->md.bm_bytes_per_bit == 4096); return /* bitmap sectors */ (uint64_t)(cfg->md.md_size_sect - cfg->md.bm_offset) * 512 /* sector size */ * 8 /* bits per byte */ /* storage bytes per bitmap bit; * currently always 4096 */ * cfg->md.bm_bytes_per_bit / 512; /* and back to sectors */; } #undef min } void re_initialize_md_offsets(struct format *cfg) { uint64_t md_size_sect; int al_size_sect; if (is_v08(cfg)) al_size_sect = cfg->md.al_stripes * cfg->md.al_stripe_size_4k * 8; else al_size_sect = MD_AL_MAX_SECT_07; switch(cfg->md_index) { default: cfg->md.md_size_sect = MD_RESERVED_SECT_07; cfg->md.al_offset = MD_AL_OFFSET_07; cfg->md.bm_offset = cfg->md.al_offset + al_size_sect; break; case DRBD_MD_INDEX_FLEX_EXT: /* just occupy the full device; unit: sectors */ cfg->md.md_size_sect = cfg->bd_size >> 9; cfg->md.al_offset = MD_AL_OFFSET_07; cfg->md.bm_offset = cfg->md.al_offset + al_size_sect; break; case DRBD_MD_INDEX_INTERNAL: /* only v07 */ cfg->md.md_size_sect = MD_RESERVED_SECT_07; cfg->md.al_offset = MD_AL_OFFSET_07; cfg->md.bm_offset = MD_BM_OFFSET_07; break; case DRBD_MD_INDEX_FLEX_INT: /* al size is still fixed */ cfg->md.al_offset = -al_size_sect; /* we need (slightly less than) ~ this much bitmap sectors: */ md_size_sect = (cfg->bd_size + (1UL<<24)-1) >> 24; /* BM_EXT_SIZE_B */ md_size_sect = (md_size_sect + 7) & ~7ULL; /* align on 4K blocks */ if (md_size_sect > (MD_BM_MAX_BYTE_FLEX>>9)) { char ppbuf[10]; fprintf(stderr, "Device too large. We only support up to %s.\n", ppsize(ppbuf, MD_BM_MAX_BYTE_FLEX << (3+2))); if (BITS_PER_LONG == 32) fprintf(stderr, "Maybe try a 64bit arch?\n"); exit(10); } /* plus the "drbd meta data super block", * and the activity log; unit still sectors */ md_size_sect += MD_AL_OFFSET_07 + al_size_sect; cfg->md.md_size_sect = md_size_sect; cfg->md.bm_offset = -md_size_sect + MD_AL_OFFSET_07; break; } cfg->al_offset = cfg->md_offset + cfg->md.al_offset * 512LL; cfg->bm_offset = cfg->md_offset + cfg->md.bm_offset * 512LL; cfg->max_usable_sect = max_usable_sectors(cfg); if (verbose >= 2) { fprintf(stderr,"md_offset: "U64"\n", cfg->md_offset); fprintf(stderr,"al_offset: "U64" (%d)\n", cfg->al_offset, cfg->md.al_offset); fprintf(stderr,"bm_offset: "U64" (%d)\n", cfg->bm_offset, cfg->md.bm_offset); fprintf(stderr,"md_size_sect: "U32"\n", cfg->md.md_size_sect); fprintf(stderr,"max_usable_sect: "U64"\n", cfg->max_usable_sect); } } void initialize_al(struct format *cfg) { unsigned int mx = cfg->md.al_stripes * cfg->md.al_stripe_size_4k; size_t al_size = mx * 4096; memset(on_disk_buffer, 0x00, al_size); if (format_version(cfg) == Drbd_08) { /* DRBD <= 8.3 does not care if it is all zero, * or otherwise wrong magic. * * For 8.4, we initialize to something that is * valid magic, valid crc, and transaction_type = 0xffff. */ struct al_4k_transaction_on_disk *al = on_disk_buffer; unsigned crc_be = 0; int i; for (i = 0; i < mx; i++, al++) { al->magic.be = cpu_to_be32(DRBD_AL_MAGIC); al->transaction_type.be = cpu_to_be16(AL_TR_INITIALIZED); /* crc calculated once */ if (i == 0) crc_be = cpu_to_be32(crc32c(0, (void*)al, 4096)); al->crc32c.be = crc_be; } } pwrite_or_die(cfg->md_fd, on_disk_buffer, al_size, cfg->al_offset, "md_initialize_common:AL"); } void check_for_existing_data(struct format *cfg); /* MAYBE DOES DISK WRITES!! */ int md_initialize_common(struct format *cfg, int do_disk_writes) { cfg->md.al_nr_extents = 257; /* arbitrary. */ cfg->md.bm_bytes_per_bit = DEFAULT_BM_BLOCK_SIZE; re_initialize_md_offsets(cfg); if (!do_disk_writes) return 0; check_for_existing_data(cfg); /* do you want to initialize al to something more useful? */ printf("initializing activity log\n"); if (MD_AL_MAX_SECT_07*512 > buffer_size) { fprintf(stderr, "%s:%u: LOGIC BUG\n" , __FILE__ , __LINE__ ); exit(111); } initialize_al(cfg); /* THINK * do we really need to initialize the bitmap? */ if (INITIALIZE_BITMAP) { /* need to sector-align this for O_DIRECT. * "sector" here means hard-sect size, which may be != 512. * Note that even though ALIGN does round up, for sector sizes * of 512, 1024, 2048, 4096 Bytes, this will be fully within * the claimed meta data area, since we already align all * "interesting" parts of that to 4kB */ const size_t bm_bytes = ALIGN(cfg->bm_bytes, cfg->md_hard_sect_size); size_t i = bm_bytes; off_t bm_on_disk_off = cfg->bm_offset; unsigned int percent_done = 0; unsigned int percent_last_report = 0; size_t chunk; fprintf(stderr,"initializing bitmap (%u KB)\n", (unsigned int)(bm_bytes>>10)); memset(on_disk_buffer, 0xff, buffer_size); while (i) { chunk = buffer_size < i ? buffer_size : i; pwrite_or_die(cfg->md_fd, on_disk_buffer, chunk, bm_on_disk_off, "md_initialize_common:BM"); bm_on_disk_off += chunk; i -= chunk; percent_done = 100*(bm_bytes-i)/bm_bytes; if (percent_done != percent_last_report) { fprintf(stderr,"\r%u%%", percent_done); percent_last_report = percent_done; } } fprintf(stderr,"\r100%%\n"); } else { fprintf(stderr,"NOT initializing bitmap\n"); } return 0; } /****************************************** begin of v07 {{{ ******************************************/ uint64_t v07_style_md_get_byte_offset(const int idx, const uint64_t bd_size) { uint64_t offset; switch(idx) { default: /* external, some index */ offset = MD_RESERVED_SECT_07 * idx * 512; break; case DRBD_MD_INDEX_INTERNAL: offset = (bd_size & ~4095LLU) - MD_RESERVED_SECT_07 * 512; break; case DRBD_MD_INDEX_FLEX_INT: /* sizeof(struct md_on_disk_07) == 4k * position: last 4k aligned block of 4k size */ offset = bd_size - 4096LLU; offset &= ~4095LLU; break; case DRBD_MD_INDEX_FLEX_EXT: offset = 0; break; } return offset; } void printf_al_07(struct format *cfg, struct al_sector_on_disk *al_disk) { struct al_sector_cpu al_cpu; unsigned s, i; unsigned max_slot_nr = 0; for (s = 0; s < MD_AL_MAX_SECT_07; s++) { int ok = v07_al_disk_to_cpu(&al_cpu, al_disk + s); printf("# sector %2u { %s\n", s, ok ? "valid" : "invalid"); printf("# \tmagic: 0x%08x\n", al_cpu.magic); printf("# \ttr: %10u\n", al_cpu.tr_number); for (i = 0; i < 62; i++) { printf("# \t%2u: %10u %10u\n", i, al_cpu.updates[i].pos, al_cpu.updates[i].extent); if (al_cpu.updates[i].pos > max_slot_nr && al_cpu.updates[i].pos != -1U) max_slot_nr = al_cpu.updates[i].pos; } printf("# \txor: 0x%08x\n", al_cpu.xor_sum); printf("# }\n"); } if (max_slot_nr >= cfg->md.al_nr_extents) printf( "### CAUTION: maximum slot number found in AL: %u\n" "### CAUTION: but 'super-block' al-extents is: %u\n", max_slot_nr, cfg->md.al_nr_extents); } void printf_al_84(struct format *cfg, struct al_4k_transaction_on_disk *al_disk, unsigned block_nr_offset, unsigned N) { struct al_4k_cpu al_cpu; unsigned b, i; unsigned max_slot_nr = 0; for (b = 0; b < N; b++) { int ok = v84_al_disk_to_cpu(&al_cpu, al_disk + b); if (!ok) { printf("# block %2u { INVALID }\n", b + block_nr_offset); continue; } if (al_cpu.transaction_type == 0xffff) { printf("# block %2u { INITIALIZED }\n", b + block_nr_offset); continue; } printf("# block %2u {\n", b + block_nr_offset); printf("# \tmagic: 0x%08x\n", al_cpu.magic); printf("# \ttype: 0x%04x\n", al_cpu.transaction_type); printf("# \ttr: %10u\n", al_cpu.tr_number); printf("# \tactive set size: %u\n", al_cpu.context_size); if (al_cpu.context_size -1 > max_slot_nr) max_slot_nr = al_cpu.context_size -1; for (i = 0; i < AL_CONTEXT_PER_TRANSACTION; i++) { unsigned slot = al_cpu.context_start_slot_nr + i; if (al_cpu.context[i] == ~0U && slot >= al_cpu.context_size) continue; if (slot > max_slot_nr) max_slot_nr = slot; printf("# \t%2u: %10u %10u\n", i, slot, al_cpu.context[i]); } printf("# \tupdates: %u\n", al_cpu.n_updates); for (i = 0; i < AL_UPDATES_PER_TRANSACTION; i++) { if (i >= al_cpu.n_updates && al_cpu.update_slot_nr[i] == (uint16_t)(~0U)) continue; printf("# \t%2u: %10u %10u\n", i, al_cpu.update_slot_nr[i], al_cpu.update_extent_nr[i]); if (al_cpu.update_slot_nr[i] > max_slot_nr) max_slot_nr = al_cpu.update_slot_nr[i]; } printf("# \tcrc32c: 0x%08x\n", al_cpu.crc32c); printf("# }\n"); } if (max_slot_nr >= cfg->md.al_nr_extents) printf( "### CAUTION: maximum slot number found in AL: %u\n" "### CAUTION: but 'super-block' al-extents is: %u\n", max_slot_nr, cfg->md.al_nr_extents); } void printf_al(struct format *cfg) { off_t al_on_disk_off = cfg->al_offset; off_t al_size = cfg->md.al_stripes * cfg->md.al_stripe_size_4k * 4096; struct al_sector_on_disk *al_512_disk = on_disk_buffer; struct al_4k_transaction_on_disk *al_4k_disk = on_disk_buffer; unsigned block_nr_offset = 0; unsigned N; int is_al_84 = is_v08(cfg) && (cfg->md.al_stripes != 1 || cfg->md.al_stripe_size_4k != 8); printf("# al {\n"); while (al_size) { off_t chunk = al_size; if (chunk > buffer_size) chunk = buffer_size; ASSERT(chunk); pread_or_die(cfg->md_fd, on_disk_buffer, chunk, al_on_disk_off, "printf_al"); al_on_disk_off += chunk; al_size -= chunk; N = chunk/4096; /* FIXME * we should introduce a new meta data "super block" magic, so we won't * have the same super block with two different activity log * transaction layouts */ if (format_version(cfg) < Drbd_08) printf_al_07(cfg, al_512_disk); /* looks like we have the new al format */ else if (is_al_84 || DRBD_AL_MAGIC == be32_to_cpu(al_4k_disk[0].magic.be) || DRBD_AL_MAGIC == be32_to_cpu(al_4k_disk[1].magic.be)) { is_al_84 = 1; printf_al_84(cfg, al_4k_disk, block_nr_offset, N); } /* try the old al format anyways */ else printf_al_07(cfg, al_512_disk); block_nr_offset += N; if (al_size && !is_al_84) { printf("### UNEXPECTED ACTIVITY LOG SIZE!\n"); } } printf("# }\n"); } /* One activity log extent represents 4M of storage, * one bit corresponds to 4k. * 4M / 4k / 8bit per byte */ #define BM_BYTES_PER_AL_EXT (1UL << (22 - 12 - 3)) struct al_cursor { unsigned i; uint32_t tr_number; }; static int replay_al_07(struct format *cfg, uint32_t *hot_extent) { unsigned int mx; struct al_sector_cpu al_cpu[MD_AL_MAX_SECT_07]; unsigned char valid[MD_AL_MAX_SECT_07]; struct al_sector_on_disk *al_disk = on_disk_buffer; unsigned b, i; int found_valid = 0; struct al_cursor oldest = { 0, }; struct al_cursor newest = { 0, }; /* Endian convert, validate, and find oldest to newest log range. * In contrast to the 8.4 log format, this log format does NOT * use all log space, but only as many sectors as absolutely necessary. * * We need to trust the "al_nr_extents" setting in the "super block". */ #define AL_EXTENTS_PT 61 /* mx = 1 + div_ceil(al_nr_extents, AL_EXTENTS_PT); */ mx = 1 + (cfg->md.al_nr_extents + AL_EXTENTS_PT -1) / AL_EXTENTS_PT; for (b = 0; b < mx; b++) { valid[b] = v07_al_disk_to_cpu(al_cpu + b, al_disk + b); if (!valid[b]) continue; if (++found_valid == 1) { oldest.i = b; oldest.tr_number = al_cpu[b].tr_number; newest = oldest; continue; } d_expect(al_cpu[b].tr_number != oldest.tr_number); d_expect(al_cpu[b].tr_number != newest.tr_number); if ((int)al_cpu[b].tr_number - (int)oldest.tr_number < 0) { d_expect(oldest.tr_number - al_cpu[b].tr_number + b - oldest.i == mx); oldest.i = b; oldest.tr_number = al_cpu[b].tr_number; } if ((int)al_cpu[b].tr_number - (int)newest.tr_number > 0) { d_expect(al_cpu[b].tr_number - newest.tr_number == b - newest.i); newest.i = b; newest.tr_number = al_cpu[b].tr_number; } } if (!found_valid) { /* not even one transaction was valid. * Has this ever been initialized correctly? */ fprintf(stderr, "No usable activity log found.\n"); /* with up to 8.3 style activity log, this is NOT an error. */ return 0; } /* we do expect at most one corrupt transaction, and only in case * things went wrong during transaction write. */ if (found_valid != mx) { fprintf(stderr, "%u corrupt or uninitialized AL transactions found\n", mx - found_valid); fprintf(stderr, "You can safely ignore this if this node was cleanly stopped (no crash).\n"); } /* Any other paranoia checks possible with this log format? */ /* Ok, so we found valid update transactions. Reconstruct the "active * set" at the time of the newest transaction. */ /* wrap around */ if (newest.i < oldest.i) newest.i += mx; for (b = oldest.i; b <= newest.i; b++) { unsigned idx = b % mx; if (!valid[idx]) continue; /* This loop processes both "context" and "update" information. * There is only one update, on index 0, * which is why this loop counts down. */ for (i = AL_EXTENTS_PT; (int)i >= 0; i--) { unsigned slot = al_cpu[idx].updates[i].pos; if (al_cpu[idx].updates[i].extent == ~0U) continue; if (slot >= AL_EXTENTS_MAX) { fprintf(stderr, "slot number out of range: tr:%u slot:%u\n", idx, slot); continue; } hot_extent[slot] = al_cpu[idx].updates[i].extent; } } return found_valid; } static unsigned int al_tr_number_to_on_disk_slot(struct format *cfg, unsigned int b, unsigned int mx) { const unsigned int stripes = cfg->md.al_stripes; const unsigned int stripe_size_4kB = cfg->md.al_stripe_size_4k; /* transaction number, modulo on-disk ring buffer wrap around */ unsigned int t = b % mx; /* ... to aligned 4k on disk block */ t = ((t % stripes) * stripe_size_4kB) + t/stripes; return t; } /* Expects the AL to be read into on_disk_buffer already. * Returns negative error code for non-interpretable data, * 0 for "just mark me clean, nothing more to do", * and positive if we have to apply something. */ int replay_al_84(struct format *cfg, uint32_t *hot_extent) { const unsigned int mx = cfg->md.al_stripes * cfg->md.al_stripe_size_4k; struct al_4k_transaction_on_disk *al_disk = on_disk_buffer; struct al_4k_cpu *al_cpu = NULL; unsigned b, o, i; int found_valid = 0; int found_valid_updates = 0; struct al_cursor oldest = { 0, }; struct al_cursor newest = { 0, }; al_cpu = calloc(mx, sizeof(*al_cpu)); if (!al_cpu) { fprintf(stderr, "Could not calloc(%u, sizeof(*al_cpu))\n", mx); exit(30); /* FIXME sane exit codes */ } /* endian convert, validate, and find oldest to newest log range */ for (b = 0; b < mx; b++) { o = al_tr_number_to_on_disk_slot(cfg, b, mx); if (!v84_al_disk_to_cpu(al_cpu + b, al_disk + o)) continue; ++found_valid; if (al_cpu[b].transaction_type == AL_TR_INITIALIZED) continue; d_expect(al_cpu[b].transaction_type == AL_TR_UPDATE); if (++found_valid_updates == 1) { oldest.i = b; oldest.tr_number = al_cpu[b].tr_number; newest = oldest; continue; } d_expect(al_cpu[b].tr_number != oldest.tr_number); d_expect(al_cpu[b].tr_number != newest.tr_number); if ((int)al_cpu[b].tr_number - (int)oldest.tr_number < 0) { d_expect(oldest.tr_number - al_cpu[b].tr_number + b - oldest.i == mx); oldest.i = b; oldest.tr_number = al_cpu[b].tr_number; } if ((int)al_cpu[b].tr_number - (int)newest.tr_number > 0) { d_expect(al_cpu[b].tr_number - newest.tr_number == b - newest.i); newest.i = b; newest.tr_number = al_cpu[b].tr_number; } } if (!found_valid) { /* not even one transaction was valid. * Has this ever been initialized correctly? */ fprintf(stderr, "No usable activity log found. Do you need to create-md?\n"); return -ENODATA; } /* we do expect at most one corrupt transaction, and only in case * things went wrong during transaction write. */ if (found_valid != mx) fprintf(stderr, "%u corrupt AL transactions found\n", mx - found_valid); if (!found_valid_updates) { if (found_valid == mx) /* nothing to do, all slots are valid AL_TR_INITIALIZED */ return 0; /* this is only expected, in case the _first_ transaction * somehow failed. */ if (!al_cpu[0].is_valid && found_valid == mx - 1) return 0; /* Hmm. Some transactions are valid. * Some are not. * This is not expected. */ /* FIXME how do we want to handle this? */ fprintf(stderr, "No valid AL update transaction found.\n"); return -EINVAL; } /* FIXME what do we do * about more than one corrupt transaction? * about corrupt transaction in the middle of the oldest -> newest range? */ /* Ok, so we found valid update transactions. Reconstruct the "active * set" at the time of the newest transaction. */ /* wrap around */ if (newest.i < oldest.i) newest.i += mx; for (b = oldest.i; b <= newest.i; b++) { unsigned idx = b % mx; if (!al_cpu[idx].is_valid || al_cpu[idx].transaction_type == AL_TR_INITIALIZED) continue; for (i = 0; i < AL_CONTEXT_PER_TRANSACTION; i++) { unsigned slot = al_cpu[idx].context_start_slot_nr + i; if (al_cpu[idx].context[i] == ~0U && slot >= al_cpu[idx].context_size) continue; if (slot >= AL_EXTENTS_MAX) { fprintf(stderr, "slot number out of range: tr:%u slot:%u\n", idx, slot); continue; } hot_extent[slot] = al_cpu[idx].context[i]; } for (i = 0; i < AL_UPDATES_PER_TRANSACTION; i++) { unsigned slot = al_cpu[idx].update_slot_nr[i]; if (i >= al_cpu[idx].n_updates && slot == (uint16_t)(~0U)) continue; if (slot >= AL_EXTENTS_MAX) { fprintf(stderr, "update slot number out of range: tr:%u slot:%u\n", idx, slot); continue; } hot_extent[slot] = al_cpu[idx].update_extent_nr[i]; } } return found_valid_updates; } int cmp_u32(const void *p1, const void *p2) { const unsigned a = *(unsigned *)p1; const unsigned b = *(unsigned *)p2; /* how best to deal with 32bit wrap? */ return a < b ? -1 : a == b ? 0 : 1; } void apply_al(struct format *cfg, uint32_t *hot_extent) { const size_t bm_bytes = ALIGN(cfg->bm_bytes, cfg->md_hard_sect_size); off_t bm_on_disk_off = cfg->bm_offset; size_t bm_on_disk_pos = 0; size_t chunk = 0; int i, j; /* can only be AL_EXTENTS_MAX * BM_BYTES_PER_AL_EXT * 8, * which currently is 65534 * 128 * 8 == 67106816 * fits easily into 32bit. */ unsigned additional_bits_set = 0; uint64_t *w; char ppb[10]; /* Now, actually apply this stuff to the on-disk bitmap. * Since one AL extent corresponds to 128 Byte of bitmap, * we need to do some read/modify/write cycles here. * * Note that this can be slow due to the use of O_DIRECT, * worst case it does 65534 (AL_EXTENTS_MAX) cycles of * - read 128 kByte (buffer_size) * - memset 128 Bytes (BM_BYTES_PER_AL_EXT) to 0xff * - write 128 kByte * This implementation could optimized in various ways: * - don't use direct IO; has other drawbacks * - first scan hot_extents for extent ranges, * and optimize the IO size. * - use aio with multiple buffers * - ... */ for (i = 0; i < AL_EXTENTS_MAX; i++) { size_t bm_pos; unsigned bits_set = 0; if (hot_extent[i] == ~0U) break; bm_pos = hot_extent[i] * BM_BYTES_PER_AL_EXT; if (bm_pos >= bm_bytes) { fprintf(stderr, "extent %u beyond end of bitmap!\n", hot_extent[i]); /* could break or return error here, * but I'll just print a warning, and skip, each of them. */ continue; } /* On first iteration, or when the current position in the bitmap * exceeds the current buffer, write out the current buffer, if any, * and read in the next (at most buffer_size) chunk of bitmap, * containing the currently processed bitmap region. */ if (i == 0 || bm_pos + BM_BYTES_PER_AL_EXT >= bm_on_disk_pos + chunk) { if (i != 0) pwrite_or_die(cfg->md_fd, on_disk_buffer, chunk, bm_on_disk_off + bm_on_disk_pos, "apply_al"); /* don't special case logical sector size != 512, * operate in 4k always. */ bm_on_disk_pos = bm_pos & ~(off_t)(4095); chunk = bm_bytes - bm_on_disk_pos; if (chunk > buffer_size) chunk = buffer_size; pread_or_die(cfg->md_fd, on_disk_buffer, chunk, bm_on_disk_off + bm_on_disk_pos, "apply_al"); } ASSERT(bm_pos - bm_on_disk_pos <= chunk - BM_BYTES_PER_AL_EXT); ASSERT((bm_pos - bm_on_disk_pos) % sizeof(uint64_t) == 0); w = (uint64_t *)on_disk_buffer + (bm_pos - bm_on_disk_pos)/sizeof(uint64_t); for (j = 0; j < BM_BYTES_PER_AL_EXT/sizeof(uint64_t); j++) bits_set += generic_hweight64(w[j]); additional_bits_set += BM_BYTES_PER_AL_EXT * 8 - bits_set; memset((char*)on_disk_buffer + (bm_pos - bm_on_disk_pos), 0xff, BM_BYTES_PER_AL_EXT); } /* we still need to write out the buffer of the last iteration */ if (i != 0) { pwrite_or_die(cfg->md_fd, on_disk_buffer, chunk, bm_on_disk_off + bm_on_disk_pos, "apply_al"); fprintf(stderr, "Marked additional %s as out-of-sync based on AL.\n", ppsize(ppb, additional_bits_set * 4)); } else fprintf(stderr, "Nothing to do.\n"); } int need_to_apply_al(struct format *cfg) { if (is_v08(cfg)) return cfg->md.flags & MDF_PRIMARY_IND; else if (is_v07(cfg)) return cfg->md.gc[Flags] & MDF_PRIMARY_IND; else return 0; /* there was no activity log in 0.6, right? */ } int v08_move_internal_md_after_resize(struct format *cfg); int meta_apply_al(struct format *cfg, char **argv __attribute((unused)), int argc) { off_t al_size; struct al_4k_transaction_on_disk *al_4k_disk = on_disk_buffer; uint32_t hot_extent[AL_EXTENTS_MAX]; int need_to_update_md_flags = 0; int re_initialize_anyways = 0; int err; if (argc > 0) fprintf(stderr, "Ignoring additional arguments\n"); if (format_version(cfg) < Drbd_07) { fprintf(stderr, "apply-al only implemented for DRBD >= 0.7\n"); return -1; } err = cfg->ops->open(cfg); if (err == VALID_MD_FOUND_AT_LAST_KNOWN_LOCATION) { if (v08_move_internal_md_after_resize(cfg) == 0) err = cfg->ops->open(cfg); } if (err != VALID_MD_FOUND) { fprintf(stderr, "No valid meta data found\n"); return -1; } al_size = cfg->md.al_stripes * cfg->md.al_stripe_size_4k * 4096; /* read in first chunk (which is actually the whole AL * for old fixed size 32k activity log */ pread_or_die(cfg->md_fd, on_disk_buffer, al_size < buffer_size ? al_size : buffer_size, cfg->al_offset, "apply_al"); /* init all extent numbers to -1U aka "unused" */ memset(hot_extent, 0xff, sizeof(hot_extent)); /* replay al */ if (is_v07(cfg)) err = replay_al_07(cfg, hot_extent); /* FIXME * we should introduce a new meta data "super block" magic, so we won't * have the same super block with two different activity log * transaction layouts */ else if (DRBD_MD_MAGIC_84_UNCLEAN == cfg->md.magic || DRBD_AL_MAGIC == be32_to_cpu(al_4k_disk[0].magic.be) || DRBD_AL_MAGIC == be32_to_cpu(al_4k_disk[1].magic.be) || cfg->md.al_stripes != 1 || cfg->md.al_stripe_size_4k != 8) { err = replay_al_84(cfg, hot_extent); } else { /* try the old al format anyways, this may be the first time we * run after upgrading from < 8.4 to 8.4, and we need to * transparently "convert" the activity log format. */ err = replay_al_07(cfg, hot_extent); re_initialize_anyways = 1; } if (err < 0) { /* ENODATA: * most likely this is an uninitialized, * or at least non-8.4-style activity log. * Cannot do anything about that. * * EINVAL: * Some valid 8.4 style INITIALIZED transactions found, * but others have been corrupt, and no single "usable" * update transaction was found. * FIXME: what to do about that? * We probably need some "FORCE" mode as well. */ if (need_to_apply_al(cfg)) { /* 1, 2, 10, 20? FIXME sane exit codes! */ if (err == -ENODATA) return 1; return 2; } else if (is_v08(cfg)) { fprintf(stderr, "Error ignored, no need to apply the AL\n"); re_initialize_anyways = 1; } } /* do we need to actually apply it? */ if (err > 0 && need_to_apply_al(cfg)) { /* process hot extents in order, to reduce disk seeks. */ qsort(hot_extent, ARRAY_SIZE(hot_extent), sizeof(hot_extent[0]), cmp_u32); apply_al(cfg, hot_extent); need_to_update_md_flags = 1; } /* (Re-)initialize the activity log. * This is needed on 8.4, and does not hurt on < 8.4. * It may cause a "No usable activity log found" kernel message * if it is attached to < 8.4, but that is cosmetic. * We can skip this, if it was clean anyways (err == 0), * or if we know that this is for 0.7. */ if (re_initialize_anyways || (err > 0 && !is_v07(cfg))) initialize_al(cfg); if (is_v08(cfg) && ((cfg->md.flags & MDF_AL_CLEAN) == 0 || cfg->md.magic != DRBD_MD_MAGIC_08)) need_to_update_md_flags = 1; err = 0; if (need_to_update_md_flags) { /* Must not touch MDF_PRIMARY_IND. * This flag is used in-kernel to determine which * "wait-for-connection-timeout" is to be used. * Maybe it is time to reconsider the concept or * current implementation of "degr-wfc-timeout". * RFC: * If we set MDF_CRASHED_PRIMARY, in case MDF_PRIMARY_IND * was set, and clear MDF_PRIMARY_IND here, we can then * USE_DEGR_WFC_T as long as MDF_CRASHED_PRIMARY is set. * Maybe that even results in better semantics. */ if (is_v08(cfg)) { cfg->md.flags |= MDF_AL_CLEAN; cfg->md.magic = DRBD_MD_MAGIC_08; } err = cfg->ops->md_cpu_to_disk(cfg); err = cfg->ops->close(cfg) || err; if (err) fprintf(stderr, "update of super block flags failed\n"); } return err; } unsigned long bm_words(uint64_t sectors, int bytes_per_bit) { unsigned long long bits; unsigned long long words; bits = ALIGN(sectors, 8) / (bytes_per_bit / 512); words = ALIGN(bits, 64) >> LN2_BPL; return words; } static void printf_bm_eol(unsigned int i) { if ((i & 31) == 0) printf("\n # at %llukB\n ", (256LLU * i)); else printf("\n "); } /* le_u64, because we want to be able to hexdump it reliably * regardless of sizeof(long) */ void printf_bm(struct format *cfg) { off_t bm_on_disk_off = cfg->bm_offset; le_u64 const *bm = on_disk_buffer; le_u64 cw; /* current word for rl encoding */ const unsigned int n = cfg->bm_bytes/sizeof(*bm); unsigned int count = 0; unsigned int bits_set = 0; unsigned int n_buffer = 0; unsigned int r; /* real offset */ unsigned int i; /* in-buffer offset */ unsigned int j; i=0; r=0; cw.le = 0; /* silence compiler warning */ printf("bm {"); while (r < n) { /* need to read on first iteration, * and on buffer wrap */ if (r*sizeof(*bm) % buffer_size == 0) { size_t chunk = ALIGN( (n-r)*sizeof(*bm), cfg->md_hard_sect_size ); if (chunk > buffer_size) chunk = buffer_size; ASSERT(chunk); pread_or_die(cfg->md_fd, on_disk_buffer, chunk, bm_on_disk_off, "printf_bm"); bm_on_disk_off += chunk; i = 0; n_buffer = chunk/sizeof(*bm); } next: ASSERT(i < n_buffer); if (count == 0) cw = bm[i]; if ((i & 3) == 0) { if (!count) printf_bm_eol(r); /* j = i, because it may be continuation after buffer wrap */ for (j = i; j < n_buffer && cw.le == bm[j].le; j++) ; j &= ~3; // round down to a multiple of 4 unsigned int tmp = (j-i); if (tmp > 4) { count += tmp; r += tmp; i = j; if (j == n_buffer && r < n) continue; } if (count) { printf(" %u times 0x"X64(016)";", count, le64_to_cpu(cw.le)); bits_set += count * generic_hweight64(cw.le); count = 0; if (r >= n) break; /* don't "continue;", we may have not advanced i after buffer wrap, * so that would be treated as an other buffer wrap */ goto next; } } ASSERT(i < n_buffer); printf(" 0x"X64(016)";", le64_to_cpu(bm[i].le)); bits_set += generic_hweight64(bm[i].le); r++; i++; } printf("\n}\n"); cfg->bits_set = bits_set; } static void clip_la_sect_and_bm_bytes(struct format *cfg) { if (cfg->md.la_sect > cfg->max_usable_sect) { printf("# la-size-sect was too big (%llu), truncated (%llu)!\n", (unsigned long long)cfg->md.la_sect, (unsigned long long)cfg->max_usable_sect); cfg->md.la_sect = cfg->max_usable_sect; } cfg->bm_bytes = sizeof(long) * bm_words(cfg->md.la_sect, cfg->md.bm_bytes_per_bit); } int v07_style_md_open(struct format *cfg) { struct stat sb; unsigned long hard_sect_size = 0; int ioctl_err; int open_flags = O_RDWR | O_DIRECT; /* For old-style fixed size indexed external meta data, * we cannot really use O_EXCL, we have to trust the given minor. * * For internal, or "flexible" external meta data, we open O_EXCL to * avoid accidentally damaging otherwise in-use data, just because * someone had a typo in the command line. */ if (cfg->md_index < 0) open_flags |= O_EXCL; retry: cfg->md_fd = open(cfg->md_device_name, open_flags ); if (cfg->md_fd == -1) { int save_errno = errno; PERROR("open(%s) failed", cfg->md_device_name); if (save_errno == EBUSY && (open_flags & O_EXCL)) { if ((!force && command->function == &meta_apply_al) || !confirmed("Exclusive open failed. Do it anyways?")) { printf("Operation canceled.\n"); exit(20); } open_flags &= ~O_EXCL; goto retry; } if (save_errno == EINVAL && (open_flags & O_DIRECT)) { /* shoo. O_DIRECT is not supported? * retry, but remember this, so we can * BLKFLSBUF appropriately */ fprintf(stderr, "could not open with O_DIRECT, retrying without\n"); open_flags &= ~O_DIRECT; opened_odirect = 0; goto retry; } exit(20); } if (fstat(cfg->md_fd, &sb)) { PERROR("fstat(%s) failed", cfg->md_device_name); exit(20); } if (!S_ISBLK(sb.st_mode)) { fprintf(stderr, "'%s' is not a block device!\n", cfg->md_device_name); exit(20); } if (is_v08(cfg)) { ASSERT(cfg->md_index != DRBD_MD_INDEX_INTERNAL); } ioctl_err = ioctl(cfg->md_fd, BLKSSZGET, &hard_sect_size); if (ioctl_err) { fprintf(stderr, "ioctl(md_fd, BLKSSZGET) returned %d, " "assuming hard_sect_size is 512 Byte\n", ioctl_err); cfg->md_hard_sect_size = 512; } else { cfg->md_hard_sect_size = hard_sect_size; if (verbose >= 2) fprintf(stderr, "hard_sect_size is %d Byte\n", cfg->md_hard_sect_size); } cfg->bd_size = bdev_size(cfg->md_fd); if ((cfg->bd_size >> 9) < MD_BM_OFFSET_07) { fprintf(stderr, "%s is only %llu bytes. That's not enough.\n", cfg->md_device_name, (long long unsigned)cfg->bd_size); exit(10); } cfg->md_offset = v07_style_md_get_byte_offset(cfg->md_index, cfg->bd_size); if (cfg->md_offset > cfg->bd_size - 4096) { fprintf(stderr, "Device too small: expecting meta data block at\n" "byte offset %lld, but %s is only %llu bytes.\n", (signed long long)cfg->md_offset, cfg->md_device_name, (long long unsigned)cfg->bd_size); exit(10); } if (!opened_odirect && (MAJOR(sb.st_rdev) != RAMDISK_MAJOR)) { ioctl_err = ioctl(cfg->md_fd, BLKFLSBUF); /* report error, but otherwise ignore. we could not open * O_DIRECT, it is a "strange" device anyways. */ if (ioctl_err) fprintf(stderr, "ioctl(md_fd, BLKFLSBUF) returned %d, " "we may read stale data\n", ioctl_err); } if (cfg->ops->md_disk_to_cpu(cfg)) return NO_VALID_MD_FOUND; if(cfg->md.bm_bytes_per_bit == 0 ) { printf("bm-byte-per-bit was 0, fixed. (Set to 4096)\n"); cfg->md.bm_bytes_per_bit = 4096; } cfg->al_offset = cfg->md_offset + cfg->md.al_offset * 512LL; cfg->bm_offset = cfg->md_offset + cfg->md.bm_offset * 512LL; cfg->max_usable_sect = max_usable_sectors(cfg); clip_la_sect_and_bm_bytes(cfg); cfg->bits_set = -1U; /* FIXME paranoia verify that unused bits and words are unset... */ /* FIXME paranoia verify that unused bits and words are unset... */ return VALID_MD_FOUND; } int v07_md_disk_to_cpu(struct format *cfg) { struct md_cpu md; int ok; PREAD(cfg->md_fd, on_disk_buffer, sizeof(struct md_on_disk_07), cfg->md_offset); md_disk_07_to_cpu(&md, (struct md_on_disk_07*)on_disk_buffer); ok = is_valid_md(Drbd_07, &md, cfg->md_index, cfg->bd_size); if (ok) cfg->md = md; return ok ? 0 : -1; } int v07_md_cpu_to_disk(struct format *cfg) { if (!is_valid_md(Drbd_07, &cfg->md, cfg->md_index, cfg->bd_size)) return -1; md_cpu_to_disk_07(on_disk_buffer, &cfg->md); PWRITE(cfg->md_fd, on_disk_buffer, sizeof(struct md_on_disk_07), cfg->md_offset); return 0; } int v07_parse(struct format *cfg, char **argv, int argc, int *ai) { long index; char *e; if (argc < 2) { fprintf(stderr, "Too few arguments for format\n"); return -1; } cfg->md_device_name = strdup(argv[0]); if (!strcmp(argv[1],"internal")) { index = is_v07(cfg) ? DRBD_MD_INDEX_INTERNAL : DRBD_MD_INDEX_FLEX_INT; } else if (!strcmp(argv[1],"flex-external")) { index = DRBD_MD_INDEX_FLEX_EXT; } else if (!strcmp(argv[1],"flex-internal")) { index = DRBD_MD_INDEX_FLEX_INT; } else { e = argv[1]; errno = 0; index = strtol(argv[1], &e, 0); if (*e != 0 || 0 > index || index > 255 || errno != 0) { fprintf(stderr, "'%s' is not a valid index number.\n", argv[1]); return -1; } } cfg->md_index = index; *ai += 2; return 0; } int _v07_md_initialize(struct format *cfg, int do_disk_writes) { memset(&cfg->md, 0, sizeof(cfg->md)); cfg->md.la_sect = 0; cfg->md.gc[Flags] = 0; cfg->md.gc[HumanCnt] = 1; /* THINK 0? 1? */ cfg->md.gc[TimeoutCnt] = 1; cfg->md.gc[ConnectedCnt] = 1; cfg->md.gc[ArbitraryCnt] = 1; cfg->md.magic = DRBD_MD_MAGIC_07; return md_initialize_common(cfg, do_disk_writes); } int v07_md_initialize(struct format *cfg) { return _v07_md_initialize(cfg, 1); } /****************************************** }}} end of v07 ******************************************/ /****************************************** begin of v08 {{{ ******************************************/ /* if this returns with something != 0 in cfg->lk_bd.bd_size, * caller knows he must move the meta data to actually find it. */ void v08_check_for_resize(struct format *cfg) { struct md_cpu md_08; off_t flex_offset; int found = 0; /* you should not call me if you already found something. */ ASSERT(cfg->md.magic == 0); /* check for resized lower level device ... only check for drbd 8 */ if (!is_v08(cfg)) return; if (cfg->md_index != DRBD_MD_INDEX_FLEX_INT) return; /* Do we know anything? Maybe it never was stored. */ if (lk_bdev_load(cfg->minor, &cfg->lk_bd)) { if (verbose) fprintf(stderr, "no last-known offset information available.\n"); return; } if (verbose) { fprintf(stderr, " last known info: %llu %s\n", (unsigned long long)cfg->lk_bd.bd_size, cfg->lk_bd.bd_name ?: "-unknown device name-"); if (cfg->lk_bd.bd_uuid) fprintf(stderr, " last known uuid: "X64(016)"\n", cfg->lk_bd.bd_uuid); } /* I just checked that offset, nothing to see there. */ if (cfg->lk_bd.bd_size == cfg->bd_size) return; flex_offset = v07_style_md_get_byte_offset( DRBD_MD_INDEX_FLEX_INT, cfg->lk_bd.bd_size); /* actually check that offset, if it is accessible. */ /* If someone shrunk that device, I won't be able to read it! */ if (flex_offset < cfg->bd_size) { PREAD(cfg->md_fd, on_disk_buffer, 4096, flex_offset); md_disk_08_to_cpu(&md_08, (struct md_on_disk_08*)on_disk_buffer); found = is_valid_md(Drbd_08, &md_08, DRBD_MD_INDEX_FLEX_INT, cfg->lk_bd.bd_size); } if (verbose) { fprintf(stderr, "While checking for internal meta data for drbd%u on %s,\n" "it appears that it may have been relocated.\n" "It used to be ", cfg->minor, cfg->md_device_name); if (cfg->lk_bd.bd_name && strcmp(cfg->lk_bd.bd_name, cfg->md_device_name)) { fprintf(stderr, "on %s ", cfg->lk_bd.bd_name); } fprintf(stderr, "at byte offset %llu", (unsigned long long)flex_offset); if (!found) { fprintf(stderr, ", but I cannot find it now.\n"); if (flex_offset >= cfg->bd_size) fprintf(stderr, "Device is too small now!\n"); } else fprintf(stderr, ", and seems to still be valid.\n"); } if (found) { if (cfg->lk_bd.bd_uuid && md_08.device_uuid != cfg->lk_bd.bd_uuid) { fprintf(stderr, "Last known and found uuid differ!?\n" X64(016)" != "X64(016)"\n", cfg->lk_bd.bd_uuid, cfg->md.device_uuid); if (!force) { found = 0; fprintf(stderr, "You may --force me to ignore that.\n"); } else fprintf(stderr, "You --force'ed me to ignore that.\n"); } } if (found) cfg->md = md_08; return; } int v08_md_open(struct format *cfg) { int r = v07_style_md_open(cfg); if (r == VALID_MD_FOUND) return r; v08_check_for_resize(cfg); if (!cfg->lk_bd.bd_size || !cfg->md.magic) return NO_VALID_MD_FOUND; else return VALID_MD_FOUND_AT_LAST_KNOWN_LOCATION; } int v08_md_disk_to_cpu(struct format *cfg) { struct md_cpu md; int ok; PREAD(cfg->md_fd, on_disk_buffer, sizeof(struct md_on_disk_08), cfg->md_offset); md_disk_08_to_cpu(&md, (struct md_on_disk_08*)on_disk_buffer); ok = is_valid_md(Drbd_08, &md, cfg->md_index, cfg->bd_size); if (ok) cfg->md = md; if (verbose >= 3 + !!ok && verbose <= 10) fprintf_hex(stderr, cfg->md_offset, on_disk_buffer, 4096); return ok ? 0 : -1; } int v08_md_cpu_to_disk(struct format *cfg) { if (!is_valid_md(Drbd_08, &cfg->md, cfg->md_index, cfg->bd_size)) return -1; md_cpu_to_disk_08((struct md_on_disk_08 *)on_disk_buffer, &cfg->md); PWRITE(cfg->md_fd, on_disk_buffer, sizeof(struct md_on_disk_08), cfg->md_offset); cfg->update_lk_bdev = 1; return 0; } int _v08_md_initialize(struct format *cfg, int do_disk_writes) { size_t i; memset(&cfg->md, 0, sizeof(cfg->md)); cfg->md.la_sect = 0; cfg->md.uuid[UI_CURRENT] = UUID_JUST_CREATED; cfg->md.uuid[UI_BITMAP] = 0; for ( i=UI_HISTORY_START ; i<=UI_HISTORY_END ; i++ ) { cfg->md.uuid[i]=0; } cfg->md.flags = MDF_AL_CLEAN; cfg->md.magic = DRBD_MD_MAGIC_08; cfg->md.al_stripes = option_al_stripes; cfg->md.al_stripe_size_4k = option_al_stripe_size_4k; return md_initialize_common(cfg, do_disk_writes); } int v08_md_initialize(struct format *cfg) { return _v08_md_initialize(cfg, 1); } int v08_md_close(struct format *cfg) { /* update last known info, if we changed anything, * or if explicitly requested. */ if (cfg->update_lk_bdev && !dry_run) { if (cfg->md_index != DRBD_MD_INDEX_FLEX_INT) lk_bdev_delete(cfg->minor); else { cfg->lk_bd.bd_size = cfg->bd_size; cfg->lk_bd.bd_uuid = cfg->md.device_uuid; cfg->lk_bd.bd_name = cfg->md_device_name; lk_bdev_save(cfg->minor, &cfg->lk_bd); } } return generic_md_close(cfg); } /****************************************** }}} end of v08 ******************************************/ int meta_get_gi(struct format *cfg, char **argv __attribute((unused)), int argc) { if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } if (cfg->ops->open(cfg)) return -1; cfg->ops->get_gi(&cfg->md); return cfg->ops->close(cfg); } int meta_show_gi(struct format *cfg, char **argv __attribute((unused)), int argc) { char ppb[10]; if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } if (cfg->ops->open(cfg)) return -1; cfg->ops->show_gi(&cfg->md); if (cfg->md.la_sect) { printf("last agreed size: %s (%llu sectors)\n", ppsize(ppb, cfg->md.la_sect >> 1), (unsigned long long)cfg->md.la_sect); printf("last agreed max bio size: %u Byte\n", cfg->md.la_peer_max_bio_size); #if 0 /* FIXME implement count_bits() */ printf("%u bits set in the bitmap [ %s out of sync ]\n", cfg->bits_set, ppsize(ppb, cfg->bits_set * 4)); #endif } else { printf("zero size device -- never seen peer yet?\n"); } return cfg->ops->close(cfg); } int meta_dstate(struct format *cfg, char **argv __attribute((unused)), int argc) { if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } if (cfg->ops->open(cfg)) { fprintf(stderr, "No valid meta data found\n"); return -1; } if(cfg->md.flags & MDF_CONSISTENT) { if(cfg->md.flags & MDF_WAS_UP_TO_DATE) { if (cfg->md.flags & MDF_PEER_OUT_DATED) printf("UpToDate/Outdated\n"); else printf("Consistent/DUnknown\n"); } else { printf("Outdated/DUnknown\n"); } } else { printf("Inconsistent/DUnknown\n"); } return cfg->ops->close(cfg); } int meta_set_gi(struct format *cfg, char **argv, int argc) { struct md_cpu tmp; int err; if (argc > 1) { fprintf(stderr, "Ignoring additional arguments\n"); } if (argc < 1) { fprintf(stderr, "Required Argument missing\n"); exit(10); } if (cfg->ops->open(cfg)) return -1; tmp = cfg->md; cfg->ops->set_gi(&tmp,argv,argc); printf("previously "); cfg->ops->get_gi(&cfg->md); printf("set GI to "); cfg->ops->get_gi(&tmp); if (!confirmed("Write new GI to disk?")) { printf("Operation canceled.\n"); exit(0); } cfg->md = tmp; err = cfg->ops->md_cpu_to_disk(cfg); err = cfg->ops->close(cfg) || err; if (err) fprintf(stderr, "update failed\n"); return err; } void print_dump_header() { char time_str[60]; struct utsname nodeinfo; time_t t = time(NULL); int i; strftime(time_str, sizeof(time_str), "%F %T %z [%s]", localtime(&t)); uname(&nodeinfo); printf("# DRBD meta data dump\n# %s\n# %s>", time_str, nodeinfo.nodename); for (i=0; i < global_argc; i++) printf(" %s",global_argv[i]); printf("\n#\n\n"); } int meta_dump_md(struct format *cfg, char **argv __attribute((unused)), int argc) { int i; if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } i = cfg->ops->open(cfg); if (i == NO_VALID_MD_FOUND) { fprintf(stderr, "No valid meta data found\n"); return -1; } if (DRBD_MD_MAGIC_84_UNCLEAN == cfg->md.magic) { fprintf(stderr, "Found meta data is \"unclean\", please apply-al first\n"); if (!force) return -1; } print_dump_header(); printf("version \"%s\";\n\n", cfg->ops->name); if (DRBD_MD_MAGIC_84_UNCLEAN == cfg->md.magic) { printf("This_is_an_unclean_meta_data_dump._Don't_trust_the_bitmap.\n\n"); } printf("# md_size_sect %llu\n", (long long unsigned)cfg->md.md_size_sect); if (i == VALID_MD_FOUND_AT_LAST_KNOWN_LOCATION) { printf("#\n" "### Device seems to have been resized!\n" "### dumping meta data from the last known position\n" "### current size of %s: %llu byte\n" "### expected position of meta data:\n", cfg->md_device_name, (unsigned long long)cfg->bd_size); printf("## md_offset %llu\n", (long long unsigned)cfg->md_offset); printf("## al_offset %llu\n", (long long unsigned)cfg->al_offset); printf("## bm_offset %llu\n", (long long unsigned)cfg->bm_offset); printf( "### last known size of %s: %llu byte\n" "### adjusted position of meta data:\n", cfg->lk_bd.bd_name ?: "-?-", (unsigned long long)cfg->lk_bd.bd_size); cfg->md_offset = v07_style_md_get_byte_offset( DRBD_MD_INDEX_FLEX_INT, cfg->lk_bd.bd_size); cfg->al_offset = cfg->md_offset + cfg->md.al_offset * 512LL; cfg->bm_offset = cfg->md_offset + cfg->md.bm_offset * 512LL; cfg->bm_bytes = sizeof(long) * bm_words(cfg->md.la_sect, cfg->md.bm_bytes_per_bit); } printf("# md_offset %llu\n", (long long unsigned)cfg->md_offset); printf("# al_offset %llu\n", (long long unsigned)cfg->al_offset); printf("# bm_offset %llu\n", (long long unsigned)cfg->bm_offset); printf("\n"); if (format_version(cfg) < Drbd_08) { printf("gc {\n "); for (i = 0; i < GEN_CNT_SIZE; i++) { printf(" %d;", cfg->md.gc[i]); } printf("\n}\n"); } else { // >= 08 printf("uuid {\n "); for ( i=UI_CURRENT ; imd.uuid[i]); } printf("\n"); printf(" flags 0x"X32(08)";\n",cfg->md.flags); printf("}\n"); } if (format_version(cfg) >= Drbd_07) { printf("# al-extents %u;\n", cfg->md.al_nr_extents); printf("la-size-sect "U64";\n", cfg->md.la_sect); if (format_version(cfg) >= Drbd_08) { printf("bm-byte-per-bit "U32";\n", cfg->md.bm_bytes_per_bit); printf("device-uuid 0x"X64(016)";\n", cfg->md.device_uuid); printf("la-peer-max-bio-size %d;\n", cfg->md.la_peer_max_bio_size); if (cfg->md.al_stripes != 1 || cfg->md.al_stripe_size_4k != 8) { printf("al-stripes "U32";\n", cfg->md.al_stripes); printf("al-stripe-size-4k "U32";\n", cfg->md.al_stripe_size_4k); } } printf("# bm-bytes %u;\n", cfg->bm_bytes); printf_bm(cfg); /* pretty prints the whole bitmap */ printf("# bits-set %u;\n", cfg->bits_set); /* This is half assed, still. Hide it. */ if (verbose >= 10) printf_al(cfg); } return cfg->ops->close(cfg); } void md_parse_error(int expected_token, int seen_token,const char *etext) { if (!etext) { switch(expected_token) { /* leading space indicates to strip off "expected" below */ default : etext = " invalid/unexpected token!"; break; case 0 : etext = "end of file"; break; case ';': etext = "semicolon (;)"; break; case '{': etext = "opening brace ({)"; break; case '}': etext = "closing brace (})"; break; case TK_BM: etext = "keyword 'bm'"; break; case TK_BM_BYTE_PER_BIT: etext = "keyword 'bm-byte-per-bit'"; break; case TK_DEVICE_UUID: etext = "keyword 'device-uuid'"; break; case TK_FLAGS: etext = "keyword 'flags'"; break; case TK_GC: etext = "keyword 'gc'"; break; case TK_LA_SIZE: etext = "keyword 'la-size-sect'"; break; case TK_TIMES: etext = "keyword 'times'"; break; case TK_UUID: etext = "keyword 'uuid'"; break; case TK_VERSION: etext = "keyword 'version'"; break; case TK_NUM: etext = "number ([0-9], up to 20 digits)"; break; case TK_STRING: etext = "short quoted string " "(\"..up to 20 characters, no newline..\")"; break; case TK_U32: etext = "an 8-digit hex number"; break; case TK_U64: etext = "a 16-digit hex number"; break; } } fflush(stdout); fprintf(stderr,"Parse error in line %u: %s%s", yylineno, etext, (etext[0] == ' ' ? ":" : " expected") ); switch(seen_token) { case 0: fprintf(stderr, ", but end of file encountered\n"); break; case 1 ... 58: /* ord(';') == 58 */ case 60 ... 122: /* ord('{') == 123 */ case 124: /* ord('}') == 125 */ case 126 ... 257: /* oopsie. these should never be returned! */ fprintf(stderr, "; got token value %u (this should never happen!)\n", seen_token); break; break; case TK_INVALID_CHAR: fprintf(stderr,"; got invalid input character '\\x%02x' [%c]\n", (unsigned char)yylval.txt[0], yylval.txt[0]); break; case ';': case '{': case '}': fprintf(stderr, ", not '%c'\n", seen_token); break; case TK_NUM: case TK_U32: case TK_U64: fprintf(stderr, ", not some number\n"); break; case TK_INVALID: /* already reported by scanner */ fprintf(stderr,"\n"); break; default: fprintf(stderr, ", not '%s'\n", yylval.txt); } exit(10); } static void EXP(int expected_token) { int tok = yylex(); if (tok != expected_token) md_parse_error(expected_token, tok, NULL); } int verify_dumpfile_or_restore(struct format *cfg, char **argv, int argc, int parse_only) { int i,times; int err; off_t bm_on_disk_off; off_t bm_max_on_disk_off; le_u64 *bm, value; if (argc > 0) { yyin = fopen(argv[0],"r"); if(yyin == NULL) { fprintf(stderr, "open of '%s' failed.\n",argv[0]); exit(20); } } if (!parse_only) { if (!cfg->ops->open(cfg)) { if (!confirmed("Valid meta-data in place, overwrite?")) return -1; } else { check_for_existing_data(cfg); ASSERT(!is_v06(cfg)); } fprintf(stderr, "initializing with defaults\n"); if (is_v07(cfg)) _v07_md_initialize(cfg,0); else _v08_md_initialize(cfg,0); } EXP(TK_VERSION); EXP(TK_STRING); if(strcmp(yylval.txt,cfg->ops->name)) { fprintf(stderr,"dump is '%s' you requested '%s'.\n", yylval.txt,cfg->ops->name); exit(10); } EXP(';'); if (format_version(cfg) < Drbd_08) { EXP(TK_GC); EXP('{'); for (i = 0; i < GEN_CNT_SIZE; i++) { EXP(TK_NUM); EXP(';'); cfg->md.gc[i] = yylval.u64; } EXP('}'); } else { // >= 08 EXP(TK_UUID); EXP('{'); for ( i=UI_CURRENT ; imd.uuid[i] = yylval.u64; } EXP(TK_FLAGS); EXP(TK_U32); EXP(';'); cfg->md.flags = (uint32_t)yylval.u64; EXP('}'); } EXP(TK_LA_SIZE); EXP(TK_NUM); EXP(';'); cfg->md.la_sect = yylval.u64; if (format_version(cfg) >= Drbd_08) { EXP(TK_BM_BYTE_PER_BIT); EXP(TK_NUM); EXP(';'); cfg->md.bm_bytes_per_bit = yylval.u64; EXP(TK_DEVICE_UUID); EXP(TK_U64); EXP(';'); cfg->md.device_uuid = yylval.u64; int tok = yylex(); switch(tok) { case TK_LA_BIO_SIZE: EXP(TK_NUM); EXP(';'); cfg->md.la_peer_max_bio_size = yylval.u64; break; case TK_BM: goto start_of_bm; default: md_parse_error(TK_BM, 0, "keyword 'bm' or 'la-peer-max-bio-size'"); } /* do we have al stripe settings? */ tok = yylex(); switch(tok) { case TK_AL_STRIPES: EXP(TK_NUM); EXP(';'); cfg->md.al_stripes = yylval.u64; EXP(TK_AL_STRIPE_SIZE_4K); EXP(TK_NUM); EXP(';'); cfg->md.al_stripe_size_4k = yylval.u64; /* FIXME reject invalid values */ break; case TK_BM: goto start_of_bm; default: md_parse_error(TK_BM, 0, "keyword 'bm' or 'al-stripe-size'"); } } else { cfg->md.bm_bytes_per_bit = 4096; } EXP(TK_BM); start_of_bm: if (option_al_stripes != cfg->md.al_stripes || option_al_stripe_size_4k != cfg->md.al_stripe_size_4k) { if (option_al_stripes_used) { fprintf(stderr, "override activity log striping from commandline\n"); cfg->md.al_stripes = option_al_stripes; cfg->md.al_stripe_size_4k = option_al_stripe_size_4k; } if (verbose >= 2) fprintf(stderr, "adjusting activity-log and bitmap offsets\n"); re_initialize_md_offsets(cfg); } clip_la_sect_and_bm_bytes(cfg); EXP('{'); bm = (le_u64 *)on_disk_buffer; i = 0; bm_on_disk_off = cfg->bm_offset; bm_max_on_disk_off = bm_on_disk_off + ALIGN(cfg->bm_bytes, 4096); while(1) { int tok = yylex(); switch(tok) { case TK_U64: EXP(';'); /* NOTE: * even though this EXP(';'); already advanced * to the next token, yylval will *not* be updated * for * ';', so it is still valid. * * This seemed to be the least ugly way to implement a * "parse_only" functionality without ugly if-branches * or the maintenance nightmare of code duplication */ if (parse_only) break; bm[i].le = cpu_to_le64(yylval.u64); if ((unsigned)++i == buffer_size/sizeof(*bm)) { size_t s = pwrite_with_limit_or_die(cfg->md_fd, on_disk_buffer, buffer_size, bm_on_disk_off, bm_max_on_disk_off, "meta_restore_md:bitmap:TK_U64"); bm_on_disk_off += s; i = 0; if (s != buffer_size) { fprintf(stderr, "Bitmap info too large, truncated!\n"); if (parse_only) goto break_loop; else goto close; } } break; case TK_NUM: times = yylval.u64; EXP(TK_TIMES); EXP(TK_U64); EXP(';'); if (parse_only) break; value.le = cpu_to_le64(yylval.u64); while(times--) { bm[i] = value; if ((unsigned)++i == buffer_size/sizeof(*bm)) { size_t s = pwrite_with_limit_or_die(cfg->md_fd, on_disk_buffer, buffer_size, bm_on_disk_off, bm_max_on_disk_off, "meta_restore_md:bitmap:TK_NUM"); bm_on_disk_off += s; i = 0; if (s != buffer_size) { fprintf(stderr, "Bitmap info too large, truncated!\n"); if (parse_only) goto break_loop; else goto close; } } } break; case '}': goto break_loop; default: md_parse_error(0 /* ignored, since etext is set */, tok, "repeat count, 16-digit hex number, or closing brace (})"); goto break_loop; } } break_loop: /* there should be no trailing garbage in the input file */ EXP(0); if (parse_only) { printf("input file parsed ok\n"); return 0; } /* not reached if parse_only */ if (i) { size_t s = i * sizeof(*bm); memset(bm+i, 0x00, buffer_size - s); /* need to sector-align this for O_DIRECT. to be * generic, maybe we even need to PAGE align it? */ s = ALIGN(s, cfg->md_hard_sect_size); pwrite_with_limit_or_die(cfg->md_fd, on_disk_buffer, s, bm_on_disk_off, bm_max_on_disk_off, "meta_restore_md:bitmap:tail"); } close: err = cfg->ops->md_cpu_to_disk(cfg); err = cfg->ops->close(cfg) || err; if (err) { fprintf(stderr, "Writing failed\n"); return -1; } printf("Successfully restored meta data\n"); return 0; } int meta_restore_md(struct format *cfg, char **argv, int argc) { return verify_dumpfile_or_restore(cfg,argv,argc,0); } int meta_verify_dump_file(struct format *cfg, char **argv, int argc) { return verify_dumpfile_or_restore(cfg,argv,argc,1); } void md_convert_07_to_08(struct format *cfg) { int i,j; /* * FIXME * what about the UI_BITMAP, and the Activity Log? * how to bring them over for internal meta data? * * maybe just refuse to convert anything that is not * "clean"? how to detect that? * * FIXME: if I am a crashed R_PRIMARY, or D_INCONSISTENT, * or Want-Full-Sync or the like, * refuse, and indicate how to solve this */ printf("Converting meta data...\n"); //if (!cfg->bits_counted) count_bits(cfg); /* FIXME: * if this is "internal" meta data, and I have bits set, * either move the bitmap into the newly expected place, * or refuse, and indicate how to solve this */ /* KB <-> sectors is done in the md disk<->cpu functions. * We only need to adjust the magic here. */ cfg->md.magic = DRBD_MD_MAGIC_08; // The MDF Flags are (nearly) the same in 07 and 08 cfg->md.flags = cfg->md.gc[Flags]; cfg->md.uuid[UI_CURRENT] = (uint64_t)(cfg->md.gc[HumanCnt] & 0xffff) << 48 | (uint64_t)(cfg->md.gc[TimeoutCnt] & 0xffff) << 32 | (uint64_t)((cfg->md.gc[ConnectedCnt]+cfg->md.gc[ArbitraryCnt]) & 0xffff) << 16 | (uint64_t)0xbabe; cfg->md.uuid[UI_BITMAP] = (uint64_t)0; for (i = cfg->bits_set ? UI_BITMAP : UI_HISTORY_START, j = 1; i <= UI_HISTORY_END ; i++, j++) cfg->md.uuid[i] = cfg->md.uuid[UI_CURRENT] - j*0x10000; /* unconditionally re-initialize offsets, * not necessary if fixed size external, * necessary if flex external or internal */ re_initialize_md_offsets(cfg); if (!is_valid_md(Drbd_08, &cfg->md, cfg->md_index, cfg->bd_size)) { fprintf(stderr, "Conversion failed.\nThis is a bug :(\n"); exit(111); } } void md_convert_08_to_07(struct format *cfg) { /* * FIXME * what about the UI_BITMAP, and the Activity Log? * how to bring them over for internal meta data? * * maybe just refuse to convert anything that is not * "clean"? how to detect that? * * FIXME: if I am a crashed R_PRIMARY, or D_INCONSISTENT, * or Want-Full-Sync or the like, * refuse, and indicate how to solve this */ printf("Converting meta data...\n"); //if (!cfg->bits_counted) count_bits(cfg); /* FIXME: * if this is "internal" meta data, and I have bits set, * either move the bitmap into the newly expected place, * or refuse, and indicate how to solve this */ /* KB <-> sectors is done in the md disk<->cpu functions. * We only need to adjust the magic here. */ cfg->md.magic = DRBD_MD_MAGIC_07; /* FIXME somehow generate GCs in a sane way */ /* FIXME convert the flags? */ printf("Conversion v08 -> v07 is BROKEN!\n" "Be prepared to manually intervene!\n"); /* FIXME put some more helpful text here, indicating what exactly is to * be done to make this work as expected. */ /* unconditionally re-initialize offsets, * not necessary if fixed size external, * necessary if flex external or internal */ re_initialize_md_offsets(cfg); if (!is_valid_md(Drbd_07, &cfg->md, cfg->md_index, cfg->bd_size)) { fprintf(stderr, "Conversion failed.\nThis is a bug :(\n"); exit(111); } } /* if on the physical device we find some data we can interpret, * print some informational message about what we found, * and what we think how much room it needs. * * look into /usr/share/misc/magic for inspiration * also consider e.g. xfsprogs/libdisk/fstype.c, * and of course the linux kernel headers... */ struct fstype_s { const char * type; unsigned long long bnum, bsize; }; int may_be_extX(const char *data, struct fstype_s *f) { unsigned int size; if (le16_to_cpu(*(uint16_t*)(data+0x438)) == 0xEF53) { if ( (le32_to_cpu(*(data+0x45c)) & 4) == 4 ) f->type = "ext3 filesystem"; else f->type = "ext2 filesystem"; f->bnum = le32_to_cpu(*(uint32_t*)(data+0x404)); size = le32_to_cpu(*(uint32_t*)(data+0x418)); f->bsize = size == 0 ? 1024 : size == 1 ? 2048 : size == 2 ? 4096 : 4096; /* DEFAULT */ return 1; } return 0; } int may_be_xfs(const char *data, struct fstype_s *f) { if (be32_to_cpu(*(uint32_t*)(data+0)) == 0x58465342) { f->type = "xfs filesystem"; f->bsize = be32_to_cpu(*(uint32_t*)(data+4)); f->bnum = be64_to_cpu(*(uint64_t*)(data+8)); return 1; } return 0; } int may_be_reiserfs(const char *data, struct fstype_s *f) { if (strncmp("ReIsErFs",data+0x10034,8) == 0 || strncmp("ReIsEr2Fs",data+0x10034,9) == 0) { f->type = "reiser filesystem"; f->bnum = le32_to_cpu(*(uint32_t*)(data+0x10000)); f->bsize = le16_to_cpu(*(uint16_t*)(data+0x1002c)); return 1; } return 0; } int may_be_jfs(const char *data, struct fstype_s *f) { if (strncmp("JFS1",data+0x8000,4) == 0) { f->type = "JFS filesystem"; f->bnum = le64_to_cpu(*(uint64_t*)(data+0x8008)); f->bsize = le32_to_cpu(*(uint32_t*)(data+0x8018)); return 1; } return 0; } /* really large block size, * will always refuse */ #define REFUSE_BSIZE 0xFFFFffffFFFF0000LLU #define ERR_BSIZE 0xFFFFffffFFFF0001LLU #define REFUSE_IT() do { f->bnum = 1; f->bsize = REFUSE_BSIZE; } while(0) #define REFUSE_IT_ERR() do { f->bnum = 1; f->bsize = ERR_BSIZE; } while(0) int may_be_swap(const char *data, struct fstype_s *f) { int looks_like_swap = strncmp(data+(1<<12)-10, "SWAP-SPACE", 10) == 0 || strncmp(data+(1<<12)-10, "SWAPSPACE2", 10) == 0 || strncmp(data+(1<<13)-10, "SWAP-SPACE", 10) == 0 || strncmp(data+(1<<13)-10, "SWAPSPACE2", 10) == 0; if (looks_like_swap) { f->type = "swap space signature"; REFUSE_IT(); return 1; } return 0; } #define N_ERR_LINES 4 #define MAX_ERR_LINE_LEN 1024 int guessed_size_from_pvs(struct fstype_s *f, char *dev_name) { char buf_in[200]; char *buf_err[N_ERR_LINES]; size_t c; unsigned long long bnum; int pipes[3][2]; int err_lines = 0; FILE *child_err = NULL; int i; int ret = 0; pid_t pid; buf_err[0] = calloc(N_ERR_LINES, MAX_ERR_LINE_LEN); if (!buf_err[0]) return 0; for (i = 1; i < N_ERR_LINES; i++) buf_err[i] = buf_err[i-1] + MAX_ERR_LINE_LEN; for (i = 0; i < 3; i++) { if (pipe(pipes[i])) goto out; } pid = fork(); if (pid < 0) goto out; setenv("dev_name", dev_name, 1); if (pid == 0) { /* child */ char *argv[] = { "sh", "-vxc", "pvs -vvv --noheadings --nosuffix --units s -o pv_size" " --config \"devices { write_cache_state=0 filter = [ 'a|$dev_name|', 'r|.|' ] }\"", NULL, }; close(pipes[0][1]); /* close unused pipe ends */ close(pipes[1][0]); close(pipes[2][0]); dup2(pipes[0][0],0); /* map to expected stdin/out/err */ dup2(pipes[1][1],1); dup2(pipes[2][1],2); close(0); /* we do not use stdin */ execvp(argv[0], argv); _exit(0); } /* parent */ close(pipes[0][0]); /* close unused pipe ends */ close(pipes[1][1]); close(pipes[2][1]); close(pipes[0][1]); /* we do not use stdin in child */ /* We use blocking IO on pipes. This could deadlock, * If the child process would do something unexpected. * We do know the behaviour of pvs, though, * and expect only a few bytes on stdout, * and quite a few debug messages on stderr. * * First drain stderr, keeping the last N_ERR_LINES, * then read stdout. */ child_err = fdopen(pipes[2][0], "r"); if (child_err) { char *b; do { err_lines = (err_lines + 1) % N_ERR_LINES; b = fgets(buf_err[err_lines], MAX_ERR_LINE_LEN, child_err); } while (b); } c = read(pipes[1][0], buf_in, sizeof(buf_in)-1); if (c > 0) { buf_in[c] = 0; if (1 == sscanf(buf_in, " %llu\n", &bnum)) { f->bnum = bnum; f->bsize = 512; ret = 1; } } if (!ret) { for (i = 0; i < N_ERR_LINES; i++) { char *b = buf_err[(err_lines + i) % N_ERR_LINES]; if (b[0] == 0) continue; fprintf(stderr, "pvs stderr:%s", b); } fprintf(stderr, "\n"); } i = 2; out: for ( ; i >= 0; i--) { close(pipes[i][0]); close(pipes[i][1]); } if (child_err) fclose(child_err); free(buf_err[0]); return ret; } int may_be_LVM(const char *data, struct fstype_s *f, char *dev_name) { if (strncmp("LVM2",data+0x218,4) == 0) { f->type = "LVM2 physical volume signature"; if (!guessed_size_from_pvs(f, dev_name)) REFUSE_IT_ERR(); return 1; } return 0; } /* XXX should all this output go to stderr? */ void check_for_existing_data(struct format *cfg) { struct fstype_s f; size_t i; uint64_t fs_kB; uint64_t max_usable_kB; PREAD(cfg->md_fd, on_disk_buffer, SO_MUCH, 0); for (i = 0; i < SO_MUCH/sizeof(long); i++) { if (((long*)(on_disk_buffer))[i] != 0LU) break; } /* all zeros? no message */ if (i == SO_MUCH/sizeof(long)) return; f.type = "some data"; f.bnum = 0; f.bsize = 0; /* FIXME add more detection magic. * Or, rather, use some lib. */ (void)( may_be_swap (on_disk_buffer,&f) || may_be_LVM (on_disk_buffer,&f, cfg->md_device_name) || may_be_extX (on_disk_buffer,&f) || may_be_xfs (on_disk_buffer,&f) || may_be_jfs (on_disk_buffer,&f) || may_be_reiserfs (on_disk_buffer,&f) ); /* FIXME * some of the messages below only make sense for internal meta data. * for external meta data, we now only checked the meta-disk. * we should still check the actual lower level storage area for * existing data, too, and give appropriate warnings when it would * appear to be truncated by too small external meta data */ printf("md_offset %llu\n", (long long unsigned)cfg->md_offset); printf("al_offset %llu\n", (long long unsigned)cfg->al_offset); printf("bm_offset %llu\n", (long long unsigned)cfg->bm_offset); printf("\nFound %s\n", f.type); /* FIXME overflow check missing! * relevant for ln2(bsize) + ln2(bnum) >= 64, thus only for * device sizes of more than several exa byte. * seems irrelevant to me for now. */ fs_kB = ((f.bsize * f.bnum) + (1<<10)-1) >> 10; max_usable_kB = max_usable_sectors(cfg) >> 1; if (f.bnum) { if (cfg->md_index >= 0 || cfg->md_index == DRBD_MD_INDEX_FLEX_EXT) { printf("\nThis would corrupt existing data.\n"); if (ignore_sanity_checks) { printf("\nIgnoring sanity check on user request.\n\n"); return; } printf( "If you want me to do this, you need to zero out the first part\n" "of the device (destroy the content).\n" "You should be very sure that you mean it.\n" "Operation refused.\n\n"); exit(40); /* FIXME sane exit code! */ } if (f.bsize < REFUSE_BSIZE) printf("%12llu kB data area apparently used\n", (unsigned long long)fs_kB); printf("%12llu kB left usable by current configuration\n", (unsigned long long)max_usable_kB); if (f.bsize == ERR_BSIZE) printf( "Could not determine the size of the actually used data area.\n\n"); if (f.bsize >= REFUSE_BSIZE) { printf( "Device size would be truncated, which\n" "would corrupt data and result in\n" "'access beyond end of device' errors.\n"); if (ignore_sanity_checks) { printf("\nIgnoring sanity check on user request.\n\n"); return; } printf( "If you want me to do this, you need to zero out the first part\n" "of the device (destroy the content).\n" "You should be very sure that you mean it.\n" "Operation refused.\n\n"); exit(40); /* FIXME sane exit code! */ } /* looks like file system data */ if (fs_kB > max_usable_kB) { printf( "\nDevice size would be truncated, which\n" "would corrupt data and result in\n" "'access beyond end of device' errors.\n" "You need to either\n" " * use external meta data (recommended)\n" " * shrink that filesystem first\n" " * zero out the device (destroy the filesystem)\n" "Operation refused.\n\n"); exit(40); /* FIXME sane exit code! */ } else { printf( "\nEven though it looks like this would place the new meta data into\n" "unused space, you still need to confirm, as this is only a guess.\n"); } } else printf("\n ==> This might destroy existing data! <==\n"); if (!confirmed("Do you want to proceed?")) { printf("Operation canceled.\n"); exit(1); // 1 to avoid online resource counting } } void check_internal_md_flavours(struct format * cfg) { struct md_cpu md_07; struct md_cpu md_07p; struct md_cpu md_08; off_t fixed_offset, flex_offset; int have_fixed_v07 = 0; int have_flex_v07 = 0; int have_flex_v08 = 0; ASSERT( cfg->md_index == DRBD_MD_INDEX_INTERNAL || cfg->md_index == DRBD_MD_INDEX_FLEX_INT ); fixed_offset = v07_style_md_get_byte_offset( DRBD_MD_INDEX_INTERNAL, cfg->bd_size); flex_offset = v07_style_md_get_byte_offset( DRBD_MD_INDEX_FLEX_INT, cfg->bd_size); /* printf("%lld\n%lld\n%lld\n", (long long unsigned)cfg->bd_size, (long long unsigned)fixed_offset, (long long unsigned)flex_offset); */ if (0 <= fixed_offset && fixed_offset < (off_t)cfg->bd_size - 4096) { /* ... v07 fixed-size internal meta data? */ PREAD(cfg->md_fd, on_disk_buffer, 4096, fixed_offset); md_disk_07_to_cpu(&md_07, (struct md_on_disk_07*)on_disk_buffer); have_fixed_v07 = is_valid_md(Drbd_07, &md_07, DRBD_MD_INDEX_INTERNAL, cfg->bd_size); } PREAD(cfg->md_fd, on_disk_buffer, 4096, flex_offset); /* ... v07 (plus) flex-internal meta data? */ md_disk_07_to_cpu(&md_07p, (struct md_on_disk_07*)on_disk_buffer); have_flex_v07 = is_valid_md(Drbd_07, &md_07p, DRBD_MD_INDEX_FLEX_INT, cfg->bd_size); /* ... v08 flex-internal meta data? * (same offset, same on disk data) */ md_disk_08_to_cpu(&md_08, (struct md_on_disk_08*)on_disk_buffer); have_flex_v08 = is_valid_md(Drbd_08, &md_08, DRBD_MD_INDEX_FLEX_INT, cfg->bd_size); if (!(have_fixed_v07 || have_flex_v07 || have_flex_v08)) return; ASSERT(have_flex_v07 == 0 || have_flex_v08 == 0); /* :-) */ fprintf(stderr, "You want me to create a %s%s style %s internal meta data block.\n", cfg->ops->name, (is_v07(cfg) && cfg->md_index == DRBD_MD_INDEX_FLEX_INT) ? "(plus)" : "", cfg->md_index == DRBD_MD_INDEX_FLEX_INT ? "flexible-size" : "fixed-size"); if (have_fixed_v07) { fprintf(stderr, "There appears to be a v07 fixed-size internal meta data block\n" "already in place on %s at byte offset %llu\n", cfg->md_device_name, (long long unsigned)fixed_offset); } if (have_flex_v07) { fprintf(stderr, "There appears to be a v07(plus) flexible-size internal meta data block\n" "already in place on %s at byte offset %llu", cfg->md_device_name, (long long unsigned)flex_offset); } if (have_flex_v08) { fprintf(stderr, "There appears to be a v08 flexible-size internal meta data block\n" "already in place on %s at byte offset %llu", cfg->md_device_name, (long long unsigned)flex_offset); } if (have_fixed_v07 && have_flex_v07) { fprintf(stderr, "Don't know what to do now. If you want this to work,\n" "Please wipe out at least one of these.\n"); exit(10); } if (is_v08(cfg)) { if (have_flex_v08) { if (cfg->md.al_stripes != option_al_stripes || cfg->md.al_stripe_size_4k != option_al_stripe_size_4k) { if (confirmed("Do you want to change the activity log stripe settings only?")) { fprintf(stderr, "sorry, not yet fully implemented\n"); exit(30); cfg->md.al_stripes = option_al_stripes; cfg->md.al_stripe_size_4k = option_al_stripe_size_4k; re_initialize_md_offsets(cfg); have_flex_v08 = 0; /* do not wipe this */ goto out; } } if (!confirmed("Do you really want to overwrite the existing v08 meta-data?")) { printf("Operation cancelled.\n"); exit(1); // 1 to avoid online resource counting } /* no need to wipe flex offset, * will be overwritten with new data */ cfg->md.magic = 0; have_flex_v08 = 0; } if ( (have_fixed_v07||have_flex_v07) ) { if (confirmed("Convert the existing v07 meta-data to v08?")) { cfg->md = have_fixed_v07 ? md_07 : md_07p; md_convert_07_to_08(cfg); /* goto wipe; */ } else if (!confirmed("So you want me to wipe out the v07 meta-data?")) { printf("Operation cancelled.\n"); exit(1); // 1 to avoid online resource counting } } } else { /* is_v07(cfg) */ if (have_fixed_v07 || have_flex_v07) { if (!confirmed("Do you really want to overwrite the existing v07 meta-data?")) { printf("Operation cancelled.\n"); exit(1); // 1 to avoid online resource counting } /* no need to wipe the requested flavor, * will be overwritten with new data */ cfg->md.magic = 0; if (cfg->md_index == DRBD_MD_INDEX_INTERNAL) have_fixed_v07 = 0; else have_flex_v07 = 0; } if (have_flex_v08) { if (confirmed("Valid v08 meta-data found, convert back to v07?")) { cfg->md = md_08; md_convert_08_to_07(cfg); if (cfg->md_index == DRBD_MD_INDEX_FLEX_INT) have_flex_v08 = 0; /* goto wipe; */ } } } out: if (have_fixed_v07) cfg->wipe_fixed = fixed_offset; if (have_flex_v08 || have_flex_v07) cfg->wipe_flex = flex_offset; } void wipe_after_convert(struct format *cfg) { memset(on_disk_buffer, 0x00, 4096); if (cfg->wipe_fixed) pwrite_or_die(cfg->md_fd, on_disk_buffer, 4096, cfg->wipe_fixed, "wipe fixed-size v07 internal md"); if (cfg->wipe_flex) pwrite_or_die(cfg->md_fd, on_disk_buffer, 4096, cfg->wipe_flex, "wipe flexible-size internal md"); } void check_external_md_flavours(struct format * cfg) { struct md_cpu md_07; struct md_cpu md_08; ASSERT( cfg->md_index >= 0 || cfg->md_index == DRBD_MD_INDEX_FLEX_EXT ); if (cfg->md.magic) { if (!confirmed("Valid meta data seems to be in place.\n" "Do you really want to overwrite?")) { printf("Operation cancelled.\n"); exit(1); } cfg->md.magic = 0; /* will be re-initialized below */ return; } PREAD(cfg->md_fd, on_disk_buffer, 4096, cfg->md_offset); if (is_v08(cfg)) { md_disk_07_to_cpu(&md_07, (struct md_on_disk_07*)on_disk_buffer); if (!is_valid_md(Drbd_07, &md_07, cfg->md_index, cfg->bd_size)) return; if (confirmed("Valid v07 meta-data found, convert to v08?")) { cfg->md = md_07; md_convert_07_to_08(cfg); return; } if (!confirmed("So you want me to replace the v07 meta-data\n" "with newly initialized v08 meta-data?")) { printf("Operation cancelled.\n"); exit(1); } } else if (is_v07(cfg)) { md_disk_08_to_cpu(&md_08, (struct md_on_disk_08*)on_disk_buffer); if (!is_valid_md(Drbd_08, &md_08, cfg->md_index, cfg->bd_size)) return; if (confirmed("Valid v08 meta-data found, convert back to v07?")) { cfg->md = md_08; md_convert_08_to_07(cfg); return; } if (!confirmed("So you want me to replace the v08 meta-data\n" "with newly initialized v07 meta-data?")) { printf("Operation cancelled.\n"); exit(1); } } } /* ok, so there is no valid meta data at the end of the device, * but there is valid internal meta data at the "last known" * position. Move the stuff. * Areas may overlap: * |--...~//~[BITMAP][AL][SB]| <<- last known * |--.......~//~[BITMAP][AL][SB]| <<- what it should look like now * So we move it in chunks. */ int v08_move_internal_md_after_resize(struct format *cfg) { off_t old_offset; off_t old_bm_offset; off_t cur_offset; off_t last_chunk_size; int err; ASSERT(is_v08(cfg)); ASSERT(cfg->md_index == DRBD_MD_INDEX_FLEX_INT); ASSERT(cfg->lk_bd.bd_size <= cfg->bd_size); /* we just read it in v08_check_for_resize(). * no need to do it again, but ASSERT this. */ old_offset = v07_style_md_get_byte_offset(DRBD_MD_INDEX_FLEX_INT, cfg->lk_bd.bd_size); /* PREAD(cfg->md_fd, on_disk_buffer, 4096, old_offset); md_disk_08_to_cpu(&md_08, (struct md_on_disk_08*)on_disk_buffer); */ ASSERT(is_valid_md(Drbd_08, &cfg->md, DRBD_MD_INDEX_FLEX_INT, cfg->lk_bd.bd_size)); fprintf(stderr, "Moving the internal meta data to its proper location\n"); /* FIXME * If the new meta data area overlaps the old "super block", * and we crash before we successfully wrote the new super block, * but after we overwrote the old, we are out of luck! * But I don't want to write the new superblock early, either. */ /* move activity log, fixed size immediately preceeding the "super block". */ cur_offset = old_offset + cfg->md.al_offset * 512LL; PREAD(cfg->md_fd, on_disk_buffer, old_offset - cur_offset, cur_offset); PWRITE(cfg->md_fd, on_disk_buffer, old_offset - cur_offset, cfg->al_offset); /* The AL was of fixed size. * Bitmap is of flexible size, new bitmap is likely larger. * We do not initialize that part, we just leave "garbage" in there. * Once DRBD "agrees" on the new lower level device size, that part of * the bitmap will be handled by the module, anyways. */ old_bm_offset = old_offset + cfg->md.bm_offset * 512LL; /* move bitmap, in chunks, peel off from the end. */ cur_offset = old_offset + cfg->md.al_offset * 512LL - buffer_size; while (cur_offset > old_bm_offset) { PREAD(cfg->md_fd, on_disk_buffer, buffer_size, cur_offset); PWRITE(cfg->md_fd, on_disk_buffer, buffer_size, cfg->bm_offset + (cur_offset - old_bm_offset)); cur_offset -= buffer_size; } /* Adjust for last, possibly partial buffer. */ last_chunk_size = buffer_size - (old_bm_offset - cur_offset); PREAD(cfg->md_fd, on_disk_buffer, last_chunk_size, old_bm_offset); PWRITE(cfg->md_fd, on_disk_buffer, last_chunk_size, cfg->bm_offset); /* fix bitmap offset in meta data, * and rewrite the "super block" */ re_initialize_md_offsets(cfg); err = cfg->ops->md_cpu_to_disk(cfg); if (!err) printf("Internal drbd meta data successfully moved.\n"); if (!err && old_offset < cfg->bm_offset) { /* wipe out previous meta data block, it has been superseded. */ memset(on_disk_buffer, 0, 4096); PWRITE(cfg->md_fd, on_disk_buffer, 4096, old_offset); } err = cfg->ops->close(cfg) || err; if (err) fprintf(stderr, "operation failed\n"); return err; } int meta_create_md(struct format *cfg, char **argv __attribute((unused)), int argc) { int err = 0; if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } err = cfg->ops->open(cfg); /* Suggest to move existing meta data after offline resize. Though, if * you --force create-md, you probably mean it, so we don't even ask. * If you want to automatically move it, use check-resize. */ if (err == VALID_MD_FOUND_AT_LAST_KNOWN_LOCATION) { if (option_al_stripes_used) { if (option_al_stripes != cfg->md.al_stripes || option_al_stripe_size_4k != cfg->md.al_stripe_size_4k) { fprintf(stderr, "Cannot move after offline resize and change AL-striping at the same time, yet.\n"); exit(20); } } if (!force && confirmed("Move internal meta data from last-known position?\n")) { /* Maybe we want to use some library that provides detection of * fs/partition/usage types? */ check_for_existing_data(cfg); return v08_move_internal_md_after_resize(cfg); } /* else: reset cfg->md, it needs to be re-initialized below */ memset(&cfg->md, 0, sizeof(cfg->md)); } /* the offset of v07 fixed-size internal meta data is different from * the offset of the flexible-size v07 ("plus") and v08 (default) * internal meta data. * to avoid the situation where we would have "valid" meta data blocks * of different versions at different offsets, we also need to check * the other format, and the other offset. * * on a request to create v07 fixed-size internal meta data, we also * check flex-internal v08 [and v07 (plus)] at the other offset. * * on a request to create v08 flex-internal meta data (or v07 plus, for * that matter), we also check the same offset for the respective other * flex-internal format version, as well as the v07 fixed-size internal * meta data offset for its flavor of meta data. */ if (cfg->md_index == DRBD_MD_INDEX_INTERNAL || cfg->md_index == DRBD_MD_INDEX_FLEX_INT) check_internal_md_flavours(cfg); else check_external_md_flavours(cfg); printf("Writing meta data...\n"); if (!cfg->md.magic) /* not converted: initialize */ /* calls check_for_existing_data() internally */ err = cfg->ops->md_initialize(cfg); /* Clears on disk AL implicitly */ else { err = 0; /* we have sucessfully converted somthing */ check_for_existing_data(cfg); } cfg->md.la_peer_max_bio_size = option_peer_max_bio_size; /* FIXME * if this converted fixed-size 128MB internal meta data * to flexible size, we'd need to move the AL and bitmap * over to the new location! * But the upgrade procedure in such case is documented to first get * the previous DRBD into "clean" C_CONNECTED R_SECONDARY/R_SECONDARY, so AL * and bitmap should be empty anyways. */ err = err || cfg->ops->md_cpu_to_disk(cfg); // <- short circuit if (!err) wipe_after_convert(cfg); err = cfg->ops->close(cfg) || err; // <- close always if (err) fprintf(stderr, "operation failed\n"); else printf("New drbd meta data block successfully created.\n"); return err; } int meta_wipe_md(struct format *cfg, char **argv __attribute((unused)), int argc) { int virgin, err; if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } virgin = cfg->ops->open(cfg); if (virgin) { fprintf(stderr,"There appears to be no drbd meta data to wipe out?\n"); return 0; } if (!confirmed("Do you really want to wipe out the DRBD meta data?")) { printf("Operation cancelled.\n"); exit(1); } printf("Wiping meta data...\n"); memset(on_disk_buffer, 0, 4096); PWRITE(cfg->md_fd, on_disk_buffer, 4096, cfg->md_offset); err = cfg->ops->close(cfg); if (err) fprintf(stderr, "operation failed\n"); else printf("DRBD meta data block successfully wiped out.\n"); /* delete last-known bdev info, it is of no use now. */ lk_bdev_delete(cfg->minor); return err; } int meta_outdate(struct format *cfg, char **argv __attribute((unused)), int argc) { int err; if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } if (cfg->ops->open(cfg)) return -1; if (cfg->ops->outdate_gi(&cfg->md)) { fprintf(stderr, "Device is inconsistent.\n"); exit(5); } err = cfg->ops->md_cpu_to_disk(cfg); err = cfg->ops->close(cfg) || err; // <- close always if (err) fprintf(stderr, "update failed\n"); return err; } int meta_invalidate(struct format *cfg, char **argv __attribute((unused)), int argc) { int err; if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } if (cfg->ops->open(cfg)) return -1; cfg->ops->invalidate_gi(&cfg->md); err = cfg->ops->md_cpu_to_disk(cfg); err = cfg->ops->close(cfg) || err; // <- close always if (err) fprintf(stderr, "update failed\n"); return err; } int meta_read_dev_uuid(struct format *cfg, char **argv __attribute((unused)), int argc) { if (argc > 0) { fprintf(stderr, "Ignoring additional arguments\n"); } if (cfg->ops->open(cfg)) return -1; printf(X64(016)"\n",cfg->md.device_uuid); return cfg->ops->close(cfg); } int meta_write_dev_uuid(struct format *cfg, char **argv, int argc) { int err; if (argc > 1) { fprintf(stderr, "Ignoring additional arguments\n"); } if (argc < 1) { fprintf(stderr, "Required Argument missing\n"); exit(10); } if (cfg->ops->open(cfg)) return -1; cfg->md.device_uuid = strto_u64(argv[0],NULL,16); err = cfg->ops->md_cpu_to_disk(cfg); err = cfg->ops->close(cfg) || err; if (err) fprintf(stderr, "update failed\n"); return err; } char *progname = NULL; void print_usage_and_exit() { char **args; size_t i; printf ("\nUSAGE: %s [--force] DEVICE FORMAT [FORMAT ARGS...] COMMAND [CMD ARGS...]\n", progname); printf("\nFORMATS:\n"); for (i = Drbd_06; i < Drbd_Unknown; i++) { printf(" %s", f_ops[i].name); if ((args = f_ops[i].args)) { while (*args) { printf(" %s", *args++); } } printf("\n"); } printf("\nCOMMANDS:\n"); for (i = 0; i < ARRAY_SIZE(cmds); i++) { if (!cmds[i].show_in_usage) continue; printf(" %s %s\n", cmds[i].name, cmds[i].args ? cmds[i].args : ""); } exit(20); } int parse_format(struct format *cfg, char **argv, int argc, int *ai) { enum Known_Formats f; if (argc < 1) { fprintf(stderr, "Format identifier missing\n"); return -1; } for (f = Drbd_06; f < Drbd_Unknown; f++) { if (!strcmp(f_ops[f].name, argv[0])) break; } if (f == Drbd_Unknown) { fprintf(stderr, "Unknown format '%s'.\n", argv[0]); return -1; } (*ai)++; cfg->ops = f_ops + f; return cfg->ops->parse(cfg, argv + 1, argc - 1, ai); } int is_attached(int minor) { FILE *pr; char token[128]; /* longest interesting token is 40 Byte (git hash) */ int rv = -1; long m, cm = -1; char *p; pr = fopen("/proc/drbd", "r"); if (!pr) return 0; while (fget_token(token, sizeof(token), pr) != EOF) { m = strtol(token, &p, 10); /* keep track of currently parsed minor */ if (p[0] == ':' && p[1] == 0) cm = m; /* we found the minor number that was asked for */ if (cm == minor) { /* first, assume it is attached */ if (rv == -1) rv = 1; /* unless, of course, it is unconfigured or diskless */ if (!strcmp(token, "cs:Unconfigured")) rv = 0; if (!strncmp(token, "ds:Diskless", 11)) rv = 0; } } fclose(pr); if (rv == -1) rv = 0; // minor not found -> not attached. return rv; } int meta_chk_offline_resize(struct format *cfg, char **argv, int argc) { int err; err = cfg->ops->open(cfg); /* this is first, so that lk-bdev-info files are removed/updated * if we find valid meta data in the expected place. */ if (err == VALID_MD_FOUND) { /* Do not clutter the output of the init script printf("Found valid meta data in the expected location, %llu bytes into %s.\n", (unsigned long long)cfg->md_offset, cfg->md_device_name); */ /* create, delete or update the last known info */ err = lk_bdev_load(cfg->minor, &cfg->lk_bd); if (cfg->md_index != DRBD_MD_INDEX_FLEX_INT) lk_bdev_delete(cfg->minor); else if (cfg->lk_bd.bd_size != cfg->bd_size || cfg->lk_bd.bd_uuid != cfg->md.device_uuid) cfg->update_lk_bdev = 1; return cfg->ops->close(cfg); } else if (err == NO_VALID_MD_FOUND) { if (!is_v08(cfg) || cfg->md_index != DRBD_MD_INDEX_FLEX_INT) { fprintf(stderr, "Operation only supported for v8 internal meta data\n"); return -1; } fprintf(stderr, "no suitable meta data found :(\n"); return -1; /* sorry :( */ } ASSERT(is_v08(cfg)); ASSERT(cfg->md_index == DRBD_MD_INDEX_FLEX_INT); ASSERT(cfg->lk_bd.bd_size); ASSERT(cfg->md.magic); return v08_move_internal_md_after_resize(cfg); } /* CALL ONLY ONCE as long as on_disk_buffer is global! */ struct format *new_cfg() { int err; struct format *cfg; errno = 0; pagesize = sysconf(_SC_PAGESIZE); if (errno) { perror("could not determine pagesize"); exit(20); } cfg = calloc(1, sizeof(struct format)); if (!cfg) { fprintf(stderr, "could not calloc() cfg\n"); exit(20); } err = posix_memalign(&on_disk_buffer,pagesize, (buffer_size+pagesize-1)/pagesize*pagesize); if (err) { fprintf(stderr, "could not posix_memalign() on_disk_buffer\n"); exit(20); } return cfg; } int main(int argc, char **argv) { struct format *cfg; size_t i; int ai; #if 1 if (sizeof(struct md_on_disk_07) != 4096) { fprintf(stderr, "Where did you get this broken build!?\n" "sizeof(md_on_disk_07) == %lu, should be 4096\n", (unsigned long)sizeof(struct md_on_disk_07)); exit(111); } if (sizeof(struct md_on_disk_08) != 4096) { fprintf(stderr, "Where did you get this broken build!?\n" "sizeof(md_on_disk_08) == %lu, should be 4096\n", (unsigned long)sizeof(struct md_on_disk_08)); exit(111); } #if 0 printf("v07: al_offset: %u\n", (int)&(((struct md_on_disk_07*)0)->al_offset)); printf("v07: bm_offset: %u\n", (int)&(((struct md_on_disk_07*)0)->bm_offset)); printf("v08: al_offset: %u\n", (int)&(((struct md_on_disk_08*)0)->al_offset)); printf("v08: bm_offset: %u\n", (int)&(((struct md_on_disk_08*)0)->bm_offset)); exit(0); #endif #endif if ((progname = strrchr(argv[0], '/'))) { argv[0] = ++progname; } else { progname = argv[0]; } if (argc < 4) print_usage_and_exit(); /* so dump_md can write a nice header */ global_argc = argc; global_argv = argv; /* Check for options (e.g. --force) */ while (1) { int c = getopt_long(argc, argv, make_optstring(metaopt), metaopt, 0); if (c == -1) break; switch (c) { case 0: break; case 'f': force = 1; break; case 'v': verbose++; break; case 'p': option_peer_max_bio_size = m_strtoll(optarg, 1); if (option_peer_max_bio_size < 0 || option_peer_max_bio_size > 128 * 1024) { fprintf(stderr, "peer-max-bio-size out of range (0...128k)\n"); exit(10); } break; case 's': option_al_stripes = m_strtoll(optarg, 1); option_al_stripes_used = 1; break; case 'z': option_al_stripe_size_4k = m_strtoll(optarg, 'k')/4; option_al_stripes_used = 1; break; default: print_usage_and_exit(); break; } } // Next argument to process is specified by optind... ai = optind; cfg = new_cfg(); cfg->drbd_dev_name = argv[ai++]; if (parse_format(cfg, argv + ai, argc - ai, &ai)) { /* parse has already printed some error message */ exit(20); } if (ai >= argc) { fprintf(stderr, "command missing\n"); exit(20); } for (i = 0; i < ARRAY_SIZE(cmds); i++) { if (!strcmp(cmds[i].name, argv[ai])) { command = cmds + i; break; } } if (command == NULL) { fprintf(stderr, "Unknown command '%s'.\n", argv[ai]); exit(20); } ai++; /* does exit() unless we acquired the lock. * unlock happens implicitly when the process dies, * but may be requested implicitly */ cfg->minor = dt_minor_of_dev(cfg->drbd_dev_name); if (cfg->minor < 0) { fprintf(stderr, "Cannot determine minor device number of " "drbd device '%s'", cfg->drbd_dev_name); exit(20); } cfg->lock_fd = dt_lock_drbd(cfg->minor); /* unconditionally check whether this is in use */ if (is_attached(cfg->minor)) { if (!(force && (command->function == meta_dump_md))) { fprintf(stderr, "Device '%s' is configured!\n", cfg->drbd_dev_name); exit(20); } } if (option_peer_max_bio_size && command->function != &meta_create_md) { fprintf(stderr, "The --peer-max-bio-size option is only allowed with create-md\n"); exit(10); } if (option_al_stripes_used && command->function != &meta_create_md && command->function != &meta_restore_md) { fprintf(stderr, "The --al-stripe* options are only allowed with create-md and restore-md\n"); exit(10); } /* at some point I'd like to go for this: (16*1024*1024/4) */ if ((uint64_t)option_al_stripes * option_al_stripe_size_4k > (buffer_size/4096)) { fprintf(stderr, "invalid (too large) al-stripe* settings\n"); exit(10); } if (option_al_stripes * option_al_stripe_size_4k < 32/4) { fprintf(stderr, "invalid (too small) al-stripe* settings\n"); exit(10); } return command->function(cfg, argv + ai, argc - ai); /* and if we want an explicit free, * this would be the place for it. * free(cfg->md_device_name), free(cfg) ... */ } drbd-8.4.4/user/drbdmeta_parser.h0000664000000000000000000000102712216604252015427 0ustar rootroottypedef union YYSTYPE { char* txt; uint64_t u64; } YYSTYPE; #define YYSTYPE_IS_DECLARED 1 #define YYSTYPE_IS_TRIVIAL 1 #define YY_NO_UNPUT 1 extern YYSTYPE yylval; extern int yylineno; enum yytokentype { TK_STRING = 258, TK_U64, TK_U32, TK_NUM, TK_GC, TK_BM, TK_UUID, TK_VERSION, TK_LA_SIZE, TK_BM_BYTE_PER_BIT, TK_DEVICE_UUID, TK_TIMES, TK_FLAGS, TK_INVALID, TK_INVALID_CHAR, TK_LA_BIO_SIZE, TK_AL_STRIPES, TK_AL_STRIPE_SIZE_4K, }; /* avoid compiler warnings about implicit declaration */ int yylex(void); drbd-8.4.4/user/drbdmeta_scanner.fl0000664000000000000000000000425012132747531015744 0ustar rootroot%{ #include "drbd_endian.h" #include "drbdmeta_parser.h" #include "drbdtool_common.h" static void bad_token(char*); //#define DP printf("%s ",yytext); #define DP #define CP yylval.txt=yytext #define YY_NO_INPUT 1 #define YY_NO_UNPUT 1 static void yyunput (int c, register char * yy_bp ) __attribute((unused)); %} %option noyywrap %option yylineno /* remember to avoid backing up. * tell user about bad/unexpected tokens. */ OP [{};] WS [ \r\t\n] COMMENT \#[^\n]* /* 1<<63 is 19 digits. has to be enough. * 20 digits would risk overflow of 64bit unsigned int */ NUM [0-9]{1,19} NUM_TOO_LONG {NUM}[0-9] U64 0x[0-9A-Fa-f]{16} U32 0x[0-9A-Fa-f]{8} INVALID_HEX 0x[0-9A-Fa-f]{0,17} STRING \"[^\"\r\n]{1,20}\" EMPTY_STRING \"\"? INVALID_STRING \"[^\"\r\n]{1,20} INVALID_TOKEN [-_a-zA-Z0-9]{1,100} INVALID_CHAR [^-_a-zA-Z0-9 \t\r\n\";{}] %% {WS} /* skip silently */ {COMMENT} /* skip silently */ {OP} DP; return yytext[0]; {STRING} unescape(yytext); DP; CP; return TK_STRING; {U64} yylval.u64 = strto_u64(yytext, NULL, 16); DP; return TK_U64; {U32} yylval.u64 = strto_u64(yytext, NULL, 16); DP; return TK_U32; {NUM} yylval.u64 = strto_u64(yytext, NULL, 10); DP; return TK_NUM; gc DP; CP; return TK_GC; bm DP; CP; return TK_BM; uuid DP; CP; return TK_UUID; version DP; CP; return TK_VERSION; la-size-sect DP; CP; return TK_LA_SIZE; bm-byte-per-bit DP; CP; return TK_BM_BYTE_PER_BIT; device-uuid DP; CP; return TK_DEVICE_UUID; times DP; CP; return TK_TIMES; flags DP; CP; return TK_FLAGS; al-stripes DP; CP; return TK_AL_STRIPES; al-stripe-size-4k DP; CP; return TK_AL_STRIPE_SIZE_4K; la-peer-max-bio-size DP; CP; return TK_LA_BIO_SIZE; {INVALID_STRING} CP; bad_token("invalid string"); return TK_INVALID; {EMPTY_STRING} CP; bad_token("invalid string"); return TK_INVALID; {INVALID_HEX} CP; bad_token("invalid hex number (only 8 or 16 hex digits accepted)"); return TK_INVALID; {NUM_TOO_LONG} CP; bad_token("number too big"); return TK_INVALID; {INVALID_TOKEN} CP; bad_token("unknown token"); return TK_INVALID; {INVALID_CHAR} CP; return TK_INVALID_CHAR; %% static void bad_token(char *msg) { fflush(stdout); fprintf(stderr,"line %u: %s: %s ...\n", yylineno, msg, yytext); } drbd-8.4.4/user/drbdsetup.c0000664000000000000000000021416512221331365014267 0ustar rootroot/* * DRBD setup via genetlink * * This file is part of DRBD by Philipp Reisner and Lars Ellenberg. * * Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. * Copyright (C) 1999-2008, Philipp Reisner . * Copyright (C) 2002-2008, Lars Ellenberg . * * drbd is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2, or (at your option) * any later version. * * drbd is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with drbd; see the file COPYING. If not, write to * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define EXIT_NOMEM 20 #define EXIT_NO_FAMILY 20 #define EXIT_SEND_ERR 20 #define EXIT_RECV_ERR 20 #define EXIT_TIMED_OUT 20 #define EXIT_NOSOCK 30 #define EXIT_THINKO 42 /* * We are not using libnl, * using its API for the few things we want to do * ends up being almost as much lines of code as * coding the necessary bits right here. */ #include "libgenl.h" #include "drbd_nla.h" #include #include #include #include #include "drbdtool_common.h" #include "registry.h" #include "config.h" #include "config_flags.h" #include "wrap_printf.h" #include "drbd_strings.h" char *progname; /* for parsing of messages */ static struct nlattr *global_attrs[128]; /* there is an other table, nested_attr_tb, defined in genl_magic_func.h, * which can be used after _from_attrs, * to check for presence of struct fields. */ #define ntb(t) nested_attr_tb[__nla_type(t)] #ifdef PRINT_NLMSG_LEN /* I'm to lazy to check the maximum possible nlmsg length by hand */ int main(void) { static __u16 nla_attr_minlen[NLA_TYPE_MAX+1] __read_mostly = { [NLA_U8] = sizeof(__u8), [NLA_U16] = sizeof(__u16), [NLA_U32] = sizeof(__u32), [NLA_U64] = sizeof(__u64), [NLA_NESTED] = NLA_HDRLEN, }; int i; int sum_total = 0; #define LEN__(policy) do { \ int sum = 0; \ for (i = 0; i < ARRAY_SIZE(policy); i++) { \ sum += nla_total_size(policy[i].len ?: \ nla_attr_minlen[policy[i].type]); \ \ } \ sum += 4; \ sum_total += sum; \ printf("%-30s %4u [%4u]\n", \ #policy ":", sum, sum_total); \ } while (0) #define LEN_(p) LEN__(p ## _nl_policy) LEN_(disk_conf); LEN_(syncer_conf); LEN_(net_conf); LEN_(set_role_parms); LEN_(resize_parms); LEN_(state_info); LEN_(start_ov_parms); LEN_(new_c_uuid_parms); sum_total += sizeof(struct nlmsghdr) + sizeof(struct genlmsghdr) + sizeof(struct drbd_genlmsghdr); printf("sum total inclusive hdr overhead: %4u\n", sum_total); return 0; } #else #ifndef AF_INET_SDP #define AF_INET_SDP 27 #define PF_INET_SDP AF_INET_SDP #endif /* pretty print helpers */ static int indent = 0; #define INDENT_WIDTH 4 #define printI(fmt, args... ) printf("%*s" fmt,INDENT_WIDTH * indent,"" , ## args ) enum usage_type { BRIEF, FULL, XML, }; struct drbd_argument { const char* name; __u16 nla_type; int (*convert_function)(struct drbd_argument *, struct msg_buff *, struct drbd_genlmsghdr *dhdr, char *); }; /* Configuration requests typically need a context to operate on. * Possible keys are device minor/volume id (both fit in the drbd_genlmsghdr), * the replication link (aka connection) name, * and/or the replication group (aka resource) name */ enum cfg_ctx_key { /* Only one of these can be present in a command: */ CTX_MINOR = 1, CTX_RESOURCE = 2, CTX_ALL = 4, CTX_CONNECTION = 8, CTX_RESOURCE_AND_CONNECTION = 16, }; struct drbd_cmd { const char* cmd; const enum cfg_ctx_key ctx_key; const int cmd_id; const int tla_id; /* top level attribute id */ int (*function)(struct drbd_cmd *, int, char **); struct drbd_argument *drbd_args; int (*show_function)(struct drbd_cmd*, struct genl_info *); struct option *options; bool missing_ok; bool continuous_poll; bool wait_for_connect_timeouts; bool set_defaults; struct context_def *ctx; }; // other functions static int get_af_ssocks(int warn); static void print_command_usage(struct drbd_cmd *cm, enum usage_type); // command functions static int generic_config_cmd(struct drbd_cmd *cm, int argc, char **argv); static int down_cmd(struct drbd_cmd *cm, int argc, char **argv); static int generic_get_cmd(struct drbd_cmd *cm, int argc, char **argv); static int del_minor_cmd(struct drbd_cmd *cm, int argc, char **argv); static int del_resource_cmd(struct drbd_cmd *cm, int argc, char **argv); // sub commands for generic_get_cmd static int show_scmd(struct drbd_cmd *cm, struct genl_info *info); static int role_scmd(struct drbd_cmd *cm, struct genl_info *info); static int sh_status_scmd(struct drbd_cmd *cm, struct genl_info *info); static int cstate_scmd(struct drbd_cmd *cm, struct genl_info *info); static int dstate_scmd(struct drbd_cmd *cm, struct genl_info *info); static int uuids_scmd(struct drbd_cmd *cm, struct genl_info *info); static int lk_bdev_scmd(struct drbd_cmd *cm, struct genl_info *info); static int print_broadcast_events(struct drbd_cmd *, struct genl_info *); static int w_connected_state(struct drbd_cmd *, struct genl_info *); static int w_synced_state(struct drbd_cmd *, struct genl_info *); // convert functions for arguments static int conv_block_dev(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg); static int conv_md_idx(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg); static int conv_resource_name(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg); static int conv_volume(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg); static int conv_minor(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg); struct option wait_cmds_options[] = { { "wfc-timeout",required_argument, 0, 't' }, { "degr-wfc-timeout",required_argument,0,'d'}, { "outdated-wfc-timeout",required_argument,0,'o'}, { "wait-after-sb",optional_argument,0,'w'}, { 0, 0, 0, 0 } }; struct option show_cmd_options[] = { { "show-defaults", no_argument, 0, 'D' }, { } }; #define F_CONFIG_CMD generic_config_cmd #define NO_PAYLOAD 0 #define F_GET_CMD(scmd) DRBD_ADM_GET_STATUS, NO_PAYLOAD, generic_get_cmd, \ .show_function = scmd struct drbd_cmd commands[] = { {"primary", CTX_MINOR, DRBD_ADM_PRIMARY, DRBD_NLA_SET_ROLE_PARMS, F_CONFIG_CMD, .ctx = &primary_cmd_ctx }, {"secondary", CTX_MINOR, DRBD_ADM_SECONDARY, NO_PAYLOAD, F_CONFIG_CMD }, {"attach", CTX_MINOR, DRBD_ADM_ATTACH, DRBD_NLA_DISK_CONF, F_CONFIG_CMD, .drbd_args = (struct drbd_argument[]) { { "lower_dev", T_backing_dev, conv_block_dev }, { "meta_data_dev", T_meta_dev, conv_block_dev }, { "meta_data_index", T_meta_dev_idx, conv_md_idx }, { } }, .ctx = &attach_cmd_ctx }, {"disk-options", CTX_MINOR, DRBD_ADM_CHG_DISK_OPTS, DRBD_NLA_DISK_CONF, F_CONFIG_CMD, .set_defaults = true, .ctx = &disk_options_ctx }, {"detach", CTX_MINOR, DRBD_ADM_DETACH, DRBD_NLA_DETACH_PARMS, F_CONFIG_CMD, .ctx = &detach_cmd_ctx }, {"connect", CTX_RESOURCE_AND_CONNECTION, DRBD_ADM_CONNECT, DRBD_NLA_NET_CONF, F_CONFIG_CMD, .ctx = &connect_cmd_ctx }, {"net-options", CTX_CONNECTION, DRBD_ADM_CHG_NET_OPTS, DRBD_NLA_NET_CONF, F_CONFIG_CMD, .set_defaults = true, .ctx = &net_options_ctx }, {"disconnect", CTX_CONNECTION, DRBD_ADM_DISCONNECT, DRBD_NLA_DISCONNECT_PARMS, F_CONFIG_CMD, .ctx = &disconnect_cmd_ctx }, {"resize", CTX_MINOR, DRBD_ADM_RESIZE, DRBD_NLA_RESIZE_PARMS, F_CONFIG_CMD, .ctx = &resize_cmd_ctx }, {"resource-options", CTX_RESOURCE, DRBD_ADM_RESOURCE_OPTS, DRBD_NLA_RESOURCE_OPTS, F_CONFIG_CMD, .set_defaults = true, .ctx = &resource_options_cmd_ctx }, {"new-current-uuid", CTX_MINOR, DRBD_ADM_NEW_C_UUID, DRBD_NLA_NEW_C_UUID_PARMS, F_CONFIG_CMD, .ctx = &new_current_uuid_cmd_ctx }, {"invalidate", CTX_MINOR, DRBD_ADM_INVALIDATE, NO_PAYLOAD, F_CONFIG_CMD, }, {"invalidate-remote", CTX_MINOR, DRBD_ADM_INVAL_PEER, NO_PAYLOAD, F_CONFIG_CMD, }, {"pause-sync", CTX_MINOR, DRBD_ADM_PAUSE_SYNC, NO_PAYLOAD, F_CONFIG_CMD, }, {"resume-sync", CTX_MINOR, DRBD_ADM_RESUME_SYNC, NO_PAYLOAD, F_CONFIG_CMD, }, {"suspend-io", CTX_MINOR, DRBD_ADM_SUSPEND_IO, NO_PAYLOAD, F_CONFIG_CMD, }, {"resume-io", CTX_MINOR, DRBD_ADM_RESUME_IO, NO_PAYLOAD, F_CONFIG_CMD, }, {"outdate", CTX_MINOR, DRBD_ADM_OUTDATE, NO_PAYLOAD, F_CONFIG_CMD, }, {"verify", CTX_MINOR, DRBD_ADM_START_OV, DRBD_NLA_START_OV_PARMS, F_CONFIG_CMD, .ctx = &verify_cmd_ctx }, {"down", CTX_RESOURCE, DRBD_ADM_DOWN, NO_PAYLOAD, down_cmd, .missing_ok = true, }, {"state", CTX_MINOR, F_GET_CMD(role_scmd) }, {"role", CTX_MINOR, F_GET_CMD(role_scmd) }, {"sh-status", CTX_MINOR | CTX_RESOURCE | CTX_ALL, F_GET_CMD(sh_status_scmd), .missing_ok = true, }, {"cstate", CTX_MINOR, F_GET_CMD(cstate_scmd) }, {"dstate", CTX_MINOR, F_GET_CMD(dstate_scmd) }, {"show-gi", CTX_MINOR, F_GET_CMD(uuids_scmd) }, {"get-gi", CTX_MINOR, F_GET_CMD(uuids_scmd) }, {"show", CTX_MINOR | CTX_RESOURCE | CTX_ALL, F_GET_CMD(show_scmd), .options = show_cmd_options }, {"check-resize", CTX_MINOR, F_GET_CMD(lk_bdev_scmd) }, {"events", CTX_MINOR | CTX_RESOURCE | CTX_ALL, F_GET_CMD(print_broadcast_events), .missing_ok = true, .continuous_poll = true, }, {"wait-connect", CTX_MINOR, F_GET_CMD(w_connected_state), .options = wait_cmds_options, .continuous_poll = true, .wait_for_connect_timeouts = true, }, {"wait-sync", CTX_MINOR, F_GET_CMD(w_synced_state), .options = wait_cmds_options, .continuous_poll = true, .wait_for_connect_timeouts = true, }, {"new-resource", CTX_RESOURCE, DRBD_ADM_NEW_RESOURCE, DRBD_NLA_RESOURCE_OPTS, F_CONFIG_CMD, .ctx = &resource_options_cmd_ctx }, /* only payload is resource name and volume number */ {"new-minor", 0, DRBD_ADM_NEW_MINOR, DRBD_NLA_CFG_CONTEXT, F_CONFIG_CMD, .drbd_args = (struct drbd_argument[]) { { "resource", T_ctx_resource_name, conv_resource_name }, { "minor", 0, conv_minor }, { "volume", T_ctx_volume, conv_volume }, { } }, .ctx = &new_minor_cmd_ctx }, {"del-minor", CTX_MINOR, DRBD_ADM_DEL_MINOR, NO_PAYLOAD, del_minor_cmd, }, {"del-resource", CTX_RESOURCE, DRBD_ADM_DEL_RESOURCE, NO_PAYLOAD, del_resource_cmd, } }; bool show_defaults; bool wait_after_split_brain; #define OTHER_ERROR 900 #define EM(C) [ C - ERR_CODE_BASE ] /* The EM(123) are used for old error messages. */ static const char *error_messages[] = { EM(NO_ERROR) = "No further Information available.", EM(ERR_LOCAL_ADDR) = "Local address(port) already in use.", EM(ERR_PEER_ADDR) = "Remote address(port) already in use.", EM(ERR_OPEN_DISK) = "Can not open backing device.", EM(ERR_OPEN_MD_DISK) = "Can not open meta device.", EM(106) = "Lower device already in use.", EM(ERR_DISK_NOT_BDEV) = "Lower device is not a block device.", EM(ERR_MD_NOT_BDEV) = "Meta device is not a block device.", EM(109) = "Open of lower device failed.", EM(110) = "Open of meta device failed.", EM(ERR_DISK_TOO_SMALL) = "Low.dev. smaller than requested DRBD-dev. size.", EM(ERR_MD_DISK_TOO_SMALL) = "Meta device too small.", EM(113) = "You have to use the disk command first.", EM(ERR_BDCLAIM_DISK) = "Lower device is already claimed. This usually means it is mounted.", EM(ERR_BDCLAIM_MD_DISK) = "Meta device is already claimed. This usually means it is mounted.", EM(ERR_MD_IDX_INVALID) = "Lower device / meta device / index combination invalid.", EM(117) = "Currently we only support devices up to 3.998TB.\n" "(up to 2TB in case you do not have CONFIG_LBD set)\n" "Contact office@linbit.com, if you need more.", EM(ERR_IO_MD_DISK) = "IO error(s) occurred during initial access to meta-data.\n", EM(ERR_MD_UNCLEAN) = "Unclean meta-data found.\nYou need to 'drbdadm apply-al res'\n", EM(ERR_MD_INVALID) = "No valid meta-data signature found.\n\n" "\t==> Use 'drbdadm create-md res' to initialize meta-data area. <==\n", EM(ERR_AUTH_ALG) = "The 'cram-hmac-alg' you specified is not known in " "the kernel. (Maybe you need to modprobe it, or modprobe hmac?)", EM(ERR_AUTH_ALG_ND) = "The 'cram-hmac-alg' you specified is not a digest.", EM(ERR_NOMEM) = "kmalloc() failed. Out of memory?", EM(ERR_DISCARD_IMPOSSIBLE) = "--discard-my-data not allowed when primary.", EM(ERR_DISK_CONFIGURED) = "Device is attached to a disk (use detach first)", EM(ERR_NET_CONFIGURED) = "Device has a net-config (use disconnect first)", EM(ERR_MANDATORY_TAG) = "UnknownMandatoryTag", EM(ERR_MINOR_INVALID) = "Device minor not allocated", EM(128) = "Resulting device state would be invalid", EM(ERR_INTR) = "Interrupted by Signal", EM(ERR_RESIZE_RESYNC) = "Resize not allowed during resync.", EM(ERR_NO_PRIMARY) = "Need one Primary node to resize.", EM(ERR_RESYNC_AFTER) = "The resync-after minor number is invalid", EM(ERR_RESYNC_AFTER_CYCLE) = "This would cause a resync-after dependency cycle", EM(ERR_PAUSE_IS_SET) = "Sync-pause flag is already set", EM(ERR_PAUSE_IS_CLEAR) = "Sync-pause flag is already cleared", EM(136) = "Disk state is lower than outdated", EM(ERR_PACKET_NR) = "Kernel does not know how to handle your request.\n" "Maybe API_VERSION mismatch?", EM(ERR_NO_DISK) = "Device does not have a disk-config", EM(ERR_NOT_PROTO_C) = "Protocol C required", EM(ERR_NOMEM_BITMAP) = "vmalloc() failed. Out of memory?", EM(ERR_INTEGRITY_ALG) = "The 'data-integrity-alg' you specified is not known in " "the kernel. (Maybe you need to modprobe it, or modprobe hmac?)", EM(ERR_INTEGRITY_ALG_ND) = "The 'data-integrity-alg' you specified is not a digest.", EM(ERR_CPU_MASK_PARSE) = "Invalid cpu-mask.", EM(ERR_VERIFY_ALG) = "VERIFYAlgNotAvail", EM(ERR_VERIFY_ALG_ND) = "VERIFYAlgNotDigest", EM(ERR_VERIFY_RUNNING) = "Can not change verify-alg while online verify runs", EM(ERR_DATA_NOT_CURRENT) = "Can only attach to the data we lost last (see kernel log).", EM(ERR_CONNECTED) = "Need to be StandAlone", EM(ERR_CSUMS_ALG) = "CSUMSAlgNotAvail", EM(ERR_CSUMS_ALG_ND) = "CSUMSAlgNotDigest", EM(ERR_CSUMS_RESYNC_RUNNING) = "Can not change csums-alg while resync is in progress", EM(ERR_PERM) = "Permission denied. CAP_SYS_ADMIN necessary", EM(ERR_NEED_APV_93) = "Protocol version 93 required to use --assume-clean", EM(ERR_STONITH_AND_PROT_A) = "Fencing policy resource-and-stonith only with prot B or C allowed", EM(ERR_CONG_NOT_PROTO_A) = "on-congestion policy pull-ahead only with prot A allowed", EM(ERR_PIC_AFTER_DEP) = "Sync-pause flag is already cleared.\n" "Note: Resync pause caused by a local resync-after dependency.", EM(ERR_PIC_PEER_DEP) = "Sync-pause flag is already cleared.\n" "Note: Resync pause caused by the peer node.", EM(ERR_RES_NOT_KNOWN) = "Unknown resource", EM(ERR_RES_IN_USE) = "Resource still in use (delete all minors first)", EM(ERR_MINOR_CONFIGURED) = "Minor still configured (down it first)", EM(ERR_MINOR_EXISTS) = "Minor exists already (delete it first)", EM(ERR_INVALID_REQUEST) = "Invalid configuration request", EM(ERR_NEED_APV_100) = "Prot version 100 required in order to change\n" "these network options while connected", EM(ERR_NEED_ALLOW_TWO_PRI) = "Can not clear allow_two_primaries as long as\n" "there a primaries on both sides", EM(ERR_MD_LAYOUT_CONNECTED) = "DRBD need to be connected for online MD layout change\n", EM(ERR_MD_LAYOUT_TOO_BIG) = "Resulting AL area too big\n", EM(ERR_MD_LAYOUT_TOO_SMALL) = "Resulting AL are too small\n", EM(ERR_MD_LAYOUT_NO_FIT) = "Resulting AL does not fit into available meta data space\n", EM(ERR_IMPLICIT_SHRINK) = "Implicit device shrinking not allowed. See kernel log.\n", }; #define MAX_ERROR (sizeof(error_messages)/sizeof(*error_messages)) const char * error_to_string(int err_no) { const unsigned int idx = err_no - ERR_CODE_BASE; if (idx >= MAX_ERROR) return "Unknown... maybe API_VERSION mismatch?"; return error_messages[idx]; } #undef MAX_ERROR char *cmdname = NULL; /* "drbdsetup" for reporting in usage etc. */ /* * In CTX_MINOR, CTX_RESOURCE, CTX_ALL, objname and minor refer to the object * the command operates on. */ char *objname; unsigned minor = -1U; enum cfg_ctx_key context; int debug_dump_argv = 0; /* enabled by setting DRBD_DEBUG_DUMP_ARGV in the environment */ int lock_fd; struct genl_sock *drbd_sock = NULL; int try_genl = 1; struct genl_family drbd_genl_family = { .name = "drbd", .version = GENL_MAGIC_VERSION, .hdrsize = GENL_MAGIC_FAMILY_HDRSZ, }; static int conv_block_dev(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg) { struct stat sb; int device_fd; int err; if ((device_fd = open(arg,O_RDWR))==-1) { PERROR("Can not open device '%s'", arg); return OTHER_ERROR; } if ( (err=fstat(device_fd, &sb)) ) { PERROR("fstat(%s) failed", arg); return OTHER_ERROR; } if(!S_ISBLK(sb.st_mode)) { fprintf(stderr, "%s is not a block device!\n", arg); return OTHER_ERROR; } close(device_fd); nla_put_string(msg, ad->nla_type, arg); return NO_ERROR; } static int conv_md_idx(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg) { int idx; if(!strcmp(arg,"internal")) idx = DRBD_MD_INDEX_FLEX_INT; else if(!strcmp(arg,"flexible")) idx = DRBD_MD_INDEX_FLEX_EXT; else idx = m_strtoll(arg,1); nla_put_u32(msg, ad->nla_type, idx); return NO_ERROR; } static int conv_resource_name(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg) { /* additional sanity checks? */ nla_put_string(msg, T_ctx_resource_name, arg); return NO_ERROR; } static int conv_volume(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg) { unsigned vol = m_strtoll(arg,1); /* sanity check on vol < 256? */ nla_put_u32(msg, T_ctx_volume, vol); return NO_ERROR; } static int conv_minor(struct drbd_argument *ad, struct msg_buff *msg, struct drbd_genlmsghdr *dhdr, char* arg) { unsigned minor = dt_minor_of_dev(arg); if (minor == -1U) { fprintf(stderr, "Cannot determine minor device number of " "device '%s'\n", arg); return OTHER_ERROR; } dhdr->minor = minor; return NO_ERROR; } static void resolv6(char *name, struct sockaddr_in6 *addr) { struct addrinfo hints, *res, *tmp; int err; memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_INET6; hints.ai_socktype = SOCK_STREAM; hints.ai_protocol = IPPROTO_TCP; err = getaddrinfo(name, 0, &hints, &res); if (err) { fprintf(stderr, "getaddrinfo %s: %s\n", name, gai_strerror(err)); exit(20); } /* Yes, it is a list. We use only the first result. The loop is only * there to document that we know it is a list */ for (tmp = res; tmp; tmp = tmp->ai_next) { memcpy(addr, tmp->ai_addr, sizeof(*addr)); break; } freeaddrinfo(res); if (0) { /* debug output */ char ip[INET6_ADDRSTRLEN]; inet_ntop(AF_INET6, &addr->sin6_addr, ip, sizeof(ip)); fprintf(stderr, "%s -> %02x %04x %08x %s %08x\n", name, addr->sin6_family, addr->sin6_port, addr->sin6_flowinfo, ip, addr->sin6_scope_id); } } static unsigned long resolv(const char* name) { unsigned long retval; if((retval = inet_addr(name)) == INADDR_NONE ) { struct hostent *he; he = gethostbyname(name); if (!he) { fprintf(stderr, "can not resolve the hostname: gethostbyname(%s): %s\n", name, hstrerror(h_errno)); exit(20); } retval = ((struct in_addr *)(he->h_addr_list[0]))->s_addr; } return retval; } static void split_ipv6_addr(char **address, int *port) { /* ipv6:[fe80::0234:5678:9abc:def1]:8000; */ char *b = strrchr(*address,']'); if (address[0][0] != '[' || b == NULL || (b[1] != ':' && b[1] != '\0')) { fprintf(stderr, "unexpected ipv6 format: %s\n", *address); exit(20); } *b = 0; *address += 1; /* skip '[' */ if (b[1] == ':') *port = m_strtoll(b+2,1); /* b+2: "]:" */ else *port = 7788; /* will we ever get rid of that default port? */ } static void split_address(char* text, int *af, char** address, int* port) { static struct { char* text; int af; } afs[] = { { "ipv4:", AF_INET }, { "ipv6:", AF_INET6 }, { "sdp:", AF_INET_SDP }, { "ssocks:", -1 }, }; unsigned int i; char *b; *af=AF_INET; *address = text; for (i=0; i 0) return af; fd = open(PROC_NET_AF_SSOCKS_FAMILY, O_RDONLY); if (fd < 0) fd = open(PROC_NET_AF_SCI_FAMILY, O_RDONLY); if (fd < 0) { if (warn_and_use_default) { fprintf(stderr, "open(" PROC_NET_AF_SSOCKS_FAMILY ") " "failed: %m\n WARNING: assuming AF_SSOCKS = 27. " "Socket creation may fail.\n"); af = 27; } return af; } c = read(fd, buf, sizeof(buf)-1); if (c > 0) { buf[c] = 0; if (buf[c-1] == '\n') buf[c-1] = 0; af = m_strtoll(buf,1); } else { if (warn_and_use_default) { fprintf(stderr, "read(" PROC_NET_AF_SSOCKS_FAMILY ") " "failed: %m\n WARNING: assuming AF_SSOCKS = 27. " "Socket creation may fail.\n"); af = 27; } } close(fd); return af; } static struct option *make_longoptions(struct drbd_cmd *cm) { static struct option buffer[42]; int i = 0; int primary_force_index = -1; int connect_tentative_index = -1; if (cm->ctx) { struct field_def *field; /* * Make sure to keep cm->ctx->fields first: we use the index * returned by getopt_long() to access cm->ctx->fields. */ for (field = cm->ctx->fields; field->name; field++) { assert(i < ARRAY_SIZE(buffer)); buffer[i].name = field->name; buffer[i].has_arg = field->argument_is_optional ? optional_argument : required_argument; buffer[i].flag = NULL; buffer[i].val = 0; if (!strcmp(cm->cmd, "primary") && !strcmp(field->name, "force")) primary_force_index = i; if (!strcmp(cm->cmd, "connect") && !strcmp(field->name, "tentative")) connect_tentative_index = i; i++; } assert(field - cm->ctx->fields == i); } if (primary_force_index != -1) { /* * For backward compatibility, add --overwrite-data-of-peer as * an alias to --force. */ assert(i < ARRAY_SIZE(buffer)); buffer[i] = buffer[primary_force_index]; buffer[i].name = "overwrite-data-of-peer"; buffer[i].val = 1000 + primary_force_index; i++; } if (connect_tentative_index != -1) { /* * For backward compatibility, add --dry-run as an alias to * --tentative. */ assert(i < ARRAY_SIZE(buffer)); buffer[i] = buffer[connect_tentative_index]; buffer[i].name = "dry-run"; buffer[i].val = 1000 + connect_tentative_index; i++; } if (cm->set_defaults) { assert(i < ARRAY_SIZE(buffer)); buffer[i].name = "set-defaults"; buffer[i].has_arg = 0; buffer[i].flag = NULL; buffer[i].val = '('; i++; } assert(i < ARRAY_SIZE(buffer)); buffer[i].name = NULL; buffer[i].has_arg = 0; buffer[i].flag = NULL; buffer[i].val = 0; return buffer; } /* prepends global objname to output (if any) */ static int print_config_error(int err_no, char *desc) { int rv=0; if (err_no == NO_ERROR || err_no == SS_SUCCESS) return 0; if (err_no == OTHER_ERROR) { if (desc) fprintf(stderr,"%s: %s\n", objname, desc); return 20; } if ( ( err_no >= AFTER_LAST_ERR_CODE || err_no <= ERR_CODE_BASE ) && ( err_no > SS_CW_NO_NEED || err_no <= SS_AFTER_LAST_ERROR) ) { fprintf(stderr,"%s: Error code %d unknown.\n" "You should update the drbd userland tools.\n", objname, err_no); rv = 20; } else { if(err_no > ERR_CODE_BASE ) { fprintf(stderr,"%s: Failure: (%d) %s\n", objname, err_no, desc ?: error_to_string(err_no)); rv = 10; } else if (err_no == SS_UNKNOWN_ERROR) { fprintf(stderr,"%s: State change failed: (%d)" "unknown error.\n", objname, err_no); rv = 11; } else if (err_no > SS_TWO_PRIMARIES) { // Ignore SS_SUCCESS, SS_NOTHING_TO_DO, SS_CW_Success... } else { fprintf(stderr,"%s: State change failed: (%d) %s\n", objname, err_no, drbd_set_st_err_str(err_no)); if (err_no == SS_NO_UP_TO_DATE_DISK) { /* all available disks are inconsistent, * or I am consistent, but cannot outdate the peer. */ rv = 17; } else if (err_no == SS_LOWER_THAN_OUTDATED) { /* was inconsistent anyways */ rv = 5; } else if (err_no == SS_NO_LOCAL_DISK) { /* Can not start resync, no local disks, try with drbdmeta */ rv = 16; } else { rv = 11; } } } if (global_attrs[DRBD_NLA_CFG_REPLY] && global_attrs[DRBD_NLA_CFG_REPLY]->nla_len) { struct nlattr *nla; int rem; fprintf(stderr, "additional info from kernel:\n"); nla_for_each_nested(nla, global_attrs[DRBD_NLA_CFG_REPLY], rem) { if (nla_type(nla) == __nla_type(T_info_text)) fprintf(stderr, "%s\n", (char*)nla_data(nla)); } } return rv; } static void warn_print_excess_args(int argc, char **argv, int i) { fprintf(stderr, "Excess arguments:"); for (; i < argc; i++) fprintf(stderr, " %s", argv[i]); printf("\n"); } static void dump_argv(int argc, char **argv, int first_non_option, int n_known_args) { int i; if (!debug_dump_argv) return; fprintf(stderr, ",-- ARGV dump (optind %d, known_args %d, argc %u):\n", first_non_option, n_known_args, argc); for (i = 0; i < argc; i++) { if (i == 1) fprintf(stderr, "-- consumed options:"); if (i == first_non_option) fprintf(stderr, "-- known args:"); if (i == (first_non_option + n_known_args)) fprintf(stderr, "-- unexpected args:"); fprintf(stderr, "| %2u: %s\n", i, argv[i]); } fprintf(stderr, "`--\n"); } int drbd_tla_parse(struct nlmsghdr *nlh) { return nla_parse(global_attrs, ARRAY_SIZE(drbd_tla_nl_policy)-1, nlmsg_attrdata(nlh, GENL_HDRLEN + drbd_genl_family.hdrsize), nlmsg_attrlen(nlh, GENL_HDRLEN + drbd_genl_family.hdrsize), drbd_tla_nl_policy); } #define ASSERT(exp) if (!(exp)) \ fprintf(stderr,"ASSERT( " #exp " ) in %s:%d\n", __FILE__,__LINE__); static int _generic_config_cmd(struct drbd_cmd *cm, int argc, char **argv, int quiet) { struct drbd_argument *ad = cm->drbd_args; struct nlattr *nla; struct option *lo; int c, i; int n_args; int rv = NO_ERROR; char *desc = NULL; /* error description from kernel reply message */ struct drbd_genlmsghdr *dhdr; struct msg_buff *smsg; struct iovec iov; /* pre allocate request message and reply buffer */ iov.iov_len = DEFAULT_MSG_SIZE; iov.iov_base = malloc(iov.iov_len); smsg = msg_new(DEFAULT_MSG_SIZE); if (!smsg || !iov.iov_base) { desc = "could not allocate netlink messages"; rv = OTHER_ERROR; goto error; } dhdr = genlmsg_put(smsg, &drbd_genl_family, 0, cm->cmd_id); dhdr->minor = -1; dhdr->flags = 0; i = 1; if (context & (CTX_RESOURCE | CTX_CONNECTION)) { nla = nla_nest_start(smsg, DRBD_NLA_CFG_CONTEXT); if (context & CTX_RESOURCE) nla_put_string(smsg, T_ctx_resource_name, argv[i++]); if (context & CTX_CONNECTION) { nla_put_address(smsg, T_ctx_my_addr, argv[i++]); nla_put_address(smsg, T_ctx_peer_addr, argv[i++]); } nla_nest_end(smsg, nla); } else if (context & CTX_MINOR) { dhdr->minor = minor; i++; } nla = NULL; for (ad = cm->drbd_args; ad && ad->name; i++) { if (argc < i + 1) { fprintf(stderr, "Missing argument '%s'\n", ad->name); print_command_usage(cm, FULL); rv = OTHER_ERROR; goto error; } if (!nla) { assert (cm->tla_id != NO_PAYLOAD); nla = nla_nest_start(smsg, cm->tla_id); } rv = ad->convert_function(ad, smsg, dhdr, argv[i]); if (rv != NO_ERROR) goto error; ad++; } n_args = i - 1; /* command name "doesn't count" here */ /* dhdr->minor may have been set by one of the convert functions. */ minor = dhdr->minor; lo = make_longoptions(cm); for (;;) { int idx; c = getopt_long(argc, argv, "(", lo, &idx); if (c == -1) break; if (c >= 1000) { /* This is a field alias. */ idx = c - 1000; c = 0; } if (c == 0) { struct field_def *field = &cm->ctx->fields[idx]; assert (field->name == lo[idx].name); if (!nla) { assert (cm->tla_id != NO_PAYLOAD); nla = nla_nest_start(smsg, cm->tla_id); } if (!field->put(cm->ctx, field, smsg, optarg)) { rv = OTHER_ERROR; goto error; } } else if (c == '(') dhdr->flags |= DRBD_GENL_F_SET_DEFAULTS; else { rv = OTHER_ERROR; goto error; } } /* argc should be cmd + n options + n args; * if it is more, we did not understand some */ if (n_args + optind < argc) { warn_print_excess_args(argc, argv, optind + n_args); rv = OTHER_ERROR; goto error; } dump_argv(argc, argv, optind, i - 1); if (rv == NO_ERROR) { int received; if (nla) nla_nest_end(smsg, nla); if (genl_send(drbd_sock, smsg)) { desc = "error sending config command"; rv = OTHER_ERROR; goto error; } retry_recv: /* reduce timeout! limit retries */ received = genl_recv_msgs(drbd_sock, &iov, &desc, 120000); if (received > 0) { struct nlmsghdr *nlh = (struct nlmsghdr*)iov.iov_base; struct drbd_genlmsghdr *dh = genlmsg_data(nlmsg_data(nlh)); ASSERT(dh->minor == minor); rv = dh->ret_code; if (rv == ERR_RES_NOT_KNOWN && cm->missing_ok) rv = NO_ERROR; drbd_tla_parse(nlh); } else { if (received == -E_RCV_ERROR_REPLY && !errno) goto retry_recv; if (!desc) desc = "error receiving config reply"; rv = OTHER_ERROR; } } error: msg_free(smsg); if (!quiet) rv = print_config_error(rv, desc); free(iov.iov_base); return rv; } static int generic_config_cmd(struct drbd_cmd *cm, int argc, char **argv) { return _generic_config_cmd(cm, argc, argv, 0); } static int del_minor_cmd(struct drbd_cmd *cm, int argc, char **argv) { int rv; rv = generic_config_cmd(cm, argc, argv); if (!rv) unregister_minor(minor); return rv; } static int del_resource_cmd(struct drbd_cmd *cm, int argc, char **argv) { int rv; rv = generic_config_cmd(cm, argc, argv); if (!rv) unregister_resource(objname); return rv; } static struct drbd_cmd *find_cmd_by_name(const char *name) { unsigned int i; for (i = 0; i < ARRAY_SIZE(commands); i++) { if (!strcmp(name, commands[i].cmd)) { return commands + i; } } return NULL; } static void print_options(const char *cmd_name, const char *sect_name) { struct drbd_cmd *cmd; struct field_def *field; int opened = 0; cmd = find_cmd_by_name(cmd_name); if (!cmd) { fprintf(stderr, "%s internal error, no such cmd %s\n", cmdname, cmd_name); abort(); } if (!global_attrs[cmd->tla_id]) return; if (drbd_nla_parse_nested(nested_attr_tb, cmd->ctx->nla_policy_size - 1, global_attrs[cmd->tla_id], cmd->ctx->nla_policy)) { fprintf(stderr, "nla_policy violation for %s payload!\n", sect_name); /* still, print those that validated ok */ } if (!cmd->ctx) return; for (field = cmd->ctx->fields; field->name; field++) { struct nlattr *nlattr; const char *str; bool is_default; nlattr = ntb(field->nla_type); if (!nlattr) continue; if (!opened) { opened=1; printI("%s {\n",sect_name); ++indent; } str = field->get(cmd->ctx, field, nlattr); is_default = field->is_default(field, str); if (is_default && !show_defaults) continue; if (field->needs_double_quoting) str = double_quote_string(str); printI("%-16s\t%s;",field->name, str); if (field->unit || is_default) { printf(" # "); if (field->unit) printf("%s", field->unit); if (field->unit && is_default) printf(", "); if (is_default) printf("default"); } printf("\n"); } if(opened) { --indent; printI("}\n"); } } struct choose_timo_ctx { unsigned minor; struct msg_buff *smsg; struct iovec *iov; int timeout; int wfc_timeout; int degr_wfc_timeout; int outdated_wfc_timeout; }; int choose_timeout(struct choose_timo_ctx *ctx) { char *desc = NULL; struct drbd_genlmsghdr *dhdr; int rr; if (0 < ctx->wfc_timeout && (ctx->wfc_timeout < ctx->degr_wfc_timeout || ctx->degr_wfc_timeout == 0)) { ctx->degr_wfc_timeout = ctx->wfc_timeout; fprintf(stderr, "degr-wfc-timeout has to be shorter than wfc-timeout\n" "degr-wfc-timeout implicitly set to wfc-timeout (%ds)\n", ctx->degr_wfc_timeout); } if (0 < ctx->degr_wfc_timeout && (ctx->degr_wfc_timeout < ctx->outdated_wfc_timeout || ctx->outdated_wfc_timeout == 0)) { ctx->outdated_wfc_timeout = ctx->wfc_timeout; fprintf(stderr, "outdated-wfc-timeout has to be shorter than degr-wfc-timeout\n" "outdated-wfc-timeout implicitly set to degr-wfc-timeout (%ds)\n", ctx->degr_wfc_timeout); } dhdr = genlmsg_put(ctx->smsg, &drbd_genl_family, 0, DRBD_ADM_GET_TIMEOUT_TYPE); dhdr->minor = ctx->minor; dhdr->flags = 0; if (genl_send(drbd_sock, ctx->smsg)) { desc = "error sending config command"; goto error; } rr = genl_recv_msgs(drbd_sock, ctx->iov, &desc, 120000); if (rr > 0) { struct nlmsghdr *nlh = (struct nlmsghdr*)ctx->iov->iov_base; struct genl_info info = { .seq = nlh->nlmsg_seq, .nlhdr = nlh, .genlhdr = nlmsg_data(nlh), .userhdr = genlmsg_data(nlmsg_data(nlh)), .attrs = global_attrs, }; struct drbd_genlmsghdr *dh = info.userhdr; struct timeout_parms parms; ASSERT(dh->minor == ctx->minor); rr = dh->ret_code; if (rr == ERR_MINOR_INVALID) { desc = "minor not available"; goto error; } if (rr != NO_ERROR) goto error; if (drbd_tla_parse(nlh) || timeout_parms_from_attrs(&parms, &info)) { desc = "reply did not validate - " "do you need to upgrade your userland tools?"; goto error; } rr = parms.timeout_type; ctx->timeout = (rr == UT_DEGRADED) ? ctx->degr_wfc_timeout : (rr == UT_PEER_OUTDATED) ? ctx->outdated_wfc_timeout : ctx->wfc_timeout; return 0; } error: if (!desc) desc = "error receiving netlink reply"; fprintf(stderr, "error determining which timeout to use: %s\n", desc); return 20; } #include static bool kernel_older_than(int version, int patchlevel, int sublevel) { struct utsname utsname; char *rel; int l; if (uname(&utsname) != 0) return false; rel = utsname.release; l = strtol(rel, &rel, 10); if (l > version) return false; else if (l < version || *rel == 0) return true; l = strtol(rel + 1, &rel, 10); if (l > patchlevel) return false; else if (l < patchlevel || *rel == 0) return true; l = strtol(rel + 1, &rel, 10); if (l >= sublevel) return false; return true; } static int generic_get_cmd(struct drbd_cmd *cm, int argc, char **argv) { char *desc = NULL; struct drbd_genlmsghdr *dhdr; struct msg_buff *smsg; struct iovec iov; struct choose_timo_ctx timeo_ctx = { .wfc_timeout = DRBD_WFC_TIMEOUT_DEF, .degr_wfc_timeout = DRBD_DEGR_WFC_TIMEOUT_DEF, .outdated_wfc_timeout = DRBD_OUTDATED_WFC_TIMEOUT_DEF, }; int timeout_ms = -1; /* "infinite" */ int flags; int rv = NO_ERROR; int err = 0; int n_args; /* pre allocate request message and reply buffer */ iov.iov_len = 8192; iov.iov_base = malloc(iov.iov_len); smsg = msg_new(DEFAULT_MSG_SIZE); if (!smsg || !iov.iov_base) { desc = "could not allocate netlink messages"; rv = OTHER_ERROR; goto out; } struct option *options = cm->options; if (!options) { static struct option none[] = { { } }; options = none; } const char *opts = make_optstring(options); int c; for(;;) { c = getopt_long(argc, argv, opts, options, 0); if (c == -1) break; switch(c) { default: case '?': return 20; case 't': timeo_ctx.wfc_timeout = m_strtoll(optarg, 1); if(DRBD_WFC_TIMEOUT_MIN > timeo_ctx.wfc_timeout || timeo_ctx.wfc_timeout > DRBD_WFC_TIMEOUT_MAX) { fprintf(stderr, "wfc_timeout => %d" " out of range [%d..%d]\n", timeo_ctx.wfc_timeout, DRBD_WFC_TIMEOUT_MIN, DRBD_WFC_TIMEOUT_MAX); return 20; } break; case 'd': timeo_ctx.degr_wfc_timeout = m_strtoll(optarg, 1); if(DRBD_DEGR_WFC_TIMEOUT_MIN > timeo_ctx.degr_wfc_timeout || timeo_ctx.degr_wfc_timeout > DRBD_DEGR_WFC_TIMEOUT_MAX) { fprintf(stderr, "degr_wfc_timeout => %d" " out of range [%d..%d]\n", timeo_ctx.degr_wfc_timeout, DRBD_DEGR_WFC_TIMEOUT_MIN, DRBD_DEGR_WFC_TIMEOUT_MAX); return 20; } break; case 'o': timeo_ctx.outdated_wfc_timeout = m_strtoll(optarg, 1); if(DRBD_OUTDATED_WFC_TIMEOUT_MIN > timeo_ctx.outdated_wfc_timeout || timeo_ctx.outdated_wfc_timeout > DRBD_OUTDATED_WFC_TIMEOUT_MAX) { fprintf(stderr, "outdated_wfc_timeout => %d" " out of range [%d..%d]\n", timeo_ctx.outdated_wfc_timeout, DRBD_OUTDATED_WFC_TIMEOUT_MIN, DRBD_OUTDATED_WFC_TIMEOUT_MAX); return 20; } break; case 'w': if (!optarg || !strcmp(optarg, "yes")) wait_after_split_brain = true; break; case 'D': show_defaults = true; } } n_args = 1; if (n_args + optind < argc) { warn_print_excess_args(argc, argv, optind + n_args); return 20; } dump_argv(argc, argv, optind, 0); /* otherwise we need to change handling/parsing * of expected replies */ ASSERT(cm->cmd_id == DRBD_ADM_GET_STATUS); if (cm->wait_for_connect_timeouts) { /* wait-connect, wait-sync */ int rr; timeo_ctx.minor = minor; timeo_ctx.smsg = smsg; timeo_ctx.iov = &iov; rr = choose_timeout(&timeo_ctx); if (rr) return rr; if (timeo_ctx.timeout) timeout_ms = timeo_ctx.timeout * 1000; /* rewind send message buffer */ smsg->tail = smsg->data; } else if (!cm->continuous_poll) /* normal "get" request, or "show" */ timeout_ms = 120000; /* else: events command, defaults to "infinity" */ if (cm->continuous_poll) { if (genl_join_mc_group(drbd_sock, "events") && !kernel_older_than(2, 6, 23)) { fprintf(stderr, "unable to join drbd events multicast group\n"); return 20; } } flags = 0; if (minor == -1U) flags |= NLM_F_DUMP; dhdr = genlmsg_put(smsg, &drbd_genl_family, flags, cm->cmd_id); dhdr->minor = minor; dhdr->flags = 0; if (minor == -1U && strcmp(objname, "all")) { /* Restrict the dump to a single resource. */ struct nlattr *nla; nla = nla_nest_start(smsg, DRBD_NLA_CFG_CONTEXT); nla_put_string(smsg, T_ctx_resource_name, objname); nla_nest_end(smsg, nla); } if (genl_send(drbd_sock, smsg)) { desc = "error sending config command"; rv = OTHER_ERROR; goto out2; } /* disable sequence number check in genl_recv_msgs */ drbd_sock->s_seq_expect = 0; for (;;) { int received, rem; struct nlmsghdr *nlh = (struct nlmsghdr *)iov.iov_base; struct timeval before; if (timeout_ms != -1) gettimeofday(&before, NULL); received = genl_recv_msgs(drbd_sock, &iov, &desc, timeout_ms); if (received < 0) { switch(received) { case E_RCV_TIMEDOUT: err = 5; goto out2; case -E_RCV_FAILED: err = 20; goto out2; case -E_RCV_NO_SOURCE_ADDR: continue; /* ignore invalid message */ case -E_RCV_SEQ_MISMATCH: /* we disabled it, so it should not happen */ err = 20; goto out2; case -E_RCV_MSG_TRUNC: continue; case -E_RCV_UNEXPECTED_TYPE: continue; case -E_RCV_NLMSG_DONE: if (cm->continuous_poll) continue; err = cm->show_function(cm, NULL); if (err) goto out2; err = -*(int*)nlmsg_data(nlh); if (err && (err != ENODEV || !cm->missing_ok)) { fprintf(stderr, "received netlink error reply: %s\n", strerror(err)); err = 20; } goto out2; case -E_RCV_ERROR_REPLY: if (!errno) /* positive ACK message */ continue; if (!desc) desc = strerror(errno); fprintf(stderr, "received netlink error reply: %s\n", desc); err = 20; goto out2; default: if (!desc) desc = "error receiving config reply"; err = 20; goto out2; } } if (timeout_ms != -1) { struct timeval after; gettimeofday(&after, NULL); timeout_ms -= (after.tv_sec - before.tv_sec) * 1000 + (after.tv_usec - before.tv_usec) / 1000; if (timeout_ms <= 0) { err = 5; goto out2; } } /* There may be multiple messages in one datagram (for dump replies). */ nlmsg_for_each_msg(nlh, nlh, received, rem) { struct drbd_genlmsghdr *dh = genlmsg_data(nlmsg_data(nlh)); struct genl_info info = (struct genl_info){ .seq = nlh->nlmsg_seq, .nlhdr = nlh, .genlhdr = nlmsg_data(nlh), .userhdr = genlmsg_data(nlmsg_data(nlh)), .attrs = global_attrs, }; /* parse early, otherwise drbd_cfg_context_from_attrs * can not work */ if (drbd_tla_parse(nlh)) { /* FIXME * should continuous_poll continue? */ desc = "reply did not validate - " "do you need to upgrade your userland tools?"; rv = OTHER_ERROR; goto out2; } if (cm->continuous_poll) { /* * We will receive all events and have to * filter for what we want ourself. */ /* FIXME * Do we want to ignore broadcasts until the * initial get/dump requests is done? */ if (minor != -1U) { /* Assert that, for an unicast reply, * reply minor matches request minor. * "unsolicited" kernel broadcasts are "pid=0" (netlink "port id") * (and expected to be genlmsghdr.cmd == DRBD_EVENT) */ if (minor != dh->minor) { if (info.nlhdr->nlmsg_pid != 0) dbg(1, "received netlink packet for minor %u, while expecting %u\n", dh->minor, minor); continue; } } else if (strcmp(objname, "all")) { struct drbd_cfg_context ctx = { .ctx_volume = -1U }; drbd_cfg_context_from_attrs(&ctx, &info); if (ctx.ctx_volume == -1U || strcmp(objname, ctx.ctx_resource_name)) continue; } } rv = dh->ret_code; if (rv == ERR_MINOR_INVALID && cm->missing_ok) rv = NO_ERROR; if (rv != NO_ERROR) goto out2; err = cm->show_function(cm, &info); if (err) { if (err < 0) err = 0; goto out2; } } if (!cm->continuous_poll && !(flags & NLM_F_DUMP)) { /* There will be no more reply packets. */ err = cm->show_function(cm, NULL); goto out2; } } out2: msg_free(smsg); out: if (rv != NO_ERROR) err = print_config_error(rv, desc); free(iov.iov_base); return err; } static char *af_to_str(int af) { if (af == AF_INET) return "ipv4"; else if (af == AF_INET6) return "ipv6"; /* AF_SSOCKS typically is 27, the same as AF_INET_SDP. * But with warn_and_use_default = 0, it will stay at -1 if not available. * Just keep the test on ssocks before the one on SDP (which is hard-coded), * and all should be fine. */ else if (af == get_af_ssocks(0)) return "ssocks"; else if (af == AF_INET_SDP) return "sdp"; else return "unknown"; } static void show_address(void* address, int addr_len) { union { struct sockaddr addr; struct sockaddr_in addr4; struct sockaddr_in6 addr6; } a; char buffer[INET6_ADDRSTRLEN]; /* avoid alignment issues on certain platforms (e.g. armel) */ memset(&a, 0, sizeof(a)); memcpy(&a.addr, address, addr_len); if (a.addr.sa_family == AF_INET || a.addr.sa_family == get_af_ssocks(0) || a.addr.sa_family == AF_INET_SDP) { printI("address\t\t\t%s %s:%d;\n", af_to_str(a.addr4.sin_family), inet_ntoa(a.addr4.sin_addr), ntohs(a.addr4.sin_port)); } else if (a.addr.sa_family == AF_INET6) { printI("address\t\t\t%s [%s]:%d;\n", af_to_str(a.addr6.sin6_family), inet_ntop(a.addr6.sin6_family, &a.addr6.sin6_addr, buffer, INET6_ADDRSTRLEN), ntohs(a.addr6.sin6_port)); } else { printI("address\t\t\t[unknown af=%d, len=%d]\n", a.addr.sa_family, addr_len); } } struct minors_list { struct minors_list *next; unsigned minor; }; struct minors_list *__remembered_minors; static int remember_minor(struct drbd_cmd *cmd, struct genl_info *info) { struct drbd_cfg_context cfg = { .ctx_volume = -1U }; if (!info) return 0; drbd_cfg_context_from_attrs(&cfg, info); if (cfg.ctx_volume != -1U) { unsigned minor = ((struct drbd_genlmsghdr*)(info->userhdr))->minor; struct minors_list *m = malloc(sizeof(*m)); m->next = __remembered_minors; m->minor = minor; __remembered_minors = m; } return 0; } static void free_minors(struct minors_list *minors) { while (minors) { struct minors_list *m = minors; minors = minors->next; free(m); } } /* * Expects objname to be set to the resource name or "all". */ static struct minors_list *enumerate_minors(void) { struct drbd_cmd cmd = { .cmd_id = DRBD_ADM_GET_STATUS, .show_function = remember_minor, .missing_ok = true, }; struct minors_list *m; int err; err = generic_get_cmd(&cmd, 0, NULL); m = __remembered_minors; __remembered_minors = NULL; if (err) { free_minors(m); m = NULL; } return m; } /* may be called for a "show" of a single minor device. * prints all available configuration information in that case. * * may also be called iteratively for a "show-all", which should try to not * print redundant configuration information for the same resource (tconn). */ static int show_scmd(struct drbd_cmd *cm, struct genl_info *info) { /* FIXME need some define for max len here */ static char last_ctx_resource_name[128]; static int call_count; struct drbd_cfg_context cfg = { .ctx_volume = -1U }; struct disk_conf dc = { .disk_size = 0, }; struct net_conf nc = { .timeout = 0, };; if (!info) { if (call_count) { --indent; printI("}\n"); /* close _this_host */ --indent; printI("}\n"); /* close resource */ } fflush(stdout); return 0; } call_count++; /* FIXME: Is the folowing check needed? */ if (!global_attrs[DRBD_NLA_CFG_CONTEXT]) dbg(1, "unexpected packet, configuration context missing!\n"); drbd_cfg_context_from_attrs(&cfg, info); disk_conf_from_attrs(&dc, info); net_conf_from_attrs(&nc, info); if (strncmp(last_ctx_resource_name, cfg.ctx_resource_name, sizeof(last_ctx_resource_name))) { if (strncmp(last_ctx_resource_name, "", sizeof(last_ctx_resource_name))) { --indent; printI("}\n"); /* close _this_host */ --indent; printI("}\n\n"); } strncpy(last_ctx_resource_name, cfg.ctx_resource_name, sizeof(last_ctx_resource_name)); printI("resource %s {\n", cfg.ctx_resource_name); ++indent; print_options("resource-options", "options"); print_options("net-options", "net"); if (cfg.ctx_peer_addr_len) { printI("_remote_host {\n"); ++indent; show_address(cfg.ctx_peer_addr, cfg.ctx_peer_addr_len); --indent; printI("}\n"); } printI("_this_host {\n"); ++indent; if (cfg.ctx_my_addr_len) show_address(cfg.ctx_my_addr, cfg.ctx_my_addr_len); } if (cfg.ctx_volume != -1U) { unsigned minor = ((struct drbd_genlmsghdr*)(info->userhdr))->minor; printI("volume %d {\n", cfg.ctx_volume); ++indent; printI("device\t\t\tminor %d;\n", minor); if (global_attrs[DRBD_NLA_DISK_CONF]) { if (dc.backing_dev[0]) { printI("disk\t\t\t\"%s\";\n", dc.backing_dev); printI("meta-disk\t\t\t"); switch(dc.meta_dev_idx) { case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_INT: printf("internal;\n"); break; case DRBD_MD_INDEX_FLEX_EXT: printf("%s;\n", double_quote_string(dc.meta_dev)); break; default: printf("%s [ %d ];\n", double_quote_string(dc.meta_dev), dc.meta_dev_idx); } } } print_options("attach", "disk"); --indent; printI("}\n"); /* close volume */ } return 0; } static int lk_bdev_scmd(struct drbd_cmd *cm, struct genl_info *info) { unsigned minor; struct disk_conf dc = { .disk_size = 0, }; struct bdev_info bd = { 0, }; uint64_t bd_size; int fd; if (!info) return 0; minor = ((struct drbd_genlmsghdr*)(info->userhdr))->minor; disk_conf_from_attrs(&dc, info); if (!dc.backing_dev) { fprintf(stderr, "Has no disk config, try with drbdmeta.\n"); return 1; } if (dc.meta_dev_idx >= 0 || dc.meta_dev_idx == DRBD_MD_INDEX_FLEX_EXT) { lk_bdev_delete(minor); return 0; } fd = open(dc.backing_dev, O_RDONLY); if (fd == -1) { fprintf(stderr, "Could not open %s: %m.\n", dc.backing_dev); return 1; } bd_size = bdev_size(fd); close(fd); if (lk_bdev_load(minor, &bd) == 0 && bd.bd_size == bd_size && bd.bd_name && !strcmp(bd.bd_name, dc.backing_dev)) return 0; /* nothing changed. */ bd.bd_size = bd_size; bd.bd_name = dc.backing_dev; lk_bdev_save(minor, &bd); return 0; } static int sh_status_scmd(struct drbd_cmd *cm __attribute((unused)), struct genl_info *info) { unsigned minor; struct drbd_cfg_context cfg = { .ctx_volume = -1U }; struct state_info si = { .current_state = 0, }; union drbd_state state; int available = 0; if (!info) return 0; minor = ((struct drbd_genlmsghdr*)(info->userhdr))->minor; /* variable prefix; maybe rather make that a command line parameter? * or use "drbd_sh_status"? */ #define _P "" printf("%s_minor=%u\n", _P, minor); drbd_cfg_context_from_attrs(&cfg, info); if (cfg.ctx_resource_name) printf("%s_res_name=%s\n", _P, shell_escape(cfg.ctx_resource_name)); printf("%s_volume=%d\n", _P, cfg.ctx_volume); if (state_info_from_attrs(&si, info) == 0) available = 1; state.i = si.current_state; if (state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("%s_known=%s\n\n", _P, available ? "Unconfigured" : "NA # not available or not yet created"); printf("%s_cstate=Unconfigured\n", _P); printf("%s_role=\n", _P); printf("%s_peer=\n", _P); printf("%s_disk=\n", _P); printf("%s_pdsk=\n", _P); printf("%s_flags_susp=\n", _P); printf("%s_flags_aftr_isp=\n", _P); printf("%s_flags_peer_isp=\n", _P); printf("%s_flags_user_isp=\n", _P); printf("%s_resynced_percent=\n", _P); } else { printf( "%s_known=Configured\n\n" /* connection state */ "%s_cstate=%s\n" /* role */ "%s_role=%s\n" "%s_peer=%s\n" /* disk state */ "%s_disk=%s\n" "%s_pdsk=%s\n\n", _P, _P, drbd_conn_str(state.conn), _P, drbd_role_str(state.role), _P, drbd_role_str(state.peer), _P, drbd_disk_str(state.disk), _P, drbd_disk_str(state.pdsk)); /* io suspended ? */ printf("%s_flags_susp=%s\n", _P, state.susp ? "1" : ""); /* reason why sync is paused */ printf("%s_flags_aftr_isp=%s\n", _P, state.aftr_isp ? "1" : ""); printf("%s_flags_peer_isp=%s\n", _P, state.peer_isp ? "1" : ""); printf("%s_flags_user_isp=%s\n\n", _P, state.user_isp ? "1" : ""); printf("%s_resynced_percent=", _P); if (ntb(T_bits_rs_total)) { uint32_t shift = si.bits_rs_total >= (1ULL << 32) ? 16 : 10; uint64_t left = (si.bits_oos - si.bits_rs_failed) >> shift; uint64_t total = 1UL + (si.bits_rs_total >> shift); uint64_t tmp = 1000UL - left * 1000UL/total; unsigned synced = tmp; printf("%i.%i\n", synced / 10, synced % 10); /* what else? everything available! */ } else printf("\n"); } printf("\n%s_sh_status_process\n\n\n", _P); fflush(stdout); return 0; #undef _P } static int role_scmd(struct drbd_cmd *cm __attribute((unused)), struct genl_info *info) { union drbd_state state = { .i = 0 }; if (!strcmp(cm->cmd, "state")) { fprintf(stderr, "'%s ... state' is deprecated, use '%s ... role' instead.\n", cmdname, cmdname); } if (!info) return 0; if (global_attrs[DRBD_NLA_STATE_INFO]) { drbd_nla_parse_nested(nested_attr_tb, ARRAY_SIZE(state_info_nl_policy) - 1, global_attrs[DRBD_NLA_STATE_INFO], state_info_nl_policy); if (ntb(T_current_state)) state.i = nla_get_u32(ntb(T_current_state)); } if (state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("Unconfigured\n"); } else { printf("%s/%s\n",drbd_role_str(state.role),drbd_role_str(state.peer)); } return 0; } static int cstate_scmd(struct drbd_cmd *cm __attribute((unused)), struct genl_info *info) { union drbd_state state = { .i = 0 }; if (!info) return 0; if (global_attrs[DRBD_NLA_STATE_INFO]) { drbd_nla_parse_nested(nested_attr_tb, ARRAY_SIZE(state_info_nl_policy) - 1, global_attrs[DRBD_NLA_STATE_INFO], state_info_nl_policy); if (ntb(T_current_state)) state.i = nla_get_u32(ntb(T_current_state)); } if (state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("Unconfigured\n"); } else { printf("%s\n",drbd_conn_str(state.conn)); } return 0; } static int dstate_scmd(struct drbd_cmd *cm __attribute((unused)), struct genl_info *info) { union drbd_state state = { .i = 0 }; if (!info) return 0; if (global_attrs[DRBD_NLA_STATE_INFO]) { drbd_nla_parse_nested(nested_attr_tb, ARRAY_SIZE(state_info_nl_policy)-1, global_attrs[DRBD_NLA_STATE_INFO], state_info_nl_policy); if (ntb(T_current_state)) state.i = nla_get_u32(ntb(T_current_state)); } if ( state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("Unconfigured\n"); } else { printf("%s/%s\n",drbd_disk_str(state.disk),drbd_disk_str(state.pdsk)); } return 0; } static int uuids_scmd(struct drbd_cmd *cm, struct genl_info *info) { union drbd_state state = { .i = 0 }; uint64_t ed_uuid; uint64_t *uuids = NULL; int flags = flags; if (!info) return 0; if (global_attrs[DRBD_NLA_STATE_INFO]) { drbd_nla_parse_nested(nested_attr_tb, ARRAY_SIZE(state_info_nl_policy)-1, global_attrs[DRBD_NLA_STATE_INFO], state_info_nl_policy); if (ntb(T_current_state)) state.i = nla_get_u32(ntb(T_current_state)); if (ntb(T_uuids)) uuids = nla_data(ntb(T_uuids)); if (ntb(T_disk_flags)) flags = nla_get_u32(ntb(T_disk_flags)); if (ntb(T_ed_uuid)) ed_uuid = nla_get_u64(ntb(T_ed_uuid)); } if (state.conn == C_STANDALONE && state.disk == D_DISKLESS) { fprintf(stderr, "Device is unconfigured\n"); return 1; } if (state.disk == D_DISKLESS) { /* XXX we could print the ed_uuid anyways: */ if (0) printf(X64(016)"\n", ed_uuid); fprintf(stderr, "Device has no disk\n"); return 1; } if (uuids) { if(!strcmp(cm->cmd,"show-gi")) { dt_pretty_print_uuids(uuids,flags); } else if(!strcmp(cm->cmd,"get-gi")) { dt_print_uuids(uuids,flags); } else { ASSERT( 0 ); } } else { fprintf(stderr, "No uuids found in reply!\n" "Maybe you need to upgrade your userland tools?\n"); } return 0; } static int down_cmd(struct drbd_cmd *cm, int argc, char **argv) { struct minors_list *minors, *m; int rv; int success; if(argc > 2) { warn_print_excess_args(argc, argv, 2); return OTHER_ERROR; } minors = enumerate_minors(); rv = _generic_config_cmd(cm, argc, argv, 1); success = (rv >= SS_SUCCESS && rv < ERR_CODE_BASE) || rv == NO_ERROR; if (success) { for (m = minors; m; m = m->next) unregister_minor(m->minor); free_minors(minors); unregister_resource(objname); } else { free_minors(minors); return print_config_error(rv, NULL); } return 0; } /* printf format for minor, resource name, volume */ #define MNV_FMT "%d,%s[%d]" static void print_state(char *tag, unsigned seq, unsigned minor, const char *resource_name, unsigned vnr, __u32 state_i) { union drbd_state s = { .i = state_i }; printf("%u %s " MNV_FMT " { cs:%s ro:%s/%s ds:%s/%s %c%c%c%c }\n", seq, tag, minor, resource_name, vnr, drbd_conn_str(s.conn), drbd_role_str(s.role), drbd_role_str(s.peer), drbd_disk_str(s.disk), drbd_disk_str(s.pdsk), s.susp ? 's' : 'r', s.aftr_isp ? 'a' : '-', s.peer_isp ? 'p' : '-', s.user_isp ? 'u' : '-' ); } static int print_broadcast_events(struct drbd_cmd *cm, struct genl_info *info) { struct drbd_cfg_context cfg = { .ctx_volume = -1U }; struct state_info si = { .current_state = 0 }; struct disk_conf dc = { .disk_size = 0, }; struct net_conf nc = { .timeout = 0, }; struct drbd_genlmsghdr *dh; /* End of initial dump. Ignore. Maybe: print some marker? */ if (!info) return 0; dh = info->userhdr; if (dh->ret_code == ERR_MINOR_INVALID && cm->missing_ok) return 0; if (drbd_cfg_context_from_attrs(&cfg, info)) { dbg(1, "unexpected packet, configuration context missing!\n"); /* keep running anyways. */ struct nlattr *nla = NULL; if (info->attrs[DRBD_NLA_CFG_REPLY]) nla = drbd_nla_find_nested(ARRAY_SIZE(drbd_cfg_reply_nl_policy) - 1, info->attrs[DRBD_NLA_CFG_REPLY], T_info_text); if (nla) { char *txt = nla_data(nla); char *c; for (c = txt; *c; c++) if (*c == '\n') *c = '_'; printf("%u # %s\n", info->seq, txt); } goto out; } if (state_info_from_attrs(&si, info)) { /* this is a DRBD_ADM_GET_STATUS reply * with information about a resource without any volumes */ printf("%u R - %s\n", info->seq, cfg.ctx_resource_name); goto out; } disk_conf_from_attrs(&dc, info); net_conf_from_attrs(&nc, info); switch (si.sib_reason) { case SIB_STATE_CHANGE: print_state("ST-prev", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, si.prev_state); print_state("ST-new", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, si.new_state); /* fall through */ case SIB_GET_STATUS_REPLY: print_state("ST", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, si.current_state); break; case SIB_HELPER_PRE: printf("%u UH " MNV_FMT " %s\n", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, si.helper); break; case SIB_HELPER_POST: printf("%u UH-post " MNV_FMT " %s 0x%04x\n", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, si.helper, si.helper_exit_code); break; case SIB_SYNC_PROGRESS: { uint32_t shift = si.bits_rs_total >= (1ULL << 32) ? 16 : 10; uint64_t left = (si.bits_oos - si.bits_rs_failed) >> shift; uint64_t total = 1UL + (si.bits_rs_total >> shift); uint64_t tmp = 1000UL - left * 1000UL/total; unsigned synced = tmp; printf("%u SP " MNV_FMT " %i.%i\n", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, synced / 10, synced % 10); } break; default: /* we could add the si.reason */ printf("%u ?? " MNV_FMT " \n", info->seq, dh->minor, cfg.ctx_resource_name, cfg.ctx_volume, si.sib_reason); break; } out: fflush(stdout); return 0; } static int w_connected_state(struct drbd_cmd *cm, struct genl_info *info) { struct state_info si = { .current_state = 0 }; union drbd_state state; if (!info) return 0; if (!global_attrs[DRBD_NLA_STATE_INFO]) return 0; if (state_info_from_attrs(&si, info)) { fprintf(stderr,"nla_policy violation!?\n"); return 0; } if (si.sib_reason != SIB_STATE_CHANGE && si.sib_reason != SIB_GET_STATUS_REPLY) return 0; state.i = si.current_state; if (state.conn >= C_CONNECTED) return -1; /* done waiting */ if (state.conn < C_UNCONNECTED) { struct drbd_genlmsghdr *dhdr = info->userhdr; struct drbd_cfg_context cfg = { .ctx_volume = -1U }; if (!wait_after_split_brain) return -1; /* done waiting */ drbd_cfg_context_from_attrs(&cfg, info); fprintf(stderr, "\ndrbd%u (%s[%u]) is %s, " "but I'm configured to wait anways (--wait-after-sb)\n", dhdr->minor, cfg.ctx_resource_name, cfg.ctx_volume, drbd_conn_str(state.conn)); } return 0; } static int w_synced_state(struct drbd_cmd *cm, struct genl_info *info) { struct state_info si = { .current_state = 0 }; union drbd_state state; if (!info) return 0; if (!global_attrs[DRBD_NLA_STATE_INFO]) return 0; if (state_info_from_attrs(&si, info)) { fprintf(stderr,"nla_policy violation!?\n"); return 0; } if (si.sib_reason != SIB_STATE_CHANGE && si.sib_reason != SIB_GET_STATUS_REPLY) return 0; state.i = si.current_state; if (state.conn == C_CONNECTED) return -1; /* done waiting */ if (!wait_after_split_brain && state.conn < C_UNCONNECTED) return -1; /* done waiting */ return 0; } /* * Check if an integer is a power of two. */ static bool power_of_two(int i) { return i && !(i & (i - 1)); } static void print_command_usage(struct drbd_cmd *cm, enum usage_type ut) { struct drbd_argument *args; if(ut == XML) { enum cfg_ctx_key ctx = cm->ctx_key; printf("\n", cm->cmd); if (ctx & CTX_RESOURCE_AND_CONNECTION) ctx = CTX_RESOURCE | CTX_CONNECTION; if (ctx & (CTX_RESOURCE | CTX_MINOR | CTX_ALL)) { bool more_than_one_choice = !power_of_two(ctx & (CTX_RESOURCE | CTX_MINOR | CTX_ALL)); const char *indent = "\t\t" + !more_than_one_choice; if (more_than_one_choice) printf("\t\n"); if (ctx & CTX_RESOURCE) printf("%sresource\n", indent); if (ctx & CTX_MINOR) printf("%sminor\n", indent); if (ctx & CTX_ALL) printf("%sall\n", indent); if (more_than_one_choice) printf("\t\n"); } if (ctx & CTX_CONNECTION) { printf("\tlocal_addr\n"); printf("\tremote_addr\n"); } if(cm->drbd_args) { for (args = cm->drbd_args; args->name; args++) { printf("\t%s\n", args->name); } } if (cm->options) { struct option *option; for (option = cm->options; option->name; option++) { /* * The "string" options here really are * timeouts, but we can't describe them * in a resonable way here. */ printf("\t\n", option->name, option->has_arg == no_argument ? "flag" : "string"); } } if (cm->set_defaults) printf("\t\n"); if (cm->ctx) { struct field_def *field; for (field = cm->ctx->fields; field->name; field++) field->describe_xml(field); } printf("\n"); return; } if (ut == BRIEF) wrap_printf(4, "%-18s ", cm->cmd); else { wrap_printf(0, "USAGE:\n"); wrap_printf(1, "%s %s", progname, cm->cmd); if (cm->ctx_key && ut != BRIEF) { enum cfg_ctx_key ctx = cm->ctx_key; if (ctx & CTX_RESOURCE_AND_CONNECTION) ctx = CTX_RESOURCE | CTX_CONNECTION; if (ctx & (CTX_RESOURCE | CTX_MINOR | CTX_ALL)) { bool first = true; wrap_printf(4, " {"); if (ctx & CTX_RESOURCE) { wrap_printf(4, "|resource" + first); first = false; } if (ctx & CTX_MINOR) { wrap_printf(4, "|minor" + first); first = false; } if (ctx & CTX_ALL) { wrap_printf(4, "|all" + first); first = false; } wrap_printf(4, "}"); } if (ctx & CTX_CONNECTION) { wrap_printf(4, " [{af}:]{local_addr}[:{port}]"); wrap_printf(4, " [{af}:]{remote_addr}[:{port}]"); } } if (cm->drbd_args) { for (args = cm->drbd_args; args->name; args++) wrap_printf(4, " {%s}", args->name); } if (cm->options) { struct option *option; for (option = cm->options; option->name; option++) wrap_printf(4, " [--%s%s]", option->name, option->has_arg == no_argument ? "" : "=..."); } if (cm->set_defaults) wrap_printf(4, " [--set-defaults]"); if (cm->ctx) { struct field_def *field; for (field = cm->ctx->fields; field->name; field++) { char buffer[300]; int n; n = field->usage(field, buffer, sizeof(buffer)); assert(n < sizeof(buffer)); wrap_printf(4, " %s", buffer); } } wrap_printf(4, "\n"); } } static void print_usage_and_exit(const char* addinfo) { size_t i; printf("\nUSAGE: %s command device arguments options\n\n" "Device is usually /dev/drbdX or /dev/drbd/X.\n" "\nCommands are:\n",cmdname); for (i = 0; i < ARRAY_SIZE(commands); i++) print_command_usage(&commands[i], BRIEF); printf("\n\n" "To get more details about a command issue " "'drbdsetup help cmd'.\n" "\n"); /* printf("\n\nVersion: "REL_VERSION" (api:%d)\n%s\n", API_VERSION, drbd_buildtag()); */ if (addinfo) printf("\n%s\n",addinfo); exit(20); } static int modprobe_drbd(void) { struct stat sb; int ret, retries = 10; ret = stat("/proc/drbd", &sb); if (ret && errno == ENOENT) { system("/sbin/modprobe drbd"); for(;;) { struct timespec ts = { .tv_nsec = 1000000, }; ret = stat("/proc/drbd", &sb); if (!ret || retries-- == 0) break; nanosleep(&ts, NULL); } } if (ret) { fprintf(stderr, "Could not stat /proc/drbd: %m\n"); fprintf(stderr, "Make sure that the DRBD kernel module is installed " "and can be loaded!\n"); } return ret == 0; } void exec_legacy_drbdsetup(char **argv) { #ifdef DRBD_LEGACY_83 static const char * const legacy_drbdsetup = "drbdsetup-83"; char *progname, *drbdsetup; /* in case drbdsetup is called with an absolute or relative pathname * look for the legacy drbdsetup binary in the same location, * otherwise, just let execvp sort it out... */ if ((progname = strrchr(argv[0], '/')) == 0) { drbdsetup = strdup(legacy_drbdsetup); } else { size_t len_dir, l; ++progname; len_dir = progname - argv[0]; l = len_dir + strlen(legacy_drbdsetup) + 1; drbdsetup = malloc(l); if (!drbdsetup) { fprintf(stderr, "Malloc() failed\n"); exit(20); } strncpy(drbdsetup, argv[0], len_dir); strcpy(drbdsetup + len_dir, legacy_drbdsetup); } execvp(drbdsetup, argv); #else fprintf(stderr, "This drbdsetup was not built with support for legacy drbd-8.3\n" "Eventually rebuild with ./configure --with-legacy-connector\n"); #endif } int main(int argc, char **argv) { struct drbd_cmd *cmd; int rv=0; progname = basename(argv[0]); if (chdir("/")) { /* highly unlikely, but gcc is picky */ perror("cannot chdir /"); return -111; } cmdname = strrchr(argv[0],'/'); if (cmdname) argv[0] = ++cmdname; else cmdname = argv[0]; if (argc > 2 && (!strcmp(argv[2], "--help") || !strcmp(argv[2], "-h"))) { char *swap = argv[1]; argv[1] = argv[2]; argv[2] = swap; } if (argc > 1 && (!strcmp(argv[1], "help") || !strcmp(argv[1], "xml-help") || !strcmp(argv[1], "--help") || !strcmp(argv[1], "-h"))) { enum usage_type usage_type = !strcmp(argv[1], "xml-help") ? XML : FULL; if(argc > 2) { cmd = find_cmd_by_name(argv[2]); if(cmd) { print_command_usage(cmd, usage_type); exit(0); } else print_usage_and_exit("unknown command"); } else print_usage_and_exit(0); } /* * drbdsetup previously took the object to operate on as its first argument, * followed by the command. For backwards compatibility, still support his. */ if (argc >= 3 && !find_cmd_by_name(argv[1]) && find_cmd_by_name(argv[2])) { char *swap = argv[1]; argv[1] = argv[2]; argv[2] = swap; } /* it is enough to set it, value is ignored */ if (getenv("DRBD_DEBUG_DUMP_ARGV")) debug_dump_argv = 1; if (argc < 2) print_usage_and_exit(0); cmd = find_cmd_by_name(argv[1]); if (!cmd) print_usage_and_exit("invalid command"); if (!modprobe_drbd()) { if (!strcmp(argv[1], "down") || !strcmp(argv[1], "secondary") || !strcmp(argv[1], "disconnect") || !strcmp(argv[1], "detach")) return 0; /* "down" succeeds even if drbd is missing */ return 20; } if (try_genl) { if (cmd->continuous_poll) drbd_genl_family.nl_groups = -1; drbd_sock = genl_connect_to_family(&drbd_genl_family); if (!drbd_sock) { try_genl = 0; exec_legacy_drbdsetup(argv); /* Only reached in case exec() failed... */ fprintf(stderr, "Could not connect to 'drbd' generic netlink family\n"); return 20; } if (drbd_genl_family.version != API_VERSION || drbd_genl_family.hdrsize != sizeof(struct drbd_genlmsghdr)) { fprintf(stderr, "API mismatch!\n\t" "API version drbdsetup: %u kernel: %u\n\t" "header size drbdsetup: %u kernel: %u\n", API_VERSION, drbd_genl_family.version, (unsigned)sizeof(struct drbd_genlmsghdr), drbd_genl_family.hdrsize); return 20; } } context = 0; if (cmd->ctx_key & (CTX_MINOR | CTX_RESOURCE | CTX_ALL | CTX_RESOURCE_AND_CONNECTION)) { if (argc < 3) { fprintf(stderr, "Missing first argument\n"); print_command_usage(cmd, FULL); exit(20); } objname = argv[2]; if (!strcmp(objname, "all")) { if (!(cmd->ctx_key & CTX_ALL)) print_usage_and_exit("command does not accept argument 'all'"); context = CTX_ALL; } else if (cmd->ctx_key & CTX_MINOR) { minor = dt_minor_of_dev(objname); if (minor == -1U && !(cmd->ctx_key & (CTX_RESOURCE | CTX_RESOURCE_AND_CONNECTION))) { fprintf(stderr, "Cannot determine minor device number of " "device '%s'\n", objname); exit(20); } if (cmd->cmd_id != DRBD_ADM_GET_STATUS) lock_fd = dt_lock_drbd(minor); context = CTX_MINOR; } else context = CTX_RESOURCE; } if (cmd->ctx_key & (CTX_CONNECTION | CTX_RESOURCE_AND_CONNECTION)) { if (argc < 4 + !!context) { fprintf(stderr, "Missing connection endpoint argument\n"); print_command_usage(cmd, FULL); exit(20); } context |= CTX_CONNECTION; } if (objname == NULL && (cmd->ctx_key & CTX_CONNECTION)) { objname = getenv("DRBD_RESOURCE"); if (objname == NULL) m_asprintf(&objname, "connection %s %s", argv[2], argv[3]); } if (objname == NULL && cmd->ctx == &new_minor_cmd_ctx) objname = argv[2]; if (objname == NULL) objname = "??"; /* Make it so that argv[0] is the command name. */ rv = cmd->function(cmd, argc - 1, argv + 1); dt_unlock_drbd(lock_fd); return rv; } #endif drbd-8.4.4/user/drbdtool_common.c0000664000000000000000000004374412216604252015461 0ustar rootroot#define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* for BLKGETSIZE64 */ #include #include "drbdtool_common.h" #include "config.h" /* In-place unescape double quotes and backslash escape sequences from a * double quoted string. Note: backslash is only useful to quote itself, or * double quote, no special treatment to any c-style escape sequences. */ void unescape(char *txt) { char *ue, *e; e = ue = txt; for (;;) { if (*ue == '"') { ue++; continue; } if (*ue == '\\') ue++; if (!*ue) break; *e++ = *ue++; } *e = '\0'; } /* input size is expected to be in KB */ char *ppsize(char *buf, unsigned long long size) { /* Needs 9 bytes at max including trailing NUL: * -1ULL ==> "16384 EB" */ static char units[] = { 'K', 'M', 'G', 'T', 'P', 'E' }; int base = 0; while (size >= 10000 && base < sizeof(units)-1) { /* shift + round */ size = (size >> 10) + !!(size & (1<<9)); base++; } sprintf(buf, "%u %cB", (unsigned)size, units[base]); return buf; } const char *make_optstring(struct option *options) { static char buffer[200]; char seen[256]; struct option *opt; char *c; memset(seen, 0, sizeof(seen)); opt = options; c = buffer; while (opt->name) { if (0 < opt->val && opt->val < 256) { if (seen[opt->val]++) { fprintf(stderr, "internal error: --%s has duplicate opt->val '%c'\n", opt->name, opt->val); abort(); } *c++ = opt->val; if (opt->has_arg != no_argument) { *c++ = ':'; if (opt->has_arg == optional_argument) *c++ = ':'; } } opt++; } *c = 0; return buffer; } int new_strtoll(const char *s, const char def_unit, unsigned long long *rv) { char unit = 0; char dummy = 0; int shift, c; switch (def_unit) { default: return MSE_DEFAULT_UNIT; case 0: case 1: case '1': shift = 0; break; case 'K': case 'k': shift = -10; break; case 's': shift = -9; // sectors break; /* case 'M': case 'm': case 'G': case 'g': */ } if (!s || !*s) return MSE_MISSING_NUMBER; c = sscanf(s, "%llu%c%c", rv, &unit, &dummy); if (c != 1 && c != 2) return MSE_INVALID_NUMBER; switch (unit) { case 0: return MSE_OK; case 'K': case 'k': shift += 10; break; case 'M': case 'm': shift += 20; break; case 'G': case 'g': shift += 30; break; case 's': shift += 9; break; default: return MSE_INVALID_UNIT; } /* if shift is negative (e.g. default unit 'K', actual unit 's'), * convert to positive, and shift right, rounding up. */ if (shift < 0) { shift = -shift; *rv = (*rv + (1ULL << shift) - 1) >> shift; return MSE_OK; } /* if shift is positive, first check for overflow */ if (*rv > (~0ULL >> shift)) return MSE_OUT_OF_RANGE; /* then convert */ *rv = *rv << shift; return MSE_OK; } unsigned long long m_strtoll(const char *s, const char def_unit) { unsigned long long r; switch(new_strtoll(s, def_unit, &r)) { case MSE_OK: return r; case MSE_DEFAULT_UNIT: fprintf(stderr, "unexpected default unit: %d\n",def_unit); exit(100); case MSE_MISSING_NUMBER: fprintf(stderr, "missing number argument\n"); exit(100); case MSE_INVALID_NUMBER: fprintf(stderr, "%s is not a valid number\n", s); exit(20); case MSE_INVALID_UNIT: fprintf(stderr, "%s is not a valid number\n", s); exit(20); case MSE_OUT_OF_RANGE: fprintf(stderr, "%s: out of range\n", s); exit(20); default: fprintf(stderr, "m_stroll() is confused\n"); exit(20); } } void alarm_handler(int __attribute((unused)) signo) { /* nothing. just interrupt F_SETLKW */ } /* it is implicitly unlocked when the process dies. * but if you want to explicitly unlock it, just close it. */ int unlock_fd(int fd) { return close(fd); } int get_fd_lockfile_timeout(const char *path, int seconds) { int fd, err; struct sigaction sa,so; struct flock fl = { .l_type = F_WRLCK, .l_whence = 0, .l_start = 0, .l_len = 0 }; if ((fd = open(path, O_RDWR | O_CREAT, 0600)) < 0) { fprintf(stderr,"open(%s): %m\n",path); return -1; } if (seconds) { sa.sa_handler=alarm_handler; sigemptyset(&sa.sa_mask); sa.sa_flags=0; sigaction(SIGALRM,&sa,&so); alarm(seconds); err = fcntl(fd,F_SETLKW,&fl); if (err) err = errno; alarm(0); sigaction(SIGALRM,&so,NULL); } else { err = fcntl(fd,F_SETLK,&fl); if (err) err = errno; } if (!err) return fd; if (err != EINTR && err != EAGAIN) { close(fd); errno = err; fprintf(stderr,"fcntl(%s,...): %m\n", path); return -1; } /* do we want to know this? */ if (!fcntl(fd,F_GETLK,&fl)) { fprintf(stderr,"lock on %s currently held by pid:%u\n", path, fl.l_pid); } close(fd); return -1; } int dt_minor_of_dev(const char *device) { struct stat sb; long m; int digits_only = only_digits(device); const char *c = device; /* On udev/devfs based system the device nodes does not * exist before the drbd is created. * * If the device name starts with /dev/drbd followed by * only digits, or if only digits are given, * those digits are the minor number. * * Otherwise, we cannot reliably determine the minor number! * * We allow "arbitrary" device names in drbd.conf, * and those may well contain digits. * Interpreting any digits as minor number is dangerous! */ if (!digits_only) { if (!strncmp("/dev/drbd", device, 9) && only_digits(device + 9)) c = device + 9; /* if the device node exists, * and is a block device with the correct major, * do not enforce further naming conventions. * people without udev, and not using drbdadm * may do whatever they like. */ else if (!stat(device,&sb) && S_ISBLK(sb.st_mode) && major(sb.st_rdev) == LANANA_DRBD_MAJOR) return minor(sb.st_rdev); else return -1; } /* ^[0-9]+$ or ^/dev/drbd[0-9]+$ */ errno = 0; m = strtol(c, NULL, 10); if (!errno) return m; return -1; } int only_digits(const char *s) { const char *c; for (c = s; isdigit(*c); c++) ; return c != s && *c == 0; } int dt_lock_drbd(int minor) { int sz, lfd; char *lfname; /* THINK. * maybe we should also place a fcntl lock on the * _physical_device_ we open later... * * This lock is to prevent a drbd minor from being configured * by drbdsetup while drbdmeta is about to mess with its meta data. * * If you happen to mess with the meta data of one device, * pretending it belongs to an other, you'll screw up completely. * * We should store something in the meta data to detect such abuses. */ /* NOTE that /var/lock/drbd-*-* may not be "secure", * maybe we should rather use /var/lock/drbd/drbd-*-*, * and make sure that /var/lock/drbd is drwx.-..-. root:root ... */ sz = asprintf(&lfname, DRBD_LOCK_DIR "/drbd-%d-%d", LANANA_DRBD_MAJOR, minor); if (sz < 0) { perror(""); exit(20); } lfd = get_fd_lockfile_timeout(lfname, 1); free (lfname); if (lfd < 0) exit(20); return lfd; } /* ignore errors */ void dt_unlock_drbd(int lock_fd) { if (lock_fd >= 0) unlock_fd(lock_fd); } void dt_print_gc(const uint32_t* gen_cnt) { printf("%d:%d:%d:%d:%d:%d:%d:%d\n", gen_cnt[Flags] & MDF_CONSISTENT ? 1 : 0, gen_cnt[HumanCnt], gen_cnt[TimeoutCnt], gen_cnt[ConnectedCnt], gen_cnt[ArbitraryCnt], gen_cnt[Flags] & MDF_PRIMARY_IND ? 1 : 0, gen_cnt[Flags] & MDF_CONNECTED_IND ? 1 : 0, gen_cnt[Flags] & MDF_FULL_SYNC ? 1 : 0); } void dt_pretty_print_gc(const uint32_t* gen_cnt) { printf("\n" " WantFullSync |\n" " ConnectedInd | |\n" " lastState | | |\n" " ArbitraryCnt | | | |\n" " ConnectedCnt | | | | |\n" " TimeoutCnt | | | | | |\n" " HumanCnt | | | | | | |\n" "Consistent | | | | | | | |\n" " --------+-----+-----+-----+-----+-----+-----+-----+\n" " %3s | %3d | %3d | %3d | %3d | %3s | %3s | %3s \n" "\n", gen_cnt[Flags] & MDF_CONSISTENT ? "1/c" : "0/i", gen_cnt[HumanCnt], gen_cnt[TimeoutCnt], gen_cnt[ConnectedCnt], gen_cnt[ArbitraryCnt], gen_cnt[Flags] & MDF_PRIMARY_IND ? "1/p" : "0/s", gen_cnt[Flags] & MDF_CONNECTED_IND ? "1/c" : "0/n", gen_cnt[Flags] & MDF_FULL_SYNC ? "1/y" : "0/n"); } void dt_print_uuids(const uint64_t* uuid, unsigned int flags) { int i; printf(X64(016)":"X64(016)":", uuid[UI_CURRENT], uuid[UI_BITMAP]); for ( i=UI_HISTORY_START ; i<=UI_HISTORY_END ; i++ ) { printf(X64(016)":", uuid[i]); } printf("%d:%d:%d:%d:%d:%d:%d\n", flags & MDF_CONSISTENT ? 1 : 0, flags & MDF_WAS_UP_TO_DATE ? 1 : 0, flags & MDF_PRIMARY_IND ? 1 : 0, flags & MDF_CONNECTED_IND ? 1 : 0, flags & MDF_FULL_SYNC ? 1 : 0, flags & MDF_PEER_OUT_DATED ? 1 : 0, flags & MDF_CRASHED_PRIMARY ? 1 : 0); } void dt_pretty_print_uuids(const uint64_t* uuid, unsigned int flags) { printf( "\n" " +--< Current data generation UUID >-\n" " | +--< Bitmap's base data generation UUID >-\n" " | | +--< younger history UUID >-\n" " | | | +-< older history >-\n" " V V V V\n"); dt_print_uuids(uuid, flags); printf( " ^ ^ ^ ^ ^ ^ ^\n" " -< Data consistency flag >--+ | | | | | |\n" " -< Data was/is currently up-to-date >--+ | | | | |\n" " -< Node was/is currently primary >--+ | | | |\n" " -< Node was/is currently connected >--+ | | |\n" " -< Node was in the progress of setting all bits in the bitmap >--+ | |\n" " -< The peer's disk was out-dated or inconsistent >--+ |\n" " -< This node was a crashed primary, and has not seen its peer since >--+\n" "\n"); printf("flags:%s %s, %s, %s%s%s\n", (flags & MDF_CRASHED_PRIMARY) ? " crashed" : "", (flags & MDF_PRIMARY_IND) ? "Primary" : "Secondary", (flags & MDF_CONNECTED_IND) ? "Connected" : "StandAlone", (flags & MDF_CONSISTENT) ? ((flags & MDF_WAS_UP_TO_DATE) ? "UpToDate" : "Outdated") : "Inconsistent", (flags & MDF_FULL_SYNC) ? ", need full sync" : "", (flags & MDF_PEER_OUT_DATED) ? ", peer Outdated" : ""); printf("meta-data: %s\n", (flags & MDF_AL_CLEAN) ? "clean" : "need apply-al"); } /* s: token buffer * size: size of s, _including_ the terminating NUL * stream: to read from. * s is guaranteed to be NUL terminated * if a token (including the NUL) needs more size bytes, * s will contain only a truncated token, and the next call will * return the next size-1 non-white-space bytes of stream. */ int fget_token(char *s, int size, FILE* stream) { int c; char* sp = s; *sp = 0; /* terminate even if nothing is found */ --size; /* account for the terminating NUL */ do { // eat white spaces in front. c = getc(stream); if( c == EOF) return EOF; } while (!isgraph(c)); do { // read the first word into s *sp++ = c; c = getc(stream); if ( c == EOF) break; } while (isgraph(c) && --size); *sp=0; return 1; } int sget_token(char *s, int size, const char** text) { int c; char* sp = s; *sp = 0; /* terminate even if nothing is found */ --size; /* account for the terminating NUL */ do { // eat white spaces in front. c = *(*text)++; if( c == 0) return EOF; } while (!isgraph(c)); do { // read the first word into s *sp++ = c; c = *(*text)++; if ( c == 0) break; } while (isgraph(c) && --size); *sp=0; return 1; } uint64_t bdev_size(int fd) { uint64_t size64; /* size in byte. */ long size; /* size in sectors. */ int err; err = ioctl(fd, BLKGETSIZE64, &size64); if (err) { if (errno == EINVAL) { printf("INFO: falling back to BLKGETSIZE\n"); err = ioctl(fd, BLKGETSIZE, &size); if (err) { perror("ioctl(,BLKGETSIZE,) failed"); exit(20); } size64 = (uint64_t)512 *size; } else { perror("ioctl(,BLKGETSIZE64,) failed"); exit(20); } } return size64; } char *lk_bdev_path(unsigned minor) { char *path; m_asprintf(&path, "%s/drbd-minor-%d.lkbd", DRBD_LIB_DIR, minor); return path; } /* If the lower level device is resized, * and DRBD did not move its "internal" meta data in time, * the next time we try to attach, we won't find our meta data. * * Some helpers for storing and retrieving "last known" * information, to be able to find it regardless, * without scanning the full device for magic numbers. */ /* these return 0 on sucess, error code if something goes wrong. */ /* NOTE: file format for now: * one line, starting with size in byte, followed by tab, * followed by device name, followed by newline. */ int lk_bdev_save(const unsigned minor, const struct bdev_info *bd) { FILE *fp; char *path = lk_bdev_path(minor); int ok = 0; fp = fopen(path, "w"); if (!fp) goto fail; ok = fprintf(fp, "%llu\t%s\n", (unsigned long long) bd->bd_size, bd->bd_name); if (ok <= 0) goto fail; if (bd->bd_uuid) fprintf(fp, "uuid:\t"X64(016)"\n", bd->bd_uuid); ok = 0 == fflush(fp); ok = ok && 0 == fsync(fileno(fp)); ok = ok && 0 == fclose(fp); if (!ok) fail: /* MAYBE: unlink. But maybe partial info is better than no info? */ fprintf(stderr, "lk_bdev_save(%s) failed: %m\n", path); free(path); return ok <= 0 ? -1 : 0; } /* we may want to remove all stored information */ int lk_bdev_delete(const unsigned minor) { char *path = lk_bdev_path(minor); int rc = unlink(path); if (rc && errno != ENOENT) fprintf(stderr, "lk_bdev_delete(%s) failed: %m\n", path); free(path); return rc; } /* load info from that file. * caller should free(bd->bd_name) once it is no longer needed. */ int lk_bdev_load(const unsigned minor, struct bdev_info *bd) { FILE *fp; char *path; char *bd_name; unsigned long long bd_size; unsigned long long bd_uuid; char nl[2]; int rc = -1; if (!bd) return -1; path = lk_bdev_path(minor); fp = fopen(path, "r"); if (!fp) { if (errno != ENOENT) fprintf(stderr, "lk_bdev_load(%s) failed: %m\n", path); goto out; } /* GNU format extension: %as: * malloc buffer space for the resulting char */ rc = fscanf(fp, "%llu %as%[\n]uuid: %llx%[\n]", &bd_size, &bd_name, nl, &bd_uuid, nl); /* rc == 5: successfully converted two lines. * == 4: newline not found, possibly truncated uuid * == 3: first line complete, uuid missing. * == 2: new line not found, possibly truncated pathname, * or early whitespace * == 1: found some number, but no more. * incomplete file? try anyways. */ bd->bd_uuid = (rc >= 4) ? bd_uuid : 0; bd->bd_name = (rc >= 2) ? bd_name : NULL; bd->bd_size = (rc >= 1) ? bd_size : 0; if (rc < 1) { fprintf(stderr, "lk_bdev_load(%s): parse error\n", path); rc = -1; } else rc = 0; fclose(fp); out: free(path); return rc; } void get_random_bytes(void* buffer, int len) { int fd; fd = open("/dev/urandom",O_RDONLY); if( fd == -1) { perror("Open of /dev/urandom failed"); exit(20); } if(read(fd,buffer,len) != len) { fprintf(stderr,"Reading from /dev/urandom failed\n"); exit(20); } close(fd); } const char* shell_escape(const char* s) { /* ugly static buffer. so what. */ static char buffer[1024]; char *c = buffer; if (s == NULL) return s; while (*s) { if (buffer + sizeof(buffer) < c+2) break; switch(*s) { /* set of 'clean' characters */ case '%': case '+': case '-': case '.': case '/': case '0' ... '9': case ':': case '=': case '@': case 'A' ... 'Z': case '_': case 'a' ... 'z': break; /* escape everything else */ default: *c++ = '\\'; } *c++ = *s++; } *c = '\0'; return buffer; } int m_asprintf(char **strp, const char *fmt, ...) { int r; va_list ap; va_start(ap, fmt); r = vasprintf(strp, fmt, ap); va_end(ap); if (r == -1) { fprintf(stderr, "vasprintf() failed. Out of memory?\n"); exit(10); } return r; } /* print len bytes from buf in the format of well known "hd", * adjust displayed offset by file_offset */ void fprintf_hex(FILE *fp, off_t file_offset, const void *buf, unsigned len) { const unsigned char *c = buf; unsigned o; int skipped = 0; for (o = 0; o + 16 < len; o += 16, c += 16) { if (o && !memcmp(c - 16, c, 16)) { skipped = 1; continue; } if (skipped) { skipped = 0; fprintf(fp, "*\n"); } /* no error check here, don't know what to do about errors */ fprintf(fp, /* offset */ "%08llx" /* two times 8 byte as byte stream, on disk order */ " %02x %02x %02x %02x %02x %02x %02x %02x" " %02x %02x %02x %02x %02x %02x %02x %02x" /* the same as printable char or '.' */ " |%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c|\n", (unsigned long long)o + file_offset, c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7], c[8], c[9], c[10], c[11], c[12], c[13], c[14], c[15], #define p_(x) (isprint(x) ? x : '.') #define p(a,b,c,d,e,f,g,h) \ p_(a), p_(b), p_(c), p_(d), p_(e), p_(f), p_(g), p_(h) p(c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7]), p(c[8], c[9], c[10], c[11], c[12], c[13], c[14], c[15]) ); } if (skipped) { skipped = 0; fprintf(fp, "*\n"); } if (o < len) { unsigned remaining = len - o; unsigned i; fprintf(fp, "%08llx ", (unsigned long long)o + file_offset); for (i = 0; i < remaining; i++) { if (i == 8) fprintf(fp, " "); fprintf(fp, " %02x", c[i]); } fprintf(fp, "%*s |", (16 - i)*3 + (i < 8), ""); for (i = 0; i < remaining; i++) fprintf(fp, "%c", p_(c[i])); #undef p #undef p_ fprintf(fp, "|\n"); } fprintf(fp, "%08llx\n", (unsigned long long)len + file_offset); } drbd-8.4.4/user/drbdtool_common.h0000664000000000000000000000736012216604252015460 0ustar rootroot#ifndef DRBDTOOL_COMMON_H #define DRBDTOOL_COMMON_H #include "drbd_endian.h" #include #include #include #include #define LANANA_DRBD_MAJOR 147 /* we should get this into linux/major.h */ #ifndef DRBD_MAJOR #define DRBD_MAJOR LANANA_DRBD_MAJOR #elif (DRBD_MAJOR != LANANA_DRBD_MAJOR) # error "FIXME unexpected DRBD_MAJOR" #endif #ifndef __packed #define __packed __attribute__((packed)) #endif #ifndef ARRAY_SIZE #define ARRAY_SIZE(A) (sizeof(A)/sizeof(A[0])) #endif #define COMM_TIMEOUT 120 /* MetaDataIndex for v06 / v07 style meta data blocks */ enum MetaDataIndex { Flags, /* Consistency flag,connected-ind,primary-ind */ HumanCnt, /* human-intervention-count */ TimeoutCnt, /* timout-count */ ConnectedCnt, /* connected-count */ ArbitraryCnt, /* arbitrary-count */ GEN_CNT_SIZE /* MUST BE LAST! (and Flags must stay first...) */ }; /* #define PERROR(fmt, args...) \ do { fprintf(stderr,fmt ": " , ##args); perror(0); } while (0) */ #define PERROR(fmt, args...) fprintf(stderr, fmt ": %m\n" , ##args); enum new_strtoll_errs { MSE_OK, MSE_DEFAULT_UNIT, MSE_MISSING_NUMBER, MSE_INVALID_NUMBER, MSE_INVALID_UNIT, MSE_OUT_OF_RANGE, }; struct option; extern int only_digits(const char *s); extern int dt_lock_drbd(int minor); extern void dt_unlock_drbd(int lock_fd); extern void dt_release_lockfile(int drbd_fd); extern int dt_minor_of_dev(const char *device); extern int new_strtoll(const char *s, const char def_unit, unsigned long long *rv); extern unsigned long long m_strtoll(const char* s,const char def_unit); extern const char* make_optstring(struct option *options); extern char* ppsize(char* buf, unsigned long long size); extern void dt_print_gc(const uint32_t* gen_cnt); extern void dt_pretty_print_gc(const uint32_t* gen_cnt); extern void dt_print_uuids(const uint64_t* uuid, unsigned int flags); extern void dt_pretty_print_uuids(const uint64_t* uuid, unsigned int flags); extern int fget_token(char *s, int size, FILE* stream); extern int sget_token(char *s, int size, const char** text); extern uint64_t bdev_size(int fd); extern void get_random_bytes(void* buffer, int len); extern const char* shell_escape(const char* s); /* In-place unescape double quotes and backslash escape sequences from a * double quoted string. Note: backslash is only useful to quote itself, or * double quote, no special treatment to any c-style escape sequences. */ extern void unescape(char *txt); /* Since glibc 2.8~20080505-0ubuntu7 asprintf() is declared with the warn_unused_result attribute.... */ extern int m_asprintf(char **strp, const char *fmt, ...); extern void fprintf_hex(FILE *fp, off_t file_offset, const void *buf, unsigned len); /* If the lower level device is resized, * and DRBD did not move its "internal" meta data in time, * the next time we try to attach, we won't find our meta data. * * Some helpers for storing and retrieving "last known" * information, to be able to find it regardless, * without scanning the full device for magic numbers. */ /* We may want to store more things later... if so, we can easily change to * some NULL terminated tag-value list format then. * For now: store the last known lower level block device size, * and its /dev/ */ struct bdev_info { uint64_t bd_size; uint64_t bd_uuid; char *bd_name; }; /* these return 0 on sucess, error code if something goes wrong. */ /* create (update) the last-known-bdev-info file */ extern int lk_bdev_save(const unsigned minor, const struct bdev_info *bd); /* we may want to remove all stored information */ extern int lk_bdev_delete(const unsigned minor); /* load info from that file. * caller should free(bd->bd_name) once it is no longer needed. */ extern int lk_bdev_load(const unsigned minor, struct bdev_info *bd); #endif drbd-8.4.4/user/legacy/.gitignore0000664000000000000000000000010211605310253015341 0ustar rootrootMakefile config.h drbd_buildtag.c drbd_endian.h drbdadm_scanner.c drbd-8.4.4/user/legacy/Makefile.in0000664000000000000000000000720112221261130015417 0ustar rootroot# Makefile for drbd.o # # This file is part of DRBD by Philipp Reisner and Lars Ellenberg. # # drbd is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # drbd is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with drbd; see the file COPYING. If not, write to # the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. # # variables set by configure DISTRO = @DISTRO@ prefix = @prefix@ exec_prefix = @exec_prefix@ localstatedir = @localstatedir@ datarootdir = @datarootdir@ datadir = @datadir@ sbindir = @sbindir@ sysconfdir = @sysconfdir@ BASH_COMPLETION_SUFFIX = @BASH_COMPLETION_SUFFIX@ UDEV_RULE_SUFFIX = @UDEV_RULE_SUFFIX@ INITDIR = @INITDIR@ LIBDIR = @prefix@/lib/@PACKAGE_TARNAME@ CC = @CC@ CFLAGS = @CFLAGS@ LDFLAGS = @LDFLAGS@ LN_S = @LN_S@ # features enabled or disabled by configure WITH_UTILS = @WITH_UTILS@ WITH_LEGACY_UTILS = @WITH_LEGACY_UTILS@ WITH_KM = @WITH_KM@ WITH_UDEV = @WITH_UDEV@ WITH_XEN = @WITH_XEN@ WITH_PACEMAKER = @WITH_PACEMAKER@ WITH_HEARTBEAT = @WITH_HEARTBEAT@ WITH_RGMANAGER = @WITH_RGMANAGER@ WITH_BASHCOMPLETION = @WITH_BASHCOMPLETION@ # variables meant to be overridden from the make command line DESTDIR ?= / CFLAGS += -Wall -I. -I../drbd -I../drbd/compat drbdadm-obj = drbdadm_scanner.o drbdadm_parser.o drbdadm_main.o \ drbdadm_adjust.o drbdtool_common.o drbdadm_usage_cnt.o \ drbd_buildtag.o drbdadm_minor_table.o drbdsetup-obj = drbdsetup.o drbdtool_common.o drbd_buildtag.o \ drbd_strings.o all: tools ifeq ($(WITH_LEGACY_UTILS),yes) tools: drbdadm-83 drbdsetup-83 else tools: endif drbd_buildtag.c: ../../drbd/drbd_buildtag.c cp $< $@ drbd_endian.h: ../drbd_endian.h cp $< $@ drbdtool_common.h: drbd_endian.h drbdadm-83: $(drbdadm-obj) $(LINK.c) $(LDFLAGS) -o $@ $^ drbdadm_scanner.c: drbdadm_scanner.fl drbdadm_parser.h flex -s -odrbdadm_scanner.c drbdadm_scanner.fl drbdsetup-83: $(drbdsetup-obj) $(LINK.c) $(LDFLAGS) -o $@ $^ clean: rm -f drbdadm_scanner.c rm -f drbdsetup-83 drbdadm-83 *.o rm -f drbd_buildtag.c drbd_endian.h rm -f *~ distclean: clean install: ifeq ($(WITH_UTILS),yes) ifeq ($(WITH_LEGACY_UTILS),yes) install -d $(DESTDIR)$(localstatedir)/lib/drbd install -d $(DESTDIR)$(localstatedir)/lock install -d $(DESTDIR)/lib/drbd/ if getent group haclient > /dev/null 2> /dev/null ; then \ install -g haclient -m 4750 drbdsetup-83 $(DESTDIR)/lib/drbd/ ; \ install -m 755 drbdadm-83 $(DESTDIR)/lib/drbd/ ; \ else \ install -m 755 drbdsetup-83 $(DESTDIR)/lib/drbd/ ; \ install -m 755 drbdadm-83 $(DESTDIR)/lib/drbd/ ; \ fi endif endif uninstall: rm -f $(DESTDIR)/lib/drbd/drbdsetup-83 rm -f $(DESTDIR)/lib/drbd/drbdadm-83 ###dependencies drbdsetup.o: drbdtool_common.h linux/drbd_limits.h drbdsetup.o: linux/drbd_tag_magic.h linux/drbd.h drbdsetup.o: linux/drbd_config.h linux/drbd_nl.h drbdsetup.o: unaligned.h drbdtool_common.o: drbdtool_common.h drbdadm_main.o: drbdtool_common.h drbdadm.h drbdadm_adjust.o: drbdtool_common.h drbdadm.h drbdadm_parser.o: drbdtool_common.h drbdadm.h linux/drbd_limits.h drbd_endian.h drbdadm_scanner.o: drbdtool_common.h drbdadm.h drbdadm_parser.h drbdsetup.o: drbdtool_common.h linux/drbd_limits.h drbdadm_usage_cnt.o: drbdtool_common.h drbdadm.h drbd_endian.h drbd-8.4.4/user/legacy/config.h.in0000664000000000000000000000147511605310253015412 0ustar rootroot/* user/config.h.in. Generated from configure.ac by autoheader. */ /* Local configuration directory. Commonly /etc or /usr/local/etc */ #undef DRBD_CONFIG_DIR /* Local state directory. Commonly /var/lib/drbd or /usr/local/var/lib/drbd */ #undef DRBD_LIB_DIR /* Local lock directory. Commonly /var/lock or /usr/local/var/lock */ #undef DRBD_LOCK_DIR /* Define to the address where bug reports for this package should be sent. */ #undef PACKAGE_BUGREPORT /* Define to the full name of this package. */ #undef PACKAGE_NAME /* Define to the full name and version of this package. */ #undef PACKAGE_STRING /* Define to the one symbol short name of this package. */ #undef PACKAGE_TARNAME /* Define to the home page for this package. */ #undef PACKAGE_URL /* Define to the version of this package. */ #undef PACKAGE_VERSION drbd-8.4.4/user/legacy/drbd_strings.c0000664000000000000000000001025311605310253016211 0ustar rootroot/* drbd.h This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include static const char *drbd_conn_s_names[] = { [C_STANDALONE] = "StandAlone", [C_DISCONNECTING] = "Disconnecting", [C_UNCONNECTED] = "Unconnected", [C_TIMEOUT] = "Timeout", [C_BROKEN_PIPE] = "BrokenPipe", [C_NETWORK_FAILURE] = "NetworkFailure", [C_PROTOCOL_ERROR] = "ProtocolError", [C_WF_CONNECTION] = "WFConnection", [C_WF_REPORT_PARAMS] = "WFReportParams", [C_TEAR_DOWN] = "TearDown", [C_CONNECTED] = "Connected", [C_STARTING_SYNC_S] = "StartingSyncS", [C_STARTING_SYNC_T] = "StartingSyncT", [C_WF_BITMAP_S] = "WFBitMapS", [C_WF_BITMAP_T] = "WFBitMapT", [C_WF_SYNC_UUID] = "WFSyncUUID", [C_SYNC_SOURCE] = "SyncSource", [C_SYNC_TARGET] = "SyncTarget", [C_PAUSED_SYNC_S] = "PausedSyncS", [C_PAUSED_SYNC_T] = "PausedSyncT", [C_VERIFY_S] = "VerifyS", [C_VERIFY_T] = "VerifyT", [C_AHEAD] = "Ahead", [C_BEHIND] = "Behind", }; static const char *drbd_role_s_names[] = { [R_PRIMARY] = "Primary", [R_SECONDARY] = "Secondary", [R_UNKNOWN] = "Unknown" }; static const char *drbd_disk_s_names[] = { [D_DISKLESS] = "Diskless", [D_ATTACHING] = "Attaching", [D_FAILED] = "Failed", [D_NEGOTIATING] = "Negotiating", [D_INCONSISTENT] = "Inconsistent", [D_OUTDATED] = "Outdated", [D_UNKNOWN] = "DUnknown", [D_CONSISTENT] = "Consistent", [D_UP_TO_DATE] = "UpToDate", }; static const char *drbd_state_sw_errors[] = { [-SS_TWO_PRIMARIES] = "Multiple primaries not allowed by config", [-SS_NO_UP_TO_DATE_DISK] = "Need access to UpToDate data", [-SS_NO_LOCAL_DISK] = "Can not resync without local disk", [-SS_NO_REMOTE_DISK] = "Can not resync without remote disk", [-SS_CONNECTED_OUTDATES] = "Refusing to be Outdated while Connected", [-SS_PRIMARY_NOP] = "Refusing to be Primary while peer is not outdated", [-SS_RESYNC_RUNNING] = "Can not start OV/resync since it is already active", [-SS_ALREADY_STANDALONE] = "Can not disconnect a StandAlone device", [-SS_CW_FAILED_BY_PEER] = "State change was refused by peer node", [-SS_IS_DISKLESS] = "Device is diskless, the requested operation requires a disk", [-SS_DEVICE_IN_USE] = "Device is held open by someone", [-SS_NO_NET_CONFIG] = "Have no net/connection configuration", [-SS_NO_VERIFY_ALG] = "Need a verify algorithm to start online verify", [-SS_NEED_CONNECTION] = "Need a connection to start verify or resync", [-SS_NOT_SUPPORTED] = "Peer does not support protocol", [-SS_LOWER_THAN_OUTDATED] = "Disk state is lower than outdated", [-SS_IN_TRANSIENT_STATE] = "In transient state, retry after next state change", [-SS_CONCURRENT_ST_CHG] = "Concurrent state changes detected and aborted", }; const char *drbd_conn_str(enum drbd_conns s) { /* enums are unsigned... */ return s > C_BEHIND ? "TOO_LARGE" : drbd_conn_s_names[s]; } const char *drbd_role_str(enum drbd_role s) { return s > R_SECONDARY ? "TOO_LARGE" : drbd_role_s_names[s]; } const char *drbd_disk_str(enum drbd_disk_state s) { return s > D_UP_TO_DATE ? "TOO_LARGE" : drbd_disk_s_names[s]; } const char *drbd_set_st_err_str(enum drbd_state_rv err) { return err <= SS_AFTER_LAST_ERROR ? "TOO_SMALL" : err > SS_TWO_PRIMARIES ? "TOO_LARGE" : drbd_state_sw_errors[-err]; } drbd-8.4.4/user/legacy/drbdadm.h0000664000000000000000000001772111605310253015136 0ustar rootroot#ifndef DRBDADM_H #define DRBDADM_H #include #include #include #include #include #include #include "config.h" #define E_syntax 2 #define E_usage 3 #define E_config_invalid 10 #define E_exec_error 20 #define E_thinko 42 /* :) */ enum { SLEEPS_FINITE = 1, SLEEPS_SHORT = 2+1, SLEEPS_LONG = 4+1, SLEEPS_VERY_LONG = 8+1, SLEEPS_MASK = 15, RETURN_PID = 2, SLEEPS_FOREVER = 4, SUPRESS_STDERR = 0x10, RETURN_STDOUT_FD = 0x20, RETURN_STDERR_FD = 0x40, DONT_REPORT_FAILED = 0x80, }; /* for check_uniq(): Check for uniqueness of certain values... * comment out if you want to NOT choke on the first conflict */ #define EXIT_ON_CONFLICT 1 /* for verify_ips(): are not verifyable ips fatal? */ #define INVALID_IP_IS_INVALID_CONF 1 enum usage_count_type { UC_YES, UC_NO, UC_ASK, }; struct d_globals { int disable_io_hints; int disable_ip_verification; int minor_count; int dialog_refresh; enum usage_count_type usage_count; }; #define IFI_HADDR 8 #define IFI_ALIAS 1 struct ifi_info { char ifi_name[IFNAMSIZ]; /* interface name, nul terminated */ uint8_t ifi_haddr[IFI_HADDR]; /* hardware address */ uint16_t ifi_hlen; /* bytes in hardware address, 0, 6, 8 */ short ifi_flags; /* IFF_xxx constants from */ short ifi_myflags; /* our own IFI_xxx flags */ struct sockaddr *ifi_addr; /* primary address */ struct ifi_info *ifi_next; /* next ifi_info structure */ }; struct d_name { char *name; struct d_name *next; }; struct d_proxy_info { struct d_name *on_hosts; char* inside_addr; char* inside_port; char* inside_af; char* outside_addr; char* outside_port; char* outside_af; }; struct d_host_info { struct d_name *on_hosts; char* device; unsigned device_minor; char* disk; char* address; char* port; char* meta_disk; char* address_family; int meta_major; int meta_minor; char* meta_index; struct d_proxy_info *proxy; struct d_host_info* next; struct d_resource* lower; /* for device stacking */ char *lower_name; /* for device stacking, before bind_stacked_res() */ int config_line; unsigned int by_address:1; /* Match to machines by address, not by names (=on_hosts) */ }; struct d_option { char* name; char* value; struct d_option* next; unsigned int mentioned :1 ; // for the adjust command. unsigned int is_default :1 ; // for the adjust command. unsigned int is_escaped :1 ; }; struct d_resource { char* name; char* protocol; /* these get propagated to host_info sections later. */ char* device; unsigned device_minor; char* disk; char* meta_disk; char* meta_index; struct d_host_info* me; struct d_host_info* peer; struct d_host_info* all_hosts; struct d_option* net_options; struct d_option* disk_options; struct d_option* sync_options; struct d_option* startup_options; struct d_option* handlers; struct d_option* proxy_options; struct d_option* proxy_plugins; struct d_resource* next; struct d_name *become_primary_on; char *config_file; /* The config file this resource is define in.*/ int start_line; unsigned int stacked_timeouts:1; unsigned int ignore:1; unsigned int stacked:1; /* Stacked on this node */ unsigned int stacked_on_one:1; /* Stacked either on me or on peer */ }; extern char *canonify_path(char *path); extern int adm_attach(struct d_resource* ,const char* ); extern int adm_connect(struct d_resource* ,const char* ); extern int adm_resize(struct d_resource* ,const char* ); extern int adm_syncer(struct d_resource* ,const char* ); extern int adm_generic_s(struct d_resource* ,const char* ); extern int _admm_generic(struct d_resource* ,const char*, int flags); extern void m__system(char **argv, int flags, struct d_resource *res, pid_t *kid, int *fd, int *ex); static inline int m_system_ex(char **argv, int flags, struct d_resource *res) { int ex; m__system(argv, flags, res, NULL, NULL, &ex); return ex; } extern struct d_option* find_opt(struct d_option*,char*); extern void validate_resource(struct d_resource *); extern void schedule_dcmd( int (* function)(struct d_resource*,const char* ), struct d_resource* res, char* arg, int order); extern int version_code_kernel(void); extern int version_code_userland(void); extern void warn_on_version_mismatch(void); extern void uc_node(enum usage_count_type type); extern int adm_create_md(struct d_resource* res ,const char* cmd); extern void convert_discard_opt(struct d_resource* res); extern void convert_after_option(struct d_resource* res); extern int have_ip(const char *af, const char *ip); /* See drbdadm_minor_table.c */ extern int register_minor(int minor, const char *path); extern int unregister_minor(int minor); extern char *lookup_minor(int minor); enum pr_flags { NoneHAllowed = 4, IgnDiscardMyData = 8 }; enum pp_flags { match_on_proxy = 1, }; extern struct d_resource* parse_resource(char*, enum pr_flags); extern void post_parse(struct d_resource *config, enum pp_flags); extern struct d_option *new_opt(char *name, char *value); extern int name_in_names(char *name, struct d_name *names); extern char *_names_to_str(char* buffer, struct d_name *names); extern char *_names_to_str_c(char* buffer, struct d_name *names, char c); #define NAMES_STR_SIZE 255 #define names_to_str(N) _names_to_str(alloca(NAMES_STR_SIZE+1), N) #define names_to_str_c(N, C) _names_to_str_c(alloca(NAMES_STR_SIZE+1), N, C) extern void free_names(struct d_name *names); extern void set_me_in_resource(struct d_resource* res, int match_on_proxy); extern void set_peer_in_resource(struct d_resource* res, int peer_required); extern void set_on_hosts_in_res(struct d_resource *res); extern void set_disk_in_res(struct d_resource *res); extern char *proxy_connection_name(struct d_resource *res); int parse_proxy_settings(struct d_resource *res, int check_proxy_token); /* conn_name is optional and mostly for compatibility with dcmd */ int do_proxy_conn_up(struct d_resource *res, const char *conn_name); int do_proxy_conn_down(struct d_resource *res, const char *conn_name); int do_proxy_conn_plugins(struct d_resource *res, const char *conn_name); extern char *config_file; extern char *config_save; extern int config_valid; extern struct d_resource* config; extern struct d_resource* common; extern struct d_globals global_options; extern int line, fline; extern struct hsearch_data global_htable; extern int no_tty; extern int dry_run; extern int verbose; extern char* drbdsetup; extern char* drbd_proxy_ctl; extern char ss_buffer[1024]; extern struct utsname nodeinfo; extern char* setup_opts[10]; extern char* connect_to_host; extern int soi; /* ssprintf() places the result of the printf in the current stack frame and sets ptr to the resulting string. If the current stack frame is destroyed (=function returns), the allocated memory is freed automatically */ /* // This is the nicer version, that does not need the ss_buffer. // But it only works with very new glibcs. #define ssprintf(...) \ ({ int _ss_size = snprintf(0, 0, ##__VA_ARGS__); \ char *_ss_ret = __builtin_alloca(_ss_size+1); \ snprintf(_ss_ret, _ss_size+1, ##__VA_ARGS__); \ _ss_ret; }) */ #define ssprintf(ptr,...) \ ptr=strcpy(alloca(snprintf(ss_buffer,sizeof(ss_buffer),##__VA_ARGS__)+1),ss_buffer) /* CAUTION: arguments may not have side effects! */ #define for_each_resource(res,tmp,config) \ for (res = (config); res && (tmp = res->next, 1); res = tmp) #endif #define APPEND(LIST,ITEM) ({ \ typeof((LIST)) _l = (LIST); \ typeof((ITEM)) _i = (ITEM); \ typeof((ITEM)) _t; \ _i->next = NULL; \ if (_l == NULL) { _l = _i; } \ else { \ for (_t = _l; _t->next; _t = _t->next); \ _t->next = _i; \ }; \ _l; \ }) #define PARSER_CHECK_PROXY_KEYWORD (1) #define PARSER_STOP_IF_INVALID (2) drbd-8.4.4/user/legacy/drbdadm_adjust.c0000664000000000000000000003247011605310253016501 0ustar rootroot/* drbdadm_adjust.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2003-2008, LINBIT Information Technologies GmbH. Copyright (C) 2003-2008, Philipp Reisner . Copyright (C) 2003-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include "drbdadm.h" #include "drbdtool_common.h" #include "drbdadm_parser.h" /* drbdsetup show might complain that the device minor does not exist at all. Redirect stderr to /dev/null therefore. */ static FILE *m_popen(int *pid,char** argv) { int mpid; int pipes[2]; int dev_null; if(pipe(pipes)) { perror("Creation of pipes failed"); exit(E_exec_error); } dev_null = open("/dev/null", O_WRONLY); if (dev_null == -1) { perror("Opening /dev/null failed"); exit(E_exec_error); } mpid = fork(); if(mpid == -1) { fprintf(stderr,"Can not fork"); exit(E_exec_error); } if(mpid == 0) { close(pipes[0]); // close reading end dup2(pipes[1], fileno(stdout)); close(pipes[1]); dup2(dev_null, fileno(stderr)); close(dev_null); execvp(argv[0],argv); fprintf(stderr,"Can not exec"); exit(E_exec_error); } close(pipes[1]); // close writing end close(dev_null); *pid=mpid; return fdopen(pipes[0],"r"); } /* option value equal? */ static int ov_eq(char* val1, char* val2) { unsigned long long v1,v2; if(val1 == NULL && val2 == NULL) return 1; if(val1 == NULL || val2 == NULL) return 0; if(new_strtoll(val1,0,&v1) == MSE_OK && new_strtoll(val2,0,&v2) == MSE_OK) return v1 == v2; return !strcmp(val1,val2); } static int opts_equal(struct d_option* conf, struct d_option* running) { struct d_option* opt; while(running) { if((opt=find_opt(conf,running->name))) { if(!ov_eq(running->value,opt->value)) { /* printf("Value of '%s' differs: r=%s c=%s\n", opt->name,running->value,opt->value); */ return 0; } opt->mentioned=1; } else { if(!running->is_default) { /*printf("Only in running config %s: %s\n", running->name,running->value);*/ return 0; } } running=running->next; } while(conf) { if(conf->mentioned==0) { /*printf("Only in config file %s: %s\n", conf->name,conf->value);*/ return 0; } conf=conf->next; } return 1; } static int addr_equal(struct d_resource* conf, struct d_resource* running) { int equal; if (conf->peer == NULL && running->peer == NULL) return 1; if (running->peer == NULL) return 0; equal = !strcmp(conf->me->address, running->me->address) && !strcmp(conf->me->port, running->me->port) && !strcmp(conf->me->address_family, running->me->address_family); if(conf->me->proxy) equal = equal && !strcmp(conf->me->proxy->inside_addr, running->peer->address) && !strcmp(conf->me->proxy->inside_port, running->peer->port) && !strcmp(conf->me->proxy->inside_af, running->peer->address_family); else equal = equal && conf->peer && !strcmp(conf->peer->address, running->peer->address) && !strcmp(conf->peer->port, running->peer->port) && !strcmp(conf->peer->address_family, running->peer->address_family); return equal; } static int proto_equal(struct d_resource* conf, struct d_resource* running) { if (conf->protocol == NULL && running->protocol == NULL) return 1; if (conf->protocol == NULL || running->protocol == NULL) return 0; return !strcmp(conf->protocol, running->protocol); } /* Are both internal, or are both not internal. */ static int int_eq(char* m_conf, char* m_running) { return !strcmp(m_conf,"internal") == !strcmp(m_running,"internal"); } static int disk_equal(struct d_host_info* conf, struct d_host_info* running) { int eq = 1; if (conf->disk == NULL && running->disk == NULL) return 1; if (conf->disk == NULL || running->disk == NULL) return 0; eq &= !strcmp(conf->disk,running->disk); eq &= int_eq(conf->meta_disk,running->meta_disk); if(!strcmp(conf->meta_disk,"internal")) return eq; eq &= !strcmp(conf->meta_disk,running->meta_disk); return eq; } /* NULL terminated */ static void find_option_in_resources(char *name, struct d_option *list, struct d_option **opt, ...) { va_list va; va_start(va, opt); /* We need to keep setting *opt to NULL, even if a list == NULL. */ while (list || opt) { while (list) { if (strcmp(list->name, name) == 0) break; list = list->next; } *opt = list; list = va_arg(va, struct d_option*); opt = va_arg(va, struct d_option**); } } static int do_proxy_reconf(struct d_resource *res, const char *cmd) { int rv; char *argv[4] = { drbd_proxy_ctl, "-c", (char*)cmd, NULL }; rv = m_system_ex(argv, SLEEPS_SHORT, res); return rv; } #define MAX_PLUGINS (10) #define MAX_PLUGIN_NAME (16) /* The new name is appended to the alist. */ int _is_plugin_in_list(char *string, char slist[MAX_PLUGINS][MAX_PLUGIN_NAME], char alist[MAX_PLUGINS][MAX_PLUGIN_NAME], int list_len) { int word_len, i; char *copy; for(word_len=0; string[word_len]; word_len++) if (isspace(string[word_len])) break; if (word_len+1 >= MAX_PLUGIN_NAME) { fprintf(stderr, "Wrong proxy plugin name %*.*s", word_len, word_len, string); exit(E_config_invalid); } copy = alist[list_len]; strncpy(copy, string, word_len); copy[word_len] = 0; for(i=0; i= MAX_PLUGINS) { fprintf(stderr, "Too many proxy plugins."); exit(E_config_invalid); } return 0; } static int proxy_reconf(struct d_resource *res, struct d_resource *running) { int reconn = 0; struct d_option* res_o, *run_o; unsigned long long v1, v2, minimum; char *plugin_changes[MAX_PLUGINS], *cp, *conn_name; /* It's less memory usage when we're storing char[]. malloc overhead for * the few bytes + pointers is much more. */ char p_res[MAX_PLUGINS][MAX_PLUGIN_NAME], p_run[MAX_PLUGINS][MAX_PLUGIN_NAME]; int used, i, re_do; reconn = 0; find_option_in_resources("memlimit", res->proxy_options, &res_o, running->proxy_options, &run_o, NULL, NULL); v1 = res_o ? m_strtoll(res_o->value, 1) : 0; v2 = run_o ? m_strtoll(run_o->value, 1) : 0; minimum = v1 < v2 ? v1 : v2; /* We allow an Ñ” [epsilon] of 2%, so that small (rounding) deviations do * not cause the connection to be re-established. */ if (res_o && (!run_o || abs(v1-v2)/(float)minimum > 0.02)) { redo_whole_conn: /* As the memory is in use while the connection is allocated we have to * completely destroy and rebuild the connection. */ schedule_dcmd( do_proxy_conn_down, res, NULL, 0); schedule_dcmd( do_proxy_conn_up, res, NULL, 1); schedule_dcmd( do_proxy_conn_plugins, res, NULL, 2); /* With connection cleanup and reopen everything is rebuild anyway, and * DRBD will get a reconnect too. */ return 0; } res_o = res->proxy_plugins; run_o = running->proxy_plugins; used = 0; conn_name = proxy_connection_name(res); for(i=0; i= sizeof(plugin_changes)-1) { fprintf(stderr, "Too many proxy plugin changes"); exit(E_config_invalid); } /* Now we can be sure that we can store another pointer. */ if (!res_o) { if (run_o) { /* More plugins running than configured - just stop here. */ m_asprintf(&cp, "set plugin %s %d end", conn_name, i); plugin_changes[used++] = cp; } else { /* Both at the end? ok, quit loop */ } break; } /* res_o != NULL. */ if (!run_o) { p_run[i][0] = 0; if (_is_plugin_in_list(res_o->name, p_run, p_res, i)) { /* Current plugin was already active, just at another position. * Redo the whole connection. */ goto redo_whole_conn; } /* More configured than running - just add it, if it's not already * somewhere else. */ m_asprintf(&cp, "set plugin %s %d %s", conn_name, i, res_o->name); plugin_changes[used++] = cp; } else { /* If we get here, both lists have been filled in parallel, so we * can simply use the common counter. */ re_do = _is_plugin_in_list(res_o->name, p_run, p_res, i) || _is_plugin_in_list(run_o->name, p_res, p_run, i); if (re_do) { /* Plugin(s) were moved, not simple reconfigured. * Re-do the whole connection. */ goto redo_whole_conn; } /* TODO: We don't (yet) account for possible different ordering of * the parameters to the plugin. * plugin A 1 B 2 * should be treated as equal to * plugin B 2 A 1. */ if (strcmp(run_o->name, res_o->name) != 0) { /* Either a different plugin, or just different settings * - plugin can be overwritten. */ m_asprintf(&cp, "set plugin %s %d %s", conn_name, i, res_o->name); plugin_changes[used++] = cp; } } if (res_o) res_o = res_o->next; if (run_o) run_o = run_o->next; } /* change only a few plugin settings. */ for(i=0; iname); err = stat("/dev/drbd/by-res", &sbuf); if (err) /* probably no udev rules in use */ return 0; err = stat(link_name, &sbuf); if (err) /* resource link cannot be stat()ed. */ return 1; /* double check device information */ if (!S_ISBLK(sbuf.st_mode)) return 1; if (major(sbuf.st_rdev) != DRBD_MAJOR) return 1; if (minor(sbuf.st_rdev) != res->me->device_minor) return 1; /* Link exists, and is expected block major:minor. * Do nothing. */ return 0; } /* * CAUTION this modifies global static char * config_file! */ int adm_adjust(struct d_resource* res,char* unused __attribute((unused))) { char* argv[20]; int pid,argc, i; struct d_resource* running; int do_attach=0,do_connect=0,do_syncer=0; int have_disk=0,have_net=0,can_do_proxy=1; char config_file_dummy[250], *conn_name, show_conn[128]; /* disable check_uniq, so it won't interfere * with parsing of drbdsetup show output */ config_valid = 2; /* setup error reporting context for the parsing routines */ line = 1; sprintf(config_file_dummy,"drbdsetup %u show", res->me->device_minor); config_file = config_file_dummy; argc=0; argv[argc++]=drbdsetup; argv[argc++]=res->me->device; argv[argc++]="show"; argv[argc++]=0; /* actually parse drbdsetup show output */ yyin = m_popen(&pid,argv); running = parse_resource(res->name, IgnDiscardMyData); fclose(yyin); waitpid(pid,0,0); /* Sets "me" and "peer" pointer */ post_parse(running, 0); set_peer_in_resource(running, 0); /* Parse proxy settings, if this host has a proxy definition */ if (res->me->proxy) { line = 1; conn_name = proxy_connection_name(res); i=snprintf(show_conn, sizeof(show_conn), "show proxy-settings %s", conn_name); if (i>= sizeof(show_conn)-1) { fprintf(stderr,"connection name too long"); exit(E_thinko); } sprintf(config_file_dummy,"drbd-proxy-ctl -c '%s'", show_conn); config_file = config_file_dummy; argc=0; argv[argc++]=drbd_proxy_ctl; argv[argc++]="-c"; argv[argc++]=show_conn; argv[argc++]=0; /* actually parse "drbd-proxy-ctl show" output */ yyin = m_popen(&pid,argv); can_do_proxy = !parse_proxy_settings(running, PARSER_CHECK_PROXY_KEYWORD | PARSER_STOP_IF_INVALID); fclose(yyin); waitpid(pid,0,0); } do_attach = !opts_equal(res->disk_options, running->disk_options); if(running->me) { do_attach |= (res->me->device_minor != running->me->device_minor); do_attach |= !disk_equal(res->me, running->me); have_disk = (running->me->disk != NULL); } else do_attach |= 1; do_connect = !opts_equal(res->net_options, running->net_options); do_connect |= !addr_equal(res,running); do_connect |= !proto_equal(res,running); /* No adjust support for drbd proxy version 1. */ if (res->me->proxy && can_do_proxy) do_connect |= proxy_reconf(res,running); have_net = (running->protocol != NULL); do_syncer = !opts_equal(res->sync_options, running->sync_options); /* Special case: nothing changed, but the resource name. * Trigger a no-op syncer request, which will cause a KOBJ_CHANGE * to be broadcast, so udev may pick up the resource name change * and update its symlinks. */ if (!(do_attach || do_syncer || do_connect)) do_syncer = need_trigger_kobj_change(running); if(do_attach) { if(have_disk) schedule_dcmd(adm_generic_s,res,"detach",0); schedule_dcmd(adm_attach,res,"attach",0); } if(do_syncer) schedule_dcmd(adm_syncer,res,"syncer",1); if(do_connect) { if (have_net && res->peer) schedule_dcmd(adm_generic_s,res,"disconnect",0); schedule_dcmd(adm_connect,res,"connect",2); } return 0; } drbd-8.4.4/user/legacy/drbdadm_main.c0000664000000000000000000025270412132747531016150 0ustar rootroot/* drbdadm_main.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2002-2008, LINBIT Information Technologies GmbH. Copyright (C) 2002-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "linux/drbd_limits.h" #include "drbdtool_common.h" #include "drbdadm.h" #define MAX_ARGS 40 static int indent = 0; #define INDENT_WIDTH 4 #define BFMT "%s;\n" #define IPV4FMT "%-16s %s %s:%s;\n" #define IPV6FMT "%-16s %s [%s]:%s;\n" #define MDISK "%-16s %s [%s];\n" #define FMDISK "%-16s %s;\n" #define printI(fmt, args... ) printf("%*s" fmt,INDENT_WIDTH * indent,"" , ## args ) #define printA(name, val ) \ printf("%*s%*s %3s;\n", \ INDENT_WIDTH * indent,"" , \ -24+INDENT_WIDTH * indent, \ name, val ) char *progname; struct adm_cmd { const char *name; int (*function) (struct d_resource *, const char *); /* which level this command is for. * 0: don't show this command, ever * 1: normal administrative commands, shown in normal help * 2-4: shown on "drbdadm hidden-commands" * 2: useful for shell scripts * 3: callbacks potentially called from kernel module on certain events * 4: advanced, experts and developers only */ unsigned int show_in_usage:3; /* if set, command requires an explicit resource name */ unsigned int res_name_required:1; /* error out if the ip specified is not available/active now */ unsigned int verify_ips:1; /* if set, use the "cache" in /var/lib/drbd to figure out * which config file to use. * This is necessary for handlers (callbacks from kernel) to work * when using "drbdadm -c /some/other/config/file" */ unsigned int use_cached_config_file:1; unsigned int need_peer:1; unsigned int is_proxy_cmd:1; unsigned int uc_dialog:1; /* May show usage count dialog */ unsigned int test_config:1; /* Allow -t option */ }; struct deferred_cmd { int (*function) (struct d_resource *, const char *); const char *arg; struct d_resource *res; struct deferred_cmd *next; }; struct option admopt[] = { {"stacked", no_argument, 0, 'S'}, {"dry-run", no_argument, 0, 'd'}, {"verbose", no_argument, 0, 'v'}, {"config-file", required_argument, 0, 'c'}, {"config-to-test", required_argument, 0, 't'}, {"drbdsetup", required_argument, 0, 's'}, {"drbdmeta", required_argument, 0, 'm'}, {"drbd-proxy-ctl", required_argument, 0, 'p'}, {"sh-varname", required_argument, 0, 'n'}, {"force", no_argument, 0, 'f'}, {"peer", required_argument, 0, 'P'}, {"version", no_argument, 0, 'V'}, {0, 0, 0, 0} }; extern int my_parse(); extern int yydebug; extern FILE *yyin; int adm_attach(struct d_resource *, const char *); int adm_connect(struct d_resource *, const char *); int adm_generic_s(struct d_resource *, const char *); int adm_status_xml(struct d_resource *, const char *); int adm_generic_l(struct d_resource *, const char *); int adm_resize(struct d_resource *, const char *); int adm_syncer(struct d_resource *, const char *); static int adm_up(struct d_resource *, const char *); extern int adm_adjust(struct d_resource *, const char *); static int adm_dump(struct d_resource *, const char *); static int adm_dump_xml(struct d_resource *, const char *); static int adm_wait_c(struct d_resource *, const char *); static int adm_wait_ci(struct d_resource *, const char *); static int adm_proxy_up(struct d_resource *, const char *); static int adm_proxy_down(struct d_resource *, const char *); static int sh_nop(struct d_resource *, const char *); static int sh_resources(struct d_resource *, const char *); static int sh_resource(struct d_resource *, const char *); static int sh_mod_parms(struct d_resource *, const char *); static int sh_dev(struct d_resource *, const char *); static int sh_udev(struct d_resource *, const char *); static int sh_minor(struct d_resource *, const char *); static int sh_ip(struct d_resource *, const char *); static int sh_lres(struct d_resource *, const char *); static int sh_ll_dev(struct d_resource *, const char *); static int sh_md_dev(struct d_resource *, const char *); static int sh_md_idx(struct d_resource *, const char *); static int sh_b_pri(struct d_resource *, const char *); static int sh_status(struct d_resource *, const char *); static int admm_generic(struct d_resource *, const char *); static int adm_khelper(struct d_resource *, const char *); static int adm_generic_b(struct d_resource *, const char *); static int hidden_cmds(struct d_resource *, const char *); static int adm_outdate(struct d_resource *, const char *); static int adm_chk_resize(struct d_resource *res, const char *cmd); static void dump_options(char *name, struct d_option *opts); static char *get_opt_val(struct d_option *, const char *, char *); static void register_config_file(struct d_resource *res, const char *cfname); static struct ifreq *get_ifreq(); char ss_buffer[1024]; struct utsname nodeinfo; int line = 1; int fline; struct d_globals global_options = { 0, 0, 0, 1, UC_ASK }; char *config_file = NULL; char *config_save = NULL; char *config_test = NULL; struct d_resource *config = NULL; struct d_resource *common = NULL; struct ifreq *ifreq_list = NULL; int is_drbd_top; int nr_resources; int nr_stacked; int nr_normal; int nr_ignore; int highest_minor; int config_from_stdin = 0; int config_valid = 1; int no_tty; int dry_run = 0; int verbose = 0; int do_verify_ips = 0; int do_register_minor = 1; /* whether drbdadm was called with "all" instead of resource name(s) */ int all_resources = 0; char *drbdsetup = NULL; char *drbdmeta = NULL; char *drbd_proxy_ctl; char *sh_varname = NULL; char *setup_opts[10]; char *connect_to_host = NULL; int soi = 0; volatile int alarm_raised; struct deferred_cmd *deferred_cmds[3] = { NULL, NULL, NULL }; struct deferred_cmd *deferred_cmds_tail[3] = { NULL, NULL, NULL }; /* DRBD adm_cmd flags shortcuts, * to avoid merge conflicts and unreadable diffs * when we add the next flag */ #define DRBD_acf1_default \ .show_in_usage = 1, \ .res_name_required = 1, \ .verify_ips = 0, \ .uc_dialog = 1, \ #define DRBD_acf1_connect \ .show_in_usage = 1, \ .res_name_required = 1, \ .verify_ips = 1, \ .need_peer = 1, \ .uc_dialog = 1, \ #define DRBD_acf1_defnet \ .show_in_usage = 1, \ .res_name_required = 1, \ .verify_ips = 1, \ .uc_dialog = 1, \ #define DRBD_acf3_handler \ .show_in_usage = 3, \ .res_name_required = 1, \ .verify_ips = 0, \ .use_cached_config_file = 1, \ #define DRBD_acf4_advanced \ .show_in_usage = 4, \ .res_name_required = 1, \ .verify_ips = 0, \ .uc_dialog = 1, \ #define DRBD_acf1_dump \ .show_in_usage = 1, \ .res_name_required = 1, \ .verify_ips = 1, \ .uc_dialog = 1, \ .test_config = 1, \ #define DRBD_acf2_shell \ .show_in_usage = 2, \ .res_name_required = 1, \ .verify_ips = 0, \ #define DRBD_acf2_proxy \ .show_in_usage = 2, \ .res_name_required = 1, \ .verify_ips = 0, \ .need_peer = 1, \ .is_proxy_cmd = 1, \ #define DRBD_acf2_hook \ .show_in_usage = 2, \ .res_name_required = 1, \ .verify_ips = 0, \ .use_cached_config_file = 1, \ #define DRBD_acf2_gen_shell \ .show_in_usage = 2, \ .res_name_required = 0, \ .verify_ips = 0, \ struct adm_cmd cmds[] = { /* name, function, flags * sort order: * - normal config commands, * - normal meta data manipulation * - sh-* * - handler * - advanced ***/ {"attach", adm_attach, DRBD_acf1_default}, {"detach", adm_generic_l, DRBD_acf1_default}, {"connect", adm_connect, DRBD_acf1_connect}, {"disconnect", adm_generic_s, DRBD_acf1_default}, {"up", adm_up, DRBD_acf1_connect}, {"down", adm_generic_l, DRBD_acf1_default}, {"primary", adm_generic_l, DRBD_acf1_default}, {"secondary", adm_generic_l, DRBD_acf1_default}, {"invalidate", adm_generic_b, DRBD_acf1_default}, {"invalidate-remote", adm_generic_l, DRBD_acf1_defnet}, {"outdate", adm_outdate, DRBD_acf1_default}, {"resize", adm_resize, DRBD_acf1_defnet}, {"syncer", adm_syncer, DRBD_acf1_defnet}, {"verify", adm_generic_s, DRBD_acf1_defnet}, {"pause-sync", adm_generic_s, DRBD_acf1_defnet}, {"resume-sync", adm_generic_s, DRBD_acf1_defnet}, {"adjust", adm_adjust, DRBD_acf1_connect}, {"wait-connect", adm_wait_c, DRBD_acf1_defnet}, {"wait-con-int", adm_wait_ci, .show_in_usage = 1,.verify_ips = 1,}, {"status", adm_status_xml, DRBD_acf2_gen_shell}, {"role", adm_generic_s, DRBD_acf1_default}, {"cstate", adm_generic_s, DRBD_acf1_default}, {"dstate", adm_generic_b, DRBD_acf1_default}, {"dump", adm_dump, DRBD_acf1_dump}, {"dump-xml", adm_dump_xml, DRBD_acf1_dump}, {"create-md", adm_create_md, DRBD_acf1_default}, {"show-gi", adm_generic_b, DRBD_acf1_default}, {"get-gi", adm_generic_b, DRBD_acf1_default}, {"dump-md", admm_generic, DRBD_acf1_default}, {"wipe-md", admm_generic, DRBD_acf1_default}, {"hidden-commands", hidden_cmds,.show_in_usage = 1,}, {"sh-nop", sh_nop, DRBD_acf2_gen_shell .uc_dialog = 1, .test_config = 1}, {"sh-resources", sh_resources, DRBD_acf2_gen_shell}, {"sh-resource", sh_resource, DRBD_acf2_shell}, {"sh-mod-parms", sh_mod_parms, DRBD_acf2_gen_shell}, {"sh-dev", sh_dev, DRBD_acf2_shell}, {"sh-udev", sh_udev, DRBD_acf2_hook}, {"sh-minor", sh_minor, DRBD_acf2_shell}, {"sh-ll-dev", sh_ll_dev, DRBD_acf2_shell}, {"sh-md-dev", sh_md_dev, DRBD_acf2_shell}, {"sh-md-idx", sh_md_idx, DRBD_acf2_shell}, {"sh-ip", sh_ip, DRBD_acf2_shell}, {"sh-lr-of", sh_lres, DRBD_acf2_shell}, {"sh-b-pri", sh_b_pri, DRBD_acf2_shell}, {"sh-status", sh_status, DRBD_acf2_gen_shell}, {"proxy-up", adm_proxy_up, DRBD_acf2_proxy}, {"proxy-down", adm_proxy_down, DRBD_acf2_proxy}, {"before-resync-target", adm_khelper, DRBD_acf3_handler}, {"after-resync-target", adm_khelper, DRBD_acf3_handler}, {"before-resync-source", adm_khelper, DRBD_acf3_handler}, {"pri-on-incon-degr", adm_khelper, DRBD_acf3_handler}, {"pri-lost-after-sb", adm_khelper, DRBD_acf3_handler}, {"fence-peer", adm_khelper, DRBD_acf3_handler}, {"local-io-error", adm_khelper, DRBD_acf3_handler}, {"pri-lost", adm_khelper, DRBD_acf3_handler}, {"initial-split-brain", adm_khelper, DRBD_acf3_handler}, {"split-brain", adm_khelper, DRBD_acf3_handler}, {"out-of-sync", adm_khelper, DRBD_acf3_handler}, {"suspend-io", adm_generic_s, DRBD_acf4_advanced}, {"resume-io", adm_generic_s, DRBD_acf4_advanced}, {"set-gi", admm_generic, DRBD_acf4_advanced}, {"new-current-uuid", adm_generic_s, DRBD_acf4_advanced}, {"check-resize", adm_chk_resize, DRBD_acf4_advanced}, }; void schedule_dcmd(int (*function) (struct d_resource *, const char *), struct d_resource *res, char *arg, int order) { struct deferred_cmd *d, *t; d = calloc(1, sizeof(struct deferred_cmd)); if (d == NULL) { perror("calloc"); exit(E_exec_error); } d->function = function; d->res = res; d->arg = arg; /* first to come is head */ if (!deferred_cmds[order]) deferred_cmds[order] = d; /* link it in at tail */ t = deferred_cmds_tail[order]; if (t) t->next = d; /* advance tail */ deferred_cmds_tail[order] = d; } static void _adm_generic(struct d_resource *res, const char *cmd, int flags, pid_t *pid, int *fd, int *ex); /* Returns non-zero if the resource is down. */ static int test_if_resource_is_down(struct d_resource *res) { char buf[1024]; int rr, s = 0; int fd; pid_t pid; int old_verbose = verbose; if (dry_run) { fprintf(stderr, "Logic bug: should not be dry-running here.\n"); exit(E_thinko); } if (verbose == 1) verbose = 0; _adm_generic(res, "role", SLEEPS_SHORT | RETURN_STDOUT_FD | SUPRESS_STDERR, &pid, &fd, NULL); verbose = old_verbose; if (fd < 0) { fprintf(stderr, "Strange: got negative fd.\n"); exit(E_thinko); } while (1) { rr = read(fd, buf + s, sizeof(buf) - s); if (rr <= 0) break; s += rr; } close(fd); waitpid(pid, NULL, 0); /* Reap the child process, do not leave a zombie around. */ alarm(0); if (s == 0 || strncmp(buf, "Unconfigured", strlen("Unconfigured")) == 0) return 1; return 0; } enum do_register { SAME_ANYWAYS, DO_REGISTER }; enum do_register if_conf_differs_confirm_or_abort(struct d_resource *res) { int minor = res->me->device_minor; char *f; /* if the resource was down, * just register the new config file */ if (test_if_resource_is_down(res)) { unregister_minor(minor); return DO_REGISTER; } f = lookup_minor(minor); /* if there was nothing registered before, * there is nothing to compare to */ if (!f) return DO_REGISTER; /* no need to register the same thing again */ if (strcmp(f, config_save) == 0) return SAME_ANYWAYS; fprintf(stderr, "Warning: resource %s\n" "last used config file: %s\n" " current config file: %s\n", res->name, f, config_save); /* implicitly force if we don't have a tty */ if (no_tty) force = 1; if (!confirmed("Do you want to proceed " "and register the current config file?")) { printf("Operation canceled.\n"); exit(E_usage); } return DO_REGISTER; } static void register_config_file(struct d_resource *res, const char *cfname) { int minor = res->me->device_minor; if (test_if_resource_is_down(res)) unregister_minor(minor); else register_minor(minor, cfname); } enum on_error { KEEP_RUNNING, EXIT_ON_FAIL }; int call_cmd_fn(int (*function) (struct d_resource *, const char *), const char *fn_name, struct d_resource *res, enum on_error on_error) { int rv; int really_register = do_register_minor && DO_REGISTER == if_conf_differs_confirm_or_abort(res) && /* adm_up and adm_adjust only * "schedule" the commands, don't register yet! */ function != adm_up && function != adm_adjust; rv = function(res, fn_name); if (rv >= 20) { fprintf(stderr, "%s %s %s: exited with code %d\n", progname, fn_name, res->name, rv); if (on_error == EXIT_ON_FAIL) exit(rv); } if (rv == 0 && really_register) register_config_file(res, config_save); return rv; } int call_cmd(struct adm_cmd *cmd, struct d_resource *res, enum on_error on_error) { if (!res->peer) set_peer_in_resource(res, cmd->need_peer); return call_cmd_fn(cmd->function, cmd->name, res, on_error); } int _run_dcmds(int order) { struct deferred_cmd *d = deferred_cmds[order]; struct deferred_cmd *t; int r = 0; int rv = 0; while (d) { r = call_cmd_fn(d->function, d->arg, d->res, KEEP_RUNNING); t = d->next; free(d); d = t; if (r > rv) rv = r; } return rv; } int run_dcmds(void) { return _run_dcmds(0) || _run_dcmds(1) || _run_dcmds(2); } /*** These functions are used to the print the config ***/ static char *esc(char *str) { static char buffer[1024]; char *ue = str, *e = buffer; if (!str || !str[0]) { return "\"\""; } if (strchr(str, ' ') || strchr(str, '\t') || strchr(str, '\\')) { *e++ = '"'; while (*ue) { if (*ue == '"' || *ue == '\\') { *e++ = '\\'; } if (e - buffer >= 1022) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } *e++ = *ue++; if (e - buffer >= 1022) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } } *e++ = '"'; *e++ = '\0'; return buffer; } return str; } static char *esc_xml(char *str) { static char buffer[1024]; char *ue = str, *e = buffer; if (!str || !str[0]) { return ""; } if (strchr(str, '"') || strchr(str, '\'') || strchr(str, '<') || strchr(str, '>') || strchr(str, '&') || strchr(str, '\\')) { while (*ue) { if (*ue == '"' || *ue == '\\') { *e++ = '\\'; if (e - buffer >= 1021) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } *e++ = *ue++; } else if (*ue == '\'' || *ue == '<' || *ue == '>' || *ue == '&') { if (*ue == '\'' && e - buffer < 1017) { strcpy(e, "'"); e += 6; } else if (*ue == '<' && e - buffer < 1019) { strcpy(e, "<"); e += 4; } else if (*ue == '>' && e - buffer < 1019) { strcpy(e, ">"); e += 4; } else if (*ue == '&' && e - buffer < 1018) { strcpy(e, "&"); e += 5; } else { fprintf(stderr, "string too long.\n"); exit(E_syntax); } ue++; } else { *e++ = *ue++; if (e - buffer >= 1022) { fprintf(stderr, "string too long.\n"); exit(E_syntax); } } } *e++ = '\0'; return buffer; } return str; } static void dump_options2(char *name, struct d_option *opts, void(*within)(void*), void *ctx) { if (!opts && !(within && ctx)) return; printI("%s {\n", name); ++indent; while (opts) { if (opts->value) printA(opts->name, opts->is_escaped ? opts->value : esc(opts-> value)); else printI(BFMT, opts->name); opts = opts->next; } if (within) within(ctx); --indent; printI("}\n"); } static void dump_options(char *name, struct d_option *opts) { dump_options2(name, opts, NULL, NULL); } void dump_proxy_plugins(void *ctx) { struct d_option *opt = ctx; dump_options("plugin", opt); } static void dump_global_info() { if (!global_options.minor_count && !global_options.disable_ip_verification && global_options.dialog_refresh == 1) return; printI("global {\n"); ++indent; if (global_options.disable_ip_verification) printI("disable-ip-verification;\n"); if (global_options.minor_count) printI("minor-count %i;\n", global_options.minor_count); if (global_options.dialog_refresh != 1) printI("dialog-refresh %i;\n", global_options.dialog_refresh); --indent; printI("}\n\n"); } static void fake_startup_options(struct d_resource *res); static void dump_common_info() { if (!common) return; printI("common {\n"); ++indent; if (common->protocol) printA("protocol", common->protocol); fake_startup_options(common); dump_options("net", common->net_options); dump_options("disk", common->disk_options); dump_options("syncer", common->sync_options); dump_options("startup", common->startup_options); dump_options2("proxy", common->proxy_options, dump_proxy_plugins, common->proxy_plugins); dump_options("handlers", common->handlers); --indent; printf("}\n\n"); } static void dump_address(char *name, char *addr, char *port, char *af) { if (!strcmp(af, "ipv6")) printI(IPV6FMT, name, af, addr, port); else printI(IPV4FMT, name, af, addr, port); } static void dump_proxy_info(struct d_proxy_info *pi) { printI("proxy on %s {\n", names_to_str(pi->on_hosts)); ++indent; dump_address("inside", pi->inside_addr, pi->inside_port, pi->inside_af); dump_address("outside", pi->outside_addr, pi->outside_port, pi->outside_af); --indent; printI("}\n"); } static void dump_host_info(struct d_host_info *hi) { if (!hi) { printI(" # No host section data available.\n"); return; } if (hi->lower) { printI("stacked-on-top-of %s {\n", esc(hi->lower->name)); ++indent; printI("# on %s \n", names_to_str(hi->on_hosts)); } else if (hi->by_address) { if (!strcmp(hi->address_family, "ipv6")) printI("floating ipv6 [%s]:%s {\n", hi->address, hi->port); else printI("floating %s %s:%s {\n", hi->address_family, hi->address, hi->port); ++indent; } else { printI("on %s {\n", names_to_str(hi->on_hosts)); ++indent; } printI("device%*s", -19 + INDENT_WIDTH * indent, ""); if (hi->device) printf("%s ", esc(hi->device)); printf("minor %d;\n", hi->device_minor); if (!hi->lower) printA("disk", esc(hi->disk)); if (!hi->by_address) dump_address("address", hi->address, hi->port, hi->address_family); if (!hi->lower) { if (!strncmp(hi->meta_index, "flex", 4)) printI(FMDISK, "flexible-meta-disk", esc(hi->meta_disk)); else if (!strcmp(hi->meta_index, "internal")) printA("meta-disk", "internal"); else printI(MDISK, "meta-disk", esc(hi->meta_disk), hi->meta_index); } if (hi->proxy) dump_proxy_info(hi->proxy); --indent; printI("}\n"); } static void dump_options_xml2(char *name, struct d_option *opts, void(*within)(void*), void *ctx) { if (!opts && !(within && ctx)) return; printI("
\n", name); ++indent; while (opts) { if (opts->value) printI("
\n"); } static void dump_options_xml(char *name, struct d_option *opts) { dump_options_xml2(name, opts, NULL, NULL); } void dump_proxy_plugins_xml(void *ctx) { struct d_option *opt = ctx; dump_options_xml("plugin", opt); } static void dump_global_info_xml() { if (!global_options.minor_count && !global_options.disable_ip_verification && global_options.dialog_refresh == 1) return; printI("\n"); ++indent; if (global_options.disable_ip_verification) printI("\n"); if (global_options.minor_count) printI("\n", global_options.minor_count); if (global_options.dialog_refresh != 1) printI("\n", global_options.dialog_refresh); --indent; printI("\n"); } static void dump_common_info_xml() { if (!common) return; printI("protocol) printf(" protocol=\"%s\"", common->protocol); printf(">\n"); ++indent; fake_startup_options(common); dump_options_xml("net", common->net_options); dump_options_xml("disk", common->disk_options); dump_options_xml("syncer", common->sync_options); dump_options_xml("startup", common->startup_options); dump_options2("proxy", common->proxy_options, dump_proxy_plugins, common->proxy_plugins); dump_options_xml("handlers", common->handlers); --indent; printI("\n"); } static void dump_proxy_info_xml(struct d_proxy_info *pi) { printI("\n", names_to_str(pi->on_hosts)); ++indent; printI("%s\n", pi->inside_af, pi->inside_port, pi->inside_addr); printI("%s\n", pi->outside_af, pi->outside_port, pi->outside_addr); --indent; printI("\n"); } static void dump_host_info_xml(struct d_host_info *hi) { if (!hi) { printI("\n"); return; } if (hi->by_address) printI("\n"); else printI("\n", names_to_str(hi->on_hosts)); ++indent; printI("%s\n", hi->device_minor, esc_xml(hi->device)); printI("%s\n", esc_xml(hi->disk)); printI("
%s
\n", hi->address_family, hi->port, hi->address); if (!strncmp(hi->meta_index, "flex", 4)) printI("%s\n", esc_xml(hi->meta_disk)); else if (!strcmp(hi->meta_index, "internal")) printI("internal\n"); else { printI("%s\n", hi->meta_index, esc_xml(hi->meta_disk)); } if (hi->proxy) dump_proxy_info_xml(hi->proxy); --indent; printI("
\n"); } static void fake_startup_options(struct d_resource *res) { struct d_option *opt; char *val; if (res->stacked_timeouts) { opt = new_opt(strdup("stacked-timeouts"), NULL); res->startup_options = APPEND(res->startup_options, opt); } if (res->become_primary_on) { val = strdup(names_to_str(res->become_primary_on)); opt = new_opt(strdup("become-primary-on"), val); opt->is_escaped = 1; res->startup_options = APPEND(res->startup_options, opt); } } static int adm_dump(struct d_resource *res, const char *unused __attribute((unused))) { struct d_host_info *host; printI("# resource %s on %s: %s, %s\n", esc(res->name), nodeinfo.nodename, res->ignore ? "ignored" : "not ignored", res->stacked ? "stacked" : "not stacked"); printI("resource %s {\n", esc(res->name)); ++indent; if (res->protocol) printA("protocol", res->protocol); for (host = res->all_hosts; host; host = host->next) dump_host_info(host); fake_startup_options(res); dump_options("net", res->net_options); dump_options("disk", res->disk_options); dump_options("syncer", res->sync_options); dump_options("startup", res->startup_options); dump_options2("proxy", res->proxy_options, dump_proxy_plugins, res->proxy_plugins); dump_options("handlers", res->handlers); --indent; printf("}\n\n"); return 0; } static int adm_dump_xml(struct d_resource *res, const char *unused __attribute((unused))) { struct d_host_info *host; printI("name)); if (res->protocol) printf(" protocol=\"%s\"", res->protocol); printf(">\n"); ++indent; // else if (common && common->protocol) printA("# common protocol", common->protocol); for (host = res->all_hosts; host; host = host->next) dump_host_info_xml(host); fake_startup_options(res); dump_options_xml("net", res->net_options); dump_options_xml("disk", res->disk_options); dump_options_xml("syncer", res->sync_options); dump_options_xml("startup", res->startup_options); dump_options_xml2("proxy", res->proxy_options, dump_proxy_plugins_xml, res->proxy_plugins); dump_options_xml("handlers", res->handlers); --indent; printI("\n"); return 0; } static int sh_nop(struct d_resource *ignored __attribute((unused)), const char *unused __attribute((unused))) { return 0; } static int sh_resources(struct d_resource *ignored __attribute((unused)), const char *unused __attribute((unused))) { struct d_resource *res, *t; int first = 1; for_each_resource(res, t, config) { if (res->ignore) continue; if (is_drbd_top != res->stacked) continue; printf(first ? "%s" : " %s", esc(res->name)); first = 0; } if (!first) printf("\n"); return 0; } static int sh_resource(struct d_resource *res, const char *unused __attribute((unused))) { printf("%s\n", res->name); return 0; } static int sh_dev(struct d_resource *res, const char *unused __attribute((unused))) { printf("%s\n", res->me->device); return 0; } static int sh_udev(struct d_resource *res, const char *unused __attribute((unused))) { /* No shell escape necessary. Udev does not handle it anyways... */ printf("RESOURCE=%s\n", res->name); if (!strncmp(res->me->device, "/dev/drbd", 9)) printf("DEVICE=%s\n", res->me->device + 5); else printf("DEVICE=drbd%u\n", res->me->device_minor); if (!strncmp(res->me->disk, "/dev/", 5)) printf("DISK=%s\n", res->me->disk + 5); else printf("DISK=%s\n", res->me->disk); return 0; } static int sh_minor(struct d_resource *res, const char *unused __attribute((unused))) { printf("%d\n", res->me->device_minor); return 0; } static int sh_ip(struct d_resource *res, const char *unused __attribute((unused))) { printf("%s\n", res->me->address); return 0; } static int sh_lres(struct d_resource *res, const char *unused __attribute((unused))) { if (!is_drbd_top) { fprintf(stderr, "sh-lower-resource only available in stacked mode\n"); exit(E_usage); } if (!res->stacked) { fprintf(stderr, "'%s' is not stacked on this host (%s)\n", res->name, nodeinfo.nodename); exit(E_usage); } printf("%s\n", res->me->lower->name); return 0; } static int sh_ll_dev(struct d_resource *res, const char *unused __attribute((unused))) { printf("%s\n", res->me->disk); return 0; } static int sh_md_dev(struct d_resource *res, const char *unused __attribute((unused))) { char *r; if (strcmp("internal", res->me->meta_disk) == 0) r = res->me->disk; else r = res->me->meta_disk; printf("%s\n", r); return 0; } static int sh_md_idx(struct d_resource *res, const char *unused __attribute((unused))) { printf("%s\n", res->me->meta_index); return 0; } static int sh_b_pri(struct d_resource *res, const char *unused __attribute((unused))) { int i, rv; if (name_in_names(nodeinfo.nodename, res->become_primary_on) || name_in_names("both", res->become_primary_on)) { /* Opon connect resync starts, and both sides become primary at the same time. One's try might be declined since an other state transition happens. Retry. */ for (i = 0; i < 5; i++) { rv = adm_generic_s(res, "primary"); if (rv == 0) return rv; sleep(1); } return rv; } return 0; } static int sh_mod_parms(struct d_resource *res __attribute((unused)), const char *unused __attribute((unused))) { int mc = global_options.minor_count; if (mc == 0) { mc = highest_minor + 11; if (mc > DRBD_MINOR_COUNT_MAX) mc = DRBD_MINOR_COUNT_MAX; if (mc < DRBD_MINOR_COUNT_DEF) mc = DRBD_MINOR_COUNT_DEF; } printf("minor_count=%d\n", mc); return 0; } static void free_host_info(struct d_host_info *hi) { if (!hi) return; free_names(hi->on_hosts); free(hi->device); free(hi->disk); free(hi->address); free(hi->address_family); free(hi->port); free(hi->meta_disk); free(hi->meta_index); } static void free_options(struct d_option *opts) { struct d_option *f; while (opts) { free(opts->name); free(opts->value); f = opts; opts = opts->next; free(f); } } static void free_config(struct d_resource *res) { struct d_resource *f, *t; struct d_host_info *host; for_each_resource(f, t, res) { free(f->name); free(f->protocol); free(f->device); free(f->disk); free(f->meta_disk); free(f->meta_index); for (host = f->all_hosts; host; host = host->next) free_host_info(host); free_options(f->net_options); free_options(f->disk_options); free_options(f->sync_options); free_options(f->startup_options); free_options(f->proxy_options); free_options(f->handlers); free(f); } if (common) { free_options(common->net_options); free_options(common->disk_options); free_options(common->sync_options); free_options(common->startup_options); free_options(common->proxy_options); free_options(common->handlers); free(common); } if (ifreq_list) free(ifreq_list); } static void expand_opts(struct d_option *co, struct d_option **opts) { struct d_option *no; while (co) { if (!find_opt(*opts, co->name)) { // prepend new item to opts no = new_opt(strdup(co->name), co->value ? strdup(co->value) : NULL); no->next = *opts; *opts = no; } co = co->next; } } static void expand_common(void) { struct d_resource *res, *tmp; struct d_host_info *h; for_each_resource(res, tmp, config) { for (h = res->all_hosts; h; h = h->next) { if (!h->device) m_asprintf(&h->device, "/dev/drbd%u", h->device_minor); } } if (!common) return; for_each_resource(res, tmp, config) { expand_opts(common->net_options, &res->net_options); expand_opts(common->disk_options, &res->disk_options); expand_opts(common->sync_options, &res->sync_options); expand_opts(common->startup_options, &res->startup_options); expand_opts(common->proxy_options, &res->proxy_options); expand_opts(common->handlers, &res->handlers); if (common->protocol && !res->protocol) res->protocol = strdup(common->protocol); if (common->stacked_timeouts) res->stacked_timeouts = 1; if (!res->become_primary_on) res->become_primary_on = common->become_primary_on; if (common->proxy_plugins && !res->proxy_plugins) expand_opts(common->proxy_plugins, &res->proxy_plugins); } } static void find_drbdcmd(char **cmd, char **pathes) { char **path; path = pathes; while (*path) { if (access(*path, X_OK) == 0) { *cmd = *path; return; } path++; } fprintf(stderr, "Can not find command (drbdsetup/drbdmeta)\n"); exit(E_exec_error); } static void alarm_handler(int __attribute((unused)) signo) { alarm_raised = 1; } void m__system(char **argv, int flags, struct d_resource *res, pid_t *kid, int *fd, int *ex) { pid_t pid; int status, rv = -1; int timeout = 0; char **cmdline = argv; int pipe_fds[2]; struct sigaction so; struct sigaction sa; sa.sa_handler = &alarm_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; if (dry_run || verbose) { if (sh_varname && *cmdline) printf("%s=%s\n", sh_varname, shell_escape(res->name)); while (*cmdline) { printf("%s ", shell_escape(*cmdline++)); } printf("\n"); if (dry_run) { if (kid) *kid = -1; if (fd) *fd = 0; if (ex) *ex = 0; return; } } /* flush stdout and stderr, so output of drbdadm * and helper binaries is reported in order! */ fflush(stdout); fflush(stderr); if (flags & (RETURN_STDOUT_FD | RETURN_STDERR_FD)) { if (pipe(pipe_fds) < 0) { perror("pipe"); fprintf(stderr, "Error in pipe, giving up.\n"); exit(E_exec_error); } } pid = fork(); if (pid == -1) { fprintf(stderr, "Can not fork\n"); exit(E_exec_error); } if (pid == 0) { if (flags & RETURN_STDOUT_FD) { close(pipe_fds[0]); dup2(pipe_fds[1], 1); } if (flags & RETURN_STDERR_FD) { close(pipe_fds[0]); dup2(pipe_fds[1], 2); } if (flags & SUPRESS_STDERR) fclose(stderr); execvp(argv[0], argv); fprintf(stderr, "Can not exec\n"); exit(E_exec_error); } if (flags & (RETURN_STDOUT_FD | RETURN_STDERR_FD)) close(pipe_fds[1]); if (flags & SLEEPS_FINITE) { sigaction(SIGALRM, &sa, &so); alarm_raised = 0; switch (flags & SLEEPS_MASK) { case SLEEPS_SHORT: timeout = 5; break; case SLEEPS_LONG: timeout = COMM_TIMEOUT + 1; break; case SLEEPS_VERY_LONG: timeout = 600; break; default: fprintf(stderr, "logic bug in %s:%d\n", __FILE__, __LINE__); exit(E_thinko); } alarm(timeout); } if (kid) *kid = pid; if (fd) *fd = pipe_fds[0]; if (flags & (RETURN_STDOUT_FD | RETURN_STDERR_FD) || flags == RETURN_PID) return; while (1) { if (waitpid(pid, &status, 0) == -1) { if (errno != EINTR) break; if (alarm_raised) { alarm(0); sigaction(SIGALRM, &so, NULL); rv = 0x100; break; } else { fprintf(stderr, "logic bug in %s:%d\n", __FILE__, __LINE__); exit(E_exec_error); } } else { if (WIFEXITED(status)) { rv = WEXITSTATUS(status); break; } } } if (flags & SLEEPS_FINITE) { if (rv >= 10 && !(flags & (DONT_REPORT_FAILED | SUPRESS_STDERR))) { fprintf(stderr, "Command '"); for (cmdline = argv; *cmdline; cmdline++) { fprintf(stderr, "%s", *cmdline); if (cmdline[1]) fputc(' ', stderr); } if (alarm_raised) { fprintf(stderr, "' did not terminate within %u seconds\n", timeout); exit(E_exec_error); } else { fprintf(stderr, "' terminated with exit code %d\n", rv); } } } fflush(stdout); fflush(stderr); if (ex) *ex = rv; } #define NA(ARGC) \ ({ if((ARGC) >= MAX_ARGS) { fprintf(stderr,"MAX_ARGS too small\n"); \ exit(E_thinko); \ } \ (ARGC)++; \ }) #define make_options(OPT) \ while(OPT) { \ if(OPT->value) { \ ssprintf(argv[NA(argc)],"--%s=%s",OPT->name,OPT->value); \ } else { \ ssprintf(argv[NA(argc)],"--%s",OPT->name); \ } \ OPT=OPT->next; \ } #define make_address(ADDR, PORT, AF) \ if (!strcmp(AF, "ipv6")) { \ ssprintf(argv[NA(argc)],"%s:[%s]:%s", AF, ADDR, PORT); \ } else { \ ssprintf(argv[NA(argc)],"%s:%s:%s", AF, ADDR, PORT); \ } int adm_attach(struct d_resource *res, const char *unused __attribute((unused))) { char *argv[MAX_ARGS]; struct d_option *opt; int argc = 0; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "disk"; argv[NA(argc)] = res->me->disk; if (!strcmp(res->me->meta_disk, "internal")) { argv[NA(argc)] = res->me->disk; } else { argv[NA(argc)] = res->me->meta_disk; } argv[NA(argc)] = res->me->meta_index; argv[NA(argc)] = "--set-defaults"; argv[NA(argc)] = "--create-device"; opt = res->disk_options; make_options(opt); argv[NA(argc)] = 0; return m_system_ex(argv, SLEEPS_LONG, res); } struct d_option *find_opt(struct d_option *base, char *name) { while (base) { if (!strcmp(base->name, name)) { return base; } base = base->next; } return 0; } int adm_resize(struct d_resource *res, const char *cmd) { char *argv[MAX_ARGS]; struct d_option *opt; int i, argc = 0; int silent; int ex; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "resize"; opt = find_opt(res->disk_options, "size"); if (opt) ssprintf(argv[NA(argc)], "--%s=%s", opt->name, opt->value); for (i = 0; i < soi; i++) argv[NA(argc)] = setup_opts[i]; argv[NA(argc)] = 0; /* if this is not "resize", but "check-resize", be silent! */ silent = strcmp(cmd, "resize") ? SUPRESS_STDERR : 0; ex = m_system_ex(argv, SLEEPS_SHORT | silent, res); if (ex) return ex; /* Record last-known bdev info. * Unfortunately drbdsetup did not have enough information * when doing the "resize", and in theory, _our_ information * about the backing device may even be wrong. * Call drbdsetup again, tell it to ask the kernel for * current config, and update the last known bdev info * according to that. */ /* argv[0] = drbdsetup; * argv[1] = minor; */ argv[2] = "check-resize"; argv[3] = NULL; /* ignore exit code */ m_system_ex(argv, SLEEPS_SHORT | silent, res); return 0; } int _admm_generic(struct d_resource *res, const char *cmd, int flags) { char *argv[MAX_ARGS]; int argc = 0, i; argv[NA(argc)] = drbdmeta; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "v08"; if (!strcmp(res->me->meta_disk, "internal")) { argv[NA(argc)] = res->me->disk; } else { argv[NA(argc)] = res->me->meta_disk; } if (!strcmp(res->me->meta_index, "flexible")) { if (!strcmp(res->me->meta_disk, "internal")) { argv[NA(argc)] = "flex-internal"; } else { argv[NA(argc)] = "flex-external"; } } else { argv[NA(argc)] = res->me->meta_index; } argv[NA(argc)] = (char *)cmd; for (i = 0; i < soi; i++) { argv[NA(argc)] = setup_opts[i]; } argv[NA(argc)] = 0; return m_system_ex(argv, flags, res); } static int admm_generic(struct d_resource *res, const char *cmd) { return _admm_generic(res, cmd, SLEEPS_VERY_LONG); } static void _adm_generic(struct d_resource *res, const char *cmd, int flags, pid_t *pid, int *fd, int *ex) { char *argv[MAX_ARGS]; int argc = 0, i; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = (char *)cmd; for (i = 0; i < soi; i++) { argv[NA(argc)] = setup_opts[i]; } argv[NA(argc)] = 0; setenv("DRBD_RESOURCE", res->name, 1); m__system(argv, flags, res, pid, fd, ex); } static int adm_generic(struct d_resource *res, const char *cmd, int flags) { int ex; _adm_generic(res, cmd, flags, NULL, NULL, &ex); return ex; } int adm_generic_s(struct d_resource *res, const char *cmd) { return adm_generic(res, cmd, SLEEPS_SHORT); } int adm_status_xml(struct d_resource *res, const char *cmd) { struct d_resource *r, *t; int rv = 0; if (!dry_run) { printf("\n", REL_VERSION, API_VERSION); printf("\n", config_save); } for_each_resource(r, t, res) { if (r->ignore) continue; rv = adm_generic(r, cmd, SLEEPS_SHORT); if (rv) break; } if (!dry_run) printf("\n\n"); return rv; } int sh_status(struct d_resource *res, const char *cmd) { struct d_resource *r, *t; int rv = 0; if (!dry_run) { printf("_drbd_version=%s\n_drbd_api=%u\n", shell_escape(REL_VERSION), API_VERSION); printf("_config_file=%s\n\n", shell_escape(config_save)); } for_each_resource(r, t, res) { if (r->ignore) continue; printf("_stacked_on=%s\n", r->stacked && r->me->lower ? shell_escape(r->me->lower->name) : ""); printf("_stacked_on_device=%s\n", r->stacked && r->me->lower ? shell_escape(r->me->lower->me->device) : ""); if (r->stacked && r->me->lower) printf("_stacked_on_minor=%d\n", r->me->lower->me->device_minor); else printf("_stacked_on_minor=\n"); rv = adm_generic(r, cmd, SLEEPS_SHORT); if (rv) break; } return rv; } int adm_generic_l(struct d_resource *res, const char *cmd) { return adm_generic(res, cmd, SLEEPS_LONG); } static int adm_outdate(struct d_resource *res, const char *cmd) { int rv; rv = adm_generic(res, cmd, SLEEPS_SHORT | SUPRESS_STDERR); /* special cases for outdate: * 17: drbdsetup outdate, but is primary and thus cannot be outdated. * 5: drbdsetup outdate, and is inconsistent or worse anyways. */ if (rv == 17) return rv; if (rv == 5) { /* That might mean it is diskless. */ rv = admm_generic(res, cmd); if (rv) rv = 5; return rv; } if (rv || dry_run) { rv = admm_generic(res, cmd); } return rv; } /* shell equivalent: * ( drbdsetup resize && drbdsetup check-resize ) || drbdmeta check-resize */ static int adm_chk_resize(struct d_resource *res, const char *cmd) { /* drbdsetup resize && drbdsetup check-resize */ int ex = adm_resize(res, cmd); if (ex == 0) return 0; /* try drbdmeta check-resize */ return admm_generic(res, cmd); } static int adm_generic_b(struct d_resource *res, const char *cmd) { char buffer[4096]; int fd, status, rv = 0, rr, s = 0; pid_t pid; _adm_generic(res, cmd, SLEEPS_SHORT | RETURN_STDERR_FD, &pid, &fd, NULL); if (fd < 0) { fprintf(stderr, "Strange: got negative fd.\n"); exit(E_thinko); } if (!dry_run) { while (1) { rr = read(fd, buffer + s, 4096 - s); if (rr <= 0) break; s += rr; } close(fd); rr = waitpid(pid, &status, 0); alarm(0); if (WIFEXITED(status)) rv = WEXITSTATUS(status); if (alarm_raised) { rv = 0x100; } } /* see drbdsetup.c, print_config_error(): * 11: some unspecific state change error * 17: SS_NO_UP_TO_DATE_DISK * In both cases, we don't need to retry with drbdmeta, * it would fail anyways with "Device is configured!" */ if (rv == 11 || rv == 17) { /* Some state transition error, report it ... */ rr = write(fileno(stderr), buffer, s); return rv; } if (rv || dry_run) { /* On other errors rv = 10 .. no minor allocated rv = 20 .. module not loaded rv = 16 .. we are diskless here retry with drbdmeta. */ rv = admm_generic(res, cmd); } return rv; } static int adm_khelper(struct d_resource *res, const char *cmd) { int rv = 0; char *sh_cmd; char minor_string[8]; char *argv[] = { "/bin/sh", "-c", NULL, NULL }; if (!res->peer) { /* Since 8.3.2 we get DRBD_PEER_AF and DRBD_PEER_ADDRESS from the kernel. If we do not know the peer by now, use these to find the peer. */ struct d_host_info *host; char *peer_address = getenv("DRBD_PEER_ADDRESS"); char *peer_af = getenv("DRBD_PEER_AF"); if (peer_address && peer_af) { for (host = res->all_hosts; host; host = host->next) { if (!strcmp(host->address_family, peer_af) && !strcmp(host->address, peer_address)) { res->peer = host; break; } } } } if (res->peer) { setenv("DRBD_PEER_AF", res->peer->address_family, 1); /* since 8.3.0 */ setenv("DRBD_PEER_ADDRESS", res->peer->address, 1); /* since 8.3.0 */ setenv("DRBD_PEER", res->peer->on_hosts->name, 1); /* deprecated */ setenv("DRBD_PEERS", names_to_str(res->peer->on_hosts), 1); /* since 8.3.0, but not usable when using a config with "floating" statements. */ } snprintf(minor_string, sizeof(minor_string), "%u", res->me->device_minor); setenv("DRBD_RESOURCE", res->name, 1); setenv("DRBD_MINOR", minor_string, 1); setenv("DRBD_CONF", config_save, 1); if ((sh_cmd = get_opt_val(res->handlers, cmd, NULL))) { argv[2] = sh_cmd; rv = m_system_ex(argv, SLEEPS_VERY_LONG, res); } return rv; } // need to convert discard-node-nodename to discard-local or discard-remote. void convert_discard_opt(struct d_resource *res) { struct d_option *opt; if (res == NULL) return; if ((opt = find_opt(res->net_options, "after-sb-0pri"))) { if (!strncmp(opt->value, "discard-node-", 13)) { if (!strcmp(nodeinfo.nodename, opt->value + 13)) { free(opt->value); opt->value = strdup("discard-local"); } else { free(opt->value); opt->value = strdup("discard-remote"); } } } } int adm_connect(struct d_resource *res, const char *unused __attribute((unused))) { char *argv[MAX_ARGS]; struct d_option *opt; int i; int argc = 0; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "net"; make_address(res->me->address, res->me->port, res->me->address_family); if (res->me->proxy) { make_address(res->me->proxy->inside_addr, res->me->proxy->inside_port, res->me->proxy->inside_af); } else if (res->peer) { make_address(res->peer->address, res->peer->port, res->peer->address_family); } else if (dry_run > 1) { argv[NA(argc)] = "N/A"; } else { fprintf(stderr, "resource %s: cannot change network config without knowing my peer.\n", res->name); return dry_run ? 0 : 20; } argv[NA(argc)] = res->protocol; argv[NA(argc)] = "--set-defaults"; argv[NA(argc)] = "--create-device"; opt = res->net_options; make_options(opt); for (i = 0; i < soi; i++) { argv[NA(argc)] = setup_opts[i]; } argv[NA(argc)] = 0; return m_system_ex(argv, SLEEPS_SHORT, res); } struct d_resource *res_by_name(const char *name); struct d_option *del_opt(struct d_option *base, struct d_option *item) { struct d_option *i; if (base == item) { base = item->next; free(item->name); free(item->value); free(item); return base; } for (i = base; i; i = i->next) { if (i->next == item) { i->next = item->next; free(item->name); free(item->value); free(item); return base; } } return base; } // Need to convert after from resourcename to minor_number. void convert_after_option(struct d_resource *res) { struct d_option *opt, *next; struct d_resource *depends_on_res; if (res == NULL) return; opt = res->sync_options; while ((opt = find_opt(opt, "after"))) { next = opt->next; depends_on_res = res_by_name(opt->value); if (!depends_on_res || depends_on_res->ignore) { res->sync_options = del_opt(res->sync_options, opt); } else { free(opt->value); m_asprintf(&opt->value, "%d", depends_on_res->me->device_minor); } opt = next; } } char *proxy_connection_name(struct d_resource *res) { static char conn_name[128]; int counter; counter = snprintf(conn_name, sizeof(conn_name), "%s-%s-%s", names_to_str_c(res->me->proxy->on_hosts, '_'), res->name, names_to_str_c(res->peer->proxy->on_hosts, '_')); if (counter >= sizeof(conn_name)-3) { fprintf(stderr, "The connection name in resource %s got too long.\n", res->name); exit(E_config_invalid); } return conn_name; } int do_proxy_conn_up(struct d_resource *res, const char *conn_name) { char *argv[4] = { drbd_proxy_ctl, "-c", NULL, NULL }; int rv; if (!conn_name) conn_name = proxy_connection_name(res); ssprintf(argv[2], "add connection %s %s:%s %s:%s %s:%s %s:%s", conn_name, res->me->proxy->inside_addr, res->me->proxy->inside_port, res->peer->proxy->outside_addr, res->peer->proxy->outside_port, res->me->proxy->outside_addr, res->me->proxy->outside_port, res->me->address, res->me->port); rv = m_system_ex(argv, SLEEPS_SHORT, res); return rv; } int do_proxy_conn_plugins(struct d_resource *res, const char *conn_name) { char *argv[MAX_ARGS]; int argc = 0; struct d_option *opt; int counter; if (!conn_name) conn_name = proxy_connection_name(res); argc = 0; argv[NA(argc)] = drbd_proxy_ctl; opt = res->proxy_options; while (opt) { argv[NA(argc)] = "-c"; ssprintf(argv[NA(argc)], "set %s %s %s", opt->name, conn_name, opt->value); opt = opt->next; } counter = 0; opt = res->proxy_plugins; /* Don't send the "set plugin ... END" line if no plugins are defined * - that's incompatible with the drbd proxy version 1. */ if (opt) { while (1) { argv[NA(argc)] = "-c"; ssprintf(argv[NA(argc)], "set plugin %s %d %s", conn_name, counter, opt ? opt->name : "END"); if (!opt) break; opt = opt->next; counter ++; } } argv[NA(argc)] = 0; if (argc > 2) return m_system_ex(argv, SLEEPS_SHORT, res); return 0; } int do_proxy_conn_down(struct d_resource *res, const char *conn_name) { char *argv[4] = { drbd_proxy_ctl, "-c", NULL, NULL}; int rv; if (!conn_name) conn_name = proxy_connection_name(res); ssprintf(argv[2], "del connection %s", conn_name); rv = m_system_ex(argv, SLEEPS_SHORT, res); return rv; } static int check_proxy(struct d_resource *res, int do_up) { int rv; if (!res->me->proxy) { if (all_resources) return 0; fprintf(stderr, "There is no proxy config for host %s in resource %s.\n", nodeinfo.nodename, res->name); exit(E_config_invalid); } if (!name_in_names(nodeinfo.nodename, res->me->proxy->on_hosts)) { if (all_resources) return 0; fprintf(stderr, "The proxy config in resource %s is not for %s.\n", res->name, nodeinfo.nodename); exit(E_config_invalid); } if (!res->peer) { fprintf(stderr, "Cannot determine the peer in resource %s.\n", res->name); exit(E_config_invalid); } if (!res->peer->proxy) { fprintf(stderr, "There is no proxy config for the peer in resource %s.\n", res->name); if (all_resources) return 0; exit(E_config_invalid); } if (do_up) { rv = do_proxy_conn_up(res, NULL); if (!rv) rv = do_proxy_conn_plugins(res, NULL); } else rv = do_proxy_conn_down(res, NULL); return rv; } static int adm_proxy_up(struct d_resource *res, const char *unused __attribute((unused))) { return check_proxy(res, 1); } static int adm_proxy_down(struct d_resource *res, const char *unused __attribute((unused))) { return check_proxy(res, 0); } int adm_syncer(struct d_resource *res, const char *unused __attribute((unused))) { char *argv[MAX_ARGS]; struct d_option *opt; int i, argc = 0; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "syncer"; argv[NA(argc)] = "--set-defaults"; argv[NA(argc)] = "--create-device"; opt = res->sync_options; make_options(opt); for (i = 0; i < soi; i++) { argv[NA(argc)] = setup_opts[i]; } argv[NA(argc)] = 0; return m_system_ex(argv, SLEEPS_SHORT, res); } static int adm_up(struct d_resource *res, const char *unused __attribute((unused))) { schedule_dcmd(adm_attach, res, "attach", 0); schedule_dcmd(adm_syncer, res, "syncer", 1); schedule_dcmd(adm_connect, res, "connect", 2); return 0; } /* The stacked-timeouts switch in the startup sections allows us to enforce the use of the specified timeouts instead the use of a sane value. Should only be used if the third node should never become primary. */ static int adm_wait_c(struct d_resource *res, const char *unused __attribute((unused))) { char *argv[MAX_ARGS]; struct d_option *opt; int argc = 0, rv; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "wait-connect"; if (is_drbd_top && !res->stacked_timeouts) { unsigned long timeout = 20; if ((opt = find_opt(res->net_options, "connect-int"))) { timeout = strtoul(opt->value, NULL, 10); // one connect-interval? two? timeout *= 2; } argv[argc++] = "-t"; ssprintf(argv[argc], "%lu", timeout); argc++; } else { opt = res->startup_options; make_options(opt); } argv[NA(argc)] = 0; rv = m_system_ex(argv, SLEEPS_FOREVER, res); return rv; } static unsigned minor_by_id(const char *id) { if (strncmp(id, "minor-", 6)) return -1U; return m_strtoll(id + 6, 1); } struct d_resource *res_by_minor(const char *id) { struct d_resource *res, *t; unsigned int mm; mm = minor_by_id(id); if (mm == -1U) return NULL; for_each_resource(res, t, config) { if (res->ignore) continue; if (mm == res->me->device_minor) { is_drbd_top = res->stacked; return res; } } return NULL; } struct d_resource *res_by_name(const char *name) { struct d_resource *res, *t; for_each_resource(res, t, config) { if (strcmp(name, res->name) == 0) return res; } return NULL; } /* In case a child exited, or exits, its return code is stored as negative number in the pids[i] array */ static int childs_running(pid_t * pids, int opts) { int i = 0, wr, rv = 0, status; for (i = 0; i < nr_resources; i++) { if (pids[i] <= 0) continue; wr = waitpid(pids[i], &status, opts); if (wr == -1) { // Wait error. if (errno == ECHILD) { printf("No exit code for %d\n", pids[i]); pids[i] = 0; // Child exited before ? continue; } perror("waitpid"); exit(E_exec_error); } if (wr == 0) rv = 1; // Child still running. if (wr > 0) { pids[i] = 0; if (WIFEXITED(status)) pids[i] = -WEXITSTATUS(status); if (WIFSIGNALED(status)) pids[i] = -1000; } } return rv; } static void kill_childs(pid_t * pids) { int i; for (i = 0; i < nr_resources; i++) { if (pids[i] <= 0) continue; kill(pids[i], SIGINT); } } /* returns: -1 ... all childs terminated 0 ... timeout expired 1 ... a string was read */ int gets_timeout(pid_t * pids, char *s, int size, int timeout) { int pr, rr, n = 0; struct pollfd pfd; if (s) { pfd.fd = fileno(stdin); pfd.events = POLLIN | POLLHUP | POLLERR | POLLNVAL; n = 1; } if (!childs_running(pids, WNOHANG)) { pr = -1; goto out; } do { pr = poll(&pfd, n, timeout); if (pr == -1) { // Poll error. if (errno == EINTR) { if (childs_running(pids, WNOHANG)) continue; goto out; // pr = -1 here. } perror("poll"); exit(E_exec_error); } } while (pr == -1); if (pr == 1) { // Input available. rr = read(fileno(stdin), s, size - 1); if (rr == -1) { perror("read"); exit(E_exec_error); } s[rr] = 0; } out: return pr; } static char *get_opt_val(struct d_option *base, const char *name, char *def) { while (base) { if (!strcmp(base->name, name)) { return base->value; } base = base->next; } return def; } void chld_sig_hand(int __attribute((unused)) unused) { // do nothing. But interrupt systemcalls :) } static int check_exit_codes(pid_t * pids) { struct d_resource *res, *t; int i = 0, rv = 0; for_each_resource(res, t, config) { if (res->ignore) continue; if (is_drbd_top != res->stacked) continue; if (pids[i] == -5 || pids[i] == -1000) { pids[i] = 0; } if (pids[i] == -20) rv = 20; i++; } return rv; } static int adm_wait_ci(struct d_resource *ignored __attribute((unused)), const char *unused __attribute((unused))) { struct d_resource *res, *t; char *argv[20], answer[40]; pid_t *pids; struct d_option *opt; int rr, wtime, argc, i = 0; time_t start; int saved_stdin, saved_stdout, fd; struct sigaction so, sa; saved_stdin = -1; saved_stdout = -1; if (no_tty) { fprintf(stderr, "WARN: stdin/stdout is not a TTY; using /dev/console"); fprintf(stdout, "WARN: stdin/stdout is not a TTY; using /dev/console"); saved_stdin = dup(fileno(stdin)); if (saved_stdin == -1) perror("dup(stdin)"); saved_stdout = dup(fileno(stdout)); if (saved_stdin == -1) perror("dup(stdout)"); fd = open("/dev/console", O_RDONLY); if (fd == -1) perror("open('/dev/console, O_RDONLY)"); dup2(fd, fileno(stdin)); fd = open("/dev/console", O_WRONLY); if (fd == -1) perror("open('/dev/console, O_WRONLY)"); dup2(fd, fileno(stdout)); } sa.sa_handler = chld_sig_hand; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_NOCLDSTOP; sigaction(SIGCHLD, &sa, &so); pids = alloca(nr_resources * sizeof(pid_t)); /* alloca can not fail, it can "only" overflow the stack :) * but it needs to be initialized anyways! */ memset(pids, 0, nr_resources * sizeof(pid_t)); for_each_resource(res, t, config) { if (res->ignore) continue; if (is_drbd_top != res->stacked) continue; argc = 0; argv[NA(argc)] = drbdsetup; ssprintf(argv[NA(argc)], "%d", res->me->device_minor); argv[NA(argc)] = "wait-connect"; opt = res->startup_options; make_options(opt); argv[NA(argc)] = 0; m__system(argv, RETURN_PID, res, &pids[i++], NULL, NULL); } wtime = global_options.dialog_refresh ? : -1; start = time(0); for (i = 0; i < 10; i++) { // no string, but timeout rr = gets_timeout(pids, 0, 0, 1 * 1000); if (rr < 0) break; putchar('.'); fflush(stdout); check_exit_codes(pids); } if (rr == 0) { /* track a "yes", as well as ctrl-d and ctrl-c, * in case our tty is stuck in "raw" mode, and * we get it one character a time (-icanon) */ char yes_string[] = "yes\n"; char *yes_expect = yes_string; int ctrl_c_count = 0; int ctrl_d_count = 0; /* Just in case, if plymouth or usplash is running, * tell them to step aside. * Also try to force canonical tty mode. */ if (system("exec > /dev/null 2>&1; plymouth quit ; usplash_write QUIT ; " "stty echo icanon icrnl")) /* Ignore return value. Cannot do anything about it anyways. */; printf ("\n***************************************************************\n" " DRBD's startup script waits for the peer node(s) to appear.\n" " - In case this node was already a degraded cluster before the\n" " reboot the timeout is %s seconds. [degr-wfc-timeout]\n" " - If the peer was available before the reboot the timeout will\n" " expire after %s seconds. [wfc-timeout]\n" " (These values are for resource '%s'; 0 sec -> wait forever)\n", get_opt_val(config->startup_options, "degr-wfc-timeout", "0"), get_opt_val(config->startup_options, "wfc-timeout", "0"), config->name); printf(" To abort waiting enter 'yes' [ -- ]:"); do { printf("\e[s\e[31G[%4d]:\e[u", (int)(time(0) - start)); // Redraw sec. fflush(stdout); rr = gets_timeout(pids, answer, 40, wtime * 1000); check_exit_codes(pids); if (rr != 1) continue; /* If our tty is in "sane" or "canonical" mode, * we get whole lines. * If it still is in "raw" mode, even though we * tried to set ICANON above, possibly some other * "boot splash thingy" is in operation. * We may be lucky to get single characters. * If a sysadmin sees things stuck during boot, * I expect that ctrl-c or ctrl-d will be one * of the first things that are tried. * In raw mode, we get these characters directly. * But I want them to try that three times ;) */ if (answer[0] && answer[1] == 0) { if (answer[0] == '\3') ++ctrl_c_count; if (answer[0] == '\4') ++ctrl_d_count; if (yes_expect && answer[0] == *yes_expect) ++yes_expect; else if (answer[0] == '\n') yes_expect = yes_string; else yes_expect = NULL; } if (!strcmp(answer, "yes\n") || (yes_expect && *yes_expect == '\0') || ctrl_c_count >= 3 || ctrl_d_count >= 3) { kill_childs(pids); childs_running(pids, 0); check_exit_codes(pids); break; } printf(" To abort waiting enter 'yes' [ -- ]:"); } while (rr != -1); printf("\n"); } if (saved_stdin != -1) { dup2(saved_stdin, fileno(stdin)); dup2(saved_stdout, fileno(stdout)); } return 0; } static void print_cmds(int level) { size_t i; int j = 0; for (i = 0; i < ARRAY_SIZE(cmds); i++) { if (cmds[i].show_in_usage != level) continue; if (j++ % 2) { printf("%-35s\n", cmds[i].name); } else { printf(" %-35s", cmds[i].name); } } if (j % 2) printf("\n"); } static int hidden_cmds(struct d_resource *ignored __attribute((unused)), const char *ignored2 __attribute((unused))) { printf("\nThese additional commands might be useful for writing\n" "nifty shell scripts around drbdadm:\n\n"); print_cmds(2); printf("\nThese commands are used by the kernel part of DRBD to\n" "invoke user mode helper programs:\n\n"); print_cmds(3); printf ("\nThese commands ought to be used by experts and developers:\n\n"); print_cmds(4); printf("\n"); exit(0); } void print_usage_and_exit(const char *addinfo) { struct option *opt; printf("\nUSAGE: %s [OPTION...] [-- DRBDSETUP-OPTION...] COMMAND " "{all|RESOURCE...}\n\n" "OPTIONS:\n", progname); opt = admopt; while (opt->name) { if (opt->has_arg == required_argument) printf(" {--%s|-%c} val\n", opt->name, opt->val); else printf(" {--%s|-%c}\n", opt->name, opt->val); opt++; } printf("\nCOMMANDS:\n"); print_cmds(1); printf("\nVersion: " REL_VERSION " (api:%d)\n%s\n", API_VERSION, drbd_buildtag()); if (addinfo) printf("\n%s\n", addinfo); exit(E_usage); } /* * I'd really rather parse the output of * ip -o a s * once, and be done. * But anyways.... */ static struct ifreq *get_ifreq(void) { int sockfd, num_ifaces; struct ifreq *ifr; struct ifconf ifc; size_t buf_size; if (0 > (sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP))) { perror("Cannot open socket"); exit(EXIT_FAILURE); } num_ifaces = 0; ifc.ifc_req = NULL; /* realloc buffer size until no overflow occurs */ do { num_ifaces += 16; /* initial guess and increment */ buf_size = ++num_ifaces * sizeof(struct ifreq); ifc.ifc_len = buf_size; if (NULL == (ifc.ifc_req = realloc(ifc.ifc_req, ifc.ifc_len))) { fprintf(stderr, "Out of memory.\n"); return NULL; } if (ioctl(sockfd, SIOCGIFCONF, &ifc)) { perror("ioctl SIOCFIFCONF"); free(ifc.ifc_req); return NULL; } } while (buf_size <= (size_t) ifc.ifc_len); num_ifaces = ifc.ifc_len / sizeof(struct ifreq); /* Since we allocated at least one more than necessary, * this serves as a stop marker for the iteration in * have_ip() */ ifc.ifc_req[num_ifaces].ifr_name[0] = 0; for (ifr = ifc.ifc_req; ifr->ifr_name[0] != 0; ifr++) { /* we only want to look up the presence or absence of a certain address * here. but we want to skip "down" interfaces. if an interface is down, * we store an invalid sa_family, so the lookup will skip it. */ struct ifreq ifr_for_flags = *ifr; /* get a copy to work with */ if (ioctl(sockfd, SIOCGIFFLAGS, &ifr_for_flags) < 0) { perror("ioctl SIOCGIFFLAGS"); ifr->ifr_addr.sa_family = -1; /* what's wrong here? anyways: skip */ continue; } if (!(ifr_for_flags.ifr_flags & IFF_UP)) { ifr->ifr_addr.sa_family = -1; /* is not up: skip */ continue; } } close(sockfd); return ifc.ifc_req; } int have_ip_ipv4(const char *ip) { struct ifreq *ifr; struct in_addr query_addr; query_addr.s_addr = inet_addr(ip); if (!ifreq_list) ifreq_list = get_ifreq(); for (ifr = ifreq_list; ifr && ifr->ifr_name[0] != 0; ifr++) { /* SIOCGIFCONF only supports AF_INET */ struct sockaddr_in *list_addr = (struct sockaddr_in *)&ifr->ifr_addr; if (ifr->ifr_addr.sa_family != AF_INET) continue; if (query_addr.s_addr == list_addr->sin_addr.s_addr) return 1; } return 0; } int have_ip_ipv6(const char *ip) { FILE *if_inet6; struct in6_addr addr6, query_addr; unsigned int b[4]; char tmp_ip[INET6_ADDRSTRLEN+1]; char name[20]; /* IFNAMSIZ aka IF_NAMESIZE is 16 */ int i; /* don't want to do getaddrinfo lookup, but inet_pton get's confused by * %eth0 link local scope specifiers. So we have a temporary copy * without that part. */ for (i=0; ip[i] && ip[i] != '%' && i < INET6_ADDRSTRLEN; i++) tmp_ip[i] = ip[i]; tmp_ip[i] = 0; if (inet_pton(AF_INET6, tmp_ip, &query_addr) <= 0) return 0; #define PROC_IF_INET6 "/proc/net/if_inet6" if_inet6 = fopen(PROC_IF_INET6, "r"); if (!if_inet6) { if (errno != ENOENT) perror("open of " PROC_IF_INET6 " failed:"); #undef PROC_IF_INET6 return 0; } while (fscanf (if_inet6, X32(08) X32(08) X32(08) X32(08) " %*02x %*02x %*02x %*02x %s", b, b + 1, b + 2, b + 3, name) > 0) { for (i = 0; i < 4; i++) addr6.s6_addr32[i] = cpu_to_be32(b[i]); if (memcmp(&query_addr, &addr6, sizeof(struct in6_addr)) == 0) { fclose(if_inet6); return 1; } } fclose(if_inet6); return 0; } int have_ip(const char *af, const char *ip) { if (!strcmp(af, "ipv4")) return have_ip_ipv4(ip); else if (!strcmp(af, "ipv6")) return have_ip_ipv6(ip); return 1; /* SCI */ } void verify_ips(struct d_resource *res) { if (global_options.disable_ip_verification) return; if (dry_run == 1 || do_verify_ips == 0) return; if (res->ignore) return; if (res->stacked && !is_drbd_top) return; if (!have_ip(res->me->address_family, res->me->address)) { ENTRY e, *ep; e.key = e.data = ep = NULL; m_asprintf(&e.key, "%s:%s", res->me->address, res->me->port); hsearch_r(e, FIND, &ep, &global_htable); fprintf(stderr, "%s: in resource %s, on %s:\n\t" "IP %s not found on this host.\n", ep ? (char *)ep->data : res->config_file, res->name, names_to_str(res->me->on_hosts), res->me->address); if (INVALID_IP_IS_INVALID_CONF) config_valid = 0; } } static char *conf_file[] = { DRBD_CONFIG_DIR "/drbd-83.conf", DRBD_CONFIG_DIR "/drbd-82.conf", DRBD_CONFIG_DIR "/drbd-08.conf", DRBD_CONFIG_DIR "/drbd.conf", 0 }; int sanity_check_abs_cmd(char *cmd_name) { struct stat sb; if (stat(cmd_name, &sb)) { /* If stat fails, just ignore this sanity check, * we are still iterating over $PATH probably. */ return 0; } if (!(sb.st_mode & S_ISUID) || sb.st_mode & S_IXOTH || sb.st_gid == 0) { static int did_header = 0; if (!did_header) fprintf(stderr, "WARN:\n" " You are using the 'drbd-peer-outdater' as fence-peer program.\n" " If you use that mechanism the dopd heartbeat plugin program needs\n" " to be able to call drbdsetup and drbdmeta with root privileges.\n\n" " You need to fix this with these commands:\n"); did_header = 1; fprintf(stderr, " chgrp haclient %s\n" " chmod o-x %s\n" " chmod u+s %s\n\n", cmd_name, cmd_name, cmd_name); } return 1; } void sanity_check_cmd(char *cmd_name) { char *path, *pp, *c; char abs_path[100]; if (strchr(cmd_name, '/')) { sanity_check_abs_cmd(cmd_name); } else { path = pp = c = strdup(getenv("PATH")); while (1) { c = strchr(pp, ':'); if (c) *c = 0; snprintf(abs_path, 100, "%s/%s", pp, cmd_name); if (sanity_check_abs_cmd(abs_path)) break; if (!c) break; c++; if (!*c) break; pp = c; } free(path); } } /* if the config file is not readable by haclient, * dopd cannot work. * NOTE: we assume that any gid != 0 will be the group dopd will run as, * typically haclient. */ void sanity_check_conf(char *c) { struct stat sb; /* if we cannot stat the config file, * we have other things to worry about. */ if (stat(c, &sb)) return; /* permissions are funny: if it is world readable, * but not group readable, and it belongs to my group, * I am denied access. * For the file to be readable by dopd (hacluster:haclient), * it is not enough to be world readable. */ /* ok if world readable, and NOT group haclient (see NOTE above) */ if (sb.st_mode & S_IROTH && sb.st_gid == 0) return; /* ok if group readable, and group haclient (see NOTE above) */ if (sb.st_mode & S_IRGRP && sb.st_gid != 0) return; fprintf(stderr, "WARN:\n" " You are using the 'drbd-peer-outdater' as fence-peer program.\n" " If you use that mechanism the dopd heartbeat plugin program needs\n" " to be able to read the drbd.config file.\n\n" " You need to fix this with these commands:\n" " chgrp haclient %s\n" " chmod g+r %s\n\n", c, c); } void sanity_check_perm() { static int checked = 0; if (checked) return; sanity_check_cmd(drbdsetup); sanity_check_cmd(drbdmeta); sanity_check_conf(config_file); checked = 1; } void validate_resource(struct d_resource *res) { struct d_option *opt, *next; struct d_name *bpo; if (!res->protocol) { if (!common || !common->protocol) { fprintf(stderr, "%s:%d: in resource %s:\n\tprotocol definition missing.\n", res->config_file, res->start_line, res->name); config_valid = 0; } /* else: * may not have been expanded yet for "dump" subcommand */ } else { res->protocol[0] = toupper(res->protocol[0]); } /* there may be more than one "after" statement, * see commit 89cd0585 */ opt = res->sync_options; while ((opt = find_opt(opt, "after"))) { next = opt->next; if (res_by_name(opt->value) == NULL) { fprintf(stderr, "%s:%d: in resource %s:\n\tresource '%s' mentioned in " "'after' option is not known.\n", res->config_file, res->start_line, res->name, opt->value); /* Non-fatal if run from some script. * When deleting resources, it is an easily made * oversight to leave references to the deleted * resources in sync-after statements. Don't fail on * every pacemaker-induced action, as it would * ultimately lead to all nodes committing suicide. */ if (no_tty) res->sync_options = del_opt(res->sync_options, opt); else config_valid = 0; } opt = next; } if (res->ignore) return; if (!res->me) { fprintf(stderr, "%s:%d: in resource %s:\n\tmissing section 'on %s { ... }'.\n", res->config_file, res->start_line, res->name, nodeinfo.nodename); config_valid = 0; } // need to verify that in the discard-node-nodename options only known // nodenames are mentioned. if ((opt = find_opt(res->net_options, "after-sb-0pri"))) { if (!strncmp(opt->value, "discard-node-", 13)) { if (res->peer && !name_in_names(opt->value + 13, res->peer->on_hosts) && !name_in_names(opt->value + 13, res->me->on_hosts)) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "the nodename in the '%s' option is " "not known.\n\t" "valid nodenames are: '%s %s'.\n", res->config_file, res->start_line, res->name, opt->value, names_to_str(res->me->on_hosts), names_to_str(res->peer->on_hosts)); config_valid = 0; } } } if ((opt = find_opt(res->handlers, "fence-peer"))) { if (strstr(opt->value, "drbd-peer-outdater")) sanity_check_perm(); } opt = find_opt(res->net_options, "allow-two-primaries"); if (name_in_names("both", res->become_primary_on) && opt == NULL) { fprintf(stderr, "%s:%d: in resource %s:\n" "become-primary-on is set to both, but allow-two-primaries " "is not set.\n", res->config_file, res->start_line, res->name); config_valid = 0; } if (!res->peer) set_peer_in_resource(res, 0); if (res->peer && ((res->me->proxy == NULL) != (res->peer->proxy == NULL))) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "Either both 'on' sections must contain a proxy subsection, or none.\n", res->config_file, res->start_line, res->name); config_valid = 0; } for (bpo = res->become_primary_on; bpo; bpo = bpo->next) { if (res->peer && !name_in_names(bpo->name, res->me->on_hosts) && !name_in_names(bpo->name, res->peer->on_hosts) && strcmp(bpo->name, "both")) { fprintf(stderr, "%s:%d: in resource %s:\n\t" "become-primary-on contains '%s', which is not named with the 'on' sections.\n", res->config_file, res->start_line, res->name, bpo->name); config_valid = 0; } } } static void global_validate_maybe_expand_die_if_invalid(int expand) { struct d_resource *res, *tmp; for_each_resource(res, tmp, config) { validate_resource(res); if (!config_valid) exit(E_config_invalid); if (expand) { convert_after_option(res); convert_discard_opt(res); } } } /* * returns a pointer to an malloced area that contains * an absolute, canonical, version of path. * aborts if any allocation or syscall fails. * return value should be free()d, once no longer needed. */ char *canonify_path(char *path) { int cwd_fd = -1; char *last_slash; char *tmp; char *that_wd; char *abs_path; if (!path || !path[0]) { fprintf(stderr, "cannot canonify an empty path\n"); exit(E_usage); } tmp = strdupa(path); last_slash = strrchr(tmp, '/'); if (last_slash) { *last_slash++ = '\0'; cwd_fd = open(".", O_RDONLY); if (cwd_fd < 0) { fprintf(stderr, "open(\".\") failed: %m\n"); exit(E_usage); } if (chdir(tmp)) { fprintf(stderr, "chdir(\"%s\") failed: %m\n", tmp); exit(E_usage); } } else { last_slash = tmp; } that_wd = getcwd(NULL, 0); if (!that_wd) { fprintf(stderr, "getcwd() failed: %m\n"); exit(E_usage); } if (!strcmp("/", that_wd)) m_asprintf(&abs_path, "/%s", last_slash); else m_asprintf(&abs_path, "%s/%s", that_wd, last_slash); free(that_wd); if (cwd_fd >= 0) { if (fchdir(cwd_fd) < 0) { fprintf(stderr, "fchdir() failed: %m\n"); exit(E_usage); } } return abs_path; } void assign_command_names_from_argv0(char **argv) { /* in case drbdadm is called with an absolute or relative pathname * look for the drbdsetup binary in the same location, * otherwise, just let execvp sort it out... */ if ((progname = strrchr(argv[0], '/')) == 0) { progname = argv[0]; drbdsetup = strdup("drbdsetup-83"); drbdmeta = strdup("drbdmeta"); drbd_proxy_ctl = strdup("drbd-proxy-ctl"); } else { struct cmd_helper { char *name; char **var; }; struct cmd_helper helpers[] = { {"drbdsetup-83", &drbdsetup}, {"drbdmeta", &drbdmeta}, {"drbd-proxy-ctl", &drbd_proxy_ctl}, {NULL, NULL} }; size_t len_dir, l; struct cmd_helper *c; ++progname; len_dir = progname - argv[0]; for (c = helpers; c->name; ++c) { l = len_dir + strlen(c->name) + 1; *(c->var) = malloc(l); if (*(c->var)) { strncpy(*(c->var), argv[0], len_dir); strcpy(*(c->var) + len_dir, c->name); } } /* for pretty printing, truncate to basename */ argv[0] = progname; } } int parse_options(int argc, char **argv) { opterr = 1; optind = 0; while (1) { int c; c = getopt_long(argc, argv, make_optstring(admopt), admopt, 0); if (c == -1) break; switch (c) { case 'S': is_drbd_top = 1; break; case 'v': verbose++; break; case 'd': dry_run++; break; case 'c': if (!strcmp(optarg, "-")) { yyin = stdin; if (asprintf(&config_file, "STDIN") < 0) { fprintf(stderr, "asprintf(config_file): %m\n"); return 20; } config_from_stdin = 1; } else { yyin = fopen(optarg, "r"); if (!yyin) { fprintf(stderr, "Can not open '%s'.\n.", optarg); exit(E_exec_error); } if (asprintf(&config_file, "%s", optarg) < 0) { fprintf(stderr, "asprintf(config_file): %m\n"); return 20; } } break; case 't': config_test = optarg; break; case 's': { char *pathes[2]; pathes[0] = optarg; pathes[1] = 0; find_drbdcmd(&drbdsetup, pathes); } break; case 'm': { char *pathes[2]; pathes[0] = optarg; pathes[1] = 0; find_drbdcmd(&drbdmeta, pathes); } break; case 'p': { char *pathes[2]; pathes[0] = optarg; pathes[1] = 0; find_drbdcmd(&drbd_proxy_ctl, pathes); } break; case 'n': { char *c; int shell_var_name_ok = 1; for (c = optarg; *c && shell_var_name_ok; c++) { switch (*c) { case 'a'...'z': case 'A'...'Z': case '0'...'9': case '_': break; default: shell_var_name_ok = 0; } } if (shell_var_name_ok) sh_varname = optarg; else fprintf(stderr, "ignored --sh-varname=%s: " "contains suspect characters, allowed set is [a-zA-Z0-9_]\n", optarg); } break; case 'f': force = 1; break; case 'V': printf("DRBDADM_BUILDTAG=%s\n", shell_escape(drbd_buildtag())); printf("DRBDADM_API_VERSION=%u\n", API_VERSION); printf("DRBD_KERNEL_VERSION_CODE=0x%06x\n", version_code_kernel()); printf("DRBDADM_VERSION_CODE=0x%06x\n", version_code_userland()); printf("DRBDADM_VERSION=%s\n", shell_escape(REL_VERSION)); exit(0); break; case 'P': connect_to_host = optarg; break; case '?': /* commented out, since opterr=1 * fprintf(stderr,"Unknown option %s\n",argv[optind-1]); */ fprintf(stderr, "try '%s help'\n", progname); return 20; break; } } return 0; } static void substitute_deprecated_cmd(char **c, char *deprecated, char *substitution) { if (!strcmp(*c, deprecated)) { fprintf(stderr, "'%s %s' is deprecated, use '%s %s' instead.\n", progname, deprecated, progname, substitution); *c = substitution; } } struct adm_cmd *find_cmd(char *cmdname) { struct adm_cmd *cmd = NULL; unsigned int i; if (!strcmp("hidden-commands", cmdname)) { // before parsing the configuration file... hidden_cmds(NULL, NULL); exit(0); } if (!strncmp("help", cmdname, 5)) print_usage_and_exit(0); /* R_PRIMARY / R_SECONDARY is not a state, but a role. Whatever that * means, actually. But anyways, we decided to start using _role_ as * the terminus of choice, and deprecate "state". */ substitute_deprecated_cmd(&cmdname, "state", "role"); /* "outdate-peer" got renamed to fence-peer, * it is not required to actually outdate the peer, * depending on situation it may be sufficient to power-reset it * or do some other fencing action, or even call out to "meatware". * The name of the handler should not imply something that is not done. */ substitute_deprecated_cmd(&cmdname, "outdate-peer", "fence-peer"); for (i = 0; i < ARRAY_SIZE(cmds); i++) { if (!strcmp(cmds[i].name, cmdname)) { cmd = cmds + i; break; } } return cmd; } char *config_file_from_arg(char *arg) { char *f; int minor = minor_by_id(arg); if (minor < 0) { /* this is expected, if someone wants to test the configured * handlers from the command line, using resource names */ fprintf(stderr, "Couldn't find minor from id %s, " "expecting minor- as id. " "Trying default config files.\n", arg); return NULL; } f = lookup_minor(minor); if (!f) { fprintf(stderr, "Don't know which config file belongs to minor %d, " "trying default ones...\n", minor); } else { yyin = fopen(f, "r"); if (yyin == NULL) { fprintf(stderr, "Couldn't open file %s for reading, reason: %m\n" "trying default config file...\n", config_file); } } return f; } void assign_default_config_file(void) { int i; for (i = 0; conf_file[i]; i++) { yyin = fopen(conf_file[i], "r"); if (yyin) { config_file = conf_file[i]; break; } } if (!config_file) { fprintf(stderr, "Can not open '%s': %m\n", conf_file[i - 1]); exit(E_config_invalid); } } void count_resources_or_die(void) { int m, mc = global_options.minor_count; struct d_resource *res, *tmp; highest_minor = 0; for_each_resource(res, tmp, config) { if (res->ignore) continue; m = res->me->device_minor; if (m > highest_minor) highest_minor = m; nr_resources++; if (res->stacked) nr_stacked++; else if (res->ignore) nr_ignore++; else nr_normal++; } // Just for the case that minor_of_res() returned 0 for all devices. if (nr_resources > (highest_minor + 1)) highest_minor = nr_resources - 1; if (mc && mc < (highest_minor + 1)) { fprintf(stderr, "The highest minor you have in your config is %d" "but a minor_count of %d in your config!\n", highest_minor, mc); exit(E_usage); } } void die_if_no_resources(void) { if (!is_drbd_top && nr_ignore > 0 && nr_normal == 0) { fprintf(stderr, "WARN: no normal resources defined for this host (%s)!?\n" "Misspelled name of the local machine with the 'on' keyword ?\n", nodeinfo.nodename); exit(E_usage); } if (!is_drbd_top && nr_normal == 0) { fprintf(stderr, "WARN: no normal resources defined for this host (%s)!?\n", nodeinfo.nodename); exit(E_usage); } if (is_drbd_top && nr_stacked == 0) { fprintf(stderr, "WARN: nothing stacked for this host (%s), " "nothing to do in stacked mode!\n", nodeinfo.nodename); exit(E_usage); } } void print_dump_xml_header(void) { printf("\n", config_save); ++indent; dump_global_info_xml(); dump_common_info_xml(); } void print_dump_header(void) { printf("# %s\n", config_save); dump_global_info(); dump_common_info(); } int main(int argc, char **argv) { size_t i; int rv = 0; struct adm_cmd *cmd = NULL; struct d_resource *res, *tmp; char *env_drbd_nodename = NULL; int is_dump_xml; int is_dump; yyin = NULL; uname(&nodeinfo); /* FIXME maybe fold to lower case ? */ no_tty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout))); env_drbd_nodename = getenv("__DRBD_NODE__"); if (env_drbd_nodename && *env_drbd_nodename) { strncpy(nodeinfo.nodename, env_drbd_nodename, sizeof(nodeinfo.nodename) - 1); nodeinfo.nodename[sizeof(nodeinfo.nodename) - 1] = 0; fprintf(stderr, "\n" " found __DRBD_NODE__ in environment\n" " PRETENDING that I am >>%s<<\n\n", nodeinfo.nodename); } assign_command_names_from_argv0(argv); if (argc == 1) print_usage_and_exit("missing arguments"); // arguments missing. if (drbdsetup == NULL || drbdmeta == NULL || drbd_proxy_ctl == NULL) { fprintf(stderr, "could not strdup argv[0].\n"); exit(E_exec_error); } if (!getenv("DRBD_DONT_WARN_ON_VERSION_MISMATCH")) warn_on_version_mismatch(); rv = parse_options(argc, argv); if (rv) return rv; /* store everything before the command name as pass through option/argument */ while (optind < argc) { cmd = find_cmd(argv[optind]); if (cmd) break; setup_opts[soi++] = argv[optind++]; } if (optind == argc) print_usage_and_exit(0); if (cmd == NULL) { fprintf(stderr, "Unknown command '%s'.\n", argv[optind]); exit(E_usage); } if (config_test && !cmd->test_config) { fprintf(stderr, "The --config-to-test (-t) option is only allowed " "with the dump and sh-nop commands\n"); exit(E_usage); } do_verify_ips = cmd->verify_ips; optind++; is_dump_xml = (cmd->function == adm_dump_xml); is_dump = (is_dump_xml || cmd->function == adm_dump); /* remaining argv are expected to be resource names * optind == argc: no resourcenames given. * optind + 1 == argc: exactly one resource name (or "all") given * optind + 1 < argc: multiple resource names given. */ if (optind == argc) { if (is_dump) all_resources = 1; else if (cmd->res_name_required) print_usage_and_exit("missing resourcename arguments"); } else if (optind + 1 < argc) { if (!cmd->res_name_required) fprintf(stderr, "this command will ignore resource names!\n"); else if (cmd->use_cached_config_file) fprintf(stderr, "You should not use this command with multiple resources!\n"); } if (!config_file && cmd->use_cached_config_file) config_file = config_file_from_arg(argv[optind]); if (!config_file) /* may exit if no config file can be used! */ assign_default_config_file(); /* for error-reporting reasons config_file may be re-assigned by adm_adjust, * we need the current value for register_minor, though. * save that. */ if (config_from_stdin) config_save = config_file; else config_save = canonify_path(config_file); my_parse(); if (config_test) { char *saved_config_file = config_file; char *saved_config_save = config_save; config_file = config_test; config_save = canonify_path(config_test); fclose(yyin); yyin = fopen(config_test, "r"); if (!yyin) { fprintf(stderr, "Can not open '%s'.\n.", config_test); exit(E_exec_error); } my_parse(); config_file = saved_config_file; config_save = saved_config_save; } if (!config_valid) exit(E_config_invalid); post_parse(config, cmd->is_proxy_cmd ? match_on_proxy : 0); if (!is_dump || dry_run || verbose) expand_common(); if (is_dump || dry_run || config_from_stdin) do_register_minor = 0; count_resources_or_die(); if (cmd->uc_dialog) uc_node(global_options.usage_count); if (cmd->res_name_required) { if (config == NULL) { fprintf(stderr, "no resources defined!\n"); exit(E_usage); } global_validate_maybe_expand_die_if_invalid(!is_dump); if (optind == argc || !strcmp(argv[optind], "all")) { /* either no resource arguments at all, * but command is dump / dump-xml, so implicit "all", * or an explicit "all" argument is given */ all_resources = 1; if (!is_dump || !force) die_if_no_resources(); /* verify ips first, for all of them */ for_each_resource(res, tmp, config) { verify_ips(res); } if (!config_valid) exit(E_config_invalid); if (is_dump_xml) print_dump_xml_header(); else if (is_dump) print_dump_header(); for_each_resource(res, tmp, config) { if (!is_dump && res->ignore) continue; if (!is_dump && is_drbd_top != res->stacked) continue; int r = call_cmd(cmd, res, EXIT_ON_FAIL); /* does exit for r >= 20! */ /* this super positioning of return values is soo ugly * anyone any better idea? */ if (r > rv) rv = r; } if (is_dump_xml) { --indent; printf("\n"); } } else { /* explicit list of resources to work on */ for (i = optind; (int)i < argc; i++) { res = res_by_name(argv[i]); if (!res) res = res_by_minor(argv[i]); if (!res) { fprintf(stderr, "'%s' not defined in your config.\n", argv[i]); exit(E_usage); } if (res->ignore && !is_dump) { fprintf(stderr, "'%s' ignored, since this host (%s) is not mentioned with an 'on' keyword.\n", res->name, nodeinfo.nodename); rv = E_usage; continue; } if (is_drbd_top != res->stacked && !is_dump) { fprintf(stderr, "'%s' is a %s resource, and not available in %s mode.\n", res->name, res-> stacked ? "stacked" : "normal", is_drbd_top ? "stacked" : "normal"); rv = E_usage; continue; } verify_ips(res); if (!is_dump && !config_valid) exit(E_config_invalid); rv = call_cmd(cmd, res, EXIT_ON_FAIL); /* does exit for rv >= 20! */ } } } else { // Commands which do not need a resource name /* no call_cmd, as that implies register_minor, * which does not make sense for resource independent commands */ rv = cmd->function(config, cmd->name); if (rv >= 10) { /* why do we special case the "generic sh-*" commands? */ fprintf(stderr, "command %s exited with code %d\n", cmd->name, rv); exit(rv); } } /* do we really have to bitor the exit code? * it is even only a Boolean value in this case! */ rv |= run_dcmds(); free_config(config); return rv; } void yyerror(char *text) { fprintf(stderr, "%s:%d: %s\n", config_file, line, text); exit(E_syntax); } drbd-8.4.4/user/legacy/drbdadm_minor_table.c0000664000000000000000000001030211753207431017477 0ustar rootroot/* drbdadm_minor_table.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. It was written by Johannes Thoma Copyright (C) 2002-2008, LINBIT Information Technologies GmbH. Copyright (C) 2002-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ /* This keeps track of which DRBD minor was configured in which * config file. This is required to have alternative config files * (-c switch) and userland event handlers. */ #include #include #include #include #include #include #include #include #include #include "config.h" #define MAX_MINOR 256 #define MAX_REGISTER_PATH_LEN 1024 /* buf has to be big enough to hold that path. * it is assumed that sprintf cannot fail :-] */ void linkname_from_minor(char *buf, int minor) { sprintf(buf, "%s/drbd-minor-%d.conf", DRBD_LIB_DIR, minor); } int unregister_minor(int minor) { char buf[255]; if (minor >= MAX_MINOR || minor < 0) { fprintf(stderr, "unregister_minor: minor too big (%d).\n", minor); return -1; } linkname_from_minor(buf, minor); if (unlink(buf) < 0) { if (errno != ENOENT) { perror("unlink"); return -1; } } return 0; } int register_minor(int minor, const char *path) { char buf[255]; struct stat stat_buf; int err = -1; if (minor >= MAX_MINOR || minor < 0) { fprintf(stderr, "register_minor: minor too big (%d).\n", minor); return -1; } linkname_from_minor(buf, minor); if (!path || !path[0]) fprintf(stderr, "Cannot register an empty path.\n"); else if (path[0] != '/') fprintf(stderr, "Absolute path expected, " "won't register relative path (%s).\n", path); else if (strlen(path) >= MAX_REGISTER_PATH_LEN) fprintf(stderr, "path (%s):\ntoo long to be registered, " "max path len supported: %u\n", path, MAX_REGISTER_PATH_LEN-1); else if (stat(path, &stat_buf) < 0) fprintf(stderr, "stat(%s): %m\n", path); else if (unlink(buf) < 0 && errno != ENOENT) fprintf(stderr, "unlink(%s): %m\n", buf); else if (symlink(path, buf) < 0) fprintf(stderr, "symlink(%s, %s): %m\n", path, buf); else /* it did work out after all! */ err = 0; return err; } /* This returns a static buffer containing the real * configuration file known to be used last for this minor. * If you need the return value longer, stuff it away with strdup. */ char *lookup_minor(int minor) { static char buf[255]; static char resolved_path[MAX_REGISTER_PATH_LEN+1]; struct stat stat_buf; ssize_t len; if (minor >= MAX_MINOR || minor < 0) { fprintf(stderr, "register_minor: minor too big (%d).\n", minor); return NULL; } linkname_from_minor(buf, minor); if (stat(buf, &stat_buf) < 0) { if (errno != ENOENT) fprintf(stderr, "stat(%s): %m\n", buf); return NULL; } len = readlink(buf, resolved_path, sizeof(resolved_path)-1); if (len < 0) { perror("readlink"); return NULL; } if (len >= MAX_REGISTER_PATH_LEN) fprintf(stderr, "readlink(%s): result has probably been truncated\n", buf); resolved_path[len] = '\0'; return resolved_path; } #ifdef TEST int main(int argc, char ** argv) { register_minor(1, "/etc/drbd-xy.conf"); register_minor(15, "/etc/drbd-82.conf"); register_minor(14, "/../../../../../../etc/drbd-82.conf"); printf("Minor 1 is %s.\n", lookup_minor(1)); printf("Minor 2 is %s.\n", lookup_minor(2)); printf("Minor 14 is %s.\n", lookup_minor(14)); printf("Minor 15 is %s.\n", lookup_minor(15)); return 0; } #endif drbd-8.4.4/user/legacy/drbdadm_parser.c0000664000000000000000000012464512132747531016522 0ustar rootroot/* * drbdadm_parser.c a hand crafted parser This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2006-2008, LINBIT Information Technologies GmbH Copyright (C) 2006-2008, Philipp Reisner Copyright (C) 2006-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include "drbdadm.h" #include "linux/drbd_limits.h" #include "drbdtool_common.h" #include "drbdadm_parser.h" YYSTYPE yylval; ///////////////////// static int c_section_start; void my_parse(void); struct d_name *names_from_str(char* str) { struct d_name *names; names = malloc(sizeof(struct d_name)); names->next = NULL; names->name = strdup(str); return names; } char *_names_to_str_c(char* buffer, struct d_name *names, char c) { int n = 0; if (!names) return buffer; while (1) { n += snprintf(buffer + n, NAMES_STR_SIZE - n, "%s", names->name); names = names->next; if (!names) return buffer; n += snprintf(buffer + n, NAMES_STR_SIZE - n, "%c", c); } } char *_names_to_str(char* buffer, struct d_name *names) { return _names_to_str_c(buffer, names, ' '); } int name_in_names(char *name, struct d_name *names) { while (names) { if (!strcmp(names->name, name)) return 1; names = names->next; } return 0; } void free_names(struct d_name *names) { struct d_name *nf; while (names) { nf = names->next; free(names->name); free(names); names = nf; } } static void append_names(struct d_name **head, struct d_name ***last, struct d_name *to_copy) { struct d_name *new; while (to_copy) { new = malloc(sizeof(struct d_name)); if (!*head) *head = new; new->name = strdup(to_copy->name); new->next = NULL; if (*last) **last = new; *last = &new->next; to_copy = to_copy->next; } } struct d_name *concat_names(struct d_name *to_copy1, struct d_name *to_copy2) { struct d_name *head = NULL, **last = NULL; append_names(&head, &last, to_copy1); append_names(&head, &last, to_copy2); return head; } void m_strtoll_range(const char *s, char def_unit, const char *name, unsigned long long min, unsigned long long max) { unsigned long long r = m_strtoll(s, def_unit); char unit[] = { def_unit > '1' ? def_unit : 0, 0 }; if (min > r || r > max) { fprintf(stderr, "%s:%d: %s %s => %llu%s out of range [%llu..%llu]%s.\n", config_file, fline, name, s, r, unit, min, max, unit); if (config_valid <= 1) { config_valid = 0; return; } } if (DEBUG_RANGE_CHECK) { fprintf(stderr, "%s:%d: %s %s => %llu%s in range [%llu..%llu]%s.\n", config_file, fline, name, s, r, unit, min, max, unit); } } void range_check(const enum range_checks what, const char *name, const char *value) { switch (what) { case R_NO_CHECK: break; default: fprintf(stderr, "%s:%d: unknown range for %s => %s\n", config_file, fline, name, value); break; case R_MINOR_COUNT: m_strtoll_range(value, 1, name, DRBD_MINOR_COUNT_MIN, DRBD_MINOR_COUNT_MAX); break; case R_DIALOG_REFRESH: m_strtoll_range(value, 1, name, DRBD_DIALOG_REFRESH_MIN, DRBD_DIALOG_REFRESH_MAX); break; case R_DISK_SIZE: m_strtoll_range(value, 's', name, DRBD_DISK_SIZE_SECT_MIN, DRBD_DISK_SIZE_SECT_MAX); break; case R_TIMEOUT: m_strtoll_range(value, 1, name, DRBD_TIMEOUT_MIN, DRBD_TIMEOUT_MAX); break; case R_CONNECT_INT: m_strtoll_range(value, 1, name, DRBD_CONNECT_INT_MIN, DRBD_CONNECT_INT_MAX); break; case R_PING_INT: m_strtoll_range(value, 1, name, DRBD_PING_INT_MIN, DRBD_PING_INT_MAX); break; case R_MAX_BUFFERS: m_strtoll_range(value, 1, name, DRBD_MAX_BUFFERS_MIN, DRBD_MAX_BUFFERS_MAX); break; case R_MAX_EPOCH_SIZE: m_strtoll_range(value, 1, name, DRBD_MAX_EPOCH_SIZE_MIN, DRBD_MAX_EPOCH_SIZE_MAX); break; case R_SNDBUF_SIZE: m_strtoll_range(value, 1, name, DRBD_SNDBUF_SIZE_MIN, DRBD_SNDBUF_SIZE_MAX); break; case R_RCVBUF_SIZE: m_strtoll_range(value, 1, name, DRBD_RCVBUF_SIZE_MIN, DRBD_RCVBUF_SIZE_MAX); break; case R_KO_COUNT: m_strtoll_range(value, 1, name, DRBD_KO_COUNT_MIN, DRBD_KO_COUNT_MAX); break; case R_RATE: m_strtoll_range(value, 'K', name, DRBD_RATE_MIN, DRBD_RATE_MAX); break; case R_AL_EXTENTS: m_strtoll_range(value, 1, name, DRBD_AL_EXTENTS_MIN, DRBD_AL_EXTENTS_MAX); break; case R_PORT: m_strtoll_range(value, 1, name, DRBD_PORT_MIN, DRBD_PORT_MAX); break; /* FIXME not yet implemented! case R_META_IDX: m_strtoll_range(value, 1, name, DRBD_META_IDX_MIN, DRBD_META_IDX_MAX); break; */ case R_WFC_TIMEOUT: m_strtoll_range(value, 1, name, DRBD_WFC_TIMEOUT_MIN, DRBD_WFC_TIMEOUT_MAX); break; case R_DEGR_WFC_TIMEOUT: m_strtoll_range(value, 1, name, DRBD_DEGR_WFC_TIMEOUT_MIN, DRBD_DEGR_WFC_TIMEOUT_MAX); break; case R_OUTDATED_WFC_TIMEOUT: m_strtoll_range(value, 1, name, DRBD_OUTDATED_WFC_TIMEOUT_MIN, DRBD_OUTDATED_WFC_TIMEOUT_MAX); break; case R_C_PLAN_AHEAD: m_strtoll_range(value, 1, name, DRBD_C_PLAN_AHEAD_MIN, DRBD_C_PLAN_AHEAD_MAX); break; case R_C_DELAY_TARGET: m_strtoll_range(value, 1, name, DRBD_C_DELAY_TARGET_MIN, DRBD_C_DELAY_TARGET_MAX); break; case R_C_FILL_TARGET: m_strtoll_range(value, 's', name, DRBD_C_FILL_TARGET_MIN, DRBD_C_FILL_TARGET_MAX); break; case R_C_MAX_RATE: m_strtoll_range(value, 'k', name, DRBD_C_MAX_RATE_MIN, DRBD_C_MAX_RATE_MAX); break; case R_C_MIN_RATE: m_strtoll_range(value, 'k', name, DRBD_C_MIN_RATE_MIN, DRBD_C_MIN_RATE_MAX); break; case R_CONG_FILL: m_strtoll_range(value, 's', name, DRBD_CONG_FILL_MIN, DRBD_CONG_FILL_MAX); break; case R_CONG_EXTENTS: m_strtoll_range(value, 1, name, DRBD_CONG_EXTENTS_MIN, DRBD_CONG_EXTENTS_MAX); break; } } struct d_option *new_opt(char *name, char *value) { struct d_option *cn = malloc(sizeof(struct d_option)); /* fprintf(stderr,"%s:%d: %s = %s\n",config_file,line,name,value); */ cn->name = name; cn->value = value; cn->mentioned = 0; cn->is_default = 0; cn->is_escaped = 0; return cn; } static void derror(struct d_host_info *host, struct d_resource *res, char *text) { config_valid = 0; fprintf(stderr, "%s:%d: in resource %s, on %s { ... }:" " '%s' keyword missing.\n", config_file, c_section_start, res->name, names_to_str(host->on_hosts), text); } void pdperror(char *text) { config_valid = 0; fprintf(stderr, "%s:%d: in proxy plugin section: %s.\n", config_file, line, text); exit(E_config_invalid); } static void pperror(struct d_host_info *host, struct d_proxy_info *proxy, char *text) { config_valid = 0; fprintf(stderr, "%s:%d: in section: on %s { proxy on %s { ... } }:" " '%s' keyword missing.\n", config_file, c_section_start, names_to_str(host->on_hosts), names_to_str(proxy->on_hosts), text); } #define typecheck(type,x) \ ({ type __dummy; \ typeof(x) __dummy2; \ (void)(&__dummy == &__dummy2); \ 1; \ }) #define for_each_host(h_,hosts_) \ for ( ({ typecheck(struct d_name*, h_); \ h_ = hosts_; }); \ h_; h_ = h_->next) /* * for check_uniq: check uniqueness of * resource names, ip:port, node:disk and node:device combinations * as well as resource:section ... * hash table to test for uniqueness of these values... * 256 (max minors) * *( * 2 (host sections) * 4 (res ip:port node:disk node:device) * + 4 (other sections) * + some more, * if we want to check for scoped uniqueness of *every* option * ) * since nobody (?) will actually use more than a dozen minors, * this should be more than enough. */ struct hsearch_data global_htable; void check_uniq_init(void) { memset(&global_htable, 0, sizeof(global_htable)); if (!hcreate_r(256 * ((2 * 4) + 4), &global_htable)) { fprintf(stderr, "Insufficient memory.\n"); exit(E_exec_error); }; } /* some settings need only be unique within one resource definition. * we need currently about 8 + (number of host) * 8 entries, * 200 should be much more than enough. */ struct hsearch_data per_resource_htable; void check_upr_init(void) { static int created = 0; if (config_valid >= 2) return; if (created) hdestroy_r(&per_resource_htable); memset(&per_resource_htable, 0, sizeof(per_resource_htable)); if (!hcreate_r(256, &per_resource_htable)) { fprintf(stderr, "Insufficient memory.\n"); exit(E_exec_error); }; created = 1; } /* FIXME * strictly speaking we don't need to check for uniqueness of disk and device names, * but for uniqueness of their major:minor numbers ;-) */ int vcheck_uniq(struct hsearch_data *ht, const char *what, const char *fmt, va_list ap) { int rv; ENTRY e, *ep; e.key = e.data = ep = NULL; /* if we are done parsing the config file, * switch off this paranoia */ if (config_valid >= 2) return 1; rv = vasprintf(&e.key, fmt, ap); if (rv < 0) { perror("vasprintf"); exit(E_thinko); } if (EXIT_ON_CONFLICT && !what) { fprintf(stderr, "Oops, unset argument in %s:%d.\n", __FILE__, __LINE__); exit(E_thinko); } m_asprintf((char **)&e.data, "%s:%u", config_file, fline); hsearch_r(e, FIND, &ep, ht); //fprintf(stderr, "FIND %s: %p\n", e.key, ep); if (ep) { if (what) { fprintf(stderr, "%s: conflicting use of %s '%s' ...\n" "%s: %s '%s' first used here.\n", (char *)e.data, what, ep->key, (char *)ep->data, what, ep->key); } free(e.key); free(e.data); config_valid = 0; } else { //fprintf(stderr, "ENTER %s\t=>\t%s\n", e.key, (char *)e.data); hsearch_r(e, ENTER, &ep, ht); if (!ep) { fprintf(stderr, "hash table entry (%s => %s) failed\n", e.key, (char *)e.data); exit(E_thinko); } ep = NULL; } if (EXIT_ON_CONFLICT && ep) exit(E_config_invalid); return !ep; } int check_uniq(const char *what, const char *fmt, ...) { int rv; va_list ap; va_start(ap, fmt); rv = vcheck_uniq(&global_htable, what, fmt, ap); va_end(ap); return rv; } /* unique per resource */ int check_upr(const char *what, const char *fmt, ...) { int rv; va_list ap; va_start(ap, fmt); rv = vcheck_uniq(&per_resource_htable, what, fmt, ap); va_end(ap); return rv; } void check_meta_disk(struct d_host_info *host) { struct d_name *h; if (strcmp(host->meta_disk, "internal") != 0) { /* external */ if (host->meta_index == NULL) { fprintf(stderr, "%s:%d: expected 'meta-disk = %s [index]'.\n", config_file, fline, host->meta_disk); } /* index either some number, or "flexible" */ for_each_host(h, host->on_hosts) check_uniq("meta-disk", "%s:%s[%s]", h->name, host->meta_disk, host->meta_index); } else if (host->meta_index) { /* internal */ if (strcmp(host->meta_index, "flexible") != 0) { /* internal, not flexible, but index given: no sir! */ fprintf(stderr, "%s:%d: no index allowed with 'meta-disk = internal'.\n", config_file, fline); } /* else internal, flexible: fine */ } else { /* internal, not flexible */ host->meta_index = strdup("internal"); } } static void pe_expected(const char *exp) { const char *s = yytext; fprintf(stderr, "%s:%u: Parse error: '%s' expected,\n\t" "but got '%.20s%s'\n", config_file, line, exp, s, strlen(s) > 20 ? "..." : ""); exit(E_config_invalid); } static void check_string_error(int got) { const char *msg; switch(got) { case TK_ERR_STRING_TOO_LONG: msg = "Token too long"; break; case TK_ERR_DQSTRING_TOO_LONG: msg = "Double quoted string too long"; break; case TK_ERR_DQSTRING: msg = "Unterminated double quoted string\n we don't allow embedded newlines\n "; break; default: return; } fprintf(stderr,"%s:%u: %s >>>%.20s...<<<\n", config_file, line, msg, yytext); exit(E_config_invalid); } static void pe_expected_got(const char *exp, int got) { static char tmp[2] = "\0"; const char *s = yytext; if (exp[0] == '\'' && exp[1] && exp[2] == '\'' && exp[3] == 0) { tmp[0] = exp[1]; } fprintf(stderr, "%s:%u: Parse error: '%s' expected,\n\t" "but got '%.20s%s' (TK %d)\n", config_file, line, tmp[0] ? tmp : exp, s, strlen(s) > 20 ? "..." : "", got); exit(E_config_invalid); } #define EXP(TOKEN1) \ ({ \ int token; \ token = yylex(); \ if (token != TOKEN1) { \ if (TOKEN1 == TK_STRING) \ check_string_error(token); \ pe_expected_got( #TOKEN1, token); \ } \ token; \ }) static void expect_STRING_or_INT(void) { int token = yylex(); switch(token) { case TK_INTEGER: case TK_STRING: break; case TK_ON: yylval.txt = strdup(yytext); break; default: check_string_error(token); pe_expected_got("TK_STRING | TK_INTEGER", token); } } static void parse_global(void) { fline = line; check_uniq("global section", "global"); if (config) { fprintf(stderr, "%s:%u: You should put the global {} section\n\t" "in front of any resource {} section\n", config_file, line); } EXP('{'); while (1) { int token = yylex(); fline = line; switch (token) { case TK_DISABLE_IP_VERIFICATION: global_options.disable_ip_verification = 1; break; case TK_MINOR_COUNT: EXP(TK_INTEGER); range_check(R_MINOR_COUNT, "minor-count", yylval.txt); global_options.minor_count = atoi(yylval.txt); break; case TK_DIALOG_REFRESH: EXP(TK_INTEGER); range_check(R_DIALOG_REFRESH, "dialog-refresh", yylval.txt); global_options.dialog_refresh = atoi(yylval.txt); break; case TK_USAGE_COUNT: switch (yylex()) { case TK_YES: global_options.usage_count = UC_YES; break; case TK_NO: global_options.usage_count = UC_NO; break; case TK_ASK: global_options.usage_count = UC_ASK; break; default: pe_expected("yes | no | ask"); } break; case '}': return; default: pe_expected("dialog-refresh | minor-count | " "disable-ip-verification"); } EXP(';'); } } static void check_and_change_deprecated_alias(char **name, int token_option) { if (token_option == TK_HANDLER_OPTION) { if (!strcmp(*name, "outdate-peer")) { /* fprintf(stder, "config file:line: name is deprecated ...\n") */ free(*name); *name = strdup("fence-peer"); } } } static struct d_option *parse_options_d(int token_switch, int token_option, int token_delegate, void (*delegate)(void*), void *ctx) { char *opt_name; int token, token_group; enum range_checks rc; struct d_option *options = NULL, *ro = NULL; c_section_start = line; fline = line; while (1) { token_group = yylex(); /* Keep the higher bits in token_option, remove them from token. */ token = REMOVE_GROUP_FROM_TOKEN(token_group); fline = line; if (token == token_switch) { options = APPEND(options, new_opt(yylval.txt, NULL)); } else if (token == token_option || GET_TOKEN_GROUP(token_option & token_group)) { opt_name = yylval.txt; check_and_change_deprecated_alias(&opt_name, token_option); rc = yylval.rc; expect_STRING_or_INT(); range_check(rc, opt_name, yylval.txt); ro = new_opt(opt_name, yylval.txt); options = APPEND(options, ro); } else if (token == token_delegate || GET_TOKEN_GROUP(token_delegate & token_group)) { delegate(ctx); continue; } else if (token == TK_DEPRECATED_OPTION) { /* fprintf(stderr, "Warn: Ignoring deprecated option '%s'\n", yylval.txt); */ expect_STRING_or_INT(); } else if (token == '}') { return options; } else { pe_expected("an option keyword"); } switch (yylex()) { case TK__IS_DEFAULT: ro->is_default = 1; EXP(';'); break; case ';': break; default: pe_expected("_is_default | ;"); } } } static struct d_option *parse_options(int token_switch, int token_option) { return parse_options_d(token_switch, token_option, 0, NULL, NULL); } static void __parse_address(char** addr, char** port, char** af) { switch(yylex()) { case TK_SCI: /* 'ssocks' was names 'sci' before. */ if (af) *af = strdup("ssocks"); EXP(TK_IPADDR); break; case TK_SSOCKS: case TK_SDP: case TK_IPV4: if (af) *af = yylval.txt; EXP(TK_IPADDR); break; case TK_IPV6: if (af) *af = yylval.txt; EXP('['); EXP(TK_IPADDR6); break; case TK_IPADDR: if (af) *af = strdup("ipv4"); break; /* case '[': // Do not foster people's laziness ;) EXP(TK_IPADDR6); *af = strdup("ipv6"); break; */ default: pe_expected("ssocks | sdp | ipv4 | ipv6 | "); } if (addr) *addr = yylval.txt; if (af && !strcmp(*af, "ipv6")) EXP(']'); EXP(':'); EXP(TK_INTEGER); if (port) *port = yylval.txt; range_check(R_PORT, "port", yylval.txt); } static void parse_address(struct d_name *on_hosts, char** addr, char** port, char** af) { struct d_name *h; __parse_address(addr, port, af); if (!strcmp(*addr, "127.0.0.1") || !strcmp(*addr, "::1")) for_each_host(h, on_hosts) check_uniq("IP", "%s:%s:%s", h->name, *addr, *port); else check_uniq("IP", "%s:%s", *addr, *port); EXP(';'); } static void parse_hosts(struct d_name **pnp, char delimeter) { char errstr[20]; struct d_name *name; int hosts = 0; int token; while (1) { token = yylex(); switch (token) { case TK_STRING: name = malloc(sizeof(struct d_name)); name->name = yylval.txt; name->next = NULL; *pnp = name; pnp = &name->next; hosts++; break; default: if (token == delimeter) { if (!hosts) pe_expected_got("TK_STRING", token); return; } else { sprintf(errstr, "TK_STRING | '%c'", delimeter); pe_expected_got(errstr, token); } } } } static void parse_proxy_section(struct d_host_info *host) { struct d_proxy_info *proxy; proxy=calloc(1,sizeof(struct d_proxy_info)); host->proxy = proxy; EXP(TK_ON); parse_hosts(&proxy->on_hosts, '{'); while (1) { switch (yylex()) { case TK_INSIDE: parse_address(proxy->on_hosts, &proxy->inside_addr, &proxy->inside_port, &proxy->inside_af); break; case TK_OUTSIDE: parse_address(proxy->on_hosts, &proxy->outside_addr, &proxy->outside_port, &proxy->outside_af); break; case '}': goto break_loop; default: pe_expected("inside | outside"); } } break_loop: if (!proxy->inside_addr) pperror(host, proxy, "inside"); if (!proxy->outside_addr) pperror(host, proxy, "outside"); return; } static void parse_meta_disk(char **disk, char** index) { EXP(TK_STRING); *disk = yylval.txt; if (strcmp("internal", yylval.txt)) { EXP('['); EXP(TK_INTEGER); *index = yylval.txt; EXP(']'); EXP(';'); } else { EXP(';'); } } static void check_minor_nonsense(const char *devname, const int explicit_minor) { if (!devname) return; /* if devname is set, it starts with /dev/drbd */ if (only_digits(devname + 9)) { int m = strtol(devname + 9, NULL, 10); if (m == explicit_minor) return; fprintf(stderr, "%s:%d: explicit minor number must match with device name\n" "\tTry \"device /dev/drbd%u minor %u;\",\n" "\tor leave off either device name or explicit minor.\n" "\tArbitrary device names must start with /dev/drbd_\n" "\tmind the '_'! (/dev/ is optional, but drbd_ is required)\n", config_file, fline, explicit_minor, explicit_minor); config_valid = 0; return; } else if (devname[9] == '_') return; fprintf(stderr, "%s:%d: arbitrary device name must start with /dev/drbd_\n" "\tmind the '_'! (/dev/ is optional, but drbd_ is required)\n", config_file, fline); config_valid = 0; return; } static void parse_device(struct d_name* on_hosts, unsigned *minor, char **device) { struct d_name *h; int m; switch (yylex()) { case TK_STRING: if (!strncmp("drbd", yylval.txt, 4)) { m_asprintf(device, "/dev/%s", yylval.txt); free(yylval.txt); } else *device = yylval.txt; if (strncmp("/dev/drbd", *device, 9)) { fprintf(stderr, "%s:%d: device name must start with /dev/drbd\n" "\t(/dev/ is optional, but drbd is required)\n", config_file, fline); config_valid = 0; /* no goto out yet, * as that would additionally throw a parse error */ } switch (yylex()) { default: pe_expected("minor | ;"); /* fall through */ case ';': m = dt_minor_of_dev(*device); if (m < 0) { fprintf(stderr, "%s:%d: no minor given nor device name contains a minor number\n", config_file, fline); config_valid = 0; } *minor = m; goto out; case TK_MINOR: ; /* double fall through */ } case TK_MINOR: EXP(TK_INTEGER); *minor = atoi(yylval.txt); EXP(';'); /* if both device name and minor number are explicitly given, * force /dev/drbd or /dev/drbd_ */ check_minor_nonsense(*device, *minor); } out: for_each_host(h, on_hosts) { check_uniq("device-minor", "device-minor:%s:%u", h->name, *minor); if (*device) check_uniq("device", "device:%s:%s", h->name, *device); } } enum parse_host_section_flags { REQUIRE_ALL = 1, BY_ADDRESS = 2, }; static void parse_host_section(struct d_resource *res, struct d_name* on_hosts, enum parse_host_section_flags flags) { struct d_host_info *host; struct d_name *h; int in_braces = 1; c_section_start = line; fline = line; host=calloc(1,sizeof(struct d_host_info)); host->on_hosts = on_hosts; host->config_line = c_section_start; host->device_minor = -1; if (flags & BY_ADDRESS) { /* floating
{} */ char *fake_uname = NULL; int token; host->by_address = 1; __parse_address(&host->address, &host->port, &host->address_family); check_uniq("IP", "%s:%s", host->address, host->port); if (!strcmp(host->address_family, "ipv6")) m_asprintf(&fake_uname, "ipv6 [%s]:%s", host->address, host->port); else m_asprintf(&fake_uname, "%s:%s", host->address, host->port); on_hosts = names_from_str(fake_uname); host->on_hosts = on_hosts; token = yylex(); switch(token) { case '{': break; case ';': in_braces = 0; break; default: pe_expected_got("{ | ;", token); } } for_each_host(h, on_hosts) check_upr("host section", "%s: on %s", res->name, h->name); res->all_hosts = APPEND(res->all_hosts, host); while (in_braces) { int token = yylex(); fline = line; switch (token) { case TK_DISK: for_each_host(h, on_hosts) check_upr("disk statement", "%s:%s:disk", res->name, h->name); EXP(TK_STRING); host->disk = yylval.txt; for_each_host(h, on_hosts) check_uniq("disk", "disk:%s:%s", h->name, yylval.txt); EXP(';'); break; case TK_DEVICE: for_each_host(h, on_hosts) check_upr("device statement", "%s:%s:device", res->name, h->name); parse_device(on_hosts, &host->device_minor, &host->device); break; case TK_ADDRESS: if (host->by_address) { fprintf(stderr, "%s:%d: address statement not allowed for floating {} host sections\n", config_file, fline); config_valid = 0; exit(E_config_invalid); } for_each_host(h, on_hosts) check_upr("address statement", "%s:%s:address", res->name, h->name); parse_address(on_hosts, &host->address, &host->port, &host->address_family); range_check(R_PORT, "port", host->port); break; case TK_META_DISK: for_each_host(h, on_hosts) check_upr("meta-disk statement", "%s:%s:meta-disk", res->name, h->name); parse_meta_disk(&host->meta_disk, &host->meta_index); check_meta_disk(host); break; case TK_FLEX_META_DISK: for_each_host(h, on_hosts) check_upr("meta-disk statement", "%s:%s:meta-disk", res->name, h->name); EXP(TK_STRING); host->meta_disk = yylval.txt; if (strcmp("internal", yylval.txt)) { host->meta_index = strdup("flexible"); } check_meta_disk(host); EXP(';'); break; case TK_PROXY: parse_proxy_section(host); break; case '}': in_braces = 0; break; default: pe_expected("disk | device | address | meta-disk " "| flexible-meta-disk"); } } /* Inherit device, disk, meta_disk and meta_index from the resource. */ if(!host->disk && res->disk) { host->disk = strdup(res->disk); for_each_host(h, on_hosts) check_uniq("disk", "disk:%s:%s", h->name, host->disk); } if(!host->device && res->device) { host->device = strdup(res->device); } if (host->device_minor == -1U && res->device_minor != -1U) { host->device_minor = res->device_minor; for_each_host(h, on_hosts) check_uniq("device-minor", "device-minor:%s:%d", h->name, host->device_minor); } if(!host->meta_disk && res->meta_disk) { host->meta_disk = strdup(res->meta_disk); if(res->meta_index) host->meta_index = strdup(res->meta_index); check_meta_disk(host); } if (!(flags & REQUIRE_ALL)) return; if (!host->device && host->device_minor == -1U) derror(host, res, "device"); if (!host->disk) derror(host, res, "disk"); if (!host->address) derror(host, res, "address"); if (!host->meta_disk) derror(host, res, "meta-disk"); } void parse_skip() { int level; int token; fline = line; token = yylex(); switch (token) { case TK_STRING: EXP('{'); break; case '{': break; default: check_string_error(token); pe_expected("[ some_text ] {"); } level = 1; while (level) { switch (yylex()) { case '{': /* if you really want to, you can wrap this with a GB size config file :) */ level++; break; case '}': level--; break; case 0: fprintf(stderr, "%s:%u: reached eof " "while parsing this skip block.\n", config_file, fline); exit(E_config_invalid); } } while (level) ; } static void parse_stacked_section(struct d_resource* res) { struct d_host_info *host; struct d_name *h; c_section_start = line; fline = line; host=calloc(1,sizeof(struct d_host_info)); host->device_minor = -1; res->all_hosts = APPEND(res->all_hosts, host); EXP(TK_STRING); check_uniq("stacked-on-top-of", "stacked:%s", yylval.txt); host->lower_name = yylval.txt; m_asprintf(&host->meta_disk, "%s", "internal"); m_asprintf(&host->meta_index, "%s", "internal"); EXP('{'); while (1) { switch(yylex()) { case TK_DEVICE: for_each_host(h, host->on_hosts) check_upr("device statement", "%s:%s:device", res->name, h->name); parse_device(host->on_hosts, &host->device_minor, &host->device); break; case TK_ADDRESS: for_each_host(h, host->on_hosts) check_upr("address statement", "%s:%s:address", res->name, h->name); parse_address(NULL, &host->address, &host->port, &host->address_family); range_check(R_PORT, "port", yylval.txt); break; case TK_PROXY: parse_proxy_section(host); break; case '}': goto break_loop; default: pe_expected("device | address | proxy"); } } break_loop: res->stacked_on_one = 1; /* inherit device */ if (!host->device && res->device) { host->device = strdup(res->device); for_each_host(h, host->on_hosts) { if (host->device) check_uniq("device", "device:%s:%s", h->name, host->device); } } if (host->device_minor == -1U && res->device_minor != -1U) { host->device_minor = res->device_minor; for_each_host(h, host->on_hosts) check_uniq("device-minor", "device-minor:%s:%d", h->name, host->device_minor); } if (!host->device && host->device_minor == -1U) derror(host, res, "device"); if (!host->address) derror(host,res,"address"); if (!host->meta_disk) derror(host,res,"meta-disk"); } void startup_delegate(void *ctx) { struct d_resource *res = (struct d_resource *)ctx; if (!strcmp(yytext, "become-primary-on")) { parse_hosts(&res->become_primary_on, ';'); } else if (!strcmp(yytext, "stacked-timeouts")) { res->stacked_timeouts = 1; EXP(';'); } else pe_expected(" | become-primary-on | stacked-timeouts"); } void net_delegate(void *ctx) { enum pr_flags flags = (enum pr_flags)ctx; if (!strcmp(yytext, "discard-my-data") && flags & IgnDiscardMyData) EXP(';'); else pe_expected("an option keyword"); } void set_me_in_resource(struct d_resource* res, int match_on_proxy) { struct d_host_info *host; /* Determine the local host section */ for (host = res->all_hosts; host; host=host->next) { /* do we match this host? */ if (match_on_proxy) { if (!host->proxy || !name_in_names(nodeinfo.nodename, host->proxy->on_hosts)) continue; } else if (host->by_address) { if (!have_ip(host->address_family, host->address) && /* for debugging only, e.g. __DRBD_NODE__=10.0.0.1 */ strcmp(nodeinfo.nodename, host->address)) continue; } else if (host->lower) { if (!host->lower->me) continue; } else if (!host->on_hosts) { /* huh? a resource without hosts to run on?! */ continue; } else { if (!name_in_names(nodeinfo.nodename, host->on_hosts) && strcmp("_this_host", host->on_hosts->name)) continue; } /* we matched. */ if (res->ignore) { config_valid = 0; fprintf(stderr, "%s:%d: in resource %s, %s %s { ... }:\n" "\tYou cannot ignore and define at the same time.\n", res->config_file, host->config_line, res->name, host->lower ? "stacked-on-top-of" : "on", host->lower ? host->lower->name : names_to_str(host->on_hosts)); } if (res->me) { config_valid = 0; fprintf(stderr, "%s:%d: in resource %s, %s %s { ... } ... %s %s { ... }:\n" "\tThere are multiple host sections for this node.\n", res->config_file, host->config_line, res->name, res->me->lower ? "stacked-on-top-of" : "on", res->me->lower ? res->me->lower->name : names_to_str(res->me->on_hosts), host->lower ? "stacked-on-top-of" : "on", host->lower ? host->lower->name : names_to_str(host->on_hosts)); } res->me = host; if (host->lower) res->stacked = 1; } /* If there is no me, implicitly ignore that resource */ if (!res->me) { res->ignore = 1; return; } } void set_peer_in_resource(struct d_resource* res, int peer_required) { struct d_host_info *host = NULL; if (res->ignore) return; /* me must be already set */ if (!res->me) { /* should have been implicitly ignored. */ fprintf(stderr, "%s:%d: in resource %s:\n" "\tcannot determine the peer, don't even know myself!\n", res->config_file, res->start_line, res->name); exit(E_thinko); } /* only one host section? */ if (!res->all_hosts->next) { if (peer_required) { fprintf(stderr, "%s:%d: in resource %s:\n" "\tMissing section 'on { ... }'.\n", res->config_file, res->start_line, res->name); config_valid = 0; } return; } /* short cut for exactly two host sections. * silently ignore any --peer connect_to_host option. */ if (res->all_hosts->next->next == NULL) { res->peer = res->all_hosts == res->me ? res->all_hosts->next : res->all_hosts; if (dry_run > 1 && connect_to_host) fprintf(stderr, "%s:%d: in resource %s:\n" "\tIgnoring --peer '%s': there are only two host sections.\n", res->config_file, res->start_line, res->name, connect_to_host); return; } /* Multiple peer hosts to choose from. * we need some help! */ if (!connect_to_host) { if (peer_required) { fprintf(stderr, "%s:%d: in resource %s:\n" "\tThere are multiple host sections for the peer node.\n" "\tUse the --peer option to select which peer section to use.\n", res->config_file, res->start_line, res->name); config_valid = 0; } return; } for (host = res->all_hosts; host; host=host->next) { if (host->by_address && strcmp(connect_to_host, host->address)) continue; if (host->proxy && !name_in_names(nodeinfo.nodename, host->proxy->on_hosts)) continue; if (!name_in_names(connect_to_host, host->on_hosts)) continue; if (host == res->me) { fprintf(stderr, "%s:%d: in resource %s\n" "\tInvoked with --peer '%s', but that matches myself!\n", res->config_file, res->start_line, res->name, connect_to_host); res->peer = NULL; break; } if (res->peer) { fprintf(stderr, "%s:%d: in resource %s:\n" "\tInvoked with --peer '%s', but that matches multiple times!\n", res->config_file, res->start_line, res->name, connect_to_host); res->peer = NULL; break; } res->peer = host; } if (peer_required && !res->peer) { config_valid = 0; if (!host) fprintf(stderr, "%s:%d: in resource %s:\n" "\tNo host ('on' or 'floating') section matches --peer '%s'\n", res->config_file, res->start_line, res->name, connect_to_host); } } void set_on_hosts_in_res(struct d_resource *res) { struct d_resource *l_res, *tmp; struct d_host_info *host, *host2; struct d_name *h, **last; for (host = res->all_hosts; host; host=host->next) { if (host->lower_name) { for_each_resource(l_res, tmp, config) { if (!strcmp(l_res->name, host->lower_name)) break; } if (l_res == NULL) { fprintf(stderr, "%s:%d: in resource %s, " "referenced resource '%s' not defined.\n", res->config_file, res->start_line, res->name, host->lower_name); config_valid = 0; continue; } /* Simple: host->on_hosts = concat_names(l_res->me->on_hosts, l_res->peer->on_hosts); */ last = NULL; for (host2 = l_res->all_hosts; host2; host2 = host2->next) if (!host2->lower_name) { append_names(&host->on_hosts, &last, host2->on_hosts); for_each_host(h, host2->on_hosts) { check_uniq("device-minor", "device-minor:%s:%u", h->name, host->device_minor); if (host->device) check_uniq("device", "device:%s:%s", h->name, host->device); } } host->lower = l_res; /* */ if (!strcmp(host->address, "127.0.0.1") || !strcmp(host->address, "::1")) for_each_host(h, host->on_hosts) check_uniq("IP", "%s:%s:%s", h->name, host->address, host->port); } } } void set_disk_in_res(struct d_resource *res) { struct d_host_info *host; if (res->ignore) return; for (host = res->all_hosts; host; host=host->next) { if (host->lower) { if (res->stacked && host->lower->stacked) { fprintf(stderr, "%s:%d: in resource %s, stacked-on-top-of %s { ... }:\n" "\tFIXME. I won't stack stacked resources.\n", res->config_file, res->start_line, res->name, host->lower_name); config_valid = 0; } if (host->lower->ignore) continue; if (host->lower->me->device) m_asprintf(&host->disk, "%s", host->lower->me->device); else m_asprintf(&host->disk, "/dev/drbd%u", host->lower->me->device_minor); if (!host->disk) derror(host,res,"disk"); } } } void proxy_delegate(void *ctx) { struct d_resource *res = (struct d_resource *)ctx; int token; struct d_option *options, *opt; struct d_name *line, *word, **pnp; opt = NULL; token = yylex(); if (token != '{') { fprintf(stderr, "%s:%d: expected \"{\" after \"proxy\" keyword\n", config_file, fline); exit(E_config_invalid); } options = NULL; while (1) { pnp = &line; while (1) { token = yylex(); if (token == ';') break; if (token == '}') { if (pnp == &line) goto out; fprintf(stderr, "%s:%d: Missing \";\" before \"}\"\n", config_file, fline); exit(E_config_invalid); } word = malloc(sizeof(struct d_name)); if (!word) pdperror("out of memory."); word->name = yylval.txt; word->next = NULL; *pnp = word; pnp = &word->next; } opt = calloc(1, sizeof(struct d_option)); if (!opt) pdperror("out of memory."); opt->name = strdup(names_to_str(line)); options = APPEND(options, opt); free_names(line); } out: res->proxy_plugins = options; } int parse_proxy_settings(struct d_resource *res, int flags) { int token; if (flags & PARSER_CHECK_PROXY_KEYWORD) { token = yylex(); if (token != TK_PROXY) { if (flags & PARSER_STOP_IF_INVALID) { yyrestart(yyin); /* flushes flex's buffers */ return 1; } pe_expected_got("proxy", token); } } EXP('{'); res->proxy_options = parse_options_d(TK_PROXY_SWITCH, TK_PROXY_OPTION | TK_PROXY_GROUP, TK_PROXY_DELEGATE, proxy_delegate, res); return 0; } struct d_resource* parse_resource(char* res_name, enum pr_flags flags) { struct d_resource* res; struct d_name *host_names; int token; check_upr_init(); check_uniq("resource section", res_name); res=calloc(1,sizeof(struct d_resource)); res->name = res_name; res->device_minor = -1; res->config_file = config_file; res->start_line = line; while(1) { token = yylex(); fline = line; switch(token) { case TK_PROTOCOL: check_upr("protocol statement","%s: protocol",res->name); EXP(TK_STRING); res->protocol=yylval.txt; EXP(';'); break; case TK_ON: parse_hosts(&host_names, '{'); parse_host_section(res, host_names, REQUIRE_ALL); break; case TK_STACKED: parse_stacked_section(res); break; case TK_IGNORE: if (res->me || res->peer) { fprintf(stderr, "%s:%d: in resource %s, " "'ignore-on' statement must precede any real host section (on ... { ... }).\n", config_file, line, res->name); exit(E_config_invalid); } EXP(TK_STRING); fprintf(stderr, "%s:%d: in resource %s, " "WARN: The 'ignore-on' keyword is deprecated.\n", config_file, line, res->name); EXP(';'); break; case TK__THIS_HOST: EXP('{'); host_names = names_from_str("_this_host"); parse_host_section(res, host_names, 0); break; case TK__REMOTE_HOST: EXP('{'); host_names = names_from_str("_remote_host"); parse_host_section(res, host_names, 0); break; case TK_FLOATING: parse_host_section(res, NULL, REQUIRE_ALL + BY_ADDRESS); break; case TK_DISK: switch (token=yylex()) { case TK_STRING: res->disk = yylval.txt; EXP(';'); break; case '{': check_upr("disk section", "%s:disk", res->name); res->disk_options = parse_options(TK_DISK_SWITCH, TK_DISK_OPTION); break; default: check_string_error(token); pe_expected_got( "TK_STRING | {", token); } break; case TK_NET: check_upr("net section", "%s:net", res->name); EXP('{'); res->net_options = parse_options_d(TK_NET_SWITCH, TK_NET_OPTION, TK_NET_DELEGATE, &net_delegate, (void *)flags); break; case TK_SYNCER: check_upr("syncer section", "%s:syncer", res->name); EXP('{'); res->sync_options = parse_options(TK_SYNCER_SWITCH, TK_SYNCER_OPTION); break; case TK_STARTUP: check_upr("startup section", "%s:startup", res->name); EXP('{'); res->startup_options=parse_options_d(TK_STARTUP_SWITCH, TK_STARTUP_OPTION, TK_STARTUP_DELEGATE, &startup_delegate, res); break; case TK_HANDLER: check_upr("handlers section", "%s:handlers", res->name); EXP('{'); res->handlers = parse_options(0, TK_HANDLER_OPTION); break; case TK_PROXY: check_upr("proxy section", "%s:proxy", res->name); parse_proxy_settings(res, 0); break; case TK_DEVICE: check_upr("device statement", "%s:device", res->name); parse_device(NULL, &res->device_minor, &res->device); break; case TK_META_DISK: parse_meta_disk(&res->meta_disk, &res->meta_index); break; case TK_FLEX_META_DISK: EXP(TK_STRING); res->meta_disk = yylval.txt; if (strcmp("internal", yylval.txt)) { res->meta_index = strdup("flexible"); } EXP(';'); break; case '}': case 0: goto exit_loop; default: pe_expected_got("protocol | on | disk | net | syncer |" " startup | handlers |" " ignore-on | stacked-on-top-of",token); } } exit_loop: if (flags == NoneHAllowed && res->all_hosts) { config_valid = 0; fprintf(stderr, "%s:%d: in the %s section, there are no host sections" " allowed.\n", config_file, c_section_start, res->name); } return res; } void post_parse(struct d_resource *config, enum pp_flags flags) { struct d_resource *res,*tmp; for_each_resource(res, tmp, config) if (res->stacked_on_one) set_on_hosts_in_res(res); /* sets on_hosts and host->lower */ /* Needs "on_hosts" and host->lower already set */ for_each_resource(res, tmp, config) if (!res->stacked_on_one) set_me_in_resource(res, flags & match_on_proxy); /* Needs host->lower->me already set */ for_each_resource(res, tmp, config) if (res->stacked_on_one) set_me_in_resource(res, flags & match_on_proxy); // Needs "me" set already for_each_resource(res, tmp, config) if (res->stacked_on_one) set_disk_in_res(res); } void include_file(FILE *f, char *name) { int saved_line; char *saved_config_file, *saved_config_save; saved_line = line; saved_config_file = config_file; saved_config_save = config_save; line = 1; config_file = name; config_save = canonify_path(name); my_yypush_buffer_state(f); my_parse(); yypop_buffer_state(); line = saved_line; config_file = saved_config_file; config_save = saved_config_save; } void include_stmt(char *str) { char *last_slash, *tmp; glob_t glob_buf; int cwd_fd; FILE *f; size_t i; int r; /* in order to allow relative paths in include statements we change directory to the location of the current configuration file. */ cwd_fd = open(".", O_RDONLY); if (cwd_fd < 0) { fprintf(stderr, "open(\".\") failed: %m\n"); exit(E_usage); } tmp = strdupa(config_save); last_slash = strrchr(tmp, '/'); if (last_slash) *last_slash = 0; if (chdir(tmp)) { fprintf(stderr, "chdir(\"%s\") failed: %m\n", tmp); exit(E_usage); } r = glob(str, 0, NULL, &glob_buf); if (r == 0) { for (i=0; i Copyright (C) 2006-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ enum range_checks { R_NO_CHECK, R_MINOR_COUNT, R_DIALOG_REFRESH, R_DISK_SIZE, R_TIMEOUT, R_CONNECT_INT, R_PING_INT, R_MAX_BUFFERS, R_MAX_EPOCH_SIZE, R_SNDBUF_SIZE, R_RCVBUF_SIZE, R_KO_COUNT, R_RATE, R_GROUP, R_AL_EXTENTS, R_PORT, R_META_IDX, R_WFC_TIMEOUT, R_DEGR_WFC_TIMEOUT, R_OUTDATED_WFC_TIMEOUT, R_C_PLAN_AHEAD, R_C_DELAY_TARGET, R_C_FILL_TARGET, R_C_MAX_RATE, R_C_MIN_RATE, R_CONG_FILL, R_CONG_EXTENTS, }; enum yytokentype { TK_GLOBAL = 258, TK_RESOURCE, TK_ON, TK_STACKED, TK_IGNORE, TK_NET, TK_DISK, TK_SKIP, TK_SYNCER, TK_STARTUP, TK_DISABLE_IP_VERIFICATION, TK_DIALOG_REFRESH, TK_PROTOCOL, TK_HANDLER, TK_COMMON, TK_ADDRESS, TK_DEVICE, TK_MINOR, TK_META_DISK, TK_FLEX_META_DISK, TK_MINOR_COUNT, TK_IPADDR, TK_INTEGER, TK_STRING, TK_ELSE, TK_DISK_SWITCH, TK_DISK_OPTION, TK_NET_SWITCH, TK_NET_OPTION, TK_SYNCER_SWITCH, TK_SYNCER_OPTION, TK_STARTUP_SWITCH, TK_STARTUP_OPTION, TK_STARTUP_DELEGATE, TK_HANDLER_OPTION, TK_USAGE_COUNT, TK_ASK, TK_YES, TK_NO, TK__IS_DEFAULT, TK__THIS_HOST, TK__REMOTE_HOST, TK_PROXY, TK_INSIDE, TK_OUTSIDE, TK_MEMLIMIT, TK_PROXY_OPTION, TK_PROXY_SWITCH, TK_PROXY_DELEGATE, TK_ERR_STRING_TOO_LONG, TK_ERR_DQSTRING_TOO_LONG, TK_ERR_DQSTRING, TK_SCI, TK_SDP, TK_SSOCKS, TK_IPV4, TK_IPV6, TK_IPADDR6, TK_NET_DELEGATE, TK_INCLUDE, TK_FLOATING, TK_DEPRECATED_OPTION, TK__GROUPING_BASE = 0x1000, TK_PROXY_GROUP = 0x2000, /* Gets or'ed to some options */ }; /* The higher bits define one or more token groups. */ #define GET_TOKEN_GROUP(__x) ((__x) & ~(TK__GROUPING_BASE - 1)) #define REMOVE_GROUP_FROM_TOKEN(__x) ((__x) & (TK__GROUPING_BASE - 1)) typedef struct YYSTYPE { char* txt; enum range_checks rc; } YYSTYPE; #define yystype YYSTYPE /* obsolescent; will be withdrawn */ #define YYSTYPE_IS_DECLARED 1 #define YYSTYPE_IS_TRIVIAL 1 extern yystype yylval; extern char* yytext; extern FILE* yyin; /* avoid compiler warnings about implicit declaration */ int yylex(void); void my_yypush_buffer_state(FILE *f); void yypop_buffer_state (void ); void yyrestart(FILE *input_file); drbd-8.4.4/user/legacy/drbdadm_scanner.fl0000664000000000000000000002045212132747531017025 0ustar rootroot%{ #include #include #include #include "drbdadm_parser.h" #include "drbdadm.h" #include "drbdtool_common.h" void long_string(char* text); void long_dqstring(char* text); void err_dqstring(char* text); #if 0 #define DP printf("'%s' ",yytext) #else #define DP #endif #define CP yylval.txt = strdup(yytext); yylval.rc = R_NO_CHECK #define RC(N) yylval.rc = R_ ## N #define YY_NO_INPUT 1 #define YY_NO_UNPUT 1 static void yyunput (int c, register char * yy_bp ) __attribute((unused)); #ifndef YY_FLEX_SUBMINOR_VERSION #define MAX_INCLUDE_DEPTH 10 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; int include_stack_ptr = 0; #endif %} %option noyywrap NUM [0-9]{1,8}[MKGs]? SNUMB [0-9]{1,3} IPV4ADDR ({SNUMB}"."){3}{SNUMB} HEX4 [0-9a-fA-F]{1,4} IPV6ADDR ((({HEX4}":"){0,5}{HEX4})?":"{HEX4}?":"({HEX4}(":"{HEX4}){0,5})?("%"{STRING})?)|("::"[fF]{4}":"{IPV4ADDR}) WS [ \t\r] OPCHAR [{};\[\]:] DQSTRING \"([^\"\\\n]|\\[^\n]){0,255}\" LONG_DQSTRING \"([^\"\\\n]|\\[^\n]){255}. ERR_DQSTRING \"([^\"\\\n]|\\[^\n]){0,255}[\\\n] STRING [a-zA-Z0-9/._-]{1,80} LONG_STRING [a-zA-Z0-9/._-]{81} %% \n { line++; } \#.* /* ignore comments */ {WS} /* ignore whitespaces */ {OPCHAR} { DP; return yytext[0]; } on { DP; return TK_ON; } ignore-on { DP; return TK_IGNORE; } stacked-on-top-of { DP; return TK_STACKED; } floating { DP; return TK_FLOATING; } no { DP; return TK_NO; } net { DP; return TK_NET; } yes { DP; return TK_YES; } ask { DP; return TK_ASK; } skip { DP; return TK_SKIP; } disk { DP; return TK_DISK; } proxy { DP; return TK_PROXY; } minor { DP; return TK_MINOR; } inside { DP; return TK_INSIDE; } syncer { DP; return TK_SYNCER; } device { DP; return TK_DEVICE; } global { DP; return TK_GLOBAL; } common { DP; return TK_COMMON; } outside { DP; return TK_OUTSIDE; } address { DP; return TK_ADDRESS; } startup { DP; return TK_STARTUP; } include { DP; return TK_INCLUDE; } handlers { DP; return TK_HANDLER; } protocol { DP; return TK_PROTOCOL; } minor-count { DP; return TK_MINOR_COUNT; } disable-ip-verification { DP; return TK_DISABLE_IP_VERIFICATION;} dialog-refresh { DP; return TK_DIALOG_REFRESH; } resource { DP; return TK_RESOURCE; } meta-disk { DP; return TK_META_DISK; } flexible-meta-disk { DP; return TK_FLEX_META_DISK; } usage-count { DP; return TK_USAGE_COUNT; } _is_default { DP; return TK__IS_DEFAULT; } _this_host { DP; return TK__THIS_HOST; } _remote_host { DP; return TK__REMOTE_HOST; } sci { DP; CP; return TK_SCI; } ssocks { DP; CP; return TK_SSOCKS; } sdp { DP; CP; return TK_SDP; } ipv4 { DP; CP; return TK_IPV4; } ipv6 { DP; CP; return TK_IPV6; } size { DP; CP; RC(DISK_SIZE); return TK_DISK_OPTION; } on-io-error { DP; CP; return TK_DISK_OPTION; } fencing { DP; CP; return TK_DISK_OPTION; } max-bio-bvecs { DP; CP; return TK_DISK_OPTION; } disk-timeout { DP; CP; return TK_DISK_OPTION; } use-bmbv { DP; CP; return TK_DISK_SWITCH; } no-disk-barrier { DP; CP; return TK_DISK_SWITCH; } no-disk-flushes { DP; CP; return TK_DISK_SWITCH; } no-disk-drain { DP; CP; return TK_DISK_SWITCH; } no-md-flushes { DP; CP; return TK_DISK_SWITCH; } timeout { DP; CP; RC(TIMEOUT); return TK_NET_OPTION; } ko-count { DP; CP; RC(KO_COUNT); return TK_NET_OPTION; } ping-int { DP; CP; RC(PING_INT); return TK_NET_OPTION; } max-buffers { DP; CP; RC(MAX_BUFFERS); return TK_NET_OPTION;} sndbuf-size { DP; CP; RC(SNDBUF_SIZE); return TK_NET_OPTION | TK_PROXY_GROUP;} rcvbuf-size { DP; CP; RC(RCVBUF_SIZE); return TK_NET_OPTION | TK_PROXY_GROUP;} connect-int { DP; CP; RC(CONNECT_INT); return TK_NET_OPTION;} cram-hmac-alg { DP; CP; return TK_NET_OPTION; } shared-secret { DP; CP; return TK_NET_OPTION; } max-epoch-size { DP; CP; RC(MAX_EPOCH_SIZE); return TK_NET_OPTION;} after-sb-[012]pri { DP; CP; return TK_NET_OPTION; } rr-conflict { DP; CP; return TK_NET_OPTION; } ping-timeout { DP; CP; return TK_NET_OPTION | TK_PROXY_GROUP;} unplug-watermark { DP; CP; return TK_NET_OPTION; } data-integrity-alg { DP; CP; return TK_NET_OPTION; } on-congestion { DP; CP; return TK_NET_OPTION; } congestion-fill { DP; CP; RC(CONG_FILL); return TK_NET_OPTION; } congestion-extents { DP; CP; RC(CONG_EXTENTS); return TK_NET_OPTION;} allow-two-primaries { DP; CP; return TK_NET_SWITCH; } always-asbp { DP; CP; return TK_NET_SWITCH; } no-tcp-cork { DP; CP; return TK_NET_SWITCH; } discard-my-data { DP; CP; return TK_NET_DELEGATE; } rate { DP; CP; RC(RATE); return TK_SYNCER_OPTION; } after { DP; CP; return TK_SYNCER_OPTION; } verify-alg { DP; CP; return TK_SYNCER_OPTION; } csums-alg { DP; CP; return TK_SYNCER_OPTION; } al-extents { DP; CP; RC(AL_EXTENTS); return TK_SYNCER_OPTION;} cpu-mask { DP; CP; return TK_SYNCER_OPTION; } use-rle { DP; CP; return TK_SYNCER_SWITCH; } delay-probe-volume { DP; CP; return TK_DEPRECATED_OPTION; } delay-probe-interval { DP; CP; return TK_DEPRECATED_OPTION; } c-plan-ahead { DP; CP; RC(C_PLAN_AHEAD); return TK_SYNCER_OPTION; } c-delay-target { DP; CP; RC(C_DELAY_TARGET); return TK_SYNCER_OPTION; } c-fill-target { DP; CP; RC(C_FILL_TARGET); return TK_SYNCER_OPTION; } c-max-rate { DP; CP; RC(C_MAX_RATE); return TK_SYNCER_OPTION; } c-min-rate { DP; CP; RC(C_MIN_RATE); return TK_SYNCER_OPTION; } throttle-threshold { DP; CP; return TK_DEPRECATED_OPTION; } hold-off-threshold { DP; CP; return TK_DEPRECATED_OPTION; } on-no-data-accessible { DP; CP; return TK_SYNCER_OPTION; } wfc-timeout { DP; CP; RC(WFC_TIMEOUT); return TK_STARTUP_OPTION;} degr-wfc-timeout { DP; CP; RC(DEGR_WFC_TIMEOUT); return TK_STARTUP_OPTION;} outdated-wfc-timeout { DP; CP; RC(OUTDATED_WFC_TIMEOUT); return TK_STARTUP_OPTION;} stacked-timeouts { DP; return TK_STARTUP_DELEGATE; } become-primary-on { DP; return TK_STARTUP_DELEGATE; } wait-after-sb { DP; CP; return TK_STARTUP_SWITCH; } pri-on-incon-degr { DP; CP; return TK_HANDLER_OPTION; } pri-lost-after-sb { DP; CP; return TK_HANDLER_OPTION; } pri-lost { DP; CP; return TK_HANDLER_OPTION; } initial-split-brain { DP; CP; return TK_HANDLER_OPTION; } split-brain { DP; CP; return TK_HANDLER_OPTION; } outdate-peer { DP; CP; return TK_HANDLER_OPTION; } fence-peer { DP; CP; return TK_HANDLER_OPTION; } local-io-error { DP; CP; return TK_HANDLER_OPTION; } before-resync-target { DP; CP; return TK_HANDLER_OPTION; } after-resync-target { DP; CP; return TK_HANDLER_OPTION; } before-resync-source { DP; CP; return TK_HANDLER_OPTION; } memlimit { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } read-loops { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } compression { DP; CP; return TK_PROXY_OPTION | TK_PROXY_GROUP; } plugin { DP; CP; return TK_PROXY_DELEGATE; } out-of-sync { DP; CP; return TK_HANDLER_OPTION; } {IPV4ADDR} { DP; CP; return TK_IPADDR; } {IPV6ADDR} { DP; CP; return TK_IPADDR6; } {NUM} { DP; CP; return TK_INTEGER; } {DQSTRING} { unescape(yytext); DP; CP; return TK_STRING; } {STRING} { DP; CP; return TK_STRING; } {LONG_STRING} { return TK_ERR_STRING_TOO_LONG; } {LONG_DQSTRING} { return TK_ERR_DQSTRING_TOO_LONG; } {ERR_DQSTRING} { return TK_ERR_DQSTRING; } . { DP; return TK_ELSE; } %% /* Compatibility cruft for flex version 2.5.4a */ #ifndef YY_FLEX_SUBMINOR_VERSION /** Pushes the new state onto the stack. The new state becomes * the current state. This function will allocate the stack * if necessary. * @param new_buffer The new state. * */ void yypush_buffer_state (YY_BUFFER_STATE new_buffer ) { if (new_buffer == NULL) return; if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) { fprintf( stderr, "Includes nested too deeply" ); exit( 1 ); } include_stack[include_stack_ptr++] = YY_CURRENT_BUFFER; yy_switch_to_buffer(new_buffer); BEGIN(INITIAL); } /** Removes and deletes the top of the stack, if present. * The next element becomes the new top. * */ void yypop_buffer_state (void) { if (!YY_CURRENT_BUFFER) return; if ( --include_stack_ptr < 0 ) { fprintf( stderr, "error in flex compat code\n" ); exit( 1 ); } yy_delete_buffer(YY_CURRENT_BUFFER ); yy_switch_to_buffer(include_stack[include_stack_ptr]); } #endif void my_yypush_buffer_state(FILE *f) { /* Since we do not have YY_BUF_SIZE outside of the flex generated file.*/ yypush_buffer_state(yy_create_buffer(f, YY_BUF_SIZE)); } drbd-8.4.4/user/legacy/drbdadm_usage_cnt.c0000664000000000000000000004320411661146603017164 0ustar rootroot/* drbdadm_usage_cnt.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2006-2008, LINBIT Information Technologies GmbH Copyright (C) 2006-2008, Philipp Reisner Copyright (C) 2006-2008, Lars Ellenberg drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "drbdadm.h" #include "drbdtool_common.h" #include "drbd_endian.h" #include "linux/drbd.h" /* only use DRBD_MAGIC from here! */ #define HTTP_PORT 80 #define HTTP_HOST "usage.drbd.org" #define HTTP_ADDR "212.69.161.111" #define NODE_ID_FILE DRBD_LIB_DIR"/node_id" #define GIT_HASH_BYTE 20 #define SRCVERSION_BYTE 12 /* actually 11 and a half. */ #define SRCVERSION_PAD (GIT_HASH_BYTE - SRCVERSION_BYTE) #define SVN_STYLE_OD 16 struct vcs_rel { uint32_t svn_revision; char git_hash[GIT_HASH_BYTE]; struct { unsigned major, minor, sublvl; } version; unsigned version_code; }; struct node_info { uint64_t node_uuid; struct vcs_rel rev; }; struct node_info_od { uint32_t magic; struct node_info ni; } __packed; /* For our purpose (finding the revision) SLURP_SIZE is always enough. */ static char* slurp_proc_drbd() { const int SLURP_SIZE = 4096; char* buffer; int rr, fd; fd = open("/proc/drbd",O_RDONLY); if( fd == -1) return 0; buffer = malloc(SLURP_SIZE); if(!buffer) return 0; rr = read(fd, buffer, SLURP_SIZE-1); if( rr == -1) { free(buffer); return 0; } buffer[rr]=0; close(fd); return buffer; } void read_hex(char* dst, char* src, int dst_size, int src_size) { int dst_i, u, src_i=0; for(dst_i=0;dst_i= src_size) break; if(src[src_i] == 0) break; if(++src_i >= src_size) break; } } void vcs_ver_from_str(struct vcs_rel *rel, const char *token) { char *dot; long maj, min, sub; maj = strtol(token, &dot, 10); if (*dot != '.') return; min = strtol(dot+1, &dot, 10); if (*dot != '.') return; sub = strtol(dot+1, &dot, 10); /* don't check on *dot == 0, * we may want to add some extraversion tag sometime if (*dot != 0) return; */ rel->version.major = maj; rel->version.minor = min; rel->version.sublvl = sub; rel->version_code = (maj << 16) + (min << 8) + sub; } void vcs_from_str(struct vcs_rel *rel, const char *text) { char token[80]; int plus=0; enum { begin, f_ver, f_svn, f_rev, f_git, f_srcv } ex = begin; while (sget_token(token, sizeof(token), &text) != EOF) { switch(ex) { case begin: if(!strcmp(token,"version:")) ex = f_ver; if(!strcmp(token,"SVN")) ex = f_svn; if(!strcmp(token,"GIT-hash:")) ex = f_git; if(!strcmp(token,"srcversion:")) ex = f_srcv; break; case f_ver: if(!strcmp(token,"plus")) plus = 1; /* still waiting for version */ else { vcs_ver_from_str(rel, token); ex = begin; } break; case f_svn: if(!strcmp(token,"Revision:")) ex = f_rev; break; case f_rev: rel->svn_revision = atol(token) * 10; if( plus ) rel->svn_revision += 1; memset(rel->git_hash, 0, GIT_HASH_BYTE); return; case f_git: read_hex(rel->git_hash, token, GIT_HASH_BYTE, strlen(token)); rel->svn_revision = 0; return; case f_srcv: memset(rel->git_hash, 0, SRCVERSION_PAD); read_hex(rel->git_hash + SRCVERSION_PAD, token, SRCVERSION_BYTE, strlen(token)); rel->svn_revision = 0; return; } } } static int current_vcs_is_from_proc_drbd; static struct vcs_rel current_vcs_rel; static struct vcs_rel userland_version; static void vcs_get_current(void) { char* version_txt; if (current_vcs_rel.version_code) return; version_txt = slurp_proc_drbd(); if(version_txt) { vcs_from_str(¤t_vcs_rel, version_txt); current_vcs_is_from_proc_drbd = 1; free(version_txt); } else { vcs_from_str(¤t_vcs_rel, drbd_buildtag()); vcs_ver_from_str(¤t_vcs_rel, REL_VERSION); } } static void vcs_get_userland(void) { if (userland_version.version_code) return; vcs_ver_from_str(&userland_version, REL_VERSION); } int version_code_kernel(void) { vcs_get_current(); return current_vcs_is_from_proc_drbd ? current_vcs_rel.version_code : 0; } int version_code_userland(void) { vcs_get_userland(); return userland_version.version_code; } static int vcs_eq(struct vcs_rel *rev1, struct vcs_rel *rev2) { if( rev1->svn_revision || rev2->svn_revision ) { return rev1->svn_revision == rev2->svn_revision; } else { return !memcmp(rev1->git_hash,rev2->git_hash,GIT_HASH_BYTE); } } static int vcs_ver_cmp(struct vcs_rel *rev1, struct vcs_rel *rev2) { return rev1->version_code - rev2->version_code; } void warn_on_version_mismatch(void) { char *msg; int cmp; /* get the kernel module version from /proc/drbd */ vcs_get_current(); /* get the userland version from REL_VERSION */ vcs_get_userland(); cmp = vcs_ver_cmp(&userland_version, ¤t_vcs_rel); /* no message if equal */ if (cmp == 0) return; if (cmp > 0xffff || cmp < -0xffff) /* major version differs! */ msg = "mixing different major numbers will not work!"; else if (cmp < 0) /* userland is older. always warn. */ msg = "you should upgrade your drbd tools!"; else if (cmp & 0xff00) /* userland is newer minor version */ msg = "please don't mix different DRBD series."; else /* userland is newer, but only differ in sublevel. */ msg = "preferably kernel and userland versions should match."; fprintf(stderr, "DRBD module version: %u.%u.%u\n" " userland version: %u.%u.%u\n%s\n", current_vcs_rel.version.major, current_vcs_rel.version.minor, current_vcs_rel.version.sublvl, userland_version.version.major, userland_version.version.minor, userland_version.version.sublvl, msg); } static char *vcs_to_str(struct vcs_rel *rev) { static char buffer[80]; // Not generic, sufficient for the purpose. if( rev->svn_revision ) { snprintf(buffer,80,"nv="U32,rev->svn_revision); } else { int len=20,p; unsigned char *bytes; p = sprintf(buffer,"git="); bytes = (unsigned char*)rev->git_hash; while(len--) p += sprintf(buffer+p,"%02x",*bytes++); } return buffer; } static void write_node_id(struct node_info *ni) { int fd; struct node_info_od on_disk; int size; fd = open(NODE_ID_FILE,O_WRONLY|O_CREAT,S_IRUSR|S_IWUSR); if( fd == -1 && errno == ENOENT) { mkdir(DRBD_LIB_DIR,S_IRWXU); fd = open(NODE_ID_FILE,O_WRONLY|O_CREAT,S_IRUSR|S_IWUSR); } if( fd == -1) { perror("Creation of "NODE_ID_FILE" failed."); exit(20); } if(ni->rev.svn_revision != 0) { // SVN style (old) on_disk.magic = cpu_to_be32(DRBD_MAGIC); on_disk.ni.node_uuid = cpu_to_be64(ni->node_uuid); on_disk.ni.rev.svn_revision = cpu_to_be32(ni->rev.svn_revision); memset(on_disk.ni.rev.git_hash,0,GIT_HASH_BYTE); size = SVN_STYLE_OD; } else { on_disk.magic = cpu_to_be32(DRBD_MAGIC+1); on_disk.ni.node_uuid = cpu_to_be64(ni->node_uuid); on_disk.ni.rev.svn_revision = 0; memcpy(on_disk.ni.rev.git_hash,ni->rev.git_hash,GIT_HASH_BYTE); size = sizeof(on_disk); } if( write(fd,&on_disk, size) != size) { perror("Write to "NODE_ID_FILE" failed."); exit(20); } close(fd); } static int read_node_id(struct node_info *ni) { int rr,fd; struct node_info_od on_disk; fd = open(NODE_ID_FILE,O_RDONLY); if( fd == -1) { return 0; } rr = read(fd,&on_disk, sizeof(on_disk)); if( rr != sizeof(on_disk) && rr != SVN_STYLE_OD ) { close(fd); return 0; } switch(be32_to_cpu(on_disk.magic)) { case DRBD_MAGIC: ni->node_uuid = be64_to_cpu(on_disk.ni.node_uuid); ni->rev.svn_revision = be32_to_cpu(on_disk.ni.rev.svn_revision); memset(ni->rev.git_hash,0,GIT_HASH_BYTE); break; case DRBD_MAGIC+1: ni->node_uuid = be64_to_cpu(on_disk.ni.node_uuid); ni->rev.svn_revision = 0; memcpy(ni->rev.git_hash,on_disk.ni.rev.git_hash,GIT_HASH_BYTE); break; default: return 0; } close(fd); return 1; } /* to interrupt gethostbyname, * we not only need a signal, * but also the long jump: * gethostbyname would otherwise just restart the syscall * and timeout again. */ static jmp_buf timed_out; static void alarm_handler(int __attribute((unused)) signo) { longjmp(timed_out, 1); } #define DNS_TIMEOUT 3 /* seconds */ #define SOCKET_TIMEOUT 3 /* seconds */ struct hostent *my_gethostbyname(const char *name) { struct sigaction sa; struct sigaction so; struct hostent *h; alarm(0); sa.sa_handler = &alarm_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sigaction(SIGALRM, &sa, &so); if (!setjmp(timed_out)) { alarm(DNS_TIMEOUT); h = gethostbyname(name); } else /* timed out, longjmp of SIGALRM jumped here */ h = NULL; alarm(0); sigaction(SIGALRM, &so, NULL); return h; } /** * insert_usage_with_socket: * * Return codes: * * 0 - success * 1 - failed to create socket * 2 - unknown server * 3 - cannot connect to server * 5 - other error */ static int make_get_request(char *uri) { struct sockaddr_in server; struct hostent *host_info; unsigned long addr; int sock; char *req_buf; char *http_host = HTTP_HOST; int buf_len = 1024; char buffer[buf_len]; FILE *sockfd; int writeit; struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT }; sock = socket( PF_INET, SOCK_STREAM, 0); if (sock < 0) return 1; setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout)); setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(timeout)); memset (&server, 0, sizeof(server)); /* convert host name to ip */ host_info = my_gethostbyname(http_host); if (host_info == NULL) { /* unknown host, try with ip */ if ((addr = inet_addr( HTTP_ADDR )) != INADDR_NONE) memcpy((char *)&server.sin_addr, &addr, sizeof(addr)); else { close(sock); return 2; } } else { memcpy((char *)&server.sin_addr, host_info->h_addr, host_info->h_length); } ssprintf(req_buf, "GET %s HTTP/1.0\r\n" "Host: "HTTP_HOST"\r\n" "User-Agent: drbdadm/"REL_VERSION" (%s; %s; %s; %s)\r\n" "\r\n", uri, nodeinfo.sysname, nodeinfo.release, nodeinfo.version, nodeinfo.machine); server.sin_family = AF_INET; server.sin_port = htons(HTTP_PORT); if (connect(sock, (struct sockaddr*)&server, sizeof(server))<0) { /* cannot connect to server */ close(sock); return 3; } if ((sockfd = fdopen(sock, "r+")) == NULL) { close(sock); return 5; } if (fputs(req_buf, sockfd) == EOF) { fclose(sockfd); close(sock); return 5; } writeit = 0; while (fgets(buffer, buf_len, sockfd) != NULL) { /* ignore http headers */ if (writeit == 0) { if (buffer[0] == '\r' || buffer[0] == '\n') writeit = 1; } else { fprintf(stderr,"%s", buffer); } } fclose(sockfd); close(sock); return 0; } static void url_encode(char* in, char* out) { char *h = "0123456789abcdef"; char c; while( (c = *in++) != 0 ) { if( c == '\n' ) break; if( ( 'a' <= c && c <= 'z' ) || ( 'A' <= c && c <= 'Z' ) || ( '0' <= c && c <= '9' ) || c == '-' || c == '_' || c == '.' ) *out++ = c; else if( c == ' ' ) *out++ = '+'; else { *out++ = '%'; *out++ = h[c >> 4]; *out++ = h[c & 0x0f]; } } *out = 0; } /* Ensure that the node is counted on http://usage.drbd.org */ #define ANSWER_SIZE 80 void uc_node(enum usage_count_type type) { struct node_info ni; char *uri; int send = 0; int update = 0; char answer[ANSWER_SIZE]; char n_comment[ANSWER_SIZE*3]; char *r; if( type == UC_NO ) return; if( getuid() != 0 ) return; /* not when running directly from init, * or if stdout is no tty. * you do not want to have the "user information message" * as output from `drbdadm sh-resources all` */ if (getenv("INIT_VERSION")) return; if (no_tty) return; vcs_get_current(); if( ! read_node_id(&ni) ) { get_random_bytes(&ni.node_uuid,sizeof(ni.node_uuid)); ni.rev = current_vcs_rel; send = 1; } else { // read_node_id() was successful if (!vcs_eq(&ni.rev,¤t_vcs_rel)) { ni.rev = current_vcs_rel; update = 1; send = 1; } } if(!send) return; n_comment[0]=0; if (type == UC_ASK ) { fprintf(stderr, "\n" "\t\t--== This is %s of DRBD ==--\n" "Please take part in the global DRBD usage count at http://"HTTP_HOST".\n\n" "The counter works anonymously. It creates a random number to identify\n" "your machine and sends that random number, along with the kernel and\n" "DRBD version, to "HTTP_HOST".\n\n" "The benefits for you are:\n" " * In response to your submission, the server ("HTTP_HOST") will tell you\n" " how many users before you have installed this version (%s).\n" " * With a high counter LINBIT has a strong motivation to\n" " continue funding DRBD's development.\n\n" "http://"HTTP_HOST"/cgi-bin/insert_usage.pl?nu="U64"&%s\n\n" "In case you want to participate but know that this machine is firewalled,\n" "simply issue the query string with your favorite web browser or wget.\n" "You can control all of this by setting 'usage-count' in your drbd.conf.\n\n" "* You may enter a free form comment about your machine, that gets\n" " used on "HTTP_HOST" instead of the big random number.\n" "* If you wish to opt out entirely, simply enter 'no'.\n" "* To count this node without comment, just press [RETURN]\n", update ? "an update" : "a new installation", REL_VERSION,ni.node_uuid, vcs_to_str(&ni.rev)); r = fgets(answer, ANSWER_SIZE, stdin); if(r && !strcmp(answer,"no\n")) send = 0; url_encode(answer,n_comment); } ssprintf(uri,"http://"HTTP_HOST"/cgi-bin/insert_usage.pl?nu="U64"&%s%s%s", ni.node_uuid, vcs_to_str(&ni.rev), n_comment[0] ? "&nc=" : "", n_comment); if (send) { write_node_id(&ni); fprintf(stderr, "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" " --== Thank you for participating in the global usage survey ==--\n" "The server's response is:\n\n"); make_get_request(uri); if (type == UC_ASK) { fprintf(stderr, "\n" "From now on, drbdadm will contact "HTTP_HOST" only when you update\n" "DRBD or when you use 'drbdadm create-md'. Of course it will continue\n" "to ask you for confirmation as long as 'usage-count' is at its default\n" "value of 'ask'.\n\n" "Just press [RETURN] to continue: "); r = fgets(answer, 9, stdin); } } } /* For our purpose (finding the revision) SLURP_SIZE is always enough. */ char* run_admm_generic(struct d_resource* res ,const char* cmd) { const int SLURP_SIZE = 4096; int rr,pipes[2]; char* buffer; pid_t pid; buffer = malloc(SLURP_SIZE); if(!buffer) return 0; if(pipe(pipes)) return 0; pid = fork(); if(pid == -1) { fprintf(stderr,"Can not fork\n"); exit(E_exec_error); } if(pid == 0) { // child close(pipes[0]); // close reading end dup2(pipes[1],1); // 1 = stdout close(pipes[1]); exit(_admm_generic(res,cmd, SLEEPS_VERY_LONG|SUPRESS_STDERR| DONT_REPORT_FAILED)); } close(pipes[1]); // close writing end rr = read(pipes[0], buffer, SLURP_SIZE-1); if( rr == -1) { free(buffer); // FIXME cleanup return 0; } buffer[rr]=0; close(pipes[0]); waitpid(pid,0,0); return buffer; } int adm_create_md(struct d_resource* res ,const char* cmd) { char answer[ANSWER_SIZE]; struct node_info ni; uint64_t device_uuid=0; uint64_t device_size=0; char *uri; int send=0; char *tb; int rv,fd; int soi_tmp; char *setup_opts_0_tmp; char *r; tb = run_admm_generic(res, "read-dev-uuid"); device_uuid = strto_u64(tb,NULL,16); free(tb); rv = _admm_generic(res, cmd, SLEEPS_VERY_LONG); // cmd is "create-md". if(rv || dry_run) return rv; fd = open(res->me->disk,O_RDONLY); if( fd != -1) { device_size = bdev_size(fd); close(fd); } if( read_node_id(&ni) && device_size && !device_uuid) { get_random_bytes(&device_uuid, sizeof(uint64_t)); if( global_options.usage_count == UC_YES ) send = 1; if( global_options.usage_count == UC_ASK ) { fprintf(stderr, "\n" "\t\t--== Creating metadata ==--\n" "As with nodes, we count the total number of devices mirrored by DRBD\n" "at http://"HTTP_HOST".\n\n" "The counter works anonymously. It creates a random number to identify\n" "the device and sends that random number, along with the kernel and\n" "DRBD version, to "HTTP_HOST".\n\n" "http://"HTTP_HOST"/cgi-bin/insert_usage.pl?nu="U64"&ru="U64"&rs="U64"\n\n" "* If you wish to opt out entirely, simply enter 'no'.\n" "* To continue, just press [RETURN]\n", ni.node_uuid,device_uuid,device_size ); r = fgets(answer, ANSWER_SIZE, stdin); if(r && strcmp(answer,"no\n")) send = 1; } } if(!device_uuid) { get_random_bytes(&device_uuid, sizeof(uint64_t)); } if (send) { ssprintf(uri,"http://"HTTP_HOST"/cgi-bin/insert_usage.pl?" "nu="U64"&ru="U64"&rs="U64, ni.node_uuid, device_uuid, device_size); make_get_request(uri); } /* HACK */ soi_tmp = soi; setup_opts_0_tmp = setup_opts[0]; setup_opts[0] = NULL; ssprintf( setup_opts[0], X64(016), device_uuid); soi=1; _admm_generic(res, "write-dev-uuid", SLEEPS_VERY_LONG); setup_opts[0] = setup_opts_0_tmp; soi = soi_tmp; return rv; } drbd-8.4.4/user/legacy/drbdsetup.c0000664000000000000000000023161712132747531015543 0ustar rootroot/* drbdsetup.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 1999-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define __bitwise /* Build-workaround for broken RHEL4 kernels (2.6.9_78.0.1) */ #include #include #include #include #include #include #include "unaligned.h" #include "drbdtool_common.h" #ifndef __CONNECTOR_H #error "You need to set KDIR while building drbdsetup." #endif #ifndef AF_INET_SDP #define AF_INET_SDP 27 #define PF_INET_SDP AF_INET_SDP #endif enum usage_type { BRIEF, FULL, XML, }; struct drbd_tag_list { struct nlmsghdr *nl_header; struct cn_msg *cn_header; struct drbd_nl_cfg_req* drbd_p_header; unsigned short *tag_list_start; unsigned short *tag_list_cpos; int tag_size; }; struct drbd_argument { const char* name; const enum drbd_tags tag; int (*convert_function)(struct drbd_argument *, struct drbd_tag_list *, char *); }; struct drbd_option { const char* name; const char short_name; const enum drbd_tags tag; int (*convert_function)(struct drbd_option *, struct drbd_tag_list *, char *); void (*show_function)(struct drbd_option *,unsigned short*); int (*usage_function)(struct drbd_option *, char*, int); void (*xml_function)(struct drbd_option *); union { struct { const long long min; const long long max; const long long def; const unsigned char unit_prefix; const char* unit; } numeric_param; // for conv_numeric struct { const char** handler_names; const int number_of_handlers; const int def; } handler_param; // conv_handler }; }; struct drbd_cmd { const char* cmd; const int packet_id; int (*function)(struct drbd_cmd *, unsigned, int, char **); void (*usage)(struct drbd_cmd *, enum usage_type); union { struct { struct drbd_argument *args; struct drbd_option *options; } cp; // for generic_config_cmd, config_usage struct { int (*show_function)(struct drbd_cmd *, unsigned, unsigned short* ); } gp; // for generic_get_cmd, get_usage struct { struct option *options; int (*proc_event)(unsigned int, int, struct drbd_nl_cfg_reply *); } ep; // for events_cmd, events_usage }; }; // Connector functions #define NL_TIME (COMM_TIMEOUT*1000) static int open_cn(); static int send_cn(int sk_nl, struct nlmsghdr* nl_hdr, int size); static int receive_cn(int sk_nl, struct nlmsghdr* nl_hdr, int size, int timeout_ms); static int call_drbd(int sk_nl, struct drbd_tag_list *tl, struct nlmsghdr* nl_hdr, int size, int timeout_ms); static void close_cn(int sk_nl); // other functions static int get_af_ssocks(int warn); static void print_command_usage(int i, const char *addinfo, enum usage_type); // command functions static int generic_config_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv); static int down_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv); static int generic_get_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv); static int events_cmd(struct drbd_cmd *cm, unsigned minor, int argc,char **argv); // usage functions static void config_usage(struct drbd_cmd *cm, enum usage_type); static void get_usage(struct drbd_cmd *cm, enum usage_type); static void events_usage(struct drbd_cmd *cm, enum usage_type); // sub usage functions for config_usage static int numeric_opt_usage(struct drbd_option *option, char* str, int strlen); static int handler_opt_usage(struct drbd_option *option, char* str, int strlen); static int bit_opt_usage(struct drbd_option *option, char* str, int strlen); static int string_opt_usage(struct drbd_option *option, char* str, int strlen); // sub usage function for config_usage as xml static void numeric_opt_xml(struct drbd_option *option); static void handler_opt_xml(struct drbd_option *option); static void bit_opt_xml(struct drbd_option *option); static void string_opt_xml(struct drbd_option *option); // sub commands for generic_get_cmd static int show_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int role_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int status_xml_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int sh_status_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int cstate_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int dstate_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int uuids_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); static int lk_bdev_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl); // convert functions for arguments static int conv_block_dev(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg); static int conv_md_idx(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg); static int conv_address(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg); static int conv_protocol(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg); // convert functions for options static int conv_numeric(struct drbd_option *od, struct drbd_tag_list *tl, char* arg); static int conv_sndbuf(struct drbd_option *od, struct drbd_tag_list *tl, char* arg); static int conv_handler(struct drbd_option *od, struct drbd_tag_list *tl, char* arg); static int conv_bit(struct drbd_option *od, struct drbd_tag_list *tl, char* arg); static int conv_string(struct drbd_option *od, struct drbd_tag_list *tl, char* arg); // show functions for options (used by show_scmd) static void show_numeric(struct drbd_option *od, unsigned short* tp); static void show_handler(struct drbd_option *od, unsigned short* tp); static void show_bit(struct drbd_option *od, unsigned short* tp); static void show_string(struct drbd_option *od, unsigned short* tp); // sub functions for events_cmd static int print_broadcast_events(unsigned int seq, int, struct drbd_nl_cfg_reply *reply); static int w_connected_state(unsigned int seq, int, struct drbd_nl_cfg_reply *reply); static int w_synced_state(unsigned int seq, int, struct drbd_nl_cfg_reply *reply); const char *on_error[] = { [EP_PASS_ON] = "pass_on", [EP_CALL_HELPER] = "call-local-io-error", [EP_DETACH] = "detach", }; const char *fencing_n[] = { [FP_DONT_CARE] = "dont-care", [FP_RESOURCE] = "resource-only", [FP_STONITH] = "resource-and-stonith", }; const char *asb0p_n[] = { [ASB_DISCONNECT] = "disconnect", [ASB_DISCARD_YOUNGER_PRI] = "discard-younger-primary", [ASB_DISCARD_OLDER_PRI] = "discard-older-primary", [ASB_DISCARD_ZERO_CHG] = "discard-zero-changes", [ASB_DISCARD_LEAST_CHG] = "discard-least-changes", [ASB_DISCARD_LOCAL] = "discard-local", [ASB_DISCARD_REMOTE] = "discard-remote" }; const char *asb1p_n[] = { [ASB_DISCONNECT] = "disconnect", [ASB_CONSENSUS] = "consensus", [ASB_VIOLENTLY] = "violently-as0p", [ASB_DISCARD_SECONDARY] = "discard-secondary", [ASB_CALL_HELPER] = "call-pri-lost-after-sb" }; const char *asb2p_n[] = { [ASB_DISCONNECT] = "disconnect", [ASB_VIOLENTLY] = "violently-as0p", [ASB_CALL_HELPER] = "call-pri-lost-after-sb" }; const char *rrcf_n[] = { [ASB_DISCONNECT] = "disconnect", [ASB_VIOLENTLY] = "violently", [ASB_CALL_HELPER] = "call-pri-lost" }; const char *on_no_data_n[] = { [OND_IO_ERROR] = "io-error", [OND_SUSPEND_IO] = "suspend-io" }; const char *on_congestion_n[] = { [OC_BLOCK] = "block", [OC_PULL_AHEAD] = "pull-ahead", [OC_DISCONNECT] = "disconnect" }; struct option wait_cmds_options[] = { { "wfc-timeout",required_argument, 0, 't' }, { "degr-wfc-timeout",required_argument,0,'d'}, { "outdated-wfc-timeout",required_argument,0,'o'}, { "wait-after-sb",no_argument,0,'w'}, { 0, 0, 0, 0 } }; #define EN(N,U,UN) \ conv_numeric, show_numeric, numeric_opt_usage, numeric_opt_xml, \ { .numeric_param = { DRBD_ ## N ## _MIN, DRBD_ ## N ## _MAX, \ DRBD_ ## N ## _DEF ,U,UN } } #define EN_sndbuf(N,U,UN) \ conv_sndbuf, show_numeric, numeric_opt_usage, numeric_opt_xml, \ { .numeric_param = { DRBD_ ## N ## _MIN, DRBD_ ## N ## _MAX, \ DRBD_ ## N ## _DEF ,U,UN } } #define EH(N,D) \ conv_handler, show_handler, handler_opt_usage, handler_opt_xml, \ { .handler_param = { N, ARRAY_SIZE(N), \ DRBD_ ## D ## _DEF } } #define EB conv_bit, show_bit, bit_opt_usage, bit_opt_xml, { } #define ES conv_string, show_string, string_opt_usage, string_opt_xml, { } #define CLOSE_OPTIONS { NULL,0,0,NULL,NULL,NULL, NULL, { } } #define F_CONFIG_CMD generic_config_cmd, config_usage #define F_GET_CMD generic_get_cmd, get_usage #define F_EVENTS_CMD events_cmd, events_usage struct drbd_cmd commands[] = { {"primary", P_primary, F_CONFIG_CMD, {{ NULL, (struct drbd_option[]) { { "overwrite-data-of-peer",'o',T_primary_force, EB }, /* legacy name */ { "force",'f', T_primary_force, EB }, CLOSE_OPTIONS }} }, }, {"secondary", P_secondary, F_CONFIG_CMD, {{NULL, NULL}} }, {"disk", P_disk_conf, F_CONFIG_CMD, {{ (struct drbd_argument[]) { { "lower_dev", T_backing_dev, conv_block_dev }, { "meta_data_dev", T_meta_dev, conv_block_dev }, { "meta_data_index", T_meta_dev_idx, conv_md_idx }, { NULL, 0, NULL}, }, (struct drbd_option[]) { { "size",'d', T_disk_size, EN(DISK_SIZE_SECT,'s',"bytes") }, { "on-io-error",'e', T_on_io_error, EH(on_error,ON_IO_ERROR) }, { "fencing",'f', T_fencing, EH(fencing_n,FENCING) }, { "use-bmbv",'b', T_use_bmbv, EB }, { "no-disk-barrier",'a',T_no_disk_barrier,EB }, { "no-disk-flushes",'i',T_no_disk_flush,EB }, { "no-disk-drain",'D', T_no_disk_drain,EB }, { "no-md-flushes",'m', T_no_md_flush, EB }, { "max-bio-bvecs",'s', T_max_bio_bvecs,EN(MAX_BIO_BVECS,1,NULL) }, { "disk-timeout",'t', T_disk_timeout, EN(DISK_TIMEOUT,1,"1/10 seconds") }, CLOSE_OPTIONS }} }, }, {"detach", P_detach, F_CONFIG_CMD, {{NULL, (struct drbd_option[]) { { "force",'f', T_detach_force, EB }, CLOSE_OPTIONS }} }, }, {"net", P_net_conf, F_CONFIG_CMD, {{ (struct drbd_argument[]) { { "[af:]local_addr[:port]",T_my_addr, conv_address }, { "[af:]remote_addr[:port]",T_peer_addr,conv_address }, { "protocol", T_wire_protocol,conv_protocol }, { NULL, 0, NULL}, }, (struct drbd_option[]) { { "timeout",'t', T_timeout, EN(TIMEOUT,1,"1/10 seconds") }, { "max-epoch-size",'e',T_max_epoch_size,EN(MAX_EPOCH_SIZE,1,NULL) }, { "max-buffers",'b', T_max_buffers, EN(MAX_BUFFERS,1,NULL) }, { "unplug-watermark",'u',T_unplug_watermark, EN(UNPLUG_WATERMARK,1,NULL) }, { "connect-int",'c', T_try_connect_int, EN(CONNECT_INT,1,"seconds") }, { "ping-int",'i', T_ping_int, EN(PING_INT,1,"seconds") }, { "sndbuf-size",'S', T_sndbuf_size, EN_sndbuf(SNDBUF_SIZE,1,"bytes") }, { "rcvbuf-size",'r', T_rcvbuf_size, EN_sndbuf(RCVBUF_SIZE,1,"bytes") }, { "ko-count",'k', T_ko_count, EN(KO_COUNT,1,NULL) }, { "allow-two-primaries",'m',T_two_primaries, EB }, { "cram-hmac-alg",'a', T_cram_hmac_alg, ES }, { "shared-secret",'x', T_shared_secret, ES }, { "after-sb-0pri",'A', T_after_sb_0p,EH(asb0p_n,AFTER_SB_0P) }, { "after-sb-1pri",'B', T_after_sb_1p,EH(asb1p_n,AFTER_SB_1P) }, { "after-sb-2pri",'C', T_after_sb_2p,EH(asb2p_n,AFTER_SB_2P) }, { "always-asbp",'P', T_always_asbp, EB }, { "rr-conflict",'R', T_rr_conflict,EH(rrcf_n,RR_CONFLICT) }, { "ping-timeout",'p', T_ping_timeo, EN(PING_TIMEO,1,"1/10 seconds") }, { "discard-my-data",'D', T_want_lose, EB }, { "data-integrity-alg",'d', T_integrity_alg, ES }, { "no-tcp-cork",'o', T_no_cork, EB }, { "dry-run",'n', T_dry_run, EB }, { "on-congestion", 'g', T_on_congestion, EH(on_congestion_n,ON_CONGESTION) }, { "congestion-fill", 'f', T_cong_fill, EN(CONG_FILL,'s',"byte") }, { "congestion-extents", 'h', T_cong_extents, EN(CONG_EXTENTS,1,NULL) }, CLOSE_OPTIONS }} }, }, {"disconnect", P_disconnect, F_CONFIG_CMD, {{NULL, (struct drbd_option[]) { { "force", 'F', T_force, EB }, CLOSE_OPTIONS }} }, }, {"resize", P_resize, F_CONFIG_CMD, {{ NULL, (struct drbd_option[]) { { "size",'s',T_resize_size, EN(DISK_SIZE_SECT,'s',"bytes") }, { "assume-peer-has-space",'f',T_resize_force, EB }, { "assume-clean", 'c', T_no_resync, EB }, CLOSE_OPTIONS }} }, }, {"syncer", P_syncer_conf, F_CONFIG_CMD, {{ NULL, (struct drbd_option[]) { { "rate",'r',T_rate, EN(RATE,'k',"bytes/second") }, { "after",'a',T_after, EN(AFTER,1,NULL) }, { "al-extents",'e',T_al_extents, EN(AL_EXTENTS,1,NULL) }, { "csums-alg", 'C',T_csums_alg, ES }, { "verify-alg", 'v',T_verify_alg, ES }, { "cpu-mask",'c',T_cpu_mask, ES }, { "use-rle",'R',T_use_rle, EB }, { "on-no-data-accessible",'n', T_on_no_data, EH(on_no_data_n,ON_NO_DATA) }, { "c-plan-ahead", 'p', T_c_plan_ahead, EN(C_PLAN_AHEAD,1,"1/10 seconds") }, { "c-delay-target", 'd', T_c_delay_target, EN(C_DELAY_TARGET,1,"1/10 seconds") }, { "c-fill-target", 's', T_c_fill_target, EN(C_FILL_TARGET,'s',"bytes") }, { "c-max-rate", 'M', T_c_max_rate, EN(C_MAX_RATE,'k',"bytes/second") }, { "c-min-rate", 'm', T_c_min_rate, EN(C_MIN_RATE,'k',"bytes/second") }, CLOSE_OPTIONS }} }, }, {"new-current-uuid", P_new_c_uuid, F_CONFIG_CMD, {{NULL, (struct drbd_option[]) { { "clear-bitmap",'c',T_clear_bm, EB }, CLOSE_OPTIONS }} }, }, {"invalidate", P_invalidate, F_CONFIG_CMD, {{ NULL, NULL }} }, {"invalidate-remote", P_invalidate_peer, F_CONFIG_CMD, {{NULL, NULL}} }, {"pause-sync", P_pause_sync, F_CONFIG_CMD, {{ NULL, NULL }} }, {"resume-sync", P_resume_sync, F_CONFIG_CMD, {{ NULL, NULL }} }, {"suspend-io", P_suspend_io, F_CONFIG_CMD, {{ NULL, NULL }} }, {"resume-io", P_resume_io, F_CONFIG_CMD, {{ NULL, NULL }} }, {"outdate", P_outdate, F_CONFIG_CMD, {{ NULL, NULL }} }, {"verify", P_start_ov, F_CONFIG_CMD, {{ NULL, (struct drbd_option[]) { { "start",'s',T_start_sector, EN(DISK_SIZE_SECT,'s',"bytes") }, { "stop",'S',T_stop_sector, EN(DISK_SIZE_SECT,'s',"bytes") }, CLOSE_OPTIONS }} }, }, {"down", 0, down_cmd, get_usage, { {NULL, NULL }} }, {"state", P_get_state, F_GET_CMD, { .gp={ role_scmd} } }, {"role", P_get_state, F_GET_CMD, { .gp={ role_scmd} } }, {"status", P_get_state, F_GET_CMD, {.gp={ status_xml_scmd } } }, {"sh-status", P_get_state, F_GET_CMD, {.gp={ sh_status_scmd } } }, {"cstate", P_get_state, F_GET_CMD, {.gp={ cstate_scmd} } }, {"dstate", P_get_state, F_GET_CMD, {.gp={ dstate_scmd} } }, {"show-gi", P_get_uuids, F_GET_CMD, {.gp={ uuids_scmd} }}, {"get-gi", P_get_uuids, F_GET_CMD, {.gp={ uuids_scmd} } }, {"show", P_get_config, F_GET_CMD, {.gp={ show_scmd} } }, {"check-resize", P_get_config, F_GET_CMD, {.gp={ lk_bdev_scmd} } }, {"events", 0, F_EVENTS_CMD, { .ep = { (struct option[]) { { "unfiltered", no_argument, 0, 'u' }, { "all-devices",no_argument, 0, 'a' }, { 0, 0, 0, 0 } }, print_broadcast_events } } }, {"wait-connect", 0, F_EVENTS_CMD, { .ep = { wait_cmds_options, w_connected_state } } }, {"wait-sync", 0, F_EVENTS_CMD, { .ep = { wait_cmds_options, w_synced_state } } }, }; #define OTHER_ERROR 900 #define EM(C) [ C - ERR_CODE_BASE ] /* The EM(123) are used for old error messages. */ static const char *error_messages[] = { EM(NO_ERROR) = "No further Information available.", EM(ERR_LOCAL_ADDR) = "Local address(port) already in use.", EM(ERR_PEER_ADDR) = "Remote address(port) already in use.", EM(ERR_OPEN_DISK) = "Can not open backing device.", EM(ERR_OPEN_MD_DISK) = "Can not open meta device.", EM(106) = "Lower device already in use.", EM(ERR_DISK_NOT_BDEV) = "Lower device is not a block device.", EM(ERR_MD_NOT_BDEV) = "Meta device is not a block device.", EM(109) = "Open of lower device failed.", EM(110) = "Open of meta device failed.", EM(ERR_DISK_TOO_SMALL) = "Low.dev. smaller than requested DRBD-dev. size.", EM(ERR_MD_DISK_TOO_SMALL) = "Meta device too small.", EM(113) = "You have to use the disk command first.", EM(ERR_BDCLAIM_DISK) = "Lower device is already claimed. This usually means it is mounted.", EM(ERR_BDCLAIM_MD_DISK) = "Meta device is already claimed. This usually means it is mounted.", EM(ERR_MD_IDX_INVALID) = "Lower device / meta device / index combination invalid.", EM(117) = "Currently we only support devices up to 3.998TB.\n" "(up to 2TB in case you do not have CONFIG_LBD set)\n" "Contact office@linbit.com, if you need more.", EM(ERR_IO_MD_DISK) = "IO error(s) occurred during initial access to meta-data.\n", EM(ERR_MD_INVALID) = "No valid meta-data signature found.\n\n" "\t==> Use 'drbdadm create-md res' to initialize meta-data area. <==\n", EM(ERR_AUTH_ALG) = "The 'cram-hmac-alg' you specified is not known in " "the kernel. (Maybe you need to modprobe it, or modprobe hmac?)", EM(ERR_AUTH_ALG_ND) = "The 'cram-hmac-alg' you specified is not a digest.", EM(ERR_NOMEM) = "kmalloc() failed. Out of memory?", EM(ERR_DISCARD_IMPOSSIBLE) = "--discard-my-data not allowed when primary.", EM(ERR_DISK_CONFIGURED) = "Device is attached to a disk (use detach first)", EM(ERR_NET_CONFIGURED) = "Device has a net-config (use disconnect first)", EM(ERR_MANDATORY_TAG) = "UnknownMandatoryTag", EM(ERR_MINOR_INVALID) = "Device minor not allocated", EM(128) = "Resulting device state would be invalid", EM(ERR_INTR) = "Interrupted by Signal", EM(ERR_RESIZE_RESYNC) = "Resize not allowed during resync.", EM(ERR_NO_PRIMARY) = "Need one Primary node to resize.", EM(ERR_SYNC_AFTER) = "The sync-after minor number is invalid", EM(ERR_SYNC_AFTER_CYCLE) = "This would cause a sync-after dependency cycle", EM(ERR_PAUSE_IS_SET) = "Sync-pause flag is already set", EM(ERR_PAUSE_IS_CLEAR) = "Sync-pause flag is already cleared", EM(136) = "Disk state is lower than outdated", EM(ERR_PACKET_NR) = "Kernel does not know how to handle your request.\n" "Maybe API_VERSION mismatch?", EM(ERR_NO_DISK) = "Device does not have a disk-config", EM(ERR_NOT_PROTO_C) = "Protocol C required", EM(ERR_NOMEM_BITMAP) = "vmalloc() failed. Out of memory?", EM(ERR_INTEGRITY_ALG) = "The 'data-integrity-alg' you specified is not known in " "the kernel. (Maybe you need to modprobe it, or modprobe hmac?)", EM(ERR_INTEGRITY_ALG_ND) = "The 'data-integrity-alg' you specified is not a digest.", EM(ERR_CPU_MASK_PARSE) = "Invalid cpu-mask.", EM(ERR_VERIFY_ALG) = "VERIFYAlgNotAvail", EM(ERR_VERIFY_ALG_ND) = "VERIFYAlgNotDigest", EM(ERR_VERIFY_RUNNING) = "Can not change verify-alg while online verify runs", EM(ERR_DATA_NOT_CURRENT) = "Can only attach to the data we lost last (see kernel log).", EM(ERR_CONNECTED) = "Need to be StandAlone", EM(ERR_CSUMS_ALG) = "CSUMSAlgNotAvail", EM(ERR_CSUMS_ALG_ND) = "CSUMSAlgNotDigest", EM(ERR_CSUMS_RESYNC_RUNNING) = "Can not change csums-alg while resync is in progress", EM(ERR_PERM) = "Permission denied. CAP_SYS_ADMIN necessary", EM(ERR_NEED_APV_93) = "Protocol version 93 required to use --assume-clean", EM(ERR_STONITH_AND_PROT_A) = "Fencing policy resource-and-stonith only with prot B or C allowed", EM(ERR_CONG_NOT_PROTO_A) = "on-congestion policy pull-ahead only with prot A allowed", EM(ERR_PIC_AFTER_DEP) = "Sync-pause flag is already cleared.\n" "Note: Resync pause caused by a local sync-after dependency.", EM(ERR_PIC_PEER_DEP) = "Sync-pause flag is already cleared.\n" "Note: Resync pause caused by the peer node.", }; #define MAX_ERROR (sizeof(error_messages)/sizeof(*error_messages)) const char * error_to_string(int err_no) { const unsigned int idx = err_no - ERR_CODE_BASE; if (idx >= MAX_ERROR) return "Unknown... maybe API_VERSION mismatch?"; return error_messages[idx]; } #undef MAX_ERROR char *cmdname = NULL; /* "drbdsetup" for reporting in usage etc. */ char *devname = NULL; /* "/dev/drbd12" for reporting in print_config_error */ char *resname = NULL; /* for pretty printing in "status" only, taken from environment variable DRBD_RESOURCE */ int debug_dump_argv = 0; /* enabled by setting DRBD_DEBUG_DUMP_ARGV in the environment */ int lock_fd = -1; unsigned int cn_idx; static int dump_tag_list(unsigned short *tlc) { enum drbd_tags tag; unsigned int tag_nr; int len; int integer; char bit; uint64_t int64; const char* string; int found_unknown=0; while( (tag = *tlc++ ) != TT_END) { len = *tlc++; if(tag == TT_REMOVED) goto skip; tag_nr = tag_number(tag); if(tag_nrnl_header = malloc(NLMSG_SPACE( sizeof(struct cn_msg) + sizeof(struct drbd_nl_cfg_req) + size) ); tl->cn_header = NLMSG_DATA(tl->nl_header); tl->drbd_p_header = (struct drbd_nl_cfg_req*) tl->cn_header->data; tl->tag_list_start = tl->drbd_p_header->tag_list; tl->tag_list_cpos = tl->tag_list_start; tl->tag_size = size; return tl; } static void add_tag(struct drbd_tag_list *tl, short int tag, void *data, short int data_len) { if(data_len > tag_descriptions[tag_number(tag)].max_len) { fprintf(stderr, "The value for %s may only be %d byte long." " You requested %d.\n", tag_descriptions[tag_number(tag)].name, tag_descriptions[tag_number(tag)].max_len, data_len); exit(20); } if( (tl->tag_list_cpos - tl->tag_list_start) + data_len > tl->tag_size ) { fprintf(stderr, "Tag list size exceeded!\n"); exit(20); } put_unaligned(tag, tl->tag_list_cpos++); put_unaligned(data_len, tl->tag_list_cpos++); memcpy(tl->tag_list_cpos, data, data_len); tl->tag_list_cpos = (unsigned short*)((char*)tl->tag_list_cpos + data_len); } static void free_tag_list(struct drbd_tag_list *tl) { free(tl->nl_header); free(tl); } static int conv_block_dev(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg) { struct stat sb; int device_fd; int err; if ((device_fd = open(arg,O_RDWR))==-1) { PERROR("Can not open device '%s'", arg); return OTHER_ERROR; } if ( (err=fstat(device_fd, &sb)) ) { PERROR("fstat(%s) failed", arg); return OTHER_ERROR; } if(!S_ISBLK(sb.st_mode)) { fprintf(stderr, "%s is not a block device!\n", arg); return OTHER_ERROR; } close(device_fd); add_tag(tl,ad->tag,arg,strlen(arg)+1); // include the null byte. return NO_ERROR; } static int conv_md_idx(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg) { int idx; if(!strcmp(arg,"internal")) idx = DRBD_MD_INDEX_FLEX_INT; else if(!strcmp(arg,"flexible")) idx = DRBD_MD_INDEX_FLEX_EXT; else idx = m_strtoll(arg,1); add_tag(tl,ad->tag,&idx,sizeof(idx)); return NO_ERROR; } static void resolv6(char *name, struct sockaddr_in6 *addr) { struct addrinfo hints, *res, *tmp; int err; memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_INET6; hints.ai_socktype = SOCK_STREAM; hints.ai_protocol = IPPROTO_TCP; err = getaddrinfo(name, 0, &hints, &res); if (err) { fprintf(stderr, "getaddrinfo %s: %s\n", name, gai_strerror(err)); exit(20); } /* Yes, it is a list. We use only the first result. The loop is only * there to document that we know it is a list */ for (tmp = res; tmp; tmp = tmp->ai_next) { memcpy(addr, tmp->ai_addr, sizeof(*addr)); break; } freeaddrinfo(res); if (0) { /* debug output */ char ip[INET6_ADDRSTRLEN]; inet_ntop(AF_INET6, &addr->sin6_addr, ip, sizeof(ip)); fprintf(stderr, "%s -> %02x %04x %08x %s %08x\n", name, addr->sin6_family, addr->sin6_port, addr->sin6_flowinfo, ip, addr->sin6_scope_id); } } static unsigned long resolv(const char* name) { unsigned long retval; if((retval = inet_addr(name)) == INADDR_NONE ) { struct hostent *he; he = gethostbyname(name); if (!he) { fprintf(stderr, "can not resolve the hostname: gethostbyname(%s): %s\n", name, hstrerror(h_errno)); exit(20); } retval = ((struct in_addr *)(he->h_addr_list[0]))->s_addr; } return retval; } static void split_ipv6_addr(char **address, int *port) { /* ipv6:[fe80::0234:5678:9abc:def1]:8000; */ char *b = strrchr(*address,']'); if (address[0][0] != '[' || b == NULL || (b[1] != ':' && b[1] != '\0')) { fprintf(stderr, "unexpected ipv6 format: %s\n", *address); exit(20); } *b = 0; *address += 1; /* skip '[' */ if (b[1] == ':') *port = m_strtoll(b+2,1); /* b+2: "]:" */ else *port = 7788; /* will we ever get rid of that default port? */ } static void split_address(char* text, int *af, char** address, int* port) { static struct { char* text; int af; } afs[] = { { "ipv4:", AF_INET }, { "ipv6:", AF_INET6 }, { "sdp:", AF_INET_SDP }, { "ssocks:", -1 }, }; unsigned int i; char *b; *af=AF_INET; *address = text; for (i=0; itag,&addr6,sizeof(addr6)); } else { /* AF_INET, AF_SDP, AF_SSOCKS, * all use the IPv4 addressing scheme */ addr.sin_port = htons(port); addr.sin_family = af; addr.sin_addr.s_addr = resolv(address); add_tag(tl,ad->tag,&addr,sizeof(addr)); } return NO_ERROR; } static int conv_protocol(struct drbd_argument *ad, struct drbd_tag_list *tl, char* arg) { int prot; if(!strcmp(arg,"A") || !strcmp(arg,"a")) { prot=DRBD_PROT_A; } else if (!strcmp(arg,"B") || !strcmp(arg,"b")) { prot=DRBD_PROT_B; } else if (!strcmp(arg,"C") || !strcmp(arg,"c")) { prot=DRBD_PROT_C; } else { fprintf(stderr, "'%s' is no valid protocol.\n", arg); return OTHER_ERROR; } add_tag(tl,ad->tag,&prot,sizeof(prot)); return NO_ERROR; } static int conv_bit(struct drbd_option *od, struct drbd_tag_list *tl, char* arg __attribute((unused))) { char bit=1; add_tag(tl,od->tag,&bit,sizeof(bit)); return NO_ERROR; } /* It will only print the WARNING if the warn flag is set with the _first_ call! */ #define PROC_NET_AF_SCI_FAMILY "/proc/net/af_sci/family" #define PROC_NET_AF_SSOCKS_FAMILY "/proc/net/af_ssocks/family" static int get_af_ssocks(int warn_and_use_default) { char buf[16]; int c, fd; static int af = -1; if (af > 0) return af; fd = open(PROC_NET_AF_SSOCKS_FAMILY, O_RDONLY); if (fd < 0) fd = open(PROC_NET_AF_SCI_FAMILY, O_RDONLY); if (fd < 0) { if (warn_and_use_default) { fprintf(stderr, "open(" PROC_NET_AF_SSOCKS_FAMILY ") " "failed: %m\n WARNING: assuming AF_SSOCKS = 27. " "Socket creation may fail.\n"); af = 27; } return af; } c = read(fd, buf, sizeof(buf)-1); if (c > 0) { buf[c] = 0; if (buf[c-1] == '\n') buf[c-1] = 0; af = m_strtoll(buf,1); } else { if (warn_and_use_default) { fprintf(stderr, "read(" PROC_NET_AF_SSOCKS_FAMILY ") " "failed: %m\n WARNING: assuming AF_SSOCKS = 27. " "Socket creation may fail.\n"); af = 27; } } close(fd); return af; } static int conv_sndbuf(struct drbd_option *od, struct drbd_tag_list *tl, char* arg) { int err = conv_numeric(od, tl, arg); long long l = m_strtoll(arg, 0); char bit = 0; if (err != NO_ERROR || l != 0) return err; /* this is a mandatory bit, * to avoid newer userland to configure older modules with * a sndbuf size of zero, which would lead to Oops. */ add_tag(tl, T_auto_sndbuf_size, &bit, sizeof(bit)); return NO_ERROR; } static int conv_numeric(struct drbd_option *od, struct drbd_tag_list *tl, char* arg) { const long long min = od->numeric_param.min; const long long max = od->numeric_param.max; const unsigned char unit_prefix = od->numeric_param.unit_prefix; long long l; int i; char unit[] = {0,0}; l = m_strtoll(arg, unit_prefix); if (min > l || l > max) { unit[0] = unit_prefix > 1 ? unit_prefix : 0; fprintf(stderr,"%s %s => %llu%s out of range [%llu..%llu]%s\n", od->name, arg, l, unit, min, max, unit); return OTHER_ERROR; } switch(tag_type(od->tag)) { case TT_INT64: add_tag(tl,od->tag,&l,sizeof(l)); break; case TT_INTEGER: i=l; add_tag(tl,od->tag,&i,sizeof(i)); break; default: fprintf(stderr, "internal error in conv_numeric()\n"); } return NO_ERROR; } static int conv_handler(struct drbd_option *od, struct drbd_tag_list *tl, char* arg) { const char** handler_names = od->handler_param.handler_names; const int number_of_handlers = od->handler_param.number_of_handlers; int i; for(i=0;itag,&i,sizeof(i)); return NO_ERROR; } } fprintf(stderr, "%s-handler '%s' not known\n", od->name, arg); fprintf(stderr, "known %s-handlers:\n", od->name); for (i = 0; i < number_of_handlers; i++) { if (handler_names[i]) printf("\t%s\n", handler_names[i]); } return OTHER_ERROR; } static int conv_string(struct drbd_option *od, struct drbd_tag_list *tl, char* arg) { add_tag(tl,od->tag,arg,strlen(arg)+1); return NO_ERROR; } static struct option * make_longoptions(struct drbd_option* od) { /* room for up to N options, * plus set-defaults, create-device, and the terminating NULL */ #define N 30 static struct option buffer[N+3]; int i=0; while(od && od->name) { buffer[i].name = od->name; buffer[i].has_arg = tag_type(od->tag) == TT_BIT ? no_argument : required_argument ; buffer[i].flag = NULL; buffer[i].val = od->short_name; if (i++ == N) { /* we must not leave this loop with i > N */ fprintf(stderr,"buffer in make_longoptions to small.\n"); abort(); } od++; } #undef N // The two omnipresent options: buffer[i].name = "set-defaults"; buffer[i].has_arg = 0; buffer[i].flag = NULL; buffer[i].val = '('; i++; buffer[i].name = "create-device"; buffer[i].has_arg = 0; buffer[i].flag = NULL; buffer[i].val = ')'; i++; buffer[i].name = NULL; buffer[i].has_arg = 0; buffer[i].flag = NULL; buffer[i].val = 0; return buffer; } static struct drbd_option *find_opt_by_short_name(struct drbd_option *od, int c) { if(!od) return NULL; while(od->name) { if(od->short_name == c) return od; od++; } return NULL; } /* prepends global devname to output (if any) */ static int print_config_error(int err_no) { int rv=0; if (err_no == NO_ERROR || err_no == SS_SUCCESS) return 0; if (err_no == OTHER_ERROR) return 20; if ( ( err_no >= AFTER_LAST_ERR_CODE || err_no <= ERR_CODE_BASE ) && ( err_no > SS_CW_NO_NEED || err_no <= SS_AFTER_LAST_ERROR) ) { fprintf(stderr,"Error code %d unknown.\n" "You should update the drbd userland tools.\n",err_no); rv = 20; } else { if(err_no > ERR_CODE_BASE ) { fprintf(stderr,"%s: Failure: (%d) %s\n", devname, err_no, error_to_string(err_no)); rv = 10; } else if (err_no == SS_UNKNOWN_ERROR) { fprintf(stderr,"%s: State change failed: (%d)" "unknown error.\n", devname, err_no); rv = 11; } else if (err_no > SS_TWO_PRIMARIES) { // Ignore SS_SUCCESS, SS_NOTHING_TO_DO, SS_CW_Success... } else { fprintf(stderr,"%s: State change failed: (%d) %s\n", devname, err_no, drbd_set_st_err_str(err_no)); if (err_no == SS_NO_UP_TO_DATE_DISK) { /* all available disks are inconsistent, * or I am consistent, but cannot outdate the peer. */ rv = 17; } else if (err_no == SS_LOWER_THAN_OUTDATED) { /* was inconsistent anyways */ rv = 5; } else if (err_no == SS_NO_LOCAL_DISK) { /* Can not start resync, no local disks, try with drbdmeta */ rv = 16; } else { rv = 11; } } } return rv; } #define RCV_SIZE NLMSG_SPACE(sizeof(struct cn_msg)+sizeof(struct drbd_nl_cfg_reply)) static void warn_print_excess_args(int argc, char **argv, int i) { fprintf(stderr, "Excess arguments:"); for (; i < argc; i++) fprintf(stderr, " %s", argv[i]); printf("\n"); } static void dump_argv(int argc, char **argv, int first_non_option, int n_known_args) { int i; if (!debug_dump_argv) return; fprintf(stderr, ",-- ARGV dump (optind %d, known_args %d, argc %u):\n", first_non_option, n_known_args, argc); for (i = 0; i < argc; i++) { if (i == 1) fprintf(stderr, "-- consumed options:"); if (i == first_non_option) fprintf(stderr, "-- known args:"); if (i == (first_non_option + n_known_args)) fprintf(stderr, "-- unexpected args:"); fprintf(stderr, "| %2u: %s\n", i, argv[i]); } fprintf(stderr, "`--\n"); } static int _generic_config_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv) { char buffer[ RCV_SIZE ]; struct drbd_nl_cfg_reply *reply; struct drbd_argument *ad = cm->cp.args; struct drbd_option *od; struct option *lo; struct drbd_tag_list *tl; int c,i=1,rv=NO_ERROR,sk_nl; int flags=0; int n_args; tl = create_tag_list(4096); while(ad && ad->name) { if(argc < i+1) { fprintf(stderr,"Missing argument '%s'\n", ad->name); print_command_usage(cm-commands, "",FULL); rv = OTHER_ERROR; goto error; } rv = ad->convert_function(ad,tl,argv[i++]); if (rv != NO_ERROR) goto error; ad++; } n_args = i - 1; lo = make_longoptions(cm->cp.options); if (!lo) { static struct option none[] = { { } }; lo = none; } for(;;) { c = getopt_long(argc, argv, make_optstring(lo), lo, 0); if (c == -1) break; od = find_opt_by_short_name(cm->cp.options,c); if (od) rv = od->convert_function(od,tl,optarg); else { if(c=='(') flags |= DRBD_NL_SET_DEFAULTS; else if(c==')') flags |= DRBD_NL_CREATE_DEVICE; else { rv = OTHER_ERROR; goto error; } } if (rv != NO_ERROR) goto error; } /* argc should be cmd + n options + n args; * if it is more, we did not understand some */ if (n_args + optind < argc) { warn_print_excess_args(argc, argv, optind + n_args); rv = OTHER_ERROR; goto error; } dump_argv(argc, argv, optind, i - 1); add_tag(tl,TT_END,NULL,0); // close the tag list if(rv == NO_ERROR) { //dump_tag_list(tl->tag_list_start); int received; sk_nl = open_cn(); if (sk_nl < 0) { rv = OTHER_ERROR; goto error; } tl->drbd_p_header->packet_type = cm->packet_id; tl->drbd_p_header->drbd_minor = minor; tl->drbd_p_header->flags = flags; received = call_drbd(sk_nl,tl, (struct nlmsghdr*)buffer,RCV_SIZE,NL_TIME); close_cn(sk_nl); if (received >= 0) { reply = (struct drbd_nl_cfg_reply *) ((struct cn_msg *)NLMSG_DATA(buffer))->data; rv = reply->ret_code; } } error: free_tag_list(tl); return rv; } static int generic_config_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv) { return print_config_error(_generic_config_cmd(cm, minor, argc, argv)); } #define ASSERT(exp) if (!(exp)) \ fprintf(stderr,"ASSERT( " #exp " ) in %s:%d\n", __FILE__,__LINE__); static void show_numeric(struct drbd_option *od, unsigned short* tp) { long long val; const unsigned char unit_prefix = od->numeric_param.unit_prefix; switch(tag_type(get_unaligned(tp++))) { case TT_INTEGER: ASSERT( get_unaligned(tp++) == sizeof(int) ); val = get_unaligned((int*)tp); break; case TT_INT64: ASSERT( get_unaligned(tp++) == sizeof(uint64_t) ); val = get_unaligned((uint64_t*)tp); break; default: ASSERT(0); val=0; } if(unit_prefix == 1) printf("\t%-16s\t%lld",od->name,val); else printf("\t%-16s\t%lld%c",od->name,val,unit_prefix); if(val == (long long) od->numeric_param.def) printf(" _is_default"); if(od->numeric_param.unit) { printf("; # %s\n",od->numeric_param.unit); } else { printf(";\n"); } } static void show_handler(struct drbd_option *od, unsigned short* tp) { const char** handler_names = od->handler_param.handler_names; int i; ASSERT( tag_type(get_unaligned(tp++)) == TT_INTEGER ); ASSERT( get_unaligned(tp++) == sizeof(int) ); i = get_unaligned((int*)tp); printf("\t%-16s\t%s",od->name,handler_names[i]); if( i == (long long)od->numeric_param.def) printf(" _is_default"); printf(";\n"); } static void show_bit(struct drbd_option *od, unsigned short* tp) { ASSERT( tag_type(get_unaligned(tp++)) == TT_BIT ); ASSERT( get_unaligned(tp++) == sizeof(char) ); if(get_unaligned((char*)tp)) printf("\t%-16s;\n",od->name); } static void show_string(struct drbd_option *od, unsigned short* tp) { ASSERT( tag_type(get_unaligned(tp++)) == TT_STRING ); if( get_unaligned(tp++) > 0 && get_unaligned((char*)tp)) printf("\t%-16s\t\"%s\";\n",od->name,(char*)tp); } static unsigned short *look_for_tag(unsigned short *tlc, unsigned short tag) { enum drbd_tags t; int len; while( (t = get_unaligned(tlc)) != TT_END ) { if(t == tag) return tlc; tlc++; len = get_unaligned(tlc++); tlc = (unsigned short*)((char*)tlc + len); } return NULL; } static void print_options(struct drbd_option *od, unsigned short *tlc, const char* sect_name) { unsigned short *tp; int opened = 0; while(od->name) { tp = look_for_tag(tlc,od->tag); if(tp) { if(!opened) { opened=1; printf("%s {\n",sect_name); } od->show_function(od,tp); put_unaligned(TT_REMOVED, tp); } od++; } if(opened) { printf("}\n"); } } static void consume_everything(unsigned short *tlc) { enum drbd_tags t; int len; while( (t = get_unaligned(tlc)) != TT_END ) { put_unaligned(TT_REMOVED, tlc++); len = get_unaligned(tlc++); tlc = (unsigned short*)((char*)tlc + len); } } static int consume_tag_blob(enum drbd_tags tag, unsigned short *tlc, char** val, unsigned int* len) { unsigned short *tp; tp = look_for_tag(tlc,tag); if(tp) { put_unaligned(TT_REMOVED, tp++); *len = get_unaligned(tp++); *val = (char*)tp; return 1; } return 0; } static int consume_tag_string(enum drbd_tags tag, unsigned short *tlc, char** val) { unsigned short *tp; tp = look_for_tag(tlc,tag); if(tp) { put_unaligned(TT_REMOVED, tp++); if( get_unaligned(tp++) > 0 ) *val = (char*)tp; else *val = ""; return 1; } return 0; } static int consume_tag_int(enum drbd_tags tag, unsigned short *tlc, int* val) { unsigned short *tp; tp = look_for_tag(tlc,tag); if(tp) { put_unaligned(TT_REMOVED, tp++); tp++; *val = get_unaligned((int *)tp); return 1; } return 0; } static int consume_tag_u64(enum drbd_tags tag, unsigned short *tlc, unsigned long long* val) { unsigned short *tp; unsigned short len; tp = look_for_tag(tlc, tag); if(tp) { put_unaligned(TT_REMOVED, tp++); len = get_unaligned(tp++); /* check the data size. * actually it has to be long long, but I'm paranoid */ if (len == sizeof(int)) *val = get_unaligned((unsigned int*)tp); else if (len == sizeof(long)) *val = get_unaligned((unsigned long *)tp); else if (len == sizeof(long long)) *val = get_unaligned((unsigned long long *)tp); else { fprintf(stderr, "%s: unexpected tag len: %u\n", __func__ , len); return 0; } return 1; } return 0; } static int consume_tag_bit(enum drbd_tags tag, unsigned short *tlc, int* val) { unsigned short *tp; tp = look_for_tag(tlc,tag); if(tp) { put_unaligned(TT_REMOVED, tp++); tp++; *val = (int)(*(char *)tp); return 1; } return 0; } static int generic_get_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv __attribute((unused))) { char buffer[ 4096 ]; struct drbd_tag_list *tl; struct drbd_nl_cfg_reply *reply; int sk_nl,rv; int ignore_minor_not_known; int dummy; if (argc > 1) { warn_print_excess_args(argc, argv, 1); return 20; } dump_argv(argc, argv, 1, 0); tl = create_tag_list(2); add_tag(tl,TT_END,NULL,0); // close the tag list sk_nl = open_cn(); if(sk_nl < 0) return 20; tl->drbd_p_header->packet_type = cm->packet_id; tl->drbd_p_header->drbd_minor = minor; tl->drbd_p_header->flags = 0; memset(buffer,0,sizeof(buffer)); call_drbd(sk_nl,tl, (struct nlmsghdr*)buffer,4096,NL_TIME); close_cn(sk_nl); reply = (struct drbd_nl_cfg_reply *) ((struct cn_msg *)NLMSG_DATA(buffer))->data; /* if there was an error, report and abort -- * unless it was "this device is not there", * and command was "status" */ ignore_minor_not_known = cm->gp.show_function == status_xml_scmd || cm->gp.show_function == sh_status_scmd; if (reply->ret_code != NO_ERROR && !(reply->ret_code == ERR_MINOR_INVALID && ignore_minor_not_known)) return print_config_error(reply->ret_code); rv = cm->gp.show_function(cm,minor,reply->tag_list); /* in case cm->packet_id == P_get_state, and the gp.show_function did * nothing with the sync_progress info, consume it here, so it won't * confuse users because it gets dumped below. */ consume_tag_int(T_sync_progress, reply->tag_list, &dummy); if(dump_tag_list(reply->tag_list)) { printf("# Found unknown tags, you should update your\n" "# userland tools\n"); } return rv; } static char *af_to_str(int af) { if (af == AF_INET) return "ipv4"; else if (af == AF_INET6) return "ipv6"; /* AF_SSOCKS typically is 27, the same as AF_INET_SDP. * But with warn_and_use_default = 0, it will stay at -1 if not available. * Just keep the test on ssocks before the one on SDP (which is hard-coded), * and all should be fine. */ else if (af == get_af_ssocks(0)) return "ssocks"; else if (af == AF_INET_SDP) return "sdp"; else return "unknown"; } static void show_address(void* address, int addr_len) { union { struct sockaddr addr; struct sockaddr_in addr4; struct sockaddr_in6 addr6; } a; char buffer[INET6_ADDRSTRLEN]; /* avoid alignment issues on certain platforms (e.g. armel) */ memset(&a, 0, sizeof(a)); memcpy(&a.addr, address, addr_len); if (a.addr.sa_family == AF_INET || a.addr.sa_family == get_af_ssocks(0) || a.addr.sa_family == AF_INET_SDP) { printf("\taddress\t\t\t%s %s:%d;\n", af_to_str(a.addr4.sin_family), inet_ntoa(a.addr4.sin_addr), ntohs(a.addr4.sin_port)); } else if (a.addr.sa_family == AF_INET6) { printf("\taddress\t\t\t%s [%s]:%d;\n", af_to_str(a.addr6.sin6_family), inet_ntop(a.addr6.sin6_family, &a.addr6.sin6_addr, buffer, INET6_ADDRSTRLEN), ntohs(a.addr6.sin6_port)); } else { printf("\taddress\t\t\t[unknown af=%d, len=%d]\n", a.addr.sa_family, addr_len); } } static int show_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl) { int idx = idx; char *str = NULL, *backing_dev, *address; unsigned int addr_len = 0; // find all commands that have options and print those... for ( cm = commands ; cm < commands + ARRAY_SIZE(commands) ; cm++ ) { if(cm->function == generic_config_cmd && cm->cp.options ) print_options(cm->cp.options, rtl, cm->cmd); } // start of spaghetti code... if(consume_tag_int(T_wire_protocol,rtl,&idx)) printf("protocol %c;\n",'A'+idx-1); backing_dev = address = NULL; consume_tag_string(T_backing_dev,rtl,&backing_dev); consume_tag_blob(T_my_addr, rtl, &address, &addr_len); if(backing_dev || address) { printf("_this_host {\n"); printf("\tdevice\t\t\tminor %d;\n",minor); if(backing_dev) { printf("\tdisk\t\t\t\"%s\";\n",backing_dev); consume_tag_int(T_meta_dev_idx,rtl,&idx); consume_tag_string(T_meta_dev,rtl,&str); switch(idx) { case DRBD_MD_INDEX_INTERNAL: case DRBD_MD_INDEX_FLEX_INT: printf("\tmeta-disk\t\tinternal;\n"); break; case DRBD_MD_INDEX_FLEX_EXT: printf("\tflexible-meta-disk\t\"%s\";\n",str); break; default: printf("\tmeta-disk\t\t\"%s\" [ %d ];\n",str, idx); } } if(address) show_address(address, addr_len); printf("}\n"); } if(consume_tag_blob(T_peer_addr, rtl, &address, &addr_len)) { printf("_remote_host {\n"); show_address(address, addr_len); printf("}\n"); } consume_tag_bit(T_mind_af, rtl, &idx); /* consume it, its value has no relevance */ consume_tag_bit(T_auto_sndbuf_size, rtl, &idx); /* consume it, its value has no relevance */ return 0; } static int lk_bdev_scmd(struct drbd_cmd *cm, unsigned minor, unsigned short *rtl) { struct bdev_info bd = { 0, }; char *backing_dev = NULL; uint64_t bd_size; int fd; int idx = idx; int index_valid = 0; consume_tag_string(T_backing_dev, rtl, &backing_dev); index_valid = consume_tag_int(T_meta_dev_idx, rtl, &idx); /* consume everything */ consume_everything(rtl); if (!backing_dev) { fprintf(stderr, "Has no disk config, try with drbdmeta.\n"); return 1; } if (!index_valid) { /* cannot happen, right? ;-) */ fprintf(stderr, "No meta data index!?\n"); return 1; } if (idx >= 0 || idx == DRBD_MD_INDEX_FLEX_EXT) { lk_bdev_delete(minor); return 0; } fd = open(backing_dev, O_RDONLY); if (fd == -1) { fprintf(stderr, "Could not open %s: %m.\n", backing_dev); return 1; } bd_size = bdev_size(fd); close(fd); if (lk_bdev_load(minor, &bd) == 0 && bd.bd_size == bd_size && bd.bd_name && !strcmp(bd.bd_name, backing_dev)) return 0; /* nothing changed. */ bd.bd_size = bd_size; bd.bd_name = backing_dev; lk_bdev_save(minor, &bd); return 0; } static int status_xml_scmd(struct drbd_cmd *cm __attribute((unused)), unsigned minor, unsigned short *rtl) { union drbd_state state = { .i = 0 }; int synced = 0; if (!consume_tag_int(T_state_i,rtl,(int*)&state.i)) { printf( "\n"); return 0; } printf("\n"); return 0; } printf( /* connection state */ " cs=\"%s\"" /* role */ " ro1=\"%s\" ro2=\"%s\"" /* disk state */ " ds1=\"%s\" ds2=\"%s\"", drbd_conn_str(state.conn), drbd_role_str(state.role), drbd_role_str(state.peer), drbd_disk_str(state.disk), drbd_disk_str(state.pdsk)); /* io suspended ? */ if (state.susp) printf(" suspended"); /* reason why sync is paused */ if (state.aftr_isp) printf(" aftr_isp"); if (state.peer_isp) printf(" peer_isp"); if (state.user_isp) printf(" user_isp"); if (consume_tag_int(T_sync_progress, rtl, &synced)) printf(" resynced_percent=\"%i.%i\"", synced / 10, synced % 10); printf(" />\n"); return 0; } static int sh_status_scmd(struct drbd_cmd *cm __attribute((unused)), unsigned minor, unsigned short *rtl) { /* variable prefix; maybe rather make that a command line parameter? * or use "drbd_sh_status"? */ #define _P "" union drbd_state state = { .i = 0 }; int available = 0; int synced = 0; printf("%s_minor=%u\n", _P, minor); printf("%s_res_name=%s\n", _P, shell_escape(resname ?: "UNKNOWN")); available = consume_tag_int(T_state_i,rtl,(int*)&state.i); if (state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("%s_known=%s\n\n", _P, available ? "Unconfigured" : "NA # not available or not yet created"); printf("%s_cstate=Unconfigured\n", _P); printf("%s_role=\n", _P); printf("%s_peer=\n", _P); printf("%s_disk=\n", _P); printf("%s_pdsk=\n", _P); printf("%s_flags_susp=\n", _P); printf("%s_flags_aftr_isp=\n", _P); printf("%s_flags_peer_isp=\n", _P); printf("%s_flags_user_isp=\n", _P); printf("%s_resynced_percent=\n", _P); } else { printf( "%s_known=Configured\n\n" /* connection state */ "%s_cstate=%s\n" /* role */ "%s_role=%s\n" "%s_peer=%s\n" /* disk state */ "%s_disk=%s\n" "%s_pdsk=%s\n\n", _P, _P, drbd_conn_str(state.conn), _P, drbd_role_str(state.role), _P, drbd_role_str(state.peer), _P, drbd_disk_str(state.disk), _P, drbd_disk_str(state.pdsk)); /* io suspended ? */ printf("%s_flags_susp=%s\n", _P, state.susp ? "1" : ""); /* reason why sync is paused */ printf("%s_flags_aftr_isp=%s\n", _P, state.aftr_isp ? "1" : ""); printf("%s_flags_peer_isp=%s\n", _P, state.peer_isp ? "1" : ""); printf("%s_flags_user_isp=%s\n\n", _P, state.user_isp ? "1" : ""); printf("%s_resynced_percent=", _P); if (consume_tag_int(T_sync_progress, rtl, &synced)) printf("%i.%i\n", synced / 10, synced % 10); else printf("\n"); } printf("\n%s_sh_status_process\n\n\n", _P); fflush(stdout); return 0; #undef _P } static int role_scmd(struct drbd_cmd *cm __attribute((unused)), unsigned minor __attribute((unused)), unsigned short *rtl) { union drbd_state state = { .i = 0 }; if (!strcmp(cm->cmd, "state")) { fprintf(stderr, "'%s ... state' is deprecated, use '%s ... role' instead.\n", cmdname, cmdname); } consume_tag_int(T_state_i,rtl,(int*)&state.i); if ( state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("Unconfigured\n"); } else { printf("%s/%s\n",drbd_role_str(state.role),drbd_role_str(state.peer)); } return 0; } static int cstate_scmd(struct drbd_cmd *cm __attribute((unused)), unsigned minor __attribute((unused)), unsigned short *rtl) { union drbd_state state = { .i = 0 }; consume_tag_int(T_state_i,rtl,(int*)&state.i); if ( state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("Unconfigured\n"); } else { printf("%s\n",drbd_conn_str(state.conn)); } return 0; } static int dstate_scmd(struct drbd_cmd *cm __attribute((unused)), unsigned minor __attribute((unused)), unsigned short *rtl) { union drbd_state state = { .i = 0 }; consume_tag_int(T_state_i,rtl,(int*)&state.i); if ( state.conn == C_STANDALONE && state.disk == D_DISKLESS) { printf("Unconfigured\n"); } else { printf("%s/%s\n",drbd_disk_str(state.disk),drbd_disk_str(state.pdsk)); } return 0; } static int uuids_scmd(struct drbd_cmd *cm, unsigned minor __attribute((unused)), unsigned short *rtl) { uint64_t uuids[UI_SIZE]; char *tl_uuids; int flags = flags; unsigned int len; if (!consume_tag_blob(T_uuids, rtl, &tl_uuids, &len)) { fprintf(stderr,"Reply payload did not carry an uuid-tag,\n" "Probably the device has no disk!\n"); return 1; } consume_tag_int(T_uuids_flags,rtl,&flags); if( len == UI_SIZE * sizeof(uint64_t)) { memcpy(uuids, tl_uuids, len); if(!strcmp(cm->cmd,"show-gi")) { dt_pretty_print_uuids(uuids,flags); } else if(!strcmp(cm->cmd,"get-gi")) { dt_print_uuids(uuids,flags); } else { ASSERT( 0 ); } } else { fprintf(stderr, "Unexpected length of T_uuids tag. " "You should upgrade your userland tools\n"); } return 0; } static struct drbd_cmd *find_cmd_by_name(char *name) { unsigned int i; for (i = 0; i < ARRAY_SIZE(commands); i++) { if (!strcmp(name, commands[i].cmd)) { return commands + i; } } return NULL; } static int down_cmd(struct drbd_cmd *cm, unsigned minor, int argc, char **argv) { int rv; int success; if(argc > 1) { fprintf(stderr,"Ignoring excess arguments\n"); } cm = find_cmd_by_name("secondary"); rv = _generic_config_cmd(cm, minor, argc, argv); // No error messages if (rv == ERR_MINOR_INVALID) return 0; success = (rv >= SS_SUCCESS && rv < ERR_CODE_BASE) || rv == NO_ERROR; if (!success) return print_config_error(rv); cm = find_cmd_by_name("disconnect"); cm->function(cm,minor,argc,argv); cm = find_cmd_by_name("detach"); rv = cm->function(cm,minor,argc,argv); return rv; } static void print_digest(const char* label, const int len, const unsigned char *hash) { int i; printf("\t%s: ", label); for (i = 0; i < len; i++) printf("%02x",hash[i]); printf("\n"); } static char printable_or_dot(char c) { return (' ' < c && c <= '~') ? c : '.'; } static void print_hex_line(int offset, unsigned char *data) { printf( " %04x:" " %02x %02x %02x %02x %02x %02x %02x %02x " " %02x %02x %02x %02x %02x %02x %02x %02x" " %c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c\n", offset, data[0], data[1], data[2], data[3], data[4], data[5], data[6], data[7], data[8], data[9], data[10], data[11], data[12], data[13], data[14], data[15], printable_or_dot(data[0]), printable_or_dot(data[1]), printable_or_dot(data[2]), printable_or_dot(data[3]), printable_or_dot(data[4]), printable_or_dot(data[5]), printable_or_dot(data[6]), printable_or_dot(data[7]), printable_or_dot(data[8]), printable_or_dot(data[9]), printable_or_dot(data[10]), printable_or_dot(data[11]), printable_or_dot(data[12]), printable_or_dot(data[13]), printable_or_dot(data[14]), printable_or_dot(data[15])); } /* successive identical lines are collapsed into just printing one star */ static void print_hex_dump(int len, void *data) { int i; int star = 0; for (i = 0; i < len-15; i += 16) { if (i == 0 || memcmp(data + i, data + i - 16, 16)) { print_hex_line(i, data + i); star = 0; } else if (!star) { printf(" *\n"); star = 1; } } /* yes, I ignore remainders of len not modulo 16 here. * so what, usage is currently to dump bios, which are * multiple of 512. */ /* for good measure, print the total size as offset now, * last line may have been a '*' */ printf(" %04x.\n", len); } static void print_dump_ee(struct drbd_nl_cfg_reply *reply) { unsigned long long sector = -1ULL; unsigned long long block_id = 0; char *reason = "UNKNOWN REASON"; char *dig_in = NULL; char *dig_vv = NULL; unsigned int dgs_in = 0, dgs_vv = 0; unsigned int size = 0; char *data = NULL; if (!consume_tag_string(T_dump_ee_reason, reply->tag_list, &reason)) printf("\tno reason?\n"); if (!consume_tag_blob(T_seen_digest, reply->tag_list, &dig_in, &dgs_in)) printf("\tno digest in?\n"); if (!consume_tag_blob(T_calc_digest, reply->tag_list, &dig_vv, &dgs_vv)) printf("\tno digest out?\n"); if (!consume_tag_u64(T_ee_sector, reply->tag_list, §or)) printf("\tno sector?\n"); if (!consume_tag_u64(T_ee_block_id, reply->tag_list, &block_id)) printf("\tno block_id?\n"); if (!consume_tag_blob(T_ee_data, reply->tag_list, &data, &size)) printf("\tno data?\n"); printf("\tdumping ee, reason: %s\n", reason); printf("\tsector: %llu block_id: 0x%llx size: %u\n", sector, block_id, size); /* "input sanitation". Did I mention yet that I'm paranoid? */ if (!data) size = 0; if (!dig_in) dgs_in = 0; if (!dig_vv) dgs_vv = 0; if (dgs_in > SHARED_SECRET_MAX) dgs_in = SHARED_SECRET_MAX; if (dgs_vv > SHARED_SECRET_MAX) dgs_vv = SHARED_SECRET_MAX; print_digest("received digest", dgs_in, (unsigned char*)dig_in); print_digest("verified digest", dgs_vv, (unsigned char*)dig_vv); /* dump at most 32 K */ if (size > 0x8000) { size = 0x8000; printf("\tWARNING truncating data to %u!\n", 0x8000); } print_hex_dump(size,data); } /* this is not pretty; but it's api... ;-( */ const char *pretty_print_return_code(int e) { return e == NO_ERROR ? "No error" : e > ERR_CODE_BASE ? error_to_string(e) : e > SS_AFTER_LAST_ERROR && e <= SS_TWO_PRIMARIES ? drbd_set_st_err_str(e) : e == SS_CW_NO_NEED ? "Cluster wide state change: nothing to do" : e == SS_CW_SUCCESS ? "Cluster wide state change successful" : e == SS_NOTHING_TO_DO ? "State change: nothing to do" : e == SS_SUCCESS ? "State change successful" : e == SS_UNKNOWN_ERROR ? "Unspecified error" : "Unknown return code"; } static int print_broadcast_events(unsigned int seq, int u __attribute((unused)), struct drbd_nl_cfg_reply *reply) { union drbd_state state; char* str; int synced = 0; switch (reply->packet_type) { case 0: /* used to be this way in drbd_nl.c for some responses :-( */ case P_return_code_only: /* used by drbd_nl.c for most "empty" responses */ printf("%u ZZ %d ret_code: %d %s\n", seq, reply->minor, reply->ret_code, pretty_print_return_code(reply->ret_code)); break; case P_get_state: if(consume_tag_int(T_state_i,reply->tag_list,(int*)&state.i)) { printf("%u ST %d { cs:%s ro:%s/%s ds:%s/%s %c%c%c%c }\n", seq, reply->minor, drbd_conn_str(state.conn), drbd_role_str(state.role), drbd_role_str(state.peer), drbd_disk_str(state.disk), drbd_disk_str(state.pdsk), state.susp ? 's' : 'r', state.aftr_isp ? 'a' : '-', state.peer_isp ? 'p' : '-', state.user_isp ? 'u' : '-' ); } else fprintf(stderr,"Missing tag !?\n"); break; case P_call_helper: if(consume_tag_string(T_helper,reply->tag_list,&str)) { printf("%u UH %d %s\n", seq, reply->minor, str); } else fprintf(stderr,"Missing tag !?\n"); break; case P_sync_progress: if (consume_tag_int(T_sync_progress, reply->tag_list, &synced)) { printf("%u SP %d %i.%i\n", seq, reply->minor, synced / 10, synced % 10); } else fprintf(stderr,"Missing tag !?\n"); break; case P_dump_ee: printf("%u DE %d\n", seq, reply->minor); print_dump_ee(reply); break; default: printf("%u ?? %d \n",seq, reply->minor, reply->packet_type); break; } fflush(stdout); return 1; } void print_failure_code(int ret_code) { if (ret_code > ERR_CODE_BASE) fprintf(stderr,"%s: Failure: (%d) %s\n", devname, ret_code, error_to_string(ret_code)); else fprintf(stderr,"%s: Failure: (ret_code=%d)\n", devname, ret_code); } static int w_connected_state(unsigned int seq __attribute((unused)), int wait_after_sb, struct drbd_nl_cfg_reply *reply) { union drbd_state state; if (reply->ret_code != NO_ERROR) { print_failure_code(reply->ret_code); return 0; } if(reply->packet_type == P_get_state) { if(consume_tag_int(T_state_i,reply->tag_list,(int*)&state.i)) { if(state.conn >= C_CONNECTED) return 0; if(!wait_after_sb && state.conn < C_UNCONNECTED) return 0; } else fprintf(stderr,"Missing tag !?\n"); } return 1; } static int w_synced_state(unsigned int seq __attribute((unused)), int wait_after_sb, struct drbd_nl_cfg_reply *reply) { union drbd_state state; if (reply->ret_code != NO_ERROR) { print_failure_code(reply->ret_code); return 0; } if(reply->packet_type == P_get_state) { if(consume_tag_int(T_state_i,reply->tag_list,(int*)&state.i)) { if(state.conn == C_CONNECTED) return 0; if(!wait_after_sb && state.conn < C_UNCONNECTED) return 0; } else fprintf(stderr,"Missing tag !?\n"); } return 1; } static int events_cmd(struct drbd_cmd *cm, unsigned minor, int argc ,char **argv) { void *buffer; struct cn_msg *cn_reply; struct drbd_nl_cfg_reply *reply; struct drbd_tag_list *tl; struct option *lo; unsigned int b_seq=0, r_seq=0; int sk_nl,c,cont=1,rr = rr,i,last; int unfiltered=0, all_devices=0, timeout_ms=0; int wfc_timeout=DRBD_WFC_TIMEOUT_DEF; int degr_wfc_timeout=DRBD_DEGR_WFC_TIMEOUT_DEF; int outdated_wfc_timeout=DRBD_OUTDATED_WFC_TIMEOUT_DEF; struct timeval before,after; int wasb=0; lo = cm->ep.options; if (!lo) { static struct option none[] = { { } }; lo = none; } for(;;) { c = getopt_long(argc, argv, make_optstring(lo), lo, 0); if (c == -1) break; switch(c) { default: case '?': return 20; case 'u': unfiltered=1; break; case 'a': all_devices=1; break; case 't': wfc_timeout=m_strtoll(optarg,1); if(DRBD_WFC_TIMEOUT_MIN > wfc_timeout || wfc_timeout > DRBD_WFC_TIMEOUT_MAX) { fprintf(stderr, "wfc_timeout => %d" " out of range [%d..%d]\n", wfc_timeout, DRBD_WFC_TIMEOUT_MIN, DRBD_WFC_TIMEOUT_MAX); return 20; } break; case 'd': degr_wfc_timeout=m_strtoll(optarg,1); if(DRBD_DEGR_WFC_TIMEOUT_MIN > degr_wfc_timeout || degr_wfc_timeout > DRBD_DEGR_WFC_TIMEOUT_MAX) { fprintf(stderr, "degr_wfc_timeout => %d" " out of range [%d..%d]\n", degr_wfc_timeout, DRBD_DEGR_WFC_TIMEOUT_MIN, DRBD_DEGR_WFC_TIMEOUT_MAX); return 20; } break; case 'o': outdated_wfc_timeout=m_strtoll(optarg,1); if(DRBD_OUTDATED_WFC_TIMEOUT_MIN > degr_wfc_timeout || degr_wfc_timeout > DRBD_OUTDATED_WFC_TIMEOUT_MAX) { fprintf(stderr, "degr_wfc_timeout => %d" " out of range [%d..%d]\n", outdated_wfc_timeout, DRBD_OUTDATED_WFC_TIMEOUT_MIN, DRBD_OUTDATED_WFC_TIMEOUT_MAX); return 20; } break; case 'w': wasb=1; break; } } if (optind < argc) { warn_print_excess_args(argc, argv, optind); return 20; } dump_argv(argc, argv, optind, 0); tl = create_tag_list(2); add_tag(tl,TT_END,NULL,0); // close the tag list sk_nl = open_cn(); if(sk_nl < 0) return 20; /* allocate 64k to be on the safe side. */ #define NL_BUFFER_SIZE (64 << 10) buffer = malloc(NL_BUFFER_SIZE); if (!buffer) { fprintf(stderr, "could not allocate buffer of %u bytes\n", NL_BUFFER_SIZE); exit(20); } /* drbdsetup events should not ask for timeout "type", * this is only useful with wait-sync and wait-connected callbacks. */ if (cm->ep.proc_event != print_broadcast_events) { // Find out which timeout value to use. tl->drbd_p_header->packet_type = P_get_timeout_flag; tl->drbd_p_header->drbd_minor = minor; tl->drbd_p_header->flags = 0; if (0 >= call_drbd(sk_nl,tl, buffer, NL_BUFFER_SIZE, NL_TIME)) exit(20); cn_reply = (struct cn_msg *)NLMSG_DATA(buffer); reply = (struct drbd_nl_cfg_reply *)cn_reply->data; if (reply->ret_code != NO_ERROR) return print_config_error(reply->ret_code); consume_tag_bit(T_use_degraded,reply->tag_list,&rr); if (rr != UT_DEFAULT) { if (0 < wfc_timeout && (wfc_timeout < degr_wfc_timeout || degr_wfc_timeout == 0)) { degr_wfc_timeout = wfc_timeout; fprintf(stderr, "degr-wfc-timeout has to be shorter than wfc-timeout\n" "degr-wfc-timeout implicitly set to wfc-timeout (%ds)\n", degr_wfc_timeout); } if (0 < degr_wfc_timeout && (degr_wfc_timeout < outdated_wfc_timeout || outdated_wfc_timeout == 0)) { outdated_wfc_timeout = wfc_timeout; fprintf(stderr, "outdated-wfc-timeout has to be shorter than degr-wfc-timeout\n" "outdated-wfc-timeout implicitly set to degr-wfc-timeout (%ds)\n", degr_wfc_timeout); } } switch (rr) { case UT_DEFAULT: timeout_ms = wfc_timeout; break; case UT_DEGRADED: timeout_ms = degr_wfc_timeout; break; case UT_PEER_OUTDATED: timeout_ms = outdated_wfc_timeout; break; } } timeout_ms = timeout_ms * 1000 - 1; /* 0 -> -1 "infinite", 1000 -> 999, nobody cares... */ // ask for the current state before waiting for state updates... if (all_devices) { i = 0; last = 255; } else { i = last = minor; } while (i <= last) { tl->drbd_p_header->packet_type = P_get_state; tl->drbd_p_header->drbd_minor = i; tl->drbd_p_header->flags = 0; send_cn(sk_nl,tl->nl_header,(char*)tl->tag_list_cpos-(char*)tl->nl_header); i++; } dt_unlock_drbd(lock_fd); lock_fd=-1; do { gettimeofday(&before,NULL); rr = receive_cn(sk_nl, buffer, NL_BUFFER_SIZE, timeout_ms); gettimeofday(&after,NULL); if(rr == -2) break; // timeout expired. if(timeout_ms > 0 ) { timeout_ms -= ( (after.tv_sec - before.tv_sec) * 1000 + (after.tv_usec - before.tv_usec) / 1000 ); } cn_reply = (struct cn_msg *)NLMSG_DATA(buffer); reply = (struct drbd_nl_cfg_reply *)cn_reply->data; // dump_tag_list(reply->tag_list); /* There are two value spaces for sequence numbers. The first is the one created by this drbdsetup instance, the kernel's reply packets simply echo those sequence numbers. The second is created by the kernel's broadcast packets. */ if (!unfiltered) { if (cn_reply->ack == 0) { // broadcasts /* Careful, potential wrap around! * Will skip a lot of packets if you * unload/reload the module in between, * but keep this drbdsetup events running. * So don't do that. */ if ((int)(cn_reply->seq - b_seq) <= 0) continue; b_seq = cn_reply->seq; } else if ((all_devices || minor == reply->minor) && cn_reply->ack == (uint32_t)getpid() + 1) { // replies to drbdsetup packets and for this device. if ((int)(cn_reply->seq - r_seq) <= 0) continue; r_seq = cn_reply->seq; } else { /* or reply to configuration request of other drbdsetup */ continue; } } if( all_devices || minor == reply->minor ) { cont=cm->ep.proc_event(cn_reply->seq, wasb, reply); } } while(cont); free(buffer); close_cn(sk_nl); /* return code becomes exit code. * timeout? => exit 5 * else => exit 0 */ return (rr == -2) ? 5 : 0; } static int numeric_opt_usage(struct drbd_option *option, char* str, int strlen) { return snprintf(str,strlen," [{--%s|-%c} %lld ... %lld]", option->name, option->short_name, option->numeric_param.min, option->numeric_param.max); } static int handler_opt_usage(struct drbd_option *option, char* str, int strlen) { const char** handlers; int i, chars=0,first=1; chars += snprintf(str,strlen," [{--%s|-%c} {", option->name, option->short_name); handlers = option->handler_param.handler_names; for(i=0;ihandler_param.number_of_handlers;i++) { if(handlers[i]) { if(!first) chars += snprintf(str+chars,strlen,"|"); first=0; chars += snprintf(str+chars,strlen, "%s",handlers[i]); } } chars += snprintf(str+chars,strlen,"}]"); return chars; } static int bit_opt_usage(struct drbd_option *option, char* str, int strlen) { return snprintf(str,strlen," [{--%s|-%c}]", option->name, option->short_name); } static int string_opt_usage(struct drbd_option *option, char* str, int strlen) { return snprintf(str,strlen," [{--%s|-%c} ]", option->name, option->short_name); } static void numeric_opt_xml(struct drbd_option *option) { printf("\t\n"); } static void handler_opt_xml(struct drbd_option *option) { const char** handlers; int i; printf("\t\n"); } static void bit_opt_xml(struct drbd_option *option) { printf("\t\n"); } static void string_opt_xml(struct drbd_option *option) { printf("\t\n"); } static void config_usage(struct drbd_cmd *cm, enum usage_type ut) { struct drbd_argument *args; struct drbd_option *options; static char line[300]; int maxcol,col,prevcol,startcol,toolong; char *colstr; if(ut == XML) { printf("\n",cm->cmd); if( (args = cm->cp.args) ) { while (args->name) { printf("\t%s\n", args->name); args++; } } options = cm->cp.options; while (options && options->name) { options->xml_function(options); options++; } printf("\n"); return; } prevcol=col=0; maxcol=100; if((colstr=getenv("COLUMNS"))) maxcol=atoi(colstr)-1; col += snprintf(line+col, maxcol-col, " %s", cm->cmd); if( (args = cm->cp.args) ) { if(ut == BRIEF) { col += snprintf(line+col, maxcol-col, " [args...]"); } else { while (args->name) { col += snprintf(line+col, maxcol-col, " %s", args->name); args++; } } } if (col > maxcol) { printf("%s\n",line); col=0; } startcol=prevcol=col; options = cm->cp.options; if(ut == BRIEF) { if(options) col += snprintf(line+col, maxcol-col, " [opts...]"); printf("%-40s",line); return; } while (options && options->name) { col += options->usage_function(options, line+col, maxcol-col); if (col >= maxcol) { toolong = (prevcol == startcol); if( !toolong ) line[prevcol]=0; printf("%s\n",line); startcol=prevcol=col = sprintf(line," "); if( toolong) options++; } else { prevcol=col; options++; } } line[col]=0; printf("%s\n",line); } static void get_usage(struct drbd_cmd *cm, enum usage_type ut) { if(ut == BRIEF) { printf(" %-39s", cm->cmd); } else { printf(" %s\n", cm->cmd); } } static void events_usage(struct drbd_cmd *cm, enum usage_type ut) { struct option *lo; char line[41]; if(ut == BRIEF) { sprintf(line,"%s [opts...]", cm->cmd); printf(" %-39s",line); } else { printf(" %s", cm->cmd); lo = cm->ep.options; while(lo && lo->name) { printf(" [{--%s|-%c}]",lo->name,lo->val); lo++; } printf("\n"); } } static void print_command_usage(int i, const char *addinfo, enum usage_type ut) { if(ut != XML) printf("USAGE:\n"); commands[i].usage(commands+i,ut); if (addinfo) { printf("%s\n",addinfo); exit(20); } } static void print_usage(const char* addinfo) { size_t i; printf("\nUSAGE: %s device command arguments options\n\n" "Device is usually /dev/drbdX or /dev/drbd/X.\n" "General options: --create-device, --set-defaults\n" "\nCommands are:\n",cmdname); for (i = 0; i < ARRAY_SIZE(commands); i++) { commands[i].usage(commands+i,BRIEF); if(i%2==1) printf("\n"); } printf("\n\n" "To get more details about a command issue " "'drbdsetup help cmd'.\n" "\n"); /* printf("\n\nVersion: "REL_VERSION" (api:%d)\n%s\n", API_VERSION, drbd_buildtag()); */ if (addinfo) printf("\n%s\n",addinfo); exit(20); } static int open_cn() { int sk_nl; int err; struct sockaddr_nl my_nla; sk_nl = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR); if (sk_nl == -1) { perror("socket() failed"); return -1; } my_nla.nl_family = AF_NETLINK; my_nla.nl_groups = -1; //cn_idx my_nla.nl_pid = getpid(); err = bind(sk_nl, (struct sockaddr *)&my_nla, sizeof(my_nla)); if (err == -1) { err = errno; perror("bind() failed"); switch(err) { case ENOENT: fprintf(stderr,"Connector module not loaded? Try 'modprobe cn'.\n"); break; case EPERM: fprintf(stderr,"Missing privileges? You should run this as root.\n"); break; } return -1; } return sk_nl; } static void prepare_nl_header(struct nlmsghdr* nl_hdr, int size) { static uint32_t cn_seq = 1; struct cn_msg *cn_hdr; cn_hdr = (struct cn_msg *)NLMSG_DATA(nl_hdr); /* fill the netlink header */ nl_hdr->nlmsg_len = NLMSG_LENGTH(size - sizeof(struct nlmsghdr)); nl_hdr->nlmsg_type = NLMSG_DONE; nl_hdr->nlmsg_flags = 0; nl_hdr->nlmsg_seq = cn_seq; nl_hdr->nlmsg_pid = getpid(); /* fill the connector header */ cn_hdr->id.val = CN_VAL_DRBD; cn_hdr->id.idx = cn_idx; cn_hdr->seq = cn_seq++; cn_hdr->ack = getpid(); cn_hdr->len = size - sizeof(struct nlmsghdr) - sizeof(struct cn_msg); } static int send_cn(int sk_nl, struct nlmsghdr* nl_hdr, int size) { int rr; prepare_nl_header(nl_hdr,size); rr = send(sk_nl,nl_hdr,nl_hdr->nlmsg_len,0); if( rr != (ssize_t)nl_hdr->nlmsg_len) { perror("send() failed"); return -1; } return rr; } static int receive_cn(int sk_nl, struct nlmsghdr* nl_hdr, int size, int timeout_ms) { struct pollfd pfd; int rr; pfd.fd = sk_nl; pfd.events = POLLIN; rr = poll(&pfd,1,timeout_ms); if(rr == 0) return -2; // timeout expired. rr = recv(sk_nl,nl_hdr,size,0); if( rr < 0 ) { perror("recv() failed"); return -1; } return rr; } int receive_reply_cn(int sk_nl, struct drbd_tag_list *tl, struct nlmsghdr* nl_hdr, int size, int timeout_ms) { struct cn_msg *request_cn_hdr; struct cn_msg *reply_cn_hdr; int rr; request_cn_hdr = (struct cn_msg *)NLMSG_DATA(tl->nl_header); reply_cn_hdr = (struct cn_msg *)NLMSG_DATA(nl_hdr); while(1) { rr = receive_cn(sk_nl,nl_hdr,size,timeout_ms); if( rr < 0 ) return rr; if(reply_cn_hdr->seq == request_cn_hdr->seq && reply_cn_hdr->ack == request_cn_hdr->ack+1 ) return rr; /* printf("INFO: got other message \n" "got seq: %d ; ack %d \n" "exp seq: %d ; ack %d \n", reply_cn_hdr->seq,reply_cn_hdr->ack, request_cn_hdr->seq,request_cn_hdr->ack); */ } return rr; } static int call_drbd(int sk_nl, struct drbd_tag_list *tl, struct nlmsghdr* nl_hdr, int size, int timeout_ms) { int rr; prepare_nl_header(tl->nl_header, (char*)tl->tag_list_cpos - (char*)tl->nl_header); rr = send(sk_nl,tl->nl_header,tl->nl_header->nlmsg_len,0); if( rr != (ssize_t)tl->nl_header->nlmsg_len) { perror("send() failed"); return -1; } rr = receive_reply_cn(sk_nl,tl,nl_hdr,size,timeout_ms); if( rr == -2) { fprintf(stderr,"No response from the DRBD driver!" " Is the module loaded?\n"); } return rr; } static void close_cn(int sk_nl) { close(sk_nl); } static int is_drbd_driver_missing(void) { struct stat sb; FILE *cn_idx_file; int err; cn_idx = CN_IDX_DRBD; cn_idx_file = fopen("/sys/module/drbd/parameters/cn_idx", "r"); if (cn_idx_file) { unsigned int idx; /* gcc is picky */ if (fscanf(cn_idx_file, "%u", &idx)) cn_idx = idx; fclose(cn_idx_file); } err = stat("/proc/drbd", &sb); if (!err) return 0; if (err == ENOENT) fprintf(stderr, "DRBD driver appears to be missing\n"); else fprintf(stderr, "Could not stat(\"/proc/drbd\"): %m\n"); return 1; } int main(int argc, char** argv) { unsigned minor; struct drbd_cmd *cmd; int rv=0; if (chdir("/")) { /* highly unlikely, but gcc is picky */ perror("cannot chdir /"); return -111; } cmdname = strrchr(argv[0],'/'); if (cmdname) argv[0] = ++cmdname; else cmdname = argv[0]; /* == '-' catches -h, --help, and similar */ if (argc > 1 && (!strcmp(argv[1],"help") || argv[1][0] == '-')) { if(argc >= 3) { cmd=find_cmd_by_name(argv[2]); if(cmd) print_command_usage(cmd-commands,NULL,FULL); else print_usage("unknown command"); exit(0); } } /* * The legacy drbdsetup takes the object to operate on as its first argument, * followed by the command. For forward compatibility, check if we got the * command name first. */ if (argc >= 3 && !find_cmd_by_name(argv[2]) && find_cmd_by_name(argv[1])) { char *swap = argv[1]; argv[1] = argv[2]; argv[2] = swap; } /* it is enough to set it, value is ignored */ if (getenv("DRBD_DEBUG_DUMP_ARGV")) debug_dump_argv = 1; resname = getenv("DRBD_RESOURCE"); if (argc > 1 && (!strcmp(argv[1],"xml"))) { if(argc >= 3) { cmd=find_cmd_by_name(argv[2]); if(cmd) print_command_usage(cmd-commands,NULL,XML); else print_usage("unknown command"); exit(0); } } if (argc < 3) print_usage(argc==1 ? 0 : " Insufficient arguments"); cmd=find_cmd_by_name(argv[2]); if (is_drbd_driver_missing()) { if (!strcmp(argv[2], "down") || !strcmp(argv[2], "secondary") || !strcmp(argv[2], "disconnect") || !strcmp(argv[2], "detach")) return 0; /* "down" succeeds even if drbd is missing */ fprintf(stderr, "do you need to load the module?\n" "try: modprobe drbd\n"); return 20; } if(cmd) { minor = dt_minor_of_dev(argv[1]); if (minor < 0) { fprintf(stderr, "Cannot determine minor device number of " "drbd device '%s'", argv[1]); exit(20); } lock_fd = dt_lock_drbd(minor); /* maybe rather canonicalize, using asprintf? */ devname = argv[1]; // by passing argc-2, argv+2 the function has the command name // in argv[0], e.g. "syncer" rv = cmd->function(cmd,minor,argc-2,argv+2); dt_unlock_drbd(lock_fd); } else { print_usage("invalid command"); } return rv; } drbd-8.4.4/user/legacy/drbdtool_common.c0000664000000000000000000004456511605310253016723 0ustar rootroot#define _GNU_SOURCE #define _XOPEN_SOURCE 600 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* for BLKGETSIZE64 */ #include #include "drbdtool_common.h" #include "config.h" int force = 0; int confirmed(const char *text) { const char yes[] = "yes"; const ssize_t N = sizeof(yes); char *answer = NULL; size_t n = 0; int ok; printf("\n%s\n", text); if (force) { printf("*** confirmation forced via --force option ***\n"); ok = 1; } else { printf("[need to type '%s' to confirm] ", yes); ok = getline(&answer,&n,stdin) == N && strncmp(answer,yes,N-1) == 0; if (answer) free(answer); printf("\n"); } return ok; } /* In-place unescape double quotes and backslash escape sequences from a * double quoted string. Note: backslash is only useful to quote itself, or * double quote, no special treatment to any c-style escape sequences. */ void unescape(char *txt) { char *ue, *e; e = ue = txt; for (;;) { if (*ue == '"') { ue++; continue; } if (*ue == '\\') ue++; if (!*ue) break; *e++ = *ue++; } *e = '\0'; } /* input size is expected to be in KB */ char *ppsize(char *buf, unsigned long long size) { /* Needs 9 bytes at max including trailing NUL: * -1ULL ==> "16384 EB" */ static char units[] = { 'K', 'M', 'G', 'T', 'P', 'E' }; int base = 0; while (size >= 10000 && base < sizeof(units)-1) { /* shift + round */ size = (size >> 10) + !!(size & (1<<9)); base++; } sprintf(buf, "%u %cB", (unsigned)size, units[base]); return buf; } const char *make_optstring(struct option *options) { static char buffer[200]; char seen[256]; struct option *opt; char *c; memset(seen, 0, sizeof(seen)); opt = options; c = buffer; while (opt->name) { if (0 < opt->val && opt->val < 256) { if (seen[opt->val]++) { fprintf(stderr, "internal error: --%s has duplicate opt->val '%c'\n", opt->name, opt->val); abort(); } *c++ = opt->val; if (opt->has_arg != no_argument) { *c++ = ':'; if (opt->has_arg == optional_argument) *c++ = ':'; } } opt++; } *c = 0; return buffer; } int new_strtoll(const char *s, const char def_unit, unsigned long long *rv) { char unit = 0; char dummy = 0; int shift, c; switch (def_unit) { default: return MSE_DEFAULT_UNIT; case 0: case 1: case '1': shift = 0; break; case 'K': case 'k': shift = -10; break; case 's': shift = -9; // sectors break; /* case 'M': case 'm': case 'G': case 'g': */ } if (!s || !*s) return MSE_MISSING_NUMBER; c = sscanf(s, "%llu%c%c", rv, &unit, &dummy); if (c != 1 && c != 2) return MSE_INVALID_NUMBER; switch (unit) { case 0: return MSE_OK; case 'K': case 'k': shift += 10; break; case 'M': case 'm': shift += 20; break; case 'G': case 'g': shift += 30; break; case 's': shift += 9; break; default: return MSE_INVALID_UNIT; } /* if shift is negative (e.g. default unit 'K', actual unit 's'), * convert to positive, and shift right, rounding up. */ if (shift < 0) { shift = -shift; *rv = (*rv + (1ULL << shift) - 1) >> shift; return MSE_OK; } /* if shift is positive, first check for overflow */ if (*rv > (~0ULL >> shift)) return MSE_OUT_OF_RANGE; /* then convert */ *rv = *rv << shift; return MSE_OK; } unsigned long long m_strtoll(const char *s, const char def_unit) { unsigned long long r; switch(new_strtoll(s, def_unit, &r)) { case MSE_OK: return r; case MSE_DEFAULT_UNIT: fprintf(stderr, "unexpected default unit: %d\n",def_unit); exit(100); case MSE_MISSING_NUMBER: fprintf(stderr, "missing number argument\n"); exit(100); case MSE_INVALID_NUMBER: fprintf(stderr, "%s is not a valid number\n", s); exit(20); case MSE_INVALID_UNIT: fprintf(stderr, "%s is not a valid number\n", s); exit(20); case MSE_OUT_OF_RANGE: fprintf(stderr, "%s: out of range\n", s); exit(20); default: fprintf(stderr, "m_stroll() is confused\n"); exit(20); } } void alarm_handler(int __attribute((unused)) signo) { /* nothing. just interrupt F_SETLKW */ } /* it is implicitly unlocked when the process dies. * but if you want to explicitly unlock it, just close it. */ int unlock_fd(int fd) { return close(fd); } int get_fd_lockfile_timeout(const char *path, int seconds) { int fd, err; struct sigaction sa,so; struct flock fl = { .l_type = F_WRLCK, .l_whence = 0, .l_start = 0, .l_len = 0 }; if ((fd = open(path, O_RDWR | O_CREAT, 0600)) < 0) { fprintf(stderr,"open(%s): %m\n",path); return -1; } if (seconds) { sa.sa_handler=alarm_handler; sigemptyset(&sa.sa_mask); sa.sa_flags=0; sigaction(SIGALRM,&sa,&so); alarm(seconds); err = fcntl(fd,F_SETLKW,&fl); if (err) err = errno; alarm(0); sigaction(SIGALRM,&so,NULL); } else { err = fcntl(fd,F_SETLK,&fl); if (err) err = errno; } if (!err) return fd; if (err != EINTR && err != EAGAIN) { close(fd); errno = err; fprintf(stderr,"fcntl(%s,...): %m\n", path); return -1; } /* do we want to know this? */ if (!fcntl(fd,F_GETLK,&fl)) { fprintf(stderr,"lock on %s currently held by pid:%u\n", path, fl.l_pid); } close(fd); return -1; } int dt_minor_of_dev(const char *device) { struct stat sb; long m; int digits_only = only_digits(device); const char *c = device; /* On udev/devfs based system the device nodes does not * exist before the drbd is created. * * If the device name starts with /dev/drbd followed by * only digits, or if only digits are given, * those digits are the minor number. * * Otherwise, we cannot reliably determine the minor number! * * We allow "arbitrary" device names in drbd.conf, * and those may well contain digits. * Interpreting any digits as minor number is dangerous! */ if (!digits_only) { if (!strncmp("/dev/drbd", device, 9) && only_digits(device + 9)) c = device + 9; /* if the device node exists, * and is a block device with the correct major, * do not enforce further naming conventions. * people without udev, and not using drbdadm * may do whatever they like. */ else if (!stat(device,&sb) && S_ISBLK(sb.st_mode) && major(sb.st_rdev) == LANANA_DRBD_MAJOR) return minor(sb.st_rdev); else return -1; } /* ^[0-9]+$ or ^/dev/drbd[0-9]+$ */ errno = 0; m = strtol(c, NULL, 10); if (!errno) return m; return -1; } int only_digits(const char *s) { const char *c; for (c = s; isdigit(*c); c++) ; return c != s && *c == 0; } int dt_lock_drbd(int minor) { int sz, lfd; char *lfname; /* THINK. * maybe we should also place a fcntl lock on the * _physical_device_ we open later... * * This lock is to prevent a drbd minor from being configured * by drbdsetup while drbdmeta is about to mess with its meta data. * * If you happen to mess with the meta data of one device, * pretending it belongs to an other, you'll screw up completely. * * We should store something in the meta data to detect such abuses. */ /* NOTE that /var/lock/drbd-*-* may not be "secure", * maybe we should rather use /var/lock/drbd/drbd-*-*, * and make sure that /var/lock/drbd is drwx.-..-. root:root ... */ sz = asprintf(&lfname, DRBD_LOCK_DIR "/drbd-%d-%d", LANANA_DRBD_MAJOR, minor); if (sz < 0) { perror(""); exit(20); } lfd = get_fd_lockfile_timeout(lfname, 1); free (lfname); if (lfd < 0) exit(20); return lfd; } /* ignore errors */ void dt_unlock_drbd(int lock_fd) { if (lock_fd >= 0) unlock_fd(lock_fd); } void dt_print_gc(const uint32_t* gen_cnt) { printf("%d:%d:%d:%d:%d:%d:%d:%d\n", gen_cnt[Flags] & MDF_CONSISTENT ? 1 : 0, gen_cnt[HumanCnt], gen_cnt[TimeoutCnt], gen_cnt[ConnectedCnt], gen_cnt[ArbitraryCnt], gen_cnt[Flags] & MDF_PRIMARY_IND ? 1 : 0, gen_cnt[Flags] & MDF_CONNECTED_IND ? 1 : 0, gen_cnt[Flags] & MDF_FULL_SYNC ? 1 : 0); } void dt_pretty_print_gc(const uint32_t* gen_cnt) { printf("\n" " WantFullSync |\n" " ConnectedInd | |\n" " lastState | | |\n" " ArbitraryCnt | | | |\n" " ConnectedCnt | | | | |\n" " TimeoutCnt | | | | | |\n" " HumanCnt | | | | | | |\n" "Consistent | | | | | | | |\n" " --------+-----+-----+-----+-----+-----+-----+-----+\n" " %3s | %3d | %3d | %3d | %3d | %3s | %3s | %3s \n" "\n", gen_cnt[Flags] & MDF_CONSISTENT ? "1/c" : "0/i", gen_cnt[HumanCnt], gen_cnt[TimeoutCnt], gen_cnt[ConnectedCnt], gen_cnt[ArbitraryCnt], gen_cnt[Flags] & MDF_PRIMARY_IND ? "1/p" : "0/s", gen_cnt[Flags] & MDF_CONNECTED_IND ? "1/c" : "0/n", gen_cnt[Flags] & MDF_FULL_SYNC ? "1/y" : "0/n"); } void dt_print_uuids(const uint64_t* uuid, unsigned int flags) { int i; printf(X64(016)":"X64(016)":", uuid[UI_CURRENT], uuid[UI_BITMAP]); for ( i=UI_HISTORY_START ; i<=UI_HISTORY_END ; i++ ) { printf(X64(016)":", uuid[i]); } printf("%d:%d:%d:%d:%d:%d:%d\n", flags & MDF_CONSISTENT ? 1 : 0, flags & MDF_WAS_UP_TO_DATE ? 1 : 0, flags & MDF_PRIMARY_IND ? 1 : 0, flags & MDF_CONNECTED_IND ? 1 : 0, flags & MDF_FULL_SYNC ? 1 : 0, flags & MDF_PEER_OUT_DATED ? 1 : 0, flags & MDF_CRASHED_PRIMARY ? 1 : 0); } void dt_pretty_print_uuids(const uint64_t* uuid, unsigned int flags) { printf( "\n" " +--< Current data generation UUID >-\n" " | +--< Bitmap's base data generation UUID >-\n" " | | +--< younger history UUID >-\n" " | | | +-< older history >-\n" " V V V V\n"); dt_print_uuids(uuid, flags); printf( " ^ ^ ^ ^ ^ ^ ^\n" " -< Data consistency flag >--+ | | | | | |\n" " -< Data was/is currently up-to-date >--+ | | | | |\n" " -< Node was/is currently primary >--+ | | | |\n" " -< Node was/is currently connected >--+ | | |\n" " -< Node was in the progress of setting all bits in the bitmap >--+ | |\n" " -< The peer's disk was out-dated or inconsistent >--+ |\n" " -< This node was a crashed primary, and has not seen its peer since >--+\n" "\n"); printf("flags:%s %s, %s, %s%s%s\n", (flags & MDF_CRASHED_PRIMARY) ? " crashed" : "", (flags & MDF_PRIMARY_IND) ? "Primary" : "Secondary", (flags & MDF_CONNECTED_IND) ? "Connected" : "StandAlone", (flags & MDF_CONSISTENT) ? ((flags & MDF_WAS_UP_TO_DATE) ? "UpToDate" : "Outdated") : "Inconsistent", (flags & MDF_FULL_SYNC) ? ", need full sync" : "", (flags & MDF_PEER_OUT_DATED) ? ", peer Outdated" : ""); } /* s: token buffer * size: size of s, _including_ the terminating NUL * stream: to read from. * s is guaranteed to be NUL terminated * if a token (including the NUL) needs more size bytes, * s will contain only a truncated token, and the next call will * return the next size-1 non-white-space bytes of stream. */ int fget_token(char *s, int size, FILE* stream) { int c; char* sp = s; *sp = 0; /* terminate even if nothing is found */ --size; /* account for the terminating NUL */ do { // eat white spaces in front. c = getc(stream); if( c == EOF) return EOF; } while (!isgraph(c)); do { // read the first word into s *sp++ = c; c = getc(stream); if ( c == EOF) break; } while (isgraph(c) && --size); *sp=0; return 1; } int sget_token(char *s, int size, const char** text) { int c; char* sp = s; *sp = 0; /* terminate even if nothing is found */ --size; /* account for the terminating NUL */ do { // eat white spaces in front. c = *(*text)++; if( c == 0) return EOF; } while (!isgraph(c)); do { // read the first word into s *sp++ = c; c = *(*text)++; if ( c == 0) break; } while (isgraph(c) && --size); *sp=0; return 1; } uint64_t bdev_size(int fd) { uint64_t size64; /* size in byte. */ long size; /* size in sectors. */ int err; err = ioctl(fd, BLKGETSIZE64, &size64); if (err) { if (errno == EINVAL) { printf("INFO: falling back to BLKGETSIZE\n"); err = ioctl(fd, BLKGETSIZE, &size); if (err) { perror("ioctl(,BLKGETSIZE,) failed"); exit(20); } size64 = (uint64_t)512 *size; } else { perror("ioctl(,BLKGETSIZE64,) failed"); exit(20); } } return size64; } char *lk_bdev_path(unsigned minor) { char *path; m_asprintf(&path, "%s/drbd-minor-%d.lkbd", DRBD_LIB_DIR, minor); return path; } /* If the lower level device is resized, * and DRBD did not move its "internal" meta data in time, * the next time we try to attach, we won't find our meta data. * * Some helpers for storing and retrieving "last known" * information, to be able to find it regardless, * without scanning the full device for magic numbers. */ /* these return 0 on sucess, error code if something goes wrong. */ /* NOTE: file format for now: * one line, starting with size in byte, followed by tab, * followed by device name, followed by newline. */ int lk_bdev_save(const unsigned minor, const struct bdev_info *bd) { FILE *fp; char *path = lk_bdev_path(minor); int ok = 0; fp = fopen(path, "w"); if (!fp) goto fail; ok = fprintf(fp, "%llu\t%s\n", (unsigned long long) bd->bd_size, bd->bd_name); if (ok <= 0) goto fail; if (bd->bd_uuid) fprintf(fp, "uuid:\t"X64(016)"\n", bd->bd_uuid); ok = 0 == fflush(fp); ok = ok && 0 == fsync(fileno(fp)); ok = ok && 0 == fclose(fp); if (!ok) fail: /* MAYBE: unlink. But maybe partial info is better than no info? */ fprintf(stderr, "lk_bdev_save(%s) failed: %m\n", path); free(path); return ok <= 0 ? -1 : 0; } /* we may want to remove all stored information */ int lk_bdev_delete(const unsigned minor) { char *path = lk_bdev_path(minor); int rc = unlink(path); if (rc && errno != ENOENT) fprintf(stderr, "lk_bdev_delete(%s) failed: %m\n", path); free(path); return rc; } /* load info from that file. * caller should free(bd->bd_name) once it is no longer needed. */ int lk_bdev_load(const unsigned minor, struct bdev_info *bd) { FILE *fp; char *path; char *bd_name; unsigned long long bd_size; unsigned long long bd_uuid; char nl[2]; int rc = -1; if (!bd) return -1; path = lk_bdev_path(minor); fp = fopen(path, "r"); if (!fp) { if (errno != ENOENT) fprintf(stderr, "lk_bdev_load(%s) failed: %m\n", path); goto out; } /* GNU format extension: %as: * malloc buffer space for the resulting char */ rc = fscanf(fp, "%llu %as%[\n]uuid: %llx%[\n]", &bd_size, &bd_name, nl, &bd_uuid, nl); /* rc == 5: successfully converted two lines. * == 4: newline not found, possibly truncated uuid * == 3: first line complete, uuid missing. * == 2: new line not found, possibly truncated pathname, * or early whitespace * == 1: found some number, but no more. * incomplete file? try anyways. */ bd->bd_uuid = (rc >= 4) ? bd_uuid : 0; bd->bd_name = (rc >= 2) ? bd_name : NULL; bd->bd_size = (rc >= 1) ? bd_size : 0; if (rc < 1) { fprintf(stderr, "lk_bdev_load(%s): parse error\n", path); rc = -1; } else rc = 0; fclose(fp); out: free(path); return rc; } void get_random_bytes(void* buffer, int len) { int fd; fd = open("/dev/urandom",O_RDONLY); if( fd == -1) { perror("Open of /dev/urandom failed"); exit(20); } if(read(fd,buffer,len) != len) { fprintf(stderr,"Reading from /dev/urandom failed\n"); exit(20); } close(fd); } const char* shell_escape(const char* s) { /* ugly static buffer. so what. */ static char buffer[1024]; char *c = buffer; if (s == NULL) return s; while (*s) { if (buffer + sizeof(buffer) < c+2) break; switch(*s) { /* set of 'clean' characters */ case '%': case '+': case '-': case '.': case '/': case '0' ... '9': case ':': case '=': case '@': case 'A' ... 'Z': case '_': case 'a' ... 'z': break; /* escape everything else */ default: *c++ = '\\'; } *c++ = *s++; } *c = '\0'; return buffer; } int m_asprintf(char **strp, const char *fmt, ...) { int r; va_list ap; va_start(ap, fmt); r = vasprintf(strp, fmt, ap); va_end(ap); if (r == -1) { fprintf(stderr, "vasprintf() failed. Out of memory?\n"); exit(10); } return r; } /* print len bytes from buf in the format of well known "hd", * adjust displayed offset by file_offset */ void fprintf_hex(FILE *fp, off_t file_offset, const void *buf, unsigned len) { const unsigned char *c = buf; unsigned o; int skipped = 0; for (o = 0; o + 16 < len; o += 16, c += 16) { if (o && !memcmp(c - 16, c, 16)) { skipped = 1; continue; } if (skipped) { skipped = 0; fprintf(fp, "*\n"); } /* no error check here, don't know what to do about errors */ fprintf(fp, /* offset */ "%08llx" /* two times 8 byte as byte stream, on disk order */ " %02x %02x %02x %02x %02x %02x %02x %02x" " %02x %02x %02x %02x %02x %02x %02x %02x" /* the same as printable char or '.' */ " |%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c|\n", (unsigned long long)o + file_offset, c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7], c[8], c[9], c[10], c[11], c[12], c[13], c[14], c[15], #define p_(x) (isprint(x) ? x : '.') #define p(a,b,c,d,e,f,g,h) \ p_(a), p_(b), p_(c), p_(d), p_(e), p_(f), p_(g), p_(h) p(c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7]), p(c[8], c[9], c[10], c[11], c[12], c[13], c[14], c[15]) ); } if (skipped) { skipped = 0; fprintf(fp, "*\n"); } if (o < len) { unsigned remaining = len - o; unsigned i; fprintf(fp, "%08llx ", (unsigned long long)o + file_offset); for (i = 0; i < remaining; i++) { if (i == 8) fprintf(fp, " "); fprintf(fp, " %02x", c[i]); } fprintf(fp, "%*s |", (16 - i)*3 + (i < 8), ""); for (i = 0; i < remaining; i++) fprintf(fp, "%c", p_(c[i])); #undef p #undef p_ fprintf(fp, "|\n"); } fprintf(fp, "%08llx\n", (unsigned long long)len + file_offset); } drbd-8.4.4/user/legacy/drbdtool_common.h0000664000000000000000000000753611605310253016725 0ustar rootroot#ifndef DRBDTOOL_COMMON_H #define DRBDTOOL_COMMON_H #include "drbd_endian.h" #include #include #include #include #define LANANA_DRBD_MAJOR 147 /* we should get this into linux/major.h */ #ifndef DRBD_MAJOR #define DRBD_MAJOR LANANA_DRBD_MAJOR #elif (DRBD_MAJOR != LANANA_DRBD_MAJOR) # error "FIXME unexpected DRBD_MAJOR" #endif #ifndef __packed #define __packed __attribute__((packed)) #endif #ifndef ARRAY_SIZE #define ARRAY_SIZE(A) (sizeof(A)/sizeof(A[0])) #endif #define COMM_TIMEOUT 120 /* MetaDataIndex for v06 / v07 style meta data blocks */ enum MetaDataIndex { Flags, /* Consistency flag,connected-ind,primary-ind */ HumanCnt, /* human-intervention-count */ TimeoutCnt, /* timout-count */ ConnectedCnt, /* connected-count */ ArbitraryCnt, /* arbitrary-count */ GEN_CNT_SIZE /* MUST BE LAST! (and Flags must stay first...) */ }; /* #define PERROR(fmt, args...) \ do { fprintf(stderr,fmt ": " , ##args); perror(0); } while (0) */ #define PERROR(fmt, args...) fprintf(stderr, fmt ": %m\n" , ##args); enum new_strtoll_errs { MSE_OK, MSE_DEFAULT_UNIT, MSE_MISSING_NUMBER, MSE_INVALID_NUMBER, MSE_INVALID_UNIT, MSE_OUT_OF_RANGE, }; struct option; extern int only_digits(const char *s); extern int dt_lock_drbd(int minor); extern void dt_unlock_drbd(int lock_fd); extern void dt_release_lockfile(int drbd_fd); extern int dt_minor_of_dev(const char *device); extern int new_strtoll(const char *s, const char def_unit, unsigned long long *rv); extern unsigned long long m_strtoll(const char* s,const char def_unit); extern const char* make_optstring(struct option *options); extern char* ppsize(char* buf, unsigned long long size); extern void dt_print_gc(const uint32_t* gen_cnt); extern void dt_pretty_print_gc(const uint32_t* gen_cnt); extern void dt_print_uuids(const uint64_t* uuid, unsigned int flags); extern void dt_pretty_print_uuids(const uint64_t* uuid, unsigned int flags); extern int fget_token(char *s, int size, FILE* stream); extern int sget_token(char *s, int size, const char** text); extern uint64_t bdev_size(int fd); extern void get_random_bytes(void* buffer, int len); extern int force; /* global option to force implicit confirmation */ extern int confirmed(const char *text); extern const char* shell_escape(const char* s); /* In-place unescape double quotes and backslash escape sequences from a * double quoted string. Note: backslash is only useful to quote itself, or * double quote, no special treatment to any c-style escape sequences. */ extern void unescape(char *txt); /* Since glibc 2.8~20080505-0ubuntu7 asprintf() is declared with the warn_unused_result attribute.... */ extern int m_asprintf(char **strp, const char *fmt, ...); extern void fprintf_hex(FILE *fp, off_t file_offset, const void *buf, unsigned len); /* If the lower level device is resized, * and DRBD did not move its "internal" meta data in time, * the next time we try to attach, we won't find our meta data. * * Some helpers for storing and retrieving "last known" * information, to be able to find it regardless, * without scanning the full device for magic numbers. */ /* We may want to store more things later... if so, we can easily change to * some NULL terminated tag-value list format then. * For now: store the last known lower level block device size, * and its /dev/ */ struct bdev_info { uint64_t bd_size; uint64_t bd_uuid; char *bd_name; }; /* these return 0 on sucess, error code if something goes wrong. */ /* create (update) the last-known-bdev-info file */ extern int lk_bdev_save(const unsigned minor, const struct bdev_info *bd); /* we may want to remove all stored information */ extern int lk_bdev_delete(const unsigned minor); /* load info from that file. * caller should free(bd->bd_name) once it is no longer needed. */ extern int lk_bdev_load(const unsigned minor, struct bdev_info *bd); #endif drbd-8.4.4/user/legacy/linux/drbd.h0000664000000000000000000002454712132747531015630 0ustar rootroot/* drbd.h Kernel module for 2.6.x Kernels This file is part of DRBD by Philipp Reisner and Lars Ellenberg. Copyright (C) 2001-2008, LINBIT Information Technologies GmbH. Copyright (C) 2001-2008, Philipp Reisner . Copyright (C) 2001-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef DRBD_H #define DRBD_H #include #include #ifdef __KERNEL__ #include #include #else #include #include #include /* Altough the Linux source code makes a difference between generic endianness and the bitfields' endianness, there is no architecture as of Linux-2.6.24-rc4 where the bitfileds' endianness does not match the generic endianness. */ #if __BYTE_ORDER == __LITTLE_ENDIAN #define __LITTLE_ENDIAN_BITFIELD #elif __BYTE_ORDER == __BIG_ENDIAN #define __BIG_ENDIAN_BITFIELD #else # error "sorry, weird endianness on this box" #endif #endif enum drbd_io_error_p { EP_PASS_ON, /* FIXME should the better be named "Ignore"? */ EP_CALL_HELPER, EP_DETACH }; enum drbd_fencing_p { FP_DONT_CARE, FP_RESOURCE, FP_STONITH }; enum drbd_disconnect_p { DP_RECONNECT, DP_DROP_NET_CONF, DP_FREEZE_IO }; enum drbd_after_sb_p { ASB_DISCONNECT, ASB_DISCARD_YOUNGER_PRI, ASB_DISCARD_OLDER_PRI, ASB_DISCARD_ZERO_CHG, ASB_DISCARD_LEAST_CHG, ASB_DISCARD_LOCAL, ASB_DISCARD_REMOTE, ASB_CONSENSUS, ASB_DISCARD_SECONDARY, ASB_CALL_HELPER, ASB_VIOLENTLY }; enum drbd_on_no_data { OND_IO_ERROR, OND_SUSPEND_IO }; enum drbd_on_congestion { OC_BLOCK, OC_PULL_AHEAD, OC_DISCONNECT, }; /* KEEP the order, do not delete or insert. Only append. */ enum drbd_ret_code { ERR_CODE_BASE = 100, NO_ERROR = 101, ERR_LOCAL_ADDR = 102, ERR_PEER_ADDR = 103, ERR_OPEN_DISK = 104, ERR_OPEN_MD_DISK = 105, ERR_DISK_NOT_BDEV = 107, ERR_MD_NOT_BDEV = 108, ERR_DISK_TOO_SMALL = 111, ERR_MD_DISK_TOO_SMALL = 112, ERR_BDCLAIM_DISK = 114, ERR_BDCLAIM_MD_DISK = 115, ERR_MD_IDX_INVALID = 116, ERR_IO_MD_DISK = 118, ERR_MD_INVALID = 119, ERR_AUTH_ALG = 120, ERR_AUTH_ALG_ND = 121, ERR_NOMEM = 122, ERR_DISCARD_IMPOSSIBLE = 123, ERR_DISK_CONFIGURED = 124, ERR_NET_CONFIGURED = 125, ERR_MANDATORY_TAG = 126, ERR_MINOR_INVALID = 127, ERR_INTR = 129, /* EINTR */ ERR_RESIZE_RESYNC = 130, ERR_NO_PRIMARY = 131, ERR_SYNC_AFTER = 132, ERR_SYNC_AFTER_CYCLE = 133, ERR_PAUSE_IS_SET = 134, ERR_PAUSE_IS_CLEAR = 135, ERR_PACKET_NR = 137, ERR_NO_DISK = 138, ERR_NOT_PROTO_C = 139, ERR_NOMEM_BITMAP = 140, ERR_INTEGRITY_ALG = 141, /* DRBD 8.2 only */ ERR_INTEGRITY_ALG_ND = 142, /* DRBD 8.2 only */ ERR_CPU_MASK_PARSE = 143, /* DRBD 8.2 only */ ERR_CSUMS_ALG = 144, /* DRBD 8.2 only */ ERR_CSUMS_ALG_ND = 145, /* DRBD 8.2 only */ ERR_VERIFY_ALG = 146, /* DRBD 8.2 only */ ERR_VERIFY_ALG_ND = 147, /* DRBD 8.2 only */ ERR_CSUMS_RESYNC_RUNNING= 148, /* DRBD 8.2 only */ ERR_VERIFY_RUNNING = 149, /* DRBD 8.2 only */ ERR_DATA_NOT_CURRENT = 150, ERR_CONNECTED = 151, /* DRBD 8.3 only */ ERR_PERM = 152, ERR_NEED_APV_93 = 153, ERR_STONITH_AND_PROT_A = 154, ERR_CONG_NOT_PROTO_A = 155, ERR_PIC_AFTER_DEP = 156, ERR_PIC_PEER_DEP = 157, /* insert new ones above this line */ AFTER_LAST_ERR_CODE }; #define DRBD_PROT_A 1 #define DRBD_PROT_B 2 #define DRBD_PROT_C 3 enum drbd_role { R_UNKNOWN = 0, R_PRIMARY = 1, /* role */ R_SECONDARY = 2, /* role */ R_MASK = 3, }; /* The order of these constants is important. * The lower ones (=C_WF_REPORT_PARAMS ==> There is a socket */ enum drbd_conns { C_STANDALONE, C_DISCONNECTING, /* Temporal state on the way to StandAlone. */ C_UNCONNECTED, /* >= C_UNCONNECTED -> inc_net() succeeds */ /* These temporal states are all used on the way * from >= C_CONNECTED to Unconnected. * The 'disconnect reason' states * I do not allow to change beween them. */ C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE, C_PROTOCOL_ERROR, C_TEAR_DOWN, C_WF_CONNECTION, C_WF_REPORT_PARAMS, /* we have a socket */ C_CONNECTED, /* we have introduced each other */ C_STARTING_SYNC_S, /* starting full sync by admin request. */ C_STARTING_SYNC_T, /* stariing full sync by admin request. */ C_WF_BITMAP_S, C_WF_BITMAP_T, C_WF_SYNC_UUID, /* All SyncStates are tested with this comparison * xx >= C_SYNC_SOURCE && xx <= C_PAUSED_SYNC_T */ C_SYNC_SOURCE, C_SYNC_TARGET, C_VERIFY_S, C_VERIFY_T, C_PAUSED_SYNC_S, C_PAUSED_SYNC_T, C_AHEAD, C_BEHIND, C_MASK = 31 }; enum drbd_disk_state { D_DISKLESS, D_ATTACHING, /* In the process of reading the meta-data */ D_FAILED, /* Becomes D_DISKLESS as soon as we told it the peer */ /* when >= D_FAILED it is legal to access mdev->bc */ D_NEGOTIATING, /* Late attaching state, we need to talk to the peer */ D_INCONSISTENT, D_OUTDATED, D_UNKNOWN, /* Only used for the peer, never for myself */ D_CONSISTENT, /* Might be D_OUTDATED, might be D_UP_TO_DATE ... */ D_UP_TO_DATE, /* Only this disk state allows applications' IO ! */ D_MASK = 15 }; union drbd_state { /* According to gcc's docs is the ... * The order of allocation of bit-fields within a unit (C90 6.5.2.1, C99 6.7.2.1). * Determined by ABI. * pointed out by Maxim Uvarov q * even though we transmit as "cpu_to_be32(state)", * the offsets of the bitfields still need to be swapped * on different endianess. */ struct { #if defined(__LITTLE_ENDIAN_BITFIELD) unsigned role:2 ; /* 3/4 primary/secondary/unknown */ unsigned peer:2 ; /* 3/4 primary/secondary/unknown */ unsigned conn:5 ; /* 17/32 cstates */ unsigned disk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned pdsk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned susp:1 ; /* 2/2 IO suspended no/yes (by user) */ unsigned aftr_isp:1 ; /* isp .. imposed sync pause */ unsigned peer_isp:1 ; unsigned user_isp:1 ; unsigned susp_nod:1 ; /* IO suspended because no data */ unsigned susp_fen:1 ; /* IO suspended because fence peer handler runs*/ unsigned _pad:9; /* 0 unused */ #elif defined(__BIG_ENDIAN_BITFIELD) unsigned _pad:9; unsigned susp_fen:1 ; unsigned susp_nod:1 ; unsigned user_isp:1 ; unsigned peer_isp:1 ; unsigned aftr_isp:1 ; /* isp .. imposed sync pause */ unsigned susp:1 ; /* 2/2 IO suspended no/yes */ unsigned pdsk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned disk:4 ; /* 8/16 from D_DISKLESS to D_UP_TO_DATE */ unsigned conn:5 ; /* 17/32 cstates */ unsigned peer:2 ; /* 3/4 primary/secondary/unknown */ unsigned role:2 ; /* 3/4 primary/secondary/unknown */ #else # error "this endianess is not supported" #endif #ifndef DRBD_DEBUG_STATE_CHANGES # ifdef CONFIG_DYNAMIC_DEBUG # define DRBD_DEBUG_STATE_CHANGES 1 # else # define DRBD_DEBUG_STATE_CHANGES 0 # endif #endif #if DRBD_DEBUG_STATE_CHANGES unsigned int line; const char *func; unsigned long long seq; #endif }; unsigned int i; }; enum drbd_state_rv { SS_CW_NO_NEED = 4, SS_CW_SUCCESS = 3, SS_NOTHING_TO_DO = 2, SS_SUCCESS = 1, SS_UNKNOWN_ERROR = 0, /* Used to sleep longer in _drbd_request_state */ SS_TWO_PRIMARIES = -1, SS_NO_UP_TO_DATE_DISK = -2, SS_NO_LOCAL_DISK = -4, SS_NO_REMOTE_DISK = -5, SS_CONNECTED_OUTDATES = -6, SS_PRIMARY_NOP = -7, SS_RESYNC_RUNNING = -8, SS_ALREADY_STANDALONE = -9, SS_CW_FAILED_BY_PEER = -10, SS_IS_DISKLESS = -11, SS_DEVICE_IN_USE = -12, SS_NO_NET_CONFIG = -13, SS_NO_VERIFY_ALG = -14, /* drbd-8.2 only */ SS_NEED_CONNECTION = -15, /* drbd-8.2 only */ SS_LOWER_THAN_OUTDATED = -16, SS_NOT_SUPPORTED = -17, /* drbd-8.2 only */ SS_IN_TRANSIENT_STATE = -18, /* Retry after the next state change */ SS_CONCURRENT_ST_CHG = -19, /* Concurrent cluster side state change! */ SS_AFTER_LAST_ERROR = -20, /* Keep this at bottom */ }; /* from drbd_strings.c */ extern const char *drbd_conn_str(enum drbd_conns); extern const char *drbd_role_str(enum drbd_role); extern const char *drbd_disk_str(enum drbd_disk_state); extern const char *drbd_set_st_err_str(enum drbd_state_rv); #define SHARED_SECRET_MAX 64 #define MDF_CONSISTENT (1 << 0) #define MDF_PRIMARY_IND (1 << 1) #define MDF_CONNECTED_IND (1 << 2) #define MDF_FULL_SYNC (1 << 3) #define MDF_WAS_UP_TO_DATE (1 << 4) #define MDF_PEER_OUT_DATED (1 << 5) #define MDF_CRASHED_PRIMARY (1 << 6) enum drbd_uuid_index { UI_CURRENT, UI_BITMAP, UI_HISTORY_START, UI_HISTORY_END, UI_SIZE, /* nl-packet: number of dirty bits */ UI_FLAGS, /* nl-packet: flags */ UI_EXTENDED_SIZE /* Everything. */ }; enum drbd_timeout_flag { UT_DEFAULT = 0, UT_DEGRADED = 1, UT_PEER_OUTDATED = 2, }; #define UUID_JUST_CREATED ((__u64)4) #define DRBD_MAGIC 0x83740267 #define BE_DRBD_MAGIC __constant_cpu_to_be32(DRBD_MAGIC) #define DRBD_MAGIC_BIG 0x835a #define BE_DRBD_MAGIC_BIG __constant_cpu_to_be16(DRBD_MAGIC_BIG) /* these are of type "int" */ #define DRBD_MD_INDEX_INTERNAL -1 #define DRBD_MD_INDEX_FLEX_EXT -2 #define DRBD_MD_INDEX_FLEX_INT -3 /* Start of the new netlink/connector stuff */ #define DRBD_NL_CREATE_DEVICE 0x01 #define DRBD_NL_SET_DEFAULTS 0x02 /* The following line should be moved over to linux/connector.h * when the time comes */ #ifndef CN_IDX_DRBD # define CN_IDX_DRBD 0x4 /* Ubuntu "intrepid ibex" release defined CN_IDX_DRBD as 0x6 */ #endif #define CN_VAL_DRBD 0x1 /* For searching a vacant cn_idx value */ #define CN_IDX_STEP 6977 struct drbd_nl_cfg_req { int packet_type; unsigned int drbd_minor; int flags; unsigned short tag_list[]; }; struct drbd_nl_cfg_reply { int packet_type; unsigned int minor; /* FIXME: This is super ugly. */ int ret_code; /* enum drbd_ret_code or enum drbd_state_rv */ unsigned short tag_list[]; /* only used with get_* calls */ }; #endif drbd-8.4.4/user/legacy/linux/drbd_config.h0000664000000000000000000001133511605310253017133 0ustar rootroot/* drbd_config.h DRBD's compile time configuration. drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef DRBD_CONFIG_H #define DRBD_CONFIG_H extern const char *drbd_buildtag(void); /* Necessary to build the external module against >= Linux-2.6.33 */ #ifdef REL_VERSION #undef REL_VERSION #undef API_VERSION #undef PRO_VERSION_MIN #undef PRO_VERSION_MAX #endif /* End of external module for 2.6.33 stuff */ #define REL_VERSION "8.3.10" #define API_VERSION 88 #define PRO_VERSION_MIN 86 #define PRO_VERSION_MAX 96 #ifndef __CHECKER__ /* for a sparse run, we need all STATICs */ #define DBG_ALL_SYMBOLS /* no static functs, improves quality of OOPS traces */ #endif /* drbd_assert_breakpoint() function #define DBG_ASSERTS */ /* Dump all cstate changes */ #define DUMP_MD 2 /* some extra checks #define PARANOIA */ /* Enable fault insertion code */ #define DRBD_ENABLE_FAULTS /* RedHat's 2.6.9 kernels have the gfp_t type. Mainline has this feature * since 2.6.16. If you build for RedHat enable the line below. */ #define KERNEL_HAS_GFP_T /* kernel.org has atomic_add_return since 2.6.10. some vendor kernels * have it backported, though. Others don't. */ //#define NEED_BACKPORT_OF_ATOMIC_ADD /* 2.6.something has deprecated kmem_cache_t * some older still use it. * some have it defined as struct kmem_cache_s, some as struct kmem_cache */ //#define USE_KMEM_CACHE_S /* 2.6.something has sock_create_kern (SE-linux security context stuff) * some older distribution kernels don't. */ //#define DEFINE_SOCK_CREATE_KERN /* 2.6.24 and later have kernel_sock_shutdown. * some older distribution kernels may also have a backport. */ //#define DEFINE_KERNEL_SOCK_SHUTDOWN /* in older kernels (vanilla < 2.6.16) struct netlink_skb_parms has a * member called dst_groups. Later it is called dst_group (without 's'). */ //#define DRBD_NL_DST_GROUPS /* in older kernels (vanilla < 2.6.14) is no kzalloc() */ //#define NEED_BACKPORT_OF_KZALLOC // some vendor kernels have it, some don't //#define NEED_SG_SET_BUF #define HAVE_LINUX_SCATTERLIST_H /* 2.6.29 and up no longer have swabb.h */ //#define HAVE_LINUX_BYTEORDER_SWABB_H /* some vendor kernel have it backported. */ #define HAVE_SET_CPUS_ALLOWED_PTR /* Some vendor kernels < 2.6.7 might define msleep in one or * another way .. */ #define KERNEL_HAS_MSLEEP /* Some other kernels < 2.6.8 do not have struct kvec, * others do.. */ #define KERNEL_HAS_KVEC /* Actually availabe since 2.6.26, but vendors have backported... */ #define KERNEL_HAS_PROC_CREATE_DATA /* In 2.6.32 we finally fixed connector to pass netlink_skb_parms to the callback */ #define KERNEL_HAS_CN_SKB_PARMS /* In the 2.6.34 mergewindow blk_queue_max_sectors() got blk_queue_max_hw_sectors() and blk_queue_max_(phys|hw)_segments() got blk_queue_max_segments() See Linux commits: 086fa5ff0854c676ec333 8a78362c4eefc1deddbef */ //#define NEED_BLK_QUEUE_MAX_HW_SECTORS //#define NEED_BLK_QUEUE_MAX_SEGMENTS /* For kernel versions 2.6.31 to 2.6.33 inclusive, even though * blk_queue_max_hw_sectors is present, we actually need to use * blk_queue_max_sectors to set max_hw_sectors. :-( * RHEL6 2.6.32 chose to be different and already has eliminated * blk_queue_max_sectors as upstream 2.6.34 did. * I check it into the git repo as defined, * because if someone does not run our compat adjust magic, it otherwise would * silently compile broken code on affected kernel versions, which is worse * than the compile error it may cause on more recent kernels. */ #define USE_BLK_QUEUE_MAX_SECTORS_ANYWAYS /* For kernel versions > 2.6.38, open_bdev_excl has been replaced with * blkdev_get_by_path. See e525fd89 and d4d77629 */ //#define COMPAT_HAVE_BLKDEV_GET_BY_PATH /* before open_bdev_exclusive, there was a open_bdev_excl, * see 30c40d2 */ #define COMPAT_HAVE_OPEN_BDEV_EXCLUSIVE /* some old kernels do not have atomic_add_unless() */ //#define NEED_ATOMIC_ADD_UNLESS /* some old kernels do not have the bool type */ //#define NEED_BOOL_TYPE /* some older kernels do not have schedule_timeout_interruptible() */ //#define NEED_SCHEDULE_TIMEOUT_INTERR /* Stone old kernels lack the fmode_t type */ #define COMPAT_HAVE_FMODE_T #endif drbd-8.4.4/user/legacy/linux/drbd_limits.h0000664000000000000000000001210511605310253017163 0ustar rootroot/* drbd_limits.h This file is part of DRBD by Philipp Reisner and Lars Ellenberg. */ /* * Our current limitations. * Some of them are hard limits, * some of them are arbitrary range limits, that make it easier to provide * feedback about nonsense settings for certain configurable values. */ #ifndef DRBD_LIMITS_H #define DRBD_LIMITS_H 1 #define DEBUG_RANGE_CHECK 0 #define DRBD_MINOR_COUNT_MIN 1 #define DRBD_MINOR_COUNT_MAX 256 #define DRBD_MINOR_COUNT_DEF 32 #define DRBD_DIALOG_REFRESH_MIN 0 #define DRBD_DIALOG_REFRESH_MAX 600 /* valid port number */ #define DRBD_PORT_MIN 1 #define DRBD_PORT_MAX 0xffff /* startup { */ /* if you want more than 3.4 days, disable */ #define DRBD_WFC_TIMEOUT_MIN 0 #define DRBD_WFC_TIMEOUT_MAX 300000 #define DRBD_WFC_TIMEOUT_DEF 0 #define DRBD_DEGR_WFC_TIMEOUT_MIN 0 #define DRBD_DEGR_WFC_TIMEOUT_MAX 300000 #define DRBD_DEGR_WFC_TIMEOUT_DEF 0 #define DRBD_OUTDATED_WFC_TIMEOUT_MIN 0 #define DRBD_OUTDATED_WFC_TIMEOUT_MAX 300000 #define DRBD_OUTDATED_WFC_TIMEOUT_DEF 0 /* }*/ /* net { */ /* timeout, unit centi seconds * more than one minute timeout is not usefull */ #define DRBD_TIMEOUT_MIN 1 #define DRBD_TIMEOUT_MAX 600 #define DRBD_TIMEOUT_DEF 60 /* 6 seconds */ /* If backing disk takes longer than disk_timeout, mark the disk as failed */ #define DRBD_DISK_TIMEOUT_MIN 0 /* 0 = disabled */ #define DRBD_DISK_TIMEOUT_MAX 6000 /* 10 Minutes */ #define DRBD_DISK_TIMEOUT_DEF 0 /* disabled */ /* active connection retries when C_WF_CONNECTION */ #define DRBD_CONNECT_INT_MIN 1 #define DRBD_CONNECT_INT_MAX 120 #define DRBD_CONNECT_INT_DEF 10 /* seconds */ /* keep-alive probes when idle */ #define DRBD_PING_INT_MIN 1 #define DRBD_PING_INT_MAX 120 #define DRBD_PING_INT_DEF 10 /* timeout for the ping packets.*/ #define DRBD_PING_TIMEO_MIN 1 #define DRBD_PING_TIMEO_MAX 100 #define DRBD_PING_TIMEO_DEF 5 /* max number of write requests between write barriers */ #define DRBD_MAX_EPOCH_SIZE_MIN 1 #define DRBD_MAX_EPOCH_SIZE_MAX 20000 #define DRBD_MAX_EPOCH_SIZE_DEF 2048 /* I don't think that a tcp send buffer of more than 10M is usefull */ #define DRBD_SNDBUF_SIZE_MIN 0 #define DRBD_SNDBUF_SIZE_MAX (10<<20) #define DRBD_SNDBUF_SIZE_DEF 0 #define DRBD_RCVBUF_SIZE_MIN 0 #define DRBD_RCVBUF_SIZE_MAX (10<<20) #define DRBD_RCVBUF_SIZE_DEF 0 /* @4k PageSize -> 128kB - 512MB */ #define DRBD_MAX_BUFFERS_MIN 32 #define DRBD_MAX_BUFFERS_MAX 131072 #define DRBD_MAX_BUFFERS_DEF 2048 /* @4k PageSize -> 4kB - 512MB */ #define DRBD_UNPLUG_WATERMARK_MIN 1 #define DRBD_UNPLUG_WATERMARK_MAX 131072 #define DRBD_UNPLUG_WATERMARK_DEF (DRBD_MAX_BUFFERS_DEF/16) /* 0 is disabled. * 200 should be more than enough even for very short timeouts */ #define DRBD_KO_COUNT_MIN 0 #define DRBD_KO_COUNT_MAX 200 #define DRBD_KO_COUNT_DEF 0 /* } */ /* syncer { */ /* FIXME allow rate to be zero? */ #define DRBD_RATE_MIN 1 /* channel bonding 10 GbE, or other hardware */ #define DRBD_RATE_MAX (4 << 20) #define DRBD_RATE_DEF 250 /* kb/second */ /* less than 7 would hit performance unneccessarily. * 3833 is the largest prime that still does fit * into 64 sectors of activity log */ #define DRBD_AL_EXTENTS_MIN 7 #define DRBD_AL_EXTENTS_MAX 3833 #define DRBD_AL_EXTENTS_DEF 127 #define DRBD_AFTER_MIN -1 #define DRBD_AFTER_MAX 255 #define DRBD_AFTER_DEF -1 /* } */ /* drbdsetup XY resize -d Z * you are free to reduce the device size to nothing, if you want to. * the upper limit with 64bit kernel, enough ram and flexible meta data * is 16 TB, currently. */ /* DRBD_MAX_SECTORS */ #define DRBD_DISK_SIZE_SECT_MIN 0 #define DRBD_DISK_SIZE_SECT_MAX (16 * (2LLU << 30)) #define DRBD_DISK_SIZE_SECT_DEF 0 /* = disabled = no user size... */ #define DRBD_ON_IO_ERROR_DEF EP_PASS_ON #define DRBD_FENCING_DEF FP_DONT_CARE #define DRBD_AFTER_SB_0P_DEF ASB_DISCONNECT #define DRBD_AFTER_SB_1P_DEF ASB_DISCONNECT #define DRBD_AFTER_SB_2P_DEF ASB_DISCONNECT #define DRBD_RR_CONFLICT_DEF ASB_DISCONNECT #define DRBD_ON_NO_DATA_DEF OND_IO_ERROR #define DRBD_ON_CONGESTION_DEF OC_BLOCK #define DRBD_MAX_BIO_BVECS_MIN 0 #define DRBD_MAX_BIO_BVECS_MAX 128 #define DRBD_MAX_BIO_BVECS_DEF 0 #define DRBD_C_PLAN_AHEAD_MIN 0 #define DRBD_C_PLAN_AHEAD_MAX 300 #define DRBD_C_PLAN_AHEAD_DEF 0 /* RS rate controller disabled by default */ #define DRBD_C_DELAY_TARGET_MIN 1 #define DRBD_C_DELAY_TARGET_MAX 100 #define DRBD_C_DELAY_TARGET_DEF 10 #define DRBD_C_FILL_TARGET_MIN 0 #define DRBD_C_FILL_TARGET_MAX (1<<20) /* 500MByte in sec */ #define DRBD_C_FILL_TARGET_DEF 0 /* By default disabled -> controlled by delay_target */ #define DRBD_C_MAX_RATE_MIN 250 /* kByte/sec */ #define DRBD_C_MAX_RATE_MAX (4 << 20) #define DRBD_C_MAX_RATE_DEF 102400 #define DRBD_C_MIN_RATE_MIN 0 /* kByte/sec */ #define DRBD_C_MIN_RATE_MAX (4 << 20) #define DRBD_C_MIN_RATE_DEF 4096 #define DRBD_CONG_FILL_MIN 0 #define DRBD_CONG_FILL_MAX (10<<21) /* 10GByte in sectors */ #define DRBD_CONG_FILL_DEF 0 #define DRBD_CONG_EXTENTS_MIN DRBD_AL_EXTENTS_MIN #define DRBD_CONG_EXTENTS_MAX DRBD_AL_EXTENTS_MAX #define DRBD_CONG_EXTENTS_DEF DRBD_AL_EXTENTS_DEF #endif drbd-8.4.4/user/legacy/linux/drbd_nl.h0000664000000000000000000001173012004476306016304 0ustar rootroot/* PAKET( name, TYPE ( pn, pr, member ) ... ) You may never reissue one of the pn arguments */ #if !defined(NL_PACKET) || !defined(NL_STRING) || !defined(NL_INTEGER) || !defined(NL_BIT) || !defined(NL_INT64) #error "The macros NL_PACKET, NL_STRING, NL_INTEGER, NL_INT64 and NL_BIT needs to be defined" #endif NL_PACKET(primary, 1, NL_BIT( 1, T_MAY_IGNORE, primary_force) ) NL_PACKET(secondary, 2, ) NL_PACKET(disk_conf, 3, NL_INT64( 2, T_MAY_IGNORE, disk_size) NL_STRING( 3, T_MANDATORY, backing_dev, 128) NL_STRING( 4, T_MANDATORY, meta_dev, 128) NL_INTEGER( 5, T_MANDATORY, meta_dev_idx) NL_INTEGER( 6, T_MAY_IGNORE, on_io_error) NL_INTEGER( 7, T_MAY_IGNORE, fencing) NL_BIT( 37, T_MAY_IGNORE, use_bmbv) NL_BIT( 53, T_MAY_IGNORE, no_disk_flush) NL_BIT( 54, T_MAY_IGNORE, no_md_flush) /* 55 max_bio_size was available in 8.2.6rc2 */ NL_INTEGER( 56, T_MAY_IGNORE, max_bio_bvecs) NL_BIT( 57, T_MAY_IGNORE, no_disk_barrier) NL_BIT( 58, T_MAY_IGNORE, no_disk_drain) NL_INTEGER( 89, T_MAY_IGNORE, disk_timeout) ) NL_PACKET(detach, 4, NL_BIT( 88, T_MANDATORY, detach_force) ) NL_PACKET(net_conf, 5, NL_STRING( 8, T_MANDATORY, my_addr, 128) NL_STRING( 9, T_MANDATORY, peer_addr, 128) NL_STRING( 10, T_MAY_IGNORE, shared_secret, SHARED_SECRET_MAX) NL_STRING( 11, T_MAY_IGNORE, cram_hmac_alg, SHARED_SECRET_MAX) NL_STRING( 44, T_MAY_IGNORE, integrity_alg, SHARED_SECRET_MAX) NL_INTEGER( 14, T_MAY_IGNORE, timeout) NL_INTEGER( 15, T_MANDATORY, wire_protocol) NL_INTEGER( 16, T_MAY_IGNORE, try_connect_int) NL_INTEGER( 17, T_MAY_IGNORE, ping_int) NL_INTEGER( 18, T_MAY_IGNORE, max_epoch_size) NL_INTEGER( 19, T_MAY_IGNORE, max_buffers) NL_INTEGER( 20, T_MAY_IGNORE, unplug_watermark) NL_INTEGER( 21, T_MAY_IGNORE, sndbuf_size) NL_INTEGER( 22, T_MAY_IGNORE, ko_count) NL_INTEGER( 24, T_MAY_IGNORE, after_sb_0p) NL_INTEGER( 25, T_MAY_IGNORE, after_sb_1p) NL_INTEGER( 26, T_MAY_IGNORE, after_sb_2p) NL_INTEGER( 39, T_MAY_IGNORE, rr_conflict) NL_INTEGER( 40, T_MAY_IGNORE, ping_timeo) NL_INTEGER( 67, T_MAY_IGNORE, rcvbuf_size) NL_INTEGER( 81, T_MAY_IGNORE, on_congestion) NL_INTEGER( 82, T_MAY_IGNORE, cong_fill) NL_INTEGER( 83, T_MAY_IGNORE, cong_extents) /* 59 addr_family was available in GIT, never released */ NL_BIT( 60, T_MANDATORY, mind_af) NL_BIT( 27, T_MAY_IGNORE, want_lose) NL_BIT( 28, T_MAY_IGNORE, two_primaries) NL_BIT( 41, T_MAY_IGNORE, always_asbp) NL_BIT( 61, T_MAY_IGNORE, no_cork) NL_BIT( 62, T_MANDATORY, auto_sndbuf_size) NL_BIT( 70, T_MANDATORY, dry_run) ) NL_PACKET(disconnect, 6, NL_BIT( 84, T_MAY_IGNORE, force) ) NL_PACKET(resize, 7, NL_INT64( 29, T_MAY_IGNORE, resize_size) NL_BIT( 68, T_MAY_IGNORE, resize_force) NL_BIT( 69, T_MANDATORY, no_resync) ) NL_PACKET(syncer_conf, 8, NL_INTEGER( 30, T_MAY_IGNORE, rate) NL_INTEGER( 31, T_MAY_IGNORE, after) NL_INTEGER( 32, T_MAY_IGNORE, al_extents) /* NL_INTEGER( 71, T_MAY_IGNORE, dp_volume) */ /* NL_INTEGER( 72, T_MAY_IGNORE, dp_interval) */ /* NL_INTEGER( 73, T_MAY_IGNORE, throttle_th) removed */ /* NL_INTEGER( 74, T_MAY_IGNORE, hold_off_th) removed */ NL_STRING( 52, T_MAY_IGNORE, verify_alg, SHARED_SECRET_MAX) NL_STRING( 51, T_MAY_IGNORE, cpu_mask, 32) NL_STRING( 64, T_MAY_IGNORE, csums_alg, SHARED_SECRET_MAX) NL_BIT( 65, T_MAY_IGNORE, use_rle) NL_INTEGER( 75, T_MAY_IGNORE, on_no_data) NL_INTEGER( 76, T_MAY_IGNORE, c_plan_ahead) NL_INTEGER( 77, T_MAY_IGNORE, c_delay_target) NL_INTEGER( 78, T_MAY_IGNORE, c_fill_target) NL_INTEGER( 79, T_MAY_IGNORE, c_max_rate) NL_INTEGER( 80, T_MAY_IGNORE, c_min_rate) ) NL_PACKET(invalidate, 9, ) NL_PACKET(invalidate_peer, 10, ) NL_PACKET(pause_sync, 11, ) NL_PACKET(resume_sync, 12, ) NL_PACKET(suspend_io, 13, ) NL_PACKET(resume_io, 14, ) NL_PACKET(outdate, 15, ) NL_PACKET(get_config, 16, ) NL_PACKET(get_state, 17, NL_INTEGER( 33, T_MAY_IGNORE, state_i) ) NL_PACKET(get_uuids, 18, NL_STRING( 34, T_MAY_IGNORE, uuids, (UI_SIZE*sizeof(__u64))) NL_INTEGER( 35, T_MAY_IGNORE, uuids_flags) ) NL_PACKET(get_timeout_flag, 19, NL_BIT( 36, T_MAY_IGNORE, use_degraded) ) NL_PACKET(call_helper, 20, NL_STRING( 38, T_MAY_IGNORE, helper, 32) ) /* Tag nr 42 already allocated in drbd-8.1 development. */ NL_PACKET(sync_progress, 23, NL_INTEGER( 43, T_MAY_IGNORE, sync_progress) ) NL_PACKET(dump_ee, 24, NL_STRING( 45, T_MAY_IGNORE, dump_ee_reason, 32) NL_STRING( 46, T_MAY_IGNORE, seen_digest, SHARED_SECRET_MAX) NL_STRING( 47, T_MAY_IGNORE, calc_digest, SHARED_SECRET_MAX) NL_INT64( 48, T_MAY_IGNORE, ee_sector) NL_INT64( 49, T_MAY_IGNORE, ee_block_id) NL_STRING( 50, T_MAY_IGNORE, ee_data, 32 << 10) ) NL_PACKET(start_ov, 25, NL_INT64( 66, T_MAY_IGNORE, start_sector) NL_INT64( 90, T_MANDATORY, stop_sector) ) NL_PACKET(new_c_uuid, 26, NL_BIT( 63, T_MANDATORY, clear_bm) ) #ifdef NL_RESPONSE NL_RESPONSE(return_code_only, 27) #endif #undef NL_PACKET #undef NL_INTEGER #undef NL_INT64 #undef NL_BIT #undef NL_STRING #undef NL_RESPONSE drbd-8.4.4/user/legacy/linux/drbd_tag_magic.h0000664000000000000000000000540511753207431017611 0ustar rootroot#ifndef DRBD_TAG_MAGIC_H #define DRBD_TAG_MAGIC_H #define TT_END 0 #define TT_REMOVED 0xE000 /* declare packet_type enums */ enum packet_types { #define NL_PACKET(name, number, fields) P_ ## name = number, #define NL_RESPONSE(name, number) P_ ## name = number, #define NL_INTEGER(pn, pr, member) #define NL_INT64(pn, pr, member) #define NL_BIT(pn, pr, member) #define NL_STRING(pn, pr, member, len) #include "drbd_nl.h" P_nl_after_last_packet, }; /* These struct are used to deduce the size of the tag lists: */ #define NL_PACKET(name, number, fields) \ struct name ## _tag_len_struct { fields }; #define NL_INTEGER(pn, pr, member) \ int member; int tag_and_len ## member; #define NL_INT64(pn, pr, member) \ __u64 member; int tag_and_len ## member; #define NL_BIT(pn, pr, member) \ unsigned char member:1; int tag_and_len ## member; #define NL_STRING(pn, pr, member, len) \ unsigned char member[len]; int member ## _len; \ int tag_and_len ## member; #include "drbd_nl.h" /* declare tag-list-sizes */ static const int tag_list_sizes[] = { #define NL_PACKET(name, number, fields) 2 fields , #define NL_INTEGER(pn, pr, member) + 4 + 4 #define NL_INT64(pn, pr, member) + 4 + 8 #define NL_BIT(pn, pr, member) + 4 + 1 #define NL_STRING(pn, pr, member, len) + 4 + (len) #include "drbd_nl.h" }; /* The two highest bits are used for the tag type */ #define TT_MASK 0xC000 #define TT_INTEGER 0x0000 #define TT_INT64 0x4000 #define TT_BIT 0x8000 #define TT_STRING 0xC000 /* The next bit indicates if processing of the tag is mandatory */ #define T_MANDATORY 0x2000 #define T_MAY_IGNORE 0x0000 #define TN_MASK 0x1fff /* The remaining 13 bits are used to enumerate the tags */ #define tag_type(T) ((T) & TT_MASK) #define tag_number(T) ((T) & TN_MASK) /* declare tag enums */ #define NL_PACKET(name, number, fields) fields enum drbd_tags { #define NL_INTEGER(pn, pr, member) T_ ## member = pn | TT_INTEGER | pr , #define NL_INT64(pn, pr, member) T_ ## member = pn | TT_INT64 | pr , #define NL_BIT(pn, pr, member) T_ ## member = pn | TT_BIT | pr , #define NL_STRING(pn, pr, member, len) T_ ## member = pn | TT_STRING | pr , #include "drbd_nl.h" }; struct tag { const char *name; int type_n_flags; int max_len; }; /* declare tag names */ #define NL_PACKET(name, number, fields) fields static const struct tag tag_descriptions[] = { #define NL_INTEGER(pn, pr, member) [ pn ] = { #member, TT_INTEGER | pr, sizeof(int) }, #define NL_INT64(pn, pr, member) [ pn ] = { #member, TT_INT64 | pr, sizeof(__u64) }, #define NL_BIT(pn, pr, member) [ pn ] = { #member, TT_BIT | pr, sizeof(int) }, #define NL_STRING(pn, pr, member, len) [ pn ] = { #member, TT_STRING | pr, (len) }, #include "drbd_nl.h" }; #endif drbd-8.4.4/user/legacy/unaligned.h0000664000000000000000000000336711753207431015517 0ustar rootroot#ifndef UNALIGNED_H #define UNALIGNED_H #include #if defined(__i386__) || defined(__x86_64__) #define UNALIGNED_ACCESS_SUPPORTED #endif #ifndef UNALIGNED_ACCESS_SUPPORTED #warning "Assuming that your architecture can not do unaligned memory accesses." #warning "Enabling extra code for unaligned memory accesses." #endif #ifdef UNALIGNED_ACCESS_SUPPORTED /* On some architectures the hardware (or microcode) does it */ #define get_unaligned(ptr) *(ptr) #define put_unaligned(val, ptr) *(ptr) = (val) #else /* on some architectures we have to do it in program code */ /* Better not use memcpy(). gcc generates broken code an ARM at higher optimisation levels */ #define __bad_unaligned_access_size() ({ \ fprintf(stderr, "bad unaligned access. abort()\n"); \ abort(); \ }) #define get_unaligned(ptr) ((typeof(*(ptr)))({ \ typeof(*(ptr)) v; \ unsigned char *s = (unsigned char*)(ptr); \ unsigned char *d = (unsigned char*)&v; \ switch (sizeof(v)) { \ case 8: *d++ = *s++; \ *d++ = *s++; \ *d++ = *s++; \ *d++ = *s++; \ case 4: *d++ = *s++; \ *d++ = *s++; \ case 2: *d++ = *s++; \ case 1: *d++ = *s++; \ break; \ default: \ __bad_unaligned_access_size(); \ break; \ } \ v; })) #define put_unaligned(val, ptr) ({ \ typeof(*(ptr)) v = (val); \ unsigned char *d = (unsigned char*)(ptr); \ unsigned char *s = (unsigned char*)&v; \ switch (sizeof(v)) { \ case 8: *d++ = *s++; \ *d++ = *s++; \ *d++ = *s++; \ *d++ = *s++; \ case 4: *d++ = *s++; \ *d++ = *s++; \ case 2: *d++ = *s++; \ case 1: *d++ = *s++; \ break; \ default: \ __bad_unaligned_access_size(); \ break; \ } \ (void)0; }) #endif #endif drbd-8.4.4/user/libgenl.c0000664000000000000000000005132312216604252013704 0ustar rootroot#include "libgenl.h" #include #include #include #include int genl_join_mc_group(struct genl_sock *s, const char *name) { int g_id; int i; BUG_ON(!s || !s->s_family); for (i = 0; i < 32; i++) { if (!s->s_family->mc_groups[i].id) continue; if (strcmp(s->s_family->mc_groups[i].name, name)) continue; g_id = s->s_family->mc_groups[i].id; return setsockopt(s->s_fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &g_id, sizeof(g_id)); } return -2; } #define DO_OR_LOG_AND_FAIL(x) \ do { \ int err = x; \ if (err) { \ dbg(1, "%s failed: %d %s\n", \ #x, err, strerror(errno)); \ goto fail; \ } \ } while(0) static struct genl_sock *genl_connect(__u32 nl_groups) { struct genl_sock *s = calloc(1, sizeof(*s)); socklen_t sock_len; int bsz = 2 << 10; if (!s) return NULL; /* autobind; kernel is responsible to give us something unique * in bind() below. */ s->s_local.nl_pid = 0; s->s_local.nl_family = AF_NETLINK; /* * If we want to receive multicast traffic on this socket, kernels * before v2.6.23-rc1 require us to indicate which multicast groups we * are interested in in nl_groups. */ s->s_local.nl_groups = nl_groups; s->s_peer.nl_family = AF_NETLINK; /* start with some sane sequence number */ s->s_seq_expect = s->s_seq_next = time(0); s->s_fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_GENERIC); if (s->s_fd == -1) goto fail; sock_len = sizeof(s->s_local); DO_OR_LOG_AND_FAIL(setsockopt(s->s_fd, SOL_SOCKET, SO_SNDBUF, &bsz, sizeof(bsz))); DO_OR_LOG_AND_FAIL(setsockopt(s->s_fd, SOL_SOCKET, SO_RCVBUF, &bsz, sizeof(bsz))); DO_OR_LOG_AND_FAIL(bind(s->s_fd, (struct sockaddr*) &s->s_local, sizeof(s->s_local))); DO_OR_LOG_AND_FAIL(getsockname(s->s_fd, (struct sockaddr*) &s->s_local, &sock_len)); dbg(3, "bound socket to nl_pid:%u, my pid:%u, len:%u, sizeof:%u\n", s->s_local.nl_pid, getpid(), (unsigned)sock_len, (unsigned)sizeof(s->s_local)); return s; fail: free(s); return NULL; } #undef DO_OR_LOG_AND_FAIL static int do_send(int fd, const void *buf, int len) { int c; while ((c = write(fd, buf, len)) < len) { if (c == -1) { if (errno == EINTR) continue; return -1; } buf += c; len -= c; } return 0; } int genl_send(struct genl_sock *s, struct msg_buff *msg) { struct nlmsghdr *n = (struct nlmsghdr *)msg->data; struct genlmsghdr *g; n->nlmsg_len = msg->tail - msg->data; n->nlmsg_flags |= NLM_F_REQUEST; n->nlmsg_seq = s->s_seq_expect = s->s_seq_next++; n->nlmsg_pid = s->s_local.nl_pid; g = nlmsg_data(n); dbg(3, "sending %smessage, pid:%u seq:%u, g.cmd/version:%u/%u", n->nlmsg_type == GENL_ID_CTRL ? "ctrl " : "", n->nlmsg_pid, n->nlmsg_seq, g->cmd, g->version); return do_send(s->s_fd, msg->data, n->nlmsg_len); } /* "inspired" by libnl nl_recv() * You pass in one iovec, which may contain pre-allocated buffer space, * obtained by malloc(). It will be realloc()ed on demand. * Caller is responsible for free()ing it up on return, * regardless of return code. */ int genl_recv_timeout(struct genl_sock *s, struct iovec *iov, int timeout_ms) { struct sockaddr_nl addr; struct pollfd pfd; int flags; struct msghdr msg = { .msg_name = &addr, .msg_namelen = sizeof(struct sockaddr_nl), .msg_iov = iov, .msg_iovlen = 1, .msg_control = NULL, .msg_controllen = 0, .msg_flags = 0, }; int n; if (!iov->iov_len) { iov->iov_len = 8192; iov->iov_base = malloc(iov->iov_len); } flags = MSG_PEEK; retry: pfd.fd = s->s_fd; pfd.events = POLLIN; if ((poll(&pfd, 1, timeout_ms) != 1) || !(pfd.revents & POLLIN)) return 0; /* which is E_RCV_TIMEDOUT */ /* for most cases this method will memcopy twice, as the default buffer * is large enough. But for those few other cases, we now have a * chance to realloc before the rest of the datagram is discarded. */ n = recvmsg(s->s_fd, &msg, flags); if (!n) return 0; else if (n < 0) { if (errno == EINTR) { dbg(3, "recvmsg() returned EINTR, retrying\n"); goto retry; } else if (errno == EAGAIN) { dbg(3, "recvmsg() returned EAGAIN, aborting\n"); return 0; } else return -E_RCV_FAILED; } if (iov->iov_len < (unsigned)n || msg.msg_flags & MSG_TRUNC) { /* Provided buffer is not long enough, enlarge it * and try again. */ iov->iov_len *= 2; iov->iov_base = realloc(iov->iov_base, iov->iov_len); goto retry; } else if (flags != 0) { /* Buffer is big enough, do the actual reading */ flags = 0; goto retry; } if (msg.msg_namelen != sizeof(struct sockaddr_nl)) return -E_RCV_NO_SOURCE_ADDR; if (addr.nl_pid != 0) { dbg(3, "ignoring message from sender pid %u != 0\n", addr.nl_pid); goto retry; } return n; } /* Note that one datagram may contain multiple netlink messages * (e.g. for a dump response). This only checks the _first_ message, * caller has to iterate over multiple messages with nlmsg_for_each_msg() * when necessary. */ int genl_recv_msgs(struct genl_sock *s, struct iovec *iov, char **err_desc, int timeout_ms) { struct nlmsghdr *nlh; int c = genl_recv_timeout(s, iov, timeout_ms); if (c <= 0) { if (err_desc) *err_desc = (c == -E_RCV_TIMEDOUT) ? "timed out waiting for reply" : (c == -E_RCV_NO_SOURCE_ADDR) ? "no source address!" : "failed to receive netlink reply"; return c; } nlh = (struct nlmsghdr*)iov->iov_base; if (!nlmsg_ok(nlh, c)) { if (err_desc) *err_desc = "truncated message in netlink reply"; return -E_RCV_MSG_TRUNC; } if (s->s_seq_expect && nlh->nlmsg_seq != s->s_seq_expect) { if (err_desc) *err_desc = "sequence mismatch in netlink reply"; return -E_RCV_SEQ_MISMATCH; } if (nlh->nlmsg_type == NLMSG_NOOP || nlh->nlmsg_type == NLMSG_OVERRUN) { if (err_desc) *err_desc = "unexpected message type in reply"; return -E_RCV_UNEXPECTED_TYPE; } if (nlh->nlmsg_type == NLMSG_DONE) return -E_RCV_NLMSG_DONE; if (nlh->nlmsg_type == NLMSG_ERROR) { struct nlmsgerr *e = nlmsg_data(nlh); errno = -e->error; if (!errno) /* happens if you request NLM_F_ACK */ dbg(3, "got a positive ACK message for seq:%u", s->s_seq_expect); else { dbg(3, "got a NACK message for seq:%u, error:%d", s->s_seq_expect, e->error); if (err_desc) *err_desc = strerror(errno); } return -E_RCV_ERROR_REPLY; } /* good reply message(s) */ dbg(3, "received a good message for seq:%u", s->s_seq_expect); return c; } static struct genl_family genl_ctrl = { .id = GENL_ID_CTRL, .name = "nlctrl", .version = 0x2, .maxattr = CTRL_ATTR_MAX, }; struct genl_sock *genl_connect_to_family(struct genl_family *family) { struct genl_sock *s = NULL; struct msg_buff *msg; struct nlmsghdr *nlh; struct nlattr *nla; struct iovec iov = { .iov_len = 0 }; int rem; BUG_ON(!family); BUG_ON(!strlen(family->name)); msg = msg_new(DEFAULT_MSG_SIZE); if (!msg) { dbg(1, "could not allocate genl message"); goto out; } s = genl_connect(family->nl_groups); if (!s) { dbg(1, "error creating netlink socket"); goto out; } genlmsg_put(msg, &genl_ctrl, 0, CTRL_CMD_GETFAMILY); nla_put_string(msg, CTRL_ATTR_FAMILY_NAME, family->name); if (genl_send(s, msg)) { dbg(1, "failed to send netlink message"); free(s); s = NULL; goto out; } if (genl_recv_msgs(s, &iov, NULL, 3000) <= 0) { close(s->s_fd); free(s); s = NULL; goto out; } nlh = (struct nlmsghdr*)iov.iov_base; nla_for_each_attr(nla, nlmsg_attrdata(nlh, GENL_HDRLEN), nlmsg_attrlen(nlh, GENL_HDRLEN), rem) { switch (nla_type(nla)) { case CTRL_ATTR_FAMILY_ID: family->id = nla_get_u16(nla); dbg(2, "'%s' genl family id: %d", family->name, family->id); break; case CTRL_ATTR_FAMILY_NAME: break; #ifdef HAVE_CTRL_ATTR_VERSION case CTRL_ATTR_VERSION: family->version = nla_get_u32(nla); dbg(2, "'%s' genl family version: %d", family->name, family->version); break; #endif #ifdef HAVE_CTRL_ATTR_HDRSIZE case CTRL_ATTR_HDRSIZE: family->hdrsize = nla_get_u32(nla); dbg(2, "'%s' genl family hdrsize: %d", family->name, family->hdrsize); break; #endif #ifdef HAVE_CTRL_ATTR_MCAST_GROUPS case CTRL_ATTR_MCAST_GROUPS: { static struct nla_policy policy[] = { [CTRL_ATTR_MCAST_GRP_NAME] = { .type = NLA_NUL_STRING, .len = GENL_NAMSIZ }, [CTRL_ATTR_MCAST_GRP_ID] = { .type = NLA_U32 }, }; struct nlattr *ntb[__CTRL_ATTR_MCAST_GRP_MAX]; struct nlattr *idx; int tmp; int i = 0; nla_for_each_nested(idx, nla, tmp) { BUG_ON(i >= 32); nla_parse_nested(ntb, CTRL_ATTR_MCAST_GRP_MAX, idx, policy); if (ntb[CTRL_ATTR_MCAST_GRP_NAME] && ntb[CTRL_ATTR_MCAST_GRP_ID]) { struct genl_multicast_group *grp = &family->mc_groups[i++]; grp->id = nla_get_u32(ntb[CTRL_ATTR_MCAST_GRP_ID]); nla_strlcpy(grp->name, ntb[CTRL_ATTR_MCAST_GRP_NAME], sizeof(grp->name)); dbg(2, "'%s'-'%s' multicast group found (id: %u)\n", family->name, grp->name, grp->id); } } break; }; #endif default: ; } } if (!family->id) dbg(1, "genl family '%s' not found", family->name); else s->s_family = family; out: free(iov.iov_base); msg_free(msg); return s; } /* * Stripped down copy from linux-2.6.32/lib/nlattr.c * skb -> "msg_buff" * - Lars Ellenberg * * NETLINK Netlink attributes * * Authors: Thomas Graf * Alexey Kuznetsov */ #include #include static __u16 nla_attr_minlen[NLA_TYPE_MAX+1] __read_mostly = { [NLA_U8] = sizeof(__u8), [NLA_U16] = sizeof(__u16), [NLA_U32] = sizeof(__u32), [NLA_U64] = sizeof(__u64), [NLA_NESTED] = NLA_HDRLEN, }; static int validate_nla(struct nlattr *nla, int maxtype, const struct nla_policy *policy) { const struct nla_policy *pt; int minlen = 0, attrlen = nla_len(nla), type = nla_type(nla); if (type <= 0 || type > maxtype) return 0; pt = &policy[type]; BUG_ON(pt->type > NLA_TYPE_MAX); switch (pt->type) { case NLA_FLAG: if (attrlen > 0) return -ERANGE; break; case NLA_NUL_STRING: if (pt->len) minlen = min_t(int, attrlen, pt->len + 1); else minlen = attrlen; if (!minlen || memchr(nla_data(nla), '\0', minlen) == NULL) return -EINVAL; /* fall through */ case NLA_STRING: if (attrlen < 1) return -ERANGE; if (pt->len) { char *buf = nla_data(nla); if (buf[attrlen - 1] == '\0') attrlen--; if (attrlen > pt->len) return -ERANGE; } break; case NLA_BINARY: if (pt->len && attrlen > pt->len) return -ERANGE; break; case NLA_NESTED_COMPAT: if (attrlen < pt->len) return -ERANGE; if (attrlen < NLA_ALIGN(pt->len)) break; if (attrlen < NLA_ALIGN(pt->len) + NLA_HDRLEN) return -ERANGE; nla = nla_data(nla) + NLA_ALIGN(pt->len); if (attrlen < NLA_ALIGN(pt->len) + NLA_HDRLEN + nla_len(nla)) return -ERANGE; break; case NLA_NESTED: /* a nested attributes is allowed to be empty; if its not, * it must have a size of at least NLA_HDRLEN. */ if (attrlen == 0) break; default: if (pt->len) minlen = pt->len; else if (pt->type != NLA_UNSPEC) minlen = nla_attr_minlen[pt->type]; if (attrlen < minlen) return -ERANGE; } return 0; } /** * nla_validate - Validate a stream of attributes * @head: head of attribute stream * @len: length of attribute stream * @maxtype: maximum attribute type to be expected * @policy: validation policy * * Validates all attributes in the specified attribute stream against the * specified policy. Attributes with a type exceeding maxtype will be * ignored. See documenation of struct nla_policy for more details. * * Returns 0 on success or a negative error code. */ int nla_validate(struct nlattr *head, int len, int maxtype, const struct nla_policy *policy) { struct nlattr *nla; int rem, err; nla_for_each_attr(nla, head, len, rem) { err = validate_nla(nla, maxtype, policy); if (err < 0) goto errout; } err = 0; errout: return err; } /** * nla_policy_len - Determin the max. length of a policy * @policy: policy to use * @n: number of policies * * Determines the max. length of the policy. It is currently used * to allocated Netlink buffers roughly the size of the actual * message. * * Returns 0 on success or a negative error code. */ int nla_policy_len(const struct nla_policy *p, int n) { int i, len = 0; for (i = 0; i < n; i++, p++) { if (p->len) len += nla_total_size(p->len); else if (nla_attr_minlen[p->type]) len += nla_total_size(nla_attr_minlen[p->type]); } return len; } /** * nla_parse - Parse a stream of attributes into a tb buffer * @tb: destination array with maxtype+1 elements * @maxtype: maximum attribute type to be expected * @head: head of attribute stream * @len: length of attribute stream * @policy: validation policy * * Parses a stream of attributes and stores a pointer to each attribute in * the tb array accessable via the attribute type. Attributes with a type * exceeding maxtype will be silently ignored for backwards compatibility * reasons. policy may be set to NULL if no validation is required. * * Returns 0 on success or a negative error code. */ int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len, const struct nla_policy *policy) { struct nlattr *nla; int rem, err; memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1)); nla_for_each_attr(nla, head, len, rem) { __u16 type = nla_type(nla); if (type > 0 && type <= maxtype) { if (policy) { err = validate_nla(nla, maxtype, policy); if (err < 0) goto errout; } tb[type] = nla; } } if (unlikely(rem > 0)) dbg(1, "netlink: %d bytes leftover after parsing " "attributes.\n", rem); err = 0; errout: if (err) dbg(1, "netlink: policy violation t:%d[%x] e:%d\n", nla_type(nla), nla->nla_type, err); return err; } /** * nla_find - Find a specific attribute in a stream of attributes * @head: head of attribute stream * @len: length of attribute stream * @attrtype: type of attribute to look for * * Returns the first attribute in the stream matching the specified type. */ struct nlattr *nla_find(struct nlattr *head, int len, int attrtype) { struct nlattr *nla; int rem; nla_for_each_attr(nla, head, len, rem) if (nla_type(nla) == attrtype) return nla; return NULL; } /** * nla_strlcpy - Copy string attribute payload into a sized buffer * @dst: where to copy the string to * @nla: attribute to copy the string from * @dstsize: size of destination buffer * * Copies at most dstsize - 1 bytes into the destination buffer. * The result is always a valid NUL-terminated string. Unlike * strlcpy the destination buffer is always padded out. * * Returns the length of the source buffer. */ size_t nla_strlcpy(char *dst, const struct nlattr *nla, size_t dstsize) { size_t srclen = nla_len(nla); char *src = nla_data(nla); if (srclen > 0 && src[srclen - 1] == '\0') srclen--; if (dstsize > 0) { size_t len = (srclen >= dstsize) ? dstsize - 1 : srclen; memset(dst, 0, dstsize); memcpy(dst, src, len); } return srclen; } /** * nla_memcpy - Copy a netlink attribute into another memory area * @dest: where to copy to memcpy * @src: netlink attribute to copy from * @count: size of the destination area * * Note: The number of bytes copied is limited by the length of * attribute's payload. memcpy * * Returns the number of bytes copied. */ int nla_memcpy(void *dest, const struct nlattr *src, int count) { int minlen = min_t(int, count, nla_len(src)); memcpy(dest, nla_data(src), minlen); return minlen; } /** * nla_memcmp - Compare an attribute with sized memory area * @nla: netlink attribute * @data: memory area * @size: size of memory area */ int nla_memcmp(const struct nlattr *nla, const void *data, size_t size) { int d = nla_len(nla) - size; if (d == 0) d = memcmp(nla_data(nla), data, size); return d; } /** * nla_strcmp - Compare a string attribute against a string * @nla: netlink string attribute * @str: another string */ int nla_strcmp(const struct nlattr *nla, const char *str) { int len = strlen(str) + 1; int d = nla_len(nla) - len; if (d == 0) d = memcmp(nla_data(nla), str, len); return d; } /** * __nla_reserve - reserve room for attribute on the msg * @msg: message buffer to reserve room on * @attrtype: attribute type * @attrlen: length of attribute payload * * Adds a netlink attribute header to a message buffer and reserves * room for the payload but does not copy it. * * The caller is responsible to ensure that the msg provides enough * tailroom for the attribute header and payload. */ struct nlattr *__nla_reserve(struct msg_buff *msg, int attrtype, int attrlen) { struct nlattr *nla; nla = (struct nlattr *) msg_put(msg, nla_total_size(attrlen)); nla->nla_type = attrtype; nla->nla_len = nla_attr_size(attrlen); memset((unsigned char *) nla + nla->nla_len, 0, nla_padlen(attrlen)); return nla; } /** * __nla_reserve_nohdr - reserve room for attribute without header * @msg: message buffer to reserve room on * @attrlen: length of attribute payload * * Reserves room for attribute payload without a header. * * The caller is responsible to ensure that the msg provides enough * tailroom for the payload. */ void *__nla_reserve_nohdr(struct msg_buff *msg, int attrlen) { void *start; start = msg_put(msg, NLA_ALIGN(attrlen)); memset(start, 0, NLA_ALIGN(attrlen)); return start; } /** * nla_reserve - reserve room for attribute on the msg * @msg: message buffer to reserve room on * @attrtype: attribute type * @attrlen: length of attribute payload * * Adds a netlink attribute header to a message buffer and reserves * room for the payload but does not copy it. * * Returns NULL if the tailroom of the msg is insufficient to store * the attribute header and payload. */ struct nlattr *nla_reserve(struct msg_buff *msg, int attrtype, int attrlen) { if (unlikely(msg_tailroom(msg) < nla_total_size(attrlen))) return NULL; return __nla_reserve(msg, attrtype, attrlen); } /** * nla_reserve_nohdr - reserve room for attribute without header * @msg: message buffer to reserve room on * @attrlen: length of attribute payload * * Reserves room for attribute payload without a header. * * Returns NULL if the tailroom of the msg is insufficient to store * the attribute payload. */ void *nla_reserve_nohdr(struct msg_buff *msg, int attrlen) { if (unlikely(msg_tailroom(msg) < NLA_ALIGN(attrlen))) return NULL; return __nla_reserve_nohdr(msg, attrlen); } /** * __nla_put - Add a netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @attrlen: length of attribute payload * @data: head of attribute payload * * The caller is responsible to ensure that the msg provides enough * tailroom for the attribute header and payload. */ void __nla_put(struct msg_buff *msg, int attrtype, int attrlen, const void *data) { struct nlattr *nla; nla = __nla_reserve(msg, attrtype, attrlen); memcpy(nla_data(nla), data, attrlen); } /** * __nla_put_nohdr - Add a netlink attribute without header * @msg: message buffer to add attribute to * @attrlen: length of attribute payload * @data: head of attribute payload * * The caller is responsible to ensure that the msg provides enough * tailroom for the attribute payload. */ void __nla_put_nohdr(struct msg_buff *msg, int attrlen, const void *data) { void *start; start = __nla_reserve_nohdr(msg, attrlen); memcpy(start, data, attrlen); } /** * nla_put - Add a netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @attrlen: length of attribute payload * @data: head of attribute payload * * Returns -EMSGSIZE if the tailroom of the msg is insufficient to store * the attribute header and payload. */ int nla_put(struct msg_buff *msg, int attrtype, int attrlen, const void *data) { if (unlikely(msg_tailroom(msg) < nla_total_size(attrlen))) return -EMSGSIZE; __nla_put(msg, attrtype, attrlen, data); return 0; } /** * nla_put_nohdr - Add a netlink attribute without header * @msg: message buffer to add attribute to * @attrlen: length of attribute payload * @data: head of attribute payload * * Returns -EMSGSIZE if the tailroom of the msg is insufficient to store * the attribute payload. */ int nla_put_nohdr(struct msg_buff *msg, int attrlen, const void *data) { if (unlikely(msg_tailroom(msg) < NLA_ALIGN(attrlen))) return -EMSGSIZE; __nla_put_nohdr(msg, attrlen, data); return 0; } /** * nla_append - Add a netlink attribute without header or padding * @msg: message buffer to add attribute to * @attrlen: length of attribute payload * @data: head of attribute payload * * Returns -EMSGSIZE if the tailroom of the msg is insufficient to store * the attribute payload. */ int nla_append(struct msg_buff *msg, int attrlen, const void *data) { if (unlikely(msg_tailroom(msg) < NLA_ALIGN(attrlen))) return -EMSGSIZE; memcpy(msg_put(msg, attrlen), data, attrlen); return 0; } drbd-8.4.4/user/libgenl.h0000664000000000000000000007404012216604252013712 0ustar rootroot#ifndef LIBGENL_H #define LIBGENL_H /* * stripped down copy of * linux-2.6.32/include/net/netlink.h and * linux-2.6.32/include/net/genetlink.h * * sk_buff -> "msg_buff" */ #include #include #include #include #include #include #include #include #include #ifndef SOL_NETLINK #define SOL_NETLINK 270 #endif #define DEBUG_LEVEL 1 #define dbg(lvl, fmt, arg...) \ do { \ if (lvl <= DEBUG_LEVEL) \ fprintf(stderr, "<%d>" fmt "\n", \ lvl , ##arg); \ } while (0) #define BUG_ON(cond) \ do { \ int __cond = (cond); \ if (!__cond) \ break; \ fprintf(stderr, "BUG: %s:%d: %s == %u\n", \ __FILE__, __LINE__, \ #cond, __cond); \ abort(); \ } while (0) #define min_t(type, x, y) ({ \ type __min1 = (x); \ type __min2 = (y); \ __min1 < __min2 ? __min1: __min2; }) #ifndef __unused #define __unused __attribute((unused)) #endif #ifndef __read_mostly #define __read_mostly #endif #ifndef unlikely #define unlikely(arg) (arg) #endif #ifndef ARRAY_SIZE #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) #endif struct msg_buff { /* housekeeping */ unsigned char *tail; unsigned char *end; /* start of data to be send(), * or received into */ unsigned char data[0]; }; #define DEFAULT_MSG_SIZE 8192 static inline int msg_tailroom(struct msg_buff *msg) { return msg->end - msg->tail; } static inline struct msg_buff *msg_new(size_t size) { struct msg_buff *m = calloc(1, sizeof(*m) + size); if (!m) return NULL; m->tail = m->data; m->end = m->tail + size; return m; } static inline void msg_free(struct msg_buff *m) { free(m); } static inline void *msg_put(struct msg_buff *msg, unsigned int len) { void *tmp = msg->tail; msg->tail += len; BUG_ON(msg->tail > msg->end); return (void*)tmp; } /* ======================================================================== * Netlink Messages and Attributes Interface (As Seen On TV) * ------------------------------------------------------------------------ * Messages Interface * ------------------------------------------------------------------------ * * Message Format: * <--- nlmsg_total_size(payload) ---> * <-- nlmsg_msg_size(payload) -> * +----------+- - -+-------------+- - -+-------- - - * | nlmsghdr | Pad | Payload | Pad | nlmsghdr * +----------+- - -+-------------+- - -+-------- - - * nlmsg_data(nlh)---^ ^ * nlmsg_next(nlh)-----------------------+ * * Payload Format: * <---------------------- nlmsg_len(nlh) ---------------------> * <------ hdrlen ------> <- nlmsg_attrlen(nlh, hdrlen) -> * +----------------------+- - -+--------------------------------+ * | Family Header | Pad | Attributes | * +----------------------+- - -+--------------------------------+ * nlmsg_attrdata(nlh, hdrlen)---^ * * Data Structures: * struct nlmsghdr netlink message header * * Message Construction: * nlmsg_new() create a new netlink message * nlmsg_put() add a netlink message to an msg * nlmsg_put_answer() callback based nlmsg_put() * nlmsg_end() finanlize netlink message * nlmsg_get_pos() return current position in message * nlmsg_trim() trim part of message * nlmsg_cancel() cancel message construction * nlmsg_free() free a netlink message * * Message Sending: * nlmsg_multicast() multicast message to several groups * nlmsg_unicast() unicast a message to a single socket * nlmsg_notify() send notification message * * Message Length Calculations: * nlmsg_msg_size(payload) length of message w/o padding * nlmsg_total_size(payload) length of message w/ padding * nlmsg_padlen(payload) length of padding at tail * * Message Payload Access: * nlmsg_data(nlh) head of message payload * nlmsg_len(nlh) length of message payload * nlmsg_attrdata(nlh, hdrlen) head of attributes data * nlmsg_attrlen(nlh, hdrlen) length of attributes data * * Message Parsing: * nlmsg_ok(nlh, remaining) does nlh fit into remaining bytes? * nlmsg_next(nlh, remaining) get next netlink message * nlmsg_parse() parse attributes of a message * nlmsg_find_attr() find an attribute in a message * nlmsg_for_each_msg() loop over all messages * nlmsg_validate() validate netlink message incl. attrs * nlmsg_for_each_attr() loop over all attributes * * Misc: * nlmsg_report() report back to application? * * ------------------------------------------------------------------------ * Attributes Interface * ------------------------------------------------------------------------ * * Attribute Format: * <------- nla_total_size(payload) -------> * <---- nla_attr_size(payload) -----> * +----------+- - -+- - - - - - - - - +- - -+-------- - - * | Header | Pad | Payload | Pad | Header * +----------+- - -+- - - - - - - - - +- - -+-------- - - * <- nla_len(nla) -> ^ * nla_data(nla)----^ | * nla_next(nla)-----------------------------' * * Data Structures: * struct nlattr netlink attribute header * * Attribute Construction: * nla_reserve(msg, type, len) reserve room for an attribute * nla_reserve_nohdr(msg, len) reserve room for an attribute w/o hdr * nla_put(msg, type, len, data) add attribute to msg * nla_put_nohdr(msg, len, data) add attribute w/o hdr * nla_append(msg, len, data) append data to msg * * Attribute Construction for Basic Types: * nla_put_u8(msg, type, value) add u8 attribute to msg * nla_put_u16(msg, type, value) add u16 attribute to msg * nla_put_u32(msg, type, value) add u32 attribute to msg * nla_put_u64(msg, type, value) add u64 attribute to msg * nla_put_string(msg, type, str) add string attribute to msg * nla_put_flag(msg, type) add flag attribute to msg * nla_put_msecs(msg, type, jiffies) add msecs attribute to msg * * Exceptions Based Attribute Construction: * NLA_PUT(msg, type, len, data) add attribute to msg * NLA_PUT_U8(msg, type, value) add u8 attribute to msg * NLA_PUT_U16(msg, type, value) add u16 attribute to msg * NLA_PUT_U32(msg, type, value) add u32 attribute to msg * NLA_PUT_U64(msg, type, value) add u64 attribute to msg * NLA_PUT_STRING(msg, type, str) add string attribute to msg * NLA_PUT_FLAG(msg, type) add flag attribute to msg * NLA_PUT_MSECS(msg, type, jiffies) add msecs attribute to msg * * The meaning of these functions is equal to their lower case * variants but they jump to the label nla_put_failure in case * of a failure. * * Nested Attributes Construction: * nla_nest_start(msg, type) start a nested attribute * nla_nest_end(msg, nla) finalize a nested attribute * nla_nest_cancel(msg, nla) cancel nested attribute construction * * Attribute Length Calculations: * nla_attr_size(payload) length of attribute w/o padding * nla_total_size(payload) length of attribute w/ padding * nla_padlen(payload) length of padding * * Attribute Payload Access: * nla_data(nla) head of attribute payload * nla_len(nla) length of attribute payload * * Attribute Payload Access for Basic Types: * nla_get_u8(nla) get payload for a u8 attribute * nla_get_u16(nla) get payload for a u16 attribute * nla_get_u32(nla) get payload for a u32 attribute * nla_get_u64(nla) get payload for a u64 attribute * nla_get_flag(nla) return 1 if flag is true * nla_get_msecs(nla) get payload for a msecs attribute * * Attribute Misc: * nla_memcpy(dest, nla, count) copy attribute into memory * nla_memcmp(nla, data, size) compare attribute with memory area * nla_strlcpy(dst, nla, size) copy attribute to a sized string * nla_strcmp(nla, str) compare attribute with string * * Attribute Parsing: * nla_ok(nla, remaining) does nla fit into remaining bytes? * nla_next(nla, remaining) get next netlink attribute * nla_validate() validate a stream of attributes * nla_validate_nested() validate a stream of nested attributes * nla_find() find attribute in stream of attributes * nla_find_nested() find attribute in nested attributes * nla_parse() parse and validate stream of attrs * nla_parse_nested() parse nested attribuets * nla_for_each_attr() loop over all attributes * nla_for_each_nested() loop over the nested attributes *========================================================================= */ /** * Standard attribute types to specify validation policy */ enum { NLA_UNSPEC, NLA_U8, NLA_U16, NLA_U32, NLA_U64, NLA_STRING, NLA_FLAG, NLA_MSECS, NLA_NESTED, NLA_NESTED_COMPAT, NLA_NUL_STRING, NLA_BINARY, __NLA_TYPE_MAX, }; #define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1) /** * struct nla_policy - attribute validation policy * @type: Type of attribute or NLA_UNSPEC * @len: Type specific length of payload * * Policies are defined as arrays of this struct, the array must be * accessible by attribute type up to the highest identifier to be expected. * * Meaning of `len' field: * NLA_STRING Maximum length of string * NLA_NUL_STRING Maximum length of string (excluding NUL) * NLA_FLAG Unused * NLA_BINARY Maximum length of attribute payload * NLA_NESTED_COMPAT Exact length of structure payload * All other Exact length of attribute payload * * Example: * static struct nla_policy my_policy[ATTR_MAX+1] __read_mostly = { * [ATTR_FOO] = { .type = NLA_U16 }, * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ }, * [ATTR_BAZ] = { .len = sizeof(struct mystruct) }, * }; */ struct nla_policy { __u16 type; __u16 len; }; extern int nla_validate(struct nlattr *head, int len, int maxtype, const struct nla_policy *policy); extern int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len, const struct nla_policy *policy); extern int nla_policy_len(const struct nla_policy *, int); extern struct nlattr * nla_find(struct nlattr *head, int len, int attrtype); extern size_t nla_strlcpy(char *dst, const struct nlattr *nla, size_t dstsize); extern int nla_memcpy(void *dest, const struct nlattr *src, int count); extern int nla_memcmp(const struct nlattr *nla, const void *data, size_t size); extern int nla_strcmp(const struct nlattr *nla, const char *str); extern struct nlattr * __nla_reserve(struct msg_buff *msg, int attrtype, int attrlen); extern void * __nla_reserve_nohdr(struct msg_buff *msg, int attrlen); extern struct nlattr * nla_reserve(struct msg_buff *msg, int attrtype, int attrlen); extern void * nla_reserve_nohdr(struct msg_buff *msg, int attrlen); extern void __nla_put(struct msg_buff *msg, int attrtype, int attrlen, const void *data); extern void __nla_put_nohdr(struct msg_buff *msg, int attrlen, const void *data); extern int nla_put(struct msg_buff *msg, int attrtype, int attrlen, const void *data); extern int nla_put_nohdr(struct msg_buff *msg, int attrlen, const void *data); extern int nla_append(struct msg_buff *msg, int attrlen, const void *data); /************************************************************************** * Netlink Messages **************************************************************************/ /** * nlmsg_msg_size - length of netlink message not including padding * @payload: length of message payload */ static inline int nlmsg_msg_size(int payload) { return NLMSG_HDRLEN + payload; } /** * nlmsg_total_size - length of netlink message including padding * @payload: length of message payload */ static inline int nlmsg_total_size(int payload) { return NLMSG_ALIGN(nlmsg_msg_size(payload)); } /** * nlmsg_padlen - length of padding at the message's tail * @payload: length of message payload */ static inline int nlmsg_padlen(int payload) { return nlmsg_total_size(payload) - nlmsg_msg_size(payload); } /** * nlmsg_data - head of message payload * @nlh: netlink messsage header */ static inline void *nlmsg_data(const struct nlmsghdr *nlh) { return (unsigned char *) nlh + NLMSG_HDRLEN; } /** * nlmsg_len - length of message payload * @nlh: netlink message header */ static inline int nlmsg_len(const struct nlmsghdr *nlh) { return nlh->nlmsg_len - NLMSG_HDRLEN; } /** * nlmsg_attrdata - head of attributes data * @nlh: netlink message header * @hdrlen: length of family specific header */ static inline struct nlattr *nlmsg_attrdata(const struct nlmsghdr *nlh, int hdrlen) { unsigned char *data = nlmsg_data(nlh); return (struct nlattr *) (data + NLMSG_ALIGN(hdrlen)); } /** * nlmsg_attrlen - length of attributes data * @nlh: netlink message header * @hdrlen: length of family specific header */ static inline int nlmsg_attrlen(const struct nlmsghdr *nlh, int hdrlen) { return nlmsg_len(nlh) - NLMSG_ALIGN(hdrlen); } /** * nlmsg_ok - check if the netlink message fits into the remaining bytes * @nlh: netlink message header * @remaining: number of bytes remaining in message stream */ static inline int nlmsg_ok(const struct nlmsghdr *nlh, int remaining) { return (remaining >= (int) sizeof(struct nlmsghdr) && nlh->nlmsg_len >= sizeof(struct nlmsghdr) && nlh->nlmsg_len <= (__u32)remaining); } /** * nlmsg_next - next netlink message in message stream * @nlh: netlink message header * @remaining: number of bytes remaining in message stream * * Returns the next netlink message in the message stream and * decrements remaining by the size of the current message. */ static inline struct nlmsghdr *nlmsg_next(struct nlmsghdr *nlh, int *remaining) { int totlen = NLMSG_ALIGN(nlh->nlmsg_len); *remaining -= totlen; return (struct nlmsghdr *) ((unsigned char *) nlh + totlen); } /** * nlmsg_parse - parse attributes of a netlink message * @nlh: netlink message header * @hdrlen: length of family specific header * @tb: destination array with maxtype+1 elements * @maxtype: maximum attribute type to be expected * @policy: validation policy * * See nla_parse() */ static inline int nlmsg_parse(const struct nlmsghdr *nlh, int hdrlen, struct nlattr *tb[], int maxtype, const struct nla_policy *policy) { if (nlh->nlmsg_len < (__u32)nlmsg_msg_size(hdrlen)) return -EINVAL; return nla_parse(tb, maxtype, nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen), policy); } /** * nlmsg_find_attr - find a specific attribute in a netlink message * @nlh: netlink message header * @hdrlen: length of familiy specific header * @attrtype: type of attribute to look for * * Returns the first attribute which matches the specified type. */ static inline struct nlattr *nlmsg_find_attr(struct nlmsghdr *nlh, int hdrlen, int attrtype) { return nla_find(nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen), attrtype); } /** * nlmsg_validate - validate a netlink message including attributes * @nlh: netlinket message header * @hdrlen: length of familiy specific header * @maxtype: maximum attribute type to be expected * @policy: validation policy */ static inline int nlmsg_validate(struct nlmsghdr *nlh, int hdrlen, int maxtype, const struct nla_policy *policy) { if (nlh->nlmsg_len < (__u32)nlmsg_msg_size(hdrlen)) return -EINVAL; return nla_validate(nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen), maxtype, policy); } /** * nlmsg_report - need to report back to application? * @nlh: netlink message header * * Returns 1 if a report back to the application is requested. */ static inline int nlmsg_report(const struct nlmsghdr *nlh) { return !!(nlh->nlmsg_flags & NLM_F_ECHO); } /** * nlmsg_for_each_attr - iterate over a stream of attributes * @pos: loop counter, set to current attribute * @nlh: netlink message header * @hdrlen: length of familiy specific header * @rem: initialized to len, holds bytes currently remaining in stream */ #define nlmsg_for_each_attr(pos, nlh, hdrlen, rem) \ nla_for_each_attr(pos, nlmsg_attrdata(nlh, hdrlen), \ nlmsg_attrlen(nlh, hdrlen), rem) /** * nlmsg_for_each_msg - iterate over a stream of messages * @pos: loop counter, set to current message * @head: head of message stream * @len: length of message stream * @rem: initialized to len, holds bytes currently remaining in stream */ #define nlmsg_for_each_msg(pos, head, len, rem) \ for (pos = head, rem = len; \ nlmsg_ok(pos, rem); \ pos = nlmsg_next(pos, &(rem))) /************************************************************************** * Netlink Attributes **************************************************************************/ /** * nla_attr_size - length of attribute not including padding * @payload: length of payload */ static inline int nla_attr_size(int payload) { return NLA_HDRLEN + payload; } /** * nla_total_size - total length of attribute including padding * @payload: length of payload */ static inline int nla_total_size(int payload) { return NLA_ALIGN(nla_attr_size(payload)); } /** * nla_padlen - length of padding at the tail of attribute * @payload: length of payload */ static inline int nla_padlen(int payload) { return nla_total_size(payload) - nla_attr_size(payload); } #ifndef NLA_TYPE_MASK #define NLA_TYPE_MASK ~0 #endif /** * nla_type - attribute type * @nla: netlink attribute */ static inline int nla_type(const struct nlattr *nla) { return nla->nla_type & NLA_TYPE_MASK; } /** * nla_data - head of payload * @nla: netlink attribute */ static inline void *nla_data(const struct nlattr *nla) { return (char *) nla + NLA_HDRLEN; } /** * nla_len - length of payload * @nla: netlink attribute */ static inline int nla_len(const struct nlattr *nla) { return nla->nla_len - NLA_HDRLEN; } /** * nla_ok - check if the netlink attribute fits into the remaining bytes * @nla: netlink attribute * @remaining: number of bytes remaining in attribute stream */ static inline int nla_ok(const struct nlattr *nla, int remaining) { return remaining >= (int) sizeof(*nla) && nla->nla_len >= sizeof(*nla) && nla->nla_len <= remaining; } /** * nla_next - next netlink attribute in attribute stream * @nla: netlink attribute * @remaining: number of bytes remaining in attribute stream * * Returns the next netlink attribute in the attribute stream and * decrements remaining by the size of the current attribute. */ static inline struct nlattr *nla_next(const struct nlattr *nla, int *remaining) { int totlen = NLA_ALIGN(nla->nla_len); *remaining -= totlen; return (struct nlattr *) ((char *) nla + totlen); } /** * nla_find_nested - find attribute in a set of nested attributes * @nla: attribute containing the nested attributes * @attrtype: type of attribute to look for * * Returns the first attribute which matches the specified type. */ static inline struct nlattr *nla_find_nested(struct nlattr *nla, int attrtype) { return nla_find(nla_data(nla), nla_len(nla), attrtype); } /** * nla_parse_nested - parse nested attributes * @tb: destination array with maxtype+1 elements * @maxtype: maximum attribute type to be expected * @nla: attribute containing the nested attributes * @policy: validation policy * * See nla_parse() */ static inline int nla_parse_nested(struct nlattr *tb[], int maxtype, const struct nlattr *nla, const struct nla_policy *policy) { return nla_parse(tb, maxtype, nla_data(nla), nla_len(nla), policy); } /** * nla_put_u8 - Add a u8 netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @value: numeric value */ static inline int nla_put_u8(struct msg_buff *msg, int attrtype, __u8 value) { return nla_put(msg, attrtype, sizeof(__u8), &value); } /** * nla_put_u16 - Add a u16 netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @value: numeric value */ static inline int nla_put_u16(struct msg_buff *msg, int attrtype, __u16 value) { return nla_put(msg, attrtype, sizeof(__u16), &value); } /** * nla_put_u32 - Add a u32 netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @value: numeric value */ static inline int nla_put_u32(struct msg_buff *msg, int attrtype, __u32 value) { return nla_put(msg, attrtype, sizeof(__u32), &value); } /** * nla_put_64 - Add a u64 netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @value: numeric value */ static inline int nla_put_u64(struct msg_buff *msg, int attrtype, __u64 value) { return nla_put(msg, attrtype, sizeof(__u64), &value); } /** * nla_put_string - Add a string netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type * @str: NUL terminated string */ static inline int nla_put_string(struct msg_buff *msg, int attrtype, const char *str) { return nla_put(msg, attrtype, strlen(str) + 1, str); } /** * nla_put_flag - Add a flag netlink attribute to a message buffer * @msg: message buffer to add attribute to * @attrtype: attribute type */ static inline int nla_put_flag(struct msg_buff *msg, int attrtype) { return nla_put(msg, attrtype, 0, NULL); } #define NLA_PUT(msg, attrtype, attrlen, data) \ do { \ if (unlikely(nla_put(msg, attrtype, attrlen, data) < 0)) \ goto nla_put_failure; \ } while(0) #define NLA_PUT_TYPE(msg, type, attrtype, value) \ do { \ type __tmp = value; \ NLA_PUT(msg, attrtype, sizeof(type), &__tmp); \ } while(0) #define NLA_PUT_U8(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __u8, attrtype, value) #define NLA_PUT_U16(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __u16, attrtype, value) #define NLA_PUT_LE16(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __le16, attrtype, value) #define NLA_PUT_BE16(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __be16, attrtype, value) #define NLA_PUT_U32(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __u32, attrtype, value) #define NLA_PUT_BE32(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __be32, attrtype, value) #define NLA_PUT_U64(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __u64, attrtype, value) #define NLA_PUT_BE64(msg, attrtype, value) \ NLA_PUT_TYPE(msg, __be64, attrtype, value) #define NLA_PUT_STRING(msg, attrtype, value) \ NLA_PUT(msg, attrtype, strlen(value) + 1, value) #define NLA_PUT_FLAG(msg, attrtype) \ NLA_PUT(msg, attrtype, 0, NULL) /** * nla_get_u32 - return payload of u32 attribute * @nla: u32 netlink attribute */ static inline __u32 nla_get_u32(const struct nlattr *nla) { return *(__u32 *) nla_data(nla); } /** * nla_get_be32 - return payload of __be32 attribute * @nla: __be32 netlink attribute */ static inline __be32 nla_get_be32(const struct nlattr *nla) { return *(__be32 *) nla_data(nla); } /** * nla_get_u16 - return payload of u16 attribute * @nla: u16 netlink attribute */ static inline __u16 nla_get_u16(const struct nlattr *nla) { return *(__u16 *) nla_data(nla); } /** * nla_get_be16 - return payload of __be16 attribute * @nla: __be16 netlink attribute */ static inline __be16 nla_get_be16(const struct nlattr *nla) { return *(__be16 *) nla_data(nla); } /** * nla_get_le16 - return payload of __le16 attribute * @nla: __le16 netlink attribute */ static inline __le16 nla_get_le16(const struct nlattr *nla) { return *(__le16 *) nla_data(nla); } /** * nla_get_u8 - return payload of u8 attribute * @nla: u8 netlink attribute */ static inline __u8 nla_get_u8(const struct nlattr *nla) { return *(__u8 *) nla_data(nla); } /** * nla_get_u64 - return payload of u64 attribute * @nla: u64 netlink attribute */ static inline __u64 nla_get_u64(const struct nlattr *nla) { __u64 tmp; nla_memcpy(&tmp, nla, sizeof(tmp)); return tmp; } /** * nla_get_be64 - return payload of __be64 attribute * @nla: __be64 netlink attribute */ static inline __be64 nla_get_be64(const struct nlattr *nla) { return *(__be64 *) nla_data(nla); } /** * nla_get_flag - return payload of flag attribute * @nla: flag netlink attribute */ static inline int nla_get_flag(const struct nlattr *nla) { return !!nla; } /** * nla_nest_start - Start a new level of nested attributes * @msg: message buffer to add attributes to * @attrtype: attribute type of container * * Returns the container attribute */ static inline struct nlattr *nla_nest_start(struct msg_buff *msg, int attrtype) { struct nlattr *start = (struct nlattr *)msg->tail; if (nla_put(msg, attrtype, 0, NULL) < 0) return NULL; return start; } /** * nla_nest_end - Finalize nesting of attributes * @msg: message buffer the attributes are stored in * @start: container attribute * * Corrects the container attribute header to include the all * appeneded attributes. * * Returns the total data length of the msg. */ static inline int nla_nest_end(struct msg_buff *msg, struct nlattr *start) { start->nla_len = msg->tail - (unsigned char *)start; return msg->tail - msg->data; } /** * nla_validate_nested - Validate a stream of nested attributes * @start: container attribute * @maxtype: maximum attribute type to be expected * @policy: validation policy * * Validates all attributes in the nested attribute stream against the * specified policy. Attributes with a type exceeding maxtype will be * ignored. See documenation of struct nla_policy for more details. * * Returns 0 on success or a negative error code. */ static inline int nla_validate_nested(struct nlattr *start, int maxtype, const struct nla_policy *policy) { return nla_validate(nla_data(start), nla_len(start), maxtype, policy); } /** * nla_for_each_attr - iterate over a stream of attributes * @pos: loop counter, set to current attribute * @head: head of attribute stream * @len: length of attribute stream * @rem: initialized to len, holds bytes currently remaining in stream */ #define nla_for_each_attr(pos, head, len, rem) \ for (pos = head, rem = len; \ nla_ok(pos, rem); \ pos = nla_next(pos, &(rem))) /** * nla_for_each_nested - iterate over nested attributes * @pos: loop counter, set to current attribute * @nla: attribute containing the nested attributes * @rem: initialized to len, holds bytes currently remaining in stream */ #define nla_for_each_nested(pos, nla, rem) \ nla_for_each_attr(pos, nla_data(nla), nla_len(nla), rem) /** * struct genl_multicast_group - generic netlink multicast group * @name: name of the multicast group, names are per-family * @id: multicast group ID, assigned by the core, to use with * genlmsg_multicast(). */ struct genl_multicast_group { char name[GENL_NAMSIZ]; __u32 id; }; /** * struct genl_family - generic netlink family * @id: protocol family idenfitier * @hdrsize: length of user specific header in bytes * @name: name of family * @version: protocol version * @maxattr: maximum number of attributes supported * @attrbuf: buffer to store parsed attributes * @ops_list: list of all assigned operations * @mcast_groups: multicast groups list */ struct genl_family { unsigned int id; unsigned int hdrsize; char name[GENL_NAMSIZ]; unsigned int version; unsigned int maxattr; /* 32 should be enough for most genl families */ struct genl_multicast_group mc_groups[32]; __u32 nl_groups; }; /** * struct genl_info - receiving information * @snd_seq: sending sequence number * @nlhdr: netlink message header * @genlhdr: generic netlink message header * @userhdr: user specific header * @attrs: netlink attributes */ struct genl_info { __u32 seq; struct nlmsghdr * nlhdr; struct genlmsghdr * genlhdr; void * userhdr; struct nlattr ** attrs; }; /** * genlmsg_put - Add generic netlink header to netlink message * @msg: message buffer holding the message * @family: generic netlink family * @flags netlink message flags * @cmd: generic netlink command * * Returns pointer to user specific header */ static inline void *genlmsg_put(struct msg_buff *msg, struct genl_family *family, int flags, __u8 cmd) { const unsigned hdrsize = NLMSG_HDRLEN + GENL_HDRLEN + family->hdrsize; struct nlmsghdr *nlh; struct genlmsghdr *hdr; if (unlikely(msg_tailroom(msg) < nlmsg_total_size(hdrsize))) return NULL; nlh = msg_put(msg, hdrsize); nlh->nlmsg_type = family->id; nlh->nlmsg_flags = flags; /* pid and seq will be reassigned in genl_send() */ nlh->nlmsg_pid = 0; nlh->nlmsg_seq = 0; hdr = nlmsg_data(nlh); hdr->cmd = cmd; hdr->version = family->version; /* truncated to u8! */ hdr->reserved = 0; return (char *) hdr + GENL_HDRLEN; } /** * gennlmsg_data - head of message payload * @gnlh: genetlink messsage header */ static inline void *genlmsg_data(const struct genlmsghdr *gnlh) { return ((unsigned char *) gnlh + GENL_HDRLEN); } /** * genlmsg_len - length of message payload * @gnlh: genetlink message header */ static inline int genlmsg_len(const struct genlmsghdr *gnlh) { struct nlmsghdr *nlh = (struct nlmsghdr *)((unsigned char *)gnlh - NLMSG_HDRLEN); return (nlh->nlmsg_len - GENL_HDRLEN - NLMSG_HDRLEN); } /** * genlmsg_msg_size - length of genetlink message not including padding * @payload: length of message payload */ static inline int genlmsg_msg_size(int payload) { return GENL_HDRLEN + payload; } /** * genlmsg_total_size - length of genetlink message including padding * @payload: length of message payload */ static inline int genlmsg_total_size(int payload) { return NLMSG_ALIGN(genlmsg_msg_size(payload)); } /* * Some helpers to simplify communicating with a particular family */ struct genl_sock { struct sockaddr_nl s_local; struct sockaddr_nl s_peer; int s_fd; unsigned int s_seq_next; unsigned int s_seq_expect; unsigned int s_flags; struct genl_family *s_family; }; extern struct genl_sock *genl_connect_to_family(struct genl_family *family); extern int genl_join_mc_group(struct genl_sock *s, const char *name); extern int genl_send(struct genl_sock *s, struct msg_buff *msg); enum { E_RCV_TIMEDOUT = 0, E_RCV_FAILED, E_RCV_NO_SOURCE_ADDR, E_RCV_SEQ_MISMATCH, E_RCV_MSG_TRUNC, E_RCV_UNEXPECTED_TYPE, E_RCV_NLMSG_DONE, E_RCV_ERROR_REPLY, }; /* returns negative E_RCV_*, or length of message */ extern int genl_recv_msgs(struct genl_sock *s, struct iovec *iov, char **err_desc, int timeout_ms); #endif /* LIBGENL_H */ drbd-8.4.4/user/registry.c0000664000000000000000000001136012216604252014135 0ustar rootroot/* drbdadm_registry.c This file is part of DRBD by Philipp Reisner and Lars Ellenberg. It was written by Johannes Thoma Copyright (C) 2002-2008, LINBIT Information Technologies GmbH. Copyright (C) 2002-2008, Philipp Reisner . Copyright (C) 2002-2008, Lars Ellenberg . drbd is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. drbd is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with drbd; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ /* This keeps track of which DRBD minor was configured in which * config file. This is required to have alternative config files * (-c switch) and userland event handlers. */ #include #include #include #include #include #include #include #include #include #include #include "config.h" #include "registry.h" static void linkname_from_minor(char *linkname, int minor) { sprintf(linkname, "%s/drbd-minor-%d.conf", DRBD_RUN_DIR, minor); } int unregister_minor(int minor) { char linkname[PATH_MAX]; linkname_from_minor(linkname, minor); if (unlink(linkname) < 0) { if (errno != ENOENT) { perror("unlink"); return -1; } } return 0; } static ssize_t __readlink(const char *path, char *buf, size_t bufsiz) { ssize_t ret; ret = readlink(path, buf, bufsiz); if (ret >= 0) { if (ret >= bufsiz) { errno = ENAMETOOLONG; return -1; } buf[ret] = 0; } return ret; } static int register_path(const char *linkname, const char *path) { char target[PATH_MAX]; if (path[0] != '/') { fprintf(stderr, "File %s: absolute path expected; won't " "register relative path.", path); return -1; } /* safeguard against symlink loops in DRBD_RUN_DIR */ if (!strncmp(path, DRBD_RUN_DIR "/", strlen(DRBD_RUN_DIR "/"))) return -1; if (__readlink(linkname, target, sizeof(target)) >= 0 && !strcmp(target, path)) return 0; if (unlink(linkname) != 0 && errno != ENOENT) { perror(linkname); return -1; } if (mkdir(DRBD_RUN_DIR, S_IRWXU) != 0 && errno != EEXIST) { perror(DRBD_RUN_DIR); return -1; } if (symlink(path, linkname) != 0) { fprintf(stderr, "symlink(%s, %s): %m\n", path, linkname); return -1; } return 0; } int register_minor(int minor, const char *path) { char linkname[PATH_MAX]; linkname_from_minor(linkname, minor); return register_path(linkname, path); } static char *resolve_symlink(const char *linkname) { static char target[PATH_MAX]; if (__readlink(linkname, target, sizeof(target)) < 0) return NULL; return target; } char *lookup_minor(int minor) { static char linkname[PATH_MAX]; struct stat stat_buf; linkname_from_minor(linkname, minor); if (stat(linkname, &stat_buf) != 0) { if (errno != ENOENT) perror(linkname); return NULL; } return resolve_symlink(linkname); } static void linkname_from_resource_name(char *linkname, const char *name) { sprintf(linkname, "%s/drbd-resource-%s.conf", DRBD_RUN_DIR, name); } int unregister_resource(const char *name) { char linkname[PATH_MAX]; linkname_from_resource_name(linkname, name); if (unlink(linkname) != 0) { if (errno != ENOENT) { perror(linkname); return -1; } } return 0; } int register_resource(const char *name, const char *path) { char linkname[PATH_MAX]; linkname_from_resource_name(linkname, name); return register_path(linkname, path); } /* This returns a static buffer containing the real * configuration file known to be used last for this minor. * If you need the return value longer, stuff it away with strdup. */ char *lookup_resource(const char *name) { static char linkname[PATH_MAX]; struct stat stat_buf; linkname_from_resource_name(linkname, name); if (stat(linkname, &stat_buf) != 0) { if (errno != ENOENT) perror(linkname); return NULL; } return resolve_symlink(linkname); } #ifdef TEST int main(int argc, char ** argv) { register_minor(1, "/etc/drbd-xy.conf"); register_minor(15, "/etc/drbd-82.conf"); register_minor(14, "/../../../../../../etc/drbd-82.conf"); printf("Minor 1 is %s.\n", lookup_minor(1)); printf("Minor 2 is %s.\n", lookup_minor(2)); printf("Minor 14 is %s.\n", lookup_minor(14)); printf("Minor 15 is %s.\n", lookup_minor(15)); return 0; } #endif drbd-8.4.4/user/registry.h0000664000000000000000000000056112216604252014143 0ustar rootroot#ifndef __REGISTRY_H #define __REGISTRY_H extern int register_minor(int minor, const char *path); extern int unregister_minor(int minor); extern char *lookup_minor(int minor); extern int unregister_resource(const char *name); extern int register_resource(const char *name, const char *path); extern char *lookup_resource(const char *name); #endif /* __REGISTRY_H */ drbd-8.4.4/user/wrap_printf.c0000664000000000000000000000140412216604252014616 0ustar rootroot#include #include #include #include #include int wrap_printf(int indent, char *format, ...) { static int columns, col; va_list ap1, ap2; int n; const char *nl; if (columns == 0) { struct winsize ws = { }; ioctl(1, TIOCGWINSZ, &ws); columns = ws.ws_col; if (columns <= 0) columns = 80; } va_start(ap1, format); va_copy(ap2, ap1); n = vsnprintf(NULL, 0, format, ap1); va_end(ap1); if (col + n > columns) { putchar('\n'); col = 0; } if (col == 0) { while (*format == ' ') format++; col += indent; while (indent--) putchar(' '); } n = vprintf(format, ap2); va_end(ap2); if (n > 0) col += n; nl = strrchr(format, '\n'); if (nl && nl[1] == 0) col = 0; return n; } drbd-8.4.4/user/wrap_printf.h0000664000000000000000000000020112216604252014615 0ustar rootroot#ifndef __WRAP_PRINTF #define __WRAP_PRINTF extern int wrap_printf(int indent, char *format, ...); #endif /* __WRAP_PRINTF */ drbd-8.4.4/documentation/drbdsetup.80000664000000000000000000014512712226007146016112 0ustar rootroot'\" t .\" Title: drbdsetup .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 6 May 2011 .\" Manual: System Administration .\" Source: DRBD 8.4.0 .\" Language: English .\" .TH "DRBDSETUP" "8" "6 May 2011" "DRBD 8.4.0" "System Administration" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" drbdsetup \- Setup tool for DRBD .\" drbdsetup .SH "SYNOPSIS" .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR new\-resource \fIresource\fR [\-\-cpu\-mask\ {\fIval\fR}] [\-\-on\-no\-data\-accessible\ {io\-error\ |\ suspend\-io}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR new\-minor \fIresource\fR \fIminor\fR \fIvolume\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR del\-resource \fIresource\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR del\-minor \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR attach \fIminor\fR \fIlower_dev\fR \fImeta_data_dev\fR \fImeta_data_index\fR [\-\-size\ {\fIval\fR}] [\-\-max\-bio\-bvecs\ {\fIval\fR}] [\-\-on\-io\-error\ {pass_on\ |\ call\-local\-io\-error\ |\ detach}] [\-\-fencing\ {dont\-care\ |\ resource\-only\ |\ resource\-and\-stonith}] [\-\-disk\-barrier] [\-\-disk\-flushes] [\-\-disk\-drain] [\-\-md\-flushes] [\-\-resync\-rate\ {\fIval\fR}] [\-\-resync\-after\ {\fIval\fR}] [\-\-al\-extents\ {\fIval\fR}] [\-\-al\-updates] [\-\-c\-plan\-ahead\ {\fIval\fR}] [\-\-c\-delay\-target\ {\fIval\fR}] [\-\-c\-fill\-target\ {\fIval\fR}] [\-\-c\-max\-rate\ {\fIval\fR}] [\-\-c\-min\-rate\ {\fIval\fR}] [\-\-disk\-timeout\ {\fIval\fR}] [\-\-read\-balancing\ {prefer\-local\ |\ prefer\-remote\ |\ round\-robin\ |\ least\-pending\ |\ when\-congested\-remote\ |\ 32K\-striping\ |\ 64K\-striping\ |\ 128K\-striping\ |\ 256K\-striping\ |\ 512K\-striping\ |\ 1M\-striping}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR connect \fIresource\fR \fIlocal_addr\fR \fIremote_addr\fR [\-\-tentative] [\-\-discard\-my\-data] [\-\-protocol\ {A\ |\ B\ |\ C}] [\-\-timeout\ {\fIval\fR}] [\-\-max\-epoch\-size\ {\fIval\fR}] [\-\-max\-buffers\ {\fIval\fR}] [\-\-unplug\-watermark\ {\fIval\fR}] [\-\-connect\-int\ {\fIval\fR}] [\-\-ping\-int\ {\fIval\fR}] [\-\-sndbuf\-size\ {\fIval\fR}] [\-\-rcvbuf\-size\ {\fIval\fR}] [\-\-ko\-count\ {\fIval\fR}] [\-\-allow\-two\-primaries] [\-\-cram\-hmac\-alg\ {\fIval\fR}] [\-\-shared\-secret\ {\fIval\fR}] [\-\-after\-sb\-0pri\ {disconnect\ |\ discard\-younger\-primary\ |\ discard\-older\-primary\ |\ discard\-zero\-changes\ |\ discard\-least\-changes\ |\ discard\-local\ |\ discard\-remote}] [\-\-after\-sb\-1pri\ {disconnect\ |\ consensus\ |\ discard\-secondary\ |\ call\-pri\-lost\-after\-sb\ |\ violently\-as0p}] [\-\-after\-sb\-2pri\ {disconnect\ |\ call\-pri\-lost\-after\-sb\ |\ violently\-as0p}] [\-\-always\-asbp] [\-\-rr\-conflict\ {disconnect\ |\ call\-pri\-lost\ |\ violently}] [\-\-ping\-timeout\ {\fIval\fR}] [\-\-data\-integrity\-alg\ {\fIval\fR}] [\-\-tcp\-cork] [\-\-on\-congestion\ {block\ |\ pull\-ahead\ |\ disconnect}] [\-\-congestion\-fill\ {\fIval\fR}] [\-\-congestion\-extents\ {\fIval\fR}] [\-\-csums\-alg\ {\fIval\fR}] [\-\-verify\-alg\ {\fIval\fR}] [\-\-use\-rle] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR disk\-options \fIminor\fR [\-\-on\-io\-error\ {pass_on\ |\ call\-local\-io\-error\ |\ detach}] [\-\-fencing\ {dont\-care\ |\ resource\-only\ |\ resource\-and\-stonith}] [\-\-disk\-barrier] [\-\-disk\-flushes] [\-\-disk\-drain] [\-\-md\-flushes] [\-\-resync\-rate\ {\fIval\fR}] [\-\-resync\-after\ {\fIval\fR}] [\-\-al\-extents\ {\fIval\fR}] [\-\-al\-updates] [\-\-c\-plan\-ahead\ {\fIval\fR}] [\-\-c\-delay\-target\ {\fIval\fR}] [\-\-c\-fill\-target\ {\fIval\fR}] [\-\-c\-max\-rate\ {\fIval\fR}] [\-\-c\-min\-rate\ {\fIval\fR}] [\-\-disk\-timeout\ {\fIval\fR}] [\-\-read\-balancing\ {prefer\-local\ |\ prefer\-remote\ |\ round\-robin\ |\ least\-pending\ |\ when\-congested\-remote\ |\ 32K\-striping\ |\ 64K\-striping\ |\ 128K\-striping\ |\ 256K\-striping\ |\ 512K\-striping\ |\ 1M\-striping}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR net\-options \fIlocal_addr\fR \fIremote_addr\fR [\-\-protocol\ {A\ |\ B\ |\ C}] [\-\-timeout\ {\fIval\fR}] [\-\-max\-epoch\-size\ {\fIval\fR}] [\-\-max\-buffers\ {\fIval\fR}] [\-\-unplug\-watermark\ {\fIval\fR}] [\-\-connect\-int\ {\fIval\fR}] [\-\-ping\-int\ {\fIval\fR}] [\-\-sndbuf\-size\ {\fIval\fR}] [\-\-rcvbuf\-size\ {\fIval\fR}] [\-\-ko\-count\ {\fIval\fR}] [\-\-allow\-two\-primaries] [\-\-cram\-hmac\-alg\ {\fIval\fR}] [\-\-shared\-secret\ {\fIval\fR}] [\-\-after\-sb\-0pri\ {disconnect\ |\ discard\-younger\-primary\ |\ discard\-older\-primary\ |\ discard\-zero\-changes\ |\ discard\-least\-changes\ |\ discard\-local\ |\ discard\-remote}] [\-\-after\-sb\-1pri\ {disconnect\ |\ consensus\ |\ discard\-secondary\ |\ call\-pri\-lost\-after\-sb\ |\ violently\-as0p}] [\-\-after\-sb\-2pri\ {disconnect\ |\ call\-pri\-lost\-after\-sb\ |\ violently\-as0p}] [\-\-always\-asbp] [\-\-rr\-conflict\ {disconnect\ |\ call\-pri\-lost\ |\ violently}] [\-\-ping\-timeout\ {\fIval\fR}] [\-\-data\-integrity\-alg\ {\fIval\fR}] [\-\-tcp\-cork] [\-\-on\-congestion\ {block\ |\ pull\-ahead\ |\ disconnect}] [\-\-congestion\-fill\ {\fIval\fR}] [\-\-congestion\-extents\ {\fIval\fR}] [\-\-csums\-alg\ {\fIval\fR}] [\-\-verify\-alg\ {\fIval\fR}] [\-\-use\-rle] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR resource\-options \fIresource\fR [\-\-cpu\-mask\ {\fIval\fR}] [\-\-on\-no\-data\-accessible\ {io\-error\ |\ suspend\-io}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR disconnect \fIlocal_addr\fR \fIremote_addr\fR [\-\-force] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR detach \fIminor\fR [\-\-force] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR primary \fIminor\fR [\-\-force] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR secondary \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR down \fIresource\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR verify \fIminor\fR [\-\-start\ {\fIval\fR}] [\-\-stop\ {\fIval\fR}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR invalidate \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR invalidate\-remote \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR wait\-connect \fIminor\fR [\-\-wfc\-timeout\ {\fIval\fR}] [\-\-degr\-wfc\-timeout\ {\fIval\fR}] [\-\-outdated\-wfc\-timeout\ {\fIval\fR}] [\-\-wait\-after\-sb\ {\fIval\fR}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR wait\-sync \fIminor\fR [\-\-wfc\-timeout\ {\fIval\fR}] [\-\-degr\-wfc\-timeout\ {\fIval\fR}] [\-\-outdated\-wfc\-timeout\ {\fIval\fR}] [\-\-wait\-after\-sb\ {\fIval\fR}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR role \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR cstate \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR dstate \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR resize \fIminor\fR [\-\-size\ {\fIval\fR}] [\-\-assume\-peer\-has\-space] [\-\-assume\-clean] [\-\-al\-stripes\ {\fIval\fR}] [\-\-al\-stripe\-size\-kB\ {\fIval\fR}] .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR check\-resize \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR pause\-sync \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR resume\-sync \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR outdate \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR show\-gi \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR get\-gi \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR show {\fIresource\fR | \fIminor\fR | \fIall\fR} .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR suspend\-io \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR resume\-io \fIminor\fR .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR events {\fIresource\fR | \fIminor\fR | \fIall\fR} .HP \w'\fBdrbdsetup\fR\ 'u \fBdrbdsetup\fR new\-current\-uuid \fIminor\fR [\-\-clear\-bitmap] .SH "DESCRIPTION" .PP drbdsetup is used to associate DRBD devices with their backing block devices, to set up DRBD device pairs to mirror their backing block devices, and to inspect the configuration of running DRBD devices\&. .SH "NOTE" .PP drbdsetup is a low level tool of the DRBD program suite\&. It is used by the data disk and drbd scripts to communicate with the device driver\&. .SH "COMMANDS" .PP Each drbdsetup sub\-command might require arguments and bring its own set of options\&. All values have default units which might be overruled by K, M or G\&. These units are defined in the usual way (e\&.g\&. K = 2^10 = 1024)\&. .SS "Common options" .PP All drbdsetup sub\-commands accept these two options .PP \fB\-\-create\-device\fR .RS 4 In case the specified DRBD device (minor number) does not exist yet, create it implicitly\&. .RE .SS "new\-resource" .PP Resources are the primary objects of any DRBD configuration\&. A resource must be created with the \fBnew\-resource\fR command before any volumes or minor devices can be created\&. Connections are referenced by name\&. .SS "new\-minor" .PP A \fIminor\fR is used as a synonym for replicated block device\&. It is represented in the /dev/ directory by a block device\&. It is the application\*(Aqs interface to the DRBD\-replicated block devices\&. These block devices get addressed by their minor numbers on the drbdsetup commandline\&. .PP A pair of replicated block devices may have different minor numbers on the two machines\&. They are associated by a common \fIvolume\-number\fR\&. Volume numbers are local to each connection\&. Minor numbers are global on one node\&. .SS "del\-resource" .PP Destroys a resource object\&. This is only possible if the resource has no volumes\&. .SS "del\-minor" .PP Minors can only be destroyed if its disk is detached\&. .SS "attach, disk\-options" .\" drbdsetup: disk .PP Attach associates \fIdevice\fR with \fIlower_device\fR to store its data blocks on\&. The \fB\-d\fR (or \fB\-\-disk\-size\fR) should only be used if you wish not to use as much as possible from the backing block devices\&. If you do not use \fB\-d\fR, the \fIdevice\fR is only ready for use as soon as it was connected to its peer once\&. (See the \fBnet\fR command\&.) .PP With the disk\-options command it is possible to change the options of a minor while it is attached\&. .PP \fB\-\-disk\-size \fR\fB\fIsize\fR\fR .RS 4 You can override DRBD\*(Aqs size determination method with this option\&. If you need to use the device before it was ever connected to its peer, use this option to pass the \fIsize\fR of the DRBD device to the driver\&. Default unit is sectors (1s = 512 bytes)\&. .sp If you use the \fIsize\fR parameter in drbd\&.conf, we strongly recommend to add an explicit unit postfix\&. drbdadm and drbdsetup used to have mismatching default units\&. .RE .PP \fB\-\-on\-io\-error \fR\fB\fIerr_handler\fR\fR .RS 4 If the driver of the \fIlower_device\fR reports an error to DRBD, DRBD will mark the disk as inconsistent, call a helper program, or detach the device from its backing storage and perform all further IO by requesting it from the peer\&. The valid \fIerr_handlers\fR are: \fBpass_on\fR, \fBcall\-local\-io\-error\fR and \fBdetach\fR\&. .RE .PP \fB\-\-fencing \fR\fB\fIfencing_policy\fR\fR .RS 4 Under \fBfencing\fR we understand preventive measures to avoid situations where both nodes are primary and disconnected (AKA split brain)\&. .sp Valid fencing policies are: .PP \fBdont\-care\fR .RS 4 This is the default policy\&. No fencing actions are done\&. .RE .PP \fBresource\-only\fR .RS 4 If a node becomes a disconnected primary, it tries to outdate the peer\*(Aqs disk\&. This is done by calling the fence\-peer handler\&. The handler is supposed to reach the other node over alternative communication paths and call \*(Aqdrbdadm outdate res\*(Aq there\&. .RE .PP \fBresource\-and\-stonith\fR .RS 4 If a node becomes a disconnected primary, it freezes all its IO operations and calls its fence\-peer handler\&. The fence\-peer handler is supposed to reach the peer over alternative communication paths and call \*(Aqdrbdadm outdate res\*(Aq there\&. In case it cannot reach the peer, it should stonith the peer\&. IO is resumed as soon as the situation is resolved\&. In case your handler fails, you can resume IO with the \fBresume\-io\fR command\&. .RE .RE .PP \fB\-\-disk\-barrier\fR, \fB\-\-disk\-flushes\fR, \fB\-\-disk\-drain\fR .RS 4 DRBD has four implementations to express write\-after\-write dependencies to its backing storage device\&. DRBD will use the first method that is supported by the backing storage device and that is not disabled\&. By default the \fIflush\fR method is used\&. .sp Since drbd\-8\&.4\&.2 \fBdisk\-barrier\fR is disabled by default because since linux\-2\&.6\&.36 (or 2\&.6\&.32 RHEL6) there is no reliable way to determine if queuing of IO\-barriers works\&. \fIDangerous\fR only enable if you are told so by one that knows for sure\&. .sp When selecting the method you should not only base your decision on the measurable performance\&. In case your backing storage device has a volatile write cache (plain disks, RAID of plain disks) you should use one of the first two\&. In case your backing storage device has battery\-backed write cache you may go with option 3\&. Option 4 (disable everything, use "none") \fIis dangerous\fR on most IO stacks, may result in write\-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles\&. \fIDo not use\fR \fBno\-disk\-drain\fR\&. .sp Unfortunately device mapper (LVM) might not support barriers\&. .sp The letter after "wo:" in /proc/drbd indicates with method is currently in use for a device: b, f, d, n\&. The implementations: .PP barrier .RS 4 The first requires that the driver of the backing storage device support barriers (called \*(Aqtagged command queuing\*(Aq in SCSI and \*(Aqnative command queuing\*(Aq in SATA speak)\&. The use of this method can be enabled by setting the \fBdisk\-barrier\fR options to \fByes\fR\&. .RE .PP flush .RS 4 The second requires that the backing device support disk flushes (called \*(Aqforce unit access\*(Aq in the drive vendors speak)\&. The use of this method can be disabled setting \fBdisk\-flushes\fR to \fBno\fR\&. .RE .PP drain .RS 4 The third method is simply to let write requests drain before write requests of a new reordering domain are issued\&. That was the only implementation before 8\&.0\&.9\&. .RE .PP none .RS 4 The fourth method is to not express write\-after\-write dependencies to the backing store at all, by also specifying \fB\-\-no\-disk\-drain\fR\&. This \fIis dangerous\fR on most IO stacks, may result in write\-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles\&. \fIDo not use\fR \fB\-\-no\-disk\-drain\fR\&. .RE .RE .PP \fB\-\-md\-flushes\fR .RS 4 Disables the use of disk flushes and barrier BIOs when accessing the meta data device\&. See the notes on \fB\-\-disk\-flushes\fR\&. .RE .PP \fB\-\-max\-bio\-bvecs\fR .RS 4 In some special circumstances the device mapper stack manages to pass BIOs to DRBD that violate the constraints that are set forth by DRBD\*(Aqs merge_bvec() function and which have more than one bvec\&. A known example is: phys\-disk \-> DRBD \-> LVM \-> Xen \-> missaligned partition (63) \-> DomU FS\&. Then you might see "bio would need to, but cannot, be split:" in the Dom0\*(Aqs kernel log\&. .sp The best workaround is to proper align the partition within the VM (E\&.g\&. start it at sector 1024)\&. That costs 480 KiB of storage\&. Unfortunately the default of most Linux partitioning tools is to start the first partition at an odd number (63)\&. Therefore most distributions install helpers for virtual linux machines will end up with missaligned partitions\&. The second best workaround is to limit DRBD\*(Aqs max bvecs per BIO (i\&.e\&., the \fBmax\-bio\-bvecs\fR option) to 1, but that might cost performance\&. .sp The default value of \fBmax\-bio\-bvecs\fR is 0, which means that there is no user imposed limitation\&. .RE .PP \fB\-\-resync\-rate \fR\fB\fIrate\fR\fR .RS 4 To ensure smooth operation of the application on top of DRBD, it is possible to limit the bandwidth that may be used by background synchronization\&. The default is 250 KiB/sec, the default unit is KiB/sec\&. .RE .PP \fB\-\-resync\-after \fR\fB\fIminor\fR\fR .RS 4 Start resync on this device only if the device with \fIminor\fR is already in connected state\&. Otherwise this device waits in SyncPause state\&. .RE .PP \fB\-\-al\-extents \fR\fB\fIextents\fR\fR .RS 4 DRBD automatically performs hot area detection\&. With this parameter you control how big the hot area (=active set) can get\&. Each extent marks 4M of the backing storage\&. In case a primary node leaves the cluster unexpectedly, the areas covered by the active set must be resynced upon rejoining of the failed node\&. The data structure is stored in the meta\-data area, therefore each change of the active set is a write operation to the meta\-data device\&. A higher number of extents gives longer resync times but less updates to the meta\-data\&. The default number of \fIextents\fR is 1237\&. (Minimum: 7, Maximum: 65534) .sp See also \fBdrbd.conf\fR(5) and \fBdrbdmeta\fR(8) for additional limitations and necessary preparation\&. .RE .PP \fB\-\-al\-updates \fR\fB{yes | no}\fR .RS 4 DRBD\*(Aqs activity log transaction writing makes it possible, that after the crash of a primary node a partial (bit\-map based) resync is sufficient to bring the node back to up\-to\-date\&. Setting \fBal\-updates\fR to \fBno\fR might increase normal operation performance but causes DRBD to do a full resync when a crashed primary gets reconnected\&. The default value is \fByes\fR\&. .RE .PP \fB\-\-c\-plan\-ahead \fR\fB\fIplan_time\fR\fR, \fB\-\-c\-fill\-target \fR\fB\fIfill_target\fR\fR, \fB\-\-c\-delay\-target \fR\fB\fIdelay_target\fR\fR, \fB\-\-c\-max\-rate \fR\fB\fImax_rate\fR\fR .RS 4 The dynamic resync speed controller gets enabled with setting \fIplan_time\fR to a positive value\&. It aims to fill the buffers along the data path with either a constant amount of data \fIfill_target\fR, or aims to have a constant delay time of \fIdelay_target\fR along the path\&. The controller has an upper bound of \fImax_rate\fR\&. .sp By \fIplan_time\fR the agility of the controller is configured\&. Higher values yield for slower/lower responses of the controller to deviation from the target value\&. It should be at least 5 times RTT\&. For regular data paths a \fIfill_target\fR in the area of 4k to 100k is appropriate\&. For a setup that contains drbd\-proxy it is advisable to use \fIdelay_target\fR instead\&. Only when \fIfill_target\fR is set to 0 the controller will use \fIdelay_target\fR\&. 5 times RTT is a reasonable starting value\&. \fIMax_rate\fR should be set to the bandwidth available between the DRBD\-hosts and the machines hosting DRBD\-proxy, or to the available disk\-bandwidth\&. .sp The default value of \fIplan_time\fR is 0, the default unit is 0\&.1 seconds\&. \fIFill_target\fR has 0 and sectors as default unit\&. \fIDelay_target\fR has 1 (100ms) and 0\&.1 as default unit\&. \fIMax_rate\fR has 10240 (100MiB/s) and KiB/s as default unit\&. .RE .PP \fB\-\-c\-min\-rate \fR\fB\fImin_rate\fR\fR .RS 4 We track the disk IO rate caused by the resync, so we can detect non\-resync IO on the lower level device\&. If the lower level device seems to be busy, and the current resync rate is above \fImin_rate\fR, we throttle the resync\&. .sp The default value of \fImin_rate\fR is 4M, the default unit is k\&. If you want to not throttle at all, set it to zero, if you want to throttle always, set it to one\&. .RE .PP \fB\-t\fR, \fB\-\-disk\-timeout \fR\fB\fIdisk_timeout\fR\fR .RS 4 If the driver of the \fIlower_device\fR does not finish an IO request within \fIdisk_timeout\fR, DRBD considers the disk as failed\&. If DRBD is connected to a remote host, it will reissue local pending IO requests to the peer, and ship all new IO requests to the peer only\&. The disk state advances to diskless, as soon as the backing block device has finished all IO requests\&. .sp The default value of is 0, which means that no timeout is enforced\&. The default unit is 100ms\&. This option is available since 8\&.3\&.12\&. .RE .PP \fB\-\-read\-balancing \fR\fB\fImethod\fR\fR .RS 4 The supported \fImethods\fR for load balancing of read requests are \fBprefer\-local\fR, \fBprefer\-remote\fR, \fBround\-robin\fR, \fBleast\-pending\fR and \fBwhen\-congested\-remote\fR, \fB32K\-striping\fR, \fB64K\-striping\fR, \fB128K\-striping\fR, \fB256K\-striping\fR, \fB512K\-striping\fR and \fB1M\-striping\fR\&. .sp The default value of is \fBprefer\-local\fR\&. This option is available since 8\&.4\&.1\&. .RE .SS "connect, net\-options" .\" drbdsetup: net .PP Connect sets up the \fIdevice\fR to listen on \fIaf:local_addr:port\fR for incoming connections and to try to connect to \fIaf:remote_addr:port\fR\&. If \fIport\fR is omitted, 7788 is used as default\&. If \fIaf\fR is omitted \fBipv4\fR gets used\&. Other supported address families are \fBipv6\fR, \fBssocks\fR for Dolphin Interconnect Solutions\*(Aq "super sockets" and \fBsdp\fR for Sockets Direct Protocol (Infiniband)\&. .PP The net\-options command allows you to change options while the connection is established\&. .PP \fB\-\-protocol \fR\fB\fIprotocol\fR\fR .RS 4 On the TCP/IP link the specified \fIprotocol\fR is used\&. Valid protocol specifiers are A, B, and C\&. .sp Protocol A: write IO is reported as completed, if it has reached local disk and local TCP send buffer\&. .sp Protocol B: write IO is reported as completed, if it has reached local disk and remote buffer cache\&. .sp Protocol C: write IO is reported as completed, if it has reached both local and remote disk\&. .RE .PP \fB\-\-connect\-int \fR\fB\fItime\fR\fR .RS 4 In case it is not possible to connect to the remote DRBD device immediately, DRBD keeps on trying to connect\&. With this option you can set the time between two retries\&. The default value is 10\&. The unit is seconds\&. .RE .PP \fB\-\-ping\-int \fR\fB\fItime\fR\fR .RS 4 If the TCP/IP connection linking a DRBD device pair is idle for more than \fItime\fR seconds, DRBD will generate a keep\-alive packet to check if its partner is still alive\&. The default value is 10\&. The unit is seconds\&. .RE .PP \fB\-\-timeout \fR\fB\fIval\fR\fR .RS 4 If the partner node fails to send an expected response packet within \fIval\fR tenths of a second, the partner node is considered dead and therefore the TCP/IP connection is abandoned\&. The default value is 60 (= 6 seconds)\&. .RE .PP \fB\-\-sndbuf\-size \fR\fB\fIsize\fR\fR .RS 4 The socket send buffer is used to store packets sent to the secondary node, which are not yet acknowledged (from a network point of view) by the secondary node\&. When using protocol A, it might be necessary to increase the size of this data structure in order to increase asynchronicity between primary and secondary nodes\&. But keep in mind that more asynchronicity is synonymous with more data loss in the case of a primary node failure\&. Since 8\&.0\&.13 resp\&. 8\&.2\&.7 setting the \fIsize\fR value to 0 means that the kernel should autotune this\&. The default \fIsize\fR is 0, i\&.e\&. autotune\&. .RE .PP \fB\-\-rcvbuf\-size \fR\fB\fIsize\fR\fR .RS 4 Packets received from the network are stored in the socket receive buffer first\&. From there they are consumed by DRBD\&. Before 8\&.3\&.2 the receive buffer\*(Aqs size was always set to the size of the socket send buffer\&. Since 8\&.3\&.2 they can be tuned independently\&. A value of 0 means that the kernel should autotune this\&. The default \fIsize\fR is 0, i\&.e\&. autotune\&. .RE .PP \fB\-\-ko\-count \fR\fB\fIcount\fR\fR .RS 4 In case the secondary node fails to complete a single write request for \fIcount\fR times the \fItimeout\fR, it is expelled from the cluster, i\&.e\&. the primary node goes into StandAlone mode\&. The default is 0, which disables this feature\&. .RE .PP \fB\-\-max\-epoch\-size \fR\fB\fIval\fR\fR .RS 4 With this option the maximal number of write requests between two barriers is limited\&. Typically set to the same as \fB\-\-max\-buffers\fR, or the allowed maximum\&. Values smaller than 10 can lead to degraded performance\&. The default value is 2048\&. .RE .PP \fB\-\-max\-buffers \fR\fB\fIval\fR\fR .RS 4 With this option the maximal number of buffer pages allocated by DRBD\*(Aqs receiver thread is limited\&. Typically set to the same as \fB\-\-max\-epoch\-size\fR\&. Small values could lead to degraded performance\&. The default value is 2048, the minimum 32\&. Increase this if you cannot saturate the IO backend of the receiving side during linear write or during resync while otherwise idle\&. .sp See also \fBdrbd.conf\fR(5) .RE .PP \fB\-\-unplug\-watermark \fR\fB\fIval\fR\fR .RS 4 This setting has no effect with recent kernels that use explicit on\-stack plugging (upstream Linux kernel 2\&.6\&.39, distributions may have backported)\&. .sp When the number of pending write requests on the standby (secondary) node exceeds the unplug\-watermark, we trigger the request processing of our backing storage device\&. Some storage controllers deliver better performance with small values, others deliver best performance when the value is set to the same value as max\-buffers, yet others don\*(Aqt feel much effect at all\&. Minimum 16, default 128, maximum 131072\&. .RE .PP \fB\-\-allow\-two\-primaries \fR .RS 4 With this option set you may assign primary role to both nodes\&. You only should use this option if you use a shared storage file system on top of DRBD\&. At the time of writing the only ones are: OCFS2 and GFS\&. If you use this option with any other file system, you are going to crash your nodes and to corrupt your data! .RE .PP \fB\-\-cram\-hmac\-alg \fR\fB\fIalg\fR\fR .RS 4 You need to specify the HMAC algorithm to enable peer authentication at all\&. You are strongly encouraged to use peer authentication\&. The HMAC algorithm will be used for the challenge response authentication of the peer\&. You may specify any digest algorithm that is named in /proc/crypto\&. .RE .PP \fB\-\-shared\-secret \fR\fB\fIsecret\fR\fR .RS 4 The shared secret used in peer authentication\&. May be up to 64 characters\&. .RE .PP \fB\-\-after\-sb\-0pri \fR\fB\fIasb\-0p\-policy\fR\fR .RS 4 possible policies are: .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBdiscard\-younger\-primary\fR .RS 4 Auto sync from the node that was primary before the split\-brain situation occurred\&. .RE .PP \fBdiscard\-older\-primary\fR .RS 4 Auto sync from the node that became primary as second during the split\-brain situation\&. .RE .PP \fBdiscard\-zero\-changes\fR .RS 4 In case one node did not write anything since the split brain became evident, sync from the node that wrote something to the node that did not write anything\&. In case none wrote anything this policy uses a random decision to perform a "resync" of 0 blocks\&. In case both have written something this policy disconnects the nodes\&. .RE .PP \fBdiscard\-least\-changes\fR .RS 4 Auto sync from the node that touched more blocks during the split brain situation\&. .RE .PP \fBdiscard\-node\-NODENAME\fR .RS 4 Auto sync to the named node\&. .RE .RE .PP \fB\-\-after\-sb\-1pri \fR\fB\fIasb\-1p\-policy\fR\fR .RS 4 possible policies are: .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBconsensus\fR .RS 4 Discard the version of the secondary if the outcome of the \fBafter\-sb\-0pri\fR algorithm would also destroy the current secondary\*(Aqs data\&. Otherwise disconnect\&. .RE .PP \fBdiscard\-secondary\fR .RS 4 Discard the secondary\*(Aqs version\&. .RE .PP \fBcall\-pri\-lost\-after\-sb\fR .RS 4 Always honor the outcome of the \fBafter\-sb\-0pri \fR algorithm\&. In case it decides the current secondary has the correct data, call the \fBpri\-lost\-after\-sb\fR on the current primary\&. .RE .PP \fBviolently\-as0p\fR .RS 4 Always honor the outcome of the \fBafter\-sb\-0pri \fR algorithm\&. In case it decides the current secondary has the correct data, accept a possible instantaneous change of the primary\*(Aqs data\&. .RE .RE .PP \fB\-\-after\-sb\-2pri \fR\fB\fIasb\-2p\-policy\fR\fR .RS 4 possible policies are: .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBcall\-pri\-lost\-after\-sb\fR .RS 4 Always honor the outcome of the \fBafter\-sb\-0pri \fR algorithm\&. In case it decides the current secondary has the right data, call the \fBpri\-lost\-after\-sb\fR on the current primary\&. .RE .PP \fBviolently\-as0p\fR .RS 4 Always honor the outcome of the \fBafter\-sb\-0pri \fR algorithm\&. In case it decides the current secondary has the right data, accept a possible instantaneous change of the primary\*(Aqs data\&. .RE .RE .PP \fB\-\-always\-asbp\fR .RS 4 Normally the automatic after\-split\-brain policies are only used if current states of the UUIDs do not indicate the presence of a third node\&. .sp With this option you request that the automatic after\-split\-brain policies are used as long as the data sets of the nodes are somehow related\&. This might cause a full sync, if the UUIDs indicate the presence of a third node\&. (Or double faults have led to strange UUID sets\&.) .RE .PP \fB\-\-rr\-conflict \fR\fB\fIrole\-resync\-conflict\-policy\fR\fR .RS 4 This option sets DRBD\*(Aqs behavior when DRBD deduces from its meta data that a resynchronization is needed, and the SyncTarget node is already primary\&. The possible settings are: \fBdisconnect\fR, \fBcall\-pri\-lost\fR and \fBviolently\fR\&. While \fBdisconnect\fR speaks for itself, with the \fBcall\-pri\-lost\fR setting the \fBpri\-lost\fR handler is called which is expected to either change the role of the node to secondary, or remove the node from the cluster\&. The default is \fBdisconnect\fR\&. .sp With the \fBviolently\fR setting you allow DRBD to force a primary node into SyncTarget state\&. This means that the data exposed by DRBD changes to the SyncSource\*(Aqs version of the data instantaneously\&. USE THIS OPTION ONLY IF YOU KNOW WHAT YOU ARE DOING\&. .RE .PP \fB\-\-data\-integrity\-alg \fR\fB\fIhash_alg\fR\fR .RS 4 DRBD can ensure the data integrity of the user\*(Aqs data on the network by comparing hash values\&. Normally this is ensured by the 16 bit checksums in the headers of TCP/IP packets\&. This option can be set to any of the kernel\*(Aqs data digest algorithms\&. In a typical kernel configuration you should have at least one of \fBmd5\fR, \fBsha1\fR, and \fBcrc32c\fR available\&. By default this is not enabled\&. .sp See also the notes on data integrity on the drbd\&.conf manpage\&. .RE .PP \fB\-\-no\-tcp\-cork\fR .RS 4 DRBD usually uses the TCP socket option TCP_CORK to hint to the network stack when it can expect more data, and when it should flush out what it has in its send queue\&. There is at least one network stack that performs worse when one uses this hinting method\&. Therefore we introduced this option, which disable the setting and clearing of the TCP_CORK socket option by DRBD\&. .RE .PP \fB\-\-ping\-timeout \fR\fB\fIping_timeout\fR\fR .RS 4 The time the peer has to answer to a keep\-alive packet\&. In case the peer\*(Aqs reply is not received within this time period, it is considered dead\&. The default unit is tenths of a second, the default value is 5 (for half a second)\&. .RE .PP \fB\-\-discard\-my\-data\fR .RS 4 Use this option to manually recover from a split\-brain situation\&. In case you do not have any automatic after\-split\-brain policies selected, the nodes refuse to connect\&. By passing this option you make this node a sync target immediately after successful connect\&. .RE .PP \fB\-\-tentative\fR .RS 4 Causes DRBD to abort the connection process after the resync handshake, i\&.e\&. no resync gets performed\&. You can find out which resync DRBD would perform by looking at the kernel\*(Aqs log file\&. .RE .PP \fB\-\-on\-congestion \fR\fB\fIcongestion_policy\fR\fR, \fB\-\-congestion\-fill \fR\fB\fIfill_threshold\fR\fR, \fB\-\-congestion\-extents \fR\fB\fIactive_extents_threshold\fR\fR .RS 4 By default DRBD blocks when the available TCP send queue becomes full\&. That means it will slow down the application that generates the write requests that cause DRBD to send more data down that TCP connection\&. .sp When DRBD is deployed with DRBD\-proxy it might be more desirable that DRBD goes into AHEAD/BEHIND mode shortly before the send queue becomes full\&. In AHEAD/BEHIND mode DRBD does no longer replicate data, but still keeps the connection open\&. .sp The advantage of the AHEAD/BEHIND mode is that the application is not slowed down, even if DRBD\-proxy\*(Aqs buffer is not sufficient to buffer all write requests\&. The downside is that the peer node falls behind, and that a resync will be necessary to bring it back into sync\&. During that resync the peer node will have an inconsistent disk\&. .sp Available \fIcongestion_policy\fRs are \fBblock\fR and \fBpull\-ahead\fR\&. The default is \fBblock\fR\&. \fIFill_threshold\fR might be in the range of 0 to 10GiBytes\&. The default is 0 which disables the check\&. \fIActive_extents_threshold\fR has the same limits as \fBal\-extents\fR\&. .sp The AHEAD/BEHIND mode and its settings are available since DRBD 8\&.3\&.10\&. .RE .PP \fB\-\-verify\-alg \fR\fB\fIhash\-alg\fR\fR .RS 4 During online verification (as initiated by the \fBverify\fR sub\-command), rather than doing a bit\-wise comparison, DRBD applies a hash function to the contents of every block being verified, and compares that hash with the peer\&. This option defines the hash algorithm being used for that purpose\&. It can be set to any of the kernel\*(Aqs data digest algorithms\&. In a typical kernel configuration you should have at least one of \fBmd5\fR, \fBsha1\fR, and \fBcrc32c\fR available\&. By default this is not enabled; you must set this option explicitly in order to be able to use on\-line device verification\&. .sp See also the notes on data integrity on the drbd\&.conf manpage\&. .RE .PP \fB\-\-csums\-alg \fR\fB\fIhash\-alg\fR\fR .RS 4 A resync process sends all marked data blocks form the source to the destination node, as long as no \fBcsums\-alg\fR is given\&. When one is specified the resync process exchanges hash values of all marked blocks first, and sends only those data blocks over, that have different hash values\&. .sp This setting is useful for DRBD setups with low bandwidth links\&. During the restart of a crashed primary node, all blocks covered by the activity log are marked for resync\&. But a large part of those will actually be still in sync, therefore using \fBcsums\-alg\fR will lower the required bandwidth in exchange for CPU cycles\&. .RE .PP \fB\-\-use\-rle\fR .RS 4 During resync\-handshake, the dirty\-bitmaps of the nodes are exchanged and merged (using bit\-or), so the nodes will have the same understanding of which blocks are dirty\&. On large devices, the fine grained dirty\-bitmap can become large as well, and the bitmap exchange can take quite some time on low\-bandwidth links\&. .sp Because the bitmap typically contains compact areas where all bits are unset (clean) or set (dirty), a simple run\-length encoding scheme can considerably reduce the network traffic necessary for the bitmap exchange\&. .sp For backward compatibility reasons, and because on fast links this possibly does not improve transfer time but consumes cpu cycles, this defaults to off\&. .sp Introduced in 8\&.3\&.2\&. .RE .SS "resource\-options" .\" drbdsetup: resource-options .PP Changes the options of the resource at runtime\&. .PP \fB\-\-cpu\-mask \fR\fB\fIcpu\-mask\fR\fR .RS 4 Sets the cpu\-affinity\-mask for DRBD\*(Aqs kernel threads of this device\&. The default value of \fIcpu\-mask\fR is 0, which means that DRBD\*(Aqs kernel threads should be spread over all CPUs of the machine\&. This value must be given in hexadecimal notation\&. If it is too big it will be truncated\&. .RE .PP \fB\-\-on\-no\-data\-accessible \fR\fB\fIond\-policy\fR\fR .RS 4 This setting controls what happens to IO requests on a degraded, disk less node (I\&.e\&. no data store is reachable)\&. The available policies are \fBio\-error\fR and \fBsuspend\-io\fR\&. .sp If \fIond\-policy\fR is set to \fBsuspend\-io\fR you can either resume IO by attaching/connecting the last lost data storage, or by the \fBdrbdadm resume\-io \fR\fB\fIres\fR\fR command\&. The latter will result in IO errors of course\&. .sp The default is \fBio\-error\fR\&. This setting is available since DRBD 8\&.3\&.9\&. .RE .SS "primary" .\" drbdsetup: primary .PP Sets the \fIdevice\fR into primary role\&. This means that applications (e\&.g\&. a file system) may open the \fIdevice\fR for read and write access\&. Data written to the \fIdevice\fR in primary role are mirrored to the device in secondary role\&. .PP Normally it is not possible to set both devices of a connected DRBD device pair to primary role\&. By using the \fB\-\-allow\-two\-primaries\fR option, you override this behavior and instruct DRBD to allow two primaries\&. .PP \fB\-\-overwrite\-data\-of\-peer\fR .RS 4 Alias for \-\-force\&. .RE .PP \fB\-\-force\fR .RS 4 Becoming primary fails if the local replica is not up\-to\-date\&. I\&.e\&. when it is inconsistent, outdated of consistent\&. By using this option you can force it into primary role anyway\&. USE THIS OPTION ONLY IF YOU KNOW WHAT YOU ARE DOING\&. .RE .SS "secondary" .\" drbdsetup: secondary .PP Brings the \fIdevice\fR into secondary role\&. This operation fails as long as at least one application (or file system) has opened the device\&. .PP It is possible that both devices of a connected DRBD device pair are secondary\&. .SS "verify" .\" drbdsetup: verify .PP This initiates on\-line device verification\&. During on\-line verification, the contents of every block on the local node are compared to those on the peer node\&. Device verification progress can be monitored via /proc/drbd\&. Any blocks whose content differs from that of the corresponding block on the peer node will be marked out\-of\-sync in DRBD\*(Aqs on\-disk bitmap; they are \fInot\fR brought back in sync automatically\&. To do that, simply disconnect and reconnect the resource\&. .PP If on\-line verification is already in progress (and this node is "VerifyS"), this command silently "succeeds"\&. In this case, any start\-sector (see below) will be ignored, and any stop\-sector (see below) will be honored\&. This can be used to stop a running verify, or to update/shorten/extend the coverage of the currently running verify\&. .PP This command will fail if the \fIdevice\fR is not part of a connected device pair\&. .PP See also the notes on data integrity on the drbd\&.conf manpage\&. .PP \fB\-\-start \fR\fB\fIstart\-sector\fR\fR .RS 4 Since version 8\&.3\&.2, on\-line verification should resume from the last position after connection loss\&. It may also be started from an arbitrary position by setting this option\&. If you had reached some stop\-sector before, and you do not specify an explicit start\-sector, verify should resume from the previous stop\-sector\&. .sp Default unit is sectors\&. You may also specify a unit explicitly\&. The \fBstart\-sector\fR will be rounded down to a multiple of 8 sectors (4kB)\&. .RE .PP \fB\-S\fR, \fB\-\-stop \fR\fB\fIstop\-sector\fR\fR .RS 4 Since version 8\&.3\&.14, on\-line verification can be stopped before it reaches end\-of\-device\&. .sp Default unit is sectors\&. You may also specify a unit explicitly\&. The \fBstop\-sector\fR may be updated by issuing an additional drbdsetup verify command on the same node while the verify is running\&. This can be used to stop a running verify, or to update/shorten/extend the coverage of the currently running verify\&. .RE .SS "invalidate" .\" drbdsetup: invalidate .PP This forces the local device of a pair of connected DRBD devices into SyncTarget state, which means that all data blocks of the device are copied over from the peer\&. .PP This command will fail if the \fIdevice\fR is not either part of a connected device pair, or disconnected Secondary\&. .SS "invalidate\-remote" .\" drbdsetup: invalidate-remote .PP This forces the local device of a pair of connected DRBD devices into SyncSource state, which means that all data blocks of the device are copied to the peer\&. .PP On a disconnected Primary device, this will set all bits in the out of sync bitmap\&. As a side affect this suspends updates to the on disk activity log\&. Updates to the on disk activity log resume automatically when necessary\&. .SS "wait\-connect" .\" drbdsetup: wait-connect .PP Returns as soon as the \fIdevice\fR can communicate with its partner device\&. .PP \fB\-\-wfc\-timeout \fR\fB\fIwfc_timeout\fR\fR, \fB\-\-degr\-wfc\-timeout \fR\fB\fIdegr_wfc_timeout\fR\fR, \fB\-\-outdated\-wfc\-timeout \fR\fB\fIoutdated_wfc_timeout\fR\fR, \fB\-\-wait\-after\-sb\fR .RS 4 This command will fail if the \fIdevice\fR cannot communicate with its partner for \fItimeout\fR seconds\&. If the peer was working before this node was rebooted, the \fIwfc_timeout\fR is used\&. If the peer was already down before this node was rebooted, the \fIdegr_wfc_timeout\fR is used\&. If the peer was successfully outdated before this node was rebooted the \fIoutdated_wfc_timeout\fR is used\&. The default value for all those timeout values is 0 which means to wait forever\&. The unit is seconds\&. In case the connection status goes down to StandAlone because the peer appeared but the devices had a split brain situation, the default for the command is to terminate\&. You can change this behavior with the \fB\-\-wait\-after\-sb\fR option\&. .RE .SS "wait\-sync" .\" drbdsetup: wait-sync .PP Returns as soon as the \fIdevice\fR leaves any synchronization into connected state\&. The options are the same as with the \fIwait\-connect\fR command\&. .SS "disconnect" .\" drbdsetup: disconnect .PP Removes the information set by the \fBnet\fR command from the \fIdevice\fR\&. This means that the \fIdevice\fR goes into unconnected state and will no longer listen for incoming connections\&. .SS "detach" .\" drbdsetup: detach .PP Removes the information set by the \fBdisk\fR command from the \fIdevice\fR\&. This means that the \fIdevice\fR is detached from its backing storage device\&. .PP \fB\-f\fR, \fB\-\-force\fR .RS 4 A regular detach returns after the disk state finally reached diskless\&. As a consequence detaching from a frozen backing block device never terminates\&. .sp On the other hand A forced detach returns immediately\&. It allows you to detach DRBD from a frozen backing block device\&. Please note that the disk will be marked as failed until all pending IO requests where finished by the backing block device\&. .RE .SS "down" .\" drbdsetup: down .PP Removes all configuration information from the \fIdevice\fR and forces it back to unconfigured state\&. .SS "role" .\" drbdsetup: role .PP Shows the current roles of the \fIdevice\fR and its peer, as \fIlocal\fR/\fIpeer\fR\&. .SS "state" .\" drbdsetup: state .PP Deprecated alias for "role" .SS "cstate" .\" drbdsetup: cstate .PP Shows the current connection state of the \fIdevice\fR\&. .SS "dstate" .\" drbdsetup: dstate .PP Shows the current states of the backing storage devices, as \fIlocal\fR/\fIpeer\fR\&. .SS "resize" .\" drbdsetup: resize .PP This causes DRBD to reexamine the size of the \fIdevice\fR\*(Aqs backing storage device\&. To actually do online growing you need to extend the backing storages on both devices and call the \fBresize\fR command on one of your nodes\&. .PP The \fB\-\-size\fR option can be used to online shrink the usable size of a drbd device\&. It\*(Aqs the users responsibility to make sure that a file system on the device is not truncated by that operation\&. .PP The \fB\-\-assume\-peer\-has\-space\fR allows you to resize a device which is currently not connected to the peer\&. Use with care, since if you do not resize the peer\*(Aqs disk as well, further connect attempts of the two will fail\&. .PP When the \fB\-\-assume\-clean\fR option is given DRBD will skip the resync of the new storage\&. Only do this if you know that the new storage was initialized to the same content by other means\&. .PP The options \fB\-\-al\-stripes\fR and \fB\-\-al\-stripe\-size\-kB\fR may be used to change the layout of the activity log online\&. In case of internal meta data this may invovle shrinking the user visible size at the same time (unsing the \fB\-\-size\fR) or increasing the avalable space on the backing devices\&. .SS "check\-resize" .\" drbdsetup: check-resize .PP To enable DRBD to detect offline resizing of backing devices this command may be used to record the current size of backing devices\&. The size is stored in files in /var/lib/drbd/ named drbd\-minor\-??\&.lkbd .PP This command is called by \fBdrbdadm resize \fR\fB\fIres\fR\fR after \fBdrbdsetup \fR\fB\fIdevice\fR\fR\fB resize\fR returned\&. .SS "pause\-sync" .\" drbdsetup: pause-sync .PP Temporarily suspend an ongoing resynchronization by setting the local pause flag\&. Resync only progresses if neither the local nor the remote pause flag is set\&. It might be desirable to postpone DRBD\*(Aqs resynchronization after eventual resynchronization of the backing storage\*(Aqs RAID setup\&. .SS "resume\-sync" .\" drbdsetup: resume-sync .PP Unset the local sync pause flag\&. .SS "outdate" .\" drbdsetup: outdate .PP Mark the data on the local backing storage as outdated\&. An outdated device refuses to become primary\&. This is used in conjunction with \fBfencing\fR and by the peer\*(Aqs \fBfence\-peer\fR handler\&. .SS "show\-gi" .\" drbdsetup: show-gi .PP Displays the device\*(Aqs data generation identifiers verbosely\&. .SS "get\-gi" .\" drbdsetup: get-gi .PP Displays the device\*(Aqs data generation identifiers\&. .SS "show" .\" drbdsetup: show .PP Shows all available configuration information of the \fIdevice\fR\&. .SS "suspend\-io" .\" drbdsetup: suspend-io .PP This command is of no apparent use and just provided for the sake of completeness\&. .SS "resume\-io" .\" drbdsetup: resume-io .PP If the fence\-peer handler fails to stonith the peer node, and your \fBfencing\fR policy is set to resource\-and\-stonith, you can unfreeze IO operations with this command\&. .SS "events" .\" drbdsetup: events .PP Displays every state change of DRBD and all calls to helper programs\&. This might be used to get notified of DRBD\*(Aqs state changes by piping the output to another program\&. .PP \fB\-\-all\-devices\fR .RS 4 Display the events of all DRBD minors\&. .RE .PP \fB\-\-unfiltered\fR .RS 4 This is a debugging aid that displays the content of all received netlink messages\&. .RE .SS "new\-current\-uuid" .\" drbdsetup: new-current-uuid .PP Generates a new current UUID and rotates all other UUID values\&. This has at least two use cases, namely to skip the initial sync, and to reduce network bandwidth when starting in a single node configuration and then later (re\-)integrating a remote site\&. .PP Available option: .PP \fB\-\-clear\-bitmap\fR .RS 4 Clears the sync bitmap in addition to generating a new current UUID\&. .RE .PP This can be used to skip the initial sync, if you want to start from scratch\&. This use\-case does only work on "Just Created" meta data\&. Necessary steps: .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} On \fIboth\fR nodes, initialize meta data and configure the device\&. .sp \fBdrbdadm create\-md \-\-force \fR\fB\fIres\fR\fR .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} They need to do the initial handshake, so they know their sizes\&. .sp \fBdrbdadm up \fR\fB\fIres\fR\fR .RE .sp .RS 4 .ie n \{\ \h'-04' 3.\h'+01'\c .\} .el \{\ .sp -1 .IP " 3." 4.2 .\} They are now Connected Secondary/Secondary Inconsistent/Inconsistent\&. Generate a new current\-uuid and clear the dirty bitmap\&. .sp \fBdrbdadm new\-current\-uuid \-\-clear\-bitmap \fR\fB\fIres\fR\fR .RE .sp .RS 4 .ie n \{\ \h'-04' 4.\h'+01'\c .\} .el \{\ .sp -1 .IP " 4." 4.2 .\} They are now Connected Secondary/Secondary UpToDate/UpToDate\&. Make one side primary and create a file system\&. .sp \fBdrbdadm primary \fR\fB\fIres\fR\fR .sp \fBmkfs \-t \fR\fB\fIfs\-type\fR\fR\fB $(drbdadm sh\-dev \fR\fB\fIres\fR\fR\fB)\fR .RE .PP One obvious side\-effect is that the replica is full of old garbage (unless you made them identical using other means), so any online\-verify is expected to find any number of out\-of\-sync blocks\&. .PP \fIYou must not use this on pre\-existing data!\fR Even though it may appear to work at first glance, once you switch to the other node, your data is toast, as it never got replicated\&. So \fIdo not leave out the mkfs\fR (or equivalent)\&. .PP This can also be used to shorten the initial resync of a cluster where the second node is added after the first node is gone into production, by means of disk shipping\&. This use\-case works on disconnected devices only, the device may be in primary or secondary role\&. .PP The necessary steps on the current active server are: .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} \fBdrbdsetup new\-current\-uuid \-\-clear\-bitmap \fR\fB\fIminor\fR\fR\fB \fR .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} Take the copy of the current active server\&. E\&.g\&. by pulling a disk out of the RAID1 controller, or by copying with dd\&. You need to copy the actual data, and the meta data\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 3.\h'+01'\c .\} .el \{\ .sp -1 .IP " 3." 4.2 .\} \fBdrbdsetup new\-current\-uuid \fR\fB\fIminor\fR\fR\fB \fR .RE .sp Now add the disk to the new secondary node, and join it to the cluster\&. You will get a resync of that parts that were changed since the first call to \fBdrbdsetup\fR in step 1\&. .SH "EXAMPLES" .PP For examples, please have a look at the \m[blue]\fBDRBD User\*(Aqs Guide\fR\m[]\&\s-2\u[1]\d\s+2\&. .SH "VERSION" .sp This document was revised for version 8\&.3\&.2 of the DRBD distribution\&. .SH "AUTHOR" .sp Written by Philipp Reisner philipp\&.reisner@linbit\&.com and Lars Ellenberg lars\&.ellenberg@linbit\&.com .SH "REPORTING BUGS" .sp Report bugs to drbd\-user@lists\&.linbit\&.com\&. .SH "COPYRIGHT" .sp Copyright 2001\-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg\&. This is free software; see the source for copying conditions\&. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. .SH "SEE ALSO" .PP \fBdrbd.conf\fR(5), \fBdrbd\fR(8), \fBdrbddisk\fR(8), \fBdrbdadm\fR(8), \m[blue]\fBDRBD User\*(Aqs Guide\fR\m[]\&\s-2\u[1]\d\s+2, \m[blue]\fBDRBD web site\fR\m[]\&\s-2\u[2]\d\s+2 .SH "NOTES" .IP " 1." 4 DRBD User's Guide .RS 4 \%http://www.drbd.org/users-guide/ .RE .IP " 2." 4 DRBD web site .RS 4 \%http://www.drbd.org/ .RE drbd-8.4.4/documentation/drbdadm.80000664000000000000000000002700512226007147015506 0ustar rootroot'\" t .\" Title: drbdadm .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 6 May 2011 .\" Manual: System Administration .\" Source: DRBD 8.4.0 .\" Language: English .\" .TH "DRBDADM" "8" "6 May 2011" "DRBD 8.4.0" "System Administration" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" drbdadm \- Administration tool for DRBD .\" drbdadm .SH "SYNOPSIS" .HP \w'\fBdrbdadm\fR\ 'u \fBdrbdadm\fR [\-d] [\-c\ {\fIfile\fR}] [\-t\ {\fIfile\fR}] [\-s\ {\fIcmd\fR}] [\-m\ {\fIcmd\fR}] [\-S] [\-h\ {\fIhost\fR}] [\-\-\ {\fIbackend\-options\fR}] {\fIcommand\fR} [{all} | {\fIresource\fR\fI[/volume>]\fR...}] .SH "DESCRIPTION" .PP \fBDrbdadm\fR is the high level tool of the DRBD program suite\&. \fBDrbdadm\fR is to \fBdrbdsetup\fR and \fBdrbdmeta\fR what \fBifup\fR/\fBifdown\fR is to \fBifconfig\fR\&. \fBDrbdadm\fR reads its configuration file and performs the specified commands by calling the \fBdrbdsetup\fR and/or the \fBdrbdmeta\fR program\&. .PP \fBDrbdadm\fR can operate on whole resources or on individual volumes in a resource\&. The sub commands: \fBattach\fR, \fBdetach\fR, \fBprimary\fR, \fBsecondary\fR, \fBinvalidate\fR, \fBinvalidate\-remote\fR, \fBoutdate\fR, \fBresize\fR, \fBverify\fR, \fBpause\-sync\fR, \fBresume\-sync\fR, \fBrole\fR, \fBcsytate\fR, \fBdstate\fR, \fBcreate\-md\fR, \fBshow\-gi\fR, \fBget\-gi\fR, \fBdump\-md\fR, \fBwipe\-md\fR work on whole resources and on individual volumes\&. .PP Resource level only commands are: \fBconnect\fR, \fBdisconnect\fR, \fBup\fR, \fBdown\fR, \fBwait\-connect\fR and \fBdump\fR\&. .SH "OPTIONS" .PP \fB\-d\fR, \fB\-\-dry\-run\fR .RS 4 Just prints the calls of \fBdrbdsetup\fR to stdout, but does not run the commands\&. .RE .PP \fB\-c\fR, \fB\-\-config\-file\fR \fIfile\fR .RS 4 Specifies the configuration file drbdadm will use\&. If this parameter is not specified, drbdadm will look for \fB/etc/drbd\-84\&.conf\fR, \fB/etc/drbd\-83\&.conf\fR, \fB/etc/drbd\-08\&.conf\fR and \fB/etc/drbd\&.conf\fR\&. .RE .PP \fB\-t\fR, \fB\-\-config\-to\-test\fR \fIfile\fR .RS 4 Specifies an additional configuration file drbdadm to check\&. This option is only allowed with the dump and the sh\-nop commands\&. .RE .PP \fB\-s\fR, \fB\-\-drbdsetup\fR \fIfile\fR .RS 4 Specifies the full path to the \fBdrbdsetup\fR program\&. If this option is omitted, drbdadm will look for \fB/sbin/drbdsetup\fR and \fB\&./drbdsetup\fR\&. .RE .PP \fB\-m\fR, \fB\-\-drbdmeta\fR \fIfile\fR .RS 4 Specifies the full path to the \fBdrbdmeta\fR program\&. If this option is omitted, drbdadm will look for \fB/sbin/drbdmeta\fR and \fB\&./drbdmeta\fR\&. .RE .PP \fB\-S\fR, \fB\-\-stacked\fR .RS 4 Specifies that this command should be performed on a stacked resource\&. .RE .PP \fB\-P\fR, \fB\-\-peer\fR .RS 4 Specifies to which peer node to connect\&. Only necessary if there are more than two host sections in the resource you are working on\&. .RE .PP \fB\-\-\fR \fIbackend\-options\fR .RS 4 All options following the doubly hyphen are considered \fIbackend\-options\fR\&. These are passed through to the backend command\&. I\&.e\&. to \fBdrbdsetup\fR, \fBdrbdmeta\fR or \fBdrbd\-proxy\-ctl\fR\&. .RE .SH "COMMANDS" .PP attach .RS 4 Attaches a local backing block device to the DRBD resource\*(Aqs device\&. .RE .PP detach .RS 4 .\" drbdadm: detach Removes the backing storage device from a DRBD resource\*(Aqs device\&. .RE .PP connect .RS 4 .\" drbdadm: connect Sets up the network configuration of the resource\*(Aqs device\&. If the peer device is already configured, the two DRBD devices will connect\&. If there are more than two host sections in the resource you need to use the \fB\-\-peer\fR option to select the peer you want to connect to\&. .RE .PP disconnect .RS 4 .\" drbdadm: disconnect Removes the network configuration from the resource\&. The device will then go into StandAlone state\&. .RE .PP syncer .RS 4 .\" drbdadm: syncer Loads the resynchronization parameters into the device\&. .RE .PP up .RS 4 .\" drbdadm: up Is a shortcut for attach and connect\&. .RE .PP down .RS 4 .\" drbdadm: down Is a shortcut for disconnect and detach\&. .RE .PP primary .RS 4 .\" drbdadm: primary Promote the resource\*(Aqs device into primary role\&. You need to do this before any access to the device, such as creating or mounting a file system\&. .RE .PP secondary .RS 4 .\" drbdadm: secondary Brings the device back into secondary role\&. This is needed since in a connected DRBD device pair, only one of the two peers may have primary role (except if \fBallow\-two\-primaries\fR is explicitly set in the configuration file)\&. .RE .PP invalidate .RS 4 .\" drbdadm: invalidate Forces DRBD to consider the data on the \fIlocal\fR backing storage device as out\-of\-sync\&. Therefore DRBD will copy each and every block from its peer, to bring the local storage device back in sync\&. To avoid races, you need an established replication link, or be disconnected Secondary\&. .RE .PP invalidate\-remote .RS 4 .\" drbdadm: invalidate-remote This command is similar to the invalidate command, however, the \fIpeer\*(Aqs\fR backing storage is invalidated and hence rewritten with the data of the local node\&. To avoid races, you need an established replication link, or be disconnected Primary\&. .RE .PP resize .RS 4 .\" drbdadm: resize Causes DRBD to re\-examine all sizing constraints, and resize the resource\*(Aqs device accordingly\&. For example, if you increased the size of your backing storage devices (on both nodes, of course), then DRBD will adopt the new size after you called this command on one of your nodes\&. Since new storage space must be synchronised this command only works if there is at least one primary node present\&. .sp The \fB\-\-size\fR option can be used to online shrink the usable size of a drbd device\&. It\*(Aqs the users responsibility to make sure that a file system on the device is not truncated by that operation\&. .sp The \fB\-\-assume\-peer\-has\-space\fR allows you to resize a device which is currently not connected to the peer\&. Use with care, since if you do not resize the peer\*(Aqs disk as well, further connect attempts of the two will fail\&. .sp The \fB\-\-assume\-clean\fR allows you to resize an existing device and avoid syncing the new space\&. This is useful when adding addtional blank storage to your device\&. Example: .sp .if n \{\ .RS 4 .\} .nf # drbdadm \-\- \-\-assume\-clean resize r0 .fi .if n \{\ .RE .\} .sp The options \fB\-\-al\-stripes\fR and \fB\-\-al\-stripe\-size\-kB\fR may be used to change the layout of the activity log online\&. In case of internal meta data this may invovle shrinking the user visible size at the same time (unsing the \fB\-\-size\fR) or increasing the avalable space on the backing devices\&. .RE .PP check\-resize .RS 4 .\" drbdadm: check-resize Calls drbdmeta to eventually move internal meta data\&. If the backing device was resized, while DRBD was not running, meta data has to be moved to the end of the device, so that the next \fBattach\fR command can succeed\&. .RE .PP create\-md .RS 4 .\" drbdadm: create-md Initializes the meta data storage\&. This needs to be done before a DRBD resource can be taken online for the first time\&. In case of issues with that command have a look at \fBdrbdmeta\fR(8) .RE .PP get\-gi .RS 4 .\" drbdadm: get-gi Shows a short textual representation of the data generation identifiers\&. .RE .PP show\-gi .RS 4 .\" drbdadm: show-gi Prints a textual representation of the data generation identifiers including explanatory information\&. .RE .PP dump\-md .RS 4 .\" drbdadm: dump-md Dumps the whole contents of the meta data storage, including the stored bit\-map and activity\-log, in a textual representation\&. .RE .PP outdate .RS 4 .\" drbdadm: outdate Sets the outdated flag in the meta data\&. .RE .PP adjust .RS 4 .\" drbdadm: adjust Synchronizes the configuration of the device with your configuration file\&. You should always examine the output of the dry\-run mode before actually executing this command\&. .RE .PP wait\-connect .RS 4 .\" drbdadm: wait-connect Waits until the device is connected to its peer device\&. .RE .PP role .RS 4 .\" drbdadm: role Shows the current roles of the devices (local/peer)\&. E\&.g\&. Primary/Secondary .RE .PP state .RS 4 .\" drbdadm: state Deprecated alias for "role", see above\&. .RE .PP cstate .RS 4 .\" drbdadm: cstate Shows the current connection state of the devices\&. .RE .PP dump .RS 4 .\" drbdadm: dump Just parse the configuration file and dump it to stdout\&. May be used to check the configuration file for syntactic correctness\&. .RE .PP outdate .RS 4 .\" drbdadm: outdate Used to mark the node\*(Aqs data as outdated\&. Usually used by the peer\*(Aqs fence\-peer handler\&. .RE .PP verify .RS 4 .\" drbdadm: verify Starts online verify\&. During online verify, data on both nodes is compared for equality\&. See /proc/drbd for online verify progress\&. If out\-of\-sync blocks are found, they are \fInot\fR resynchronized automatically\&. To do that, \fBdisconnect\fR and \fBconnect\fR the resource when verification has completed\&. .sp See also the notes on data integrity on the drbd\&.conf manpage\&. .RE .PP pause\-sync .RS 4 .\" drbdadm: pause-sync Temporarily suspend an ongoing resynchronization by setting the local pause flag\&. Resync only progresses if neither the local nor the remote pause flag is set\&. It might be desirable to postpone DRBD\*(Aqs resynchronization until after any resynchronization of the backing storage\*(Aqs RAID setup\&. .RE .PP resume\-sync .RS 4 .\" drbdadm: resume-sync Unset the local sync pause flag\&. .RE .PP new\-current\-uuid .RS 4 .\" drbdadm: new-current-uuid Generates a new currend UUID and rotates all other UUID values\&. .sp This can be used to shorten the initial resync of a cluster\&. See the \fBdrbdsetup\fR manpage for a more details\&. .RE .PP dstate .RS 4 .\" drbdadm: dstate Show the current state of the backing storage devices\&. (local/peer) .RE .PP hidden\-commands .RS 4 Shows all commands undocumented on purpose\&. .RE .SH "VERSION" .sp This document was revised for version 8\&.4\&.0 of the DRBD distribution\&. .SH "AUTHOR" .sp Written by Philipp Reisner philipp\&.reisner@linbit\&.com and Lars Ellenberg lars\&.ellenberg@linbit\&.com .SH "REPORTING BUGS" .sp Report bugs to drbd\-user@lists\&.linbit\&.com\&. .SH "COPYRIGHT" .sp Copyright 2001\-2011 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg\&. This is free software; see the source for copying conditions\&. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. .SH "SEE ALSO" .PP \fBdrbd.conf\fR(5), \fBdrbd\fR(8), \fBdrbddisk\fR(8), \fBdrbdsetup\fR(8), \fBdrbdmeta\fR(8) and the \m[blue]\fBDRBD project web site\fR\m[]\&\s-2\u[1]\d\s+2 .SH "NOTES" .IP " 1." 4 DRBD project web site .RS 4 \%http://www.drbd.org/ .RE drbd-8.4.4/documentation/drbdmeta.80000664000000000000000000001526512226007147015700 0ustar rootroot'\" t .\" Title: drbdmeta .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 15 Oct 2008 .\" Manual: System Administration .\" Source: DRBD 8.3.2 .\" Language: English .\" .TH "DRBDMETA" "8" "15 Oct 2008" "DRBD 8.3.2" "System Administration" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" drbdmeta \- DRBD\*(Aqs meta data management tool .\" drbdmeta .SH "SYNOPSIS" .HP \w'\fBdrbdmeta\fR\ 'u \fBdrbdmeta\fR [\-\-force] [\-\-ignore\-sanity\-checks] {\fIdevice\fR} {v06\ \fIminor\fR | v07\ \fImeta_dev\ index\fR | v08\ \fImeta_dev\ index\fR} {\fIcommand\fR} [\fIcmd\ args\fR...] .SH "DESCRIPTION" .PP Drbdmeta is used to create, display and modify the contents of DRBD\*(Aqs meta data storage\&. Usually you do not want to use this command directly, but start it via the frontend \fBdrbdadm\fR(8)\&. .PP This command only works if the DRBD resource is currently down, or at least detached from its backing storage\&. The first parameter is the device node associated to the resource\&. With the second parameter you can select the version of the meta data\&. Currently all major DRBD releases (0\&.6, 0\&.7 and 8) are supported\&. .SH "OPTIONS" .PP \-\-force .RS 4 .\" drbdmeta: --force All questions that get asked by drbdmeta are treated as if the user answered \*(Aqyes\*(Aq\&. .RE .PP \-\-ignore\-sanity\-checks .RS 4 .\" drbdmeta: --ignore-sanity-checks Some sanity checks cause drbdmeta to terminate\&. E\&.g\&. if a file system image would get destroyed by creating the meta data\&. By using that option you can force drbdmeta to ignore these checks\&. .RE .SH "COMMANDS" .PP create\-md \fB\-\-peer\-max\-bio\-size \fR\fB\fIval\fR\fR \fB\-\-al\-stripes \fR\fB\fIval\fR\fR \fB\-\-al\-stripe\-size\-kB \fR\fB\fIval\fR\fR .RS 4 .\" drbdmeta: create-md Create\-md initializes the meta data storage\&. This needs to be done before a DRBD resource can be taken online for the first time\&. In case there is already a meta data signature of an older format in place, drbdmeta will ask you if it should convert the older format to the selected format\&. .sp If you will use the resource before it is connected to its peer for the first time DRBD may perform better if you use the \fB\-\-peer\-max\-bio\-size\fR option\&. For DRBD versions of the peer use up to these values: <8\&.3\&.7 \-> 4k, 8\&.3\&.8 \-> 32k, 8\&.3\&.9 \-> 128k, 8\&.4\&.0 \-> 1M\&. .sp If you want to use more than 6433 activity log extents, or live on top of a spriped RAID, you may specify the number of stripes (\fB\-\-al\-stripes\fR, default 1), and the stripe size (\fB\-\-al\-stripe\-size\-kB\fR, default 32)\&. To just use a larger linear on\-disk ring\-buffer, leave the number of stripes at 1, and increase the size only: \fBdrbdmeta 0 v08 /dev/vg23/lv42 internal create\-md \-\-al\-stripe\-size 1M\fR .sp To avoid a single "spindle" from becoming a bottleneck, increase the number of stripes, to achieve an interleaved layout of the on\-disk activity\-log transactions\&. What you give as "stripe\-size" should be what is a\&.k\&.a\&. "chunk size" or "granularity" or "strip unit": the minimum skip to the next "spindle"\&. \fBdrbdmeta 0 v08 /dev/vg23/lv42 internal create\-md \-\-al\-stripes 7 \-\-al\-stripe\-size 64k\fR .RE .PP get\-gi .RS 4 .\" drbdmeta: get-gi Get\-gi shows a short textual representation of the data generation identifier\&. In version 0\&.6 and 0\&.7 these are generation counters, while in version 8 it is a set of UUIDs\&. .RE .PP show\-gi .RS 4 .\" drbdmeta: show-gi Show\-gi prints a textual representation of the data generation identifiers including explanatory information\&. .RE .PP dump\-md .RS 4 .\" drbdmeta: dump-md Dumps the whole contents of the meta data storage including the stored bit\-map and activity\-log in a textual representation\&. .RE .PP outdate .RS 4 .\" drbdmeta: outdate Sets the outdated flag in the meta data\&. This is used by the peer node when it wants to become primary, but cannot communicate with the DRBD stack on this host\&. .RE .PP dstate .RS 4 .\" drbdmeta: dstate Prints the state of the data on the backing storage\&. The output is always followed by \*(Aq/DUnknown\*(Aq since drbdmeta only looks at the local meta data\&. .RE .PP check\-resize .RS 4 .\" drbdmeta: check-resize Examines the device size of a backing device, and it\*(Aqs last known device size, recorded in a file /var/lib/drbd/drbd\-minor\-??\&.lkbd\&. In case the size of the backing device changed, and the meta data can be found at the old position, it moves the meta data to the right position at the end of the block device\&. .RE .SH "EXPERT'S COMMANDS" .PP Drbdmeta allows you to modify the meta data as well\&. This is intentionally omitted for the command\*(Aqs usage output, since you should only use it if you really know what you are doing\&. By setting the generation identifiers to wrong values, you risk to overwrite your up\-to\-data data with an older version of your data\&. .PP set\-gi \fIgi\fR .RS 4 .\" drbdmeta: set-gi Set\-gi allows you to set the generation identifier\&. \fIGi\fR needs to be a generation counter for the 0\&.6 and 0\&.7 format, and a UUID set for 8\&.x\&. Specify it in the same way as get\-gi shows it\&. .RE .PP restore\-md \fIdump_file\fR .RS 4 .\" drbdmeta: restore-md Reads the \fIdump_file\fR and writes it to the meta data\&. .RE .SH "VERSION" .sp This document was revised for version 8\&.3\&.2 of the DRBD distribution\&. .SH "AUTHOR" .sp Written by Philipp Reisner philipp\&.reisner@linbit\&.com and Lars Ellenberg lars\&.ellenberg@linbit\&.com\&. .SH "REPORTING BUGS" .sp Report bugs to drbd\-user@lists\&.linbit\&.com\&. .SH "COPYRIGHT" .sp Copyright 2001\-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg\&. This is free software; see the source for copying conditions\&. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. .SH "SEE ALSO" .PP \fBdrbdadm\fR(8) drbd-8.4.4/documentation/drbd.conf.50000664000000000000000000014245712226007146015755 0ustar rootroot'\" t .\" Title: drbd.conf .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 6 May 2011 .\" Manual: Configuration Files .\" Source: DRBD 8.4.0 .\" Language: English .\" .TH "DRBD\&.CONF" "5" "6 May 2011" "DRBD 8.4.0" "Configuration Files" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" drbd.conf \- Configuration file for DRBD\*(Aqs devices .\" drbd.conf .SH "INTRODUCTION" .PP The file \fB/etc/drbd\&.conf\fR is read by \fBdrbdadm\fR\&. .PP The file format was designed as to allow to have a verbatim copy of the file on both nodes of the cluster\&. It is highly recommended to do so in order to keep your configuration manageable\&. The file \fB/etc/drbd\&.conf\fR should be the same on both nodes of the cluster\&. Changes to \fB/etc/drbd\&.conf\fR do not apply immediately\&. .PP By convention the main config contains two include statements\&. The first one includes the file \fB/etc/drbd\&.d/global_common\&.conf\fR, the second one all file with a \fB\&.res\fR suffix\&. .PP .PP \fBExample\ \&1.\ \&A small example.res file\fR .sp .if n \{\ .RS 4 .\} .nf resource r0 { net { protocol C; cram\-hmac\-alg sha1; shared\-secret "FooFunFactory"; } disk { resync\-rate 10M; } on alice { volume 0 { device minor 1; disk /dev/sda7; meta\-disk internal; } address 10\&.1\&.1\&.31:7789; } on bob { volume 0 { device minor 1; disk /dev/sda7; meta\-disk internal; } address 10\&.1\&.1\&.32:7789; } } .fi .if n \{\ .RE .\}In this example, there is a single DRBD resource (called r0) which uses protocol C for the connection between its devices\&. It contains a single volume which runs on host \fIalice\fR uses \fI/dev/drbd1\fR as devices for its application, and \fI/dev/sda7\fR as low\-level storage for the data\&. The IP addresses are used to specify the networking interfaces to be used\&. An eventually running resync process should use about 10MByte/second of IO bandwidth\&. This sync\-rate statement is valid for volume 0, but would also be valid for further volumes\&. In this example it assigns full 10MByte/second to each volume\&. .PP There may be multiple resource sections in a single drbd\&.conf file\&. For more examples, please have a look at the \m[blue]\fBDRBD User\*(Aqs Guide\fR\m[]\&\s-2\u[1]\d\s+2\&. .SH "FILE FORMAT" .PP The file consists of sections and parameters\&. A section begins with a keyword, sometimes an additional name, and an opening brace (\(lq{\(rq)\&. A section ends with a closing brace (\(lq}\(rq\&. The braces enclose the parameters\&. .PP section [name] { parameter value; [\&.\&.\&.] } .PP A parameter starts with the identifier of the parameter followed by whitespace\&. Every subsequent character is considered as part of the parameter\*(Aqs value\&. A special case are Boolean parameters which consist only of the identifier\&. Parameters are terminated by a semicolon (\(lq;\(rq)\&. .PP Some parameter values have default units which might be overruled by K, M or G\&. These units are defined in the usual way (K = 2^10 = 1024, M = 1024 K, G = 1024 M)\&. .PP Comments may be placed into the configuration file and must begin with a hash sign (\(lq#\(rq)\&. Subsequent characters are ignored until the end of the line\&. .SS "Sections" .PP \fBskip\fR .RS 4 .\" drbd.conf: skip Comments out chunks of text, even spanning more than one line\&. Characters between the keyword \fBskip\fR and the opening brace (\(lq{\(rq) are ignored\&. Everything enclosed by the braces is skipped\&. This comes in handy, if you just want to comment out some \*(Aq\fBresource [name] {\&.\&.\&.}\fR\*(Aq section: just precede it with \*(Aq\fBskip\fR\*(Aq\&. .RE .PP \fBglobal\fR .RS 4 .\" drbd.conf: global Configures some global parameters\&. Currently only \fBminor\-count\fR, \fBdialog\-refresh\fR, \fBdisable\-ip\-verification\fR and \fBusage\-count\fR are allowed here\&. You may only have one global section, preferably as the first section\&. .RE .PP \fBcommon\fR .RS 4 .\" drbd.conf: common All resources inherit the options set in this section\&. The common section might have a \fBstartup\fR, a \fBoptions\fR, a \fBhandlers\fR, a \fBnet\fR and a \fBdisk\fR section\&. .RE .PP \fBresource \fR\fB\fIname\fR\fR .RS 4 .\" drbd.conf: resource Configures a DRBD resource\&. Each resource section needs to have two (or more) \fBon \fR\fB\fIhost\fR\fR sections and may have a \fBstartup\fR, a \fBoptions\fR, a \fBhandlers\fR, a \fBnet\fR and a \fBdisk\fR section\&. It might contain \fBvolume\fRs sections\&. .RE .PP \fBon \fR\fB\fIhost\-name\fR\fR .RS 4 .\" drbd.conf: on Carries the necessary configuration parameters for a DRBD device of the enclosing resource\&. \fIhost\-name\fR is mandatory and must match the Linux host name (uname \-n) of one of the nodes\&. You may list more than one host name here, in case you want to use the same parameters on several hosts (you\*(Aqd have to move the IP around usually)\&. Or you may list more than two such sections\&. .sp .if n \{\ .RS 4 .\} .nf resource r1 { protocol C; device minor 1; meta\-disk internal; on alice bob { address 10\&.2\&.2\&.100:7801; disk /dev/mapper/some\-san; } on charlie { address 10\&.2\&.2\&.101:7801; disk /dev/mapper/other\-san; } on daisy { address 10\&.2\&.2\&.103:7801; disk /dev/mapper/other\-san\-as\-seen\-from\-daisy; } } .fi .if n \{\ .RE .\} .sp See also the \fBfloating\fR section keyword\&. Required statements in this section: \fBaddress\fR and \fBvolume\fR\&. Note for backward compatibility and convenience it is valid to embed the statements of a single volume directly into the host section\&. .RE .PP \fBvolume \fR\fB\fIvnr\fR\fR .RS 4 .\" drbd.conf: volume Defines a volume within a connection\&. The minor numbers of a replicated volume might be different on different hosts, the volume number (\fIvnr\fR) is what groups them together\&. Required parameters in this section: \fBdevice\fR, \fBdisk\fR, \fBmeta\-disk\fR\&. .RE .PP \fBstacked\-on\-top\-of \fR\fB\fIresource\fR\fR .RS 4 .\" drbd.conf: stacked-on-top-of For a stacked DRBD setup (3 or 4 nodes), a \fBstacked\-on\-top\-of\fR is used instead of an \fBon\fR section\&. Required parameters in this section: \fBdevice\fR and \fBaddress\fR\&. .RE .PP \fBfloating \fR\fB\fIAF addr:port\fR\fR .RS 4 .\" drbd.conf: on Carries the necessary configuration parameters for a DRBD device of the enclosing resource\&. This section is very similar to the \fBon\fR section\&. The difference to the \fBon\fR section is that the matching of the host sections to machines is done by the IP\-address instead of the node name\&. Required parameters in this section: \fBdevice\fR, \fBdisk\fR, \fBmeta\-disk\fR, all of which \fImay\fR be inherited from the resource section, in which case you may shorten this section down to just the address identifier\&. .sp .if n \{\ .RS 4 .\} .nf resource r2 { protocol C; device minor 2; disk /dev/sda7; meta\-disk internal; # short form, device, disk and meta\-disk inherited floating 10\&.1\&.1\&.31:7802; # longer form, only device inherited floating 10\&.1\&.1\&.32:7802 { disk /dev/sdb; meta\-disk /dev/sdc8; } } .fi .if n \{\ .RE .\} .RE .PP \fBdisk\fR .RS 4 .\" drbd.conf: disk This section is used to fine tune DRBD\*(Aqs properties in respect to the low level storage\&. Please refer to \fBdrbdsetup\fR(8) for detailed description of the parameters\&. Optional parameters: \fBon\-io\-error\fR, \fBsize\fR, \fBfencing\fR, \fBdisk\-barrier\fR, \fBdisk\-flushes\fR, \fBdisk\-drain\fR, \fBmd\-flushes\fR, \fBmax\-bio\-bvecs\fR, \fBresync\-rate\fR, \fBresync\-after\fR, \fBal\-extents\fR, \fBal\-updates\fR, \fBc\-plan\-ahead\fR, \fBc\-fill\-target\fR, \fBc\-delay\-target\fR, \fBc\-max\-rate\fR, \fBc\-min\-rate\fR, \fBdisk\-timeout\fR, \fBread\-balancing\fR\&. .RE .PP \fBnet\fR .RS 4 .\" drbd.conf: net This section is used to fine tune DRBD\*(Aqs properties\&. Please refer to \fBdrbdsetup\fR(8) for a detailed description of this section\*(Aqs parameters\&. Optional parameters: \fBprotocol\fR, \fBsndbuf\-size\fR, \fBrcvbuf\-size\fR, \fBtimeout\fR, \fBconnect\-int\fR, \fBping\-int\fR, \fBping\-timeout\fR, \fBmax\-buffers\fR, \fBmax\-epoch\-size\fR, \fBko\-count\fR, \fBallow\-two\-primaries\fR, \fBcram\-hmac\-alg\fR, \fBshared\-secret\fR, \fBafter\-sb\-0pri\fR, \fBafter\-sb\-1pri\fR, \fBafter\-sb\-2pri\fR, \fBdata\-integrity\-alg\fR, \fBno\-tcp\-cork\fR, \fBon\-congestion\fR, \fBcongestion\-fill\fR, \fBcongestion\-extents\fR, \fBverify\-alg\fR, \fBuse\-rle\fR, \fBcsums\-alg\fR\&. .RE .PP \fBstartup\fR .RS 4 .\" drbd.conf: startup This section is used to fine tune DRBD\*(Aqs properties\&. Please refer to \fBdrbdsetup\fR(8) for a detailed description of this section\*(Aqs parameters\&. Optional parameters: \fBwfc\-timeout\fR, \fBdegr\-wfc\-timeout\fR, \fBoutdated\-wfc\-timeout\fR, \fBwait\-after\-sb\fR, \fBstacked\-timeouts\fR and \fBbecome\-primary\-on\fR\&. .RE .PP \fBoptions\fR .RS 4 .\" drbd.conf: options This section is used to fine tune the behaviour of the resource object\&. Please refer to \fBdrbdsetup\fR(8) for a detailed description of this section\*(Aqs parameters\&. Optional parameters: \fBcpu\-mask\fR, and \fBon\-no\-data\-accessible\fR\&. .RE .PP \fBhandlers\fR .RS 4 .\" drbd.conf: handlers In this section you can define handlers (executables) that are started by the DRBD system in response to certain events\&. Optional parameters: \fBpri\-on\-incon\-degr\fR, \fBpri\-lost\-after\-sb\fR, \fBpri\-lost\fR, \fBfence\-peer\fR (formerly oudate\-peer), \fBlocal\-io\-error\fR, \fBinitial\-split\-brain\fR, \fBsplit\-brain\fR, \fBbefore\-resync\-target\fR, \fBafter\-resync\-target\fR\&. .sp The interface is done via environment variables: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} \fBDRBD_RESOURCE\fR is the name of the resource .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} \fBDRBD_MINOR\fR is the minor number of the DRBD device, in decimal\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} \fBDRBD_CONF\fR is the path to the primary configuration file; if you split your configuration into multiple files (e\&.g\&. in \fB/etc/drbd\&.conf\&.d/\fR), this will not be helpful\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} \fBDRBD_PEER_AF\fR , \fBDRBD_PEER_ADDRESS\fR , \fBDRBD_PEERS\fR are the address family (e\&.g\&. \fBipv6\fR), the peer\*(Aqs address and hostnames\&. .RE .sp \fBDRBD_PEER\fR is deprecated\&. .sp Please note that not all of these might be set for all handlers, and that some values might not be useable for a \fBfloating\fR definition\&. .RE .SS "Parameters" .PP \fBminor\-count \fR\fB\fIcount\fR\fR .RS 4 .\" drbd.conf: minor-count\fIcount\fR may be a number from 1 to 1048575\&. .sp \fIMinor\-count\fR is a sizing hint for DRBD\&. It helps to right\-size various memory pools\&. It should be set in the in the same order of magnitude than the actual number of minors you use\&. Per default the module loads with 11 more resources than you have currently in your config but at least 32\&. .RE .PP \fBdialog\-refresh \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: dialog-refresh\fItime\fR may be 0 or a positive number\&. .sp The user dialog redraws the second count every \fItime\fR seconds (or does no redraws if \fItime\fR is 0)\&. The default value is 1\&. .RE .PP \fBdisable\-ip\-verification\fR .RS 4 .\" drbd.conf: disable-ip-verification Use \fIdisable\-ip\-verification\fR if, for some obscure reasons, drbdadm can/might not use \fBip\fR or \fBifconfig\fR to do a sanity check for the IP address\&. You can disable the IP verification with this option\&. .RE .PP \fBusage\-count \fR\fB\fIval\fR\fR .RS 4 .\" drbd.conf: usage-count Please participate in \m[blue]\fBDRBD\*(Aqs online usage counter\fR\m[]\&\s-2\u[2]\d\s+2\&. The most convenient way to do so is to set this option to \fByes\fR\&. Valid options are: \fByes\fR, \fBno\fR and \fBask\fR\&. .RE .PP \fBprotocol \fR\fB\fIprot\-id\fR\fR .RS 4 .\" drbd.conf: protocol On the TCP/IP link the specified \fIprotocol\fR is used\&. Valid protocol specifiers are A, B, and C\&. .sp Protocol A: write IO is reported as completed, if it has reached local disk and local TCP send buffer\&. .sp Protocol B: write IO is reported as completed, if it has reached local disk and remote buffer cache\&. .sp Protocol C: write IO is reported as completed, if it has reached both local and remote disk\&. .RE .PP \fBdevice \fR\fB\fIname\fR\fR\fB minor \fR\fB\fInr\fR\fR .RS 4 .\" drbd.conf: device The name of the block device node of the resource being described\&. You must use this device with your application (file system) and you must not use the low level block device which is specified with the \fBdisk\fR parameter\&. .sp One can ether omit the \fIname\fR or \fBminor\fR and the \fIminor number\fR\&. If you omit the \fIname\fR a default of /dev/drbd\fIminor\fR will be used\&. .sp Udev will create additional symlinks in /dev/drbd/by\-res and /dev/drbd/by\-disk\&. .RE .PP \fBdisk \fR\fB\fIname\fR\fR .RS 4 .\" drbd.conf: disk DRBD uses this block device to actually store and retrieve the data\&. Never access such a device while DRBD is running on top of it\&. This also holds true for \fBdumpe2fs\fR(8) and similar commands\&. .RE .PP \fBaddress \fR\fB\fIAF addr:port\fR\fR .RS 4 .\" drbd.conf: address A resource needs one \fIIP\fR address per device, which is used to wait for incoming connections from the partner device respectively to reach the partner device\&. \fIAF\fR must be one of \fBipv4\fR, \fBipv6\fR, \fBssocks\fR or \fBsdp\fR (for compatibility reasons \fBsci\fR is an alias for \fBssocks\fR)\&. It may be omited for IPv4 addresses\&. The actual IPv6 address that follows the \fBipv6\fR keyword must be placed inside brackets: ipv6 [fd01:2345:6789:abcd::1]:7800\&. .sp Each DRBD resource needs a TCP \fIport\fR which is used to connect to the node\*(Aqs partner device\&. Two different DRBD resources may not use the same \fIaddr:port\fR combination on the same node\&. .RE .PP \fBmeta\-disk internal\fR, \fBmeta\-disk \fR\fB\fIdevice\fR\fR, \fBmeta\-disk \fR\fB\fIdevice\fR\fR\fB [\fR\fB\fIindex\fR\fR\fB]\fR .RS 4 .\" drbd.conf: meta-disk Internal means that the last part of the backing device is used to store the meta\-data\&. The size of the meta\-data is computed based on the size of the device\&. .sp When a \fIdevice\fR is specified, either with or without an \fIindex\fR, DRBD stores the meta\-data on this device\&. Without \fIindex\fR, the size of the meta\-data is determined by the size of the data device\&. This is usually used with LVM, which allows to have many variable sized block devices\&. The meta\-data size is 36kB + Backing\-Storage\-size / 32k, rounded up to the next 4kb boundary\&. (Rule of the thumb: 32kByte per 1GByte of storage, rounded up to the next MB\&.) .sp When an \fIindex\fR is specified, each index number refers to a fixed slot of meta\-data of 128 MB, which allows a maximum data size of 4 GB\&. This way, multiple DBRD devices can share the same meta\-data device\&. For example, if /dev/sde6[0] and /dev/sde6[1] are used, /dev/sde6 must be at least 256 MB big\&. Because of the hard size limit, use of meta\-disk indexes is discouraged\&. .RE .PP \fBon\-io\-error \fR\fB\fIhandler\fR\fR .RS 4 .\" drbd.conf: on-io-error\fIhandler\fR is taken, if the lower level device reports io\-errors to the upper layers\&. .sp \fIhandler\fR may be \fBpass_on\fR, \fBcall\-local\-io\-error\fR or \fBdetach\&.\fR .sp \fBpass_on\fR: The node downgrades the disk status to inconsistent, marks the erroneous block as inconsistent in the bitmap and retries the IO on the remote node\&. .sp \fBcall\-local\-io\-error\fR: Call the handler script \fBlocal\-io\-error\fR\&. .sp \fBdetach\fR: The node drops its low level device, and continues in diskless mode\&. .RE .PP \fBfencing \fR\fB\fIfencing_policy\fR\fR .RS 4 .\" drbd.conf: fencing By \fBfencing\fR we understand preventive measures to avoid situations where both nodes are primary and disconnected (AKA split brain)\&. .sp Valid fencing policies are: .PP \fBdont\-care\fR .RS 4 This is the default policy\&. No fencing actions are taken\&. .RE .PP \fBresource\-only\fR .RS 4 If a node becomes a disconnected primary, it tries to fence the peer\*(Aqs disk\&. This is done by calling the \fBfence\-peer\fR handler\&. The handler is supposed to reach the other node over alternative communication paths and call \*(Aq\fBdrbdadm outdate res\fR\*(Aq there\&. .RE .PP \fBresource\-and\-stonith\fR .RS 4 If a node becomes a disconnected primary, it freezes all its IO operations and calls its fence\-peer handler\&. The fence\-peer handler is supposed to reach the peer over alternative communication paths and call \*(Aqdrbdadm outdate res\*(Aq there\&. In case it cannot reach the peer it should stonith the peer\&. IO is resumed as soon as the situation is resolved\&. In case your handler fails, you can resume IO with the \fBresume\-io\fR command\&. .RE .RE .PP \fBdisk\-barrier\fR, \fBdisk\-flushes\fR, \fBdisk\-drain\fR .RS 4 .\" drbd.conf: disk-barrier .\" drbd.conf: disk-flushes .\" drbd.conf: disk-drain DRBD has four implementations to express write\-after\-write dependencies to its backing storage device\&. DRBD will use the first method that is supported by the backing storage device and that is not disabled\&. By default the \fIflush\fR method is used\&. .sp Since drbd\-8\&.4\&.2 \fBdisk\-barrier\fR is disabled by default because since linux\-2\&.6\&.36 (or 2\&.6\&.32 RHEL6) there is no reliable way to determine if queuing of IO\-barriers works\&. \fIDangerous\fR only enable if you are told so by one that knows for sure\&. .sp When selecting the method you should not only base your decision on the measurable performance\&. In case your backing storage device has a volatile write cache (plain disks, RAID of plain disks) you should use one of the first two\&. In case your backing storage device has battery\-backed write cache you may go with option 3\&. Option 4 (disable everything, use "none") \fIis dangerous\fR on most IO stacks, may result in write\-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles\&. \fIDo not use\fR \fBno\-disk\-drain\fR\&. .sp Unfortunately device mapper (LVM) might not support barriers\&. .sp The letter after "wo:" in /proc/drbd indicates with method is currently in use for a device: \fBb\fR, \fBf\fR, \fBd\fR, \fBn\fR\&. The implementations are: .PP barrier .RS 4 The first requires that the driver of the backing storage device support barriers (called \*(Aqtagged command queuing\*(Aq in SCSI and \*(Aqnative command queuing\*(Aq in SATA speak)\&. The use of this method can be enabled by setting the \fBdisk\-barrier\fR options to \fByes\fR\&. .RE .PP flush .RS 4 The second requires that the backing device support disk flushes (called \*(Aqforce unit access\*(Aq in the drive vendors speak)\&. The use of this method can be disabled setting \fBdisk\-flushes\fR to \fBno\fR\&. .RE .PP drain .RS 4 The third method is simply to let write requests drain before write requests of a new reordering domain are issued\&. This was the only implementation before 8\&.0\&.9\&. .RE .PP none .RS 4 The fourth method is to not express write\-after\-write dependencies to the backing store at all, by also specifying \fBno\-disk\-drain\fR\&. This \fIis dangerous\fR on most IO stacks, may result in write\-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles\&. \fIDo not use\fR \fBno\-disk\-drain\fR\&. .RE .RE .PP \fBmd\-flushes\fR .RS 4 .\" drbd.conf: md-flushes Disables the use of disk flushes and barrier BIOs when accessing the meta data device\&. See the notes on \fBdisk\-flushes\fR\&. .RE .PP \fBmax\-bio\-bvecs\fR .RS 4 .\" drbd.conf: max-bio-bvecs In some special circumstances the device mapper stack manages to pass BIOs to DRBD that violate the constraints that are set forth by DRBD\*(Aqs merge_bvec() function and which have more than one bvec\&. A known example is: phys\-disk \-> DRBD \-> LVM \-> Xen \-> misaligned partition (63) \-> DomU FS\&. Then you might see "bio would need to, but cannot, be split:" in the Dom0\*(Aqs kernel log\&. .sp The best workaround is to proper align the partition within the VM (E\&.g\&. start it at sector 1024)\&. This costs 480 KiB of storage\&. Unfortunately the default of most Linux partitioning tools is to start the first partition at an odd number (63)\&. Therefore most distribution\*(Aqs install helpers for virtual linux machines will end up with misaligned partitions\&. The second best workaround is to limit DRBD\*(Aqs max bvecs per BIO (= \fBmax\-bio\-bvecs\fR) to 1, but that might cost performance\&. .sp The default value of \fBmax\-bio\-bvecs\fR is 0, which means that there is no user imposed limitation\&. .RE .PP \fBdisk\-timeout\fR .RS 4 .\" drbd.conf: disk-timeout If the driver of the \fIlower_device\fR does not finish an IO request within \fIdisk_timeout\fR, DRBD considers the disk as failed\&. If DRBD is connected to a remote host, it will reissue local pending IO requests to the peer, and ship all new IO requests to the peer only\&. The disk state advances to diskless, as soon as the backing block device has finished all IO requests\&. .sp The default value of is 0, which means that no timeout is enforced\&. The default unit is 100ms\&. This option is available since 8\&.3\&.12\&. .RE .PP \fBread\-balancing \fR\fB\fImethod\fR\fR .RS 4 .\" drbd.conf: read-balancing The supported \fImethods\fR for load balancing of read requests are \fBprefer\-local\fR, \fBprefer\-remote\fR, \fBround\-robin\fR, \fBleast\-pending\fR, \fBwhen\-congested\-remote\fR, \fB32K\-striping\fR, \fB64K\-striping\fR, \fB128K\-striping\fR, \fB256K\-striping\fR, \fB512K\-striping\fR and \fB1M\-striping\fR\&. .sp The default value of is \fBprefer\-local\fR\&. This option is available since 8\&.4\&.1\&. .RE .PP \fBsndbuf\-size \fR\fB\fIsize\fR\fR .RS 4 .\" drbd.conf: sndbuf-size\fIsize\fR is the size of the TCP socket send buffer\&. The default value is 0, i\&.e\&. autotune\&. You can specify smaller or larger values\&. Larger values are appropriate for reasonable write throughput with protocol A over high latency networks\&. Values below 32K do not make sense\&. Since 8\&.0\&.13 resp\&. 8\&.2\&.7, setting the \fIsize\fR value to 0 means that the kernel should autotune this\&. .RE .PP \fBrcvbuf\-size \fR\fB\fIsize\fR\fR .RS 4 .\" drbd.conf: rcvbuf-size\fIsize\fR is the size of the TCP socket receive buffer\&. The default value is 0, i\&.e\&. autotune\&. You can specify smaller or larger values\&. Usually this should be left at its default\&. Setting the \fIsize\fR value to 0 means that the kernel should autotune this\&. .RE .PP \fBtimeout \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: timeout If the partner node fails to send an expected response packet within \fItime\fR tenths of a second, the partner node is considered dead and therefore the TCP/IP connection is abandoned\&. This must be lower than \fIconnect\-int\fR and \fIping\-int\fR\&. The default value is 60 = 6 seconds, the unit 0\&.1 seconds\&. .RE .PP \fBconnect\-int \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: connect-int In case it is not possible to connect to the remote DRBD device immediately, DRBD keeps on trying to connect\&. With this option you can set the time between two retries\&. The default value is 10 seconds, the unit is 1 second\&. .RE .PP \fBping\-int \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: ping-int If the TCP/IP connection linking a DRBD device pair is idle for more than \fItime\fR seconds, DRBD will generate a keep\-alive packet to check if its partner is still alive\&. The default is 10 seconds, the unit is 1 second\&. .RE .PP \fBping\-timeout \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: ping-timeout The time the peer has time to answer to a keep\-alive packet\&. In case the peer\*(Aqs reply is not received within this time period, it is considered as dead\&. The default value is 500ms, the default unit are tenths of a second\&. .RE .PP \fBmax\-buffers \fR\fB\fInumber\fR\fR .RS 4 .\" drbd.conf: max-buffers Limits the memory usage per DRBD minor device on the receiving side, or for internal buffers during resync or online\-verify\&. Unit is PAGE_SIZE, which is 4 KiB on most systems\&. The minimum possible setting is hard coded to 32 (=128 KiB)\&. These buffers are used to hold data blocks while they are written to/read from disk\&. To avoid possible distributed deadlocks on congestion, this setting is used as a throttle threshold rather than a hard limit\&. Once more than max\-buffers pages are in use, further allocation from this pool is throttled\&. You want to increase max\-buffers if you cannot saturate the IO backend on the receiving side\&. .RE .PP \fBko\-count \fR\fB\fInumber\fR\fR .RS 4 .\" drbd.conf: ko-count In case the secondary node fails to complete a single write request for \fIcount\fR times the \fItimeout\fR, it is expelled from the cluster\&. (I\&.e\&. the primary node goes into \fBStandAlone\fR mode\&.) The default value is 0, which disables this feature\&. .RE .PP \fBmax\-epoch\-size \fR\fB\fInumber\fR\fR .RS 4 .\" drbd.conf: max-epoch-size The highest number of data blocks between two write barriers\&. If you set this smaller than 10, you might decrease your performance\&. .RE .PP \fBallow\-two\-primaries\fR .RS 4 .\" drbd.conf: allow-two-primaries With this option set you may assign the primary role to both nodes\&. You only should use this option if you use a shared storage file system on top of DRBD\&. At the time of writing the only ones are: OCFS2 and GFS\&. If you use this option with any other file system, you are going to crash your nodes and to corrupt your data! .RE .PP \fBunplug\-watermark \fR\fB\fInumber\fR\fR .RS 4 .\" drbd.conf: unplug-watermark This setting has no effect with recent kernels that use explicit on\-stack plugging (upstream Linux kernel 2\&.6\&.39, distributions may have backported)\&. .sp When the number of pending write requests on the standby (secondary) node exceeds the \fBunplug\-watermark\fR, we trigger the request processing of our backing storage device\&. Some storage controllers deliver better performance with small values, others deliver best performance when the value is set to the same value as max\-buffers, yet others don\*(Aqt feel much effect at all\&. Minimum 16, default 128, maximum 131072\&. .RE .PP \fBcram\-hmac\-alg\fR .RS 4 .\" drbd.conf: cram-hmac-alg You need to specify the HMAC algorithm to enable peer authentication at all\&. You are strongly encouraged to use peer authentication\&. The HMAC algorithm will be used for the challenge response authentication of the peer\&. You may specify any digest algorithm that is named in \fB/proc/crypto\fR\&. .RE .PP \fBshared\-secret\fR .RS 4 .\" drbd.conf: shared-secret The shared secret used in peer authentication\&. May be up to 64 characters\&. Note that peer authentication is disabled as long as no \fBcram\-hmac\-alg\fR (see above) is specified\&. .RE .PP \fBafter\-sb\-0pri \fR \fIpolicy\fR .RS 4 .\" drbd.conf: after-sb-0pri possible policies are: .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBdiscard\-younger\-primary\fR .RS 4 Auto sync from the node that was primary before the split\-brain situation happened\&. .RE .PP \fBdiscard\-older\-primary\fR .RS 4 Auto sync from the node that became primary as second during the split\-brain situation\&. .RE .PP \fBdiscard\-zero\-changes\fR .RS 4 In case one node did not write anything since the split brain became evident, sync from the node that wrote something to the node that did not write anything\&. In case none wrote anything this policy uses a random decision to perform a "resync" of 0 blocks\&. In case both have written something this policy disconnects the nodes\&. .RE .PP \fBdiscard\-least\-changes\fR .RS 4 Auto sync from the node that touched more blocks during the split brain situation\&. .RE .PP \fBdiscard\-node\-NODENAME\fR .RS 4 Auto sync to the named node\&. .RE .RE .PP \fBafter\-sb\-1pri \fR \fIpolicy\fR .RS 4 .\" drbd.conf: after-sb-1pri possible policies are: .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBconsensus\fR .RS 4 Discard the version of the secondary if the outcome of the \fBafter\-sb\-0pri\fR algorithm would also destroy the current secondary\*(Aqs data\&. Otherwise disconnect\&. .RE .PP \fBviolently\-as0p\fR .RS 4 Always take the decision of the \fBafter\-sb\-0pri\fR algorithm, even if that causes an erratic change of the primary\*(Aqs view of the data\&. This is only useful if you use a one\-node FS (i\&.e\&. not OCFS2 or GFS) with the \fBallow\-two\-primaries\fR flag, \fIAND\fR if you really know what you are doing\&. This is \fIDANGEROUS and MAY CRASH YOUR MACHINE\fR if you have an FS mounted on the primary node\&. .RE .PP \fBdiscard\-secondary\fR .RS 4 Discard the secondary\*(Aqs version\&. .RE .PP \fBcall\-pri\-lost\-after\-sb\fR .RS 4 Always honor the outcome of the \fBafter\-sb\-0pri \fR algorithm\&. In case it decides the current secondary has the right data, it calls the "pri\-lost\-after\-sb" handler on the current primary\&. .RE .RE .PP \fBafter\-sb\-2pri \fR \fIpolicy\fR .RS 4 .\" drbd.conf: after-sb-2pri possible policies are: .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBviolently\-as0p\fR .RS 4 Always take the decision of the \fBafter\-sb\-0pri\fR algorithm, even if that causes an erratic change of the primary\*(Aqs view of the data\&. This is only useful if you use a one\-node FS (i\&.e\&. not OCFS2 or GFS) with the \fBallow\-two\-primaries\fR flag, \fIAND\fR if you really know what you are doing\&. This is \fIDANGEROUS and MAY CRASH YOUR MACHINE\fR if you have an FS mounted on the primary node\&. .RE .PP \fBcall\-pri\-lost\-after\-sb\fR .RS 4 Call the "pri\-lost\-after\-sb" helper program on one of the machines\&. This program is expected to reboot the machine, i\&.e\&. make it secondary\&. .RE .RE .PP \fBalways\-asbp\fR .RS 4 Normally the automatic after\-split\-brain policies are only used if current states of the UUIDs do not indicate the presence of a third node\&. .sp With this option you request that the automatic after\-split\-brain policies are used as long as the data sets of the nodes are somehow related\&. This might cause a full sync, if the UUIDs indicate the presence of a third node\&. (Or double faults led to strange UUID sets\&.) .RE .PP \fBrr\-conflict \fR \fIpolicy\fR .RS 4 .\" drbd.conf: rr-conflict This option helps to solve the cases when the outcome of the resync decision is incompatible with the current role assignment in the cluster\&. .PP \fBdisconnect\fR .RS 4 No automatic resynchronization, simply disconnect\&. .RE .PP \fBviolently\fR .RS 4 Sync to the primary node is allowed, violating the assumption that data on a block device are stable for one of the nodes\&. \fIDangerous, do not use\&.\fR .RE .PP \fBcall\-pri\-lost\fR .RS 4 Call the "pri\-lost" helper program on one of the machines\&. This program is expected to reboot the machine, i\&.e\&. make it secondary\&. .RE .RE .PP \fBdata\-integrity\-alg \fR \fIalg\fR .RS 4 .\" drbd.conf: data-integrity-alg DRBD can ensure the data integrity of the user\*(Aqs data on the network by comparing hash values\&. Normally this is ensured by the 16 bit checksums in the headers of TCP/IP packets\&. .sp This option can be set to any of the kernel\*(Aqs data digest algorithms\&. In a typical kernel configuration you should have at least one of \fBmd5\fR, \fBsha1\fR, and \fBcrc32c\fR available\&. By default this is not enabled\&. .sp See also the notes on data integrity\&. .RE .PP \fBtcp\-cork\fR .RS 4 .\" drbd.conf: tcp-cork DRBD usually uses the TCP socket option TCP_CORK to hint to the network stack when it can expect more data, and when it should flush out what it has in its send queue\&. It turned out that there is at least one network stack that performs worse when one uses this hinting method\&. Therefore we introducted this option\&. By setting \fBtcp\-cork\fR to \fBno\fR you can disable the setting and clearing of the TCP_CORK socket option by DRBD\&. .RE .PP \fBon\-congestion \fR\fB\fIcongestion_policy\fR\fR, \fBcongestion\-fill \fR\fB\fIfill_threshold\fR\fR, \fBcongestion\-extents \fR\fB\fIactive_extents_threshold\fR\fR .RS 4 By default DRBD blocks when the available TCP send queue becomes full\&. That means it will slow down the application that generates the write requests that cause DRBD to send more data down that TCP connection\&. .sp When DRBD is deployed with DRBD\-proxy it might be more desirable that DRBD goes into AHEAD/BEHIND mode shortly before the send queue becomes full\&. In AHEAD/BEHIND mode DRBD does no longer replicate data, but still keeps the connection open\&. .sp The advantage of the AHEAD/BEHIND mode is that the application is not slowed down, even if DRBD\-proxy\*(Aqs buffer is not sufficient to buffer all write requests\&. The downside is that the peer node falls behind, and that a resync will be necessary to bring it back into sync\&. During that resync the peer node will have an inconsistent disk\&. .sp Available \fIcongestion_policy\fRs are \fBblock\fR and \fBpull\-ahead\fR\&. The default is \fBblock\fR\&. \fIFill_threshold\fR might be in the range of 0 to 10GiBytes\&. The default is 0 which disables the check\&. \fIActive_extents_threshold\fR has the same limits as \fBal\-extents\fR\&. .sp The AHEAD/BEHIND mode and its settings are available since DRBD 8\&.3\&.10\&. .RE .PP \fBwfc\-timeout \fR\fB\fItime\fR\fR .RS 4 Wait for connection timeout\&. .\" drbd.conf: wfc-timeout The init script \fBdrbd\fR(8) blocks the boot process until the DRBD resources are connected\&. When the cluster manager starts later, it does not see a resource with internal split\-brain\&. In case you want to limit the wait time, do it here\&. Default is 0, which means unlimited\&. The unit is seconds\&. .RE .PP \fBdegr\-wfc\-timeout \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: degr-wfc-timeout Wait for connection timeout, if this node was a degraded cluster\&. In case a degraded cluster (= cluster with only one node left) is rebooted, this timeout value is used instead of wfc\-timeout, because the peer is less likely to show up in time, if it had been dead before\&. Value 0 means unlimited\&. .RE .PP \fBoutdated\-wfc\-timeout \fR\fB\fItime\fR\fR .RS 4 .\" drbd.conf: outdated-wfc-timeout Wait for connection timeout, if the peer was outdated\&. In case a degraded cluster (= cluster with only one node left) with an outdated peer disk is rebooted, this timeout value is used instead of wfc\-timeout, because the peer is not allowed to become primary in the meantime\&. Value 0 means unlimited\&. .RE .PP \fBwait\-after\-sb\fR .RS 4 By setting this option you can make the init script to continue to wait even if the device pair had a split brain situation and therefore refuses to connect\&. .RE .PP \fBbecome\-primary\-on \fR\fB\fInode\-name\fR\fR .RS 4 Sets on which node the device should be promoted to primary role by the init script\&. The \fInode\-name\fR might either be a host name or the keyword \fBboth\fR\&. When this option is not set the devices stay in secondary role on both nodes\&. Usually one delegates the role assignment to a cluster manager (e\&.g\&. heartbeat)\&. .RE .PP \fBstacked\-timeouts\fR .RS 4 Usually \fBwfc\-timeout\fR and \fBdegr\-wfc\-timeout\fR are ignored for stacked devices, instead twice the amount of \fBconnect\-int\fR is used for the connection timeouts\&. With the \fBstacked\-timeouts\fR keyword you disable this, and force DRBD to mind the \fBwfc\-timeout\fR and \fBdegr\-wfc\-timeout\fR statements\&. Only do that if the peer of the stacked resource is usually not available or will usually not become primary\&. By using this option incorrectly, you run the risk of causing unexpected split brain\&. .RE .PP \fBresync\-rate \fR\fB\fIrate\fR\fR .RS 4 .\" drbd.conf: resync-rate To ensure a smooth operation of the application on top of DRBD, it is possible to limit the bandwidth which may be used by background synchronizations\&. The default is 250 KB/sec, the default unit is KB/sec\&. Optional suffixes K, M, G are allowed\&. .RE .PP \fBuse\-rle\fR .RS 4 .\" drbd.conf: use-rle During resync\-handshake, the dirty\-bitmaps of the nodes are exchanged and merged (using bit\-or), so the nodes will have the same understanding of which blocks are dirty\&. On large devices, the fine grained dirty\-bitmap can become large as well, and the bitmap exchange can take quite some time on low\-bandwidth links\&. .sp Because the bitmap typically contains compact areas where all bits are unset (clean) or set (dirty), a simple run\-length encoding scheme can considerably reduce the network traffic necessary for the bitmap exchange\&. .sp For backward compatibilty reasons, and because on fast links this possibly does not improve transfer time but consumes cpu cycles, this defaults to off\&. .RE .PP \fBresync\-after \fR\fB\fIres\-name\fR\fR .RS 4 .\" drbd.conf: resync-after By default, resynchronization of all devices would run in parallel\&. By defining a resync\-after dependency, the resynchronization of this resource will start only if the resource \fIres\-name\fR is already in connected state (i\&.e\&., has finished its resynchronization)\&. .RE .PP \fBal\-extents \fR\fB\fIextents\fR\fR .RS 4 .\" drbd.conf: al-extents DRBD automatically performs hot area detection\&. With this parameter you control how big the hot area (= active set) can get\&. Each extent marks 4M of the backing storage (= low\-level device)\&. In case a primary node leaves the cluster unexpectedly, the areas covered by the active set must be resynced upon rejoining of the failed node\&. The data structure is stored in the meta\-data area, therefore each change of the active set is a write operation to the meta\-data device\&. A higher number of extents gives longer resync times but less updates to the meta\-data\&. The default number of \fIextents\fR is 1237\&. (Minimum: 7, Maximum: 65534) .sp Note that the effective maximum may be smaller, depending on how you created the device meta data, see also \fBdrbdmeta\fR(8)\&. The effective maximum is 919 * (available on\-disk activity\-log ring\-buffer area/4kB \-1), the default 32kB ring\-buffer effects a maximum of 6433 (covers more than 25 GiB of data)\&. We recommend to keep this well within the amount your backend storage and replication link are able to resync inside of about 5 minutes\&. .RE .PP \fBal\-updates \fR\fB{yes | no}\fR .RS 4 .\" drbd.conf: al-updates DRBD\*(Aqs activity log transaction writing makes it possible, that after the crash of a primary node a partial (bit\-map based) resync is sufficient to bring the node back to up\-to\-date\&. Setting \fBal\-updates\fR to \fBno\fR might increase normal operation performance but causes DRBD to do a full resync when a crashed primary gets reconnected\&. The default value is \fByes\fR\&. .RE .PP \fBverify\-alg \fR\fB\fIhash\-alg\fR\fR .RS 4 During online verification (as initiated by the \fBverify\fR sub\-command), rather than doing a bit\-wise comparison, DRBD applies a hash function to the contents of every block being verified, and compares that hash with the peer\&. This option defines the hash algorithm being used for that purpose\&. It can be set to any of the kernel\*(Aqs data digest algorithms\&. In a typical kernel configuration you should have at least one of \fBmd5\fR, \fBsha1\fR, and \fBcrc32c\fR available\&. By default this is not enabled; you must set this option explicitly in order to be able to use on\-line device verification\&. .sp See also the notes on data integrity\&. .RE .PP \fBcsums\-alg \fR\fB\fIhash\-alg\fR\fR .RS 4 A resync process sends all marked data blocks from the source to the destination node, as long as no \fBcsums\-alg\fR is given\&. When one is specified the resync process exchanges hash values of all marked blocks first, and sends only those data blocks that have different hash values\&. .sp This setting is useful for DRBD setups with low bandwidth links\&. During the restart of a crashed primary node, all blocks covered by the activity log are marked for resync\&. But a large part of those will actually be still in sync, therefore using \fBcsums\-alg\fR will lower the required bandwidth in exchange for CPU cycles\&. .RE .PP \fBc\-plan\-ahead \fR\fB\fIplan_time\fR\fR, \fBc\-fill\-target \fR\fB\fIfill_target\fR\fR, \fBc\-delay\-target \fR\fB\fIdelay_target\fR\fR, \fBc\-max\-rate \fR\fB\fImax_rate\fR\fR .RS 4 The dynamic resync speed controller gets enabled with setting \fIplan_time\fR to a positive value\&. It aims to fill the buffers along the data path with either a constant amount of data \fIfill_target\fR, or aims to have a constant delay time of \fIdelay_target\fR along the path\&. The controller has an upper bound of \fImax_rate\fR\&. .sp By \fIplan_time\fR the agility of the controller is configured\&. Higher values yield for slower/lower responses of the controller to deviation from the target value\&. It should be at least 5 times RTT\&. For regular data paths a \fIfill_target\fR in the area of 4k to 100k is appropriate\&. For a setup that contains drbd\-proxy it is advisable to use \fIdelay_target\fR instead\&. Only when \fIfill_target\fR is set to 0 the controller will use \fIdelay_target\fR\&. 5 times RTT is a reasonable starting value\&. \fIMax_rate\fR should be set to the bandwidth available between the DRBD\-hosts and the machines hosting DRBD\-proxy, or to the available disk\-bandwidth\&. .sp The default value of \fIplan_time\fR is 0, the default unit is 0\&.1 seconds\&. \fIFill_target\fR has 0 and sectors as default unit\&. \fIDelay_target\fR has 1 (100ms) and 0\&.1 as default unit\&. \fIMax_rate\fR has 10240 (100MiB/s) and KiB/s as default unit\&. .sp The dynamic resync speed controller and its settings are available since DRBD 8\&.3\&.9\&. .RE .PP \fBc\-min\-rate \fR\fB\fImin_rate\fR\fR .RS 4 A node that is primary and sync\-source has to schedule application IO requests and resync IO requests\&. The \fImin_rate\fR tells DRBD use only up to min_rate for resync IO and to dedicate all other available IO bandwidth to application requests\&. .sp Note: The value 0 has a special meaning\&. It disables the limitation of resync IO completely, which might slow down application IO considerably\&. Set it to a value of 1, if you prefer that resync IO never slows down application IO\&. .sp Note: Although the name might suggest that it is a lower bound for the dynamic resync speed controller, it is not\&. If the DRBD\-proxy buffer is full, the dynamic resync speed controller is free to lower the resync speed down to 0, completely independent of the \fBc\-min\-rate\fR setting\&. .sp \fIMin_rate\fR has 4096 (4MiB/s) and KiB/s as default unit\&. .RE .PP \fBon\-no\-data\-accessible \fR\fB\fIond\-policy\fR\fR .RS 4 This setting controls what happens to IO requests on a degraded, disk less node (I\&.e\&. no data store is reachable)\&. The available policies are \fBio\-error\fR and \fBsuspend\-io\fR\&. .sp If \fIond\-policy\fR is set to \fBsuspend\-io\fR you can either resume IO by attaching/connecting the last lost data storage, or by the \fBdrbdadm resume\-io \fR\fB\fIres\fR\fR command\&. The latter will result in IO errors of course\&. .sp The default is \fBio\-error\fR\&. This setting is available since DRBD 8\&.3\&.9\&. .RE .PP \fBcpu\-mask \fR\fB\fIcpu\-mask\fR\fR .RS 4 .\" drbd.conf: cpu-mask Sets the cpu\-affinity\-mask for DRBD\*(Aqs kernel threads of this device\&. The default value of \fIcpu\-mask\fR is 0, which means that DRBD\*(Aqs kernel threads should be spread over all CPUs of the machine\&. This value must be given in hexadecimal notation\&. If it is too big it will be truncated\&. .RE .PP \fBpri\-on\-incon\-degr \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: pri-on-incon-degr This handler is called if the node is primary, degraded and if the local copy of the data is inconsistent\&. .RE .PP \fBpri\-lost\-after\-sb \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: pri-lost-after-sb The node is currently primary, but lost the after\-split\-brain auto recovery procedure\&. As as consequence, it should be abandoned\&. .RE .PP \fBpri\-lost \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: pri-lost The node is currently primary, but DRBD\*(Aqs algorithm thinks that it should become sync target\&. As a consequence it should give up its primary role\&. .RE .PP \fBfence\-peer \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: fence-peer The handler is part of the \fBfencing\fR mechanism\&. This handler is called in case the node needs to fence the peer\*(Aqs disk\&. It should use other communication paths than DRBD\*(Aqs network link\&. .RE .PP \fBlocal\-io\-error \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: local-io-error DRBD got an IO error from the local IO subsystem\&. .RE .PP \fBinitial\-split\-brain \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: initial-split-brain DRBD has connected and detected a split brain situation\&. This handler can alert someone in all cases of split brain, not just those that go unresolved\&. .RE .PP \fBsplit\-brain \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: split-brain DRBD detected a split brain situation but remains unresolved\&. Manual recovery is necessary\&. This handler should alert someone on duty\&. .RE .PP \fBbefore\-resync\-target \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: before-resync-target DRBD calls this handler just before a resync begins on the node that becomes resync target\&. It might be used to take a snapshot of the backing block device\&. .RE .PP \fBafter\-resync\-target \fR\fB\fIcmd\fR\fR .RS 4 .\" drbd.conf: after-resync-target DRBD calls this handler just after a resync operation finished on the node whose disk just became consistent after being inconsistent for the duration of the resync\&. It might be used to remove a snapshot of the backing device that was created by the \fBbefore\-resync\-target\fR handler\&. .RE .SS "Other Keywords" .PP \fBinclude \fR\fB\fIfile\-pattern\fR\fR .RS 4 .\" drbd.conf: include Include all files matching the wildcard pattern \fIfile\-pattern\fR\&. The \fBinclude\fR statement is only allowed on the top level, i\&.e\&. it is not allowed inside any section\&. .RE .SH "NOTES ON DATA INTEGRITY" .PP There are two independent methods in DRBD to ensure the integrity of the mirrored data\&. The online\-verify mechanism and the \fBdata\-integrity\-alg\fR of the \fBnetwork\fR section\&. .PP Both mechanisms might deliver false positives if the user of DRBD modifies the data which gets written to disk while the transfer goes on\&. This may happen for swap, or for certain append while global sync, or truncate/rewrite workloads, and not necessarily poses a problem for the integrity of the data\&. Usually when the initiator of the data transfer does this, it already knows that that data block will not be part of an on disk data structure, or will be resubmitted with correct data soon enough\&. .PP The \fBdata\-integrity\-alg\fR causes the receiving side to log an error about "Digest integrity check FAILED: Ns +x\en", where N is the sector offset, and x is the size of the request in bytes\&. It will then disconnect, and reconnect, thus causing a quick resync\&. If the sending side at the same time detected a modification, it warns about "Digest mismatch, buffer modified by upper layers during write: Ns +x\en", which shows that this was a false positive\&. The sending side may detect these buffer modifications immediately after the unmodified data has been copied to the tcp buffers, in which case the receiving side won\*(Aqt notice it\&. .PP The most recent (2007) example of systematic corruption was an issue with the TCP offloading engine and the driver of a certain type of GBit NIC\&. The actual corruption happened on the DMA transfer from core memory to the card\&. Since the TCP checksum gets calculated on the card, this type of corruption stays undetected as long as you do not use either the online \fBverify\fR or the \fBdata\-integrity\-alg\fR\&. .PP We suggest to use the \fBdata\-integrity\-alg\fR only during a pre\-production phase due to its CPU costs\&. Further we suggest to do online \fBverify\fR runs regularly e\&.g\&. once a month during a low load period\&. .SH "VERSION" .sp This document was revised for version 8\&.4\&.0 of the DRBD distribution\&. .SH "AUTHOR" .sp Written by Philipp Reisner philipp\&.reisner@linbit\&.com and Lars Ellenberg lars\&.ellenberg@linbit\&.com\&. .SH "REPORTING BUGS" .sp Report bugs to drbd\-user@lists\&.linbit\&.com\&. .SH "COPYRIGHT" .sp Copyright 2001\-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg\&. This is free software; see the source for copying conditions\&. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. .SH "SEE ALSO" .PP \fBdrbd\fR(8), \fBdrbddisk\fR(8), \fBdrbdsetup\fR(8), \fBdrbdmeta\fR(8), \fBdrbdadm\fR(8), \m[blue]\fBDRBD User\*(Aqs Guide\fR\m[]\&\s-2\u[1]\d\s+2, \m[blue]\fBDRBD web site\fR\m[]\&\s-2\u[3]\d\s+2 .SH "NOTES" .IP " 1." 4 DRBD User's Guide .RS 4 \%http://www.drbd.org/users-guide/ .RE .IP " 2." 4 DRBD's online usage counter .RS 4 \%http://usage.drbd.org .RE .IP " 3." 4 DRBD web site .RS 4 \%http://www.drbd.org/ .RE drbd-8.4.4/documentation/drbddisk.80000664000000000000000000000523212226007147015675 0ustar rootroot'\" t .\" Title: drbddisk .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 15 Oct 2008 .\" Manual: System Administration .\" Source: DRBD 8.3.2 .\" Language: English .\" .TH "DRBDDISK" "8" "15 Oct 2008" "DRBD 8.3.2" "System Administration" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" drbddisk \- Script to mark devices as primary and mount file systems .SH "SYNOPSIS" .HP \w'\fB/etc/ha\&.d/resource\&.d/drbddisk\fR\ 'u \fB/etc/ha\&.d/resource\&.d/drbddisk\fR [\fIresource\fR] {{start}\ |\ {stop}\ |\ {status}} .SH "INTRODUCTION" .PP The \fB/etc/ha\&.d/resource\&.d/drbddisk\fR script brings the local device of \fIresource\fR into primary role\&. It is designed to be used by Heartbeat\&. .PP In order to use \fB/etc/ha\&.d/resource\&.d/drbddisk\fR you must define a resource, a host, and any other configuration options in the DRBD configuration file\&. See \fB/etc/drbd\&.conf\fR for details\&. If \fIresource\fR is omitted, then all of the resources listed in the config file are affected\&. .SH "VERSION" .sp This document was revised for version 8\&.0\&.14 of the DRBD distribution\&. .SH "AUTHOR" .sp Written by Philipp Reisner philipp\&.reisner@linbit\&.com and Lars Ellenberg lars\&.ellenberg@linbit\&.com\&. .SH "REPORTING BUGS" .sp Report bugs to drbd\-user@lists\&.linbit\&.com\&. .SH "COPYRIGHT" .sp Copyright 2001\-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg\&. This is free software; see the source for copying conditions\&. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. .SH "SEE ALSO" .PP \fBdrbd.conf\fR(5), \fBdrbd\fR(8), \fBdrbdsetup\fR(8)\fBdrbdadm\fR(8)\m[blue]\fBDRBD Homepage\fR\m[]\&\s-2\u[1]\d\s+2 .SH "NOTES" .IP " 1." 4 DRBD Homepage .RS 4 \%http://www.drbd.org/ .RE drbd-8.4.4/documentation/drbd.80000664000000000000000000000545112226007146015024 0ustar rootroot'\" t .\" Title: drbd .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 15 Oct 2008 .\" Manual: System Administration .\" Source: DRBD 8.3.2 .\" Language: English .\" .TH "DRBD" "8" "15 Oct 2008" "DRBD 8.3.2" "System Administration" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" drbd \- The start and stop script for DRBD .SH "SYNOPSIS" .HP \w'\fB/etc/init\&.d/drbd\fR\ 'u \fB/etc/init\&.d/drbd\fR [\fIresource\fR] {{start}\ |\ {stop}\ |\ {status}\ |\ {reload}\ |\ {restart}\ |\ {force\-reload}} .SH "INTRODUCTION" .PP The \fB/etc/init\&.d/drbd\fR script is used to start and stop drbd on a system V style init system\&. .PP In order to use \fB/etc/init\&.d/drbd\fR you must define a resource, a host, and any other configuration options in the drbd configuration file\&. See \fB/etc/drbd\&.conf\fR for details\&. If \fIresource\fR is omitted, then all of the resources listed in the config file are configured\&. .PP This script might ask you \(lqDo you want to abort waiting for other server and make this one primary?\(rq .PP Only answer this question with \(lqyes\(rq if you are sure that it is impossible to repair the other node\&. .SH "VERSION" .sp This document was revised for version 8\&.3\&.2 of the DRBD distribution\&. .SH "AUTHOR" .sp Written by Philipp Reisner philipp\&.reisner@linbit\&.com and Lars Ellenberg lars\&.ellenberg@linbit\&.com\&. .SH "REPORTING BUGS" .sp Report bugs to drbd\-user@lists\&.linbit\&.com\&. .SH "COPYRIGHT" .sp Copyright 2001\-2008 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg\&. This is free software; see the source for copying conditions\&. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. .SH "SEE ALSO" .PP \fBdrbd.conf\fR(5), \fBdrbddisk\fR(8), \fBdrbdsetup\fR(8)\fBdrbdadm\fR(8)\m[blue]\fBDRBD Homepage\fR\m[]\&\s-2\u[1]\d\s+2 .SH "NOTES" .IP " 1." 4 DRBD Homepage .RS 4 \%http://www.drbd.org/ .RE drbd-8.4.4/drbd_config.h0000777000000000000000000000000012226007150020224 2drbd/linux/drbd_config.hustar rootrootdrbd-8.4.4/drbd/drbd_buildtag.c0000664000000000000000000000032712226007150015004 0ustar rootroot/* automatically generated. DO NOT EDIT. */ #include const char *drbd_buildtag(void) { return "GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd" " build by phil@fat-tyre, 2013-10-11 16:42:48"; } drbd-8.4.4/.filelist0000664000000000000000000002047512226007150012761 0ustar rootrootdrbd-8.4.4/.gitignore drbd-8.4.4/COPYING drbd-8.4.4/ChangeLog drbd-8.4.4/Makefile.in drbd-8.4.4/README drbd-8.4.4/autogen.sh drbd-8.4.4/benchmark/Makefile drbd-8.4.4/benchmark/README drbd-8.4.4/benchmark/dm.c drbd-8.4.4/benchmark/io-latency-test.c drbd-8.4.4/configure.ac drbd-8.4.4/documentation/Makefile.in drbd-8.4.4/documentation/Makefile.lang drbd-8.4.4/documentation/aspell.en.per drbd-8.4.4/documentation/drbd.conf.xml drbd-8.4.4/documentation/drbd.xml drbd-8.4.4/documentation/drbdadm.xml drbd-8.4.4/documentation/drbddisk.xml drbd-8.4.4/documentation/drbdmeta.xml drbd-8.4.4/documentation/drbdsetup.xml drbd-8.4.4/documentation/fencing-by-constraints.txt drbd-8.4.4/documentation/xml-usage-to-docbook.xsl drbd-8.4.4/drbd-kernel.spec.in drbd-8.4.4/drbd-km.spec.in drbd-8.4.4/drbd.spec.in drbd-8.4.4/drbd/Kbuild drbd-8.4.4/drbd/Kconfig drbd-8.4.4/drbd/Makefile drbd-8.4.4/drbd/compat/asm-generic/bitops/le.h drbd-8.4.4/drbd/compat/asm/barrier.h drbd-8.4.4/drbd/compat/bitops.h drbd-8.4.4/drbd/compat/blkdev_issue_zeroout.c drbd-8.4.4/drbd/compat/idr.c drbd-8.4.4/drbd/compat/kobject.c drbd-8.4.4/drbd/compat/linux/autoconf.h drbd-8.4.4/drbd/compat/linux/dynamic_debug.h drbd-8.4.4/drbd/compat/linux/hardirq.h drbd-8.4.4/drbd/compat/linux/memcontrol.h drbd-8.4.4/drbd/compat/linux/mutex.h drbd-8.4.4/drbd/compat/linux/tracepoint.h drbd-8.4.4/drbd/compat/tests/bio_split_has_bio_split_pool_parameter.c drbd-8.4.4/drbd/compat/tests/bioset_create_has_three_parameters.c drbd-8.4.4/drbd/compat/tests/blkdev_issue_zeroout_has_5_paramters.c drbd-8.4.4/drbd/compat/tests/drbd_release_returns_void.c drbd-8.4.4/drbd/compat/tests/have_IS_ERR_OR_NULL.c drbd-8.4.4/drbd/compat/tests/have_atomic_in_flight.c drbd-8.4.4/drbd/compat/tests/have_bio_bi_destructor.c drbd-8.4.4/drbd/compat/tests/have_bioset_create_front_pad.c drbd-8.4.4/drbd/compat/tests/have_blk_queue_max_hw_sectors.c drbd-8.4.4/drbd/compat/tests/have_blk_queue_max_segments.c drbd-8.4.4/drbd/compat/tests/have_blk_set_stacking_limits.c drbd-8.4.4/drbd/compat/tests/have_blkdev_get_by_path.c drbd-8.4.4/drbd/compat/tests/have_bool_type.c drbd-8.4.4/drbd/compat/tests/have_clear_bit_unlock.c drbd-8.4.4/drbd/compat/tests/have_cn_netlink_skb_parms.c drbd-8.4.4/drbd/compat/tests/have_cpumask_empty.c drbd-8.4.4/drbd/compat/tests/have_ctrl_attr_mcast_groups.c drbd-8.4.4/drbd/compat/tests/have_dst_groups.c drbd-8.4.4/drbd/compat/tests/have_find_next_zero_bit_le.c drbd-8.4.4/drbd/compat/tests/have_fmode_t.c drbd-8.4.4/drbd/compat/tests/have_genl_lock.c drbd-8.4.4/drbd/compat/tests/have_genlmsg_msg_size.c drbd-8.4.4/drbd/compat/tests/have_genlmsg_new.c drbd-8.4.4/drbd/compat/tests/have_genlmsg_put_reply.c drbd-8.4.4/drbd/compat/tests/have_genlmsg_reply.c drbd-8.4.4/drbd/compat/tests/have_idr_alloc.c drbd-8.4.4/drbd/compat/tests/have_idr_for_each.c drbd-8.4.4/drbd/compat/tests/have_idr_for_each_entry.c drbd-8.4.4/drbd/compat/tests/have_kref_sub.c drbd-8.4.4/drbd/compat/tests/have_linux_byteorder_swabb_h.c drbd-8.4.4/drbd/compat/tests/have_list_splice_tail_init.c drbd-8.4.4/drbd/compat/tests/have_netlink_skb_parms_portid.c drbd-8.4.4/drbd/compat/tests/have_nlmsg_hdr.c drbd-8.4.4/drbd/compat/tests/have_nr_cpu_ids.c drbd-8.4.4/drbd/compat/tests/have_open_bdev_exclusive.c drbd-8.4.4/drbd/compat/tests/have_prandom_u32.c drbd-8.4.4/drbd/compat/tests/have_proc_create_data.c drbd-8.4.4/drbd/compat/tests/have_proc_pde_data.c drbd-8.4.4/drbd/compat/tests/have_rb_augment_functions.c drbd-8.4.4/drbd/compat/tests/have_security_netlink_recv.c drbd-8.4.4/drbd/compat/tests/have_sock_shutdown.c drbd-8.4.4/drbd/compat/tests/have_struct_queue_limits.c drbd-8.4.4/drbd/compat/tests/have_task_pid_nr.c drbd-8.4.4/drbd/compat/tests/have_umh_wait_proc.c drbd-8.4.4/drbd/compat/tests/have_void_make_request.c drbd-8.4.4/drbd/compat/tests/have_vzalloc.c drbd-8.4.4/drbd/compat/tests/hlist_for_each_entry_has_three_parameters.c drbd-8.4.4/drbd/compat/tests/init_work_has_three_arguments.c drbd-8.4.4/drbd/compat/tests/kmap_atomic_page_only.c drbd-8.4.4/drbd/compat/tests/need_genlmsg_multicast_wrapper.c drbd-8.4.4/drbd/compat/tests/queue_limits_has_discard_zeroes_data.c drbd-8.4.4/drbd/compat/tests/use_blk_queue_max_sectors_anyways.c drbd-8.4.4/drbd/data-structure-v9.txt drbd-8.4.4/drbd/drbd_actlog.c drbd-8.4.4/drbd/drbd_bitmap.c drbd-8.4.4/drbd/drbd_int.h drbd-8.4.4/drbd/drbd_interval.c drbd-8.4.4/drbd/drbd_interval.h drbd-8.4.4/drbd/drbd_main.c drbd-8.4.4/drbd/drbd_nl.c drbd-8.4.4/drbd/drbd_nla.c drbd-8.4.4/drbd/drbd_nla.h drbd-8.4.4/drbd/drbd_proc.c drbd-8.4.4/drbd/drbd_protocol.h drbd-8.4.4/drbd/drbd_receiver.c drbd-8.4.4/drbd/drbd_req.c drbd-8.4.4/drbd/drbd_req.h drbd-8.4.4/drbd/drbd_state.c drbd-8.4.4/drbd/drbd_state.h drbd-8.4.4/drbd/drbd_strings.c drbd-8.4.4/drbd/drbd_strings.h drbd-8.4.4/drbd/drbd_sysfs.c drbd-8.4.4/drbd/drbd_vli.h drbd-8.4.4/drbd/drbd_worker.c drbd-8.4.4/drbd/drbd_wrappers.h drbd-8.4.4/drbd/linux/drbd.h drbd-8.4.4/drbd/linux/drbd_config.h drbd-8.4.4/drbd/linux/drbd_genl.h drbd-8.4.4/drbd/linux/drbd_genl_api.h drbd-8.4.4/drbd/linux/drbd_limits.h drbd-8.4.4/drbd/linux/genl_magic_func.h drbd-8.4.4/drbd/linux/genl_magic_struct.h drbd-8.4.4/drbd/linux/lru_cache.h drbd-8.4.4/drbd/lru_cache.c drbd-8.4.4/filelist-redhat drbd-8.4.4/filelist-suse drbd-8.4.4/preamble drbd-8.4.4/preamble-rhel5 drbd-8.4.4/preamble-sles10 drbd-8.4.4/preamble-sles11 drbd-8.4.4/rpm-macro-fixes/README drbd-8.4.4/rpm-macro-fixes/kmodtool.rhel5.diff drbd-8.4.4/rpm-macro-fixes/macros.kernel-source.sles11-sp1.diff drbd-8.4.4/rpm-macro-fixes/macros.kernel-source.sles11.diff drbd-8.4.4/rpm-macro-fixes/macros.rhel5.diff drbd-8.4.4/rpm-macro-fixes/suse_macros.sles10.diff drbd-8.4.4/rpm-macro-fixes/symset-table.diff drbd-8.4.4/scripts/Makefile.in drbd-8.4.4/scripts/README drbd-8.4.4/scripts/README.rhcs_fence drbd-8.4.4/scripts/block-drbd drbd-8.4.4/scripts/crm-fence-peer.sh drbd-8.4.4/scripts/drbd drbd-8.4.4/scripts/drbd-overview.pl drbd-8.4.4/scripts/drbd.conf drbd-8.4.4/scripts/drbd.conf.example drbd-8.4.4/scripts/drbd.gentoo drbd-8.4.4/scripts/drbd.metadata.rhcs drbd-8.4.4/scripts/drbd.ocf drbd-8.4.4/scripts/drbd.rules drbd-8.4.4/scripts/drbd.sh.rhcs drbd-8.4.4/scripts/drbdadm.bash_completion drbd-8.4.4/scripts/drbddisk drbd-8.4.4/scripts/drbdupper drbd-8.4.4/scripts/get_uts_release.sh drbd-8.4.4/scripts/global_common.conf drbd-8.4.4/scripts/notify.sh drbd-8.4.4/scripts/outdate-peer.sh drbd-8.4.4/scripts/pretty-proc-drbd.sh drbd-8.4.4/scripts/rhcs_fence drbd-8.4.4/scripts/snapshot-resync-target-lvm.sh drbd-8.4.4/scripts/stonith_admin-fence-peer.sh drbd-8.4.4/scripts/unsnapshot-resync-target-lvm.sh drbd-8.4.4/user/Makefile.in drbd-8.4.4/user/config_flags.c drbd-8.4.4/user/config_flags.h drbd-8.4.4/user/drbd_endian.h drbd-8.4.4/user/drbd_nla.c drbd-8.4.4/user/drbd_nla.h drbd-8.4.4/user/drbdadm.h drbd-8.4.4/user/drbdadm_adjust.c drbd-8.4.4/user/drbdadm_main.c drbd-8.4.4/user/drbdadm_parser.c drbd-8.4.4/user/drbdadm_parser.h drbd-8.4.4/user/drbdadm_scanner.fl drbd-8.4.4/user/drbdadm_usage_cnt.c drbd-8.4.4/user/drbdmeta.c drbd-8.4.4/user/drbdmeta_parser.h drbd-8.4.4/user/drbdmeta_scanner.fl drbd-8.4.4/user/drbdsetup.c drbd-8.4.4/user/drbdtool_common.c drbd-8.4.4/user/drbdtool_common.h drbd-8.4.4/user/legacy/.gitignore drbd-8.4.4/user/legacy/Makefile.in drbd-8.4.4/user/legacy/config.h.in drbd-8.4.4/user/legacy/drbd_strings.c drbd-8.4.4/user/legacy/drbdadm.h drbd-8.4.4/user/legacy/drbdadm_adjust.c drbd-8.4.4/user/legacy/drbdadm_main.c drbd-8.4.4/user/legacy/drbdadm_minor_table.c drbd-8.4.4/user/legacy/drbdadm_parser.c drbd-8.4.4/user/legacy/drbdadm_parser.h drbd-8.4.4/user/legacy/drbdadm_scanner.fl drbd-8.4.4/user/legacy/drbdadm_usage_cnt.c drbd-8.4.4/user/legacy/drbdsetup.c drbd-8.4.4/user/legacy/drbdtool_common.c drbd-8.4.4/user/legacy/drbdtool_common.h drbd-8.4.4/user/legacy/linux/drbd.h drbd-8.4.4/user/legacy/linux/drbd_config.h drbd-8.4.4/user/legacy/linux/drbd_limits.h drbd-8.4.4/user/legacy/linux/drbd_nl.h drbd-8.4.4/user/legacy/linux/drbd_tag_magic.h drbd-8.4.4/user/legacy/unaligned.h drbd-8.4.4/user/libgenl.c drbd-8.4.4/user/libgenl.h drbd-8.4.4/user/registry.c drbd-8.4.4/user/registry.h drbd-8.4.4/user/wrap_printf.c drbd-8.4.4/user/wrap_printf.h drbd-8.4.4/documentation/drbdsetup.8 drbd-8.4.4/documentation/drbdadm.8 drbd-8.4.4/documentation/drbdmeta.8 drbd-8.4.4/documentation/drbd.conf.5 drbd-8.4.4/documentation/drbddisk.8 drbd-8.4.4/documentation/drbd.8 drbd-8.4.4/drbd_config.h drbd-8.4.4/drbd/drbd_buildtag.c drbd-8.4.4/.filelist drbd-8.4.4/configure drbd-8.4.4/user/config.h.in drbd-8.4.4/configure0000775000000000000000000041621012226007150013050 0ustar rootroot#! /bin/sh # Guess values for system-dependent variables and create Makefiles. # Generated by GNU Autoconf 2.68 for DRBD 8.4.4. # # Report bugs to . # # # Copyright (C) 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001, # 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Free Software # Foundation, Inc. # # # This configure script is free software; the Free Software Foundation # gives unlimited permission to copy, distribute and modify it. ## -------------------- ## ## M4sh Initialization. ## ## -------------------- ## # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( *) : ;; esac fi as_nl=' ' export as_nl # Printing a long string crashes Solaris 7 /usr/bin/printf. as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo # Prefer a ksh shell builtin over an external printf program on Solaris, # but without wasting forks for bash or zsh. if test -z "$BASH_VERSION$ZSH_VERSION" \ && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then as_echo='print -r --' as_echo_n='print -rn --' elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then as_echo='printf %s\n' as_echo_n='printf %s' else if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' as_echo_n='/usr/ucb/echo -n' else as_echo_body='eval expr "X$1" : "X\\(.*\\)"' as_echo_n_body='eval arg=$1; case $arg in #( *"$as_nl"*) expr "X$arg" : "X\\(.*\\)$as_nl"; arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; esac; expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" ' export as_echo_n_body as_echo_n='sh -c $as_echo_n_body as_echo' fi export as_echo_body as_echo='sh -c $as_echo_body as_echo' fi # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || PATH_SEPARATOR=';' } fi # IFS # We need space, tab and new line, in precisely that order. Quoting is # there to prevent editors from complaining about space-tab. # (If _AS_PATH_WALK were called with IFS unset, it would disable word # splitting by setting IFS to empty value.) IFS=" "" $as_nl" # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done IFS=$as_save_IFS ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi # Unset variables that we do not need and which cause bugs (e.g. in # pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" # suppresses any "Segmentation fault" message there. '((' could # trigger a bug in pdksh 5.2.14. for as_var in BASH_ENV ENV MAIL MAILPATH do eval test x\${$as_var+set} = xset \ && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : done PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. LC_ALL=C export LC_ALL LANGUAGE=C export LANGUAGE # CDPATH. (unset CDPATH) >/dev/null 2>&1 && unset CDPATH if test "x$CONFIG_SHELL" = x; then as_bourne_compatible="if test -n \"\${ZSH_VERSION+set}\" && (emulate sh) >/dev/null 2>&1; then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on \${1+\"\$@\"}, which # is contrary to our usage. Disable this feature. alias -g '\${1+\"\$@\"}'='\"\$@\"' setopt NO_GLOB_SUBST else case \`(set -o) 2>/dev/null\` in #( *posix*) : set -o posix ;; #( *) : ;; esac fi " as_required="as_fn_return () { (exit \$1); } as_fn_success () { as_fn_return 0; } as_fn_failure () { as_fn_return 1; } as_fn_ret_success () { return 0; } as_fn_ret_failure () { return 1; } exitcode=0 as_fn_success || { exitcode=1; echo as_fn_success failed.; } as_fn_failure && { exitcode=1; echo as_fn_failure succeeded.; } as_fn_ret_success || { exitcode=1; echo as_fn_ret_success failed.; } as_fn_ret_failure && { exitcode=1; echo as_fn_ret_failure succeeded.; } if ( set x; as_fn_ret_success y && test x = \"\$1\" ); then : else exitcode=1; echo positional parameters were not saved. fi test x\$exitcode = x0 || exit 1" as_suggested=" as_lineno_1=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_1a=\$LINENO as_lineno_2=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_2a=\$LINENO eval 'test \"x\$as_lineno_1'\$as_run'\" != \"x\$as_lineno_2'\$as_run'\" && test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1" if (eval "$as_required") 2>/dev/null; then : as_have_required=yes else as_have_required=no fi if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null; then : else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR as_found=false for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. as_found=: case $as_dir in #( /*) for as_base in sh bash ksh sh5; do # Try only shells that exist, to save several forks. as_shell=$as_dir/$as_base if { test -f "$as_shell" || test -f "$as_shell.exe"; } && { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$as_shell"; } 2>/dev/null; then : CONFIG_SHELL=$as_shell as_have_required=yes if { $as_echo "$as_bourne_compatible""$as_suggested" | as_run=a "$as_shell"; } 2>/dev/null; then : break 2 fi fi done;; esac as_found=false done $as_found || { if { test -f "$SHELL" || test -f "$SHELL.exe"; } && { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$SHELL"; } 2>/dev/null; then : CONFIG_SHELL=$SHELL as_have_required=yes fi; } IFS=$as_save_IFS if test "x$CONFIG_SHELL" != x; then : # We cannot yet assume a decent shell, so we have to provide a # neutralization value for shells without unset; and this also # works around shells that cannot unset nonexistent variables. # Preserve -v and -x to the replacement shell. BASH_ENV=/dev/null ENV=/dev/null (unset BASH_ENV) >/dev/null 2>&1 && unset BASH_ENV ENV export CONFIG_SHELL case $- in # (((( *v*x* | *x*v* ) as_opts=-vx ;; *v* ) as_opts=-v ;; *x* ) as_opts=-x ;; * ) as_opts= ;; esac exec "$CONFIG_SHELL" $as_opts "$as_myself" ${1+"$@"} fi if test x$as_have_required = xno; then : $as_echo "$0: This script requires a shell more modern than all" $as_echo "$0: the shells that I found on your system." if test x${ZSH_VERSION+set} = xset ; then $as_echo "$0: In particular, zsh $ZSH_VERSION has bugs and should" $as_echo "$0: be upgraded to zsh 4.3.4 or later." else $as_echo "$0: Please tell bug-autoconf@gnu.org and $0: drbd-dev@lists.linbit.com about your system, including $0: any error possibly output before this message. Then $0: install a modern shell, or manually run the script $0: under such a shell if you do have one." fi exit 1 fi fi fi SHELL=${CONFIG_SHELL-/bin/sh} export SHELL # Unset more variables known to interfere with behavior of common tools. CLICOLOR_FORCE= GREP_OPTIONS= unset CLICOLOR_FORCE GREP_OPTIONS ## --------------------- ## ## M4sh Shell Functions. ## ## --------------------- ## # as_fn_unset VAR # --------------- # Portably unset VAR. as_fn_unset () { { eval $1=; unset $1;} } as_unset=as_fn_unset # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. as_fn_set_status () { return $1 } # as_fn_set_status # as_fn_exit STATUS # ----------------- # Exit the shell with STATUS, even in a "trap 0" or "set -e" context. as_fn_exit () { set +e as_fn_set_status $1 exit $1 } # as_fn_exit # as_fn_mkdir_p # ------------- # Create "$as_dir" as a directory, including parents if necessary. as_fn_mkdir_p () { case $as_dir in #( -*) as_dir=./$as_dir;; esac test -d "$as_dir" || eval $as_mkdir_p || { as_dirs= while :; do case $as_dir in #( *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" as_dir=`$as_dirname -- "$as_dir" || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || $as_echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` test -d "$as_dir" && break done test -z "$as_dirs" || eval "mkdir $as_dirs" } || test -d "$as_dir" || as_fn_error $? "cannot create directory $as_dir" } # as_fn_mkdir_p # as_fn_append VAR VALUE # ---------------------- # Append the text in VALUE to the end of the definition contained in VAR. Take # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : eval 'as_fn_append () { eval $1+=\$2 }' else as_fn_append () { eval $1=\$$1\$2 } fi # as_fn_append # as_fn_arith ARG... # ------------------ # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : eval 'as_fn_arith () { as_val=$(( $* )) }' else as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` } fi # as_fn_arith # as_fn_error STATUS ERROR [LINENO LOG_FD] # ---------------------------------------- # Output "`basename $0`: error: ERROR" to stderr. If LINENO and LOG_FD are # provided, also output the error to LOG_FD, referencing LINENO. Then exit the # script with STATUS, using 1 if that was 0. as_fn_error () { as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi $as_echo "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi if (basename -- /) >/dev/null 2>&1 && test "X`basename -- / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi if (as_dir=`dirname -- /` && test "X$as_dir" = X/) >/dev/null 2>&1; then as_dirname=dirname else as_dirname=false fi as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || $as_echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q } /^X\/\(\/\/\)$/{ s//\1/ q } /^X\/\(\/\).*/{ s//\1/ q } s/.*/./; q'` # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits as_lineno_1=$LINENO as_lineno_1a=$LINENO as_lineno_2=$LINENO as_lineno_2a=$LINENO eval 'test "x$as_lineno_1'$as_run'" != "x$as_lineno_2'$as_run'" && test "x`expr $as_lineno_1'$as_run' + 1`" = "x$as_lineno_2'$as_run'"' || { # Blame Lee E. McMahon (1931-1989) for sed's syntax. :-) sed -n ' p /[$]LINENO/= ' <$as_myself | sed ' s/[$]LINENO.*/&-/ t lineno b :lineno N :loop s/[$]LINENO\([^'$as_cr_alnum'_].*\n\)\(.*\)/\2\1\2/ t loop s/-\n.*// ' >$as_me.lineno && chmod +x "$as_me.lineno" || { $as_echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2; as_fn_exit 1; } # Don't try to exec as it changes $[0], causing all sort of problems # (the dirname of $[0] is not the place where we might find the # original and so on. Autoconf is especially sensitive to this). . "./$as_me.lineno" # Exit status is that of the last command. exit } ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) case `echo 'xy\c'` in *c*) ECHO_T=' ';; # ECHO_T is single tab character. xy) ECHO_C='\c';; *) echo `echo ksh88 bug on AIX 6.1` > /dev/null ECHO_T=' ';; esac;; *) ECHO_N='-n';; esac rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file else rm -f conf$$.dir mkdir conf$$.dir 2>/dev/null fi if (echo >conf$$.file) 2>/dev/null; then if ln -s conf$$.file conf$$ 2>/dev/null; then as_ln_s='ln -s' # ... but there are two gotchas: # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. # In both cases, we have to default to `cp -p'. ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || as_ln_s='cp -p' elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -p' fi else as_ln_s='cp -p' fi rm -f conf$$ conf$$.exe conf$$.dir/conf$$.file conf$$.file rmdir conf$$.dir 2>/dev/null if mkdir -p . 2>/dev/null; then as_mkdir_p='mkdir -p "$as_dir"' else test -d ./-p && rmdir ./-p as_mkdir_p=false fi if test -x / >/dev/null 2>&1; then as_test_x='test -x' else if ls -dL / >/dev/null 2>&1; then as_ls_L_option=L else as_ls_L_option= fi as_test_x=' eval sh -c '\'' if test -d "$1"; then test -d "$1/."; else case $1 in #( -*)set "./$1";; esac; case `ls -ld'$as_ls_L_option' "$1" 2>/dev/null` in #(( ???[sx]*):;;*)false;;esac;fi '\'' sh ' fi as_executable_p=$as_test_x # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" test -n "$DJDIR" || exec 7<&0 &1 # Name of the host. # hostname on some systems (SVR3.2, old GNU/Linux) returns a bogus exit status, # so uname gets run too. ac_hostname=`(hostname || uname -n) 2>/dev/null | sed 1q` # # Initializations. # ac_default_prefix=/usr/local ac_clean_files= ac_config_libobj_dir=. LIBOBJS= cross_compiling=no subdirs= MFLAGS= MAKEFLAGS= # Identity of this package. PACKAGE_NAME='DRBD' PACKAGE_TARNAME='drbd' PACKAGE_VERSION='8.4.4' PACKAGE_STRING='DRBD 8.4.4' PACKAGE_BUGREPORT='drbd-dev@lists.linbit.com' PACKAGE_URL='' ac_subst_vars='LTLIBOBJS LIBOBJS RPM_REQ_CHKCONFIG_PREUN RPM_REQ_CHKCONFIG_POST RPM_REQ_XEN RPM_REQ_BASH_COMPLETION RPM_REQ_HEARTBEAT RPM_REQ_PACEMAKER RPM_SUBPACKAGE_NOARCH RPM_BUILDREQ_KM RPM_BUILDREQ_DEFAULT RPM_CONFLICTS_KM RPM_DIST_TAG UDEV_RULE_SUFFIX BASH_COMPLETION_SUFFIX INITDIR DISTRO UDEVINFO UDEVADM DPKG_BUILDPACKAGE GIT TAR XSLTPROC RPMBUILD FLEX GREP SED LN_S OBJEXT EXEEXT ac_ct_CC CPPFLAGS LDFLAGS CFLAGS CC WITH_BASHCOMPLETION WITH_RGMANAGER WITH_HEARTBEAT WITH_PACEMAKER WITH_XEN WITH_UDEV WITH_KM WITH_LEGACY_UTILS WITH_UTILS target_alias host_alias build_alias LIBS ECHO_T ECHO_N ECHO_C DEFS mandir localedir libdir psdir pdfdir dvidir htmldir infodir docdir oldincludedir includedir localstatedir sharedstatedir sysconfdir datadir datarootdir libexecdir sbindir bindir program_transform_name prefix exec_prefix PACKAGE_URL PACKAGE_BUGREPORT PACKAGE_STRING PACKAGE_VERSION PACKAGE_TARNAME PACKAGE_NAME PATH_SEPARATOR SHELL' ac_subst_files='' ac_user_opts=' enable_option_checking with_utils with_legacy_utils with_km with_udev with_xen with_pacemaker with_heartbeat with_rgmanager with_bashcompletion with_distro with_initdir with_noarchsubpkg enable_spec ' ac_precious_vars='build_alias host_alias target_alias CC CFLAGS LDFLAGS LIBS CPPFLAGS' # Initialize some variables set by options. ac_init_help= ac_init_version=false ac_unrecognized_opts= ac_unrecognized_sep= # The variables have the same names as the options, with # dashes changed to underlines. cache_file=/dev/null exec_prefix=NONE no_create= no_recursion= prefix=NONE program_prefix=NONE program_suffix=NONE program_transform_name=s,x,x, silent= site= srcdir= verbose= x_includes=NONE x_libraries=NONE # Installation directory options. # These are left unexpanded so users can "make install exec_prefix=/foo" # and all the variables that are supposed to be based on exec_prefix # by default will actually change. # Use braces instead of parens because sh, perl, etc. also accept them. # (The list follows the same order as the GNU Coding Standards.) bindir='${exec_prefix}/bin' sbindir='${exec_prefix}/sbin' libexecdir='${exec_prefix}/libexec' datarootdir='${prefix}/share' datadir='${datarootdir}' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' includedir='${prefix}/include' oldincludedir='/usr/include' docdir='${datarootdir}/doc/${PACKAGE_TARNAME}' infodir='${datarootdir}/info' htmldir='${docdir}' dvidir='${docdir}' pdfdir='${docdir}' psdir='${docdir}' libdir='${exec_prefix}/lib' localedir='${datarootdir}/locale' mandir='${datarootdir}/man' ac_prev= ac_dashdash= for ac_option do # If the previous option needs an argument, assign it. if test -n "$ac_prev"; then eval $ac_prev=\$ac_option ac_prev= continue fi case $ac_option in *=?*) ac_optarg=`expr "X$ac_option" : '[^=]*=\(.*\)'` ;; *=) ac_optarg= ;; *) ac_optarg=yes ;; esac # Accept the important Cygnus configure options, so we can diagnose typos. case $ac_dashdash$ac_option in --) ac_dashdash=yes ;; -bindir | --bindir | --bindi | --bind | --bin | --bi) ac_prev=bindir ;; -bindir=* | --bindir=* | --bindi=* | --bind=* | --bin=* | --bi=*) bindir=$ac_optarg ;; -build | --build | --buil | --bui | --bu) ac_prev=build_alias ;; -build=* | --build=* | --buil=* | --bui=* | --bu=*) build_alias=$ac_optarg ;; -cache-file | --cache-file | --cache-fil | --cache-fi \ | --cache-f | --cache- | --cache | --cach | --cac | --ca | --c) ac_prev=cache_file ;; -cache-file=* | --cache-file=* | --cache-fil=* | --cache-fi=* \ | --cache-f=* | --cache-=* | --cache=* | --cach=* | --cac=* | --ca=* | --c=*) cache_file=$ac_optarg ;; --config-cache | -C) cache_file=config.cache ;; -datadir | --datadir | --datadi | --datad) ac_prev=datadir ;; -datadir=* | --datadir=* | --datadi=* | --datad=*) datadir=$ac_optarg ;; -datarootdir | --datarootdir | --datarootdi | --datarootd | --dataroot \ | --dataroo | --dataro | --datar) ac_prev=datarootdir ;; -datarootdir=* | --datarootdir=* | --datarootdi=* | --datarootd=* \ | --dataroot=* | --dataroo=* | --dataro=* | --datar=*) datarootdir=$ac_optarg ;; -disable-* | --disable-*) ac_useropt=`expr "x$ac_option" : 'x-*disable-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid feature name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--disable-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval enable_$ac_useropt=no ;; -docdir | --docdir | --docdi | --doc | --do) ac_prev=docdir ;; -docdir=* | --docdir=* | --docdi=* | --doc=* | --do=*) docdir=$ac_optarg ;; -dvidir | --dvidir | --dvidi | --dvid | --dvi | --dv) ac_prev=dvidir ;; -dvidir=* | --dvidir=* | --dvidi=* | --dvid=* | --dvi=* | --dv=*) dvidir=$ac_optarg ;; -enable-* | --enable-*) ac_useropt=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid feature name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--enable-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval enable_$ac_useropt=\$ac_optarg ;; -exec-prefix | --exec_prefix | --exec-prefix | --exec-prefi \ | --exec-pref | --exec-pre | --exec-pr | --exec-p | --exec- \ | --exec | --exe | --ex) ac_prev=exec_prefix ;; -exec-prefix=* | --exec_prefix=* | --exec-prefix=* | --exec-prefi=* \ | --exec-pref=* | --exec-pre=* | --exec-pr=* | --exec-p=* | --exec-=* \ | --exec=* | --exe=* | --ex=*) exec_prefix=$ac_optarg ;; -gas | --gas | --ga | --g) # Obsolete; use --with-gas. with_gas=yes ;; -help | --help | --hel | --he | -h) ac_init_help=long ;; -help=r* | --help=r* | --hel=r* | --he=r* | -hr*) ac_init_help=recursive ;; -help=s* | --help=s* | --hel=s* | --he=s* | -hs*) ac_init_help=short ;; -host | --host | --hos | --ho) ac_prev=host_alias ;; -host=* | --host=* | --hos=* | --ho=*) host_alias=$ac_optarg ;; -htmldir | --htmldir | --htmldi | --htmld | --html | --htm | --ht) ac_prev=htmldir ;; -htmldir=* | --htmldir=* | --htmldi=* | --htmld=* | --html=* | --htm=* \ | --ht=*) htmldir=$ac_optarg ;; -includedir | --includedir | --includedi | --included | --include \ | --includ | --inclu | --incl | --inc) ac_prev=includedir ;; -includedir=* | --includedir=* | --includedi=* | --included=* | --include=* \ | --includ=* | --inclu=* | --incl=* | --inc=*) includedir=$ac_optarg ;; -infodir | --infodir | --infodi | --infod | --info | --inf) ac_prev=infodir ;; -infodir=* | --infodir=* | --infodi=* | --infod=* | --info=* | --inf=*) infodir=$ac_optarg ;; -libdir | --libdir | --libdi | --libd) ac_prev=libdir ;; -libdir=* | --libdir=* | --libdi=* | --libd=*) libdir=$ac_optarg ;; -libexecdir | --libexecdir | --libexecdi | --libexecd | --libexec \ | --libexe | --libex | --libe) ac_prev=libexecdir ;; -libexecdir=* | --libexecdir=* | --libexecdi=* | --libexecd=* | --libexec=* \ | --libexe=* | --libex=* | --libe=*) libexecdir=$ac_optarg ;; -localedir | --localedir | --localedi | --localed | --locale) ac_prev=localedir ;; -localedir=* | --localedir=* | --localedi=* | --localed=* | --locale=*) localedir=$ac_optarg ;; -localstatedir | --localstatedir | --localstatedi | --localstated \ | --localstate | --localstat | --localsta | --localst | --locals) ac_prev=localstatedir ;; -localstatedir=* | --localstatedir=* | --localstatedi=* | --localstated=* \ | --localstate=* | --localstat=* | --localsta=* | --localst=* | --locals=*) localstatedir=$ac_optarg ;; -mandir | --mandir | --mandi | --mand | --man | --ma | --m) ac_prev=mandir ;; -mandir=* | --mandir=* | --mandi=* | --mand=* | --man=* | --ma=* | --m=*) mandir=$ac_optarg ;; -nfp | --nfp | --nf) # Obsolete; use --without-fp. with_fp=no ;; -no-create | --no-create | --no-creat | --no-crea | --no-cre \ | --no-cr | --no-c | -n) no_create=yes ;; -no-recursion | --no-recursion | --no-recursio | --no-recursi \ | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) no_recursion=yes ;; -oldincludedir | --oldincludedir | --oldincludedi | --oldincluded \ | --oldinclude | --oldinclud | --oldinclu | --oldincl | --oldinc \ | --oldin | --oldi | --old | --ol | --o) ac_prev=oldincludedir ;; -oldincludedir=* | --oldincludedir=* | --oldincludedi=* | --oldincluded=* \ | --oldinclude=* | --oldinclud=* | --oldinclu=* | --oldincl=* | --oldinc=* \ | --oldin=* | --oldi=* | --old=* | --ol=* | --o=*) oldincludedir=$ac_optarg ;; -prefix | --prefix | --prefi | --pref | --pre | --pr | --p) ac_prev=prefix ;; -prefix=* | --prefix=* | --prefi=* | --pref=* | --pre=* | --pr=* | --p=*) prefix=$ac_optarg ;; -program-prefix | --program-prefix | --program-prefi | --program-pref \ | --program-pre | --program-pr | --program-p) ac_prev=program_prefix ;; -program-prefix=* | --program-prefix=* | --program-prefi=* \ | --program-pref=* | --program-pre=* | --program-pr=* | --program-p=*) program_prefix=$ac_optarg ;; -program-suffix | --program-suffix | --program-suffi | --program-suff \ | --program-suf | --program-su | --program-s) ac_prev=program_suffix ;; -program-suffix=* | --program-suffix=* | --program-suffi=* \ | --program-suff=* | --program-suf=* | --program-su=* | --program-s=*) program_suffix=$ac_optarg ;; -program-transform-name | --program-transform-name \ | --program-transform-nam | --program-transform-na \ | --program-transform-n | --program-transform- \ | --program-transform | --program-transfor \ | --program-transfo | --program-transf \ | --program-trans | --program-tran \ | --progr-tra | --program-tr | --program-t) ac_prev=program_transform_name ;; -program-transform-name=* | --program-transform-name=* \ | --program-transform-nam=* | --program-transform-na=* \ | --program-transform-n=* | --program-transform-=* \ | --program-transform=* | --program-transfor=* \ | --program-transfo=* | --program-transf=* \ | --program-trans=* | --program-tran=* \ | --progr-tra=* | --program-tr=* | --program-t=*) program_transform_name=$ac_optarg ;; -pdfdir | --pdfdir | --pdfdi | --pdfd | --pdf | --pd) ac_prev=pdfdir ;; -pdfdir=* | --pdfdir=* | --pdfdi=* | --pdfd=* | --pdf=* | --pd=*) pdfdir=$ac_optarg ;; -psdir | --psdir | --psdi | --psd | --ps) ac_prev=psdir ;; -psdir=* | --psdir=* | --psdi=* | --psd=* | --ps=*) psdir=$ac_optarg ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) silent=yes ;; -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ | --sbi=* | --sb=*) sbindir=$ac_optarg ;; -sharedstatedir | --sharedstatedir | --sharedstatedi \ | --sharedstated | --sharedstate | --sharedstat | --sharedsta \ | --sharedst | --shareds | --shared | --share | --shar \ | --sha | --sh) ac_prev=sharedstatedir ;; -sharedstatedir=* | --sharedstatedir=* | --sharedstatedi=* \ | --sharedstated=* | --sharedstate=* | --sharedstat=* | --sharedsta=* \ | --sharedst=* | --shareds=* | --shared=* | --share=* | --shar=* \ | --sha=* | --sh=*) sharedstatedir=$ac_optarg ;; -site | --site | --sit) ac_prev=site ;; -site=* | --site=* | --sit=*) site=$ac_optarg ;; -srcdir | --srcdir | --srcdi | --srcd | --src | --sr) ac_prev=srcdir ;; -srcdir=* | --srcdir=* | --srcdi=* | --srcd=* | --src=* | --sr=*) srcdir=$ac_optarg ;; -sysconfdir | --sysconfdir | --sysconfdi | --sysconfd | --sysconf \ | --syscon | --sysco | --sysc | --sys | --sy) ac_prev=sysconfdir ;; -sysconfdir=* | --sysconfdir=* | --sysconfdi=* | --sysconfd=* | --sysconf=* \ | --syscon=* | --sysco=* | --sysc=* | --sys=* | --sy=*) sysconfdir=$ac_optarg ;; -target | --target | --targe | --targ | --tar | --ta | --t) ac_prev=target_alias ;; -target=* | --target=* | --targe=* | --targ=* | --tar=* | --ta=* | --t=*) target_alias=$ac_optarg ;; -v | -verbose | --verbose | --verbos | --verbo | --verb) verbose=yes ;; -version | --version | --versio | --versi | --vers | -V) ac_init_version=: ;; -with-* | --with-*) ac_useropt=`expr "x$ac_option" : 'x-*with-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid package name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--with-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval with_$ac_useropt=\$ac_optarg ;; -without-* | --without-*) ac_useropt=`expr "x$ac_option" : 'x-*without-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid package name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--without-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval with_$ac_useropt=no ;; --x) # Obsolete; use --with-x. with_x=yes ;; -x-includes | --x-includes | --x-include | --x-includ | --x-inclu \ | --x-incl | --x-inc | --x-in | --x-i) ac_prev=x_includes ;; -x-includes=* | --x-includes=* | --x-include=* | --x-includ=* | --x-inclu=* \ | --x-incl=* | --x-inc=* | --x-in=* | --x-i=*) x_includes=$ac_optarg ;; -x-libraries | --x-libraries | --x-librarie | --x-librari \ | --x-librar | --x-libra | --x-libr | --x-lib | --x-li | --x-l) ac_prev=x_libraries ;; -x-libraries=* | --x-libraries=* | --x-librarie=* | --x-librari=* \ | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*) x_libraries=$ac_optarg ;; -*) as_fn_error $? "unrecognized option: \`$ac_option' Try \`$0 --help' for more information" ;; *=*) ac_envvar=`expr "x$ac_option" : 'x\([^=]*\)='` # Reject names that are not valid shell variable names. case $ac_envvar in #( '' | [0-9]* | *[!_$as_cr_alnum]* ) as_fn_error $? "invalid variable name: \`$ac_envvar'" ;; esac eval $ac_envvar=\$ac_optarg export $ac_envvar ;; *) # FIXME: should be removed in autoconf 3.0. $as_echo "$as_me: WARNING: you should use --build, --host, --target" >&2 expr "x$ac_option" : ".*[^-._$as_cr_alnum]" >/dev/null && $as_echo "$as_me: WARNING: invalid host type: $ac_option" >&2 : "${build_alias=$ac_option} ${host_alias=$ac_option} ${target_alias=$ac_option}" ;; esac done if test -n "$ac_prev"; then ac_option=--`echo $ac_prev | sed 's/_/-/g'` as_fn_error $? "missing argument to $ac_option" fi if test -n "$ac_unrecognized_opts"; then case $enable_option_checking in no) ;; fatal) as_fn_error $? "unrecognized options: $ac_unrecognized_opts" ;; *) $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2 ;; esac fi # Check all directory arguments for consistency. for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \ datadir sysconfdir sharedstatedir localstatedir includedir \ oldincludedir docdir infodir htmldir dvidir pdfdir psdir \ libdir localedir mandir do eval ac_val=\$$ac_var # Remove trailing slashes. case $ac_val in */ ) ac_val=`expr "X$ac_val" : 'X\(.*[^/]\)' \| "X$ac_val" : 'X\(.*\)'` eval $ac_var=\$ac_val;; esac # Be sure to have absolute directory names. case $ac_val in [\\/$]* | ?:[\\/]* ) continue;; NONE | '' ) case $ac_var in *prefix ) continue;; esac;; esac as_fn_error $? "expected an absolute directory name for --$ac_var: $ac_val" done # There might be people who depend on the old broken behavior: `$host' # used to hold the argument of --host etc. # FIXME: To remove some day. build=$build_alias host=$host_alias target=$target_alias # FIXME: To remove some day. if test "x$host_alias" != x; then if test "x$build_alias" = x; then cross_compiling=maybe $as_echo "$as_me: WARNING: if you wanted to set the --build type, don't use --host. If a cross compiler is detected then cross compile mode will be used" >&2 elif test "x$build_alias" != "x$host_alias"; then cross_compiling=yes fi fi ac_tool_prefix= test -n "$host_alias" && ac_tool_prefix=$host_alias- test "$silent" = yes && exec 6>/dev/null ac_pwd=`pwd` && test -n "$ac_pwd" && ac_ls_di=`ls -di .` && ac_pwd_ls_di=`cd "$ac_pwd" && ls -di .` || as_fn_error $? "working directory cannot be determined" test "X$ac_ls_di" = "X$ac_pwd_ls_di" || as_fn_error $? "pwd does not report name of working directory" # Find the source files, if location was not specified. if test -z "$srcdir"; then ac_srcdir_defaulted=yes # Try the directory containing this script, then the parent directory. ac_confdir=`$as_dirname -- "$as_myself" || $as_expr X"$as_myself" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_myself" : 'X\(//\)[^/]' \| \ X"$as_myself" : 'X\(//\)$' \| \ X"$as_myself" : 'X\(/\)' \| . 2>/dev/null || $as_echo X"$as_myself" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` srcdir=$ac_confdir if test ! -r "$srcdir/$ac_unique_file"; then srcdir=.. fi else ac_srcdir_defaulted=no fi if test ! -r "$srcdir/$ac_unique_file"; then test "$ac_srcdir_defaulted" = yes && srcdir="$ac_confdir or .." as_fn_error $? "cannot find sources ($ac_unique_file) in $srcdir" fi ac_msg="sources are in $srcdir, but \`cd $srcdir' does not work" ac_abs_confdir=`( cd "$srcdir" && test -r "./$ac_unique_file" || as_fn_error $? "$ac_msg" pwd)` # When building in place, set srcdir=. if test "$ac_abs_confdir" = "$ac_pwd"; then srcdir=. fi # Remove unnecessary trailing slashes from srcdir. # Double slashes in file names in object file debugging info # mess up M-x gdb in Emacs. case $srcdir in */) srcdir=`expr "X$srcdir" : 'X\(.*[^/]\)' \| "X$srcdir" : 'X\(.*\)'`;; esac for ac_var in $ac_precious_vars; do eval ac_env_${ac_var}_set=\${${ac_var}+set} eval ac_env_${ac_var}_value=\$${ac_var} eval ac_cv_env_${ac_var}_set=\${${ac_var}+set} eval ac_cv_env_${ac_var}_value=\$${ac_var} done # # Report the --help message. # if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF \`configure' configures DRBD 8.4.4 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... To assign environment variables (e.g., CC, CFLAGS...), specify them as VAR=VALUE. See below for descriptions of some of the useful variables. Defaults for the options are specified in brackets. Configuration: -h, --help display this help and exit --help=short display options specific to this package --help=recursive display the short help of all the included packages -V, --version display version information and exit -q, --quiet, --silent do not print \`checking ...' messages --cache-file=FILE cache test results in FILE [disabled] -C, --config-cache alias for \`--cache-file=config.cache' -n, --no-create do not create output files --srcdir=DIR find the sources in DIR [configure dir or \`..'] Installation directories: --prefix=PREFIX install architecture-independent files in PREFIX [$ac_default_prefix] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [PREFIX] By default, \`make install' will install all the files in \`$ac_default_prefix/bin', \`$ac_default_prefix/lib' etc. You can specify an installation prefix other than \`$ac_default_prefix' using \`--prefix', for instance \`--prefix=\$HOME'. For better control, use the options below. Fine tuning of the installation directories: --bindir=DIR user executables [EPREFIX/bin] --sbindir=DIR system admin executables [EPREFIX/sbin] --libexecdir=DIR program executables [EPREFIX/libexec] --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] --datarootdir=DIR read-only arch.-independent data root [PREFIX/share] --datadir=DIR read-only architecture-independent data [DATAROOTDIR] --infodir=DIR info documentation [DATAROOTDIR/info] --localedir=DIR locale-dependent data [DATAROOTDIR/locale] --mandir=DIR man documentation [DATAROOTDIR/man] --docdir=DIR documentation root [DATAROOTDIR/doc/drbd] --htmldir=DIR html documentation [DOCDIR] --dvidir=DIR dvi documentation [DOCDIR] --pdfdir=DIR pdf documentation [DOCDIR] --psdir=DIR ps documentation [DOCDIR] _ACEOF cat <<\_ACEOF _ACEOF fi if test -n "$ac_init_help"; then case $ac_init_help in short | recursive ) echo "Configuration of DRBD 8.4.4:";; esac cat <<\_ACEOF Optional Features: --disable-option-checking ignore unrecognized --enable/--with options --disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no) --enable-FEATURE[=ARG] include FEATURE [ARG=yes] --enable-spec Rather than creating Makefiles, create an RPM spec file only Optional Packages: --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) --with-utils Enable management utilities --without-legacy_utils Do not include legacy <= 8.3 drbdsetup/drbdadm --with-km Enable kernel module --with-udev Enable udev integration --with-xen Enable Xen integration --with-pacemaker Enable Pacemaker integration --with-heartbeat Enable Heartbeat integration --with-rgmanager Enable Red Hat Cluster Suite integration --with-bashcompletion Enable programmable bash completion --with-distro Configure for a specific distribution (supported values: generic, redhat, suse, debian, gentoo, slackware; default is to autodetect) --with-initdir Override directory for init scripts (default is distribution-specific) --with-noarchsubpkg Build subpackages that support it for the "noarch" architecture (makes sense only with --enable-spec, supported by RPM from 4.6.0 forward) Some influential environment variables: CC C compiler command CFLAGS C compiler flags LDFLAGS linker flags, e.g. -L if you have libraries in a nonstandard directory LIBS libraries to pass to the linker, e.g. -l CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I if you have headers in a nonstandard directory Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. Report bugs to . _ACEOF ac_status=$? fi if test "$ac_init_help" = "recursive"; then # If there are subdirs, report their specific --help. for ac_dir in : $ac_subdirs_all; do test "x$ac_dir" = x: && continue test -d "$ac_dir" || { cd "$srcdir" && ac_pwd=`pwd` && srcdir=. && test -d "$ac_dir"; } || continue ac_builddir=. case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; esac ;; esac ac_abs_top_builddir=$ac_pwd ac_abs_builddir=$ac_pwd$ac_dir_suffix # for backward compatibility: ac_top_builddir=$ac_top_build_prefix case $srcdir in .) # We are building in place. ac_srcdir=. ac_top_srcdir=$ac_top_builddir_sub ac_abs_top_srcdir=$ac_pwd ;; [\\/]* | ?:[\\/]* ) # Absolute name. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ac_abs_top_srcdir=$srcdir ;; *) # Relative name. ac_srcdir=$ac_top_build_prefix$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_build_prefix$srcdir ac_abs_top_srcdir=$ac_pwd/$srcdir ;; esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix cd "$ac_dir" || { ac_status=$?; continue; } # Check for guested configure. if test -f "$ac_srcdir/configure.gnu"; then echo && $SHELL "$ac_srcdir/configure.gnu" --help=recursive elif test -f "$ac_srcdir/configure"; then echo && $SHELL "$ac_srcdir/configure" --help=recursive else $as_echo "$as_me: WARNING: no configuration information is in $ac_dir" >&2 fi || ac_status=$? cd "$ac_pwd" || { ac_status=$?; break; } done fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF DRBD configure 8.4.4 generated by GNU Autoconf 2.68 Copyright (C) 2010 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. _ACEOF exit fi ## ------------------------ ## ## Autoconf initialization. ## ## ------------------------ ## # ac_fn_c_try_compile LINENO # -------------------------- # Try to compile conftest.$ac_ext, and return whether this succeeded. ac_fn_c_try_compile () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack rm -f conftest.$ac_objext if { { ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>conftest.err ac_status=$? if test -s conftest.err; then grep -v '^ *+' conftest.err >conftest.er1 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then : ac_retval=0 else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 fi eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno as_fn_set_status $ac_retval } # ac_fn_c_try_compile cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by DRBD $as_me 8.4.4, which was generated by GNU Autoconf 2.68. Invocation command line was $ $0 $@ _ACEOF exec 5>>config.log { cat <<_ASUNAME ## --------- ## ## Platform. ## ## --------- ## hostname = `(hostname || uname -n) 2>/dev/null | sed 1q` uname -m = `(uname -m) 2>/dev/null || echo unknown` uname -r = `(uname -r) 2>/dev/null || echo unknown` uname -s = `(uname -s) 2>/dev/null || echo unknown` uname -v = `(uname -v) 2>/dev/null || echo unknown` /usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null || echo unknown` /bin/uname -X = `(/bin/uname -X) 2>/dev/null || echo unknown` /bin/arch = `(/bin/arch) 2>/dev/null || echo unknown` /usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null || echo unknown` /usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null || echo unknown` /usr/bin/hostinfo = `(/usr/bin/hostinfo) 2>/dev/null || echo unknown` /bin/machine = `(/bin/machine) 2>/dev/null || echo unknown` /usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null || echo unknown` /bin/universe = `(/bin/universe) 2>/dev/null || echo unknown` _ASUNAME as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. $as_echo "PATH: $as_dir" done IFS=$as_save_IFS } >&5 cat >&5 <<_ACEOF ## ----------- ## ## Core tests. ## ## ----------- ## _ACEOF # Keep a trace of the command line. # Strip out --no-create and --no-recursion so they do not pile up. # Strip out --silent because we don't want to record it for future runs. # Also quote any args containing shell meta-characters. # Make two passes to allow for proper duplicate-argument suppression. ac_configure_args= ac_configure_args0= ac_configure_args1= ac_must_keep_next=false for ac_pass in 1 2 do for ac_arg do case $ac_arg in -no-create | --no-c* | -n | -no-recursion | --no-r*) continue ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) continue ;; *\'*) ac_arg=`$as_echo "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; esac case $ac_pass in 1) as_fn_append ac_configure_args0 " '$ac_arg'" ;; 2) as_fn_append ac_configure_args1 " '$ac_arg'" if test $ac_must_keep_next = true; then ac_must_keep_next=false # Got value, back to normal. else case $ac_arg in *=* | --config-cache | -C | -disable-* | --disable-* \ | -enable-* | --enable-* | -gas | --g* | -nfp | --nf* \ | -q | -quiet | --q* | -silent | --sil* | -v | -verb* \ | -with-* | --with-* | -without-* | --without-* | --x) case "$ac_configure_args0 " in "$ac_configure_args1"*" '$ac_arg' "* ) continue ;; esac ;; -* ) ac_must_keep_next=true ;; esac fi as_fn_append ac_configure_args " '$ac_arg'" ;; esac done done { ac_configure_args0=; unset ac_configure_args0;} { ac_configure_args1=; unset ac_configure_args1;} # When interrupted or exit'd, cleanup temporary files, and complete # config.log. We remove comments because anyway the quotes in there # would cause problems or look ugly. # WARNING: Use '\'' to represent an apostrophe within the trap. # WARNING: Do not start the trap code with a newline, due to a FreeBSD 4.0 bug. trap 'exit_status=$? # Save into config.log some information that might help in debugging. { echo $as_echo "## ---------------- ## ## Cache variables. ## ## ---------------- ##" echo # The following way of writing the cache mishandles newlines in values, ( for ac_var in `(set) 2>&1 | sed -n '\''s/^\([a-zA-Z_][a-zA-Z0-9_]*\)=.*/\1/p'\''`; do eval ac_val=\$$ac_var case $ac_val in #( *${as_nl}*) case $ac_var in #( *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( BASH_ARGV | BASH_SOURCE) eval $ac_var= ;; #( *) { eval $ac_var=; unset $ac_var;} ;; esac ;; esac done (set) 2>&1 | case $as_nl`(ac_space='\'' '\''; set) 2>&1` in #( *${as_nl}ac_space=\ *) sed -n \ "s/'\''/'\''\\\\'\'''\''/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\''\\2'\''/p" ;; #( *) sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p" ;; esac | sort ) echo $as_echo "## ----------------- ## ## Output variables. ## ## ----------------- ##" echo for ac_var in $ac_subst_vars do eval ac_val=\$$ac_var case $ac_val in *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac $as_echo "$ac_var='\''$ac_val'\''" done | sort echo if test -n "$ac_subst_files"; then $as_echo "## ------------------- ## ## File substitutions. ## ## ------------------- ##" echo for ac_var in $ac_subst_files do eval ac_val=\$$ac_var case $ac_val in *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac $as_echo "$ac_var='\''$ac_val'\''" done | sort echo fi if test -s confdefs.h; then $as_echo "## ----------- ## ## confdefs.h. ## ## ----------- ##" echo cat confdefs.h echo fi test "$ac_signal" != 0 && $as_echo "$as_me: caught signal $ac_signal" $as_echo "$as_me: exit $exit_status" } >&5 rm -f core *.core core.conftest.* && rm -f -r conftest* confdefs* conf$$* $ac_clean_files && exit $exit_status ' 0 for ac_signal in 1 2 13 15; do trap 'ac_signal='$ac_signal'; as_fn_exit 1' $ac_signal done ac_signal=0 # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -f -r conftest* confdefs.h $as_echo "/* confdefs.h */" > confdefs.h # Predefined preprocessor variables. cat >>confdefs.h <<_ACEOF #define PACKAGE_NAME "$PACKAGE_NAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_TARNAME "$PACKAGE_TARNAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_VERSION "$PACKAGE_VERSION" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_STRING "$PACKAGE_STRING" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_BUGREPORT "$PACKAGE_BUGREPORT" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_URL "$PACKAGE_URL" _ACEOF # Let the site file select an alternate cache file if it wants to. # Prefer an explicitly selected file to automatically selected ones. ac_site_file1=NONE ac_site_file2=NONE if test -n "$CONFIG_SITE"; then # We do not want a PATH search for config.site. case $CONFIG_SITE in #(( -*) ac_site_file1=./$CONFIG_SITE;; */*) ac_site_file1=$CONFIG_SITE;; *) ac_site_file1=./$CONFIG_SITE;; esac elif test "x$prefix" != xNONE; then ac_site_file1=$prefix/share/config.site ac_site_file2=$prefix/etc/config.site else ac_site_file1=$ac_default_prefix/share/config.site ac_site_file2=$ac_default_prefix/etc/config.site fi for ac_site_file in "$ac_site_file1" "$ac_site_file2" do test "x$ac_site_file" = xNONE && continue if test /dev/null != "$ac_site_file" && test -r "$ac_site_file"; then { $as_echo "$as_me:${as_lineno-$LINENO}: loading site script $ac_site_file" >&5 $as_echo "$as_me: loading site script $ac_site_file" >&6;} sed 's/^/| /' "$ac_site_file" >&5 . "$ac_site_file" \ || { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "failed to load site script $ac_site_file See \`config.log' for more details" "$LINENO" 5; } fi done if test -r "$cache_file"; then # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 $as_echo "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 $as_echo "$as_me: creating cache $cache_file" >&6;} >$cache_file fi # Check that the precious variables saved in the cache have kept the same # value. ac_cache_corrupted=false for ac_var in $ac_precious_vars; do eval ac_old_set=\$ac_cv_env_${ac_var}_set eval ac_new_set=\$ac_env_${ac_var}_set eval ac_old_val=\$ac_cv_env_${ac_var}_value eval ac_new_val=\$ac_env_${ac_var}_value case $ac_old_set,$ac_new_set in set,) { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 $as_echo "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} ac_cache_corrupted=: ;; ,set) { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5 $as_echo "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} ac_cache_corrupted=: ;; ,);; *) if test "x$ac_old_val" != "x$ac_new_val"; then # differences in whitespace do not lead to failure. ac_old_val_w=`echo x $ac_old_val` ac_new_val_w=`echo x $ac_new_val` if test "$ac_old_val_w" != "$ac_new_val_w"; then { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5 $as_echo "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} ac_cache_corrupted=: else { $as_echo "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5 $as_echo "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;} eval $ac_var=\$ac_old_val fi { $as_echo "$as_me:${as_lineno-$LINENO}: former value: \`$ac_old_val'" >&5 $as_echo "$as_me: former value: \`$ac_old_val'" >&2;} { $as_echo "$as_me:${as_lineno-$LINENO}: current value: \`$ac_new_val'" >&5 $as_echo "$as_me: current value: \`$ac_new_val'" >&2;} fi;; esac # Pass precious variables to config.status. if test "$ac_new_set" = set; then case $ac_new_val in *\'*) ac_arg=$ac_var=`$as_echo "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; *) ac_arg=$ac_var=$ac_new_val ;; esac case " $ac_configure_args " in *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. *) as_fn_append ac_configure_args " '$ac_arg'" ;; esac fi done if $ac_cache_corrupted; then { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} { $as_echo "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5 $as_echo "$as_me: error: changes in the environment can compromise the build" >&2;} as_fn_error $? "run \`make distclean' and/or \`rm $cache_file' and start over" "$LINENO" 5 fi ## -------------------- ## ## Main body of script. ## ## -------------------- ## ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu if test "$prefix" = "NONE"; then prefix=$ac_default_prefix fi exec_prefix=$prefix prefix="`eval echo ${prefix}`" exec_prefix="`eval echo ${exec_prefix}`" bindir="`eval echo ${bindir}`" sbindir="`eval echo ${sbindir}`" libexecdir="`eval echo ${libexecdir}`" datarootdir="`eval echo ${datarootdir}`" datadir="`eval echo ${datadir}`" sysconfdir="`eval echo ${sysconfdir}`" sharedstatedir="`eval echo ${sharedstatedir}`" localstatedir="`eval echo ${localstatedir}`" libdir="`eval echo ${libdir}`" includedir="`eval echo ${includedir}`" oldincludedir="`eval echo ${oldincludedir}`" infodir="`eval echo ${infodir}`" mandir="`eval echo ${mandir}`" docdir="`eval echo ${docdir}`" WITH_UTILS=yes WITH_LEGACY_UTILS=yes WITH_KM=no WITH_UDEV=yes WITH_XEN=yes WITH_PACEMAKER=yes WITH_HEARTBEAT=yes WITH_RGMANAGER=no WITH_BASHCOMPLETION=yes WITH_NOARCH_SUBPACKAGES=no # Check whether --with-utils was given. if test "${with_utils+set}" = set; then : withval=$with_utils; WITH_UTILS=$withval fi # Check whether --with-legacy_utils was given. if test "${with_legacy_utils+set}" = set; then : withval=$with_legacy_utils; WITH_LEGACY_UTILS=$withval fi # Check whether --with-km was given. if test "${with_km+set}" = set; then : withval=$with_km; WITH_KM=$withval fi # Check whether --with-udev was given. if test "${with_udev+set}" = set; then : withval=$with_udev; WITH_UDEV=$withval fi # Check whether --with-xen was given. if test "${with_xen+set}" = set; then : withval=$with_xen; WITH_XEN=$withval fi # Check whether --with-pacemaker was given. if test "${with_pacemaker+set}" = set; then : withval=$with_pacemaker; WITH_PACEMAKER=$withval fi # Check whether --with-heartbeat was given. if test "${with_heartbeat+set}" = set; then : withval=$with_heartbeat; WITH_HEARTBEAT=$withval fi # Check whether --with-rgmanager was given. if test "${with_rgmanager+set}" = set; then : withval=$with_rgmanager; WITH_RGMANAGER=$withval fi # Check whether --with-bashcompletion was given. if test "${with_bashcompletion+set}" = set; then : withval=$with_bashcompletion; WITH_BASHCOMPLETION=$withval fi # Check whether --with-distro was given. if test "${with_distro+set}" = set; then : withval=$with_distro; DISTRO=$withval fi # Check whether --with-initdir was given. if test "${with_initdir+set}" = set; then : withval=$with_initdir; INITDIR=$withval fi # Check whether --with-noarchsubpkg was given. if test "${with_noarchsubpkg+set}" = set; then : withval=$with_noarchsubpkg; WITH_NOARCH_SUBPACKAGES=$withval fi # Check whether --enable-spec was given. if test "${enable_spec+set}" = set; then : enableval=$enable_spec; SPECMODE=$enableval else SPECMODE="" fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_CC+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_CC="${ac_tool_prefix}gcc" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 $as_echo "$CC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_ac_ct_CC+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_ac_ct_CC="gcc" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 $as_echo "$ac_ct_CC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi if test "x$ac_ct_CC" = x; then CC="" else case $cross_compiling:$ac_tool_warned in yes:) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 $as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC fi else CC="$ac_cv_prog_CC" fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_CC+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_CC="${ac_tool_prefix}cc" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 $as_echo "$CC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi fi fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_CC+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else ac_prog_rejected=no as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS if test $ac_prog_rejected = yes; then # We found a bogon in the path, so make sure we never use it. set dummy $ac_cv_prog_CC shift if test $# != 0; then # We chose a different compiler from the bogus one. # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 $as_echo "$CC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then for ac_prog in cl.exe do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_CC+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 $as_echo "$CC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi test -n "$CC" && break done fi if test -z "$CC"; then ac_ct_CC=$CC for ac_prog in cl.exe do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_ac_ct_CC+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_ac_ct_CC="$ac_prog" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 $as_echo "$ac_ct_CC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi test -n "$ac_ct_CC" && break done if test "x$ac_ct_CC" = x; then CC="" else case $cross_compiling:$ac_tool_warned in yes:) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 $as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC fi fi fi test -z "$CC" && { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "no acceptable C compiler found in \$PATH See \`config.log' for more details" "$LINENO" 5; } # Provide some information about the compiler. $as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5 set X $ac_compile ac_compiler=$2 for ac_option in --version -v -V -qversion; do { { ac_try="$ac_compiler $ac_option >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compiler $ac_option >&5") 2>conftest.err ac_status=$? if test -s conftest.err; then sed '10a\ ... rest of stderr output deleted ... 10q' conftest.err >conftest.er1 cat conftest.er1 >&5 fi rm -f conftest.er1 conftest.err $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } done cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files a.out a.out.dSYM a.exe b.out" # Try to create an executable without -o first, disregard a.out. # It will help us diagnose broken compilers, and finding out an intuition # of exeext. { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C compiler works" >&5 $as_echo_n "checking whether the C compiler works... " >&6; } ac_link_default=`$as_echo "$ac_link" | sed 's/ -o *conftest[^ ]*//'` # The possible output files: ac_files="a.out conftest.exe conftest a.exe a_out.exe b.out conftest.*" ac_rmfiles= for ac_file in $ac_files do case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;; * ) ac_rmfiles="$ac_rmfiles $ac_file";; esac done rm -f $ac_rmfiles if { { ac_try="$ac_link_default" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_link_default") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then : # Autoconf-2.13 could set the ac_cv_exeext variable to `no'. # So ignore a value of `no', otherwise this would lead to `EXEEXT = no' # in a Makefile. We should not override ac_cv_exeext if it was cached, # so that the user can short-circuit this test for compilers unknown to # Autoconf. for ac_file in $ac_files '' do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;; [ab].out ) # We found the default executable, but exeext='' is most # certainly right. break;; *.* ) if test "${ac_cv_exeext+set}" = set && test "$ac_cv_exeext" != no; then :; else ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` fi # We set ac_cv_exeext here because the later test for it is not # safe: cross compilers may not add the suffix if given an `-o' # argument, so we may need to know it at that point already. # Even if this section looks crufty: it has the advantage of # actually working. break;; * ) break;; esac done test "$ac_cv_exeext" = no && ac_cv_exeext= else ac_file='' fi if test -z "$ac_file"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error 77 "C compiler cannot create executables See \`config.log' for more details" "$LINENO" 5; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 $as_echo "yes" >&6; } fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler default output file name" >&5 $as_echo_n "checking for C compiler default output file name... " >&6; } { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_file" >&5 $as_echo "$ac_file" >&6; } ac_exeext=$ac_cv_exeext rm -f -r a.out a.out.dSYM a.exe conftest$ac_cv_exeext b.out ac_clean_files=$ac_clean_files_save { $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of executables" >&5 $as_echo_n "checking for suffix of executables... " >&6; } if { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then : # If both `conftest.exe' and `conftest' are `present' (well, observable) # catch `conftest.exe'. For instance with Cygwin, `ls conftest' will # work properly (i.e., refer to `conftest.exe'), while it won't with # `rm'. for ac_file in conftest.exe conftest conftest.*; do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;; *.* ) ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` break;; * ) break;; esac done else { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of executables: cannot compile and link See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest conftest$ac_cv_exeext { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5 $as_echo "$ac_cv_exeext" >&6; } rm -f conftest.$ac_ext EXEEXT=$ac_cv_exeext ac_exeext=$EXEEXT cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include int main () { FILE *f = fopen ("conftest.out", "w"); return ferror (f) || fclose (f) != 0; ; return 0; } _ACEOF ac_clean_files="$ac_clean_files conftest.out" # Check that the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are cross compiling" >&5 $as_echo_n "checking whether we are cross compiling... " >&6; } if test "$cross_compiling" != yes; then { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } if { ac_try='./conftest$ac_cv_exeext' { { case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_try") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; }; then cross_compiling=no else if test "$cross_compiling" = maybe; then cross_compiling=yes else { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details" "$LINENO" 5; } fi fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5 $as_echo "$cross_compiling" >&6; } rm -f conftest.$ac_ext conftest$ac_cv_exeext conftest.out ac_clean_files=$ac_clean_files_save { $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5 $as_echo_n "checking for suffix of object files... " >&6; } if ${ac_cv_objext+:} false; then : $as_echo_n "(cached) " >&6 else cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.o conftest.obj if { { ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then : for ac_file in conftest.o conftest.obj conftest.*; do test -f "$ac_file" || continue; case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM ) ;; *) ac_cv_objext=`expr "$ac_file" : '.*\.\(.*\)'` break;; esac done else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of object files: cannot compile See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest.$ac_cv_objext conftest.$ac_ext fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5 $as_echo "$ac_cv_objext" >&6; } OBJEXT=$ac_cv_objext ac_objext=$OBJEXT { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C compiler" >&5 $as_echo_n "checking whether we are using the GNU C compiler... " >&6; } if ${ac_cv_c_compiler_gnu+:} false; then : $as_echo_n "(cached) " >&6 else cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF if ac_fn_c_try_compile "$LINENO"; then : ac_compiler_gnu=yes else ac_compiler_gnu=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5 $as_echo "$ac_cv_c_compiler_gnu" >&6; } if test $ac_compiler_gnu = yes; then GCC=yes else GCC= fi ac_test_CFLAGS=${CFLAGS+set} ac_save_CFLAGS=$CFLAGS { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC accepts -g" >&5 $as_echo_n "checking whether $CC accepts -g... " >&6; } if ${ac_cv_prog_cc_g+:} false; then : $as_echo_n "(cached) " >&6 else ac_save_c_werror_flag=$ac_c_werror_flag ac_c_werror_flag=yes ac_cv_prog_cc_g=no CFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_c_try_compile "$LINENO"; then : ac_cv_prog_cc_g=yes else CFLAGS="" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_c_try_compile "$LINENO"; then : else ac_c_werror_flag=$ac_save_c_werror_flag CFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_c_try_compile "$LINENO"; then : ac_cv_prog_cc_g=yes fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_c_werror_flag=$ac_save_c_werror_flag fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5 $as_echo "$ac_cv_prog_cc_g" >&6; } if test "$ac_test_CFLAGS" = set; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then CFLAGS="-g -O2" else CFLAGS="-g" fi else if test "$GCC" = yes; then CFLAGS="-O2" else CFLAGS= fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $CC option to accept ISO C89" >&5 $as_echo_n "checking for $CC option to accept ISO C89... " >&6; } if ${ac_cv_prog_cc_c89+:} false; then : $as_echo_n "(cached) " >&6 else ac_cv_prog_cc_c89=no ac_save_CC=$CC cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include #include #include #include /* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ struct buf { int x; }; FILE * (*rcsopen) (struct buf *, struct stat *, int); static char *e (p, i) char **p; int i; { return p[i]; } static char *f (char * (*g) (char **, int), char **p, ...) { char *s; va_list v; va_start (v,p); s = g (p, va_arg (v,int)); va_end (v); return s; } /* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has function prototypes and stuff, but not '\xHH' hex character constants. These don't provoke an error unfortunately, instead are silently treated as 'x'. The following induces an error, until -std is added to get proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an array size at least. It's necessary to write '\x00'==0 to get something that's true only with -std. */ int osf4_cc_array ['\x00' == 0 ? 1 : -1]; /* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters inside strings and character constants. */ #define FOO(x) 'x' int xlc6_cc_array[FOO(a) == 'x' ? 1 : -1]; int test (int i, double x); struct s1 {int (*f) (int a);}; struct s2 {int (*f) (double a);}; int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); int argc; char **argv; int main () { return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; ; return 0; } _ACEOF for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std \ -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" if ac_fn_c_try_compile "$LINENO"; then : ac_cv_prog_cc_c89=$ac_arg fi rm -f core conftest.err conftest.$ac_objext test "x$ac_cv_prog_cc_c89" != "xno" && break done rm -f conftest.$ac_ext CC=$ac_save_CC fi # AC_CACHE_VAL case "x$ac_cv_prog_cc_c89" in x) { $as_echo "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 $as_echo "none needed" >&6; } ;; xno) { $as_echo "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 $as_echo "unsupported" >&6; } ;; *) CC="$CC $ac_cv_prog_cc_c89" { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5 $as_echo "$ac_cv_prog_cc_c89" >&6; } ;; esac if test "x$ac_cv_prog_cc_c89" != xno; then : fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 $as_echo_n "checking whether ln -s works... " >&6; } LN_S=$as_ln_s if test "$LN_S" = "ln -s"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 $as_echo "yes" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 $as_echo "no, using $LN_S" >&6; } fi # Extract the first word of "sed", so it can be a program name with args. set dummy sed; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_SED+:} false; then : $as_echo_n "(cached) " >&6 else case $SED in [\\/]* | ?:[\\/]*) ac_cv_path_SED="$SED" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_SED="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi SED=$ac_cv_path_SED if test -n "$SED"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $SED" >&5 $as_echo "$SED" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "grep", so it can be a program name with args. set dummy grep; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_GREP+:} false; then : $as_echo_n "(cached) " >&6 else case $GREP in [\\/]* | ?:[\\/]*) ac_cv_path_GREP="$GREP" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_GREP="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi GREP=$ac_cv_path_GREP if test -n "$GREP"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $GREP" >&5 $as_echo "$GREP" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "flex", so it can be a program name with args. set dummy flex; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_FLEX+:} false; then : $as_echo_n "(cached) " >&6 else case $FLEX in [\\/]* | ?:[\\/]*) ac_cv_path_FLEX="$FLEX" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_FLEX="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi FLEX=$ac_cv_path_FLEX if test -n "$FLEX"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $FLEX" >&5 $as_echo "$FLEX" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "rpmbuild", so it can be a program name with args. set dummy rpmbuild; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_RPMBUILD+:} false; then : $as_echo_n "(cached) " >&6 else case $RPMBUILD in [\\/]* | ?:[\\/]*) ac_cv_path_RPMBUILD="$RPMBUILD" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_RPMBUILD="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi RPMBUILD=$ac_cv_path_RPMBUILD if test -n "$RPMBUILD"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $RPMBUILD" >&5 $as_echo "$RPMBUILD" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "xsltproc", so it can be a program name with args. set dummy xsltproc; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_XSLTPROC+:} false; then : $as_echo_n "(cached) " >&6 else case $XSLTPROC in [\\/]* | ?:[\\/]*) ac_cv_path_XSLTPROC="$XSLTPROC" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_XSLTPROC="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi XSLTPROC=$ac_cv_path_XSLTPROC if test -n "$XSLTPROC"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $XSLTPROC" >&5 $as_echo "$XSLTPROC" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "tar", so it can be a program name with args. set dummy tar; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_TAR+:} false; then : $as_echo_n "(cached) " >&6 else case $TAR in [\\/]* | ?:[\\/]*) ac_cv_path_TAR="$TAR" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_TAR="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi TAR=$ac_cv_path_TAR if test -n "$TAR"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $TAR" >&5 $as_echo "$TAR" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "git", so it can be a program name with args. set dummy git; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_GIT+:} false; then : $as_echo_n "(cached) " >&6 else case $GIT in [\\/]* | ?:[\\/]*) ac_cv_path_GIT="$GIT" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_GIT="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi GIT=$ac_cv_path_GIT if test -n "$GIT"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $GIT" >&5 $as_echo "$GIT" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "dpkg-buildpackage", so it can be a program name with args. set dummy dpkg-buildpackage; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_DPKG_BUILDPACKAGE+:} false; then : $as_echo_n "(cached) " >&6 else case $DPKG_BUILDPACKAGE in [\\/]* | ?:[\\/]*) ac_cv_path_DPKG_BUILDPACKAGE="$DPKG_BUILDPACKAGE" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_DPKG_BUILDPACKAGE="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS ;; esac fi DPKG_BUILDPACKAGE=$ac_cv_path_DPKG_BUILDPACKAGE if test -n "$DPKG_BUILDPACKAGE"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $DPKG_BUILDPACKAGE" >&5 $as_echo "$DPKG_BUILDPACKAGE" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "udevadm", so it can be a program name with args. set dummy udevadm; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_UDEVADM+:} false; then : $as_echo_n "(cached) " >&6 else case $UDEVADM in [\\/]* | ?:[\\/]*) ac_cv_path_UDEVADM="$UDEVADM" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in /sbin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_UDEVADM="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS test -z "$ac_cv_path_UDEVADM" && ac_cv_path_UDEVADM="false" ;; esac fi UDEVADM=$ac_cv_path_UDEVADM if test -n "$UDEVADM"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $UDEVADM" >&5 $as_echo "$UDEVADM" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi # Extract the first word of "udevinfo", so it can be a program name with args. set dummy udevinfo; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_path_UDEVINFO+:} false; then : $as_echo_n "(cached) " >&6 else case $UDEVINFO in [\\/]* | ?:[\\/]*) ac_cv_path_UDEVINFO="$UDEVINFO" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in /sbin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_UDEVINFO="$as_dir/$ac_word$ac_exec_ext" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS test -z "$ac_cv_path_UDEVINFO" && ac_cv_path_UDEVINFO="false" ;; esac fi UDEVINFO=$ac_cv_path_UDEVINFO if test -n "$UDEVINFO"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $UDEVINFO" >&5 $as_echo "$UDEVINFO" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi if test -z "$CC"; then if test "$WITH_UTILS" = "yes"; then as_fn_error $? "Cannot build utils without a C compiler, either install a compiler or pass the --without-utils option." "$LINENO" 5 fi if test "$WITH_KM" = "yes"; then as_fn_error $? "Cannot build kernel module without a C compiler, either install a compiler or pass the --without-km option." "$LINENO" 5 fi fi if test -z $FLEX; then if test "$WITH_UTILS" = "yes"; then as_fn_error $? "Cannot build utils without flex, either install flex or pass the --without-utils option." "$LINENO" 5 fi fi if test -z $RPMBUILD; then { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: No rpmbuild found, building RPM packages is disabled." >&5 $as_echo "$as_me: WARNING: No rpmbuild found, building RPM packages is disabled." >&2;} fi if test -z $DPKG_BUILDPACKAGE; then { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: No dpkg-buildpackage found, building Debian packages is disabled." >&5 $as_echo "$as_me: WARNING: No dpkg-buildpackage found, building Debian packages is disabled." >&2;} fi if test -z $XSLTPROC; then { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Cannot build man pages without xsltproc. You may safely ignore this warning when building from a tarball." >&5 $as_echo "$as_me: WARNING: Cannot build man pages without xsltproc. You may safely ignore this warning when building from a tarball." >&2;} XSLTPROC=xsltproc fi if test -z $GIT; then { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Cannot update buildtag without git. You may safely ignore this warning when building from a tarball." >&5 $as_echo "$as_me: WARNING: Cannot update buildtag without git. You may safely ignore this warning when building from a tarball." >&2;} fi if test $UDEVADM = false && test $UDEVINFO = false; then if test "$WITH_UDEV" = "yes"; then { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: udev support enabled, but neither udevadm nor udevinfo found on this system." >&5 $as_echo "$as_me: WARNING: udev support enabled, but neither udevadm nor udevinfo found on this system." >&2;} fi fi BASH_COMPLETION_SUFFIX="" UDEV_RULE_SUFFIX="" RPM_DIST_TAG="" RPM_CONFLICTS_KM="" RPM_BUILDREQ_DEFAULT="gcc flex glibc-devel make" RPM_BUILDREQ_KM="" RPM_SUBPACKAGE_NOARCH="" RPM_REQ_PACEMAKER="" RPM_REQ_HEARTBEAT="" RPM_REQ_BASH_COMPLETION="" RPM_REQ_XEN="" RPM_REQ_CHKCONFIG_POST="" RPM_REQ_CHKCONFIG_PREUN="" if test -z $DISTRO; then { $as_echo "$as_me:${as_lineno-$LINENO}: checking for /etc/gentoo-release" >&5 $as_echo_n "checking for /etc/gentoo-release... " >&6; } if ${ac_cv_file__etc_gentoo_release+:} false; then : $as_echo_n "(cached) " >&6 else test "$cross_compiling" = yes && as_fn_error $? "cannot check for file existence when cross compiling" "$LINENO" 5 if test -r "/etc/gentoo-release"; then ac_cv_file__etc_gentoo_release=yes else ac_cv_file__etc_gentoo_release=no fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_file__etc_gentoo_release" >&5 $as_echo "$ac_cv_file__etc_gentoo_release" >&6; } if test "x$ac_cv_file__etc_gentoo_release" = xyes; then : DISTRO="gentoo" fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for /etc/redhat-release" >&5 $as_echo_n "checking for /etc/redhat-release... " >&6; } if ${ac_cv_file__etc_redhat_release+:} false; then : $as_echo_n "(cached) " >&6 else test "$cross_compiling" = yes && as_fn_error $? "cannot check for file existence when cross compiling" "$LINENO" 5 if test -r "/etc/redhat-release"; then ac_cv_file__etc_redhat_release=yes else ac_cv_file__etc_redhat_release=no fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_file__etc_redhat_release" >&5 $as_echo "$ac_cv_file__etc_redhat_release" >&6; } if test "x$ac_cv_file__etc_redhat_release" = xyes; then : DISTRO="redhat" fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for /etc/slackware-version" >&5 $as_echo_n "checking for /etc/slackware-version... " >&6; } if ${ac_cv_file__etc_slackware_version+:} false; then : $as_echo_n "(cached) " >&6 else test "$cross_compiling" = yes && as_fn_error $? "cannot check for file existence when cross compiling" "$LINENO" 5 if test -r "/etc/slackware-version"; then ac_cv_file__etc_slackware_version=yes else ac_cv_file__etc_slackware_version=no fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_file__etc_slackware_version" >&5 $as_echo "$ac_cv_file__etc_slackware_version" >&6; } if test "x$ac_cv_file__etc_slackware_version" = xyes; then : DISTRO="slackware" fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for /etc/debian_version" >&5 $as_echo_n "checking for /etc/debian_version... " >&6; } if ${ac_cv_file__etc_debian_version+:} false; then : $as_echo_n "(cached) " >&6 else test "$cross_compiling" = yes && as_fn_error $? "cannot check for file existence when cross compiling" "$LINENO" 5 if test -r "/etc/debian_version"; then ac_cv_file__etc_debian_version=yes else ac_cv_file__etc_debian_version=no fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_file__etc_debian_version" >&5 $as_echo "$ac_cv_file__etc_debian_version" >&6; } if test "x$ac_cv_file__etc_debian_version" = xyes; then : DISTRO="debian" fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for /etc/SuSE-release" >&5 $as_echo_n "checking for /etc/SuSE-release... " >&6; } if ${ac_cv_file__etc_SuSE_release+:} false; then : $as_echo_n "(cached) " >&6 else test "$cross_compiling" = yes && as_fn_error $? "cannot check for file existence when cross compiling" "$LINENO" 5 if test -r "/etc/SuSE-release"; then ac_cv_file__etc_SuSE_release=yes else ac_cv_file__etc_SuSE_release=no fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_file__etc_SuSE_release" >&5 $as_echo "$ac_cv_file__etc_SuSE_release" >&6; } if test "x$ac_cv_file__etc_SuSE_release" = xyes; then : DISTRO="suse" fi fi case "$DISTRO" in gentoo) { $as_echo "$as_me:${as_lineno-$LINENO}: configured for Gentoo." >&5 $as_echo "$as_me: configured for Gentoo." >&6;} ;; redhat) test -z $INITDIR && INITDIR="$sysconfdir/rc.d/init.d" RPM_DIST_TAG="%{?dist}" RPM_CONFLICTS_KM="drbd-kmod <= %{version}_3" RPM_BUILDREQ_DEFAULT="flex" RPM_BUILDREQ_KM="kernel-devel" RPM_REQ_CHKCONFIG_POST="Requires(post): chkconfig" RPM_REQ_CHKCONFIG_PREUN="Requires(preun): chkconfig" { $as_echo "$as_me:${as_lineno-$LINENO}: configured for Red Hat (includes Fedora, RHEL, CentOS)." >&5 $as_echo "$as_me: configured for Red Hat (includes Fedora, RHEL, CentOS)." >&6;} { $as_echo "$as_me:${as_lineno-$LINENO}: checking for /etc/fedora-release" >&5 $as_echo_n "checking for /etc/fedora-release... " >&6; } if ${ac_cv_file__etc_fedora_release+:} false; then : $as_echo_n "(cached) " >&6 else test "$cross_compiling" = yes && as_fn_error $? "cannot check for file existence when cross compiling" "$LINENO" 5 if test -r "/etc/fedora-release"; then ac_cv_file__etc_fedora_release=yes else ac_cv_file__etc_fedora_release=no fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_file__etc_fedora_release" >&5 $as_echo "$ac_cv_file__etc_fedora_release" >&6; } if test "x$ac_cv_file__etc_fedora_release" = xyes; then : SUB_DISTRO="fedora" else SUB_DISTRO="RHEL" fi if test "$SUB_DISTRO" = "fedora"; then # pacemaker, heartbeat and bash-completion are not available in RHEL # Xen: Be relaxed on RHEL (hassle free update). Be strict on Fedora RPM_REQ_PACEMAKER="Requires: pacemaker" RPM_REQ_HEARTBEAT="Requires: heartbeat" RPM_REQ_BASH_COMPLETION="Requires: bash-completion" RPM_REQ_XEN="Requires: xen" fi ;; slackware) test -z $INITDIR && INITDIR="$sysconfdir/rc.d" { $as_echo "$as_me:${as_lineno-$LINENO}: configured for Slackware." >&5 $as_echo "$as_me: configured for Slackware." >&6;} ;; debian) { $as_echo "$as_me:${as_lineno-$LINENO}: configured for Debian (includes Ubuntu)." >&5 $as_echo "$as_me: configured for Debian (includes Ubuntu)." >&6;} ;; suse) BASH_COMPLETION_SUFFIX=".sh" RPM_CONFLICTS_KM="km_drbd, drbd-kmp <= %{version}_3" RPM_BUILDREQ_KM="kernel-syms" # RPM_REQ_CHKCONFIG_POST="" chkconfig is part of aaa_base on suse # RPM_REQ_CHKCONFIG_PREUN="" chkconfig is part of aaa_base on suse { $as_echo "$as_me:${as_lineno-$LINENO}: configured for SUSE (includes openSUSE, SLES)." >&5 $as_echo "$as_me: configured for SUSE (includes openSUSE, SLES)." >&6;} RPM_REQ_BASH_COMPLETION="Requires: bash" # The following are disabled for hassle free updates: # RPM_REQ_XEN="Requires: xen" # RPM_REQ_PACEMAKER="Requires: pacemaker" # RPM_REQ_HEARTBEAT="Requires: heartbeat" # Unfortunately gcc on SLES9 is broken with -O2. Works with -O1 if grep -q 'VERSION = 9' /etc/SuSE-release; then CFLAGS="-g -O1" fi ;; "") { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Unable to determine what distribution we are running on. Distribution-specific features will be disabled." >&5 $as_echo "$as_me: WARNING: Unable to determine what distribution we are running on. Distribution-specific features will be disabled." >&2;} ;; esac test -z $INITDIR && INITDIR="$sysconfdir/init.d" if test "$WITH_UDEV" = "yes"; then udev_version=`$UDEVADM version 2>/dev/null` || udev_version=`$UDEVINFO -V | cut -d " " -f 3` if test -z $udev_version || test $udev_version -lt 85; then UDEV_RULE_SUFFIX=".disabled" { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Obsolete or unknown udev version. Installing disabled udev rules." >&5 $as_echo "$as_me: WARNING: Obsolete or unknown udev version. Installing disabled udev rules." >&2;} fi fi if test "$WITH_NOARCH_SUBPACKAGES" = "yes"; then RPM_SUBPACKAGE_NOARCH="BuildArch: noarch" fi cat >>confdefs.h <<_ACEOF #define DRBD_LIB_DIR "$localstatedir/lib/$PACKAGE_TARNAME" _ACEOF cat >>confdefs.h <<_ACEOF #define DRBD_RUN_DIR "$localstatedir/run/$PACKAGE_TARNAME" _ACEOF cat >>confdefs.h <<_ACEOF #define DRBD_LOCK_DIR "$localstatedir/lock" _ACEOF cat >>confdefs.h <<_ACEOF #define DRBD_CONFIG_DIR "$sysconfdir" _ACEOF if test "$WITH_LEGACY_UTILS" = "yes"; then $as_echo "#define DRBD_LEGACY_83 1" >>confdefs.h fi if test -z $SPECMODE; then ac_config_files="$ac_config_files Makefile user/Makefile user/legacy/Makefile scripts/Makefile documentation/Makefile" ac_config_headers="$ac_config_headers user/config.h user/legacy/config.h" else if test "$WITH_UTILS" = "yes"; then ac_config_files="$ac_config_files drbd.spec" fi if test "$WITH_KM" = "yes"; then ac_config_files="$ac_config_files drbd-km.spec drbd-kernel.spec" fi fi cat >confcache <<\_ACEOF # This file is a shell script that caches the results of configure # tests run on this system so they can be shared between configure # scripts and configure runs, see configure's option --config-cache. # It is not useful on other systems. If it contains results you don't # want to keep, you may remove or edit it. # # config.status only pays attention to the cache file if you give it # the --recheck option to rerun configure. # # `ac_cv_env_foo' variables (set or unset) will be overridden when # loading this file, other *unset* `ac_cv_foo' will be assigned the # following values. _ACEOF # The following way of writing the cache mishandles newlines in values, # but we know of no workaround that is simple, portable, and efficient. # So, we kill variables containing newlines. # Ultrix sh set writes to stderr and can't be redirected directly, # and sets the high bit in the cache file unless we assign to the vars. ( for ac_var in `(set) 2>&1 | sed -n 's/^\([a-zA-Z_][a-zA-Z0-9_]*\)=.*/\1/p'`; do eval ac_val=\$$ac_var case $ac_val in #( *${as_nl}*) case $ac_var in #( *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( BASH_ARGV | BASH_SOURCE) eval $ac_var= ;; #( *) { eval $ac_var=; unset $ac_var;} ;; esac ;; esac done (set) 2>&1 | case $as_nl`(ac_space=' '; set) 2>&1` in #( *${as_nl}ac_space=\ *) # `set' does not quote correctly, so add quotes: double-quote # substitution turns \\\\ into \\, and sed turns \\ into \. sed -n \ "s/'/'\\\\''/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\\2'/p" ;; #( *) # `set' quotes correctly as required by POSIX, so do not add quotes. sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p" ;; esac | sort ) | sed ' /^ac_cv_env_/b end t clear :clear s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then if test "x$cache_file" != "x/dev/null"; then { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 $as_echo "$as_me: updating cache $cache_file" >&6;} if test ! -f "$cache_file" || test -h "$cache_file"; then cat confcache >"$cache_file" else case $cache_file in #( */* | ?:*) mv -f confcache "$cache_file"$$ && mv -f "$cache_file"$$ "$cache_file" ;; #( *) mv -f confcache "$cache_file" ;; esac fi fi else { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 $as_echo "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache test "x$prefix" = xNONE && prefix=$ac_default_prefix # Let make expand exec_prefix. test "x$exec_prefix" = xNONE && exec_prefix='${prefix}' DEFS=-DHAVE_CONFIG_H ac_libobjs= ac_ltlibobjs= U= for ac_i in : $LIBOBJS; do test "x$ac_i" = x: && continue # 1. Remove the extension, and $U if already installed. ac_script='s/\$U\././;s/\.o$//;s/\.obj$//' ac_i=`$as_echo "$ac_i" | sed "$ac_script"` # 2. Prepend LIBOBJDIR. When used with automake>=1.10 LIBOBJDIR # will be set to the directory where LIBOBJS objects are built. as_fn_append ac_libobjs " \${LIBOBJDIR}$ac_i\$U.$ac_objext" as_fn_append ac_ltlibobjs " \${LIBOBJDIR}$ac_i"'$U.lo' done LIBOBJS=$ac_libobjs LTLIBOBJS=$ac_ltlibobjs : "${CONFIG_STATUS=./config.status}" ac_write_fail=0 ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files $CONFIG_STATUS" { $as_echo "$as_me:${as_lineno-$LINENO}: creating $CONFIG_STATUS" >&5 $as_echo "$as_me: creating $CONFIG_STATUS" >&6;} as_write_fail=0 cat >$CONFIG_STATUS <<_ASEOF || as_write_fail=1 #! $SHELL # Generated by $as_me. # Run this file to recreate the current configuration. # Compiler output produced by configure, useful for debugging # configure, is in config.log if it exists. debug=false ac_cs_recheck=false ac_cs_silent=false SHELL=\${CONFIG_SHELL-$SHELL} export SHELL _ASEOF cat >>$CONFIG_STATUS <<\_ASEOF || as_write_fail=1 ## -------------------- ## ## M4sh Initialization. ## ## -------------------- ## # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( *) : ;; esac fi as_nl=' ' export as_nl # Printing a long string crashes Solaris 7 /usr/bin/printf. as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo # Prefer a ksh shell builtin over an external printf program on Solaris, # but without wasting forks for bash or zsh. if test -z "$BASH_VERSION$ZSH_VERSION" \ && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then as_echo='print -r --' as_echo_n='print -rn --' elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then as_echo='printf %s\n' as_echo_n='printf %s' else if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' as_echo_n='/usr/ucb/echo -n' else as_echo_body='eval expr "X$1" : "X\\(.*\\)"' as_echo_n_body='eval arg=$1; case $arg in #( *"$as_nl"*) expr "X$arg" : "X\\(.*\\)$as_nl"; arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; esac; expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" ' export as_echo_n_body as_echo_n='sh -c $as_echo_n_body as_echo' fi export as_echo_body as_echo='sh -c $as_echo_body as_echo' fi # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || PATH_SEPARATOR=';' } fi # IFS # We need space, tab and new line, in precisely that order. Quoting is # there to prevent editors from complaining about space-tab. # (If _AS_PATH_WALK were called with IFS unset, it would disable word # splitting by setting IFS to empty value.) IFS=" "" $as_nl" # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done IFS=$as_save_IFS ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi # Unset variables that we do not need and which cause bugs (e.g. in # pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" # suppresses any "Segmentation fault" message there. '((' could # trigger a bug in pdksh 5.2.14. for as_var in BASH_ENV ENV MAIL MAILPATH do eval test x\${$as_var+set} = xset \ && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : done PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. LC_ALL=C export LC_ALL LANGUAGE=C export LANGUAGE # CDPATH. (unset CDPATH) >/dev/null 2>&1 && unset CDPATH # as_fn_error STATUS ERROR [LINENO LOG_FD] # ---------------------------------------- # Output "`basename $0`: error: ERROR" to stderr. If LINENO and LOG_FD are # provided, also output the error to LOG_FD, referencing LINENO. Then exit the # script with STATUS, using 1 if that was 0. as_fn_error () { as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi $as_echo "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. as_fn_set_status () { return $1 } # as_fn_set_status # as_fn_exit STATUS # ----------------- # Exit the shell with STATUS, even in a "trap 0" or "set -e" context. as_fn_exit () { set +e as_fn_set_status $1 exit $1 } # as_fn_exit # as_fn_unset VAR # --------------- # Portably unset VAR. as_fn_unset () { { eval $1=; unset $1;} } as_unset=as_fn_unset # as_fn_append VAR VALUE # ---------------------- # Append the text in VALUE to the end of the definition contained in VAR. Take # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : eval 'as_fn_append () { eval $1+=\$2 }' else as_fn_append () { eval $1=\$$1\$2 } fi # as_fn_append # as_fn_arith ARG... # ------------------ # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : eval 'as_fn_arith () { as_val=$(( $* )) }' else as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` } fi # as_fn_arith if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi if (basename -- /) >/dev/null 2>&1 && test "X`basename -- / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi if (as_dir=`dirname -- /` && test "X$as_dir" = X/) >/dev/null 2>&1; then as_dirname=dirname else as_dirname=false fi as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || $as_echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q } /^X\/\(\/\/\)$/{ s//\1/ q } /^X\/\(\/\).*/{ s//\1/ q } s/.*/./; q'` # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) case `echo 'xy\c'` in *c*) ECHO_T=' ';; # ECHO_T is single tab character. xy) ECHO_C='\c';; *) echo `echo ksh88 bug on AIX 6.1` > /dev/null ECHO_T=' ';; esac;; *) ECHO_N='-n';; esac rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file else rm -f conf$$.dir mkdir conf$$.dir 2>/dev/null fi if (echo >conf$$.file) 2>/dev/null; then if ln -s conf$$.file conf$$ 2>/dev/null; then as_ln_s='ln -s' # ... but there are two gotchas: # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. # In both cases, we have to default to `cp -p'. ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || as_ln_s='cp -p' elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -p' fi else as_ln_s='cp -p' fi rm -f conf$$ conf$$.exe conf$$.dir/conf$$.file conf$$.file rmdir conf$$.dir 2>/dev/null # as_fn_mkdir_p # ------------- # Create "$as_dir" as a directory, including parents if necessary. as_fn_mkdir_p () { case $as_dir in #( -*) as_dir=./$as_dir;; esac test -d "$as_dir" || eval $as_mkdir_p || { as_dirs= while :; do case $as_dir in #( *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" as_dir=`$as_dirname -- "$as_dir" || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || $as_echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` test -d "$as_dir" && break done test -z "$as_dirs" || eval "mkdir $as_dirs" } || test -d "$as_dir" || as_fn_error $? "cannot create directory $as_dir" } # as_fn_mkdir_p if mkdir -p . 2>/dev/null; then as_mkdir_p='mkdir -p "$as_dir"' else test -d ./-p && rmdir ./-p as_mkdir_p=false fi if test -x / >/dev/null 2>&1; then as_test_x='test -x' else if ls -dL / >/dev/null 2>&1; then as_ls_L_option=L else as_ls_L_option= fi as_test_x=' eval sh -c '\'' if test -d "$1"; then test -d "$1/."; else case $1 in #( -*)set "./$1";; esac; case `ls -ld'$as_ls_L_option' "$1" 2>/dev/null` in #(( ???[sx]*):;;*)false;;esac;fi '\'' sh ' fi as_executable_p=$as_test_x # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" exec 6>&1 ## ----------------------------------- ## ## Main body of $CONFIG_STATUS script. ## ## ----------------------------------- ## _ASEOF test $as_write_fail = 0 && chmod +x $CONFIG_STATUS || ac_write_fail=1 cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # Save the log message, to keep $0 and so on meaningful, and to # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" This file was extended by DRBD $as_me 8.4.4, which was generated by GNU Autoconf 2.68. Invocation command line was CONFIG_FILES = $CONFIG_FILES CONFIG_HEADERS = $CONFIG_HEADERS CONFIG_LINKS = $CONFIG_LINKS CONFIG_COMMANDS = $CONFIG_COMMANDS $ $0 $@ on `(hostname || uname -n) 2>/dev/null | sed 1q` " _ACEOF case $ac_config_files in *" "*) set x $ac_config_files; shift; ac_config_files=$*;; esac case $ac_config_headers in *" "*) set x $ac_config_headers; shift; ac_config_headers=$*;; esac cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 # Files that config.status was made for. config_files="$ac_config_files" config_headers="$ac_config_headers" _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 ac_cs_usage="\ \`$as_me' instantiates files and other configuration actions from templates according to the current configuration. Unless the files and actions are specified as TAGs, all are instantiated by default. Usage: $0 [OPTION]... [TAG]... -h, --help print this help, then exit -V, --version print version number and configuration settings, then exit --config print configuration, then exit -q, --quiet, --silent do not print progress messages -d, --debug don't remove temporary files --recheck update $as_me by reconfiguring in the same conditions --file=FILE[:TEMPLATE] instantiate the configuration file FILE --header=FILE[:TEMPLATE] instantiate the configuration header FILE Configuration files: $config_files Configuration headers: $config_headers Report bugs to ." _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`" ac_cs_version="\\ DRBD config.status 8.4.4 configured by $0, generated by GNU Autoconf 2.68, with options \\"\$ac_cs_config\\" Copyright (C) 2010 Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it." ac_pwd='$ac_pwd' srcdir='$srcdir' test -n "\$AWK" || AWK=awk _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # The default lists apply if the user does not specify any file. ac_need_defaults=: while test $# != 0 do case $1 in --*=?*) ac_option=`expr "X$1" : 'X\([^=]*\)='` ac_optarg=`expr "X$1" : 'X[^=]*=\(.*\)'` ac_shift=: ;; --*=) ac_option=`expr "X$1" : 'X\([^=]*\)='` ac_optarg= ac_shift=: ;; *) ac_option=$1 ac_optarg=$2 ac_shift=shift ;; esac case $ac_option in # Handling of the options. -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) ac_cs_recheck=: ;; --version | --versio | --versi | --vers | --ver | --ve | --v | -V ) $as_echo "$ac_cs_version"; exit ;; --config | --confi | --conf | --con | --co | --c ) $as_echo "$ac_cs_config"; exit ;; --debug | --debu | --deb | --de | --d | -d ) debug=: ;; --file | --fil | --fi | --f ) $ac_shift case $ac_optarg in *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; '') as_fn_error $? "missing file argument" ;; esac as_fn_append CONFIG_FILES " '$ac_optarg'" ac_need_defaults=false;; --header | --heade | --head | --hea ) $ac_shift case $ac_optarg in *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; esac as_fn_append CONFIG_HEADERS " '$ac_optarg'" ac_need_defaults=false;; --he | --h) # Conflict between --help and --header as_fn_error $? "ambiguous option: \`$1' Try \`$0 --help' for more information.";; --help | --hel | -h ) $as_echo "$ac_cs_usage"; exit ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil | --si | --s) ac_cs_silent=: ;; # This is an error. -*) as_fn_error $? "unrecognized option: \`$1' Try \`$0 --help' for more information." ;; *) as_fn_append ac_config_targets " $1" ac_need_defaults=false ;; esac shift done ac_configure_extra_args= if $ac_cs_silent; then exec 6>/dev/null ac_configure_extra_args="$ac_configure_extra_args --silent" fi _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 if \$ac_cs_recheck; then set X '$SHELL' '$0' $ac_configure_args \$ac_configure_extra_args --no-create --no-recursion shift \$as_echo "running CONFIG_SHELL=$SHELL \$*" >&6 CONFIG_SHELL='$SHELL' export CONFIG_SHELL exec "\$@" fi _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 exec 5>>config.log { echo sed 'h;s/./-/g;s/^.../## /;s/...$/ ##/;p;x;p;x' <<_ASBOX ## Running $as_me. ## _ASBOX $as_echo "$ac_log" } >&5 _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # Handling of arguments. for ac_config_target in $ac_config_targets do case $ac_config_target in "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;; "user/Makefile") CONFIG_FILES="$CONFIG_FILES user/Makefile" ;; "user/legacy/Makefile") CONFIG_FILES="$CONFIG_FILES user/legacy/Makefile" ;; "scripts/Makefile") CONFIG_FILES="$CONFIG_FILES scripts/Makefile" ;; "documentation/Makefile") CONFIG_FILES="$CONFIG_FILES documentation/Makefile" ;; "user/config.h") CONFIG_HEADERS="$CONFIG_HEADERS user/config.h" ;; "user/legacy/config.h") CONFIG_HEADERS="$CONFIG_HEADERS user/legacy/config.h" ;; "drbd.spec") CONFIG_FILES="$CONFIG_FILES drbd.spec" ;; "drbd-km.spec") CONFIG_FILES="$CONFIG_FILES drbd-km.spec" ;; "drbd-kernel.spec") CONFIG_FILES="$CONFIG_FILES drbd-kernel.spec" ;; *) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;; esac done # If the user did not use the arguments to specify the items to instantiate, # then the envvar interface is used. Set only those that are not. # We use the long form for the default assignment because of an extremely # bizarre bug on SunOS 4.1.3. if $ac_need_defaults; then test "${CONFIG_FILES+set}" = set || CONFIG_FILES=$config_files test "${CONFIG_HEADERS+set}" = set || CONFIG_HEADERS=$config_headers fi # Have a temporary directory for convenience. Make it in the build tree # simply because there is no reason against having it here, and in addition, # creating and moving files from /tmp can sometimes cause problems. # Hook for its removal unless debugging. # Note that there is a small window in which the directory will not be cleaned: # after its creation but before its name has been assigned to `$tmp'. $debug || { tmp= ac_tmp= trap 'exit_status=$? : "${ac_tmp:=$tmp}" { test ! -d "$ac_tmp" || rm -fr "$ac_tmp"; } && exit $exit_status ' 0 trap 'as_fn_exit 1' 1 2 13 15 } # Create a (secure) tmp directory for tmp files. { tmp=`(umask 077 && mktemp -d "./confXXXXXX") 2>/dev/null` && test -d "$tmp" } || { tmp=./conf$$-$RANDOM (umask 077 && mkdir "$tmp") } || as_fn_error $? "cannot create a temporary directory in ." "$LINENO" 5 ac_tmp=$tmp # Set up the scripts for CONFIG_FILES section. # No need to generate them if there are no CONFIG_FILES. # This happens for instance with `./config.status config.h'. if test -n "$CONFIG_FILES"; then ac_cr=`echo X | tr X '\015'` # On cygwin, bash can eat \r inside `` if the user requested igncr. # But we know of no other shell where ac_cr would be empty at this # point, so we can use a bashism as a fallback. if test "x$ac_cr" = x; then eval ac_cr=\$\'\\r\' fi ac_cs_awk_cr=`$AWK 'BEGIN { print "a\rb" }' /dev/null` if test "$ac_cs_awk_cr" = "a${ac_cr}b"; then ac_cs_awk_cr='\\r' else ac_cs_awk_cr=$ac_cr fi echo 'BEGIN {' >"$ac_tmp/subs1.awk" && _ACEOF { echo "cat >conf$$subs.awk <<_ACEOF" && echo "$ac_subst_vars" | sed 's/.*/&!$&$ac_delim/' && echo "_ACEOF" } >conf$$subs.sh || as_fn_error $? "could not make $CONFIG_STATUS" "$LINENO" 5 ac_delim_num=`echo "$ac_subst_vars" | grep -c '^'` ac_delim='%!_!# ' for ac_last_try in false false false false false :; do . ./conf$$subs.sh || as_fn_error $? "could not make $CONFIG_STATUS" "$LINENO" 5 ac_delim_n=`sed -n "s/.*$ac_delim\$/X/p" conf$$subs.awk | grep -c X` if test $ac_delim_n = $ac_delim_num; then break elif $ac_last_try; then as_fn_error $? "could not make $CONFIG_STATUS" "$LINENO" 5 else ac_delim="$ac_delim!$ac_delim _$ac_delim!! " fi done rm -f conf$$subs.sh cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 cat >>"\$ac_tmp/subs1.awk" <<\\_ACAWK && _ACEOF sed -n ' h s/^/S["/; s/!.*/"]=/ p g s/^[^!]*!// :repl t repl s/'"$ac_delim"'$// t delim :nl h s/\(.\{148\}\)..*/\1/ t more1 s/["\\]/\\&/g; s/^/"/; s/$/\\n"\\/ p n b repl :more1 s/["\\]/\\&/g; s/^/"/; s/$/"\\/ p g s/.\{148\}// t nl :delim h s/\(.\{148\}\)..*/\1/ t more2 s/["\\]/\\&/g; s/^/"/; s/$/"/ p b :more2 s/["\\]/\\&/g; s/^/"/; s/$/"\\/ p g s/.\{148\}// t delim ' >$CONFIG_STATUS || ac_write_fail=1 rm -f conf$$subs.awk cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 _ACAWK cat >>"\$ac_tmp/subs1.awk" <<_ACAWK && for (key in S) S_is_set[key] = 1 FS = "" } { line = $ 0 nfields = split(line, field, "@") substed = 0 len = length(field[1]) for (i = 2; i < nfields; i++) { key = field[i] keylen = length(key) if (S_is_set[key]) { value = S[key] line = substr(line, 1, len) "" value "" substr(line, len + keylen + 3) len += length(value) + length(field[++i]) substed = 1 } else len += 1 + keylen } print line } _ACAWK _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 if sed "s/$ac_cr//" < /dev/null > /dev/null 2>&1; then sed "s/$ac_cr\$//; s/$ac_cr/$ac_cs_awk_cr/g" else cat fi < "$ac_tmp/subs1.awk" > "$ac_tmp/subs.awk" \ || as_fn_error $? "could not setup config files machinery" "$LINENO" 5 _ACEOF # VPATH may cause trouble with some makes, so we remove sole $(srcdir), # ${srcdir} and @srcdir@ entries from VPATH if srcdir is ".", strip leading and # trailing colons and then remove the whole line if VPATH becomes empty # (actually we leave an empty line to preserve line numbers). if test "x$srcdir" = x.; then ac_vpsub='/^[ ]*VPATH[ ]*=[ ]*/{ h s/// s/^/:/ s/[ ]*$/:/ s/:\$(srcdir):/:/g s/:\${srcdir}:/:/g s/:@srcdir@:/:/g s/^:*// s/:*$// x s/\(=[ ]*\).*/\1/ G s/\n// s/^[^=]*=[ ]*$// }' fi cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 fi # test -n "$CONFIG_FILES" # Set up the scripts for CONFIG_HEADERS section. # No need to generate them if there are no CONFIG_HEADERS. # This happens for instance with `./config.status Makefile'. if test -n "$CONFIG_HEADERS"; then cat >"$ac_tmp/defines.awk" <<\_ACAWK || BEGIN { _ACEOF # Transform confdefs.h into an awk script `defines.awk', embedded as # here-document in config.status, that substitutes the proper values into # config.h.in to produce config.h. # Create a delimiter string that does not exist in confdefs.h, to ease # handling of long lines. ac_delim='%!_!# ' for ac_last_try in false false :; do ac_tt=`sed -n "/$ac_delim/p" confdefs.h` if test -z "$ac_tt"; then break elif $ac_last_try; then as_fn_error $? "could not make $CONFIG_HEADERS" "$LINENO" 5 else ac_delim="$ac_delim!$ac_delim _$ac_delim!! " fi done # For the awk script, D is an array of macro values keyed by name, # likewise P contains macro parameters if any. Preserve backslash # newline sequences. ac_word_re=[_$as_cr_Letters][_$as_cr_alnum]* sed -n ' s/.\{148\}/&'"$ac_delim"'/g t rset :rset s/^[ ]*#[ ]*define[ ][ ]*/ / t def d :def s/\\$// t bsnl s/["\\]/\\&/g s/^ \('"$ac_word_re"'\)\(([^()]*)\)[ ]*\(.*\)/P["\1"]="\2"\ D["\1"]=" \3"/p s/^ \('"$ac_word_re"'\)[ ]*\(.*\)/D["\1"]=" \2"/p d :bsnl s/["\\]/\\&/g s/^ \('"$ac_word_re"'\)\(([^()]*)\)[ ]*\(.*\)/P["\1"]="\2"\ D["\1"]=" \3\\\\\\n"\\/p t cont s/^ \('"$ac_word_re"'\)[ ]*\(.*\)/D["\1"]=" \2\\\\\\n"\\/p t cont d :cont n s/.\{148\}/&'"$ac_delim"'/g t clear :clear s/\\$// t bsnlc s/["\\]/\\&/g; s/^/"/; s/$/"/p d :bsnlc s/["\\]/\\&/g; s/^/"/; s/$/\\\\\\n"\\/p b cont ' >$CONFIG_STATUS || ac_write_fail=1 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 for (key in D) D_is_set[key] = 1 FS = "" } /^[\t ]*#[\t ]*(define|undef)[\t ]+$ac_word_re([\t (]|\$)/ { line = \$ 0 split(line, arg, " ") if (arg[1] == "#") { defundef = arg[2] mac1 = arg[3] } else { defundef = substr(arg[1], 2) mac1 = arg[2] } split(mac1, mac2, "(") #) macro = mac2[1] prefix = substr(line, 1, index(line, defundef) - 1) if (D_is_set[macro]) { # Preserve the white space surrounding the "#". print prefix "define", macro P[macro] D[macro] next } else { # Replace #undef with comments. This is necessary, for example, # in the case of _POSIX_SOURCE, which is predefined and required # on some systems where configure will not decide to define it. if (defundef == "undef") { print "/*", prefix defundef, macro, "*/" next } } } { print } _ACAWK _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 as_fn_error $? "could not setup config headers machinery" "$LINENO" 5 fi # test -n "$CONFIG_HEADERS" eval set X " :F $CONFIG_FILES :H $CONFIG_HEADERS " shift for ac_tag do case $ac_tag in :[FHLC]) ac_mode=$ac_tag; continue;; esac case $ac_mode$ac_tag in :[FHL]*:*);; :L* | :C*:*) as_fn_error $? "invalid tag \`$ac_tag'" "$LINENO" 5;; :[FH]-) ac_tag=-:-;; :[FH]*) ac_tag=$ac_tag:$ac_tag.in;; esac ac_save_IFS=$IFS IFS=: set x $ac_tag IFS=$ac_save_IFS shift ac_file=$1 shift case $ac_mode in :L) ac_source=$1;; :[FH]) ac_file_inputs= for ac_f do case $ac_f in -) ac_f="$ac_tmp/stdin";; *) # Look for the file first in the build tree, then in the source tree # (if the path is not absolute). The absolute path cannot be DOS-style, # because $ac_f cannot contain `:'. test -f "$ac_f" || case $ac_f in [\\/$]*) false;; *) test -f "$srcdir/$ac_f" && ac_f="$srcdir/$ac_f";; esac || as_fn_error 1 "cannot find input file: \`$ac_f'" "$LINENO" 5;; esac case $ac_f in *\'*) ac_f=`$as_echo "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac as_fn_append ac_file_inputs " '$ac_f'" done # Let's still pretend it is `configure' which instantiates (i.e., don't # use $as_me), people would be surprised to read: # /* config.h. Generated by config.status. */ configure_input='Generated from '` $as_echo "$*" | sed 's|^[^:]*/||;s|:[^:]*/|, |g' `' by configure.' if test x"$ac_file" != x-; then configure_input="$ac_file. $configure_input" { $as_echo "$as_me:${as_lineno-$LINENO}: creating $ac_file" >&5 $as_echo "$as_me: creating $ac_file" >&6;} fi # Neutralize special characters interpreted by sed in replacement strings. case $configure_input in #( *\&* | *\|* | *\\* ) ac_sed_conf_input=`$as_echo "$configure_input" | sed 's/[\\\\&|]/\\\\&/g'`;; #( *) ac_sed_conf_input=$configure_input;; esac case $ac_tag in *:-:* | *:-) cat >"$ac_tmp/stdin" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 ;; esac ;; esac ac_dir=`$as_dirname -- "$ac_file" || $as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_file" : 'X\(//\)[^/]' \| \ X"$ac_file" : 'X\(//\)$' \| \ X"$ac_file" : 'X\(/\)' \| . 2>/dev/null || $as_echo X"$ac_file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` as_dir="$ac_dir"; as_fn_mkdir_p ac_builddir=. case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; esac ;; esac ac_abs_top_builddir=$ac_pwd ac_abs_builddir=$ac_pwd$ac_dir_suffix # for backward compatibility: ac_top_builddir=$ac_top_build_prefix case $srcdir in .) # We are building in place. ac_srcdir=. ac_top_srcdir=$ac_top_builddir_sub ac_abs_top_srcdir=$ac_pwd ;; [\\/]* | ?:[\\/]* ) # Absolute name. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ac_abs_top_srcdir=$srcdir ;; *) # Relative name. ac_srcdir=$ac_top_build_prefix$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_build_prefix$srcdir ac_abs_top_srcdir=$ac_pwd/$srcdir ;; esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix case $ac_mode in :F) # # CONFIG_FILE # _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # If the template does not know about datarootdir, expand it. # FIXME: This hack should be removed a few years after 2.60. ac_datarootdir_hack=; ac_datarootdir_seen= ac_sed_dataroot=' /datarootdir/ { p q } /@datadir@/p /@docdir@/p /@infodir@/p /@localedir@/p /@mandir@/p' case `eval "sed -n \"\$ac_sed_dataroot\" $ac_file_inputs"` in *datarootdir*) ac_datarootdir_seen=yes;; *@datadir@*|*@docdir@*|*@infodir@*|*@localedir@*|*@mandir@*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&5 $as_echo "$as_me: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&2;} _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_datarootdir_hack=' s&@datadir@&$datadir&g s&@docdir@&$docdir&g s&@infodir@&$infodir&g s&@localedir@&$localedir&g s&@mandir@&$mandir&g s&\\\${datarootdir}&$datarootdir&g' ;; esac _ACEOF # Neutralize VPATH when `$srcdir' = `.'. # Shell code in configure.ac might set extrasub. # FIXME: do we really want to maintain this feature? cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_sed_extra="$ac_vpsub $extrasub _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 :t /@[a-zA-Z_][a-zA-Z_0-9]*@/!b s|@configure_input@|$ac_sed_conf_input|;t t s&@top_builddir@&$ac_top_builddir_sub&;t t s&@top_build_prefix@&$ac_top_build_prefix&;t t s&@srcdir@&$ac_srcdir&;t t s&@abs_srcdir@&$ac_abs_srcdir&;t t s&@top_srcdir@&$ac_top_srcdir&;t t s&@abs_top_srcdir@&$ac_abs_top_srcdir&;t t s&@builddir@&$ac_builddir&;t t s&@abs_builddir@&$ac_abs_builddir&;t t s&@abs_top_builddir@&$ac_abs_top_builddir&;t t $ac_datarootdir_hack " eval sed \"\$ac_sed_extra\" "$ac_file_inputs" | $AWK -f "$ac_tmp/subs.awk" \ >$ac_tmp/out || as_fn_error $? "could not create $ac_file" "$LINENO" 5 test -z "$ac_datarootdir_hack$ac_datarootdir_seen" && { ac_out=`sed -n '/\${datarootdir}/p' "$ac_tmp/out"`; test -n "$ac_out"; } && { ac_out=`sed -n '/^[ ]*datarootdir[ ]*:*=/p' \ "$ac_tmp/out"`; test -z "$ac_out"; } && { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined" >&5 $as_echo "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined" >&2;} rm -f "$ac_tmp/stdin" case $ac_file in -) cat "$ac_tmp/out" && rm -f "$ac_tmp/out";; *) rm -f "$ac_file" && mv "$ac_tmp/out" "$ac_file";; esac \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 ;; :H) # # CONFIG_HEADER # if test x"$ac_file" != x-; then { $as_echo "/* $configure_input */" \ && eval '$AWK -f "$ac_tmp/defines.awk"' "$ac_file_inputs" } >"$ac_tmp/config.h" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 if diff "$ac_file" "$ac_tmp/config.h" >/dev/null 2>&1; then { $as_echo "$as_me:${as_lineno-$LINENO}: $ac_file is unchanged" >&5 $as_echo "$as_me: $ac_file is unchanged" >&6;} else rm -f "$ac_file" mv "$ac_tmp/config.h" "$ac_file" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 fi else $as_echo "/* $configure_input */" \ && eval '$AWK -f "$ac_tmp/defines.awk"' "$ac_file_inputs" \ || as_fn_error $? "could not create -" "$LINENO" 5 fi ;; esac done # for ac_tag as_fn_exit 0 _ACEOF ac_clean_files=$ac_clean_files_save test $ac_write_fail = 0 || as_fn_error $? "write failure creating $CONFIG_STATUS" "$LINENO" 5 # configure is writing to config.log, and then calls config.status. # config.status does its own redirection, appending to config.log. # Unfortunately, on DOS this fails, as config.log is still kept open # by configure, so config.status won't be able to write to it; its # output is simply discarded. So we exec the FD to /dev/null, # effectively closing config.log, so it can be properly (re)opened and # appended to by config.status. When coming back to configure, we # need to make the FD available again. if test "$no_create" != yes; then ac_cs_success=: ac_config_status_args= test "$silent" = yes && ac_config_status_args="$ac_config_status_args --quiet" exec 5>/dev/null $SHELL $CONFIG_STATUS $ac_config_status_args || ac_cs_success=false exec 5>>config.log # Use ||, not &&, to avoid exiting from the if with $? = 1, which # would make configure fail if this is the last instruction. $ac_cs_success || as_fn_exit 1 fi if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5 $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;} fi drbd-8.4.4/user/config.h.in0000664000000000000000000000175411562773171014164 0ustar rootroot/* user/config.h.in. Generated from configure.ac by autoheader. */ /* Local configuration directory. Commonly /etc or /usr/local/etc */ #undef DRBD_CONFIG_DIR /* Include support for drbd-8.3 kernel code */ #undef DRBD_LEGACY_83 /* Local state directory. Commonly /var/lib/drbd or /usr/local/var/lib/drbd */ #undef DRBD_LIB_DIR /* Local lock directory. Commonly /var/lock or /usr/local/var/lock */ #undef DRBD_LOCK_DIR /* Runtime state directory. Commonly /var/run/drbd or /usr/local/var/run/drbd */ #undef DRBD_RUN_DIR /* Define to the address where bug reports for this package should be sent. */ #undef PACKAGE_BUGREPORT /* Define to the full name of this package. */ #undef PACKAGE_NAME /* Define to the full name and version of this package. */ #undef PACKAGE_STRING /* Define to the one symbol short name of this package. */ #undef PACKAGE_TARNAME /* Define to the home page for this package. */ #undef PACKAGE_URL /* Define to the version of this package. */ #undef PACKAGE_VERSION