pax_global_header00006660000000000000000000000064122445062370014517gustar00rootroot0000000000000052 comment=453fb27059366b841e5a03a8cfb1c394847c716d mairix-master/000077500000000000000000000000001224450623700136455ustar00rootroot00000000000000mairix-master/.gitignore000066400000000000000000000002411224450623700156320ustar00rootroot00000000000000*.o *.swp *.report .RELEASES Makefile RELEASES config.log datescan.c datescan.h fromcheck.c fromcheck.h mairix make.log nvpscan.c nvpscan.h nvp.report version.h mairix-master/ACKNOWLEDGEMENTS000066400000000000000000000016611224450623700161260ustar00rootroot00000000000000These people have contributed useful patches, ideas and suggestions: Anand Kumria André Costa Andreas Amann Andre Costa Aredridel Balázs Szabó Bardur Arantsson Benj. Mako Hill Chris Mason Christoph Dworzak Christopher Rosado Chung-chieh Shan Claus Alboege Corrin Lakeland Dan Egnor Daniel Jacobowitz Dirk Huebner Ed Blackman Emil Sit Felipe Gustavo de Almeida Ico Doornekamp Jaime Velasco Juan James Leifer Jerry Jorgenson Joerg Desch Johannes Schindelin Johannes Weißl John Arthur Kane John Keener Jonathan Kamens Josh Purinton Karsten Petersen Kevin Rosenberg Mark Hills Martin Danielsson Matthias Teege Mikael Ylikoski Mika Fischer Oliver Braun Paramjit Oberoi Paul Fox Peter Chines Peter Jeremy Robert Hofer Roberto Boati Samuel Tardieu Sanjoy Mahajan Satyaki Das Steven Lumos Tim Harder Tom Doherty Vincent Lefevre Vladimir V. Kisil Will Yardley Wolfgang Weisselberg I apologise to any contributors who have been omitted from this list! mairix-master/COPYING000066400000000000000000000431031224450623700147010ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. mairix-master/INSTALL000066400000000000000000000012411224450623700146740ustar00rootroot00000000000000Installation of mairix goes as follows: ./configure make make install You need to be root to run the final step unless you're installing under your own home directory somewhere. However, you might want to tune the options further. The configure script shares its common options with the usual autoconf-generated scripts, even though it's not autoconf-generated itself. For example, a fuller build could use CC=gcc CFLAGS="-O2 -Wall" ./configure \ --prefix=/opt/mairix \ --infodir=/usr/share/info make make install The final step is to create a ~/.mairixrc file. An example is included in the file dotmairixrc.eg. Just copy that to ~/.mairixrc and edit it. mairix-master/Makefile.in000066400000000000000000000073001224450623700157120ustar00rootroot00000000000000######################################################################### # # mairix - message index builder and finder for maildir folders. # # Copyright (C) Richard P. Curnow 2002-2004,2006 # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. # # ======================================================================= ######################################################################### # Edit the following variables as required CC=@cc@ CFLAGS=@cflags@ @defs@ CPPFLAGS=@CPPFLAGS@ LDFLAGS=@LDFLAGS@ LIBS=@LIBS@ ####################################################################### # If you're generating a package, you may want to use # make DESTDIR=temporary_dir install # to get the software installed to a directory where you can create # a tdl.tar.gz from it DESTDIR= ####################################################################### prefix=$(DESTDIR)@prefix@ bindir=$(DESTDIR)@bindir@ mandir=$(DESTDIR)@mandir@ man1dir=$(mandir)/man1 man5dir=$(mandir)/man5 infodir=$(DESTDIR)@infodir@ docdir=$(DESTDIR)@docdir@ ######################################################################### # Things below this point shouldn't need to be edited. OBJ = mairix.o db.o rfc822.o tok.o hash.o dirscan.o writer.o \ reader.o search.o stats.o dates.o datescan.o mbox.o md5.o \ fromcheck.o glob.o dumper.o expandstr.o dotlock.o \ nvp.o nvpscan.o all : mairix mairix : $(OBJ) $(CC) -o mairix $(CFLAGS) $(LDFLAGS) $(OBJ) $(LIBS) %.o : %.c memmac.h mairix.h reader.h Makefile $(CC) -c $(CFLAGS) $(CPPFLAGS) -o $@ $< datescan.c datescan.h : datescan.nfa ./dfasyn/dfasyn ./dfasyn/dfasyn -o datescan.c -ho datescan.h -r datescan.report -v -u datescan.nfa fromcheck.c fromcheck.h : fromcheck.nfa ./dfasyn/dfasyn ./dfasyn/dfasyn -o fromcheck.c -ho fromcheck.h -r fromcheck.report -v -u fromcheck.nfa nvpscan.c nvpscan.h : nvp.nfa ./dfasyn/dfasyn ./dfasyn/dfasyn -o nvpscan.c -ho nvpscan.h -r nvpscan.report -v -u nvp.nfa dates.o : datescan.h mbox.o : fromcheck.h nvp.o : nvpscan.h version.h: ./mkversion ./dfasyn/dfasyn: if [ -d dfasyn ]; then cd dfasyn ; $(MAKE) CC="$(CC)" CFLAGS="$(CFLAGS)" ; else echo "No dfasyn subdirectory?" ; exit 1 ; fi clean: -rm -f *~ *.o mairix *.s core -rm -f mairix.cp mairix.fn mairix.aux mairix.log mairix.ky mairix.pg mairix.toc mairix.tp mairix.vr -rm -f fromcheck.[ch] datescan.[ch] -rm -f nvpscan.[ch] if [ -d dfasyn ]; then cd dfasyn ; $(MAKE) clean ; fi if [ -d test ]; then cd test ; $(MAKE) clean ; fi distclean: clean if [ -d test ]; then cd test ; $(MAKE) distclean ; fi -rm -f Makefile config.log install: [ -d $(prefix) ] || mkdir -p $(prefix) [ -d $(bindir) ] || mkdir -p $(bindir) [ -d $(mandir) ] || mkdir -p $(mandir) [ -d $(man1dir) ] || mkdir -p $(man1dir) [ -d $(man5dir) ] || mkdir -p $(man5dir) cp -f mairix $(bindir) chmod 555 $(bindir)/mairix cp -f mairix.1 $(man1dir) chmod 444 $(man1dir)/mairix.1 cp -f mairixrc.5 $(man5dir) chmod 444 $(man5dir)/mairixrc.5 check: mairix if [ -d test ]; then cd test ; $(MAKE) CC="$(CC)" CFLAGS="$(CFLAGS)" check ; else echo "No test subdirectory?" ; exit 1 ; fi .PHONY : all install clean distclean check mairix.o : version.h mairix-master/NEWS000066400000000000000000000312071224450623700143470ustar00rootroot00000000000000NEW IN VERSION 0.23 =================== * Allow '=' in message-id search for RFC2822 conformance * Add the option -H to force hardlinks * Skip .gitignore files * Do not interpret special characters [~,/=^] in Message-ID queries * Fix faultly mbox message separators * Improve reporting of unparsed MIME headers & remove code duplication * Allow empty sections in MIME headers * Add support for uuencoded attachments * Improve the parsing of MIME boundaries * Fix SEGV if mbox shrinks * Add test suite * Fix building in parallel NEW IN VERSION 0.22 =================== * Skip symlinks when using mbox (R A Lichtensteiger) * Update copyright year info throughout * Update ACKNOWLEDGEMENTS and copyright headers where more credit was due * Update FSF address in file headers * Update COPYING to latest gpl-2.0.txt * Improve error message if home directory cannot be determined * Honour HOME environment variable (Andreas Amann) * MIME types are allowed to have "+" characters in them. (Jonathan Kamens) * Fix deficiencies in the parsing of mbox From lines (Jonathan Kamens) * Include the existing -x flag in the help message (Mark Hills) * Fix documentation nits (Tom Doherty) * Remove spurious message when the mtime of a message file has changed * Do not export functions already exported through a callback structure. (Samuel Tardieu) * Fix two manpages buglets. (Samuel Tardieu) * When freeing a struct nvp, do not forget to free the struct nvp_entry. (Samuel Tardieu) * Do not leak memory if duplicate fields are present. (Samuel Tardieu) * Initialize the date header with a known value. (Samuel Tardieu) * Merge two conflicting solutions for bad MIME encoding * Fix segfault when last char is not a newline (Mika Fischer) * fix for MIME-related crash (Paramjit Oberoi) * Add support claws-mail (Anand Kumria) * Add MH sub-type support for ezmlm-archives (Claus Alboege) * Detect a trailing -f or -o with no following argument * Allow lines starting "From" to occur part-way through the header.o * Display message-ID in search -x mode * Remove execute permission from source files * Handle mbox from separators where email address is in angle brackets * Fix a bug in rfc822.c: Some headers weren't correctly parsed. (Jaime Velasco Juan) NEW IN VERSION 0.21 =================== * Fix make clean target in dfasyn/ (Benj. Mako Hill) * Limit number of messages that are examined when an end boundary is missing in an mbox (Chung-chieh Shan) * Avoid examining . and .. when traversing MH folder hierarchy (Steven Lumos) * Fix various bugs in the name/value parser * Add some RFC2231 support to the name/value parser (continuations) * Fix indexing when existing database only contains 1 message NEW IN VERSION 0.20 =================== * Cache uncompressed mbox data (Chris Mason, further work by me) * Fix gaps in date ranges for search * Unlock database if mairix is interrupted (Paul Fox) * Add fast index option (-F) * Fix conditional compilation errors for compressed mbox * Reimplement MIME header parsing * Add capability to search on names of attachments * Add capability to search on state of message flags * Create maildir-format mfolder filenames correctly with regard to flags * Various bug fixes (Oliver Braun, Matthias Teege) NEW IN VERSION 0.19 =================== * mairix.spec fixes (André Costa) * bug fix: freeing of message structures (Karsten Petersen) * Add new -x (--excerpt-output) option, an alternative mode for searching. This displays the key headers from the matching messages on stdout. * Add notes about the mairix-users mailing list and the SourceForge page to README. * Fix configuration + compilation to allow building with gzip support but without bzlib support. * Rename internal functions like zopen() to avoid name conflicts on MacOS X. (Vincent Lefevre) * Remove a spurious ; in bison input file (Vincent Lefevre) * Improve output given in various error conditions (based on patch by Karsten Petersen) NEW IN VERSION 0.18 =================== * Support bzip2'd mbox folders * Fix bugs in parsing mbox folders containing unquoted 'From ' lines inside MIME body parts * Fix bug in parsing content-type data containing quotes with whitespace before * Clone the message flags (when both the source folder and mfolder are both of maildir type) * New manpages mairix.1 and mairixrc.5 are included, and the old texinfo-based documentation is deprecated into the old_docs/ directory. * Upgrade scanners to new version of dfasyn * Support Mew's MH folder subtype NEW IN VERSION 0.17.1 ===================== * Fix detection of MH folder subtype used by nnml (Gnus) * Fix filename format generated in the /cur/ directory for maildir mfolders. * Syntax fix in configure script NEW IN VERSION 0.17 =================== * Support gzipped mbox folders (any file matched by a mbox= line in the config file is considered as a gzipped mbox if its name ends in .gz) * Rework directory traversal for the '...' construct to speed up indexing and the check that mfolder isn't going to overwrite a real folder when searching. * Check whether database exists before attempting to do searching. * Matched new maildir messages go in /new/ subdirectory of maildir mfolder. * Fix lots of compiler warnings generated by gcc4.x * Don't create and immediately scrub database entries for empty mbox folders. * Fix usage() info for bare word in searching * Allow '.' on the ends of numeric filenames in MH folders (to work with Evolution) * Update .PHONY target so that 'make install' etc are more reliable. * Add X-source-folder header to indicate the original folder of a match found in an mbox. * Migration to git for revision control. NEW IN VERSION 0.16.1 ===================== * Remove the lockfile if the program terminates for any reason. NEW IN VERSION 0.16 =================== * Home directory (~) and environment variable ($foo / ${foo}) expansion in the .mairixrc file * Add -Q flag to skip database integrity checks during indexing (equivalently the nochecks option in .mairixrc file). This speeds up indexing but loses some robustness. * Add ^ word prefix to require substring search to be left-anchored * Split 'make clean' into separate clean and clean_docs * Improve some error messages * Add online help entries for -o and -d * Don't write out the database if there are no changes found during indexing. * Fix stale information about the 'and' and 'or' delimiters in the online help. * Add the capability to omit particular folders from indexing (omit keyword in .mairixrc file.) This allows broad wildcards to be used with selected folders removed from the wildcard which is much more convenient in many set-ups. * Avoid writing matches to any folder on the list of folders to be indexed (affects both mfolder option and argument of -o command line switch.) This prevents disasterous loss of messages in the event of trying to overwrite an wanted folder with the matches. * Implement dot-locking on the database file to prevent corruption due to concurrent updates. Add --unlock file to forcibly remove a stray lockfile. * Display message path in warning messages from rfc822 parsing. NEW IN VERSION 0.15 =================== * Migrate to GNU Arch for hosting the development archive * In mbox parsing, handle return path in 'From ' line only being a local part (reported by several people) * Don't output number of matched messages in raw mode (to make output more useful to scripts etc) (Samuel Tardieu) * Fix vfolder->mfolder in dotmairixrc.eg (reported by several people) * Handle spaces in multipart message boundary strings (Chung-chieh Shan) * Be more tolerant of bad multipart message boundary separators (Chung-chieh Shan) * Add rudimentary database dump command (-d/--dump) * Fix bug in handling of per-database hash key * Improve standards-compliance of maildir output file names (Jeff King) * Remove most compiler warnings NEW IN VERSION 0.14.1 ===================== * Bug fix : splitting of messages in mboxes was too strict regarding whitespace NEW IN VERSION 0.14 =================== * Fix error in path (p:) searching for messages in mboxes. * Improve usage() function NEW IN VERSION 0.13 =================== * Fixes to support the mbox format used by Mozilla mail * When creating vfolder directories for maildir/mh, remove existing non-directory at the same path, if present. When creating mbox vfolder file, complain if there's already a directory at the same path and exit. * Switch from the term "virtual folder" to "match folder" * Fix bug in path matches (p:) containing upper-case letters - previously they matched on corresponding all lower-case paths. NEW IN VERSION 0.12 =================== ! Change in database file format - existing databases need to be destroyed and recreated. * Indexing of mbox folders in addition to the existing maildir & MH support * Output to mbox format vfolder * Return exit status 1 if no messages are matched in search mode, and exit status 2 for all error conditions. * Allow wildcards to be used in specifying maildir and mh folder paths. * Searching on messages having a particular Message-ID (m:msgid expression in search mode). * When indexing whole email addresses, '+' is now considered a valid character. * Use ',' instead of '+' in search expressions, and '/' instead of ','. This is to allow '+' to be used inside email addresses that are being searched for. The '/' character is traditionally associated with meaning 'or', so it made more sense to move ',' to mean 'and'. (Unfortunately, there were very few metacharacters left which don't have some special meaning to shells, and I wanted to avoid the need to quote or escape the search expressions.) * Bug fix checking return status of mmap. * Handle ">From " at the start of the message headers * Handle mis-formatted encoding strings "7 bit" and "8 bit" * Make every database use a random seed for the token hash function (to prevent denial of service attacks against mairix through carefully crafted messages.) * Rename some options in the mairixrc file, to put the folder formats on an equal footing. * Properly handle the case where a maildir vfolder exists but one or more of the new,tmp,cur subdirectories is missing. * Add configure script (not autoconf-based) NEW IN VERSION 0.11 =================== * Detect failed malloc (out of memory) conditions properly and report it. * Improved date specification syntax for d: option * Allow vfolder to be an absolute path or relative to current directory, instead of just relative to base directory. NEW IN VERSION 0.10 =================== * Add 'raw' mode for searching. * When purging, only print the pass[12] message in verbose mode * Add an ACKNOWLEDGEMENTS file. * Hack to handle missing NAME_MAX on various non-Linux systems * Improve mairix.spec file for RPM building * Change default value for prefix in Makefile to make it more standard. NEW IN VERSION 0.9 ================== * Fix problem with auditing headers if a uucp/mbox-style "from " header is present at the start. * Allow \: sequence in folder names to specify a : NEW IN VERSION 0.8 ================== * Fix bug : mairix used to crash if a message had corrupted RFC822 header lines NEW IN VERSION 0.7 ================== * Fix bug : mairix likely to crash if a non-existant folder is listed in the conf file. * Allow multiple folders and mh_folders lines in the conf file for people who have many separate folders. * Print an extra 'comfort' message in verbose mode before starting to scan the directory tree. NEW IN VERSION 0.6 ================== * When an unrecognized encoding is found, ignore the body part instead of aborting the run. NEW IN VERSION 0.5 ================== * When -a option is used for search, avoid symlinking the same message twice if it matches more than one query. * Fixes to rpm spec file. * Fix handling of = in base64-encoded attachments. * Support non POSIX locales. * Support rfc2047 encoding in headers. * Create vfolder if it doesn't already exist. * Allow searching on complete email addresses as well as individual words in to, cc and from fields. * New -o option to allow vfolder name to be given on the command line. NEW IN VERSION 0.4 ================== * Support for MH folders * Create database with mode 0600 instead of 0644 (better security). * Add Makefile target to install whichever forms of the documentation have been built. NEW IN VERSION 0.3 ================== * Various bug fixes NEW IN VERSION 0.2 ================== * Substrings of message paths can be used as search expressions (p:substring option) * = now used instead of / as the delimiter for number of errors in an approximate match (to help with path search) * Bug fix when using -t mode for search with unpurged dead messages still in the database. ================== # vim:comments-=mb\:*:comments+=fb\:* mairix-master/README000066400000000000000000000042431224450623700145300ustar00rootroot00000000000000mairix is a program for indexing and searching email messages stored in Maildir, MH or mbox folders. * Indexing is fast. It runs incrementally on new messages - any particular message only gets scanned once in the lifetime of the index file. * The search mode populates a "virtual" folder with symlinks(*) which point to the real messages. This folder can be opened as usual in your mail program. * The search mode is very fast. * Indexing and searching works on the basis of words. The index file tabulates which words occur in which parts (particular headers + body) of which messages. The program is a very useful complement to mail programs like mutt (http://www.mutt.org/, which supports Maildir, MH and mbox folders) and Sylpheed (which supports MH folders). [(*) where the input or output folder is an mbox, a copy of the message is made instead of symlinking.] See also the mairix.txt file. ********************************************************************* Copyright (C) Richard P. Curnow 2002-2004 This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. ********************************************************************* Suggestions, bug reports, experiences, praise, complaints etc to the author please, at Since July 2006, there is a mairix-users mailing list. To subscribe or to view the archives, visit https://lists.sourceforge.net/lists/listinfo/mairix-users The main website for mairix is http://www.rc0.org.uk/mairix The SourceForge project page is http://www.sf.net/projects/mairix ACKNOWLEDGEMENTS ================ See the ACKNOWLEDGEMENTS file mairix-master/configure000077500000000000000000000205301224450623700155540ustar00rootroot00000000000000#!/bin/sh ######################################################################### # # mairix - message index builder and finder for maildir folders. # # Copyright (C) Richard P. Curnow 2003,2004,2005 # Copyright (C) Paramjit Oberoi 2005 # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. # # ======================================================================= if [ -f config.log ]; then rm -f config.log ; fi exec 5>config.log MYCC=${CC:-gcc} MYCFLAGS=${CFLAGS:--O2 -Wall} MYCPPFLAGS=${CPPFLAGS:-} MYLDFLAGS=${LDFLAGS:-} # ======================================================================= # Functions #{{{ cleanup cleanup () { if [ -f docheck.c ]; then rm -f docheck.c ; fi if [ -f docheck.o ]; then rm -f docheck.o ; fi if [ -f docheck ]; then rm -f docheck ; fi rm -rf docheck.c docheck.o docheck } #}}} #{{{ test_cc : basic compiler sanity check test_cc () { printf "Testing whether your compiler \"$MYCC $MYCFLAGS\" works : " cat >docheck.c < int main (int argc, char **argv) { return 0; } EOF ${MYCC} ${MYCFLAGS} -o docheck docheck.c 1>&5 2>&5 if [ $? -eq 0 ] then printf "it works\n" else printf "it doesn't work\n" printf "Failed program was\n" 1>&5 cat docheck.c 1>&5 rm -f docheck.c docheck exit 1 fi cleanup } #}}} #{{{ test_for_stdint_h test_for_stdint_h () { cat >docheck.c < int main(int argc, char **argv) { return 0; } EOF ${MYCC} ${MYCFLAGS} -c -o docheck.o docheck.c >/dev/null 2>&1 if [ $? -eq 0 ] then result=0 else result=1 fi rm -f docheck.c docheck.o echo $result } #}}} #{{{ test_for_inttypes_h test_for_inttypes_h () { cat >docheck.c < int main(int argc, char **argv) { return 0; } EOF ${MYCC} ${MYCFLAGS} -c -o docheck.o docheck.c >/dev/null 2>&1 if [ $? -eq 0 ] then result=0 else result=1 fi rm -f docheck.c docheck.o echo $result } #}}} #{{{ test_for_zlib test_for_zlib () { cat > docheck.c < int main () { const char *foo; foo = zlibVersion(); return 0; } EOF echo "Test program is" 1>&5 cat docheck.c 1>&5 ${MYCC} ${MYCPPFLAGS} ${MYCFLAGS} ${MYLDFLAGS} -o docheck docheck.c -lz 1>&5 2>&1 if [ $? -eq 0 ] then result=0 else result=1 fi rm -f docheck.c docheck echo $result } #}}} #{{{ test_for_bzlib test_for_bzlib () { cat > docheck.c < int main () { const char *foo; foo = BZ2_bzlibVersion(); return 0; } EOF echo "Test program is" 1>&5 cat docheck.c 1>&5 ${MYCC} ${MYCPPFLAGS} ${MYCFLAGS} ${MYLDFLAGS} -o docheck docheck.c -lbz2 1>&5 2>&1 if [ $? -eq 0 ] then result=0 else result=1 fi rm -f docheck.c docheck echo $result } #}}} #{{{ test_for_bison test_for_bison () { bison --help > /dev/null if [ $? -eq 0 ] then result=0 else result=1 fi echo $result } #}}} #{{{ test_for_flex test_for_flex () { flex --help > /dev/null if [ $? -eq 0 ] then result=0 else result=1 fi echo $result } #}}} #{{{ usage usage () { cat < if you have header files in a nonstandard directory LDFLAGS linker flags, e.g. -L if you have libraries in a nonstandard directory Use these variables to override the choices made by \`configure' or to help it to find libraries and programs with nonstandard names/locations. Report bugs to . EOF } #}}} # ======================================================================= # Defaults for variables PREFIX=/usr/local use_readline=yes bad_options=no use_gzip_mbox=yes use_bzip_mbox=yes # Parse options to configure for option do case "$option" in --prefix=* | --install-prefix=* ) PREFIX=`echo $option | sed -e 's/[^=]*=//;'` ;; --bindir=* ) BINDIR=`echo $option | sed -e 's/[^=]*=//;'` ;; --mandir=* ) MANDIR=`echo $option | sed -e 's/[^=]*=//;'` ;; --infodir=* ) INFODIR=`echo $option | sed -e 's/[^=]*=//;'` ;; --docdir=* ) DOCDIR=`echo $option | sed -e 's/[^=]*=//;'` ;; --enable-gzip-mbox ) use_gzip_mbox=yes ;; --disable-gzip-mbox ) use_gzip_mbox=no ;; --enable-bzip-mbox ) use_bzip_mbox=yes ;; --disable-bzip-mbox ) use_bzip_mbox=no ;; -h | --help ) usage exit 1 ;; * ) printf "Unrecognized option : $option\n" bad_options=yes ;; esac done if [ ${bad_options} = yes ]; then exit 1 fi DEFS="" test_cc printf "Checking for : " if [ `test_for_stdint_h` -eq 0 ]; then printf "Yes\n" DEFS="${DEFS} -DHAS_STDINT_H" else printf "No\n" fi printf "Checking for : " if [ `test_for_inttypes_h` -eq 0 ]; then printf "Yes\n" DEFS="${DEFS} -DHAS_INTTYPES_H" else printf "No\n" fi if [ $use_gzip_mbox = "yes" ]; then printf "Checking for zlib : " if [ `test_for_zlib` -eq 0 ]; then printf "Yes\n"; DEFS="${DEFS} -DUSE_GZIP_MBOX" LIBS="-lz" else printf "No (disabled gzipped mbox support)\n"; fi fi if [ $use_bzip_mbox = "yes" ]; then printf "Checking for bzlib : " if [ `test_for_bzlib` -eq 0 ]; then printf "Yes\n"; DEFS="${DEFS} -DUSE_BZIP_MBOX" LIBS="${LIBS} -lbz2" else printf "No (disabled bzip2ed mbox support)\n"; fi fi printf "Checking for bison : " if [ `test_for_bison` -eq 0 ]; then printf "Yes\n"; else printf "No\n"; exit 1; fi printf "Checking for flex : " if [ `test_for_flex` -eq 0 ]; then printf "Yes\n"; else printf "No\n"; exit 1; fi #{{{ Determine version number of the program. if [ -f version.txt ]; then revision=`cat version.txt` else revision="DEVELOPMENT" fi #}}} if [ "x" = "x${BINDIR}" ]; then BINDIR=${PREFIX}/bin ; fi if [ "x" = "x${MANDIR}" ]; then MANDIR=${PREFIX}/man ; fi if [ "x" = "x${INFODIR}" ]; then INFODIR=${PREFIX}/info ; fi if [ "x" = "x${DOCDIR}" ]; then DOCDIR=${PREFIX}/doc/mairix-${revision} ; fi echo "Generating Makefile" rm -f Makefile sed -e "s%@cc@%${MYCC}%; \ s%@defs@%${DEFS}%; \ s%@cflags@%${MYCFLAGS}%; \ s%@prefix@%${PREFIX}%; \ s%@bindir@%${BINDIR}%; \ s%@mandir@%${MANDIR}%; \ s%@infodir@%${INFODIR}%; \ s%@docdir@%${DOCDIR}%; \ s%@LIBS@%${LIBS}%; \ s%@CPPFLAGS@%${MYCPPFLAGS}%; \ s%@LDFLAGS@%${MYLDFLAGS}%; \ " < Makefile.in > Makefile # Avoid editing Makefile instead of Makefile.in chmod ugo-w Makefile # ======================================================================= # vim:et:sw=2:ht=2:sts=2:fdm=marker:cms=#%s mairix-master/dates.c000066400000000000000000000235171224450623700151210ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002-2004,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include #include #include #include #include #include "mairix.h" #include "dates.h" #include "datescan.h" static enum DATESCAN_TYPE discover_type(char *first, char *last)/*{{{*/ { int current_state = 0; int token; char *p; p = first; while (p < last) { token = datescan_char2tok[(int)*(unsigned char*)p]; current_state = datescan_next_state(current_state, token); if (current_state < 0) break; p++; } if (current_state < 0) { return DS_FAILURE; } else { return datescan_attr[current_state]; } } /*}}}*/ static int match_month(char *p)/*{{{*/ { if (!strncasecmp(p, "jan", 3)) return 1; if (!strncasecmp(p, "feb", 3)) return 2; if (!strncasecmp(p, "mar", 3)) return 3; if (!strncasecmp(p, "apr", 3)) return 4; if (!strncasecmp(p, "may", 3)) return 5; if (!strncasecmp(p, "jun", 3)) return 6; if (!strncasecmp(p, "jul", 3)) return 7; if (!strncasecmp(p, "aug", 3)) return 8; if (!strncasecmp(p, "sep", 3)) return 9; if (!strncasecmp(p, "oct", 3)) return 10; if (!strncasecmp(p, "nov", 3)) return 11; if (!strncasecmp(p, "dec", 3)) return 12; return 0; } /*}}}*/ static int year_fix(int y)/*{{{*/ { if (y>100) { return y-1900; } else if (y < 70) { /* 2000-2069 */ return y+100; } else { /* 1970-1999 */ return y; } } /*}}}*/ static int last_day(int mon, int y) {/*{{{*/ /* mon in [0,11], y=year-1900 */ static unsigned char days[12] = {31,28,31,30,31,30,31,31,30,31,30,31}; if (mon != 1) { return days[mon]; } else { /* Because 2000 was a leap year, we don't have to bother about the %100 * rule, at least not in this range of dates. */ if ((y % 4) == 0) { return 29; } else { return 28; } } } /*}}}*/ static void set_day(struct tm *x, int y)/*{{{*/ { if (y > x->tm_mday) { /* Shorthand for that day in previous month */ if (x->tm_mon == 0) { x->tm_mon = 11; --x->tm_year; } else { --x->tm_mon; } } x->tm_mday = y; /* Always */ } /*}}}*/ static int is_later_dm(struct tm *x, int m, int d)/*{{{*/ { int m1 = m-1; return ((x->tm_mon < m1) || ((x->tm_mon == m1) && (x->tm_mday < d))); } /*}}}*/ static int scan_date_expr(char *first, char *last, struct tm *start, struct tm *end)/*{{{*/ { enum DATESCAN_TYPE type; time_t now; time(&now); type = discover_type(first, last); if (type == DS_SCALED) {/*{{{*/ int v; char *p; time_t then; p = first; v = 0; while (isdigit(*p)) { v = (v*10) + (*p - '0'); p++; } switch(*p) { case 'd': v *= 86400; break; case 'w': v *= 7*86400; break; case 'm': v *= 30*86400; break; case 'y': v *= 365*86400; break; default: fprintf(stderr, "Unrecognized relative date scaling '%c'\n", *p); return -1; } then = now - v; if (start) { *start = *localtime(&then); } if (end) { *end = *localtime(&then); }/*}}}*/ } else if (type == DS_FAILURE) { fputs("Cannot parse date expression [", stderr); fwrite(first, sizeof(char), last-first, stderr); fputs("]\n", stderr); return -1; } else { /* something else */ int v1, v3; int m2; /* decoded month */ char *p; v1 = v3 = m2 = 0; p = first; while (p < last && isdigit(*p)) { v1 = (v1*10) + (*p - '0'); p++; } if (p < last) { m2 = match_month(p); p += 3; if (m2 == 0) { return -1; /* failure */ } } while (p < last && isdigit(*p)) { v3 = (v3*10) + (*p - '0'); p++; } assert(p==last); /* should be true in all cases. */ switch (type) { case DS_D:/*{{{*/ if (start) set_day(start, v1); if (end) set_day(end, v1); break; /*}}}*/ case DS_Y:/*{{{*/ if (start) { start->tm_mday = 1; start->tm_mon = 0; /* january */ start->tm_year = year_fix(v1); } if (end) { end->tm_mday = 31; end->tm_mon = 11; end->tm_year = year_fix(v1); } break; /*}}}*/ case DS_YYMMDD:/*{{{*/ if (start) { start->tm_mday = v1 % 100; start->tm_mon = ((v1 / 100) % 100) - 1; start->tm_year = year_fix(v1/10000); } if (end) { end->tm_mday = v1 % 100; end->tm_mon = ((v1 / 100) % 100) - 1; end->tm_year = year_fix(v1/10000); } break; /*}}}*/ case DS_M:/*{{{*/ if (start) { if (m2-1 > start->tm_mon) --start->tm_year; /* shorthand for previous year */ start->tm_mon = m2-1; start->tm_mday = 1; } if (end) { if (m2-1 > end->tm_mon) --end->tm_year; /* shorthand for previous year */ end->tm_mon = m2-1; end->tm_mday = last_day(m2-1, end->tm_year); } break; /*}}}*/ case DS_DM:/*{{{*/ if (start) { if (is_later_dm(start, m2, v1)) --start->tm_year; /* shorthand for previous year. */ start->tm_mon = m2-1; start->tm_mday = v1; } if (end) { if (is_later_dm(end, m2, v1)) --end->tm_year; /* shorthand for previous year. */ end->tm_mon = m2-1; end->tm_mday = v1; } break; /*}}}*/ case DS_MD:/*{{{*/ if (start) { if (is_later_dm(start, m2, v3)) --start->tm_year; /* shorthand for previous year. */ start->tm_mon = m2-1; start->tm_mday = v3; } if (end) { if (is_later_dm(end, m2, v3)) --end->tm_year; /* shorthand for previous year. */ end->tm_mon = m2-1; end->tm_mday = v3; } break; /*}}}*/ case DS_DMY:/*{{{*/ if (start) { start->tm_mon = m2-1; start->tm_mday = v1; start->tm_year = year_fix(v3); } if (end) { end->tm_mon = m2-1; end->tm_mday = v1; end->tm_year = year_fix(v3); } break; /*}}}*/ case DS_YMD:/*{{{*/ if (start) { start->tm_mon = m2-1; start->tm_mday = v3; start->tm_year = year_fix(v1); } if (end) { end->tm_mon = m2-1; end->tm_mday = v3; end->tm_year = year_fix(v1); } break; /*}}}*/ case DS_MY:/*{{{*/ if (start) { start->tm_year = year_fix(v3); start->tm_mon = m2 - 1; start->tm_mday = 1; } if (end) { end->tm_year = year_fix(v3); end->tm_mon = m2 - 1; end->tm_mday = last_day(end->tm_mon, end->tm_year); } break; /*}}}*/ case DS_YM:/*{{{*/ if (start) { start->tm_year = year_fix(v1); start->tm_mon = m2 - 1; start->tm_mday = 1; } if (end) { end->tm_year = year_fix(v1); end->tm_mon = m2 - 1; end->tm_mday = last_day(end->tm_mon, end->tm_year); } break;/*}}}*/ case DS_FAILURE: return -1; break; case DS_SCALED: assert(0); break; } } return 0; } /*}}}*/ int scan_date_string(char *in, time_t *start, int *has_start, time_t *end, int *has_end)/*{{{*/ { char *hyphen; time_t now; struct tm start_tm, end_tm; char *nullchar; int status; *has_start = *has_end = 0; nullchar = in; while (*nullchar) nullchar++; time(&now); start_tm = end_tm = *localtime(&now); start_tm.tm_hour = 0; start_tm.tm_min = 0; start_tm.tm_sec = 0; end_tm.tm_hour = 23; end_tm.tm_min = 59; end_tm.tm_sec = 59; hyphen = strchr(in, '-'); if (!hyphen) { /* Start and end are the same. */ *has_start = *has_end = 1; status = scan_date_expr(in, nullchar, &start_tm, &end_tm); if (status) return status; *start = mktime(&start_tm); *end = mktime(&end_tm); return 0; } else { if (hyphen+1 < nullchar) { *has_end = 1; status = scan_date_expr(hyphen+1, nullchar, NULL, &end_tm); if (status) return status; *end = mktime(&end_tm); start_tm = end_tm; } if (hyphen > in) { *has_start = 1; status = scan_date_expr(in, hyphen, &start_tm, NULL); if (status) return status; *start = mktime(&start_tm); } } return 0; } /*}}}*/ #ifdef TEST static void check(char *in)/*{{{*/ { struct tm start, end; int result; result = scan_date_string(in, &start, &end); if (result) printf("Conversion for <%s> failed\n", in); else { char buf1[128], buf2[128]; strftime(buf1, 128, "%d-%b-%Y", &start); strftime(buf2, 128, "%d-%b-%Y", &end); printf("Computed range for <%s> : %s - %s\n", in, buf1, buf2); } } /*}}}*/ int main (int argc, char **argv)/*{{{*/ { check("2w-1w"); check("4m-1w"); check("2002-2003"); check("may2002-2003"); check("2002may-2003"); check("feb98-15may99"); check("feb98-15may1999"); check("2feb98-1y"); check("02feb98-1y"); check("970617-20010618"); return 0; } /*}}}*/ #endif mairix-master/dates.h000066400000000000000000000023271224450623700151220ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002-2004 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #ifndef DATES_H #define DATES_H enum DATESCAN_TYPE { DS_FAILURE, DS_D, DS_Y, DS_YYMMDD, DS_SCALED, DS_M, DS_DM, DS_MD, DS_YM, DS_MY, DS_YMD, DS_DMY, }; extern int datescan_next_state(int current_state, int next_token); extern enum DATESCAN_TYPE datescan_exitval[]; #endif /* DATES_H */ mairix-master/datescan.nfa000066400000000000000000000055471224450623700161300ustar00rootroot00000000000000######################################################################### # # mairix - message index builder and finder for maildir folders. # # Copyright (C) Richard P. Curnow 2002-2004,2006 # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. # # ======================================================================= # NFA description for parsing dates # Stuff to pass through verbatim %{ #include "dates.h" %} Abbrev A = [a-zA-Z] BLOCK day { State in [12] ; [0-9] -> out [3] ; [01] -> out } # Match 2 digit year BLOCK year { State in [04-9] ; [0-9] -> out [3] ; [2-9] -> out } BLOCK month { State in A ; A ; A -> out } BLOCK scaled { State in [0-9] -> in, after_value State after_value A -> out } BLOCK ccyy { State in [1-9] ; [0-9] ; [0-9] ; [0-9] -> out } BLOCK main { State in [1-9] = DS_D out> = DS_D out> = DS_Y out> = DS_Y [0-9] ; [0-9] ; [0-9] ; [0-9] ; [0-9] ; [0-9] = DS_YYMMDD [0-9] ; [0-9] ; [0-9] ; [0-9] ; [0-9] ; [0-9] ; [0-9] ; [0-9] = DS_YYMMDD out> = DS_SCALED out> = DS_M [1-9] ; out> = DS_DM out> ; out> = DS_DM out> ; [1-9] = DS_MD out> ; out> = DS_MD out> ; out> = DS_YM out> ; out> = DS_MY out> ; out> = DS_YM out> ; out> = DS_MY out> ; out> ; [1-9] = DS_YMD out> ; out> ; out> = DS_YMD [1-9] ; out> ; out> = DS_DMY out> ; out> ; out> = DS_DMY out> ; out> ; [1-9] = DS_YMD out> ; out> ; out> = DS_YMD [1-9] ; out> ; out> = DS_DMY out> ; out> ; out> = DS_DMY } ATTR DS_D ATTR DS_Y ATTR DS_YYMMDD ATTR DS_SCALED ATTR DS_M ATTR DS_DM ATTR DS_MD ATTR DS_YM ATTR DS_MY ATTR DS_YMD ATTR DS_DMY DEFATTR DS_FAILURE TYPE "enum DATESCAN_TYPE" PREFIX datescan # vim:ft=txt:et:sw=4:sts=4:ht=4 mairix-master/db.c000066400000000000000000001115061224450623700144020ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005,2006,2007,2009 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* Handle complete database */ #include "mairix.h" #include "reader.h" #include #include #include #include struct sortable_token {/*{{{*/ char *text; int index; }; /*}}}*/ static int compare_sortable_tokens(const void *a, const void *b)/*{{{*/ { const struct sortable_token *aa = (const struct sortable_token *) a; const struct sortable_token *bb = (const struct sortable_token *) b; int foo; foo = strcmp(aa->text, bb->text); if (foo) { return foo; } else { if (aa->index < bb->index) return -1; else if (aa->index > bb->index) return +1; else return 0; } } /*}}}*/ static void check_toktable_enc_integrity(int n_msgs, struct toktable *table)/*{{{*/ { /* FIXME : Check reachability of tokens that are displaced from their natural * hash bucket (if deletions have occurred during purge). */ int idx, incr; int i, k; unsigned char *j, *last_char; int broken_chains = 0; struct sortable_token *sort_list; int any_duplicates; for (i=0; isize; i++) { struct token *tok = table->tokens[i]; if (tok) { idx = 0; incr = 0; last_char = tok->match0.msginfo + tok->match0.n; for (j = tok->match0.msginfo; j < last_char; ) { incr = read_increment(&j); idx += incr; } if (idx != tok->match0.highest) { fprintf(stderr, "broken encoding chain for token <%s>, highest=%ld\n", tok->text, tok->match0.highest); fflush(stderr); broken_chains = 1; } if (idx >= n_msgs) { fprintf(stderr, "end of chain higher than number of message paths (%d) for token <%s>\n", n_msgs, tok->text); fflush(stderr); broken_chains = 1; } } } assert(!broken_chains); /* Check there are no duplicated tokens in the table. */ sort_list = new_array(struct sortable_token, table->n); k = 0; for (i=0; isize; i++) { struct token *tok = table->tokens[i]; if (tok) { sort_list[k].text = new_string(tok->text); sort_list[k].index = i; k++; } } assert(k == table->n); qsort(sort_list, table->n, sizeof(struct sortable_token), compare_sortable_tokens); /* Check for uniqueness of neighbouring token texts */ any_duplicates = 0; for (i=0; i<(table->n - 1); i++) { if (!strcmp(sort_list[i].text, sort_list[i+1].text)) { fprintf(stderr, "Token table contains duplicated token %s at indices %d and %d\n", sort_list[i].text, sort_list[i].index, sort_list[i+1].index); any_duplicates = 1; } } /* release */ for (i=0; in; i++) { free(sort_list[i].text); } free(sort_list); if (any_duplicates) { fprintf(stderr, "Token table contained duplicate entries, aborting\n"); assert(0); } } /*}}}*/ static int compare_strings(const void *a, const void *b)/*{{{*/ { const char **aa = (const char **) a; const char **bb = (const char **) b; return strcmp(*aa, *bb); } /*}}}*/ static void check_message_path_integrity(struct database *db)/*{{{*/ { /* TODO : for now only checks integrity of non-mbox paths. */ /* Check there are no duplicates */ int i; int n; int has_duplicate = 0; char **paths; paths = new_array(char *, db->n_msgs); for (i=0, n=0; in_msgs; i++) { switch (db->type[i]) { case MTY_DEAD: case MTY_MBOX: break; case MTY_FILE: paths[n++] = db->msgs[i].src.mpf.path; break; } } qsort(paths, n, sizeof(char *), compare_strings); for (i=1; i repeated\n", paths[i]); has_duplicate = 1; } } fflush(stderr); assert(!has_duplicate); free(paths); return; } /*}}}*/ void check_database_integrity(struct database *db)/*{{{*/ { if (verbose) fprintf(stderr, "Checking message path integrity\n"); check_message_path_integrity(db); /* Just check encoding chains for now */ if (verbose) fprintf(stderr, "Checking to\n"); check_toktable_enc_integrity(db->n_msgs, db->to); if (verbose) fprintf(stderr, "Checking cc\n"); check_toktable_enc_integrity(db->n_msgs, db->cc); if (verbose) fprintf(stderr, "Checking from\n"); check_toktable_enc_integrity(db->n_msgs, db->from); if (verbose) fprintf(stderr, "Checking subject\n"); check_toktable_enc_integrity(db->n_msgs, db->subject); if (verbose) fprintf(stderr, "Checking body\n"); check_toktable_enc_integrity(db->n_msgs, db->body); if (verbose) fprintf(stderr, "Checking attachment_name\n"); check_toktable_enc_integrity(db->n_msgs, db->attachment_name); } /*}}}*/ struct database *new_database(unsigned int hash_key)/*{{{*/ { struct database *result = new(struct database); struct timeval tv; pid_t pid; result->to = new_toktable(); result->cc = new_toktable(); result->from = new_toktable(); result->subject = new_toktable(); result->body = new_toktable(); result->attachment_name = new_toktable(); result->msg_ids = new_toktable2(); if ( hash_key == CREATE_RANDOM_DATABASE_HASH ) { gettimeofday(&tv, NULL); pid = getpid(); hash_key = tv.tv_sec ^ (pid ^ (tv.tv_usec << 15)); } result->hash_key = hash_key; result->msgs = NULL; result->type = NULL; result->n_msgs = 0; result->max_msgs = 0; result->mboxen = NULL; result->n_mboxen = 0; result->max_mboxen = 0; return result; } /*}}}*/ void free_database(struct database *db)/*{{{*/ { int i; free_toktable(db->to); free_toktable(db->cc); free_toktable(db->from); free_toktable(db->subject); free_toktable(db->body); free_toktable(db->attachment_name); free_toktable2(db->msg_ids); if (db->msgs) { for (i=0; in_msgs; i++) { switch (db->type[i]) { case MTY_DEAD: break; case MTY_MBOX: break; case MTY_FILE: assert(db->msgs[i].src.mpf.path); free(db->msgs[i].src.mpf.path); break; } } free(db->msgs); free(db->type); } free(db); } /*}}}*/ static int get_max (int a, int b) {/*{{{*/ return (a > b) ? a : b; } /*}}}*/ static void import_toktable(char *data, unsigned int hash_key, int n_msgs, struct toktable_db *in, struct toktable *out)/*{{{*/ { int n, size, i; n = in->n; size = 1; while (size < n) size <<= 1; size <<= 1; /* safe hash table size */ out->size = size; out->mask = size - 1; out->n = n; out->tokens = new_array(struct token *, size); memset(out->tokens, 0, size * sizeof(struct token *)); out->hwm = (n + size) >> 1; for (i=0; ienc_offsets[i]; idx = 0; for (j = enc; *j != 0xff; ) { incr = read_increment(&j); idx += incr; } enc_len = j - enc; enc_hi = idx; text = data + in->tok_offsets[i]; hash = hashfn((unsigned char *) text, strlen(text), hash_key); nt = new(struct token); nt->hashval = hash; nt->text = new_string(text); /* Allow a bit of headroom for adding more entries later */ nt->match0.max = get_max(16, enc_len + (enc_len >> 1)); nt->match0.n = enc_len; nt->match0.highest = enc_hi; assert(nt->match0.highest < n_msgs); nt->match0.msginfo = new_array(unsigned char, nt->match0.max); memcpy(nt->match0.msginfo, enc, nt->match0.n); index = hash & out->mask; while (out->tokens[index]) { /* Audit to look for corrupt database with multiple entries for the same * string. */ if (!strcmp(nt->text, out->tokens[index]->text)) { fprintf(stderr, "\n!!! Corrupt token table found in database, token <%s> duplicated, aborting\n", nt->text); fprintf(stderr, " Delete the database file and rebuild from scratch as a workaround\n"); /* No point going on - need to find out why the database got corrupted * in the 1st place. Workaround for user - rebuild database from * scratch by deleting it then rerunning. */ unlock_and_exit(1); } ++index; index &= out->mask; } out->tokens[index] = nt; } } /*}}}*/ static void import_toktable2(char *data, unsigned int hash_key, int n_msgs, struct toktable2_db *in, struct toktable2 *out)/*{{{*/ { int n, size, i; n = in->n; size = 1; while (size < n) size <<= 1; size <<= 1; /* safe hash table size */ out->size = size; out->mask = size - 1; out->n = n; out->tokens = new_array(struct token2 *, size); memset(out->tokens, 0, size * sizeof(struct token *)); out->hwm = (n + size) >> 1; for (i=0; ienc0_offsets[i]; idx = 0; for (j = enc0; *j != 0xff; ) { incr = read_increment(&j); idx += incr; } enc0_len = j - enc0; enc0_hi = idx; /*}}}*/ /*{{{ do enc1*/ enc1 = (unsigned char *) data + in->enc1_offsets[i]; idx = 0; for (j = enc1; *j != 0xff; ) { incr = read_increment(&j); idx += incr; } enc1_len = j - enc1; enc1_hi = idx; /*}}}*/ text = data + in->tok_offsets[i]; hash = hashfn((unsigned char *) text, strlen(text), hash_key); nt = new(struct token2); nt->hashval = hash; nt->text = new_string(text); /* Allow a bit of headroom for adding more entries later */ /*{{{ set up match0 chain */ nt->match0.max = get_max(16, enc0_len + (enc0_len >> 1)); nt->match0.n = enc0_len; nt->match0.highest = enc0_hi; assert(nt->match0.highest < n_msgs); nt->match0.msginfo = new_array(unsigned char, nt->match0.max); memcpy(nt->match0.msginfo, enc0, nt->match0.n); /*}}}*/ /*{{{ set up match1 chain */ nt->match1.max = get_max(16, enc1_len + (enc1_len >> 1)); nt->match1.n = enc1_len; nt->match1.highest = enc1_hi; assert(nt->match1.highest < n_msgs); nt->match1.msginfo = new_array(unsigned char, nt->match1.max); memcpy(nt->match1.msginfo, enc1, nt->match1.n); /*}}}*/ index = hash & out->mask; while (out->tokens[index]) { ++index; index &= out->mask; } out->tokens[index] = nt; } } /*}}}*/ struct database *new_database_from_file(char *db_filename, int do_integrity_checks)/*{{{*/ { /* Read existing database from file for doing incremental update */ struct database *result; struct read_db *input; int i, n, N; result = new_database( CREATE_RANDOM_DATABASE_HASH ); input = open_db(db_filename); if (!input) { /* Nothing to initialise */ if (verbose) printf("Database file was empty, creating a new database\n"); return result; } /* Build pathname information */ n = result->n_msgs = input->n_msgs; result->max_msgs = input->n_msgs; /* let it be extended as-and-when */ result->msgs = new_array(struct msgpath, n); result->type = new_array(enum message_type, n); result->hash_key = input->hash_key; /* Set up mbox structures */ N = result->n_mboxen = result->max_mboxen = input->n_mboxen; result->mboxen = N ? (new_array(struct mbox, N)) : NULL; for (i=0; imbox_paths_table[i]) { result->mboxen[i].path = new_string(input->data + input->mbox_paths_table[i]); } else { /* mbox is dead. */ result->mboxen[i].path = NULL; } result->mboxen[i].file_mtime = input->mbox_mtime_table[i]; result->mboxen[i].file_size = input->mbox_size_table[i]; nn = result->mboxen[i].n_msgs = input->mbox_entries_table[i]; result->mboxen[i].max_msgs = nn; result->mboxen[i].start = new_array(off_t, nn); result->mboxen[i].len = new_array(size_t, nn); result->mboxen[i].check_all = new_array(checksum_t, nn); /* Copy the entire checksum table in one go. */ memcpy(result->mboxen[i].check_all, input->data + input->mbox_checksum_table[i], nn * sizeof(checksum_t)); result->mboxen[i].n_so_far = 0; } for (i=0; itype[i] = MTY_DEAD; break; case DB_MSG_FILE: result->type[i] = MTY_FILE; result->msgs[i].src.mpf.path = new_string(input->data + input->path_offsets[i]); result->msgs[i].src.mpf.mtime = input->mtime_table[i]; result->msgs[i].src.mpf.size = input->size_table[i]; break; case DB_MSG_MBOX: { unsigned int mbi, msgi; int n; struct mbox *mb; result->type[i] = MTY_MBOX; decode_mbox_indices(input->path_offsets[i], &mbi, &msgi); result->msgs[i].src.mbox.file_index = mbi; mb = &result->mboxen[mbi]; assert(mb->n_so_far == msgi); n = mb->n_so_far; result->msgs[i].src.mbox.msg_index = n; mb->start[n] = input->mtime_table[i]; mb->len[n] = input->size_table[i]; ++mb->n_so_far; } break; } result->msgs[i].seen = (input->msg_type_and_flags[i] & FLAG_SEEN) ? 1:0; result->msgs[i].replied = (input->msg_type_and_flags[i] & FLAG_REPLIED) ? 1:0; result->msgs[i].flagged = (input->msg_type_and_flags[i] & FLAG_FLAGGED) ? 1:0; result->msgs[i].date = input->date_table[i]; result->msgs[i].tid = input->tid_table[i]; } import_toktable(input->data, input->hash_key, result->n_msgs, &input->to, result->to); import_toktable(input->data, input->hash_key, result->n_msgs, &input->cc, result->cc); import_toktable(input->data, input->hash_key, result->n_msgs, &input->from, result->from); import_toktable(input->data, input->hash_key, result->n_msgs, &input->subject, result->subject); import_toktable(input->data, input->hash_key, result->n_msgs, &input->body, result->body); import_toktable(input->data, input->hash_key, result->n_msgs, &input->attachment_name, result->attachment_name); import_toktable2(input->data, input->hash_key, result->n_msgs, &input->msg_ids, result->msg_ids); close_db(input); if (do_integrity_checks) { check_database_integrity(result); } return result; } /*}}}*/ static void add_angled_terms(int file_index, unsigned int hash_key, struct toktable2 *table, int add_to_chain1, char *s)/*{{{*/ { char *left, *right; if (s) { left = strchr(s, '<'); while (left) { right = strchr(left, '>'); if (right) { *right = '\0'; add_token2_in_file(file_index, hash_key, left+1, table, add_to_chain1); *right = '>'; /* restore */ } else { break; } left = strchr(right, '<'); } } } /*}}}*/ /* Macro for what characters can make up token strings. The following characters have special meanings: 0x2b + 0x2d - 0x2e . 0x40 @ 0x5f _ since they can occur within email addresses and message IDs when considered as a whole rather than as individual words. Underscore (0x5f) is considered a word-character always too. */ static unsigned char special_table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 00-0f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 10-1f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 0, /* 20-2f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 30-3f */ 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 40-4f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, /* 50-5f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 60-6f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 70-7f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 80-8f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 90-9f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* a0-af */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* b0-bf */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* c0-cf */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* d0-df */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* e0-ef */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 /* f0-ff */ }; #if 0 #define CHAR_VALID(x,mask) (isalnum((unsigned char) x) || (special_table[(unsigned int)(unsigned char) x] & mask)) #endif static inline int char_valid_p(char x, unsigned int mask)/*{{{*/ { unsigned char xx = (unsigned char) x; if (isalnum(xx)) return 1; else if (special_table[(unsigned int) xx] & mask) return 1; else return 0; } /*}}}*/ static void tokenise_string(int file_index, unsigned int hash_key, struct toktable *table, char *data, int match_mask)/*{{{*/ { char *ss, *es, old_es; ss = data; for (;;) { while (*ss && !char_valid_p(*ss,match_mask)) ss++; if (!*ss) break; es = ss + 1; while (*es && char_valid_p(*es,match_mask)) es++; /* deal with token [ss,es) */ old_es = *es; *es = '\0'; /* FIXME: Ought to do this by passing start and length - clean up later */ add_token_in_file(file_index, hash_key, ss, table); *es = old_es; if (!*es) break; ss = es; } } /*}}}*/ static void tokenise_html_string(int file_index, unsigned int hash_key, struct toktable *table, char *data)/*{{{*/ { char *ss, *es, old_es; /* FIXME : Probably want to rewrite this as an explicit FSM */ ss = data; for (;;) { /* Assume < and > are never valid token characters ! */ while (*ss && !char_valid_p(*ss, 1)) { if (*ss++ == '<') { /* Skip over HTML tag */ while (*ss && (*ss != '>')) ss++; } } if (!*ss) break; es = ss + 1; while (*es && char_valid_p(*es, 1)) es++; /* deal with token [ss,es) */ old_es = *es; *es = '\0'; /* FIXME: Ought to do this by passing start and length - clean up later */ add_token_in_file(file_index, hash_key, ss, table); *es = old_es; if (!*es) break; ss = es; } } /*}}}*/ void tokenise_message(int file_index, struct database *db, struct rfc822 *msg)/*{{{*/ { struct attachment *a; /* Match on whole addresses in these headers as well as the individual words */ if (msg->hdrs.to) { tokenise_string(file_index, db->hash_key, db->to, msg->hdrs.to, 1); tokenise_string(file_index, db->hash_key, db->to, msg->hdrs.to, 2); } if (msg->hdrs.cc) { tokenise_string(file_index, db->hash_key, db->cc, msg->hdrs.cc, 1); tokenise_string(file_index, db->hash_key, db->cc, msg->hdrs.cc, 2); } if (msg->hdrs.from) { tokenise_string(file_index, db->hash_key, db->from, msg->hdrs.from, 1); tokenise_string(file_index, db->hash_key, db->from, msg->hdrs.from, 2); } if (msg->hdrs.subject) tokenise_string(file_index, db->hash_key, db->subject, msg->hdrs.subject, 1); for (a=msg->atts.next; a!=&msg->atts; a=a->next) { switch (a->ct) { case CT_TEXT_PLAIN: tokenise_string(file_index, db->hash_key, db->body, a->data.normal.bytes, 1); break; case CT_TEXT_HTML: tokenise_html_string(file_index, db->hash_key, db->body, a->data.normal.bytes); break; case CT_MESSAGE_RFC822: /* Just recurse for now - maybe we should have separate token tables * for tokens occurring in embedded messages? */ if (a->data.rfc822) { tokenise_message(file_index, db, a->data.rfc822); } break; default: /* Don't do anything - unknown text format or some nasty binary stuff. * In future, we could have all kinds of 'plug-ins' here, e.g. * something that can parse PDF to get the basic text strings out of * the pages? */ break; } if (a->filename) { add_token_in_file(file_index, db->hash_key, a->filename, db->attachment_name); } } /* Deal with threading information */ add_angled_terms(file_index, db->hash_key, db->msg_ids, 1, msg->hdrs.message_id); add_angled_terms(file_index, db->hash_key, db->msg_ids, 0, msg->hdrs.in_reply_to); add_angled_terms(file_index, db->hash_key, db->msg_ids, 0, msg->hdrs.references); } /*}}}*/ static void scan_maildir_flags(struct msgpath *m)/*{{{*/ { const char *p, *start; start = m->src.mpf.path; m->seen = 0; m->replied = 0; m->flagged = 0; for (p=start; *p; p++) {} for (p--; (p >= start) && ((*p) != ':'); p--) {} if (p >= start) { if (!strncmp(p, ":2,", 3)) { p += 3; while (*p) { switch (*p) { case 'F': m->flagged = 1; break; case 'R': m->replied = 1; break; case 'S': m->seen = 1; break; default: break; } p++; } } } } /*}}}*/ static void scan_new_messages(struct database *db, int start_at)/*{{{*/ { int i; for (i=start_at; in_msgs; i++) { struct rfc822 *msg = NULL; int len = strlen(db->msgs[i].src.mpf.path); if (len > 10 && !strcmp(db->msgs[i].src.mpf.path + len - 11, "/.gitignore")) continue; switch (db->type[i]) { case MTY_DEAD: assert(0); break; case MTY_MBOX: assert(0); /* Should never get here - mbox messages are scanned elsewhere. */ break; case MTY_FILE: if (verbose) fprintf(stderr, "Scanning <%s>\n", db->msgs[i].src.mpf.path); msg = make_rfc822(db->msgs[i].src.mpf.path); break; } if(msg) { db->msgs[i].date = msg->hdrs.date; scan_maildir_flags(&db->msgs[i]); tokenise_message(i, db, msg); free_rfc822(msg); } else fprintf(stderr, "Skipping %s (could not parse message)\n", db->msgs[i].src.mpf.path); } } /*}}}*/ static inline void set_bit(unsigned long *x, int n)/*{{{*/ { int set; unsigned long mask; set = (n >> 5); mask = (1UL << (n & 31)); x[set] |= mask; } /*}}}*/ static inline int isset_bit(unsigned long *x, int n)/*{{{*/ { int set; unsigned long mask; set = (n >> 5); mask = (1UL << (n & 31)); return (x[set] & mask) ? 1 : 0; } /*}}}*/ static int find_base(int *table, int index) {/*{{{*/ int a = index; /* TODO : make this compress the path lengths down to the base entry */ while (table[a] != a) { a = table[a]; } return a; } /*}}}*/ static void find_threading(struct database *db)/*{{{*/ { /* ix is a table mapping path array index to the lowest path array index that * is known to share at least one message ID in its hdrs somewhere (i.e. they * must be in the same thread) */ int *ix; int i, m, np, sm; int next_tid; np = db->n_msgs; sm = db->msg_ids->size; ix = new_array(int, np); for (i=0; imsg_ids->tokens[m]; if (tok) { unsigned char *j = tok->match0.msginfo; unsigned char *last_char = j + tok->match0.n; int cur = 0, incr, first=1; int new_base=-1, old_base; while (j < last_char) { incr = read_increment(&j); cur += incr; if (first) { new_base = find_base(ix, cur); first = 0; } else { old_base = find_base(ix, cur); if (old_base < new_base) { ix[new_base] = old_base; new_base = old_base; } else if (old_base > new_base) { assert(new_base != -1); ix[old_base] = new_base; } } } } } /* Now make each entry point directly to its base */ for (i=0; imsgs[i].tid = next_tid++; } else { db->msgs[i].tid = db->msgs[ix[i]].tid; } } free(ix); return; } /*}}}*/ static int lookup_msgpath(struct msgpath *sorted_paths, int n_msgs, char *key)/*{{{*/ { /* Implement bisection search */ int l, h, m, r; l = 0, h = n_msgs; m = -1; while (h > l) { m = (h + l) >> 1; /* Should only get called on 'file' type messages - TBC */ r = strcmp(sorted_paths[m].src.mpf.path, key); if (r == 0) break; if (l == m) return -1; if (r > 0) h = m; else l = m; } return m; } /*}}}*/ void maybe_grow_message_arrays(struct database *db)/*{{{*/ { if (db->n_msgs == db->max_msgs) { if (db->max_msgs <= 128) { db->max_msgs = 256; } else { db->max_msgs += (db->max_msgs >> 1); } db->msgs = grow_array(struct msgpath, db->max_msgs, db->msgs); db->type = grow_array(enum message_type, db->max_msgs, db->type); } } /*}}}*/ static void add_msg_path(struct database *db, char *path, time_t mtime, size_t message_size)/*{{{*/ { maybe_grow_message_arrays(db); db->type[db->n_msgs] = MTY_FILE; db->msgs[db->n_msgs].src.mpf.path = new_string(path); db->msgs[db->n_msgs].src.mpf.mtime = mtime; db->msgs[db->n_msgs].src.mpf.size = message_size; ++db->n_msgs; } /*}}}*/ static int do_stat(struct msgpath *mp)/*{{{*/ { struct stat sb; int status; status = stat(mp->src.mpf.path, &sb); if ((status < 0) || !S_ISREG(sb.st_mode)) { return 0; } else { mp->src.mpf.mtime = sb.st_mtime; mp->src.mpf.size = sb.st_size; return 1; } } /*}}}*/ int update_database(struct database *db, struct msgpath *sorted_paths, int n_msgs, int do_fast_index)/*{{{*/ { /* The incoming list must be sorted into order, to make binary searching * possible. We search for each existing path in the incoming sorted array. * If the date differs, or the file no longer exist, the existing database * entry for that file is nulled. (These are only recovered if the database * is actively compressed.) If the date differed, a new entry for the file * is put at the end of the list. Similarly, any new file goes at the end. * These new entries are all rescanned to find tokens and add them to the * database. */ char *file_in_db, *file_in_new_list; int matched_index; int i, new_entries_start_at; int any_new, n_newly_pruned, n_already_dead; int status; file_in_db = new_array(char, n_msgs); file_in_new_list = new_array(char, db->n_msgs); bzero(file_in_db, n_msgs); bzero(file_in_new_list, db->n_msgs); n_already_dead = 0; n_newly_pruned = 0; for (i=0; in_msgs; i++) { switch (db->type[i]) { case MTY_FILE: matched_index = lookup_msgpath(sorted_paths, n_msgs, db->msgs[i].src.mpf.path); if (matched_index >= 0) { if (do_fast_index) { /* Assume the presence of a matching path is good enough without * even bothering to stat the file that's there now. */ file_in_db[matched_index] = 1; file_in_new_list[i] = 1; } else { status = do_stat(sorted_paths + matched_index); if (status) { if (sorted_paths[matched_index].src.mpf.mtime == db->msgs[i].src.mpf.mtime) { /* Treat stale files as though the path has changed. */ file_in_db[matched_index] = 1; file_in_new_list[i] = 1; } } else { /* This path will get treated as dead, and be re-stated below. * When that stat fails, the path won't get added to the db. */ } } } break; case MTY_MBOX: /* Nothing to do on this pass. */ break; case MTY_DEAD: break; } } /* Add new entries to database */ new_entries_start_at = db->n_msgs; for (i=0; in_msgs; i++) { /* Weed dead entries */ switch (db->type[i]) { case MTY_FILE: if (!file_in_new_list[i]) { free(db->msgs[i].src.mpf.path); db->msgs[i].src.mpf.path = NULL; db->type[i] = MTY_DEAD; ++n_newly_pruned; } break; case MTY_MBOX: { int msg_index, file_index, number_valid; int mbox_valid; msg_index = db->msgs[i].src.mbox.msg_index; file_index = db->msgs[i].src.mbox.file_index; assert (file_index < db->n_mboxen); mbox_valid = (db->mboxen[file_index].path) ? 1 : 0; number_valid = db->mboxen[file_index].n_old_msgs_valid; if (!mbox_valid || (msg_index >= number_valid)) { db->type[i] = MTY_DEAD; ++n_newly_pruned; } } break; case MTY_DEAD: /* already dead */ ++n_already_dead; break; } } if (verbose) { fprintf(stderr, "%d newly dead messages, %d messages now dead in total\n", n_newly_pruned, n_newly_pruned+n_already_dead); } any_new = 0; for (i=0; i 0); } /*}}}*/ static void recode_encoding(struct matches *m, int *new_idx)/*{{{*/ { unsigned char *new_enc, *old_enc; unsigned char *j, *last_char; int incr, idx, n_idx; old_enc = m->msginfo; j = old_enc; last_char = old_enc + m->n; new_enc = new_array(unsigned char, m->max); /* Probably not bigger than this. */ m->n = 0; m->highest = 0; m->msginfo = new_enc; idx = 0; while (j < last_char) { incr = read_increment(&j); idx += incr; n_idx = new_idx[idx]; if (n_idx >= 0) { check_and_enlarge_encoding(m); insert_index_on_encoding(m, n_idx); } } free(old_enc); } /*}}}*/ static void recode_toktable(struct toktable *tbl, int *new_idx)/*{{{*/ { /* Re-encode the vectors according to the new path indices */ int i; int any_dead = 0; int any_moved, pass; for (i=0; isize; i++) { struct token *tok = tbl->tokens[i]; if (tok) { recode_encoding(&tok->match0, new_idx); if (tok->match0.n == 0) { /* Delete this token. Gotcha - there may be tokens further on in the * array that didn't get their natural hash bucket due to collisions. * Need to shuffle such tokens up to guarantee that the buckets between * the natural one and the one where they are now are all occupied, to * prevent their lookups failing. */ #if 0 fprintf(stderr, "Token <%s> (bucket %d) no longer has files containing it, deleting\n", tok->text, i); #endif free_token(tok); tbl->tokens[i] = NULL; --tbl->n; /* Maintain number in use counter */ any_dead = 1; } } } if (any_dead) { /* Now close gaps. This has to be done in a second pass, otherwise we get a * problem with moving entries that need deleting back before the current scan point. */ pass = 1; for (;;) { int i; if (verbose) { fprintf(stderr, "Pass %d\n", pass); } any_moved = 0; for (i=0; isize; i++) { if (tbl->tokens[i]) { int nat_bucket_i; nat_bucket_i = tbl->tokens[i]->hashval & tbl->mask; if (nat_bucket_i != i) { /* Find earliest bucket that we could move i to */ int j = nat_bucket_i; while (j != i) { if (!tbl->tokens[j]) { /* put it here */ #if 0 fprintf(stderr, "Moved <%s> from bucket %d to %d (natural bucket %d)\n", tbl->tokens[i]->text, i, j, nat_bucket_i); #endif tbl->tokens[j] = tbl->tokens[i]; tbl->tokens[i] = NULL; any_moved = 1; break; } else { j++; j &= tbl->mask; } } if (tbl->tokens[i]) { #if 0 fprintf(stderr, "NOT moved <%s> from bucket %d (natural bucket %d)\n", tbl->tokens[i]->text, i, nat_bucket_i); #endif } } } } if (!any_moved) break; pass++; } } } /*}}}*/ static void recode_toktable2(struct toktable2 *tbl, int *new_idx)/*{{{*/ { /* Re-encode the vectors according to the new path indices */ int i; int any_dead = 0; int any_moved, pass; for (i=0; isize; i++) { struct token2 *tok = tbl->tokens[i]; if (tok) { recode_encoding(&tok->match0, new_idx); recode_encoding(&tok->match1, new_idx); if ((tok->match0.n == 0) && (tok->match1.n == 0)) { /* Delete this token. Gotcha - there may be tokens further on in the * array that didn't get their natural hash bucket due to collisions. * Need to shuffle such tokens up to guarantee that the buckets between * the natural one and the one where they are now are all occupied, to * prevent their lookups failing. */ #if 0 fprintf(stderr, "Token <%s> (bucket %d) no longer has files containing it, deleting\n", tok->text, i); #endif free_token2(tok); tbl->tokens[i] = NULL; --tbl->n; /* Maintain number in use counter */ any_dead = 1; } } } if (any_dead) { /* Now close gaps. This has to be done in a second pass, otherwise we get a * problem with moving entries that need deleting back before the current scan point. */ pass = 1; for (;;) { int i; if (verbose) { fprintf(stderr, "Pass %d\n", pass); } any_moved = 0; for (i=0; isize; i++) { if (tbl->tokens[i]) { int nat_bucket_i; nat_bucket_i = tbl->tokens[i]->hashval & tbl->mask; if (nat_bucket_i != i) { /* Find earliest bucket that we could move i to */ int j = nat_bucket_i; while (j != i) { if (!tbl->tokens[j]) { /* put it here */ #if 0 fprintf(stderr, "Moved <%s> from bucket %d to %d (natural bucket %d)\n", tbl->tokens[i]->text, i, j, nat_bucket_i); #endif tbl->tokens[j] = tbl->tokens[i]; tbl->tokens[i] = NULL; any_moved = 1; break; } else { j++; j &= tbl->mask; } } if (tbl->tokens[i]) { #if 0 fprintf(stderr, "NOT moved <%s> from bucket %d (natural bucket %d)\n", tbl->tokens[i]->text, i, nat_bucket_i); #endif } } } } if (!any_moved) break; pass++; } } } /*}}}*/ int cull_dead_messages(struct database *db, int do_integrity_checks)/*{{{*/ { /* Return true if any culled */ int *new_idx, i, j, n_old; int any_culled = 0; /* Check db is OK before we start on this. (Check afterwards is done in the * writer.c code.) */ if (do_integrity_checks) { check_database_integrity(db); } if (verbose) { fprintf(stderr, "Culling dead messages\n"); } n_old = db->n_msgs; new_idx = new_array(int, n_old); for (i=0, j=0; itype[i]) { case MTY_FILE: case MTY_MBOX: new_idx[i] = j++; break; case MTY_DEAD: new_idx[i] = -1; any_culled = 1; break; } } recode_toktable(db->to, new_idx); recode_toktable(db->cc, new_idx); recode_toktable(db->from, new_idx); recode_toktable(db->subject, new_idx); recode_toktable(db->body, new_idx); recode_toktable(db->attachment_name, new_idx); recode_toktable2(db->msg_ids, new_idx); /* And crunch down the filename table */ for (i=0, j=0; itype[i]) { case MTY_DEAD: break; case MTY_FILE: case MTY_MBOX: if (i > j) { db->msgs[j] = db->msgs[i]; db->type[j] = db->type[i]; } j++; break; } } db->n_msgs = j; free(new_idx); /* .. and cull dead mboxen */ cull_dead_mboxen(db); return any_culled; } /*}}}*/ mairix-master/dfasyn/000077500000000000000000000000001224450623700151315ustar00rootroot00000000000000mairix-master/dfasyn/.gitignore000066400000000000000000000000521224450623700171160ustar00rootroot00000000000000*.o dfasyn parse.[ch] parse.output scan.c mairix-master/dfasyn/COPYING000066400000000000000000000431031224450623700161650ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. mairix-master/dfasyn/INSTALL000066400000000000000000000005461224450623700161670ustar00rootroot00000000000000There is no real configure mechanism (yet). To build the program make To install the program (perhaps as root) make prefix=/usr/local install or as yourself you might do make prefix=$HOME install or if your distribution puts manpages in /usr/share/man, you might do make prefix=/usr/local mandir=/usr/share/man install # vim:et:sw=4 mairix-master/dfasyn/Makefile000066400000000000000000000030701224450623700165710ustar00rootroot00000000000000# Makefile for NFA->DFA conversion utility # # Copyright (C) Richard P. Curnow 2000-2001,2003,2005,2006,2007 # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. # CC=gcc #CFLAGS=-g -Wall #CFLAGS=-O2 -pg CFLAGS=-Wall prefix?=/usr/local bindir=$(prefix)/bin mandir?=$(prefix)/man man1dir=$(mandir)/man1 man5dir=$(mandir)/man5 OBJ = dfasyn.o parse.o scan.o \ tokens.o abbrevs.o charclass.o \ stimulus.o \ blocks.o states.o \ n2d.o expr.o evaluator.o \ tabcompr.o compdfa.o all : dfasyn install : all [ -d $(bindir) ] || mkdir -p $(bindir) [ -d $(man1dir) ] || mkdir -p $(man1dir) [ -d $(man5dir) ] || mkdir -p $(man5dir) cp dfasyn $(bindir) cp dfasyn.1 $(man1dir) cp dfasyn.5 $(man5dir) dfasyn : $(OBJ) $(CC) $(CFLAGS) -o dfasyn $(OBJ) parse.c parse.h : parse.y bison -v -d -o parse.c parse.y parse.o : parse.c dfasyn.h scan.c : scan.l flex -t -s scan.l > scan.c scan.o : scan.c parse.h dfasyn.h $(OBJ) : dfasyn.h clean: rm -f dfasyn *.o scan.c parse.c parse.h parse.output mairix-master/dfasyn/NEWS000066400000000000000000000001061224450623700156250ustar00rootroot00000000000000New in version 0.2 ================== * Added README and NEWS files mairix-master/dfasyn/README000066400000000000000000000007341224450623700160150ustar00rootroot00000000000000dfasyn is a tool for constructing state machines. The input language allows a lot of generality. For example, it allows repeated elements to be specified where the items have constraints between the end of one and the start of the next. (I could not find a way to define such an automaton in the lex/flex input language, which prompted the writing of the tool.) Currently, you must do a fair amount of work yourself to build a parser around the resulting state machine. mairix-master/dfasyn/abbrevs.c000066400000000000000000000036401224450623700167240ustar00rootroot00000000000000/*************************************** Handle state-related stuff ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2000-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" static struct Abbrev *abbrevtable=NULL; static int nabbrevs = 0; static int maxabbrevs = 0; static void grow_abbrevs(void)/*{{{*/ { maxabbrevs += 32; abbrevtable = resize_array(struct Abbrev, abbrevtable, maxabbrevs); } /*}}}*/ struct Abbrev * create_abbrev(const char *name, struct StimulusList *stimuli)/*{{{*/ { struct Abbrev *result; if (nabbrevs == maxabbrevs) { grow_abbrevs(); } result = abbrevtable + (nabbrevs++); result->lhs = new_string(name); result->stimuli = stimuli; return result; } /*}}}*/ struct Abbrev * lookup_abbrev(char *name)/*{{{*/ { int found = -1; int i; struct Abbrev *result = NULL; /* Scan table in reverse order. If a name has been redefined, make sure the most recent definition is picked up. */ for (i=nabbrevs-1; i>=0; i--) { if (!strcmp(abbrevtable[i].lhs, name)) { found = i; result = abbrevtable + found; break; } } return result; } /*}}}*/ mairix-master/dfasyn/blocks.c000066400000000000000000000106261224450623700165570ustar00rootroot00000000000000/*************************************** Handle blocks ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2000-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" static Block **blocks = NULL; static int nblocks = 0; static int maxblocks = 0; /* ================================================================= */ static void grow_blocks(void)/*{{{*/ { maxblocks += 32; blocks = resize_array(Block*, blocks, maxblocks); } /*}}}*/ static Block * create_block(char *name)/*{{{*/ { Block *result; int i; if (nblocks == maxblocks) { grow_blocks(); } #if 0 /* Not especially useful to show this */ if (verbose) { fprintf(stderr, " %s", name); } #endif result = blocks[nblocks++] = new(Block); result->name = new_string(name); for (i=0; istate_hash[i].states = NULL; result->state_hash[i].nstates = 0; result->state_hash[i].maxstates = 0; } result->states = NULL; result->nstates = result->maxstates = 0; result->eclo = NULL; result->subcount = 1; result->subblockcount = 1; return result; } /*}}}*/ Block * lookup_block(char *name, int create)/*{{{*/ { Block *found = NULL; int i; for (i=0; iname, name)) { found = blocks[i]; break; } } switch (create) { case USE_OLD_MUST_EXIST: if (!found) { fprintf(stderr, "Could not find block '%s' to instantiate\n", name); exit(1); } break; case CREATE_MUST_NOT_EXIST: if (found) { fprintf(stderr, "Already have a block called '%s', cannot redefine\n", name); exit(1); } else { found = create_block(name); } break; case CREATE_OR_USE_OLD: if (!found) { found = create_block(name); } break; } return found; } /*}}}*/ /* ================================================================= */ void instantiate_block(Block *curblock, char *block_name, char *instance_name)/*{{{*/ { Block *master = lookup_block(block_name, USE_OLD_MUST_EXIST); char namebuf[1024]; int i; for (i=0; instates; i++) { State *s = master->states[i]; State *new_state; TransList *tl; Stringlist *sl, *ex; strcpy(namebuf, instance_name); strcat(namebuf, "."); strcat(namebuf, s->name); /* In perverse circumstances, we might already have a state called this */ new_state = lookup_state(curblock, namebuf, CREATE_OR_USE_OLD); for (tl=s->transitions; tl; tl=tl->next) { TransList *new_tl = new(TransList); new_tl->type = tl->type; /* Might cause some dangling ref problem later... */ new_tl->x = tl->x; strcpy(namebuf, instance_name); strcat(namebuf, "."); strcat(namebuf, tl->ds_name); new_tl->ds_name = new_string(namebuf); new_tl->ds_ref = NULL; new_tl->next = new_state->transitions; new_state->transitions = new_tl; } /*{{{ Copy state tags */ ex = NULL; for (sl=s->tags; sl; sl=sl->next) { Stringlist *new_sl = new(Stringlist); new_sl->string = sl->string; new_sl->next = ex; ex = new_sl; } new_state->tags = ex; /*}}}*/ /* **DON'T** COPY ENTRIES : these are deliberately dropped if they occur * in a block that gets instantiated elsewhere. */ } } /*}}}*/ /* ================================================================= */ InlineBlock *create_inline_block(char *type, char *in, char *out)/*{{{*/ { InlineBlock *result; result = new(InlineBlock); result->type = new_string(type); result->in = new_string(in); result->out = new_string(out); return result; } /*}}}*/ mairix-master/dfasyn/charclass.c000066400000000000000000000176151224450623700172520ustar00rootroot00000000000000/*************************************** Handle character classes ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" #include struct cc_list { struct cc_list *next; CharClass *cc; }; static struct cc_list *cc_list = NULL; static short mapping[256]; int n_charclasses; static char *strings[256]; static void set_bit(unsigned long *bitmap, int entry)/*{{{*/ { int i, j, mask; i = (entry >> 5); j = entry & 31; mask = 1<> 5); j = entry & 31; mask = 1<> 5); j = entry & 31; mask = 1<is_used = 0; memset(result->char_bitmap, 0, sizeof(result->char_bitmap)); memset(result->group_bitmap, 0, sizeof(result->group_bitmap)); return result; } /*}}}*/ void free_charclass(CharClass *what)/*{{{*/ { free(what); } /*}}}*/ void add_charclass_to_list(CharClass *cc)/*{{{*/ { /* Add the cc to the master list for later processing. */ struct cc_list *elt = new(struct cc_list); elt->next = cc_list; elt->cc = cc; cc_list = elt; } /*}}}*/ void add_singleton_to_charclass(CharClass *towhat, char thechar)/*{{{*/ { int x; x = (int)(unsigned char) thechar; set_bit(towhat->char_bitmap, x); } /*}}}*/ void add_range_to_charclass(CharClass *towhat, char start, char end)/*{{{*/ { int sx, ex, t; sx = (int)(unsigned char) start; ex = (int)(unsigned char) end; if (sx > ex) { t = sx, sx = ex, ex = t; } for (t=sx; t<=ex; t++) { set_bit(towhat->char_bitmap, t); } } /*}}}*/ void invert_charclass(CharClass *what)/*{{{*/ { int i; for (i=0; ichar_bitmap[i] ^= 0xffffffffUL; } } /*}}}*/ void diff_charclasses(CharClass *left, CharClass *right)/*{{{*/ { /* Compute set difference */ int i; for (i=0; ichar_bitmap[i] &= ~(right->char_bitmap[i]); } } /*}}}*/ static char *emit_char (char *p, int i)/*{{{*/ { if (i == '\\') { *p++ = '\\'; *p++ = '\\'; } else if (isprint(i) && (i != '-')) { *p++ = i; } else if (i == '\n') { *p++ = '\\'; *p++ = 'n'; } else if (i == '\r') { *p++ = '\\'; *p++ = 'r'; } else if (i == '\f') { *p++ = '\\'; *p++ = 'f'; } else if (i == '\t') { *p++ = '\\'; *p++ = 't'; } else { p += sprintf(p, "\\%03o", i); } return p; } /*}}}*/ static void generate_string(int idx, const unsigned long *x)/*{{{*/ { int i, j; char buffer[4096]; char *p; p = buffer; *p++ = '['; /* Force '-' to be shown at the start. */ i = 0; do { while ((i < 256) && !cc_test_bit(x,i)) i++; if (i>=256) break; j = i + 1; while ((j < 256) && cc_test_bit(x,j)) j++; j--; p = emit_char(p, i); if (j == (i + 1)) { p = emit_char(p, j); } else if (j > (i + 1)) { *p++ = '-'; p = emit_char(p, j); } i = j + 1; } while (i < 256); *p++ = ']'; *p = 0; strings[idx] = new_string(buffer); return; } /*}}}*/ static void combine(unsigned long *into, const unsigned long *with)/*{{{*/ { int i; for (i=0; i>= 16; if (!(val & 0x00ff)) pos += 8, val >>= 8; if (!(val & 0x000f)) pos += 4, val >>= 4; if (!(val & 0x0003)) pos += 2, val >>= 2; if (!(val & 0x0001)) pos += 1; return (i << 5) + pos; } } return -1; } /*}}}*/ static void mark_used_in_block(const Block *b)/*{{{*/ { int i; for (i=0; instates; i++) { const State *s = b->states[i]; const TransList *tl; for (tl=s->transitions; tl; tl=tl->next) { switch (tl->type) { case TT_CHARCLASS: tl->x.char_class->is_used = 1; break; default: break; } } } } /*}}}*/ static void reduce_list(void)/*{{{*/ { struct cc_list *ccl, *next_ccl; ccl = cc_list; cc_list = NULL; while (ccl) { next_ccl = ccl->next; if (ccl->cc->is_used) { ccl->next = cc_list; cc_list = ccl; } else { free(ccl->cc); free(ccl); } ccl = next_ccl; } } /*}}}*/ void split_charclasses(const Block *b)/*{{{*/ { unsigned long cc_union[ULONGS_PER_CC]; struct cc_list *elt; int i; int any_left; mark_used_in_block(b); reduce_list(); n_charclasses = 0; if (!cc_list) { if (verbose) fprintf(stderr, "No charclasses used\n"); return; } /* Form union */ clear_all(cc_union); for (elt=cc_list; elt; elt=elt->next) { combine(cc_union, elt->cc->char_bitmap); } for (i=0; i<256; i++) mapping[i] = -1; do { int first_char; int i; unsigned long pos[ULONGS_PER_CC], neg[ULONGS_PER_CC]; first_char = find_lowest_bit_set(cc_union); set_all(pos); clear_all(neg); for (elt=cc_list; elt; elt=elt->next) { if (cc_test_bit(elt->cc->char_bitmap, first_char)) { for (i=0; icc->char_bitmap[i]; } else { for (i=0; icc->char_bitmap[i]; } } for (i=0; inext) { for (i=0; i<256; i++) { if (cc_test_bit(elt->cc->char_bitmap, i)) { set_bit(elt->cc->group_bitmap, mapping[i]); } } } fprintf(stderr, "Got %d character classes\n", n_charclasses); return; } /*}}}*/ void print_charclass_mapping(FILE *out, FILE *header_out, const char *prefix_under)/*{{{*/ { int i; if (!cc_list) return; fprintf(out, "short %schar2tok[256] = {", prefix_under); for (i=0; i<256; i++) { if (i > 0) fputs(", ", out); if ((i & 15) == 0) fputs("\n ", out); if (mapping[i] >= 0) { fprintf(out, "%3d", mapping[i] + ntokens); } else { fprintf(out, "%3d", mapping[i]); } } fputs("\n};\n", out); if (header_out) { fprintf(header_out, "extern short %schar2tok[256];\n", prefix_under); } return; } /*}}}*/ void print_charclass(FILE *out, int idx)/*{{{*/ { fprintf(out, "%d:%s", idx, strings[idx]); } /*}}}*/ mairix-master/dfasyn/compdfa.c000066400000000000000000000334071224450623700167150ustar00rootroot00000000000000/*************************************** Routines for compressing the DFA by commoning-up equivalent states ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* The input to this stage is the 'raw' DFA build from the NFA by the subset construction. Depending on the style of the NFA, there may be large chunks of the DFA that have equivalent functionality, in terms of resulting in the same attributes for the same sequence of input tokens, but which are reached by different prefixes. The idea of this stage is to common up such regions, to reduce the size of the DFA and hence the table sizes that are generated. Conceptually, the basis of the algorithm is to assign the DFA states to equivalence classes. If there are N different tags-combinations, there are initially N+1 classes. All states that can exit with a particular value are placed in a class together, and all non-accepting states are placed together. Now, a pass is made over all pairs of states. Two states remain equivalent if for each token, their outbound transitions go to states in the same class. If the states do not stay equivalent, the class they were in is split accordingly. This is repeated again and again until no more bisections occur. The algorithm actually used is to assign an ordering to the states based on their current class and outbound transitions. The states are then sorted. This allows all checking to be done on near-neighbours in the sequence generated by the sort, which brings the execution time down to something finite. */ #include "dfasyn.h" static int last_eq_class; /* Next class to assign */ static int Nt; /* Number of tokens; has to be made static to be visible to comparison fn. */ /* To give 'general_compre' visibility of the current equiv. classes of the destination states */ static DFANode **local_dfas; static void calculate_signatures(DFANode **seq, DFANode **dfas, int ndfas)/*{{{*/ /**** Determine state signatures based on transitions and current classes. ****/ { unsigned long sig; int i, t; for (i=0; imap[t]; if (di >= 0) { DFANode *d = dfas[di]; int deq_class = d->eq_class; sig = increment(sig, deq_class & 0xf); /* 16 bit pairs in sig */ } } s->signature = sig; } } /*}}}*/ static int general_compare(const void *a, const void *b)/*{{{*/ /************************* Do full compare on states *************************/ { Castderef (a, const DFANode *, aa); Castderef (b, const DFANode *, bb); if (aa->eq_class < bb->eq_class) { return -1; } else if (aa->eq_class > bb->eq_class) { return +1; } else if (aa->signature < bb->signature) { return -1; } else if (aa->signature > bb->signature) { return +1; } else { /* The hard way... */ int i; for (i=0; imap[i]; int bm = bb->map[i]; /* Map transition destinations to the current equivalence class of the destination state (otherwise compressor is very pessimistic). */ am = (am>=0) ? local_dfas[am]->eq_class: -1; bm = (bm>=0) ? local_dfas[bm]->eq_class: -1; if (am < bm) return -1; else if (am > bm) return +1; } } /* If you get here, the states are still equivalent */ return 0; } /*}}}*/ static int split_classes(DFANode **seq, DFANode **dfas, int ndfas)/*{{{*/ /*********************** Do one pass of class splitting ***********************/ { int i; int had_to_split = 0; calculate_signatures(seq, dfas, ndfas); qsort(seq, ndfas, sizeof(DFANode *), general_compare); seq[0]->new_eq_class = seq[0]->eq_class; for (i=1; inew_eq_class = seq[i]->eq_class; if (seq[i]->eq_class == seq[i-1]->eq_class) { /* May need to split, otherwise states were previously separated anyway */ if (general_compare(seq+i, seq+i-1) != 0) { /* Different transition pattern, split existing equivalent class */ had_to_split = 1; seq[i]->new_eq_class = ++last_eq_class; if (verbose) fprintf(stderr, "Found %d equivalence classes\r", last_eq_class+1); } else { /* This works even if seq[i-1] was assigned a new class due to splitting from seq[i-2] etc. */ seq[i]->new_eq_class = seq[i-1]->new_eq_class; } } } /* Set classes to new class values. */ for (i=0; ieq_class = seq[i]->new_eq_class; } return had_to_split; } /*}}}*/ static int initial_compare(const void *a, const void *b)/*{{{*/ /************************** Sort based on tags **************************/ { Castderef (a, const DFANode *, aa); Castderef (b, const DFANode *, bb); int status; int i; for (i=0; iattrs[i], *br = bb->attrs[i]; if (!ar) ar = get_defattr(i); if (!br) br = get_defattr(i); /* Sort so that states with identical attributes appear together. */ if (!ar && br) { return -1; } else if (ar && !br) { return +1; } else { if (ar && br) { status = strcmp(ar, br); if (status < 0) return -1; else if (status > 0) return +1; } /* So neither had an attribute at all, or both did and they were equal. * i.e. need to look at attributes further up the vectors */ } } /* Got here => both states were identical in terms of their attribute sets */ return 0; } /*}}}*/ static void assign_initial_classes(DFANode **seq, int ndfas)/*{{{*/ /******************* Determine initial equivalence classes. *******************/ { int i; qsort(seq, ndfas, sizeof(DFANode *), initial_compare); last_eq_class = 0; seq[0]->eq_class = last_eq_class; for (i=1; ieq_class = ++last_eq_class; } else { /* Same class as last entry */ seq[i]->eq_class = last_eq_class; } } } /*}}}*/ /*{{{ compress_states() */ static void compress_states(struct DFA *dfa, int n_dfa_entries, struct DFAEntry *dfa_entries) /***** Compress the DFA so there is precisely one state in each eq. class *****/ { int *reps; int i, j, t; int neqc; int new_index; if (verbose) fprintf(stderr, "%d DFA states before compression\n", dfa->n); if (report) { fprintf(report, "\n-----------------------------\n" "------ COMPRESSING DFA ------\n" "-----------------------------\n"); } neqc = 1 + last_eq_class; /* Array containing which state is the representative of each eq. class. Keep the state which had the lowest array index. */ reps = new_array(int, neqc); for (i=0; in; i++) { int eqc = dfa->s[i]->eq_class; if (reps[eqc] < 0) { reps[eqc] = i; dfa->s[i]->is_rep = 1; } else { dfa->s[i]->is_rep = 0; } } /* Go through DFA states and assign new indices. */ for (i=0, new_index=0; in; i++) { if (dfa->s[i]->is_dead) { dfa->s[i]->new_index = -1; if (report) fprintf(report, "Old DFA state %d becomes -1 (dead state)\n", i); } else if (dfa->s[i]->is_rep) { dfa->s[i]->new_index = new_index++; if (report) fprintf(report, "Old DFA state %d becomes %d\n", i, dfa->s[i]->new_index); } else { int eqc = dfa->s[i]->eq_class; int rep = reps[eqc]; /* This assignment works because the representative for the class must have been done earlier in the loop. */ dfa->s[i]->new_index = dfa->s[rep]->new_index; if (report) fprintf(report, "Old DFA state %d becomes %d (formerly %d)\n", i, dfa->s[i]->new_index, rep); } } /* Go through all transitions and fix them up. */ for (i=0; in; i++) { DFANode *s = dfa->s[i]; for (t=0; tmap[t]; if (dest >= 0) { s->map[t] = dfa->s[dest]->new_index; } } } /* Go through the entries and fix their states */ for (i=0; is[dfa_entries[i].state_number]->new_index; if (report) { fprintf(report, "Entry <%s>, formerly state %d, now state %d\n", dfa_entries[i].entry_name, dfa_entries[i].state_number, ni); } dfa_entries[i].state_number = dfa->s[dfa_entries[i].state_number]->new_index; } /* Fix from_state */ for (i=0; in; i++) { int old_from_state, new_from_state; /* If we're not going to preserve the state, move along */ if (!dfa->s[i]->is_rep) continue; old_from_state = dfa->s[i]->from_state; /* Any entry state ..., move along */ if (old_from_state < 0) continue; new_from_state = dfa->s[reps[dfa->s[old_from_state]->eq_class]]->new_index; dfa->s[i]->from_state = new_from_state; } /* Go through and crunch the entries in the DFA array, fixing up the indices */ for (i=j=0; in; i++) { if (!dfa->s[i]->is_dead && dfa->s[i]->is_rep) { dfa->s[j] = dfa->s[i]; dfa->s[j]->index = dfa->s[j]->new_index; j++; } } free(reps); dfa->n = new_index; /* ignore dead states which are completely pruned. */ if (verbose) fprintf(stderr, "%d DFA states after compression", dfa->n); } /*}}}*/ static void discard_nfa_bitmaps(struct DFA *dfa)/*{{{*/ /********** Discard the (now inaccurate) NFA bitmaps from the states **********/ { int i; for (i=0; in; i++) { free(dfa->s[i]->nfas); dfa->s[i]->nfas = NULL; } return; } /*}}}*/ static void print_classes(DFANode **dfas, int ndfas)/*{{{*/ { int i; #if 1 /* Comment out to print this stuff for debug */ return; #endif if (!report) return; fprintf(report, "Equivalence classes are :\n"); for (i=0; ieq_class); } fprintf(report, "\n"); return; } /*}}}*/ static int has_any_nondefault_attribute(const DFANode *x)/*{{{*/ { int result = 0; int i; for (i=0; iattrs[i]) { char *defattr; defattr = get_defattr(i); if (defattr && strcmp(defattr, x->attrs[i])) { result = 1; break; } } } return result; } /*}}}*/ static void find_dead_states(DFANode **dfas, int ndfas, int ntokens)/*{{{*/ { /* Find any state that has no transitions out of it and no attribute. * If you get there, you're guaranteed to be stuck. * Then, repeatedly look for states which are such that all transitions from * them lead to dead states. Mark these dead too. * Then, go through all the dead states and remove their transitions. * This will force them all into a single class later. */ int did_any; int i, j; /* Eventually, consider looking for results that are non-default. */ char *leads_to_result; int total_found = 0; leads_to_result = new_array(char, ndfas); memset(leads_to_result, 0, ndfas); if (report) { fprintf(report, "Searching for dead states...\n"); } do { did_any = 0; for (i=0; imap[j]; if ((next_state >= 0) && leads_to_result[next_state]) { leads_to_result[i] = 1; did_any = 1; goto do_next_dfa_state; } } } do_next_dfa_state: (void) 0; } } while (did_any); /* Now prune any transition to states that have no path to a result. */ for (i=0; ifrom_state = -1; dfas[i]->via_token = -1; dfas[i]->is_dead = 1; } else { dfas[i]->is_dead = 0; } for (j=0; jmap[j]; if (leads_to_result[next_state] == 0) { dfas[i]->map[j] = -1; } } } free(leads_to_result); if (!total_found && report) { fprintf(report, "(no dead states found)\n"); } } /*}}}*/ /*{{{ compress_dfa() */ void compress_dfa(struct DFA *dfa, int ntokens, int n_dfa_entries, struct DFAEntry *dfa_entries) { DFANode **seq; /* Storage for node sequence */ int i; int had_to_split; /* Safety net */ if (dfa->n <= 0) return; local_dfas = dfa->s; Nt = ntokens; seq = new_array(DFANode *, dfa->n); for (i=0; in; i++) { seq[i] = dfa->s[i]; } find_dead_states(dfa->s, dfa->n, ntokens); assign_initial_classes(seq, dfa->n); do { print_classes(dfa->s, dfa->n); had_to_split = split_classes(seq, dfa->s, dfa->n); } while (had_to_split); print_classes(dfa->s, dfa->n); compress_states(dfa, n_dfa_entries, dfa_entries); discard_nfa_bitmaps(dfa); free(seq); return; } /*}}}*/ mairix-master/dfasyn/configure000077500000000000000000000000421224450623700170340ustar00rootroot00000000000000#!/bin/sh egrep -v '^#' INSTALL mairix-master/dfasyn/dfasyn.1000066400000000000000000000071101224450623700164760ustar00rootroot00000000000000.TH DFASYN 1 "" .SH NAME dfasyn \- generate deterministic finite automata .SH SYNOPSYS .B dfasyn [ .BR \-o | \-\-output .I C-filename ] [ .BR \-ho | \-\-header-output .I H-filename ] [ .BR \-r | \-\-report .I report-filename ] [ .BR \-p | \-\-prefix .I prefix ] [ .BR \-u | \-\-uncompressed-tables ] [ .BR \-ud | \-\-uncompressed-dfa ] [ .BR \-I | \-\-inline-function ] [ .BR \-v | \-\-verbose ] [ .BR \-h | \-\-help ] .I input-file .SH DESCRIPTION .B dfasyn generates a deterministic finite automaton (DFA) from a description file. .SH OPTIONS .SS Options controlling output files .TP .BI "-o " C-filename .br .ns .TP .BI "--output " C-filename .br Specify the name of the file to which the C program text will be written. If this option is not present, the C program text will be written to stdout. .TP .BI "-ho " H-filename .br .ns .TP .BI "--header-output " H-filename .br Specify the name of the file to which the header information will be written. .TP .BI "-r " report-filename .br .ns .TP .BI "--report " report-filename .br Specify the name of the file to which the report on the generated automaton will be written. If this option is not present, no report will be written. .TP .I input-file .br This is the name of the file containing the definition of the automaton. Refer to .BR dfasyn (5) for more information about the format of this file. .SS Options controlling the generated automaton .TP .BI "-p " prefix .br .ns .TP .BI "--prefix " prefix .br Specify the prefix to be prepended onto each symbol that .B dfasyn generates in the output file. This allows multiple automata to be linked into the same final program without namespace clashes. The string prepended is actually .I prefix followed by an underscore ('_'). .TP .BR -u ", " --uncompressed-tables .br Do not compress the transition tables. By default, .B dfasyn emits the transition tables compressed, and it emits a next-state function that uses a bisection algorithm to search the tables. By contrast, uncompressed tables use a simple array indexing algorithm in the next-state algorithm. However, the generated tables will be much larger, especially if there is a large set of input symbols and the transitions in the automaton are relatively sparse. This option therefore represents a speed versus space trade-off in the generated DFA. .TP .BR -ud ", " --uncompressed-dfa .br Do not compress the generated DFA. By default, .B dfasyn compresses the DFA to combine common states into a single state in the final DFA and to remove unreachable states. This option suppresses the compression. Giving this option can only be to the detriment of the final DFA, in terms of the array sizes of its tables. However, the option is useful for debugging .B dfasyn and will also reduce the run time of .B dfasyn since a potentially complex processing step can be omitted. .TP .BR -I ", " --inline-function .br This causes the next-state function to emitted as an inline function in the header output. Specifying this option without .B -ho is non-sensical and .B dfasyn will complain in that situation. Normally, .B dfasyn will emit the next_state function in the C program text output. This will incur a function call overhead for each input symbol when the DFA is used at run-time. If this is significant to the final application, the .B -I option may be useful to allow the next-state function to be inlined. .SS General options .TP .BR -v ", " --verbose .br Make the output more verbose; provide more comfort messages whilst .B dfasyn is running. .TP .BR -h ", " --help .br Show usage summary and exit .SH "SEE ALSO" .BR dfasyn (5), .BR bison (1), .BR flex (1) mairix-master/dfasyn/dfasyn.5000066400000000000000000000366521224450623700165170ustar00rootroot00000000000000.TH DFASYN 5 "" .SH NAME dfasyn .SH SYNOPSYS This page describes the format of the .I input-file for the .B dfasyn deterministic finite automaton generator. .SH DESCRIPTION .SS Overview Reserved words may be given in all-lowercase, all-uppercase, initial capitals, or 'WikiWord' format (e.g. .B endblock may be given as .BR endblock ", " Endblock ", " EndBlock " or " ENDBLOCK . .SS Block declaration A .B block declaration is used to group together a set of state declarations. Blocks are useful if there are blocks of states and their interconnections that occur more than once in the NFA. In this case it is useful to declare a block, allowing that block to be instantiated more than once elsewhere in the input file. Since state declarations are only allowed inside blocks, there must be at least one block declaration in any useful input file. The syntax of a block declaration is .RS .B block .I block-name { .br .RS 2 [ .I instance-declarations ] .br [ .I state-declarations ] .RE .br } .RE .SS State declarations A .B state declaration gives rise to a state in the input NFA. The syntax of a state declaration is .RS .B state .I state-name [ .B entry .I entry-name ] .br .RS 2 [ .I transitions ] .RE .RE States are implicitly terminated by the beginning of another type of construct. .B entry .I entry-name (if present) defines the name of an entry point into the scanner. In the resulting C-code, a symbol called .I entry-name will be declared. Its value will be the DFA state number of the state containing just this NFA state (plus its epsilon closure.) This allows for multiple scanners to be generated from the same input file. For example, if one scanner is the same as another but with some extra text that must match at the beginning, two different .B entry states can be declared to represent this. .B dfasyn will be able to common-up all of the common part of the DFA's transition tables. If there are no .B entry directives anywhere in the input file, .B dfasyn defaults to the last mentioned state in the last block being the entry state. .I transitions is a whitespace-separated sequence of zero or more transitions. These define which of the automaton's input symbols cause a transition from this state to which other states. The same state may be declared more than once inside its block. In this case, the transitions given in the second declaration will be merged with those given in the first, as though all the transitions had been given in the first place. .SS Instance declarations A block may be instantiated inside another block. This is useful if there is a block of states with their transitions that occurs in more than once place within the NFA. The syntax for an instance declaration is .RS .I instance-name : .I block-name .RE where .I instance-name is the name of the new instance, and .I block-name is the name of the block that is being instantiated. This block .B must have been declared earlier in the input file. For one thing, this prevents mutually recursive definitions. When such an instance has been created, the states inside it may be referred to within the enclosing block by prefixing their names with the .I instance-name followed by a period. .SS Transitions A state-to-state transition is specified as follows. .RS .I transition -> .I destinations .RE .I destinations is a comma-separated list of one or more fully-qualified state names. These are the states to which the NFA moves if the .I transition is matched next in the input. The destination state names are allowed to be forward-references; just the name is stored during parsing, and a second pass later is used to resolve all the names. There is no need for a named destination to actually be declared with another state definition; a state just comes into being if it is named at all. A .I transition defines the inputs that are required to cause the scanner to move from one state to another. A .I transition is a semicolon-separated list of one or more .I stimuli. (If there is only one stimulus, no semicolon is required.) The transition matches as a whole if the stimuli are matched individually in sequential order from left to right. .SS Transitions to a tag Where a transition leads to a tagged exit state, the following syntax is used: .RS .I transition = .I tags .RE where .I tags is a comma-separated list of one or more tag names. Thus a construction like .RS state foo XXX = TAG1 .RE indicates that matching the token XXX leads to a state in which TAG1 applies. .SS Stimuli A .B stimulus is a pipe-separated list of alternatives. Each alternative may be one of the following: .IP "*" 7 the name of a token .IP "*" 7 a character class .IP "*" 7 the name of an abbreviation .IP "*" 7 an empty string (which gives rise to an .B epsilon transition ) .IP "*" 7 an inline block instance .SS Input symbols Input symbols can be defined in two ways. The first is to use ASCII characters directly. The second is to define a set of .I tokens and use a front-end module to generate these based on the actual input. You can actually mix both types of input symbol. For example, you might wish to use ASCII characters mostly, but detect \(dqend-of-file\(dq as an explicit symbol. .SS ASCII input and character classes. Single ASCII characters can be given in double-quotes. Sets of ASCII characters can be given in square brackets, similar to shell globbing. Character classes can be negated and differenced. .IP [a] 12 The character "a". .IP [abe-h] 12 Any of the characters "a", "b", "e", "f", "g", "h". .IP ~[abc] 12 Any of the 253 characters excluding "a", "b" and "c"; a negated character class. .IP [^abc] 12 Ditto - another way of expressing a negated character class. .IP [a-z]~[c] 12 Equivalent to [abd-z]. .PP The following special cases are available within the square brackets: .IP \(rs- 8 A hyphen. Normally the hyphen is used as a range separator. To get a literal hyphen, it must be escaped by a back-slash. .IP \(rs] 8 A closing square bracket. The escaping is required to prevent it being handled as the end of the character class. .IP \(rs\(rs 8 A literal backslash. .IP \(rs^ 8 A literal "^". .IP \(rsn 8 The same character as "\(rsn" in C. .IP \(rsr 8 The same character as "\(rsr" in C. .IP \(rsf 8 The same character as "\(rsf" in C. .IP \(rst 8 The same character as "\(rst" in C. .IP ^A 8 Generate a control character, in this case ASCII character 1. Defined for ^@ through to ^Z. .IP \(rsxa9 8 The ASCII character with hex value 0xa9. Upper or lower case hex may be used. .IP \(rs234 The ASCII character with octal value 0234. .SS Tokens To define non-ASCII inputs, at least one .B tokens directive must be used. The syntax is .PP .B tokens .I list-of-tokens .PP where .I list-of-tokens is a space-separated list of token names. Each token name is a string that will be acceptable as a C macro name when prefixed by the current prefix string plus an underscore. If more than one .B tokens line appears in the input file, the 2nd and subsequent lines are treated as though their entries were concatenated with the 1st line. .SS Abbreviations An .B abbreviation provides a convenient way to define a shorthand name for a frequently used .B stimulus. The syntax is .RS .B abbrev .I abbrev-name = .I stimulus .RE For example: .RS abbrev FOO = [aeiouAEIOU] | A_TOKEN | out> .RE .SS Inline block instances A .B stimulus may take the form of a block instance. This is a convenient shorthand when a complex sequence of input tokens needs to be matched as part of a transition. The syntax of an inline block instance is .RS .RI < block_name : entry_state "->" exit_state > .RE As an example, given a block .B double_a defined like this .RS block double_a state in A -> out .br endblock .RE the following construction .RS block x state foo out> ; B ; out> -> bar .br endblock .RE is equivalent to .RS block x aa1 : double_a aa2 : double_a state foo -> aa1.in state aa1.out B -> aa2.in state aa2.out -> bar .br endblock .RE Note that in the second example, where explicit instances have been created, they must have unique names. In the first case, .B dfasyn will create the two anonymous instances automatically and handle all the plumbing to connect up the in and out states. Note there is no requirement for the states to be named 'in' and 'out'; that is merely a convention. An instanced block may have multiple inputs, with different inputs being used in different instantiations of the block, for example. .SS Tags and attributes .B Tags are associated with the NFA states in the input. An NFA state may have an arbitrary number of tags associated with it, through what amounts to a list of strings. .B Attributes are attached to the DFA states in the output. In the generated C-file, the attributes are expressed in terms of an array which is indexed by the DFA state number and whose elements are the attribute values applying to the states. Once the DFA has been generated, .B dfasyn knows the NFA states that apply in each DFA state. From this, the tags associated with a DFA state are given by the union of all the tags appylying in all the NFA states that apply in that DFA state. The input file defines how a set of tags applying in a DFA state is to be reduced to a single attribute value. A boolean expression language is provided for this purpose. Although the default is to generate a single attribute table, .B dfasyn can generate arbitrarily many tables if required. This is achieved by using .B attribute groups. The NFA tag namespace is shared across all such groups. The group syntax is as follows: .RS .B group .I groupname .B { .I declaration [ .RI ", " declaration \ ... ] .B } .RE where each .I declaration is one of the following: .RS .B attr .I attribute-name [ .RI ", " attribute-name \ ... ] .br .B attr .I attribute-name .B : .I expression .br .B early .B attr .I attribute-name [ .RI ", " attribute-name \ ... ] .br .B early .B attr .I attribute-name .B : .I expression .RE In the form with no expression, each .I attribute-name has an implicit expression consisting of just the tag with the same name as itself. .I expression is defined in the section .B Expressions later. The short form .RS .B attr foo .RE is short for .RS .B attr foo .B : foo .RE i.e. it allows an attribute to be defined which has the same name as a tag and which is active in the cases where precisely that tag is active. If an attribute is prefixed by .BR early , it means that the C-code you provide to drive the DFA is going to stop scanning once this state attribute is detected. For example, this would apply if you were coding a "shortest match" scanner. .B dfasyn will prune all the transitions away from any DFA state having such an attribute. This may lead to greater opportunities for .B dfasyn to compress the DFA. A default attribute must be declared. This is used to fill all the entries in the attribute array for DFA states that end up with no explicit attribute defined. (It is also used in determining where the DFA may be optimised to remove "dead states".) The syntax is .RS .B defattr .I default-attribute-string .RE Finally, the C-type of the attribute must be declared. This becomes the base type of the array indexed by the DFA state number. The syntax is .RS .B type .I attribute-type-name .RE It is illegal for more than one attribute in a particular attribute group to be active in a DFA state. If this situation occurs, it indicates that the expression logic for that group is defective. .SS Expressions An .I expression defines an attribute in terms of a boolean relationship between one more more tags. An .I expression may be any one of the following: .RS .IR expression " & " expression .br .IR expression " | " expression .br .IR expression " ^ " expression .br .IR expression " ? " expression " : " expression .br .RI ( expression ) .br .RI "~" expression .br .RI "!" expression .br .I tag-name .RE Note that .RI "~" expression and .RI "!" expression both mean the negation of expression. The operator precedence is what would be expected for a C-programmer. .SH Prefix specification The .B prefix used in the generated C-file can optionally be set in the input file using the following syntax: .RS .B prefix .I prefix-string .RE where .IR prefix-string _ (i.e. the specific string followed by an underscore) will occur at the start of each symbol name in the generated C-file. If the prefix has been set via the command line using .BR -p , the .B prefix line in the input file will be ignored and a warning given. .SH "THE GENERATED C-FILE" The generated file exports the following symbols that can be used by the calling program: .TP .B short .IB prefix_ char2tok [256]; .br If character classes have been used, this table maps from ASCII values to the internal tokens numbers used by the generated DFA. This array will be defined in the generated C-file. If a header file is being generated, it will be declared in there also. .TP .B #define .IB prefix_ TOKEN .I numeric_value .br If a .b tokens directive has been used, each such token will be assigned a number. These assignments are emitted by .b dfasyn as a series of #define lines. Each token name from the input file will have the .I prefix and an underscore prepended to form the name of the symbol in the #define. If a header file is being generated .RB ( -ho ), these definitions are placed in the header file. Otherwise, they are placed in the main output C-file. .TP 7 .B int .IB prefix_ next_state (int current_state, int next_state); .br This is the prototype for the next state function which the calling program must invoke. If no .B -I option has been used, this function will be defined in the generated C-file. If a header file is being generated, it will be prototyped in there also. If .B -I has been used, the function will be defined in the header file. .TP .B int .IB prefix _ entry-name .br If the .B entrystruct directive has not been used, this format is used to define the DFA state numbers for the defined entry points. The calling program uses these values to set the .I current_state at the start of the scanning process, depending on which entry point is being used. If there is more than one entry, there will be more than one such line. .TP .B struct .I entrystruct-type { ... } .I entrystruct-var .br If the .B entrystruct directive has been used, the DFA state numbers for the entry points are declared as elements of a struct. The struct member names are identical to the entry names used in the .B dfasyn input file. The declaration of the struct variable containing the state numbers will be in the generated C-file. If a header file is being generated .RB ( -ho ), the definition of the struct type will be in there. Otherwise, it will be in the C-file also. .TP 12 .I attr-type .IB prefix_ attr .RI [ #DFA-states ] .br This defines the attributes for each of the DFA states in the default attribute group. If no .B type .I attr-type declaration was in the input file, the default of .B short will be used. If other attribute groups are defined, there will be a similar array for each one: .TP 18 .I group-attr-type .I prefix_group-name .RI [ #DFA-states ] .br For the attribute group declared with .B group .I group-name in the input file, this defines the attribute of each of the DFA states in that group. .SH TEXT PASSTHROUGH To pass a block of literal text through to the output file without interpretation, enclose it in %{ ... %} like this: .RS %{ .br #include "foo.h" .br %} .RE The opening and closing patterns must be on lines on their own (trailing whitespace is allowed). .SH "SEE ALSO" .BR dfasyn (1) mairix-master/dfasyn/dfasyn.c000066400000000000000000000510511224450623700165630ustar00rootroot00000000000000/*************************************** Main program for NFA to DFA table builder program. ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2000-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" FILE *report = NULL; FILE *output = NULL; FILE *header_output = NULL; /* If non-null this gets prepended onto the names of the all the entities that * are generated in the output file. */ char *prefix = NULL; extern int yyparse(void); /* ================================================================= */ static char *entrystruct = NULL; static char *entryvar = NULL; void define_entrystruct(const char *s, const char *v)/*{{{*/ { if (!entrystruct) { entrystruct = new_string(s); entryvar = new_string(v); } else { fprintf(stderr, "Can't redefine entrystruct with <%s>\n", s); exit(1); } } /*}}}*/ /* ================================================================= */ static void print_token_table(void)/*{{{*/ { FILE *dest; int i; extern char *prefix; dest = header_output ? header_output : output; /* Not sure how it makes sense to write this to the C file : maybe if you're going * to include the C file into a bigger one it's reasonable? Anyway, the intention * is that you're more likely to use this for real if you're writing a header file. */ for (i=0; in; i++) { char *attr = dfa->s[i]->attrs[tab]; fprintf(output, " %s", attr ? attr : defattr); fputc ((i<(dfa->n - 1)) ? ',' : ' ', output); fprintf(output, " /* State %d */\n", i); } fprintf(output, "};\n\n"); if (header_output) { fprintf(header_output, "extern %s %s%s[];\n", get_attr_type(tab), prefix_under, attrname); } } } /*}}}*/ static void check_default_attrs(void)/*{{{*/ { int tab; int fail = 0; for (tab=0; tab= %d) return -1;\n", Nt); fprintf(dest, " return %strans[%d*current_state + next_token];\n", prefix_under, Nt); fprintf(dest, "}\n"); if (!do_inline && header_output) { fprintf(header_output, "extern int %snext_state(int current_state, int next_token);\n", prefix_under); } } /*}}}*/ static void print_uncompressed_tables(struct DFA *dfa, int do_inline, const char *prefix_under)/*{{{*/ /* Print out the state/transition table uncompressed, i.e. every token has an array entry in every state. This is fast to access but quite wasteful on memory with many states and many tokens. */ { int Nt = ntokens + n_charclasses; int n, i, j; n = 0; fprintf(output, "%sshort %strans[] = {", do_inline ? "" : "static ", prefix_under); if (do_inline) { fprintf(header_output, "extern short %strans[];\n", prefix_under); } for (i=0; in; i++) { for (j=0; j0) fputc (',', output); if (n%8 == 0) { fprintf(output, "\n "); } else { fputc(' ', output); } n++; fprintf(output, "%4d", dfa->s[i]->map[j]); } } fprintf(output, "\n};\n\n"); write_next_state_function_uncompressed(Nt, do_inline, prefix_under); } /*}}}*/ static int check_include_char(struct DFA *dfa, int this_state, int token)/*{{{*/ { if (dfa->s[this_state]->defstate >= 0) { return (dfa->s[this_state]->map[token] != dfa->s[dfa->s[this_state]->defstate]->map[token]); } else { return (dfa->s[this_state]->map[token] >= 0); } } /*}}}*/ static void write_next_state_function_compressed(int do_inline, const char *prefix_under)/*{{{*/ /* Write the next_state function for traversing compressed tables into the output file. */ { FILE *dest; dest = do_inline ? header_output : output; fprintf(dest, "%sint %snext_state(int current_state, int next_token) {\n", do_inline ? "static inline " : "", prefix_under); fprintf(dest, " int h, l, m, xm;\n"); fprintf(dest, " while (current_state >= 0) {\n"); fprintf(dest, " l = %sbase[current_state], h = %sbase[current_state+1];\n", prefix_under, prefix_under); fprintf(dest, " while (h > l) {\n"); fprintf(dest, " m = (h + l) >> 1; xm = %stoken[m];\n", prefix_under); fprintf(dest, " if (xm == next_token) goto done;\n"); fprintf(dest, " if (m == l) break;\n"); fprintf(dest, " if (xm > next_token) h = m;\n"); fprintf(dest, " else l = m;\n"); fprintf(dest, " }\n"); fprintf(dest, " current_state = %sdefstate[current_state];\n", prefix_under); fprintf(dest, " }\n"); fprintf(dest, " return -1;\n"); fprintf(dest, " done:\n"); fprintf(dest, " return %snextstate[m];\n", prefix_under); fprintf(dest, "}\n"); if (!do_inline && header_output) { fprintf(header_output, "extern int %snext_state(int current_state, int next_token);\n", prefix_under); } } /*}}}*/ static void print_compressed_tables(struct DFA *dfa, int do_inline, const char *prefix_under)/*{{{*/ /* Print state/transition table in compressed form. This is more economical on storage, but requires a bisection search to find the next state for a given current state & token */ { int *basetab = new_array(int, dfa->n + 1); int Nt = ntokens + n_charclasses; int n, i, j; n = 0; fprintf(output, "%sunsigned char %stoken[] = {", do_inline ? "" : "static ", prefix_under); for (i=0; in; i++) { for (j=0; j0) fputc (',', output); if (n%8 == 0) { fprintf(output, "\n "); } else { fputc(' ', output); } n++; fprintf(output, "%3d", j); } } } fprintf(output, "\n};\n\n"); n = 0; fprintf(output, "%sshort %snextstate[] = {", do_inline ? "" : "static ", prefix_under); for (i=0; in; i++) { basetab[i] = n; for (j=0; j0) fputc (',', output); if (n%8 == 0) { fprintf(output, "\n "); } else { fputc(' ', output); } n++; fprintf(output, "%5d", dfa->s[i]->map[j]); } } } fprintf(output, "\n};\n\n"); basetab[dfa->n] = n; n = 0; fprintf(output, "%sunsigned short %sbase[] = {", do_inline ? "" : "static ", prefix_under); for (i=0; i<=dfa->n; i++) { if (n>0) fputc (',', output); if (n%8 == 0) { fprintf(output, "\n "); } else { fputc(' ', output); } n++; fprintf(output, "%5d", basetab[i]); } fprintf(output, "\n};\n\n"); n = 0; fprintf(output, "%sshort %sdefstate[] = {", do_inline ? "" : "static ", prefix_under); for (i=0; in; i++) { if (n>0) fputc (',', output); if (n%8 == 0) { fprintf(output, "\n "); } else { fputc(' ', output); } n++; fprintf(output, "%5d", dfa->s[i]->defstate); } fprintf(output, "\n};\n\n"); if (do_inline) { fprintf(header_output, "extern unsigned char %stoken[];\n", prefix_under); fprintf(header_output, "extern short %snextstate[];\n", prefix_under); fprintf(header_output, "extern unsigned short %sbase[];\n", prefix_under); fprintf(header_output, "extern short %sdefstate[];\n", prefix_under); } free(basetab); write_next_state_function_compressed(do_inline, prefix_under); } /*}}}*/ static void print_entries_table(const char *prefix_under)/*{{{*/ { int i; if (entrystruct) { int first; /* If we write the struct defn to the header file, we ought not to emit the * full struct defn again in the main output. This is tricky unless we can * guarantee the header will get included, though. */ fprintf(output, "struct %s {\n", entrystruct); if (header_output) { fprintf(header_output, "extern struct %s {\n", entrystruct); } for (i=0; inext) Ne++; if (report) { fprintf(report, "Processing %d separate entry points\n", Ne); } blocks = new_array(Block*, Ne); for (Nb=0, e=entries; e; e=e->next) { int matched = 0; for (bi=0; bistate->parent == blocks[bi]) { matched = 1; break; } } if (!matched) { blocks[Nb++] = e->state->parent; } } for (Ns=0, bi=0; binstates; } if (report) { fprintf(report, "Entries in %d blocks, total of %d states\n", Nb, Ns); } jumbo = new(Block); jumbo->name = "(UNION OF MULTIPLE BLOCKS)"; jumbo->nstates = jumbo->maxstates = Ns; jumbo->states = new_array(State *, Ns); jumbo->eclo = NULL; for (bi=0, si=0; binstates; int i; int block_name_len; memcpy(jumbo->states + si, blocks[bi]->states, sizeof(State *) * ns); block_name_len = strlen(blocks[bi]->name); for (i=0; istates[si + i]; len = block_name_len + strlen(s->name) + 2; new_name = new_array(char, len); strcpy(new_name, blocks[bi]->name); strcat(new_name, "."); strcat(new_name, s->name); free(s->name); s->name = new_name; } si += ns; } /* Reindex all the states */ for (si=0; sistates[si]->index = si; } split_charclasses(jumbo); expand_charclass_transitions(jumbo); if (verbose) fprintf(stderr, "Computing epsilon closure...\n"); generate_epsilon_closure(jumbo); print_nfa(jumbo); build_transmap(jumbo); if (verbose) fprintf(stderr, "Building DFA...\n"); n_dfa_entries = Ne; dfa_entries = new_array(struct DFAEntry, Ne); for (e=entries, ei=0; e; e=e->next, ei++) { dfa_entries[ei].entry_name = new_string(e->entry_name); dfa_entries[ei].state_number = e->state->index; } *dfa = build_dfa(jumbo); *blk = jumbo; } /*}}}*/ /* ================================================================= */ static void usage(void)/*{{{*/ { fprintf(stderr, "dfasyn, Copyright (C) 2001-2003,2005,2006 Richard P. Curnow\n" "\n" "dfasyn comes with ABSOLUTELY NO WARRANTY.\n" "This is free software, and you are welcome to redistribute it\n" "under certain conditions; see the GNU General Public License for details.\n" "\n" "Usage: dfasyn [OPTION]... FILE\n" "Read state-machine description from FILE and generate a deterministic automaton.\n" "Write results to stdout unless options dictate otherwise.\n" "\n" "Output files:\n" " -o, --output FILE Define the name of the output file (e.g. foobar.c)\n" " -ho, --header-output FILE Define the name of the header output file (e.g. foobar.h)\n" " -r, --report FILE Define the name where the full generator report goes (e.g. foobar.report)\n" "\n" "Generated automaton:\n" " -p, --prefix PREFIX Specify a prefix for the variables and functions in the generated file(s)\n" " -u, --uncompressed-tables Don't compress the generated transition tables\n" " -ud, --uncompressed-dfa Don't common-up identical states in the DFA\n" " -I, --inline-function Make the next_state function inline (requires -ho)\n" "\n" "General:\n" " -v, --verbose Be verbose\n" " -h, --help Display this help message\n" ); } /*}}}*/ /* ================================================================= */ int main (int argc, char **argv)/*{{{*/ { int result; Block *main_block; char *input_name = NULL; char *output_name = NULL; char *header_output_name = NULL; char *report_name = NULL; int uncompressed_tables = 0; int uncompressed_dfa = 0; /* Useful for debug */ int do_inline = 0; extern char *prefix; char *prefix_under; FILE *input = NULL; struct DFA *dfa; verbose = 0; report = NULL; /*{{{ Parse cmd line arguments */ while (++argv, --argc) { if (!strcmp(*argv, "-h") || !strcmp(*argv, "--help")) { usage(); exit(0); } else if (!strcmp(*argv, "-v") || !strcmp(*argv, "--verbose")) { verbose = 1; } else if (!strcmp(*argv, "-o") || !strcmp(*argv, "--output")) { ++argv, --argc; output_name = *argv; } else if (!strcmp(*argv, "-ho") || !strcmp(*argv, "--header-output")) { ++argv, --argc; header_output_name = *argv; } else if (!strcmp(*argv, "-r") || !strcmp(*argv, "--report")) { ++argv, --argc; report_name = *argv; } else if (!strcmp(*argv, "-u") || !strcmp(*argv, "--uncompressed-tables")) { uncompressed_tables = 1; } else if (!strcmp(*argv, "-ud") || !strcmp(*argv, "--uncompressed-dfa")) { uncompressed_dfa = 1; } else if (!strcmp(*argv, "-I") || !strcmp(*argv, "--inline-function")) { do_inline = 1; } else if (!strcmp(*argv, "-p") || !strcmp(*argv, "--prefix")) { ++argv, --argc; prefix = *argv; } else if ((*argv)[0] == '-') { fprintf(stderr, "Unrecognized command line option %s\n", *argv); } else { input_name = *argv; } } /*}}}*/ if (do_inline && !header_output_name) {/*{{{*/ fprintf(stderr, "--------------------------------------------------------------\n" "It doesn't make sense to try inlining if you're not generating\n" "a separate header file.\n" "Not inlining the transition function.\n" "--------------------------------------------------------------\n" ); do_inline = 0; } /*}}}*/ if (input_name) {/*{{{*/ input = fopen(input_name, "r"); if (!input) { fprintf(stderr, "Can't open %s for input, exiting\n", input_name); exit(1); } } else { input = stdin; } /*}}}*/ if (output_name) {/*{{{*/ output = fopen(output_name, "w"); if (!output) { fprintf(stderr, "Can't open %s for writing, exiting\n", output_name); exit(1); } } else { output = stdout; } /*}}}*/ if (header_output_name) {/*{{{*/ header_output = fopen(header_output_name, "w"); if (!header_output) { fprintf(stderr, "Can't open %s for writing, exiting\n", header_output_name); exit(1); } } /* otherwise the header stuff just goes to the same fd as the main output. */ /*}}}*/ if (report_name) {/*{{{*/ report = fopen(report_name, "w"); if (!report) { fprintf(stderr, "Can't open %s for writing, no report will be created\n", report_name); } } /*}}}*/ if (verbose) { fprintf(stderr, "General-purpose automaton builder\n"); fprintf(stderr, "Copyright (C) Richard P. Curnow 2000-2003,2005,2006\n"); } eval_initialise(); if (verbose) fprintf(stderr, "Parsing input..."); yyin = input; /* Set yyout. This means that if anything leaks from the scanner, or appears in a %{ .. %} block, it goes to the right place. */ yyout = output; result = yyparse(); if (result > 0) exit(1); if (verbose) fprintf(stderr, "\n"); make_evaluator_array(); check_default_attrs(); if (!entries) { /* Support legacy method : the last state to be current in the input file * is the entry state of the NFA */ State *start_state; start_state = get_curstate(); main_block = start_state->parent; split_charclasses(main_block); expand_charclass_transitions(main_block); if (verbose) fprintf(stderr, "Computing epsilon closure...\n"); generate_epsilon_closure(main_block); print_nfa(main_block); build_transmap(main_block); if (verbose) fprintf(stderr, "Building DFA...\n"); { struct DFAEntry entry[1]; n_dfa_entries = 1; dfa_entries = entry; entry[0].entry_name = "(ONLY ENTRY)"; entry[0].state_number = start_state->index; dfa = build_dfa(main_block); } } else { /* Allow generation of multiple entry states, so you can use the same input file when * you need several automata that have a lot of logic in common. */ deal_with_multiple_entries(&main_block, &dfa); } if (report) { fprintf(report, "--------------------------------\n" "DFA structure before compression\n" "--------------------------------\n"); } print_dfa(dfa); if (had_ambiguous_result) { fprintf(stderr, "No output written, there were ambiguous attribute values for accepting states\n"); exit(2); } if (!uncompressed_dfa) { if (verbose) fprintf(stderr, "\nCompressing DFA...\n"); compress_dfa(dfa, ntokens + n_charclasses, n_dfa_entries, dfa_entries); } if (verbose) fprintf(stderr, "\nCompressing transition tables...\n"); compress_transition_table(dfa, ntokens + n_charclasses); if (report) { fprintf(report, "-------------------------------\n" "DFA structure after compression\n" "-------------------------------\n"); } if (verbose) fprintf(stderr, "Writing outputs...\n"); print_dfa(dfa); if (prefix) { prefix_under = new_array(char, 2 + strlen(prefix)); strcpy(prefix_under, prefix); strcat(prefix_under, "_"); } else { prefix_under = ""; } if (header_output) { fprintf(header_output, "#ifndef %sHEADER_H\n", prefix_under); fprintf(header_output, "#define %sHEADER_H\n", prefix_under); } print_token_table(); print_charclass_mapping(output, header_output, prefix_under); print_attr_tables(dfa, prefix_under); if (uncompressed_tables) { print_uncompressed_tables(dfa, do_inline, prefix_under); } else { print_compressed_tables(dfa, do_inline, prefix_under); } if (entries) { /* Emit entry table */ print_entries_table(prefix_under); } else { /* Legacy behaviour - DFA state 0 is implicitly the single entry state. */ } if (report) { fclose(report); report = NULL; } report_unused_tags(); if (header_output) { fprintf(header_output, "#endif\n"); } return result; } /*}}}*/ mairix-master/dfasyn/dfasyn.h000066400000000000000000000251351224450623700165740ustar00rootroot00000000000000/*************************************** Header file for NFA->DFA conversion utility. ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #ifndef N2D_H #define N2D_H #include #include #include #define new(T) ((T *) malloc(sizeof(T))) #define new_array(T,N) ((T *) malloc((N) * sizeof(T))) #define resize_array(T,arr,newN) ((T *) ((arr) ? realloc(arr,(newN)*sizeof(T)) : malloc((newN)*sizeof(T)))) #define new_string(s) strcpy((char *)malloc((strlen(s)+1)*sizeof(char)),s) /* For typecasting, especially useful for declarations of local ptrs to args of a qsort comparison fn */ #define Castdecl(x, T, nx) T nx = (T) x #define Castderef(x, T, nx) T nx = *(T*) x /* Globally visible options to control reporting */ extern FILE *report; extern FILE *report; extern FILE *output; extern FILE *header_output; /* Bison interface. */ extern FILE *yyin; extern FILE *yyout; extern int verbose; extern char *prefix; /* Temporary - this will be done better when the charclass stuff is * added. */ extern char **toktable; extern int ntokens; extern int n_charclasses; extern int had_ambiguous_result; extern int n_dfa_entries; extern struct DFAEntry *dfa_entries; struct State; struct Block; struct StimulusList; struct Abbrev {/*{{{*/ char *lhs; /* Defined name */ struct StimulusList *stimuli; #if 0 char **rhs; /* Token/define */ int nrhs; int maxrhs; #endif }; /*}}}*/ typedef enum StimulusType {/*{{{*/ T_EPSILON, T_TOKEN, T_ABBREV, T_INLINEBLOCK, T_CHARCLASS } StimulusType; /*}}}*/ typedef struct InlineBlock {/*{{{*/ char *type; /* Block type */ char *in; /* Name of input node */ char *out; /* Name of output node */ } InlineBlock; /*}}}*/ #define ULONGS_PER_CC 8 typedef struct CharClass {/*{{{*/ int is_used; unsigned long char_bitmap[ULONGS_PER_CC]; unsigned long group_bitmap[ULONGS_PER_CC]; } CharClass; /*}}}*/ typedef struct Stimulus {/*{{{*/ StimulusType type; union { /* TODO : token should eventually become a struct ref ? */ int token; struct Abbrev *abbrev; /* placeholders */ InlineBlock *inline_block; CharClass *char_class; } x; } Stimulus; /*}}}*/ typedef struct StimulusList {/*{{{*/ struct StimulusList *next; Stimulus *stimulus; } StimulusList; /*}}}*/ typedef enum TransType {/*{{{*/ TT_EPSILON, TT_TOKEN, TT_CHARCLASS } TransType; /*}}}*/ typedef struct TransList {/*{{{*/ struct TransList *next; TransType type; union { int token; CharClass *char_class; } x; char *ds_name; struct State *ds_ref; } TransList; /*}}}*/ typedef struct Stringlist {/*{{{*/ struct Stringlist *next; char *string; } Stringlist; /*}}}*/ #if 0 typedef struct InlineBlockList {/*{{{*/ struct InlineBlockList *next; InlineBlock *ib; } InlineBlockList; /*}}}*/ #endif typedef struct State {/*{{{*/ char *name; int index; /* Array index in containing block */ struct Block *parent; TransList *transitions; Stringlist *tags; Stringlist *entries; /* Pointers to the nodes in the 'transitions' list, sorted into canonical order */ TransList **ordered_trans; int n_transitions; unsigned char removed; /* Flag indicating state has been pruned by compression stage */ } State; /*}}}*/ typedef struct S_Stateset {/*{{{*/ State **states; int nstates; int maxstates; } Stateset; /*}}}*/ #define HASH_BUCKETS 64 #define HASH_MASK (HASH_BUCKETS-1) typedef struct Block {/*{{{*/ char *name; /* The master table of states within this block. This has to be in a flat array because we have to work with respect to state indices when doing the 2D bitmap stuff for the subset construction. */ State **states; int nstates; int maxstates; /* epsilon closure for this block (treating it as a top-level block.) */ unsigned long **eclo; /* Hash table for getting rapid access to a state within the block, given its name */ Stateset state_hash[HASH_BUCKETS]; int subcount; /* Number for generating substates */ int subblockcount; /* Number for generating inline subblocks */ } Block; /*}}}*/ struct Entrylist {/*{{{*/ struct Entrylist *next; char *entry_name; State *state; }; /*}}}*/ extern struct Entrylist *entries; typedef struct DFANode {/*{{{*/ unsigned long *nfas; unsigned long signature; /* All the longwords in the nfas array xor'ed together */ int index; /* Entry's own index in the array */ int *map; /* index by token code */ int from_state; /* the state which provided the first transition to this one (leading to its creation) */ int via_token; /* the token through which we got to this state the first time. */ Stringlist *nfa_exit_sl; /* NFA exit values */ Stringlist *nfa_attr_sl; /* NFA exit values */ char **attrs; /* Attributes, computed by boolean expressions defined in input text */ int has_early_exit; /* If !=0, the scanner is expected to exit immediately this DFA state is entered. It means that no out-bound transitions have to be created. */ /* Fields calculated in compdfa.c */ /* The equivalence class the state is in. */ int eq_class; /* Temp. storage for the new eq. class within a single pass of the splitting alg. */ int new_eq_class; /* Signature field from above is also re-used. */ int is_rep; /* Set if state is chosen as the representative of its equivalence class. */ int is_dead; /* Set if the state has no path to a non-default result */ int new_index; /* New index assigned to the state. */ /* Fields calculated in tabcompr.c */ unsigned long transition_sig; /* Default state, i.e. the one that supplies transitions for tokens not explicitly listed for this one. */ int defstate; /* Number of transitions that this state has different to those in the default state. */ int best_diff; } DFANode; /*}}}*/ struct DFAEntry {/*{{{*/ char *entry_name; /* Initially the NFA number, overwritten with DFA number by build_dfa */ int state_number; }; /*}}}*/ struct DFA {/*{{{*/ DFANode **s; /* states */ int n; int max; /* the original block that the DFA comes from. */ Block *b; }; /*}}}*/ void yyerror(const char *s); extern int yylex(void); /* Constants for 'create' args */ #define USE_OLD_MUST_EXIST 0 #define CREATE_MUST_NOT_EXIST 1 #define CREATE_OR_USE_OLD 2 State *get_curstate(void); struct Abbrev; extern struct Abbrev * create_abbrev(const char *name, struct StimulusList *stimuli); int lookup_token(char *name, int create); Block *lookup_block(char *name, int create); State *lookup_state(Block *in_block, char *name, int create); void add_entry_to_state(State *curstate, const char *entry); void define_entrystruct(const char *s, const char *v); Stringlist * add_string_to_list(Stringlist *existing, const char *token); void add_transitions(Block *curblock, State *curstate, StimulusList *stimuli, char *destination); State * add_transitions_to_internal(Block *curblock, State *addtostate, StimulusList *stimuli); void add_tags(State *curstate, Stringlist *sl); InlineBlock *create_inline_block(char *type, char *in, char *out); void instantiate_block(Block *curblock, char *block_name, char *instance_name); void fixup_state_refs(Block *b); void expand_charclass_transitions(Block *b); void compress_nfa(Block *b); extern void generate_epsilon_closure(Block *b); extern void print_nfa(Block *b); extern void build_transmap(Block *b); extern struct DFA *build_dfa(Block *b); extern void print_dfa(struct DFA *dfa); /* In expr.c */ typedef struct Expr Expr; Expr * new_not_expr(Expr *c); Expr * new_and_expr(Expr *c1, Expr *c2); Expr * new_or_expr(Expr *c1, Expr *c2); Expr * new_xor_expr(Expr *c1, Expr *c2); Expr * new_cond_expr(Expr *c1, Expr *c2, Expr *c3); Expr * new_tag_expr(char *tag_name); extern int eval(Expr *e); void define_tag(char *name, Expr *e); void clear_tag_values(void); void report_unused_tags(void); /* In evaluator.c */ typedef struct evaluator Evaluator; extern int n_evaluators; extern Evaluator *default_evaluator; extern Evaluator *start_evaluator(const char *name); void define_attr(Evaluator *x, char *string, Expr *e, int early); void define_defattr(Evaluator *x, char *string); void set_tag_value(char *tag_name); int evaluate_attrs(char ***, int *); int evaluator_is_used(Evaluator *x); void define_defattr(Evaluator *x, char *text); void define_type(Evaluator *x, char *text); char* get_defattr(int i); char* get_attr_type(int i); char* get_attr_name(int i); void make_evaluator_array(void); void emit_dfa_attr_report(char **results, FILE *out); void eval_initialise(void); void compress_transition_table(struct DFA *dfa, int ntokens); unsigned long increment(unsigned long x, int field); unsigned long count_bits_set(unsigned long x); /* in abbrevs.c */ struct Abbrev * lookup_abbrev(char *name); /* in stimulus.c */ extern Stimulus *stimulus_from_epsilon(void); extern Stimulus *stimulus_from_string(char *str); extern Stimulus *stimulus_from_inline_block(InlineBlock *block); extern Stimulus *stimulus_from_char_class(CharClass *char_class); extern StimulusList *append_stimulus_to_list(StimulusList *existing, Stimulus *stim); /* in charclass.c */ extern int cc_test_bit(const unsigned long *bitmap, int entry); extern CharClass *new_charclass(void); extern void free_charclass(CharClass *what); extern void add_charclass_to_list(CharClass *cc); extern void add_singleton_to_charclass(CharClass *towhat, char thechar); extern void add_range_to_charclass(CharClass *towhat, char star, char end); extern void invert_charclass(CharClass *what); extern void diff_charclasses(CharClass *left, CharClass *right); extern void split_charclasses(const Block *b); extern void print_charclass_mapping(FILE *out, FILE *header_out, const char *prefix_under); extern void print_charclass(FILE *out, int idx); /* Return new number of DFA states */ extern void compress_dfa(struct DFA *dfa, int ntokens, int n_dfa_entries, struct DFAEntry *dfa_entries); #endif /* N2D_H */ mairix-master/dfasyn/dfasyn.texi000066400000000000000000000046671224450623700173250ustar00rootroot00000000000000@setfilename dfasyn.info @settitle User guide for the dfasyn DFA construction utility @titlepage @title dfasyn user guide @subtitle This manual describes how to use dfasyn. @author Richard P. Curnow @page @end titlepage @c{{{ Top node @node Top @top @menu * Introduction:: The introduction * Input file format:: A reference for the input file * Concept Index:: Index of concepts @end menu @c}}} @c{{{ ch:Introduction @node Introduction @chapter Introduction @menu * Uses for dfasyn:: The types of problem to which dfasyn is well-suited @end menu @node Uses for dfasyn @section Uses for dfasyn dfasyn is particularly suited to the following types of scanning problem, both of which exceed flex's capabilities @itemize @bullet @item When the pattern describing a token cannot be written as a regular expression. For example, there may be iteration but with constraints between the end of one iteration and the start of the next. @item When more than 1 rule matches in a flex input file, flex chooses between them based on @itemize - @item Longest match first @item Earliest rule in the file if more than 1 match of the same length exists @end itemize dfasyn allows for a more general method of resolving multiple matches. Conceptually, it works out which rules match, giving a true/false status for each rule. The input file defines an arbitrarily complex set of boolean expressions to reduce the multiple matches down to one unique one. (If more than one of the boolean expressions evaluates true, this is an error.) @item When a customised method is required to construct the input tokens that pass to the scanner. For example, if the tokens are the characters in a string (rather than coming from a file), or if some special logic has to be used to generate the tokens from the input character stream. @item If you want to add actions to the scanning loop, e.g. to remember special locations within the word being scanned. @end itemize @node Non-uses for dfasyn @section Cases where flex might be better In general, flex is easier and more convenient to use. Where it is applicable to your problem, there are no obvious benefits to using dfasyn. @node Why written @section Why was dfasyn written? @c}}} @c{{{ ch:Input file format @node Input file format @chapter Input file format This section describes the format of the input file. @c}}} @node Concept Index @unnumbered Concept Index @printindex cp @bye @c vim:syntax=OFF:fdm=marker:fdc=4:cms=@c%s mairix-master/dfasyn/evaluator.c000066400000000000000000000136111224450623700173010ustar00rootroot00000000000000/*************************************** Routines for merging and prioritising exit tags and attribute tags ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* Handle boolean expressions used to determine the final scanner result from the set of NFA accepting states that are simultaneously active at the end of the scan. */ #include "dfasyn.h" struct Attr { char *attr; /* The string to write to the output file */ /* The boolean expression that defines whether the attribute is active */ Expr *e; /* If != 0, assume the state machine that the program's output is embedded in will exit immediately if this result occurs. This may allow lots of states to be culled from the DFA. */ int early; }; typedef struct Attr Attr; struct evaluator { Attr *attrs; int is_used; /* Set if any input rules reference this evaluator */ int n_attrs; int max_attrs; char *name; char *defattr; char *attr_type; }; Evaluator *default_evaluator; struct evaluator_list { struct evaluator_list *next; Evaluator *evaluator; }; static struct evaluator_list *evaluator_list = NULL; /* Array pointer */ static struct evaluator **evaluators = NULL; int n_evaluators = 0; Evaluator* start_evaluator(const char *name)/*{{{*/ { Evaluator *x = NULL; struct evaluator_list *el; for (el=evaluator_list; el; el=el->next) { /* name is null for the default (anonymous) attribute group */ const char *een = el->evaluator->name; if ((!een && !name) || (een && name && !strcmp(een, name))) { x = el->evaluator; break; } } if (!x) { struct evaluator_list *nel; x = new(struct evaluator); x->attrs = NULL; x->is_used = 0; x->n_attrs = x->max_attrs = 0; x->name = name ? new_string(name) : NULL; x->defattr = NULL; x->attr_type = NULL; nel = new(struct evaluator_list); nel->next = evaluator_list; nel->evaluator = x; evaluator_list = nel; } return x; } /*}}}*/ void destroy_evaluator(Evaluator *x)/*{{{*/ { /* Just leak memory for now, no need to clean up. */ return; } /*}}}*/ void define_defattr(Evaluator *x, char *text)/*{{{*/ { x = x ? x : default_evaluator; x->defattr = new_string(text); x->is_used = 1; } /*}}}*/ void define_type(Evaluator *x, char *text)/*{{{*/ { x = x ? x : default_evaluator; x->attr_type = new_string(text); x->is_used = 1; } /*}}}*/ char* get_defattr(int i)/*{{{*/ { Evaluator *x = evaluators[i]; return x->defattr; } /*}}}*/ char* get_attr_type(int i)/*{{{*/ { Evaluator *x = evaluators[i]; return x->attr_type ? x->attr_type : "short"; } /*}}}*/ char* get_attr_name(int i)/*{{{*/ { Evaluator *x = evaluators[i]; return x->name ? x->name : NULL; } /*}}}*/ static void grow_attrs(Evaluator *x)/*{{{*/ { if (x->n_attrs == x->max_attrs) { x->max_attrs += 32; x->attrs = resize_array(Attr, x->attrs, x->max_attrs); } } /*}}}*/ void define_attr(Evaluator *x, char *string, Expr *e, int early)/*{{{*/ /*++++++++++++++++++++ Add a attr defn. If the expr is null, it means build a single expr corr. to the value of the tag with the same name as the attr string. ++++++++++++++++++++*/ { Attr *r; x = x ? x : default_evaluator; x->is_used = 1; grow_attrs(x); r = &(x->attrs[x->n_attrs++]); r->attr = new_string(string); r->early = early; if (e) { r->e = e; } else { Expr *ne; ne = new_tag_expr(string); r->e = ne; } return; } /*}}}*/ void make_evaluator_array(void)/*{{{*/ { int n; struct evaluator_list *el; for (el=evaluator_list, n=0; el; el=el->next, n++) ; evaluators = new_array(struct evaluator *, n); n_evaluators = n; for (el=evaluator_list, n=0; el; el=el->next, n++) { evaluators[n] = el->evaluator; } } /*}}}*/ int evaluate_attrs(char ***attrs, int *attr_early)/*{{{*/ /*++++++++++++++++++++ Evaluate the attr which holds given the tags that are set ++++++++++++++++++++*/ { int i, j; int status; if (attr_early) *attr_early = 0; status = 1; *attrs = new_array(char *, n_evaluators); for (j=0; jn_attrs; i++) { if (eval(x->attrs[i].e)) { if (matched >= 0) { *attr = NULL; status = 0; break; } else { any_attrs_so_far = 1; matched = i; } } } if (matched < 0) { *attr = NULL; } else { *attr = x->attrs[matched].attr; if (attr_early) *attr_early |= x->attrs[matched].early; } } return status; } /*}}}*/ int evaluator_is_used(Evaluator *x)/*{{{*/ { return x->is_used; } /*}}}*/ void emit_dfa_attr_report(char **attrs, FILE *out)/*{{{*/ { int i; for (i=0; iname; fprintf(out, " Attributes for <%s> : %s\n", name ? name : "(DEFAULT)", attrs[i]); } } } /*}}}*/ /* Initialisation */ void eval_initialise(void)/*{{{*/ { default_evaluator = start_evaluator(NULL); } /*}}}*/ mairix-master/dfasyn/expr.c000066400000000000000000000127041224450623700162570ustar00rootroot00000000000000/*************************************** Routines for merging and prioritising exit tags and attribute tags ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* Handle boolean expressions used to determine the final scanner result from the set of NFA accepting states that are simultaneously active at the end of the scan. */ #include "dfasyn.h" enum ExprType { E_AND, E_OR, E_XOR, E_COND, E_NOT, E_TAG }; struct Tag; struct Expr { enum ExprType type; union { struct { struct Expr *c1, *c2; } and; struct { struct Expr *c1, *c2; } or; struct { struct Expr *c1, *c2; } xor; struct { struct Expr *c1, *c2, *c3; } cond; struct { struct Expr *c1; } not; struct { char *name; struct Tag *s; } tag; } data; }; struct Tag { char *name; int is_expr; union { Expr *e; int val; } data; int is_used; }; struct TagList { struct TagList *next; struct Tag *tag; }; typedef struct Tag Tag; typedef struct TagList TagList; static TagList *tags = NULL; Expr * new_not_expr(Expr *c)/*{{{*/ { Expr *r = new(Expr); r->type = E_NOT; r->data.not.c1 = c; return r; } /*}}}*/ Expr * new_and_expr(Expr *c1, Expr *c2)/*{{{*/ { Expr *r = new(Expr); r->type = E_AND; r->data.and.c1 = c1; r->data.and.c2 = c2; return r; } /*}}}*/ Expr * new_or_expr(Expr *c1, Expr *c2)/*{{{*/ { Expr *r = new(Expr); r->type = E_OR; r->data.or.c1 = c1; r->data.or.c2 = c2; return r; } /*}}}*/ Expr * new_xor_expr(Expr *c1, Expr *c2)/*{{{*/ { Expr *r = new(Expr); r->type = E_XOR; r->data.xor.c1 = c1; r->data.xor.c2 = c2; return r; } /*}}}*/ Expr * new_cond_expr(Expr *c1, Expr *c2, Expr *c3)/*{{{*/ { Expr *r = new(Expr); r->type = E_COND; r->data.cond.c1 = c1; r->data.cond.c2 = c2; r->data.cond.c3 = c3; return r; } /*}}}*/ Expr * new_tag_expr(char *tag_name)/*{{{*/ /* Return expr for tag name if it already exist, else create. Don't bind to actual tag instance yet. At the stage of parsing where this function is used, we don't know yet which tag table the tag has to exist in. */ { Expr *r; r = new(Expr); r->type = E_TAG; r->data.tag.name = new_string(tag_name); r->data.tag.s = NULL; /* Force binding at first use */ return r; } /*}}}*/ static void add_new_tag(Tag *s)/*{{{*/ { TagList *nsl = new(TagList); nsl->tag = s; nsl->next = tags; tags = nsl; } /*}}}*/ static Tag * find_tag_or_create(char *tag_name)/*{{{*/ { Tag *s; TagList *sl; for (sl=tags; sl; sl=sl->next) { s = sl->tag; if (!strcmp(s->name, tag_name)) { return s; } } s = new(Tag); add_new_tag(s); s->is_expr = 0; /* Until proven otherwise */ s->data.val = 0; /* Force initial value to be well-defined */ s->name = new_string(tag_name); s->is_used = 0; return s; } /*}}}*/ void define_tag(char *name, Expr *e)/*{{{*/ /*++++++++++++++++++++ Define an entry in the tag table. ++++++++++++++++++++*/ { Tag *s; s = find_tag_or_create(name); s->data.e = e; s->is_expr = 1; return; } /*}}}*/ void clear_tag_values(void)/*{{{*/ { TagList *sl; for (sl=tags; sl; sl=sl->next) { Tag *s = sl->tag; if (0 == s->is_expr) { s->data.val = 0; } } } /*}}}*/ void set_tag_value(char *tag_name)/*{{{*/ { Tag *s; s = find_tag_or_create(tag_name); if (s->is_expr) { fprintf(stderr, "Cannot set value for tag '%s', it is defined by an expression\n", s->name); exit(2); } else { s->data.val = 1; } } /*}}}*/ int eval(Expr *e)/*{{{*/ /*++++++++++++++++++++ Evaluate the value of an expr ++++++++++++++++++++*/ { switch (e->type) { case E_AND: return eval(e->data.and.c1) && eval(e->data.and.c2); case E_OR: return eval(e->data.or.c1) || eval(e->data.or.c2); case E_XOR: return eval(e->data.xor.c1) ^ eval(e->data.xor.c2); case E_COND: return eval(e->data.cond.c1) ? eval(e->data.cond.c2) : eval(e->data.cond.c3); case E_NOT: return !eval(e->data.not.c1); case E_TAG: { Tag *s = e->data.tag.s; int result; if (!s) { /* Not bound yet */ e->data.tag.s = s = find_tag_or_create(e->data.tag.name); } if (s->is_expr) { result = eval(s->data.e); } else { result = s->data.val; } s->is_used = 1; return result; } default: fprintf(stderr, "Interal error : Can't get here!\n"); exit(2); } } /*}}}*/ void report_unused_tags(void)/*{{{*/ { Tag *s; TagList *sl; for (sl=tags; sl; sl=sl->next) { s = sl->tag; if (!s->is_used) { fprintf(stderr, "Warning: tag <%s> not referenced by any attribute expression\n", s->name); } } } /*}}}*/ mairix-master/dfasyn/n2d.c000066400000000000000000000450421224450623700157650ustar00rootroot00000000000000/*************************************** Convert NFA to DFA ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2000-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* {{{ General comments Convert a nondeterminstic finite automaton (NFA) into a deterministic finite automaton (DFA). The NFA is defined in terms of a set of states, with transitions between the states. The transitions may occur on any one of a set of symbols (specified with | characters between the options), or may be 'epsilon' transitions, i.e. occurring without consumption of any input. A state may have multiple transitions for the same input symbol (hence 'nondeterministic'). The final state encountered within the final block defined in the input file is taken to be the start state of the whole NFA. A state may be entered more than once in the file; the transitions in the multiple definitions are combined to give the complete transition set. A state may have 1 or more tags assigned (with =); this is the return value of the automaton if the end of string is encountered when in that state. }}} */ #include #include "dfasyn.h" #include /* Globally visible options to control reporting */ int verbose; struct Entrylist *entries = NULL; /* ================================================================= */ static inline int round_up(const int x) {/*{{{*/ return (x+31)>>5; } /*}}}*/ static inline void set_bit(unsigned long *x, int n)/*{{{*/ { int r = n>>5; unsigned long m = 1UL<<(n&31); x[r] |= m; } /*}}}*/ static inline int is_set(unsigned long *x, int n)/*{{{*/ { int r = n>>5; unsigned long m = 1UL<<(n&31); return !!(x[r] & m); } /*}}}*/ /* ================================================================= */ static void transitively_close_eclo(unsigned long **eclo, int N)/*{{{*/ { int from; unsigned long *from_row; unsigned long *todo, this_todo; int Nru; int i, i32, j, k, merge_idx; int j_limit; int any_changes; Nru = round_up(N); todo = new_array(unsigned long, Nru); for (from=0; from 32) j_limit = 32; for (j=0; j>= 1; if (!this_todo) break; /* Workload reduction at end */ j++; } } } } } /*}}}*/ void generate_epsilon_closure(Block *b)/*{{{*/ { int i, j, N; N = b->nstates; b->eclo = new_array(unsigned long*, N); for (i=0; ieclo[i] = new_array(unsigned long, round_up(N)); for (j=0; jeclo[i][j] = 0; } } /* Determine initial immediate transitions */ for (i=0; istates[i]; TransList *tl; int from_state = s->index; set_bit(b->eclo[from_state], from_state); /* Always reflexive */ for (tl=s->transitions; tl; tl=tl->next) { switch (tl->type) { case TT_EPSILON: { int to_state = tl->ds_ref->index; set_bit(b->eclo[from_state], to_state); } break; case TT_TOKEN: /* smoke out old method of indicating an epsilon trans */ assert(tl->x.token >= 0); break; default: assert(0); break; } } } transitively_close_eclo(b->eclo, N); } /*}}}*/ void print_nfa(Block *b)/*{{{*/ { int i, j, N; N = b->nstates; if (!report) return; for (i=0; istates[i]; TransList *tl; Stringlist *sl; fprintf(report, "NFA state %d = %s", i, s->name); if (s->entries) { int first = 1; Stringlist *e = s->entries; fputs(" [Entries: ", report); while (e) { if (!first) { fputc(',', report); } first = 0; fputs(e->string, report); e = e->next; } fputc(']', report); } fputc('\n', report); for (tl=s->transitions; tl; tl=tl->next) { switch (tl->type) { case TT_EPSILON: fprintf(report, " [(epsilon)] -> "); break; case TT_TOKEN: assert(tl->x.token >= 0); if (tl->x.token >= ntokens) { fprintf(report, " "); print_charclass(report, tl->x.token - ntokens); fprintf(report, " -> "); } else { fprintf(report, " %s -> ", toktable[tl->x.token]); } break; default: assert(0); break; } fprintf(report, "%s\n", tl->ds_name); } if (s->tags) { int first = 1; fprintf(report, " Tags : "); for (sl=s->tags; sl; sl=sl->next) { fprintf(report, "%s%s", first ? "" : "|", sl->string); } fprintf(report, "\n"); } fprintf(report, " Epsilon closure :\n (self)\n"); for (j=0; jeclo[i], j)) { fprintf(report, " %s\n", b->states[j]->name); } } fprintf(report, "\n"); } } /*}}}*/ /* ================================================================= */ /* Indexed [from_state][token][to_state], flag set if there is a transition from from_state to to_state, via token then zero or more epsilon transitions */ static unsigned long ***transmap; /* Index [from_nfa_state][token], flag set if there is a transition to any destination nfa state for that token. */ static unsigned long **anytrans; /* ================================================================= */ void build_transmap(Block *b)/*{{{*/ { int N = b->nstates; int Nt = ntokens + n_charclasses; int i, j, k, m, dest; transmap = new_array(unsigned long **, N); anytrans = new_array(unsigned long *, N); for (i=0; istates[i]; TransList *tl; for (tl=s->transitions; tl; tl=tl->next) { switch (tl->type) { case TT_EPSILON: break; case TT_TOKEN: { assert(tl->x.token >= 0); dest = tl->ds_ref->index; for (m=0; meclo[dest][m]; transmap[i][tl->x.token][m] |= x; if (!!x) set_bit(anytrans[i], tl->x.token); } } break; default: assert(0); break; } } } } /*}}}*/ /* ================================================================= */ int had_ambiguous_result = 0; /* ================================================================= */ /* Implement an array of linked lists to access DFA states directly. The * hashes are given by folding the signatures down to single bytes. */ struct DFAList { struct DFAList *next; DFANode *dfa; }; #define DFA_HASHSIZE 256 static struct DFAList *dfa_hashtable[DFA_HASHSIZE]; /* ================================================================= */ int n_dfa_entries; struct DFAEntry *dfa_entries = NULL; /* ================================================================= */ static void grow_dfa(struct DFA *dfa)/*{{{*/ { dfa->max += 32; dfa->s = resize_array(DFANode*, dfa->s, dfa->max); } /*}}}*/ static unsigned long fold_signature(unsigned long sig)/*{{{*/ { unsigned long folded; folded = sig ^ (sig >> 16); folded ^= (folded >> 8); folded &= 0xff; return folded; } /*}}}*/ /* ================================================================= */ static int find_dfa(unsigned long *nfas, int N)/*{{{*/ /* Simple linear search. Use 'signatures' to get rapid rejection of any DFA state that can't possibly match */ { int j; unsigned long signature = 0UL; unsigned long folded_signature; struct DFAList *dfal; for (j=0; jnext) { DFANode *dfa = dfal->dfa; int matched; if (signature != dfa->signature) continue; matched=1; for (j=0; jnfas[j]) { matched = 0; break; } } if (matched) { return dfa->index; } } return -1; } /*}}}*/ /*{{{ add_dfa() */ static int add_dfa(Block *b, struct DFA *dfa, unsigned long *nfas, int N, int Nt, int from_state, int via_token) { int j; int result = dfa->n; int this_result_unambiguous; Stringlist *ex; unsigned long signature = 0UL, folded_signature; struct DFAList *dfal; if (verbose) { fprintf(stderr, "Adding DFA state %d\r", dfa->n); fflush(stderr); } if (dfa->max == dfa->n) { grow_dfa(dfa); } dfa->s[dfa->n] = new(DFANode); dfa->s[dfa->n]->nfas = new_array(unsigned long, round_up(N)); dfa->s[dfa->n]->map = new_array(int, Nt); for (j=0; js[dfa->n]->map[j] = -1; dfa->s[dfa->n]->index = dfa->n; dfa->s[dfa->n]->defstate = -1; dfa->s[dfa->n]->from_state = from_state; dfa->s[dfa->n]->via_token = via_token; for (j=0; js[dfa->n]->nfas[j] = x; } dfa->s[dfa->n]->signature = signature; folded_signature = fold_signature(signature); dfal = new(struct DFAList); dfal->dfa = dfa->s[dfa->n]; dfal->next = dfa_hashtable[folded_signature]; dfa_hashtable[folded_signature] = dfal; /* {{{ Boolean reductions to get attributes */ ex = NULL; clear_tag_values(); for (j=0; js[dfa->n]->nfas, j)) { Stringlist *sl; State *s = b->states[j]; for (sl = s->tags; sl; sl = sl->next) { Stringlist *new_sl; new_sl = new(Stringlist); new_sl->string = sl->string; new_sl->next = ex; ex = new_sl; set_tag_value(sl->string); } } } dfa->s[dfa->n]->nfa_exit_sl = ex; this_result_unambiguous = evaluate_attrs(&dfa->s[dfa->n]->attrs, &dfa->s[dfa->n]->has_early_exit); if (!this_result_unambiguous) { Stringlist *sl; fprintf(stderr, "WARNING : Ambiguous exit state abandoned for DFA state %d\n", dfa->n); fprintf(stderr, "NFA exit tags applying in this stage :\n"); for (sl = ex; sl; sl = sl->next) { fprintf(stderr, " %s\n", sl->string); } had_ambiguous_result = 1; } /*}}}*/ ++dfa->n; return result; } /*}}}*/ static void clear_nfas(unsigned long *nfas, int N)/*{{{*/ { int i; for (i=0; in = 0; dfa->max = 0; dfa->s = NULL; dfa->b = b; for (i=0; instates; rup_N = round_up(N); Nt = ntokens + n_charclasses; nfas = new_array(unsigned long *, Nt); for (i=0; ieclo[dfa_entries[j].state_number][i]; } /* Must handle the case where >=2 of the start states are actually identical; * nothing in the input language prevents this. */ idx = find_dfa(nfas[0], N); if (idx < 0) { idx = dfa->n; add_dfa(b, dfa, nfas[0], N, Nt, -1, -1); } dfa_entries[j].state_number = idx; } next_to_do = 0; found_any = new_array(int, Nt); /* Now the heart of the program : the subset construction to turn the NFA into a DFA. This is a major performance hog in the program, so there are lots of tricks to speed this up (particularly, hoisting intermediate pointer computations out of the loop to assert the fact that there is no aliasing between the arrays.) */ while (next_to_do < dfa->n) { int t; /* token index */ int j0, j0_5, j1, j, mask, k; int idx; unsigned long *current_nfas; unsigned long block_bitmap; /* If the next DFA state has the result_early flag set, it means that the scanner will * always exit straight away when that state is reached, so there's no need to compute * any transitions out of it. */ if (dfa->s[next_to_do]->has_early_exit) { next_to_do++; continue; } for (j=0; js[next_to_do]->nfas; for (j0=0; j0s[next_to_do]->map[t] = idx; } next_to_do++; } free(found_any); for (i=0; is[idx]->from_state; if (from_state >= 0) { display_route(dfa, from_state, out); fputs("->", out); } via_token = dfa->s[idx]->via_token; if (via_token >= ntokens) { print_charclass(out, via_token - ntokens); } else if (via_token >= 0) { fprintf(out, "%s", toktable[via_token]); } } /*}}}*/ void print_dfa(struct DFA *dfa)/*{{{*/ { int N = dfa->b->nstates; int Nt = ntokens + n_charclasses; int i, j0, j0_5, j1, t; unsigned long mask; unsigned long current_nfas; int rup_N = round_up(N); int from_state, this_state; if (!report) return; for (i=0; in; i++) { fprintf(report, "DFA state %d\n", i); if (dfa->s[i]->nfas) { fprintf(report, " NFA states :\n"); for (j0=0; j0s[i]->nfas[j0]; if (!current_nfas) continue; j0_5 = j0<<5; for (j1=0, mask=1UL; j1<32; mask<<=1, j1++) { if (current_nfas & mask) { fprintf(report, " %s\n", dfa->b->states[j0_5 + j1]->name); } } } fprintf(report, "\n"); } fprintf(report, " Forward route :"); this_state = i; from_state = dfa->s[i]->from_state; if (from_state >= 0) { fprintf(report, " (from state %d)", from_state); } fputs("\n (START)", report); display_route(dfa, i, report); fputs("->(HERE)", report); fprintf(report, "\n"); fprintf(report, " Transitions :\n"); for (t=0; ts[i]->map[t]; if (dest >= 0) { if (t >= ntokens) { fprintf(report, " "); print_charclass(report, t - ntokens); fprintf(report, " -> %d\n", dest); } else { fprintf(report, " %s -> %d\n", toktable[t], dest); } } } if (dfa->s[i]->defstate >= 0) { fprintf(report, " Use state %d as basis (%d fixups)\n", dfa->s[i]->defstate, dfa->s[i]->best_diff); } if (dfa->s[i]->nfa_exit_sl) { Stringlist *sl; fprintf(report, " NFA exit tags applying :\n"); for (sl=dfa->s[i]->nfa_exit_sl; sl; sl = sl->next) { fprintf(report, " %s\n", sl->string); } } emit_dfa_attr_report(dfa->s[i]->attrs, report); fprintf(report, "\n"); } fprintf(report, "\nEntry states in DFA:\n"); for (i=0; i : %d\n", dfa_entries[i].entry_name, dfa_entries[i].state_number); } } /*}}}*/ /* ================================================================= */ void yyerror (const char *s)/*{{{*/ { extern int lineno; fprintf(stderr, "%s at line %d\n", s, lineno); } /*}}}*/ int yywrap(void) /*{{{*/ { return -1; } /*}}}*/ /* ================================================================= */ mairix-master/dfasyn/n2d.h000066400000000000000000000165631224450623700160000ustar00rootroot00000000000000/*************************************** $Header: /cvs/src/dfasyn/n2d.h,v 1.2 2003/03/02 23:42:11 richard Exp $ Header file for NFA->DFA conversion utility. ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #ifndef N2D_H #define N2D_H #include #include #include #define new(T) ((T *) malloc(sizeof(T))) #define new_array(T,N) ((T *) malloc((N) * sizeof(T))) #define resize_array(T,arr,newN) ((T *) ((arr) ? realloc(arr,(newN)*sizeof(T)) : malloc((newN)*sizeof(T)))) #define new_string(s) strcpy((char *)malloc((strlen(s)+1)*sizeof(char)),s) /* For typecasting, especially useful for declarations of local ptrs to args of a qsort comparison fn */ #define Castdecl(x, T, nx) T nx = (T) x #define Castderef(x, T, nx) T nx = *(T*) x /* Globally visible options to control reporting */ extern FILE *report; extern int verbose; struct State; struct Block; typedef struct Translist { struct Translist *next; int token; char *ds_name; struct State *ds_ref; } Translist; typedef struct Stringlist { struct Stringlist *next; char *string; } Stringlist; typedef struct InlineBlock { char *type; /* Block type */ char *in; /* Name of input node */ char *out; /* Name of output node */ } InlineBlock; typedef struct InlineBlockList { struct InlineBlockList *next; InlineBlock *ib; } InlineBlockList; typedef struct State { char *name; int index; /* Array index in containing block */ struct Block *parent; Translist *transitions; Stringlist *exitvals; Stringlist *attributes; /* Pointers to the nodes in the 'transitions' list, sorted into canonical order */ Translist **ordered_trans; int n_transitions; unsigned char removed; /* Flag indicating state has been pruned by compression stage */ } State; typedef struct S_Stateset { State **states; int nstates; int maxstates; } Stateset; #define HASH_BUCKETS 64 #define HASH_MASK (HASH_BUCKETS-1) typedef struct Block { char *name; /* The master table of states within this block. This has to be in a flat array because we have to work with respect to state indices when doing the 2D bitmap stuff for the subset construction. */ State **states; int nstates; int maxstates; /* Hash table for getting rapid access to a state within the block, given its name */ Stateset state_hash[HASH_BUCKETS]; int subcount; /* Number for generating substates */ int subblockcount; /* Number for generating inline subblocks */ } Block; typedef struct { unsigned long *nfas; unsigned long signature; /* All the longwords in the nfas array xor'ed together */ int index; /* Entry's own index in the array */ int *map; /* index by token code */ int from_state; /* the state which provided the first transition to this one (leading to its creation) */ int via_token; /* the token through which we got to this state the first time. */ Stringlist *nfa_exit_sl; /* NFA exit values */ Stringlist *nfa_attr_sl; /* NFA exit values */ char *result; /* Result token, computed by boolean expressions defined in input text */ int result_early; /* If !=0, the scanner is expected to exit immediately this DFA state is entered. It means that no out-bound transitions have to be created. */ char *attribute; /* Attribute token, computed by boolean expressions defined in input text */ /* Fields calculated in compdfa.c */ /* The equivalence class the state is in. */ int eq_class; /* Temp. storage for the new eq. class within a single pass of the splitting alg. */ int new_eq_class; /* Signature field from above is also re-used. */ int is_rep; /* Set if state is chosen as the representative of its equivalence class. */ int new_index; /* New index assigned to the state. */ /* Fields calculated in tabcompr.c */ unsigned long transition_sig; /* Default state, i.e. the one that supplies transitions for tokens not explicitly listed for this one. */ int defstate; /* Number of transitions that this state has different to those in the default state. */ int best_diff; } DFANode; void yyerror(const char *s); extern int yylex(void); /* Constants for 'create' args */ #define USE_OLD_MUST_EXIST 0 #define CREATE_MUST_NOT_EXIST 1 #define CREATE_OR_USE_OLD 2 State *get_curstate(void); struct Abbrev; extern struct Abbrev * create_abbrev(char *name); extern void add_tok_to_abbrev(struct Abbrev *abbrev, char *tok); int lookup_token(char *name, int create); Block *lookup_block(char *name, int create); State *lookup_state(Block *in_block, char *name, int create); Stringlist * add_token(Stringlist *existing, char *token); void add_transitions(State *curstate, Stringlist *tokens, char *destination); State * add_transitions_to_internal(Block *curblock, State *addtostate, Stringlist *tokens); void add_exit_value(State *curstate, char *value); void set_state_attribute(State *curstate, char *name); InlineBlock *create_inline_block(char *type, char *in, char *out); InlineBlockList *add_inline_block(InlineBlockList *existing, InlineBlock *nib); State * add_inline_block_transitions(Block *curblock, State *addtostate, InlineBlockList *ibl); void instantiate_block(Block *curblock, char *block_name, char *instance_name); void fixup_state_refs(Block *b); void compress_nfa(Block *b); /* In expr.c */ typedef struct Expr Expr; typedef struct evaluator Evaluator; extern Evaluator *exit_evaluator; extern Evaluator *attr_evaluator; Expr * new_wild_expr(void); Expr * new_not_expr(Expr *c); Expr * new_and_expr(Expr *c1, Expr *c2); Expr * new_or_expr(Expr *c1, Expr *c2); Expr * new_xor_expr(Expr *c1, Expr *c2); Expr * new_cond_expr(Expr *c1, Expr *c2, Expr *c3); Expr * new_sym_expr(char *sym_name); void define_symbol(Evaluator *x, char *name, Expr *e); void define_result(Evaluator *x, char *string, Expr *e, int early); void define_symresult(Evaluator *x, char *string, Expr *e, int early); void define_defresult(Evaluator *x, char *string); void clear_symbol_values(Evaluator *x); void set_symbol_value(Evaluator *x, char *sym_name); int evaluate_result(Evaluator *x, char **, int *); int evaluator_is_used(Evaluator *x); void define_defresult(Evaluator *x, char *text); void define_type(Evaluator *x, char *text); char* get_defresult(Evaluator *x); char* get_result_type(Evaluator *x); void eval_initialise(void); void compress_transition_table(DFANode **dfas, int ndfas, int ntokens); unsigned long increment(unsigned long x, int field); unsigned long count_bits_set(unsigned long x); /* Return new number of DFA states */ int compress_dfa(DFANode **dfas, int ndfas, int ntokens); #endif /* N2D_H */ mairix-master/dfasyn/parse.y000066400000000000000000000211161224450623700164360ustar00rootroot00000000000000/********************************************************************** Grammar definition for input files defining an NFA *********************************************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ %{ #include "dfasyn.h" static Block *curblock = NULL; /* Current block being built */ static State *curstate = NULL; /* Current state being worked on */ static State *addtostate = NULL; /* Current state (incl ext) to which transitions are added */ static StimulusList *curtranslist = NULL; /* Final option set of stimuli prior to ARROW */ static CharClass *curcharclass = NULL; static Evaluator *current_evaluator = NULL; State *get_curstate(void) { return curstate; } %} %union { char c; char *s; int i; Stringlist *sl; Stimulus *st; StimulusList *stl; InlineBlock *ib; CharClass *cc; Expr *e; } %token STRING STATE TOKENS PREFIX ARROW BLOCK ENDBLOCK COLON EQUAL SEMICOLON COMMA %token ABBREV DEFINE %type STRING %type stimulus %type tag_seq %type stimulus_seq %type transition_seq %type expr %type inline_block %type CHAR %type char_class simple_char_class negated_char_class char_class_diff %token ATTR TAG %token DEFATTR %token EARLY %token TYPE %token ENTRY %token ENTRYSTRUCT %token GROUP %token LBRACE RBRACE %token LSQUARE RSQUARE %token LSQUARE_CARET %token CHAR HYPHEN %right QUERY COLON %left PIPE %left XOR %left AND %left NOT %left LPAREN RPAREN %left LANGLE RANGLE %% all : decl_seq ; decl_seq : /* empty */ | decl_seq decl ; decl : block_decl | tokens_decl | abbrev_decl | attr_decl | group_decl | tag_decl | prefix_decl | entrystruct_decl ; /* Don't invalidate curstate at the end, this is the means of working out the starting state of the NFA */ block_decl : block1 block2 { fixup_state_refs(curblock); curblock = NULL; } ; block1 : BLOCK STRING LBRACE { curblock = lookup_block($2, CREATE_MUST_NOT_EXIST); addtostate = curstate = NULL; } ; block2 : instance_decl_seq state_decl_seq RBRACE ; prefix_decl : PREFIX STRING { if (!prefix) { prefix = $2; } else { fprintf(stderr, "\n\nWarning: prefix declaration ignored; already set on the command line\n"); } }; tokens_decl : TOKENS token_seq ; abbrev_decl : ABBREV STRING EQUAL stimulus_seq { create_abbrev($2, $4); } ; token_seq : token_seq token | token ; token : STRING { (void) lookup_token($1, CREATE_MUST_NOT_EXIST); } ; instance_decl_seq : /* empty */ | instance_decl_seq instance_decl ; state_decl_seq : /* empty */ | state_decl_seq state_decl ; state_decl : STATE STRING { addtostate = curstate = lookup_state(curblock, $2, CREATE_OR_USE_OLD); } sdecl_seq | STATE STRING ENTRY STRING { addtostate = curstate = lookup_state(curblock, $2, CREATE_OR_USE_OLD); add_entry_to_state(curstate, $4); } sdecl_seq ; sdecl_seq : /* empty */ | sdecl_seq sdecl ; sdecl : transition_decl ; instance_decl : STRING COLON STRING { instantiate_block(curblock, $3 /* master_block_name */, $1 /* instance_name */ ); } ; transition_decl : transition_seq ARROW { curtranslist = $1; } destination_seq { addtostate = curstate; } | transition_seq EQUAL tag_seq { addtostate = add_transitions_to_internal(curblock, addtostate, $1); add_tags(addtostate, $3); addtostate = curstate; } ; destination_seq : STRING { add_transitions(curblock, addtostate, curtranslist, $1); } | destination_seq COMMA STRING { add_transitions(curblock, addtostate, curtranslist, $3); } ; transition_seq : stimulus_seq { $$ = $1; } | transition_seq SEMICOLON stimulus_seq { addtostate = add_transitions_to_internal(curblock, addtostate, $1); $$ = $3; } ; tag_seq : STRING { $$ = add_string_to_list(NULL, $1); } | tag_seq COMMA STRING { $$ = add_string_to_list($1, $3); } ; stimulus_seq : stimulus { $$ = append_stimulus_to_list(NULL, $1); } | stimulus_seq PIPE stimulus { $$ = append_stimulus_to_list($1, $3); } ; /* A 'thing' that will make the DFA move from one state to another */ stimulus : STRING { $$ = stimulus_from_string($1); } | inline_block { $$ = stimulus_from_inline_block($1); } | char_class { add_charclass_to_list($1); /* freeze it into the list. */ $$ = stimulus_from_char_class($1); } | /* empty */ { $$ = stimulus_from_epsilon(); } ; inline_block : LANGLE STRING COLON STRING ARROW STRING RANGLE { $$ = create_inline_block($2, $4, $6); } ; char_class : simple_char_class | negated_char_class | char_class_diff ; negated_char_class : NOT simple_char_class { invert_charclass($2); $$ = $2; } ; char_class_diff : simple_char_class NOT simple_char_class { diff_charclasses($1, $3); free_charclass($3); $$ = $1; } ; simple_char_class : LSQUARE { curcharclass = new_charclass(); } cc_body RSQUARE { $$ = curcharclass; curcharclass = NULL; } | LSQUARE_CARET { curcharclass = new_charclass(); } cc_body RSQUARE { $$ = curcharclass; invert_charclass($$); curcharclass = NULL; } ; cc_body : CHAR { add_singleton_to_charclass(curcharclass, $1); } | CHAR HYPHEN CHAR { add_range_to_charclass(curcharclass, $1, $3); } | cc_body CHAR { add_singleton_to_charclass(curcharclass, $2); } | cc_body CHAR HYPHEN CHAR { add_range_to_charclass(curcharclass, $2, $4); } ; attr_decl : ATTR simple_attr_seq | ATTR STRING COLON expr { define_attr(current_evaluator, $2, $4, 0); } | EARLY ATTR early_attr_seq | EARLY ATTR STRING COLON expr { define_attr(current_evaluator, $3, $5, 1); } | DEFATTR STRING { define_defattr(current_evaluator, $2); } | TYPE STRING { define_type(current_evaluator, $2); } ; simple_attr_seq : STRING { define_attr(current_evaluator, $1, NULL, 0); } | simple_attr_seq COMMA STRING { define_attr(current_evaluator, $3, NULL, 0); } ; early_attr_seq : STRING { define_attr(current_evaluator, $1, NULL, 1); } | early_attr_seq COMMA STRING { define_attr(current_evaluator, $3, NULL, 1); } ; group_decl : GROUP STRING LBRACE { current_evaluator = start_evaluator($2); } attr_decl_seq RBRACE { current_evaluator = NULL; } ; attr_decl_seq : /* empty */ | attr_decl_seq attr_decl ; tag_decl : TAG STRING EQUAL expr { define_tag($2, $4); } ; entrystruct_decl : ENTRYSTRUCT STRING STRING { define_entrystruct($2, $3); } ; expr : NOT expr { $$ = new_not_expr($2); } | expr AND expr { $$ = new_and_expr($1, $3); } | expr PIPE /* OR */ expr { $$ = new_or_expr($1, $3); } | expr XOR expr { $$ = new_xor_expr($1, $3); } | expr QUERY expr COLON expr { $$ = new_cond_expr($1, $3, $5); } | LPAREN expr RPAREN { $$ = $2; } | STRING { $$ = new_tag_expr($1); } ; /* vim:et */ mairix-master/dfasyn/scan.l000066400000000000000000000112211224450623700162270ustar00rootroot00000000000000/********************************************************************** Lexical analyser definition for input files defining an NFA *********************************************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ %{ #include "dfasyn.h" #include "parse.h" /* yyunput() not used - define this to avoid compiler warnings */ #define YY_NO_UNPUT int lineno = 1; %} %x PASSTHRU %x STR %x CHARCLASS %% STATE|State|state { return STATE; } ABBREV|Abbrev|abbrev { return ABBREV; } DEFINE|Define|define { return DEFINE; } TOKENS|Tokens|tokens { return TOKENS; } PREFIX|Prefix|prefix { return PREFIX; } BLOCK|Block|block { return BLOCK; } TYPE|Type|type { return TYPE; } ENTRY|Entry|entry { return ENTRY; } ENTRYSTRUCT { return ENTRYSTRUCT; } EntryStruct { return ENTRYSTRUCT; } Entrystruct { return ENTRYSTRUCT; } entrystruct { return ENTRYSTRUCT; } ATTR|Attr|attr { return ATTR; } EARLY|Early|early { return EARLY; } DEFATTR|DefAttr { return DEFATTR; } Defattr|defattr { return DEFATTR; } TAG|Tag|tag { return TAG; } GROUP|Group|group { return GROUP; } [A-Za-z0-9_.]+ { yylval.s = new_string(yytext); return STRING; } \#.*$ { /* strip comments */ } \-\> { return ARROW; } = { return EQUAL; } \| { return PIPE; /* OR */ } \& { return AND; } \~ { return NOT; } \! { return NOT; } \^ { return XOR; } \? { return QUERY; } \: { return COLON; } \; { return SEMICOLON; } \( { return LPAREN; } \) { return RPAREN; } \{ { return LBRACE; } \} { return RBRACE; } \< { return LANGLE; } \> { return RANGLE; } \[ { BEGIN CHARCLASS; return LSQUARE; } \[\^ { BEGIN CHARCLASS; return LSQUARE_CARET; } \, { return COMMA; } \n { lineno++; } [ \t]+ { /* ignore */ } ^\%\{[ \t]*\n { BEGIN PASSTHRU; } \" { BEGIN STR; } . { printf("Unmatched input <%s> at line %d\n", yytext, lineno); exit (1); } ^\%\}[ \t]*\n { BEGIN INITIAL; } \n { fputs(yytext, yyout); lineno++; } .+ { fputs(yytext, yyout); } \" { BEGIN INITIAL; } [^"]* { yylval.s = new_string(yytext); return STRING; } \] { BEGIN INITIAL; return RSQUARE; } \- { return HYPHEN; } \\- { yylval.c = '-'; return CHAR; } \\] { yylval.c = ']'; return CHAR; } \\^ { yylval.c = '^'; return CHAR; } \\n { yylval.c = '\n'; return CHAR; } \\r { yylval.c = '\r'; return CHAR; } \\f { yylval.c = '\f'; return CHAR; } \\t { yylval.c = '\t'; return CHAR; } \\\\ { yylval.c = '\\'; return CHAR; } \^[@A-Z] { yylval.c = yytext[1] - '@'; return CHAR; } \\x[0-9a-fA-F][0-9a-fA-F] { unsigned int foo; sscanf(yytext+2,"%x",&foo); yylval.c = (char) foo; return CHAR; } \\[0-7][0-7][0-7] { unsigned int foo; sscanf(yytext+1,"%o",&foo); yylval.c = (char) foo; return CHAR; } . { yylval.c = yytext[0]; return CHAR; } %{ /* vim:et */ %} mairix-master/dfasyn/states.c000066400000000000000000000214231224450623700166020ustar00rootroot00000000000000/*************************************** Handle state-related stuff ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2000-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" static void maybe_grow_states(Block *b, int hash)/*{{{*/ { Stateset *ss = b->state_hash + hash; if (ss->nstates == ss->maxstates) { ss->maxstates += 8; ss->states = resize_array(State*, ss->states, ss->maxstates); } if (b->nstates == b->maxstates) { b->maxstates += 32; b->states = resize_array(State*, b->states, b->maxstates); } } /*}}}*/ static unsigned long hashfn(const char *s)/*{{{*/ { unsigned long y = 0UL, v, w, x, k; const char *t = s; while (1) { k = (unsigned long) *(unsigned char *)(t++); if (!k) break; v = ~y; w = y<<13; x = v>>6; y = w ^ x; y += k; } y ^= (y>>13); y &= HASH_MASK; return y; } /*}}}*/ static State * create_state(Block *b, char *name)/*{{{*/ { State *result; int hash; Stateset *ss; hash = hashfn(name); maybe_grow_states(b, hash); ss = b->state_hash + hash; result = b->states[b->nstates++] = ss->states[ss->nstates++] = new(State); result->name = new_string(name); result->parent = b; result->index = b->nstates - 1; result->transitions = NULL; result->tags = NULL; result->entries = NULL; result->ordered_trans = NULL; result->n_transitions = 0; result->removed = 0; return result; } /*}}}*/ State * lookup_state(Block *b, char *name, int create)/*{{{*/ { State *found = NULL; int i; int hash; Stateset *ss; hash = hashfn(name); ss = b->state_hash + hash; for (i=0; instates; i++) { if (!strcmp(ss->states[i]->name, name)) { found = ss->states[i]; break; } } switch (create) { case USE_OLD_MUST_EXIST: if (!found) { fprintf(stderr, "Could not find a state '%s' in block '%s' to transition to\n", name, b->name); exit(1); } break; case CREATE_MUST_NOT_EXIST: if (found) { fprintf(stderr, "Warning : already have a state '%s' in block '%s'\n", name, b->name); } else { found = create_state(b, name); } break; case CREATE_OR_USE_OLD: if (!found) { found = create_state(b, name); } break; } return found; } /*}}}*/ void add_entry_to_state(State *curstate, const char *entry_tag)/*{{{*/ { struct Entrylist *new_entries = new(struct Entrylist); new_entries->entry_name = new_string(entry_tag); new_entries->state = curstate; new_entries->next = entries; entries = new_entries; curstate->entries = add_string_to_list(curstate->entries, entry_tag); } /*}}}*/ /* ================================================================= */ static void add_transition(Block *curblock, State *curstate, Stimulus *stimulus, char *destination); /* ================================================================= */ Stringlist * add_string_to_list(Stringlist *existing, const char *token)/*{{{*/ { Stringlist *result = new(Stringlist); if (token) { result->string = new_string(token); } else { result->string = NULL; } result->next = existing; return result; } /*}}}*/ static TransList *new_translist(struct TransList *existing, char *destination)/*{{{*/ { TransList *result; result = new(TransList); result->next = existing; result->ds_name = new_string(destination); return result; } /*}}}*/ static void add_epsilon_transition(State *curstate, char *destination)/*{{{*/ { TransList *tl = new_translist(curstate->transitions, destination); tl->type = TT_EPSILON; curstate->transitions = tl; } /*}}}*/ static void add_token_transition(State *curstate, int token, char *destination)/*{{{*/ { TransList *tl = new_translist(curstate->transitions, destination); tl->type = TT_TOKEN; tl->x.token = token; curstate->transitions = tl; } /*}}}*/ static void add_abbrev_transition(Block *curblock, State *curstate, struct Abbrev *abbrev, char *destination)/*{{{*/ { StimulusList *stimuli; for (stimuli = abbrev->stimuli; stimuli; stimuli = stimuli->next) { add_transition(curblock, curstate, stimuli->stimulus, destination); } } /*}}}*/ static void add_inline_block_transition(Block *curblock, State *curstate, InlineBlock *ib, char *destination)/*{{{*/ { char block_name[1024]; char input_name[1024]; char output_name[1024]; State *output_state; sprintf(block_name, "%s#%d", ib->type, curblock->subblockcount++); instantiate_block(curblock, ib->type, block_name); sprintf(input_name, "%s.%s", block_name, ib->in); sprintf(output_name, "%s.%s", block_name, ib->out); output_state = lookup_state(curblock, output_name, CREATE_OR_USE_OLD); add_epsilon_transition(curstate, input_name); add_epsilon_transition(output_state, destination); } /*}}}*/ static void add_char_class_transition(State *curstate, CharClass *cc, char *destination)/*{{{*/ { TransList *tl = new_translist(curstate->transitions, destination); tl->type = TT_CHARCLASS; tl->x.char_class = cc; curstate->transitions = tl; } /*}}}*/ static void add_transition(Block *curblock, State *curstate, Stimulus *stimulus, char *destination)/*{{{*/ /* Add a single transition to the state. Allow definitions to be recursive */ { switch (stimulus->type) { case T_EPSILON: add_epsilon_transition(curstate, destination); break; case T_TOKEN: add_token_transition(curstate, stimulus->x.token, destination); break; case T_ABBREV: add_abbrev_transition(curblock, curstate, stimulus->x.abbrev, destination); break; case T_INLINEBLOCK: add_inline_block_transition(curblock, curstate, stimulus->x.inline_block, destination); break; case T_CHARCLASS: add_char_class_transition(curstate, stimulus->x.char_class, destination); break; } } /*}}}*/ void add_transitions(Block *curblock, State *curstate, StimulusList *stimuli, char *destination)/*{{{*/ { StimulusList *sl; for (sl=stimuli; sl; sl=sl->next) { add_transition(curblock, curstate, sl->stimulus, destination); } } /*}}}*/ State * add_transitions_to_internal(Block *curblock, State *addtostate, StimulusList *stimuli)/*{{{*/ { char buffer[1024]; State *result; sprintf(buffer, "#%d", curblock->subcount++); result = lookup_state(curblock, buffer, CREATE_MUST_NOT_EXIST); add_transitions(curblock, addtostate, stimuli, result->name); return result; } /*}}}*/ void add_tags(State *curstate, Stringlist *sl)/*{{{*/ { if (curstate->tags) { /* If we already have some, stick them on the end of the new list */ Stringlist *xsl = sl; while (xsl->next) xsl = xsl->next; xsl->next = curstate->tags; } curstate->tags = sl; } /*}}}*/ /* ================================================================= */ void fixup_state_refs(Block *b)/*{{{*/ { int i; for (i=0; instates; i++) { State *s = b->states[i]; TransList *tl; for (tl=s->transitions; tl; tl=tl->next) { tl->ds_ref = lookup_state(b, tl->ds_name, CREATE_OR_USE_OLD); } } } /*}}}*/ /* ================================================================= */ void expand_charclass_transitions(Block *b)/*{{{*/ { int i; for (i=0; instates; i++) { State *s = b->states[i]; TransList *tl; for (tl=s->transitions; tl; tl=tl->next) { if (tl->type == TT_CHARCLASS) { int i, first; CharClass *cc = tl->x.char_class; first = 1; for (i=0; i<256; i++) { /* Insert separate transitions for each subclass of the charclass */ if (cc_test_bit(cc->group_bitmap, i)) { if (first) { tl->type = TT_TOKEN; tl->x.token = ntokens + i; } else { TransList *ntl = new(TransList); ntl->next = tl->next; ntl->ds_name = new_string(tl->ds_name); ntl->ds_ref = tl->ds_ref; ntl->type = TT_TOKEN; ntl->x.token = ntokens + i; tl->next = ntl; } first = 0; } } } } } } /*}}}*/ /* ================================================================= */ mairix-master/dfasyn/stimulus.c000066400000000000000000000043741224450623700171720ustar00rootroot00000000000000/*************************************** Handle stimulus-related stuff ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" Stimulus *stimulus_from_epsilon(void)/*{{{*/ { Stimulus *result; result = new(Stimulus); result->type = T_EPSILON; return result; } /*}}}*/ Stimulus *stimulus_from_string(char *str)/*{{{*/ { struct Abbrev *abbrev; Stimulus *result; result = new(Stimulus); /* See if an abbrev exists with the name */ abbrev = lookup_abbrev(str); if (abbrev) { result->type = T_ABBREV; result->x.abbrev = abbrev; } else { /* Token */ int token; token = lookup_token(str, USE_OLD_MUST_EXIST); /* lookup_token will have bombed if it wasn't found. */ result->type = T_TOKEN; result->x.token = token; } return result; } /*}}}*/ Stimulus *stimulus_from_inline_block(InlineBlock *block)/*{{{*/ { Stimulus *result; result = new(Stimulus); result->type = T_INLINEBLOCK; result->x.inline_block = block; return result; } /*}}}*/ Stimulus *stimulus_from_char_class(CharClass *char_class)/*{{{*/ { Stimulus *result; result = new(Stimulus); result->type = T_CHARCLASS; result->x.char_class = char_class; return result; } /*}}}*/ StimulusList *append_stimulus_to_list(StimulusList *existing, Stimulus *stim)/*{{{*/ { StimulusList *result; result = new(StimulusList); result->next = existing; result->stimulus = stim; return result; } /*}}}*/ mairix-master/dfasyn/tabcompr.c000066400000000000000000000127601224450623700171120ustar00rootroot00000000000000/*************************************** Routines to compress the DFA transition tables, by identifying where two DFA states have a lot of transitions the same. ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2001-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" /* ================================================================= */ /* Treat 'x' as a set of 16 bit pairs, with field (0..15) specifying which. Increment the field'th bit pair as a gray code, in the pattern 00->01->11->10->00 */ unsigned long increment(unsigned long x, int field) { int f2 = field + field; static unsigned char transxor[4] = {1, 2, 2, 1}; unsigned long g = x >> f2; unsigned long h = transxor[g&3]; return x ^ (h<>1) & c) + (y & c); c = 0x33333333UL; y = ((y>>2) & c) + (y & c); y = (y>>4) + y; c = 0x0f0f0f0fUL; y &= c; y = (y>>8) + y; y = (y>>16) + y; return y & 0x1f; } /* ================================================================= */ /* Compute 'signatures' of the transitions out of a particular state. The signature is given by considering the destination state numbers mod 16, and counting how many transitions there are in each resulting equivalence class. The number is encoded using the gray code implied by the increment fn. */ static void compute_transition_sigs(struct DFA *dfa, int ntokens) { int i, j; for (i=0; in; i++) { unsigned long ts = 0UL; /* transition signature */ for (j=0; js[i]->map[j]; dest &= 0xf; /* 16 bit pairs in 'ts' */ ts = increment(ts, dest); } dfa->s[i]->transition_sig = ts; } } /* ================================================================= */ #define REQUIRED_BENEFIT 2 static void find_default_states(struct DFA *dfa, int ntokens) { int i, j, t; int best_index; int best_diff; int trans_count; /* Number of transitions in working state */ unsigned long tsi; for (i=0; in; i++) { trans_count = 0; for (t=0; ts[i]->map[t] >= 0) trans_count++; } dfa->s[i]->defstate = -1; /* not defaulted */ best_index = -1; best_diff = ntokens + 1; /* Worse than any computed value */ tsi = dfa->s[i]->transition_sig; for (j=0; js[j]->defstate >= 0) continue; /* Avoid chains of defstates */ tsj = dfa->s[j]->transition_sig; /* This is the heart of the technique : if we xor two vectors of bit pairs encoded with the gray code above, and count the number of bits set in the result, we get the sum of absolute differences of the bit pairs. The number of outgoing transitions that differ between the states must be _at_least_ this value. It may in fact be much greater (i.e. we may get 'false matches'). However, this algorithm is a quick way of filtering most of the useless potential default states out. */ sigdiff = tsi ^ tsj; diffsize = count_bits_set(sigdiff); if (diffsize >= best_diff) continue; if (diffsize >= trans_count) continue; /* Else pointless! */ /* Otherwise, do an exact check (i.e. see how much false matching we suffered). */ diffsize = 0; for (t=0; ts[i]->map[t] != dfa->s[j]->map[t]) { diffsize++; } } if (((best_index < 0) || (diffsize < best_diff)) && (diffsize < (trans_count - REQUIRED_BENEFIT))) { best_index = j; best_diff = diffsize; } } dfa->s[i]->defstate = best_index; dfa->s[i]->best_diff = best_diff; } } /* ================================================================= */ void compress_transition_table(struct DFA *dfa, int ntokens) { compute_transition_sigs(dfa, ntokens); find_default_states(dfa, ntokens); } /* ================================================================= */ #ifdef TEST int main () { unsigned long x = 0; unsigned long x1, x2, x3, x4; x1 = increment(x, 2); x2 = increment(x1, 2); x3 = increment(x2, 2); x4 = increment(x3, 2); printf("%d %d %d %d %d\n", x, x1, x2, x3, x4); printf("1=%d\n", count_bits_set(0x00000001)); printf("2=%d\n", count_bits_set(0x00000003)); printf("3=%d\n", count_bits_set(0x00000007)); printf("4=%d\n", count_bits_set(0x0000000f)); printf("4=%d\n", count_bits_set(0xf0000000)); return 0; } #endif mairix-master/dfasyn/tokens.c000066400000000000000000000042011224450623700165750ustar00rootroot00000000000000/*************************************** Handle token-related stuff ***************************************/ /* ********************************************************************** * Copyright (C) Richard P. Curnow 2000-2003,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "dfasyn.h" char **toktable=NULL; int ntokens = 0; static int maxtokens = 0; /* ================================================================= */ static void grow_tokens(void)/*{{{*/ { maxtokens += 32; toktable = resize_array(char *, toktable, maxtokens); } /*}}}*/ static int create_token(char *name)/*{{{*/ { int result; if (ntokens == maxtokens) { grow_tokens(); } result = ntokens++; toktable[result] = new_string(name); return result; } /*}}}*/ int lookup_token(char *name, int create)/*{{{*/ { int found = -1; int i; for (i=0; i= 0) { fprintf(stderr, "Token '%s' already declared\n", name); exit(1); } else { found = create_token(name); } break; case CREATE_OR_USE_OLD: if (found < 0) { found = create_token(name); } break; } return found; } /*}}}*/ mairix-master/dirscan.c000066400000000000000000000247561224450623700154520ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005,2006,2007 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* Traverse a directory tree and find maildirs, then list files in them. */ #include #include #include #include #include #include #include "mairix.h" struct msgpath_array *new_msgpath_array(void)/*{{{*/ { struct msgpath_array *result; result = new(struct msgpath_array); result->paths = NULL; result->type = NULL; result->n = 0; result->max = 0; return result; } /*}}}*/ void free_msgpath_array(struct msgpath_array *x)/*{{{*/ { int i; if (x->paths) { for (i=0; in; i++) { switch (x->type[i]) { case MTY_FILE: free(x->paths[i].src.mpf.path); break; case MTY_MBOX: break; case MTY_DEAD: break; } } free(x->type); free(x->paths); } free(x); } /*}}}*/ static void add_file_to_list(char *x, struct msgpath_array *arr) {/*{{{*/ char *y = new_string(x); if (arr->n == arr->max) { arr->max += 1024; arr->paths = grow_array(struct msgpath, arr->max, arr->paths); arr->type = grow_array(enum message_type, arr->max, arr->type); } arr->type[arr->n] = MTY_FILE; arr->paths[arr->n].src.mpf.path = y; ++arr->n; return; } /*}}}*/ static void get_maildir_message_paths(char *folder, struct msgpath_array *arr)/*{{{*/ { char *subdir, *fname; int i; static char *subdirs[] = {"new", "cur"}; DIR *d; struct dirent *de; int folder_len = strlen(folder); /* FIXME : just store mdir-rooted paths in array and have common prefix elsewhere. */ subdir = new_array(char, folder_len + 6); fname = new_array(char, folder_len + 8 + NAME_MAX); for (i=0; i<2; i++) { strcpy(subdir, folder); strcat(subdir, "/"); strcat(subdir, subdirs[i]); d = opendir(subdir); if (d) { while ((de = readdir(d))) { /* TODO : Perhaps we ought to do some validation on the path here? i.e. check that the filename looks valid for a maildir message. */ if (!strcmp(de->d_name, ".") || !strcmp(de->d_name, "..")) { continue; } strcpy(fname, subdir); strcat(fname, "/"); strcat(fname, de->d_name); add_file_to_list(fname, arr); } closedir(d); } } free(subdir); free(fname); return; } /*}}}*/ int valid_mh_filename_p(const char *x)/*{{{*/ { const char *p; if (!*x) return 0; /* Must not be empty */ p = x; while (*p) { if (!isdigit(*p)) { /* Handle MH folders generated by Evolution, which have '.' on the ends * of the numerical filenames for the messages. */ if ((p[0] == '.') && (p[1] == 0)) return 1; else return 0; } p++; } return 1; } /*}}}*/ static void get_mh_message_paths(char *folder, struct msgpath_array *arr)/*{{{*/ { char *fname; DIR *d; struct dirent *de; int folder_len = strlen(folder); fname = new_array(char, folder_len + 8 + NAME_MAX); d = opendir(folder); if (d) { while ((de = readdir(d))) { if (!strcmp(de->d_name, ".") || !strcmp(de->d_name, "..")) { continue; } strcpy(fname, folder); strcat(fname, "/"); strcat(fname, de->d_name); if (valid_mh_filename_p(de->d_name)) { add_file_to_list(fname, arr); } } closedir(d); } free(fname); return; } /*}}}*/ static int child_stat(const char *base, const char *child, struct stat *sb)/*{{{*/ { int result = 0; char *scratch; int len; len = strlen(base) + strlen(child) + 2; scratch = new_array(char, len); strcpy(scratch, base); strcat(scratch, "/"); strcat(scratch, child); result = stat(scratch, sb); free(scratch); return result; } /*}}}*/ static int has_child_file(const char *base, const char *child)/*{{{*/ { int result = 0; int status; struct stat sb; status = child_stat(base, child, &sb); if ((status >= 0) && S_ISREG(sb.st_mode)) { result = 1; } return result; } /*}}}*/ static int has_child_dir(const char *base, const char *child)/*{{{*/ { int result = 0; int status; struct stat sb; status = child_stat(base, child, &sb); if ((status >= 0) && S_ISDIR(sb.st_mode)) { result = 1; } return result; } /*}}}*/ static enum traverse_check scrutinize_maildir_entry(int parent_is_maildir, const char *de_name)/*{{{*/ { if (parent_is_maildir) { /* Process any subdirectory that's not part of this maildir itself. */ if (!strcmp(de_name, "new") || !strcmp(de_name, "cur") || !strcmp(de_name, "tmp")) { return TRAV_IGNORE; } else { return TRAV_PROCESS; } } else { return TRAV_PROCESS; } } /*}}}*/ static int filter_is_maildir(const char *path, const struct stat *sb)/*{{{*/ { if (S_ISDIR(sb->st_mode)) { if (has_child_dir(path, "new") && has_child_dir(path, "tmp") && has_child_dir(path, "cur")) { return 1; } } return 0; } /*}}}*/ struct traverse_methods maildir_traverse_methods = {/*{{{*/ .filter = filter_is_maildir, .scrutinize = scrutinize_maildir_entry }; /*}}}*/ static enum traverse_check scrutinize_mh_entry(int parent_is_mh, const char *de_name)/*{{{*/ { /* Have to allow sub-folders within a folder until we think of a better * solution. */ if (valid_mh_filename_p(de_name)) { return TRAV_IGNORE; } else { return TRAV_PROCESS; } } /*}}}*/ static int filter_is_mh(const char *path, const struct stat *sb)/*{{{*/ { int result = 0; if (S_ISDIR(sb->st_mode)) { /* TODO : find a way of making this more scalable? e.g. if a folder of a * particular subtype is found once, try that subtype first later, since * the user presumably uses a consistent MH-subtype (i.e. a single MUA). */ if (has_child_file(path, ".xmhcache") || has_child_file(path, ".mh_sequences") || /* Sylpheed */ has_child_file(path, ".sylpheed_cache") || has_child_file(path, ".sylpheed_mark") || /* claws-mail */ has_child_file(path, ".claws_cache") || has_child_file(path, ".claws_mark") || /* NNML (Gnus) */ has_child_file(path, ".marks") || has_child_file(path, ".overview") || /* Evolution */ has_child_file(path, "cmeta") || has_child_file(path, "summary") || /* Mew */ has_child_file(path, ".mew-summary") || /* ezmlm/archive */ has_child_file(path, "index") ) { result = 1; } } return result; } /*}}}*/ struct traverse_methods mh_traverse_methods = {/*{{{*/ .filter = filter_is_mh, .scrutinize = scrutinize_mh_entry }; /*}}}*/ #if 0 static void scan_directory(char *folder_base, char *this_folder, enum folder_type ft, struct msgpath_array *arr)/*{{{*/ { DIR *d; struct dirent *de; struct stat sb; char *fname, *sname; char *name; int folder_base_len = strlen(folder_base); int this_folder_len = strlen(this_folder); name = new_array(char, folder_base_len + this_folder_len + 2); strcpy(name, folder_base); strcat(name, "/"); strcat(name, this_folder); switch (ft) { case FT_MAILDIR: if (looks_like_maildir(folder_base, this_folder)) { get_maildir_message_paths(folder_base, this_folder, arr); } break; case FT_MH: get_mh_message_paths(folder_base, this_folder, arr); break; default: break; } fname = new_array(char, strlen(name) + 2 + NAME_MAX); sname = new_array(char, this_folder_len + 2 + NAME_MAX); d = opendir(name); if (d) { while ((de = readdir(d))) { if (!strcmp(de->d_name, ".") || !strcmp(de->d_name, "..")) { continue; } strcpy(fname, name); strcat(fname, "/"); strcat(fname, de->d_name); strcpy(sname, this_folder); strcat(sname, "/"); strcat(sname, de->d_name); if (stat(fname, &sb) >= 0) { if (S_ISDIR(sb.st_mode)) { scan_directory(folder_base, sname, ft, arr); } } } closedir(d); } free(fname); free(sname); free(name); return; } /*}}}*/ #endif static int message_compare(const void *a, const void *b)/*{{{*/ { /* FIXME : Is this a sensible way to do this with mbox messages in the picture? */ struct msgpath *aa = (struct msgpath *) a; struct msgpath *bb = (struct msgpath *) b; /* This should only get called on 'file' type messages - TBC! */ return strcmp(aa->src.mpf.path, bb->src.mpf.path); } /*}}}*/ static void sort_message_list(struct msgpath_array *arr)/*{{{*/ { qsort(arr->paths, arr->n, sizeof(struct msgpath), message_compare); } /*}}}*/ /*{{{ void build_message_list */ void build_message_list(char *folder_base, char *folders, enum folder_type ft, struct msgpath_array *msgs, struct globber_array *omit_globs) { char **raw_paths, **paths; int n_raw_paths, n_paths, i; split_on_colons(folders, &n_raw_paths, &raw_paths); switch (ft) { case FT_MAILDIR: glob_and_expand_paths(folder_base, raw_paths, n_raw_paths, &paths, &n_paths, &maildir_traverse_methods, omit_globs); for (i=0; in; i++) { printf("%08lx %s\n", arr->paths[i].mtime, arr->paths[i].path); } free_msgpath_array(arr); return 0; } #endif mairix-master/dotlock.c000066400000000000000000000057331224450623700154600ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2005 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "mairix.h" #include #include #include #include static char *lock_file_name = NULL; /* This locking code was originally written for tdl */ void lock_database(char *path, int forced_unlock)/*{{{*/ { struct utsname uu; struct passwd *pw; int pid; int len; char *tname; struct stat sb; FILE *out; if (uname(&uu) < 0) { perror("uname"); exit(1); } pw = getpwuid(getuid()); if (!pw) { perror("getpwuid"); exit(1); } pid = getpid(); len = 1 + strlen(path) + 5; lock_file_name = new_array(char, len); sprintf(lock_file_name, "%s.lock", path); if (forced_unlock) { unlock_database(); forced_unlock = 0; } len += strlen(uu.nodename); /* add on max width of pid field (allow up to 32 bit pid_t) + 2 '.' chars */ len += (10 + 2); tname = new_array(char, len); sprintf(tname, "%s.%d.%s", lock_file_name, pid, uu.nodename); out = fopen(tname, "w"); if (!out) { fprintf(stderr, "Cannot open lock file %s for writing\n", tname); exit(1); } fprintf(out, "%d,%s,%s\n", pid, uu.nodename, pw->pw_name); fclose(out); if (link(tname, lock_file_name) < 0) { /* check if link count==2 */ if (stat(tname, &sb) < 0) { fprintf(stderr, "Could not stat the lock file\n"); unlink(tname); exit(1); } else { if (sb.st_nlink != 2) { FILE *in; in = fopen(lock_file_name, "r"); if (in) { char line[2048]; fgets(line, sizeof(line), in); line[strlen(line)-1] = 0; /* strip trailing newline */ fprintf(stderr, "Database %s appears to be locked by (pid,node,user)=(%s)\n", path, line); unlink(tname); exit(1); } } else { /* lock succeeded apparently */ } } } else { /* lock succeeded apparently */ } unlink(tname); free(tname); return; } /*}}}*/ void unlock_database(void)/*{{{*/ { if (lock_file_name) unlink(lock_file_name); return; } /*}}}*/ void unlock_and_exit(int code)/*{{{*/ { unlock_database(); exit(code); } /*}}}*/ mairix-master/dotmairixrc.eg000066400000000000000000000026051224450623700165120ustar00rootroot00000000000000####################################################################### # # Example ~/.mairixrc file # # Any line starting with # is a comment. # ####################################################################### # Set this to the directory where your maildir folders live base=/home/richard/mail ####################################################################### # You need to define at least one of maildir, mh and mbox. You probably don't # need to define all three! You can use >1 line for any of these. # Set this to a list of maildir folders within 'base'. 3 dots at the end means # there are sub-folders within this folder. maildir=inbox:archive... maildir=lists... # Set this to a list of MH folders within 'base'. 3 dots at the end means # there are sub-folders within this folder. mh=mh_archive... # Set this to a list of mbox folders within 'base'. mbox=mboxen/folder1:mboxen/folder2:mboxen/foobar ####################################################################### # Set this to the folder within 'base' where you want the search mode # to write its output. mfolder=mfolder # Set this if you want the format of mfolder to be mh or mbox (the default is # maildir). # # mformat=mh # mformat=mbox ####################################################################### # Set this to the path where the index database file will be kept database=/home/richard/mail/mairix_database mairix-master/dumper.c000066400000000000000000000107551224450623700153150ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2004, 2005 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* Database dumper */ #include #include #include #include #include #include #include #include #include #include #include "mairix.h" #include "reader.h" #include "memmac.h" static void dump_token_chain(struct read_db *db, unsigned int n, unsigned int *tok_offsets, unsigned int *enc_offsets) { int i, j, incr; int on_line; unsigned char *foo; printf("%d entries\n", n); for (i=0; i\n", i, db->data + tok_offsets[i]); foo = (unsigned char *) db->data + enc_offsets[i]; j = 0; on_line = 0; printf(" "); while (*foo != 0xff) { if (on_line > 15) { printf("\n"); on_line = 0; } incr = read_increment(&foo); j += incr; printf("%d ", j); on_line++; } printf("\n"); } } static void dump_toktable(struct read_db *db, struct toktable_db *tbl, const char *title) { printf("Contents of <%s> table\n", title); dump_token_chain( db, tbl->n, tbl->tok_offsets, tbl->enc_offsets); } static void dump_toktable2(struct read_db *db, struct toktable2_db *tbl, const char *title) { unsigned int n; n = tbl->n; printf("Contents of <%s> table\n", title); printf("Chain 0\n"); dump_token_chain( db, n, tbl->tok_offsets, tbl->enc0_offsets); printf("Chain 1\n"); dump_token_chain( db, n, tbl->tok_offsets, tbl->enc1_offsets); } void dump_database(char *filename) { struct read_db *db; int i; db = open_db(filename); printf("Dump of %s\n", filename); printf("%d messages\n", db->n_msgs); for (i=0; in_msgs; i++) { printf("%6d: ", i); switch (rd_msg_type(db, i)) { case DB_MSG_DEAD: printf("DEAD"); break; case DB_MSG_FILE: printf("FILE %s, size=%d, tid=%d", db->data + db->path_offsets[i], db->size_table[i], db->tid_table[i]); break; case DB_MSG_MBOX: { unsigned int mbix, msgix; decode_mbox_indices(db->path_offsets[i], &mbix, &msgix); printf("MBOX %d, msg %d, offset=%d, size=%d, tid=%d", mbix, msgix, db->mtime_table[i], db->size_table[i], db->tid_table[i]); } break; } if (db->msg_type_and_flags[i] & FLAG_SEEN) printf(" seen"); if (db->msg_type_and_flags[i] & FLAG_REPLIED) printf(" replied"); if (db->msg_type_and_flags[i] & FLAG_FLAGGED) printf(" flagged"); printf("\n"); } printf("\n"); if (db->n_mboxen > 0) { printf("\nMBOX INFORMATION\n"); printf("%d mboxen\n", db->n_mboxen); for (i=0; in_mboxen; i++) { if (db->mbox_paths_table[i]) { printf("%4d: %d msgs in %s\n", i, db->mbox_entries_table[i], db->data + db->mbox_paths_table[i]); } else { printf("%4d: dead\n", i); } } printf("\n"); } printf("Hash key %08x\n\n", db->hash_key); printf("--------------------------------\n"); dump_toktable(db, &db->to, "To"); printf("--------------------------------\n"); dump_toktable(db, &db->cc, "Cc"); printf("--------------------------------\n"); dump_toktable(db, &db->from, "From"); printf("--------------------------------\n"); dump_toktable(db, &db->subject, "Subject"); printf("--------------------------------\n"); dump_toktable(db, &db->body, "Body"); printf("--------------------------------\n"); dump_toktable(db, &db->attachment_name, "Attachment names"); printf("--------------------------------\n"); dump_toktable2(db, &db->msg_ids, "Message Ids"); printf("--------------------------------\n"); close_db(db); return; } mairix-master/expandstr.c000066400000000000000000000103261224450623700160230ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2004 * Copyright (C) Andreas Amann 2010 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "mairix.h" #include #include #include #include #include static int isenv(unsigned char x)/*{{{*/ { /* Return true if x is valid as part of an environment variable name. */ if (isalnum(x)) return 1; else if (x == '_') return 1; else return 0; } /*}}}*/ static int home_dir_len(void)/*{{{*/ { struct passwd *foo; char *lookup; lookup = getenv("HOME"); if (lookup) { return strlen(lookup); } foo = getpwuid(getuid()); return strlen(foo->pw_dir); } /*}}}*/ static char *env_lookup(const char *p, const char *q)/*{{{*/ { char *var; char *lookup, *result; char *s; var = new_array(char, (q-p)+1); for (s=var; ppw_dir); strcpy(to, foo->pw_dir); } return to + len; } /*}}}*/ static char *append_env(char *to, const char *p, const char *q)/*{{{*/ { char *foo; int len; foo = env_lookup(p, q); if (foo) { len = strlen(foo); strcpy(to, foo); free(foo); } else { len = 0; } return (to + len); } /*}}}*/ static void do_expand(const char *p, char *result)/*{{{*/ { const char *q; int first; first = 1; while (*p) { if (first && (*p == '~') && (p[1] == '/')) { result = append_home_dir(result); p++; } else if ((*p == '$') && (p[1] == '{')) { p += 2; q = p; while (*q && (*q != '}')) q++; result = append_env(result, p, q); p = *q ? (q + 1) : q; } else if (*p == '$') { p++; q = p; while (*q && isenv(*(unsigned char*)q)) q++; result = append_env(result, p, q); p = q; } else { *result++ = *p++; } first = 0; } *result = 0; } /*}}}*/ char *expand_string(const char *p)/*{{{*/ { /* Return a copy of p, but with ~ expanded to the user's home directory $env expanded to the value of that environment variable */ int len; char *result; len = compute_length(p); result = new_array(char, len+1); do_expand(p, result); return result; } /*}}}*/ mairix-master/from.h000066400000000000000000000020121224450623700147540ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002-2004,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #ifndef _FROM_H #define _FROM_H enum fromcheck_result { FROMCHECK_PASS, FROMCHECK_FAIL }; #endif mairix-master/fromcheck.nfa000066400000000000000000000127761224450623700163110ustar00rootroot00000000000000######################################################################### # # mairix - message index builder and finder for maildir folders. # # Copyright (C) Richard P. Curnow 2002-2004,2006 # Copyright (C) Jonathan Kamens 2010 # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. # # ======================================================================= %{ #include "from.h" %} # Define tokens # CR : \n # DIGIT : [0-9] # AT : @ # COLON : : # WHITE : ' ', \t # LOWER : [a-z] # UPPER : [A-Z] # PLUSMINUS : [+-] # OTHER_EMAIL : other stuff valid in the LHS of an address # DOMAIN : stuff valid in the RHS of an address Abbrev LF = [\n] Abbrev CR = [\r] Abbrev DIGIT = [0-9] Abbrev PERIOD = [.] Abbrev AT = [@] Abbrev LOWER = [a-z] Abbrev UPPER = [A-Z] Abbrev COLON = [:] Abbrev WHITE = [ \t] Abbrev PLUSMINUS = [+\-] # Explained clearly at # http://en.wikipedia.org/wiki/E-mail_address#RFC_specification Abbrev OTHER_EMAIL = [.!#$%&'*/=?^_`{|}~] Abbrev LT = [<] Abbrev GT = [>] Abbrev EMAIL = LOWER | UPPER | DIGIT | PLUSMINUS | OTHER_EMAIL Abbrev OTHER_DOMAIN = [\-_.] Abbrev DOMAIN = LOWER | UPPER | DIGIT | OTHER_DOMAIN Abbrev DQUOTE = ["] Abbrev OTHER_QUOTED = [@:<>] Abbrev LEFTSQUARE = [[] Abbrev RIGHTSQUARE = [\]] BLOCK email { STATE in EMAIL -> in, before_at DQUOTE -> quoted_before_at AT -> domain_route STATE domain_route DOMAIN -> domain_route COLON -> in STATE quoted_before_at EMAIL | WHITE | OTHER_QUOTED -> quoted_before_at DQUOTE -> before_at STATE before_at EMAIL -> before_at DQUOTE -> quoted_before_at # Local part only : >=1 characters will suffice, which we've already # matched. -> out AT -> start_of_domain STATE start_of_domain LEFTSQUARE -> dotted_quad DOMAIN -> after_at STATE dotted_quad DIGIT | PERIOD -> dotted_quad RIGHTSQUARE -> out STATE after_at DOMAIN -> after_at, out } BLOCK angled_email { STATE in LT -> in_angles STATE in_angles out> -> before_gt STATE before_gt GT -> out } BLOCK zone { # Make this pretty lenient STATE in UPPER -> zone2 UPPER -> out PLUSMINUS -> zone2 STATE zone2 UPPER | LOWER -> zone2, out DIGIT -> zone2, out } BLOCK date { STATE in WHITE -> in, before_weekday STATE before_weekday UPPER ; LOWER ; LOWER ; WHITE -> after_weekday STATE after_weekday WHITE -> after_weekday UPPER ; LOWER ; LOWER ; WHITE -> after_month STATE after_month WHITE -> after_month DIGIT ; WHITE -> after_day DIGIT ; DIGIT ; WHITE -> after_day STATE after_day WHITE -> after_day # Accept HH:MM:SS DIGIT ; DIGIT ; COLON ; DIGIT ; DIGIT ; COLON ; DIGIT ; DIGIT ; WHITE -> after_time # Accept HH:MM DIGIT ; DIGIT ; COLON ; DIGIT ; DIGIT ; WHITE -> after_time # Allow either 1 or 2 words of timezone STATE after_time WHITE -> after_time -> after_timezone out> ; WHITE -> after_timezone out> ; WHITE -> after_timezone_1 # It appears that Pine puts the timezone after the year DIGIT ; DIGIT ; DIGIT ; DIGIT -> after_year_before_zone STATE after_year_before_zone WHITE -> after_year_before_zone out> -> after_timezone_after_year out> ; WHITE -> after_timezone_after_year_1 STATE after_timezone_after_year_1 WHITE -> after_timezone_after_year_1 out> -> after_timezone_after_year STATE after_timezone_after_year WHITE -> after_timezone_after_year -> out STATE after_timezone_1 WHITE -> after_timezone_1 out> ; WHITE -> after_timezone STATE after_timezone WHITE -> after_timezone DIGIT ; DIGIT ; DIGIT ; DIGIT -> after_year STATE after_year WHITE -> after_year -> out } # Assume the earlier code has identified the '\nFrom ' sequence, # and the validator starts scanning from the character beyond the space BLOCK main { STATE in # Real return address. WHITE -> in out> -> before_date out> -> before_date # Cope with Mozilla mbox folder format which just uses a '-' as # the return address field. PLUSMINUS -> before_date # Empty return address -> before_date STATE before_date out> ; LF = FROMCHECK_PASS # Cope with mozilla mbox format out> ; CR ; LF = FROMCHECK_PASS # Mention this state last : the last mentioned state in the last defined # block becomes the entry state of the scanner. STATE in } ATTR FROMCHECK_PASS ATTR FROMCHECK_FAIL DEFATTR FROMCHECK_FAIL PREFIX fromcheck TYPE "enum fromcheck_result" # vim:ft=txt:et:sw=4:sts=4:ht=4 mairix-master/glob.c000066400000000000000000000220731224450623700147400ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2003,2004,2005 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include #include #include #include #include "mairix.h" struct globber { unsigned int pat[256]; unsigned int starpat; unsigned int twostarpat; unsigned int hit; }; struct globber_array { int n; struct globber **globs; }; static const char *parse_charclass(const char *in, struct globber *result, unsigned int mask)/*{{{*/ { int first = 1; int prev = -1; in++; /* Advance over '[' */ while (*in) { if (*in == ']') { if (first) { result->pat[(int)']'] |= mask; } else { return in; } } else if (*in == '-') { /* Maybe range */ if ((prev < 0) || !in[1] || (in[1]==']')) { /* - at either end of string (or right after an earlier range) means * normal - */ result->pat['-'] |= mask; } else { int next = in[1]; int hi, lo; int i; /* Cope with range being inverted */ if (prev < next) { lo = prev, hi = next; } else { lo = next, hi = prev; } for (i=lo; i<=hi; i++) { int index = 0xff & i; result->pat[index] |= mask; } /* require 1 extra increment */ in++; prev = -1; /* Avoid junk like [a-e-z] */ } } else { int index = 0xff & (int)*in; result->pat[index] |= mask; } prev = *in; first = 0; in++; } return in; } /*}}}*/ struct globber *make_globber(const char *wildstring)/*{{{*/ { struct globber *result; int n, i; const char *p; char c; int index; unsigned int mask; result = new(struct globber); memset(&result->pat, 0x00, 256*sizeof(unsigned int)); memset(&result->starpat, 0x00, sizeof(unsigned int)); memset(&result->twostarpat, 0x00, sizeof(unsigned int)); mask = 0x1; n = 0; for (p=wildstring; *p; p++) { mask = 1<twostarpat |= mask; p++; } else { /* Match zero or more of anything */ result->starpat |= mask; } break; /*}}}*/ case '[':/*{{{*/ p = parse_charclass(p, result, mask); n++; break; /*}}}*/ case '?':/*{{{*/ for (i=0; i<256; i++) { result->pat[i] |= mask; } n++; break; /*}}}*/ default:/*{{{*/ index = 0xff & (int)c; result->pat[index] |= mask; n++; break; /*}}}*/ } } result->hit = (1<pat[index]); #endif stars = (reg & g->starpat); twostars = (reg & g->twostarpat); if (index != '/') { stars2 = stars | twostars; } else { stars2 = twostars; } reg &= g->pat[index]; reg <<= 1; reg |= stars2; #if DODEBUG printf(" new_reg=%08lx ", reg); printf("starpat=%08lx stars=%08lx stars2=%08lx\n", g->starpat, stars, stars2); #endif s++; } #if DODEBUG printf("reg=%08lx hit=%08lx\n", reg, g->hit); #endif reg &= g->hit; if (reg) { return 1; } else { return 0; } } /*}}}*/ struct globber_array *colon_sep_string_to_globber_array(const char *in)/*{{{*/ { char **strings; int n_strings; int i; struct globber_array *result; split_on_colons(in, &n_strings, &strings); result = new(struct globber_array); result->n = n_strings; result->globs = new_array(struct globber *, n_strings); for (i=0; iglobs[i] = make_globber(strings[i]); free(strings[i]); } free(strings); return result; } /*}}}*/ int is_globber_array_match(struct globber_array *ga, const char *s)/*{{{*/ { int i; if (!ga) return 0; for (i=0; in; i++) { if (is_glob_match(ga->globs[i], s)) return 1; } return 0; } /*}}}*/ void free_globber_array(struct globber_array *in)/*{{{*/ { int i; for (i=0; in; i++) { free_globber(in->globs[i]); } free(in); } /*}}}*/ static char *copy_folder_name(const char *start, const char *end)/*{{{*/ { /* 'start' points to start of string to copy. Any '\:' sequence is replaced by ':' . Otherwise \ is treated normally. 'end' can be 1 beyond the end of the string to copy. Otherwise it can be null, meaning treat 'start' as the start of a normal null-terminated string. */ char *p; const char *q; int len; char *result; if (end) { len = end - start; } else { len = strlen(start); } result = new_array(char, len + 1); for (p=result, q=start; end ? (q < end) : *q; q++) { if ((q[0] == '\\') && (q[1] == ':')) { /* Escaped colon : drop the backslash */ } else { *p++ = *q; } } *p = '\0'; return result; } /*}}}*/ void string_list_to_array(struct string_list *list, int *n, char ***arr)/*{{{*/ { int N, i; struct string_list *a, *next_a; char **result; for (N=0, a=list->next; a!=list; a=a->next, N++) ; result = new_array(char *, N); for (i=0, a=list->next; idata; next_a = a->next; free(a); } *n = N; *arr = result; } /*}}}*/ void split_on_colons(const char *str, int *n, char ***arr)/*{{{*/ { struct string_list list, *new_cell; const char *left_to_do; list.next = list.prev = &list; left_to_do = str; do { char *colon; char *xx; colon = strchr(left_to_do, ':'); /* Allow backslash-escaped colons in filenames */ if (colon && (colon > left_to_do) && (colon[-1]=='\\')) { int is_escaped; do { colon = strchr(colon + 1, ':'); is_escaped = (colon && (colon[-1] == '\\')); } while (colon && is_escaped); } /* 'colon' now points to the first non-escaped colon or is null if there were no more such colons in the rest of the line. */ xx = copy_folder_name(left_to_do, colon); if (colon) { left_to_do = colon + 1; } else { while (*left_to_do) ++left_to_do; } new_cell = new(struct string_list); new_cell->data = xx; new_cell->next = &list; new_cell->prev = list.prev; list.prev->next = new_cell; list.prev = new_cell; } while (*left_to_do); string_list_to_array(&list, n, arr); } /*}}}*/ #if defined (TEST) void run1(char *ref, char *s, int expected)/*{{{*/ { struct globber *g; int result; g = make_globber(ref); result = is_glob_match(g, s); printf("ref=%s, str=%s, %s %s\n", ref, s, result ? "MATCHED" : "not matched", (expected==result) ? "" : "??????"); free_globber(g); } /*}}}*/ int main (int argc, char **argv)/*{{{*/ { run1("ab?de", "abdde", 1); run1("ab?de", "abcde", 1); run1("ab?de", "Abcde", 0); run1("ab?de", "abcd", 0); run1("ab?de", "abc", 0); run1("ab[cd]de", "abdde", 1); run1("ab[cd]de", "abbde", 0); run1("ab[cd]de", "abcde", 1); run1("ab*de", "ade", 0); run1("ab*de", "abde", 1); run1("ab*de", "abcde", 1); run1("ab*de", "abccde", 1); run1("ab*de", "abccdfde", 1); run1("ab*de", "abccdedf", 0); run1("ab[b-d]de", "abade",0); run1("ab[b-d]de", "abcDe",0); run1("ab[b-d]de", "abcde",1); run1("ab[b-d]de", "abdde",1); run1("ab[b-d]de", "abEde", 0); run1("[a-z][0-9A-F][]a-f-]", "yE]", 1); run1("[a-z][0-9A-F][]a-f-]", "uE[", 0); run1("[a-z][0-9A-F][]a-f-]", "vG-", 0); run1("[a-z][0-9A-F][]a-f-]", "w8-", 1); run1("*", "a", 1); run1("*", "", 1); run1("a*", "a", 1); run1("a*", "aa", 1); run1("a*", "aaA", 1); run1("*a", "aaa", 1); run1("*a", "a", 1); run1("x*abc", "xabdxabc", 1); run1("*", "", 1); run1("a*", "", 0); run1("*a", "", 0); run1("a", "", 0); run1("*abc*", "x/abc/y", 0); run1("**abc**", "x/abc/y", 1); run1("x/*/abc**", "x/z/abc/y", 1); run1("x/*/abc**", "x/z/w/abc/y", 0); run1("x/*/abc**", "x/zz/w/abc/y", 0); run1("x/*/abc**", "x/z/ww/abc/y", 0); run1("x/**/abc**", "x/z/w/abc/y", 1); run1("x/**/abc**", "x/zz/w/abc/y", 1); return 0; } /*}}}*/ #endif mairix-master/hash.c000066400000000000000000000124211224450623700147340ustar00rootroot00000000000000/* Hash function */ #include "mairix.h" /* -------------------------------------------------------------------- lookup2.c, by Bob Jenkins, December 1996, Public Domain. hash(), hash2(), hash3, and mix() are externally useful functions. Routines to test the hash are included if SELF_TEST is defined. You can use this free for any purpose. It has no warranty. -------------------------------------------------------------------- */ #include #include #include #define hashsize(n) ((unsigned int)1<<(n)) #define hashmask(n) (hashsize(n)-1) /* -------------------------------------------------------------------- mix -- mix 3 32-bit values reversibly. For every delta with one or two bit set, and the deltas of all three high bits or all three low bits, whether the original value of a,b,c is almost all zero or is uniformly distributed, * If mix() is run forward or backward, at least 32 bits in a,b,c have at least 1/4 probability of changing. * If mix() is run forward, every bit of c will change between 1/3 and 2/3 of the time. (Well, 22/100 and 78/100 for some 2-bit deltas.) mix() was built out of 36 single-cycle latency instructions in a structure that could supported 2x parallelism, like so: a -= b; a -= c; x = (c>>13); b -= c; a ^= x; b -= a; x = (a<<8); c -= a; b ^= x; c -= b; x = (b>>13); ... Unfortunately, superscalar Pentiums and Sparcs can't take advantage of that parallelism. They've also turned some of those single-cycle latency instructions into multi-cycle latency instructions. Still, this is the fastest good hash I could find. There were about 2^^68 to choose from. I only looked at a billion or so. -------------------------------------------------------------------- */ #define mix(a,b,c) \ { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<<8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>>5); \ a -= b; a -= c; a ^= (c>>3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \ } /* same, but slower, works on systems that might have 8 byte ub4's */ #define mix2(a,b,c) \ { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<< 8); \ c -= a; c -= b; c ^= ((b&0xffffffff)>>13); \ a -= b; a -= c; a ^= ((c&0xffffffff)>>12); \ b -= c; b -= a; b = (b ^ (a<<16)) & 0xffffffff; \ c -= a; c -= b; c = (c ^ (b>> 5)) & 0xffffffff; \ a -= b; a -= c; a = (a ^ (c>> 3)) & 0xffffffff; \ b -= c; b -= a; b = (b ^ (a<<10)) & 0xffffffff; \ c -= a; c -= b; c = (c ^ (b>>15)) & 0xffffffff; \ } /* -------------------------------------------------------------------- hash() -- hash a variable-length key into a 32-bit value k : the key (the unaligned variable-length array of bytes) len : the length of the key, counting by bytes level : can be any 4-byte value Returns a 32-bit value. Every bit of the key affects every bit of the return value. Every 1-bit and 2-bit delta achieves avalanche. About 36+6len instructions. The best hash table sizes are powers of 2. There is no need to do mod a prime (mod is sooo slow!). If you need less than 32 bits, use a bitmask. For example, if you need only 10 bits, do h = (h & hashmask(10)); In which case, the hash table should have hashsize(10) elements. If you are hashing n strings (ub1 **)k, do it like this: for (i=0, h=0; i= 12) { a += (k[0] +((unsigned int)k[1]<<8) +((unsigned int)k[2]<<16) +((unsigned int)k[3]<<24)); b += (k[4] +((unsigned int)k[5]<<8) +((unsigned int)k[6]<<16) +((unsigned int)k[7]<<24)); c += (k[8] +((unsigned int)k[9]<<8) +((unsigned int)k[10]<<16)+((unsigned int)k[11]<<24)); mix(a,b,c); k += 12; len -= 12; } /*------------------------------------- handle the last 11 bytes */ c += length; switch(len) /* all the case statements fall through */ { case 11: c+=((unsigned int)k[10]<<24); case 10: c+=((unsigned int)k[9]<<16); case 9 : c+=((unsigned int)k[8]<<8); /* the first byte of c is reserved for the length */ case 8 : b+=((unsigned int)k[7]<<24); case 7 : b+=((unsigned int)k[6]<<16); case 6 : b+=((unsigned int)k[5]<<8); case 5 : b+=k[4]; case 4 : a+=((unsigned int)k[3]<<24); case 3 : a+=((unsigned int)k[2]<<16); case 2 : a+=((unsigned int)k[1]<<8); case 1 : a+=k[0]; /* case 0: nothing left to add */ } mix(a,b,c); /*-------------------------------------------- report the result */ return c; } mairix-master/mairix.1000066400000000000000000000410471224450623700152260ustar00rootroot00000000000000.TH MAIRIX 1 "January 2006" .de Sx .PP .ne \\$1 .nf .na .RS 7 .. .de Ex .RE .fi .ad .PP .. .de Sy .PP .ne \\$1 .nf .na .RS 12 .. .de Ey .RE .fi .ad .IP "" 7 .. .SH NAME mairix \- index and search mail folders .SH SYNOPSIS .SS Indexing .B mairix [ .BR \-v | \-\-verbose ] [ .BR \-p | \-\-purge ] [ .BR \-f | \-\-rcfile .I mairixrc ] [ .BR \-F | \-\-fast-index ] [ .BR \-\-force-hash-key-new-database .I hash ] .SS Searching .B mairix [ .BR \-v | \-\-verbose ] [ .BR \-f | \-\-rcfile .I mairixrc ] [ .BR \-r | \-\-raw-output ] [ .BR \-x | \-\-excerpt-output ] [ .BR \-H | \-\-force-hardlinks ] [ .BR \-o | \-\-mfolder .I mfolder ] [ .BR \-a | \-\-augment ] [ .BR \-t | \-\-threads ] .I search-patterns .SS Other .B mairix [ .BR \-h | \-\-help ] .B mairix [ .BR \-V | \-\-version ] .B mairix [ .BR \-d | \-\-dump ] .SH DESCRIPTION .I mairix indexes and searches a collection of email messages. The folders containing the messages for indexing are defined in the configuration file. The indexing stage produces a database file. The database file provides rapid access to details of the indexed messages during searching operations. A search normally produces a folder (so-called .BR mfolder ) containing the matched messages. However, a raw mode .RB ( \-r ) exists which just lists the matched messages instead. .PP It can operate with the following folder types .IP * maildir .IP * MH (compatible with the MH folder formats used by xmh, sylpheed, claws-mail, nnml (Gnus) and evolution) .IP * mbox (including mboxes that have been compressed with gzip or bzip2) .PP If maildir or MH source folders are used, and a search outputs its matches to an mfolder in maildir or MH format, symbolic links are used to reference the original messages inside the mfolder. However, if mbox folders are involved, copies of messages are made instead. .SH OPTIONS .B mairix decides whether indexing or searching is required by looking for the presence of any .I search-patterns on the command line. .SS Special modes .TP .B -h, --help .br Show usage summary and exit .TP .B -V, --version Show program version and exit .TP .B -d .br Dump the database's contents in human-readable form to stdout. .SS General options .TP .BI "-f " mairixrc .br .ns .TP .BI "--rcfile " mairixrc .br Specify an alternative configuration file to use. The default configuration file is .IR ~/.mairixrc . .TP .B -v, --verbose .br Make the output more verbose .TP .B -Q, --no-integrity-checks .br Normally .I mairix will do some internal integrity tests on the database. The .B -Q option removes these checks, making .I mairix run faster, but it will be less likely to detect internal problems if any bugs creep in. The .I nochecks directive in the rc file has the same effect. .TP .B \-\-unlock .br .I mairix locks its database file during any indexing or searching operation to prevent multiple indexing runs interfering with each other, or an indexing run interfering with search runs. The .B --unlock option removes the lockfile before doing the requested indexing or searching operation. This is a convenient way of cleaning up a stale lockfile if an earlier run crashed for some reason or was aborted. .SS Indexing options .TP .B -p, --purge .br Cause stale (dead) messages to be purged from the database during an indexing run. (Normally, stale messages are left in the database because of the additional cost of compacting away the storage that they take up.) .TP .B -F, --fast-index .br When processing maildir and MH folders, .I mairix normally compares the mtime and size of each message against the values stored in the database. If they have changed, the message will be rescanned. This check requires each message file to be stat'ed. For large numbers of messages in these folder types, this can be a sizeable overhead. This option tells .I mairix to assume that when a message currently on-disc has a name matching one already in the database, it should assume the message is unchanged. A later indexing run without using this option will fix up any rescans that were missed due to its use. .TP .BI "--force-hash-key-new-database " hash .br This option should only be used for debugging. .br If a new database is created, .I hash is used as hash key, instead of a random hash. .SS Search options .TP .B -a, --augment .br Append newly matches messages to the current mfolder instead of creating the mfolder from scratch. .TP .B -t, --threads .br As well as returning the matched messages, also return every message in the same thread as one of the real matches. .TP .B -r, --raw-output .br Instead of creating an mfolder containing the matched messages, just show their paths on stdout. .TP .B -x, --excerpt-output .br Instead of creating an mfolder containing the matched messages, display an excerpt from their headers on stdout. The excerpt shows To, Cc, From, Subject and Date. .TP .B -H, --force-hardlinks .br Instead of creating symbolic links, force the use of hardlinks. This helps mailers such as alpine to realize that there are new mails in the search folder. .TP .BI "-o " mfolder .br .ns .TP .BI "--mfolder " mfolder .br Specify a temporary alternative path for the mfolder to use, overriding the .I mfolder directive in the rc file. .B mairix will refuse to output search results into any folder that appears to be amongst those that are indexed. This is to prevent accidental deletion of emails. .SS Search patterns .TP .BI t: word .br Match .I word in the To: header. .TP .BI c: word .br Match .I word in the Cc: header. .TP .BI f: word .br Match .I word in the From: header. .TP .BI s: word .br Match .I word in the Subject: header. .TP .BI m: word .br Match .I word in the Message-ID: header. .TP .BI b: word .br Match .I word in the message body. .B Message body is taken to mean any body part of type text/plain or text/html. For text/html, text within meta tags is ignored. In particular, the URLs inside tags are not currently indexed. Non-text attachments are ignored. If there's an attachment of type message/rfc822, this is parsed and the match is performed on this sub-message too. If a hit occurs, the enclosing message is treated as having a hit. .TP .BI d: "[start-datespec]" - "[end-datespec]" .br Match messages with Date: headers lying in the specific range. .TP .BI z: "[low-size]" - "[high-size]" .br Match messages whose size lies in the specified range. If the .I low-size argument is omitted it defaults to zero. If the .I high-size argument is omitted it defaults to infinite size. For example, to match messages between 10kilobytes and 20kilobytes in size, the following search term can be used: .Sy 1 mairix z:10k-20k .Ey The suffix 'k' on a number means multiply by 1024, and the suffix 'M' on a number means multiply by 1024*1024. .TP .BI n: word .br Match .I word occurring as the name of an attachment in the message. Since attachment names are usually long, this option would usually be used in the substring form. So .Sy 1 mairix n:mairix= .Ey would match all messages which have attachments whose names contain the substring .IR mairix . The attachment name is determined from the name=xxx or filename=xxx qualifiers on the Content-Type: and Content-Disposition: headers respectively. .TP .BI F: flags .br Match messages with particular flag settings. The available flags are 's' meaning seen, 'r' meaning replied, and 'f' meaning flagged. The flags are case-insensitive. A flag letter may be prefixed by a '-' to negate its sense. Thus .Sy 1 mairix F:-s d:1w- .Ey would match any unread message less than a week old, and .Sy 1 mairix F:f-r d:-1m .Ey would match any flagged message older than a month which you haven't replied to yet. Note that the flag characters and their meanings agree with those used as the suffix letters on message filenames in maildir folders. .SS Searching for a match amongst more than one part of a message .PP Multiple body parts may be grouped together, if a match in any of them is sought. Common examples follow. .TP .BI tc: word .br Match .I word in either the To: or Cc: headers (or both). .TP .BI bs: word .br Match .I word in either the Subject: header or the message body (or both). .PP The .B a: search pattern is an abbreviation for .BR tcf: ; i.e. match the word in the To:, Cc: or From: headers. ("a" stands for "address" in this case.) .SS Match words The .I word argument to the search strings can take various forms. .TP .I ~word .br Match messages .B not containing the word. .TP .I word1,word2 .br This matches if both the words are matched in the specified message part. .TP .I word1/word2 .br This matches if either of the words are matched in the specified message part. .TP .I substring= .br Match any word containing .I substring as a substring .TP .I substring=N .br Match any word containing .IR substring , allowing up to .I N errors in the match. For example, if .I N is 1, a single error is allowed, where an error can be .IP * a missing letter .IP * an extra letter .IP * a different letter. .TP .I ^substring= .br Match any word containing .I substring as a substring, with the requirement that .I substring occurs at the beginning of the matched word. .SS Precedence matters The binding order of the constructions is: .IP "1." Individual command line arguments define separate conditions which are AND-ed together .IP "2." Within a single argument, the letters before the colon define which message parts the expression applies to. If there is no colon, the expression applies to all the headers listed earlier and the body. .IP "3." After the colon, commas delineate separate disjuncts, which are OR-ed together. .IP "4." Each disjunct may contain separate conjuncts, which are separated by plus signs. These conditions are AND-ed together. .IP "5." Each conjunct may start with a tilde to negate it, and may be followed by a slash to indicate a substring match, optionally followed by an integer to define the maximum number of errors allowed. .SS Date specification .PP This section describes the syntax used for specifying dates when searching using the `d:' option. Dates are specified as a range. The start and end of the range can both be specified. Alternatively, if the start is omitted, it is treated as being the beginning of time. If the end is omitted, it is treated as the current time. There are 4 basic formats: .TP .BI d: start-end .br Specify both start and end explicitly .TP .BI d: start- Specify start, end is the current time .TP .BI d: -end Specify end, start is 'a long time ago' (i.e. early enough to include any message). .TP .BI d: period Specify start and end implicitly, as the start and end of the period given. .PP The start and end can be specified either absolute or relative. A relative endpoint is given as a number followed by a single letter defining the scaling: .TS box tab(&); lb | lb | lb | lb. letter & short for & example & meaning = .T& l | l | l | l. d & days & 3d & 3 days w & weeks & 2w & 2 weeks (14 days) m & months & 5m & 5 months (150 days) y & years & 4y & 4 years (4*365 days) .TE .PP Months are always treated as 30 days, and years as 365 days, for this purpose. Absolute times can be specified in many forms. Some forms have different meanings when they define a start date from that when they define an end date. Where a single expression specifies both the start and end (i.e. where the argument to d: doesn't contain a `-'), it will usually have different interpretations in the two cases. In the examples below, suppose the current date is Sunday May 18th, 2003 (when I started to write this material.) .TS box tab(&); l | l | l | l. Example & Start date & End date & Notes = d:20030301\-20030425 & March 1st, 2003 & 25th April, 2003 d:030301\-030425 & March 1st, 2003 & April 25th, 2003 & century assumed d:mar1\-apr25 & March 1st, 2003 & April 25th, 2003 d:Mar1\-Apr25 & March 1st, 2003 & April 25th, 2003 & case insensitive d:MAR1\-APR25 & March 1st, 2003 & April 25th, 2003 & case insensitive d:1mar\-25apr & March 1st, 2003 & April 25th, 2003 & date and month in either order d:2002 & January 1st, 2002 & December 31st, 2002 & whole year d:mar & March 1st, 2003 & March 31st, 2003 & most recent March d:oct & October 1st, 2002 & October 31st, 2002 & most recent October d:21oct\-mar & October 21st, 2002 & March 31st, 2003 & start before end d:21apr\-mar & April 21st, 2002 & March 31st, 2003 & start before end d:21apr\- & April 21st, 2003 & May 18th, 2003 & end omitted d:\-21apr & January 1st, 1900 & April 21st, 2003 & start omitted d:6w\-2w & April 6th, 2003 & May 4th, 2003 & both dates relative d:21apr\-1w & April 21st, 2003 & May 11th, 2003 & one date relative d:21apr\-2y & April 21st, 2001 & May 11th, 2001 & start before end d:99\-11 & January 1st, 1999 & May 11th, 2003 &T{ 2 digits are a day of the month if possible, otherwise a year T} d:99oct\-1oct & October 1st, 1999 & October 1st, 2002 &T{ end before now, single digit is a day of the month T} d:99oct\-01oct & October 1st, 1999 & October 31st, 2001 &T{ 2 digits starting with zero treated as a year T} d:oct99\-oct1 & October 1st, 1999 & October 1st, 2002 &T{ day and month in either order T} d:oct99\-oct01 & October 1st, 1999 & October 31st, 2001 &T{ year and month in either order T} .TE .PP The principles in the table work as follows. .IP \(bu When the expression defines a period of more than a day (i.e. if a month or year is specified), the earliest day in the period is taken when the start date is defined, and the last day in the period if the end of the range is being defined. .IP \(bu The end date is always taken to be on or before the current date. .IP \(bu The start date is always taken to be on or before the end date. .SH "SETTING UP THE MATCH FOLDER" If the match folder does not exist when running in search mode, it is automatically created. For 'mformat=maildir' (the default), this should be all you need to do. If you use 'mformat=mh', you may have to run some commands before your mailer will recognize the folder. e.g. for mutt, you could do .Sx 2 mkdir -p /home/richard/Mail/mfolder touch /home/richard/Mail/mfolder/.mh_sequences .Ex which seems to work. Alternatively, within mutt, you could set MBOX_TYPE to 'mh' and save a message to '+mfolder' to have mutt set up the structure for you in advance. If you use Sylpheed, the best way seems to be to create the new folder from within Sylpheed before letting mairix write into it. .SH EXAMPLES .PP Suppose my email address is . Either of the following will match all messages newer than 3 months from me with the word 'chrony' in the subject line: .Sx 2 mairix d:3m- f:richard+doesnt+exist s:chrony mairix d:3m- f:richard@doesnt.exist s:chrony .Ex Suppose I don't mind a few spurious matches on the address, I want a wider date range, and I suspect that some messages I replied to might have had the subject keyword spelt wrongly (let's allow up to 2 errors): .Sx 1 mairix d:6m- f:richard s:chrony=2 .Ex .SH NOTES .PP .B mairix works exclusively in terms of .IR words . The index that's built in indexing mode contains a table of which words occur in which messages. Hence, the search capability is based on finding messages that contain particular words. .B mairix defines a word as any string of alphanumeric characters + underscore. Any whitespace, punctuation, hyphens etc are treated as word boundaries. .B mairix has special handling for the To:, Cc: and From: headers. Besides the normal word scan, these headers are scanned a second time, where the characters '@', '-' and '.' are also treated as word characters. This allows most (if not all) email addresses to appear in the database as single words. So if you have a mail from wibble@foobar.zzz, it will match on both these searches .Sx 2 mairix f:foobar mairix f:wibble@foobar.zzz .Ex It should be clear by now that the searching cannot be used to find messages matching general regular expressions. This has never been much of a limitation. Most searches are for particular keywords that were in the messages, or details of the recipients, or the approximate date. It's also worth pointing out that there is no 'locality' information stored, so you can't search for messages that have one words 'close' to some other word. For every message and every word, there is a simple yes/no condition stored - whether the message contains the word in a particular header or in the body. So far this has proved to be adequate. .B mairix has a similar feel to using an Internet search engine. .SH FILES .I ~/.mairixrc .SH AUTHOR Copyright (C) 2002-2006 Richard P. Curnow .SH "SEE ALSO" mairixrc(5) .SH BUGS .PP We need a plugin scheme to allow more types of attachment to be scanned and indexed. mairix-master/mairix.c000066400000000000000000000553631224450623700153160ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005,2006,2007,2008 * Copyright (C) Sanjoy Mahajan 2005 * - mfolder validation code * Copyright (C) James Cameron 2005 * Copyright (C) Paul Fox 2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "mairix.h" #include "version.h" #include #include #include #include #include #include #include #include #ifdef TEST_OOM int total_bytes=0; #endif int verbose = 0; int do_hardlinks = 0; static char *folder_base = NULL; static char *maildir_folders = NULL; static char *mh_folders = NULL; static char *mboxen = NULL; static char *mfolder = NULL; static char *omit = NULL; static char *database_path = NULL; static enum folder_type output_folder_type = FT_MAILDIR; static int skip_integrity_checks = 0; enum filetype { M_NONE, M_FILE, M_DIR, M_OTHER }; static enum filetype classify_file(char *name)/*{{{*/ { struct stat sb; if (stat(name, &sb) < 0) { return M_NONE; } if (S_ISREG(sb.st_mode)) { return M_FILE; } else if (S_ISDIR(sb.st_mode)) { return M_DIR; } else { return M_OTHER; } } /*}}}*/ /*{{{ member of*/ /* returns 1 iff COMPLETE_MFOLDER (i.e. the match folder with folder_base prepended if needed) matches one of the FOLDERS after expanding the wildcards and recursion. Used to make sure that the match folder will not overwrite a valuable mail file or directory. */ int member_of (const char *complete_mfolder, const char *folder_base, const char *folders, enum folder_type ft, struct globber_array *omit_globs) { char **raw_paths, **paths; int n_raw_paths, n_paths, i; if (!folders) return 0; split_on_colons(folders, &n_raw_paths, &raw_paths); switch (ft) { case FT_MAILDIR: glob_and_expand_paths(folder_base, raw_paths, n_raw_paths, &paths, &n_paths, &maildir_traverse_methods, omit_globs); break; case FT_MH: glob_and_expand_paths(folder_base, raw_paths, n_raw_paths, &paths, &n_paths, &mh_traverse_methods, omit_globs); break; case FT_MBOX: glob_and_expand_paths(folder_base, raw_paths, n_raw_paths, &paths, &n_paths, &mbox_traverse_methods, omit_globs); break; case FT_RAW: /* cannot happen but to keep compiler happy */ case FT_EXCERPT: break; } for (i=0; i\n", temp); } free(temp); } /*}}}*/ static void parse_rc_file(char *name)/*{{{*/ { FILE *in; char line[4096], *p; int len, lineno; int all_blank; int used_default_name = 0; if (!name) { /* open default file */ struct passwd *pw; char *home; home = getenv("HOME"); if (!home) { pw = getpwuid(getuid()); if (!pw) { fprintf(stderr, "Cannot determine home directory\n"); exit(2); } home = pw->pw_dir; } name = new_array(char, strlen(home) + 12); strcpy(name, home); strcat(name, "/.mairixrc"); used_default_name = 1; } in = fopen(name, "r"); if (!in) { fprintf(stderr, "Cannot open %s, exiting\n", name); exit(2); } lineno = 0; while(fgets(line, sizeof(line), in)) { lineno++; len = strlen(line); if (len > sizeof(line) - 4) { fprintf(stderr, "Line %d in %s too long, exiting\n", lineno, name); exit(2); } if (line[len-1] == '\n') { line[len-1] = '\0'; } /* Strip trailing comments. */ for (p=line; *p && !strchr("#!;%", *p); p++) ; if (*p) *p = '\0'; /* Discard blank lines */ all_blank = 1; for (p=line; *p; p++) { if (!isspace(*(unsigned char *)p)) { all_blank = 0; break; } } if (all_blank) continue; /* Now a real line to parse */ if (!strncasecmp(p, "base", 4)) folder_base = copy_value(p); else if (!strncasecmp(p, "folders", 7)) { fprintf(stderr, "'folders=' option in rc file is depracated, use 'maildir='\n"); add_folders(&maildir_folders, copy_value(p)); } else if (!strncasecmp(p, "maildir=", 8)) add_folders(&maildir_folders, copy_value(p)); else if (!strncasecmp(p, "mh_folders=", 11)) { fprintf(stderr, "'mh_folders=' option in rc file is depracated, use 'mh='\n"); add_folders(&mh_folders, copy_value(p)); } else if (!strncasecmp(p, "mh=", 3)) add_folders(&mh_folders, copy_value(p)); else if (!strncasecmp(p, "mbox=", 5)) add_folders(&mboxen, copy_value(p)); else if (!strncasecmp(p, "omit=", 5)) add_folders(&omit, copy_value(p)); else if (!strncasecmp(p, "mformat=", 8)) { parse_output_folder(p); } else if (!strncasecmp(p, "mfolder=", 8)) mfolder = copy_value(p); else if (!strncasecmp(p, "database=", 9)) database_path = copy_value(p); else if (!strncasecmp(p, "nochecks", 8)) skip_integrity_checks = 1; else { if (verbose) { fprintf(stderr, "Unrecognized option at line %d in %s\n", lineno, name); } } } fclose(in); if (used_default_name) free(name); } /*}}}*/ static int compare_strings(const void *a, const void *b)/*{{{*/ { const char **aa = (const char **) a; const char **bb = (const char **) b; return strcmp(*aa, *bb); } /*}}}*/ static int check_message_list_for_duplicates(struct msgpath_array *msgs)/*{{{*/ { /* Caveat : only examines the file-per-message case */ char **sorted_paths; int i, n, nn; int result; n = msgs->n; sorted_paths = new_array(char *, n); for (i=0, nn=0; itype[i]) { case MTY_MBOX: break; case MTY_DEAD: assert(0); break; case MTY_FILE: sorted_paths[nn++] = msgs->paths[i].src.mpf.path; break; } } qsort(sorted_paths, nn, sizeof(char *), compare_strings); result = 0; for (i=1; i= buf1) { *q++ = *p--; } write(2, buf2, q-buf2); return; } /*}}}*/ void out_of_mem(char *file, int line, size_t size)/*{{{*/ { /* Hairy coding ahead - can't use any [s]printf, itoa etc because * those might try to use the heap! */ int filelen; char *p; static char msg1[] = "Out of memory (at "; static char msg2[] = " bytes)\n"; /* Perhaps even strlen is unsafe in this situation? */ p = file; while (*p) p++; filelen = p - file; write(2, msg1, sizeof(msg1)-1); write(2, file, filelen); write(2, ":", 1); emit_int(line); write(2, ", ", 2); emit_int(size); write(2, msg2, sizeof(msg2)-1); exit(2); } /*}}}*/ void report_error(const char *str, const char *filename)/*{{{*/ { if (filename) { int len = strlen(str) + strlen(filename) + 4; char *t; t = new_array(char, len); sprintf(t, "%s '%s'", str, filename); perror(t); free(t); } else { perror(str); } } /*}}}*/ static void print_copyright(void)/*{{{*/ { fprintf(stderr, "mairix %s, Copyright (C) 2002-2010 Richard P. Curnow\n" "mairix comes with ABSOLUTELY NO WARRANTY.\n" "This is free software, and you are welcome to redistribute it\n" "under certain conditions; see the GNU General Public License for details.\n\n", PROGRAM_VERSION); } /*}}}*/ static void print_version(void)/*{{{*/ { fprintf(stdout, "mairix %s\n", PROGRAM_VERSION); } /*}}}*/ static void handlesig(int signo)/*{{{*/ { unlock_and_exit(7); } /*}}}*/ static void usage(void)/*{{{*/ { print_copyright(); printf("mairix [-h] : Show help\n" "mairix [-f ] [-v] [-p] [-F] : Build index\n" "mairix [-f ] [-a] [-t] expr1 ... exprN : Run search\n" "mairix [-f ] -d : Dump database to stdout\n" "-h : show this help\n" "-f : use alternative rc file (default ~/.mairixrc)\n" "-V : show version\n" "-v : be verbose\n" "-p : purge messages that no longer exist\n" "-F : fast scan for maildir and MH folders (no mtime or size checks)\n" "-a : add new matches to match folder (default : clear it first)\n" "-x : display excerpt of message headers (default : use match folder)\n" "-t : include all messages in same threads as matching messages\n" "-o : override setting of mfolder from mairixrc file\n" "-r : force raw output regardless of mformat setting in mairixrc file\n" "-H : force hard links rather than symbolic ones\n" "expr_i : search expression (all expr's AND'ed together):\n" " word : match word in message body and major headers\n" " t:word : match word in To: header\n" " c:word : match word in Cc: header\n" " f:word : match word in From: header\n" " a:word : match word in To:, Cc: or From: headers (address)\n" " s:word : match word in Subject: header\n" " b:word : match word in message body\n" " m:word : match word in Message-ID: header\n" " n:word : match name of attachment within message\n" " F:flags : match on message flags (s=seen,r=replied,f=flagged,-=negate)\n" " p:substring : match substring of path\n" " d:start-end : match date range\n" " z:low-high : match messages in size range\n" " bs:word : match word in Subject: header or body (or any other group of prefixes)\n" " s:word1,word2 : match both words in Subject:\n" " s:word1/word2 : match either word or both words in Subject:\n" " s:~word : match messages not containing word in Subject:\n" " s:substring= : match substring in any word in Subject:\n" " s:^substring= : match left-anchored substring in any word in Subject:\n" " s:substring=2 : match substring with <=2 errors in any word in Subject:\n" "\n" " (See documentation for more examples)\n" ); } /*}}}*/ /* Notes on folder management: {{{ Assumption is that the user wants to keep the 'mfolder' directories under a common root with the real maildir folders. This allows a common value for mutt's 'folder' variable => the '+' and '=' prefixes work better. This means the indexer here can't just scan down all subdirectories of a single ancestor, because it'll pick up its own mfolders. So, use environment variables to tailor the folders. MAIRIX_FOLDER_BASE is the common parent directory of the folders (aka mutt's 'folder' variable) MAIRIX_MAILDIR_FOLDERS, MAIRIX_MH_FOLDERS, MAIRIX_MBOXEN are colon-separated lists of folders to index, with '...' after a component meaning any maildir underneath it. MAIRIX_MFOLDER is the folder to put the match data. For example, if MAIRIX_FOLDER_BASE = "/home/foobar/mail" MAIRIX_FOLDERS = "inbox:lists...:action:archive..." MAIRIX_MFOLDER = "mf" then /home/foobar/mail/mf/{new,cur,tmp} contain the output of the search. }}} */ int main (int argc, char **argv)/*{{{*/ { struct msgpath_array *msgs; struct database *db = NULL; char *arg_rc_file_path = NULL; char *arg_mfolder = NULL; char *e; int do_augment = 0; int do_threads = 0; int do_search = 0; int do_purge = 0; int any_updates = 0; int any_purges = 0; int do_help = 0; int do_raw_output = 0; int do_excerpt_output = 0; int do_dump = 0; int do_integrity_checks = 1; int do_forced_unlock = 0; int do_fast_index = 0; unsigned int forced_hash_key = CREATE_RANDOM_DATABASE_HASH; struct globber_array *omit_globs; int result; setlocale(LC_CTYPE, ""); while (++argv, --argc) { if (!*argv) { break; } else if (!strcmp(*argv, "-f") || !strcmp(*argv, "--rcfile")) { ++argv, --argc; if (!argc) { fprintf(stderr, "No filename given after -f argument\n"); exit(1); } arg_rc_file_path = *argv; } else if (!strcmp(*argv, "-t") || !strcmp(*argv, "--threads")) { do_search = 1; do_threads = 1; } else if (!strcmp(*argv, "-a") || !strcmp(*argv, "--augment")) { do_search = 1; do_augment = 1; } else if (!strcmp(*argv, "-o") || !strcmp(*argv, "--mfolder")) { ++argv, --argc; if (!argc) { fprintf(stderr, "No folder name given after -o argument\n"); exit(1); } arg_mfolder = *argv; } else if (!strcmp(*argv, "-p") || !strcmp(*argv, "--purge")) { do_purge = 1; } else if (!strcmp(*argv, "-d") || !strcmp(*argv, "--dump")) { do_dump = 1; } else if (!strcmp(*argv, "-r") || !strcmp(*argv, "--raw-output")) { do_raw_output = 1; } else if (!strcmp(*argv, "-x") || !strcmp(*argv, "--excerpt-output")) { do_excerpt_output = 1; } else if (!strcmp(*argv, "-H") || !strcmp(*argv, "--force-hardlinks")) { do_hardlinks = 1; } else if (!strcmp(*argv, "-Q") || !strcmp(*argv, "--no-integrity-checks")) { do_integrity_checks = 0; } else if (!strcmp(*argv, "--unlock")) { do_forced_unlock = 1; } else if (!strcmp(*argv, "-F") || !strcmp(*argv, "--fast-index")) { do_fast_index = 1; } else if (!strcmp(*argv, "--force-hash-key-new-database")) { ++argv, --argc; if (!argc) { fprintf(stderr, "No hash key given after --force-hash-key-new-database\n"); exit(1); } if ( 1 != sscanf(*argv, "%u", &forced_hash_key) ) { fprintf(stderr, "Hash key given after --force-hash-key-new-database could not be parsed\n"); exit(1); } } else if (!strcmp(*argv, "-v") || !strcmp(*argv, "--verbose")) { verbose = 1; } else if (!strcmp(*argv, "-V") || !strcmp(*argv, "--version")) { print_version(); exit(0); } else if (!strcmp(*argv, "-h") || !strcmp(*argv, "--help")) { do_help = 1; } else if ((*argv)[0] == '-') { fprintf(stderr, "Unrecognized option %s\n", *argv); } else if (!strcmp(*argv, "--")) { /* End of args */ break; } else { /* standard args start */ break; } } if (do_help) { usage(); exit(0); } if (verbose) { print_copyright(); } if (*argv) { /* There are still args to process */ do_search = 1; } parse_rc_file(arg_rc_file_path); if (getenv("MAIRIX_FOLDER_BASE")) { folder_base = getenv("MAIRIX_FOLDER_BASE"); } if (getenv("MAIRIX_MAILDIR_FOLDERS")) { maildir_folders = getenv("MAIRIX_MAIDIR_FOLDERS"); } if (getenv("MAIRIX_MH_FOLDERS")) { mh_folders = getenv("MAIRIX_MH_FOLDERS"); } if ((e = getenv("MAIRIX_MBOXEN"))) { mboxen = e; } if (getenv("MAIRIX_MFOLDER")) { mfolder = getenv("MAIRIX_MFOLDER"); } if (getenv("MAIRIX_DATABASE")) { database_path = getenv("MAIRIX_DATABASE"); } if (arg_mfolder) { mfolder = arg_mfolder; } if (skip_integrity_checks) { do_integrity_checks = 0; } if (!folder_base) { fprintf(stderr, "No folder_base/MAIRIX_FOLDER_BASE set\n"); exit(2); } if (!database_path) { fprintf(stderr, "No database/MAIRIX_DATABASE set\n"); exit(2); } if (do_raw_output) { output_folder_type = FT_RAW; } else if (do_excerpt_output) { output_folder_type = FT_EXCERPT; } if (omit) { omit_globs = colon_sep_string_to_globber_array(omit); } else { omit_globs = NULL; } /* Lock database. * Prevent concurrent updates due to parallel indexing (e.g. due to stuck * cron jobs). * Prevent concurrent searching and indexing. */ signal(SIGHUP, handlesig); signal(SIGINT, handlesig); signal(SIGQUIT, handlesig); lock_database(database_path, do_forced_unlock); if (do_dump) { dump_database(database_path); result = 0; } else if (do_search) { int len; char *complete_mfolder; enum filetype ftype; if (!mfolder) { switch (output_folder_type) { case FT_RAW: case FT_EXCERPT: break; default: fprintf(stderr, "No mfolder/MAIRIX_MFOLDER set\n"); unlock_and_exit(2); } mfolder = new_string(""); } /* complete_mfolder is needed by search_top() and member_of() so compute it once here rather than in search_top() as well */ if ((mfolder[0] == '/') || ((mfolder[0] == '.') && (mfolder[1] == '/'))) { complete_mfolder = new_string(mfolder); } else { len = strlen(folder_base) + strlen(mfolder) + 2; complete_mfolder = new_array(char, len); strcpy(complete_mfolder, folder_base); strcat(complete_mfolder, "/"); strcat(complete_mfolder, mfolder); } /* check whether mfolder output would destroy a mail folder or mbox */ switch (output_folder_type) { case FT_RAW: case FT_EXCERPT: break; default: if ((member_of(complete_mfolder,folder_base, maildir_folders, FT_MAILDIR, omit_globs)|| member_of (complete_mfolder, folder_base, mh_folders, FT_MH, omit_globs) || member_of (complete_mfolder, folder_base, mboxen, FT_MBOX, omit_globs))) { fprintf (stderr, "You asked search results to go to the folder '%s'.\n" "That folder appears to be one of the indexed mail folders!\n" "For your own good, I refuse to output search results to an indexed mail folder.\n", mfolder); unlock_and_exit(3); } } ftype = classify_file(database_path); if (ftype != M_FILE) { fprintf(stderr, "No database file '%s' is present.\nYou need to do an indexing run first.\n", database_path); unlock_and_exit(3); } result = search_top(do_threads, do_augment, database_path, complete_mfolder, argv, output_folder_type, verbose); } else { enum filetype ftype; if (!maildir_folders && !mh_folders && !mboxen) { fprintf(stderr, "No [mh_]folders/mboxen/MAIRIX_[MH_]FOLDERS set\n"); unlock_and_exit(2); } if (verbose) printf("Finding all currently existing messages...\n"); msgs = new_msgpath_array(); if (maildir_folders) { build_message_list(folder_base, maildir_folders, FT_MAILDIR, msgs, omit_globs); } if (mh_folders) { build_message_list(folder_base, mh_folders, FT_MH, msgs, omit_globs); } /* The next call sorts the msgs array as part of looking for duplicates. */ if (check_message_list_for_duplicates(msgs)) { fprintf(stderr, "Message list contains duplicates - check your 'folders' setting\n"); unlock_and_exit(2); } /* Try to open existing database */ ftype = classify_file(database_path); if (ftype == M_FILE) { if (verbose) printf("Reading existing database...\n"); db = new_database_from_file(database_path, do_integrity_checks); if (verbose) printf("Loaded %d existing messages\n", db->n_msgs); } else if (ftype == M_NONE) { if (verbose) printf("Starting new database\n"); db = new_database( forced_hash_key ); } else { fprintf(stderr, "database path %s is not a file; you can't put the database there\n", database_path); unlock_and_exit(2); } build_mbox_lists(db, folder_base, mboxen, omit_globs); any_updates = update_database(db, msgs->paths, msgs->n, do_fast_index); if (do_purge) { any_purges = cull_dead_messages(db, do_integrity_checks); } if (any_updates || any_purges) { /* For now write it every time. This is obviously the most reliable method. */ write_database(db, database_path, do_integrity_checks); } #if 0 get_db_stats(db); #endif free_database(db); free_msgpath_array(msgs); result = 0; } unlock_database(); return result; } /*}}}*/ mairix-master/mairix.h000066400000000000000000000301631224450623700153120ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #ifndef MAIRIX_H #define MAIRIX_H #include #include #include #include #include #include #include "memmac.h" struct msgpath {/*{{{*/ /* The 'selector' for this union is the corresponding entry of type 'enum * message_type' */ union { struct { char *path; size_t size; /* size of the message in bytes */ time_t mtime; /* mtime of message file on disc */ } mpf; /* message per file */ struct { int file_index; /* index into table of mbox files */ int msg_index; /* index of message within the file */ } mbox; /* for messages in mbox format folders */ } src; /* Now fields that are common to both types of message. */ time_t date; /* representation of Date: header in message */ int tid; /* thread-id */ /* Message flags. */ unsigned int seen:1; unsigned int replied:1; unsigned int flagged:1; /* + other stuff eventually */ }; /*}}}*/ enum message_type {/*{{{*/ MTY_DEAD, /* msg no longer exists, i.e. don't report in searches, prune it on a '-p' run. */ MTY_FILE, /* msg <-> file in 1-1 correspondence e.g. maildir, MH */ MTY_MBOX /* multiple msgs per file : MBOX format file */ }; /*}}}*/ struct msgpath_array {/*{{{*/ enum message_type *type; struct msgpath *paths; int n; int max; }; /*}}}*/ struct matches {/*{{{*/ unsigned char *msginfo; int n; /* bytes in use */ int max; /* bytes allocated */ unsigned long highest; }; /*}}}*/ struct token {/*{{{*/ char *text; unsigned long hashval; /* to store delta-compressed info of which msgpaths match the token */ struct matches match0; }; /*}}}*/ struct token2 {/*{{{*/ char *text; unsigned long hashval; /* to store delta-compressed info of which msgpaths match the token */ struct matches match0; struct matches match1; }; /*}}}*/ struct toktable {/*{{{*/ struct token **tokens; int n; /* # in use */ int size; /* # allocated */ unsigned int mask; /* for masking down hash values */ int hwm; /* number to have before expanding */ }; /*}}}*/ struct toktable2 {/*{{{*/ struct token2 **tokens; int n; /* # in use */ int size; /* # allocated */ unsigned int mask; /* for masking down hash values */ int hwm; /* number to have before expanding */ }; /*}}}*/ enum content_type {/*{{{*/ CT_TEXT_PLAIN, CT_TEXT_HTML, CT_TEXT_OTHER, CT_MESSAGE_RFC822, CT_OTHER }; /*}}}*/ struct rfc822; struct attachment {/*{{{*/ struct attachment *next; struct attachment *prev; enum content_type ct; char *filename; union attachment_body { struct normal_attachment_body { int len; char *bytes; } normal; struct rfc822 *rfc822; } data; }; /*}}}*/ struct headers {/*{{{*/ char *to; char *cc; char *from; char *subject; /* The following are needed to support threading */ char *message_id; char *in_reply_to; char *references; struct { unsigned int seen:1; unsigned int replied:1; unsigned int flagged:1; } flags; time_t date; }; /*}}}*/ struct rfc822 {/*{{{*/ struct headers hdrs; struct attachment atts; }; /*}}}*/ typedef char checksum_t[16]; struct message_list {/*{{{*/ struct message_list *next; off_t start; size_t len; }; /*}}}*/ struct mbox {/*{{{*/ /* If path==NULL, this indicates that the mbox is dead, i.e. no longer * exists. */ char *path; /* As read in from database (i.e. current last time mairix scan was run.) */ time_t file_mtime; size_t file_size; /* As found in the filesystem now. */ time_t current_mtime; size_t current_size; /* After reconciling a loaded database with what's on the disc, this entry stores how many of the msgs that used to be there last time are still present at the head of the file. Thus, all messages beyond that are treated as dead, and scanning starts at that point to find 'new' messages (whch may actually be old ones that have moved, but they're treated as new.) */ int n_old_msgs_valid; /* Hold list of new messages and their number. Number is temporary - * eventually just list walking in case >=2 have to be reattached. */ struct message_list *new_msgs; int n_new_msgs; int n_so_far; /* Used during database load. */ int n_msgs; /* Number of entries in 'start' and 'len' */ int max_msgs; /* Allocated size of 'start' and 'len' */ /* File offset to the start of each message (first line of real header, not to mbox 'From ' line) */ off_t *start; /* Length of each message */ size_t *len; /* Checksums on whole messages. */ checksum_t *check_all; }; /*}}}*/ struct database {/*{{{*/ /* Used to hold an entire mapping between an array of filenames, each containing a single message, and the sets of tokens that occur in various parts of those messages */ enum message_type *type; struct msgpath *msgs; /* Paths to messages */ int n_msgs; /* Number in use */ int max_msgs; /* Space allocated */ struct mbox *mboxen; int n_mboxen; /* number in use. */ int max_mboxen; /* space allocated */ /* Seed for hashing in the token tables. Randomly created for * each new database - avoid DoS attacks through carefully * crafted messages. */ unsigned int hash_key; /* Token tables */ struct toktable *to; struct toktable *cc; struct toktable *from; struct toktable *subject; struct toktable *body; struct toktable *attachment_name; /* Encoding chain 0 stores all msgids appearing in the following message headers: * Message-Id, In-Reply-To, References. Used for thread reconciliation. * Encoding chain 1 stores just the Message-Id. Used for search by message ID. */ struct toktable2 *msg_ids; }; /*}}}*/ enum folder_type {/*{{{*/ FT_MAILDIR, FT_MH, FT_MBOX, FT_RAW, FT_EXCERPT }; /*}}}*/ struct string_list {/*{{{*/ struct string_list *next; struct string_list *prev; char *data; }; /*}}}*/ struct msg_src { enum {MS_FILE, MS_MBOX} type; char *filename; off_t start; size_t len; }; /* Outcomes of checking a filename/dirname to see whether to keep on looking * at filenames within this dir. */ enum traverse_check { TRAV_PROCESS, /* Continue looking at this entry */ TRAV_IGNORE, /* Ignore just this dir entry */ TRAV_FINISH /* Ignore this dir entry and don't bother looking at the rest of the directory */ }; struct traverse_methods { int (*filter)(const char *, const struct stat *); enum traverse_check (*scrutinize)(int, const char *); }; extern struct traverse_methods maildir_traverse_methods; extern struct traverse_methods mh_traverse_methods; extern struct traverse_methods mbox_traverse_methods; extern int verbose; /* cmd line -v switch */ extern int do_hardlinks; /* cmd line -H switch */ /* Lame fix for systems where NAME_MAX isn't defined after including the above * set of .h files (Solaris, FreeBSD so far). Probably grossly oversized but * it'll do. */ #if !defined(NAME_MAX) #define NAME_MAX 4096 #endif /* In glob.c */ struct globber; struct globber_array; struct globber *make_globber(const char *wildstring); void free_globber(struct globber *old); int is_glob_match(struct globber *g, const char *s); struct globber_array *colon_sep_string_to_globber_array(const char *in); int is_globber_array_match(struct globber_array *ga, const char *s); void free_globber_array(struct globber_array *in); /* In hash.c */ unsigned int hashfn( unsigned char *k, unsigned int length, unsigned int initval); /* In dirscan.c */ struct msgpath_array *new_msgpath_array(void); int valid_mh_filename_p(const char *x); void free_msgpath_array(struct msgpath_array *x); void string_list_to_array(struct string_list *list, int *n, char ***arr); void split_on_colons(const char *str, int *n, char ***arr); void build_message_list(char *folder_base, char *folders, enum folder_type ft, struct msgpath_array *msgs, struct globber_array *omit_globs); /* In rfc822.c */ struct rfc822 *make_rfc822(char *filename); void free_rfc822(struct rfc822 *msg); enum data_to_rfc822_error { DTR8_OK, DTR8_MISSING_END, /* missing endpoint marker. */ DTR8_MULTIPART_SANS_BOUNDARY, /* multipart with no boundary string defined */ DTR8_BAD_HEADERS, /* corrupt headers */ DTR8_BAD_ATTACHMENT /* corrupt attachment (e.g. no body part) */ }; struct rfc822 *data_to_rfc822(struct msg_src *src, char *data, int length, enum data_to_rfc822_error *error); void create_ro_mapping(const char *filename, unsigned char **data, int *len); void free_ro_mapping(unsigned char *data, int len); char *format_msg_src(struct msg_src *src); /* In tok.c */ struct toktable *new_toktable(void); struct toktable2 *new_toktable2(void); void free_token(struct token *x); void free_token2(struct token2 *x); void free_toktable(struct toktable *x); void free_toktable2(struct toktable2 *x); void add_token_in_file(int file_index, unsigned int hash_key, char *tok_text, struct toktable *table); void check_and_enlarge_encoding(struct matches *m); void insert_index_on_encoding(struct matches *m, int idx); void add_token2_in_file(int file_index, unsigned int hash_key, char *tok_text, struct toktable2 *table, int add_to_chain1); /* In db.c */ #define CREATE_RANDOM_DATABASE_HASH 0 struct database *new_database(unsigned int hash_key); struct database *new_database_from_file(char *db_filename, int do_integrity_checks); void free_database(struct database *db); void maybe_grow_message_arrays(struct database *db); void tokenise_message(int file_index, struct database *db, struct rfc822 *msg); int update_database(struct database *db, struct msgpath *sorted_paths, int n_paths, int do_fast_index); void check_database_integrity(struct database *db); int cull_dead_messages(struct database *db, int do_integrity_checks); /* In mbox.c */ void build_mbox_lists(struct database *db, const char *folder_base, const char *mboxen_paths, struct globber_array *omit_globs); int add_mbox_messages(struct database *db); void compute_checksum(const char *data, size_t len, checksum_t *csum); void cull_dead_mboxen(struct database *db); unsigned int encode_mbox_indices(unsigned int mb, unsigned int msg); void decode_mbox_indices(unsigned int index, unsigned int *mb, unsigned int *msg); int verify_mbox_size_constraints(struct database *db); void glob_and_expand_paths(const char *folder_base, char **paths_in, int n_in, char ***paths_out, int *n_out, const struct traverse_methods *methods, struct globber_array *omit_globs); /* In glob.c */ struct globber; struct globber *make_globber(const char *wildstring); void free_globber(struct globber *old); int is_glob_match(struct globber *g, const char *s); /* In writer.c */ void write_database(struct database *db, char *filename, int do_integrity_checks); /* In search.c */ int search_top(int do_threads, int do_augment, char *database_path, char *complete_mfolder, char **argv, enum folder_type ft, int verbose); /* In stats.c */ void get_db_stats(struct database *db); /* In dates.c */ int scan_date_string(char *in, time_t *start, int *has_start, time_t *end, int *has_end); /* In dumper.c */ void dump_database(char *filename); /* In strexpand.c */ char *expand_string(const char *p); /* In dotlock.c */ void lock_database(char *path, int forced_unlock); void unlock_database(void); void unlock_and_exit(int code); /* In mairix.c */ void report_error(const char *str, const char *filename); #endif /* MAIRIX_H */ mairix-master/mairix.spec.sample000066400000000000000000000023321224450623700172720ustar00rootroot00000000000000Name: mairix Summary: A maildir indexer and searcher Version: @@VERSION@@ Release: 1 Source: %{name}-%{version}.tar.gz License: GPL Group: Application/Internet Packager: Richard P. Curnow BuildRoot: %{_tmppath}/%{name}-%{version}-root-%(id -u -n) Requires: info URL: http://www.rc0.org.uk/mairix %description mairix is a tool for indexing email messages stored in maildir format folders and performing fast searches on the resulting index. The output is a new maildir folder containing symbolic links to the matched messages. %prep %setup -q %build CFLAGS="$RPM_OPT_FLAGS" ./configure --prefix=%{_prefix} make %install rm -rf $RPM_BUILD_ROOT cd $RPM_BUILD_DIR/mairix-%{version} make install DESTDIR=$RPM_BUILD_ROOT mandir=$RPM_BUILD_ROOT/%{_mandir} cp README dotmairixrc.eg .. %files %{_bindir}/mairix %doc README %doc dotmairixrc.eg %doc %{_mandir}/man1/mairix.1.gz %doc %{_mandir}/man5/mairixrc.5.gz %changelog * Fri Mar 24 2006 Andre Costa - 0.18 - Updated to version 0.18 - Included URL on header - removed references to 'mairix.txt', 'mairix.html' and 'mairix.info' - .info files have been deprecated - removed useless 'post' section - makefile's "mandir" is pointing to /usr/man instead of /usr/share/man mairix-master/mairixrc.5000066400000000000000000000242561224450623700155620ustar00rootroot00000000000000.TH MAIRIXRC 5 "January 2006" .de Sx .PP .ne \\$1 .nf .na .RS 12 .. .de Ex .RE .fi .ad .IP "" 7 .. .SH NAME mairixrc \- configuration file for mairix(1) .SH SYNOPSIS $HOME/.mairixrc .SH DESCRIPTION .PP The .I mairixrc file tells .B mairix where your mail folders are located. It also tells .B mairix where the results of searches are to be written. .B mairix searches for this file at .I $HOME/.mairixrc unless the .B -f option is used. The directives .BR base , .BR mfolder , and .B database must always appear in the file. There must also be some folder definitions (using the .BR maildir , .BR mh , or .BR mbox ) directives. .SS Comments Any line starting with a '#' character is treated as a comment. .SS Directives .TP .BI base= base-directory .br This defines the path to the common parent directory of all your maildir folders. If the path is relative, it is treated as relative to the location of the .I mairixrc file. .TP .BI maildir= list-of-folder-specifications This is a colon-separated list of the Maildir folders (relative to `base') that you want indexed. Any entry that ends `...' is recursively scanned to find any Maildir folders underneath it. More than one line starting with `maildir' can be included. In this case, mairix joins the lines together with colons as though a single list of folders had been given on a single very long line. Each colon-separated entry may be a wildcard. See the discussion under mbox (below) for the wildcard syntax. For example .Sx 1 maildir=zzz/foo*... .Ex will match maildir folders like these (relative to the .IR base-directory ) .Sx 4 zzz/foobar/xyz zzz/fooquux zzz/foo zzz/fooabc/u/v/w .Ex and .Sx 1 maildir=zzz/foo[abc]* .Ex will match maildir folders like these (relative to the folder_base) .Sx 4 zzz/fooa zzz/fooaaaxyz zzz/foobcd zzz/fooccccccc .Ex If a folder name contains a colon, you can write this by using the sequence '\\:' to escape the colon. Otherwise, the backslash character is treated normally. (If the folder name actually contains the sequence '\\:', you're out of luck.) .TP .BI mh= list-of-folder-specifications .br This is a colon-separated list of the MH folders (relative to `base') that you want indexed. Any entry that ends '...' is recursively scanned to find any MH folders underneath it. More than one line starting with 'mh' can be included. In this case, mairix joins the lines together with colons as though a single list of folders had been given on a single very long line. Each colon-separated entry may be a wildcard, see the discussion under maildir (above) and mbox (below) for the syntax and semantics of specifying wildcards. .b mairix recognizes the types of MH folders created by the following email applications: .RS 7 .IP "*" xmh .IP "*" sylpheed .IP "*" claws-mail .IP "*" evolution .IP "*" NNML .IP "*" Mew .RE .TP .BI mbox= list-of-folder-specifications .br This is a colon-separated list of the mbox folders (relative to `base') that you want indexed. Each colon-separated item in the list can be suffixed by '...'. If the item matches a regular file, that file is treated as a mbox folder and the '...' suffix is ignored. If the item matches a directory, a recursive scan of everything inside that directory is made, and all regular files are initially considered as mbox folders. (Any directories found in this scan are themselves scanned, since the scan is recursive.) Each colon-separated item may contain wildcard operators, but only in its final path component. The wildcard operators currently supported are .TP * .br Match zero or more characters (each character matched is arbitrary) .TP ? .br Match exactly one arbitrary character .TP [abcs-z] .br Character class : match a single character from the set a, b, c, s, t, u, v, w, x, y and z. To include a literal ']' in the class, place it immediately after the opening '['. To include a literal '-' in the class, place it immediately before the closing ']'. If these metacharacters are included in non-final path components, they have no special meaning. Here are some examples .TP mbox=foo/bar* .br matches 'foo/bar', 'foo/bar1', 'foo/barrrr' etc .TP mbox=foo*/bar* .br matches 'foo*/bar', 'foo*/bar1', 'foo*/barrrr' etc .TP mbox=foo/* .br matches 'foo/bar', 'foo/bar1', 'foo/barrrr', 'foo/foo', \'foo/x' etc .TP mbox=foo... .br matches any regular file in the tree rooted at 'foo' .TP mbox=foo/*... .br same as before .TP mbox=foo/[a-z]*... .br matches 'foo/a', 'foo/aardvark/xxx', 'foo/zzz/foobar', \'foo/w/x/y/zzz', but not 'foo/A/foobar' Regular files that are mbox folder candidates are examined internally. Only files containing standard mbox 'From ' separator lines will be scanned for messages. If a regular file has a name ending in '.gz', and gzip support is compiled into the .B mairix binary, the file will be treated as a gzipped mbox. If a regular file has a name ending in '.bz2', and bzip support is compiled into the .B mairix binary, the file will be treated as a bzip2'd mbox. More than one line starting with 'mbox' can be included. In this case, .B mairix joins the lines together with colons as though a single list of folders had been given on a single very long line. .B mairix performs no locking of mbox folders when it is accessing them. If a mail delivery program is modifying the mbox at the same time, it is likely that one or messages in the mbox will never get indexed by .B mairix (until the database is removed and recreated from scratch, anyway.) The assumption is that .B mairix will be used to index archive folders rather than incoming ones, so this is unlikely to be much of a problem in reality. .B mairix can support a maximum of 65536 separate mboxes, and a maximum of 65536 messages within any one mbox. .TP .BI omit= list-of-glob-patterns This is a colon-separated list of glob patterns for folders to be omitted from the indexing. This allows wide wildcards and recursive elements to be used in the .BR maildir , mh ", and" mbox directives, with the .B omit option used to selectively remove unwanted folders from the folder lists. Within the glob patterns, a single '*' matches any sequence of characters other than '/'. However '**' matches any sequence of characters including '/'. This allows glob patterns to be constructed which have a wildcard for just one directory component, or for any number of directory components. The _omit_ option can be specified as many times as required so that the list of patterns doesn't all have to fit on one line. As an example, .Sx 2 mbox=bulk... omit=bulk/spam* .Ex will index all mbox folders at any level under the 'bulk' subdirectory of the base folder, except for those folders whose names start 'bulk/spam', e.g. 'bulk/spam', 'bulk/spam2005' etc. In constrast, .Sx 2 mbox=bulk... omit=bulk/spam** .Ex will index all mbox folders at any level under the 'bulk' subdirectory of the base folder, except for those folders whose names start 'bulk/spam', e.g. 'bulk/spam', 'bulk/spam2005', \'bulk/spam/2005', 'bulk/spam/2005/jan' etc. .TP .B nochecks This takes no arguments. If a line starting with .B nochecks is present, it is the equivalent of specifying the .B -Q flag to every indexing run. .TP .BI mfolder= match-folder-name This defines the name of the folder (within the directory specified by .BR base ) into which the search mode writes its output. (If the .B mformat used is 'raw' or 'excerpt', then this setting is not used and may be omitted.) The .B mfolder setting may be over-ridden for a particular search by using the .B -o option to .BR mairix . .B mairix will refuse to output search results to a folder that appears to be amongst those that are indexed. This is to prevent accidental deletion of emails. If the first character of the mfolder value is '/' or '.', it is taken as a pathname in its own right. This allows you to specify absolute paths and paths relative to the current directory where the mfolder should be written. Otherwise, the value of mfolder is appended to the value of base, in the same way as for the source folders. .TP .BI mformat= format This defines the type of folder used for the match folder where the search results go. There are four valid settings for .IR format , namely 'maildir', 'mh', 'mbox', 'raw' or 'excerpt'. If the 'raw' setting is used then .B mairix will just print out the path names of the files that match and no match folder will be created. If the 'excerpt' setting is used, .B mairix will also print out the To:, Cc:, From:, Subject: and Date: headers of the matching messages. 'maildir' is the default if this option is not defined. The setting is case-insensitive. .TP .BI database= path-to-database .br This defines the path where .BR mairix 's index database is kept. You can keep this file anywhere you like. Currently, .B mairix will place a single database file at the location indicated by .IR path-to-database . However, a future version of .B mairix may instead place a directory containing several files at this location. .I path-to-database should be an absolute pathname (starting with '/'). If a relative pathname is used, it will be interpreted relative to the current directory at the time .B mairix is run, .RB ( not relative to the location of the .I mairixrc file or anything like that.) .SS Expansions The part of each line in '.mairixrc' following the equals sign can contain the following types of expansion: .TP .B Home directory expansion If the sequence '~/' appears at the start of the text after the equals sign, it is expanded to the user's home directory. Example: .Sx 1 database=~/Mail/mairix_database .Ex .TP .B Environment expansion If a '$' is followed by a sequence of alpha-numeric characters (or \'_'), the whole string is replaced by looking up the corresponding environment variable. Similarly, if '$' is followed by an open brace ('{'), everything up to the next close brace is looked up as an environment variable and the result replaces the entire sequence. Suppose in the shell we do .Sx 1 export FOO=bar .Ex and the '.mairixrc' file contains .Sx 2 maildir=xxx/$FOO mbox=yyy/a${FOO}b .Ex this is equivalent to .Sx 2 maildir=xxx/bar mbox=yyy/abarb .Ex If the specified environment variable is not set, the replacement is the empty string. .SH NOTES .PP An alternative path to the configuration file may be given with the .B \-f option to mairix(1). mairix-master/make_release000077500000000000000000000037161224450623700162170ustar00rootroot00000000000000#!/usr/bin/env perl ######################################################################### # # mairix - message index builder and finder for maildir folders. # # Copyright (C) Richard P. Curnow 2005,2006 # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. # # ======================================================================= ######################################################################### $version = shift || die "Usage : $0 \n"; $subdir = "mairix-${version}"; unless (-d "RELEASES") { mkdir "RELEASES", 0755; } system ("git tag -s $version"); die "git tag failed" if ($? != 0); if (-d "RELEASES/$subdir") { system ("rm -rf RELEASES/$subdir"); } system ("git archive --format=tar --prefix=RELEASES/${subdir}/ ${version} | tar xf -"); die "git archive failed" if ($? != 0); chdir "RELEASES" or die "Cannot get into RELEASES"; $here = qx/pwd/; chomp $here; chdir $subdir or die "Cannot get into $subdir"; open (OUT, ">version.txt"); print OUT $version."\n"; close OUT; open (IN, "mairix.spec"); while () { s/\@\@VERSION\@\@/$version/; print OUT; } close (IN); close (OUT); unlink "make_release"; unlink "mairix.spec.sample"; unlink ".gitignore"; unlink "dfasyn/.gitignore"; chdir $here; system ("tar cvf - $subdir | gzip -9 > ${subdir}.tar.gz"); system ("gpg -b -a -o ${subdir}-tar-gz-asc.txt ${subdir}.tar.gz"); mairix-master/mbox.c000066400000000000000000000732111224450623700147620ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2003,2004,2005,2006,2007 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include #include #include #include #include #include #include #include #include #include "mairix.h" #include "from.h" #include "fromcheck.h" #include "md5.h" struct extant_mbox {/*{{{*/ char *full_path; time_t mtime; size_t size; int db_index; /* + stuff to store positions etc of individual messages. */ }; /*}}}*/ static int compare_extant_mboxen(const void *a, const void *b)/*{{{*/ { const struct extant_mbox *aa = (const struct extant_mbox *) a; const struct extant_mbox *bb = (const struct extant_mbox *) b; return strcmp(aa->full_path, bb->full_path); } /*}}}*/ static int lookup_extant_mbox(struct extant_mbox *sorted_mboxen, int n_extant, char *key)/*{{{*/ { /* Implement bisection search */ int l, h, m, r; l = 0, h = n_extant; m = -1; while (h > l) { m = (h + l) >> 1; /* Should only get called on 'file' type messages - TBC */ r = strcmp(sorted_mboxen[m].full_path, key); if (r == 0) break; if (l == m) return -1; if (r > 0) h = m; else l = m; } return m; } /*}}}*/ static void append_new_mboxen_to_db(struct database *db, struct extant_mbox *extant_mboxen, int n_extant)/*{{{*/ { int N, n_reqd; int i, j; for (i=N=0; in_mboxen + N; if (n_reqd > db->max_mboxen) { db->max_mboxen = n_reqd; db->mboxen = grow_array(struct mbox, n_reqd, db->mboxen); } /* Init new entries. */ for (j=0, i=db->n_mboxen; jmboxen[i].path = new_string(extant_mboxen[j].full_path); db->mboxen[i].current_mtime = extant_mboxen[j].mtime; db->mboxen[i].current_size = extant_mboxen[j].size; db->mboxen[i].file_mtime = 0; db->mboxen[i].file_size = 0; db->mboxen[i].n_msgs = 0; db->mboxen[i].n_old_msgs_valid = 0; db->mboxen[i].max_msgs = 0; db->mboxen[i].start = NULL; db->mboxen[i].len = NULL; db->mboxen[i].check_all = NULL; i++; } } db->n_mboxen = n_reqd; } /*}}}*/ void compute_checksum(const char *data, size_t len, checksum_t *csum)/*{{{*/ { MD5_CTX md5; MD5Init(&md5); MD5Update(&md5, (unsigned char *) data, len); MD5Final(&md5); memcpy(csum, md5.digest, sizeof(md5.digest)); return; } /*}}}*/ static int message_is_intact(struct mbox *mb, int idx, char *va, size_t len)/*{{{*/ { /* TODO : later, look at whether to optimise this in some way, e.g. by doing an initial check on just the first 1k of a message, this will detect failures much faster at the cost of extra storage. */ if (mb->start[idx] + mb->len[idx] > len) { /* Message overruns the end of the file - can't possibly be intact. */ return 0; } else { checksum_t csum; compute_checksum(va + mb->start[idx], mb->len[idx], &csum); if (!memcmp(mb->check_all[idx], &csum, sizeof(checksum_t))) { return 1; } else { return 0; } } return 0; } /*}}}*/ static int find_number_intact(struct mbox *mb, char *va, size_t len)/*{{{*/ { /* Pick up the common obvious case first - where new messages have been appended to the end of the mbox */ if (mb->n_msgs == 0) { return 0; } else if (message_is_intact(mb, mb->n_msgs - 1, va, len)) { return mb->n_msgs; /* The lot */ } else if (!message_is_intact(mb, 0, va, len)) { return 0; /* None of them. */ } else { /* Looks like a deletion has occurred earlier in the file => binary chop search to find the last message that's still valid. Assume that everything below a valid message is still valid itself (possibly dangerous assumption, time will tell.) */ int l, m, h; l = 0; h = mb->n_msgs; /* Loop invariant : always, mesasage[l] is intact, message[h] isn't. */ while (l < h) { m = (h + l) >> 1; if (m==l) break; if (message_is_intact(mb, m, va, len)) { l = m; } else { h = m; } } /* By loop invariant, message[l] is the highest valid one. */ return (l + 1); } } /*}}}*/ static int fromtab_inited = 0; static signed char fromtab[256]; static void init_fromtab(void)/*{{{*/ { memset(fromtab, 0xff, 256); fromtab[(int)(unsigned char)'\n'] = ~(1<<0); fromtab[(int)(unsigned char)'F'] = ~(1<<1); fromtab[(int)(unsigned char)'r'] = ~(1<<2); fromtab[(int)(unsigned char)'o'] = ~(1<<3); fromtab[(int)(unsigned char)'m'] = ~(1<<4); fromtab[(int)(unsigned char)' '] = ~(1<<5); } /*}}}*/ /* REAL CHECKING : need to see if the line looks like this: * From [ ] tags are not currently indexed. Non-text attachments are ignored. If there's an attachment of type message/rfc822, this is parsed and the match is performed on this sub-message too. If a hit occurs, the enclosing message is treated as having a hit.} @item A word in a particular part of the message, e.g. @samp{s:pointer}. This matches any message with the word @samp{pointer} in the subject. The qualifiers for this are : @table @asis @item @t{t:pointer} to match @samp{pointer} in the @t{To:} header, @item @t{c:pointer} to match @samp{pointer} in the @t{Cc:} header, @item @t{a:pointer} to match @samp{pointer} in the @t{To:}, @t{Cc:} or @t{From:} headers (@samp{a} meaning @samp{address}), @item @t{f:pointer} to match @samp{pointer} in the @t{From:} header, @item @t{s:pointer} to match @samp{pointer} in the @t{Subject:} header, @item @t{b:pointer} to match @samp{pointer} in the message body. @item @t{m:pointer} to match messages having a Message-ID header of @samp{pointer}. @end table Multiple fields may be specified, e.g. @t{sb:pointer} to match in the @t{Subject:} header or the body. @item A negated word, e.g. @samp{s:~pointer}. This matches all messages that don't have the word @samp{pointer} in the subject line. @item A substring match, e.g. @samp{s:point=}. This matches all messages containing a word in their subject line where the word has @samp{point} as a substring, e.g. @samp{pointer}, @samp{disappoint}. @item An approximate match, e.g. @samp{s:point=1}. This matches all messages containing a word in their subject line where the word has @samp{point} as a substring with at most one error, e.g. @samp{jointed} contains @samp{joint} which can be got from @samp{point} with one letter changed. An error can be a single letter changed, inserted or deleted. @item A left-anchored substring match, e.g. @samp{s:^point=}. This matches all messages containing a word in their subject line where the word begins with the string @samp{point}. (This feature is intended to be useful for inflected languages where the substring search is used to avoid the grammatical ending on the word.) This left-anchored facility can be combined with the approximate match facility, e.g. @samp{s:^point=1}. Note, if the @samp{^} prefix is used without the @samp{=} suffix, it is ignored. For example, @samp{s:^point} means the same thing as @samp{s:point}. @item A disjunction, e.g. @samp{s:pointer/dereference}. This matches all messages with one or both of the words @samp{pointer} and @samp{dereference} in their subject lines. @item Each disjunction may be a conjunction, e.g. @samp{s:null,pointer/dereference=2} matches all messages whose subject lines either contain both the words @samp{null} and @samp{pointer}, or contain the word @samp{dereference} with up to 2 errors (or both). @item A path expression. This matches all messages with a particular substring in their path. The syntax is very similar to that for words within the message (above), and all the rules for @samp{+}, @samp{,}, approximate matching etc are the same. The word prefix used for a path expression is @samp{p:}. Examples: @example mairix p:/archive/ @end example matches all messages with @samp{/archive/} in their path, and @example mairix p:wibble=1 s:wibble=1 @end example matches all messages with @samp{wibble} in their path and in their subject line, allowing up to 1 error in each case (the errors may be different for a particular message.) Path expressions always use substring matches and never exact matches (it's very unlikely you want to type in the whole of a message path as a search expression!) The matches are always @b{case-sensitive}. (All matches on words within messages are case-insensitive.) There is a limit of 32 characters on the match expression. @end itemize The binding order of the constructions is: @enumerate @item Individual command line arguments define separate conditions which are AND-ed together @item Within a single argument, the letters before the colon define which message parts the expression applies to. If there is no colon, the expression applies to all the headers listed earlier and the body. @item After the colon, commas delineate separate disjuncts, which are OR-ed together. @item Each disjunct may contain separate conjuncts, which are separated by plus signs. These conditions are AND-ed together. @item Each conjunct may start with a tilde to negate it, and may be followed by a slash to indicate a substring match, optionally followed by an integer to define the maximum number of errors allowed. @end enumerate Now some examples. Suppose my email address is @email{richard@@doesnt.exist}. The following will match all messages newer than 3 months from me with the word @samp{chrony} in the subject line: @example mairix d:3m- f:richard+doesnt+exist s:chrony @end example Suppose I don't mind a few spurious matches on the address, I want a wider date range, and I suspect that some messages I replied to might have had the subject keyword spelt wrongly (let's allow up to 2 errors): @example mairix d:6m- f:richard s:chrony=2 @end example @node date_syntax @section Syntax used for specifying dates This section describes the syntax used for specifying dates when searching using the @samp{d:} option. Dates are specified as a range. The start and end of the range can both be specified. Alternatively, if the start is omitted, it is treated as being the beginning of time. If the end is omitted, it is treated as the current time. There are 4 basic formats: @table @samp @item d:start-end Specify both start and end explicitly @item d:start- Specify start, end is the current time @item d:-end Specify end, start is 'a long time ago' (i.e. early enough to include any message). @item d:period Specify start and end implicitly, as the start and end of the period given. @end table The start and end can be specified either absolute or relative. A relative endpoint is given as a number followed by a single letter defining the scaling: @multitable @columnfractions 0.15 0.2 0.2 0.45 @item @b{letter} @tab @b{meaning} @tab @b{example} @tab @b{meaning} @item d @tab days @tab 3d @tab 3 days @item w @tab weeks @tab 2w @tab 2 weeks (14 days) @item m @tab months @tab 5m @tab 5 months (150 days) @item y @tab years @tab 4y @tab 4 years (4*365 days) @end multitable Months are always treated as 30 days, and years as 365 days, for this purpose. Absolute times can be specified in a lot of forms. Some forms have different meanings when they define a start date from that when they define an end date. Where a single expression specifies both the start and end (i.e. where the argument to d: doesn't contain a @samp{-}), it will usually have different interpretations in the two cases. In the examples below, suppose the current date is Sunday May 18th, 2003 (when I started to write this material.) @multitable @columnfractions 0.24 0.24 0.24 0.28 @item @b{Example} @tab @b{Start date} @tab @b{End date} @tab @b{Notes} @item d:20030301@minus{}20030425 @tab March 1st, 2003 @tab 25th April, 2003 @item d:030301@minus{}030425 @tab March 1st, 2003 @tab April 25th, 2003 @tab century assumed @item d:mar1@minus{}apr25 @tab March 1st, 2003 @tab April 25th, 2003 @item d:Mar1@minus{}Apr25 @tab March 1st, 2003 @tab April 25th, 2003 @tab case insensitive @item d:MAR1@minus{}APR25 @tab March 1st, 2003 @tab April 25th, 2003 @tab case insensitive @item d:1mar@minus{}25apr @tab March 1st, 2003 @tab April 25th, 2003 @tab date and month in either order @item d:2002 @tab January 1st, 2002 @tab December 31st, 2002 @tab whole year @item d:mar @tab March 1st, 2003 @tab March 31st, 2003 @tab most recent March @item d:oct @tab October 1st, 2002 @tab October 31st, 2002 @tab most recent October @item d:21oct@minus{}mar @tab October 21st, 2002 @tab March 31st, 2003 @tab start before end @item d:21apr@minus{}mar @tab April 21st, 2002 @tab March 31st, 2003 @tab start before end @item d:21apr@minus{} @tab April 21st, 2003 @tab May 18th, 2003 @tab end omitted @item d:@minus{}21apr @tab January 1st, 1900 @tab April 21st, 2003 @tab start omitted @item d:6w@minus{}2w @tab April 6th, 2003 @tab May 4th, 2003 @tab both dates relative @item d:21apr@minus{}1w @tab April 21st, 2003 @tab May 11th, 2003 @tab one date relative @item d:21apr@minus{}2y @tab April 21st, 2001 @tab May 11th, 2001 @tab start before end @item d:99@minus{}11 @tab January 1st, 1999 @tab May 11th, 2003 @tab 2 digits are a day of the month if possible, otherwise a year @item d:99oct@minus{}1oct @tab October 1st, 1999 @tab October 1st, 2002 @tab end before now, single digit is a day of the month @item d:99oct@minus{}01oct @tab October 1st, 1999 @tab October 31st, 2001 @tab 2 digits starting with zero treated as a year @item d:oct99@minus{}oct1 @tab October 1st, 1999 @tab October 1st, 2002 @tab day and month in either order @item d:oct99@minus{}oct01 @tab October 1st, 1999 @tab October 31st, 2001 @tab year and month in either order @end multitable The principles in the table work as follows. @itemize @bullet @item When the expression defines a period of more than a day (i.e. if a month or year is specified), the earliest day in the period is taken when the start date is defined, and the last day in the period if the end of the range is being defined. @item The end date is always taken to be on or before the current date. @item The start date is always taken to be on or before the end date. @end itemize @bye @c vim:cms=@c\ %s:fdm=marker:fdc=5:syntax=off mairix-master/reader.c000066400000000000000000000137161224450623700152630ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ /* Database reader */ #include #include #include #include #include #include #include #include #include #include #include "reader.h" #include "memmac.h" #include "mairix.h" int read_increment(unsigned char **encpos) {/*{{{*/ unsigned char *j = *encpos; int result; unsigned char x0, x1, x2, x3; x0 = *j++; if ((x0 & 0xc0) == 0xc0) { /* 4 byte encoding */ x1 = *j++; x2 = *j++; x3 = *j++; result = ((x0 & 0x3f) << 24) + (x1 << 16) + (x2 << 8) + x3; } else if (x0 & 0x80) { /* 2 byte encoding */ x1 = *j++; result = ((x0 & 0x7f) << 8) + x1; } else { /* Single byte encoding */ result = x0; } *encpos = j; return result; } /*}}}*/ static void read_toktable_db(char *data, struct toktable_db *toktable, int start, unsigned int *uidata)/*{{{*/ { toktable->n = uidata[start]; toktable->tok_offsets = uidata + uidata[start+1]; toktable->enc_offsets = uidata + uidata[start+2]; return; } /*}}}*/ static void read_toktable2_db(char *data, struct toktable2_db *toktable, int start, unsigned int *uidata)/*{{{*/ { toktable->n = uidata[start]; toktable->tok_offsets = uidata + uidata[start+1]; toktable->enc0_offsets = uidata + uidata[start+2]; toktable->enc1_offsets = uidata + uidata[start+3]; return; } /*}}}*/ struct read_db *open_db(char *filename)/*{{{*/ { int fd, len; char *data; struct stat sb; struct read_db *result; unsigned int *uidata; unsigned char *ucdata; fd = open(filename, O_RDONLY); if (fd < 0) { report_error("open", filename); unlock_and_exit (2); } if (fstat(fd, &sb) < 0) { report_error("stat", filename); unlock_and_exit(2); } len = sb.st_size; data = (char *) mmap(0, len, PROT_READ, MAP_SHARED, fd, 0); if (data == MAP_FAILED) { report_error("reader:mmap", filename); unlock_and_exit(2); } if (!data) { /* Empty file opened => database corrupt for sure */ if (close(fd) < 0) { report_error("close", filename); unlock_and_exit(2); } return NULL; } if (close(fd) < 0) { report_error("close", filename); unlock_and_exit(2); } result = new(struct read_db); uidata = (unsigned int *) data; /* alignment is assured */ ucdata = (unsigned char *) data; result->len = len; result->data = data; /*{{{ Magic number check */ if (ucdata[0] == HEADER_MAGIC0 || ucdata[1] == HEADER_MAGIC1 || ucdata[2] == HEADER_MAGIC2) { if (ucdata[3] != HEADER_MAGIC3) { fprintf(stderr, "Another version of this program produced the existing database! Please rebuild.\n"); unlock_and_exit(2); } } else { fprintf(stderr, "The existing database wasn't produced by this program! Please rebuild.\n"); unlock_and_exit(2); } /*}}}*/ /* {{{ Endianness check */ if (uidata[UI_ENDIAN] == 0x11223344) { fprintf(stderr, "The endianness of the database is reversed for this machine\n"); unlock_and_exit(2); } else if (uidata[UI_ENDIAN] != 0x44332211) { fprintf(stderr, "The endianness of this machine is strange (or database is corrupt)\n"); unlock_and_exit(2); } /* }}} */ /* Now build tables of where things are in the file */ result->n_msgs = uidata[UI_N_MSGS]; result->msg_type_and_flags = ucdata + uidata[UI_MSG_TYPE_AND_FLAGS]; result->path_offsets = uidata + uidata[UI_MSG_CDATA]; result->mtime_table = uidata + uidata[UI_MSG_MTIME]; result->size_table = uidata + uidata[UI_MSG_SIZE]; result->date_table = uidata + uidata[UI_MSG_DATE]; result->tid_table = uidata + uidata[UI_MSG_TID]; result->n_mboxen = uidata[UI_MBOX_N]; result->mbox_paths_table = uidata + uidata[UI_MBOX_PATHS]; result->mbox_entries_table = uidata + uidata[UI_MBOX_ENTRIES]; result->mbox_mtime_table = uidata + uidata[UI_MBOX_MTIME]; result->mbox_size_table = uidata + uidata[UI_MBOX_SIZE]; result->mbox_checksum_table = uidata + uidata[UI_MBOX_CKSUM]; result->hash_key = uidata[UI_HASH_KEY]; read_toktable_db(data, &result->to, UI_TO_BASE, uidata); read_toktable_db(data, &result->cc, UI_CC_BASE, uidata); read_toktable_db(data, &result->from, UI_FROM_BASE, uidata); read_toktable_db(data, &result->subject, UI_SUBJECT_BASE, uidata); read_toktable_db(data, &result->body, UI_BODY_BASE, uidata); read_toktable_db(data, &result->attachment_name, UI_ATTACHMENT_NAME_BASE, uidata); read_toktable2_db(data, &result->msg_ids, UI_MSGID_BASE, uidata); return result; } /*}}}*/ static void free_toktable_db(struct toktable_db *x)/*{{{*/ { /* Nothing to do */ } /*}}}*/ static void free_toktable2_db(struct toktable2_db *x)/*{{{*/ { /* Nothing to do */ } /*}}}*/ void close_db(struct read_db *x)/*{{{*/ { free_toktable_db(&x->to); free_toktable_db(&x->cc); free_toktable_db(&x->from); free_toktable_db(&x->subject); free_toktable_db(&x->body); free_toktable_db(&x->attachment_name); free_toktable2_db(&x->msg_ids); if (munmap(x->data, x->len) < 0) { perror("munmap"); unlock_and_exit(2); } free(x); return; } /*}}}*/ mairix-master/reader.h000066400000000000000000000137251224450623700152700ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002-2004,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #ifndef READER_H #define READER_H /* MX, then a high byte, then the version no. */ #define HEADER_MAGIC0 'M' #define HEADER_MAGIC1 'X' #define HEADER_MAGIC2 0xA5 #define HEADER_MAGIC3 0x03 /*{{{ Constants for file data positions */ #define UI_ENDIAN 1 #define UI_N_MSGS 2 /* Offset to byte-per-message table encoding message types */ #define UI_MSG_TYPE_AND_FLAGS 3 /* Header positions containing offsets to the per-message tables. */ /* Character data: * for maildir/MH : the path of the box. * for mbox : index of mbox containing the message */ #define UI_MSG_CDATA 4 /* For maildir/MH : mtime of file containing message */ #define UI_MSG_MTIME 5 /* For mbox msgs : the offset into the file */ #define UI_MSG_OFFSET 5 /* For all formats : message size */ #define UI_MSG_SIZE 6 /* For mbox msgs : offset into file */ #define UI_MSG_START 6 /* These are common to Maildir,MH,mbox messages */ #define UI_MSG_DATE 7 #define UI_MSG_TID 8 /* Header positions for mbox (file-level) information */ /* Number of mboxes */ #define UI_MBOX_N 9 #define UI_MBOX_PATHS 10 #define UI_MBOX_ENTRIES 11 /* mtime of mboxes */ #define UI_MBOX_MTIME 12 /* Size in bytes */ #define UI_MBOX_SIZE 13 /* Base of checksums for messages in each mbox */ #define UI_MBOX_CKSUM 14 #define UI_HASH_KEY 15 /* Header positions for token tables */ #define UI_TO_BASE 16 #define UI_CC_BASE 19 #define UI_FROM_BASE 22 #define UI_SUBJECT_BASE 25 #define UI_BODY_BASE 28 #define UI_ATTACHMENT_NAME_BASE 31 #define UI_MSGID_BASE 34 /* Larger than the last table offset. */ #define UI_HEADER_LEN 40 #define UC_HEADER_LEN ((UI_HEADER_LEN) << 2) #define UI_N_OFFSET 0 #define UI_TOK_OFFSET 1 #define UI_ENC_OFFSET 2 #define UI_TO_N (UI_TO_BASE + UI_N_OFFSET) #define UI_TO_TOK (UI_TO_BASE + UI_TOK_OFFSET) #define UI_TO_ENC (UI_TO_BASE + UI_ENC_OFFSET) #define UI_CC_N (UI_CC_BASE + UI_N_OFFSET) #define UI_CC_TOK (UI_CC_BASE + UI_TOK_OFFSET) #define UI_CC_ENC (UI_CC_BASE + UI_ENC_OFFSET) #define UI_FROM_N (UI_FROM_BASE + UI_N_OFFSET) #define UI_FROM_TOK (UI_FROM_BASE + UI_TOK_OFFSET) #define UI_FROM_ENC (UI_FROM_BASE + UI_ENC_OFFSET) #define UI_SUBJECT_N (UI_SUBJECT_BASE + UI_N_OFFSET) #define UI_SUBJECT_TOK (UI_SUBJECT_BASE + UI_TOK_OFFSET) #define UI_SUBJECT_ENC (UI_SUBJECT_BASE + UI_ENC_OFFSET) #define UI_BODY_N (UI_BODY_BASE + UI_N_OFFSET) #define UI_BODY_TOK (UI_BODY_BASE + UI_TOK_OFFSET) #define UI_BODY_ENC (UI_BODY_BASE + UI_ENC_OFFSET) #define UI_ATTACHMENT_NAME_N (UI_ATTACHMENT_NAME_BASE + UI_N_OFFSET) #define UI_ATTACHMENT_NAME_TOK (UI_ATTACHMENT_NAME_BASE + UI_TOK_OFFSET) #define UI_ATTACHMENT_NAME_ENC (UI_ATTACHMENT_NAME_BASE + UI_ENC_OFFSET) #define UI_MSGID_N (UI_MSGID_BASE + UI_N_OFFSET) #define UI_MSGID_TOK (UI_MSGID_BASE + UI_TOK_OFFSET) #define UI_MSGID_ENC0 (UI_MSGID_BASE + UI_ENC_OFFSET) #define UI_MSGID_ENC1 (UI_MSGID_ENC0 + 1) /*}}}*/ /*{{{ Literals used for encoding messages types in database file */ #define DB_MSG_DEAD 0 /* maildir/MH : one file per message */ #define DB_MSG_FILE 1 /* mbox : multiple files per message */ #define DB_MSG_MBOX 2 /*}}}*/ #define FLAG_SEEN (1<<3) #define FLAG_REPLIED (1<<4) #define FLAG_FLAGGED (1<<5) struct toktable_db {/*{{{*/ unsigned int n; /* number of entries in this table */ unsigned int *tok_offsets; /* offset to table of token offsets */ unsigned int *enc_offsets; /* offset to table of encoding offsets */ }; /*}}}*/ struct toktable2_db {/*{{{*/ unsigned int n; /* number of entries in this table */ unsigned int *tok_offsets; /* offset to table of token offsets */ unsigned int *enc0_offsets; /* offset to table of encoding offsets */ unsigned int *enc1_offsets; /* offset to table of encoding offsets */ }; /*}}}*/ struct read_db {/*{{{*/ /* Raw file parameters, needed later for munmap */ char *data; int len; /* Pathname information */ int n_msgs; unsigned char *msg_type_and_flags; unsigned int *path_offsets; /* or (mbox index, msg index) */ unsigned int *mtime_table; /* or offset into mbox */ unsigned int *size_table; /* either file size or span inside mbox */ unsigned int *date_table; unsigned int *tid_table; int n_mboxen; unsigned int *mbox_paths_table; unsigned int *mbox_entries_table; /* table of number of messages per mbox */ unsigned int *mbox_mtime_table; unsigned int *mbox_size_table; unsigned int *mbox_checksum_table; unsigned int hash_key; struct toktable_db to; struct toktable_db cc; struct toktable_db from; struct toktable_db subject; struct toktable_db body; struct toktable_db attachment_name; struct toktable2_db msg_ids; }; /*}}}*/ struct read_db *open_db(char *filename); void close_db(struct read_db *x); static inline int rd_msg_type(struct read_db *db, int i) { return db->msg_type_and_flags[i] & 0x7; } /* Common to search and db reader. */ int read_increment(unsigned char **encpos); #endif /* READER_H */ mairix-master/rfc822.c000066400000000000000000001212641224450623700150250ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005,2006,2007,2010 * rfc2047 decode: * Copyright (C) Mikael Ylikoski 2002 * gzip mbox support: * Copyright (C) Ico Doornekamp 2005 * Copyright (C) Felipe Gustavo de Almeida 2005 * bzip2 mbox support: * Copyright (C) Paramjit Oberoi 2005 * caching uncompressed mbox data: * Copyright (C) Chris Mason 2006 * memory leak fixes: * Copyright (C) Samuel Tardieu 2008 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "mairix.h" #include "nvp.h" #include #include #include #include #include #include #include #ifdef USE_GZIP_MBOX # include #endif #ifdef USE_BZIP_MBOX # include #endif struct DLL {/*{{{*/ struct DLL *next; struct DLL *prev; }; /*}}}*/ static void enqueue(void *head, void *x)/*{{{*/ { /* Declare this way so it can be used with any kind of double linked list * having next & prev pointers in its first two words. */ struct DLL *h = (struct DLL *) head; struct DLL *xx = (struct DLL *) x; xx->next = h; xx->prev = h->prev; h->prev->next = xx; h->prev = xx; return; } /*}}}*/ enum encoding_type {/*{{{*/ ENC_UNKNOWN, ENC_NONE, ENC_BINARY, ENC_7BIT, ENC_8BIT, ENC_QUOTED_PRINTABLE, ENC_BASE64, ENC_UUENCODE }; /*}}}*/ struct content_type_header {/*{{{*/ const char *major; /* e.g. text */ const char *minor; /* e.g. plain */ const char *boundary; /* for multipart */ /* charset? */ }; /*}}}*/ struct line {/*{{{*/ struct line *next; struct line *prev; char *text; }; /*}}}*/ static void init_headers(struct headers *hdrs)/*{{{*/ { hdrs->to = NULL; hdrs->cc = NULL; hdrs->from = NULL; hdrs->subject = NULL; hdrs->message_id = NULL; hdrs->in_reply_to = NULL; hdrs->references = NULL; hdrs->date = 0; hdrs->flags.seen = 0; hdrs->flags.replied = 0; hdrs->flags.flagged = 0; }; /*}}}*/ static void splice_header_lines(struct line *header)/*{{{*/ { /* Deal with newline then tab in header */ struct line *x, *next; for (x=header->next; x!=header; x=next) { #if 0 printf("next header, x->text=%08lx\n", x->text); printf("header=<%s>\n", x->text); #endif next = x->next; if (isspace(x->text[0] & 0xff)) { /* Glue to previous line */ char *p, *newbuf, *oldbuf; struct line *y; for (p=x->text; *p; p++) { if (!isspace(*(unsigned char *)p)) break; } p--; /* point to final space */ y = x->prev; #if 0 printf("y=%08lx p=%08lx\n", y->text, p); #endif newbuf = new_array(char, strlen(y->text) + strlen(p) + 1); strcpy(newbuf, y->text); strcat(newbuf, p); oldbuf = y->text; y->text = newbuf; free(oldbuf); y->next = x->next; x->next->prev = y; free(x->text); free(x); } } return; } /*}}}*/ static int audit_header(struct line *header)/*{{{*/ { /* Check for obvious broken-ness * 1st line has no leading spaces, single word then colon * following lines have leading spaces or single word followed by colon * */ struct line *x; int first=1; int count=1; for (x=header->next; x!=header; x=x->next) { int has_leading_space=0; int is_blank; int has_word_colon=0; if (1 || first) { /* Ignore any UUCP or mbox style From line at the start */ if (!strncmp("From ", x->text, 5)) { continue; } /* Ignore escaped From line at the start */ if (!strncmp(">From ", x->text, 6)) { continue; } } is_blank = !(x->text[0]); if (!is_blank) { char *p; int saw_char = 0; has_leading_space = isspace(x->text[0] & 0xff); has_word_colon = 0; /* default */ p = x->text; while(*p) { if(*p == ':') { has_word_colon = saw_char; break; } else if (isspace(*(unsigned char *) p)) { has_word_colon = 0; break; } else { saw_char = 1; } p++; } } if (( first && (is_blank || has_leading_space || !has_word_colon)) || (!first && (is_blank || !(has_leading_space || has_word_colon)))) { #if 0 fprintf(stderr, "Header line %d <%s> fails because:", count, x->text); if (first && is_blank) { fprintf(stderr, " [first && is_blank]"); } if (first && has_leading_space) { fprintf(stderr, " [first && has_leading_space]"); } if (first && !has_word_colon) { fprintf(stderr, " [first && !has_word_colon]"); } if (!first && is_blank) { fprintf(stderr, " [!first && is_blank]"); } if (!first && !(has_leading_space||has_word_colon)) { fprintf(stderr, " [!first && !has_leading_space||has_word_colon]"); } fprintf(stderr, "\n"); #endif /* Header fails the audit */ return 0; } first = 0; count++; } /* If we get here the header must have been OK */ return 1; }/*}}}*/ static int match_string(const char *ref, const char *candidate)/*{{{*/ { int len = strlen(ref); return !strncasecmp(ref, candidate, len); } /*}}}*/ static char equal_table[] = {/*{{{*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 00-0f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 10-1f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 20-2f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, /* 30-3f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 40-4f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 50-5f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 60-6f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 70-7f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 80-8f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 90-9f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* a0-af */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* b0-bf */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* c0-cf */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* d0-df */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* e0-ef */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 /* f0-ff */ }; /*}}}*/ static int base64_table[] = {/*{{{*/ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* 00-0f */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* 10-1f */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, /* 20-2f */ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, 0, -1, -1, /* 30-3f */ -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, /* 40-4f */ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, /* 50-5f */ -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, /* 60-6f */ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1, /* 70-7f */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* 80-8f */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* 90-9f */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* a0-af */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* b0-bf */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* c0-cf */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* d0-df */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* e0-ef */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 /* f0-ff */ }; /*}}}*/ static int hex_to_val(char x) {/*{{{*/ switch (x) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': return (x - '0'); break; case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': return 10 + (x - 'a'); break; case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': return 10 + (x - 'A'); break; default: return 0; } } /*}}}*/ static void decode_header_value(char *text){/*{{{*/ /* rfc2047 decode, written by Mikael Ylikoski */ char *s, *a, *b, *e, *p, *q; for (p = q = s = text; (s = strstr(s, "=?")); s = e + 2) { if (p == q) p = q = s; else while (q != s) *p++ = *q++; s += 2; a = strchr(s, '?'); if (!a) break; a++; b = strchr(a, '?'); if (!b) break; b++; e = strstr(b, "?="); if (!e) break; /* have found an encoded-word */ if (b - a != 2) continue; /* unknown encoding */ if (*a == 'q' || *a == 'Q') { int val; q = b; while (q < e) { if (*q == '_') { *p++ = 0x20; q++; } else if (*q == '=') { q++; val = hex_to_val(*q++) << 4; val += hex_to_val(*q++); *p++ = val; } else *p++ = *q++; } } else if (*a == 'b' || *a == 'B') { int reg, nc, eq; /* register, #characters in reg, #equals */ int dc; /* decoded character */ eq = reg = nc = 0; for (q = b; q < e; q++) { unsigned char cq = *(unsigned char *)q; dc = base64_table[cq]; eq += equal_table[cq]; if (dc >= 0) { reg <<= 6; reg += dc; nc++; if (nc == 4) { *p++ = ((reg >> 16) & 0xff); if (eq < 2) *p++ = ((reg >> 8) & 0xff); if (eq < 1) *p++ = reg & 0xff; nc = reg = 0; if (eq) break; } } } } else { continue; /* unknown encoding */ } q = e + 2; } if (p == q) return; while (*q != '\0') *p++ = *q++; *p = '\0'; } /*}}}*/ static char *copy_header_value(char *text){/*{{{*/ char *p; for (p = text; *p && (*p != ':'); p++) ; if (!*p) return NULL; p++; p = new_string(p); decode_header_value(p); return p; } /*}}}*/ static void copy_or_concat_header_value(char **previous, char *text){/*{{{*/ char *p = copy_header_value(text); if (*previous) { *previous = extend_string(*previous, ", "); *previous = extend_string(*previous, p); free(p); } else *previous = p; } /*}}}*/ static enum encoding_type decode_encoding_type(const char *e)/*{{{*/ { enum encoding_type result; const char *p; if (!e) { result = ENC_NONE; } else { for (p=e; *p && isspace(*(unsigned char *)p); p++) ; if ( match_string("7bit", p) || match_string("7-bit", p) || match_string("7 bit", p)) { result = ENC_7BIT; } else if (match_string("8bit", p) || match_string("8-bit", p) || match_string("8 bit", p)) { result = ENC_8BIT; } else if (match_string("quoted-printable", p)) { result = ENC_QUOTED_PRINTABLE; } else if (match_string("base64", p)) { result = ENC_BASE64; } else if (match_string("binary", p)) { result = ENC_BINARY; } else if (match_string("x-uuencode", p)) { result = ENC_UUENCODE; } else { fprintf(stderr, "Warning: unknown encoding type: '%s'\n", e); result = ENC_UNKNOWN; } } return result; } /*}}}*/ static void parse_content_type(struct nvp *ct_nvp, struct content_type_header *result)/*{{{*/ { result->major = NULL; result->minor = NULL; result->boundary = NULL; result->major = nvp_major(ct_nvp); if (result->major) { result->minor = nvp_minor(ct_nvp); } else { result->minor = NULL; result->major = nvp_first(ct_nvp); } result->boundary = nvp_lookupcase(ct_nvp, "boundary"); } /*}}}*/ static char *looking_at_ws_then_newline(char *start)/*{{{*/ { char *result; result = start; do { if (*result == '\n') return result; else if (!isspace(*(unsigned char *) result)) return NULL; else result++; } while (1); /* Can't get here */ assert(0); } /*}}}*/ static char *unencode_data(struct msg_src *src, char *input, int input_len, const char *enc, int *output_len)/*{{{*/ { enum encoding_type encoding; char *result, *end_result; char *end_input; encoding = decode_encoding_type(enc); end_input = input + input_len; /* All mime encodings result in expanded data, so this is guaranteed to * safely oversize the output array */ result = new_array(char, input_len + 1); /* Now decode */ switch (encoding) { case ENC_7BIT:/*{{{*/ case ENC_8BIT: case ENC_BINARY: case ENC_NONE: { memcpy(result, input, input_len); end_result = result + input_len; } break; /*}}}*/ case ENC_QUOTED_PRINTABLE:/*{{{*/ { char *p, *q; p = result; for (p=result, q=input; q 0; q += 4, len -= 3) { if (len >= 3) { *p++ = DEC(q[0]) << 2 | DEC(q[1]) >> 4; *p++ = DEC(q[1]) << 4 | DEC(q[2]) >> 2; *p++ = DEC(q[2]) << 6 | DEC(q[3]); } else { if (len >= 1) *p++ = DEC(q[0]) << 2 | DEC(q[1]) >> 4; if (len >= 2) *p++ = DEC(q[1]) << 4 | DEC(q[2]) >> 2; } } while (q < end_input && *q != '\n') q++; } end_result = p; } break; /*}}}*/ case ENC_UNKNOWN:/*{{{*/ fprintf(stderr, "Unknown encoding type in %s\n", format_msg_src(src)); /* fall through - ignore this data */ /*}}}*/ default:/*{{{*/ end_result = result; break; /*}}}*/ } *output_len = end_result - result; result[*output_len] = '\0'; /* for convenience with text/plain etc to make it printable */ return result; } /*}}}*/ char *format_msg_src(struct msg_src *src)/*{{{*/ { static char *buffer = NULL; static int buffer_len = 0; char *result; int len; switch (src->type) { case MS_FILE: result = src->filename; break; case MS_MBOX: len = strlen(src->filename); len += 32; if (!buffer || (len > buffer_len)) { free(buffer); buffer = new_array(char, len); buffer_len = len; } sprintf(buffer, "%s[%d,%d)", src->filename, (int) src->start, (int) (src->start + src->len)); result = buffer; break; default: result = NULL; break; } return result; } /*}}}*/ static int split_and_splice_header(struct msg_src *src, char *data, struct line *header, char **body_start)/*{{{*/ { char *sol, *eol; int blank_line; header->next = header->prev = header; sol = data; do { if (!*sol) break; blank_line = 1; /* until proven otherwise */ eol = sol; while (*eol && (*eol != '\n')) { if (!isspace(*(unsigned char *) eol)) blank_line = 0; eol++; } if (*eol == '\n') { if (!blank_line) { int line_length = eol - sol; char *line_text = new_array(char, 1 + line_length); struct line *new_header; strncpy(line_text, sol, line_length); line_text[line_length] = '\0'; new_header = new(struct line); new_header->text = line_text; enqueue(header, new_header); } sol = eol + 1; /* Start of next line */ } else { /* must be null char */ fprintf(stderr, "Got null character whilst processing header of %s\n", format_msg_src(src)); return -1; /* & leak memory */ } } while (!blank_line); *body_start = sol; if (audit_header(header)) { splice_header_lines(header); return 0; } else { #if 0 /* Caller generates message */ fprintf(stderr, "Message had bad rfc822 headers, ignoring\n"); #endif return -1; } } /*}}}*/ /* Forward prototypes */ static void do_multipart(struct msg_src *src, char *input, int input_len, const char *boundary, struct attachment *atts, enum data_to_rfc822_error *error); /*{{{ do_body() */ static void do_body(struct msg_src *src, char *body_start, int body_len, struct nvp *ct_nvp, struct nvp *cte_nvp, struct nvp *cd_nvp, struct attachment *atts, enum data_to_rfc822_error *error) { char *decoded_body; int decoded_body_len; const char *content_transfer_encoding; content_transfer_encoding = NULL; if (cte_nvp) { content_transfer_encoding = nvp_first(cte_nvp); if (!content_transfer_encoding) { fprintf(stderr, "Giving up on %s, content_transfer_encoding header not parseable\n", format_msg_src(src)); return; } } decoded_body = unencode_data(src, body_start, body_len, content_transfer_encoding, &decoded_body_len); if (ct_nvp) { struct content_type_header ct; parse_content_type(ct_nvp, &ct); if (ct.major && !strcasecmp(ct.major, "multipart")) { do_multipart(src, decoded_body, decoded_body_len, ct.boundary, atts, error); /* Don't need decoded body any longer - copies have been taken if * required when handling multipart attachments. */ free(decoded_body); if (error && (*error == DTR8_MISSING_END)) return; } else { /* unipart */ struct attachment *new_att; const char *disposition; new_att = new(struct attachment); disposition = cd_nvp ? nvp_first(cd_nvp) : NULL; if (disposition && !strcasecmp(disposition, "attachment")) { const char *lookup; lookup = nvp_lookupcase(cd_nvp, "filename"); if (lookup) { new_att->filename = new_string(lookup); } else { /* Some messages have name=... in content-type: instead of * filename=... in content-disposition. */ lookup = nvp_lookup(ct_nvp, "name"); if (lookup) { new_att->filename = new_string(lookup); } else { new_att->filename = NULL; } } } else { new_att->filename = NULL; } if (ct.major && !strcasecmp(ct.major, "text")) { if (ct.minor && !strcasecmp(ct.minor, "plain")) { new_att->ct = CT_TEXT_PLAIN; } else if (ct.minor && !strcasecmp(ct.minor, "html")) { new_att->ct = CT_TEXT_HTML; } else { new_att->ct = CT_TEXT_OTHER; } } else if (ct.major && !strcasecmp(ct.major, "message") && ct.minor && !strcasecmp(ct.minor, "rfc822")) { new_att->ct = CT_MESSAGE_RFC822; } else { new_att->ct = CT_OTHER; } if (new_att->ct == CT_MESSAGE_RFC822) { new_att->data.rfc822 = data_to_rfc822(src, decoded_body, decoded_body_len, error); free(decoded_body); /* data no longer needed */ } else { new_att->data.normal.len = decoded_body_len; new_att->data.normal.bytes = decoded_body; } enqueue(atts, new_att); } } else { /* Treat as text/plain {{{*/ struct attachment *new_att; new_att = new(struct attachment); new_att->filename = NULL; new_att->ct = CT_TEXT_PLAIN; new_att->data.normal.len = decoded_body_len; /* Add null termination on the end */ new_att->data.normal.bytes = new_array(char, decoded_body_len + 1); memcpy(new_att->data.normal.bytes, decoded_body, decoded_body_len + 1); free(decoded_body); enqueue(atts, new_att);/*}}}*/ } } /*}}}*/ /*{{{ do_attachment() */ static void do_attachment(struct msg_src *src, char *start, char *after_end, struct attachment *atts) { /* decode attachment and add to attachment list */ struct line header, *x, *nx; char *body_start; int body_len; struct nvp *ct_nvp, *cte_nvp, *cd_nvp, *nvp; if (split_and_splice_header(src, start, &header, &body_start) < 0) { fprintf(stderr, "Giving up on attachment with bad header in %s\n", format_msg_src(src)); return; } /* Extract key headers */ ct_nvp = cte_nvp = cd_nvp = NULL; for (x=header.next; x!=&header; x=x->next) { if ((nvp = make_nvp(src, x->text, "content-type:"))) { ct_nvp = nvp; } else if ((nvp = make_nvp(src, x->text, "content-transfer-encoding:"))) { cte_nvp = nvp; } else if ((nvp = make_nvp(src, x->text, "content-disposition:"))) { cd_nvp = nvp; } } #if 0 if (ct_nvp) { fprintf(stderr, "======\n"); fprintf(stderr, "Dump of content-type hdr\n"); nvp_dump(ct_nvp, stderr); free(ct_nvp); } if (cte_nvp) { fprintf(stderr, "======\n"); fprintf(stderr, "Dump of content-transfer-encoding hdr\n"); nvp_dump(cte_nvp, stderr); free(cte_nvp); } #endif if (body_start > after_end) { /* This is a (maliciously?) b0rken attachment, e.g. maybe empty */ if (verbose) { fprintf(stderr, "Message %s contains an invalid attachment, length=%d bytes\n", format_msg_src(src), (int)(after_end - start)); } } else { body_len = after_end - body_start; /* Ignore errors in nested body parts. */ do_body(src, body_start, body_len, ct_nvp, cte_nvp, cd_nvp, atts, NULL); } /* Free header memory */ for (x=header.next; x!=&header; x=nx) { nx = x->next; free(x->text); free(x); } if (ct_nvp) free_nvp(ct_nvp); if (cte_nvp) free_nvp(cte_nvp); if (cd_nvp) free_nvp(cd_nvp); } /*}}}*/ /*{{{ do_multipart() */ static void do_multipart(struct msg_src *src, char *input, int input_len, const char *boundary, struct attachment *atts, enum data_to_rfc822_error *error) { char *b0, *b1, *be, *bx; char *line_after_b0, *start_b1_search_from; int boundary_len; int looking_at_end_boundary; if (!boundary) { fprintf(stderr, "Can't process multipart message %s with no boundary string\n", format_msg_src(src)); if (error) *error = DTR8_MULTIPART_SANS_BOUNDARY; return; } boundary_len = strlen(boundary); b0 = NULL; line_after_b0 = input; be = input + input_len; do { int boundary_ok; start_b1_search_from = line_after_b0; do { /* reject boundaries that aren't a whole line */ b1 = NULL; for (bx = start_b1_search_from; bx < be - (boundary_len + 4); bx++) { if (bx[0] == '-' && bx[1] == '-' && !strncmp(bx+2, boundary, boundary_len)) { b1 = bx; break; } } if (!b1) { if (error) *error = DTR8_MISSING_END; return; } looking_at_end_boundary = (b1[boundary_len+2] == '-' && b1[boundary_len+3] == '-'); boundary_ok = 1; if ((b1 > input) && (*(b1-1) != '\n')) boundary_ok = 0; if (!looking_at_end_boundary && (b1 + boundary_len + 2 < input + input_len) && (*(b1 + boundary_len + 2) != '\n')) boundary_ok = 0; if (!boundary_ok) { char *eol = strchr(b1, '\n'); if (!eol) { fprintf(stderr, "Oops, didn't find another normal boundary in %s\n", format_msg_src(src)); return; } start_b1_search_from = 1 + eol; } } while (!boundary_ok); /* b1 is now looking at a good boundary, which might be the final one */ if (b0) { /* don't treat preamble as an attachment */ do_attachment(src, line_after_b0, b1, atts); } b0 = b1; line_after_b0 = strchr(b0, '\n'); if (line_after_b0 == 0) line_after_b0 = b0 + strlen(b0); else ++line_after_b0; } while (b1 < be && !looking_at_end_boundary); } /*}}}*/ static time_t parse_rfc822_date(char *date_string)/*{{{*/ { struct tm tm; char *s, *z; /* Format [weekday ,] day-of-month month year hour:minute:second timezone. Some of the ideas, sanity checks etc taken from parse.c in the mutt sources, credit to Michael R. Elkins et al */ s = date_string; z = strchr(s, ','); if (z) s = z + 1; while (*s && isspace(*s)) s++; /* Should now be looking at day number */ if (!isdigit(*s)) goto tough_cheese; tm.tm_mday = atoi(s); if (tm.tm_mday > 31) goto tough_cheese; while (isdigit(*s)) s++; while (*s && isspace(*s)) s++; if (!*s) goto tough_cheese; if (!strncasecmp(s, "jan", 3)) tm.tm_mon = 0; else if (!strncasecmp(s, "feb", 3)) tm.tm_mon = 1; else if (!strncasecmp(s, "mar", 3)) tm.tm_mon = 2; else if (!strncasecmp(s, "apr", 3)) tm.tm_mon = 3; else if (!strncasecmp(s, "may", 3)) tm.tm_mon = 4; else if (!strncasecmp(s, "jun", 3)) tm.tm_mon = 5; else if (!strncasecmp(s, "jul", 3)) tm.tm_mon = 6; else if (!strncasecmp(s, "aug", 3)) tm.tm_mon = 7; else if (!strncasecmp(s, "sep", 3)) tm.tm_mon = 8; else if (!strncasecmp(s, "oct", 3)) tm.tm_mon = 9; else if (!strncasecmp(s, "nov", 3)) tm.tm_mon = 10; else if (!strncasecmp(s, "dec", 3)) tm.tm_mon = 11; else goto tough_cheese; while (!isspace(*s)) s++; while (*s && isspace(*s)) s++; if (!isdigit(*s)) goto tough_cheese; tm.tm_year = atoi(s); if (tm.tm_year < 70) { tm.tm_year += 100; } else if (tm.tm_year >= 1900) { tm.tm_year -= 1900; } while (isdigit(*s)) s++; while (*s && isspace(*s)) s++; if (!*s) goto tough_cheese; /* Now looking at hms */ /* For now, forget this. The searching will be vague enough that nearest day is good enough. */ tm.tm_hour = 0; tm.tm_min = 0; tm.tm_sec = 0; tm.tm_isdst = 0; return mktime(&tm); tough_cheese: return (time_t) -1; /* default value */ } /*}}}*/ static void scan_status_flags(const char *s, struct headers *hdrs)/*{{{*/ { const char *p; for (p=s; *p; p++) { switch (*p) { case 'R': hdrs->flags.seen = 1; break; case 'A': hdrs->flags.replied = 1; break; case 'F': hdrs->flags.flagged = 1; break; default: break; } } } /*}}}*/ /*{{{ data_to_rfc822() */ struct rfc822 *data_to_rfc822(struct msg_src *src, char *data, int length, enum data_to_rfc822_error *error) { struct rfc822 *result; char *body_start; struct line header; struct line *x, *nx; struct nvp *ct_nvp, *cte_nvp, *cd_nvp, *nvp; int body_len; if (error) *error = DTR8_OK; /* default */ result = new(struct rfc822); init_headers(&result->hdrs); result->atts.next = result->atts.prev = &result->atts; if (split_and_splice_header(src, data, &header, &body_start) < 0) { if (verbose) { fprintf(stderr, "Giving up on message %s with bad header\n", format_msg_src(src)); } if (error) *error = DTR8_BAD_HEADERS; return NULL; } /* Extract key headers {{{*/ ct_nvp = cte_nvp = cd_nvp = NULL; for (x=header.next; x!=&header; x=x->next) { if (match_string("to", x->text)) copy_or_concat_header_value(&result->hdrs.to, x->text); else if (match_string("cc", x->text)) copy_or_concat_header_value(&result->hdrs.cc, x->text); else if (!result->hdrs.from && match_string("from", x->text)) result->hdrs.from = copy_header_value(x->text); else if (!result->hdrs.subject && match_string("subject", x->text)) result->hdrs.subject = copy_header_value(x->text); else if (!ct_nvp && (nvp = make_nvp(src, x->text, "content-type:"))) ct_nvp = nvp; else if (!cte_nvp && (nvp = make_nvp(src, x->text, "content-transfer-encoding:"))) cte_nvp = nvp; else if (!cd_nvp && (nvp = make_nvp(src, x->text, "content-disposition:"))) cd_nvp = nvp; else if (!result->hdrs.date && match_string("date", x->text)) { char *date_string = copy_header_value(x->text); result->hdrs.date = parse_rfc822_date(date_string); free(date_string); } else if (!result->hdrs.message_id && match_string("message-id", x->text)) result->hdrs.message_id = copy_header_value(x->text); else if (!result->hdrs.in_reply_to && match_string("in-reply-to", x->text)) result->hdrs.in_reply_to = copy_header_value(x->text); else if (!result->hdrs.references && match_string("references", x->text)) result->hdrs.references = copy_header_value(x->text); else if (match_string("status", x->text)) scan_status_flags(x->text + sizeof("status:"), &result->hdrs); else if (match_string("x-status", x->text)) scan_status_flags(x->text + sizeof("x-status:"), &result->hdrs); } /*}}}*/ /* Process body */ body_len = length - (body_start - data); do_body(src, body_start, body_len, ct_nvp, cte_nvp, cd_nvp, &result->atts, error); /* Free header memory */ for (x=header.next; x!=&header; x=nx) { nx = x->next; free(x->text); free(x); } if (ct_nvp) free_nvp(ct_nvp); if (cte_nvp) free_nvp(cte_nvp); if (cd_nvp) free_nvp(cd_nvp); return result; } /*}}}*/ #define ALLOC_NONE 1 #define ALLOC_MMAP 2 #define ALLOC_MALLOC 3 int data_alloc_type; #if USE_GZIP_MBOX || USE_BZIP_MBOX #define SIZE_STEP (8 * 1024 * 1024) #define COMPRESSION_NONE 0 #define COMPRESSION_GZIP 1 #define COMPRESSION_BZIP 2 static int get_compression_type(const char *filename) {/*{{{*/ size_t len = strlen(filename); int ptr; #ifdef USE_GZIP_MBOX ptr = len - 3; if (len > 3 && strncasecmp(filename + ptr, ".gz", 3) == 0) { return COMPRESSION_GZIP; } #endif #ifdef USE_BZIP_MBOX ptr = len - 4; if (len > 3 && strncasecmp(filename + ptr, ".bz2", 4) == 0) { return COMPRESSION_BZIP; } #endif return COMPRESSION_NONE; } /*}}}*/ static int is_compressed(const char *filename) {/*{{{*/ return (get_compression_type(filename) != COMPRESSION_NONE); } /*}}}*/ struct zFile {/*{{{*/ union { /* Both gzFile and BZFILE* are defined as void pointers * in their respective header files. */ #ifdef USE_GZIP_MBOX gzFile gzf; #endif #ifdef USE_BZIP_MBOX BZFILE *bzf; #endif void *zptr; } foo; int type; }; /*}}}*/ static struct zFile * xx_zopen(const char *filename, const char *mode) {/*{{{*/ struct zFile *zf = new(struct zFile); zf->type = get_compression_type(filename); switch (zf->type) { #ifdef USE_GZIP_MBOX case COMPRESSION_GZIP: zf->foo.gzf = gzopen(filename, "rb"); break; #endif #ifdef USE_BZIP_MBOX case COMPRESSION_BZIP: zf->foo.bzf = BZ2_bzopen(filename, "rb"); break; #endif default: zf->foo.zptr = NULL; break; } if (!zf->foo.zptr) { free(zf); return 0; } return zf; } /*}}}*/ static void xx_zclose(struct zFile *zf) {/*{{{*/ switch (zf->type) { #ifdef USE_GZIP_MBOX case COMPRESSION_GZIP: gzclose(zf->foo.gzf); break; #endif #ifdef USE_BZIP_MBOX case COMPRESSION_BZIP: BZ2_bzclose(zf->foo.bzf); break; #endif default: zf->foo.zptr = NULL; break; } free(zf); } /*}}}*/ static int xx_zread(struct zFile *zf, void *buf, int len) {/*{{{*/ switch (zf->type) { #ifdef USE_GZIP_MBOX case COMPRESSION_GZIP: return gzread(zf->foo.gzf, buf, len); break; #endif #ifdef USE_BZIP_MBOX case COMPRESSION_BZIP: return BZ2_bzread(zf->foo.bzf, buf, len); break; #endif default: return 0; break; } } /*}}}*/ #endif #if USE_GZIP_MBOX || USE_BZIP_MBOX /* do we need ROCACHE_SIZE > 1? the code supports any number here */ #define ROCACHE_SIZE 1 struct ro_mapping { char *filename; unsigned char *map; size_t len; }; static int ro_cache_init = 0; static struct ro_mapping ro_mapping_cache[ROCACHE_SIZE]; /* find a temp file in the mapping cache. If nothing is found lasti is * set to the next slot to use for insertion. You have to check that slot * to see if it is currently in use */ static struct ro_mapping *find_ro_cache(const char *filename, int *lasti) { int i = 0; struct ro_mapping *ro = NULL; if (lasti) *lasti = 0; if (!ro_cache_init) return NULL; for (i = 0 ; i < ROCACHE_SIZE ; i++) { ro = ro_mapping_cache + i; if (!ro->map) { if (lasti) *lasti = i; return NULL; } if (strcmp(filename, ro->filename) == 0) return ro; } /* if we're here, the map is full. They will reuse slot 0 */ return NULL; } /* * put a new tempfile into the cache. It is mmaped as part of this function * so you can safely close the file handle after calling this. */ static struct ro_mapping *add_ro_cache(const char *filename, int fd, size_t len) { int i = 0; struct ro_mapping *ro = NULL; if (!ro_cache_init) { memset(&ro_mapping_cache, 0, sizeof(ro_mapping_cache)); ro_cache_init = 1; } ro = find_ro_cache(filename, &i); if (ro) { fprintf(stderr, "%s already in ro cache\n", filename); return NULL; } ro = ro_mapping_cache + i; if (ro->map) { munmap(ro->map, ro->len); ro->map = NULL; free(ro->filename); } ro->map = (unsigned char *)mmap(0, len, PROT_READ, MAP_SHARED, fd, 0); if (ro->map == MAP_FAILED) { ro->map = NULL; perror("rfc822:mmap"); return NULL; } ro->len = len; ro->filename = new_string(filename); return ro; } #endif /* USE_GZIP_MBOX || USE_BZIP_MBOX */ void create_ro_mapping(const char *filename, unsigned char **data, int *len)/*{{{*/ { struct stat sb; int fd; #if USE_GZIP_MBOX || USE_BZIP_MBOX struct zFile *zf; #endif if (stat(filename, &sb) < 0) { report_error("stat", filename); *data = NULL; return; } #if USE_GZIP_MBOX || USE_BZIP_MBOX if(is_compressed(filename)) { unsigned char *p; size_t cur_read; struct ro_mapping *ro; FILE *tmpf; /* this branch never returns things that are freeable */ data_alloc_type = ALLOC_NONE; ro = find_ro_cache(filename, NULL); if (ro) { *data = ro->map; *len = ro->len; return; } if(verbose) { fprintf(stderr, "Decompressing %s...\n", filename); } tmpf = tmpfile(); if (!tmpf) { perror("tmpfile"); goto comp_error; } zf = xx_zopen(filename, "rb"); if (!zf) { fprintf(stderr, "Could not open %s\n", filename); goto comp_error; } p = new_array(unsigned char, SIZE_STEP); cur_read = xx_zread(zf, p, SIZE_STEP); if (fwrite(p, cur_read, 1, tmpf) != 1) { fprintf(stderr, "failed writing to temp file for %s\n", filename); goto comp_error; } *len = cur_read; if (cur_read >= SIZE_STEP) { while(1) { int ret; cur_read = xx_zread(zf, p, SIZE_STEP); if (cur_read <= 0) break; *len += cur_read; ret = fwrite(p, cur_read, 1, tmpf); if (ret != 1) { fprintf(stderr, "failed writing to temp file for %s\n", filename); goto comp_error; } } } free(p); xx_zclose(zf); if(*len > 0) { ro = add_ro_cache(filename, fileno(tmpf), *len); if (!ro) goto comp_error; *data = ro->map; *len = ro->len; } else { *data = NULL; } fclose(tmpf); return; comp_error: *data = NULL; *len = 0; if (tmpf) fclose(tmpf); return; } #endif /* USE_GZIP_MBOX || USE_BZIP_MBOX */ *len = sb.st_size; if (*len == 0) { *data = NULL; return; } if (!S_ISREG(sb.st_mode)) { *data = NULL; return; } fd = open(filename, O_RDONLY); if (fd < 0) { report_error("open", filename); *data = NULL; return; } *data = (unsigned char *) mmap(0, *len, PROT_READ, MAP_SHARED, fd, 0); if (close(fd) < 0) report_error("close", filename); if (*data == MAP_FAILED) { report_error("rfc822:mmap", filename); *data = NULL; return; } data_alloc_type = ALLOC_MMAP; } /*}}}*/ void free_ro_mapping(unsigned char *data, int len)/*{{{*/ { int r; if(data_alloc_type == ALLOC_MALLOC) { free(data); } if(data_alloc_type == ALLOC_MMAP) { r = munmap(data, len); if(r < 0) { fprintf(stderr, "munmap() errord\n"); exit(1); } } } /*}}}*/ static struct msg_src *setup_msg_src(char *filename)/*{{{*/ { static struct msg_src result; result.type = MS_FILE; result.filename = filename; return &result; } /*}}}*/ struct rfc822 *make_rfc822(char *filename)/*{{{*/ { int len; unsigned char *data; struct rfc822 *result; create_ro_mapping(filename, &data, &len); /* Don't process empty files */ result = NULL; if (data) { struct msg_src *src; /* Now process the data */ src = setup_msg_src(filename); /* For one message per file, ignore missing end boundary condition. */ result = data_to_rfc822(src, (char *) data, len, NULL); free_ro_mapping(data, len); } return result; } /*}}}*/ void free_rfc822(struct rfc822 *msg)/*{{{*/ { struct attachment *a, *na; if (!msg) return; if (msg->hdrs.to) free(msg->hdrs.to); if (msg->hdrs.cc) free(msg->hdrs.cc); if (msg->hdrs.from) free(msg->hdrs.from); if (msg->hdrs.subject) free(msg->hdrs.subject); if (msg->hdrs.message_id) free(msg->hdrs.message_id); if (msg->hdrs.in_reply_to) free(msg->hdrs.in_reply_to); if (msg->hdrs.references) free(msg->hdrs.references); for (a = msg->atts.next; a != &msg->atts; a = na) { na = a->next; if (a->filename) free(a->filename); if (a->ct == CT_MESSAGE_RFC822) { free_rfc822(a->data.rfc822); } else { free(a->data.normal.bytes); } free(a); } free(msg); } /*}}}*/ #ifdef TEST static void do_indent(int indent)/*{{{*/ { int i; for (i=indent; i>0; i--) { putchar(' '); } } /*}}}*/ static void show_header(char *tag, char *x, int indent)/*{{{*/ { if (x) { do_indent(indent); printf("%s: %s\n", tag, x); } } /*}}}*/ static void show_rfc822(struct rfc822 *msg, int indent)/*{{{*/ { struct attachment *a; show_header("From", msg->hdrs.from, indent); show_header("To", msg->hdrs.to, indent); show_header("Cc", msg->hdrs.cc, indent); show_header("Date", msg->hdrs.date, indent); show_header("Subject", msg->hdrs.subject, indent); for (a = msg->atts.next; a != &msg->atts; a=a->next) { printf("========================\n"); switch (a->ct) { case CT_TEXT_PLAIN: printf("Attachment type text/plain\n"); break; case CT_TEXT_HTML: printf("Attachment type text/html\n"); break; case CT_TEXT_OTHER: printf("Attachment type text/non-plain\n"); break; case CT_MESSAGE_RFC822: printf("Attachment type message/rfc822\n"); break; case CT_OTHER: printf("Attachment type other\n"); break; } if (a->ct != CT_MESSAGE_RFC822) { printf("%d bytes\n", a->data.normal.len); } if ((a->ct == CT_TEXT_PLAIN) || (a->ct == CT_TEXT_HTML) || (a->ct == CT_TEXT_OTHER)) { printf("----------\n"); printf("%s\n", a->data.normal.bytes); } if (a->ct == CT_MESSAGE_RFC822) { show_rfc822(a->data.rfc822, indent + 4); } } } /*}}}*/ int main (int argc, char **argv)/*{{{*/ { struct rfc822 *msg; if (argc < 2) { fprintf(stderr, "Need a path\n"); unlock_and_exit(2); } msg = make_rfc822(argv[1]); show_rfc822(msg, 0); free_rfc822(msg); /* Print out some stuff */ return 0; } /*}}}*/ #endif /* TEST */ mairix-master/search.c000066400000000000000000001166221224450623700152660ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002,2003,2004,2005,2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include #include #include #include #include #include #include #include #include #include #include #include /* Lame fix for systems where NAME_MAX isn't defined after including the above * set of .h files (Solaris, FreeBSD so far). Probably grossly oversized but * it'll do. */ #if !defined(NAME_MAX) #define NAME_MAX 4096 #endif #include "mairix.h" #include "reader.h" #include "memmac.h" static void mark_hits_in_table(struct read_db *db, struct toktable_db *tt, int hit_tok, char *hits)/*{{{*/ { /* mark files containing matched token */ int idx; unsigned char *j, *first_char; idx = 0; first_char = (unsigned char *) db->data + tt->enc_offsets[hit_tok]; for (j = first_char; *j != 0xff; ) { idx += read_increment(&j); assert(idx < db->n_msgs); hits[idx] = 1; } } /*}}}*/ static void mark_hits_in_table2(struct read_db *db, struct toktable2_db *tt, int hit_tok, char *hits)/*{{{*/ { /* mark files containing matched token */ int idx; unsigned char *j, *first_char; idx = 0; first_char = (unsigned char *) db->data + tt->enc1_offsets[hit_tok]; for (j = first_char; *j != 0xff; ) { idx += read_increment(&j); assert(idx < db->n_msgs); hits[idx] = 1; } } /*}}}*/ /* See "Fast text searching with errors, Sun Wu and Udi Manber, TR 91-11, University of Arizona. I have been informed that this algorithm is NOT patented. This implementation of it is entirely the work of Richard P. Curnow - I haven't looked at any related source (webglimpse, agrep etc) in writing this. */ static void build_match_vector(char *substring, unsigned long *a, unsigned long *hit)/*{{{*/ { int len; char *p; int i; len = strlen(substring); if (len > 31 || len == 0) { fprintf(stderr, "Can't match patterns longer than 31 characters or empty\n"); unlock_and_exit(2); } memset(a, 0xff, 256 * sizeof(unsigned long)); for (p=substring, i=0; *p; p++, i++) { a[(unsigned int) *(unsigned char *)p] &= ~(1UL << i); } *hit = ~(1UL << (len-1)); return; } /*}}}*/ static int substring_match_0(unsigned long *a, unsigned long hit, int left_anchor, char *token)/*{{{*/ { int got_hit=0; char *p; unsigned long r0; unsigned long anchor, anchor1; r0 = ~0; got_hit = 0; anchor = 0; anchor1 = left_anchor ? 0x1 : 0x0; for(p=token; *p; p++) { int idx = (unsigned int) *(unsigned char *)p; r0 = (r0<<1) | anchor | a[idx]; if (~(r0 | hit)) { got_hit = 1; break; } anchor = anchor1; } return got_hit; } /*}}}*/ static int substring_match_1(unsigned long *a, unsigned long hit, int left_anchor, char *token)/*{{{*/ { int got_hit=0; char *p; unsigned long r0, r1, nr0; unsigned long anchor, anchor1; r0 = ~0; r1 = r0<<1; got_hit = 0; anchor = 0; anchor1 = left_anchor ? 0x1 : 0x0; for(p=token; *p; p++) { int idx = (unsigned int) *(unsigned char *)p; nr0 = (r0<<1) | anchor | a[idx]; r1 = ((r1<<1) | anchor | a[idx]) & ((r0 & nr0) << 1) & r0; r0 = nr0; if (~((r0 & r1) | hit)) { got_hit = 1; break; } anchor = anchor1; } return got_hit; } /*}}}*/ static int substring_match_2(unsigned long *a, unsigned long hit, int left_anchor, char *token)/*{{{*/ { int got_hit=0; char *p; unsigned long r0, r1, r2, nr0, nr1; unsigned long anchor, anchor1; r0 = ~0; r1 = r0<<1; r2 = r1<<1; got_hit = 0; anchor = 0; anchor1 = left_anchor ? 0x1 : 0x0; for(p=token; *p; p++) { int idx = (unsigned int) *(unsigned char *)p; nr0 = (r0<<1) | anchor | a[idx]; nr1 = ((r1<<1) | anchor | a[idx]) & ((r0 & nr0) << 1) & r0; r2 = ((r2<<1) | anchor | a[idx]) & ((r1 & nr1) << 1) & r1; r0 = nr0; r1 = nr1; if (~((r0 & r1 & r2) | hit)) { got_hit = 1; break; } anchor = anchor1; } return got_hit; } /*}}}*/ static int substring_match_3(unsigned long *a, unsigned long hit, int left_anchor, char *token)/*{{{*/ { int got_hit=0; char *p; unsigned long r0, r1, r2, r3, nr0, nr1, nr2; unsigned long anchor, anchor1; r0 = ~0; r1 = r0<<1; r2 = r1<<1; r3 = r2<<1; got_hit = 0; anchor = 0; anchor1 = left_anchor ? 0x1 : 0x0; for(p=token; *p; p++) { int idx = (unsigned int) *(unsigned char *)p; nr0 = (r0<<1) | anchor | a[idx]; nr1 = ((r1<<1) | anchor | a[idx]) & ((r0 & nr0) << 1) & r0; nr2 = ((r2<<1) | anchor | a[idx]) & ((r1 & nr1) << 1) & r1; r3 = ((r3<<1) | anchor | a[idx]) & ((r2 & nr2) << 1) & r2; r0 = nr0; r1 = nr1; r2 = nr2; if (~((r0 & r1 & r2 & r3) | hit)) { got_hit = 1; break; } anchor = anchor1; } return got_hit; } /*}}}*/ static int substring_match_general(unsigned long *a, unsigned long hit, int left_anchor, char *token, int max_errors, unsigned long *r, unsigned long *nr)/*{{{*/ { int got_hit=0; char *p; int j; unsigned long anchor, anchor1; r[0] = ~0; anchor = 0; anchor1 = left_anchor ? 0x1 : 0x0; for (j=1; j<=max_errors; j++) { r[j] = r[j-1] << 1; } got_hit = 0; for(p=token; *p; p++) { int idx = (unsigned int) *(unsigned char *)p; int d; unsigned int compo; compo = nr[0] = ((r[0]<<1) | anchor | a[idx]); for (d=1; d<=max_errors; d++) { nr[d] = ((r[d]<<1) | anchor | a[idx]) & ((r[d-1] & nr[d-1])<<1) & r[d-1]; compo &= nr[d]; } memcpy(r, nr, (1 + max_errors) * sizeof(unsigned long)); if (~(compo | hit)) { got_hit = 1; break; } anchor = anchor1; } return got_hit; } /*}}}*/ static void match_substring_in_table(struct read_db *db, struct toktable_db *tt, char *substring, int max_errors, int left_anchor, char *hits)/*{{{*/ { int i, got_hit; unsigned long a[256]; unsigned long *r=NULL, *nr=NULL; unsigned long hit; char *token; build_match_vector(substring, a, &hit); got_hit = 0; if (max_errors > 3) { r = new_array(unsigned long, 1 + max_errors); nr = new_array(unsigned long, 1 + max_errors); } for (i=0; in; i++) { token = db->data + tt->tok_offsets[i]; switch (max_errors) { /* Optimise common cases for few errors to allow optimizer to keep bitmaps * in registers */ case 0: got_hit = substring_match_0(a, hit, left_anchor, token); break; case 1: got_hit = substring_match_1(a, hit, left_anchor, token); break; case 2: got_hit = substring_match_2(a, hit, left_anchor, token); break; case 3: got_hit = substring_match_3(a, hit, left_anchor, token); break; default: got_hit = substring_match_general(a, hit, left_anchor, token, max_errors, r, nr); break; } if (got_hit) { mark_hits_in_table(db, tt, i, hits); } } if (r) free(r); if (nr) free(nr); } /*}}}*/ static void match_substring_in_table2(struct read_db *db, struct toktable2_db *tt, char *substring, int max_errors, int left_anchor, char *hits)/*{{{*/ { int i, got_hit; unsigned long a[256]; unsigned long *r=NULL, *nr=NULL; unsigned long hit; char *token; build_match_vector(substring, a, &hit); got_hit = 0; if (max_errors > 3) { r = new_array(unsigned long, 1 + max_errors); nr = new_array(unsigned long, 1 + max_errors); } for (i=0; in; i++) { token = db->data + tt->tok_offsets[i]; switch (max_errors) { /* Optimise common cases for few errors to allow optimizer to keep bitmaps * in registers */ case 0: got_hit = substring_match_0(a, hit, left_anchor, token); break; case 1: got_hit = substring_match_1(a, hit, left_anchor, token); break; case 2: got_hit = substring_match_2(a, hit, left_anchor, token); break; case 3: got_hit = substring_match_3(a, hit, left_anchor, token); break; default: got_hit = substring_match_general(a, hit, left_anchor, token, max_errors, r, nr); break; } if (got_hit) { mark_hits_in_table2(db, tt, i, hits); } } if (r) free(r); if (nr) free(nr); } /*}}}*/ static void match_substring_in_paths(struct read_db *db, char *substring, int max_errors, int left_anchor, char *hits)/*{{{*/ { int i; unsigned long a[256]; unsigned long *r=NULL, *nr=NULL; unsigned long hit; build_match_vector(substring, a, &hit); if (max_errors > 3) { r = new_array(unsigned long, 1 + max_errors); nr = new_array(unsigned long, 1 + max_errors); } for (i=0; in_msgs; i++) { char *token = NULL; unsigned int mbix, msgix; switch (rd_msg_type(db, i)) { case DB_MSG_FILE: token = db->data + db->path_offsets[i]; break; case DB_MSG_MBOX: decode_mbox_indices(db->path_offsets[i], &mbix, &msgix); token = db->data + db->mbox_paths_table[mbix]; break; case DB_MSG_DEAD: hits[i] = 0; /* never match on dead paths */ goto next_message; } assert(token); switch (max_errors) { /* Optimise common cases for few errors to allow optimizer to keep bitmaps * in registers */ case 0: hits[i] = substring_match_0(a, hit, left_anchor, token); break; case 1: hits[i] = substring_match_1(a, hit, left_anchor, token); break; case 2: hits[i] = substring_match_2(a, hit, left_anchor, token); break; case 3: hits[i] = substring_match_3(a, hit, left_anchor, token); break; default: hits[i] = substring_match_general(a, hit, left_anchor, token, max_errors, r, nr); break; } next_message: (void) 0; } if (r) free(r); if (nr) free(nr); } /*}}}*/ static void match_string_in_table(struct read_db *db, struct toktable_db *tt, char *key, char *hits)/*{{{*/ { /* TODO : replace with binary search? */ int i; for (i=0; in; i++) { if (!strcmp(key, db->data + tt->tok_offsets[i])) { /* get all matching files */ mark_hits_in_table(db, tt, i, hits); } } } /*}}}*/ static void match_string_in_table2(struct read_db *db, struct toktable2_db *tt, char *key, char *hits)/*{{{*/ { /* TODO : replace with binary search? */ int i; for (i=0; in; i++) { if (!strcmp(key, db->data + tt->tok_offsets[i])) { /* get all matching files */ mark_hits_in_table2(db, tt, i, hits); } } } /*}}}*/ static int parse_size_expr(char *x)/*{{{*/ { int result; int n; if (1 == sscanf(x, "%d%n", &result, &n)) { x += n; switch (*x) { case 'k': case 'K': result <<= 10; break; case 'm': case 'M': result <<= 20; break; default: break; } return result; } else { fprintf(stderr, "Could not parse message size expression <%s>\n", x); return -1; } } /*}}}*/ static void parse_size_range(char *size_expr, int *has_start, int *start, int *has_end, int *end)/*{{{*/ { char *x = size_expr; char *dash; int len; if (*x == ':') x++; len = strlen(x); dash = strchr(x, '-'); *has_start = *has_end = 0; if (dash) { char *p, *q; if (dash > x) { char *s; s = new_array(char, dash - x + 1); for (p=s, q=x; q end) { int temp = start; start = end; end = temp; } } for (i=0; in_msgs; i++) { start_cond = has_start ? (db->size_table[i] > start) : 1; end_cond = has_end ? (db->size_table[i] < end ) : 1; if (start_cond && end_cond) { hits[i] = 1; } } } /*}}}*/ static void find_date_matches_in_table(struct read_db *db, char *date_expr, char *hits)/*{{{*/ { time_t start, end; int has_start, has_end, start_cond, end_cond; int i; int status; status = scan_date_string(date_expr, &start, &has_start, &end, &has_end); if (status) { unlock_and_exit (2); } if (has_start && has_end) { /* Allow user to put the endpoints in backwards */ if (start > end) { time_t temp = start; start = end; end = temp; } } for (i=0; in_msgs; i++) { start_cond = has_start ? (db->date_table[i] > start) : 1; end_cond = has_end ? (db->date_table[i] < end ) : 1; if (start_cond && end_cond) { hits[i] = 1; } } } /*}}}*/ static void find_flag_matches_in_table(struct read_db *db, char *flag_expr, char *hits)/*{{{*/ { int pos_seen, neg_seen; int pos_replied, neg_replied; int pos_flagged, neg_flagged; int negate; char *p; int i; negate = 0; pos_seen = neg_seen = 0; pos_replied = neg_replied = 0; pos_flagged = neg_flagged = 0; for (p=flag_expr; *p; p++) { switch (*p) { case '-': negate = 1; break; case 's': case 'S': if (negate) neg_seen = 1; else pos_seen = 1; negate = 0; break; case 'r': case 'R': if (negate) neg_replied = 1; else pos_replied = 1; negate = 0; break; case 'f': case 'F': if (negate) neg_flagged = 1; else pos_flagged = 1; negate = 0; break; default: fprintf(stderr, "Did not understand the character '%c' (0x%02x) in the flags argument F:%s\n", isprint(*p) ? *p : '.', (int) *(unsigned char *) p, flag_expr); break; } } for (i=0; in_msgs; i++) { if ((!pos_seen || (db->msg_type_and_flags[i] & FLAG_SEEN)) && (!neg_seen || !(db->msg_type_and_flags[i] & FLAG_SEEN)) && (!pos_replied || (db->msg_type_and_flags[i] & FLAG_REPLIED)) && (!neg_replied || !(db->msg_type_and_flags[i] & FLAG_REPLIED)) && (!pos_flagged || (db->msg_type_and_flags[i] & FLAG_FLAGGED)) && (!neg_flagged || !(db->msg_type_and_flags[i] & FLAG_FLAGGED))) { hits[i] = 1; } } } /*}}}*/ static char *mk_maildir_path(int token, char *output_dir, int is_in_new, int is_seen, int is_replied, int is_flagged)/*{{{*/ { char *result; char uniq_buf[48]; int len; len = strlen(output_dir) + 64; /* oversize */ result = new_array(char, len + 1 + sizeof(":2,FRS")); strcpy(result, output_dir); strcat(result, is_in_new ? "/new/" : "/cur/"); sprintf(uniq_buf, "123456789.%d.mairix", token); strcat(result, uniq_buf); if (is_seen || is_replied || is_flagged) { strcat(result, ":2,"); } if (is_flagged) strcat(result, "F"); if (is_replied) strcat(result, "R"); if (is_seen) strcat(result, "S"); return result; } /*}}}*/ static char *mk_mh_path(int token, char *output_dir)/*{{{*/ { char *result; char uniq_buf[8]; int len; len = strlen(output_dir) + 10; /* oversize */ result = new_array(char, len); strcpy(result, output_dir); strcat(result, "/"); sprintf(uniq_buf, "%d", token+1); strcat(result, uniq_buf); return result; } /*}}}*/ static int looks_like_maildir_new_p(const char *p)/*{{{*/ { const char *s1, *s2; s2 = p; while (*s2) s2++; while ((s2 > p) && (*s2 != '/')) s2--; if (s2 <= p) return 0; s1 = s2 - 1; while ((s1 > p) && (*s1 != '/')) s1--; if (s1 <= p) return 0; if (!strncmp(s1, "/new/", 5)) { return 1; } else { return 0; } } /*}}}*/ static void create_symlink(char *link_target, char *new_link)/*{{{*/ { if ((!do_hardlinks && symlink(link_target, new_link) < 0) || link(link_target, new_link)) { if (verbose) { perror("symlink"); fprintf(stderr, "Failed path <%s> -> <%s>\n", link_target, new_link); } } } /*}}}*/ static void mbox_terminate(const unsigned char *data, int len, FILE *out)/*{{{*/ { if (len == 0) fputs("\n", out); else if (len == 1) { if (data[0] != '\n') fputs("\n", out); } else if (data[len-1] != '\n') fputs("\n\n", out); else if (data[len-2] != '\n') fputs("\n", out); } /*}}}*/ static void append_file_to_mbox(const char *path, FILE *out)/*{{{*/ { unsigned char *data; int len; create_ro_mapping(path, &data, &len); if (data) { fprintf(out, "From mairix@mairix Mon Jan 1 12:34:56 1970\n"); fprintf(out, "X-source-folder: %s\n", path); fwrite (data, sizeof(unsigned char), len, out); mbox_terminate(data, len, out); free_ro_mapping(data, len); } return; } /*}}}*/ static int had_failed_checksum; static void get_validated_mbox_msg(struct read_db *db, int msg_index,/*{{{*/ int *mbox_index, unsigned char **mbox_data, int *mbox_len, unsigned char **msg_data, int *msg_len) { /* msg_data==NULL if checksum mismatches */ unsigned char *start; checksum_t csum; unsigned int mbi, msgi; *msg_data = NULL; *msg_len = 0; decode_mbox_indices(db->path_offsets[msg_index], &mbi, &msgi); *mbox_index = mbi; create_ro_mapping(db->data + db->mbox_paths_table[mbi], mbox_data, mbox_len); if (!*mbox_data) return; start = *mbox_data + db->mtime_table[msg_index]; /* Ensure that we don't run off the end of the mmap'd file */ if (db->mtime_table[msg_index] >= *mbox_len) *msg_len = 0; else if (db->mtime_table[msg_index] + db->size_table[msg_index] >= *mbox_len) *msg_len = *mbox_len - db->mtime_table[msg_index]; else *msg_len = db->size_table[msg_index]; compute_checksum((char *)start, *msg_len, &csum); if (!memcmp((db->data + db->mbox_checksum_table[mbi] + (msgi * sizeof(checksum_t))), &csum, sizeof(checksum_t))) { *msg_data = start; } else { had_failed_checksum = 1; } return; } /*}}}*/ static void append_mboxmsg_to_mbox(struct read_db *db, int msg_index, FILE *out)/*{{{*/ { /* Need to common up code with try_copy_to_path */ unsigned char *mbox_start, *msg_start; int mbox_len, msg_len; int mbox_index; get_validated_mbox_msg(db, msg_index, &mbox_index, &mbox_start, &mbox_len, &msg_start, &msg_len); if (msg_start) { /* Artificial from line, we don't have the envelope sender so this is going to be artificial anyway. */ fprintf(out, "From mairix@mairix Mon Jan 1 12:34:56 1970\n"); fprintf(out, "X-source-folder: %s\n", db->data + db->mbox_paths_table[mbox_index]); fwrite(msg_start, sizeof(unsigned char), msg_len, out); mbox_terminate(msg_start, msg_len, out); } if (mbox_start) { free_ro_mapping(mbox_start, mbox_len); } } /*}}}*/ static void try_copy_to_path(struct read_db *db, int msg_index, char *target_path)/*{{{*/ { unsigned char *data; int mbox_len, msg_len; int mbi; FILE *out; unsigned char *start; get_validated_mbox_msg(db, msg_index, &mbi, &data, &mbox_len, &start, &msg_len); if (start) { out = fopen(target_path, "wb"); if (out) { fprintf(out, "X-source-folder: %s\n", db->data + db->mbox_paths_table[mbi]); fwrite(start, sizeof(char), msg_len?msg_len-1:0, out); fclose(out); } } if (data) { free_ro_mapping(data, mbox_len); } return; } /*}}}*/ static struct msg_src *setup_mbox_msg_src(char *filename, off_t start, size_t len)/*{{{*/ { static struct msg_src result; result.type = MS_MBOX; result.filename = filename; result.start = start; result.len = len; return &result; } /*}}}*/ static void get_flags_from_file(struct read_db *db, int idx, int *is_seen, int *is_replied, int *is_flagged) { *is_seen = (db->msg_type_and_flags[idx] & FLAG_SEEN) ? 1 : 0; *is_replied = (db->msg_type_and_flags[idx] & FLAG_REPLIED) ? 1 : 0; *is_flagged = (db->msg_type_and_flags[idx] & FLAG_FLAGGED) ? 1 : 0; } static void string_tolower(char *str) { char *p; for (p=str; *p; p++) { *p = tolower(*(unsigned char *)p); } } static int do_search(struct read_db *db, char **args, char *output_path, int show_threads, enum folder_type ft, int verbose)/*{{{*/ { char *colon, *start_words; int do_body, do_subject, do_from, do_to, do_cc, do_date, do_size; int do_att_name; int do_flags; int do_path, do_msgid; char *key; char *hit0, *hit1, *hit2, *hit3; int i; int n_hits; int left_anchor; had_failed_checksum = 0; hit0 = new_array(char, db->n_msgs); hit1 = new_array(char, db->n_msgs); hit2 = new_array(char, db->n_msgs); hit3 = new_array(char, db->n_msgs); /* Argument structure is * x:tokena+tokenb,~tokenc,tokend+tokene * * + (and) binds more tightly than , * , (or) binds more tightly than separate args * * * hit1 gathers the tokens and'ed with + * hit2 gathers the tokens or'ed with , * hit3 gathers the separate args and'ed with * */ /* Everything matches until proven otherwise */ memset(hit3, 1, db->n_msgs); while (*args) { /* key is a single argument, separate args are and-ed together */ key = *args++; memset(hit2, 0, db->n_msgs); memset(hit1, 1, db->n_msgs); do_to = 0; do_cc = 0; do_from = 0; do_subject = 0; do_body = 0; do_date = 0; do_size = 0; do_path = 0; do_msgid = 0; do_att_name = 0; do_flags = 0; colon = strchr(key, ':'); if (colon) { char *p; for (p=key; p\n", *p); break; } } if (do_msgid && (p-key) > 1) { fprintf(stderr, "Message-ID key can't be used with other keys\n"); unlock_and_exit(2); } start_words = 1 + colon; } else { do_body = do_subject = do_to = do_cc = do_from = 1; start_words = key; } if (do_date || do_size || do_flags) { memset(hit0, 0, db->n_msgs); if (do_date) { find_date_matches_in_table(db, start_words, hit0); } else if (do_size) { find_size_matches_in_table(db, start_words, hit0); } else if (do_flags) { find_flag_matches_in_table(db, start_words, hit0); } /* AND-combine match vectors */ for (i=0; in_msgs; i++) { hit1[i] &= hit0[i]; } } else if (do_msgid) { char *lower_word = new_string(start_words); string_tolower(lower_word); memset(hit0, 0, db->n_msgs); match_string_in_table2(db, &db->msg_ids, lower_word, hit0); free(lower_word); /* AND-combine match vectors */ for (i=0; in_msgs; i++) { hit1[i] &= hit0[i]; } } else { /*{{{ Scan over separate words within this argument */ do { /* / = 'or' separator * , = 'and' separator */ char *orsep; char *andsep; char *word, *orig_word, *lower_word; char *equal; int negate; int had_orsep; int max_errors; orsep = strchr(start_words, '/'); andsep = strchr(start_words, ','); had_orsep = 0; if (andsep && (!orsep || (andsep < orsep))) { char *p, *q; word = new_array(char, 1 + (andsep - start_words)); /* maybe oversize */ for (p=word, q=start_words; q < andsep; q++) { if (!isspace(*(unsigned char *)q)) { *p++ = *q; } } *p = 0; start_words = andsep + 1; } else if (orsep) { /* comes before + if there's a + */ char *p, *q; word = new_array(char, 1 + (orsep - start_words)); /* maybe oversize */ for (p=word, q=start_words; q < orsep; q++) { if (!isspace(*(unsigned char *)q)) { *p++ = *q; } } *p = 0; start_words = orsep + 1; had_orsep = 1; } else { word = new_string(start_words); while (*start_words) ++start_words; } orig_word = word; if (word[0] == '~') { negate = 1; word++; } else { negate = 0; } if (word[0] == '^') { left_anchor = 1; word++; } else { left_anchor = 0; } equal = strchr(word, '='); if (equal) { *equal = 0; max_errors = atoi(equal + 1); /* Extend this to do anchoring etc */ } else { max_errors = 0; /* keep GCC quiet */ } /* Canonicalise search string to lowercase, since the database has all * tokens handled that way. But not for path search! */ lower_word = new_string(word); string_tolower(lower_word); memset(hit0, 0, db->n_msgs); if (equal) { if (do_to) match_substring_in_table(db, &db->to, lower_word, max_errors, left_anchor, hit0); if (do_cc) match_substring_in_table(db, &db->cc, lower_word, max_errors, left_anchor, hit0); if (do_from) match_substring_in_table(db, &db->from, lower_word, max_errors, left_anchor, hit0); if (do_subject) match_substring_in_table(db, &db->subject, lower_word, max_errors, left_anchor, hit0); if (do_body) match_substring_in_table(db, &db->body, lower_word, max_errors, left_anchor, hit0); if (do_att_name) match_substring_in_table(db, &db->attachment_name, lower_word, max_errors, left_anchor, hit0); if (do_path) match_substring_in_paths(db, word, max_errors, left_anchor, hit0); } else { if (do_to) match_string_in_table(db, &db->to, lower_word, hit0); if (do_cc) match_string_in_table(db, &db->cc, lower_word, hit0); if (do_from) match_string_in_table(db, &db->from, lower_word, hit0); if (do_subject) match_string_in_table(db, &db->subject, lower_word, hit0); if (do_body) match_string_in_table(db, &db->body, lower_word, hit0); if (do_att_name) match_string_in_table(db, &db->attachment_name, lower_word, hit0); /* FIXME */ if (do_path) match_substring_in_paths(db, word, 0, left_anchor, hit0); } free(lower_word); /* AND-combine match vectors */ for (i=0; in_msgs; i++) { if (negate) { hit1[i] &= !hit0[i]; } else { hit1[i] &= hit0[i]; } } if (had_orsep) { /* OR-combine match vectors */ for (i=0; in_msgs; i++) { hit2[i] |= hit1[i]; } memset(hit1, 1, db->n_msgs); } free(orig_word); } while (*start_words); /*}}}*/ } /* OR-combine match vectors */ for (i=0; in_msgs; i++) { hit2[i] |= hit1[i]; } /* AND-combine match vectors */ for (i=0; in_msgs; i++) { hit3[i] &= hit2[i]; } } n_hits = 0; if (show_threads) {/*{{{*/ char *tids; tids = new_array(char, db->n_msgs); memset(tids, 0, db->n_msgs); for (i=0; in_msgs; i++) { if (hit3[i]) { tids[db->tid_table[i]] = 1; } } for (i=0; in_msgs; i++) { if (tids[db->tid_table[i]]) { hit3[i] = 1; } } free(tids); } /*}}}*/ switch (ft) { case FT_MAILDIR:/*{{{*/ for (i=0; in_msgs; i++) { if (hit3[i]) { int is_seen, is_replied, is_flagged; get_flags_from_file(db, i, &is_seen, &is_replied, &is_flagged); switch (rd_msg_type(db, i)) { case DB_MSG_FILE: { char *target_path; char *message_path; int is_in_new; message_path = db->data + db->path_offsets[i]; is_in_new = looks_like_maildir_new_p(message_path); target_path = mk_maildir_path(i, output_path, is_in_new, is_seen, is_replied, is_flagged); create_symlink(message_path, target_path); free(target_path); ++n_hits; } break; case DB_MSG_MBOX: { char *target_path = mk_maildir_path(i, output_path, !is_seen, is_seen, is_replied, is_flagged); try_copy_to_path(db, i, target_path); free(target_path); ++n_hits; } break; case DB_MSG_DEAD: break; } } } break; /*}}}*/ case FT_MH:/*{{{*/ for (i=0; in_msgs; i++) { if (hit3[i]) { switch (rd_msg_type(db, i)) { case DB_MSG_FILE: { char *target_path = mk_mh_path(i, output_path); create_symlink(db->data + db->path_offsets[i], target_path); free(target_path); ++n_hits; } break; case DB_MSG_MBOX: { char *target_path = mk_mh_path(i, output_path); try_copy_to_path(db, i, target_path); free(target_path); ++n_hits; } break; case DB_MSG_DEAD: break; } } } break; /*}}}*/ case FT_MBOX:/*{{{*/ { FILE *out; out = fopen(output_path, "ab"); if (!out) { fprintf(stderr, "Cannot open output folder %s\n", output_path); unlock_and_exit(1); } for (i=0; in_msgs; i++) { if (hit3[i]) { switch (rd_msg_type(db, i)) { case DB_MSG_FILE: { append_file_to_mbox(db->data + db->path_offsets[i], out); ++n_hits; } break; case DB_MSG_MBOX: { append_mboxmsg_to_mbox(db, i, out); ++n_hits; } break; case DB_MSG_DEAD: break; } } } fclose(out); } break; /*}}}*/ case FT_RAW:/*{{{*/ for (i=0; in_msgs; i++) { if (hit3[i]) { switch (rd_msg_type(db, i)) { case DB_MSG_FILE: { ++n_hits; printf("%s\n", db->data + db->path_offsets[i]); } break; case DB_MSG_MBOX: { unsigned int mbix, msgix; int start, len, after_end; start = db->mtime_table[i]; len = db->size_table[i]; after_end = start + len; ++n_hits; decode_mbox_indices(db->path_offsets[i], &mbix, &msgix); printf("mbox:%s [%d,%d)\n", db->data + db->mbox_paths_table[mbix], start, after_end); } break; case DB_MSG_DEAD: break; } } } break; /*}}}*/ case FT_EXCERPT:/*{{{*/ for (i=0; in_msgs; i++) { if (hit3[i]) { struct rfc822 *parsed = NULL; switch (rd_msg_type(db, i)) { case DB_MSG_FILE: { char *filename; ++n_hits; printf("---------------------------------\n"); filename = db->data + db->path_offsets[i]; printf("%s\n", filename); parsed = make_rfc822(filename); } break; case DB_MSG_MBOX: { unsigned int mbix, msgix; int start, len, after_end; unsigned char *mbox_start, *msg_start; int mbox_len, msg_len; int mbox_index; start = db->mtime_table[i]; len = db->size_table[i]; after_end = start + len; ++n_hits; printf("---------------------------------\n"); decode_mbox_indices(db->path_offsets[i], &mbix, &msgix); printf("mbox:%s [%d,%d)\n", db->data + db->mbox_paths_table[mbix], start, after_end); get_validated_mbox_msg(db, i, &mbox_index, &mbox_start, &mbox_len, &msg_start, &msg_len); if (msg_start) { enum data_to_rfc822_error error; struct msg_src *msg_src; msg_src = setup_mbox_msg_src(db->data + db->mbox_paths_table[mbix], start, msg_len); parsed = data_to_rfc822(msg_src, (char *) msg_start, msg_len, &error); } if (mbox_start) { free_ro_mapping(mbox_start, mbox_len); } } break; case DB_MSG_DEAD: break; } if (parsed) { char datebuf[64]; struct tm *thetm; if (parsed->hdrs.to) printf(" To: %s\n", parsed->hdrs.to); if (parsed->hdrs.cc) printf(" Cc: %s\n", parsed->hdrs.cc); if (parsed->hdrs.from) printf(" From: %s\n", parsed->hdrs.from); if (parsed->hdrs.subject) printf(" Subject: %s\n", parsed->hdrs.subject); if (parsed->hdrs.message_id) printf(" Message-ID: %s\n", parsed->hdrs.message_id); thetm = gmtime(&parsed->hdrs.date); strftime(datebuf, sizeof(datebuf), "%a, %d %b %Y", thetm); printf(" Date: %s\n", datebuf); free_rfc822(parsed); } } } break; /*}}}*/ default: assert(0); break; } free(hit0); free(hit1); free(hit2); free(hit3); if ((ft != FT_RAW) && (ft != FT_EXCERPT)) { printf("Matched %d messages\n", n_hits); } fflush(stdout); if (had_failed_checksum) { fprintf(stderr, "WARNING : \n" "Matches were found in mbox folders but the message checksums failed.\n" "You may need to run mairix in indexing mode then repeat your search.\n"); } /* Return error code 1 to the shell if no messages were matched. */ return (n_hits == 0) ? 1 : 0; } /*}}}*/ static int directory_exists_remove_other(char *name)/*{{{*/ { struct stat sb; if (stat(name, &sb) < 0) { return 0; } if (S_ISDIR(sb.st_mode)) { return 1; } else { /* Try to remove. */ unlink(name); return 0; } } /*}}}*/ static void create_dir(char *path)/*{{{*/ { if (mkdir(path, 0700) < 0) { fprintf(stderr, "Could not create directory %s\n", path); unlock_and_exit(2); } fprintf(stderr, "Created directory %s\n", path); return; } /*}}}*/ static void maybe_create_maildir(char *path)/*{{{*/ { char *subdir, *tailpos; int len; if (!directory_exists_remove_other(path)) { create_dir(path); } len = strlen(path); subdir = new_array(char, len + 5); strcpy(subdir, path); strcpy(subdir+len, "/"); tailpos = subdir + len + 1; strcpy(tailpos,"cur"); if (!directory_exists_remove_other(subdir)) { create_dir(subdir); } strcpy(tailpos,"new"); if (!directory_exists_remove_other(subdir)) { create_dir(subdir); } strcpy(tailpos,"tmp"); if (!directory_exists_remove_other(subdir)) { create_dir(subdir); } free(subdir); return; } /*}}}*/ static void clear_maildir_subfolder(char *path, char *subdir)/*{{{*/ { char *sdir; char *fpath; int len; DIR *d; struct dirent *de; struct stat sb; len = strlen(path) + strlen(subdir); sdir = new_array(char, len + 2); fpath = new_array(char, len + 3 + NAME_MAX); strcpy(sdir, path); strcat(sdir, "/"); strcat(sdir, subdir); d = opendir(sdir); if (d) { while ((de = readdir(d))) { strcpy(fpath, sdir); strcat(fpath, "/"); strcat(fpath, de->d_name); if (lstat(fpath, &sb) >= 0) { /* Deal with both symlinks to maildir/MH messages as well as real files * where mbox messages have been written. */ if (S_ISLNK(sb.st_mode) || S_ISREG(sb.st_mode)) { /* FIXME : Can you unlink from a directory while doing a readdir loop over it? */ if (unlink(fpath) < 0) { fprintf(stderr, "Unlinking %s failed\n", fpath); } } } } closedir(d); } free(fpath); free(sdir); } /*}}}*/ static void clear_mh_folder(char *path)/*{{{*/ { char *fpath; int len; DIR *d; struct dirent *de; struct stat sb; len = strlen(path); fpath = new_array(char, len + 3 + NAME_MAX); d = opendir(path); if (d) { while ((de = readdir(d))) { if (valid_mh_filename_p(de->d_name)) { strcpy(fpath, path); strcat(fpath, "/"); strcat(fpath, de->d_name); if (lstat(fpath, &sb) >= 0) { /* See under maildir above for explanation */ if (S_ISLNK(sb.st_mode) || S_ISREG(sb.st_mode)) { /* FIXME : Can you unlink from a directory while doing a readdir loop over it? */ if (unlink(fpath) < 0) { fprintf(stderr, "Unlinking %s failed\n", fpath); } } } } } closedir(d); } free(fpath); } /*}}}*/ static void clear_mbox_folder(char *path)/*{{{*/ { unlink(path); } /*}}}*/ int search_top(int do_threads, int do_augment, char *database_path, char *complete_mfolder, char **argv, enum folder_type ft, int verbose)/*{{{*/ { struct read_db *db; int result; db = open_db(database_path); switch (ft) { case FT_MAILDIR: maybe_create_maildir(complete_mfolder); break; case FT_MH: if (!directory_exists_remove_other(complete_mfolder)) { create_dir(complete_mfolder); } break; case FT_MBOX: /* Nothing to do */ break; case FT_RAW: case FT_EXCERPT: break; default: assert(0); } if (!do_augment) { switch (ft) { case FT_MAILDIR: clear_maildir_subfolder(complete_mfolder, "new"); clear_maildir_subfolder(complete_mfolder, "cur"); break; case FT_MH: clear_mh_folder(complete_mfolder); break; case FT_MBOX: clear_mbox_folder(complete_mfolder); break; case FT_RAW: case FT_EXCERPT: break; default: assert(0); } } result = do_search(db, argv, complete_mfolder, do_threads, ft, verbose); free(complete_mfolder); close_db(db); return result; } /*}}}*/ mairix-master/stats.c000066400000000000000000000076531224450623700151620ustar00rootroot00000000000000/* mairix - message index builder and finder for maildir folders. ********************************************************************** * Copyright (C) Richard P. Curnow 2002-2004 * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * ********************************************************************** */ #include "mairix.h" #include "memmac.h" #include "reader.h" static void do_toktable(struct toktable *x, int *lc, int *elc, int *ec, int size, int *ml, int *mel, int *me) { int i; for (i=0; isize; i++) { struct token *tok = x->tokens[i]; unsigned char *j, *last_char; int incr; if (tok) { int len = strlen(tok->text); if (len > size) { fprintf(stderr, "Token length %d exceeds size\n", len); } else { lc[len]++; if (len > *ml) *ml = len; } /* Deal with encoding length */ if (tok->match0.n > size) { fprintf(stderr, "Token encoding length %d exceeds size\n", tok->match0.n); } else { elc[tok->match0.n]++; if (tok->match0.n > *mel) *mel = tok->match0.n; } /* Deal with encoding */ j = tok->match0.msginfo; last_char = j + tok->match0.n; while (j < last_char) { incr = read_increment(&j); if (incr > size) { fprintf(stderr, "Encoding increment %d exceeds size\n", incr); } else { ec[incr]++; if (incr > *me) *me = incr; } } } } } void print_table(int *x, int max) { int total, sum; int i; int kk, kk1; total = 0; for (i = 0; i<=max; i++) { total += x[i]; } sum = 0; kk1 = 0; for (i = 0; i<=max; i++) { sum += x[i]; kk = (int)((double)sum*256.0/(double)total); printf("%5d : %5d %3d %3d\n", i, x[i], kk-kk1, kk); kk1 = kk; } } void get_db_stats(struct database *db) { /* Deal with paths later - problem is, they will be biased by length of folder_base at the moment. */ int size = 4096; int *len_counts, *enc_len_counts, *enc_counts; int max_len, max_enc_len, max_enc; max_len = 0; max_enc_len = 0; max_enc = 0; len_counts = new_array(int, size); memset(len_counts, 0, size * sizeof(int)); enc_len_counts = new_array(int, size); memset(enc_len_counts, 0, size * sizeof(int)); enc_counts = new_array(int, size); memset(enc_counts, 0, size * sizeof(int)); do_toktable(db->to, len_counts, enc_len_counts, enc_counts, size, &max_len, &max_enc_len, &max_enc); do_toktable(db->cc, len_counts, enc_len_counts, enc_counts, size, &max_len, &max_enc_len, &max_enc); do_toktable(db->from, len_counts, enc_len_counts, enc_counts, size, &max_len, &max_enc_len, &max_enc); do_toktable(db->subject, len_counts, enc_len_counts, enc_counts, size, &max_len, &max_enc_len, &max_enc); do_toktable(db->body, len_counts, enc_len_counts, enc_counts, size, &max_len, &max_enc_len, &max_enc); #if 0 /* no longer works now that the msg_ids table has 2 encoding chains. fix * this when required. */ do_toktable(db->msg_ids, len_counts, enc_len_counts, enc_counts, size, &max_len, &max_enc_len, &max_enc); #endif printf("Max token length : %d\n", max_len); print_table(len_counts, max_len); printf("Max encoding vector length : %d\n", max_enc_len); print_table(enc_len_counts, max_enc_len); printf("Max encoding increment : %d\n", max_enc); print_table(enc_counts, max_enc); return; } mairix-master/test/000077500000000000000000000000001224450623700146245ustar00rootroot00000000000000mairix-master/test/.gitignore000066400000000000000000000002541224450623700166150ustar00rootroot00000000000000*.status *.data .split_mboxen_marker messages/mbox_split messages/mh/BigMessages/1 messages/mh/BigMessages/2 messages/mh/BigMessages/3 messages/mh/BigMessages/4 !Makefile mairix-master/test/10-add-messages-maildir.test-spec000066400000000000000000000000711224450623700226450ustar00rootroot00000000000000add_messages maildir animals assert_dump animals-maildirmairix-master/test/10-add-messages-mbox.test-spec000066400000000000000000000000641224450623700221730ustar00rootroot00000000000000add_messages mbox animals assert_dump animals-mbox mairix-master/test/10-add-messages-mh.test-spec000066400000000000000000000000571224450623700216340ustar00rootroot00000000000000add_messages mh animals assert_dump animals-mhmairix-master/test/15-add-messages-all-formats.test-spec000066400000000000000000000001441224450623700234530ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals mairix-master/test/20-remove-purge-messages-maildir.test-spec000066400000000000000000000003311224450623700245320ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals remove_messages maildir animals assert_dump animals-removed-maildir purge_database animals-removed-maildir-purged mairix-master/test/20-remove-purge-messages-mbox.test-spec000066400000000000000000000003201224450623700240540ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals remove_messages mbox animals assert_dump animals-removed-mbox purge_database animals-removed-mbox-purged mairix-master/test/20-remove-purge-messages-mh.test-spec000066400000000000000000000003121224450623700235140ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals remove_messages mh animals assert_dump animals-removed-mh purge_database animals-removed-mh-purged mairix-master/test/30-mformat-maildir.test-spec000066400000000000000000000025721224450623700217670ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals conf_set_mformat maildir ## from all different kinds of sources search_messages animals a:server assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## from all different kinds of sources only a part of a mbox search_messages animals Elephant/Mouse assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_no_more_matches ## match in only maildir search_messages animals Caterpillar/Elevator assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches ## match in only mh search_messages animals Tiger assert_match mh animals/1 assert_match mh animals/2 assert_no_more_matches ## match in only mbox search_messages animals Frog/Cat assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## match in only part of mbox search_messages animals Frog assert_match mbox animals/part.1 assert_no_more_matches mairix-master/test/30-mformat-mbox.test-spec000066400000000000000000000025671224450623700213170ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals conf_set_mformat mbox ## from all different kinds of sources search_messages animals a:server assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## from all different kinds of sources only a part of a mbox search_messages animals Elephant/Mouse assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_no_more_matches ## match in only maildir search_messages animals Caterpillar/Elevator assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches ## match in only mh search_messages animals Tiger assert_match mh animals/1 assert_match mh animals/2 assert_no_more_matches ## match in only mbox search_messages animals Frog/Cat assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## match in only part of mbox search_messages animals Frog assert_match mbox animals/part.1 assert_no_more_matches mairix-master/test/30-mformat-mh.test-spec000066400000000000000000000025651224450623700207540ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals conf_set_mformat mh ## from all different kinds of sources search_messages animals a:server assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## from all different kinds of sources only a part of a mbox search_messages animals Elephant/Mouse assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_no_more_matches ## match in only maildir search_messages animals Caterpillar/Elevator assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches ## match in only mh search_messages animals Tiger assert_match mh animals/1 assert_match mh animals/2 assert_no_more_matches ## match in only mbox search_messages animals Frog/Cat assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## match in only part of mbox search_messages animals Frog assert_match mbox animals/part.1 assert_no_more_matches mairix-master/test/40-search-attachments.test-spec000066400000000000000000000005161224450623700224560ustar00rootroot00000000000000add_messages mh attachments assert_dump attachments search_messages attachments n:first assert_no_more_matches search_messages attachments n:first= assert_match mh attachments/1 assert_match mh attachments/2 assert_no_more_matches search_messages attachments n:second_file.txt assert_match mh attachments/1 assert_no_more_matches mairix-master/test/40-search-body-exact.test-spec000066400000000000000000000006661224450623700222100ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals add_messages mh AliceBobEve assert_dump animals-and-AliceBobEve search_messages animals-and-AliceBobEve Robert assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/6 assert_no_more_matches search_messages animals-and-AliceBobEve b:Robert assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/6 assert_no_more_matches mairix-master/test/40-search-cc-header.test-spec000066400000000000000000000003251224450623700217540ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve c:nil assert_no_more_matches search_messages AliceBobEve c:naive@good.heart assert_match mh AliceBobEve/3 assert_no_more_matches mairix-master/test/40-search-date-header-left-closed-right-closed.test-spec000066400000000000000000000026021224450623700270650ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve d:20050101-20061102 assert_no_more_matches search_messages AliceBobEve d:20100101-20120101 assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20100101-20110101 assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20110101-20120101 assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20110320-20111111 assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20120320-20121010 assert_no_more_matches mairix-master/test/40-search-date-header-left-closed-right-open.test-spec000066400000000000000000000026251224450623700265620ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve d:20050101- assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20101230- assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20110101- assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20110706- assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches search_messages AliceBobEve d:20120201- assert_no_more_matches mairix-master/test/40-search-date-header-left-open-right-closed.test-spec000066400000000000000000000025061224450623700265600ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve ## Sweeping through by end date #################### search_messages AliceBobEve d:-20100101 assert_no_more_matches search_messages AliceBobEve d:-20101231 assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_no_more_matches search_messages AliceBobEve d:-20110101 assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_no_more_matches search_messages AliceBobEve d:-20110401 assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_no_more_matches search_messages AliceBobEve d:-20121031 assert_match mh AliceBobEve/1 # i.e.:2010-12-30 17:57:41 +0100 assert_match mh AliceBobEve/2 # i.e.:2010-12-31 17:58:41 +0100 assert_match mh AliceBobEve/3 # i.e.:2011-01-01 17:59:41 +0100 assert_match mh AliceBobEve/4 # i.e.:2011-05-19 18:00:41 +0100 assert_match mh AliceBobEve/5 # i.e.:2011-10-02 18:01:41 +0100 assert_match mh AliceBobEve/6 # i.e.:2011-12-31 18:02:41 +0100 assert_no_more_matches mairix-master/test/40-search-exact.test-spec000066400000000000000000000020411224450623700212420ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals Elephant assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match mh animals/1 assert_match mh animals/2 assert_no_more_matches search_messages animals Tiger assert_match mh animals/1 assert_match mh animals/2 assert_no_more_matches search_messages animals Mouse assert_match mh animals/2 assert_match mbox animals/part.0 assert_no_more_matches search_messages animals Cat assert_match mbox animals/part.0 assert_no_more_matches search_messages animals Caterpillar assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches ###################################### ## Adding Alice, Bob, and Eve add_messages mh AliceBobEve assert_dump animals-and-AliceBobEve ## search for email address search_messages animals-and-AliceBobEve eve@ils.lair assert_match mh AliceBobEve/2 assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/5 assert_no_more_matches mairix-master/test/40-search-flags.test-spec000066400000000000000000000030261224450623700212360ustar00rootroot00000000000000add_messages maildir flags assert_dump flags search_messages flags F:s assert_match maildir flags/cur/1:2,FRS assert_match maildir flags/cur/2:2,RS assert_match maildir flags/cur/3:2,FS assert_match maildir flags/cur/4:2,S assert_no_more_matches search_messages flags F:-s assert_match maildir flags/cur/5:2,FR assert_match maildir flags/new/6:2,R assert_match maildir flags/new/7:2,F assert_match maildir flags/cur/8:2, assert_no_more_matches search_messages flags F:f assert_match maildir flags/cur/1:2,FRS assert_match maildir flags/cur/3:2,FS assert_match maildir flags/cur/5:2,FR assert_match maildir flags/new/7:2,F assert_no_more_matches search_messages flags F:-f assert_match maildir flags/cur/2:2,RS assert_match maildir flags/cur/4:2,S assert_match maildir flags/new/6:2,R assert_match maildir flags/cur/8:2, assert_no_more_matches search_messages flags F:r assert_match maildir flags/cur/1:2,FRS assert_match maildir flags/cur/2:2,RS assert_match maildir flags/cur/5:2,FR assert_match maildir flags/new/6:2,R assert_no_more_matches search_messages flags F:-r assert_match maildir flags/cur/3:2,FS assert_match maildir flags/cur/4:2,S assert_match maildir flags/new/7:2,F assert_match maildir flags/cur/8:2, assert_no_more_matches search_messages flags F:r-f assert_match maildir flags/cur/2:2,RS assert_match maildir flags/new/6:2,R assert_no_more_matches search_messages flags F:sfr assert_match maildir flags/cur/1:2,FRS assert_no_more_matches search_messages flags F:rsf assert_match maildir flags/cur/1:2,FRS assert_no_more_matches mairix-master/test/40-search-from-header.test-spec000066400000000000000000000007101224450623700223300ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve ## Empty search search_messages AliceBobEve f:nil assert_no_more_matches ## substring search in From: search_messages AliceBobEve f:ob= assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/5 assert_no_more_matches ## exact search in From: search_messages AliceBobEve f:eve@ils.lair assert_match mh AliceBobEve/2 assert_match mh AliceBobEve/4 assert_no_more_matches mairix-master/test/40-search-msg-id.test-spec000066400000000000000000000007241224450623700213240ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve m:third@message.center assert_match mh AliceBobEve/3 assert_no_more_matches search_messages AliceBobEve m:fifth@message.center assert_match mh AliceBobEve/5 assert_no_more_matches search_messages AliceBobEve m:third@message.center/fifth@message.center # nor "or" part, when searching for messages #assert_match mh AliceBobEve/3 #assert_match mh AliceBobEve/5 assert_no_more_matches mairix-master/test/40-search-size-left-closed-right-closed.test-spec000066400000000000000000000032331224450623700256750ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve z:5-200 assert_no_more_matches search_messages AliceBobEve z:0k-4M assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_no_more_matches search_messages AliceBobEve z:279-379 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_no_more_matches search_messages AliceBobEve z:2m-10000k assert_no_more_matches ######################################## # Repeating tests with some bigger files add_messages mh BigMessages assert_dump AliceBobEve-and-BigMessages search_messages AliceBobEve-and-BigMessages z:5-200 assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:0k-4M assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:279-379 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:2m-10000k assert_match mh BigMessages/3 # i.e.:2.4M assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches mairix-master/test/40-search-size-left-closed-right-open.test-spec000066400000000000000000000055501224450623700253710ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve z:200- assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_no_more_matches search_messages AliceBobEve z:279- assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/6 # i.e.:383 assert_no_more_matches search_messages AliceBobEve z:400- assert_no_more_matches search_messages AliceBobEve z:1k- assert_no_more_matches search_messages AliceBobEve z:400k- assert_no_more_matches search_messages AliceBobEve z:4M- assert_no_more_matches search_messages AliceBobEve z:6M- assert_no_more_matches ######################################## # Repeating tests with some bigger files add_messages mh BigMessages assert_dump AliceBobEve-and-BigMessages search_messages AliceBobEve-and-BigMessages z:200- assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:279- assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/6 # i.e.:383 assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:400- assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:1k- assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:400k- assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:4M- assert_match mh BigMessages/4 # i.e.:5.0M assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:6M- assert_no_more_matches mairix-master/test/40-search-size-left-open-right-closed.test-spec000066400000000000000000000060201224450623700253620ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve z:-200 assert_no_more_matches search_messages AliceBobEve z:-379 assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/5 # i.e.:250 assert_no_more_matches search_messages AliceBobEve z:-400000 assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_no_more_matches search_messages AliceBobEve z:-800k assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_no_more_matches search_messages AliceBobEve z:-4M assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_no_more_matches ######################################## # Repeating tests with some bigger files add_messages mh BigMessages assert_dump AliceBobEve-and-BigMessages search_messages AliceBobEve-and-BigMessages z:-200 assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:-379 assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/5 # i.e.:250 assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:-400000 assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_match mh BigMessages/1 # i.e.:300k assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:-800k assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_no_more_matches search_messages AliceBobEve-and-BigMessages z:-4M assert_match mh AliceBobEve/1 # i.e.:279 assert_match mh AliceBobEve/2 # i.e.:355 assert_match mh AliceBobEve/3 # i.e.:341 assert_match mh AliceBobEve/4 # i.e.:379 assert_match mh AliceBobEve/5 # i.e.:250 assert_match mh AliceBobEve/6 # i.e.:383 assert_match mh BigMessages/1 # i.e.:300k assert_match mh BigMessages/2 # i.e.:530k assert_match mh BigMessages/3 # i.e.:2.4M assert_no_more_matches mairix-master/test/40-search-subject-header.test-spec000066400000000000000000000012621224450623700230270ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve ## Empty search result search_messages AliceBobEve s:nil assert_no_more_matches ## exact search search_messages AliceBobEve s:longer assert_match mh AliceBobEve/1 assert_no_more_matches ## substring search without restriction search_messages AliceBobEve lo= assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/2 assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/5 assert_match mh AliceBobEve/6 assert_no_more_matches ## substring search with restriction to subject header search_messages AliceBobEve s:lo= assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/5 assert_no_more_matches mairix-master/test/40-search-substring-exact-word-start.test-spec000066400000000000000000000005201224450623700253640ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals ^terpilar= assert_no_more_matches search_messages animals ^tarpilar= assert_no_more_matches search_messages animals ^Caterpil= assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches mairix-master/test/40-search-substring-exact.test-spec000066400000000000000000000013411224450623700232620ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals Ele= assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match mh animals/1 assert_match mh animals/2 assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_no_more_matches search_messages animals Cat= assert_match mbox animals/part.0 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals tarpillar= assert_no_more_matches search_messages animals oose= assert_match mbox animals/part.1 assert_no_more_matches search_messages animals ouse= assert_match mbox animals/part.0 assert_match mh animals/2 assert_no_more_matches mairix-master/test/40-search-substring-fuzz-word-start.test-spec000066400000000000000000000021311224450623700252560ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals ^terpilar=0 assert_no_more_matches search_messages animals ^terpilar=1 assert_no_more_matches search_messages animals ^terpilar=2 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals ^terpilar=3 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals ^tarpilar=0 assert_no_more_matches search_messages animals ^tarpilar=1 assert_no_more_matches search_messages animals ^tarpilar=2 assert_no_more_matches search_messages animals ^tarpilar=3 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals ^tarpilar=4 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals ^Caterpil=0 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals ^Caterpil=1 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matchesmairix-master/test/40-search-substring-fuzz.test-spec000066400000000000000000000023401224450623700231540ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals tarpillar=0 assert_no_more_matches search_messages animals tarpillar=1 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals tarpillar=2 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals tarpilar=0 assert_no_more_matches search_messages animals tarpilar=1 assert_no_more_matches search_messages animals tarpilar=2 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals tarpilar=3 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals oose=0 assert_match mbox animals/part.1 assert_no_more_matches search_messages animals oose=1 assert_match mbox animals/part.1 assert_match mbox animals/part.0 assert_match mh animals/2 assert_no_more_matches search_messages animals ouse=0 assert_match mbox animals/part.0 assert_match mh animals/2 assert_no_more_matches search_messages animals ouse=1 assert_match mbox animals/part.1 assert_match mbox animals/part.0 assert_match mh animals/2 assert_no_more_matches mairix-master/test/40-search-thread.test-spec000066400000000000000000000034601224450623700214130ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals add_messages mh AliceBobEve assert_dump animals-and-AliceBobEve ## some parts of the animal thread search_messages animals-and-AliceBobEve Elephant assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match mh animals/1 assert_match mh animals/2 assert_no_more_matches ## the whole animal thread when search-term yields thread start even without -t search_messages animals-and-AliceBobEve -t Elephant assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## the whole animal thread when search-term does /not/ yield thread start without thread search search_messages animals-and-AliceBobEve --threads Tiger assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_no_more_matches ## the whole animal thread and some AliceBobEve emails search_messages animals-and-AliceBobEve -t Elephant/Robert assert_match maildir animals/cur/1294156254.3884_1.spencer:2,RS assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_match mh animals/1 assert_match mh animals/2 assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/6 assert_no_more_matches mairix-master/test/40-search-to-header.test-spec000066400000000000000000000015611224450623700220140ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve ## Empty search result search_messages AliceBobEve t:nil assert_no_more_matches ## Search by To: address search_messages AliceBobEve t:amorous.bob@heart.breaker assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/6 assert_no_more_matches ## Search by name within To: address search_messages AliceBobEve t:Alice assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/2 assert_no_more_matches ## Search by substrng without restriction search_messages AliceBobEve eve= assert_match mh AliceBobEve/2 assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/5 assert_match mh AliceBobEve/6 assert_no_more_matches ## Search by substrng without restriction to To headers search_messages AliceBobEve t:eve= assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/5 assert_no_more_matches mairix-master/test/45-search-combinations.test-spec000066400000000000000000000011361224450623700226340ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals Hippopotamus F:s assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_no_more_matches search_messages animals f:someidC@some.server Elephant assert_match mh animals/2 assert_no_more_matches search_messages animals f:someidC@some.server Ele= assert_match mh animals/2 assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_no_more_matches search_messages animals f:someidC@some.server Ele= ~Hippo= assert_match mh animals/2 assert_no_more_matches mairix-master/test/45-search-in-more-than-one-part.test-spec000066400000000000000000000011661224450623700241730ustar00rootroot00000000000000add_messages mh AliceBobEve assert_dump AliceBobEve search_messages AliceBobEve ft:eve@ils.lair assert_match mh AliceBobEve/2 assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/5 assert_no_more_matches search_messages AliceBobEve tc:naive@good= assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/2 assert_match mh AliceBobEve/3 assert_no_more_matches search_messages AliceBobEve a:amorous.bob@heart.breaker assert_match mh AliceBobEve/1 assert_match mh AliceBobEve/3 assert_match mh AliceBobEve/4 assert_match mh AliceBobEve/5 assert_match mh AliceBobEve/6 assert_no_more_matches mairix-master/test/45-search-negations.test-spec000066400000000000000000000010161224450623700221330ustar00rootroot00000000000000add_messages maildir animals add_messages mh animals add_messages mbox animals assert_dump animals search_messages animals ~Elephant assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_match maildir animals/cur/1294156254.3884_3.spencer:2,S assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches search_messages animals ~Ele= assert_match mbox animals/part.0 assert_match mbox animals/part.1 assert_match maildir animals/new/1294156254.3884_5.spencer assert_no_more_matches mairix-master/test/Makefile000066400000000000000000000066321224450623700162730ustar00rootroot00000000000000############################################ # # Structure of the Makefile # * Some defaults # * One-time preparation of mails used for testing. # * Targets for the whole test suite # * Targets for a single test # * Targets for overall maintenance # # Important Targets: # test -- runs all tests. Stops at the first failing test # check -- runs all tests. Continues after failed tests # clean -- removes all logs, etc from running tests # distclean -- removes all generated files # ############################################ ############################################ # Defaults # # the used mairix MAIRIX_EXE=../mairix # the place for helper scripts SCRIPT_DIR=scripts # Each file ending in .test-spec signifies a test TEST_SPEC_FILES=$(shell ls -1 *.test-spec ) TESTS=$(TEST_SPEC_FILES:%.test-spec=%) all: check ############################################ # One-time preparation of mails used for testing. # # - big messages are generated by a script when testing. # (Shipping them directly would unnecessary increase source-code size.) # - mboxen are split into individual messages for easier reference. # # big messages: BIG_MESSAGES_INDICES=1 2 3 4 BIG_MESSAGES=$(BIG_MESSAGES_INDICES:%=messages/mh/BigMessages/%) big-messages: $(BIG_MESSAGES) messages/mh/BigMessages/%: $(SCRIPT_DIR)/generate_big_message.sh $@ # splitting the mboxen .split_mboxen_marker: find messages/mbox -type f -exec $(SCRIPT_DIR)/split_mbox.sh {} \; touch $@ # message-preparations is used to assert all required preprocessing is done message-preparations: .split_mboxen_marker big-messages # maintenance for the generated files distclean-message-preparations: -rm -f .split_mboxen_marker -rm -rf messages/mbox_split -rm -rf $(BIG_MESSAGES) ############################################ # Targets for the whole test suite # # test runs all tests and aborts upon errer test: clean $(TESTS:%=test-%) # check runs all tests but does not abort upon error. # It collects results and presents them in a summary. # Finally this target fails if any test did not succeed check: clean $(TESTS:%=%.status) $(SCRIPT_DIR)/print_test_statistics.sh if grep '^[^#]*\(dump_database\|log_remaining_matched_unasserted_messages\)' $(TEST_SPEC_FILES) ; then echo ; echo "Above files use dump_database or log_remaining_matched_unasserted_messages. But they should only be used when developing tests. Aborting" >&2 ; exit 1 ; fi # clean removes the logs and outcomes of all tests clean-tests: $(TESTS:%=clean-%) ############################################ # Targets for a single test # # almostclean-TEST removes the logs for the test TEST, but not the outcome of the test almostclean-%: -rm -rf $(@:almostclean-test-%=%.data) # almostclean-TEST removes the logs and outcome for the test TEST clean-%: almostclean-test-% -rm -f $(@:clean-%=%.status) # test-TEST runs the test TEST, aborting on failure test-%: %.test-spec message-preparations almostclean-test-% $(MAIRIX_EXE) $(SCRIPT_DIR)/test.sh $< # TEST.status runs the test TEST, not aborting on failure, but collecting the outcome in TEST.status. Either "failed" if the test fauled, or "passed", if the test passed %.status: %.test-spec $(MAIRIX_EXE) echo "failed" >$@ $(MAKE) test-$(<:%.test-spec=%) && \ echo "passed" >$@ || \ echo "Test $< did not succeed." ############################################ # Maintenance targets # clean: clean-tests distclean: clean distclean-message-preparations mairix-master/test/README000066400000000000000000000110111224450623700154760ustar00rootroot00000000000000This file describes the automated testing framework for mairix =============================================================================== Most important commands =============================================================================== The most important commands are: make test -- runs all tests but stops at the first test make check -- runs all tests but does not stop at failed tests =============================================================================== How to write new tests: =============================================================================== Simply create a new file ending in .test-spec in the directory of this README file. The required syntax of the test specification is documented in README.format. Upon "make test" or "make check" the new test is automatically included and executed in the test suite. To run the test specification FILE.test-spec individually, issue make test-FILE . =============================================================================== More detailed look at the test suite: =============================================================================== Each of "make test" or "make check" looks for all available tests (all .test-spec files in this directory) an runs them. Each test is stored in a .test-spec file within this directory and to run the test of FILE.test-spec isolated, typically make test-FILE is used. Thereby, the script scripts/test.sh interprets the test specification given in FILE.test-spec and outputs all logging, databases, etc in the directory FILE.data. Hence, if a test fails, FILE.data is the place to find more information. All relevant information about a test specification FILE.test-spec can be found in the following files: FILE.data/log contains the detailed log of what happened and in which line of FILE.test-spec, the test failed. FILE.data/database is the current state of the database FILE.data/database.dump is the most up-to-date dump of the database. Note that this dump is automatically updated whenever searching for a message. FILE.data/mairixrc is the current configuration file used for mairix. FILE.data/messages contains the messages added in the current test. FILE.data/search_result may be a file or a directory and contains the most recent search result in the requested format (maildir, mh, mbox). Already asserted matches are not removed from this directory. Hence it allows to get a picture of the previous search, even if some, but not all matches have been asserted. FILE.data/search_result_split is only used for searches conducted in mbox format. For such searches, this directory contains the individual messages of the FILE.data/search_result mbox. The 1st message of FILE.data/search_result is stored in FILE.data/search_result_split/part.0 The 2nd in FILE.data/search_result_split/part.1 and so on. FILE.test-spec is the test specification itself FILE.status contains the status of the most-recent invocation of this test. Typically either "failed" or "passed". Of the above files, all (except for FILE.test-spec) are automatically generated upon need, when running make test-FILE . Besides above test specific files, there are the following files and directories: messages contains the messages used in the tests dumps contains the databases' dumps used for checking the validity of the databases. scripts contains the relevant scripts used in the test suite. scripts/generate_big_message.sh is used to generate big messages, which would unnecessarily increase the released tarball. Those messages are however useful when searching messages by the file size. scripts/print_test_statistics.sh summarizes all the .status files scripts/split_mbox.sh splits an mbox into individual messages, to allow fine grained message assertion even for mboxen. scripts/test.sh is the intepreter of a test specification README is the file you are currently reading. README.format gives the syntax of a test specification. mairix-master/test/README.format000066400000000000000000000222241224450623700167750ustar00rootroot00000000000000This file describes the format of a test specification. This document is divided into the following parts: * Overview * Commands for managing the configuration file * Commands for managing the database * Commands for message matching * Examples * Overview ================================================================ A test-specification file is built of commands, which are run sequentially in the order they appear in the test-specification file. Each command is on a separate line. But lines may be empty. # Servers as comment character: For each line, the first # and anything after the first # is omitted. Valid commands are (listed alphabetically): add_messages assert_dump assert_match assert_no_more_matches conf_set_mformat dump_database log_remaining_matched_unasserted_messages purge_database remove_messages search_messages Sorted by functions's purpose, we may break above list down into: - Commands for managing the configuration file conf_set_mformat - Commands for managing the database add_messages assert_dump dump_database purge_database remove_messages - Commands for message matching search_messages assert_match assert_no_more_matches log_remaining_matched_unasserted_messages * Commands for managing the configuration file ============================ +-------------------------------------------------------------------------- | conf_set_mformat FORMAT +-------------------------------------------------------------------------- | Sets the mformat parameter (i.e.: The format in which the matched | messages shall be presented) of the mairix configuration file to FORMAT. | FORMAT can be either maildir, mh, or box. This parameter does not affect | the previously conducted search, but only the upcoming searches. +-------------------------------------------------------------------------- * Commands for managing the database ====================================== +-------------------------------------------------------------------------- | add_messages FORMAT MESSAGE +-------------------------------------------------------------------------- | adds a message to the database. FORMAT has to be either maildir, mh, or | mbox. Depending on FORMAT, MESSAGE refers to a maildir, mh folder, or | mbox in the corresponding subdirectory of the base directory. | | E.g.: | add_message maildir animals | adds all messages in the maildir messages/maildir/animals to the | database. Accordingly, | add_message mbox animals | adds all messages in the mbox messages/mbox/animals to the database. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | assert_dump DUMPFILE +-------------------------------------------------------------------------- | asserts that the dump stored in DUMPFILE reflects the current state of | the database. DUMPFILE has to be relative to the dumps subdirectory in | the base directory. | | A dump of the current database gets stored in database.dump in the | test's data directory. | | If the dump stored in DUMPFILE does not reflect the current state of the | database, the test is aborted. | | Note: To see why the dump stored in DUMPFILE does not match the state of | the current database, you may diff the DUMPFILE and the database.dump | file in the test's data directory. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | dump_database DUMPFILE +-------------------------------------------------------------------------- | dumps the current database to DUMPFILE, which has to be either absolute | or relative to the base directory. | | This function is mainly useful when developing new tests. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | purge_database DUMPFILE +-------------------------------------------------------------------------- | purges the database and asserts that DUMPFILE respresents the state of | the database after the purging. DUMPFILE has to be relative to the dumps | subdirectory in the base directory. | | If the dump stored in DUMPFILE does not reflect the state of the | database after purging, the test is aborted. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | remove_messages FORMAT MESSAGE1 MESSAGE2 MESSAGE3 ... +-------------------------------------------------------------------------- | removes the given messages from the database. FORMAT has to be either | maildir, mh, or mbox. Depending on FORMAT, MESSAGE refers to a maildir, | mh folder, or mbox in the corresponding subdirectory of the base | directory. | | E.g.: | remove_message maildir animals | removes all messages in the maildir messages/maildir/animals from the | database. Accordingly, | remove_message mbox animals | removes all messages in the mbox messages/mbox/animals from the database. +-------------------------------------------------------------------------- * Commands for message matching =========================================== +-------------------------------------------------------------------------- | search_messages DUMPFILE EXPR1 EXPR2 EXPR3 ... +-------------------------------------------------------------------------- | Runs a mairix search using EXPR1 EXPR2 EXPR3 on the database and asserts | that DUMPFILE represents the database /after/ searching. DUMPFILE has to | be relative to the dumps subdirectory in the base directory | | After a search_message, specify all the found messages using the | assert_match function and close the search using the | assert_no_more_matches function. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | assert_match FORMAT MESSAGE +-------------------------------------------------------------------------- | asserts that the previously conducted search matched the given message. | FORMAT can be either maildir, mh, or mbox and signifies the folder format | of the message to assert. MESSAGE is a /single/ message in | messages/maildir/ if FORMAT is maildir, | messages/mh/ if FORMAT is mh, or | messages/mbox_split/ if FORMAT is mbox. | | If the given message has not been matched in the previous search, the | test is aborted. | | E.g.: | assert_match maildir animals/new/1294156254.3884_5.spencer | asserts that messages/maildir/animals/new/1294156254.3884_5.spencer | has been matched by the previous search. | assert_match mbox animals/part.0 | asserts that messages/mbox_split/animals/part.0 (i.e.: the first message | within the mbox messages/mbox/animals) has been matched by the previous | search. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | assert_no_more_matches +-------------------------------------------------------------------------- | asserts that all messages matched in the previous search have been | asserted using assert_match. | | If there are still matched, but unasserted messages left from the | previous search, the test is aborted. +-------------------------------------------------------------------------- +-------------------------------------------------------------------------- | log_remaining_matched_unasserted_messages +-------------------------------------------------------------------------- | This function lists pointers to the messages that have been matched in | the previous search, but have not yet been asserted using assert_match. | | This function is mainly useful when developing new tests. +-------------------------------------------------------------------------- * Examples ================================================================ All files ending in .test-spec in this directory serve as simple examples. However, below we provide an annotated example of a test, that should succeed. --- example.test-spec --- BEGIN ------------------------------------------- add_messages mh AliceBobEve # adds the mh messages underneath # messages/mh/AliceBobEve search_messages AliceBobEve t:amorous.bob@heart.breaker # Asserts that the current database # corresponds to dumps/AliceBobEve # and conducts a search for messages # where the To header contains # amorous.bob@heart.breaker assert_match mh AliceBobEve/4 # asserts that messages/mh/AliceBobEve/4 # has been matched assert_match mh AliceBobEve/6 # asserts that messages/mh/AliceBobEve/7 # has been matched assert_no_more_matches # asserts that the previous search did # not yield any further matches --- example.test-spec --- END --------------------------------------------- mairix-master/test/dumps/000077500000000000000000000000001224450623700157545ustar00rootroot00000000000000mairix-master/test/dumps/AliceBobEve000066400000000000000000000075651224450623700200140ustar00rootroot00000000000000Dump of database 6 messages 0: FILE messages/mh/AliceBobEve/1, size=279, tid=0 1: FILE messages/mh/AliceBobEve/2, size=355, tid=1 2: FILE messages/mh/AliceBobEve/3, size=341, tid=2 3: FILE messages/mh/AliceBobEve/4, size=379, tid=3 4: FILE messages/mh/AliceBobEve/5, size=250, tid=4 5: FILE messages/mh/AliceBobEve/6, size=383, tid=5 Hash key 00000001 -------------------------------- Contents of table 13 entries Word 0 : 2 4 Word 1 : 3 5 Word 2 : 3 5 Word 3 : 0 1 Word 4 : 3 5 Word 5 : 2 4 Word 6 : 3 5 Word 7 : 0 1 Word 8 : 2 4 Word 9 : 0 1 Word 10 : 0 1 3 5 Word 11 : 2 4 Word 12 : 0 1 -------------------------------- Contents of table 4 entries Word 0 : 2 Word 1 : 2 Word 2 : 2 Word 3 : 2 -------------------------------- Contents of table 21 entries Word 0 : 5 Word 1 : 5 Word 2 : 1 3 Word 3 : 0 2 4 Word 4 : 5 Word 5 : 0 2 4 Word 6 : 5 Word 7 : 0 2 4 Word 8 : 1 3 Word 9 : 5 Word 10 : 0 Word 11 : 5 Word 12 : 0 2 4 Word 13 : 5 Word 14 : 1 3 Word 15 : 5 Word 16 : 5 Word 17 : 0 2 4 5 Word 18 : 1 3 Word 19 : 5 Word 20 : 5 -------------------------------- Contents of table 17 entries Word 0 : 0 Word 1 : 1 Word 2 : 4 Word 3 : 0 2 3 Word 4 : 2 3 Word 5 : 2 3 Word 6 : 1 Word 7 : 2 3 Word 8 : 0 Word 9 : 0 Word 10 : 2 3 Word 11 : 5 Word 12 : 2 3 Word 13 : 4 Word 14 : 5 Word 15 : 0 Word 16 : 3 -------------------------------- Contents of table 53 entries Word 0 : 1 Word 1 : 5 Word 2 : 5 Word 3 : 0 1 2 3 5 Word 4 : 1 Word 5 : 5 Word 6 : 3 Word 7 : 1 Word 8 : 5 Word 9 : 3 Word 10 : 2 Word 11 : 5 Word 12 : 5 Word 13 : 5 Word 14 : 5 Word 15 : 1 Word 16 : 2 Word 17 : 1 2 Word 18 : 1 3 Word 19 : 1 2 Word 20 : 4 Word 21 : 3 Word 22 : 0 2 3 5 Word 23 : 2 Word 24 : 3 Word 25 : 2 Word 26 : 3 5 Word 27 : 1 Word 28 : 1 Word 29 : 3 Word 30 : 3 Word 31 : 5 Word 32 : 1 Word 33 :
5 Word 34 : 5 Word 35 : 1 2 Word 36 : 1 Word 37 : 1 2 Word 38 : 5 Word 39 : 5 Word 40 : 5 Word 41 : 4 Word 42 : 5 Word 43 : 1 5 Word 44 : 3 Word 45 : 3 Word 46 : 3 Word 47 : 5 Word 48 : 2 Word 49 : 0 1 2 3 Word 50 : 1 2 3 5 Word 51 : 1 2 3 5 Word 52 : 3 -------------------------------- Contents of table 0 entries -------------------------------- Contents of table Chain 0 6 entries Word 0 : 0 Word 1 : 4 Word 2 : 3 Word 3 : 2 Word 4 : 5 Word 5 : 1 Chain 1 6 entries Word 0 : 0 Word 1 : 4 Word 2 : 3 Word 3 : 2 Word 4 : 5 Word 5 : 1 -------------------------------- mairix-master/test/dumps/AliceBobEve-and-BigMessages000066400000000000000000000122321224450623700227260ustar00rootroot00000000000000Dump of database 10 messages 0: FILE messages/mh/AliceBobEve/1, size=279, tid=0 1: FILE messages/mh/AliceBobEve/2, size=355, tid=1 2: FILE messages/mh/AliceBobEve/3, size=341, tid=2 3: FILE messages/mh/AliceBobEve/4, size=379, tid=3 4: FILE messages/mh/AliceBobEve/5, size=250, tid=4 5: FILE messages/mh/AliceBobEve/6, size=383, tid=5 6: FILE messages/mh/BigMessages/1, size=300131, tid=6 7: FILE messages/mh/BigMessages/2, size=530131, tid=7 8: FILE messages/mh/BigMessages/3, size=2438130, tid=8 9: FILE messages/mh/BigMessages/4, size=5035130, tid=9 Hash key 00000001 -------------------------------- Contents of table 17 entries Word 0 : 3 5 Word 1 : 0 1 3 5 Word 2 : 0 1 Word 3 : 3 5 Word 4 : 3 5 Word 5 : 6 7 8 9 Word 6 : 2 4 Word 7 : 6 7 8 9 Word 8 : 2 4 Word 9 : 6 7 8 9 Word 10 : 2 4 Word 11 : 3 5 Word 12 : 6 7 8 9 Word 13 : 0 1 Word 14 : 2 4 Word 15 : 0 1 Word 16 : 0 1 -------------------------------- Contents of table 4 entries Word 0 : 2 Word 1 : 2 Word 2 : 2 Word 3 : 2 -------------------------------- Contents of table 25 entries Word 0 : 0 2 4 Word 1 : 0 2 4 5 Word 2 : 0 2 4 Word 3 : 5 Word 4 : 6 7 8 9 Word 5 : 6 7 8 9 Word 6 : 5 Word 7 : 5 Word 8 : 0 2 4 Word 9 : 5 Word 10 : 1 3 Word 11 : 5 Word 12 :