pax_global_header00006660000000000000000000000064147074042660014524gustar00rootroot0000000000000052 comment=6ce0ba6b03e4902286ea3febd4854e5aa71a2688 pinot-1.22/000077500000000000000000000000001470740426600126015ustar00rootroot00000000000000pinot-1.22/AUTHORS000066400000000000000000000112071470740426600136520ustar00rootroot00000000000000Pinot is written by : Fabrice Colin Patches, bug reports and helpful contributions from : Reini Urban Marcus Rueckert Manuel Breitfeld Gauvain Pocentek Oskar Thierry Thomas Neal Becker Gabriel C Roger Mason Reuben Thomas Marco Bazzani Christian Dywan Lee Marks Adel Gadllah Andreas Wagner Claudio Bustos Navarrete 林永忠 Yung-chung Lin David Paleino Michael Biebl Constantin Teodorescu Adrian Bunk C. Scott Ananian Martin Michlmayr John Werden Funda Wang Antoine Jacoutot Jonas Smedegaard Jens Wilhelm Wulf Nikolay Kachanov Martijn Verstrate Kamil Rytarowski Takafumi Arakaki Jelle van der Waa Helmut Grohne Olly Betts Pino Toscano Some code is borrowed from Xapian Omega, which is "largely the work of Olly Betts, Mark Shinwell (omindex-config) and Richard Boulton", including its HTML parser. The file Tokenize/filters/HtmlParser.cc is : Copyright 1999,2000,2001 BrightStation PLC Copyright 2001 Ananova Ltd Copyright 2002,2006,2007,2008 Olly Betts while the file Tokenize/filters/HtmlParser.h is : Copyright 1999,2000,2001 BrightStation PLC Copyright 2002,2006,2008 Olly Betts Code for mktime_from_utc() in Utils/TimeConverter.cpp was borrowed from wget (http://wget.sunsite.dk/). Translations provided by : Launchpad - https://translations.launchpad.net/pinot/trunk/+pots/pinot Chinese (Simplified) - Ashlee Ma & rainofchaos & Aron Xu & Eleanor Chen & mike2718 & happymeng Chinese (Traditional) - 林永忠 Yung-Chung Lin Czech - Zbyněk Schwarz Dutch - Tikkel & JW & Balaam's Miracle & Dirk Roos & Martijn Verstrate & Tico French - Nicolas Velin & Frédéric Grosshans & Thierry Thomas & verdy_p & Eliovir German - Christian Dywan & Gena Haltmair & Fabian Affolter & Marco Jahn & pkramerruiz Hebrew - Yaron & Ddorda Italian - Michele Angrisano & Vincenzo Consales > & Marco Bazzani & Davide Vidal & Simone Sandri Japanese - Takeo Mizuki && Takafumi Arakaki Portuguese - _PN_boy & Tiago Silva & Flávio Martins & Bernardo Lopes & Almufadado Portuguese (Brazil) - Leonardo Melo & Rafael Porto Rodrigues & André Gondim & Henrique P. Machado & andbelo & feen & Adriano Steffler Russian - Sergey Vostrikov & Alexander 'FONTER' Zinin & Nikolay Kachanov Spanish - Jesús Tramullas & Garbage & DiegoJ & Juan Miguel Boyero Corral & Matias Fonzo & Fitoschido Swedish - Daniel Nylander & Zirro pinot-1.22/COPYING000066400000000000000000000431051470740426600136370ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. pinot-1.22/ChangeLog-dijon000066400000000000000000000740301470740426600154600ustar00rootroot00000000000000------------------------------------------------------------------------ r174 | fabricecolin | 2011-10-02 13:00:28 +0800 (Sun, 02 Oct 2011) | 2 lines Changed paths: A /trunk/filters/Exiv2ImageFilter.cc A /trunk/filters/Exiv2ImageFilter.h EXIF, IPTC and XMP filter based on exiv2 (http://www.exiv2.org/). ------------------------------------------------------------------------ r173 | fabricecolin | 2011-02-21 21:54:24 +0800 (Mon, 21 Feb 2011) | 2 lines Changed paths: A /trunk/filters/ChmFilter.cc A /trunk/filters/ChmFilter.h Filter for CHM files based on chmlib from http://www.jedrea.com/chmlib/ ------------------------------------------------------------------------ r172 | fabricecolin | 2011-02-20 18:04:50 +0800 (Sun, 20 Feb 2011) | 2 lines Changed paths: M /trunk/filters/ExternalFilter.cc M /trunk/filters/ExternalFilter.h A /trunk/filters/FileOutputFilter.cc A /trunk/filters/FileOutputFilter.h Moved read_file() to class FileOutputFilter. ------------------------------------------------------------------------ r171 | fabricecolin | 2010-12-15 21:22:55 +0800 (Wed, 15 Dec 2010) | 4 lines Changed paths: M /trunk/filters/ArchiveFilter.cc M /trunk/filters/ArchiveFilter.h M /trunk/filters/ExternalFilter.cc M /trunk/filters/ExternalFilter.h M /trunk/filters/Filter.h M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h M /trunk/filters/TagLibMusicFilter.cc M /trunk/filters/TarFilter.cc M /trunk/filters/TarFilter.h M /trunk/filters/TextFilter.cc M /trunk/filters/XmlFilter.cc New property MAXIMUM_NESTED_SIZE to limit how much of nested documents filters will return. Extra DEBUG elsewhere. ------------------------------------------------------------------------ r169 | fabricecolin | 2010-07-12 21:59:08 +0800 (Mon, 12 Jul 2010) | 3 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc Fixed bug that would stop parsing when not being able to get the last part. External parts of unknown access types are assigned application/octet-stream. ------------------------------------------------------------------------ r168 | fabricecolin | 2010-07-05 21:04:27 +0800 (Mon, 05 Jul 2010) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h Support for messages of type message/external-body. ------------------------------------------------------------------------ r166 | fabricecolin | 2009-10-25 13:05:41 +0800 (Sun, 25 Oct 2009) | 3 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc M /trunk/cjkv/CJKVTokenizer.h Don't skip dots, they are useful for acronyms. Both overloads of tokenize() take break_ascii_only_on_space. ------------------------------------------------------------------------ r164 | fabricecolin | 2009-06-27 09:53:49 +0800 (Sat, 27 Jun 2009) | 3 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc GMime 2.4 doesn't require objects returned by g_mime_message_get_mime_part() to be unref'ed. ------------------------------------------------------------------------ r163 | fabricecolin | 2009-06-23 20:06:33 +0800 (Tue, 23 Jun 2009) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h Only the last message's first part was retrieved, the rest was skipped. ------------------------------------------------------------------------ r162 | fabricecolin | 2009-06-22 20:45:28 +0800 (Mon, 22 Jun 2009) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h Fix type mismatch with gmime 2.4. Rely on stream length rather than file size. ------------------------------------------------------------------------ r161 | fabricecolin | 2009-06-21 20:14:34 +0800 (Sun, 21 Jun 2009) | 2 lines Changed paths: M /trunk/filters/HtmlFilter.cc Replace tags with spaces in titles and links too. ------------------------------------------------------------------------ r160 | fabricecolin | 2009-05-31 14:10:25 +0800 (Sun, 31 May 2009) | 2 lines Changed paths: M /trunk/filters/ArchiveFilter.cc M /trunk/filters/TarFilter.cc Since title is set to file names, set mimetype to the new SCANTITLE. ------------------------------------------------------------------------ r159 | fabricecolin | 2009-05-31 14:09:23 +0800 (Sun, 31 May 2009) | 3 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h Get the file name of MIME parts and use that as title, rather than the message's subject. ------------------------------------------------------------------------ r158 | fabricecolin | 2009-05-31 11:37:32 +0800 (Sun, 31 May 2009) | 2 lines Changed paths: M /trunk/filters/external-filters.xml Removed application/x-deb, it's now handled by ArchiveFilter. ------------------------------------------------------------------------ r157 | fabricecolin | 2009-05-31 11:36:18 +0800 (Sun, 31 May 2009) | 3 lines Changed paths: M /trunk/filters/ArchiveFilter.cc M /trunk/filters/GMimeMboxFilter.cc Set the close-on-exec flag, either on open() or after open() with fcntl(). In ArchiveFilter, add support for ar archives and deb's. ------------------------------------------------------------------------ r155 | fabricecolin | 2009-04-13 17:47:49 +0800 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc M /trunk/cjkv/CJKVTokenizer.h Set a limit to the amount of text that's split and tokenized. ------------------------------------------------------------------------ r153 | fabricecolin | 2009-04-07 22:27:22 +0800 (Tue, 07 Apr 2009) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc Ensure the data buffer isn't accidentally lost on calls to skip_to_document(). ------------------------------------------------------------------------ r152 | fabricecolin | 2009-04-06 21:19:07 +0800 (Mon, 06 Apr 2009) | 2 lines Changed paths: M /trunk/filters/ArchiveFilter.cc Removed cpio and zip. ------------------------------------------------------------------------ r151 | fabricecolin | 2009-04-05 09:54:34 +0800 (Sun, 05 Apr 2009) | 2 lines Changed paths: M /trunk/filters/ArchiveFilter.cc M /trunk/filters/ArchiveFilter.h M /trunk/filters/TarFilter.cc Completed ArchiveFilter. Minor changes to TarFilter. ------------------------------------------------------------------------ r150 | fabricecolin | 2009-04-02 23:30:22 +0800 (Thu, 02 Apr 2009) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc Minor changes. ------------------------------------------------------------------------ r149 | fabricecolin | 2009-04-02 23:29:17 +0800 (Thu, 02 Apr 2009) | 2 lines Changed paths: M /trunk/filters/external-filters.xml Type application/x-compress was missing. ------------------------------------------------------------------------ r148 | fabricecolin | 2009-04-02 23:23:28 +0800 (Thu, 02 Apr 2009) | 2 lines Changed paths: M /trunk/filters/ExternalFilter.cc Don't initialize and cleanup libxml2, let the application handle that. ------------------------------------------------------------------------ r147 | fabricecolin | 2009-03-29 17:03:04 +0800 (Sun, 29 Mar 2009) | 4 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h Use the memory stream functions when passed a buffer, instead of dealing only with files. Free the stream after freeing the parser. Don't set the URI to "mailbox://...". ------------------------------------------------------------------------ r146 | fabricecolin | 2009-03-29 17:00:13 +0800 (Sun, 29 Mar 2009) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc Minor change. ------------------------------------------------------------------------ r145 | fabricecolin | 2009-03-25 21:38:25 +0800 (Wed, 25 Mar 2009) | 2 lines Changed paths: M /trunk/filters/ArchiveFilter.cc Set ipath to "f=file_name". ------------------------------------------------------------------------ r144 | fabricecolin | 2009-03-23 23:22:16 +0800 (Mon, 23 Mar 2009) | 3 lines Changed paths: A /trunk/filters/ArchiveFilter.cc A /trunk/filters/ArchiveFilter.h M /trunk/filters/external-filters.xml Filter based on libarchive for tar and cpio files, compressed or not, zip files and ISO images. ------------------------------------------------------------------------ r143 | fabricecolin | 2009-03-23 23:13:30 +0800 (Mon, 23 Mar 2009) | 2 lines Changed paths: M /trunk/filters/TarFilter.cc Reset the handle on rewind(). ------------------------------------------------------------------------ r142 | fabricecolin | 2009-03-22 22:00:47 +0800 (Sun, 22 Mar 2009) | 3 lines Changed paths: M /trunk/filters/ExifImageFilter.cc M /trunk/filters/ExternalFilter.cc M /trunk/filters/Filter.cc M /trunk/filters/Filter.h M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h M /trunk/filters/HtmlFilter.cc M /trunk/filters/HtmlFilter.h M /trunk/filters/TagLibMusicFilter.cc M /trunk/filters/TarFilter.cc M /trunk/filters/TextFilter.cc M /trunk/filters/XmlFilter.cc Content is handled separately from metadata. Filters expect the application to provide a Memory.h that define a basic_string subclass named dstring. ------------------------------------------------------------------------ r141 | fabricecolin | 2009-03-22 16:24:10 +0800 (Sun, 22 Mar 2009) | 2 lines Changed paths: M /trunk/filters/external-filters.xml For OpenXML Presentation and Sheet we didn't get the right file(s) out. ------------------------------------------------------------------------ r140 | fabricecolin | 2009-03-10 19:56:28 +0800 (Tue, 10 Mar 2009) | 3 lines Changed paths: A /trunk/filters/TarFilter.cc A /trunk/filters/TarFilter.h M /trunk/filters/external-filters.xml TarFilter is a libtar-based filter to extract files from tar archives. Modified the rules for compressed tar files in external-filters.xml accordingly. ------------------------------------------------------------------------ r138 | fabricecolin | 2009-03-05 21:51:00 +0800 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/filters/HtmlFilter.cc Fixed findCharset(). ------------------------------------------------------------------------ r137 | fabricecolin | 2009-03-05 21:19:24 +0800 (Thu, 05 Mar 2009) | 3 lines Changed paths: M /trunk/filters/ExternalFilter.cc M /trunk/filters/ExternalFilter.h M /trunk/filters/HtmlFilter.cc If ExternalFilter is getting plain text, it will only grab the first 5Mb. Less chatty DEBUG in HtmlFilter. ------------------------------------------------------------------------ r136 | fabricecolin | 2009-02-28 14:39:52 +0800 (Sat, 28 Feb 2009) | 3 lines Changed paths: M /trunk/filters/Filter.h M /trunk/filters/GMimeMboxFilter.cc Allow initializing of dynamic filters when they are loaded with GCC's __attribute__((constructor)). ------------------------------------------------------------------------ r135 | fabricecolin | 2009-02-28 10:32:22 +0800 (Sat, 28 Feb 2009) | 3 lines Changed paths: M /trunk/filters/HtmlFilter.cc M /trunk/filters/HtmlFilter.h A /trunk/filters/HtmlParser.cc A /trunk/filters/HtmlParser.h Imported Xapian Omega's htmlparse.cc/h files, made some small changes. Rewrote HtmlFilter to use that instead of libxml2. ------------------------------------------------------------------------ r134 | fabricecolin | 2009-02-15 22:39:04 +0800 (Sun, 15 Feb 2009) | 3 lines Changed paths: M /trunk/filters/external-filters.xml A hack to prevent rpm from chocking on text files mistakenly identified as RPMs by a broken shared-mime. ------------------------------------------------------------------------ r132 | fabricecolin | 2009-01-11 12:53:43 +0800 (Sun, 11 Jan 2009) | 2 lines Changed paths: M /trunk/filters/external-filters.xml Get pdftotext to output UTF-8 text. ------------------------------------------------------------------------ r131 | fabricecolin | 2009-01-08 19:25:00 +0800 (Thu, 08 Jan 2009) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc Treat all punctuation as space. ------------------------------------------------------------------------ r130 | fabricecolin | 2008-12-31 21:42:29 +0800 (Wed, 31 Dec 2008) | 3 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc Patch for gmime 2.4 support by Adel Gadllah. If GMIME_ENABLE_RFC2047_WORKAROUNDS is defined, assume gmime 2.4. ------------------------------------------------------------------------ r129 | fabricecolin | 2008-12-31 21:11:39 +0800 (Wed, 31 Dec 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc Less DEBUG. ------------------------------------------------------------------------ r128 | fabricecolin | 2008-12-16 21:37:52 +0800 (Tue, 16 Dec 2008) | 3 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc In _unicode_to_char(), convert all Unicode spaces including the "non-breaking space" character at code point 160 to a single character. ------------------------------------------------------------------------ r127 | fabricecolin | 2008-12-13 13:38:05 +0800 (Sat, 13 Dec 2008) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc Fix previous commit. ------------------------------------------------------------------------ r126 | fabricecolin | 2008-12-13 13:36:00 +0800 (Sat, 13 Dec 2008) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc If strptime() isn't available, attempt to parse EXIF_TAG_DATE_TIME bit by bit. ------------------------------------------------------------------------ r125 | fabricecolin | 2008-12-07 12:58:29 +0800 (Sun, 07 Dec 2008) | 2 lines Changed paths: M /trunk/filters/FilterFactory.cc M /trunk/filters/HtmlFilter.cc M /trunk/xesam/XapianQueryBuilder.cc Include config.h before checking any of the "HAVE_" ifdef's. ------------------------------------------------------------------------ r124 | fabricecolin | 2008-12-06 09:22:40 +0800 (Sat, 06 Dec 2008) | 2 lines Changed paths: M /trunk/filters/TagLibMusicFilter.cc GCC 4.4 patch by Martin Michlmayr (Debian bug #504908). ------------------------------------------------------------------------ r123 | fabricecolin | 2008-12-02 23:59:34 +0800 (Tue, 02 Dec 2008) | 2 lines Changed paths: M /trunk/xesam/XapianQueryBuilder.cc Support for the date field is conditioned by strptime(). ------------------------------------------------------------------------ r122 | fabricecolin | 2008-12-01 21:03:18 +0800 (Mon, 01 Dec 2008) | 2 lines Changed paths: M /trunk/filters/ExternalFilter.cc M /trunk/filters/Filter.h M /trunk/filters/FilterFactory.cc M /trunk/filters/GMimeMboxFilter.cc Portability fixes, brought up when compiling with MingW. ------------------------------------------------------------------------ r121 | fabricecolin | 2008-11-08 13:39:58 +0800 (Sat, 08 Nov 2008) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc Look out for the X-Evolution header. ------------------------------------------------------------------------ r120 | fabricecolin | 2008-10-18 18:31:54 +0800 (Sat, 18 Oct 2008) | 2 lines Changed paths: M /trunk/filters/ExternalFilter.cc Get CDATA online. In run_command(), expect several %s. ------------------------------------------------------------------------ r119 | fabricecolin | 2008-09-30 21:54:59 +0800 (Tue, 30 Sep 2008) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc The conversion specifier %z is a GNU extension to strftime(). ------------------------------------------------------------------------ r117 | fabricecolin | 2008-09-16 22:05:12 +0800 (Tue, 16 Sep 2008) | 2 lines Changed paths: M /trunk/filters/FilterFactory.cc isSupportedType() would claim plain text and XML were not supported ! ------------------------------------------------------------------------ r115 | fabricecolin | 2008-09-09 22:33:55 +0800 (Tue, 09 Sep 2008) | 2 lines Changed paths: M /trunk/filters/HtmlFilter.cc Check for vsnprintf(). ------------------------------------------------------------------------ r114 | fabricecolin | 2008-08-22 21:50:04 +0800 (Fri, 22 Aug 2008) | 4 lines Changed paths: M /trunk/filters/external-filters.xml If a filter's output element is set to SCAN, the application is supposed to scan it for its MIME type. Added two such filters for application/x-gzip and x-bzip. ------------------------------------------------------------------------ r113 | fabricecolin | 2008-08-15 21:02:48 +0800 (Fri, 15 Aug 2008) | 3 lines Changed paths: M /trunk/filters/FilterFactory.cc Don't assume that all MIME types of class text can be handled by the plain text filter. ------------------------------------------------------------------------ r112 | fabricecolin | 2008-08-04 22:27:19 +0800 (Mon, 04 Aug 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.h M /trunk/filters/ExifImageFilter.cc M /trunk/filters/ExternalFilter.cc M /trunk/filters/Filter.h M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/HtmlFilter.cc M /trunk/filters/HtmlFilter.h M /trunk/filters/TagLibMusicFilter.cc M /trunk/filters/TextFilter.h M /trunk/filters/XmlFilter.cc If gcc 4.*, export filter entry points and base classes explicitely. ------------------------------------------------------------------------ r111 | fabricecolin | 2008-07-26 13:59:03 +0800 (Sat, 26 Jul 2008) | 3 lines Changed paths: M /trunk/filters/external-filters.xml Support for docx formats, as suggested by Frank Bruzzaniti on the xapian-discuss mailing list. ------------------------------------------------------------------------ r109 | fabricecolin | 2008-07-19 15:37:59 +0800 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/filters/HtmlFilter.cc Only check for META keywords, don't add them to the META tags map. ------------------------------------------------------------------------ r108 | fabricecolin | 2008-07-05 14:01:07 +0800 (Sat, 05 Jul 2008) | 2 lines Changed paths: M /trunk/filters/external-filters.xml Run pdftotext in raw mode, it helps with columns. ------------------------------------------------------------------------ r105 | fabricecolin | 2008-06-21 11:55:16 +0800 (Sat, 21 Jun 2008) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc M /trunk/filters/ExifImageFilter.h M /trunk/filters/ExternalFilter.cc M /trunk/filters/ExternalFilter.h M /trunk/filters/Filter.cc M /trunk/filters/Filter.h M /trunk/filters/FilterFactory.cc M /trunk/filters/FilterFactory.h M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h M /trunk/filters/HtmlFilter.cc M /trunk/filters/HtmlFilter.h M /trunk/filters/TagLibMusicFilter.cc M /trunk/filters/TagLibMusicFilter.h M /trunk/filters/TextFilter.cc M /trunk/filters/TextFilter.h M /trunk/filters/XmlFilter.cc M /trunk/filters/XmlFilter.h Filters are under the GPL. ------------------------------------------------------------------------ r104 | fabricecolin | 2008-06-18 20:30:54 +0800 (Wed, 18 Jun 2008) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc Don't call TimeConverter here. ------------------------------------------------------------------------ r103 | fabricecolin | 2008-06-09 23:31:54 +0800 (Mon, 09 Jun 2008) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc No need to reinvent the wheel when we can just use g_mime_message_get_subject(). ------------------------------------------------------------------------ r102 | fabricecolin | 2008-05-28 22:29:18 +0800 (Wed, 28 May 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc M /trunk/cjkv/CJKVTokenizer.h M /trunk/filters/HtmlFilter.cc Cosmetic changes. ------------------------------------------------------------------------ r101 | fabricecolin | 2008-05-23 21:42:32 +0800 (Fri, 23 May 2008) | 2 lines Changed paths: M /trunk/filters/HtmlFilter.cc Skip htdig_noindex blocks. ------------------------------------------------------------------------ r100 | fabricecolin | 2008-05-19 18:01:29 +0800 (Mon, 19 May 2008) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc Decode base64-encoded ISO-2022-JP subject lines. ------------------------------------------------------------------------ r99 | fabricecolin | 2008-05-11 17:16:45 +0800 (Sun, 11 May 2008) | 3 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc M /trunk/filters/GMimeMboxFilter.h Don't use charset filters but remember what the part's charset is and let the calling application handle conversion. ------------------------------------------------------------------------ r98 | fabricecolin | 2008-05-06 22:53:36 +0800 (Tue, 06 May 2008) | 5 lines Changed paths: M /trunk/filters/Filter.cc M /trunk/filters/Filter.h M /trunk/filters/HtmlFilter.cc M /trunk/filters/HtmlFilter.h Filters may need to convert text to UTF-8 before/while processing documents. For instance, libxml2 used by the HTML filter doesn't handle gb2312 (at least when built without iconv support ?) and yet produces UTF-8 output. Fixed confusion about what charset should be set to in the metadata map. ------------------------------------------------------------------------ r97 | fabricecolin | 2008-05-03 14:00:01 +0800 (Sat, 03 May 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc When a switch from non-CJKV to CJKV is detected, don't consume the first byte. ------------------------------------------------------------------------ r96 | fabricecolin | 2008-04-29 20:27:41 +0800 (Tue, 29 Apr 2008) | 2 lines Changed paths: M /trunk/filters/ExifImageFilter.cc Grab all tags. ------------------------------------------------------------------------ r95 | fabricecolin | 2008-04-26 14:51:35 +0800 (Sat, 26 Apr 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc M /trunk/cjkv/CJKVTokenizer.h Allow to break ASCII tokens only on space. ------------------------------------------------------------------------ r94 | fabricecolin | 2008-04-26 13:05:55 +0800 (Sat, 26 Apr 2008) | 2 lines Changed paths: A /trunk/filters/ExifImageFilter.cc A /trunk/filters/ExifImageFilter.h Experimental EXIF filter. ------------------------------------------------------------------------ r93 | fabricecolin | 2008-04-20 18:24:11 +0800 (Sun, 20 Apr 2008) | 3 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc If a part has a charset specified, and it's not UTF-8, install a filter to convert to UTF-8. ------------------------------------------------------------------------ r92 | fabricecolin | 2008-04-11 23:47:33 +0800 (Fri, 11 Apr 2008) | 3 lines Changed paths: M /trunk/filters/TagLibMusicFilter.cc M /trunk/filters/TagLibMusicFilter.h TagLibMusicFilter only supports the input type DOCUMENT_FILE_NAME, contrary to what it advertised. ------------------------------------------------------------------------ r90 | fabricecolin | 2008-03-24 13:11:29 +0800 (Mon, 24 Mar 2008) | 2 lines Changed paths: M /trunk/filters/ExternalFilter.cc M /trunk/filters/HtmlFilter.cc Minor modifications. ------------------------------------------------------------------------ r89 | fabricecolin | 2008-03-10 19:06:19 +0800 (Mon, 10 Mar 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc Includes fix for gcc 4.3 by Adel Gadllah. ------------------------------------------------------------------------ r88 | fabricecolin | 2008-03-09 01:38:56 +0800 (Sun, 09 Mar 2008) | 3 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc M /trunk/cjkv/CJKVTokenizer.h Merging in upstream changes, ie new segment() mthod and and an overload of tokenize() that works with a TokensHandler object. ------------------------------------------------------------------------ r86 | fabricecolin | 2008-02-27 23:17:16 +0800 (Wed, 27 Feb 2008) | 2 lines Changed paths: M /trunk/filters/ExternalFilter.cc M /trunk/filters/HtmlFilter.cc Added missing includes for gcc 4.3. Patch by Adel Gadllah. ------------------------------------------------------------------------ r85 | fabricecolin | 2008-02-26 21:18:39 +0800 (Tue, 26 Feb 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc Break non-CJKV tokens on all spaces and punctuations. ------------------------------------------------------------------------ r84 | fabricecolin | 2008-02-24 19:09:01 +0800 (Sun, 24 Feb 2008) | 2 lines Changed paths: M /trunk/filters/HtmlFilter.cc M /trunk/filters/HtmlFilter.h Do without hashing. ------------------------------------------------------------------------ r83 | fabricecolin | 2008-02-24 19:07:55 +0800 (Sun, 24 Feb 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc M /trunk/cjkv/test.cc Don't fail on spaces in has_cjkv_only(). ------------------------------------------------------------------------ r82 | fabricecolin | 2008-02-22 22:12:23 +0800 (Fri, 22 Feb 2008) | 2 lines Changed paths: M /trunk/cjkv/CJKVTokenizer.cc Cosmetic fix. ------------------------------------------------------------------------ r81 | fabricecolin | 2008-02-22 22:11:12 +0800 (Fri, 22 Feb 2008) | 4 lines Changed paths: M /trunk/xesam/XapianQueryBuilder.cc M /trunk/xesam/XapianQueryBuilder.h M /trunk/xesam/XesamParser.h M /trunk/xesam/XesamQLParser.cc M /trunk/xesam/XesamQLParser.h M /trunk/xesam/XesamQueryBuilder.cc M /trunk/xesam/XesamQueryBuilder.h M /trunk/xesam/XesamULParser.cc M /trunk/xesam/XesamULParser.h Made XesamQueryBuilder::on_user_query() pure virtual, moved XesamULParser-based implementation to XapianQueryBuilder. Log stuff with XESAM_LOG macros, which we expect apps to define in XesamLog.h. ------------------------------------------------------------------------ r80 | fabricecolin | 2008-02-18 19:28:29 +0800 (Mon, 18 Feb 2008) | 3 lines Changed paths: M /trunk/filters/ExternalFilter.cc M /trunk/filters/ExternalFilter.h M /trunk/filters/external-filters.xml Each filter block can specify a charset element if the output of the program is in a known charset. ------------------------------------------------------------------------ r79 | fabricecolin | 2008-02-18 19:25:07 +0800 (Mon, 18 Feb 2008) | 6 lines Changed paths: A /trunk/cjkv/CJKVTokenizer.cc (from /trunk/cjkv/cjkv-tokenizer.cc:77) A /trunk/cjkv/CJKVTokenizer.h (from /trunk/cjkv/cjkv-tokenizer.hh:77) D /trunk/cjkv/cjkv-tokenizer.cc D /trunk/cjkv/cjkv-tokenizer.hh M /trunk/cjkv/makefile M /trunk/cjkv/test.cc Added wrapper functions around gunicode API, enabled when HAVE_UNICODE_H isn't defined. This variable should be defined only if libunicode 0.4 is available. Renamed the main class, made several mostly cosmetic changes to make it look like code from xesam/ and filters/ and added copyright and licensing headers. Thanks to Yung-Chung Lin for agreeing on licensing this under the LGPL. ------------------------------------------------------------------------ r77 | fabricecolin | 2008-01-14 22:35:44 +0800 (Mon, 14 Jan 2008) | 6 lines Changed paths: M /trunk/xesam/XapianQueryBuilder.cc M /trunk/xesam/XapianQueryBuilder.h M /trunk/xesam/XesamQLParser.cc Implemented support for size (on field "file:size") and time ranges, Proximity with any number of values, negation, as well as for content (but not source) filtering, either with the Category selector or the Query element. The latter would require the proper classification of documents, and is therefore extremely basic. ------------------------------------------------------------------------ r76 | fabricecolin | 2008-01-14 22:22:42 +0800 (Mon, 14 Jan 2008) | 3 lines Changed paths: M /trunk/xesam/XesamQueryBuilder.cc M /trunk/xesam/XesamQueryBuilder.h M /trunk/xesam/XesamULParser.cc Look only for HAVE_BOOST_SPIRIT to determine whether Spirit is available. Modified prototypes of on_query methods. ------------------------------------------------------------------------ r75 | fabricecolin | 2008-01-12 21:35:45 +0800 (Sat, 12 Jan 2008) | 2 lines Changed paths: M /trunk/xesam/XesamQLParser.cc M /trunk/xesam/XesamQueryBuilder.h The Type selector was apparently replaced with Category. ------------------------------------------------------------------------ r74 | fabricecolin | 2008-01-05 12:53:44 +0800 (Sat, 05 Jan 2008) | 2 lines Changed paths: M /trunk/filters/GMimeMboxFilter.cc This can also process text/x-mail ant text/x-news. ------------------------------------------------------------------------ r73 | xern | 2007-12-24 11:36:38 +0800 (Mon, 24 Dec 2007) | 1 line Changed paths: A /trunk/cjkv A /trunk/cjkv/cjkv-tokenizer.cc A /trunk/cjkv/cjkv-tokenizer.hh A /trunk/cjkv/makefile A /trunk/cjkv/test.cc The first commit of cjkv tokenizer. ------------------------------------------------------------------------ r72 | fabricecolin | 2007-11-29 21:21:44 +0800 (Thu, 29 Nov 2007) | 2 lines Changed paths: A /trunk/filters/external-filters.xml This file is needed too. ------------------------------------------------------------------------ r71 | fabricecolin | 2007-11-29 20:52:11 +0800 (Thu, 29 Nov 2007) | 2 lines Changed paths: M /trunk/xesam/XapianQueryBuilder.cc Assume the caller set the appropriate ValueRangeProcessor's on the QueryParser. ------------------------------------------------------------------------ r70 | fabricecolin | 2007-11-29 20:44:37 +0800 (Thu, 29 Nov 2007) | 3 lines Changed paths: A /trunk/filters A /trunk/filters/ExternalFilter.cc A /trunk/filters/ExternalFilter.h A /trunk/filters/Filter.cc A /trunk/filters/Filter.h A /trunk/filters/FilterFactory.cc A /trunk/filters/FilterFactory.h A /trunk/filters/GMimeMboxFilter.cc A /trunk/filters/GMimeMboxFilter.h A /trunk/filters/HtmlFilter.cc A /trunk/filters/HtmlFilter.h A /trunk/filters/TagLibMusicFilter.cc A /trunk/filters/TagLibMusicFilter.h A /trunk/filters/TextFilter.cc A /trunk/filters/TextFilter.h A /trunk/filters/XmlFilter.cc A /trunk/filters/XmlFilter.h A /trunk/xesam A /trunk/xesam/XapianQueryBuilder.cc A /trunk/xesam/XapianQueryBuilder.h A /trunk/xesam/XesamParser.h A /trunk/xesam/XesamQLParser.cc A /trunk/xesam/XesamQLParser.h A /trunk/xesam/XesamQueryBuilder.cc A /trunk/xesam/XesamQueryBuilder.h A /trunk/xesam/XesamULParser.cc A /trunk/xesam/XesamULParser.h Current code imported from branches/pinot. All this no longer depends on Pinot's code so there's no reason to keep it in a branch. ------------------------------------------------------------------------ r1 | fabricecolin | 2007-01-27 11:53:54 +0800 (Sat, 27 Jan 2007) | 2 lines Changed paths: A /trunk Trunk. ------------------------------------------------------------------------ pinot-1.22/ChangeLog-svn000066400000000000000000026221211470740426600151650ustar00rootroot00000000000000------------------------------------------------------------------------ r1938 | fabrice.colin@gmail.com | 2015-06-13 09:31:43 +0200 (Sat, 13 Jun 2015) | 2 lines Changed paths: M /trunk/aclocal.m4 Regenerated. ------------------------------------------------------------------------ r1937 | fabrice.colin@gmail.com | 2015-06-11 17:01:35 +0200 (Thu, 11 Jun 2015) | 2 lines Changed paths: M /trunk/NEWS M /trunk/README M /trunk/configure.in M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Updated for v1.09. ------------------------------------------------------------------------ r1936 | fabrice.colin@gmail.com | 2015-06-10 21:57:59 +0200 (Wed, 10 Jun 2015) | 2 lines Changed paths: A /trunk/Tokenize/filters/JsonFilter.cc A /trunk/Tokenize/filters/JsonFilter.h Simple JSON filter. ------------------------------------------------------------------------ r1935 | fabrice.colin@gmail.com | 2015-06-10 21:54:27 +0200 (Wed, 10 Jun 2015) | 2 lines Changed paths: M /trunk/Tokenize/filters/GMimeMboxFilter.cc M /trunk/Tokenize/filters/GMimeMboxFilter.h Better date extraction. ------------------------------------------------------------------------ r1934 | fabrice.colin@gmail.com | 2015-06-10 21:44:00 +0200 (Wed, 10 Jun 2015) | 3 lines Changed paths: M /trunk/Core/PinotSettings.cpp M /trunk/Core/pinot-search.cpp M /trunk/IndexSearch/QueryProperties.h M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/queryDialog.cc Sort by date in descending or ascending order. Updated Copyright notice. ------------------------------------------------------------------------ r1933 | fabrice.colin@gmail.com | 2014-12-19 23:13:55 +0100 (Fri, 19 Dec 2014) | 2 lines Changed paths: M /trunk/NEWS Fixed language mixup... ------------------------------------------------------------------------ r1932 | fabrice.colin@gmail.com | 2014-12-19 23:12:37 +0100 (Fri, 19 Dec 2014) | 2 lines Changed paths: M /trunk/Core/pinot-index.cpp Look for filters in the same places as the daemon and the GUI. ------------------------------------------------------------------------ r1931 | fabrice.colin@gmail.com | 2014-12-19 23:10:24 +0100 (Fri, 19 Dec 2014) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Helper method reopen(). ------------------------------------------------------------------------ r1929 | fabrice.colin@gmail.com | 2014-07-19 10:20:15 +0200 (Sat, 19 Jul 2014) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1928 | fabrice.colin@gmail.com | 2014-07-18 21:15:03 +0200 (Fri, 18 Jul 2014) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO 1.08 news. ------------------------------------------------------------------------ r1927 | fabrice.colin@gmail.com | 2014-07-18 21:14:21 +0200 (Fri, 18 Jul 2014) | 2 lines Changed paths: M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current POs. ------------------------------------------------------------------------ r1926 | fabrice.colin@gmail.com | 2014-07-18 21:04:27 +0200 (Fri, 18 Jul 2014) | 3 lines Changed paths: M /trunk/configure.in M /trunk/po/POTFILES.in Preparing for v1.08. Added --enable-libnotify switch for BSD. ------------------------------------------------------------------------ r1925 | fabrice.colin@gmail.com | 2014-07-18 21:03:35 +0200 (Fri, 18 Jul 2014) | 2 lines Changed paths: M /trunk/Tokenize/filters/FilterFactory.cc Look for LLVM-mangled names if built with LLVM. ------------------------------------------------------------------------ r1924 | fabrice.colin@gmail.com | 2014-07-17 21:57:37 +0200 (Thu, 17 Jul 2014) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Fallback if the header wasn't quoted. ------------------------------------------------------------------------ r1923 | fabrice.colin@gmail.com | 2014-07-12 18:43:47 +0200 (Sat, 12 Jul 2014) | 3 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Set the HTTP response code as document field ResponseCode. Likewise for all HTTP headers seen in the response. ------------------------------------------------------------------------ r1922 | fabrice.colin@gmail.com | 2014-07-12 18:42:01 +0200 (Sat, 12 Jul 2014) | 2 lines Changed paths: M /trunk/Tokenize/filters/FilterFactory.cc Close the handle and move on if FILTERTYPESFUNC can't be looked up! ------------------------------------------------------------------------ r1921 | fabrice.colin@gmail.com | 2014-07-12 18:40:49 +0200 (Sat, 12 Jul 2014) | 2 lines Changed paths: M /trunk/README M /trunk/UI/GTK2/src/mainWindow.cc Updated year. ------------------------------------------------------------------------ r1920 | fabrice.colin@gmail.com | 2014-07-12 18:17:07 +0200 (Sat, 12 Jul 2014) | 2 lines Changed paths: M /trunk/Utils/Makefile.am Never install the xdgmime files. ------------------------------------------------------------------------ r1919 | fabrice.colin@gmail.com | 2014-07-04 22:27:56 +0200 (Fri, 04 Jul 2014) | 2 lines Changed paths: M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Simplified. ------------------------------------------------------------------------ r1918 | fabrice.colin@gmail.com | 2014-06-28 20:08:34 +0200 (Sat, 28 Jun 2014) | 2 lines Changed paths: M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Provide values in the right order... ------------------------------------------------------------------------ r1917 | fabrice.colin@gmail.com | 2014-06-28 15:33:41 +0200 (Sat, 28 Jun 2014) | 2 lines Changed paths: M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h One more overload of executePreparedStatement() for when no result is expected. ------------------------------------------------------------------------ r1916 | fabrice.colin@gmail.com | 2014-06-28 12:51:03 +0200 (Sat, 28 Jun 2014) | 2 lines Changed paths: M /trunk/Tokenize/filters/GMimeMboxFilter.cc Fix type warning. ------------------------------------------------------------------------ r1915 | fabrice.colin@gmail.com | 2014-06-28 12:50:36 +0200 (Sat, 28 Jun 2014) | 3 lines Changed paths: M /trunk/SQL/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/TimeConverter.h Split libSQL into libSQL, libSQLite and libSQLDB. Some cosmetic changes. ------------------------------------------------------------------------ r1914 | fabrice.colin@gmail.com | 2014-06-28 12:48:41 +0200 (Sat, 28 Jun 2014) | 3 lines Changed paths: M /trunk/Core/Makefile.am A /trunk/Core/WorkerThread.cpp A /trunk/Core/WorkerThread.h M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h Moved base classes to WorkerThread, with no dependency on PinotSettings nor DB-based classes. ------------------------------------------------------------------------ r1913 | fabrice.colin@gmail.com | 2014-06-27 20:36:31 +0200 (Fri, 27 Jun 2014) | 2 lines Changed paths: M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Added rollbackTransaction() and a executePreparedStatement() overload to SQLDB. ------------------------------------------------------------------------ r1912 | fabrice.colin@gmail.com | 2014-06-27 20:28:05 +0200 (Fri, 27 Jun 2014) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h Export timegm(). ------------------------------------------------------------------------ r1910 | fabrice.colin@gmail.com | 2014-05-25 11:11:26 +0200 (Sun, 25 May 2014) | 2 lines Changed paths: M /trunk/NEWS Correct date. ------------------------------------------------------------------------ r1909 | fabrice.colin@gmail.com | 2014-05-22 20:30:42 +0200 (Thu, 22 May 2014) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/NEWS Current NEWS and log. ------------------------------------------------------------------------ r1908 | fabrice.colin@gmail.com | 2014-05-22 20:25:13 +0200 (Thu, 22 May 2014) | 3 lines Changed paths: M /trunk/aclocal.m4 M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Updated POs. With aclocal 1.13.4. ------------------------------------------------------------------------ r1907 | fabrice.colin@gmail.com | 2014-05-22 20:22:34 +0200 (Thu, 22 May 2014) | 2 lines Changed paths: M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/TextConverter.h Convert to and from UTF-8. ------------------------------------------------------------------------ r1906 | fabrice.colin@gmail.com | 2014-05-20 20:31:22 +0200 (Tue, 20 May 2014) | 2 lines Changed paths: M /trunk/README M /trunk/configure.in M /trunk/pinot.spec.in Brought back mempool build option, with boost 1.54. ------------------------------------------------------------------------ r1905 | fabrice.colin@gmail.com | 2014-05-19 21:47:16 +0200 (Mon, 19 May 2014) | 2 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/filters/ArchiveFilter.cc M /trunk/Tokenize/filters/ArchiveFilter.h M /trunk/Tokenize/filters/ChmFilter.cc M /trunk/Tokenize/filters/ChmFilter.h M /trunk/Tokenize/filters/ExifImageFilter.cc M /trunk/Tokenize/filters/ExifImageFilter.h M /trunk/Tokenize/filters/Exiv2ImageFilter.cc M /trunk/Tokenize/filters/Exiv2ImageFilter.h M /trunk/Tokenize/filters/ExternalFilter.cc M /trunk/Tokenize/filters/ExternalFilter.h M /trunk/Tokenize/filters/Filter.h M /trunk/Tokenize/filters/GMimeMboxFilter.cc M /trunk/Tokenize/filters/GMimeMboxFilter.h M /trunk/Tokenize/filters/HtmlFilter.cc M /trunk/Tokenize/filters/HtmlFilter.h M /trunk/Tokenize/filters/TagLibMusicFilter.cc M /trunk/Tokenize/filters/TagLibMusicFilter.h M /trunk/Tokenize/filters/TarFilter.cc M /trunk/Tokenize/filters/TarFilter.h M /trunk/Tokenize/filters/TextFilter.cc M /trunk/Tokenize/filters/TextFilter.h M /trunk/Tokenize/filters/XmlFilter.cc M /trunk/Tokenize/filters/XmlFilter.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/Document.cpp M /trunk/Utils/Document.h Prefer off_t for file sizes, offsets, buffer lengths. ------------------------------------------------------------------------ r1904 | fabrice.colin@gmail.com | 2014-05-19 21:42:33 +0200 (Mon, 19 May 2014) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp Try and reapply a watch on locations that have been deleted. ------------------------------------------------------------------------ r1903 | fabrice.colin@gmail.com | 2013-12-02 19:33:54 +0100 (Mon, 02 Dec 2013) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/UI/GTK2/src/mainWindow.cc Fix for http://code.google.com/p/pinot-search/issues/detail?id=20, patch from Jelle van der Waa. ------------------------------------------------------------------------ r1902 | fabrice.colin@gmail.com | 2013-12-01 13:12:28 +0100 (Sun, 01 Dec 2013) | 2 lines Changed paths: M /trunk/configure.in M /trunk/pinot.spec.in Use PKG_CHECK_MODULES() for libexttextcat too, require help2man at build time. ------------------------------------------------------------------------ r1901 | fabrice.colin@gmail.com | 2013-10-17 11:53:21 +0200 (Thu, 17 Oct 2013) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.h Added missing header. ------------------------------------------------------------------------ r1899 | fabrice.colin@gmail.com | 2013-05-26 06:34:42 +0200 (Sun, 26 May 2013) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1898 | fabrice.colin@gmail.com | 2013-05-26 06:32:13 +0200 (Sun, 26 May 2013) | 2 lines Changed paths: M /trunk/NEWS M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Releasing 1.06. ------------------------------------------------------------------------ r1897 | fabrice.colin@gmail.com | 2013-05-24 08:33:39 +0200 (Fri, 24 May 2013) | 2 lines Changed paths: M /trunk/IndexSearch/Makefile.am M /trunk/Tokenize/Makefile.am Install cjkv/CJKVTokenizer.h. ------------------------------------------------------------------------ r1896 | fabrice.colin@gmail.com | 2013-05-24 07:37:21 +0200 (Fri, 24 May 2013) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp Look out for SQLITE_IOERR_BLOCKED and handle pre-3.6.24 specifics. ------------------------------------------------------------------------ r1895 | fabrice.colin@gmail.com | 2013-05-24 07:35:20 +0200 (Fri, 24 May 2013) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h Brought this up to date, even though curl is still the preferred HTTP backend. ------------------------------------------------------------------------ r1894 | fabrice.colin@gmail.com | 2013-05-18 05:56:27 +0200 (Sat, 18 May 2013) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Query parameters were ignored. ------------------------------------------------------------------------ r1893 | fabrice.colin@gmail.com | 2013-05-18 05:55:38 +0200 (Sat, 18 May 2013) | 2 lines Changed paths: M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/SherlockParser.cpp M /trunk/configure.in Support for boost 1.50, based on patch from Thierry Thomas. ------------------------------------------------------------------------ r1892 | fabrice.colin@gmail.com | 2013-04-30 15:39:11 +0200 (Tue, 30 Apr 2013) | 2 lines Changed paths: M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/FileCollector.cpp M /trunk/Collect/FileCollector.h Added override to base class. ------------------------------------------------------------------------ r1891 | fabrice.colin@gmail.com | 2013-04-28 09:20:03 +0200 (Sun, 28 Apr 2013) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/CurlDownloader.h M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h Added TimeConverter::toNormalDate(), CurlDownloader::retrieveUrl() override. ------------------------------------------------------------------------ r1889 | fabrice.colin@gmail.com | 2013-03-03 03:33:25 +0100 (Sun, 03 Mar 2013) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1888 | fabrice.colin@gmail.com | 2013-03-03 03:32:53 +0100 (Sun, 03 Mar 2013) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Releasing 1.05. ------------------------------------------------------------------------ r1887 | fabrice.colin@gmail.com | 2013-02-26 13:30:24 +0100 (Tue, 26 Feb 2013) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Less DEBUG. ------------------------------------------------------------------------ r1886 | fabrice.colin@gmail.com | 2013-02-26 13:29:52 +0100 (Tue, 26 Feb 2013) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Tweaked the backup interface. ------------------------------------------------------------------------ r1885 | fabrice.colin@gmail.com | 2013-02-26 13:29:15 +0100 (Tue, 26 Feb 2013) | 2 lines Changed paths: M /trunk/IndexSearch/cjkv/CJKVTokenizer.cc Don't generate ngrams that include space. ------------------------------------------------------------------------ r1884 | fabrice.colin@gmail.com | 2013-02-26 13:28:29 +0100 (Tue, 26 Feb 2013) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp Skip tokens that don't validate. ------------------------------------------------------------------------ r1882 | fabrice.colin@gmail.com | 2013-02-11 00:39:49 +0100 (Mon, 11 Feb 2013) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Preparing for v1.04. ------------------------------------------------------------------------ r1881 | fabrice.colin@gmail.com | 2013-02-11 00:14:39 +0100 (Mon, 11 Feb 2013) | 2 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/SQLDB.cpp M /trunk/SQL/SQLDB.h M /trunk/SQL/ViewHistory.cpp Added helper methods for counting. ------------------------------------------------------------------------ r1880 | fabrice.colin@gmail.com | 2013-02-05 14:45:20 +0100 (Tue, 05 Feb 2013) | 2 lines Changed paths: M /trunk/IndexSearch/FieldMapperInterface.h M /trunk/IndexSearch/Xapian/XapianIndex.cpp Allow the FieldMapper to override the host name, directory and file name. ------------------------------------------------------------------------ r1879 | fabrice.colin@gmail.com | 2013-02-03 05:39:32 +0100 (Sun, 03 Feb 2013) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/cjkv/CJKVTokenizer.cc M /trunk/IndexSearch/cjkv/CJKVTokenizer.h Fix stripping of diacritics. Stem subject terms. ------------------------------------------------------------------------ r1877 | fabrice.colin@gmail.com | 2013-01-14 13:51:11 +0100 (Mon, 14 Jan 2013) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1876 | fabrice.colin@gmail.com | 2013-01-14 13:50:00 +0100 (Mon, 14 Jan 2013) | 2 lines Changed paths: M /trunk/NEWS News for 1.03. ------------------------------------------------------------------------ r1875 | fabrice.colin@gmail.com | 2013-01-14 13:49:12 +0100 (Mon, 14 Jan 2013) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-label.1 Another man page. ------------------------------------------------------------------------ r1874 | fabrice.colin@gmail.com | 2013-01-14 13:48:24 +0100 (Mon, 14 Jan 2013) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/README M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.1 M /trunk/aclocal.m4 M /trunk/configure.in M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Preparing for 1.03. ------------------------------------------------------------------------ r1873 | fabrice.colin@gmail.com | 2013-01-14 13:31:08 +0100 (Mon, 14 Jan 2013) | 2 lines Changed paths: M /trunk/IndexSearch/cjkv/CJKVTokenizer.cc Fix normalization and http://code.google.com/p/pinot-search/issues/detail?id=18. ------------------------------------------------------------------------ r1872 | fabrice.colin@gmail.com | 2013-01-13 06:03:24 +0100 (Sun, 13 Jan 2013) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Support for sqlite's backup API. ------------------------------------------------------------------------ r1871 | fabrice.colin@gmail.com | 2013-01-13 04:50:53 +0100 (Sun, 13 Jan 2013) | 3 lines Changed paths: M /trunk/configure.in Patch for http://code.google.com/p/pinot-search/issues/detail?id=19 from Kamil Rytarowski. ------------------------------------------------------------------------ r1870 | fabrice.colin@gmail.com | 2012-12-15 20:05:47 +0100 (Sat, 15 Dec 2012) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/ru.po M /trunk/po/zh_CN.po Update to the French translation by Eliovir. ------------------------------------------------------------------------ r1868 | fabrice.colin@gmail.com | 2012-11-04 09:22:32 +0100 (Sun, 04 Nov 2012) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot.1 Updated NEWS, manuals. ------------------------------------------------------------------------ r1867 | fabrice.colin@gmail.com | 2012-11-04 08:01:00 +0100 (Sun, 04 Nov 2012) | 2 lines Changed paths: M /trunk/po/cs.po M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current POs. ------------------------------------------------------------------------ r1866 | fabrice.colin@gmail.com | 2012-11-04 07:59:05 +0100 (Sun, 04 Nov 2012) | 2 lines Changed paths: M /trunk/aclocal.m4 Updated for aclocal 1.11.6. ------------------------------------------------------------------------ r1865 | fabrice.colin@gmail.com | 2012-11-04 07:58:37 +0100 (Sun, 04 Nov 2012) | 3 lines Changed paths: M /trunk/configure.in M /trunk/pinot.spec.in Bumped release to 1.02. Mempools are turned off by default. Added a dependency on libuuid-devel, not used directly but required by Xapian. ------------------------------------------------------------------------ r1864 | fabrice.colin@gmail.com | 2012-11-04 07:57:35 +0100 (Sun, 04 Nov 2012) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Less DEBUG. ------------------------------------------------------------------------ r1863 | fabrice.colin@gmail.com | 2012-11-04 07:57:17 +0100 (Sun, 04 Nov 2012) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Prefer %X for encoding. ------------------------------------------------------------------------ r1862 | fabrice.colin@gmail.com | 2012-11-03 06:26:36 +0100 (Sat, 03 Nov 2012) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/CurlDownloader.h New method putUrl(). ------------------------------------------------------------------------ r1861 | fabrice.colin@gmail.com | 2012-11-03 06:25:59 +0100 (Sat, 03 Nov 2012) | 2 lines Changed paths: M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h Minor include fixes. ------------------------------------------------------------------------ r1860 | fabrice.colin@gmail.com | 2012-10-25 18:21:27 +0200 (Thu, 25 Oct 2012) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/configure.in A /trunk/po/cs.po Czech translation by Zbyněk Schwarz. ------------------------------------------------------------------------ r1859 | fabrice.colin@gmail.com | 2012-09-23 06:46:58 +0200 (Sun, 23 Sep 2012) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/pt_BR.po Update by Adriano Steffler. ------------------------------------------------------------------------ r1858 | fabrice.colin@gmail.com | 2012-09-12 14:33:56 +0200 (Wed, 12 Sep 2012) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Japanese translation update by Takafumi Arakaki. ------------------------------------------------------------------------ r1857 | fabrice.colin@gmail.com | 2012-09-12 14:29:00 +0200 (Wed, 12 Sep 2012) | 2 lines Changed paths: M /trunk/FAQ Entry to help with http://code.google.com/p/pinot-search/issues/detail?id=15 ------------------------------------------------------------------------ r1855 | fabrice.colin@gmail.com | 2012-08-27 15:21:46 +0200 (Mon, 27 Aug 2012) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1854 | fabrice.colin@gmail.com | 2012-08-27 15:20:31 +0200 (Mon, 27 Aug 2012) | 2 lines Changed paths: M /trunk/Core/pinot-daemon.1 M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Releasing v1.01. ------------------------------------------------------------------------ r1853 | fabrice.colin@gmail.com | 2012-08-27 15:05:55 +0200 (Mon, 27 Aug 2012) | 2 lines Changed paths: M /trunk/Core/pinot-index.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Allow to override MIME type detection with "-o/--override MIMETYPE:EXTENSION". ------------------------------------------------------------------------ r1852 | fabrice.colin@gmail.com | 2012-08-26 07:19:53 +0200 (Sun, 26 Aug 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/GMimeMboxFilter.cc M /trunk/Tokenize/filters/GMimeMboxFilter.h Fixed parts parsing. ------------------------------------------------------------------------ r1851 | fabrice.colin@gmail.com | 2012-08-26 07:19:08 +0200 (Sun, 26 Aug 2012) | 2 lines Changed paths: M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/filters/ExternalFilter.cc M /trunk/Tokenize/filters/FileOutputFilter.cc Various fixes. ------------------------------------------------------------------------ r1850 | fabrice.colin@gmail.com | 2012-08-25 11:23:09 +0200 (Sat, 25 Aug 2012) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/README M /trunk/Tokenize/filters/external-filters.xml Add an entry for RST files, as suggested by Takafumi Arakaki in http://code.google.com/p/pinot-search/issues/detail?id=12 ------------------------------------------------------------------------ r1849 | fabrice.colin@gmail.com | 2012-08-25 11:19:36 +0200 (Sat, 25 Aug 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/FileOutputFilter.cc Catch zero outputs early. ------------------------------------------------------------------------ r1848 | fabrice.colin@gmail.com | 2012-08-05 10:58:44 +0200 (Sun, 05 Aug 2012) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/cjkv/CJKVTokenizer.cc M /trunk/IndexSearch/cjkv/CJKVTokenizer.h M /trunk/Utils/Makefile.am M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h D /trunk/Utils/unac Dropped unac in favour of our own code. UTF-8 strings are normalized. Some other minor changes. ------------------------------------------------------------------------ r1847 | fabrice.colin@gmail.com | 2012-08-05 03:44:43 +0200 (Sun, 05 Aug 2012) | 2 lines Changed paths: M /trunk/Utils/unac/unac.c Updated to unac 1.8.0. ------------------------------------------------------------------------ r1846 | fabrice.colin@gmail.com | 2012-07-10 12:36:43 +0200 (Tue, 10 Jul 2012) | 2 lines Changed paths: M /trunk/textcat32_conf.txt This was in DOS format for some reason... ------------------------------------------------------------------------ r1844 | fabrice.colin@gmail.com | 2012-06-16 09:21:09 +0200 (Sat, 16 Jun 2012) | 2 lines Changed paths: M /trunk/ChangeLog Current logs. ------------------------------------------------------------------------ r1843 | fabrice.colin@gmail.com | 2012-06-15 17:09:30 +0200 (Fri, 15 Jun 2012) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/Makefile.am M /trunk/UI/GTK2/src/pinot.1 Regenerated manuals with help2man 1.40.8. ------------------------------------------------------------------------ r1842 | fabrice.colin@gmail.com | 2012-06-15 17:06:04 +0200 (Fri, 15 Jun 2012) | 2 lines Changed paths: M /trunk/NEWS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Preparing for the v1.0 release. ------------------------------------------------------------------------ r1841 | fabrice.colin@gmail.com | 2012-06-15 17:01:55 +0200 (Fri, 15 Jun 2012) | 2 lines Changed paths: M /trunk/AUTHORS Forgot to mention K Rytarowski's input. ------------------------------------------------------------------------ r1840 | fabrice.colin@gmail.com | 2012-06-15 16:59:51 +0200 (Fri, 15 Jun 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/Exiv2ImageFilter.cc Don't fail if no data was extracted, most images don't have comments etc... ------------------------------------------------------------------------ r1839 | fabrice.colin@gmail.com | 2012-06-15 16:57:49 +0200 (Fri, 15 Jun 2012) | 2 lines Changed paths: M /trunk/configure.in Moved check for ext/malloc_allocator.h. ------------------------------------------------------------------------ r1838 | fabrice.colin@gmail.com | 2012-06-15 16:57:13 +0200 (Fri, 15 Jun 2012) | 2 lines Changed paths: M /trunk/aclocal.m4 M /trunk/ltmain.sh Updated for aclocal 1.11.3 and libtool 2.4.2. ------------------------------------------------------------------------ r1837 | fabrice.colin@gmail.com | 2012-06-12 17:04:14 +0200 (Tue, 12 Jun 2012) | 2 lines Changed paths: M /trunk/IndexSearch/Makefile.am Use DBUS_CFLAGS if HAVE_DBUS is defined. ------------------------------------------------------------------------ r1836 | fabrice.colin@gmail.com | 2012-06-11 12:43:00 +0200 (Mon, 11 Jun 2012) | 3 lines Changed paths: M /trunk/README M /trunk/pinot.spec.in Memory pooling in boost v1.48 seems problematic. It's turned off by the RPM build by default. ------------------------------------------------------------------------ r1835 | fabrice.colin@gmail.com | 2012-06-11 12:41:54 +0200 (Mon, 11 Jun 2012) | 3 lines Changed paths: M /trunk/Tokenize/filters/GMimeMboxFilter.cc M /trunk/Tokenize/filters/GMimeMboxFilter.h Better at parsing combinations of alternative and mixed parts. The ipath scheme changed. ------------------------------------------------------------------------ r1834 | fabrice.colin@gmail.com | 2012-06-10 16:53:48 +0200 (Sun, 10 Jun 2012) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/LanguageDetector.cpp A /trunk/textcat32_conf.txt Support for libexttextcat v3.2 based on patches from Thierry Thomas. ------------------------------------------------------------------------ r1833 | fabrice.colin@gmail.com | 2012-06-10 16:52:04 +0200 (Sun, 10 Jun 2012) | 2 lines Changed paths: M /trunk/Collect/FileCollector.cpp DEBUG message change. ------------------------------------------------------------------------ r1832 | fabrice.colin@gmail.com | 2012-06-10 15:33:25 +0200 (Sun, 10 Jun 2012) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Fixed typo. ------------------------------------------------------------------------ r1831 | fabrice.colin@gmail.com | 2012-06-10 11:55:18 +0200 (Sun, 10 Jun 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/FilterFactory.cc Fixed error handling. ------------------------------------------------------------------------ r1830 | fabrice.colin@gmail.com | 2012-06-10 11:50:55 +0200 (Sun, 10 Jun 2012) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cc M /trunk/UI/GTK2/src/ResultsTree.cc Missing header and unused variable. ------------------------------------------------------------------------ r1829 | fabrice.colin@gmail.com | 2012-06-01 15:19:48 +0200 (Fri, 01 Jun 2012) | 3 lines Changed paths: M /trunk/configure.in Fixed libexttextcat's headers checks. Bumped version number to 1.0. ------------------------------------------------------------------------ r1828 | fabrice.colin@gmail.com | 2012-06-01 14:59:45 +0200 (Fri, 01 Jun 2012) | 2 lines Changed paths: M /trunk/README Removed references to Deskbar applet. ------------------------------------------------------------------------ r1827 | fabrice.colin@gmail.com | 2012-06-01 14:58:28 +0200 (Fri, 01 Jun 2012) | 2 lines Changed paths: M /trunk/pinot.spec.in Fixed --with nodbus builds. ------------------------------------------------------------------------ r1826 | fabrice.colin@gmail.com | 2012-06-01 14:56:46 +0200 (Fri, 01 Jun 2012) | 2 lines Changed paths: M /trunk/configure.in Link with -lexttextcat if necessary. ------------------------------------------------------------------------ r1825 | fabrice.colin@gmail.com | 2012-05-26 06:17:24 +0200 (Sat, 26 May 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/ExternalFilter.cc M /trunk/Tokenize/filters/FileOutputFilter.cc Removed dead code. ------------------------------------------------------------------------ r1824 | fabrice.colin@gmail.com | 2012-05-26 06:01:39 +0200 (Sat, 26 May 2012) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h M /trunk/Core/pinot-index.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/prefsWindow.hh M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Moved queueing out of ThreadsManager, into QueueManager. ------------------------------------------------------------------------ r1823 | fabrice.colin@gmail.com | 2012-05-23 16:38:10 +0200 (Wed, 23 May 2012) | 2 lines Changed paths: M /trunk/Utils/CommandLine.cpp M /trunk/Utils/CommandLine.h Added runSync() overload, based on ExternalFilter::run_command(). ------------------------------------------------------------------------ r1822 | fabrice.colin@gmail.com | 2012-05-20 04:57:52 +0200 (Sun, 20 May 2012) | 2 lines Changed paths: M /trunk/Core/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/UI/GTK2/src/Makefile.am Follow up to SVN r1819. ------------------------------------------------------------------------ r1821 | fabrice.colin@gmail.com | 2012-05-20 04:11:25 +0200 (Sun, 20 May 2012) | 6 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Core/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/SQL/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/Tokenize/filters/FilterFactory.cc M /trunk/Tokenize/filters/GMimeMboxFilter.cc M /trunk/Utils/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Build private libraries statically only. Provide a version number to the Xapian backend library. Dropped exiv2 requirement down to v0.18. FilterFactory ignores filters that duplicate existing types. Minor edit to the mbox filter. ------------------------------------------------------------------------ r1820 | fabrice.colin@gmail.com | 2012-05-20 03:06:52 +0200 (Sun, 20 May 2012) | 2 lines Changed paths: D /trunk/SQL/historytest.cpp Obsolete file. ------------------------------------------------------------------------ r1819 | fabrice.colin@gmail.com | 2012-05-20 03:05:49 +0200 (Sun, 20 May 2012) | 2 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Core/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/SQL/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am M /trunk/pinot.spec.in Install headers and libraries needed to build apps based on Pinot. ------------------------------------------------------------------------ r1818 | fabrice.colin@gmail.com | 2012-05-17 16:30:17 +0200 (Thu, 17 May 2012) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/zh_CN.po Simplified Chinese translation update by happymeng. ------------------------------------------------------------------------ r1817 | fabrice.colin@gmail.com | 2012-05-16 01:47:42 +0200 (Wed, 16 May 2012) | 2 lines Changed paths: M /trunk/Utils/CommandLine.cpp Minor edit. ------------------------------------------------------------------------ r1816 | fabrice.colin@gmail.com | 2012-05-16 01:10:56 +0200 (Wed, 16 May 2012) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Moved crawl history checks out of MonitorThread, into HistoryMonitorThread. ------------------------------------------------------------------------ r1815 | fabrice.colin@gmail.com | 2012-05-13 06:28:38 +0200 (Sun, 13 May 2012) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/filters/ArchiveFilter.cc M /trunk/Tokenize/filters/ChmFilter.cc M /trunk/Tokenize/filters/ExifImageFilter.cc M /trunk/Tokenize/filters/Exiv2ImageFilter.cc M /trunk/Tokenize/filters/ExternalFilter.cc M /trunk/Tokenize/filters/FileOutputFilter.cc M /trunk/Tokenize/filters/Filter.cc M /trunk/Tokenize/filters/FilterFactory.cc M /trunk/Tokenize/filters/GMimeMboxFilter.cc M /trunk/Tokenize/filters/HtmlFilter.cc M /trunk/Tokenize/filters/HtmlParser.cc M /trunk/Tokenize/filters/TagLibMusicFilter.cc M /trunk/Tokenize/filters/TarFilter.cc M /trunk/Tokenize/filters/TextFilter.cc M /trunk/Tokenize/filters/XmlFilter.cc M /trunk/UI/GTK2/src/EnginesTree.cc M /trunk/UI/GTK2/src/IndexPage.cc M /trunk/UI/GTK2/src/Notebook.cc M /trunk/UI/GTK2/src/PinotUtils.cc M /trunk/UI/GTK2/src/ResultsTree.cc M /trunk/UI/GTK2/src/UIThreads.cc M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Log to clog. Minor changes. ------------------------------------------------------------------------ r1814 | fabrice.colin@gmail.com | 2012-05-13 06:14:58 +0200 (Sun, 13 May 2012) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/FileCollector.cpp M /trunk/Collect/NeonDownloader.cpp M /trunk/Core/DaemonState.cpp M /trunk/Core/OnDiskHandler.cpp M /trunk/Core/PinotSettings.cpp M /trunk/Core/ServerThreads.cpp M /trunk/Core/UniqueApplication.cpp M /trunk/Core/WorkerThreads.cpp M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Core/pinot-index.cpp M /trunk/Core/pinot-search.cpp M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/IndexSearch/Google/GoogleAPIEngine.cpp M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/SearchEngineInterface.cpp M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/WebEngine.cpp M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianDatabaseFactory.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/XesamGLib/XesamEngine.cpp M /trunk/IndexSearch/pinot-label.cpp M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorInterface.h Log to clog. Minor changes. ------------------------------------------------------------------------ r1813 | fabrice.colin@gmail.com | 2012-04-08 10:19:24 +0200 (Sun, 08 Apr 2012) | 3 lines Changed paths: M /trunk/Collect/FileCollector.cpp M /trunk/Tokenize/FilterUtils.cpp FileCollector puts filters in view mode. FilterUtils copies all of the document's metadata when filtering it iteratively. ------------------------------------------------------------------------ r1812 | fabrice.colin@gmail.com | 2012-04-04 15:43:26 +0200 (Wed, 04 Apr 2012) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp Support for file-based stub databases. Let FieldMapper add custom values. ------------------------------------------------------------------------ r1811 | fabrice.colin@gmail.com | 2012-04-01 16:21:12 +0200 (Sun, 01 Apr 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/Exiv2ImageFilter.cc Fixed typo. ------------------------------------------------------------------------ r1810 | fabrice.colin@gmail.com | 2012-04-01 13:46:09 +0200 (Sun, 01 Apr 2012) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/SQL/SQLiteBase.cpp Less DEBUG. ------------------------------------------------------------------------ r1809 | fabrice.colin@gmail.com | 2012-04-01 05:41:41 +0200 (Sun, 01 Apr 2012) | 2 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/IndexSearch/FilterWrapper.h Follow-up to SVN r1807. ------------------------------------------------------------------------ r1808 | fabrice.colin@gmail.com | 2012-03-31 08:00:59 +0200 (Sat, 31 Mar 2012) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/po/POTFILES.in Changes to stepping and error logging. ------------------------------------------------------------------------ r1807 | fabrice.colin@gmail.com | 2012-03-31 07:59:06 +0200 (Sat, 31 Mar 2012) | 3 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/IndexSearch/FilterWrapper.h M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h Preserve the filter interface while letting callers provide their own IndexAction sub-class. ------------------------------------------------------------------------ r1806 | fabrice.colin@gmail.com | 2012-03-27 17:27:57 +0200 (Tue, 27 Mar 2012) | 2 lines Changed paths: M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h Adedd toUpperCase(). ------------------------------------------------------------------------ r1805 | fabrice.colin@gmail.com | 2012-03-25 10:58:00 +0200 (Sun, 25 Mar 2012) | 2 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Get a human-readable description with getDescription() if GIO is in use. ------------------------------------------------------------------------ r1804 | fabrice.colin@gmail.com | 2012-03-25 08:44:19 +0200 (Sun, 25 Mar 2012) | 2 lines Changed paths: M /trunk/IndexSearch/FieldMapperInterface.h M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp More comprehensive FieldMapper interface. ------------------------------------------------------------------------ r1803 | fabrice.colin@gmail.com | 2012-03-18 10:45:13 +0100 (Sun, 18 Mar 2012) | 2 lines Changed paths: M /trunk/IndexSearch/FieldMapperInterface.h M /trunk/IndexSearch/Xapian/XapianEngine.cpp FieldMapper can extend the query language with new boolean filters. ------------------------------------------------------------------------ r1802 | fabrice.colin@gmail.com | 2012-03-18 08:19:01 +0100 (Sun, 18 Mar 2012) | 2 lines Changed paths: A /trunk/IndexSearch/FieldMapperInterface.h M /trunk/IndexSearch/Google/ModuleExports.cpp M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/ModuleFactory.h M /trunk/IndexSearch/Xapian/ModuleExports.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/XesamGLib/ModuleExports.cpp M /trunk/Utils/DocumentInfo.cpp Backends can be passed a field mapper that handles extra, app-specific fields. ------------------------------------------------------------------------ r1801 | fabrice.colin@gmail.com | 2012-03-11 14:18:32 +0100 (Sun, 11 Mar 2012) | 5 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h Filters may return any field, those not recognized will be added to documents as foreign fields. Fields with name "id" are assumed to be foreign keys to be indexed with prefix Q, following the convention used by Xapian. Fixed XapianIndex::removeCommonTerms(). ------------------------------------------------------------------------ r1800 | fabrice.colin@gmail.com | 2012-03-03 15:21:09 +0100 (Sat, 03 Mar 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/GMimeMboxFilter.cc Fixes to date. ------------------------------------------------------------------------ r1799 | fabrice.colin@gmail.com | 2012-03-01 13:01:50 +0100 (Thu, 01 Mar 2012) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-check-file.sh Minor message fix. ------------------------------------------------------------------------ r1798 | fabrice.colin@gmail.com | 2012-03-01 13:01:13 +0100 (Thu, 01 Mar 2012) | 2 lines Changed paths: A /trunk/Core/pinot-daemon.1 M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Makefile.am Some changes to help with no D-Bus builds. ------------------------------------------------------------------------ r1797 | fabrice.colin@gmail.com | 2012-02-24 15:49:08 +0100 (Fri, 24 Feb 2012) | 2 lines Changed paths: M /trunk/po/de.po German translation update by Gena Haltmair. ------------------------------------------------------------------------ r1796 | fabrice.colin@gmail.com | 2012-01-18 12:44:37 +0100 (Wed, 18 Jan 2012) | 2 lines Changed paths: M /trunk/Tokenize/filters/FileOutputFilter.cc DEBUG output changes. ------------------------------------------------------------------------ r1795 | fabrice.colin@gmail.com | 2012-01-04 13:46:49 +0100 (Wed, 04 Jan 2012) | 3 lines Changed paths: M /trunk/Core/pinot-index.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Prefer URI enabled actions for remote files only. This should fix http://code.google.com/p/pinot-search/issues/detail?id=7 ------------------------------------------------------------------------ r1794 | fabrice.colin@gmail.com | 2011-12-27 11:11:06 +0100 (Tue, 27 Dec 2011) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp Fixed SVN r1782. ------------------------------------------------------------------------ r1793 | fabrice.colin@gmail.com | 2011-12-24 10:49:27 +0100 (Sat, 24 Dec 2011) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/Google.src Fixed Google plugin. ------------------------------------------------------------------------ r1792 | fabrice.colin@gmail.com | 2011-12-24 10:48:41 +0100 (Sat, 24 Dec 2011) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/LanguageDetector.cpp Support for libextttexcat. ------------------------------------------------------------------------ r1791 | fabrice.colin@gmail.com | 2011-12-24 10:13:46 +0100 (Sat, 24 Dec 2011) | 2 lines Changed paths: M /trunk/configure.in M /trunk/pinot.spec.in The RPM can be built with option "--with gtkmm3". ------------------------------------------------------------------------ r1790 | fabrice.colin@gmail.com | 2011-12-24 10:13:05 +0100 (Sat, 24 Dec 2011) | 2 lines Changed paths: M /trunk/Core/ServerThreads.cpp Unused variable fix. ------------------------------------------------------------------------ r1789 | fabrice.colin@gmail.com | 2011-12-19 10:13:38 +0100 (Mon, 19 Dec 2011) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Fixed dynamically generated submenus. ------------------------------------------------------------------------ r1788 | fabrice.colin@gmail.com | 2011-12-18 10:29:18 +0100 (Sun, 18 Dec 2011) | 2 lines Changed paths: M /trunk/Makefile.am Follow up to SVN r1783. ------------------------------------------------------------------------ r1787 | fabrice.colin@gmail.com | 2011-12-18 10:27:08 +0100 (Sun, 18 Dec 2011) | 3 lines Changed paths: M /trunk/textcat3_conf.txt Removed reference to LM directory. Fixes http://code.google.com/p/pinot-search/issues/detail?id=3 ------------------------------------------------------------------------ r1786 | fabrice.colin@gmail.com | 2011-12-18 10:23:45 +0100 (Sun, 18 Dec 2011) | 2 lines Changed paths: A /trunk/IndexSearch/cjkv A /trunk/IndexSearch/cjkv/CJKVTokenizer.cc A /trunk/IndexSearch/cjkv/CJKVTokenizer.h A /trunk/Tokenize/filters A /trunk/Tokenize/filters/ArchiveFilter.cc A /trunk/Tokenize/filters/ArchiveFilter.h A /trunk/Tokenize/filters/ChmFilter.cc A /trunk/Tokenize/filters/ChmFilter.h A /trunk/Tokenize/filters/ExifImageFilter.cc A /trunk/Tokenize/filters/ExifImageFilter.h A /trunk/Tokenize/filters/Exiv2ImageFilter.cc A /trunk/Tokenize/filters/Exiv2ImageFilter.h A /trunk/Tokenize/filters/ExternalFilter.cc A /trunk/Tokenize/filters/ExternalFilter.h A /trunk/Tokenize/filters/FileOutputFilter.cc A /trunk/Tokenize/filters/FileOutputFilter.h A /trunk/Tokenize/filters/Filter.cc A /trunk/Tokenize/filters/Filter.h A /trunk/Tokenize/filters/FilterFactory.cc A /trunk/Tokenize/filters/FilterFactory.h A /trunk/Tokenize/filters/GMimeMboxFilter.cc A /trunk/Tokenize/filters/GMimeMboxFilter.h A /trunk/Tokenize/filters/HtmlFilter.cc A /trunk/Tokenize/filters/HtmlFilter.h A /trunk/Tokenize/filters/HtmlParser.cc A /trunk/Tokenize/filters/HtmlParser.h A /trunk/Tokenize/filters/TagLibMusicFilter.cc A /trunk/Tokenize/filters/TagLibMusicFilter.h A /trunk/Tokenize/filters/TarFilter.cc A /trunk/Tokenize/filters/TarFilter.h A /trunk/Tokenize/filters/TextFilter.cc A /trunk/Tokenize/filters/TextFilter.h A /trunk/Tokenize/filters/XmlFilter.cc A /trunk/Tokenize/filters/XmlFilter.h A /trunk/Tokenize/filters/external-filters.xml Merging in formerly external source. ------------------------------------------------------------------------ r1785 | fabrice.colin@gmail.com | 2011-12-18 10:22:25 +0100 (Sun, 18 Dec 2011) | 2 lines Changed paths: M /trunk M /trunk/ChangeLog M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Removed externals. ------------------------------------------------------------------------ r1784 | fabrice.colin@gmail.com | 2011-12-18 10:15:00 +0100 (Sun, 18 Dec 2011) | 3 lines Changed paths: A /trunk/IndexSearch/Plugins/Freecode.src (from /trunk/IndexSearch/Plugins/Freshmeat.src:1782) D /trunk/IndexSearch/Plugins/Freshmeat.src Renamed the Freshmeat plugin to Freecode. Fixes http://code.google.com/p/pinot-search/issues/detail?id=5 ------------------------------------------------------------------------ r1783 | fabrice.colin@gmail.com | 2011-12-18 10:10:19 +0100 (Sun, 18 Dec 2011) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in D /trunk/scripts/python/pinot-live.py D /trunk/scripts/python/pinot-module.py Dropped support for Deskbar. Fixes http://code.google.com/p/pinot-search/issues/detail?id=4 ------------------------------------------------------------------------ r1782 | fabrice.colin@gmail.com | 2011-12-18 10:06:47 +0100 (Sun, 18 Dec 2011) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp Suppress deprecated function call warnings. ------------------------------------------------------------------------ r1781 | fabrice.colin@gmail.com | 2011-12-18 09:50:05 +0100 (Sun, 18 Dec 2011) | 2 lines Changed paths: M /trunk/configure.in New --enable-gtkmm3 switch. ------------------------------------------------------------------------ r1780 | fabrice.colin@gmail.com | 2011-12-18 09:49:18 +0100 (Sun, 18 Dec 2011) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cc M /trunk/UI/GTK2/src/EnginesTree.hh M /trunk/UI/GTK2/src/IndexPage.cc M /trunk/UI/GTK2/src/ModelColumns.hh M /trunk/UI/GTK2/src/Notebook.cc M /trunk/UI/GTK2/src/ResultsTree.cc M /trunk/UI/GTK2/src/ResultsTree.hh M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/launcherDialog_glade.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/prefsWindow_glade.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/statisticsDialog_glade.cc GTK 3.0 port, first stab. Submenus generated dynamically don't work. ------------------------------------------------------------------------ r1779 | fabrice.colin@gmail.com | 2011-11-14 14:52:31 +0100 (Mon, 14 Nov 2011) | 3 lines Changed paths: M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-index.cpp M /trunk/README M /trunk/UI/GTK2/src/mainWindow.cc A /trunk/aclocal.m4 M /trunk/pinot.spec.in Replace URLs here and there. Add missing file required for building. ------------------------------------------------------------------------ r1778 | fabrice.colin@gmail.com | 2011-11-08 14:33:49 +0100 (Tue, 08 Nov 2011) | 2 lines Changed paths: M /trunk/README Dependencies on exiv2 and chmlib were not documented here. ------------------------------------------------------------------------ r1776 | fabricecolin | 2011-11-07 05:35:20 +0100 (Mon, 07 Nov 2011) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1775 | fabricecolin | 2011-11-07 05:31:08 +0100 (Mon, 07 Nov 2011) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Bumped version to 0.98. ------------------------------------------------------------------------ r1774 | fabricecolin | 2011-10-30 06:18:19 +0100 (Sun, 30 Oct 2011) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/RollYOTopNews.src Not sure whether this is still working. ------------------------------------------------------------------------ r1773 | fabricecolin | 2011-10-30 06:06:51 +0100 (Sun, 30 Oct 2011) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/GoogleCodeSearch.src Google Code Search is now deprecated. ------------------------------------------------------------------------ r1772 | fabricecolin | 2011-10-30 05:56:19 +0100 (Sun, 30 Oct 2011) | 2 lines Changed paths: M /trunk/Makefile.am YahooBOSS.src was removed. ------------------------------------------------------------------------ r1771 | fabricecolin | 2011-10-30 05:55:37 +0100 (Sun, 30 Oct 2011) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/YahooBOSS.src And Yahoo! BOSS requires OAuth, which we don't support. ------------------------------------------------------------------------ r1770 | fabricecolin | 2011-10-30 05:43:26 +0100 (Sun, 30 Oct 2011) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/Yahoo.src The Yahoo! REST API was shut down earlier this year. ------------------------------------------------------------------------ r1769 | fabricecolin | 2011-10-23 07:49:17 +0200 (Sun, 23 Oct 2011) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/configure.in Support for libexttextcat 3.1.1. ------------------------------------------------------------------------ r1768 | fabricecolin | 2011-10-23 07:48:33 +0200 (Sun, 23 Oct 2011) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Expire query results and view history after 6 months, not 1. ------------------------------------------------------------------------ r1767 | fabricecolin | 2011-10-02 07:42:33 +0200 (Sun, 02 Oct 2011) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/ltmain.sh Regenerated ltmain.sh with libtool 2.4. ------------------------------------------------------------------------ r1766 | fabricecolin | 2011-10-02 07:10:45 +0200 (Sun, 02 Oct 2011) | 2 lines Changed paths: M /trunk/Utils/Memory.cpp M /trunk/Utils/Memory.h Fixed length type. ------------------------------------------------------------------------ r1765 | fabricecolin | 2011-10-02 07:09:31 +0200 (Sun, 02 Oct 2011) | 5 lines Changed paths: M /trunk/Core/PinotSettings.cpp M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in A /trunk/textcat31_conf.txt Build the new exiv2-based filter. Add support for LibreOffice's libexttextcat 3.1 as found at http://cgit.freedesktop.org/libreoffice/libexttextcat/. Drop SOAP/Google API support. ------------------------------------------------------------------------ r1764 | fabricecolin | 2011-03-28 12:20:44 +0200 (Mon, 28 Mar 2011) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/nl.po M /trunk/po/ru.po Translation updates by Fitoschido, pkramerruiz, Tico and Nikolay Kachanov. ------------------------------------------------------------------------ r1763 | fabricecolin | 2011-02-23 15:07:23 +0100 (Wed, 23 Feb 2011) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/nl.po Translation update by Martijn Verstrate. General thanks to Martijn for his suggestions and help with testing. ------------------------------------------------------------------------ r1762 | fabricecolin | 2011-02-23 15:04:38 +0100 (Wed, 23 Feb 2011) | 3 lines Changed paths: M /trunk/Tokenize/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Build the new CHM filter if --enable-chmlib=yes is passed to configure. RPM-wise, that's done with the option "--with chmlib". ------------------------------------------------------------------------ r1761 | fabricecolin | 2011-02-20 11:17:00 +0100 (Sun, 20 Feb 2011) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc Added missing header for for_each(). ------------------------------------------------------------------------ r1759 | fabricecolin | 2011-01-09 11:44:25 +0100 (Sun, 09 Jan 2011) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/NEWS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Releasing 0.97. ------------------------------------------------------------------------ r1758 | fabricecolin | 2011-01-09 11:41:31 +0100 (Sun, 09 Jan 2011) | 3 lines Changed paths: M /trunk/ltmain.sh Regenerated with libtool 2.2.10. Changes for --as-needed from SVN r1713 are still in. ------------------------------------------------------------------------ r1757 | fabricecolin | 2011-01-06 12:41:18 +0100 (Thu, 06 Jan 2011) | 2 lines Changed paths: M /trunk/pinot.desktop desktop-file-validate v0.16 says that category Filesystem also requires System. ------------------------------------------------------------------------ r1756 | fabricecolin | 2011-01-06 12:37:25 +0100 (Thu, 06 Jan 2011) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/README M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Imminent 0.97 release. ------------------------------------------------------------------------ r1755 | fabricecolin | 2011-01-06 12:22:56 +0100 (Thu, 06 Jan 2011) | 2 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/WebEngine.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/TextConverter.cpp M /trunk/Utils/DocumentInfo.cpp Compiler warning fixes. ------------------------------------------------------------------------ r1754 | fabricecolin | 2011-01-06 11:20:12 +0100 (Thu, 06 Jan 2011) | 2 lines Changed paths: M /trunk/po/ru.po Full Russian translation by Nikolay Kachanov. ------------------------------------------------------------------------ r1753 | fabricecolin | 2011-01-04 17:43:09 +0100 (Tue, 04 Jan 2011) | 2 lines Changed paths: M /trunk/po/es.po Update by Fitoschido. ------------------------------------------------------------------------ r1752 | fabricecolin | 2010-12-31 09:45:04 +0100 (Fri, 31 Dec 2010) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/ru.po M /trunk/po/zh_CN.po Updates from Fitoschido, Nikolay Kachanov and mike2718. ------------------------------------------------------------------------ r1751 | fabricecolin | 2010-12-20 09:53:13 +0100 (Mon, 20 Dec 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/ru.po Russian translation update and valuable feedback from Nikolay Kachanov. ------------------------------------------------------------------------ r1750 | fabricecolin | 2010-12-20 09:42:55 +0100 (Mon, 20 Dec 2010) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Less DEBUG. ------------------------------------------------------------------------ r1749 | fabricecolin | 2010-12-20 09:42:38 +0100 (Mon, 20 Dec 2010) | 6 lines Changed paths: M /trunk/README M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc The UI checks for the environment variable PINOT_MAXIMUM_QUERY_RESULTS. This overrides the number of results returned by queries run through the UI's Query field as well as the number of results initially set for new stored queries, which is now capped to 1000. The UI calls mustQuit() on exit. ------------------------------------------------------------------------ r1748 | fabricecolin | 2010-12-15 14:29:36 +0100 (Wed, 15 Dec 2010) | 2 lines Changed paths: M /trunk/FAQ M /trunk/README Updates about env vars, DeskbarApplet, libtextcat. ------------------------------------------------------------------------ r1747 | fabricecolin | 2010-12-15 14:26:46 +0100 (Wed, 15 Dec 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/zh_CN.po Simplified Chinese translation update by mike2718. ------------------------------------------------------------------------ r1746 | fabricecolin | 2010-12-15 14:24:41 +0100 (Wed, 15 Dec 2010) | 3 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h If the environment variable PINOT_MAXIMUM_NESTED_SIZE is set, feed its value to filters, as property MAXIMUM_NESTED_SIZE. ------------------------------------------------------------------------ r1745 | fabricecolin | 2010-12-15 14:19:48 +0100 (Wed, 15 Dec 2010) | 5 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp Index components off terms that include dots separately, in line with how the QueryParser processes such terms in queries. When generating abstracts, we may find several terms at the same position, in which case the shortest is preferred. ------------------------------------------------------------------------ r1744 | fabricecolin | 2010-12-14 15:00:43 +0100 (Tue, 14 Dec 2010) | 5 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Core/pinot-index.cpp Both call mustQuit() when signaled; simply quitting the main loop isn't enough when running in single thread mode. pinot-index's IndexingState provides a simple, queue-less queue_index(), attempts to reclaim memory when threads end. ------------------------------------------------------------------------ r1743 | fabricecolin | 2010-12-14 14:27:33 +0100 (Tue, 14 Dec 2010) | 3 lines Changed paths: M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h Added ThreadsManager::mustQuit() for applications to signal that quitting is necessary and try and stop threads. ------------------------------------------------------------------------ r1742 | fabricecolin | 2010-12-14 14:21:57 +0100 (Tue, 14 Dec 2010) | 2 lines Changed paths: M /trunk/Utils/Document.cpp Don't try and map more than 2Gb. Use shared mappings. ------------------------------------------------------------------------ r1741 | fabricecolin | 2010-12-10 03:15:33 +0100 (Fri, 10 Dec 2010) | 2 lines Changed paths: M /trunk/Utils/Memory.cpp M /trunk/Utils/Memory.h Obsoleted our custom memory pool class, use boost::singleton_pool if available. ------------------------------------------------------------------------ r1740 | fabricecolin | 2010-12-10 03:13:32 +0100 (Fri, 10 Dec 2010) | 4 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h M /trunk/Core/pinot-index.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/prefsWindow.hh M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh The maximum number of indexing threads defaults to 1, can still be controlled globally by PINOT_MAXIMUM_INDEX_THREADS. Crawler and DirectoryScanner threads index documents inline by default, don't delegate to an Indexing thread. ------------------------------------------------------------------------ r1739 | fabricecolin | 2010-12-10 03:07:46 +0100 (Fri, 10 Dec 2010) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in Leave the old DeskBar handlers in libdir. ------------------------------------------------------------------------ r1738 | fabricecolin | 2010-12-08 16:03:04 +0100 (Wed, 08 Dec 2010) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Remove dots at the end of terms that don't look like acronyms. ------------------------------------------------------------------------ r1737 | fabricecolin | 2010-12-06 16:47:19 +0100 (Mon, 06 Dec 2010) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in Moved the deskbar plugins to /usr/libexec, where they should have been since version 2_27_91 ! ------------------------------------------------------------------------ r1736 | fabricecolin | 2010-12-06 14:24:41 +0100 (Mon, 06 Dec 2010) | 4 lines Changed paths: M /trunk/README M /trunk/textcat_conf.txt README's reset of historical data bit ignored ActionQueue. Brought textcat_conf.txt in line with how libtextcat 2.x is packaged on most platforms. ------------------------------------------------------------------------ r1735 | fabricecolin | 2010-12-06 14:21:39 +0100 (Mon, 06 Dec 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/es.po M /trunk/po/it.po Spanish and Italian translation updates by Fitoschido and Simone Sandri. ------------------------------------------------------------------------ r1734 | fabricecolin | 2010-10-25 13:41:55 +0200 (Mon, 25 Oct 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/pt.po Update by Almufadado. ------------------------------------------------------------------------ r1733 | fabricecolin | 2010-10-13 12:58:55 +0200 (Wed, 13 Oct 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/it.po M /trunk/po/ru.po Italian and Russian updates by Simone Sandri and Alexander Zinin. ------------------------------------------------------------------------ r1732 | fabricecolin | 2010-10-01 16:29:52 +0200 (Fri, 01 Oct 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/it.po Italian translation update by Davide Vidal. ------------------------------------------------------------------------ r1731 | fabricecolin | 2010-09-10 18:15:04 +0200 (Fri, 10 Sep 2010) | 4 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cc M /trunk/UI/GTK2/src/Notebook.hh M /trunk/UI/GTK2/src/mainWindow.cc When a spelling suggestion is available, don't connect to the Yes button's signal multiple times, this caused the same revised query to be shown several times. ------------------------------------------------------------------------ r1730 | fabricecolin | 2010-09-10 13:51:42 +0200 (Fri, 10 Sep 2010) | 2 lines Changed paths: M /trunk/po/es.po Updated by Matias Fonzo. ------------------------------------------------------------------------ r1729 | fabricecolin | 2010-09-02 14:43:31 +0200 (Thu, 02 Sep 2010) | 2 lines Changed paths: M /trunk/po/pt_BR.po Update by feen. ------------------------------------------------------------------------ r1728 | fabricecolin | 2010-08-16 17:34:09 +0200 (Mon, 16 Aug 2010) | 2 lines Changed paths: M /trunk/po/ja.po Update to the Japanese translation by Mizuki-san. ------------------------------------------------------------------------ r1727 | fabricecolin | 2010-07-28 16:09:50 +0200 (Wed, 28 Jul 2010) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/es.po M /trunk/po/nl.po Spanish translation by Juan Miguel Boyero Corral. Dutch translation by Dirk Roos. ------------------------------------------------------------------------ r1726 | fabricecolin | 2010-07-15 15:04:57 +0200 (Thu, 15 Jul 2010) | 2 lines Changed paths: M /trunk/Core/WorkerThreads.cpp Skip symlinks if they are blacklisted, not if they aren't !!! ------------------------------------------------------------------------ r1724 | fabricecolin | 2010-07-12 16:23:52 +0200 (Mon, 12 Jul 2010) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Change log and yet another update for the po's... ------------------------------------------------------------------------ r1723 | fabricecolin | 2010-07-12 16:08:59 +0200 (Mon, 12 Jul 2010) | 2 lines Changed paths: M /trunk/ChangeLog-dijon M /trunk/NEWS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po News for 0.96. ------------------------------------------------------------------------ r1722 | fabricecolin | 2010-07-12 15:55:31 +0200 (Mon, 12 Jul 2010) | 2 lines Changed paths: M /trunk/Core/PinotSettings.cpp Removed unused variable. ------------------------------------------------------------------------ r1721 | fabricecolin | 2010-07-11 09:57:41 +0200 (Sun, 11 Jul 2010) | 2 lines Changed paths: M /trunk/ChangeLog-dijon M /trunk/README M /trunk/TODO M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current po's and misc files. ------------------------------------------------------------------------ r1720 | fabricecolin | 2010-07-01 12:34:37 +0200 (Thu, 01 Jul 2010) | 3 lines Changed paths: M /trunk/po/fr.po M /trunk/po/he.po Completed fr.po, added missing accents etc... Updated he.po by Yaron. ------------------------------------------------------------------------ r1719 | fabricecolin | 2010-07-01 12:32:32 +0200 (Thu, 01 Jul 2010) | 2 lines Changed paths: M /trunk/Core/pinot-search.1 M /trunk/Core/pinot-search.cpp M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/Makefile.am M /trunk/pinot.spec.in Dropped Xesam support. ------------------------------------------------------------------------ r1718 | fabricecolin | 2010-07-01 12:28:57 +0200 (Thu, 01 Jul 2010) | 3 lines Changed paths: M /trunk/configure.in Dropped option --enable-xesam-glib, don't build IndexSearch/XesamGLib. Look for dlopen() in libc if not found in libdl (helps with build on BSD). ------------------------------------------------------------------------ r1717 | fabricecolin | 2010-07-01 12:26:40 +0200 (Thu, 01 Jul 2010) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp FreeBSD build patch by Thierry Thomas. ------------------------------------------------------------------------ r1716 | fabricecolin | 2010-06-29 16:06:52 +0200 (Tue, 29 Jun 2010) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc Minor fixes. ------------------------------------------------------------------------ r1715 | fabricecolin | 2010-06-27 22:47:21 +0200 (Sun, 27 Jun 2010) | 3 lines Changed paths: M /trunk/README M /trunk/configure.in Be smart, check for gmime 2.4 first then 2.5. This partially undoes SVN r1705. ------------------------------------------------------------------------ r1714 | fabricecolin | 2010-06-27 14:01:01 +0200 (Sun, 27 Jun 2010) | 3 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/ActionQueue.h Prepare more statements. New method ActionQueue::deleteItems(). ------------------------------------------------------------------------ r1713 | fabricecolin | 2010-06-27 13:11:56 +0200 (Sun, 27 Jun 2010) | 4 lines Changed paths: M /trunk/AUTHORS A /trunk/ltmain.sh Added ltmain.sh to SVN so that everybody can enjoy the changes for --as-needed as used in the Debian package, contributed by Jonas Smedegaard. Also mention Jens Wilhelm Wulf's feedback. ------------------------------------------------------------------------ r1712 | fabricecolin | 2010-06-24 11:08:05 +0200 (Thu, 24 Jun 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current translations. ------------------------------------------------------------------------ r1711 | fabricecolin | 2010-06-24 09:46:06 +0200 (Thu, 24 Jun 2010) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Bumped version number to 0.96. ------------------------------------------------------------------------ r1710 | fabricecolin | 2010-06-20 10:13:58 +0200 (Sun, 20 Jun 2010) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp Support for upower. ------------------------------------------------------------------------ r1709 | fabricecolin | 2010-06-20 10:11:12 +0200 (Sun, 20 Jun 2010) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp Fixed executeSimpleStatement(). ------------------------------------------------------------------------ r1708 | fabricecolin | 2010-06-19 11:31:00 +0200 (Sat, 19 Jun 2010) | 2 lines Changed paths: M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/pt_BR.po M /trunk/po/zh_CN.po Updates from verdy_p, Yaron, andbelo and Eleanor Chen respectively. ------------------------------------------------------------------------ r1707 | fabricecolin | 2010-06-19 11:30:07 +0200 (Sat, 19 Jun 2010) | 2 lines Changed paths: M /trunk/Core/WorkerThreads.cpp Check symlinks against the blacklist. ------------------------------------------------------------------------ r1706 | fabricecolin | 2010-06-19 11:18:46 +0200 (Sat, 19 Jun 2010) | 2 lines Changed paths: M /trunk/po/POTFILES.in Added SQLiteBase.cpp to the list. ------------------------------------------------------------------------ r1705 | fabricecolin | 2010-06-19 10:59:58 +0200 (Sat, 19 Jun 2010) | 4 lines Changed paths: M /trunk/README M /trunk/configure.in Added call to g_type_init() in the test program for GIO sniffing. Set gmime dependency to "gmime-2.6 >= 2.5". Link with -ldl. ------------------------------------------------------------------------ r1704 | fabricecolin | 2010-06-19 10:53:48 +0200 (Sat, 19 Jun 2010) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Prepare a lot more statements. ------------------------------------------------------------------------ r1703 | fabricecolin | 2010-06-19 10:53:13 +0200 (Sat, 19 Jun 2010) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Extended about box copyright to 2010. Launch preferences with pinot -p in case the pinot-prefs symlink doesn't exist. ------------------------------------------------------------------------ r1702 | fabricecolin | 2010-06-15 18:12:17 +0200 (Tue, 15 Jun 2010) | 4 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Slightly better transaction support, error reporting. Clear compiled statements on close(). Removed _USE_VSNPRINTF conditioned code. ------------------------------------------------------------------------ r1701 | fabricecolin | 2010-06-12 06:37:51 +0200 (Sat, 12 Jun 2010) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Core/pinot-index.cpp M /trunk/UI/GTK2/src/pinot.cc Prefer Chert to Flint. ------------------------------------------------------------------------ r1700 | fabricecolin | 2010-06-12 06:06:41 +0200 (Sat, 12 Jun 2010) | 3 lines Changed paths: M /trunk/README Make it clear operators should be upper-case. "hu", "ro" and "tr" were missing from the list of language codes. ------------------------------------------------------------------------ r1699 | fabricecolin | 2010-06-07 17:40:23 +0200 (Mon, 07 Jun 2010) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/LanguageDetector.cpp Check the handle before attempting to close it. ------------------------------------------------------------------------ r1698 | fabricecolin | 2010-03-30 15:05:52 +0200 (Tue, 30 Mar 2010) | 3 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Fix for INSERT, UPDATE and DELETE as prepared statements. Don't be greedy, if the database is busy, sleep for a short while! ------------------------------------------------------------------------ r1697 | fabricecolin | 2010-03-21 14:22:53 +0100 (Sun, 21 Mar 2010) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/SQL/SQLiteBase.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Tweaks and missing headers. ------------------------------------------------------------------------ r1696 | fabricecolin | 2010-03-21 14:20:43 +0100 (Sun, 21 Mar 2010) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/fr.po M /trunk/po/pt_BR.po fr and pt_BR updates by verdy_p and andbelo. ------------------------------------------------------------------------ r1695 | fabricecolin | 2009-12-06 09:50:38 +0100 (Sun, 06 Dec 2009) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp Less DEBUG. ------------------------------------------------------------------------ r1694 | fabricecolin | 2009-12-06 09:50:05 +0100 (Sun, 06 Dec 2009) | 2 lines Changed paths: M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h M /trunk/Core/pinot-dbus-daemon.cpp Battery status can now be obtained from DeviceKit-power. ------------------------------------------------------------------------ r1692 | fabricecolin | 2009-11-14 05:10:40 +0100 (Sat, 14 Nov 2009) | 2 lines Changed paths: M /trunk/ChangeLog-dijon M /trunk/NEWS M /trunk/po/es.po M /trunk/po/pt.po Releasing 0.95 today. ------------------------------------------------------------------------ r1691 | fabricecolin | 2009-11-10 14:44:54 +0100 (Tue, 10 Nov 2009) | 4 lines Changed paths: M /trunk/AUTHORS M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Bumped version number to 0.95. Included language updates from Fabian Affolter and Marco Jahn (German), Ddorda (Hebrew), Bernardo Lopes (Portuguese) and DiegoJ (Spanish). ------------------------------------------------------------------------ r1690 | fabricecolin | 2009-11-09 14:24:41 +0100 (Mon, 09 Nov 2009) | 2 lines Changed paths: M /trunk/Core/Makefile.am Build the daemon as pinot-daemon if D-Bus support is turned off. ------------------------------------------------------------------------ r1689 | fabricecolin | 2009-11-05 14:36:39 +0100 (Thu, 05 Nov 2009) | 2 lines Changed paths: M /trunk/configure.in Last chunk of the OpenBSD patches, confirmed not to affect other ports. ------------------------------------------------------------------------ r1688 | fabricecolin | 2009-11-04 13:11:54 +0100 (Wed, 04 Nov 2009) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/Exalead.src D /trunk/IndexSearch/Plugins/IOIDescription.xml These engines have fallen off the face of the Earth :-) ------------------------------------------------------------------------ r1687 | fabricecolin | 2009-11-04 13:11:05 +0100 (Wed, 04 Nov 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/Bing.src Fixed results parsing for Bing. ------------------------------------------------------------------------ r1686 | fabricecolin | 2009-11-02 15:25:09 +0100 (Mon, 02 Nov 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp Cosmetic edit. ------------------------------------------------------------------------ r1685 | fabricecolin | 2009-11-02 15:24:12 +0100 (Mon, 02 Nov 2009) | 2 lines Changed paths: M /trunk/po/POTFILES.in PinotUtils was moved. ------------------------------------------------------------------------ r1684 | fabricecolin | 2009-11-02 15:22:26 +0100 (Mon, 02 Nov 2009) | 4 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h DaemonState and MonitorThread use one CrawlHistory object from start to finish, ThreadsManager one ActionQueue object. CrawlerThread's cache applies to all updates. ------------------------------------------------------------------------ r1683 | fabricecolin | 2009-11-02 15:17:41 +0100 (Mon, 02 Nov 2009) | 2 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Use prepared statements for common queries, transactions for masse updates. ------------------------------------------------------------------------ r1682 | fabricecolin | 2009-11-02 02:35:55 +0100 (Mon, 02 Nov 2009) | 2 lines Changed paths: M /trunk/Core/Makefile.am PinotUtils was moved out. ------------------------------------------------------------------------ r1681 | fabricecolin | 2009-11-02 02:35:26 +0100 (Mon, 02 Nov 2009) | 2 lines Changed paths: M /trunk/Makefile.am Install D-Bus related stuff if HAVE_DBUS is set. ------------------------------------------------------------------------ r1680 | fabricecolin | 2009-11-01 15:48:44 +0100 (Sun, 01 Nov 2009) | 2 lines Changed paths: M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Support for prepared statements, transactions. ------------------------------------------------------------------------ r1679 | fabricecolin | 2009-10-25 09:33:44 +0100 (Sun, 25 Oct 2009) | 2 lines Changed paths: M /trunk/FAQ Summarize all environment variables that can be tuned to lower memeory usage. ------------------------------------------------------------------------ r1678 | fabricecolin | 2009-10-25 09:31:57 +0100 (Sun, 25 Oct 2009) | 2 lines Changed paths: M /trunk/po/de.po Update from Fabian Affolter. ------------------------------------------------------------------------ r1677 | fabricecolin | 2009-10-25 09:14:47 +0100 (Sun, 25 Oct 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.h Removed error checking on mutex. ------------------------------------------------------------------------ r1676 | fabricecolin | 2009-10-25 09:13:51 +0100 (Sun, 25 Oct 2009) | 6 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h XPATH/path is a probabilistic term prefix, and path terms should be generated with positional information. Get the tokenizer to break on spaces only. With the recent changes to the CJKVTokenizer, this should preserve dots, eg dots in acronyms and version numbers, similarly to what the QueryParser does. ------------------------------------------------------------------------ r1675 | fabricecolin | 2009-10-25 05:15:01 +0100 (Sun, 25 Oct 2009) | 2 lines Changed paths: M /trunk/po/nl.po Update from JW. ------------------------------------------------------------------------ r1674 | fabricecolin | 2009-10-21 16:06:19 +0200 (Wed, 21 Oct 2009) | 2 lines Changed paths: M /trunk/pinot-dbus-daemon.desktop Added Encoding and Version. ------------------------------------------------------------------------ r1673 | fabricecolin | 2009-10-21 15:22:04 +0200 (Wed, 21 Oct 2009) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianEngine.h Use the Query API to limit the query to a set of documents (Search This For) instead of mangling the query string. Escape and hash terms with prefix "path". ------------------------------------------------------------------------ r1672 | fabricecolin | 2009-10-10 13:52:09 +0200 (Sat, 10 Oct 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc If gtkmm >= 2.16 is available, replace the find button with a QueryEntry icon. ------------------------------------------------------------------------ r1671 | fabricecolin | 2009-09-19 06:45:58 +0200 (Sat, 19 Sep 2009) | 2 lines Changed paths: D /trunk/Core/PinotUtils.cpp D /trunk/Core/PinotUtils.h M /trunk/UI/GTK2/src/EnginesTree.cc M /trunk/UI/GTK2/src/IndexPage.cc M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/Notebook.cc A /trunk/UI/GTK2/src/PinotUtils.cc (from /trunk/Core/PinotUtils.cpp:1665) A /trunk/UI/GTK2/src/PinotUtils.hh (from /trunk/Core/PinotUtils.h:1665) M /trunk/UI/GTK2/src/ResultsTree.cc M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/launcherDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Moved PinotUtils out of Core. ------------------------------------------------------------------------ r1670 | fabricecolin | 2009-08-30 03:07:46 +0200 (Sun, 30 Aug 2009) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp Show the name of the database when a statement fails. ------------------------------------------------------------------------ r1669 | fabricecolin | 2009-08-30 03:07:01 +0200 (Sun, 30 Aug 2009) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/UI/GTK2/src/pinot.cc Build those when HAVE_DBUS isn't set. ------------------------------------------------------------------------ r1668 | fabricecolin | 2009-08-30 03:04:33 +0200 (Sun, 30 Aug 2009) | 2 lines Changed paths: M /trunk/Core/UniqueApplication.cpp Cleanup. ------------------------------------------------------------------------ r1667 | fabricecolin | 2009-08-29 07:33:42 +0200 (Sat, 29 Aug 2009) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/Core/DaemonState.cpp M /trunk/Core/Makefile.am M /trunk/Core/WorkerThreads.cpp M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/README M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am OpenBSD port patches, from Antoine Jacoutot. See http://www.openbsd.org/cgi-bin/cvsweb/ports/x11/pinot/patches/ ------------------------------------------------------------------------ r1666 | fabricecolin | 2009-08-29 07:23:09 +0200 (Sat, 29 Aug 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/UI/GTK2/src/pinot.cc Redirect clog too. ------------------------------------------------------------------------ r1665 | fabricecolin | 2009-08-15 12:11:22 +0200 (Sat, 15 Aug 2009) | 2 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/MetaDataBackup.cpp M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/SQLDB.cpp M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/ViewHistory.cpp M /trunk/Utils/CommandLine.cpp M /trunk/Utils/Document.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/Memory.cpp M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/Url.cpp Prefer clog to cout and cerr. ------------------------------------------------------------------------ r1664 | fabricecolin | 2009-07-02 15:37:20 +0200 (Thu, 02 Jul 2009) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/SQLDB.cpp M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Allow opening databases in read-only mode. ------------------------------------------------------------------------ r1663 | fabricecolin | 2009-07-01 17:22:38 +0200 (Wed, 01 Jul 2009) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.cpp When requesting GM time, the timezone should be GMT ! ------------------------------------------------------------------------ r1662 | fabricecolin | 2009-06-29 17:08:05 +0200 (Mon, 29 Jun 2009) | 2 lines Changed paths: M /trunk/po/fr.po Translation update by Thierry Thomas. ------------------------------------------------------------------------ r1660 | fabricecolin | 2009-06-27 05:36:22 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1659 | fabricecolin | 2009-06-27 05:31:13 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Updated po's. ------------------------------------------------------------------------ r1658 | fabricecolin | 2009-06-27 05:30:29 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/NEWS Changes in this release. ------------------------------------------------------------------------ r1657 | fabricecolin | 2009-06-27 05:27:16 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/TODO Current TODO list. ------------------------------------------------------------------------ r1656 | fabricecolin | 2009-06-27 05:26:08 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/Freshmeat.src Fixed results extraction. ------------------------------------------------------------------------ r1655 | fabricecolin | 2009-06-27 04:23:33 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/configure.in Bumped release number to 0.94, made GMime 2.4 default. ------------------------------------------------------------------------ r1654 | fabricecolin | 2009-06-27 04:22:22 +0200 (Sat, 27 Jun 2009) | 2 lines Changed paths: M /trunk/pinot-dbus-daemon.desktop Cosmetic changes. ------------------------------------------------------------------------ r1653 | fabricecolin | 2009-06-27 03:50:47 +0200 (Sat, 27 Jun 2009) | 5 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h Metadata needs to be restored en-bloc at the end of indexing (no queued actions, no threads) and crawling (empty crawl queue). Previously it was restored after crawling a directory but potentially before all documents to restore have been indexed. ------------------------------------------------------------------------ r1652 | fabricecolin | 2009-06-23 14:04:00 +0200 (Tue, 23 Jun 2009) | 2 lines Changed paths: M /trunk/SQL/MetaDataBackup.cpp getItem() would skip labels and fail if there were no fields to deserialize. ------------------------------------------------------------------------ r1651 | fabricecolin | 2009-06-23 14:01:28 +0200 (Tue, 23 Jun 2009) | 2 lines Changed paths: M /trunk/Core/ServerThreads.cpp Retrieve and restore labels separately from other metadata. ------------------------------------------------------------------------ r1650 | fabricecolin | 2009-06-22 14:46:41 +0200 (Mon, 22 Jun 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Keep track of temporary files created for viewing documents. ------------------------------------------------------------------------ r1649 | fabricecolin | 2009-06-21 15:40:10 +0200 (Sun, 21 Jun 2009) | 2 lines Changed paths: M /trunk/README Document "inurl", clarify document types and build options. ------------------------------------------------------------------------ r1648 | fabricecolin | 2009-06-21 14:19:34 +0200 (Sun, 21 Jun 2009) | 2 lines Changed paths: M /trunk/Core/PinotSettings.cpp Stored query Me should do a phrase search. ------------------------------------------------------------------------ r1647 | fabricecolin | 2009-06-21 12:08:08 +0200 (Sun, 21 Jun 2009) | 2 lines Changed paths: M /trunk/pinot.spec.in Package libarchivefilter only if _with_libarchive is set. ------------------------------------------------------------------------ r1646 | fabricecolin | 2009-06-21 12:06:37 +0200 (Sun, 21 Jun 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/UI/GTK2/src/queryDialog.cc Map the search filter "inurl" to the XFILE prefix to allow finding files embedded in a mailbox/archive at a given URL. ------------------------------------------------------------------------ r1645 | fabricecolin | 2009-06-15 06:29:51 +0200 (Mon, 15 Jun 2009) | 2 lines Changed paths: M /trunk/Core/pinot-search.cpp Show the total results estimate. ------------------------------------------------------------------------ r1644 | fabricecolin | 2009-06-15 06:29:00 +0200 (Mon, 15 Jun 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp When the index is reset, flush and signal. ------------------------------------------------------------------------ r1643 | fabricecolin | 2009-06-15 06:25:13 +0200 (Mon, 15 Jun 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc More On Like This on a Web result didn't work when the queue was empty. ------------------------------------------------------------------------ r1642 | fabricecolin | 2009-06-15 06:13:46 +0200 (Mon, 15 Jun 2009) | 2 lines Changed paths: A /trunk/IndexSearch/Plugins/Bing.src D /trunk/IndexSearch/Plugins/MSN.src Bing replaces MSN search. ------------------------------------------------------------------------ r1641 | fabricecolin | 2009-06-07 09:41:28 +0200 (Sun, 07 Jun 2009) | 2 lines Changed paths: M /trunk/Core/pinot-index.cpp In check mode, if no document has the given URL, look for embedded documents. ------------------------------------------------------------------------ r1640 | fabricecolin | 2009-05-31 08:07:05 +0200 (Sun, 31 May 2009) | 7 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Filters that return file names as titles (eg those that deal with archives) may set mimetype to SCANTITLE. The content type is then checked with MIMEScanner::scanFile(), with scanData() as fall-back. Odddly enough, this gives more accurate results for types like tar.gz : instead of being identified as application/x-gzip, they are identified as application/x-compressed-tar just like they would if they were regular files. ------------------------------------------------------------------------ r1639 | fabricecolin | 2009-05-30 06:39:25 +0200 (Sat, 30 May 2009) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Expect am IndexFlushed signal from the daemon upon which the index should be reopened. ------------------------------------------------------------------------ r1638 | fabricecolin | 2009-05-30 06:15:47 +0200 (Sat, 30 May 2009) | 2 lines Changed paths: M /trunk/Core/pinot-index.cpp M /trunk/Core/pinot-search.cpp Display the full URLs with ipath. ------------------------------------------------------------------------ r1637 | fabricecolin | 2009-05-30 06:15:05 +0200 (Sat, 30 May 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp XapianIndex::reopen() didn't actually reopen anything. Removed obsolete comment in XapianDatabase. ------------------------------------------------------------------------ r1636 | fabricecolin | 2009-05-30 06:13:19 +0200 (Sat, 30 May 2009) | 3 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Core/pinot-dbus-daemon.xml Send the signal IndexFlushed when the index is flushed. Synced with changes to DBusIndex. Some cleanup. ------------------------------------------------------------------------ r1635 | fabricecolin | 2009-05-24 11:25:15 +0200 (Sun, 24 May 2009) | 2 lines Changed paths: M /trunk/configure.in Set _FILE_OFFSET_BITS=64. ------------------------------------------------------------------------ r1634 | fabricecolin | 2009-05-24 06:26:59 +0200 (Sun, 24 May 2009) | 2 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/DBusIndex.h Added defines for the D-Bus service name and object path. ------------------------------------------------------------------------ r1633 | fabricecolin | 2009-05-24 06:25:39 +0200 (Sun, 24 May 2009) | 2 lines Changed paths: M /trunk/Utils/Document.cpp Set the close-on-exec flag, either on open() or after open() with fcntl(). ------------------------------------------------------------------------ r1632 | fabricecolin | 2009-05-24 06:24:04 +0200 (Sun, 24 May 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Show properties of external indices' documents read-only. ------------------------------------------------------------------------ r1631 | fabricecolin | 2009-05-24 05:56:53 +0200 (Sun, 24 May 2009) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/TextConverter.h TextConverter now dstring-enabled. ------------------------------------------------------------------------ r1630 | fabricecolin | 2009-04-21 15:28:44 +0200 (Tue, 21 Apr 2009) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/Tokenize/Makefile.am Funda Wang's linkage patch from http://svn.mandriva.com/svn/packages/cooker/pinot/current/SOURCES/pinot-0.93-linkage.patch ------------------------------------------------------------------------ r1628 | fabricecolin | 2009-04-13 11:56:43 +0200 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1627 | fabricecolin | 2009-04-13 11:52:11 +0200 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Updated po's. ------------------------------------------------------------------------ r1626 | fabricecolin | 2009-04-13 11:51:34 +0200 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing the 0.93 release. ------------------------------------------------------------------------ r1625 | fabricecolin | 2009-04-13 11:41:28 +0200 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Don't try and guesstimate where we are at, keep generating terms. ------------------------------------------------------------------------ r1624 | fabricecolin | 2009-04-13 05:34:19 +0200 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/README Clarify how to get dot-directories indexed with symlinks. ------------------------------------------------------------------------ r1623 | fabricecolin | 2009-04-13 05:22:14 +0200 (Mon, 13 Apr 2009) | 5 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h M /trunk/Core/pinot-dbus-daemon.cpp Unless run in full scan mode, the daemon would reindex all files on every run ! That probably stems from changes made in 0.91. There's no justification for not doing a full scan on every run. All scans are from now on full by default. ------------------------------------------------------------------------ r1621 | fabricecolin | 2009-04-09 17:38:04 +0200 (Thu, 09 Apr 2009) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1620 | fabricecolin | 2009-04-09 17:34:40 +0200 (Thu, 09 Apr 2009) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO Current state. ------------------------------------------------------------------------ r1619 | fabricecolin | 2009-04-09 14:59:15 +0200 (Thu, 09 Apr 2009) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Updated po's. ------------------------------------------------------------------------ r1618 | fabricecolin | 2009-04-08 14:34:15 +0200 (Wed, 08 Apr 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Bumped version number to 0.92, set as minimum index version. ------------------------------------------------------------------------ r1617 | fabricecolin | 2009-04-07 16:32:55 +0200 (Tue, 07 Apr 2009) | 5 lines Changed paths: M /trunk/README M /trunk/Tokenize/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in The archive filter causes issues with libarchive 2.5.5 so to be on the safe side it's now enabled only if --enable-libarchive is passed to configure. Version 2.6.2 is recommended. When building RPMs, pass "--with libarchive". ------------------------------------------------------------------------ r1616 | fabricecolin | 2009-04-07 15:50:43 +0200 (Tue, 07 Apr 2009) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/MozDexDescription.xml Removed. ------------------------------------------------------------------------ r1615 | fabricecolin | 2009-04-06 17:01:09 +0200 (Mon, 06 Apr 2009) | 5 lines Changed paths: M /trunk/FAQ M /trunk/README Updated following changes to the number of indexing threads, memory usage. List libarchive as a dependency. There was no clue as to where the stopwords should be installed and how they are useful. ------------------------------------------------------------------------ r1614 | fabricecolin | 2009-04-06 16:52:07 +0200 (Mon, 06 Apr 2009) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am M /trunk/pinot.spec.in Build and package libarchivefilter. ------------------------------------------------------------------------ r1613 | fabricecolin | 2009-04-06 16:50:39 +0200 (Mon, 06 Apr 2009) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Don't forget to provide a default title to documents not gone through a filter. ------------------------------------------------------------------------ r1612 | fabricecolin | 2009-04-05 04:22:43 +0200 (Sun, 05 Apr 2009) | 2 lines Changed paths: M /trunk/configure.in The memory pool can be optionally disabled. ------------------------------------------------------------------------ r1611 | fabricecolin | 2009-04-05 04:21:33 +0200 (Sun, 05 Apr 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cc M /trunk/UI/GTK2/src/UIThreads.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Updated following changes for documents ipath. When a new document is indexed, don't simply append it to the My Web Pages list, refresh the list. ------------------------------------------------------------------------ r1610 | fabricecolin | 2009-04-05 04:19:27 +0200 (Sun, 05 Apr 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Updated to cope with documents ipath. Don't parse documents data here, let XapianDatabase::recordToProps() do it. ------------------------------------------------------------------------ r1609 | fabricecolin | 2009-04-05 04:16:33 +0200 (Sun, 05 Apr 2009) | 3 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/WorkerThreads.cpp Try and reclaim memory more often, after flushing the index. Some changes to cope with documents ipath. ------------------------------------------------------------------------ r1608 | fabricecolin | 2009-04-05 04:14:08 +0200 (Sun, 05 Apr 2009) | 2 lines Changed paths: M /trunk/pinot.spec.in Removed dependency on libtar. ------------------------------------------------------------------------ r1607 | fabricecolin | 2009-04-05 04:12:55 +0200 (Sun, 05 Apr 2009) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Don't build TarFilter, libtar is unmaintained. ------------------------------------------------------------------------ r1606 | fabricecolin | 2009-04-02 17:31:30 +0200 (Thu, 02 Apr 2009) | 2 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp Safety check. ------------------------------------------------------------------------ r1605 | fabricecolin | 2009-04-02 17:27:47 +0200 (Thu, 02 Apr 2009) | 2 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp Removed a bunch of superfluous headers. ------------------------------------------------------------------------ r1604 | fabricecolin | 2009-04-02 17:25:40 +0200 (Thu, 02 Apr 2009) | 3 lines Changed paths: M /trunk/Core/PinotSettings.cpp Initialize and cleanup libxml2 when PinotSettings is constructed/destroyed. Added mp4 to the default blacklist, removed Z. ------------------------------------------------------------------------ r1603 | fabricecolin | 2009-03-31 15:04:29 +0200 (Tue, 31 Mar 2009) | 2 lines Changed paths: M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/Makefile.am D /trunk/Collect/MboxCollector.cpp D /trunk/Collect/MboxCollector.h Dropped MboxCollector. ------------------------------------------------------------------------ r1602 | fabricecolin | 2009-03-31 15:00:45 +0200 (Tue, 31 Mar 2009) | 2 lines Changed paths: M /trunk/SQL/MetaDataBackup.cpp M /trunk/SQL/MetaDataBackup.h Changes to accomodate ipath. ------------------------------------------------------------------------ r1601 | fabricecolin | 2009-03-28 04:50:41 +0100 (Sat, 28 Mar 2009) | 3 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/TextConverter.h Work-around documents that declare an invalid charset, eg iso_8859_1 instead of iso-8859-1. ------------------------------------------------------------------------ r1600 | fabricecolin | 2009-03-28 03:46:32 +0100 (Sat, 28 Mar 2009) | 3 lines Changed paths: M /trunk/Utils/Memory.cpp M /trunk/Utils/Memory.h Added getUsage(). In reclaim(), call umem_reap() if umem support is on. ------------------------------------------------------------------------ r1599 | fabricecolin | 2009-03-28 03:36:40 +0100 (Sat, 28 Mar 2009) | 2 lines Changed paths: M /trunk/Collect/FileCollector.cpp If ipath isn't empty, drill down with FilterUtils::reduceDocument(). ------------------------------------------------------------------------ r1598 | fabricecolin | 2009-03-28 03:33:16 +0100 (Sat, 28 Mar 2009) | 2 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/IndexSearch/FilterWrapper.h M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h Moved filterDocument() and related to FilterUtils. ------------------------------------------------------------------------ r1597 | fabricecolin | 2009-03-25 14:41:00 +0100 (Wed, 25 Mar 2009) | 2 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h ipath is a property of DocumentInfo. ------------------------------------------------------------------------ r1596 | fabricecolin | 2009-03-22 15:07:25 +0100 (Sun, 22 Mar 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am Link against libBasicUtils first. ------------------------------------------------------------------------ r1595 | fabricecolin | 2009-03-22 15:06:39 +0100 (Sun, 22 Mar 2009) | 3 lines Changed paths: M /trunk/Core/Makefile.am M /trunk/Core/WorkerThreads.cpp Attempt to reclaim, or rather let the OS reclaim, memory every 1000 threads. Link against libBasicUtils first. ------------------------------------------------------------------------ r1594 | fabricecolin | 2009-03-22 15:04:23 +0100 (Sun, 22 Mar 2009) | 4 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/Tokenize/FilterUtils.cpp Retrieve content out with Filter::get_content(). Don't set dummy documents when the file is empty or of an unsupported type, just go ahead and index its metadata. ------------------------------------------------------------------------ r1593 | fabricecolin | 2009-03-22 15:01:54 +0100 (Sun, 22 Mar 2009) | 2 lines Changed paths: M /trunk/Tokenize/TextConverter.cpp Clear the output on exception. ------------------------------------------------------------------------ r1592 | fabricecolin | 2009-03-22 15:01:22 +0100 (Sun, 22 Mar 2009) | 2 lines Changed paths: M /trunk/Utils/Memory.h Better include config.h since the whole thing is driven by ifdef's. ------------------------------------------------------------------------ r1591 | fabricecolin | 2009-03-22 02:31:09 +0100 (Sun, 22 Mar 2009) | 3 lines Changed paths: M /trunk/Utils/Document.cpp M /trunk/Utils/Document.h M /trunk/Utils/Makefile.am Document relies on Memory for its internal buffer, uses madvise() if available whenever mmap'ing a file. ------------------------------------------------------------------------ r1590 | fabricecolin | 2009-03-21 12:00:39 +0100 (Sat, 21 Mar 2009) | 5 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Set the number of daemon's indexing threads to 2, unless overriden with the environment variable PINOT_MAXIMUM_INDEX_THREADS. Changed flushing again, this time to happen when the queue is empty. This means that the index will be flushed when everything has been indexed. ------------------------------------------------------------------------ r1589 | fabricecolin | 2009-03-21 06:03:11 +0100 (Sat, 21 Mar 2009) | 8 lines Changed paths: A /trunk/Utils/Memory.cpp A /trunk/Utils/Memory.h M /trunk/configure.in Check for boost/pool/poolfwd.hpp, ext/malloc_allocator.h, mallinfo(), malloc_trim() and madvise(). Memory wraps all this together. It provides dstring, a basic_string class backed by a boost::pool memory pool, and either malloc() or umem as allocators, functions to allocate and free buffers out of the same pool, and a function to let the OS reclaim memory. Support for umem is untested at the moment. ------------------------------------------------------------------------ r1588 | fabricecolin | 2009-03-17 15:37:13 +0100 (Tue, 17 Mar 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/IndexSearch/Xapian/LanguageDetector.h M /trunk/IndexSearch/Xapian/XapianDatabaseFactory.h M /trunk/IndexSearch/Xapian/XapianIndex.cpp LanguageDetector is a singleton, and textcat is initialized once. ------------------------------------------------------------------------ r1587 | fabricecolin | 2009-03-11 15:30:17 +0100 (Wed, 11 Mar 2009) | 2 lines Changed paths: M /trunk/Core/pinot-index.cpp Missing include signal.h. Reported by Robert Wotzlaw. ------------------------------------------------------------------------ r1586 | fabricecolin | 2009-03-10 13:08:56 +0100 (Tue, 10 Mar 2009) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am M /trunk/pinot.spec.in Build the new tar filter by default. This requires libtar >= 1.2.11. ------------------------------------------------------------------------ r1585 | fabricecolin | 2009-03-10 13:07:31 +0100 (Tue, 10 Mar 2009) | 5 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Feeding the data through a temporary file is the second most preferred way to feed filters as having the data very likely means that the original file already went through one filter. There's no point trying to convert to UTF8 stuff that's not plain text. ------------------------------------------------------------------------ r1584 | fabricecolin | 2009-03-07 23:47:58 +0100 (Sat, 07 Mar 2009) | 3 lines Changed paths: M /trunk/Core/pinot-index.cpp Don't start the main loop if the document wasn't queued for indexing, eg because it's black-listed. ------------------------------------------------------------------------ r1582 | fabricecolin | 2009-03-07 04:15:02 +0100 (Sat, 07 Mar 2009) | 2 lines Changed paths: M /trunk/ChangeLog Current ChangeLog. ------------------------------------------------------------------------ r1581 | fabricecolin | 2009-03-07 04:11:11 +0100 (Sat, 07 Mar 2009) | 2 lines Changed paths: M /trunk/NEWS Updated NEWS, set 0.91 release date to today. ------------------------------------------------------------------------ r1580 | fabricecolin | 2009-03-05 15:57:55 +0100 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/ChangeLog-dijon Current Dijon ChangeLog. ------------------------------------------------------------------------ r1579 | fabricecolin | 2009-03-05 15:30:19 +0100 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Updated translations. ------------------------------------------------------------------------ r1578 | fabricecolin | 2009-03-05 15:27:23 +0100 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/UI/GTK2/src/pinot.1 Updated manual pages. ------------------------------------------------------------------------ r1577 | fabricecolin | 2009-03-05 15:17:21 +0100 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/TODO M /trunk/configure.in Bumped version number to 0.91, added tasks to the TODO list. ------------------------------------------------------------------------ r1576 | fabricecolin | 2009-03-05 14:33:26 +0100 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Use a regex to get to the refresh META tag, rather than HtmlFilter. ------------------------------------------------------------------------ r1575 | fabricecolin | 2009-03-05 14:32:25 +0100 (Thu, 05 Mar 2009) | 2 lines Changed paths: M /trunk/FAQ Give tips on how to reduce memory usage a bit. ------------------------------------------------------------------------ r1574 | fabricecolin | 2009-03-05 14:21:13 +0100 (Thu, 05 Mar 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp TokensIndexer guesstimates where it's at in the original document and doesn't generate terms past 5Mb. ------------------------------------------------------------------------ r1573 | fabricecolin | 2009-03-04 14:59:49 +0100 (Wed, 04 Mar 2009) | 2 lines Changed paths: M /trunk/Core/WorkerThreads.cpp Don't reset m_backgroundThreadsCount. ------------------------------------------------------------------------ r1572 | fabricecolin | 2009-03-01 15:34:20 +0100 (Sun, 01 Mar 2009) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/pt.po Updated translations by Gena Haltmair (German) and Flávio Martins (Portuguese). ------------------------------------------------------------------------ r1571 | fabricecolin | 2009-03-01 14:37:34 +0100 (Sun, 01 Mar 2009) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp Fixed update query in updateItemsStatus(). ------------------------------------------------------------------------ r1570 | fabricecolin | 2009-03-01 14:36:26 +0100 (Sun, 01 Mar 2009) | 2 lines Changed paths: M /trunk/Utils/DocumentInfo.cpp Missing include, reported by Adel Gadllah, with gcc 4.4. ------------------------------------------------------------------------ r1569 | fabricecolin | 2009-03-01 08:54:00 +0100 (Sun, 01 Mar 2009) | 2 lines Changed paths: M /trunk/Core/OnDiskHandler.cpp M /trunk/Core/OnDiskHandler.h M /trunk/Core/WorkerThreads.cpp Don't flush the index on file events. ------------------------------------------------------------------------ r1568 | fabricecolin | 2009-03-01 03:39:08 +0100 (Sun, 01 Mar 2009) | 3 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h Flush the index when all foreground threads have returned so that it doesn't interfere with indexing. ------------------------------------------------------------------------ r1567 | fabricecolin | 2009-03-01 03:35:42 +0100 (Sun, 01 Mar 2009) | 2 lines Changed paths: M /trunk/Core/UniqueApplication.cpp M /trunk/Core/WorkerThreads.cpp M /trunk/configure.in Functions kill() and lstat() may not be available. ------------------------------------------------------------------------ r1566 | fabricecolin | 2009-02-28 13:44:09 +0100 (Sat, 28 Feb 2009) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Missing include. ------------------------------------------------------------------------ r1565 | fabricecolin | 2009-02-28 12:37:26 +0100 (Sat, 28 Feb 2009) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/DaemonState.h Flush the index explicitely when a crawler returns and no other is started. ------------------------------------------------------------------------ r1564 | fabricecolin | 2009-02-28 07:37:41 +0100 (Sat, 28 Feb 2009) | 4 lines Changed paths: M /trunk/Core/PinotSettings.cpp M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/Core/pinot-index.cpp M /trunk/UI/GTK2/src/pinot.cc PinotSettings::getInstance() will initialize libxml2. HtmlFilter doesn't need initializing any more. Don't unload filters at exit time to avoid a so far unexplained crash. ------------------------------------------------------------------------ r1563 | fabricecolin | 2009-02-28 03:46:18 +0100 (Sat, 28 Feb 2009) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am HtmlParser files. ------------------------------------------------------------------------ r1562 | fabricecolin | 2009-02-18 12:29:22 +0100 (Wed, 18 Feb 2009) | 7 lines Changed paths: M /trunk/configure.in Disable check for linux/sched.h, and therefore support for SCHED_IDLE, for the time being. See : http://lkml.org/lkml/2009/1/11/70 http://lkml.org/lkml/2009/1/22/416 http://lkml.org/lkml/2009/1/30/297 Thanks to John Werden for reporting this. ------------------------------------------------------------------------ r1561 | fabricecolin | 2009-02-16 15:14:13 +0100 (Mon, 16 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-index.cpp M /trunk/README Running pinot-index --index on My Web Pages or My Documents isn't a good idea. ------------------------------------------------------------------------ r1560 | fabricecolin | 2009-02-15 15:31:13 +0100 (Sun, 15 Feb 2009) | 4 lines Changed paths: M /trunk/Core/DaemonState.cpp M /trunk/Core/ServerThreads.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Be selective when changing the status of crawled items on full scan. The status of links shouldn't be changed or they would be skipped during the crawl and what they point to would get unindexed. ------------------------------------------------------------------------ r1559 | fabricecolin | 2009-02-15 13:19:43 +0100 (Sun, 15 Feb 2009) | 5 lines Changed paths: M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h Cache what type of downloader DownloadingThread last got to avoid getting a new one every time. IndexingThread doesn't get a new Index on every call to its doWork() method. Look for the ROBOTS meta tag in remote documents only. ------------------------------------------------------------------------ r1558 | fabricecolin | 2009-02-15 03:25:00 +0100 (Sun, 15 Feb 2009) | 3 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/configure.in If linux/sched.h is available, use sched_setscheduler(SCHED_IDLE) in place of setpriority(). ------------------------------------------------------------------------ r1557 | fabricecolin | 2009-02-15 03:23:39 +0100 (Sun, 15 Feb 2009) | 3 lines Changed paths: M /trunk/Core/pinot-index.cpp Don't allow threads to flush the index. Delete the state object before exit. ------------------------------------------------------------------------ r1556 | fabricecolin | 2009-02-15 03:22:11 +0100 (Sun, 15 Feb 2009) | 5 lines Changed paths: M /trunk/Core/ServerThreads.cpp M /trunk/Core/WorkerThreads.cpp M /trunk/Utils/TimeConverter.cpp Fixed some memory leaks : - when converting to a timestamp (struct tm) - when reapplying user-set metadata (Index) - when skipping the download of local files (Document) ------------------------------------------------------------------------ r1555 | fabricecolin | 2009-02-14 03:39:19 +0100 (Sat, 14 Feb 2009) | 3 lines Changed paths: M /trunk/README Specify that attachments and documents embedded in mbox email are indexed, and can be opened from the UI. ------------------------------------------------------------------------ r1554 | fabricecolin | 2009-02-12 15:06:02 +0100 (Thu, 12 Feb 2009) | 2 lines Changed paths: M /trunk/pinot-dbus-daemon.desktop M /trunk/pinot-prefs.desktop M /trunk/pinot.desktop Removed deprecated Encoding field. ------------------------------------------------------------------------ r1553 | fabricecolin | 2009-02-12 15:04:00 +0100 (Thu, 12 Feb 2009) | 3 lines Changed paths: M /trunk/Core/ServerThreads.cpp Don't have monitorEntry() return false if no monitor is available, this would log an error. ------------------------------------------------------------------------ r1552 | fabricecolin | 2009-02-12 15:02:28 +0100 (Thu, 12 Feb 2009) | 5 lines Changed paths: M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h M /trunk/Core/pinot-index.cpp M /trunk/UI/GTK2/src/Makefile.am A /trunk/UI/GTK2/src/UIThreads.cc A /trunk/UI/GTK2/src/UIThreads.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/prefsWindow.hh M /trunk/UI/GTK2/src/statisticsDialog.hh M /trunk/po/POTFILES.in ThreadsManager needs to know whether local files should be scanned with DirectoryScanner or indexed right away with IndexingThread. This only makes sense in pinot-index and the UI. Moved UI-specific thread classes to UIThreads. ------------------------------------------------------------------------ r1551 | fabricecolin | 2009-02-07 10:37:40 +0100 (Sat, 07 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-index.1 M /trunk/Core/pinot-index.cpp Make it clear that -d/--db is mandatory. ------------------------------------------------------------------------ r1550 | fabricecolin | 2009-02-07 10:36:04 +0100 (Sat, 07 Feb 2009) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Fix for previous check-in. ------------------------------------------------------------------------ r1549 | fabricecolin | 2009-02-07 05:20:06 +0100 (Sat, 07 Feb 2009) | 2 lines Changed paths: M /trunk/Core/Makefile.am M /trunk/Core/pinot-index.cpp M /trunk/IndexSearch/pinot-label.cpp Let Url deal with relative paths. ------------------------------------------------------------------------ r1548 | fabricecolin | 2009-02-07 05:17:39 +0100 (Sat, 07 Feb 2009) | 2 lines Changed paths: M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h Additional Url constructor for absolute and relative paths. ------------------------------------------------------------------------ r1547 | fabricecolin | 2009-02-06 15:49:54 +0100 (Fri, 06 Feb 2009) | 2 lines Changed paths: M /trunk/Core/DaemonState.cpp Swapped DirectoryScanner for Crawler threads. ------------------------------------------------------------------------ r1546 | fabricecolin | 2009-02-06 15:47:31 +0100 (Fri, 06 Feb 2009) | 4 lines Changed paths: M /trunk/Core/pinot-index.cpp IndexingState is a ThreadsManager subclass that enables this program to rely on IndexingThread and DirectoryScannerThread. In practice, this means it can index directories recursively at last. ------------------------------------------------------------------------ r1545 | fabricecolin | 2009-02-06 15:43:55 +0100 (Fri, 06 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp Minor includes fix. ------------------------------------------------------------------------ r1544 | fabricecolin | 2009-02-06 15:43:22 +0100 (Fri, 06 Feb 2009) | 4 lines Changed paths: M /trunk/Core/ServerThreads.cpp M /trunk/Core/ServerThreads.h M /trunk/Core/WorkerThreads.cpp M /trunk/Core/WorkerThreads.h Moved crawl history and monitoring out of DirectoryScanner so that it's usable outside of the daemon, and moved the class to WorkerThreads. Crawler inherits from DirectoryScanner and handles crawl history and monitoring. ------------------------------------------------------------------------ r1543 | fabricecolin | 2009-02-05 14:16:55 +0100 (Thu, 05 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-index.cpp Resolve local URLs only. ------------------------------------------------------------------------ r1542 | fabricecolin | 2009-02-04 12:36:33 +0100 (Wed, 04 Feb 2009) | 2 lines Changed paths: M /trunk/Core/ServerThreads.cpp Use Url::resolvePath(). ------------------------------------------------------------------------ r1541 | fabricecolin | 2009-02-03 14:25:58 +0100 (Tue, 03 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-search.cpp Support for gettext(). ------------------------------------------------------------------------ r1540 | fabricecolin | 2009-02-03 14:25:11 +0100 (Tue, 03 Feb 2009) | 3 lines Changed paths: M /trunk/Core/pinot-index.cpp Urls passed as parameters may exclude "file://", be relative paths. Support for gettext(). ------------------------------------------------------------------------ r1539 | fabricecolin | 2009-02-03 14:14:49 +0100 (Tue, 03 Feb 2009) | 2 lines Changed paths: M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h Url::resolvePath() is based on the path resolver code written for ServerThreads. ------------------------------------------------------------------------ r1538 | fabricecolin | 2009-02-02 14:59:23 +0100 (Mon, 02 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.1 M /trunk/Core/pinot-search.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/UI/GTK2/src/pinot.1 Updated manual pages. ------------------------------------------------------------------------ r1537 | fabricecolin | 2009-02-02 14:44:00 +0100 (Mon, 02 Feb 2009) | 3 lines Changed paths: M /trunk/Core/Makefile.am A /trunk/Core/pinot-index.1 (from /trunk/IndexSearch/pinot-index.1:1536) A /trunk/Core/pinot-index.cpp (from /trunk/IndexSearch/pinot-index.cpp:1536) M /trunk/IndexSearch/Makefile.am D /trunk/IndexSearch/pinot-index.1 D /trunk/IndexSearch/pinot-index.cpp M /trunk/Makefile.am Moved pinot-index to Core. Use PinotSettings for proxy parameters. The -d/--db option may be the name of an index. ------------------------------------------------------------------------ r1536 | fabricecolin | 2009-02-01 15:28:52 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/Core/Makefile.am M /trunk/Makefile.am Minor fixes. ------------------------------------------------------------------------ r1535 | fabricecolin | 2009-02-01 14:59:46 +0100 (Sun, 01 Feb 2009) | 4 lines Changed paths: M /trunk/Core/Makefile.am A /trunk/Core/pinot-search.1 (from /trunk/IndexSearch/pinot-search.1:1533) A /trunk/Core/pinot-search.cpp (from /trunk/IndexSearch/pinot-search.cpp:1533) M /trunk/IndexSearch/Makefile.am D /trunk/IndexSearch/pinot-search.1 D /trunk/IndexSearch/pinot-search.cpp M /trunk/Makefile.am Moved pinot-search to Core. Use PinotSettings for proxy parameters and plugins' editable parameters. Added -r/--storedquery option to find a stored query by name and run it. ------------------------------------------------------------------------ r1534 | fabricecolin | 2009-02-01 14:53:59 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/Core/pinot-dbus-daemon.cpp M /trunk/UI/GTK2/src/pinot.cc Removed reference to tokenizer libraries. ------------------------------------------------------------------------ r1533 | fabricecolin | 2009-02-01 12:55:37 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/Core/Makefile.am Daemon specifics don't need be in libCore. ------------------------------------------------------------------------ r1532 | fabricecolin | 2009-02-01 12:49:12 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/po/POTFILES.in Files were renamed. ------------------------------------------------------------------------ r1531 | fabricecolin | 2009-02-01 12:47:02 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/Core/Makefile.am D /trunk/Core/pinot-dbus-daemon.cc A /trunk/Core/pinot-dbus-daemon.cpp (from /trunk/Core/pinot-dbus-daemon.cc:1530) A /trunk/UI/GTK2/src/EnginesTree.cc (from /trunk/UI/GTK2/src/EnginesTree.cpp:1529) D /trunk/UI/GTK2/src/EnginesTree.cpp D /trunk/UI/GTK2/src/EnginesTree.h A /trunk/UI/GTK2/src/EnginesTree.hh (from /trunk/UI/GTK2/src/EnginesTree.h:1529) A /trunk/UI/GTK2/src/IndexPage.cc (from /trunk/UI/GTK2/src/IndexPage.cpp:1529) D /trunk/UI/GTK2/src/IndexPage.cpp D /trunk/UI/GTK2/src/IndexPage.h A /trunk/UI/GTK2/src/IndexPage.hh (from /trunk/UI/GTK2/src/IndexPage.h:1529) M /trunk/UI/GTK2/src/Makefile.am A /trunk/UI/GTK2/src/ModelColumns.cc (from /trunk/UI/GTK2/src/ModelColumns.cpp:1529) D /trunk/UI/GTK2/src/ModelColumns.cpp D /trunk/UI/GTK2/src/ModelColumns.h A /trunk/UI/GTK2/src/ModelColumns.hh (from /trunk/UI/GTK2/src/ModelColumns.h:1529) A /trunk/UI/GTK2/src/Notebook.cc (from /trunk/UI/GTK2/src/Notebook.cpp:1529) D /trunk/UI/GTK2/src/Notebook.cpp D /trunk/UI/GTK2/src/Notebook.h A /trunk/UI/GTK2/src/Notebook.hh (from /trunk/UI/GTK2/src/Notebook.h:1529) A /trunk/UI/GTK2/src/ResultsTree.cc (from /trunk/UI/GTK2/src/ResultsTree.cpp:1529) D /trunk/UI/GTK2/src/ResultsTree.cpp D /trunk/UI/GTK2/src/ResultsTree.h A /trunk/UI/GTK2/src/ResultsTree.hh (from /trunk/UI/GTK2/src/ResultsTree.h:1529) M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/indexDialog.hh M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/prefsWindow.hh M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/statisticsDialog.hh Use the same naming conventions as glademm where appropriate. ------------------------------------------------------------------------ r1530 | fabricecolin | 2009-02-01 12:04:31 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: A /trunk/Core A /trunk/Core/DaemonState.cpp (from /trunk/UI/GTK2/src/DaemonState.cpp:1529) A /trunk/Core/DaemonState.h (from /trunk/UI/GTK2/src/DaemonState.h:1529) A /trunk/Core/Makefile.am A /trunk/Core/OnDiskHandler.cpp (from /trunk/UI/GTK2/src/OnDiskHandler.cpp:1529) A /trunk/Core/OnDiskHandler.h (from /trunk/UI/GTK2/src/OnDiskHandler.h:1529) A /trunk/Core/PinotSettings.cpp (from /trunk/UI/GTK2/src/PinotSettings.cpp:1529) A /trunk/Core/PinotSettings.h (from /trunk/UI/GTK2/src/PinotSettings.h:1529) A /trunk/Core/PinotUtils.cpp (from /trunk/UI/GTK2/src/PinotUtils.cpp:1529) A /trunk/Core/PinotUtils.h (from /trunk/UI/GTK2/src/PinotUtils.h:1529) A /trunk/Core/ServerThreads.cpp (from /trunk/UI/GTK2/src/ServerThreads.cpp:1529) A /trunk/Core/ServerThreads.h (from /trunk/UI/GTK2/src/ServerThreads.h:1529) A /trunk/Core/UniqueApplication.cpp (from /trunk/UI/GTK2/src/UniqueApplication.cpp:1529) A /trunk/Core/UniqueApplication.h (from /trunk/UI/GTK2/src/UniqueApplication.h:1529) A /trunk/Core/WorkerThreads.cpp (from /trunk/UI/GTK2/src/WorkerThreads.cpp:1529) A /trunk/Core/WorkerThreads.h (from /trunk/UI/GTK2/src/WorkerThreads.h:1529) A /trunk/Core/de.berlios.Pinot.service.in (from /trunk/UI/GTK2/src/de.berlios.Pinot.service.in:1529) A /trunk/Core/pinot-dbus-daemon.1 (from /trunk/UI/GTK2/src/pinot-dbus-daemon.1:1529) A /trunk/Core/pinot-dbus-daemon.cc (from /trunk/UI/GTK2/src/pinot-dbus-daemon.cc:1529) A /trunk/Core/pinot-dbus-daemon.xml (from /trunk/UI/GTK2/src/pinot-dbus-daemon.xml:1529) M /trunk/Makefile.am D /trunk/UI/GTK2/src/DaemonState.cpp D /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/Makefile.am D /trunk/UI/GTK2/src/OnDiskHandler.cpp D /trunk/UI/GTK2/src/OnDiskHandler.h D /trunk/UI/GTK2/src/PinotSettings.cpp D /trunk/UI/GTK2/src/PinotSettings.h D /trunk/UI/GTK2/src/PinotUtils.cpp D /trunk/UI/GTK2/src/PinotUtils.h D /trunk/UI/GTK2/src/ServerThreads.cpp D /trunk/UI/GTK2/src/ServerThreads.h D /trunk/UI/GTK2/src/UniqueApplication.cpp D /trunk/UI/GTK2/src/UniqueApplication.h D /trunk/UI/GTK2/src/WorkerThreads.cpp D /trunk/UI/GTK2/src/WorkerThreads.h D /trunk/UI/GTK2/src/de.berlios.Pinot.service.in D /trunk/UI/GTK2/src/pinot-dbus-daemon.1 D /trunk/UI/GTK2/src/pinot-dbus-daemon.cc D /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/configure.in M /trunk/po/POTFILES.in Moved daemon and non-UI specific code out of UI/GTK2/src to Core. ------------------------------------------------------------------------ r1529 | fabricecolin | 2009-02-01 10:53:30 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp Don't pop actions out of the queue when threads are being stopped. ------------------------------------------------------------------------ r1528 | fabricecolin | 2009-02-01 10:51:33 +0100 (Sun, 01 Feb 2009) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-search.cpp Use getResults()'s return by reference as it should be. ------------------------------------------------------------------------ r1527 | fabricecolin | 2009-01-30 15:53:45 +0100 (Fri, 30 Jan 2009) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc When a spelling suggestion is selected, don't fail if the query can't be found. This fixes the bug where nothing happens when the suggestion is on Live query. ------------------------------------------------------------------------ r1525 | fabricecolin | 2009-01-28 15:43:35 +0100 (Wed, 28 Jan 2009) | 2 lines Changed paths: M /trunk/ChangeLog Even more current log :-) ------------------------------------------------------------------------ r1524 | fabricecolin | 2009-01-28 15:41:03 +0100 (Wed, 28 Jan 2009) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1523 | fabricecolin | 2009-01-28 15:39:57 +0100 (Wed, 28 Jan 2009) | 2 lines Changed paths: M /trunk/ChangeLog-dijon Changes in Dijon. ------------------------------------------------------------------------ r1522 | fabricecolin | 2009-01-28 15:39:04 +0100 (Wed, 28 Jan 2009) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current translations. ------------------------------------------------------------------------ r1521 | fabricecolin | 2009-01-28 15:38:21 +0100 (Wed, 28 Jan 2009) | 3 lines Changed paths: M /trunk/NEWS M /trunk/configure.in Changes in this release. Upped the version number. Force an upgrade of old indexes. ------------------------------------------------------------------------ r1520 | fabricecolin | 2009-01-28 15:36:27 +0100 (Wed, 28 Jan 2009) | 2 lines Changed paths: M /trunk/README M /trunk/TODO Minor change to the README. Removed some items from, added a lot more to the TODO. ------------------------------------------------------------------------ r1519 | fabricecolin | 2009-01-28 14:10:10 +0100 (Wed, 28 Jan 2009) | 3 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp In PER_INDEX mode, there's no increment to apply since the desired number of results is requested in one call. ------------------------------------------------------------------------ r1518 | fabricecolin | 2009-01-27 14:58:53 +0100 (Tue, 27 Jan 2009) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-cd.sh If the arguments have a slash anywhere, pass them all to cd. ------------------------------------------------------------------------ r1517 | fabricecolin | 2009-01-27 11:41:28 +0100 (Tue, 27 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Only skip downloading for "reliably" typed local files, not remote documents. ------------------------------------------------------------------------ r1516 | fabricecolin | 2009-01-27 11:08:40 +0100 (Tue, 27 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated manual pages. ------------------------------------------------------------------------ r1515 | fabricecolin | 2009-01-27 11:07:51 +0100 (Tue, 27 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.cpp Fix stupid "loop on unknown protocol" loop. ------------------------------------------------------------------------ r1514 | fabricecolin | 2009-01-27 11:06:54 +0100 (Tue, 27 Jan 2009) | 2 lines Changed paths: M /trunk/Collect/DownloaderFactory.cpp HTTP downloaders supports HTTPS too. ------------------------------------------------------------------------ r1513 | fabricecolin | 2009-01-26 16:23:37 +0100 (Mon, 26 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Use default labels, queries, blacklist... if ui.xml wasn't loaded. ------------------------------------------------------------------------ r1512 | fabricecolin | 2009-01-26 15:21:07 +0100 (Mon, 26 Jan 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp When interrupted, don't set the current entry's status to CRAWLED. Delete any CRAWLING entries the previous instance didn't have time to process and has left in. ------------------------------------------------------------------------ r1511 | fabricecolin | 2009-01-26 04:38:35 +0100 (Mon, 26 Jan 2009) | 5 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/ActionQueue.h M /trunk/Tokenize/FilterUtils.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc FilterUrils::isSupportedType() could return false for supported types. This could lead to some documents content not being indexed. In pinot-dbus-daemon, actions queued by a previous instance of the daemon were cleared and the corresponding files would never get indexed. ------------------------------------------------------------------------ r1510 | fabricecolin | 2009-01-24 09:56:29 +0100 (Sat, 24 Jan 2009) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/he.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current translations. ------------------------------------------------------------------------ r1509 | fabricecolin | 2009-01-24 09:04:24 +0100 (Sat, 24 Jan 2009) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-cd.sh M /trunk/scripts/bash/pinot-check-file.sh M /trunk/scripts/bash/pinot-enum-index.sh Cosmetic changes. ------------------------------------------------------------------------ r1508 | fabricecolin | 2009-01-24 08:51:23 +0100 (Sat, 24 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/prefsWindow.hh If the StartDaemon thread didn't run, save preferences when closed. ------------------------------------------------------------------------ r1507 | fabricecolin | 2009-01-24 08:14:47 +0100 (Sat, 24 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp Changed error messages. ------------------------------------------------------------------------ r1506 | fabricecolin | 2009-01-24 04:37:09 +0100 (Sat, 24 Jan 2009) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp In PinotUtils, catch Glib::Error and unknown exceptions whenever converting. Minor edit to DEBUG messages in ResultsTree. ------------------------------------------------------------------------ r1505 | fabricecolin | 2009-01-24 04:32:13 +0100 (Sat, 24 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/OpenSearchParser.h M /trunk/IndexSearch/PluginParsers.h M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/SherlockParser.h M /trunk/IndexSearch/WebEngine.cpp The response parsers have a go at determining the charset. ------------------------------------------------------------------------ r1504 | fabricecolin | 2009-01-22 15:24:53 +0100 (Thu, 22 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Fix previous check-in. ------------------------------------------------------------------------ r1503 | fabricecolin | 2009-01-22 14:52:38 +0100 (Thu, 22 Jan 2009) | 7 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Try and load prefs.xml first: if that fails, load the old config.xml, if it works then load ui.xml. This fixes the issue where the daemon would fail to load settings on its first Reload. Save backend and googleapikey in prefs.xml. Copy the UI's history over to the daemon's if the first exists and the second doesn't. ------------------------------------------------------------------------ r1502 | fabricecolin | 2009-01-22 14:22:13 +0100 (Thu, 22 Jan 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc The PinotSettings::IndexProperties encapsulates an index properties and side-steps issues experienced with manipulating and looking up index names in non-Latin locales. ------------------------------------------------------------------------ r1501 | fabricecolin | 2009-01-22 14:19:16 +0100 (Thu, 22 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsWindow.cc No need to switch tabs on first run as Indexing is now the first tab. ------------------------------------------------------------------------ r1500 | fabricecolin | 2009-01-19 15:03:28 +0100 (Mon, 19 Jan 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.h M /trunk/IndexSearch/Xapian/XapianDatabaseFactory.cpp M /trunk/IndexSearch/Xapian/XapianDatabaseFactory.h The factory doesn't serve objects once closeAll() is called. Lock databases before deleting them, processing merged databases first. ------------------------------------------------------------------------ r1499 | fabricecolin | 2009-01-19 15:01:12 +0100 (Mon, 19 Jan 2009) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Repopulate the index menu after editing an index properties in case the name changed. ------------------------------------------------------------------------ r1498 | fabricecolin | 2009-01-19 14:24:09 +0100 (Mon, 19 Jan 2009) | 5 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/PluginWebEngine.cpp PluginWebEngine::runQuery() checks whether scrolling is possible with the plugin, stops at the first page if it isn't. OpenSearchParser sets m_nextIncrement to 0 to disable scrolling if none of count, startIndex, startPage is found. ------------------------------------------------------------------------ r1497 | fabricecolin | 2009-01-19 14:20:06 +0100 (Mon, 19 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/IOIDescription.xml Mapped hitsPerPage to {count}. ------------------------------------------------------------------------ r1496 | fabricecolin | 2009-01-17 05:16:22 +0100 (Sat, 17 Jan 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/pinot.desktop Synced pinot.desktop strings with current po's. Added to prefsWindow translatable strings from pinot-prefs.desktop even though this window won't show them. ------------------------------------------------------------------------ r1495 | fabricecolin | 2009-01-16 15:47:52 +0100 (Fri, 16 Jan 2009) | 2 lines Changed paths: M /trunk/README M /trunk/UI/GTK2/src/mainWindow.cc Copyright 2009. ------------------------------------------------------------------------ r1494 | fabricecolin | 2009-01-14 15:26:23 +0100 (Wed, 14 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-search.cpp If no results, say so ! ------------------------------------------------------------------------ r1493 | fabricecolin | 2009-01-14 15:25:10 +0100 (Wed, 14 Jan 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp In ResultsTree::findResultsExtract(), the extract doesn't need to be converted, it was converted earlier in addResults(). In appendResult(), same for the URL. ------------------------------------------------------------------------ r1492 | fabricecolin | 2009-01-14 15:23:09 +0100 (Wed, 14 Jan 2009) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc When listing an index, or appending a freshly indexed document, the charset is known to be UTF-8 so let ResultsTree::addResults() know. ------------------------------------------------------------------------ r1491 | fabricecolin | 2009-01-14 15:08:16 +0100 (Wed, 14 Jan 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp This expects documents to be converted to UTF-8 at indexing time, so that should be the results' charset. ------------------------------------------------------------------------ r1490 | fabricecolin | 2009-01-13 17:02:55 +0100 (Tue, 13 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc Conversion fixes on index names, documents serial string, language names. ------------------------------------------------------------------------ r1489 | fabricecolin | 2009-01-13 15:19:50 +0100 (Tue, 13 Jan 2009) | 2 lines Changed paths: M /trunk/po/fr.po M /trunk/po/ja.po Updated translations by Frédéric Grosshans (fr) and Mizuki-san (ja). ------------------------------------------------------------------------ r1488 | fabricecolin | 2009-01-13 15:09:33 +0100 (Tue, 13 Jan 2009) | 3 lines Changed paths: A /trunk/scripts/bash/pinot-check-file.sh A script that checks whether a file is in My Documents. Useful when run with "find ... -exec ...". ------------------------------------------------------------------------ r1487 | fabricecolin | 2009-01-13 15:06:57 +0100 (Tue, 13 Jan 2009) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp After resolving and scanning a symlink, don't hang around, just return or the symlink itself won't be indexed. ------------------------------------------------------------------------ r1486 | fabricecolin | 2009-01-13 00:43:14 +0100 (Tue, 13 Jan 2009) | 4 lines Changed paths: D /trunk/IndexSearch/Plugins/A9.src D /trunk/IndexSearch/Plugins/BitTorrent.src A /trunk/IndexSearch/Plugins/IOIDescription.xml Removing A9, now just a product search engine and less useful than the Amazon API, and BitTorrent, which redirects to Ask.com. Adding a plugin for the Internet Open Index at http://index.isc.org/ ------------------------------------------------------------------------ r1485 | fabricecolin | 2009-01-12 15:58:17 +0100 (Mon, 12 Jan 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc More conversion fixes. In ResultsTree::deleteResults(), make sure the "No results" row is deleted, if present. ------------------------------------------------------------------------ r1484 | fabricecolin | 2009-01-12 15:56:38 +0100 (Mon, 12 Jan 2009) | 3 lines Changed paths: M /trunk/README Talk about how symlinks are handled, the new tagged cd script and aliases users may find useful. ------------------------------------------------------------------------ r1483 | fabricecolin | 2009-01-12 15:54:32 +0100 (Mon, 12 Jan 2009) | 3 lines Changed paths: M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/WebEngine.cpp Moved relative URL rebase code from SherlockResponseParser::parse() to WebEngine::processResult(). ------------------------------------------------------------------------ r1482 | fabricecolin | 2009-01-12 15:52:18 +0100 (Mon, 12 Jan 2009) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-enum-index.sh Don't output the document ID, it makes comparing with find's output easier. ------------------------------------------------------------------------ r1481 | fabricecolin | 2009-01-11 12:01:29 +0100 (Sun, 11 Jan 2009) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/po/it.po Updated Italian translation by Marco Bazzani. ------------------------------------------------------------------------ r1480 | fabricecolin | 2009-01-11 11:50:09 +0100 (Sun, 11 Jan 2009) | 2 lines Changed paths: M /trunk/Makefile.am Force the link in case the destination file already exists. ------------------------------------------------------------------------ r1479 | fabricecolin | 2009-01-11 11:49:19 +0100 (Sun, 11 Jan 2009) | 3 lines Changed paths: M /trunk/scripts/bash/pinot-enum-index.sh Make sure we only consider the document IDs after the first colon in delve's output. Output all document IDs to urls.txt. ------------------------------------------------------------------------ r1478 | fabricecolin | 2009-01-11 06:20:13 +0100 (Sun, 11 Jan 2009) | 4 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Fixed get/getting stats flags. The numbers of viewed and crawled items can be had right away without waiting for the timer to kick in. ------------------------------------------------------------------------ r1477 | fabricecolin | 2009-01-11 06:04:27 +0100 (Sun, 11 Jan 2009) | 5 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h Partially reverted revision 1391. Indexing is delegated to other threads by default so that the crawler is not at the mercy of a bad external filter. The possibility of doing it inline is left in and may be useful to pinot-index once it's made to use the same worker threads as the daemon and the UI. ------------------------------------------------------------------------ r1476 | fabricecolin | 2009-01-11 05:51:52 +0100 (Sun, 11 Jan 2009) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-cd.sh Minor changes. ------------------------------------------------------------------------ r1475 | fabricecolin | 2009-01-11 05:46:19 +0100 (Sun, 11 Jan 2009) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/he.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/sv.po Updated translations by Yaron (he), JW (nl), Henrique P. Machado (pt_BR), _PN_boy (pt) and Daniel Nylander (se). ------------------------------------------------------------------------ r1474 | fabricecolin | 2009-01-10 17:16:14 +0100 (Sat, 10 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp Delete CRAWL_LINK items before the crawl, not after. ------------------------------------------------------------------------ r1473 | fabricecolin | 2009-01-10 16:25:42 +0100 (Sat, 10 Jan 2009) | 2 lines Changed paths: M /trunk/pinot-prefs.desktop Added X-GNOME-PersonalSettings to Categories. ------------------------------------------------------------------------ r1472 | fabricecolin | 2009-01-10 16:24:26 +0100 (Sat, 10 Jan 2009) | 12 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp Don't skip outright locations that already have a CrawlHistory entry, this would prevent resuming gracefully if the previous indexing run stopped before completion. Keep track of symlinks we follow, and skip those that refer to locations that have been crawled (they have a CrawlHistory entry of any type) or that we know will be crawled because they are under an indexable location. Once a symlink is followed, if we end up back in one of the indexable locations, symlinks found there will be skipped. Skipped symlinks are indexed on their own with MIME type "inode/symlink". In IndexingThread, avoid downloading unnecessarily documents for which an unsupported MIME type is supplied by the caller. ------------------------------------------------------------------------ r1471 | fabricecolin | 2009-01-10 15:50:55 +0100 (Sat, 10 Jan 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Don't attempt removing unprefixed title terms in removeCommonTerms(). While older versions create those terms, 0.90 will trigger or request a reindex. ------------------------------------------------------------------------ r1470 | fabricecolin | 2009-01-10 08:23:41 +0100 (Sat, 10 Jan 2009) | 3 lines Changed paths: M /trunk/Tokenize/TextConverter.cpp Protect against exceptions thrown by Glib::IConv. Catch Glib::Error and unknown exceptions whenever converting. ------------------------------------------------------------------------ r1469 | fabricecolin | 2009-01-09 13:03:47 +0100 (Fri, 09 Jan 2009) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/Utils/DocumentInfo.cpp Don't remove the charset declaration from the type. XapianIndex makes sure only the type is indexed with prefix T. ------------------------------------------------------------------------ r1468 | fabricecolin | 2009-01-09 13:01:20 +0100 (Fri, 09 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Makefile.am Fixed pinot_label_LDADD. ------------------------------------------------------------------------ r1467 | fabricecolin | 2009-01-08 12:41:28 +0100 (Thu, 08 Jan 2009) | 3 lines Changed paths: M /trunk/AUTHORS Suggestions from John Werden, translations by Frédéric Grosshans (fr) and Yaron (he). ------------------------------------------------------------------------ r1466 | fabricecolin | 2009-01-06 17:09:40 +0100 (Tue, 06 Jan 2009) | 9 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ServerThreads.cpp Keep track of what's being crawled. Read symlinks, resolve where they point to and skip those that point to places that have already been crawled, or places we know will be crawled. This should prevent from following symlinks that point to the directory they are in, as well as crawling the same files twice even though some perfectly fine symlinks won't be followed. Removed unused PinotSettings::TimestampedItem, made sure IndexableLocation's "is source" flag is set. ------------------------------------------------------------------------ r1465 | fabricecolin | 2009-01-02 19:37:35 +0100 (Fri, 02 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Disable the properties menuitem when nothing is selected. ------------------------------------------------------------------------ r1464 | fabricecolin | 2009-01-02 13:56:27 +0100 (Fri, 02 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/queryDialog.cc Conversion fixes. ------------------------------------------------------------------------ r1463 | fabricecolin | 2009-01-02 13:28:34 +0100 (Fri, 02 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Use the stock gtk-about menuitem. ------------------------------------------------------------------------ r1462 | fabricecolin | 2009-01-02 10:55:55 +0100 (Fri, 02 Jan 2009) | 2 lines Changed paths: M /trunk/configure.in A /trunk/po/he.po Hebrew translation by Yaron. ------------------------------------------------------------------------ r1461 | fabricecolin | 2009-01-02 10:22:34 +0100 (Fri, 02 Jan 2009) | 4 lines Changed paths: M /trunk/AUTHORS M /trunk/Makefile.am M /trunk/pinot.spec.in Xapian >= 1.0.4 is now required. When installing, use "rm -f" to remove statis libraries. Mention Martin Michlmayr's contribution. ------------------------------------------------------------------------ r1460 | fabricecolin | 2009-01-02 10:19:49 +0100 (Fri, 02 Jan 2009) | 2 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h Prior to signaling a file, apply whatever user-set metadata we may have. ------------------------------------------------------------------------ r1459 | fabricecolin | 2009-01-02 10:17:51 +0100 (Fri, 02 Jan 2009) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Better lowercase the language string when testing for a stemmer. ------------------------------------------------------------------------ r1458 | fabricecolin | 2008-12-31 14:27:49 +0100 (Wed, 31 Dec 2008) | 4 lines Changed paths: M /trunk/SQL/MetaDataBackup.cpp M /trunk/SQL/MetaDataBackup.h M /trunk/UI/GTK2/src/ServerThreads.cpp Fixed use of LIKE in MetaDataBackup. The getItems() method didn't unescape URLs. In DirectoryScannerThread::doWork(), restore metadata only for documents in the directory being crawled. ------------------------------------------------------------------------ r1457 | fabricecolin | 2008-12-30 15:08:09 +0100 (Tue, 30 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade Reorganized menus in mainWindow, moved tabs in prefsWindow. ------------------------------------------------------------------------ r1456 | fabricecolin | 2008-12-30 15:03:54 +0100 (Tue, 30 Dec 2008) | 8 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Reorganized menus, merged menuitems from Results and Index that overlapped in functionality. Unified expansion from a document selected with More Like This and from one dropped on the queries list. Export works on both results lists and index pages. Index Results will index new documents or update documents in My Web Pages, not My Documents. ------------------------------------------------------------------------ r1455 | fabricecolin | 2008-12-29 19:32:55 +0100 (Mon, 29 Dec 2008) | 2 lines Changed paths: M /trunk/Utils/DocumentInfo.cpp Remove the charset specification from the MIME type. ------------------------------------------------------------------------ r1454 | fabricecolin | 2008-12-29 19:32:11 +0100 (Mon, 29 Dec 2008) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Removed reference to deprecated protocols. ------------------------------------------------------------------------ r1453 | fabricecolin | 2008-12-29 19:01:01 +0100 (Mon, 29 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Export also works in flat mode, ie when browsing an index. ------------------------------------------------------------------------ r1452 | fabricecolin | 2008-12-29 18:59:08 +0100 (Mon, 29 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/prefsWindow.hh M /trunk/UI/GTK2/src/prefsWindow_glade.cc M /trunk/UI/GTK2/src/prefsWindow_glade.hh Reorded tabs, renamed General to Miscellaneous. ------------------------------------------------------------------------ r1451 | fabricecolin | 2008-12-29 18:56:26 +0100 (Mon, 29 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h Moved the tree in NotebookPageBox. ------------------------------------------------------------------------ r1450 | fabricecolin | 2008-12-29 18:53:17 +0100 (Mon, 29 Dec 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Always add the term XDIR:/. ------------------------------------------------------------------------ r1449 | fabricecolin | 2008-12-29 18:52:02 +0100 (Mon, 29 Dec 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp Renamed PrefixDecider to TermDecider. Reject terms that stem to the same as a previously validated term. ------------------------------------------------------------------------ r1448 | fabricecolin | 2008-12-19 15:13:14 +0100 (Fri, 19 Dec 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianEngine.h Don't wait for query step 2 to get a stemmer if the query defines a stemming language. Pass it to PrefixDecider so that it can reject query terms and terms that stem to the same as them. ------------------------------------------------------------------------ r1447 | fabricecolin | 2008-12-19 14:43:08 +0100 (Fri, 19 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/prefsWindow_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc Setting an adjustment with non-zero page size on a SpinButton is deprecated. ------------------------------------------------------------------------ r1446 | fabricecolin | 2008-12-16 14:43:28 +0100 (Tue, 16 Dec 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp PrefixDecider rejects terms with spaces, which shouldn't be there in the first place. ------------------------------------------------------------------------ r1445 | fabricecolin | 2008-12-15 14:53:32 +0100 (Mon, 15 Dec 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Following r1409, external back-ends with no channel specified didn't show in the engines list. ------------------------------------------------------------------------ r1444 | fabricecolin | 2008-12-14 05:16:33 +0100 (Sun, 14 Dec 2008) | 3 lines Changed paths: M /trunk/configure.in Require at least Xapian 1.0.4 for the recent changes to XapianIndex and XapianEngine to function as expected. ------------------------------------------------------------------------ r1443 | fabricecolin | 2008-12-14 05:12:21 +0100 (Sun, 14 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Removed unused method renderExtractColumn(). ------------------------------------------------------------------------ r1442 | fabricecolin | 2008-12-14 05:11:27 +0100 (Sun, 14 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Changed the "Showing..." message to something a bit easier to translate. ------------------------------------------------------------------------ r1441 | fabricecolin | 2008-12-14 02:45:32 +0100 (Sun, 14 Dec 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc Don't check status right away, say we are checking and wait for the first timer to kick in to get those values. ------------------------------------------------------------------------ r1440 | fabricecolin | 2008-12-14 02:42:04 +0100 (Sun, 14 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Fixed getHomeDirectory() to always return when HAVE_PWD_H isn't defined. ------------------------------------------------------------------------ r1439 | fabricecolin | 2008-12-11 14:42:28 +0100 (Thu, 11 Dec 2008) | 5 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianEngine.h Get PrefixDecider to reject short terms instead of doing it after it ran. Also reject terms that occur only once and stop words. If Xapian >= 1.0.4 is available, set the empty prefix so that both text and title are searched for on non-prefixed query terms. ------------------------------------------------------------------------ r1438 | fabricecolin | 2008-12-07 06:44:07 +0100 (Sun, 07 Dec 2008) | 6 lines Changed paths: A /trunk/scripts/bash/pinot-cd.sh This script implements a "tagged cd". The idea, from C. Scott Ananian, is to allow changing to a directory based on tags it matches. Tags here are the components of the path that leads to that directory, and the directory name itself. This will only work on an index built with the latest Xapian back-end. ------------------------------------------------------------------------ r1437 | fabricecolin | 2008-12-07 06:39:40 +0100 (Sun, 07 Dec 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp If there's a space in the file name, decompose it into XPATH terms. All those are lower-cased. Don't index the title without prefix as if it were in the text body. ------------------------------------------------------------------------ r1436 | fabricecolin | 2008-12-07 04:07:56 +0100 (Sun, 07 Dec 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Another fix for engine channels. New default query "pinot search". ------------------------------------------------------------------------ r1435 | fabricecolin | 2008-12-07 04:07:10 +0100 (Sun, 07 Dec 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Get an iterator to the extract TextBuffer after clearing it, as the iterator would otherwise be invalidated and GTK would complain about it. ------------------------------------------------------------------------ r1434 | fabricecolin | 2008-12-06 02:23:28 +0100 (Sat, 06 Dec 2008) | 2 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/Url.cpp GCC 4.4 patch by Martin Michlmayr (Debian bug #504908). ------------------------------------------------------------------------ r1433 | fabricecolin | 2008-12-04 15:42:57 +0100 (Thu, 04 Dec 2008) | 2 lines Changed paths: M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsWindow.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Wrap all DBus specific code with HAVE_DBUS ifdef's. ------------------------------------------------------------------------ r1432 | fabricecolin | 2008-12-04 15:40:17 +0100 (Thu, 04 Dec 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/configure.in Allow disabling DBus support. This will turn off building of libIndex.a, pinot-label and pinot-dbus-daemon. ------------------------------------------------------------------------ r1431 | fabricecolin | 2008-12-03 16:14:02 +0100 (Wed, 03 Dec 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Use signal() if sigaction() isn't available. ------------------------------------------------------------------------ r1430 | fabricecolin | 2008-12-03 16:11:16 +0100 (Wed, 03 Dec 2008) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Renamed CrawlHistory::ERROR to CRAWL_ERROR. ------------------------------------------------------------------------ r1429 | fabricecolin | 2008-12-03 16:10:26 +0100 (Wed, 03 Dec 2008) | 4 lines Changed paths: M /trunk/Collect/DownloaderInterface.cpp M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/configure.in In DownloaderInterface, GetCurrentThreadId() is specific to Windows, not to MingW. Check for fnmatch.h, pwd.h, sigaction(), sysconf(), getloadavg() and pipe(). ------------------------------------------------------------------------ r1428 | fabricecolin | 2008-12-02 16:54:50 +0100 (Tue, 02 Dec 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/SQL/QueryHistory.cpp M /trunk/configure.in Header fixes. Do without regex.h in XapianDatabase if not available. In configure.in, libcrypt should no longer be required. ------------------------------------------------------------------------ r1427 | fabricecolin | 2008-12-01 14:51:32 +0100 (Mon, 01 Dec 2008) | 2 lines Changed paths: M /trunk/Collect/DownloaderInterface.cpp M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/Timer.cpp M /trunk/Utils/Timer.h More portability fixes. ------------------------------------------------------------------------ r1426 | fabricecolin | 2008-12-01 14:16:35 +0100 (Mon, 01 Dec 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Identify threads spawned by DBusServlet by their ID. In mainWindow, use mktemp() if mkstemp() isn't available. ------------------------------------------------------------------------ r1425 | fabricecolin | 2008-12-01 14:01:09 +0100 (Mon, 01 Dec 2008) | 2 lines Changed paths: M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Utils/Document.cpp M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/unac/unac.c Portability fixes, brought up when compiling with MingW. ------------------------------------------------------------------------ r1424 | fabricecolin | 2008-12-01 13:57:36 +0100 (Mon, 01 Dec 2008) | 2 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/IndexSearch/XesamGLib/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/SQL/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am Use MISC_CFLAGS not VISIB_CFLAGS. ------------------------------------------------------------------------ r1423 | fabricecolin | 2008-12-01 13:56:07 +0100 (Mon, 01 Dec 2008) | 7 lines Changed paths: M /trunk/configure.in If textcat.h can't be found, we'll have to do without. Look for the library pthreadGCE2 if others can't be found. The test for GIO can be disabled with --enable-gio=yes/no. Substitute MISC_CFLAGS in Makefile's. Check for the dlfcn.h header, and the functions gmtime_r(), localtime_r(), strptime() and mkstemp(). ------------------------------------------------------------------------ r1422 | fabricecolin | 2008-11-30 11:09:54 +0100 (Sun, 30 Nov 2008) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot-dbus-daemon.desktop A /trunk/pinot-prefs.desktop M /trunk/pinot.desktop Intall pinot-prefs.desktop, which creates a menu entry in system preferences. ALl .desktop files have "Encoding=UTF-8". ------------------------------------------------------------------------ r1421 | fabricecolin | 2008-11-30 11:07:50 +0100 (Sun, 30 Nov 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp If a group is empty, add a "No results" child. ------------------------------------------------------------------------ r1420 | fabricecolin | 2008-11-30 11:05:33 +0100 (Sun, 30 Nov 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Larger default blacklist. In loadEngineChannels(), create entries in m_engineChannels if necessary. ------------------------------------------------------------------------ r1419 | fabricecolin | 2008-11-22 06:58:42 +0100 (Sat, 22 Nov 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsWindow.cc Use the new PinotSettings::load() to load all settings. In pinot, each window is responsible for saving its own settings when closing. The main window saves its part before opening the preferences window as the daemon may subsequently reload all the settings. ------------------------------------------------------------------------ r1418 | fabricecolin | 2008-11-22 06:53:32 +0100 (Sat, 22 Nov 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h Split the configuration file in two, one part being updated by preferences, and the other by the UI. The load() method now handles everything, including engines and their editable values. ------------------------------------------------------------------------ r1417 | fabricecolin | 2008-11-19 15:40:31 +0100 (Wed, 19 Nov 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsWindow.cc Log messages to pinot-prefs.log. Fixed exit after starting the daemon and/or setting labels. ------------------------------------------------------------------------ r1416 | fabricecolin | 2008-11-19 14:58:01 +0100 (Wed, 19 Nov 2008) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in M /trunk/po/POTFILES.in Set pinot-prefs as a symlink to pinot, install scripts/bash/*.sh. POTFILES looks for the prefsWindow source. ------------------------------------------------------------------------ r1415 | fabricecolin | 2008-11-19 14:55:52 +0100 (Wed, 19 Nov 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot.cc Replaced prefsDialog with prefsWindow. Using the new option -p/--preferences, or starting the program as pinot-prefs, makes the UI open prefsWindow and exit. When the Session, Preferences menu is selected, pinot runs pinot-prefs. ------------------------------------------------------------------------ r1414 | fabricecolin | 2008-11-19 14:24:34 +0100 (Wed, 19 Nov 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade D /trunk/UI/GTK2/src/prefsDialog.cc D /trunk/UI/GTK2/src/prefsDialog.hh D /trunk/UI/GTK2/src/prefsDialog_glade.cc D /trunk/UI/GTK2/src/prefsDialog_glade.hh A /trunk/UI/GTK2/src/prefsWindow.cc A /trunk/UI/GTK2/src/prefsWindow.hh A /trunk/UI/GTK2/src/prefsWindow_glade.cc A /trunk/UI/GTK2/src/prefsWindow_glade.hh Preferences is now a separate, independant window. ------------------------------------------------------------------------ r1413 | fabricecolin | 2008-11-16 15:12:42 +0100 (Sun, 16 Nov 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml The daemon'd better call loadSearchEngines() for Query to offer access to the same engines as the UI. Fixed typo in the XML description file. ------------------------------------------------------------------------ r1412 | fabricecolin | 2008-11-16 15:09:49 +0100 (Sun, 16 Nov 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/pinot-search.cpp Added a -l/--locationnly parameter to pinot-search that makes it display only the results' location. In XapianEngine, XPATH: shouldn't be a boolean prefix. ------------------------------------------------------------------------ r1411 | fabricecolin | 2008-11-15 08:24:39 +0100 (Sat, 15 Nov 2008) | 3 lines Changed paths: M /trunk/scripts/python/pinot-module.py Modified to use the new D-Bus method Query, instead of SimpleQuery. Be ready to catch AttributeError on set_snippet() (deskbar < v2.24). ------------------------------------------------------------------------ r1410 | fabricecolin | 2008-11-15 07:23:35 +0100 (Sat, 15 Nov 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Fixed default engine parameters supported by Query. If EngineQueryThread can't find the requested engine, it looks for a plugin that matches the given name and use that instead. ------------------------------------------------------------------------ r1409 | fabricecolin | 2008-11-15 04:59:05 +0100 (Sat, 15 Nov 2008) | 2 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/PluginWebEngine.h M /trunk/IndexSearch/SherlockParser.cpp M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp Sorted out discrepancies between SearchPluginProperties ad ModuleProperties. ------------------------------------------------------------------------ r1408 | fabricecolin | 2008-11-13 16:01:48 +0100 (Thu, 13 Nov 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h Show spelling suggestions above the results list. A close button hides them until other suggestions are available. ------------------------------------------------------------------------ r1407 | fabricecolin | 2008-11-11 15:33:04 +0100 (Tue, 11 Nov 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp LabelUpdateThread need not worry about updating My Web Pages' labels since they are exclusively pulled from My Documents. ------------------------------------------------------------------------ r1406 | fabricecolin | 2008-11-11 15:19:58 +0100 (Tue, 11 Nov 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh Only make new labels editable, since existing labels can't be renamed. ------------------------------------------------------------------------ r1405 | fabricecolin | 2008-11-09 16:47:01 +0100 (Sun, 09 Nov 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp Provide reasonable defaults for Query's engine type and option. Some cosmetic changes. ------------------------------------------------------------------------ r1404 | fabricecolin | 2008-11-09 13:52:37 +0100 (Sun, 09 Nov 2008) | 6 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml DBusServletInfo encapsulates information required to reply to D-Bus requests, and allows running another thread, for instance an EngineQueryThread to reply to queries. The D-Bus interface now includes a Query method similar to C. Scott Ananian's JournalQuery. This will eventually replace SimpleQuery. ------------------------------------------------------------------------ r1403 | fabricecolin | 2008-11-09 13:36:47 +0100 (Sun, 09 Nov 2008) | 2 lines Changed paths: M /trunk/scripts/python/pinot-module.py Support for deskbar v2.24 snippets. ------------------------------------------------------------------------ r1402 | fabricecolin | 2008-11-08 05:00:11 +0100 (Sat, 08 Nov 2008) | 4 lines Changed paths: M /trunk/AUTHORS M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/pinot-search.cpp Patches by C. Scott Ananian. Fix XapianEngine to get the results count estimate even when there are no matches left. Serialize results score over D-Bus. Add a "sort by date first" mode to pinot-search. ------------------------------------------------------------------------ r1401 | fabricecolin | 2008-11-02 10:11:04 +0100 (Sun, 02 Nov 2008) | 2 lines Changed paths: D /trunk/Tokenize/Tokenizer.h This was obsoleted a long time ago. ------------------------------------------------------------------------ r1400 | fabricecolin | 2008-11-02 10:07:59 +0100 (Sun, 02 Nov 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Whenever metadata is updated through the DBus interface, the daemon updates the MetaDataBackup table. When reindexing occurs, the crawler re-applies all the metadata found there. Synced with recent changes to CrawlHistory. ------------------------------------------------------------------------ r1399 | fabricecolin | 2008-11-02 09:42:24 +0100 (Sun, 02 Nov 2008) | 2 lines Changed paths: M /trunk/SQL/MetaDataBackup.cpp M /trunk/SQL/MetaDataBackup.h M /trunk/Utils/Document.cpp Fixed xattr.h check. Added MetaDataBackup::getItems(). ------------------------------------------------------------------------ r1398 | fabricecolin | 2008-11-02 09:35:29 +0100 (Sun, 02 Nov 2008) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Method getSourceItems() now takes a min/max range. ------------------------------------------------------------------------ r1397 | fabricecolin | 2008-11-02 07:16:41 +0100 (Sun, 02 Nov 2008) | 2 lines Changed paths: M /trunk/Collect/FileCollector.cpp M /trunk/Collect/FileCollector.h Test the document's MIME type after loading. ------------------------------------------------------------------------ r1396 | fabricecolin | 2008-11-02 07:11:49 +0100 (Sun, 02 Nov 2008) | 3 lines Changed paths: M /trunk/Utils/Document.cpp In setDataFromFile(), look for extended attributes. At the moment, only the attribute user.mime_type is supported. ------------------------------------------------------------------------ r1395 | fabricecolin | 2008-11-01 15:52:37 +0100 (Sat, 01 Nov 2008) | 5 lines Changed paths: M /trunk/SQL/Makefile.am A /trunk/SQL/MetaDataBackup.cpp A /trunk/SQL/MetaDataBackup.h M /trunk/configure.in Check for the header attr/xattr.h. Build MetaDataBackup, a class that enables saving to and loading metadata from the database and/or filesystem extended attributes. The purpose is to allow the daemon to restore user-set metadata after reindexing. ------------------------------------------------------------------------ r1394 | fabricecolin | 2008-10-27 08:13:45 +0100 (Mon, 27 Oct 2008) | 2 lines Changed paths: M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h Serialization methods can deal with all or only some properties. ------------------------------------------------------------------------ r1393 | fabricecolin | 2008-10-27 07:08:14 +0100 (Mon, 27 Oct 2008) | 3 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/DBusIndex.h M /trunk/IndexSearch/IndexInterface.h M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh Removing the seldom-used and more-trouble-than-it's-worth ability to rename labels. ------------------------------------------------------------------------ r1392 | fabricecolin | 2008-10-27 05:04:03 +0100 (Mon, 27 Oct 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/Utils/Url.h Cosmetic changes. ------------------------------------------------------------------------ r1391 | fabricecolin | 2008-10-27 05:02:01 +0100 (Mon, 27 Oct 2008) | 6 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h The FileFound signal doesn't require a source label, the caller should set it on the document prior to emitting the signal. The DirectoryScanner thread inherits from Indexing and will index files as they are being found, unless the env var PINOT_DELEGATE_INDEXING is set to Y. I am enabling this by default, but this may change in the near future. ------------------------------------------------------------------------ r1390 | fabricecolin | 2008-10-27 04:43:12 +0100 (Mon, 27 Oct 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/queryDialog.cc Support for the path: filter. ------------------------------------------------------------------------ r1389 | fabricecolin | 2008-10-25 09:35:03 +0200 (Sat, 25 Oct 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp Index all components of the path to a file with the XPATH: prefix. At search time, this maps to the "path:" filter. ------------------------------------------------------------------------ r1388 | fabricecolin | 2008-10-21 16:35:21 +0200 (Tue, 21 Oct 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh The daemon's status is retrieved in the background. The status window should no longer occasionally freeze. ------------------------------------------------------------------------ r1387 | fabricecolin | 2008-10-18 12:33:19 +0200 (Sat, 18 Oct 2008) | 2 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/IndexSearch/FilterWrapper.h Added method reduceToText() and the ReducedAction class. ------------------------------------------------------------------------ r1386 | fabricecolin | 2008-10-15 15:46:31 +0200 (Wed, 15 Oct 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh When a spelling suggestion is available, don't create a Corrected query right away. Instead, show suggestions below the results list, in a combobox labeled "Did you mean ?" and let the user choose which one he thinks is relevant and click the Yes button. Then create a new query, based on the original. ------------------------------------------------------------------------ r1385 | fabricecolin | 2008-10-11 12:24:34 +0200 (Sat, 11 Oct 2008) | 3 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h Reduced code duplication between the Neon and Curl downloaders. Added ability to do a POST. ------------------------------------------------------------------------ r1384 | fabricecolin | 2008-10-11 11:06:15 +0200 (Sat, 11 Oct 2008) | 3 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/pinot-index.cpp Cosmetic changes mostly. In pinot-index, let closeAll() close stuff at exit time. ------------------------------------------------------------------------ r1383 | fabricecolin | 2008-10-11 10:35:26 +0200 (Sat, 11 Oct 2008) | 4 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorInterface.h New getLimit() method to get the maximum number of watches. INotifyMonitor will not attempt using more watches than available. If there are more than 8k, 1k is set aside for other applications. ------------------------------------------------------------------------ r1382 | fabricecolin | 2008-10-09 15:09:15 +0200 (Thu, 09 Oct 2008) | 5 lines Changed paths: M /trunk/FAQ M /trunk/Utils/MIMEScanner.cpp In MIMEScanner::getDefaultActionsForType(), if GIO is used, initialize the list with the default actions obtained with g_app_info_get_default_for_type() to make sure they are picked up first. Add an entry to the FAQ about this, using browsers and HTML docs as example. ------------------------------------------------------------------------ r1381 | fabricecolin | 2008-10-05 10:58:20 +0200 (Sun, 05 Oct 2008) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/Makefile.am D /trunk/Monitor/linux-inotify-syscalls.h M /trunk/configure.in A patch by Adrian Bunk to fix inotify support. More information on this topic at http://lkml.org/lkml/2008/9/16/79 ------------------------------------------------------------------------ r1380 | fabricecolin | 2008-10-01 11:35:54 +0200 (Wed, 01 Oct 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h Don't rely on the size of m_indexNames for IDs assigned to indexes; that list may grow or shrink while Pinot is running and that messed things up, eg the retrieval from history and the display of abstracts. ------------------------------------------------------------------------ r1379 | fabricecolin | 2008-10-01 11:33:13 +0200 (Wed, 01 Oct 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp Record the position of all terms not just the first m_maxSeedTerm to ensure all terms are highlighted in the abstract. ------------------------------------------------------------------------ r1378 | fabricecolin | 2008-09-30 15:54:07 +0200 (Tue, 30 Sep 2008) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.cpp The conversion specifier %z is a GNU extension to strftime(). ------------------------------------------------------------------------ r1376 | fabricecolin | 2008-09-20 10:48:42 +0200 (Sat, 20 Sep 2008) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1375 | fabricecolin | 2008-09-20 10:45:49 +0200 (Sat, 20 Sep 2008) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO Current state of things. ------------------------------------------------------------------------ r1374 | fabricecolin | 2008-09-20 10:40:29 +0200 (Sat, 20 Sep 2008) | 4 lines Changed paths: M /trunk/IndexSearch/XesamGLib/XesamEngine.cpp Assume QueryProperties sanitized the query. Attempt to stop the search when enough hits have been received. Include as found in xesam-glib v0.5.0. ------------------------------------------------------------------------ r1373 | fabricecolin | 2008-09-20 09:18:09 +0200 (Sat, 20 Sep 2008) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current translations, with updates from Aron Xu (Simplified Chinese) and André Gondim (Brazilian Portuguese). ------------------------------------------------------------------------ r1372 | fabricecolin | 2008-09-18 16:48:09 +0200 (Thu, 18 Sep 2008) | 4 lines Changed paths: M /trunk/configure.in Bumped version number to 0.89. Force automatic reindexing of older versions so that users benefit from the various bug fixes to CJKV, handling of basic types as well as diacritics insensitivity. ------------------------------------------------------------------------ r1371 | fabricecolin | 2008-09-18 16:44:14 +0200 (Thu, 18 Sep 2008) | 3 lines Changed paths: M /trunk/AUTHORS Give credit to Loic Dachary for Unac, and to Constantin Teodorescu for his help with testing diacritics (in)sensitivity. ------------------------------------------------------------------------ r1370 | fabricecolin | 2008-09-18 16:42:42 +0200 (Thu, 18 Sep 2008) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp DEBUG only output. ------------------------------------------------------------------------ r1369 | fabricecolin | 2008-09-18 16:41:44 +0200 (Thu, 18 Sep 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Added CVS, .svn and .torrent to the default blacklist. ------------------------------------------------------------------------ r1368 | fabricecolin | 2008-09-18 16:38:27 +0200 (Thu, 18 Sep 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Boolean operators are case sensitive since 0.88 ! ------------------------------------------------------------------------ r1367 | fabricecolin | 2008-09-16 16:37:39 +0200 (Tue, 16 Sep 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Stems of terms without diacritics are indexed too. ------------------------------------------------------------------------ r1366 | fabricecolin | 2008-09-13 07:52:52 +0200 (Sat, 13 Sep 2008) | 3 lines Changed paths: M /trunk/README Clarify parts about stemming and blacklists. Mention stopwords removal and query correction. ------------------------------------------------------------------------ r1365 | fabricecolin | 2008-09-13 07:51:06 +0200 (Sat, 13 Sep 2008) | 2 lines Changed paths: M /trunk/Makefile.am Create the stopwords directory even though there are no lists to copy there. ------------------------------------------------------------------------ r1364 | fabricecolin | 2008-09-13 07:07:02 +0200 (Sat, 13 Sep 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Assume query strings are sanitized by QueryProperty. ------------------------------------------------------------------------ r1363 | fabricecolin | 2008-09-13 07:01:56 +0200 (Sat, 13 Sep 2008) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp Include more headers to hopefully fix build issue on current Fedora 10. ------------------------------------------------------------------------ r1362 | fabricecolin | 2008-09-13 06:59:53 +0200 (Sat, 13 Sep 2008) | 9 lines Changed paths: M /trunk/IndexSearch/Xapian/ModuleExports.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianEngine.h FileStopper implements a file-based Stopper. Language specific stopwords lists are assumed to be at "$PREFIX/share/pinot/stopwords/stopwords.language_code". Stopwords are removed from queries when stemming is on, and if the query consists of more than one token. For non-CJKV queries, we rely on the number of spaces in the query (which is okay for the languages stopwords removal is useful for). Cache the stopper object, and delete it when the back-end is unloaded. Let QueryProperties sanitize the query string. ------------------------------------------------------------------------ r1361 | fabricecolin | 2008-09-13 06:31:24 +0200 (Sat, 13 Sep 2008) | 3 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp Clean up the query string : remove CRs, dehyphen on line breaks, replace line breaks with spaces. ------------------------------------------------------------------------ r1360 | fabricecolin | 2008-09-09 15:21:07 +0200 (Tue, 09 Sep 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp If the query is not sensitive to diacritics, get QueryModifier to remove them from query terms. Fixed a (harmless ?) bug where a stray bracket would be appended to queries. ------------------------------------------------------------------------ r1359 | fabricecolin | 2008-09-09 15:17:59 +0200 (Tue, 09 Sep 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Unless_DIACRITICS_SENSITIVE is defined, always use our own TokensIndexer; terms with diacritics are indexed with and without. Only add a "XTOK:CJKV" term if CJKV was found. ------------------------------------------------------------------------ r1358 | fabricecolin | 2008-09-09 15:10:43 +0200 (Tue, 09 Sep 2008) | 2 lines Changed paths: M /trunk/configure.in Check for vsnprintf(). ------------------------------------------------------------------------ r1357 | fabricecolin | 2008-09-08 15:41:21 +0200 (Mon, 08 Sep 2008) | 2 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h Added a diacriticSensitive property, false by default. ------------------------------------------------------------------------ r1356 | fabricecolin | 2008-09-08 15:40:13 +0200 (Mon, 08 Sep 2008) | 4 lines Changed paths: M /trunk/Utils/Makefile.am M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h A /trunk/Utils/unac A /trunk/Utils/unac/unac.c A /trunk/Utils/unac/unac.h Embed a copy of unac 1.7.0 main files. Unac is Copyright (C) 2000, 2001, 2002 Loic Dachary. StringManip::stripDiacritics() wraps unac_string(). ------------------------------------------------------------------------ r1355 | fabricecolin | 2008-09-08 15:36:44 +0200 (Mon, 08 Sep 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc In add_query(), only merge queries if the new bit isn't in already. This prevents automatically generated queries such as spelling corrections from getting longer and longer when running the same query against the same index. ------------------------------------------------------------------------ r1354 | fabricecolin | 2008-09-08 15:34:46 +0200 (Mon, 08 Sep 2008) | 6 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp When there's only one term, no abstract window would be chosen because the default weights were zero. Prefer the window leading to terms, rather than following. Rely on terms' positions to determine whether terms should be highlighted as sometimes one position may have more than one term. ------------------------------------------------------------------------ r1352 | fabricecolin | 2008-08-29 18:02:43 +0200 (Fri, 29 Aug 2008) | 2 lines Changed paths: M /trunk/ChangeLog Current log. ------------------------------------------------------------------------ r1351 | fabricecolin | 2008-08-29 17:58:52 +0200 (Fri, 29 Aug 2008) | 2 lines Changed paths: M /trunk/ChangeLog-dijon M /trunk/NEWS Updated NEWS and Dijon's ChangeLog. ------------------------------------------------------------------------ r1350 | fabricecolin | 2008-08-28 17:52:50 +0200 (Thu, 28 Aug 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/Google.src Another plugin update. ------------------------------------------------------------------------ r1349 | fabricecolin | 2008-08-28 15:30:42 +0200 (Thu, 28 Aug 2008) | 3 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/README M /trunk/TODO M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Bumped the version number, regenerated man pages. Minor updates to TODO and README. ------------------------------------------------------------------------ r1348 | fabricecolin | 2008-08-28 15:14:48 +0200 (Thu, 28 Aug 2008) | 2 lines Changed paths: M /trunk/Makefile.am Move YahooBOSS.src out of the engines directory for the time being. ------------------------------------------------------------------------ r1347 | fabricecolin | 2008-08-28 15:08:46 +0200 (Thu, 28 Aug 2008) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/CreativeCommons.src This is now a blog search. The actual content search is somewhat obfuscated. ------------------------------------------------------------------------ r1346 | fabricecolin | 2008-08-28 15:07:38 +0200 (Thu, 28 Aug 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/BitTorrent.src M /trunk/IndexSearch/Plugins/CreativeCommons.src M /trunk/IndexSearch/Plugins/Freshmeat.src M /trunk/IndexSearch/Plugins/GoogleCodeSearch.src M /trunk/IndexSearch/Plugins/RollYOTopNews.src M /trunk/IndexSearch/Plugins/Wikipedia.src Search engine updates. ------------------------------------------------------------------------ r1345 | fabricecolin | 2008-08-27 16:28:47 +0200 (Wed, 27 Aug 2008) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/README Added rainofchaos to list of translators. Rephrased part about engines and query syntaxes in section 6 of the README. ------------------------------------------------------------------------ r1344 | fabricecolin | 2008-08-25 16:46:08 +0200 (Mon, 25 Aug 2008) | 5 lines Changed paths: M /trunk/IndexSearch/pinot-search.cpp M /trunk/README Explain to what extent Xesam specs are supported; pinot-search's help text has some examples now. Clarify that building the Google API is not recommended, and how to use filters in CJKV queries. ------------------------------------------------------------------------ r1343 | fabricecolin | 2008-08-24 06:43:33 +0200 (Sun, 24 Aug 2008) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current translations. ------------------------------------------------------------------------ r1342 | fabricecolin | 2008-08-23 16:39:04 +0200 (Sat, 23 Aug 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Create a bunch of useful default queries. ------------------------------------------------------------------------ r1341 | fabricecolin | 2008-08-23 16:37:10 +0200 (Sat, 23 Aug 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Google/ModuleExports.cpp Fix previous commit. ------------------------------------------------------------------------ r1340 | fabricecolin | 2008-08-23 14:57:23 +0200 (Sat, 23 Aug 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Show extra stats only when on. ------------------------------------------------------------------------ r1339 | fabricecolin | 2008-08-19 15:46:17 +0200 (Tue, 19 Aug 2008) | 2 lines Changed paths: A /trunk/IndexSearch/Google/ModuleExports.cpp Functions exported by this backend. ------------------------------------------------------------------------ r1338 | fabricecolin | 2008-08-19 15:43:52 +0200 (Tue, 19 Aug 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Synced with the new ModuleFactory::getSupportedEngines(). Replaced PinotSettings::Engine with ModuleProperties. ------------------------------------------------------------------------ r1337 | fabricecolin | 2008-08-19 15:40:26 +0200 (Tue, 19 Aug 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/ModuleFactory.h A /trunk/IndexSearch/ModuleProperties.h M /trunk/IndexSearch/SearchPluginProperties.cpp M /trunk/IndexSearch/SearchPluginProperties.h M /trunk/IndexSearch/Xapian/ModuleExports.cpp M /trunk/IndexSearch/XesamGLib/ModuleExports.cpp M /trunk/IndexSearch/pinot-index.cpp M /trunk/IndexSearch/pinot-search.cpp Spun some properties out of SearchPluginProperties into ModuleProperties. Backends must implement ModuleProperties *getModuleProperties() instead of string getModuleType(). ------------------------------------------------------------------------ r1336 | fabricecolin | 2008-08-19 15:38:49 +0200 (Tue, 19 Aug 2008) | 2 lines Changed paths: M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/SherlockParser.cpp Forget about description of engines. ------------------------------------------------------------------------ r1335 | fabricecolin | 2008-08-16 07:57:59 +0200 (Sat, 16 Aug 2008) | 3 lines Changed paths: M /trunk/configure.in If GIO can sniff PNG, set USE_GIO and make sure programs link against it. The test on GIO's sniffing abilities comes from gtk+'s configure.in. ------------------------------------------------------------------------ r1334 | fabricecolin | 2008-08-16 07:54:54 +0200 (Sat, 16 Aug 2008) | 3 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp Compile getKeyValue() out if GIO is in use. The xdgmime function to call in scanData() is xdg_mime_get_mime_type_for_data(). ------------------------------------------------------------------------ r1333 | fabricecolin | 2008-08-15 16:47:45 +0200 (Fri, 15 Aug 2008) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h New MIMEScanner::scanData() method, used when a filter returns "scan" as type. ------------------------------------------------------------------------ r1332 | fabricecolin | 2008-08-15 15:25:14 +0200 (Fri, 15 Aug 2008) | 4 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h M /trunk/Utils/CommandLine.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h MIMEScanner::getParentTypes() needs the list of all supported types so that in GIO mode it can determine which types are related with g_content_type_is_a(). This flattens the types hierarchy though, but this will do. ------------------------------------------------------------------------ r1331 | fabricecolin | 2008-08-14 15:34:54 +0200 (Thu, 14 Aug 2008) | 3 lines Changed paths: M /trunk/IndexSearch/pinot-index.cpp If --showinfo is passed, show which actions are associated with the document's MIME type. ------------------------------------------------------------------------ r1330 | fabricecolin | 2008-08-14 15:29:02 +0200 (Thu, 14 Aug 2008) | 4 lines Changed paths: M /trunk/Utils/CommandLine.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h M /trunk/Utils/Makefile.am Use GAppInfo at the same time as GContentType, instead of the MIMECache code. This extends to CommandLine::runAsync(). Switched to USE_GIO for consistency with the HTTP stuff. ------------------------------------------------------------------------ r1329 | fabricecolin | 2008-08-11 09:43:11 +0200 (Mon, 11 Aug 2008) | 2 lines Changed paths: M /trunk/Collect/FileCollector.cpp M /trunk/IndexSearch/pinot-index.cpp M /trunk/IndexSearch/pinot-search.cpp Cosmetic changes. ------------------------------------------------------------------------ r1328 | fabricecolin | 2008-08-11 09:41:48 +0200 (Mon, 11 Aug 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Google/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/IndexSearch/XesamGLib/Makefile.am Link all backends against BasicUtils. For the Google API, bundle all source files as headers if support is turned off. ------------------------------------------------------------------------ r1327 | fabricecolin | 2008-08-11 09:39:22 +0200 (Mon, 11 Aug 2008) | 2 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h M /trunk/Utils/Makefile.am Use GContentType instead of xdgmime if HAVE_GIO_MIME is set/defined. ------------------------------------------------------------------------ r1326 | fabricecolin | 2008-08-06 15:35:44 +0200 (Wed, 06 Aug 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h Removed dead functions. Both set/getLabels() rely on set/getMetadata(). ------------------------------------------------------------------------ r1325 | fabricecolin | 2008-08-04 16:45:18 +0200 (Mon, 04 Aug 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Google API key fix. ------------------------------------------------------------------------ r1324 | fabricecolin | 2008-08-04 16:37:13 +0200 (Mon, 04 Aug 2008) | 2 lines Changed paths: D /trunk/Tokenize/Tokenizer.cpp D /trunk/Tokenize/tokenizertest.cpp These two were obsoleted quite some time ago... ------------------------------------------------------------------------ r1323 | fabricecolin | 2008-08-04 16:33:24 +0200 (Mon, 04 Aug 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/prefsDialog.cc The Google API engine is now loaded dynamically, if available, and identified as "Google API". ------------------------------------------------------------------------ r1322 | fabricecolin | 2008-08-04 16:31:35 +0200 (Mon, 04 Aug 2008) | 2 lines Changed paths: D /trunk/IndexSearch/SOAPEnv.h D /trunk/IndexSearch/SOAPEnvH.h D /trunk/IndexSearch/SOAPEnvNS.cpp D /trunk/IndexSearch/SOAPEnvStub.h These are no longer needed. ------------------------------------------------------------------------ r1321 | fabricecolin | 2008-08-04 16:22:08 +0200 (Mon, 04 Aug 2008) | 5 lines Changed paths: M /trunk/IndexSearch/Google/GAPIC.cpp M /trunk/IndexSearch/Google/GAPIClient.cpp M /trunk/IndexSearch/Google/GAPIClientLib.cpp M /trunk/IndexSearch/Google/GAPIGoogleSearchBindingProxy.h M /trunk/IndexSearch/Google/GAPIH.h M /trunk/IndexSearch/Google/GAPIStub.h M /trunk/IndexSearch/Google/GoogleAPIEngine.cpp M /trunk/IndexSearch/Google/GoogleSearch.h M /trunk/IndexSearch/Google/Makefile.am Regenerated code with gsoap 2.7.10. Compile GAPIC and GAPIClient, and ignore GAPIClientLib as it defines a macro that makes all the soap_ functions static. All this is built into a dynamic backend. ------------------------------------------------------------------------ r1320 | fabricecolin | 2008-08-04 16:15:56 +0200 (Mon, 04 Aug 2008) | 6 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/IndexSearch/FilterWrapper.h M /trunk/IndexSearch/IndexInterface.h M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/QueryProperties.h M /trunk/IndexSearch/SearchEngineInterface.h M /trunk/IndexSearch/WebEngine.h M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/IndexSearch/Xapian/ModuleExports.cpp M /trunk/IndexSearch/XesamGLib/Makefile.am M /trunk/IndexSearch/XesamGLib/ModuleExports.cpp M /trunk/Monitor/Makefile.am M /trunk/SQL/Makefile.am M /trunk/Tokenize/FilterUtils.h M /trunk/Tokenize/Makefile.am M /trunk/Tokenize/TextConverter.h M /trunk/Tokenize/Tokenizer.h M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/CommandLine.h M /trunk/Utils/Document.h M /trunk/Utils/DocumentInfo.h M /trunk/Utils/Languages.h M /trunk/Utils/MIMEScanner.h M /trunk/Utils/Makefile.am M /trunk/Utils/StringManip.h M /trunk/Utils/TimeConverter.h M /trunk/Utils/Timer.h M /trunk/Utils/Url.h M /trunk/configure.in If gcc 4.x is available, set symbol visibility to hidden by default. Only export stuff we know is required by the backends, and backend's entry points. Don't build libSearchSOAP, and remove references to the Google API engine, this will go into a separate backend. -fPIC is set in CXXFLAGS by configure.in instead of in each Makefile. ------------------------------------------------------------------------ r1319 | fabricecolin | 2008-08-03 17:23:35 +0200 (Sun, 03 Aug 2008) | 2 lines Changed paths: A /trunk/Utils/Visibility.h Some macros to set symbol visibility. ------------------------------------------------------------------------ r1318 | fabricecolin | 2008-08-03 17:22:35 +0200 (Sun, 03 Aug 2008) | 3 lines Changed paths: A /trunk/UI/GTK2/src/UniqueApplication.cpp A /trunk/UI/GTK2/src/UniqueApplication.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Moved code that enforced the daemon's uniqueness to UniqueApplication and prepare for an eventual switch to Unique (http://www.gnome.org/~ebassi/source/). ------------------------------------------------------------------------ r1317 | fabricecolin | 2008-08-03 15:02:29 +0200 (Sun, 03 Aug 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Time retrieveUrl(). ------------------------------------------------------------------------ r1316 | fabricecolin | 2008-08-03 11:09:13 +0200 (Sun, 03 Aug 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp The flag QueryParser::FLAG_BOOLEAN_ANY_CASE causes more problems than it's worth, especially when queries are pieces of text pasted from elsewhere. ------------------------------------------------------------------------ r1315 | fabricecolin | 2008-08-03 11:04:32 +0200 (Sun, 03 Aug 2008) | 4 lines Changed paths: M /trunk/IndexSearch/XesamGLib/XesamEngine.cpp Use xesam:language and xesam:relevancyRating. It's not clear in what language the languages list is supposed to be, so we assume it's in the locale language. ------------------------------------------------------------------------ r1314 | fabricecolin | 2008-07-26 08:10:28 +0200 (Sat, 26 Jul 2008) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/fr.po M /trunk/po/pt_BR.po M /trunk/po/sv.po Updated de, pt_BR, sv and fr by Gena Haltmair, Rafael Porto Rodrigues, Daniel Nylander and myself. ------------------------------------------------------------------------ r1313 | fabricecolin | 2008-07-26 06:20:04 +0200 (Sat, 26 Jul 2008) | 5 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/DBusIndex.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Extended GetStatistics to return the flags "low disk space", "on battery" and "crawling". The latter is new to the daemon and set while a DirectoryScanner is running. That information is displayed in the Status window. ------------------------------------------------------------------------ r1311 | fabricecolin | 2008-07-19 13:11:52 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1310 | fabricecolin | 2008-07-19 13:00:19 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO This release's details. ------------------------------------------------------------------------ r1309 | fabricecolin | 2008-07-19 12:58:19 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/pinot.desktop Provide a comment for ru... in English, because it's not been translated yet. ------------------------------------------------------------------------ r1308 | fabricecolin | 2008-07-19 12:56:13 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current ranslations. ------------------------------------------------------------------------ r1307 | fabricecolin | 2008-07-19 12:49:20 +0200 (Sat, 19 Jul 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp M /trunk/IndexSearch/pinot-index.cpp AbstractGenerator ignores the current term's other positions when weighting abstract windows. pinot-index only lists index-capable backends. ------------------------------------------------------------------------ r1306 | fabricecolin | 2008-07-19 12:47:28 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Bumped version number to 0.87, updated manual pages. ------------------------------------------------------------------------ r1305 | fabricecolin | 2008-07-19 11:59:53 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/Makefile.am Don't move AmazonAPI out of the engines directory. ------------------------------------------------------------------------ r1304 | fabricecolin | 2008-07-19 11:58:39 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Follow META REFRESH if set. ------------------------------------------------------------------------ r1303 | fabricecolin | 2008-07-19 11:29:42 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/IndexSearch/XesamGLib/XesamEngine.cpp Fixed UL/QL mixup. ------------------------------------------------------------------------ r1302 | fabricecolin | 2008-07-19 08:44:20 +0200 (Sat, 19 Jul 2008) | 3 lines Changed paths: M /trunk/IndexSearch/XesamGLib/XesamEngine.cpp Extract hits data, turn live search off, set the maximum number of results and abort the search after a few seconds. ------------------------------------------------------------------------ r1301 | fabricecolin | 2008-07-19 08:24:35 +0200 (Sat, 19 Jul 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc The UI will let foreground threads run for up to a minute. ------------------------------------------------------------------------ r1300 | fabricecolin | 2008-07-18 17:21:09 +0200 (Fri, 18 Jul 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp Add search-only backends to the Current User channel and make sure it stays at the bottom of the engines tree. ------------------------------------------------------------------------ r1299 | fabricecolin | 2008-07-18 15:28:33 +0200 (Fri, 18 Jul 2008) | 6 lines Changed paths: M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/Plugins/AmazonAPI.src M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/pinot-search.cpp pinot-search can set a plugin editable parameter with -e/--seteditable. PluginWebEngine will output an error message if an editable parameter has no value. The Sherlock parser lower-cased input parameters for no specific reason. Moved the Amazon API plugin to the Shopping channel. ------------------------------------------------------------------------ r1298 | fabricecolin | 2008-07-13 15:51:08 +0200 (Sun, 13 Jul 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Preferences let the user edit all editable parameters defined in the plugins. Values are saved to and loaded from the configuration file, and are passed to WebEngine-derived search engines. ------------------------------------------------------------------------ r1297 | fabricecolin | 2008-07-13 15:46:53 +0200 (Sun, 13 Jul 2008) | 8 lines Changed paths: M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/OpenSearchParser.h M /trunk/IndexSearch/PluginParsers.h M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/PluginWebEngine.h M /trunk/IndexSearch/Plugins/A9.src M /trunk/IndexSearch/Plugins/AmazonAPI.src A /trunk/IndexSearch/Plugins/YahooBOSS.src M /trunk/IndexSearch/SearchPluginProperties.cpp M /trunk/IndexSearch/SearchPluginProperties.h M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/SherlockParser.h M /trunk/IndexSearch/WebEngine.cpp M /trunk/IndexSearch/WebEngine.h Plugins may define user-editable parameters by setting values to EDIT:description. This is designed to let the user specify his/her own ID or key for the Amazon API or the new Yahoo! BOSS API without having to edit plugins. For Sherlock plugins that don't specify a USER INPUT, append the query terms to the action URL. Always set input and output encodings, if defined, to UTF-8. ------------------------------------------------------------------------ r1296 | fabricecolin | 2008-07-11 08:21:03 +0200 (Fri, 11 Jul 2008) | 2 lines Changed paths: M /trunk/README Clarify some inaccuracies. ------------------------------------------------------------------------ r1295 | fabricecolin | 2008-07-11 08:17:47 +0200 (Fri, 11 Jul 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Use a TextView for extracts. This will allow dropping portions of the extract in the queries list. A bold tag is applied to words to highlight. ------------------------------------------------------------------------ r1294 | fabricecolin | 2008-07-11 06:47:39 +0200 (Fri, 11 Jul 2008) | 6 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Support for drag-n-drop. Dropping text on the stored queries list will create a new query with this text, while dropping files will generate a query to find similar documents in the indexes. If necessary, these files are indexed in My Web Pages. Files can be dropped on the import dialog's location field too. ------------------------------------------------------------------------ r1293 | fabricecolin | 2008-07-11 06:41:53 +0200 (Fri, 11 Jul 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp Skip very short non-CJKV terms when expanding queries. ------------------------------------------------------------------------ r1292 | fabricecolin | 2008-07-06 15:07:46 +0200 (Sun, 06 Jul 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp The QueryModifier would sometimes append n-1 bytes of the last CJKV character. ------------------------------------------------------------------------ r1291 | fabricecolin | 2008-07-06 15:03:23 +0200 (Sun, 06 Jul 2008) | 2 lines Changed paths: M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/ModuleFactory.h A /trunk/IndexSearch/XesamGLib/ModuleExports.cpp Backends that are search-only don't need to export index-related functions. ------------------------------------------------------------------------ r1290 | fabricecolin | 2008-06-28 14:34:01 +0200 (Sat, 28 Jun 2008) | 3 lines Changed paths: A /trunk/IndexSearch/XesamGLib A /trunk/IndexSearch/XesamGLib/Makefile.am A /trunk/IndexSearch/XesamGLib/XesamEngine.cpp A /trunk/IndexSearch/XesamGLib/XesamEngine.h M /trunk/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Experimental backend based on xesam-glib to query Xesam servers. Pass "--enable-xesam-glib=yes" to configure and "--with xesam-glib" to rpmbuild. ------------------------------------------------------------------------ r1288 | fabricecolin | 2008-06-21 10:16:00 +0200 (Sat, 21 Jun 2008) | 2 lines Changed paths: M /trunk/ChangeLog Update logs. ------------------------------------------------------------------------ r1287 | fabricecolin | 2008-06-21 10:11:47 +0200 (Sat, 21 Jun 2008) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/Accoona.src M /trunk/NEWS Removing Accoona, it's now a B2B search engine. ------------------------------------------------------------------------ r1286 | fabricecolin | 2008-06-21 10:11:00 +0200 (Sat, 21 Jun 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/A9.src Fixed results extraction. ------------------------------------------------------------------------ r1283 | fabricecolin | 2008-06-21 06:14:51 +0200 (Sat, 21 Jun 2008) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current logs. ------------------------------------------------------------------------ r1282 | fabricecolin | 2008-06-21 06:06:37 +0200 (Sat, 21 Jun 2008) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Current translations. ------------------------------------------------------------------------ r1281 | fabricecolin | 2008-06-21 06:04:55 +0200 (Sat, 21 Jun 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/NEWS M /trunk/TODO M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing for v0.86 release. ------------------------------------------------------------------------ r1280 | fabricecolin | 2008-06-19 17:05:50 +0200 (Thu, 19 Jun 2008) | 3 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/DownloaderFactory.h M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/FileCollector.cpp M /trunk/Collect/FileCollector.h M /trunk/Collect/MboxCollector.cpp M /trunk/Collect/MboxCollector.h M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/DBusIndex.h M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/IndexSearch/FilterWrapper.h M /trunk/IndexSearch/Google/GoogleAPIEngine.cpp M /trunk/IndexSearch/Google/GoogleAPIEngine.h M /trunk/IndexSearch/IndexInterface.h M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/ModuleFactory.h M /trunk/IndexSearch/OpenSearchParser.cpp M /trunk/IndexSearch/OpenSearchParser.h M /trunk/IndexSearch/PluginParsers.h M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/PluginWebEngine.h M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h M /trunk/IndexSearch/ResultsExporter.cpp M /trunk/IndexSearch/ResultsExporter.h M /trunk/IndexSearch/SOAPEnvH.h M /trunk/IndexSearch/SOAPEnvStub.h M /trunk/IndexSearch/SearchEngineInterface.cpp M /trunk/IndexSearch/SearchEngineInterface.h M /trunk/IndexSearch/SearchPluginProperties.cpp M /trunk/IndexSearch/SearchPluginProperties.h M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/SherlockParser.h M /trunk/IndexSearch/WebEngine.cpp M /trunk/IndexSearch/WebEngine.h M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp M /trunk/IndexSearch/Xapian/AbstractGenerator.h M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/IndexSearch/Xapian/LanguageDetector.h M /trunk/IndexSearch/Xapian/ModuleExports.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.h M /trunk/IndexSearch/Xapian/XapianDatabaseFactory.cpp M /trunk/IndexSearch/Xapian/XapianDatabaseFactory.h M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianEngine.h M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h M /trunk/IndexSearch/XesamLog.h M /trunk/IndexSearch/pinot-index.cpp M /trunk/IndexSearch/pinot-label.cpp M /trunk/IndexSearch/pinot-search.cpp M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorEvent.cpp M /trunk/Monitor/MonitorEvent.h M /trunk/Monitor/MonitorFactory.cpp M /trunk/Monitor/MonitorFactory.h M /trunk/Monitor/MonitorHandler.cpp M /trunk/Monitor/MonitorHandler.h M /trunk/Monitor/MonitorInterface.h M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/ActionQueue.h M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/SQL/SQLDB.cpp M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h M /trunk/SQL/historytest.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/TextConverter.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/tokenizertest.cpp M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/EnginesTree.h M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/indexDialog.hh M /trunk/UI/GTK2/src/launcherDialog.cc M /trunk/UI/GTK2/src/launcherDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh M /trunk/Utils/CommandLine.cpp M /trunk/Utils/CommandLine.h M /trunk/Utils/Document.cpp M /trunk/Utils/Document.h M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h M /trunk/Utils/Languages.cpp M /trunk/Utils/Languages.h M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h M /trunk/Utils/NLS.h M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h M /trunk/Utils/Timer.cpp M /trunk/Utils/Timer.h M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h Generated SOAP headers with gsoap 2.7.10, removed erroneous reference to the "Library General Public License" from all relevant files. ------------------------------------------------------------------------ r1279 | fabricecolin | 2008-06-18 14:29:58 +0200 (Wed, 18 Jun 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/Makefile.am Added HTTP_CFLAGS, following recent changes to TimeConverter. ------------------------------------------------------------------------ r1278 | fabricecolin | 2008-06-18 14:28:59 +0200 (Wed, 18 Jun 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Hmm I would have sworn SIGTERM was caught here. ------------------------------------------------------------------------ r1277 | fabricecolin | 2008-06-15 16:28:43 +0200 (Sun, 15 Jun 2008) | 2 lines Changed paths: M /trunk/Makefile.am New target "manuals" to generate man pages with help2man. ------------------------------------------------------------------------ r1276 | fabricecolin | 2008-06-15 16:25:01 +0200 (Sun, 15 Jun 2008) | 2 lines Changed paths: M /trunk/pinot.desktop Japanese and Simplified Chinese were missing in there. ------------------------------------------------------------------------ r1275 | fabricecolin | 2008-06-15 16:24:18 +0200 (Sun, 15 Jun 2008) | 6 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp On a full scan, change the status of all entries in CrawlHistory so that we can unindex orphaned documents at the end of the scan, ie documents that belong to deleted sources. Documents that have been deleted since the last full scan are still unindexed at the end of source scans. Do full scans on Reload. ------------------------------------------------------------------------ r1274 | fabricecolin | 2008-06-15 16:19:55 +0200 (Sun, 15 Jun 2008) | 4 lines Changed paths: M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h The string generated by toTimestamp() has a time zone spec, not a name. Replaced fromTimestamp() with Neon's or Curl's date parsing function, depending on which is enabled, Curl being the preferred option. ------------------------------------------------------------------------ r1273 | fabricecolin | 2008-06-15 16:15:24 +0200 (Sun, 15 Jun 2008) | 2 lines Changed paths: M /trunk Edited externals to use http, since BerliOS decided to drop svn adn svn+ssh. ------------------------------------------------------------------------ r1272 | fabricecolin | 2008-05-28 16:28:35 +0200 (Wed, 28 May 2008) | 3 lines Changed paths: M /trunk/IndexSearch/PluginWebEngine.cpp In DEBUG mode, prefix the name of the file output is saved to with the engine's hostname. ------------------------------------------------------------------------ r1271 | fabricecolin | 2008-05-28 16:26:25 +0200 (Wed, 28 May 2008) | 4 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/NeonDownloader.cpp Both Neon and CurlDownloader look for a Last-Modified header and use that as the document's timestamp. In DownloaderInterface, the timeout is in seconds. ------------------------------------------------------------------------ r1270 | fabricecolin | 2008-05-23 17:31:01 +0200 (Fri, 23 May 2008) | 2 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/UI/GTK2/src/ResultsTree.cpp We are actually interested in the date results are found and stored. ------------------------------------------------------------------------ r1269 | fabricecolin | 2008-05-23 17:27:37 +0200 (Fri, 23 May 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h Use a custom style for tab buttons, as used by gnome-terminal. ------------------------------------------------------------------------ r1268 | fabricecolin | 2008-05-22 14:47:17 +0200 (Thu, 22 May 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h Disconnect the queries combobox's changed signal before populating it and reconnect when done. ------------------------------------------------------------------------ r1267 | fabricecolin | 2008-05-21 16:37:14 +0200 (Wed, 21 May 2008) | 2 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp We might as well trim spaces off the user-supplied query too. ------------------------------------------------------------------------ r1266 | fabricecolin | 2008-05-21 16:36:25 +0200 (Wed, 21 May 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Skip CJKV terms in getCloseTerms(). ------------------------------------------------------------------------ r1265 | fabricecolin | 2008-05-20 16:47:50 +0200 (Tue, 20 May 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Synced with recent changes to QueryHistory. Don't offer suggestions if the current live query term is a filter or a range. ------------------------------------------------------------------------ r1264 | fabricecolin | 2008-05-20 16:45:08 +0200 (Tue, 20 May 2008) | 3 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp We might as well drop the Language column since it was used as charset and all the fields stored in this table are converted to UTF-8 prior to insertion. ------------------------------------------------------------------------ r1263 | fabricecolin | 2008-05-20 14:49:34 +0200 (Tue, 20 May 2008) | 3 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h Date is now part of the primary key so that we can have several snapshots of query results. ------------------------------------------------------------------------ r1262 | fabricecolin | 2008-05-19 12:05:32 +0200 (Mon, 19 May 2008) | 3 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp Trim spaces off query strings after removing filters to make sure we don't pass strings consisting exclusively of spaces to Web engines. ------------------------------------------------------------------------ r1261 | fabricecolin | 2008-05-19 09:37:41 +0200 (Mon, 19 May 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Name *History objects in a uniform manner to facilitate grepping the source. ------------------------------------------------------------------------ r1260 | fabricecolin | 2008-05-19 05:45:29 +0200 (Mon, 19 May 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc In ignore-version mode, we'd better reset the index labels list too as xapian-compact will <= 1.06 will have dropped it at the same time as the version number. ------------------------------------------------------------------------ r1259 | fabricecolin | 2008-05-19 05:40:27 +0200 (Mon, 19 May 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml The DocumentInfo methods also handle the "extract" field. ------------------------------------------------------------------------ r1258 | fabricecolin | 2008-05-19 05:33:01 +0200 (Mon, 19 May 2008) | 3 lines Changed paths: M /trunk/scripts/python/pinot-module.py Support for snippets. I am not convinced this is useful considering the current Deskbar interface, so this is disabled for now. ------------------------------------------------------------------------ r1257 | fabricecolin | 2008-05-19 05:30:29 +0200 (Mon, 19 May 2008) | 3 lines Changed paths: M /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h Utility method alterTable() to work-around SQLite's limited implementation of ALTER TABLE. ------------------------------------------------------------------------ r1256 | fabricecolin | 2008-05-16 17:22:46 +0200 (Fri, 16 May 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Refresh index lists with browse_index() when LabelUpdateThread returns, if necessary. When the query combox is updated, the change of selection will lead to a call to browse_index(). ------------------------------------------------------------------------ r1255 | fabricecolin | 2008-05-16 17:08:07 +0200 (Fri, 16 May 2008) | 4 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp Don't serialize to DBus empty fields. Added extract to the mix. Both getLabels() and getDocumentLabels() now always query the daemon and not the index. ------------------------------------------------------------------------ r1254 | fabricecolin | 2008-05-16 15:54:18 +0200 (Fri, 16 May 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h Prettier, but still too big, buttons in notebook tabs when _USE_BUTTON_TAB is defined. ------------------------------------------------------------------------ r1252 | fabricecolin | 2008-05-11 11:27:37 +0200 (Sun, 11 May 2008) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current ChangeLogs. ------------------------------------------------------------------------ r1251 | fabricecolin | 2008-05-11 04:17:10 +0200 (Sun, 11 May 2008) | 2 lines Changed paths: M /trunk/NEWS List latest changes. ------------------------------------------------------------------------ r1250 | fabricecolin | 2008-05-11 04:16:14 +0200 (Sun, 11 May 2008) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Synced with latest changes. ------------------------------------------------------------------------ r1249 | fabricecolin | 2008-05-09 15:35:29 +0200 (Fri, 09 May 2008) | 2 lines Changed paths: M /trunk/Utils/xdgmime/ChangeLog M /trunk/Utils/xdgmime/xdgmime.c M /trunk/Utils/xdgmime/xdgmime.h M /trunk/Utils/xdgmime/xdgmimealias.h M /trunk/Utils/xdgmime/xdgmimecache.c M /trunk/Utils/xdgmime/xdgmimecache.h M /trunk/Utils/xdgmime/xdgmimemagic.c M /trunk/Utils/xdgmime/xdgmimemagic.h M /trunk/Utils/xdgmime/xdgmimeparent.h Synced with current gtk+'s xdgmime. ------------------------------------------------------------------------ r1248 | fabricecolin | 2008-05-09 15:33:05 +0200 (Fri, 09 May 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp Fix for previous checkin. ------------------------------------------------------------------------ r1247 | fabricecolin | 2008-05-09 15:32:20 +0200 (Fri, 09 May 2008) | 2 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp Only set a default title if the document is not to passed to another filter. ------------------------------------------------------------------------ r1246 | fabricecolin | 2008-05-08 18:36:04 +0200 (Thu, 08 May 2008) | 5 lines Changed paths: M /trunk/IndexSearch/WebEngine.cpp M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp A fix for QueryModifier when the query is one CJKV character only. AbstractGenerator needs to skip long n-grams early on when populating the chosen window. Cosmetic changes here and there. ------------------------------------------------------------------------ r1245 | fabricecolin | 2008-05-08 14:20:55 +0200 (Thu, 08 May 2008) | 6 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/WebEngine.cpp M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp AbstractGenerator and WebEngine's TermHighlighter skip multi-character CJKV terms to avoid repetition in extracts. QueryProperties' FilterRemover does what the name implies and simply removes filters and ranges from a query instead of trying to rebuild it one token at a time. ------------------------------------------------------------------------ r1244 | fabricecolin | 2008-05-06 16:57:51 +0200 (Tue, 06 May 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp If the global configuration file couldn't be open, don't assume a first run. ------------------------------------------------------------------------ r1243 | fabricecolin | 2008-05-06 16:56:48 +0200 (Tue, 06 May 2008) | 3 lines Changed paths: M /trunk/IndexSearch/FilterWrapper.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/TextConverter.cpp M /trunk/Tokenize/TextConverter.h Overloaded TextConverter::toUTF8(). FilterWrapper uses the new method Filter::set_utf8_converter(). Some minor changes. ------------------------------------------------------------------------ r1242 | fabricecolin | 2008-05-05 14:36:21 +0200 (Mon, 05 May 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/NEWS M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/ja.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Bumped version number to 0.85. ------------------------------------------------------------------------ r1241 | fabricecolin | 2008-05-05 14:07:50 +0200 (Mon, 05 May 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Removed JPEG from default blacklist. Cosmetic modifications to pinot-dbus-daemon.xml. ------------------------------------------------------------------------ r1240 | fabricecolin | 2008-05-04 16:09:28 +0200 (Sun, 04 May 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h In MonitorThread::stop(), write to the control pipe after the stop flag is set by the call to the parent's method. Removed unused method. ------------------------------------------------------------------------ r1239 | fabricecolin | 2008-05-03 08:25:58 +0200 (Sat, 03 May 2008) | 3 lines Changed paths: M /trunk/README Mention the limitation of mixed queries doesn't apply to 0.85 and newer and give an example. ------------------------------------------------------------------------ r1238 | fabricecolin | 2008-05-03 08:17:10 +0200 (Sat, 03 May 2008) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/README M /trunk/pinot.spec.in New dependency on libexif. Added Mizuki-san to the AUTHORS file. ------------------------------------------------------------------------ r1237 | fabricecolin | 2008-05-03 08:13:44 +0200 (Sat, 03 May 2008) | 4 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/WebEngine.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp Rely on the CJKV TokensHandler interface to remove filters from queries and get sets of terms (QueryProperties), highlight terms (WebEngine) and tweak mixed CJKV queries (XapianEngine). ------------------------------------------------------------------------ r1236 | fabricecolin | 2008-05-03 08:10:08 +0200 (Sat, 03 May 2008) | 2 lines Changed paths: M /trunk/Makefile.am David Paleino suggests removing the dependency on the m4 directory. ------------------------------------------------------------------------ r1235 | fabricecolin | 2008-05-03 08:06:48 +0200 (Sat, 03 May 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/configure.in A /trunk/po/ja.po CJKVTokenizer is useful to code outside of the Xapian back-end and should be in libTokenize. The new Exif filter is also compiled. Adding Mizuki-san's Japanese translation. ------------------------------------------------------------------------ r1234 | fabricecolin | 2008-04-29 14:57:49 +0200 (Tue, 29 Apr 2008) | 7 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h We don't need to keep the Glib::Thread pointer around since all threads detach. Therefore entries can be inserted in the threads list before the new thread is started, and we get rid of a possible race condition between list insertion and thread exit. In ThreadsManager::get_thread(), the list should be locked in write mode since entries are removed. ------------------------------------------------------------------------ r1233 | fabricecolin | 2008-04-23 16:42:02 +0200 (Wed, 23 Apr 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h If a language is already specified, don't use it blindly but see if it has a stemmer. If it doesn't, then scan the document. ------------------------------------------------------------------------ r1232 | fabricecolin | 2008-04-23 16:40:21 +0200 (Wed, 23 Apr 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp In DaemonState, if new directories are queued for crawling, if there's no DirectoryScanner thread running, they won't be picked up. Call start_crawling() after an addition to the queue, this will start a new crawler if necessary. OnDiskHandler wasn't able to get the source label of files. ------------------------------------------------------------------------ r1231 | fabricecolin | 2008-04-23 16:04:22 +0200 (Wed, 23 Apr 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc If the PID file is still there, open /proc/daemon_pid/cmdline and see whether the given process is pinot-dbus-daemon. ------------------------------------------------------------------------ r1230 | fabricecolin | 2008-04-12 10:36:39 +0200 (Sat, 12 Apr 2008) | 3 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/Makefile.am M /trunk/SQL/QueryHistory.cpp A /trunk/SQL/SQLDB.cpp A /trunk/SQL/SQLDB.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.cpp Moved generic code to SQLDB. This should allow implementing support for other databases, if there's ever a need to. ------------------------------------------------------------------------ r1229 | fabricecolin | 2008-04-11 17:40:10 +0200 (Fri, 11 Apr 2008) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h Mostly namespace related cleanups. ------------------------------------------------------------------------ r1228 | fabricecolin | 2008-04-03 15:07:15 +0200 (Thu, 03 Apr 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Open the PID file, exit if that process is still running. ------------------------------------------------------------------------ r1226 | fabricecolin | 2008-03-27 09:32:05 +0100 (Thu, 27 Mar 2008) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Current ChangeLogs. ------------------------------------------------------------------------ r1225 | fabricecolin | 2008-03-27 09:28:56 +0100 (Thu, 27 Mar 2008) | 2 lines Changed paths: M /trunk/NEWS Forgot to mention the new plugin for UNdata. ------------------------------------------------------------------------ r1224 | fabricecolin | 2008-03-27 07:25:09 +0100 (Thu, 27 Mar 2008) | 2 lines Changed paths: M /trunk/NEWS List of changes since 0.83. ------------------------------------------------------------------------ r1223 | fabricecolin | 2008-03-27 03:09:44 +0100 (Thu, 27 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/TODO M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing for 0.84 release. ------------------------------------------------------------------------ r1222 | fabricecolin | 2008-03-27 03:08:00 +0100 (Thu, 27 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/Makefile.am Don't use -nostartfiles. ------------------------------------------------------------------------ r1221 | fabricecolin | 2008-03-27 03:00:36 +0100 (Thu, 27 Mar 2008) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_CN.po M /trunk/po/zh_TW.po Translations updates. ------------------------------------------------------------------------ r1220 | fabricecolin | 2008-03-26 07:24:03 +0100 (Wed, 26 Mar 2008) | 2 lines Changed paths: M /trunk/FAQ New FAQ entry about how to compact the index. ------------------------------------------------------------------------ r1219 | fabricecolin | 2008-03-26 07:11:14 +0100 (Wed, 26 Mar 2008) | 6 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Ignore the index version number, i.e. don't force an upgrade if it's lower than what was expected, if -i/--ignore-version is passed as parameter. This is useful when ~/.pinot/daemon was compacted with xapian-compact 1.0.6, which unlike previous releases, doesn't bail out on index metadata, but fails to carry metadata over to the compacted database. ------------------------------------------------------------------------ r1218 | fabricecolin | 2008-03-26 04:05:37 +0100 (Wed, 26 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh Queries can now index only new results. ------------------------------------------------------------------------ r1217 | fabricecolin | 2008-03-24 09:08:36 +0100 (Mon, 24 Mar 2008) | 2 lines Changed paths: A /trunk/IndexSearch/Plugins/UNData.src A plugin for UNdata. ------------------------------------------------------------------------ r1216 | fabricecolin | 2008-03-24 09:07:50 +0100 (Mon, 24 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/PluginWebEngine.cpp Ensure scroll-by parameters are not empty before appending them to the URL. ------------------------------------------------------------------------ r1215 | fabricecolin | 2008-03-24 08:19:54 +0100 (Mon, 24 Mar 2008) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/configure.in A /trunk/po/zh_CN.po Simplified Chinese po by Ashlee Ma. ------------------------------------------------------------------------ r1214 | fabricecolin | 2008-03-24 07:42:58 +0100 (Mon, 24 Mar 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h Don't scan the document to index for its language if it's already known. As things stand, it never is when indexDocument() is invoked. ------------------------------------------------------------------------ r1213 | fabricecolin | 2008-03-24 07:26:15 +0100 (Mon, 24 Mar 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh In Preferences, patterns can be reset to default values. ------------------------------------------------------------------------ r1212 | fabricecolin | 2008-03-24 06:13:17 +0100 (Mon, 24 Mar 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Initialize the dbus-status metadata. ------------------------------------------------------------------------ r1211 | fabricecolin | 2008-03-21 17:18:50 +0100 (Fri, 21 Mar 2008) | 7 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Pass parameters to the file_found signal by value, not reference. DaemonState and mainWindow don't delete the monitor and handler objects when destroyed because the Monitor thread might still be running. Added flags for when the daemon receives Stop or Disconnected. This is saved as database metadata, and shown by the Status window. Some other minor mods. ------------------------------------------------------------------------ r1210 | fabricecolin | 2008-03-20 12:06:31 +0100 (Thu, 20 Mar 2008) | 3 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/DBusIndex.h M /trunk/IndexSearch/IndexInterface.h M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h M /trunk/IndexSearch/pinot-index.cpp Replaced get|setVersion() with get|setMetadata(). In showinfo mode, pinot-index shows the index version string. ------------------------------------------------------------------------ r1209 | fabricecolin | 2008-03-17 02:58:43 +0100 (Mon, 17 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp Missing header, minor mod. ------------------------------------------------------------------------ r1208 | fabricecolin | 2008-03-15 14:16:04 +0100 (Sat, 15 Mar 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Const'ified parameter to view_documents(). ------------------------------------------------------------------------ r1207 | fabricecolin | 2008-03-15 14:15:02 +0100 (Sat, 15 Mar 2008) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Cosmetic changes. ------------------------------------------------------------------------ r1206 | fabricecolin | 2008-03-15 14:14:22 +0100 (Sat, 15 Mar 2008) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/Monitor/linux-inotify-syscalls.h A patch by Michael Biebl for m68k, mips, mipsel and hppa. ------------------------------------------------------------------------ r1205 | fabricecolin | 2008-03-15 14:12:46 +0100 (Sat, 15 Mar 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h Use the new CJKVTokenizer::TokensHandler class to tokenize CJKV. The spelling database is populated. ------------------------------------------------------------------------ r1204 | fabricecolin | 2008-03-15 14:09:09 +0100 (Sat, 15 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/ResultsExporter.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp Prefer snprintf() to sprintf(). ------------------------------------------------------------------------ r1203 | fabricecolin | 2008-03-15 14:06:27 +0100 (Sat, 15 Mar 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.cpp Close stuff at exit. ------------------------------------------------------------------------ r1202 | fabricecolin | 2008-03-08 10:18:12 +0100 (Sat, 08 Mar 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Be careful when applying source labels. IndexingThread redefined m_docInfo needlessly. ------------------------------------------------------------------------ r1201 | fabricecolin | 2008-03-04 12:55:08 +0100 (Tue, 04 Mar 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp Prior to asking for a write lock, always request for a read/write database. In practice, no harm is done since the database is open read/write at startup. ------------------------------------------------------------------------ r1200 | fabricecolin | 2008-03-03 12:45:01 +0100 (Mon, 03 Mar 2008) | 2 lines Changed paths: M /trunk/README Examples of what blacklisting does. ------------------------------------------------------------------------ r1198 | fabricecolin | 2008-02-28 12:02:07 +0100 (Thu, 28 Feb 2008) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/ChangeLog-dijon Updated logs. ------------------------------------------------------------------------ r1197 | fabricecolin | 2008-02-28 11:58:50 +0100 (Thu, 28 Feb 2008) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po NEWS describes changes made since 0.82. Updated TODO and synced po's. ------------------------------------------------------------------------ r1196 | fabricecolin | 2008-02-27 16:12:26 +0100 (Wed, 27 Feb 2008) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/PluginWebEngine.cpp M /trunk/IndexSearch/SherlockParser.cpp M /trunk/IndexSearch/Xapian/LanguageDetector.cpp M /trunk/IndexSearch/Xapian/XapianDatabase.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/pinot-label.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/ViewHistory.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/Utils/MIMEScanner.cpp A patch by Adel Gadllah to compile with gcc 4.3. ------------------------------------------------------------------------ r1195 | fabricecolin | 2008-02-27 16:09:26 +0100 (Wed, 27 Feb 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h The scanner thread should run in the background because we don't want to stop it after 5 minutes ! Threads that override stop() better call the base class method. ------------------------------------------------------------------------ r1194 | fabricecolin | 2008-02-26 15:17:42 +0100 (Tue, 26 Feb 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Increased version number. ------------------------------------------------------------------------ r1193 | fabricecolin | 2008-02-26 13:17:17 +0100 (Tue, 26 Feb 2008) | 4 lines Changed paths: M /trunk/AUTHORS M /trunk/README Mention contributions of Yung-chung Lin (Dijon's CJKV tokenizer) and David Paleino (.desktop files). README describes to what extent CJKV is supported. ------------------------------------------------------------------------ r1192 | fabricecolin | 2008-02-25 16:54:15 +0100 (Mon, 25 Feb 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h Mostly cosmetic changes. ------------------------------------------------------------------------ r1191 | fabricecolin | 2008-02-25 14:50:45 +0100 (Mon, 25 Feb 2008) | 3 lines Changed paths: M /trunk M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Added IndexSearch/cjkv to externals list. Updated translations, including latest es.po pulled from Rosetta. ------------------------------------------------------------------------ r1190 | fabricecolin | 2008-02-25 14:48:25 +0100 (Mon, 25 Feb 2008) | 4 lines Changed paths: M /trunk/IndexSearch/Google/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am Build IndexSearch/cjkv/CJKVTokenizer, link libxapianbackend against MISC_LIBS. Get soapcpp2 to look for includes in /usr/share/gsoap/import. Link pinot-label with -rdynamic. ------------------------------------------------------------------------ r1189 | fabricecolin | 2008-02-25 12:30:55 +0100 (Mon, 25 Feb 2008) | 3 lines Changed paths: M /trunk/pinot-dbus-daemon.desktop M /trunk/pinot.desktop David Paleino points out that the Encoding key is deprecated. See http://standards.freedesktop.org/desktop-entry-spec/latest/apc.html ------------------------------------------------------------------------ r1188 | fabricecolin | 2008-02-24 12:15:56 +0100 (Sun, 24 Feb 2008) | 7 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h First shot at CJKV support. Documents that have CJKV are processed with Dijon's CJKVTokenizer and indexed in a manner compatible with TermGenerator's. Queries that are exclusively CJKV are pre-processed by CJKVTokenizer, then fed to the QueryParser. Changed XapianIndex::removePostingsFromDocument() to reuse the term generation code of addPostingsToDocument(). ------------------------------------------------------------------------ r1187 | fabricecolin | 2008-02-22 15:22:26 +0100 (Fri, 22 Feb 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Makefile.am A /trunk/IndexSearch/XesamLog.h Provide XesamLog.h, required by the latest code in xesam/. ------------------------------------------------------------------------ r1186 | fabricecolin | 2008-02-22 14:43:26 +0100 (Fri, 22 Feb 2008) | 7 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc Avoid rebuilding DocumentInfo objects, at least use the copy constructor to avoid losing properties, for instance after importing a document into My Web Pages. The import dialog's OK button should alway be disabled if no URL is provided. In ResultsTree, don't bother getting the extract in flat mode, ie when the tree is embedded in an IndexPage. ------------------------------------------------------------------------ r1185 | fabricecolin | 2008-02-21 15:37:36 +0100 (Thu, 21 Feb 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Option --fullscan doesn't need a parameter ! Added flv to default blacklist, removed far too frequent DEBUG message. ------------------------------------------------------------------------ r1184 | fabricecolin | 2008-02-21 15:36:02 +0100 (Thu, 21 Feb 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Plugins/Google.src Fixed parsing. ------------------------------------------------------------------------ r1183 | fabricecolin | 2008-02-21 15:35:29 +0100 (Thu, 21 Feb 2008) | 4 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Modified CrawlHistory to provide errors' dates. statisticsDialog could previously miss errors starting with the second source because it didn't keep track of the latest error date for each source. ------------------------------------------------------------------------ r1182 | fabricecolin | 2008-02-21 15:32:49 +0100 (Thu, 21 Feb 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h findResultsExtract() tries all engines the result belongs to until a proper extract is found. ------------------------------------------------------------------------ r1181 | fabricecolin | 2008-02-20 15:40:56 +0100 (Wed, 20 Feb 2008) | 9 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/pinot-search.1 M /trunk/IndexSearch/pinot-search.cpp M /trunk/README In XapianEngine, if the query matched a Z-prefixed term (ie a stem, because stemming is enabled), find terms in each document that are potential unstems and use those as seed terms to generate the abstract. This may throw false positives as the matched stem isn't compared against those termss stemmed form. The pinot-search program now has a "--stemming/-s" parameter to specify a stemming language, in English not in the locale ! The README emphasizes that stemming is applied only if there is no exact match, and that the directory filter is recursive. ------------------------------------------------------------------------ r1180 | fabricecolin | 2008-02-18 12:38:52 +0100 (Mon, 18 Feb 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Set the locale according to the environment variables. Don't be silly and don't attempt to catch SIGKILL :-) ------------------------------------------------------------------------ r1179 | fabricecolin | 2008-02-18 12:35:17 +0100 (Mon, 18 Feb 2008) | 4 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/Makefile.am A /trunk/Tokenize/TextConverter.cpp A /trunk/Tokenize/TextConverter.h In FilterUtils::populateDocument(), look out for a charset in the filter's output, and convert both title and content to UTF-8 using the new TextConverter class. ------------------------------------------------------------------------ r1178 | fabricecolin | 2008-02-18 12:17:58 +0100 (Mon, 18 Feb 2008) | 5 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc Make tabs reorderable. There's also no good reason why the notebook shouldn't be scrollable... ResultsTree expected a "Xapian" engine -not available since the move to separate back-ends- and this broke the display of results extracts. ------------------------------------------------------------------------ r1177 | fabricecolin | 2008-02-17 06:53:55 +0100 (Sun, 17 Feb 2008) | 3 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/IndexSearch/Google/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/SQL/Makefile.am M /trunk/UI/GTK2/src/Makefile.am Build everything with fPIC ! Not doing that may cause nasty surprises. In the top-level Makefile.am, only move libxapianbackend* not lib* ! ------------------------------------------------------------------------ r1175 | fabricecolin | 2008-01-26 09:58:03 +0100 (Sat, 26 Jan 2008) | 3 lines Changed paths: M /trunk M /trunk/ChangeLog M /trunk/ChangeLog-dijon ChangeLogs are nice to have, they are up to date. Set externals to point to Dijon's trunk. ------------------------------------------------------------------------ r1174 | fabricecolin | 2008-01-26 07:54:13 +0100 (Sat, 26 Jan 2008) | 2 lines Changed paths: M /trunk/IndexSearch/pinot-index.1 M /trunk/IndexSearch/pinot-label.1 M /trunk/IndexSearch/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Updated manual pages and translations. ------------------------------------------------------------------------ r1173 | fabricecolin | 2008-01-26 07:51:51 +0100 (Sat, 26 Jan 2008) | 2 lines Changed paths: M /trunk/IndexSearch/Makefile.am M /trunk/configure.in Fixed building with support for the Google SOAP API. ------------------------------------------------------------------------ r1172 | fabricecolin | 2008-01-26 04:30:03 +0100 (Sat, 26 Jan 2008) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS M /trunk/README M /trunk/TODO M /trunk/configure.in Preparing for 0.82 release. ------------------------------------------------------------------------ r1171 | fabricecolin | 2008-01-26 04:06:10 +0100 (Sat, 26 Jan 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Blacklist ".cap". ------------------------------------------------------------------------ r1170 | fabricecolin | 2008-01-26 03:54:32 +0100 (Sat, 26 Jan 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Disable spelling correction on in-results queries. ------------------------------------------------------------------------ r1169 | fabricecolin | 2008-01-26 03:49:13 +0100 (Sat, 26 Jan 2008) | 2 lines Changed paths: M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/IndexSearch/pinot-index.cpp M /trunk/IndexSearch/pinot-search.cpp M /trunk/Makefile.am M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/pinot.spec.in Prefer "backend(s)" to "module(s)". ------------------------------------------------------------------------ r1168 | fabricecolin | 2008-01-20 13:01:45 +0100 (Sun, 20 Jan 2008) | 5 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/pinot-index.cpp M /trunk/IndexSearch/pinot-search.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml HasDocument is now exported over D-Bus. As a result, pinot-label doesn't have to open the index. Added a -b/--backend option to pinot-index (defaults to "xapian"). Minor changes to pinot-search. ------------------------------------------------------------------------ r1167 | fabricecolin | 2008-01-19 15:48:48 +0100 (Sat, 19 Jan 2008) | 6 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Tried to remove dependency on engine/index type xapian. Now use whatever backend is specified in the configuration file (defaults to xapian). This should make it possible to use something else. Make sure the stemming language is saved in English. When merging queries, use the boolean operator "or" and not "|". ------------------------------------------------------------------------ r1166 | fabricecolin | 2008-01-19 10:27:47 +0100 (Sat, 19 Jan 2008) | 2 lines Changed paths: D /trunk/IndexSearch/IndexFactory.cpp D /trunk/IndexSearch/IndexFactory.h M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/ModuleFactory.cpp M /trunk/IndexSearch/ModuleFactory.h D /trunk/IndexSearch/SearchEngineFactory.cpp D /trunk/IndexSearch/SearchEngineFactory.h M /trunk/IndexSearch/Xapian/ModuleExports.cpp Removed obsolete factory classes. Minor mods. ------------------------------------------------------------------------ r1165 | fabricecolin | 2008-01-19 10:24:39 +0100 (Sat, 19 Jan 2008) | 2 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h Removed getFilter() method. ------------------------------------------------------------------------ r1164 | fabricecolin | 2008-01-12 14:39:06 +0100 (Sat, 12 Jan 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/queryDialog.cc New MIME class filter. ------------------------------------------------------------------------ r1163 | fabricecolin | 2008-01-12 14:33:34 +0100 (Sat, 12 Jan 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am Minimize the amount of stuff pinot-label links to. Compile Xapian/ModuleExports.cpp not ModuleExports.cc. ------------------------------------------------------------------------ r1162 | fabricecolin | 2008-01-11 16:11:44 +0100 (Fri, 11 Jan 2008) | 2 lines Changed paths: M /trunk/po/sv.po Update from Zirro, pulled out of Rosetta. ------------------------------------------------------------------------ r1161 | fabricecolin | 2008-01-11 15:54:47 +0100 (Fri, 11 Jan 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Better keep using 'language' as the language filter (ie "lang:" and not "language:" as previous versions incorrectly assumed) and save the new stemming language property to 'stemlanguage'. ------------------------------------------------------------------------ r1160 | fabricecolin | 2008-01-11 15:52:20 +0100 (Fri, 11 Jan 2008) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h When looking for a finished thread after being signaled, stop those that have been running for more than 5 minutes. ------------------------------------------------------------------------ r1159 | fabricecolin | 2008-01-11 15:48:24 +0100 (Fri, 11 Jan 2008) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h New XCLASS:-prefixed term for MIME type classes, eg audio, application... When removing a term, remove the spelling too if spelling is enabled. ------------------------------------------------------------------------ r1158 | fabricecolin | 2008-01-06 07:42:36 +0100 (Sun, 06 Jan 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Updated about box message. ------------------------------------------------------------------------ r1157 | fabricecolin | 2008-01-05 06:30:11 +0100 (Sat, 05 Jan 2008) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/queryDialog.hh Changes to queryDialog. ------------------------------------------------------------------------ r1156 | fabricecolin | 2008-01-05 05:52:37 +0100 (Sat, 05 Jan 2008) | 2 lines Changed paths: A /trunk/IndexSearch/Xapian/ModuleExports.cpp This should have been checked in some time ago. ------------------------------------------------------------------------ r1155 | fabricecolin | 2008-01-01 05:28:20 +0100 (Tue, 01 Jan 2008) | 3 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Synced po files with the latest from Rosetta. Updates from JW (nl) and Jesus Tramullas (es). ------------------------------------------------------------------------ r1154 | fabricecolin | 2008-01-01 02:21:51 +0100 (Tue, 01 Jan 2008) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh Load and save stored queries' stemming language attribute, make it configurable in queryDialog. EngineQueryThread uses it as the results' language if no lang filter is set. ------------------------------------------------------------------------ r1153 | fabricecolin | 2008-01-01 02:16:53 +0100 (Tue, 01 Jan 2008) | 4 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h The language to stem queries with is separate from the lang filter, because - language detection is not 100% accurate - a document in language X may have words in language Y ------------------------------------------------------------------------ r1152 | fabricecolin | 2007-12-30 23:16:37 +0100 (Sun, 30 Dec 2007) | 7 lines Changed paths: M /trunk/IndexSearch/Xapian/AbstractGenerator.cpp M /trunk/IndexSearch/Xapian/XapianEngine.cpp Fixed query stemming. We should have used STEM_SOME at least since moving to Xapian 1.0. Ignore spelling suggestions if the query returned results. Ignore prefixed terms when seeding the abstracts generator. The latter doesn't mind if there are no seed terms. Ignore stems in query expansion. ------------------------------------------------------------------------ r1151 | fabricecolin | 2007-12-22 18:23:20 +0100 (Sat, 22 Dec 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Save and load queries' modified flag. The status window now shows which engines are available. Some other minor mod. ------------------------------------------------------------------------ r1150 | fabricecolin | 2007-12-22 18:21:14 +0100 (Sat, 22 Dec 2007) | 2 lines Changed paths: M /trunk/IndexSearch/ModuleFactory.cpp Disable sherlock if HAVE_BOOST_SPIRIT is not defined. ------------------------------------------------------------------------ r1149 | fabricecolin | 2007-12-19 13:49:27 +0100 (Wed, 19 Dec 2007) | 4 lines Changed paths: M /trunk/AUTHORS M /trunk/IndexSearch/SherlockParser.cpp The user input item, if not all lower case, wasn't removed from the input items list and would thus appear twice in the URL. This issue was reported by Claudio Bustos Navarrete. ------------------------------------------------------------------------ r1148 | fabricecolin | 2007-12-19 13:46:37 +0100 (Wed, 19 Dec 2007) | 2 lines Changed paths: M /trunk/README M /trunk/TODO M /trunk/pinot.spec.in Gtkmm 2.10 is needed. Updated TODO list. Spec file installs new Xapian module. ------------------------------------------------------------------------ r1147 | fabricecolin | 2007-12-19 13:41:48 +0100 (Wed, 19 Dec 2007) | 2 lines Changed paths: D /trunk/IndexSearch/Plugins/WiseNut.src Removing WiseNut plugin. ------------------------------------------------------------------------ r1146 | fabricecolin | 2007-12-19 13:39:39 +0100 (Wed, 19 Dec 2007) | 6 lines Changed paths: M /trunk/IndexSearch/pinot-index.cpp M /trunk/IndexSearch/pinot-label.cpp M /trunk/IndexSearch/pinot-search.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsDialog.cc Rely on ModuleFactory instead of other other factories, load modules. pinot-label uses a pure DBusIndex object, pinot-search sets proxy options only if the engine is a WebEngine. Added .a .la .o .so and backup files to blacklist in PinotSettings. Don't aapply spelling correction to More Like and previously corrected queries. ------------------------------------------------------------------------ r1145 | fabricecolin | 2007-12-19 13:31:34 +0100 (Wed, 19 Dec 2007) | 6 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/IndexSearch/Google/Makefile.am M /trunk/IndexSearch/Makefile.am M /trunk/IndexSearch/Xapian/Makefile.am M /trunk/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am M /trunk/configure.in Build IndexSearch/Xapian as a dynamic library, link programs with -rdynamic. The libUtils library has only got classes with static data, the rest goes into libBasicUtils. Only link to necessary libraries, eg only the mbox filter needs to link against GMime. ------------------------------------------------------------------------ r1144 | fabricecolin | 2007-12-19 13:26:57 +0100 (Wed, 19 Dec 2007) | 3 lines Changed paths: A /trunk/IndexSearch/ModuleFactory.cpp A /trunk/IndexSearch/ModuleFactory.h A class to eventuall replace Index and SearchEngine factories that can be extended through dynamically-loaded modules. ------------------------------------------------------------------------ r1143 | fabricecolin | 2007-12-19 13:21:09 +0100 (Wed, 19 Dec 2007) | 3 lines Changed paths: M /trunk/IndexSearch/QueryProperties.cpp M /trunk/IndexSearch/QueryProperties.h M /trunk/IndexSearch/SearchEngineInterface.cpp M /trunk/IndexSearch/SearchEngineInterface.h M /trunk/IndexSearch/WebEngine.h Method getDownloader() is now specific to WebEngine. QueryProperties has a modified flag to record automatic alterations. ------------------------------------------------------------------------ r1142 | fabricecolin | 2007-12-19 13:19:00 +0100 (Wed, 19 Dec 2007) | 2 lines Changed paths: M /trunk/IndexSearch/DBusIndex.cpp M /trunk/IndexSearch/DBusIndex.h DBus-enabled overloads are used automatically if there's no nested Index. ------------------------------------------------------------------------ r1141 | fabricecolin | 2007-12-19 13:15:54 +0100 (Wed, 19 Dec 2007) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h Remember whether inotify_add_watch() failed with ENOSPC. ------------------------------------------------------------------------ r1140 | fabricecolin | 2007-12-11 16:34:47 +0100 (Tue, 11 Dec 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/queryDialog.cc Removed unused include. ------------------------------------------------------------------------ r1139 | fabricecolin | 2007-12-03 16:48:26 +0100 (Mon, 03 Dec 2007) | 3 lines Changed paths: M /trunk/IndexSearch/Xapian/XapianEngine.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc Removed superfluous includes, replaced instances of XapianIndex with an object obtained from PinotSettings::getIndex(). ------------------------------------------------------------------------ r1138 | fabricecolin | 2007-12-01 15:52:21 +0100 (Sat, 01 Dec 2007) | 2 lines Changed paths: M /trunk/pinot.spec.in Now requires gtkmm24 >= 2.10. ------------------------------------------------------------------------ r1137 | fabricecolin | 2007-12-01 11:05:14 +0100 (Sat, 01 Dec 2007) | 2 lines Changed paths: D /trunk/Index D /trunk/Search Obsolete. ------------------------------------------------------------------------ r1136 | fabricecolin | 2007-12-01 10:50:05 +0100 (Sat, 01 Dec 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Get headers and libraries from IndexSearch. Obtain indexes from PinotSettings. Use new DBusIndex class. ------------------------------------------------------------------------ r1135 | fabricecolin | 2007-12-01 10:39:35 +0100 (Sat, 01 Dec 2007) | 3 lines Changed paths: M /trunk/IndexSearch/Google/Makefile.am A /trunk/IndexSearch/Makefile.am A /trunk/IndexSearch/Xapian/Makefile.am M /trunk/Makefile.am M /trunk/configure.in Build and distribute contents of IndexSearch. Configure now checks for gtkmm >= 2.10. ------------------------------------------------------------------------ r1134 | fabricecolin | 2007-12-01 10:36:25 +0100 (Sat, 01 Dec 2007) | 3 lines Changed paths: A /trunk/IndexSearch/DBusIndex.cpp A /trunk/IndexSearch/DBusIndex.h D /trunk/IndexSearch/DBusXapianIndex.cpp D /trunk/IndexSearch/DBusXapianIndex.h M /trunk/IndexSearch/IndexFactory.cpp M /trunk/IndexSearch/IndexInterface.h M /trunk/IndexSearch/Xapian/XapianEngine.cpp M /trunk/IndexSearch/Xapian/XapianIndex.cpp M /trunk/IndexSearch/Xapian/XapianIndex.h M /trunk/IndexSearch/pinot-label.cpp Added IndexInterface::reopen(). Replaced DBusXapianIndex with DBusIndex. Used IndexFactory whenever possible. ------------------------------------------------------------------------ r1133 | fabricecolin | 2007-12-01 10:25:54 +0100 (Sat, 01 Dec 2007) | 3 lines Changed paths: D /trunk/Index/DBusXapianIndex.cpp D /trunk/Index/DBusXapianIndex.h D /trunk/Index/FilterWrapper.cpp D /trunk/Index/FilterWrapper.h D /trunk/Index/IndexFactory.cpp D /trunk/Index/IndexFactory.h D /trunk/Index/IndexInterface.h D /trunk/Index/LanguageDetector.cpp D /trunk/Index/LanguageDetector.h D /trunk/Index/XapianDatabase.cpp D /trunk/Index/XapianDatabase.h D /trunk/Index/XapianDatabaseFactory.cpp D /trunk/Index/XapianDatabaseFactory.h D /trunk/Index/XapianIndex.cpp D /trunk/Index/XapianIndex.h D /trunk/Index/pinot-index.1 D /trunk/Index/pinot-index.cpp D /trunk/Index/pinot-label.1 D /trunk/Index/pinot-label.cpp A /trunk/IndexSearch A /trunk/IndexSearch/DBusXapianIndex.cpp (from /trunk/Index/DBusXapianIndex.cpp:1132) A /trunk/IndexSearch/DBusXapianIndex.h (from /trunk/Index/DBusXapianIndex.h:1132) A /trunk/IndexSearch/FilterWrapper.cpp (from /trunk/Index/FilterWrapper.cpp:1132) A /trunk/IndexSearch/FilterWrapper.h (from /trunk/Index/FilterWrapper.h:1132) A /trunk/IndexSearch/Google (from /trunk/Search/Google:1132) A /trunk/IndexSearch/IndexFactory.cpp (from /trunk/Index/IndexFactory.cpp:1132) A /trunk/IndexSearch/IndexFactory.h (from /trunk/Index/IndexFactory.h:1132) A /trunk/IndexSearch/IndexInterface.h (from /trunk/Index/IndexInterface.h:1132) A /trunk/IndexSearch/OpenSearchParser.cpp (from /trunk/Search/OpenSearchParser.cpp:1132) A /trunk/IndexSearch/OpenSearchParser.h (from /trunk/Search/OpenSearchParser.h:1132) A /trunk/IndexSearch/PluginParsers.h (from /trunk/Search/PluginParsers.h:1132) A /trunk/IndexSearch/PluginWebEngine.cpp (from /trunk/Search/PluginWebEngine.cpp:1132) A /trunk/IndexSearch/PluginWebEngine.h (from /trunk/Search/PluginWebEngine.h:1132) A /trunk/IndexSearch/Plugins (from /trunk/Search/Plugins:1132) A /trunk/IndexSearch/QueryProperties.cpp (from /trunk/Search/QueryProperties.cpp:1132) A /trunk/IndexSearch/QueryProperties.h (from /trunk/Search/QueryProperties.h:1132) A /trunk/IndexSearch/ResultsExporter.cpp (from /trunk/Search/ResultsExporter.cpp:1132) A /trunk/IndexSearch/ResultsExporter.h (from /trunk/Search/ResultsExporter.h:1132) A /trunk/IndexSearch/SOAPEnv.h (from /trunk/Search/SOAPEnv.h:1132) A /trunk/IndexSearch/SOAPEnvH.h (from /trunk/Search/SOAPEnvH.h:1132) A /trunk/IndexSearch/SOAPEnvNS.cpp (from /trunk/Search/SOAPEnvNS.cpp:1132) A /trunk/IndexSearch/SOAPEnvStub.h (from /trunk/Search/SOAPEnvStub.h:1132) A /trunk/IndexSearch/SearchEngineFactory.cpp (from /trunk/Search/SearchEngineFactory.cpp:1132) A /trunk/IndexSearch/SearchEngineFactory.h (from /trunk/Search/SearchEngineFactory.h:1132) A /trunk/IndexSearch/SearchEngineInterface.cpp (from /trunk/Search/SearchEngineInterface.cpp:1132) A /trunk/IndexSearch/SearchEngineInterface.h (from /trunk/Search/SearchEngineInterface.h:1132) A /trunk/IndexSearch/SearchPluginProperties.cpp (from /trunk/Search/SearchPluginProperties.cpp:1132) A /trunk/IndexSearch/SearchPluginProperties.h (from /trunk/Search/SearchPluginProperties.h:1132) A /trunk/IndexSearch/SherlockParser.cpp (from /trunk/Search/SherlockParser.cpp:1132) A /trunk/IndexSearch/SherlockParser.h (from /trunk/Search/SherlockParser.h:1132) A /trunk/IndexSearch/WebEngine.cpp (from /trunk/Search/WebEngine.cpp:1132) A /trunk/IndexSearch/WebEngine.h (from /trunk/Search/WebEngine.h:1132) A /trunk/IndexSearch/Xapian A /trunk/IndexSearch/Xapian/AbstractGenerator.cpp (from /trunk/Search/AbstractGenerator.cpp:1132) A /trunk/IndexSearch/Xapian/AbstractGenerator.h (from /trunk/Search/AbstractGenerator.h:1132) A /trunk/IndexSearch/Xapian/LanguageDetector.cpp (from /trunk/Index/LanguageDetector.cpp:1132) A /trunk/IndexSearch/Xapian/LanguageDetector.h (from /trunk/Index/LanguageDetector.h:1132) A /trunk/IndexSearch/Xapian/XapianDatabase.cpp (from /trunk/Index/XapianDatabase.cpp:1132) A /trunk/IndexSearch/Xapian/XapianDatabase.h (from /trunk/Index/XapianDatabase.h:1132) A /trunk/IndexSearch/Xapian/XapianDatabaseFactory.cpp (from /trunk/Index/XapianDatabaseFactory.cpp:1132) A /trunk/IndexSearch/Xapian/XapianDatabaseFactory.h (from /trunk/Index/XapianDatabaseFactory.h:1132) A /trunk/IndexSearch/Xapian/XapianEngine.cpp (from /trunk/Search/XapianEngine.cpp:1132) A /trunk/IndexSearch/Xapian/XapianEngine.h (from /trunk/Search/XapianEngine.h:1132) A /trunk/IndexSearch/Xapian/XapianIndex.cpp (from /trunk/Index/XapianIndex.cpp:1132) A /trunk/IndexSearch/Xapian/XapianIndex.h (from /trunk/Index/XapianIndex.h:1132) A /trunk/IndexSearch/pinot-index.1 (from /trunk/Index/pinot-index.1:1132) A /trunk/IndexSearch/pinot-index.cpp (from /trunk/Index/pinot-index.cpp:1132) A /trunk/IndexSearch/pinot-label.1 (from /trunk/Index/pinot-label.1:1132) A /trunk/IndexSearch/pinot-label.cpp (from /trunk/Index/pinot-label.cpp:1132) A /trunk/IndexSearch/pinot-search.1 (from /trunk/Search/pinot-search.1:1132) A /trunk/IndexSearch/pinot-search.cpp (from /trunk/Search/pinot-search.cpp:1132) D /trunk/Search/AbstractGenerator.cpp D /trunk/Search/AbstractGenerator.h D /trunk/Search/Google D /trunk/Search/OpenSearchParser.cpp D /trunk/Search/OpenSearchParser.h D /trunk/Search/PluginParsers.h D /trunk/Search/PluginWebEngine.cpp D /trunk/Search/PluginWebEngine.h D /trunk/Search/Plugins D /trunk/Search/QueryProperties.cpp D /trunk/Search/QueryProperties.h D /trunk/Search/ResultsExporter.cpp D /trunk/Search/ResultsExporter.h D /trunk/Search/SOAPEnv.h D /trunk/Search/SOAPEnvH.h D /trunk/Search/SOAPEnvNS.cpp D /trunk/Search/SOAPEnvStub.h D /trunk/Search/SearchEngineFactory.cpp D /trunk/Search/SearchEngineFactory.h D /trunk/Search/SearchEngineInterface.cpp D /trunk/Search/SearchEngineInterface.h D /trunk/Search/SearchPluginProperties.cpp D /trunk/Search/SearchPluginProperties.h D /trunk/Search/SherlockParser.cpp D /trunk/Search/SherlockParser.h D /trunk/Search/WebEngine.cpp D /trunk/Search/WebEngine.h D /trunk/Search/XapianEngine.cpp D /trunk/Search/XapianEngine.h D /trunk/Search/pinot-search.1 D /trunk/Search/pinot-search.cpp Moved index and search code under IndexSearch, with anything depending on Xapian in IndexSearch/Xapian. ------------------------------------------------------------------------ r1132 | fabricecolin | 2007-11-30 16:20:18 +0100 (Fri, 30 Nov 2007) | 2 lines Changed paths: M /trunk/Utils/Languages.cpp M /trunk/Utils/Languages.h Const'ified some variables. ------------------------------------------------------------------------ r1131 | fabricecolin | 2007-11-30 16:19:28 +0100 (Fri, 30 Nov 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc Tell the user that the index needs updating with a MessageDialog and keep pestering him until he clicks the "Don't warn me again" checkbox. Be nice and give a title of all MessageDialog's. ------------------------------------------------------------------------ r1130 | fabricecolin | 2007-11-30 16:09:34 +0100 (Fri, 30 Nov 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp If the filter takes DOCUMENT_URI input, that means all non local schemes, not just http. Removed unused header. ------------------------------------------------------------------------ r1129 | fabricecolin | 2007-11-29 14:16:41 +0100 (Thu, 29 Nov 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/mainWindow.cc Defer importing to the main window. When viewing any document, add it to the list of recently used files. This requires gtkmm >= 2.10. ------------------------------------------------------------------------ r1128 | fabricecolin | 2007-11-29 13:53:58 +0100 (Thu, 29 Nov 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Add ValueRangeProcessor's early so that it benefits XapianQueryBuilder. ------------------------------------------------------------------------ r1126 | fabricecolin | 2007-11-24 16:24:23 +0100 (Sat, 24 Nov 2007) | 2 lines Changed paths: A /trunk/mkinstalldirs Replaced symlink with actual file. ------------------------------------------------------------------------ r1125 | fabricecolin | 2007-11-24 16:23:42 +0100 (Sat, 24 Nov 2007) | 2 lines Changed paths: D /trunk/mkinstalldirs Removed symbolic link. ------------------------------------------------------------------------ r1124 | fabricecolin | 2007-11-23 16:39:37 +0100 (Fri, 23 Nov 2007) | 2 lines Changed paths: M /trunk/Search/Plugins/Freshmeat.src Corrected extraction. It still doesn't cope well when there's just one match. ------------------------------------------------------------------------ r1123 | fabricecolin | 2007-11-23 16:36:49 +0100 (Fri, 23 Nov 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Index/pinot-label.1 M /trunk/NEWS M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Preparing for 0.81 release. ------------------------------------------------------------------------ r1122 | fabricecolin | 2007-11-23 14:43:29 +0100 (Fri, 23 Nov 2007) | 2 lines Changed paths: M /trunk/TODO Current TODO list. ------------------------------------------------------------------------ r1121 | fabricecolin | 2007-11-23 14:42:51 +0100 (Fri, 23 Nov 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp If sysctlbyname() is used, restart crawling when on AC. ------------------------------------------------------------------------ r1120 | fabricecolin | 2007-11-22 12:36:47 +0100 (Thu, 22 Nov 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Advertise that org.freedesktop.DBus.Introspectable.Introspect is supported. ------------------------------------------------------------------------ r1119 | fabricecolin | 2007-11-22 12:35:50 +0100 (Thu, 22 Nov 2007) | 2 lines Changed paths: M /trunk/Makefile.am look for data files to install in srcdir. ------------------------------------------------------------------------ r1118 | fabricecolin | 2007-11-22 12:29:14 +0100 (Thu, 22 Nov 2007) | 6 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Locations to crawl are queued and only popped from the queue if the crawler wasn't stopped. This allows restarting crawling more or less where it stopped when the system went on battery. The battery state is queried at startup (through D-Bus) and crawling is restarted when the system goes on AC. ------------------------------------------------------------------------ r1117 | fabricecolin | 2007-11-22 12:20:31 +0100 (Thu, 22 Nov 2007) | 3 lines Changed paths: M /trunk/Search/Makefile.am M /trunk/Search/XapianEngine.cpp XesamULParser depends on boost Spirit and should be built conditionally, just like SherlockParser. This was reported by Reuben Thomas. ------------------------------------------------------------------------ r1116 | fabricecolin | 2007-11-22 12:17:36 +0100 (Thu, 22 Nov 2007) | 3 lines Changed paths: M /trunk/configure.in The function statfs(2) may be in sys/vfs.h, sys/statfs.h or sys/mount.h so check for those, as well as sys/statvfs.h and sysctlbyname(3). ------------------------------------------------------------------------ r1115 | fabricecolin | 2007-11-21 15:44:20 +0100 (Wed, 21 Nov 2007) | 7 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Prox files can't be monitored with inotify so I am changing tactics. Now the daemon listens for the battery state change signal sent on the message bus by org.freedesktop.PowerManagement (as defined in the spec v0.1 and v0.2) as well as the one from org.gnome.PowerManager (for older Gnome installations). On FreeBSD, we query the sysctl hw.acpi.acline at the same time as the disk usage check. I am not sure yet how useful it is, so this may be removed. ------------------------------------------------------------------------ r1114 | fabricecolin | 2007-11-17 08:50:51 +0100 (Sat, 17 Nov 2007) | 2 lines Changed paths: M /trunk/scripts/python/pinot-module.py Use get_category() to specify the category, it's cleaner. ------------------------------------------------------------------------ r1113 | fabricecolin | 2007-11-17 06:19:45 +0100 (Sat, 17 Nov 2007) | 2 lines Changed paths: M /trunk/pinot-dbus-daemon.desktop Fixed comment : the daemon doesn't allow searching anything but My Documents. ------------------------------------------------------------------------ r1112 | fabricecolin | 2007-11-15 13:51:08 +0100 (Thu, 15 Nov 2007) | 2 lines Changed paths: M /trunk/README Document Search This For, the new environment variables and Deskbar plugins. ------------------------------------------------------------------------ r1111 | fabricecolin | 2007-11-15 13:49:30 +0100 (Thu, 15 Nov 2007) | 3 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp Work around broken shared-mime-info rules that identify all HTML files as Mozilla bookmarks. ------------------------------------------------------------------------ r1110 | fabricecolin | 2007-11-15 13:47:14 +0100 (Thu, 15 Nov 2007) | 3 lines Changed paths: M /trunk/pinot.spec.in Put pinot-deskbar back in, as in the Fedora spec. This will simplify things for Fedora users. ------------------------------------------------------------------------ r1109 | fabricecolin | 2007-11-15 12:08:50 +0100 (Thu, 15 Nov 2007) | 4 lines Changed paths: M /trunk/scripts/python/pinot-module.py The initialize() method connects to D-Bus. GetStatistics() is called on the interface object. Matches category is "documents". ------------------------------------------------------------------------ r1108 | fabricecolin | 2007-11-14 15:25:39 +0100 (Wed, 14 Nov 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp Fix for previous check-in. ------------------------------------------------------------------------ r1107 | fabricecolin | 2007-11-14 14:52:25 +0100 (Wed, 14 Nov 2007) | 2 lines Changed paths: A /trunk/ChangeLog-dijon This file is required by make dist. Up to date as of Pinot v0.80. ------------------------------------------------------------------------ r1106 | fabricecolin | 2007-11-14 14:48:52 +0100 (Wed, 14 Nov 2007) | 2 lines Changed paths: M /trunk/scripts/python/pinot-module.py Undefined variable fix. ------------------------------------------------------------------------ r1105 | fabricecolin | 2007-11-14 14:47:42 +0100 (Wed, 14 Nov 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp When inserting items in the history database, check whether they exist first. ------------------------------------------------------------------------ r1104 | fabricecolin | 2007-11-13 16:36:22 +0100 (Tue, 13 Nov 2007) | 3 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h When opened, XapianDatabase checks whether the env var PINOT_SPELLING_DB is set to NO. If it is, spelling support is disabled. ------------------------------------------------------------------------ r1103 | fabricecolin | 2007-11-13 16:18:37 +0100 (Tue, 13 Nov 2007) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in Distribute and install pinot-module.py. ------------------------------------------------------------------------ r1102 | fabricecolin | 2007-11-13 16:17:13 +0100 (Tue, 13 Nov 2007) | 2 lines Changed paths: A /trunk/scripts/python/pinot-module.py First shot at a plugin for Deskbar 2.20. ------------------------------------------------------------------------ r1101 | fabricecolin | 2007-11-13 12:37:46 +0100 (Tue, 13 Nov 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp This should have been included in revision 1098. ------------------------------------------------------------------------ r1100 | fabricecolin | 2007-11-13 12:32:58 +0100 (Tue, 13 Nov 2007) | 3 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp PINOT_MINIMUM_DISK_SPACE defaults to 50Mb. Cosmetic changes. ------------------------------------------------------------------------ r1099 | fabricecolin | 2007-11-12 13:32:47 +0100 (Mon, 12 Nov 2007) | 7 lines Changed paths: M /trunk/Monitor/MonitorHandler.cpp M /trunk/Monitor/MonitorHandler.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/mainWindow.cc MonitorHandler's methods have a default, empty implementation. DaemonState monitors the battery state file found under /proc/acpi/ac_adapter and stops crawling when the system goes on battery. Indexing is not affected, ie files reported by the crawler before it stops or the file monitor will be indexed as per normal. The ReloadHandler in mainWindow needs less hand-holding. ------------------------------------------------------------------------ r1098 | fabricecolin | 2007-11-11 08:03:00 +0100 (Sun, 11 Nov 2007) | 2 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Index/Makefile.am M /trunk/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/SQL/Makefile.am M /trunk/Search/Google/Makefile.am M /trunk/Search/Makefile.am M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am Patches from Gabriel C to build from any directory. ------------------------------------------------------------------------ r1097 | fabricecolin | 2007-11-08 07:25:25 +0100 (Thu, 08 Nov 2007) | 2 lines Changed paths: M /trunk/configure.in Check for stavfs(2) and statfs(2). ------------------------------------------------------------------------ r1096 | fabricecolin | 2007-11-08 07:21:35 +0100 (Thu, 08 Nov 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp FileFoundSignal is on OnDiskHandler now. ------------------------------------------------------------------------ r1095 | fabricecolin | 2007-11-08 07:15:34 +0100 (Thu, 08 Nov 2007) | 6 lines Changed paths: M /trunk/Monitor/MonitorHandler.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc MonitorHandler has a directoryCreated() method. In practice, this means that MonitorThread doesn't shortcut the handler when a directory is created, and that OnDiskHandler can fire the "file found" signal on its own. This also unifies indexing between monitoring and crawling, and should ensure no indexing takes place when the new stop indexing flag is up. ------------------------------------------------------------------------ r1094 | fabricecolin | 2007-11-08 03:46:33 +0100 (Thu, 08 Nov 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Stop indexing if the disk on which the daemon index resides is running out of space. This is not directly configurable but the default value (10 Mb) can be overriden by setting the environment variable PINOT_MINIMUM_DISK_SPACE. ------------------------------------------------------------------------ r1093 | fabricecolin | 2007-11-06 15:13:25 +0100 (Tue, 06 Nov 2007) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/pt.po M /trunk/po/sv.po Mention Andreas Wagner's contribution. Updated Portuguese and Swedish translations by Tiago Silva and Daniel Nylander. ------------------------------------------------------------------------ r1092 | fabricecolin | 2007-11-06 15:08:57 +0100 (Tue, 06 Nov 2007) | 2 lines Changed paths: M /trunk/FAQ There already was an entry about stale locks... Updated it. ------------------------------------------------------------------------ r1091 | fabricecolin | 2007-11-05 12:32:28 +0100 (Mon, 05 Nov 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc Don't show "unknown" if the number of terms is 0. Removed unused variable in mainWindow.cc. ------------------------------------------------------------------------ r1090 | fabricecolin | 2007-11-05 12:28:06 +0100 (Mon, 05 Nov 2007) | 2 lines Changed paths: M /trunk/Index/pinot-label.cpp Don't loop forever if the file was not indexed ! ------------------------------------------------------------------------ r1089 | fabricecolin | 2007-11-05 12:20:01 +0100 (Mon, 05 Nov 2007) | 3 lines Changed paths: M /trunk/Utils/Url.cpp Don't look for parameters in file URLs, that truncates file names that have a question mark. ------------------------------------------------------------------------ r1088 | fabricecolin | 2007-11-04 11:14:46 +0100 (Sun, 04 Nov 2007) | 2 lines Changed paths: M /trunk/FAQ Added entries about stale lock files and flushing. ------------------------------------------------------------------------ r1087 | fabricecolin | 2007-11-04 11:09:37 +0100 (Sun, 04 Nov 2007) | 2 lines Changed paths: M /trunk/pinot-dbus-daemon.desktop M /trunk/pinot.desktop Icon is not an absolute path and therefore shouldn't have an extension. ------------------------------------------------------------------------ r1086 | fabricecolin | 2007-11-04 05:45:02 +0100 (Sun, 04 Nov 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/EnginesTree.h M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/launcherDialog.cc M /trunk/UI/GTK2/src/launcherDialog_glade.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh M /trunk/UI/GTK2/src/statisticsDialog_glade.cc Replaced deprecated SigC calls. New versions of libsigc++ don't provide a compatibility.h header. ------------------------------------------------------------------------ r1085 | fabricecolin | 2007-11-02 16:43:08 +0100 (Fri, 02 Nov 2007) | 2 lines Changed paths: M /trunk/FAQ Added item about KDE 3 and Autostart. ------------------------------------------------------------------------ r1083 | fabricecolin | 2007-11-01 12:02:46 +0100 (Thu, 01 Nov 2007) | 3 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Index/pinot-label.1 M /trunk/NEWS M /trunk/README M /trunk/Search/pinot-search.1 M /trunk/TODO M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Updated for 0.80 release. Clarified parts about document formats and external programs in the README. ------------------------------------------------------------------------ r1082 | fabricecolin | 2007-11-01 12:00:53 +0100 (Thu, 01 Nov 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog.cc Hide the language when editing several documents' properties, at least until that can be updated over D-Bus en masse. ------------------------------------------------------------------------ r1081 | fabricecolin | 2007-10-30 12:25:20 +0100 (Tue, 30 Oct 2007) | 2 lines Changed paths: M /trunk/pinot.spec.in Merged audio-docs, deskbar and omega sub-packages into main one. ------------------------------------------------------------------------ r1080 | fabricecolin | 2007-10-30 12:07:25 +0100 (Tue, 30 Oct 2007) | 3 lines Changed paths: M /trunk/configure.in New option --with-ssl=PATH for those systems on which Curl or Neon require OpenSSL but pkg-config can't be relied on. ------------------------------------------------------------------------ r1079 | fabricecolin | 2007-10-28 07:33:05 +0100 (Sun, 28 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Added missing includes for stat(). ------------------------------------------------------------------------ r1078 | fabricecolin | 2007-10-27 15:10:35 +0200 (Sat, 27 Oct 2007) | 2 lines Changed paths: M /trunk/NEWS M /trunk/README Updated README with used external filter programs, and NEWS with current status. ------------------------------------------------------------------------ r1077 | fabricecolin | 2007-10-27 14:57:33 +0200 (Sat, 27 Oct 2007) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Current translations, including updates from JW (nl) and Leonardo Melo (pt_BR). ------------------------------------------------------------------------ r1076 | fabricecolin | 2007-10-27 05:47:11 +0200 (Sat, 27 Oct 2007) | 3 lines Changed paths: M /trunk/Utils/Url.cpp Escape '+'. That doesn't do any harm and prevents Xapian's spelling correction from mistaking '+' in url filters for a boolean operator. ------------------------------------------------------------------------ r1075 | fabricecolin | 2007-10-27 05:45:26 +0200 (Sat, 27 Oct 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp In DEBUG mode, get a description of the Query object, that's pretty useful. ------------------------------------------------------------------------ r1074 | fabricecolin | 2007-10-27 04:37:24 +0200 (Sat, 27 Oct 2007) | 2 lines Changed paths: M /trunk/Index/pinot-label.cpp Option --list doesn't need any parameter. ------------------------------------------------------------------------ r1073 | fabricecolin | 2007-10-26 17:18:13 +0200 (Fri, 26 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/propertiesDialog.cc Better at telling when properties or daemon preferences have changed. ------------------------------------------------------------------------ r1072 | fabricecolin | 2007-10-24 16:05:41 +0200 (Wed, 24 Oct 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc In get_results_page_details(), set the query's name so that we can tell the user what we failed to find. One case where this applies is More Like This on a in-results search. ------------------------------------------------------------------------ r1071 | fabricecolin | 2007-10-24 15:53:17 +0200 (Wed, 24 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc Don't refresh the index list if labels were not changed in any way. Don't bother looking for the document's language if it's unknown ! ------------------------------------------------------------------------ r1070 | fabricecolin | 2007-10-23 16:58:24 +0200 (Tue, 23 Oct 2007) | 3 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Search/pinot-search.cpp Following the fix on language in XapianDatabase's records, we need to provide names for all languages. ------------------------------------------------------------------------ r1069 | fabricecolin | 2007-10-23 16:52:05 +0200 (Tue, 23 Oct 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh propertiesDialog should be able to tell when the user made modifications. Modified LabelUpdateThread to set several labels on N documents at once. That's used after labels are edited on several documents. ------------------------------------------------------------------------ r1068 | fabricecolin | 2007-10-23 16:43:39 +0200 (Tue, 23 Oct 2007) | 2 lines Changed paths: M /trunk/Search/pinot-search.cpp Slightly better error message. ------------------------------------------------------------------------ r1067 | fabricecolin | 2007-10-23 16:42:32 +0200 (Tue, 23 Oct 2007) | 3 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Index/pinot-label.cpp Process all parameters instead of stopping after the first one... Don't display internal labels. ------------------------------------------------------------------------ r1066 | fabricecolin | 2007-10-23 16:39:08 +0200 (Tue, 23 Oct 2007) | 5 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp The language stored by XapianDatabase::propsToRecord() is in English, so make sure both XapianIndex and XapianEngine convert to locale after calling recordToProps(). Dropped code about pre-0.60 timestamps; those indexes will be upgraded. ------------------------------------------------------------------------ r1065 | fabricecolin | 2007-10-23 16:35:08 +0200 (Tue, 23 Oct 2007) | 2 lines Changed paths: M /trunk/Index/LanguageDetector.cpp Cosmetic. ------------------------------------------------------------------------ r1064 | fabricecolin | 2007-10-22 13:51:31 +0200 (Mon, 22 Oct 2007) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Ignore size 0 from filters. ------------------------------------------------------------------------ r1063 | fabricecolin | 2007-10-22 13:50:20 +0200 (Mon, 22 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Set the error number on exceptions. ------------------------------------------------------------------------ r1062 | fabricecolin | 2007-10-22 13:44:57 +0200 (Mon, 22 Oct 2007) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp Minor fix. ------------------------------------------------------------------------ r1061 | fabricecolin | 2007-10-22 13:43:43 +0200 (Mon, 22 Oct 2007) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp Check at least twice the number of requested hits so that we have a reasonable idea about whether a second hits page is needed. ------------------------------------------------------------------------ r1060 | fabricecolin | 2007-10-21 17:15:38 +0200 (Sun, 21 Oct 2007) | 2 lines Changed paths: M /trunk/configure.in Check for the OpenSSL package if required by Curl. Reported by Adel Gadllah. ------------------------------------------------------------------------ r1059 | fabricecolin | 2007-10-19 18:49:46 +0200 (Fri, 19 Oct 2007) | 2 lines Changed paths: M /trunk/README Mention that "pinot-dbus-daemon --reindex" will reset the My Documents index. ------------------------------------------------------------------------ r1058 | fabricecolin | 2007-10-19 18:20:05 +0200 (Fri, 19 Oct 2007) | 4 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/Search/pinot-search.cpp Pinot-search didn't show date and size. When showing query times, XapianEngine displays the query text instead of the output of Query::get_description() which may be misleading. ------------------------------------------------------------------------ r1057 | fabricecolin | 2007-10-19 17:48:42 +0200 (Fri, 19 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog.hh Fixed warning about order of initialization. ------------------------------------------------------------------------ r1056 | fabricecolin | 2007-10-19 16:32:49 +0200 (Fri, 19 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/mainWindow.cc In mainWindow, get UpdateDocumentThread to update labels on each document. In DaemonState, log errors reported by IndexingThread. ------------------------------------------------------------------------ r1055 | fabricecolin | 2007-10-19 16:29:56 +0200 (Fri, 19 Oct 2007) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO Current development status. ------------------------------------------------------------------------ r1054 | fabricecolin | 2007-10-18 16:56:02 +0200 (Thu, 18 Oct 2007) | 6 lines Changed paths: M /trunk/README M /trunk/Search/QueryProperties.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp XapianEngine::queryDatabase() logs how long queries took (setting up Enquire, getting the MSet, generating abstracts). PinotSettings creates a Me query on the first run. Fixed comment in QueryProperties. The README talks about D-Bus timeouts that may occur when the system is busy. ------------------------------------------------------------------------ r1053 | fabricecolin | 2007-10-18 13:23:15 +0200 (Thu, 18 Oct 2007) | 2 lines Changed paths: M /trunk/configure.in Check for socketpair(), fork() and setrlimit(). ------------------------------------------------------------------------ r1052 | fabricecolin | 2007-10-18 13:14:10 +0200 (Thu, 18 Oct 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp "title:" should really be a probabilistic term prefix. Some extra DEBUG. ------------------------------------------------------------------------ r1051 | fabricecolin | 2007-10-17 16:26:54 +0200 (Wed, 17 Oct 2007) | 2 lines Changed paths: M /trunk/README Revised section 6. ------------------------------------------------------------------------ r1050 | fabricecolin | 2007-10-16 16:22:32 +0200 (Tue, 16 Oct 2007) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Filter libraries need not link against libBasicUtils anymore. ------------------------------------------------------------------------ r1049 | fabricecolin | 2007-10-16 16:03:27 +0200 (Tue, 16 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog.cc Don't show Unknown for size 0, it's fine. Several labels may be applied, so use the plural Labels in the list. Pass an empty string for language Unknown. ------------------------------------------------------------------------ r1048 | fabricecolin | 2007-10-16 15:55:04 +0200 (Tue, 16 Oct 2007) | 4 lines Changed paths: M /trunk/Search/XapianEngine.cpp Sort by relevance first and then by date, rather than by relevance only. The latter seems to cause MSet::get_matches_estimated() to return figures way below what we would get when sorting by date then relevance... ------------------------------------------------------------------------ r1047 | fabricecolin | 2007-10-16 13:52:44 +0200 (Tue, 16 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/queryDialog.cc Give default values to filters as examples, instead of just "xxx". ------------------------------------------------------------------------ r1046 | fabricecolin | 2007-10-16 13:30:29 +0200 (Tue, 16 Oct 2007) | 2 lines Changed paths: M /trunk/README Examples of use for filters dir and title were not correct. ------------------------------------------------------------------------ r1045 | fabricecolin | 2007-10-15 16:18:26 +0200 (Mon, 15 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Be good and catch Glib exceptions too. ------------------------------------------------------------------------ r1044 | fabricecolin | 2007-10-15 15:47:38 +0200 (Mon, 15 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc UpdateDocumentThread can update labels too. ------------------------------------------------------------------------ r1043 | fabricecolin | 2007-10-15 07:40:56 +0200 (Mon, 15 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc Get index IDs by name or location. ------------------------------------------------------------------------ r1042 | fabricecolin | 2007-10-15 06:06:30 +0200 (Mon, 15 Oct 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc Synced mainWindow with changes made to propertiesDialog. It also assumed that UpdateDocumentThread only operates on My Web Pages, which is incorrect. ResultsTree::updateResult() returns false if it failed. ------------------------------------------------------------------------ r1041 | fabricecolin | 2007-10-15 06:02:41 +0200 (Mon, 15 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh Manipulate the documents list the dialog was passed directly. There's no need to call getDocumentInfo(), since the tree now caches all that information. ------------------------------------------------------------------------ r1040 | fabricecolin | 2007-10-15 05:57:50 +0200 (Mon, 15 Oct 2007) | 7 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h UpdateDocumentThread exports the index name and doesn't call getDocumentInfo() after performing the update. It's not necessary, and may retrieve old data if the update was done over D-Bus and the daemon has not flushed to the index. IndexingThread restores the document and index IDs following the call to getDocumentInfo(). We could probably do without this call, I'll have to look into it later on... ------------------------------------------------------------------------ r1039 | fabricecolin | 2007-10-15 04:54:27 +0200 (Mon, 15 Oct 2007) | 2 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Use DocumentInfo serialization in ActionQueue and ResultsTree. ------------------------------------------------------------------------ r1038 | fabricecolin | 2007-10-15 04:52:22 +0200 (Mon, 15 Oct 2007) | 2 lines Changed paths: M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h Documents can be serialized to and from strings. ------------------------------------------------------------------------ r1037 | fabricecolin | 2007-10-14 11:06:54 +0200 (Sun, 14 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.hh Reorganized the properties dialog box a bit. ------------------------------------------------------------------------ r1036 | fabricecolin | 2007-10-14 11:05:37 +0200 (Sun, 14 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/queryDialog.cc Removed unused variables. ------------------------------------------------------------------------ r1035 | fabricecolin | 2007-10-13 15:56:29 +0200 (Sat, 13 Oct 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.hh propertiesDialog can save a document's terms to file, and does some of the processing previously done in mainWindow. Also give a sensible default name to the file results are exported to. ------------------------------------------------------------------------ r1034 | fabricecolin | 2007-10-13 10:31:22 +0200 (Sat, 13 Oct 2007) | 2 lines Changed paths: M /trunk/po/Makefile.in.in Pass -c to msgfmt. ------------------------------------------------------------------------ r1033 | fabricecolin | 2007-10-13 09:03:43 +0200 (Sat, 13 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Errors actually give some indication as to what failed. Errors of the same kind are grouped together. ------------------------------------------------------------------------ r1032 | fabricecolin | 2007-10-13 08:54:13 +0200 (Sat, 13 Oct 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h When something fails, thread record an error number, which can be converted into an error string by the caller if necessary. DirectoryScannerThread records scanning errors directly into CrawlHistory. ------------------------------------------------------------------------ r1031 | fabricecolin | 2007-10-13 07:20:22 +0200 (Sat, 13 Oct 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Index/pinot-label.cpp Display labels escaped since that's how they should be fed to pinot-label. ------------------------------------------------------------------------ r1030 | fabricecolin | 2007-10-13 07:16:53 +0200 (Sat, 13 Oct 2007) | 4 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Extra column ErrorNum in CrawlHistory. The table will be dropped and recreated but it should be okay since it will take place when users upgrade to 0.80, which will trigger reindexing. ------------------------------------------------------------------------ r1029 | fabricecolin | 2007-10-13 07:11:30 +0200 (Sat, 13 Oct 2007) | 2 lines Changed paths: M /trunk/SQL/ActionQueue.cpp Serialize labels too. ------------------------------------------------------------------------ r1028 | fabricecolin | 2007-10-13 05:02:26 +0200 (Sat, 13 Oct 2007) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp If a type has no valid parent, add an entry in the alias cache that says so. ------------------------------------------------------------------------ r1027 | fabricecolin | 2007-10-11 16:27:19 +0200 (Thu, 11 Oct 2007) | 2 lines Changed paths: M /trunk/Search/Plugins/Topix.src Updated results parsing. ------------------------------------------------------------------------ r1026 | fabricecolin | 2007-10-11 16:25:20 +0200 (Thu, 11 Oct 2007) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp Consider Z-prefixed terms (stems) when expanding the query. Fixed how the query is built when limited to several documents. ------------------------------------------------------------------------ r1025 | fabricecolin | 2007-10-10 16:32:32 +0200 (Wed, 10 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Set results' timestamps when exporting. ------------------------------------------------------------------------ r1024 | fabricecolin | 2007-10-10 16:02:28 +0200 (Wed, 10 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp File chooser asks for confirmation when overwriting files. File name and folder are initialized correctly. ------------------------------------------------------------------------ r1023 | fabricecolin | 2007-10-10 15:52:43 +0200 (Wed, 10 Oct 2007) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Look for date. ------------------------------------------------------------------------ r1022 | fabricecolin | 2007-10-10 15:50:32 +0200 (Wed, 10 Oct 2007) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.cpp Worry about timezones only if GM time is needed. ------------------------------------------------------------------------ r1021 | fabricecolin | 2007-10-09 17:01:12 +0200 (Tue, 09 Oct 2007) | 2 lines Changed paths: M /trunk/Index/FilterWrapper.cpp Doh ! Why do I always confuse true and false ? ;-P ------------------------------------------------------------------------ r1020 | fabricecolin | 2007-10-09 15:26:10 +0200 (Tue, 09 Oct 2007) | 6 lines Changed paths: M /trunk/Index/XapianIndex.cpp Xapian::Database::get_metadata() doesn't throw an exception when the backend doesn't support metadata, so check the version file if get_metadata("version") returns an empty string. Extended getDocumentTerms() to return (some but not all) prefixed terms, and terms that don't have positional information. ------------------------------------------------------------------------ r1019 | fabricecolin | 2007-10-09 15:12:13 +0200 (Tue, 09 Oct 2007) | 2 lines Changed paths: M /trunk/Index/LanguageDetector.cpp M /trunk/Index/pinot-label.cpp Minor mods. ------------------------------------------------------------------------ r1018 | fabricecolin | 2007-10-09 15:09:35 +0200 (Tue, 09 Oct 2007) | 2 lines Changed paths: M /trunk/Index/FilterWrapper.cpp M /trunk/Index/pinot-index.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp Let FilterWrapper assign a default title to documents. ------------------------------------------------------------------------ r1017 | fabricecolin | 2007-10-07 12:26:07 +0200 (Sun, 07 Oct 2007) | 5 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h New method getDocumentTerms(). In XapianIndex, revert to using the version file if metadata isn't supported by the index backend (eg Quartz). Make sure setVersion() always creates the CACHEDIR.TAG file. ------------------------------------------------------------------------ r1016 | fabricecolin | 2007-10-07 12:22:37 +0200 (Sun, 07 Oct 2007) | 2 lines Changed paths: M /trunk/Index/XapianDatabase.cpp Activate keep alive for remote databases. ------------------------------------------------------------------------ r1015 | fabricecolin | 2007-10-06 17:21:40 +0200 (Sat, 06 Oct 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp Deleting the index object when exiting cannot hurt. ------------------------------------------------------------------------ r1014 | fabricecolin | 2007-10-06 17:19:33 +0200 (Sat, 06 Oct 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.h Corrected comment. ------------------------------------------------------------------------ r1013 | fabricecolin | 2007-10-04 15:59:39 +0200 (Thu, 04 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/statisticsDialog.cc Got rid of leaks where the index returned by getIndex() wasn't deleted. ------------------------------------------------------------------------ r1012 | fabricecolin | 2007-10-04 15:17:49 +0200 (Thu, 04 Oct 2007) | 2 lines Changed paths: M /trunk/pinot.desktop More translations for Name and Comment. ------------------------------------------------------------------------ r1011 | fabricecolin | 2007-10-04 15:15:24 +0200 (Thu, 04 Oct 2007) | 3 lines Changed paths: M /trunk/Index/Makefile.am M /trunk/Index/pinot-index.cpp A /trunk/Index/pinot-label.1 A /trunk/Index/pinot-label.cpp M /trunk/Makefile.am M /trunk/README M /trunk/configure.in M /trunk/pinot.spec.in New pinot-label tool to manipulate labels on indexed files. Also increased requirement on Xapian to 1.0.3. This is enforced by configure. ------------------------------------------------------------------------ r1010 | fabricecolin | 2007-10-04 15:10:00 +0200 (Thu, 04 Oct 2007) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp Labels prefixed with "X-" are private, internal labels that can only be set when the document is initially indexed and should be preserved until it's unindexed. ------------------------------------------------------------------------ r1009 | fabricecolin | 2007-10-04 14:56:11 +0200 (Thu, 04 Oct 2007) | 2 lines Changed paths: R /trunk/Search/Plugins/Yahoo.src (from /trunk/Search/Plugins/YahooAPI.src:1003) D /trunk/Search/Plugins/YahooAPI.src Renamed YahooAPI plugin to Yahoo. ------------------------------------------------------------------------ r1008 | fabricecolin | 2007-10-04 14:52:42 +0200 (Thu, 04 Oct 2007) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h Overloaded getDocumentLabels(). Fixed typo in interface header file. ------------------------------------------------------------------------ r1007 | fabricecolin | 2007-10-04 14:50:50 +0200 (Thu, 04 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp Prefix the source label with "X-". Send reply over D-Bus before flushing the index. ------------------------------------------------------------------------ r1006 | fabricecolin | 2007-10-03 15:56:38 +0200 (Wed, 03 Oct 2007) | 2 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp Don't attempt loading anything if initialize() wasn't provided paths. ------------------------------------------------------------------------ r1005 | fabricecolin | 2007-10-03 15:53:49 +0200 (Wed, 03 Oct 2007) | 5 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/prefsDialog.cc prefsDialog might not find labels in the index if the daemon hasn't had a chance to upgrade it, in which case it should get them from the configuration file. More DEBUG in ServerThreads. ------------------------------------------------------------------------ r1004 | fabricecolin | 2007-10-02 17:35:01 +0200 (Tue, 02 Oct 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Reset everything when --reindex is passed, or when upgrading from an older version. ------------------------------------------------------------------------ r1003 | fabricecolin | 2007-10-02 16:40:00 +0200 (Tue, 02 Oct 2007) | 10 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh Daemon supports new D-Bus methods AddLabel and GetLabels. The labels list is updated if necessary whenever labels are manipulated. For instance, calling SetDocumentLabels with a new label should create that label. If new labels are defined in the UI, LabelUpdateThread calls addLabel(). PinotSettings' labels list can be accessed directly without going through setters and getters. Modified the order in which the index is checked at startup and reset to take into account changes made to versioning and labels lists. Added a --reindex option to the daemon. ------------------------------------------------------------------------ r1002 | fabricecolin | 2007-10-02 16:30:42 +0200 (Tue, 02 Oct 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh Keep track of new labels separately. ------------------------------------------------------------------------ r1001 | fabricecolin | 2007-10-02 16:25:04 +0200 (Tue, 02 Oct 2007) | 5 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Actually, a new addLabel() method, together with the existing rename and deleteLabel() methods, is more suitable to label management over D-Bus. Removed the D-Bus'ified setLabels() and overloaded getLabels() for clients that want to get the list directly from the daemon. ------------------------------------------------------------------------ r1000 | fabricecolin | 2007-10-01 14:43:31 +0200 (Mon, 01 Oct 2007) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h Forgot to include config.h. Other minor mods. ------------------------------------------------------------------------ r999 | fabricecolin | 2007-09-30 11:54:25 +0200 (Sun, 30 Sep 2007) | 5 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianDatabaseFactory.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h If database metadata is available (Xapian >= 1.0.3), use it to store version and labels. New method reset() gets XapianDatabaseFactory::getDatabase() to overwrite and re-open the database. DBusXapianIndex makes use of the D-Bus method SetLabels. ------------------------------------------------------------------------ r998 | fabricecolin | 2007-09-29 08:00:41 +0200 (Sat, 29 Sep 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/configure.in NUM_VERSION defines for Xapian and DBus. ------------------------------------------------------------------------ r997 | fabricecolin | 2007-09-27 14:36:15 +0200 (Thu, 27 Sep 2007) | 5 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Query history button to show a stored query's latest results, for all engines it's been run against, as found in the history database. QueryingThread specializes into EngineQuery and EngineHistory sub-classes. ResultsTree removes entries from previous queries if necessary. ------------------------------------------------------------------------ r996 | fabricecolin | 2007-09-27 14:32:02 +0200 (Thu, 27 Sep 2007) | 2 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h Overloaded deleteItems(), added getEngines(). ------------------------------------------------------------------------ r995 | fabricecolin | 2007-09-26 17:01:39 +0200 (Wed, 26 Sep 2007) | 2 lines Changed paths: M /trunk/README Updated blurb about filters. Talk about ranges. ------------------------------------------------------------------------ r994 | fabricecolin | 2007-09-26 16:59:46 +0200 (Wed, 26 Sep 2007) | 3 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc QueryingThread and ExpandQueryThread made assumptions about how index stuff works that are better encapsulated in SearchEngine classes. ------------------------------------------------------------------------ r993 | fabricecolin | 2007-09-26 16:09:53 +0200 (Wed, 26 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh Show the results sort order in queryDialog, save it in PinotSettings. ------------------------------------------------------------------------ r992 | fabricecolin | 2007-09-26 15:58:15 +0200 (Wed, 26 Sep 2007) | 4 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Google/GoogleAPIEngine.h M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/SearchEngineFactory.cpp M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h QueryProperties has a sort order (relevance or date). XapianIndex stores date and time in value 4 for that purpose. Removed setKey() from engines interface, it's only relevant to the Google API. ------------------------------------------------------------------------ r991 | fabricecolin | 2007-09-25 15:40:00 +0200 (Tue, 25 Sep 2007) | 6 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Backtracked previous UI changes. I realized that combining queries is not that useful. Users can do that manually with copy-and-paste already. Now there's a Search Again For menuitem that runs a specific query against currently selected (indexed) results. Removed the Edit query button, edits are done on a double-click. Added tooltips. ------------------------------------------------------------------------ r990 | fabricecolin | 2007-09-25 15:32:14 +0200 (Tue, 25 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Synced with changes to query expansion. ------------------------------------------------------------------------ r989 | fabricecolin | 2007-09-25 15:29:33 +0200 (Tue, 25 Sep 2007) | 3 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h Reworked query expansion a tad. We now use an ExpandDecider, and capitalized terms as well as subject terms are not skipped. ------------------------------------------------------------------------ r988 | fabricecolin | 2007-09-24 14:50:16 +0200 (Mon, 24 Sep 2007) | 5 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Implemented search in results. Rather than having the user select results, the selected query is combined with the one whose results are being shown. The Buttons next to the queries list are replaced with ToolButtons and a MenuToolButton. ------------------------------------------------------------------------ r987 | fabricecolin | 2007-09-23 16:26:50 +0200 (Sun, 23 Sep 2007) | 3 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Synced with Rosetta. This includes updates to es.po by Jesus Tramullas, and to de.po by Andreas Meyer. ------------------------------------------------------------------------ r986 | fabricecolin | 2007-09-23 13:33:08 +0200 (Sun, 23 Sep 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.h Previous commit should have included this file too. ------------------------------------------------------------------------ r985 | fabricecolin | 2007-09-23 13:19:37 +0200 (Sun, 23 Sep 2007) | 5 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h M /trunk/Search/XapianEngine.cpp M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/UI/GTK2/src/WorkerThreads.cpp Don't bother extracting host and file filters from queries and applying them to Web results. A better way to do it would be to pass them directly to the Web engine, as most of them supports those filters natively. Hopefully, this will be implemented soon-ish. ------------------------------------------------------------------------ r984 | fabricecolin | 2007-09-23 12:30:25 +0200 (Sun, 23 Sep 2007) | 5 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h Support for time ranges (value 3). Tweaked the query properties dialog. In XapianEngine, get Enquire::get_mset() to check at least maxResultsCount + 1 so that the total results estimate is determined correctly if between 0 to maxResultsCount. ------------------------------------------------------------------------ r983 | fabricecolin | 2007-09-23 12:22:50 +0200 (Sun, 23 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Better check whether we actually obtained a Monitor. ------------------------------------------------------------------------ r982 | fabricecolin | 2007-09-21 18:51:46 +0200 (Fri, 21 Sep 2007) | 2 lines Changed paths: M /trunk/TODO Removed a bunch of stuff, added some more. ------------------------------------------------------------------------ r981 | fabricecolin | 2007-09-21 18:45:10 +0200 (Fri, 21 Sep 2007) | 2 lines Changed paths: M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h Methods replaceEntities() and removeCharacters() were moved elsewhere. ------------------------------------------------------------------------ r980 | fabricecolin | 2007-09-21 18:41:09 +0200 (Fri, 21 Sep 2007) | 6 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h MIMEScanner::initialize() takes prefixes (eg /usr /home/bozo/.local) instead of directories. Method listConfigurationFiles() returns files to monitor for modifications. Those are monitored by the UI and modifications trigger a re-initialization of MIMEScanner. Moved MonitorThread from ServerThreads to WorkerThreads. ------------------------------------------------------------------------ r979 | fabricecolin | 2007-09-19 17:36:45 +0200 (Wed, 19 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Don't refer to XapianEngine directly. ------------------------------------------------------------------------ r978 | fabricecolin | 2007-09-19 17:34:50 +0200 (Wed, 19 Sep 2007) | 3 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h M /trunk/UI/GTK2/src/WorkerThreads.cpp Just like with getFilter(), we need a version of isSupportedType() that knows about parent types. ------------------------------------------------------------------------ r977 | fabricecolin | 2007-09-19 17:32:19 +0200 (Wed, 19 Sep 2007) | 5 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Support for reinitialization. Protect the caches list with a read/write lock as it can be emptied or appended to while being read. Prevent getDefaultActions() from returning the same action more than once. ------------------------------------------------------------------------ r976 | fabricecolin | 2007-09-12 17:03:04 +0200 (Wed, 12 Sep 2007) | 2 lines Changed paths: M /trunk/configure.in Help output wasn't pretty. Reported by Reuben Thomas. ------------------------------------------------------------------------ r975 | fabricecolin | 2007-09-12 16:49:24 +0200 (Wed, 12 Sep 2007) | 5 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh In importDialog, the location field suggests URLs pulled from QueryHistory, based on what the user entered, similarly to terms suggestion on the live query field. Removed unnecessary match method in mainWindow. ------------------------------------------------------------------------ r974 | fabricecolin | 2007-09-11 15:54:26 +0200 (Tue, 11 Sep 2007) | 2 lines Changed paths: M /trunk/Makefile.am Use mkdir, not mkinstalldirs. ------------------------------------------------------------------------ r973 | fabricecolin | 2007-09-11 15:52:46 +0200 (Tue, 11 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.h Always check DBUS_VERSION before defining DBUS_API_SUBJECT_TO_CHANGE ! ------------------------------------------------------------------------ r972 | fabricecolin | 2007-09-11 15:42:26 +0200 (Tue, 11 Sep 2007) | 3 lines Changed paths: M /trunk/Index/FilterWrapper.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/FilterUtils.h M /trunk/UI/GTK2/src/WorkerThreads.cpp If a type has no filter defined, FilterUtils::getFilter() tries its parent types, obtained from MIMEScanner. ------------------------------------------------------------------------ r971 | fabricecolin | 2007-09-11 15:28:42 +0200 (Tue, 11 Sep 2007) | 9 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Applied patch by Lee Marks. The contents of defaults.list has priority over that of mimeinfo.cache. User-specific settings found in ~/.local -which may point to desktop files in the same directory- are loaded before system-wide settings. Desktop files are cached for the duration of load() to avoid having to read the same file several times. While pinot and pinot-dbus-daemon fully initialize MIMEScanner, pinot-index and pinot-search don't, as they only need to query files' types, not launch viewers. Also added a getParentTypes() method that will be useful for filtering. ------------------------------------------------------------------------ r970 | fabricecolin | 2007-09-11 14:53:12 +0200 (Tue, 11 Sep 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Method getHomeDirectory() returns ~ as a last resort if unable to determine the user's home. ------------------------------------------------------------------------ r969 | fabricecolin | 2007-09-11 14:48:02 +0200 (Tue, 11 Sep 2007) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/queryDialog.cc Store documents' size, in bytes, in value 2. Ranges with suffix 'b' will be applied on that value. ------------------------------------------------------------------------ r968 | fabricecolin | 2007-09-08 13:04:46 +0200 (Sat, 08 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/Makefile.am D /trunk/UI/GTK2/src/dateDialog.cc D /trunk/UI/GTK2/src/dateDialog.hh D /trunk/UI/GTK2/src/dateDialog_glade.cc D /trunk/UI/GTK2/src/dateDialog_glade.hh M /trunk/po/POTFILES.in Trimmed down queryDialog, obsoleted dateDialog. ------------------------------------------------------------------------ r967 | fabricecolin | 2007-09-08 12:58:49 +0200 (Sat, 08 Sep 2007) | 8 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h Date ranges are now part of the query string in the form "yyyymmdd..yyyymmdd" and processed with a DateValueRangeProcessor. Users can use the query dialog's new "Date range" filter. D, M and Y terms need not be generated for each document. If Xapian > 1.0.2, spelling correction is enabled. Whenever a query on an index doesn't return anything, the UI creates a new query (prefixed with "Corrected ") that suggests replacement terms. ------------------------------------------------------------------------ r966 | fabricecolin | 2007-09-04 14:14:42 +0200 (Tue, 04 Sep 2007) | 3 lines Changed paths: M /trunk M /trunk/TODO Replaced Search/xesam and Tokenize/filters with externals. More stuff to do... ------------------------------------------------------------------------ r965 | fabricecolin | 2007-09-04 14:07:18 +0200 (Tue, 04 Sep 2007) | 2 lines Changed paths: D /trunk/Search/xesam D /trunk/Tokenize/filters Will replace these directories with externals. ------------------------------------------------------------------------ r964 | fabricecolin | 2007-09-04 14:00:45 +0200 (Tue, 04 Sep 2007) | 2 lines Changed paths: M /trunk/pinot.spec.in Install the FAQ file. ------------------------------------------------------------------------ r963 | fabricecolin | 2007-09-04 13:33:47 +0200 (Tue, 04 Sep 2007) | 2 lines Changed paths: A /trunk/ChangeLog A /trunk/INSTALL These two files should be in the source tree. ------------------------------------------------------------------------ r962 | fabricecolin | 2007-09-04 13:30:21 +0200 (Tue, 04 Sep 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Show external indexes in the List Contents Of menu, repopulate when one is added/edited/removed. Properties of documents from external indexes are -for the time being- not shown. ------------------------------------------------------------------------ r961 | fabricecolin | 2007-09-03 14:14:49 +0200 (Mon, 03 Sep 2007) | 2 lines Changed paths: A /trunk/FAQ M /trunk/Makefile.am M /trunk/README Moved FAQ to separate file. ------------------------------------------------------------------------ r960 | fabricecolin | 2007-09-03 13:12:38 +0200 (Mon, 03 Sep 2007) | 4 lines Changed paths: M /trunk/Index/XapianIndex.cpp With Xapian 1.0, we can list all documents by getting a postlist of the empty term. This will let us browse indexes that don't have the "magic term", i.e. indexes not built with Pinot. ------------------------------------------------------------------------ r959 | fabricecolin | 2007-09-01 07:12:21 +0200 (Sat, 01 Sep 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/EnginesTree.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc Fixed warnings about shadowed variables. ------------------------------------------------------------------------ r958 | fabricecolin | 2007-09-01 07:07:38 +0200 (Sat, 01 Sep 2007) | 2 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp M /trunk/Utils/StringManip.cpp M /trunk/Utils/Url.cpp Fixed variable shadowing. ------------------------------------------------------------------------ r957 | fabricecolin | 2007-09-01 07:05:22 +0200 (Sat, 01 Sep 2007) | 2 lines Changed paths: M /trunk/pinot.spec.in No need to list files under %{_datadir}/pinot. ------------------------------------------------------------------------ r956 | fabricecolin | 2007-09-01 05:59:45 +0200 (Sat, 01 Sep 2007) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/UI/GTK2/src/pinot.cc Include config.h before checking DBUS_VERSION as it's defined there. ------------------------------------------------------------------------ r955 | fabricecolin | 2007-09-01 05:48:49 +0200 (Sat, 01 Sep 2007) | 2 lines Changed paths: M /trunk/Index/Makefile.am M /trunk/Search/Makefile.am Applied patch by Gabriel C to fix dependencies, and hopefully, SMP builds. ------------------------------------------------------------------------ r954 | fabricecolin | 2007-08-26 16:13:41 +0200 (Sun, 26 Aug 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Catch exceptions in save(). Most likely, someone deletes ~/.pinot while the application is running. ------------------------------------------------------------------------ r953 | fabricecolin | 2007-08-26 15:31:40 +0200 (Sun, 26 Aug 2007) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp Parameter to Reload wasn't a glib type. This could cause a segault. ------------------------------------------------------------------------ r951 | fabricecolin | 2007-08-22 17:52:57 +0200 (Wed, 22 Aug 2007) | 2 lines Changed paths: M /trunk/NEWS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Checking in latest po's and the NEWS file in preparation for 0.76 release. ------------------------------------------------------------------------ r950 | fabricecolin | 2007-08-22 16:18:52 +0200 (Wed, 22 Aug 2007) | 2 lines Changed paths: M /trunk/Search/pinot-search.1 M /trunk/Search/pinot-search.cpp Corrected example. ------------------------------------------------------------------------ r949 | fabricecolin | 2007-08-22 16:17:43 +0200 (Wed, 22 Aug 2007) | 3 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/configure.in Use strings for version numbers so that we don't have to worry about the decimal point. ------------------------------------------------------------------------ r948 | fabricecolin | 2007-08-21 16:24:36 +0200 (Tue, 21 Aug 2007) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po M /trunk/po/zh_TW.po Updated translations. ------------------------------------------------------------------------ r947 | fabricecolin | 2007-08-21 16:19:25 +0200 (Tue, 21 Aug 2007) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-enum-index.sh List documents whose scheme is not file://. ------------------------------------------------------------------------ r946 | fabricecolin | 2007-08-20 16:10:50 +0200 (Mon, 20 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Very minor mod. ------------------------------------------------------------------------ r945 | fabricecolin | 2007-08-20 16:04:32 +0200 (Mon, 20 Aug 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Upped version number, updated man pages. ------------------------------------------------------------------------ r944 | fabricecolin | 2007-08-20 16:01:07 +0200 (Mon, 20 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh Leave queries' details in UTF-8. ------------------------------------------------------------------------ r943 | fabricecolin | 2007-08-20 15:53:18 +0200 (Mon, 20 Aug 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Disable spelling suggestions for the time being, it's not yet used. ------------------------------------------------------------------------ r942 | fabricecolin | 2007-08-19 16:12:19 +0200 (Sun, 19 Aug 2007) | 2 lines Changed paths: M /trunk/pinot.spec.in Some mods imported from Fedora 7's spec file. ------------------------------------------------------------------------ r941 | fabricecolin | 2007-08-19 16:10:03 +0200 (Sun, 19 Aug 2007) | 3 lines Changed paths: M /trunk/AUTHORS Updated contributors list. Lee suggested giving priority to local MIME settings and Adel brought up some issues with packaging. ------------------------------------------------------------------------ r940 | fabricecolin | 2007-08-19 16:07:51 +0200 (Sun, 19 Aug 2007) | 3 lines Changed paths: M /trunk/README M /trunk/TODO Added blurb about the location of language models to the FAQ. Removed some items from the TODO, added some more... ------------------------------------------------------------------------ r939 | fabricecolin | 2007-08-19 12:37:21 +0200 (Sun, 19 Aug 2007) | 3 lines Changed paths: M /trunk/Search/WebEngine.cpp Get the charset from the document's type, as set by the collector, or failing that from the HTML filter. ------------------------------------------------------------------------ r938 | fabricecolin | 2007-08-17 18:17:08 +0200 (Fri, 17 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Don't open ViewHistory every time appendResult() is called. ------------------------------------------------------------------------ r937 | fabricecolin | 2007-08-17 17:54:00 +0200 (Fri, 17 Aug 2007) | 4 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Spelling suggestions are not supported by all backends, eg Quartz, so try indexing again without populating the spelling database if we get an UnimplementedError. ------------------------------------------------------------------------ r936 | fabricecolin | 2007-08-17 16:54:40 +0200 (Fri, 17 Aug 2007) | 2 lines Changed paths: M /trunk/Index/LanguageDetector.cpp M /trunk/Index/Makefile.am M /trunk/configure.in Look for textcat.h in libtextcat. ------------------------------------------------------------------------ r935 | fabricecolin | 2007-08-16 18:47:24 +0200 (Thu, 16 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Forgot case where launcherDialog is used. ------------------------------------------------------------------------ r934 | fabricecolin | 2007-08-16 18:31:13 +0200 (Thu, 16 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Minor mod. ------------------------------------------------------------------------ r933 | fabricecolin | 2007-08-16 18:30:37 +0200 (Thu, 16 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/MIMEScanner.cpp Make sure getDefaultActions() really provided a MIMEAction. ------------------------------------------------------------------------ r932 | fabricecolin | 2007-08-16 18:28:32 +0200 (Thu, 16 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h Unified handling of moving and deleting a file or directory. ------------------------------------------------------------------------ r931 | fabricecolin | 2007-08-16 18:26:35 +0200 (Thu, 16 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh The label was ignored. ------------------------------------------------------------------------ r930 | fabricecolin | 2007-08-16 18:25:04 +0200 (Thu, 16 Aug 2007) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Merged listDocuments...() methods into one. XapianIndex adds an extra XFILE:-prefixed term to local documents so that the file monitor can list or delete stuff without having to worry about the protocol. ------------------------------------------------------------------------ r929 | fabricecolin | 2007-08-15 15:58:38 +0200 (Wed, 15 Aug 2007) | 3 lines Changed paths: M /trunk/pinot.spec.in Simplified files section and added find_lang macro, as per Adel Gadllah's spec for Fedora 7. ------------------------------------------------------------------------ r928 | fabricecolin | 2007-08-15 15:50:14 +0200 (Wed, 15 Aug 2007) | 2 lines Changed paths: M /trunk/Utils/CommandLine.cpp Sanity check. ------------------------------------------------------------------------ r927 | fabricecolin | 2007-08-15 15:49:17 +0200 (Wed, 15 Aug 2007) | 8 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Preliminary support for queries' spelling correction. No user feedback yet. When a stored query is selected, scroll to it in the list. A query with no text is not necessarily empty (eg, if a start date is provided) so check isEmpty() to decide if a query can be run. Get all programs associated with a file type, but only use the first one, which should have the highest priority. Eventually, we'll pop up a menu to let the user select. ------------------------------------------------------------------------ r926 | fabricecolin | 2007-08-15 15:44:40 +0200 (Wed, 15 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Support for spelling correction. ------------------------------------------------------------------------ r925 | fabricecolin | 2007-08-15 15:43:18 +0200 (Wed, 15 Aug 2007) | 5 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Initialize MIMEScanner with system-wide and if available user-specific (~/.local) mimeinfo.cache, the latter's settings having priority. All the programs associated with a MIME type are loaded by initialize() and returned by getDefaultActions(). ------------------------------------------------------------------------ r924 | fabricecolin | 2007-08-15 15:38:32 +0200 (Wed, 15 Aug 2007) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp Create a CACHEDIR.TAG file when the index is versioned. This tells archivers (for instance "tar --exclude-caches") to skip the index directory. ------------------------------------------------------------------------ r923 | fabricecolin | 2007-08-14 17:33:43 +0200 (Tue, 14 Aug 2007) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po A /trunk/po/zh_TW.po Synced with Rosetta's current po's, including new zh_TW translation by Yung-Chung Lin. ------------------------------------------------------------------------ r922 | fabricecolin | 2007-08-14 16:42:58 +0200 (Tue, 14 Aug 2007) | 2 lines Changed paths: M /trunk/TODO Current status. ------------------------------------------------------------------------ r921 | fabricecolin | 2007-08-14 16:40:52 +0200 (Tue, 14 Aug 2007) | 2 lines Changed paths: M /trunk/Utils/Document.cpp Try to open files with O_NOATIME. ------------------------------------------------------------------------ r920 | fabricecolin | 2007-08-14 16:39:33 +0200 (Tue, 14 Aug 2007) | 2 lines Changed paths: M /trunk/Search/pinot-search.cpp Call setDefaultOperator(). ------------------------------------------------------------------------ r919 | fabricecolin | 2007-08-14 16:38:52 +0200 (Tue, 14 Aug 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh Add a separator in the filters list between those that apply to all engines and those that don't. ------------------------------------------------------------------------ r918 | fabricecolin | 2007-08-14 14:02:19 +0200 (Tue, 14 Aug 2007) | 2 lines Changed paths: M /trunk/scripts/python/pinot-live.py Removed shebang. ------------------------------------------------------------------------ r917 | fabricecolin | 2007-08-14 14:00:19 +0200 (Tue, 14 Aug 2007) | 3 lines Changed paths: M /trunk/Search/SherlockParser.cpp Tentative fix for boost 1.34. Plugins may not be parsed in full. This doesn't prevent the parser from extracting the required information. ------------------------------------------------------------------------ r916 | fabricecolin | 2007-08-13 16:25:09 +0200 (Mon, 13 Aug 2007) | 3 lines Changed paths: M /trunk/README Talk about the new Reload method, the search strategy. An explanation of the "ext" filter was missing. ------------------------------------------------------------------------ r915 | fabricecolin | 2007-08-08 17:24:54 +0200 (Wed, 08 Aug 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp Directories were not unindexed when deleted, only their contents were. Similarly, they were not updated when moved. They are now, and the title is set to the new location too, unless it was edited by the user. ------------------------------------------------------------------------ r914 | fabricecolin | 2007-08-08 16:00:31 +0200 (Wed, 08 Aug 2007) | 3 lines Changed paths: M /trunk/Index/FilterWrapper.cpp M /trunk/Index/XapianIndex.cpp FilterWrapper should preserve/use the provided title for the top-level document. Less DEBUG output in XapianIndex. ------------------------------------------------------------------------ r913 | fabricecolin | 2007-08-08 15:58:12 +0200 (Wed, 08 Aug 2007) | 3 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h A DELETE event would cause the monitor to deadlock and therefore stop picking up new events. ------------------------------------------------------------------------ r912 | fabricecolin | 2007-08-06 16:32:57 +0200 (Mon, 06 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Reload returns a boolean to indicate if the configuration is being reloaded. ------------------------------------------------------------------------ r911 | fabricecolin | 2007-08-06 16:30:46 +0200 (Mon, 06 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h Don't keep track of what's being crawled, get that information from the thread. ------------------------------------------------------------------------ r910 | fabricecolin | 2007-08-06 16:29:14 +0200 (Mon, 06 Aug 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h There's no need to activate all parsing options when validating a query. ------------------------------------------------------------------------ r909 | fabricecolin | 2007-08-06 16:26:37 +0200 (Mon, 06 Aug 2007) | 2 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianIndex.cpp Fixed deadlock introduced by previous check-in, added some extra DEBUG code. ------------------------------------------------------------------------ r908 | fabricecolin | 2007-08-06 16:23:47 +0200 (Mon, 06 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Fixed D-Bus error handling. ------------------------------------------------------------------------ r907 | fabricecolin | 2007-08-05 13:06:51 +0200 (Sun, 05 Aug 2007) | 6 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/UI/GTK2/src/ServerThreads.cpp Altered XapianEngine multi-step search code. Steps 3 and 4 are gone, the original behaviour can be reproduced by changing the default operator and calling runQuery() again. That's what the DBusServlet thread does for queries received through SimpleQuery. Spelling corrections are available. ------------------------------------------------------------------------ r906 | fabricecolin | 2007-08-04 17:41:23 +0200 (Sat, 04 Aug 2007) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Search/XapianEngine.cpp Support for spelling suggestions. No feedback is given yet. ------------------------------------------------------------------------ r905 | fabricecolin | 2007-08-04 07:41:25 +0200 (Sat, 04 Aug 2007) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Tokenize/FilterUtils.cpp M /trunk/Tokenize/Tokenizer.h Minor fixes and mods. ------------------------------------------------------------------------ r904 | fabricecolin | 2007-08-04 07:29:54 +0200 (Sat, 04 Aug 2007) | 3 lines Changed paths: M /trunk/README M /trunk/pinot.spec.in gSOAP 2.7.9e is required. Updated the spec's Summary for the main and audio-docs packages. ------------------------------------------------------------------------ r903 | fabricecolin | 2007-08-04 07:24:05 +0200 (Sat, 04 Aug 2007) | 5 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/FilterWrapper.cpp M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/pinot-index.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp Replaced our naive Tokenizer class with Xapian 1.0 TermGenerator. When removing common terms, make sure we remove as much as necessary and not just the very first posting. Dropped method setStemmingMode(). ------------------------------------------------------------------------ r902 | fabricecolin | 2007-08-04 06:43:43 +0200 (Sat, 04 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc Clear the progress bar's text when importing. ------------------------------------------------------------------------ r901 | fabricecolin | 2007-08-03 19:53:16 +0200 (Fri, 03 Aug 2007) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp Neon downloader had not been updated after the move to Dijon filters. ------------------------------------------------------------------------ r900 | fabricecolin | 2007-08-03 19:34:12 +0200 (Fri, 03 Aug 2007) | 3 lines Changed paths: M /trunk/Search/Google/GAPIC.cpp M /trunk/Search/Google/GAPIClient.cpp M /trunk/Search/Google/GAPIClientLib.cpp M /trunk/Search/Google/GAPIGoogleSearchBindingProxy.h M /trunk/Search/Google/GAPIH.h M /trunk/Search/Google/GAPIStub.h M /trunk/Search/Google/GoogleSearch.h M /trunk/Search/SOAPEnvH.h M /trunk/Search/SOAPEnvStub.h M /trunk/Search/pinot-search.cpp Regenerated these files with gSOAP 2.7.9e and fixed GAPIC.cpp manually as in revision 189. ------------------------------------------------------------------------ r899 | fabricecolin | 2007-08-03 16:52:01 +0200 (Fri, 03 Aug 2007) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp Fixed bug that prevented from running queries that only define a min and/or max date, and no query string. ------------------------------------------------------------------------ r898 | fabricecolin | 2007-08-03 16:51:09 +0200 (Fri, 03 Aug 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc Combos shouldn't expand when the window they are in is enlarged. ------------------------------------------------------------------------ r897 | fabricecolin | 2007-08-01 15:55:07 +0200 (Wed, 01 Aug 2007) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Prefix wasn't applied correctly in listDocumentsInDirectory(). ------------------------------------------------------------------------ r896 | fabricecolin | 2007-08-01 15:51:01 +0200 (Wed, 01 Aug 2007) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml New D-Bus method Reload asks the daemon to reload its configuration and restart crawling and monitoring. The UI now uses this method instead of GetStatistics when the Preferences box' OK button is pressed. ------------------------------------------------------------------------ r894 | fabricecolin | 2007-07-27 18:43:37 +0200 (Fri, 27 Jul 2007) | 2 lines Changed paths: M /trunk/NEWS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Updated with latest changes. ------------------------------------------------------------------------ r893 | fabricecolin | 2007-07-27 18:29:15 +0200 (Fri, 27 Jul 2007) | 3 lines Changed paths: M /trunk/Index/FilterWrapper.cpp Don't stop going through nested documents just before the current one needs to be passed to another filter, and assign it the right MIME type. ------------------------------------------------------------------------ r892 | fabricecolin | 2007-07-27 17:07:52 +0200 (Fri, 27 Jul 2007) | 2 lines Changed paths: M /trunk/README When resetting the history, it's ~/.pinot/history-daemon that matters. ------------------------------------------------------------------------ r891 | fabricecolin | 2007-07-27 16:36:32 +0200 (Fri, 27 Jul 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/propertiesDialog.cc Modified get_column_height() hack to do without the font's height. ------------------------------------------------------------------------ r890 | fabricecolin | 2007-07-26 16:28:55 +0200 (Thu, 26 Jul 2007) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Final touch to 0.75. ------------------------------------------------------------------------ r889 | fabricecolin | 2007-07-26 16:24:35 +0200 (Thu, 26 Jul 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Another very minor mod. ------------------------------------------------------------------------ r888 | fabricecolin | 2007-07-26 15:31:23 +0200 (Thu, 26 Jul 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.h Minor fix to get rid of compilation warning. ------------------------------------------------------------------------ r887 | fabricecolin | 2007-07-26 13:52:56 +0200 (Thu, 26 Jul 2007) | 2 lines Changed paths: M /trunk/README Removed reference to mbox configuration. ------------------------------------------------------------------------ r886 | fabricecolin | 2007-07-25 16:43:23 +0200 (Wed, 25 Jul 2007) | 4 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Makefile.am M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in M /trunk/pinot.spec.in Set version number to 0.75. Indexes built with older versions will be upgraded. New file ChangeLog-dijon is installed, Ask.src isn't any more. Updated manual pages. ------------------------------------------------------------------------ r885 | fabricecolin | 2007-07-25 15:50:56 +0200 (Wed, 25 Jul 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp Modified label for default query filter. ------------------------------------------------------------------------ r884 | fabricecolin | 2007-07-25 15:42:03 +0200 (Wed, 25 Jul 2007) | 2 lines Changed paths: M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/XapianEngine.cpp XapianEngine::runQuery() checks whether the query is empty. ------------------------------------------------------------------------ r883 | fabricecolin | 2007-07-25 15:40:02 +0200 (Wed, 25 Jul 2007) | 2 lines Changed paths: D /trunk/Search/Plugins/Ask.src M /trunk/Search/Plugins/Exalead.src Updated Exalead, removed Ask. ------------------------------------------------------------------------ r882 | fabricecolin | 2007-07-25 15:37:17 +0200 (Wed, 25 Jul 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc If a ListerThread fails, reset the m_browsingIndex flag. Cosmetic mods to WorkerThreads. ------------------------------------------------------------------------ r881 | fabricecolin | 2007-07-24 15:34:36 +0200 (Tue, 24 Jul 2007) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Current translations, including updates from Balaam's Miracle (Dutch) and _PN_boy (Portuguese). ------------------------------------------------------------------------ r880 | fabricecolin | 2007-07-24 15:00:35 +0200 (Tue, 24 Jul 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc No point translating the copyright string. ------------------------------------------------------------------------ r879 | fabricecolin | 2007-07-24 14:21:42 +0200 (Tue, 24 Jul 2007) | 2 lines Changed paths: M /trunk/COPYING M /trunk/Index/pinot-index.cpp M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Clarify that the license is GPL v2 and update the FSF's address in COPYING ! ;-) ------------------------------------------------------------------------ r878 | fabricecolin | 2007-07-24 14:01:29 +0200 (Tue, 24 Jul 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Provide a field mapping to XapianQueryBuilder. ------------------------------------------------------------------------ r877 | fabricecolin | 2007-07-23 16:29:59 +0200 (Mon, 23 Jul 2007) | 2 lines Changed paths: M /trunk/README M /trunk/pinot.spec.in Upped requirements on Xapian and SQLite. ------------------------------------------------------------------------ r876 | fabricecolin | 2007-07-21 06:28:21 +0200 (Sat, 21 Jul 2007) | 2 lines Changed paths: M /trunk/po/POTFILES.in MboxHandler was removed. ------------------------------------------------------------------------ r875 | fabricecolin | 2007-07-21 06:26:57 +0200 (Sat, 21 Jul 2007) | 3 lines Changed paths: M /trunk/README Talk about mbox files are handled now, file types and external-filters.xml as well as the new patterns list. ------------------------------------------------------------------------ r874 | fabricecolin | 2007-07-21 06:17:51 +0200 (Sat, 21 Jul 2007) | 7 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/Makefile.am D /trunk/UI/GTK2/src/MboxHandler.cpp D /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Mbox files are no longer a special case, they don't need to be configured separately. Any mbox file found during a crawl should be handled correctly; same for any other file type that has sub-documents. The file patterns list may be used as a blacklist or a whitelist, depending on preferences. Some other minor changes. ------------------------------------------------------------------------ r873 | fabricecolin | 2007-07-21 06:09:30 +0200 (Sat, 21 Jul 2007) | 3 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/Google/Makefile.am M /trunk/Search/pinot-search.cpp Pinot-search ouputs type and language too. Some other minor changes. ------------------------------------------------------------------------ r872 | fabricecolin | 2007-07-21 06:06:17 +0200 (Sat, 21 Jul 2007) | 5 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/FilterWrapper.cpp M /trunk/Index/FilterWrapper.h M /trunk/Index/IndexInterface.h M /trunk/Index/LanguageDetector.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/pinot-index.cpp A XFILE-prefixed term is added to local documents that are sub-documents of another. FilterWrapper unindexes them whenever the top document is updated or unindexed. IndexInterface's unindexDocuments() was extended accordingly. Some other minor changes. ------------------------------------------------------------------------ r871 | fabricecolin | 2007-07-20 18:55:50 +0200 (Fri, 20 Jul 2007) | 2 lines Changed paths: M /trunk/Utils/Timer.cpp M /trunk/Utils/Timer.h Return milliseconds. ------------------------------------------------------------------------ r869 | fabricecolin | 2007-06-24 08:50:56 +0200 (Sun, 24 Jun 2007) | 2 lines Changed paths: M /trunk/NEWS Releasing 0.74 today. ------------------------------------------------------------------------ r868 | fabricecolin | 2007-06-23 07:49:29 +0200 (Sat, 23 Jun 2007) | 2 lines Changed paths: M /trunk/README Mention that downgrading to Xapian 0.9 requires resetting the indexes. ------------------------------------------------------------------------ r867 | fabricecolin | 2007-06-23 07:46:50 +0200 (Sat, 23 Jun 2007) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Updated translations. Christian Dywan pointed out that make uninstall doesn't remove everything. ------------------------------------------------------------------------ r866 | fabricecolin | 2007-06-23 07:33:58 +0200 (Sat, 23 Jun 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated man pages. ------------------------------------------------------------------------ r865 | fabricecolin | 2007-06-23 07:29:56 +0200 (Sat, 23 Jun 2007) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/configure.in Set version to 0.74, replaced SEARCH_LTLIBRARIES hack with a conditional. ------------------------------------------------------------------------ r864 | fabricecolin | 2007-06-23 07:24:47 +0200 (Sat, 23 Jun 2007) | 2 lines Changed paths: M /trunk/Utils/Timer.cpp Minor mod. ------------------------------------------------------------------------ r863 | fabricecolin | 2007-06-21 13:41:53 +0200 (Thu, 21 Jun 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Synced with changes to PinotSettings. ------------------------------------------------------------------------ r862 | fabricecolin | 2007-06-21 13:40:28 +0200 (Thu, 21 Jun 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h Full scans can be forced. ------------------------------------------------------------------------ r861 | fabricecolin | 2007-06-21 13:39:20 +0200 (Thu, 21 Jun 2007) | 5 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Location of history database is obtained with a getter to allow the daemon to have its own. If it doesn't exist at startup, a copy of the client's is made. The daemon has a --fullscan parameter that forces full scans. ------------------------------------------------------------------------ r860 | fabricecolin | 2007-06-19 16:09:58 +0200 (Tue, 19 Jun 2007) | 3 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Makefile.am M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/SearchEngineInterface.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Search/pinot-search.cpp Initial support for the Xesam Query and User Languages. Both only apply to index queries. ------------------------------------------------------------------------ r859 | fabricecolin | 2007-06-14 15:28:11 +0200 (Thu, 14 Jun 2007) | 2 lines Changed paths: A /trunk/Search/xesam A /trunk/Search/xesam/README Placeholder for the Xesam parsers and related classes. ------------------------------------------------------------------------ r858 | fabricecolin | 2007-06-13 16:19:47 +0200 (Wed, 13 Jun 2007) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/TODO Cleanup. ------------------------------------------------------------------------ r857 | fabricecolin | 2007-06-04 14:39:20 +0200 (Mon, 04 Jun 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/prefsDialog.cc Removed unused includes. ------------------------------------------------------------------------ r856 | fabricecolin | 2007-06-02 06:40:05 +0200 (Sat, 02 Jun 2007) | 5 lines Changed paths: M /trunk/Index/XapianIndex.cpp Adopted Xapian 1.0's indexing strategy to make the most of the QueryParser. This means that Pinot 0.73 and older will give better results for those still using Xapian 0.9. See http://www.xapian.org/docs/termgenerator.html for details. ------------------------------------------------------------------------ r855 | fabricecolin | 2007-05-31 09:49:38 +0200 (Thu, 31 May 2007) | 5 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/Languages.cpp M /trunk/textcat_conf.txt Hungarian, Romanian and Turkish are supported by Xapian 1.0's stemmers. Libtextcat 2.2 has Language Models for them but not 3.0 yet. Xapian::Stem is supposed to throw an exception for languages it doesn't know about, so this change should be backward-compatible with Xapian < 1.0. ------------------------------------------------------------------------ r854 | fabricecolin | 2007-05-31 09:45:02 +0200 (Thu, 31 May 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc The properties dialog shows the document ID in the title bar. ------------------------------------------------------------------------ r853 | fabricecolin | 2007-05-31 09:10:54 +0200 (Thu, 31 May 2007) | 3 lines Changed paths: M /trunk/Makefile.am Added uninstall-local target to remove files installed by install-data-local. This omission was reported by Christian Dywan on May 14th. ------------------------------------------------------------------------ r852 | fabricecolin | 2007-05-29 13:35:47 +0200 (Tue, 29 May 2007) | 3 lines Changed paths: M /trunk/README Specify that gsoap is optional, just like the Google SOAP API. As for openssh-askpass, it's only needed if _SSH_TUNNEL is defined. ------------------------------------------------------------------------ r851 | fabricecolin | 2007-05-29 13:10:54 +0200 (Tue, 29 May 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Fixed previous check-in. ------------------------------------------------------------------------ r850 | fabricecolin | 2007-05-27 09:34:57 +0200 (Sun, 27 May 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Set the index ID on results , when in listing mode, to ensure their properties can be shown and edited properly. ------------------------------------------------------------------------ r849 | fabricecolin | 2007-05-26 14:26:07 +0200 (Sat, 26 May 2007) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h Minor cleanup. ------------------------------------------------------------------------ r848 | fabricecolin | 2007-05-26 06:53:07 +0200 (Sat, 26 May 2007) | 7 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Both IndexBrowser and Querying threads can be used to populate index lists, the latter relies on the search engine's total number of results estimate. In IndexPage, the combo now shows all stored queries. They can be applied to the current index just like labels could previously. Stored queries maximum number of results is then ignored. Synced with changes made to Search. ------------------------------------------------------------------------ r847 | fabricecolin | 2007-05-26 06:21:17 +0200 (Sat, 26 May 2007) | 4 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Google/GoogleAPIEngine.h M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/PluginWebEngine.h M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/WebEngine.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/Search/pinot-search.cpp A start document can be passed to runQuery(). Support for total number of results estimates. Removed some cruft. ------------------------------------------------------------------------ r845 | fabricecolin | 2007-05-23 15:31:04 +0200 (Wed, 23 May 2007) | 2 lines Changed paths: M /trunk/NEWS Forgot to mention a few things. ------------------------------------------------------------------------ r844 | fabricecolin | 2007-05-23 15:25:28 +0200 (Wed, 23 May 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/NEWS M /trunk/Search/pinot-search.1 M /trunk/TODO M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Releasing 0.73. ------------------------------------------------------------------------ r843 | fabricecolin | 2007-05-22 13:53:33 +0200 (Tue, 22 May 2007) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS M /trunk/README M /trunk/configure.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po 0.73 release is close... ------------------------------------------------------------------------ r842 | fabricecolin | 2007-05-22 13:18:08 +0200 (Tue, 22 May 2007) | 2 lines Changed paths: M /trunk/TODO -3 +3 items. ------------------------------------------------------------------------ r841 | fabricecolin | 2007-05-22 13:16:03 +0200 (Tue, 22 May 2007) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp Stemming strategy was messed up. Set FLAG_PURE_NOT if Xapian >= 1.0. ------------------------------------------------------------------------ r840 | fabricecolin | 2007-05-22 13:13:46 +0200 (Tue, 22 May 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Look out for ultra-rare case when SQLite fails to delete an item and ActionQueue::popItem() returns the same thing repeatedly. The root cause is unknown. ------------------------------------------------------------------------ r839 | fabricecolin | 2007-05-19 12:54:06 +0200 (Sat, 19 May 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h Only full scans check for files that have been deleted while the daemon wasn't running, and they should now happen roughly one time out of three. Refactored DBusServletThread a bit, added method runQuery(). ------------------------------------------------------------------------ r838 | fabricecolin | 2007-05-18 15:52:23 +0200 (Fri, 18 May 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Recommend updating My Web Pages is the index's format is obsolete or it's older than PINOT_INDEX_MIN_VERSION. ------------------------------------------------------------------------ r837 | fabricecolin | 2007-05-18 15:50:26 +0200 (Fri, 18 May 2007) | 5 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc When the index has to be replaced, close and overwrite it instead of deleting all documents. Do an upgrade if the index is older than PINOT_INDEX_MIN_VERSION or its format has been obsoleted. ------------------------------------------------------------------------ r836 | fabricecolin | 2007-05-18 15:45:32 +0200 (Fri, 18 May 2007) | 3 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/ActionQueue.h M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h Let the sub-class decide when to open the database. New method CrawlHistory::updateItems(). ------------------------------------------------------------------------ r835 | fabricecolin | 2007-05-18 15:42:48 +0200 (Fri, 18 May 2007) | 2 lines Changed paths: M /trunk/Index/XapianDatabase.cpp Minor change. ------------------------------------------------------------------------ r834 | fabricecolin | 2007-05-12 05:49:33 +0200 (Sat, 12 May 2007) | 4 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianDatabaseFactory.cpp M /trunk/Index/XapianDatabaseFactory.h Allow overwriting the index. Preparing for Xapian 1.0 : upon DatabaseVersionError, overwrite the database and let the app know so that it can do a full reindexing. ------------------------------------------------------------------------ r833 | fabricecolin | 2007-05-12 05:46:50 +0200 (Sat, 12 May 2007) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Preparing for Xapian 1.0 : get_errno() is deprecated and stem_word() is gone. ------------------------------------------------------------------------ r832 | fabricecolin | 2007-05-07 15:54:45 +0200 (Mon, 07 May 2007) | 3 lines Changed paths: M /trunk/configure.in Define PINOT_INDEX_MIN_VERSION, the version number below which an index upgrade should be performed. ------------------------------------------------------------------------ r831 | fabricecolin | 2007-05-06 04:59:59 +0200 (Sun, 06 May 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Recommend updating documents in My Web Pages only if that index is not empty. ------------------------------------------------------------------------ r830 | fabricecolin | 2007-05-06 04:58:49 +0200 (Sun, 06 May 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp In flat mode, don't attempt updating the parent group, there isn't any. This fixes a crash when unindexing documents. This bug was reported by Marco . ------------------------------------------------------------------------ r829 | fabricecolin | 2007-05-06 04:15:53 +0200 (Sun, 06 May 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc The Indexing tab has moved to fourth position. ------------------------------------------------------------------------ r827 | fabricecolin | 2007-04-28 04:32:44 +0200 (Sat, 28 Apr 2007) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS Releasing 0.72 today. ------------------------------------------------------------------------ r826 | fabricecolin | 2007-04-28 03:58:35 +0200 (Sat, 28 Apr 2007) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Synced with current source. ------------------------------------------------------------------------ r825 | fabricecolin | 2007-04-27 17:52:41 +0200 (Fri, 27 Apr 2007) | 5 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Sometimes when the daemon is started via D-Bus activation and spends too long upgrading the index, it receives a SIGKILL... I have moved the upgrade to DaemonState::start() so that it happens after D-Bus is initialized and before the main loop is run. This seems to help. ------------------------------------------------------------------------ r824 | fabricecolin | 2007-04-27 17:48:25 +0200 (Fri, 27 Apr 2007) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Better have Filter.cc in the libraries source lists. ------------------------------------------------------------------------ r823 | fabricecolin | 2007-04-27 16:00:50 +0200 (Fri, 27 Apr 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated man pages. ------------------------------------------------------------------------ r822 | fabricecolin | 2007-04-27 15:35:45 +0200 (Fri, 27 Apr 2007) | 2 lines Changed paths: M /trunk/TODO Update. ------------------------------------------------------------------------ r821 | fabricecolin | 2007-04-26 16:27:30 +0200 (Thu, 26 Apr 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/configure.in Upped versio number to 0.72. Older versions of My Documents are upgraded by the daemon. ------------------------------------------------------------------------ r820 | fabricecolin | 2007-04-26 16:25:41 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/README Explain saving of results. ------------------------------------------------------------------------ r819 | fabricecolin | 2007-04-26 15:39:47 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Add term XDIR:/ to documents so that searches can be restricted with "dir:/". ------------------------------------------------------------------------ r818 | fabricecolin | 2007-04-26 13:51:58 +0200 (Thu, 26 Apr 2007) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/po/POTFILES.in M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Updated de (Christian Dywan) and it (Vincenzo Consales) translations. IndexTree no longer exists. ------------------------------------------------------------------------ r817 | fabricecolin | 2007-04-26 13:37:20 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/Utils/xdgmime/ChangeLog M /trunk/Utils/xdgmime/xdgmimeglob.c Synced with gtk+'s xdgmime. ------------------------------------------------------------------------ r816 | fabricecolin | 2007-04-26 13:34:21 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Cosmetic change. ------------------------------------------------------------------------ r815 | fabricecolin | 2007-04-26 13:33:08 +0200 (Thu, 26 Apr 2007) | 6 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Synced with changes made to ResultsTree/IndexTree. Implemented results export in on_exportresults_activate(). In view_documents(), open HTTP/HTTPS documents with the default browser. This fixes a problem where documents identified as "application/x-php" were open with a text editor. ------------------------------------------------------------------------ r814 | fabricecolin | 2007-04-26 13:28:12 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: D /trunk/UI/GTK2/src/IndexTree.cpp D /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h Removed IndexTree and IndexModelColumns classes. ------------------------------------------------------------------------ r813 | fabricecolin | 2007-04-26 13:26:36 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Added Save As menuitem under Results. ------------------------------------------------------------------------ r812 | fabricecolin | 2007-04-26 13:22:13 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h Use ResultsTree, not IndexTree. ------------------------------------------------------------------------ r811 | fabricecolin | 2007-04-26 13:20:09 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Several small changes to allow using this class as a replacement for IndexTree. ------------------------------------------------------------------------ r810 | fabricecolin | 2007-04-26 13:16:33 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/prefsDialog.cc Modified select_file_name() so that prepare_file_chooser() is more useful. ------------------------------------------------------------------------ r809 | fabricecolin | 2007-04-26 13:13:15 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/EnginesTree.h Renamed the signal getter method. ------------------------------------------------------------------------ r808 | fabricecolin | 2007-04-26 13:09:32 +0200 (Thu, 26 Apr 2007) | 2 lines Changed paths: M /trunk/Utils/StringManip.cpp Minor changes. ------------------------------------------------------------------------ r807 | fabricecolin | 2007-04-24 16:30:38 +0200 (Tue, 24 Apr 2007) | 5 lines Changed paths: M /trunk/Search/Makefile.am A /trunk/Search/ResultsExporter.cpp A /trunk/Search/ResultsExporter.h M /trunk/Search/SearchEngineFactory.cpp M /trunk/Search/SearchEngineFactory.h M /trunk/Search/pinot-search.cpp Implemented results export, to either CSV (semi-colon) or OpenSearch response XML/RSS. pinot-search cai either save the results to file or output them. SearchEngineFactory has a new utility method that returns a plugin's engine name. ------------------------------------------------------------------------ r806 | fabricecolin | 2007-04-16 15:28:52 +0200 (Mon, 16 Apr 2007) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Added Filter.cc to sources list, as found in revision 26 of filters. ------------------------------------------------------------------------ r805 | fabricecolin | 2007-04-10 16:16:33 +0200 (Tue, 10 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Phased out Result and IndexedDocument classes. ------------------------------------------------------------------------ r804 | fabricecolin | 2007-04-09 16:34:57 +0200 (Mon, 09 Apr 2007) | 2 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h Result class was dropped. ------------------------------------------------------------------------ r803 | fabricecolin | 2007-04-09 14:49:10 +0200 (Mon, 09 Apr 2007) | 3 lines Changed paths: M /trunk/Monitor/MonitorHandler.h M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/OpenSearchParser.h M /trunk/Search/PluginParsers.h M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/Search/pinot-search.cpp M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h D /trunk/Utils/IndexedDocument.cpp D /trunk/Utils/IndexedDocument.h M /trunk/Utils/Makefile.am D /trunk/Utils/Result.cpp D /trunk/Utils/Result.h Merged Result and IndexedDocument back into DocumentInfo. In XapianEngine, generate an abstract with the terms matched by the document. ------------------------------------------------------------------------ r802 | fabricecolin | 2007-04-04 15:32:31 +0200 (Wed, 04 Apr 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc Removed some fat off ResultsTree. ------------------------------------------------------------------------ r801 | fabricecolin | 2007-04-03 17:05:25 +0200 (Tue, 03 Apr 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Can remove deb from the blacklist since it's in the external filter's configuration file. ------------------------------------------------------------------------ r800 | fabricecolin | 2007-03-31 11:24:27 +0200 (Sat, 31 Mar 2007) | 2 lines Changed paths: M /trunk/NEWS I forgot to list two new features... Never mind. ------------------------------------------------------------------------ r798 | fabricecolin | 2007-03-31 05:30:44 +0200 (Sat, 31 Mar 2007) | 2 lines Changed paths: M /trunk/NEWS Changes since previous release. ------------------------------------------------------------------------ r797 | fabricecolin | 2007-03-29 15:25:54 +0200 (Thu, 29 Mar 2007) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp More verbose DEBUG output. ------------------------------------------------------------------------ r796 | fabricecolin | 2007-03-29 15:25:12 +0200 (Thu, 29 Mar 2007) | 2 lines Changed paths: M /trunk/TODO Removed items that were implemented recently and added a whole lot more... ------------------------------------------------------------------------ r795 | fabricecolin | 2007-03-29 15:19:08 +0200 (Thu, 29 Mar 2007) | 2 lines Changed paths: M /trunk/README Details upgrading to 0.71. Some minor changes. ------------------------------------------------------------------------ r794 | fabricecolin | 2007-03-29 15:13:24 +0200 (Thu, 29 Mar 2007) | 2 lines Changed paths: M /trunk/Utils/xdgmime/ChangeLog M /trunk/Utils/xdgmime/xdgmimecache.c Sync with current gtk+ source. ------------------------------------------------------------------------ r793 | fabricecolin | 2007-03-29 13:07:48 +0200 (Thu, 29 Mar 2007) | 2 lines Changed paths: M /trunk/pinot.desktop Extra translations. ------------------------------------------------------------------------ r792 | fabricecolin | 2007-03-25 17:05:57 +0200 (Sun, 25 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp The D-Bus interface description file was looked for in the wrong directory. ------------------------------------------------------------------------ r791 | fabricecolin | 2007-03-25 09:33:34 +0200 (Sun, 25 Mar 2007) | 2 lines Changed paths: M /trunk/po/de.po M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/it.po M /trunk/po/nl.po M /trunk/po/pt.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Synced po files. ------------------------------------------------------------------------ r790 | fabricecolin | 2007-03-24 08:28:21 +0100 (Sat, 24 Mar 2007) | 4 lines Changed paths: M /trunk/AUTHORS M /trunk/configure.in M /trunk/pinot.spec.in A /trunk/po/de.po A /trunk/po/it.po A /trunk/po/pt.po New German, Italian and Portuguese translations by by Christian Dywan, Michele Angrisano and _PN_boy respectively. Bumped version number. ------------------------------------------------------------------------ r789 | fabricecolin | 2007-03-24 08:04:10 +0100 (Sat, 24 Mar 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/queryDialog.cc Minor changes : check RefPtr's and complain if XML file cannot be found when Introspect is received. ------------------------------------------------------------------------ r788 | fabricecolin | 2007-03-24 05:18:00 +0100 (Sat, 24 Mar 2007) | 6 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc The daemon checks the version of My Documents and performs an upgrade if necessary. The D-Bus connection is not explicitely closed at exit time in case a servlet thread is still running. The UI also checks the version of My Web Pages but only advises to update all documents. ------------------------------------------------------------------------ r787 | fabricecolin | 2007-03-24 05:14:35 +0100 (Sat, 24 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp Get the file chooser to show hidden files. ------------------------------------------------------------------------ r786 | fabricecolin | 2007-03-24 05:13:14 +0100 (Sat, 24 Mar 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h All worker threads are non-joinable. DaemonState doesn't delete the handlers when destroyed in case they are still being used. Since it's destroyed only when the program exits, it should be okay. ------------------------------------------------------------------------ r785 | fabricecolin | 2007-03-23 16:15:03 +0100 (Fri, 23 Mar 2007) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Added versioning. New method unindexAllDocuments() can reset the whole index. Some cleanup. ------------------------------------------------------------------------ r784 | fabricecolin | 2007-03-23 14:59:14 +0100 (Fri, 23 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade Reworked Network tab layout. ------------------------------------------------------------------------ r783 | fabricecolin | 2007-03-23 14:57:47 +0100 (Fri, 23 Mar 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Made proxy type a string so that it can be passed as is to the downloaders. Network tab has same layout as indexDialog, and fields are disabled when direct connection is activated. ------------------------------------------------------------------------ r782 | fabricecolin | 2007-03-23 14:33:06 +0100 (Fri, 23 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.hh M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh Replaced ComboBox + ListStore with ComboBoxText. ------------------------------------------------------------------------ r781 | fabricecolin | 2007-03-22 13:45:56 +0100 (Thu, 22 Mar 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/indexDialog.hh M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/indexDialog_glade.hh M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Added Network tab to Preferences box for configuring the proxy. Both indexDialog and prefsDialog use ComboBoxText. ------------------------------------------------------------------------ r780 | fabricecolin | 2007-03-22 12:02:19 +0100 (Thu, 22 Mar 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/Utils/MIMEScanner.cpp Tweaked DEBUG output. ------------------------------------------------------------------------ r779 | fabricecolin | 2007-03-22 12:00:20 +0100 (Thu, 22 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Documents MIME type was lost by the results tree. ------------------------------------------------------------------------ r778 | fabricecolin | 2007-03-22 11:56:57 +0100 (Thu, 22 Mar 2007) | 2 lines Changed paths: M /trunk/Collect/FileCollector.cpp M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/Utils/Document.cpp Directories are indexed as separate documents. ------------------------------------------------------------------------ r777 | fabricecolin | 2007-03-21 18:15:31 +0100 (Wed, 21 Mar 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/scripts/python/pinot-live.py Updated deskbar handler to cope with new GetDocumentInfo. Use deskbar.Utils.url_show() if available. Also added license and copyright notices. D-Bus daemon implements org.freedesktop.DBus.Introspectable.Introspect. ------------------------------------------------------------------------ r776 | fabricecolin | 2007-03-20 12:46:40 +0100 (Tue, 20 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp MonitorThread::processEvents() didn't skip dotfiles and blacklisted files. ------------------------------------------------------------------------ r775 | fabricecolin | 2007-03-20 12:42:22 +0100 (Tue, 20 Mar 2007) | 2 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h M /trunk/Search/pinot-search.1 M /trunk/Search/pinot-search.cpp Changes to allow configuring the proxy to use for Web searches. ------------------------------------------------------------------------ r774 | fabricecolin | 2007-03-20 12:39:35 +0100 (Tue, 20 Mar 2007) | 3 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Index/pinot-index.cpp Proxy can be configured with "-a address -p port -t type". Check and index no longer require an URL each. ------------------------------------------------------------------------ r773 | fabricecolin | 2007-03-20 11:50:45 +0100 (Tue, 20 Mar 2007) | 5 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/MonitorEvent.cpp M /trunk/Monitor/MonitorEvent.h Don't skip CREATE events on files ! MOVED_FROM events for which we didn't receive a MOVED_TO (because eg, the file was moved to an unmonitored location on the same filesystem) are expired after a minute and become DELETE events. ------------------------------------------------------------------------ r772 | fabricecolin | 2007-03-19 14:27:07 +0100 (Mon, 19 Mar 2007) | 3 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h Added settings for proxy address, port and type (not applicable to Neon). Note that curl should automatically use the *_proxy environment variables. ------------------------------------------------------------------------ r771 | fabricecolin | 2007-03-18 13:47:08 +0100 (Sun, 18 Mar 2007) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Method getSourceItems() can return items newer than a given date. ------------------------------------------------------------------------ r770 | fabricecolin | 2007-03-18 09:50:01 +0100 (Sun, 18 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog.hh Status window is refreshed every 5 seconds. ------------------------------------------------------------------------ r769 | fabricecolin | 2007-03-17 14:08:04 +0100 (Sat, 17 Mar 2007) | 2 lines Changed paths: M /trunk/Makefile.am D /trunk/UI/GTK2/pinot.png D /trunk/UI/GTK2/pinot.xcf A /trunk/UI/icons A /trunk/UI/icons/16x16 A /trunk/UI/icons/16x16/pinot.png A /trunk/UI/icons/22x22 A /trunk/UI/icons/22x22/pinot.png A /trunk/UI/icons/24x24 A /trunk/UI/icons/24x24/pinot.png A /trunk/UI/icons/32x32 A /trunk/UI/icons/32x32/pinot.png A /trunk/UI/icons/48x48 A /trunk/UI/icons/48x48/pinot.png (from /trunk/UI/GTK2/pinot.png:765) A /trunk/UI/icons/pinot.xcf (from /trunk/UI/GTK2/pinot.xcf:765) A /trunk/UI/icons/scalable M /trunk/pinot.spec.in Icons in several sizes. ------------------------------------------------------------------------ r768 | fabricecolin | 2007-03-17 04:12:40 +0100 (Sat, 17 Mar 2007) | 5 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp Limit length of terms before prefix is applied so that search-time pre-processing doesn't have to bother about prefixes when dealing with long terms. Labels were not always length limited. When pre-processing filters, transform them as done at indexing time. ------------------------------------------------------------------------ r767 | fabricecolin | 2007-03-16 16:45:56 +0100 (Fri, 16 Mar 2007) | 8 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp In XapianIndex, if a prefix is to be used for terms use the same for capitalized terms, so that they are not confused with unprefixed terms. Escape label, file name, directory name and URL terms since they may contain spaces. On the search side, pre-process the query string and escape all filter values set between double-quotes. Reindexing is required for these changes to work properly. Some fixes to XapianDatabase. ------------------------------------------------------------------------ r766 | fabricecolin | 2007-03-16 16:39:33 +0100 (Fri, 16 Mar 2007) | 3 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml On second thought, better use the same DocumentInfo fields as the document data fields, at least for the time being. ------------------------------------------------------------------------ r765 | fabricecolin | 2007-03-15 14:30:07 +0100 (Thu, 15 Mar 2007) | 2 lines Changed paths: M /trunk/README Somewhat better documentation. ------------------------------------------------------------------------ r764 | fabricecolin | 2007-03-15 14:04:08 +0100 (Thu, 15 Mar 2007) | 4 lines Changed paths: M /trunk/Utils/CommandLine.cpp M /trunk/Utils/MIMEScanner.cpp If a desktop file has an equal sign in Exec, run the command with sh -c as system() would do because we don't have an easy way to find out where the equal sign is. ------------------------------------------------------------------------ r763 | fabricecolin | 2007-03-15 13:42:24 +0100 (Thu, 15 Mar 2007) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/UI/GTK2/src/pinot.cc Changed signatures of DBus methods Get and SetDocumentInfo to allow passing any metadata field. Define DBUS_API_SUBJECT_TO_CHANGE only if DBus version is < 1.0. ------------------------------------------------------------------------ r762 | fabricecolin | 2007-03-12 15:41:30 +0100 (Mon, 12 Mar 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Don't bother with our little locale hack if it's already UTF8. ------------------------------------------------------------------------ r761 | fabricecolin | 2007-03-12 15:33:22 +0100 (Mon, 12 Mar 2007) | 2 lines Changed paths: M /trunk/Collect/MboxCollector.cpp Set property OPERATING_MODE to "view". ------------------------------------------------------------------------ r760 | fabricecolin | 2007-03-12 13:49:05 +0100 (Mon, 12 Mar 2007) | 2 lines Changed paths: M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/Makefile.am D /trunk/Collect/XapianCollector.cpp D /trunk/Collect/XapianCollector.h D /trunk/Collect/pinot-collect.1 D /trunk/Collect/pinot-collect.cpp M /trunk/Makefile.am M /trunk/pinot.spec.in XapianCollector is no longer needed. Removed pinot-collect. ------------------------------------------------------------------------ r759 | fabricecolin | 2007-03-10 08:09:30 +0100 (Sat, 10 Mar 2007) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Initialize HtmlFilter before loading the filters. ------------------------------------------------------------------------ r758 | fabricecolin | 2007-03-08 13:58:59 +0100 (Thu, 08 Mar 2007) | 2 lines Changed paths: M /trunk/TODO Keep track of stuff I told Reuben I would do :-) ------------------------------------------------------------------------ r757 | fabricecolin | 2007-03-07 16:08:04 +0100 (Wed, 07 Mar 2007) | 4 lines Changed paths: M /trunk/Collect/DownloaderInterface.cpp M /trunk/configure.in Curl might not have been built against OpenSSL so check the output of 'curl-config --features' and act accordingly. This was reported by Reuben Thomas. ------------------------------------------------------------------------ r755 | fabricecolin | 2007-03-06 11:24:42 +0100 (Tue, 06 Mar 2007) | 3 lines Changed paths: M /trunk/NEWS M /trunk/README Describe work-around for threading problem on FreeBSD. Release date is now today ;-) ------------------------------------------------------------------------ r754 | fabricecolin | 2007-03-05 12:45:02 +0100 (Mon, 05 Mar 2007) | 2 lines Changed paths: M /trunk/NEWS M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/nl.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Setting release date to today. ------------------------------------------------------------------------ r753 | fabricecolin | 2007-03-05 12:39:14 +0100 (Mon, 05 Mar 2007) | 2 lines Changed paths: M /trunk/TODO Cleanup. ------------------------------------------------------------------------ r752 | fabricecolin | 2007-03-05 12:28:01 +0100 (Mon, 05 Mar 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/configure.in Make sure Glib::thread_init() is called before any other glib function. Run the dbus version number through awk to make it more useful. Call dbus_threads_init_default() just in case if dbus >= 1.0 is available. ------------------------------------------------------------------------ r751 | fabricecolin | 2007-03-01 14:52:12 +0100 (Thu, 01 Mar 2007) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated manual pages. ------------------------------------------------------------------------ r750 | fabricecolin | 2007-02-28 14:39:02 +0100 (Wed, 28 Feb 2007) | 4 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp Workaround broken desktop files that set envvars in the Exec field, for instance "PATH=... cmd_line", even though the spec says the equal sign is not allowed. Thanks to Thierry Thomas for reporting this. ------------------------------------------------------------------------ r749 | fabricecolin | 2007-02-27 11:57:02 +0100 (Tue, 27 Feb 2007) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/Utils/TimeConverter.cpp Fixed stupid time conversion error that would sometimes prevent date ranges from being applied. ------------------------------------------------------------------------ r748 | fabricecolin | 2007-02-26 14:21:37 +0100 (Mon, 26 Feb 2007) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS M /trunk/README M /trunk/configure.in M /trunk/pinot.desktop M /trunk/pinot.spec.in Preparing for 0.70 release. ------------------------------------------------------------------------ r747 | fabricecolin | 2007-02-26 14:14:09 +0100 (Mon, 26 Feb 2007) | 2 lines Changed paths: M /trunk/po/es.po Updates by Gar Bage. ------------------------------------------------------------------------ r746 | fabricecolin | 2007-02-26 14:09:11 +0100 (Mon, 26 Feb 2007) | 2 lines Changed paths: M /trunk/TODO +8 items. ------------------------------------------------------------------------ r745 | fabricecolin | 2007-02-24 09:55:12 +0100 (Sat, 24 Feb 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Query results were not labeled because getIndexId() was called on the location instead of the name of internal indices. ------------------------------------------------------------------------ r744 | fabricecolin | 2007-02-24 09:08:39 +0100 (Sat, 24 Feb 2007) | 5 lines Changed paths: M /trunk/Collect/MboxCollector.cpp M /trunk/Index/FilterWrapper.cpp M /trunk/Index/FilterWrapper.h M /trunk/Tokenize/FilterUtils.cpp FilterUtils::feedFilter() can save data to a temporary file as a last resort. Mailbox documents shouldn't be treated like files. FilterWrapper preserves the MIME type of documents before transformation. MboxCollector first feeds the mbox file, then skips to the necessary offset. ------------------------------------------------------------------------ r743 | fabricecolin | 2007-02-24 08:07:01 +0100 (Sat, 24 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Skip download if the filter can be directly pointed at the document. ------------------------------------------------------------------------ r742 | fabricecolin | 2007-02-23 16:19:48 +0100 (Fri, 23 Feb 2007) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp Overloaded IndexInterface::unindexDocuments() and added deleteItems() to CrawlHistory to be able to remove all documents under a given directory in one fell swoop. ------------------------------------------------------------------------ r741 | fabricecolin | 2007-02-23 14:30:13 +0100 (Fri, 23 Feb 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Not finding any result is okay, queryDatabase() shouldn't return false. ------------------------------------------------------------------------ r740 | fabricecolin | 2007-02-22 16:01:22 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/Utils/xdgmime/xdgmimemagic.c Synced with gtk+/gtk/xdgmime. ------------------------------------------------------------------------ r739 | fabricecolin | 2007-02-22 15:57:10 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Warning notice about the changing interface. ------------------------------------------------------------------------ r738 | fabricecolin | 2007-02-22 15:56:33 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Removed extensions handled by ExternalFilter from default blacklist. ------------------------------------------------------------------------ r737 | fabricecolin | 2007-02-22 15:53:39 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/Utils/Document.cpp M /trunk/Utils/Document.h Method resetData(). ------------------------------------------------------------------------ r736 | fabricecolin | 2007-02-22 15:53:03 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Comment to remind myself I could use enquire.get_matching_terms_begin() here. ------------------------------------------------------------------------ r735 | fabricecolin | 2007-02-22 15:05:02 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc Index stats labels were swapped. ------------------------------------------------------------------------ r734 | fabricecolin | 2007-02-22 13:58:35 +0100 (Thu, 22 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Fixed construction of Results. ------------------------------------------------------------------------ r733 | fabricecolin | 2007-02-21 15:39:05 +0100 (Wed, 21 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Look for the robots directive. ------------------------------------------------------------------------ r732 | fabricecolin | 2007-02-21 15:37:48 +0100 (Wed, 21 Feb 2007) | 2 lines Changed paths: M /trunk/Tokenize/FilterUtils.cpp Don't fail if the file is empty. ------------------------------------------------------------------------ r731 | fabricecolin | 2007-02-21 15:34:01 +0100 (Wed, 21 Feb 2007) | 2 lines Changed paths: M /trunk/Index/FilterWrapper.cpp M /trunk/Index/FilterWrapper.h Handles unsupported types. ------------------------------------------------------------------------ r730 | fabricecolin | 2007-02-20 12:38:06 +0100 (Tue, 20 Feb 2007) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS M /trunk/TODO State update. ------------------------------------------------------------------------ r729 | fabricecolin | 2007-02-20 07:09:21 +0100 (Tue, 20 Feb 2007) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/nl.po M /trunk/po/pt_BR.po M /trunk/po/ru.po Commit current po files before I import them into Rosetta. ------------------------------------------------------------------------ r728 | fabricecolin | 2007-02-20 07:03:43 +0100 (Tue, 20 Feb 2007) | 2 lines Changed paths: M /trunk/po/sv.po Checking in Daniel's latest translations, exported from Rosetta today. ------------------------------------------------------------------------ r727 | fabricecolin | 2007-02-20 06:07:22 +0100 (Tue, 20 Feb 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc Show results' timestamp. Don't attempt setting the indexed icon on results that are being indexed as their document ID is not yet known. ------------------------------------------------------------------------ r726 | fabricecolin | 2007-02-20 06:04:52 +0100 (Tue, 20 Feb 2007) | 2 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Search/XapianEngine.cpp Moved DocumentInfo <-> Xapian document data conversion to XapianDatabase. ------------------------------------------------------------------------ r725 | fabricecolin | 2007-02-19 16:21:36 +0100 (Mon, 19 Feb 2007) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp XapianEngine sets the results' timestamp as well as document IDs, so that QueryingThread can skip calling hasDocument() on internal indexes. ------------------------------------------------------------------------ r724 | fabricecolin | 2007-02-19 15:15:18 +0100 (Mon, 19 Feb 2007) | 2 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Moved m_maxTermLength to XapianDatabase. ------------------------------------------------------------------------ r723 | fabricecolin | 2007-02-18 12:40:47 +0100 (Sun, 18 Feb 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Let a thread start the daemon. Give (slight) priority to internal engines when running a query. ------------------------------------------------------------------------ r722 | fabricecolin | 2007-02-17 14:00:43 +0100 (Sat, 17 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Moved Preferences menuitem back to Session, with Status. ------------------------------------------------------------------------ r721 | fabricecolin | 2007-02-17 13:58:58 +0100 (Sat, 17 Feb 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Method queryDatabase() handles exceptions more gracefully. ------------------------------------------------------------------------ r720 | fabricecolin | 2007-02-17 11:27:53 +0100 (Sat, 17 Feb 2007) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/nl.po M /trunk/po/pt_BR.po M /trunk/po/ru.po M /trunk/po/sv.po Synced po files. ------------------------------------------------------------------------ r719 | fabricecolin | 2007-02-17 11:13:25 +0100 (Sat, 17 Feb 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/statisticsDialog.cc Daemon logs its PID, shown by the Status window. In the Status window, don't add an Errors row if no error was logged. ------------------------------------------------------------------------ r718 | fabricecolin | 2007-02-17 11:11:43 +0100 (Sat, 17 Feb 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/Utils/Result.cpp Fixes for previous commit. ------------------------------------------------------------------------ r717 | fabricecolin | 2007-02-17 10:29:23 +0100 (Sat, 17 Feb 2007) | 6 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/Result.cpp M /trunk/Utils/Result.h Avoid peeking at the indexes from the main UI thread. - IndexingThread determines whether the document should be updated. - QueryingThread finds which results are in one of the indexes. - ExpandQueryThread gets the document IDs of the relevant documents. - LabelUpdateThread can set labels on documents. ------------------------------------------------------------------------ r716 | fabricecolin | 2007-02-17 10:23:09 +0100 (Sat, 17 Feb 2007) | 2 lines Changed paths: M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h Removed now unused method. ------------------------------------------------------------------------ r715 | fabricecolin | 2007-02-17 05:17:08 +0100 (Sat, 17 Feb 2007) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Tokenize/FilterUtils.cpp Fixed link and abstract extraction, both broken following the move to Filter. ------------------------------------------------------------------------ r714 | fabricecolin | 2007-02-16 16:03:36 +0100 (Fri, 16 Feb 2007) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Search/WebEngine.cpp M /trunk/Search/pinot-search.cpp Use FilterUtils. ------------------------------------------------------------------------ r713 | fabricecolin | 2007-02-16 16:02:12 +0100 (Fri, 16 Feb 2007) | 3 lines Changed paths: M /trunk/Collect/MboxCollector.cpp M /trunk/Index/FilterWrapper.cpp M /trunk/Index/FilterWrapper.h A /trunk/Tokenize/FilterUtils.cpp A /trunk/Tokenize/FilterUtils.h M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/Utils/Makefile.am D /trunk/Utils/MboxParser.cpp D /trunk/Utils/MboxParser.h Moved FilterWrapper's non-indexing related methods to FilterUtils. The mbox filter obsoletes MboxParser. Updated MboxCollector and MboxHandler. ------------------------------------------------------------------------ r712 | fabricecolin | 2007-02-10 07:11:30 +0100 (Sat, 10 Feb 2007) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in Distribute and install external-filters.xml. ------------------------------------------------------------------------ r711 | fabricecolin | 2007-02-09 16:09:44 +0100 (Fri, 09 Feb 2007) | 4 lines Changed paths: M /trunk/Index/Makefile.am M /trunk/Makefile.am M /trunk/README M /trunk/Search/Google/Makefile.am M /trunk/pinot.spec.in Install the new catalogs and the AUTHORS file. New dependency on openssh-askpass. Makefile typo fixes. ------------------------------------------------------------------------ r710 | fabricecolin | 2007-02-09 15:25:43 +0100 (Fri, 09 Feb 2007) | 2 lines Changed paths: M /trunk/Utils/DocumentInfo.h Type declaration. ------------------------------------------------------------------------ r709 | fabricecolin | 2007-02-09 15:25:09 +0100 (Fri, 09 Feb 2007) | 6 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Replaced specialized tokenizers with Dijon filters. Pinot-index takes --db to specify the path to the index; the index type is hardcoded to xapian. Pinot-search takes --max to specify the maximum number of results. Pinot sets SSH_ASKPASS. ------------------------------------------------------------------------ r708 | fabricecolin | 2007-02-09 15:14:01 +0100 (Fri, 09 Feb 2007) | 4 lines Changed paths: A /trunk/Index/FilterWrapper.cpp A /trunk/Index/FilterWrapper.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/WebEngine.cpp M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp FilterWrapper provides a couple of useful methods to filter documents or index their content. Replaced specialized tokenizers with Dijon filters. ------------------------------------------------------------------------ r707 | fabricecolin | 2007-02-09 15:02:37 +0100 (Fri, 09 Feb 2007) | 2 lines Changed paths: D /trunk/Tokenize/HtmlTokenizer.cpp D /trunk/Tokenize/HtmlTokenizer.h D /trunk/Tokenize/OpenDocumentTokenizer.cpp D /trunk/Tokenize/OpenDocumentTokenizer.h D /trunk/Tokenize/PdfTokenizer.cpp D /trunk/Tokenize/PdfTokenizer.h D /trunk/Tokenize/RtfTokenizer.cpp D /trunk/Tokenize/RtfTokenizer.h D /trunk/Tokenize/TagLibTokenizer.cpp D /trunk/Tokenize/TagLibTokenizer.h D /trunk/Tokenize/TokenizerFactory.cpp D /trunk/Tokenize/TokenizerFactory.h D /trunk/Tokenize/UnknownTypeTokenizer.cpp D /trunk/Tokenize/UnknownTypeTokenizer.h D /trunk/Tokenize/WordTokenizer.cpp D /trunk/Tokenize/WordTokenizer.h D /trunk/Tokenize/XmlTokenizer.cpp D /trunk/Tokenize/XmlTokenizer.h A /trunk/Tokenize/filters A /trunk/Tokenize/filters/README Specialized tokenizers are replaced by Dijon's filters. ------------------------------------------------------------------------ r706 | fabricecolin | 2007-02-08 16:28:51 +0100 (Thu, 08 Feb 2007) | 2 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Index/Makefile.am M /trunk/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/Search/Google/Makefile.am M /trunk/Search/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am M /trunk/configure.in Added Tokenize/filters to build. Renamed gmime variables. ------------------------------------------------------------------------ r705 | fabricecolin | 2007-01-31 14:53:44 +0100 (Wed, 31 Jan 2007) | 2 lines Changed paths: M /trunk/TODO A lot more to do... ------------------------------------------------------------------------ r704 | fabricecolin | 2007-01-28 06:23:31 +0100 (Sun, 28 Jan 2007) | 2 lines Changed paths: M /trunk/po/fr.po Synced to current source. ------------------------------------------------------------------------ r703 | fabricecolin | 2007-01-28 06:13:07 +0100 (Sun, 28 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/UI/GTK2/src/statisticsDialog_glade.cc Changed title to Status. Errors rows are collapsed by default and the Indexes row is selected. ------------------------------------------------------------------------ r702 | fabricecolin | 2007-01-28 06:10:32 +0100 (Sun, 28 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.hh Show a document's size and terms count in propertiesDialog. Some minor mods to the queries list. ------------------------------------------------------------------------ r701 | fabricecolin | 2007-01-28 06:05:29 +0100 (Sun, 28 Jan 2007) | 3 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/LanguageDetector.cpp M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Added getDocumentTermsCount() to IndexInterface. Minor fix to XapianDatabase::openDatabase(). ------------------------------------------------------------------------ r700 | fabricecolin | 2007-01-27 12:57:05 +0100 (Sat, 27 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh Don't call DBusXapianIndex::getStatistics() here, this might freeze the UI temporarily. ------------------------------------------------------------------------ r699 | fabricecolin | 2007-01-27 12:55:39 +0100 (Sat, 27 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/statisticsDialog.cc No need to show getItemsCount(CRAWLING). ------------------------------------------------------------------------ r698 | fabricecolin | 2007-01-22 14:50:31 +0100 (Mon, 22 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc When Results, Index is selected, skip results already indexed. Clear the status bar when a notebook page is closed. ------------------------------------------------------------------------ r697 | fabricecolin | 2007-01-22 14:30:38 +0100 (Mon, 22 Jan 2007) | 2 lines Changed paths: M /trunk/Tokenize/TokenizerFactory.cpp Minor fix : application/xml is supported, not application/html. ------------------------------------------------------------------------ r696 | fabricecolin | 2007-01-20 05:41:02 +0100 (Sat, 20 Jan 2007) | 2 lines Changed paths: M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h M /trunk/UI/GTK2/src/statisticsDialog.cc Show how many results were viewed. ------------------------------------------------------------------------ r695 | fabricecolin | 2007-01-13 16:36:08 +0100 (Sat, 13 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/dateDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc Date::set_time_current() is a recent addition, prefer set_time(). ------------------------------------------------------------------------ r694 | fabricecolin | 2007-01-12 13:57:13 +0100 (Fri, 12 Jan 2007) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp Don't include terms with weight 0 in abstract generation. ------------------------------------------------------------------------ r693 | fabricecolin | 2007-01-12 13:46:10 +0100 (Fri, 12 Jan 2007) | 5 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h Date range filtering, back-end part. Creation of the date range query is copied from Omega's date_range_filter() function found in xapian-applications/omega/date.cc. When the dates don't make sense, they are ignored. ------------------------------------------------------------------------ r692 | fabricecolin | 2007-01-12 13:36:29 +0100 (Fri, 12 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/StringManip.cpp A bunch of cosmetic changes and typo corrections. ------------------------------------------------------------------------ r691 | fabricecolin | 2007-01-11 16:35:47 +0100 (Thu, 11 Jan 2007) | 3 lines Changed paths: M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/PinotSettings.cpp A /trunk/UI/GTK2/src/dateDialog.cc A /trunk/UI/GTK2/src/dateDialog.hh A /trunk/UI/GTK2/src/dateDialog_glade.cc A /trunk/UI/GTK2/src/dateDialog_glade.hh M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh M /trunk/po/POTFILES.in Date range filtering, UI-only part. Added dateDialog, a date picker box, and extended queryDialog. ------------------------------------------------------------------------ r690 | fabricecolin | 2007-01-10 00:57:14 +0100 (Wed, 10 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Minor mods. ------------------------------------------------------------------------ r689 | fabricecolin | 2007-01-09 23:30:12 +0100 (Tue, 09 Jan 2007) | 3 lines Changed paths: A /trunk/UI/GTK2/src/launcherDialog.cc A /trunk/UI/GTK2/src/launcherDialog.hh A /trunk/UI/GTK2/src/launcherDialog_glade.cc A /trunk/UI/GTK2/src/launcherDialog_glade.hh M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h M /trunk/Utils/Url.cpp Follow-up to previous commit : actual GTKmm code, mods to MIMEScanner's list of actions and Url prettification. ------------------------------------------------------------------------ r688 | fabricecolin | 2007-01-09 23:25:17 +0100 (Tue, 09 Jan 2007) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/statisticsDialog.cc M /trunk/po/POTFILES.in Still jet-lagged, I can't sleep, so I am checking in code to prompt the user for a command to open documents of unsupported types ;-) And a minor fix in statisticsDialog. ------------------------------------------------------------------------ r687 | fabricecolin | 2007-01-09 14:25:52 +0100 (Tue, 09 Jan 2007) | 3 lines Changed paths: A /trunk/po/pt_BR.po A /trunk/po/ru.po Brazilian Portuguese and Russian translations by Leonardo Melo and Sergey Vostrikov, respectively. ------------------------------------------------------------------------ r686 | fabricecolin | 2007-01-09 13:36:50 +0100 (Tue, 09 Jan 2007) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Hacky fix for parsing of user and password. ------------------------------------------------------------------------ r685 | fabricecolin | 2007-01-09 13:35:03 +0100 (Tue, 09 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am M /trunk/po/POTFILES.in Compile and translate statisticsDialog. ------------------------------------------------------------------------ r684 | fabricecolin | 2007-01-09 13:33:34 +0100 (Tue, 09 Jan 2007) | 2 lines Changed paths: A /trunk/UI/GTK2/src/statisticsDialog.cc A /trunk/UI/GTK2/src/statisticsDialog.hh A /trunk/UI/GTK2/src/statisticsDialog_glade.cc A /trunk/UI/GTK2/src/statisticsDialog_glade.hh Statistics window. ------------------------------------------------------------------------ r683 | fabricecolin | 2007-01-09 13:31:48 +0100 (Tue, 09 Jan 2007) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp Removed DEBUG code. ------------------------------------------------------------------------ r682 | fabricecolin | 2007-01-09 13:30:03 +0100 (Tue, 09 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Removed long obsolete header. ------------------------------------------------------------------------ r681 | fabricecolin | 2007-01-09 13:28:33 +0100 (Tue, 09 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/ServerThreads.h DirectoryScannerThread::scanEntry() logs errors to the database so that they can be shown by statisticsDialog. ------------------------------------------------------------------------ r680 | fabricecolin | 2007-01-09 13:22:54 +0100 (Tue, 09 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/queryDialog_glade.cc Renamed variable. ------------------------------------------------------------------------ r679 | fabricecolin | 2007-01-09 13:22:18 +0100 (Tue, 09 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Open statisticsDialog when Session, Statistics is activated. Added .desktop file's Comment to about box to help with translation. ------------------------------------------------------------------------ r678 | fabricecolin | 2007-01-09 13:19:23 +0100 (Tue, 09 Jan 2007) | 4 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/indexDialog_glade.hh Changes to remote indexes backend and UI to allow using xapian-progsrv with SSH. Note this is disabled in the UI until I figure out the best way to prompt for a password. ------------------------------------------------------------------------ r677 | fabricecolin | 2007-01-09 13:17:46 +0100 (Tue, 09 Jan 2007) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade Moved Preferences under Edit in mainWindow, added Statistics under Session. Revamped indexDialog. ------------------------------------------------------------------------ r676 | fabricecolin | 2007-01-07 15:17:12 +0100 (Sun, 07 Jan 2007) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc Removed long gone header. ------------------------------------------------------------------------ r675 | fabricecolin | 2007-01-07 15:03:19 +0100 (Sun, 07 Jan 2007) | 6 lines Changed paths: M /trunk/SQL/ActionQueue.cpp M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/SQLiteBase.cpp In SQLiteBase::open(), we may have to close the handle returned by sqlite3_open() when it doesn't return SQLITE_OK ! Changed prototype of some of CrawlHistory's methods, fixed unescaping in getSources(). Less DEBUG in ActionQueue. ------------------------------------------------------------------------ r673 | fabricecolin | 2006-12-21 15:59:10 +0100 (Thu, 21 Dec 2006) | 2 lines Changed paths: M /trunk/NEWS Changes since previous release. ------------------------------------------------------------------------ r672 | fabricecolin | 2006-12-21 15:58:31 +0100 (Thu, 21 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/nl.po M /trunk/po/sv.po Force locale to UTF-8. This is very hackish but should help with a so far unexplained crash when it's something else, eg fr_FR.ISO-8859-1. ------------------------------------------------------------------------ r671 | fabricecolin | 2006-12-20 14:20:53 +0100 (Wed, 20 Dec 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated with new version number. ------------------------------------------------------------------------ r670 | fabricecolin | 2006-12-20 14:19:37 +0100 (Wed, 20 Dec 2006) | 2 lines Changed paths: M /trunk/configure.in Bumped version number. ------------------------------------------------------------------------ r669 | fabricecolin | 2006-12-20 14:16:58 +0100 (Wed, 20 Dec 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/nl.po M /trunk/po/sv.po Synced translations with source. ------------------------------------------------------------------------ r668 | fabricecolin | 2006-12-20 14:01:19 +0100 (Wed, 20 Dec 2006) | 2 lines Changed paths: M /trunk/pinot.desktop Removed Application from Categories to keep desktop-file-install 0.11 happy. ------------------------------------------------------------------------ r667 | fabricecolin | 2006-12-20 14:00:20 +0100 (Wed, 20 Dec 2006) | 2 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp Error and warning handlers log only in DEBUG mode. ------------------------------------------------------------------------ r666 | fabricecolin | 2006-12-19 12:10:30 +0100 (Tue, 19 Dec 2006) | 5 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h In removeCommonTerms(), be prepared for some terms to not be present in the document, eg if the document was indexed with a previous version that didn't create the same terms. Changed addCommonTerms() prototype. ------------------------------------------------------------------------ r665 | fabricecolin | 2006-12-19 12:08:19 +0100 (Tue, 19 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc The daemon doesn't have load engine plugins. ------------------------------------------------------------------------ r664 | fabricecolin | 2006-12-18 11:53:10 +0100 (Mon, 18 Dec 2006) | 2 lines Changed paths: M /trunk/Tokenize/TagLibTokenizer.h Corrected define typo. ------------------------------------------------------------------------ r663 | fabricecolin | 2006-12-18 11:52:30 +0100 (Mon, 18 Dec 2006) | 2 lines Changed paths: M /trunk/README M /trunk/TODO Minus 9 items, plus 3. ------------------------------------------------------------------------ r662 | fabricecolin | 2006-12-16 05:03:18 +0100 (Sat, 16 Dec 2006) | 3 lines Changed paths: M /trunk/Tokenize/TagLibTokenizer.cpp Save the document's content to a temporary file if necessary. Don't request all documents. ------------------------------------------------------------------------ r661 | fabricecolin | 2006-12-16 03:59:34 +0100 (Sat, 16 Dec 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp Fixed argument count checking. ------------------------------------------------------------------------ r660 | fabricecolin | 2006-12-15 12:38:56 +0100 (Fri, 15 Dec 2006) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp Labelling methods allocate arrays on the heap. ------------------------------------------------------------------------ r659 | fabricecolin | 2006-12-15 12:35:28 +0100 (Fri, 15 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc When documents are relabeled, call browse_index() so that the list is cleared if necessary. ------------------------------------------------------------------------ r658 | fabricecolin | 2006-12-14 13:21:23 +0100 (Thu, 14 Dec 2006) | 2 lines Changed paths: M /trunk/AUTHORS Removed specifics of who suggested what, I can't keep track :-) ------------------------------------------------------------------------ r657 | fabricecolin | 2006-12-14 13:20:18 +0100 (Thu, 14 Dec 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h Date terms Dyyyymmdd, Myyyymm and Yyyyy. ------------------------------------------------------------------------ r656 | fabricecolin | 2006-12-14 12:26:16 +0100 (Thu, 14 Dec 2006) | 2 lines Changed paths: M /trunk/README Document which filters apply to all engines. ------------------------------------------------------------------------ r655 | fabricecolin | 2006-12-14 12:18:57 +0100 (Thu, 14 Dec 2006) | 2 lines Changed paths: M /trunk/configure.in M /trunk/pinot.spec.in New option --enable-debug. Renamed --with-soap to --enable-soap. ------------------------------------------------------------------------ r654 | fabricecolin | 2006-12-13 14:24:19 +0100 (Wed, 13 Dec 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp Unload tokenizers on exit. ------------------------------------------------------------------------ r653 | fabricecolin | 2006-12-13 13:49:55 +0100 (Wed, 13 Dec 2006) | 6 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/UI/GTK2/src/ServerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Added method setDocumentsLabels() to IndexInterface. DBusXapianIndex's version maps to a DBus method. This is definitely faster than calling SetDocumentLabels on each document. Note that DBusXapianIndex works around a problem with dbus_g_proxy_call(), which modifies the first argument's pointer. ------------------------------------------------------------------------ r652 | fabricecolin | 2006-12-12 14:47:49 +0100 (Tue, 12 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Activate the firstRun flag if the configuration file couldn't be found. ------------------------------------------------------------------------ r651 | fabricecolin | 2006-12-12 14:46:55 +0100 (Tue, 12 Dec 2006) | 2 lines Changed paths: M /trunk/Utils/Document.cpp In setDataFromFile(), set the document's timestamp and size. ------------------------------------------------------------------------ r650 | fabricecolin | 2006-12-12 14:45:38 +0100 (Tue, 12 Dec 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp Load and unload tokenizers !!! ------------------------------------------------------------------------ r649 | fabricecolin | 2006-12-12 14:44:34 +0100 (Tue, 12 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Store the charset in QueryHistory, convert the extract to UTF-8 before display and catch (conversion) exceptions in updateRow(). ------------------------------------------------------------------------ r648 | fabricecolin | 2006-12-12 14:42:19 +0100 (Tue, 12 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Replaced setenv() with Glib::setenv(). ------------------------------------------------------------------------ r647 | fabricecolin | 2006-12-12 14:40:00 +0100 (Tue, 12 Dec 2006) | 3 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h Store the result's charset in the so far unused Language column. I ought to rename the column now... ------------------------------------------------------------------------ r646 | fabricecolin | 2006-12-12 14:38:15 +0100 (Tue, 12 Dec 2006) | 3 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/WebEngine.cpp Convert strings to UTF-8 before escaping them for markup. This prevents a crash when the locale is not UTF-8. ------------------------------------------------------------------------ r645 | fabricecolin | 2006-12-11 15:52:25 +0100 (Mon, 11 Dec 2006) | 2 lines Changed paths: M /trunk/configure.in Make sure all binaries are linked against pthreads, not just the UI. ------------------------------------------------------------------------ r644 | fabricecolin | 2006-12-11 15:49:46 +0100 (Mon, 11 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Removed unused member. ------------------------------------------------------------------------ r643 | fabricecolin | 2006-12-11 14:29:09 +0100 (Mon, 11 Dec 2006) | 2 lines Changed paths: M /trunk/scripts/bash/pinot-enum-index.sh Skip non-files silently. ------------------------------------------------------------------------ r642 | fabricecolin | 2006-12-11 14:24:50 +0100 (Mon, 11 Dec 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp Additional check on document's data before calling scanDocument(). In updateDocument(), if the document has no data, don't return with an error. ------------------------------------------------------------------------ r641 | fabricecolin | 2006-12-10 15:34:22 +0100 (Sun, 10 Dec 2006) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderFactory.h M /trunk/Collect/FileCollector.h M /trunk/Collect/MboxCollector.h M /trunk/Collect/NeonDownloader.h M /trunk/Collect/XapianCollector.h M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexFactory.h M /trunk/Index/LanguageDetector.h M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianDatabaseFactory.h M /trunk/Index/XapianIndex.h M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorEvent.h M /trunk/Monitor/MonitorFactory.h M /trunk/Monitor/MonitorHandler.h M /trunk/Monitor/MonitorInterface.h M /trunk/SQL/ActionQueue.h M /trunk/SQL/CrawlHistory.h M /trunk/SQL/QueryHistory.h M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.h M /trunk/Search/AbstractGenerator.h M /trunk/Search/Google/GoogleAPIEngine.h M /trunk/Search/OpenSearchParser.h M /trunk/Search/PluginParsers.h M /trunk/Search/QueryProperties.h M /trunk/Search/SearchEngineFactory.h M /trunk/Search/SearchEngineInterface.h M /trunk/Search/SearchPluginProperties.h M /trunk/Search/SherlockParser.h M /trunk/Search/WebEngine.h M /trunk/Search/XapianEngine.h M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/OpenDocumentTokenizer.h M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/UnknownTypeTokenizer.h M /trunk/Tokenize/WordTokenizer.h M /trunk/Tokenize/XmlTokenizer.h M /trunk/Utils/CommandLine.h M /trunk/Utils/Document.h M /trunk/Utils/DocumentInfo.h M /trunk/Utils/IndexedDocument.h M /trunk/Utils/Languages.h M /trunk/Utils/MIMEScanner.h M /trunk/Utils/MboxParser.h M /trunk/Utils/Result.h M /trunk/Utils/StringManip.h M /trunk/Utils/TimeConverter.h M /trunk/Utils/Timer.h M /trunk/Utils/Url.h Minimal class documentation. ------------------------------------------------------------------------ r640 | fabricecolin | 2006-12-10 15:02:31 +0100 (Sun, 10 Dec 2006) | 3 lines Changed paths: M /trunk/scripts/bash/pinot-enum-index.sh Delve might be called xapian-delve, so check for that. Run bash not sh. ------------------------------------------------------------------------ r639 | fabricecolin | 2006-12-10 08:18:22 +0100 (Sun, 10 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/Makefile.am A /trunk/UI/GTK2/src/ServerThreads.cpp A /trunk/UI/GTK2/src/ServerThreads.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Let D-Bus message be processed and replied to by DBusServletThread. Moved Monitor and DirectoryScannerThread to ServerThreads. ------------------------------------------------------------------------ r638 | fabricecolin | 2006-12-10 08:09:09 +0100 (Sun, 10 Dec 2006) | 2 lines Changed paths: M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp Don't request any data for unknown types. ------------------------------------------------------------------------ r637 | fabricecolin | 2006-12-09 06:46:02 +0100 (Sat, 09 Dec 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Mark files in sysconfdir with %config(noreplace). ------------------------------------------------------------------------ r636 | fabricecolin | 2006-12-09 04:36:11 +0100 (Sat, 09 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml When a query returns, only relabel documents that are already indexed. Fixed description of GetDocumentLabels method. ------------------------------------------------------------------------ r635 | fabricecolin | 2006-12-09 04:19:34 +0100 (Sat, 09 Dec 2006) | 2 lines Changed paths: M /trunk/TODO More stuff to do... ------------------------------------------------------------------------ r634 | fabricecolin | 2006-12-09 03:20:37 +0100 (Sat, 09 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc First time Pinot is launched, open the preferences and show the Indexing tab. ------------------------------------------------------------------------ r633 | fabricecolin | 2006-12-08 15:59:01 +0100 (Fri, 08 Dec 2006) | 5 lines Changed paths: M /trunk/Index/LanguageDetector.cpp M /trunk/Makefile.am M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/configure.in M /trunk/pinot.spec.in Install configuration files under $sysconfdir/pinot, tokenizers under $libdir/pinot/tokenizers. Search plugins are not configuration files so can stay under $datadir. Some other tweaks to the spec file suggested by Neal Becker. ------------------------------------------------------------------------ r632 | fabricecolin | 2006-12-08 12:30:24 +0100 (Fri, 08 Dec 2006) | 2 lines Changed paths: M /trunk/configure.in Complain bitterly if textcat.h is not found. ------------------------------------------------------------------------ r631 | fabricecolin | 2006-12-08 12:17:59 +0100 (Fri, 08 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Don't call setDocumentLabels() if there were and still are no labels. ------------------------------------------------------------------------ r630 | fabricecolin | 2006-12-08 12:12:48 +0100 (Fri, 08 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Better initialize D-Bus in the UI too. Not sure why this wasn't a problem earlier. ------------------------------------------------------------------------ r629 | fabricecolin | 2006-12-08 12:11:44 +0100 (Fri, 08 Dec 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Fixed extraction of language and size from the document data. ------------------------------------------------------------------------ r628 | fabricecolin | 2006-12-08 12:11:00 +0100 (Fri, 08 Dec 2006) | 3 lines Changed paths: M /trunk/Utils/CommandLine.cpp Expand %f, %d and %n correctly. When there is no suitable parameter, or none at all, the command line wasn't correct either. ------------------------------------------------------------------------ r627 | fabricecolin | 2006-12-07 14:12:25 +0100 (Thu, 07 Dec 2006) | 3 lines Changed paths: M /trunk/configure.in From Thierry Thomas : determine which of '-pthread' '-lc_r' '-lthr' and '-lpthread' is needed to link against pthreads. ------------------------------------------------------------------------ r626 | fabricecolin | 2006-12-07 12:26:49 +0100 (Thu, 07 Dec 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Require gsoap > 2.7.8c. ------------------------------------------------------------------------ r625 | fabricecolin | 2006-12-06 15:35:01 +0100 (Wed, 06 Dec 2006) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/DownloaderFactory.h M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h M /trunk/Collect/XapianCollector.cpp M /trunk/Collect/XapianCollector.h M /trunk/Collect/pinot-collect.cpp M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexFactory.cpp M /trunk/Index/IndexFactory.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianDatabaseFactory.cpp M /trunk/Index/XapianDatabaseFactory.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/pinot-index.cpp M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorFactory.cpp M /trunk/Monitor/MonitorFactory.h M /trunk/Monitor/MonitorInterface.h M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Google/GoogleAPIEngine.h M /trunk/Search/Google/GoogleSearch.h M /trunk/Search/SearchEngineFactory.cpp M /trunk/Search/SearchEngineFactory.h M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/Search/pinot-search.cpp M /trunk/Search/plugintest.cpp M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/UnknownTypeTokenizer.cpp M /trunk/Tokenize/UnknownTypeTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/Tokenize/XmlTokenizer.h M /trunk/Tokenize/tokenizertest.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh M /trunk/Utils/Languages.cpp M /trunk/Utils/Languages.h M /trunk/Utils/MboxParser.cpp M /trunk/Utils/MboxParser.h M /trunk/Utils/Result.cpp M /trunk/Utils/Result.h M /trunk/Utils/Timer.cpp M /trunk/Utils/Timer.h Reset executable bit on source files. ------------------------------------------------------------------------ r623 | fabricecolin | 2006-12-05 11:44:07 +0100 (Tue, 05 Dec 2006) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS Updated with these few days changes. Set release date to today. ------------------------------------------------------------------------ r622 | fabricecolin | 2006-12-05 11:42:17 +0100 (Tue, 05 Dec 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po M /trunk/po/nl.po M /trunk/po/sv.po Synced po's with source. ------------------------------------------------------------------------ r621 | fabricecolin | 2006-12-05 11:33:24 +0100 (Tue, 05 Dec 2006) | 2 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianIndex.cpp Less DEBUG messages. ------------------------------------------------------------------------ r620 | fabricecolin | 2006-12-04 15:55:56 +0100 (Mon, 04 Dec 2006) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Added nl and sv to list of languages. Changed how tokenizers are installed and packaged. ------------------------------------------------------------------------ r619 | fabricecolin | 2006-12-04 14:15:28 +0100 (Mon, 04 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Fixed deadlock, reported by Thierry Thomas . ------------------------------------------------------------------------ r618 | fabricecolin | 2006-12-04 11:42:17 +0100 (Mon, 04 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Only return actual documents in getSelection(). ------------------------------------------------------------------------ r617 | fabricecolin | 2006-12-04 11:38:12 +0100 (Mon, 04 Dec 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Don't update the index list incrementally, do it when IndexBrowser returns. This fixes a crash on SMP systems due to calls to GTK by the UI and a worker thread. ------------------------------------------------------------------------ r616 | fabricecolin | 2006-12-04 11:31:41 +0100 (Mon, 04 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Check whether Glib threads subsystem is initialized before calling init(). ------------------------------------------------------------------------ r615 | fabricecolin | 2006-12-04 11:30:40 +0100 (Mon, 04 Dec 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp The file name shouldn't be lower-cased. ------------------------------------------------------------------------ r614 | fabricecolin | 2006-12-04 01:00:56 +0100 (Mon, 04 Dec 2006) | 2 lines Changed paths: A /trunk/po/sv.po Swedish translation by Daniel Nylander. ------------------------------------------------------------------------ r613 | fabricecolin | 2006-12-03 16:37:44 +0100 (Sun, 03 Dec 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/po/es.po M /trunk/po/fr.po Minor corrections and new string in French and Spanish po files. ------------------------------------------------------------------------ r612 | fabricecolin | 2006-12-03 16:31:43 +0100 (Sun, 03 Dec 2006) | 2 lines Changed paths: A /trunk/po/nl.po Dutch translation by Tikkel, through Rosetta. ------------------------------------------------------------------------ r611 | fabricecolin | 2006-12-03 16:02:10 +0100 (Sun, 03 Dec 2006) | 3 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Updated man pages. Increased version number in preparation for release in a few days time. ------------------------------------------------------------------------ r610 | fabricecolin | 2006-12-02 04:45:41 +0100 (Sat, 02 Dec 2006) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/Tokenize/TagLibTokenizer.cpp M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/Notebook.cpp Fixed minor warnings. ------------------------------------------------------------------------ r609 | fabricecolin | 2006-12-02 04:18:18 +0100 (Sat, 02 Dec 2006) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp Deserved dusting off. ------------------------------------------------------------------------ r608 | fabricecolin | 2006-12-01 16:53:51 +0100 (Fri, 01 Dec 2006) | 2 lines Changed paths: M /trunk/Makefile.am Distribute the enum-index script. ------------------------------------------------------------------------ r607 | fabricecolin | 2006-12-01 16:53:01 +0100 (Fri, 01 Dec 2006) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS M /trunk/README M /trunk/TODO Status update. ------------------------------------------------------------------------ r606 | fabricecolin | 2006-12-01 16:48:46 +0100 (Fri, 01 Dec 2006) | 4 lines Changed paths: M /trunk/Index/XapianIndex.cpp The length of R-prefixed terms was not limited. This could raise "key too long" exceptions and prevent some documents from being indexed. Print the exception's errno when caught. Made DEBUG a bit more verbose. ------------------------------------------------------------------------ r605 | fabricecolin | 2006-12-01 14:14:06 +0100 (Fri, 01 Dec 2006) | 3 lines Changed paths: M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h Pass -htmlmeta to pdftotext so that document text is wrapped in html with some useful metadata. ------------------------------------------------------------------------ r604 | fabricecolin | 2006-12-01 14:12:02 +0100 (Fri, 01 Dec 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Build with -O0 in debug mode. ------------------------------------------------------------------------ r603 | fabricecolin | 2006-12-01 12:11:25 +0100 (Fri, 01 Dec 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Glib::Dispatcher is supposed to allow multiple senders without locking, but some people on the gtkmm mailing list think otherwise :-) Better safe than sorry. I am adding a mutex in emitSignal(). ------------------------------------------------------------------------ r602 | fabricecolin | 2006-12-01 12:08:16 +0100 (Fri, 01 Dec 2006) | 4 lines Changed paths: M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h Contrary to what I thought, simultaneous read accesses on the same Database object are not safe ! Of course, it's only on SMP systems that weird problems cropped up. ------------------------------------------------------------------------ r601 | fabricecolin | 2006-11-27 15:25:46 +0100 (Mon, 27 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/MboxParser.h Cosmetic mods. ------------------------------------------------------------------------ r600 | fabricecolin | 2006-11-27 13:54:22 +0100 (Mon, 27 Nov 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Found out that SigC::Object/sigc::trackable doesn't have a virtual destructor for some obscure reason and that it should be inherited from virtually. Fixing this can only be a good thing... ------------------------------------------------------------------------ r599 | fabricecolin | 2006-11-25 07:04:36 +0100 (Sat, 25 Nov 2006) | 5 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/OpenDocumentTokenizer.cpp M /trunk/Tokenize/OpenDocumentTokenizer.h M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/TagLibTokenizer.cpp M /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/WordTokenizer.cpp M /trunk/Tokenize/WordTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/Utils/MboxParser.cpp M /trunk/Utils/MboxParser.h Preserve size and timestamp attributes determined when documents are retrieved, by a crawl, a download or an adapter. Don't load files that are going to be handled by helper applications that can access them directly. ------------------------------------------------------------------------ r598 | fabricecolin | 2006-11-25 06:57:11 +0100 (Sat, 25 Nov 2006) | 2 lines Changed paths: M /trunk/Utils/CommandLine.cpp M /trunk/Utils/CommandLine.h Static method to shell-quote strings. ------------------------------------------------------------------------ r597 | fabricecolin | 2006-11-25 06:47:31 +0100 (Sat, 25 Nov 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/queryDialog.cc File extension stored in the index with prefix XEXT. ------------------------------------------------------------------------ r596 | fabricecolin | 2006-11-23 14:04:12 +0100 (Thu, 23 Nov 2006) | 2 lines Changed paths: M /trunk/Utils/DocumentInfo.h Missed this in previous commit. ------------------------------------------------------------------------ r595 | fabricecolin | 2006-11-23 14:02:56 +0100 (Thu, 23 Nov 2006) | 2 lines Changed paths: M /trunk/Collect/XapianCollector.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Index/pinot-index.cpp M /trunk/SQL/ActionQueue.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/DocumentInfo.cpp Added file size to DocumentInfo and the index (as data). ------------------------------------------------------------------------ r594 | fabricecolin | 2006-11-22 14:13:41 +0100 (Wed, 22 Nov 2006) | 3 lines Changed paths: M /trunk/Search/Google/GAPIC.cpp M /trunk/Search/Google/GAPIClient.cpp M /trunk/Search/Google/GAPIClientLib.cpp M /trunk/Search/Google/GAPIGoogleSearchBindingProxy.h M /trunk/Search/Google/GAPIH.h M /trunk/Search/Google/GAPIStub.h M /trunk/Search/Google/GoogleSearch.h M /trunk/Search/Google/Makefile.am M /trunk/Search/SOAPEnvH.h M /trunk/Search/SOAPEnvStub.h Regenerated SOAP stubs with gSOAP v2.7.8c. Applied same fixes to GAPIC.cpp as in revision 189. ------------------------------------------------------------------------ r593 | fabricecolin | 2006-11-21 12:26:46 +0100 (Tue, 21 Nov 2006) | 6 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Disconnect and get threads to stop before closing stuff down and exiting. Worker threads that have a signal disconnect it in their stop() method. DirectoryScanner didn't check whether it was supposed to exit. While crawling, the daemon should now be able to stop gracefully, whether because it was signalled or the Stop method was invoked. ------------------------------------------------------------------------ r592 | fabricecolin | 2006-11-19 07:35:00 +0100 (Sun, 19 Nov 2006) | 3 lines Changed paths: A /trunk/SQL/ActionQueue.cpp A /trunk/SQL/ActionQueue.h M /trunk/SQL/Makefile.am M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Replaced std::queue-based indexing queue with a database table appropriately named ActionQueue. This seems to lower first-crawl memory usage a good deal. ------------------------------------------------------------------------ r591 | fabricecolin | 2006-11-19 07:03:24 +0100 (Sun, 19 Nov 2006) | 2 lines Changed paths: D /trunk/UI/pinot-live.py This was moved elsewhere. ------------------------------------------------------------------------ r590 | fabricecolin | 2006-11-19 07:02:31 +0100 (Sun, 19 Nov 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Set the daemon's scheduling priority to 15 (default) or any value passed with "--priority". This already makes a big difference. Changing the I/O scheduling class and priority may not be necessary ? ------------------------------------------------------------------------ r589 | fabricecolin | 2006-11-19 06:57:54 +0100 (Sun, 19 Nov 2006) | 4 lines Changed paths: M /trunk/Makefile.am A /trunk/scripts A /trunk/scripts/bash A /trunk/scripts/bash/pinot-enum-index.sh A /trunk/scripts/python A /trunk/scripts/python/pinot-live.py (from /trunk/UI/pinot-live.py:587) Moved pinot-live.py into scripts/python. Bash script scripts/bash/pinot-enum-index.sh enumerates files in an index and gives an estimate of the corresponding disk space. It needs delve, du and dc. ------------------------------------------------------------------------ r587 | fabricecolin | 2006-11-18 08:36:10 +0100 (Sat, 18 Nov 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated version in man pages. ------------------------------------------------------------------------ r586 | fabricecolin | 2006-11-18 08:24:34 +0100 (Sat, 18 Nov 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/configure.in M /trunk/po/es.po M /trunk/po/fr.po Updating po and NEWS, releasing 0.63 today. ------------------------------------------------------------------------ r585 | fabricecolin | 2006-11-18 07:23:24 +0100 (Sat, 18 Nov 2006) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.cpp Initialize struct tm to keep valgrind happy. ------------------------------------------------------------------------ r584 | fabricecolin | 2006-11-18 07:17:18 +0100 (Sat, 18 Nov 2006) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/TODO Mention Nicolas Velin's contribution to the French po. Removed from TODO a couple of items. ------------------------------------------------------------------------ r583 | fabricecolin | 2006-11-17 12:09:16 +0100 (Fri, 17 Nov 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/mainWindow.cc Added msf and sh to default filter list. Treat documents on https as text/html if no handler application is found. ------------------------------------------------------------------------ r582 | fabricecolin | 2006-11-17 12:07:53 +0100 (Fri, 17 Nov 2006) | 3 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/OpenDocumentTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/TagLibTokenizer.cpp M /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/UnknownTypeTokenizer.cpp M /trunk/Tokenize/UnknownTypeTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/Tokenize/XmlTokenizer.h Fixed pretty bad memory leak. Temporary Document objects were not freed most of the time. ------------------------------------------------------------------------ r581 | fabricecolin | 2006-11-17 12:05:41 +0100 (Fri, 17 Nov 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Yahoo.src Removed superfluous parameters. ------------------------------------------------------------------------ r580 | fabricecolin | 2006-11-17 12:04:31 +0100 (Fri, 17 Nov 2006) | 3 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp Watch out for NULL characters in the data received. This happens sometimes with Yahoo! queries that return non-Latin results. ------------------------------------------------------------------------ r579 | fabricecolin | 2006-11-17 12:01:45 +0100 (Fri, 17 Nov 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/configure.in The DBus connection can be closed if dbus < 0.70 (eg Fedora Core 5). Unreferencing is not sufficient, it will raise a SIGBART signal. Daemon catches exceptions, UI catches const char*. ------------------------------------------------------------------------ r578 | fabricecolin | 2006-11-14 16:16:47 +0100 (Tue, 14 Nov 2006) | 2 lines Changed paths: M /trunk/Search/QueryProperties.cpp M /trunk/Search/WebEngine.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h XapianEngine::validateQuery() is now useful :-) WebEngine is less verbose. ------------------------------------------------------------------------ r577 | fabricecolin | 2006-11-14 15:00:30 +0100 (Tue, 14 Nov 2006) | 3 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Added new strings since 0.62, merged in fixes to French po made by Nicolas Velin prior to updating templates in Rosetta. ------------------------------------------------------------------------ r576 | fabricecolin | 2006-11-14 14:23:10 +0100 (Tue, 14 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc Minor mods. ------------------------------------------------------------------------ r575 | fabricecolin | 2006-11-13 12:42:51 +0100 (Mon, 13 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc Extract was not copied into the clipboard and result rows not correctly. ------------------------------------------------------------------------ r574 | fabricecolin | 2006-11-13 12:40:33 +0100 (Mon, 13 Nov 2006) | 2 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h Use QueryProperties::getTerms() as it skips filters. ------------------------------------------------------------------------ r573 | fabricecolin | 2006-11-12 10:05:04 +0100 (Sun, 12 Nov 2006) | 4 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Google/GoogleAPIEngine.h M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h M /trunk/Search/XapianEngine.cpp M /trunk/Utils/Result.cpp M /trunk/Utils/Result.h WebEngine now handles filtering results and highlighting abstracts, similarly to AbstractGenerator. Escape strings in markup. Minor cleanup. ------------------------------------------------------------------------ r572 | fabricecolin | 2006-11-12 06:11:06 +0100 (Sun, 12 Nov 2006) | 2 lines Changed paths: M /trunk/README Mention freedesktop.org's Autostart. ------------------------------------------------------------------------ r571 | fabricecolin | 2006-11-12 06:08:48 +0100 (Sun, 12 Nov 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc In ResultsTree, don't re-parse the abstract, just rely on included markup. Replaced the extract text view with a list to make this easier. Groups are highlighted (search engine/host name). Let columns autoresize in the results, index, engines and query trees. ------------------------------------------------------------------------ r570 | fabricecolin | 2006-11-12 06:03:36 +0100 (Sun, 12 Nov 2006) | 4 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/AbstractGenerator.h M /trunk/Search/XapianEngine.cpp Use a simple markup on terms that should be highlighted, so that the abstract doesn't have to be parsed again when displayed. That markup is conveniently the same as Pango's :-) ------------------------------------------------------------------------ r569 | fabricecolin | 2006-11-12 05:52:30 +0100 (Sun, 12 Nov 2006) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/configure.in A /trunk/pinot-dbus-daemon.desktop M /trunk/pinot.spec.in Let Autostart handle the daemon. ------------------------------------------------------------------------ r567 | fabricecolin | 2006-11-04 14:17:36 +0100 (Sat, 04 Nov 2006) | 2 lines Changed paths: M /trunk/NEWS List bug fixes and new features since 0.61. ------------------------------------------------------------------------ r566 | fabricecolin | 2006-11-04 14:13:38 +0100 (Sat, 04 Nov 2006) | 3 lines Changed paths: D /trunk/po/en.po M /trunk/po/es.po M /trunk/po/fr.po Updated translations. I am taking over the Spanish po... only temporarily hopefully :-) ------------------------------------------------------------------------ r565 | fabricecolin | 2006-11-04 11:25:00 +0100 (Sat, 04 Nov 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Merged text-docs into main package since most people only download that. ------------------------------------------------------------------------ r564 | fabricecolin | 2006-11-04 11:23:46 +0100 (Sat, 04 Nov 2006) | 2 lines Changed paths: M /trunk/AUTHORS M /trunk/TODO Updates. ------------------------------------------------------------------------ r563 | fabricecolin | 2006-11-04 06:59:03 +0100 (Sat, 04 Nov 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Some strings were not localized. Don't alarm the user if the daemon index cannot be open, it just means it's not been created just yet. ------------------------------------------------------------------------ r562 | fabricecolin | 2006-11-04 06:46:01 +0100 (Sat, 04 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc Removed unused variable. ------------------------------------------------------------------------ r561 | fabricecolin | 2006-11-04 05:13:46 +0100 (Sat, 04 Nov 2006) | 4 lines Changed paths: M /trunk/README Mention patterns, that the Flint backend is immune to the lock problem, when monitor events are acted on, and that directory filters work as expected with Xapian 0.9.8. ------------------------------------------------------------------------ r560 | fabricecolin | 2006-11-04 05:11:16 +0100 (Sat, 04 Nov 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp DirectoryScanner checks whether files and directories are blacklisted so that it can skip crawling those that are and remove those that didn't used to be. Use unindexDocument() overload where it makes sense. DEBUG output a bit less verbose. ------------------------------------------------------------------------ r559 | fabricecolin | 2006-11-04 05:08:03 +0100 (Sat, 04 Nov 2006) | 2 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Overloaded unindexDocument() to allow unindexing by location. ------------------------------------------------------------------------ r558 | fabricecolin | 2006-11-03 14:40:25 +0100 (Fri, 03 Nov 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Accoona.src M /trunk/Search/Plugins/Exalead.src Caught up with modified output. ------------------------------------------------------------------------ r557 | fabricecolin | 2006-11-03 13:40:34 +0100 (Fri, 03 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp Added archive formats to file patterns blacklist. ------------------------------------------------------------------------ r556 | fabricecolin | 2006-11-03 13:21:41 +0100 (Fri, 03 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Changed message. ------------------------------------------------------------------------ r555 | fabricecolin | 2006-11-03 12:36:20 +0100 (Fri, 03 Nov 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing for imminent 0.62 release. Updated man pages. ------------------------------------------------------------------------ r554 | fabricecolin | 2006-11-03 12:15:01 +0100 (Fri, 03 Nov 2006) | 6 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Don't close the D-Bus connection, unreference it. This removes the error message "Applications can not close shared connections. Please fix this in your app. Ignoring close request and continuing." seen when the daemon exits. Any such message seen when calling one of our methods comes from the client; this holds true for dbus-send ;-) ------------------------------------------------------------------------ r553 | fabricecolin | 2006-11-03 12:02:52 +0100 (Fri, 03 Nov 2006) | 2 lines Changed paths: M /trunk/Tokenize/TagLibTokenizer.cpp Minor modification. ------------------------------------------------------------------------ r552 | fabricecolin | 2006-11-03 12:02:17 +0100 (Fri, 03 Nov 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp When receiving a WRITE_CLOSED event, ensure the file was actually modified before being closed. This should fix a major bug where the daemon loops endlessly reindexing mp3/ogg files. It seems the event is generated by the TagLib tokenizer. ------------------------------------------------------------------------ r551 | fabricecolin | 2006-11-02 15:40:26 +0100 (Thu, 02 Nov 2006) | 2 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp Less verbose. ------------------------------------------------------------------------ r550 | fabricecolin | 2006-11-02 15:39:45 +0100 (Thu, 02 Nov 2006) | 3 lines Changed paths: M /trunk/README New section on how to reset indexes. Also added blurb about dbus-send and file patterns. ------------------------------------------------------------------------ r549 | fabricecolin | 2006-11-02 12:53:36 +0100 (Thu, 02 Nov 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh If queue_index() fails, display the reason string. In importDialog, start pulsing the progress bar only when we know for sure that the document was queued for indexing. ------------------------------------------------------------------------ r548 | fabricecolin | 2006-11-02 12:48:45 +0100 (Thu, 02 Nov 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh File patterns are editable though the Preferences dialog box. ------------------------------------------------------------------------ r547 | fabricecolin | 2006-11-02 12:46:18 +0100 (Thu, 02 Nov 2006) | 6 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Load and save glob patterns for files that shouldn't be indexed. Both ThreadsManager::index_document() and OnDiskHandler::indexFile() now check files with PinotSettings::isBlackListed(). Index_document() returns why it failed. PinotSettings saves the current version string to allow automatic upgrades. ------------------------------------------------------------------------ r546 | fabricecolin | 2006-11-01 17:07:12 +0100 (Wed, 01 Nov 2006) | 2 lines Changed paths: M /trunk/Utils/Document.cpp No point trying to mmap() an empty file. ------------------------------------------------------------------------ r545 | fabricecolin | 2006-11-01 14:58:32 +0100 (Wed, 01 Nov 2006) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp XDIR and XLABEL always include a colon at indexing time, and they should here as well. ------------------------------------------------------------------------ r544 | fabricecolin | 2006-10-31 15:19:54 +0100 (Tue, 31 Oct 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc Queries' Last Run column and index lists' Timestamp column were sorted alphabetically ! ------------------------------------------------------------------------ r543 | fabricecolin | 2006-10-30 15:55:10 +0100 (Mon, 30 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Refresh index lists when labels have changed and the combo is reset. ------------------------------------------------------------------------ r542 | fabricecolin | 2006-10-29 10:42:04 +0100 (Sun, 29 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Fixed warnings. ------------------------------------------------------------------------ r541 | fabricecolin | 2006-10-29 10:05:51 +0100 (Sun, 29 Oct 2006) | 4 lines Changed paths: M /trunk/Index/LanguageDetector.cpp M /trunk/Index/Makefile.am M /trunk/Makefile.am M /trunk/configure.in If libtextcat has textcat_Cat(), set HAVE_TEXTCAT_CAT to use this function instead of textcat_Classify(). Install textcat3_conf.txt, so that if textcat_Version() returns version 3 we can initialize the library with it. ------------------------------------------------------------------------ r540 | fabricecolin | 2006-10-29 09:26:23 +0100 (Sun, 29 Oct 2006) | 3 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp M /trunk/configure.in Get the shared-mime-info prefix and use that instead of PREFIX to determine where MIME files are to be found. ------------------------------------------------------------------------ r539 | fabricecolin | 2006-10-27 17:39:17 +0200 (Fri, 27 Oct 2006) | 2 lines Changed paths: A /trunk/textcat3_conf.txt A configuration file for language models included in the upcoming libtextcat 3.0. ------------------------------------------------------------------------ r538 | fabricecolin | 2006-10-24 14:41:18 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/EnginesTree.h M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/indexDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh Added missing copyright notice. ------------------------------------------------------------------------ r537 | fabricecolin | 2006-10-24 12:46:15 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexFactory.cpp M /trunk/Index/IndexFactory.h M /trunk/Index/IndexInterface.h M /trunk/Index/LanguageDetector.cpp M /trunk/Index/LanguageDetector.h M /trunk/Index/XapianDatabase.cpp M /trunk/Index/XapianDatabase.h M /trunk/Index/XapianDatabaseFactory.cpp M /trunk/Index/XapianDatabaseFactory.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/pinot-index.cpp M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorEvent.cpp M /trunk/Monitor/MonitorEvent.h M /trunk/Monitor/MonitorFactory.cpp M /trunk/Monitor/MonitorFactory.h M /trunk/Monitor/MonitorHandler.cpp M /trunk/Monitor/MonitorHandler.h M /trunk/Monitor/MonitorInterface.h M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Google/GoogleAPIEngine.h Added missing copyright notice. ------------------------------------------------------------------------ r536 | fabricecolin | 2006-10-24 12:26:18 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/DownloaderFactory.h M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/FileCollector.cpp M /trunk/Collect/FileCollector.h M /trunk/Collect/MboxCollector.cpp M /trunk/Collect/MboxCollector.h M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h M /trunk/Collect/XapianCollector.cpp M /trunk/Collect/XapianCollector.h M /trunk/Collect/pinot-collect.cpp M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/AbstractGenerator.h M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/OpenSearchParser.h M /trunk/Search/PluginParsers.h M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/PluginWebEngine.h M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/SearchEngineFactory.cpp M /trunk/Search/SearchEngineFactory.h M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/SearchPluginProperties.cpp M /trunk/Search/SearchPluginProperties.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/Search/pinot-search.cpp M /trunk/Search/plugintest.cpp Adeed missing copyright notice. ------------------------------------------------------------------------ r535 | fabricecolin | 2006-10-24 12:12:18 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h M /trunk/SQL/historytest.cpp M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/OpenDocumentTokenizer.cpp M /trunk/Tokenize/OpenDocumentTokenizer.h M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/TagLibTokenizer.cpp M /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/UnknownTypeTokenizer.cpp M /trunk/Tokenize/UnknownTypeTokenizer.h M /trunk/Tokenize/WordTokenizer.cpp M /trunk/Tokenize/WordTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/Tokenize/XmlTokenizer.h M /trunk/Tokenize/tokenizertest.cpp M /trunk/Utils/Url.h Do it properly !!! ------------------------------------------------------------------------ r534 | fabricecolin | 2006-10-24 07:48:19 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/SQL/SQLiteBase.cpp M /trunk/SQL/SQLiteBase.h M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h M /trunk/SQL/historytest.cpp Added missing copyright notice. ------------------------------------------------------------------------ r533 | fabricecolin | 2006-10-24 07:46:56 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/OpenDocumentTokenizer.cpp M /trunk/Tokenize/OpenDocumentTokenizer.h M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/TagLibTokenizer.cpp M /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/UnknownTypeTokenizer.cpp M /trunk/Tokenize/UnknownTypeTokenizer.h M /trunk/Tokenize/WordTokenizer.cpp M /trunk/Tokenize/WordTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/Tokenize/XmlTokenizer.h M /trunk/Tokenize/tokenizertest.cpp Added missing copyright notice. ------------------------------------------------------------------------ r532 | fabricecolin | 2006-10-24 07:43:55 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/A9.src A9's output changed. ------------------------------------------------------------------------ r531 | fabricecolin | 2006-10-24 07:43:03 +0200 (Tue, 24 Oct 2006) | 2 lines Changed paths: M /trunk/Utils/CommandLine.cpp M /trunk/Utils/CommandLine.h M /trunk/Utils/Document.cpp M /trunk/Utils/Document.h M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h M /trunk/Utils/IndexedDocument.cpp M /trunk/Utils/IndexedDocument.h M /trunk/Utils/Languages.cpp M /trunk/Utils/Languages.h M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h M /trunk/Utils/MboxParser.cpp M /trunk/Utils/MboxParser.h M /trunk/Utils/NLS.h M /trunk/Utils/Result.cpp M /trunk/Utils/Result.h M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h M /trunk/Utils/TimeConverter.cpp M /trunk/Utils/TimeConverter.h M /trunk/Utils/Timer.cpp M /trunk/Utils/Timer.h M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h Added missing copyright notice. ------------------------------------------------------------------------ r530 | fabricecolin | 2006-10-21 07:19:22 +0200 (Sat, 21 Oct 2006) | 2 lines Changed paths: M /trunk/Index/Makefile.am A /trunk/Index/XapianDatabase.cpp (from /trunk/Utils/XapianDatabase.cpp:528) A /trunk/Index/XapianDatabase.h (from /trunk/Utils/XapianDatabase.h:528) A /trunk/Index/XapianDatabaseFactory.cpp (from /trunk/Utils/XapianDatabaseFactory.cpp:528) A /trunk/Index/XapianDatabaseFactory.h (from /trunk/Utils/XapianDatabaseFactory.h:528) M /trunk/Makefile.am M /trunk/Monitor/Makefile.am M /trunk/Search/Google/Makefile.am M /trunk/Search/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am D /trunk/Utils/XapianDatabase.cpp D /trunk/Utils/XapianDatabase.h D /trunk/Utils/XapianDatabaseFactory.cpp D /trunk/Utils/XapianDatabaseFactory.h Moved XapianDatabase classes to Index, on which Search now depends. ------------------------------------------------------------------------ r528 | fabricecolin | 2006-10-18 15:08:33 +0200 (Wed, 18 Oct 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Same man pages with new version number. ------------------------------------------------------------------------ r527 | fabricecolin | 2006-10-18 14:59:50 +0200 (Wed, 18 Oct 2006) | 3 lines Changed paths: M /trunk/NEWS M /trunk/README M /trunk/TODO M /trunk/configure.in Updated README and NEWS with latest changes, added to TODO list. Releasing 0.61. ------------------------------------------------------------------------ r526 | fabricecolin | 2006-10-18 14:36:32 +0200 (Wed, 18 Oct 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Updated translations. ------------------------------------------------------------------------ r525 | fabricecolin | 2006-10-17 15:37:06 +0200 (Tue, 17 Oct 2006) | 2 lines Changed paths: M /trunk/Utils/xdgmime/xdgmime.c M /trunk/Utils/xdgmime/xdgmime.h M /trunk/Utils/xdgmime/xdgmimealias.c M /trunk/Utils/xdgmime/xdgmimecache.c M /trunk/Utils/xdgmime/xdgmimeglob.c M /trunk/Utils/xdgmime/xdgmimemagic.c Imported GTK+'s version of xdgmime. ------------------------------------------------------------------------ r524 | fabricecolin | 2006-10-17 15:34:03 +0200 (Tue, 17 Oct 2006) | 10 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp Finally woke up to the fact that : - abstracts were most of the time bogus because term position wasn't incremented correctly at indexing time. - updates didn't happen correctly, most of the time ? It seems to be due to the call to setDocumentLabels right after updateDocument. It was extraneous anyway since updateDocument preserves labels. Doh ! Abstracts are generated for all documents, even if they have a sample field, as produced by omindex. Some tweaks to the QueryParser setup. ------------------------------------------------------------------------ r523 | fabricecolin | 2006-10-16 15:59:29 +0200 (Mon, 16 Oct 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc If the document to open is a sub-type of text/plain and doesn't have a defined application, use that of text/plain. Useful for types like text/x-authors. ------------------------------------------------------------------------ r522 | fabricecolin | 2006-10-15 13:29:35 +0200 (Sun, 15 Oct 2006) | 3 lines Changed paths: M /trunk/pinot.spec.in Require 0.9.7. There's no new API we make use of, but the changes to the QueryParser are worth forcing an upgrade. ------------------------------------------------------------------------ r521 | fabricecolin | 2006-10-15 05:10:24 +0200 (Sun, 15 Oct 2006) | 3 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h Run queries through the QueryParser, remove filters and only keep non-prefixed terms when querying a Web engine, or generating an abstract. ------------------------------------------------------------------------ r520 | fabricecolin | 2006-10-15 02:40:43 +0200 (Sun, 15 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc Use more appropriate icon for the "Show all search engines" button. ------------------------------------------------------------------------ r519 | fabricecolin | 2006-10-15 02:17:57 +0200 (Sun, 15 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade Should have committed this in revision 512. ------------------------------------------------------------------------ r518 | fabricecolin | 2006-10-15 02:16:13 +0200 (Sun, 15 Oct 2006) | 2 lines Changed paths: A /trunk/Search/Plugins/GoogleCodeSearch.src Plugin for the new Google code search service. ------------------------------------------------------------------------ r517 | fabricecolin | 2006-10-15 02:15:26 +0200 (Sun, 15 Oct 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Less verbose DEBUG output. ------------------------------------------------------------------------ r516 | fabricecolin | 2006-10-15 02:14:47 +0200 (Sun, 15 Oct 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp Documents deleted since last crawl were not removed from the index if the corresponding location wasn't monitored. ------------------------------------------------------------------------ r515 | fabricecolin | 2006-10-14 13:11:08 +0200 (Sat, 14 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc Mail accounts were saved only if there were no directories configured ! ------------------------------------------------------------------------ r514 | fabricecolin | 2006-10-14 07:50:56 +0200 (Sat, 14 Oct 2006) | 2 lines Changed paths: M /trunk/README Updated section about Preferences. ------------------------------------------------------------------------ r513 | fabricecolin | 2006-10-14 07:50:02 +0200 (Sat, 14 Oct 2006) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/NEWS M /trunk/TODO Updated NEWS and AUTHORS, removed completed items from TODO (and added one new item). ------------------------------------------------------------------------ r512 | fabricecolin | 2006-10-14 07:41:06 +0200 (Sat, 14 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow_glade.cc Widget hostnamegroup1 was active. ------------------------------------------------------------------------ r511 | fabricecolin | 2006-10-14 07:22:45 +0200 (Sat, 14 Oct 2006) | 7 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Simplified the UI to make it less scary to people used to Beagle :-) By default, My Web Pages is not shown and the queries and engines lists are hidden. The latter is shown/hidden by using the toggle button on the left of the query field. This and the previous changes to the Preferences box were suggested by Manuel Breitfeld . ------------------------------------------------------------------------ r510 | fabricecolin | 2006-10-14 06:41:32 +0200 (Sat, 14 Oct 2006) | 3 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Previous check-in should have included pinot-search.cpp. Updated man pages. ------------------------------------------------------------------------ r509 | fabricecolin | 2006-10-14 06:40:11 +0200 (Sat, 14 Oct 2006) | 10 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh Extensive modifications to enable using Xapian::QueryParser and get rid of the straight-jacket of AND/OR/NOT/Phrase. The live and stored queries in the UI as well as SimpleSearch's searchText can be free queries mixing search terms and any number of filters, eg : "type:text/html and lang:en and (tcp near ip)" The downsides are that the host and file filters no longer apply to plugins or the Google SOAP API engine, and that generated abstracts are pretty much useless. Revamped the query dialog for easy adding of filters. ------------------------------------------------------------------------ r508 | fabricecolin | 2006-10-14 06:23:26 +0200 (Sat, 14 Oct 2006) | 2 lines Changed paths: M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h Modified extractField() method a bit. ------------------------------------------------------------------------ r507 | fabricecolin | 2006-10-13 15:12:12 +0200 (Fri, 13 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Fixed warnings. ------------------------------------------------------------------------ r506 | fabricecolin | 2006-10-13 15:09:39 +0200 (Fri, 13 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Removed Edit buttons from prefsDialog Indexing tab. ------------------------------------------------------------------------ r505 | fabricecolin | 2006-10-12 16:57:08 +0200 (Thu, 12 Oct 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Better SuSE support, suggested by Marcus Rueckert . ------------------------------------------------------------------------ r504 | fabricecolin | 2006-10-12 16:53:26 +0200 (Thu, 12 Oct 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Move My Documents and My Email under one tab labelled "Indexing". ------------------------------------------------------------------------ r503 | fabricecolin | 2006-10-06 18:38:23 +0200 (Fri, 06 Oct 2006) | 3 lines Changed paths: M /trunk/Collect/MboxCollector.cpp M /trunk/Utils/MboxParser.cpp M /trunk/Utils/MboxParser.h MboxCollector can access any part of a message. Removed hash ("h=...") from URL scheme. ------------------------------------------------------------------------ r502 | fabricecolin | 2006-10-06 16:45:04 +0200 (Fri, 06 Oct 2006) | 2 lines Changed paths: M /trunk/Utils/MboxParser.cpp M /trunk/Utils/MboxParser.h Go through all of a message's parts. Don't skip parts based on their MIME type. ------------------------------------------------------------------------ r501 | fabricecolin | 2006-10-05 16:36:22 +0200 (Thu, 05 Oct 2006) | 2 lines Changed paths: M /trunk/TODO Rearranged/removed tasks somewhat. ------------------------------------------------------------------------ r500 | fabricecolin | 2006-09-30 06:47:17 +0200 (Sat, 30 Sep 2006) | 3 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/OpenSearchParser.h Allow mozSearch plugins too as they are similar to OpenSearch Description, even though most, if not all, will return results in text/html ! ------------------------------------------------------------------------ r499 | fabricecolin | 2006-09-30 05:56:39 +0200 (Sat, 30 Sep 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in dbus-glib, not dbus, is required on Fedora, while on SuSE it's dbus-1-glib. ------------------------------------------------------------------------ r497 | fabricecolin | 2006-09-25 14:42:13 +0200 (Mon, 25 Sep 2006) | 3 lines Changed paths: M /trunk/NEWS M /trunk/README Corrected name of Deskbar plugin in README. Updated NEWS with missing items, set release date to today. ------------------------------------------------------------------------ r496 | fabricecolin | 2006-09-25 14:26:55 +0200 (Mon, 25 Sep 2006) | 2 lines Changed paths: M /trunk/pinot.desktop Modified Categories. ------------------------------------------------------------------------ r495 | fabricecolin | 2006-09-25 14:25:45 +0200 (Mon, 25 Sep 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h Further attempt at putting buttons in notebook tabs. This still doesn't work correctly and is disabled. I'll have to revisit this later... ------------------------------------------------------------------------ r494 | fabricecolin | 2006-09-25 14:23:40 +0200 (Mon, 25 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/prefsDialog_glade.cc Label text change. ------------------------------------------------------------------------ r493 | fabricecolin | 2006-09-25 14:22:17 +0200 (Mon, 25 Sep 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Updated translations. ------------------------------------------------------------------------ r492 | fabricecolin | 2006-09-23 15:21:33 +0200 (Sat, 23 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Cosmetic change. ------------------------------------------------------------------------ r491 | fabricecolin | 2006-09-23 09:44:31 +0200 (Sat, 23 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Don't set the status on stop(). ------------------------------------------------------------------------ r490 | fabricecolin | 2006-09-21 14:04:54 +0200 (Thu, 21 Sep 2006) | 2 lines Changed paths: M /trunk/Makefile.am D /trunk/UI/pinot-deskbar.py A /trunk/UI/pinot-live.py (from /trunk/UI/pinot-deskbar.py:463) M /trunk/pinot.spec.in Renamed deskbar plugin to pinot-live.py ------------------------------------------------------------------------ r489 | fabricecolin | 2006-09-20 16:24:47 +0200 (Wed, 20 Sep 2006) | 3 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Monitor/MonitorEvent.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/SearchPluginProperties.cpp M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/Utils/Document.cpp M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/IndexedDocument.cpp M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/Result.cpp M /trunk/Utils/Timer.cpp M /trunk/Utils/Url.cpp M /trunk/Utils/XapianDatabase.cpp Copy operators check we are not trying to copy the current object. Some didn't return anything ! ------------------------------------------------------------------------ r488 | fabricecolin | 2006-09-14 13:19:18 +0200 (Thu, 14 Sep 2006) | 2 lines Changed paths: M /trunk/TODO Slightly less to do... ------------------------------------------------------------------------ r487 | fabricecolin | 2006-09-14 13:15:04 +0200 (Thu, 14 Sep 2006) | 2 lines Changed paths: M /trunk/README Clarified a few things, added blurb about the Deskbar Applet plugin. ------------------------------------------------------------------------ r486 | fabricecolin | 2006-09-13 16:45:14 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp Align help text nicely. ------------------------------------------------------------------------ r485 | fabricecolin | 2006-09-13 16:41:25 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/README Talk a bit more about how the D-Bus service functions. ------------------------------------------------------------------------ r484 | fabricecolin | 2006-09-13 15:17:48 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/README Hugely more useful README ! ------------------------------------------------------------------------ r483 | fabricecolin | 2006-09-13 14:57:38 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/NEWS Updated with recent changes. ------------------------------------------------------------------------ r482 | fabricecolin | 2006-09-13 14:56:25 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/AUTHORS Mention code was borrowed from wget. ------------------------------------------------------------------------ r481 | fabricecolin | 2006-09-13 14:55:38 +0200 (Wed, 13 Sep 2006) | 3 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Index/Makefile.am M /trunk/Search/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am Bundle Utils classes needed by the tokenizers in libBasicUtils, WITHOUT any static data that would mess things up in the UI after it loads the tokenizers. ------------------------------------------------------------------------ r480 | fabricecolin | 2006-09-13 14:53:48 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh Start the service if there are directories OR mbox files to index ! ------------------------------------------------------------------------ r479 | fabricecolin | 2006-09-13 14:51:31 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Index/pinot-index.cpp Added --showinfo option. This outputs the document's DocumentInfo, with labels. ------------------------------------------------------------------------ r478 | fabricecolin | 2006-09-13 14:48:00 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/Utils/XapianDatabase.cpp Minor changea and less verbose debug output. ------------------------------------------------------------------------ r477 | fabricecolin | 2006-09-13 14:47:10 +0200 (Wed, 13 Sep 2006) | 2 lines Changed paths: M /trunk/Utils/Timer.cpp Doh ! What kind of broken timer is that !? ------------------------------------------------------------------------ r476 | fabricecolin | 2006-09-12 15:13:47 +0200 (Tue, 12 Sep 2006) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/UI/GTK2/src/prefsDialog.cc DBusXapianIndex exports the GetStatistics method. It's invoked when validating new preferences if there are locations to index. D-Bus activation will then start the service if not already running. ------------------------------------------------------------------------ r475 | fabricecolin | 2006-09-09 08:15:05 +0200 (Sat, 09 Sep 2006) | 3 lines Changed paths: M /trunk/Utils/StringManip.cpp M /trunk/Utils/TimeConverter.cpp Portability fixes. On platforms that don't have timegm(), use wget's mktime_from_utc() function. ------------------------------------------------------------------------ r474 | fabricecolin | 2006-09-09 04:25:28 +0200 (Sat, 09 Sep 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated man pages date. ------------------------------------------------------------------------ r473 | fabricecolin | 2006-09-08 14:35:34 +0200 (Fri, 08 Sep 2006) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/README Mention code was borrowed from Xapian, and that D-Bus and its Glib bindings are required. ------------------------------------------------------------------------ r472 | fabricecolin | 2006-09-08 14:33:52 +0200 (Fri, 08 Sep 2006) | 6 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h M /trunk/configure.in Borrowed code from Xapian so that the timestamp of documents is stored in the format used by Omega, as modtime in the document data, and as a value that allows sorting by date. In configure.in, added necessary checks and backtracked on recent change to AC_OUTPUT() : Search/Google/Makefile should be generated conditionally. ------------------------------------------------------------------------ r471 | fabricecolin | 2006-09-08 13:43:31 +0200 (Fri, 08 Sep 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h OnDiskHandler and DirectoryScannerThread populate CrawlHistory correctly. MonitorThread doesn't own the handler object. ------------------------------------------------------------------------ r470 | fabricecolin | 2006-09-08 13:39:57 +0200 (Fri, 08 Sep 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h Deal with IN_MOVE_SELF. ------------------------------------------------------------------------ r469 | fabricecolin | 2006-09-07 17:03:01 +0200 (Thu, 07 Sep 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp Fix in previous commit on WorkerThreads. Cosmetic change to OnDiskHandler. ------------------------------------------------------------------------ r468 | fabricecolin | 2006-09-07 15:29:08 +0200 (Thu, 07 Sep 2006) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in Install the deskbar plugin, package in separate RPM. ------------------------------------------------------------------------ r467 | fabricecolin | 2006-09-07 13:50:27 +0200 (Thu, 07 Sep 2006) | 2 lines Changed paths: M /trunk/Monitor/MonitorHandler.cpp M /trunk/Monitor/MonitorHandler.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp Update signalling in MonitorHandler is no longer useful. ------------------------------------------------------------------------ r466 | fabricecolin | 2006-09-07 13:41:21 +0200 (Thu, 07 Sep 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h OnDisk adds a label for the source the document being indexed belongs to. Protect against concurrent access. Removed unnecessary call to setDocumentLabels() in Mbox. ------------------------------------------------------------------------ r465 | fabricecolin | 2006-09-07 13:38:20 +0200 (Thu, 07 Sep 2006) | 4 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h Generate a CREATED event for directories moved from somewhere that was not monitored. Don't generate an internal event when a new directory is created. Protect against concurrent access. ------------------------------------------------------------------------ r464 | fabricecolin | 2006-09-07 13:35:33 +0200 (Thu, 07 Sep 2006) | 3 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h DirectoryScanner deletes items and informs the MonitorHandler for all entries associated to the source that were not found by the current crawl. ------------------------------------------------------------------------ r463 | fabricecolin | 2006-09-07 02:24:02 +0200 (Thu, 07 Sep 2006) | 3 lines Changed paths: M /trunk/UI/pinot-deskbar.py Somewhat better plugin. Handler inherits from SignallingHandler and calls to GetDocumentInfo are asynchronous. ------------------------------------------------------------------------ r462 | fabricecolin | 2006-09-05 14:14:31 +0200 (Tue, 05 Sep 2006) | 2 lines Changed paths: A /trunk/UI/pinot-deskbar.py First working version of a plugin for deskbar-applet. ------------------------------------------------------------------------ r461 | fabricecolin | 2006-09-05 14:14:02 +0200 (Tue, 05 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Cosmetic changes. ------------------------------------------------------------------------ r460 | fabricecolin | 2006-09-05 14:13:29 +0200 (Tue, 05 Sep 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc A lot of changes that boil down to making correct use of the D-Bus API, eg process messages in the right place. ------------------------------------------------------------------------ r459 | fabricecolin | 2006-09-04 17:03:21 +0200 (Mon, 04 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc Fixed previous check-in ! ------------------------------------------------------------------------ r458 | fabricecolin | 2006-09-04 13:45:43 +0200 (Mon, 04 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Fixed usage of array in GetDocumentLabels and SimpleQuery methods. ------------------------------------------------------------------------ r457 | fabricecolin | 2006-09-03 10:42:17 +0200 (Sun, 03 Sep 2006) | 2 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h Protect calls to xdgmime with mutex. ------------------------------------------------------------------------ r456 | fabricecolin | 2006-09-02 10:19:50 +0200 (Sat, 02 Sep 2006) | 2 lines Changed paths: M /trunk/Search/SearchEngineFactory.cpp Check HAVE_GOOGLEAPI. ------------------------------------------------------------------------ r455 | fabricecolin | 2006-09-02 10:18:32 +0200 (Sat, 02 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh Monitoring indexable locations is optional. ------------------------------------------------------------------------ r454 | fabricecolin | 2006-09-02 09:58:22 +0200 (Sat, 02 Sep 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Removed obsolete cache stuff from DownloadingThread. ------------------------------------------------------------------------ r453 | fabricecolin | 2006-09-02 09:55:12 +0200 (Sat, 02 Sep 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Initialize and shutdown HtmlTokenizer. ------------------------------------------------------------------------ r452 | fabricecolin | 2006-09-02 09:53:52 +0200 (Sat, 02 Sep 2006) | 2 lines Changed paths: M /trunk/configure.in It should be ok to generate Search/Google/Makefile when SOAP support is off. ------------------------------------------------------------------------ r451 | fabricecolin | 2006-09-02 09:51:42 +0200 (Sat, 02 Sep 2006) | 3 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h This wasn't thread-safe. xmlInitParser() shouldn't be called by each thread. Moreover the parsed document wasn't freed. ------------------------------------------------------------------------ r450 | fabricecolin | 2006-09-02 05:42:09 +0200 (Sat, 02 Sep 2006) | 3 lines Changed paths: M /trunk/Index/pinot-index.1 M /trunk/Index/pinot-index.cpp Made sure index was open in the appropriate mode. Removed reference to pinot-search in man page. ------------------------------------------------------------------------ r449 | fabricecolin | 2006-08-31 13:46:35 +0200 (Thu, 31 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/WorkerThreads.cpp Support for move and delete events on directories. OnDisk will move or delete all documents under a given directory. ------------------------------------------------------------------------ r448 | fabricecolin | 2006-08-31 13:44:26 +0200 (Thu, 31 Aug 2006) | 3 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/MonitorHandler.h If a watch has moved, INotifyMonitor updates its details. New methods to MonitorHandler for events on directories. ------------------------------------------------------------------------ r447 | fabricecolin | 2006-08-31 13:42:41 +0200 (Thu, 31 Aug 2006) | 3 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h List of documents under a given directory can be obtained with listDocumentsInDirectory(). ------------------------------------------------------------------------ r446 | fabricecolin | 2006-08-30 16:15:46 +0200 (Wed, 30 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Slightly better at monitoring directories. New directories are crawled and monitored but moving and deletion don't do the right thing just yet. ------------------------------------------------------------------------ r445 | fabricecolin | 2006-08-30 14:21:55 +0200 (Wed, 30 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp Minor fixes. OnDiskHandler sets the document's title to the file name, not the full URL. ------------------------------------------------------------------------ r444 | fabricecolin | 2006-08-29 15:40:43 +0200 (Tue, 29 Aug 2006) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/Tokenize/HtmlTokenizer.cpp Less verbose in DEBUG mode. ------------------------------------------------------------------------ r443 | fabricecolin | 2006-08-29 15:39:00 +0200 (Tue, 29 Aug 2006) | 3 lines Changed paths: M /trunk/Search/XapianEngine.cpp Backtracking on revision 353's changes. This will have to be revisited at some point though. ------------------------------------------------------------------------ r442 | fabricecolin | 2006-08-29 15:31:28 +0200 (Tue, 29 Aug 2006) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexFactory.cpp M /trunk/Index/IndexFactory.h M /trunk/Index/IndexInterface.h M /trunk/Index/Makefile.am D /trunk/Index/WritableXapianIndex.cpp D /trunk/Index/WritableXapianIndex.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/pinot-index.cpp M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc On second thought, merged WritableXapianIndex back into XapianIndex. This complicated matters unnecessarily, especially since R/W access is handled so nicely by XapianDatabase. ------------------------------------------------------------------------ r441 | fabricecolin | 2006-08-28 13:22:21 +0200 (Mon, 28 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc Not all preferences were saved ! ------------------------------------------------------------------------ r440 | fabricecolin | 2006-08-27 04:38:26 +0200 (Sun, 27 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Expand queries with ExpandQueryThread rather than QueryingThread. ------------------------------------------------------------------------ r439 | fabricecolin | 2006-08-27 04:35:51 +0200 (Sun, 27 Aug 2006) | 2 lines Changed paths: M /trunk/Monitor/Makefile.am M /trunk/Monitor/MonitorFactory.cpp M /trunk/Search/Makefile.am M /trunk/Search/PluginWebEngine.cpp Fixed conditional building. ------------------------------------------------------------------------ r438 | fabricecolin | 2006-08-26 13:48:36 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/AUTHORS Mention Reini's contribution. ------------------------------------------------------------------------ r437 | fabricecolin | 2006-08-26 13:43:57 +0200 (Sat, 26 Aug 2006) | 5 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Index/Makefile.am M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/Makefile.am M /trunk/Monitor/MonitorFactory.cpp M /trunk/SQL/Makefile.am M /trunk/Search/Makefile.am M /trunk/Search/PluginWebEngine.cpp M /trunk/Tokenize/Makefile.am M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/UI/GTK2/src/Makefile.am M /trunk/configure.in A whole bunch of changes for building with CygWin, courtesy of Reini Urban (rurban AT x-ray DOT at). Thanks ! INotifyMonitor and SherlockParser are built based on the availability of inotify and boost Spirit, respectively. ------------------------------------------------------------------------ r436 | fabricecolin | 2006-08-26 12:21:08 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp Cosmetic changes. ------------------------------------------------------------------------ r435 | fabricecolin | 2006-08-26 11:01:14 +0200 (Sat, 26 Aug 2006) | 4 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/WritableXapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Utils/XapianDatabase.cpp M /trunk/Utils/XapianDatabase.h Reopen the database to the latest version before adding it to another one in XapianDatabase, before any read operations in DBusXapianIndex and before querying it in XapianEngine. ------------------------------------------------------------------------ r434 | fabricecolin | 2006-08-26 10:33:31 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot-dbus-daemon.1 M /trunk/UI/GTK2/src/pinot.1 Updated version number. ------------------------------------------------------------------------ r433 | fabricecolin | 2006-08-26 05:47:45 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/configure.in Cut down on the number of calls to AC_OUTPUT, until I figure out how to do it properly ;-) ------------------------------------------------------------------------ r432 | fabricecolin | 2006-08-26 05:46:16 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in On SuSE, pdftotext is provided by xpdf-tools, not poppler-utils. ------------------------------------------------------------------------ r431 | fabricecolin | 2006-08-26 05:44:38 +0200 (Sat, 26 Aug 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow_glade.cc My Documents becomes My Web Pages, My Computer becomes My Documents. Use the "merged" index for terms suggestion, More Like This and determining which documents are indexed locally. Other miscellaneous changes. ------------------------------------------------------------------------ r430 | fabricecolin | 2006-08-26 05:41:09 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Manage the list of indexable locations. ------------------------------------------------------------------------ r429 | fabricecolin | 2006-08-26 05:39:36 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h Renamed MailAccountModelColumns to TimestampedModelColumns. ------------------------------------------------------------------------ r428 | fabricecolin | 2006-08-26 05:38:08 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh Simplified importing. Since the daemon takes care of local files, only URLs can be imported into the index. ------------------------------------------------------------------------ r427 | fabricecolin | 2006-08-26 05:36:20 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade Indexable locations are configured in the preferences box. Only URLs can be imported now. ------------------------------------------------------------------------ r426 | fabricecolin | 2006-08-26 05:32:03 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Merge both index and daemon into one, this will be useful for terms suggestions and More Like This. ------------------------------------------------------------------------ r425 | fabricecolin | 2006-08-26 05:30:28 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/Utils/XapianDatabase.cpp If opening a read-only database that doesn't exist, create it first instead of failing miserably :-) ------------------------------------------------------------------------ r424 | fabricecolin | 2006-08-26 04:38:26 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp Don't fail if a term's position list cannot be found, it's to be expected if the query's terms were OR'ed. And check the start position makes sense too ! ------------------------------------------------------------------------ r423 | fabricecolin | 2006-08-26 04:33:52 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/SQL/SQLiteBase.cpp M /trunk/Utils/XapianDatabase.cpp M /trunk/Utils/XapianDatabaseFactory.cpp Fewer DEBUG messages. ------------------------------------------------------------------------ r422 | fabricecolin | 2006-08-26 03:36:43 +0200 (Sat, 26 Aug 2006) | 3 lines Changed paths: M /trunk/Utils/XapianDatabase.cpp M /trunk/Utils/XapianDatabase.h M /trunk/Utils/XapianDatabaseFactory.cpp M /trunk/Utils/XapianDatabaseFactory.h Allow to add a database to another with Xapian::Database::add_database() while making sure both are read-locked. ------------------------------------------------------------------------ r421 | fabricecolin | 2006-08-26 03:34:15 +0200 (Sat, 26 Aug 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp getCloseTerms() can suggest terms starting with an upper-case letter. ------------------------------------------------------------------------ r420 | fabricecolin | 2006-08-24 15:20:28 +0200 (Thu, 24 Aug 2006) | 2 lines Changed paths: M /trunk/Index/DBusXapianIndex.h M /trunk/Index/WritableXapianIndex.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Cosmetic changes. ------------------------------------------------------------------------ r419 | fabricecolin | 2006-08-24 13:33:17 +0200 (Thu, 24 Aug 2006) | 2 lines Changed paths: M /trunk/Monitor/MonitorEvent.cpp M /trunk/Search/SearchPluginProperties.cpp M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h M /trunk/Utils/XapianDatabase.cpp Fixed issues with some copy constructors. ------------------------------------------------------------------------ r418 | fabricecolin | 2006-08-22 13:56:58 +0200 (Tue, 22 Aug 2006) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp Forgot to unescape URLs in getSourceItems(). ------------------------------------------------------------------------ r417 | fabricecolin | 2006-08-21 15:49:35 +0200 (Mon, 21 Aug 2006) | 6 lines Changed paths: M /trunk/Monitor/MonitorHandler.cpp M /trunk/Monitor/MonitorHandler.h M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Renamed MonitorHandler::getLocations() to getFileNames() since this only deals with files. Monitoring those is done by MonitoringThread, which checks for events right after initializing the handler. OnDiskHandler::indexFile() creates an empty document if the file couldn't be downloaded. ------------------------------------------------------------------------ r416 | fabricecolin | 2006-08-21 14:01:29 +0200 (Mon, 21 Aug 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp Bug fix in retrievePendingEvents(). ------------------------------------------------------------------------ r415 | fabricecolin | 2006-08-19 17:28:06 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Fix for previous check-in. ------------------------------------------------------------------------ r414 | fabricecolin | 2006-08-19 16:30:31 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp Don't request write access in hasLabel(). ------------------------------------------------------------------------ r413 | fabricecolin | 2006-08-19 14:04:57 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Index/pinot-index.1 M /trunk/NEWS M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing for 0.60 release. ------------------------------------------------------------------------ r412 | fabricecolin | 2006-08-19 13:43:26 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/po/POTFILES.in Updated list of source files with translatable strings. ------------------------------------------------------------------------ r411 | fabricecolin | 2006-08-19 13:41:09 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/DaemonState.h M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc Synced with changes to PinotSettings and DirectoryScannerThread. ------------------------------------------------------------------------ r410 | fabricecolin | 2006-08-19 13:35:25 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh Synced with new PinotSettings and DirectoryScannerThread. ------------------------------------------------------------------------ r409 | fabricecolin | 2006-08-19 13:33:56 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp Caught up with changes made elsewhere. ------------------------------------------------------------------------ r408 | fabricecolin | 2006-08-19 13:32:36 +0200 (Sat, 19 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Use PinotSettings::getR?Index() methods to get an index. Definition of scanner's FileFound signal changed. ------------------------------------------------------------------------ r407 | fabricecolin | 2006-08-19 13:29:25 +0200 (Sat, 19 Aug 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h Index factory methods return an Index onject of the right type depending on where the index is located and whether DBus usage is enabled. Dropped mail index. ------------------------------------------------------------------------ r406 | fabricecolin | 2006-08-19 13:26:26 +0200 (Sat, 19 Aug 2006) | 3 lines Changed paths: M /trunk/Index/DBusXapianIndex.cpp M /trunk/Index/DBusXapianIndex.h M /trunk/Index/IndexInterface.h M /trunk/Index/Makefile.am M /trunk/Index/WritableXapianIndex.cpp M /trunk/Index/WritableXapianIndex.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Index object are copiable. Build new classes. ------------------------------------------------------------------------ r405 | fabricecolin | 2006-08-19 13:22:04 +0200 (Sat, 19 Aug 2006) | 2 lines Changed paths: M /trunk/Index/pinot-index.cpp Modified to use IndexFactory. ------------------------------------------------------------------------ r404 | fabricecolin | 2006-08-19 13:13:09 +0200 (Sat, 19 Aug 2006) | 3 lines Changed paths: A /trunk/Index/IndexFactory.cpp A /trunk/Index/IndexFactory.h Factory class to build read-only and read-write index objects. Supported types are "xapian" and "dbus". ------------------------------------------------------------------------ r403 | fabricecolin | 2006-08-19 07:24:48 +0200 (Sat, 19 Aug 2006) | 4 lines Changed paths: A /trunk/Index/DBusXapianIndex.cpp A /trunk/Index/DBusXapianIndex.h An implementation of WritableIndexInterface that talks to the daemon via DBus. Methods (un)indexDocument(s) don't do anything since they are not exported by the daemon. ------------------------------------------------------------------------ r402 | fabricecolin | 2006-08-19 07:21:23 +0200 (Sat, 19 Aug 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Fixed a whole bunch of stuff DBus-wise. On second thought, it doesn't make sense to allow users to (un)index documents so the corresponding methods were removed. ------------------------------------------------------------------------ r401 | fabricecolin | 2006-08-18 13:03:26 +0200 (Fri, 18 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h Fixed initialize(), implemented flushIndex(), synced with MonitorHandler. ------------------------------------------------------------------------ r400 | fabricecolin | 2006-08-18 12:57:42 +0200 (Fri, 18 Aug 2006) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp Fixed SQL in getSourceItems(). ------------------------------------------------------------------------ r399 | fabricecolin | 2006-08-18 12:56:52 +0200 (Fri, 18 Aug 2006) | 4 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorEvent.h M /trunk/Monitor/MonitorHandler.cpp M /trunk/Monitor/MonitorHandler.h M /trunk/Monitor/MonitorInterface.h Upon success, addLocation() generates an internal event of type EXISTS that's returned by the next call to retrievePendingEvents(). Some minor tweaks. ------------------------------------------------------------------------ r398 | fabricecolin | 2006-08-16 16:55:41 +0200 (Wed, 16 Aug 2006) | 2 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp Minor fix. ------------------------------------------------------------------------ r397 | fabricecolin | 2006-08-16 16:36:12 +0200 (Wed, 16 Aug 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/OnDiskHandler.cpp M /trunk/UI/GTK2/src/OnDiskHandler.h Synced with changes to MonitorHandler. Both handlers remove documents that belong to sources that were previously monitored but no longer are. Use the daemon index for mail messages rather than a separate index. ------------------------------------------------------------------------ r396 | fabricecolin | 2006-08-16 16:10:35 +0200 (Wed, 16 Aug 2006) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h New methods getSources() and getSourceItems(). ------------------------------------------------------------------------ r395 | fabricecolin | 2006-08-16 16:09:24 +0200 (Wed, 16 Aug 2006) | 4 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/MonitorEvent.cpp M /trunk/Monitor/MonitorEvent.h M /trunk/Monitor/MonitorHandler.h Modified MonitorHandler::getLocations(), added initialize() and m_isWatchi to MonitorEvent. Fixed events checking in INotifyMonitor. ------------------------------------------------------------------------ r394 | fabricecolin | 2006-08-12 09:47:23 +0200 (Sat, 12 Aug 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/UI/GTK2/src/pinot.cc All methods of WritableIndexInterface have DBus equivalents now, even though most don't actually do anything useful just yet... Both Pinot and the daemon set XAPIAN_PREFER_FLINT in the environment so that the new indices will be created with the Flint backend, instead of Quartz. ------------------------------------------------------------------------ r393 | fabricecolin | 2006-08-11 17:20:29 +0200 (Fri, 11 Aug 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc The index queue is now a queue, not a set. Uniqueness is guaranteed by ThreadsManager::m_beingIndexed, which now covers indexing and updating. URLs are removed from it by pop_queue(). ------------------------------------------------------------------------ r392 | fabricecolin | 2006-08-11 17:13:19 +0200 (Fri, 11 Aug 2006) | 5 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/WritableXapianIndex.cpp M /trunk/Index/WritableXapianIndex.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h IndexInterface has a new method, getLastDocumentID(). In WritableXapianIndex, store the MIME type and the directory hierarchy of the document with prefixes T and XDIR. Sorted out issues with language when updating the document info. ------------------------------------------------------------------------ r391 | fabricecolin | 2006-08-10 14:03:24 +0200 (Thu, 10 Aug 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Topix.src Fixed results extraction. ------------------------------------------------------------------------ r390 | fabricecolin | 2006-08-10 14:00:27 +0200 (Thu, 10 Aug 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h DaemonState uses file name as title, not the whole URL. In ThreadsManager::queue_index(), queue documents if the load is too high. DirectoryScannerThread checks the file's last modification time if it has already been crawled. ------------------------------------------------------------------------ r389 | fabricecolin | 2006-08-10 13:55:17 +0200 (Thu, 10 Aug 2006) | 2 lines Changed paths: M /trunk/Index/WritableXapianIndex.cpp M /trunk/Search/SherlockParser.cpp Cosmetic changes. ------------------------------------------------------------------------ r388 | fabricecolin | 2006-08-09 12:15:42 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am Don't link the daemon program to gtkmm, only glibmm is necessary. ------------------------------------------------------------------------ r387 | fabricecolin | 2006-08-09 12:14:52 +0200 (Wed, 09 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/OnDiskHandler.h M /trunk/UI/GTK2/src/WorkerThreads.cpp MonitorThread is now more useful and checks which files have already been crawled. Minor fix to OnDiskHandler.h. ------------------------------------------------------------------------ r386 | fabricecolin | 2006-08-09 12:11:58 +0200 (Wed, 09 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/prefsDialog.cc MboxHandler uses CrawlHistory to record which files have been crawled and when. In PinotSettings, MailAccount and use of Gdk::Color were dropped. ------------------------------------------------------------------------ r385 | fabricecolin | 2006-08-09 12:09:02 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot.cc Mail monitoring and indexing is now handled by the daemon program too. ------------------------------------------------------------------------ r384 | fabricecolin | 2006-08-09 11:46:09 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/UI/GTK2/src/de.berlios.Pinot.service.in M /trunk/configure.in Fixes. ------------------------------------------------------------------------ r383 | fabricecolin | 2006-08-09 11:45:34 +0200 (Wed, 09 Aug 2006) | 4 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h M /trunk/SQL/SQLiteBase.cpp Some modifications to CrawlHistory to accomodate MboxHandler. SQLiteBase sets up a busy handler so that we always retry operations if the database file is locked. ------------------------------------------------------------------------ r382 | fabricecolin | 2006-08-09 09:36:46 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/SQL/CrawlHistory.cpp M /trunk/SQL/CrawlHistory.h Added getItemsCount(). ------------------------------------------------------------------------ r381 | fabricecolin | 2006-08-09 09:36:07 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml Fixed handling of Disconnected, added methods GetStatistics and Stop. ------------------------------------------------------------------------ r380 | fabricecolin | 2006-08-09 07:58:58 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: A /trunk/Search/Plugins/MozDexDescription.xml Brought MozDex plugin back in. ------------------------------------------------------------------------ r379 | fabricecolin | 2006-08-09 07:58:07 +0200 (Wed, 09 Aug 2006) | 4 lines Changed paths: M /trunk/Makefile.am A /trunk/UI/GTK2/src/de.berlios.Pinot.service.in A /trunk/UI/GTK2/src/pinot-dbus-daemon.1 A /trunk/acinclude.m4 M /trunk/configure.in M /trunk/pinot.spec.in Man page and DBus service file for the daemon program. acinclude.m4 defines a useful macro for variables expansion, copied from Raphaël Slinckx's tutorial at "http://raphael.slinckx.net/dbustutorial.php". ------------------------------------------------------------------------ r378 | fabricecolin | 2006-08-09 06:39:02 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml New methods GetDocumentInfo and SimpleQuery. ------------------------------------------------------------------------ r377 | fabricecolin | 2006-08-09 05:14:51 +0200 (Wed, 09 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp Always run a DirectoryScanner thread for each indexable location. ------------------------------------------------------------------------ r376 | fabricecolin | 2006-08-09 04:53:12 +0200 (Wed, 09 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc DirectoryScannerThread records directories in CrawlHistory. The table is created at startup if necessary by the daemon program. ------------------------------------------------------------------------ r375 | fabricecolin | 2006-08-08 17:05:09 +0200 (Tue, 08 Aug 2006) | 2 lines Changed paths: A /trunk/SQL/CrawlHistory.cpp A /trunk/SQL/CrawlHistory.h M /trunk/SQL/Makefile.am First shot at CrawlHistory. ------------------------------------------------------------------------ r374 | fabricecolin | 2006-08-05 06:39:20 +0200 (Sat, 05 Aug 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/DaemonState.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h WorkerThread::immediateFlush() controls whether threads that modify indexes should flush before returning. This is turned off for the daemon. ------------------------------------------------------------------------ r373 | fabricecolin | 2006-08-04 15:23:59 +0200 (Fri, 04 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot-dbus-daemon.cc M /trunk/UI/GTK2/src/pinot-dbus-daemon.xml First functional version of the daemon. Updated Index method. ------------------------------------------------------------------------ r372 | fabricecolin | 2006-08-04 14:21:01 +0200 (Fri, 04 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am Added new source. ------------------------------------------------------------------------ r371 | fabricecolin | 2006-08-04 14:14:57 +0200 (Fri, 04 Aug 2006) | 3 lines Changed paths: A /trunk/UI/GTK2/src/DaemonState.cpp A /trunk/UI/GTK2/src/DaemonState.h A /trunk/UI/GTK2/src/OnDiskHandler.cpp A /trunk/UI/GTK2/src/OnDiskHandler.h New classes to help the daemon. DaemonState manages crawling and monitoring, while OnDiskHandler handles events generated by MonitorThread. ------------------------------------------------------------------------ r370 | fabricecolin | 2006-08-04 14:11:48 +0200 (Fri, 04 Aug 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h Nested class TimestampedItem used for indexable locations. ------------------------------------------------------------------------ r369 | fabricecolin | 2006-08-04 14:04:58 +0200 (Fri, 04 Aug 2006) | 2 lines Changed paths: M /trunk/Monitor/MonitorHandler.h M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Prototype of MonitorHandler::fileMoved() has changed. ------------------------------------------------------------------------ r368 | fabricecolin | 2006-08-04 13:59:37 +0200 (Fri, 04 Aug 2006) | 3 lines Changed paths: M /trunk/Index/WritableXapianIndex.cpp M /trunk/Index/WritableXapianIndex.h In updateDocumentInfo(), refresh the document's common terms (prefixed with U etc...) and terms generated from the title. ------------------------------------------------------------------------ r367 | fabricecolin | 2006-07-21 16:43:46 +0200 (Fri, 21 Jul 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp Caught up with recent changes. ------------------------------------------------------------------------ r366 | fabricecolin | 2006-07-20 16:12:13 +0200 (Thu, 20 Jul 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc A few cosmetic changes. Open the deamon's index in read-only mode. ------------------------------------------------------------------------ r365 | fabricecolin | 2006-07-20 16:11:15 +0200 (Thu, 20 Jul 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Synced with XapianIndex changes. Show daemon's index in engines list. When viewing, if the type doesn't have a MIMEAction and the document's protocol is http, do as if it was an html document as the browser is very likely to be able to handle it. ------------------------------------------------------------------------ r364 | fabricecolin | 2006-07-20 16:03:04 +0200 (Thu, 20 Jul 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh Make use of the Threadsmanager queue. ------------------------------------------------------------------------ r363 | fabricecolin | 2006-07-20 16:01:55 +0200 (Thu, 20 Jul 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h Synced with XapianIndex changes. ------------------------------------------------------------------------ r362 | fabricecolin | 2006-07-20 16:00:55 +0200 (Thu, 20 Jul 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog.cc Minor fix : default to the first language in the list only if the name was actually a language name. ------------------------------------------------------------------------ r361 | fabricecolin | 2006-07-20 15:56:38 +0200 (Thu, 20 Jul 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h Load locations indexable by the daemon process. Define an index for the daemon use (temporarily named "My Computer"). ------------------------------------------------------------------------ r360 | fabricecolin | 2006-07-20 15:53:17 +0200 (Thu, 20 Jul 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h ThreadManager and IndexingThread expect the index to be specified. If the type of the document to index is not supported, skip its content and index only the metadata. ------------------------------------------------------------------------ r359 | fabricecolin | 2006-07-20 15:47:46 +0200 (Thu, 20 Jul 2006) | 3 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp If the type doesn't have a default action, get a list of his parent types and see if they do. ------------------------------------------------------------------------ r358 | fabricecolin | 2006-07-19 13:27:03 +0200 (Wed, 19 Jul 2006) | 2 lines Changed paths: M /trunk/Index/WritableXapianIndex.cpp If the document to index has no data, that's fine. We can still index metadata. ------------------------------------------------------------------------ r357 | fabricecolin | 2006-07-18 15:08:51 +0200 (Tue, 18 Jul 2006) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/Makefile.am A /trunk/Index/WritableXapianIndex.cpp A /trunk/Index/WritableXapianIndex.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/pinot-index.cpp Moved everything that requires a writable index to WritableIndexInterface and WritableXapianIndex. ------------------------------------------------------------------------ r356 | fabricecolin | 2006-07-16 12:11:27 +0200 (Sun, 16 Jul 2006) | 2 lines Changed paths: M /trunk/Utils/xdgmime/ChangeLog M /trunk/Utils/xdgmime/xdgmimemagic.c Caught up with xdgmime. ------------------------------------------------------------------------ r355 | fabricecolin | 2006-07-16 12:09:20 +0200 (Sun, 16 Jul 2006) | 2 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/XapianDatabase.cpp M /trunk/Utils/XapianDatabase.h Moved buildUrl() to XapianDatabase. ------------------------------------------------------------------------ r354 | fabricecolin | 2006-07-15 15:44:22 +0200 (Sat, 15 Jul 2006) | 2 lines Changed paths: D /trunk/Index/indextest.cpp A /trunk/Index/pinot-index.1 A /trunk/Index/pinot-index.cpp (from /trunk/Index/indextest.cpp:351) Renamed indextest.cpp to pinot-index.cpp. Added man page. ------------------------------------------------------------------------ r353 | fabricecolin | 2006-07-15 15:34:12 +0200 (Sat, 15 Jul 2006) | 7 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/Utils/XapianDatabase.cpp In XapianIndex::listDocuments*(), don't limit the number of document IDs if maxDocsCount is 0. In XapianEngine, use OP_FILTER for filters when possible so that weights are not skewed. In XapianDatabase and indexDialog, remote indexes have a location with no slash, rather than one that doesn't start with a slash ! ------------------------------------------------------------------------ r352 | fabricecolin | 2006-07-14 18:18:21 +0200 (Fri, 14 Jul 2006) | 3 lines Changed paths: M /trunk/Utils/XapianDatabase.cpp M /trunk/Utils/XapianDatabase.h Try to open the database if it's not already when one of the getter methods is called. ------------------------------------------------------------------------ r351 | fabricecolin | 2006-07-14 17:57:09 +0200 (Fri, 14 Jul 2006) | 2 lines Changed paths: M /trunk/Index/indextest.cpp Updated this, it should prove useful later. ------------------------------------------------------------------------ r350 | fabricecolin | 2006-07-14 17:45:56 +0200 (Fri, 14 Jul 2006) | 3 lines Changed paths: M /trunk/Collect/pinot-collect.cpp M /trunk/Search/pinot-search.cpp In pinot-search, better call XapianDatabaseFactory::closeAll() before exiting. Minor cleanup. ------------------------------------------------------------------------ r349 | fabricecolin | 2006-07-14 15:56:39 +0200 (Fri, 14 Jul 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp Show the type of Xapian::Error when caught. ------------------------------------------------------------------------ r348 | fabricecolin | 2006-07-14 14:12:57 +0200 (Fri, 14 Jul 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/DocumentInfo.h Simplified/refactored indexing code in the UI. Moved stuff useful to both the UI and the future D-Bus daemon into ThreadsManager when it makes sense. DocumentInfo can now hold a labels set. ------------------------------------------------------------------------ r347 | fabricecolin | 2006-07-14 14:08:55 +0200 (Fri, 14 Jul 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Score column gives some visual cue as to the relevance of results. ------------------------------------------------------------------------ r346 | fabricecolin | 2006-07-12 14:15:12 +0200 (Wed, 12 Jul 2006) | 4 lines Changed paths: M /trunk/Makefile.am A /trunk/UI/GTK2/src/pinot-dbus-daemon.cc A /trunk/UI/GTK2/src/pinot-dbus-daemon.xml M /trunk/configure.in M /trunk/pinot.spec.in M /trunk/po/POTFILES.in First shot at writing a D-Bus daemon that supports the method de.berlios.Pinot.Index. Bindings can be generated from pinot-dbus-daemon.xml with dbus-binding-tool. ------------------------------------------------------------------------ r345 | fabricecolin | 2006-07-07 14:40:59 +0200 (Fri, 07 Jul 2006) | 5 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp Catch all possible exceptions thrown by Xapian in generateAbstract() to avoid aborting the whole query. For instance, since position lists are not supported by the remote backend, generateAbstract() would throw an exception and thus prevent from querying a remote index. ------------------------------------------------------------------------ r343 | fabricecolin | 2006-07-05 13:41:48 +0200 (Wed, 05 Jul 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/README M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot.1 Clarified README, updated date of man pages. ------------------------------------------------------------------------ r342 | fabricecolin | 2006-07-05 13:07:09 +0200 (Wed, 05 Jul 2006) | 2 lines Changed paths: M /trunk/NEWS Releasing v0.50. ------------------------------------------------------------------------ r341 | fabricecolin | 2006-07-04 13:29:03 +0200 (Tue, 04 Jul 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Updated translations. ------------------------------------------------------------------------ r340 | fabricecolin | 2006-07-03 15:39:30 +0200 (Mon, 03 Jul 2006) | 2 lines Changed paths: M /trunk/TODO Revised the TODO list. ------------------------------------------------------------------------ r339 | fabricecolin | 2006-06-29 16:20:58 +0200 (Thu, 29 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Display a message if no application could be found for a given type. ------------------------------------------------------------------------ r338 | fabricecolin | 2006-06-28 13:08:33 +0200 (Wed, 28 Jun 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/NEWS M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing for 0.50 release... ------------------------------------------------------------------------ r337 | fabricecolin | 2006-06-28 12:45:08 +0200 (Wed, 28 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc Enable sorting on most tree columns. ------------------------------------------------------------------------ r336 | fabricecolin | 2006-06-28 12:43:04 +0200 (Wed, 28 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Expire query and view items older than one month. ------------------------------------------------------------------------ r335 | fabricecolin | 2006-06-28 12:41:36 +0200 (Wed, 28 Jun 2006) | 2 lines Changed paths: M /trunk/Utils/CommandLine.cpp Don't look for substitution strings in arguments. ------------------------------------------------------------------------ r334 | fabricecolin | 2006-06-28 02:28:18 +0200 (Wed, 28 Jun 2006) | 2 lines Changed paths: M /trunk/NEWS Summarized additions since last version. ------------------------------------------------------------------------ r333 | fabricecolin | 2006-06-28 02:19:44 +0200 (Wed, 28 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh Workaround for bizarre bug that would cause a segfault when creating a query that indexes and labels results based on a language filter. ------------------------------------------------------------------------ r332 | fabricecolin | 2006-06-26 13:20:29 +0200 (Mon, 26 Jun 2006) | 2 lines Changed paths: M /trunk/Utils/TimeConverter.cpp M /trunk/configure.in Don't use timelocal(), check for timegm(). ------------------------------------------------------------------------ r331 | fabricecolin | 2006-06-23 13:02:38 +0200 (Fri, 23 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Fix results indexing and labeling, accidentally broken by previous checkins. ------------------------------------------------------------------------ r330 | fabricecolin | 2006-06-22 15:58:35 +0200 (Thu, 22 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Rather than opening the new More Like query for editing, just add it/update its details in the queries list. ------------------------------------------------------------------------ r329 | fabricecolin | 2006-06-22 13:01:23 +0200 (Thu, 22 Jun 2006) | 6 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh More Like This is activated if at least one indexed result is selected. It runs the query against My Documents, gets expand terms and adds those to a copy of the query object ("More Like query_name") that can be edited, stored and run again. Several other small changes and fixes. ------------------------------------------------------------------------ r328 | fabricecolin | 2006-06-22 12:56:55 +0200 (Thu, 22 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Trimmed down ResultsModelColumns. Method setSelectionState() replaces setSelectionViewedState() and should be used when a result is indexed. ------------------------------------------------------------------------ r327 | fabricecolin | 2006-06-22 12:54:31 +0200 (Thu, 22 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Added More Like This item to the Results menu. ------------------------------------------------------------------------ r326 | fabricecolin | 2006-06-22 12:52:26 +0200 (Thu, 22 Jun 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.h Last commit should have included this file too... ------------------------------------------------------------------------ r325 | fabricecolin | 2006-06-22 12:51:34 +0200 (Thu, 22 Jun 2006) | 2 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h Moved getExpandTerms() functionality to Search/XapianEngine. ------------------------------------------------------------------------ r324 | fabricecolin | 2006-06-22 12:47:46 +0200 (Thu, 22 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/Utils/Languages.cpp Unknown is now in the languages list. This enables to search for documents for which language detection failed. ------------------------------------------------------------------------ r323 | fabricecolin | 2006-06-21 15:31:38 +0200 (Wed, 21 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h View results on double-click. Removed checkSelection() method. ------------------------------------------------------------------------ r322 | fabricecolin | 2006-06-16 18:06:53 +0200 (Fri, 16 Jun 2006) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Index can obtain expand terms based on one or more documents. This should be useful for a "more like this"-type feature. ------------------------------------------------------------------------ r321 | fabricecolin | 2006-06-16 15:36:02 +0200 (Fri, 16 Jun 2006) | 2 lines Changed paths: M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h Switched to pdftotext, which seems to support more files than pdftohtml. ------------------------------------------------------------------------ r320 | fabricecolin | 2006-06-16 15:21:48 +0200 (Fri, 16 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/MboxHandler.h Stopped using temporary labels to make lists of messages. ------------------------------------------------------------------------ r319 | fabricecolin | 2006-06-16 13:14:55 +0200 (Fri, 16 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh Allow to change the language of several documents at once. ------------------------------------------------------------------------ r318 | fabricecolin | 2006-06-16 02:19:19 +0200 (Fri, 16 Jun 2006) | 2 lines Changed paths: M /trunk/Makefile.am A /trunk/globalconfig.xml M /trunk/pinot.spec.in Global config file. ------------------------------------------------------------------------ r317 | fabricecolin | 2006-06-15 17:27:13 +0200 (Thu, 15 Jun 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh View Cache is now a menu that lists all cache providers. It is disabled when none of the selected results belong to a supported protocol. After editing a query, reselect the query in the list. Some minor fixes. ------------------------------------------------------------------------ r316 | fabricecolin | 2006-06-15 17:17:50 +0200 (Thu, 15 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/pinot.cc Load global settings at startup. The only configuration item currently supported is cache providers. ------------------------------------------------------------------------ r314 | fabricecolin | 2006-06-11 10:01:24 +0200 (Sun, 11 Jun 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/po/es.po M /trunk/po/fr.po Releasing 0.49. ------------------------------------------------------------------------ r313 | fabricecolin | 2006-06-10 05:04:26 +0200 (Sat, 10 Jun 2006) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am M /trunk/Tokenize/Tokenizer.cpp M /trunk/Utils/CommandLine.cpp M /trunk/Utils/CommandLine.h New CommandLine::runSync(), called by Tokenizer::runHelperProgram(). ------------------------------------------------------------------------ r312 | fabricecolin | 2006-06-10 05:03:03 +0200 (Sat, 10 Jun 2006) | 2 lines Changed paths: M /trunk/Makefile.am Distribute po/pinot.pot. ------------------------------------------------------------------------ r311 | fabricecolin | 2006-06-09 17:07:03 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp Less chatty in DEBUG mode. ------------------------------------------------------------------------ r310 | fabricecolin | 2006-06-09 17:05:54 +0200 (Fri, 09 Jun 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc View Cache now uses the Google cache at http://www.google.com/search?q=cache: and is enabled if at least one result on HTTP is selected. Non-HTTP results will be open as with View. ------------------------------------------------------------------------ r309 | fabricecolin | 2006-06-09 15:47:44 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Search/pinot-search.1 M /trunk/TODO M /trunk/UI/GTK2/src/pinot.1 M /trunk/po/es.po M /trunk/po/fr.po Synchronised with current source. ------------------------------------------------------------------------ r308 | fabricecolin | 2006-06-09 15:32:15 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/Utils/Document.cpp M /trunk/Utils/IndexedDocument.cpp Minor fix and cosmetic changes. ------------------------------------------------------------------------ r307 | fabricecolin | 2006-06-09 14:05:10 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/README M /trunk/pinot.spec.in M /trunk/po/POTFILES.in Dropped references to internal viewer stuff. ------------------------------------------------------------------------ r306 | fabricecolin | 2006-06-09 14:02:53 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc Remember whether the queries list is expanded. ------------------------------------------------------------------------ r305 | fabricecolin | 2006-06-09 13:35:14 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/Makefile.am D /trunk/UI/GTK2/index.html D /trunk/UI/GTK2/src/HtmlView.cpp D /trunk/UI/GTK2/src/HtmlView.h M /trunk/UI/GTK2/src/Makefile.am D /trunk/UI/RenderHTML M /trunk/configure.in Removing internal viewer stuff. ------------------------------------------------------------------------ r304 | fabricecolin | 2006-06-09 13:31:54 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h M /trunk/UI/GTK2/src/pinot.cc Removed HtmlView/MozillaRenderer code. ------------------------------------------------------------------------ r303 | fabricecolin | 2006-06-09 13:28:00 +0200 (Fri, 09 Jun 2006) | 6 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Dropped internal viewer completely. Always use default applications. For those documents that have to be downloaded first, save them into a temporary file. Several changes related to trees selection. When the IndexBrowser thread finishes, don't blindly return if the page isn't there anymore, make sure the progress bar is stopped. ------------------------------------------------------------------------ r302 | fabricecolin | 2006-06-09 13:20:55 +0200 (Fri, 09 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Both of these trees return the selected items as a DocumentInfo list. ResultsTree::setSelectionViewedState() replaces setFirstSelectionViewedState(). ------------------------------------------------------------------------ r301 | fabricecolin | 2006-06-09 13:19:00 +0200 (Fri, 09 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc Expand queryExpander. ------------------------------------------------------------------------ r300 | fabricecolin | 2006-06-09 13:18:00 +0200 (Fri, 09 Jun 2006) | 6 lines Changed paths: M /trunk/Utils/CommandLine.cpp M /trunk/Utils/MIMEScanner.cpp MIMEScanner registers the first valid application (ie the first one with a .desktop file) as the default for a given type. CommandLine spawns the same process as many times as necessary if it doesn't support multiple arguments. All of MIMEAction wasn't copied. Some other fixes. ------------------------------------------------------------------------ r299 | fabricecolin | 2006-06-08 16:55:53 +0200 (Thu, 08 Jun 2006) | 2 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp Bug fix on attributes parsing. ------------------------------------------------------------------------ r298 | fabricecolin | 2006-06-07 16:06:21 +0200 (Wed, 07 Jun 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/configure.in Being optimistic and preparing for next release :-) ------------------------------------------------------------------------ r297 | fabricecolin | 2006-06-07 16:00:12 +0200 (Wed, 07 Jun 2006) | 5 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Removed code relative to the external browser setting. HTML documents are viewed with the internal viewer; for others, use MIMEScanner::getDefaultAction() and CommandLine::runAsync() to launch the type's default application. ------------------------------------------------------------------------ r296 | fabricecolin | 2006-06-07 15:57:03 +0200 (Wed, 07 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh Removed documents view mode and browser selection from the preferences as pinot can now launch other applications. ------------------------------------------------------------------------ r295 | fabricecolin | 2006-06-07 15:33:52 +0200 (Wed, 07 Jun 2006) | 4 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Search/Makefile.am A /trunk/Utils/CommandLine.cpp A /trunk/Utils/CommandLine.h M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h M /trunk/Utils/Makefile.am M /trunk/configure.in Extended MIMEAction, added CommandLine: it runs commands asynchronously after having expanded parameters. Since this makes use of glibmm, check for it explictely in configure. ------------------------------------------------------------------------ r294 | fabricecolin | 2006-06-07 13:24:05 +0200 (Wed, 07 Jun 2006) | 6 lines Changed paths: M /trunk/Collect/pinot-collect.cpp M /trunk/Search/pinot-search.cpp M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/MIMEScanner.h MIMEScanner has to be initialized and shut down. When initialized, it parses shared-mime-info's default applications file and gets the Exec line out of the corresponding .desktop files, When shutdown, it shuts down xdgmime. Hopefully, this will allow to view most documents properly soon instead of blindly relying on the internal viewer. ------------------------------------------------------------------------ r293 | fabricecolin | 2006-06-07 13:08:40 +0200 (Wed, 07 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow_glade.cc Assigned decent defaults to mainWindow's dimensions. ------------------------------------------------------------------------ r292 | fabricecolin | 2006-06-06 14:33:51 +0200 (Tue, 06 Jun 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh The main window on_delete_event handler disconnects the page switch signal to avoid segfaults when the page being shown at exit time is the View page. ------------------------------------------------------------------------ r291 | fabricecolin | 2006-06-06 14:30:11 +0200 (Tue, 06 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp Don't truncate titles. ------------------------------------------------------------------------ r290 | fabricecolin | 2006-06-05 16:36:58 +0200 (Mon, 05 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc If import failed, set the progress bar's text to the error message. ------------------------------------------------------------------------ r289 | fabricecolin | 2006-06-05 16:32:58 +0200 (Mon, 05 Jun 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp Some fixes for charset conversions. ------------------------------------------------------------------------ r288 | fabricecolin | 2006-06-02 17:46:51 +0200 (Fri, 02 Jun 2006) | 3 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Utils/StringManip.cpp Some fixes for results title and abstract, including a fairly stupid bug in trimSpaces(). ------------------------------------------------------------------------ r287 | fabricecolin | 2006-06-02 16:04:47 +0200 (Fri, 02 Jun 2006) | 4 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Search/WebEngine.cpp M /trunk/Search/WebEngine.h If necessary, WebEngine looks for a http-equiv META tag for Content-Type to determine a page's charset. SherlockParser wraps chunks to make them look like full-blown documents. ------------------------------------------------------------------------ r286 | fabricecolin | 2006-06-02 15:59:47 +0200 (Fri, 02 Jun 2006) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/FileCollector.cpp M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp Removed unnecessary include. ------------------------------------------------------------------------ r285 | fabricecolin | 2006-06-02 15:57:08 +0200 (Fri, 02 Jun 2006) | 2 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/TokenizerFactory.cpp Various fixes. HtmlTokenizer can do validation without content extraction. ------------------------------------------------------------------------ r284 | fabricecolin | 2006-06-01 13:28:51 +0200 (Thu, 01 Jun 2006) | 2 lines Changed paths: M /trunk/Search/pinot-search.cpp M /trunk/Tokenize/XmlTokenizer.cpp Fixed stupid bug in stripTags(). Use it on the extract in pinot-search. ------------------------------------------------------------------------ r283 | fabricecolin | 2006-05-31 18:11:24 +0200 (Wed, 31 May 2006) | 3 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp HtmlTokenizer can attempt to find an abstract, basically the text between links. Some other changes. ------------------------------------------------------------------------ r282 | fabricecolin | 2006-05-31 16:56:30 +0200 (Wed, 31 May 2006) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Search/SearchEngineInterface.cpp M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h Less verbose CurlDownloader. Added StringManip::trimSpaces(). ------------------------------------------------------------------------ r281 | fabricecolin | 2006-05-30 14:46:42 +0200 (Tue, 30 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp XmlTokenizer::stripTags() replaces HtmlTokenizer::stripTags(). ------------------------------------------------------------------------ r280 | fabricecolin | 2006-05-30 14:44:50 +0200 (Tue, 30 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/mainWindow.cc Removed not so useful MIME type filtering in importDialog. ------------------------------------------------------------------------ r279 | fabricecolin | 2006-05-30 14:42:56 +0200 (Tue, 30 May 2006) | 3 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Search/pinot-search.cpp Synced with changes to Htmltokenizer and Link classes. The stripTags method is now in XmlTokenizer. ------------------------------------------------------------------------ r278 | fabricecolin | 2006-05-30 14:41:33 +0200 (Tue, 30 May 2006) | 2 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/FileCollector.cpp M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/pinot-collect.cpp Dropped HtmlDocument use. ------------------------------------------------------------------------ r277 | fabricecolin | 2006-05-30 14:39:56 +0200 (Tue, 30 May 2006) | 2 lines Changed paths: D /trunk/Utils/HtmlDocument.cpp D /trunk/Utils/HtmlDocument.h M /trunk/Utils/Makefile.am M /trunk/Utils/StringManip.cpp HtmlDocument is no longer necessary. Fixed StringManip::removeCharacters() ! ------------------------------------------------------------------------ r276 | fabricecolin | 2006-05-30 14:38:39 +0200 (Tue, 30 May 2006) | 4 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/Makefile.am M /trunk/Tokenize/OpenDocumentTokenizer.cpp M /trunk/Tokenize/OpenDocumentTokenizer.h M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/Tokenize/XmlTokenizer.h M /trunk/Tokenize/tokenizertest.cpp Revisited HTML parser. It now uses libxml2's HTMLparser. This is not yet wrapped by libxml++ by the way. Cleaned up other tokenizers. ------------------------------------------------------------------------ r275 | fabricecolin | 2006-05-30 13:02:43 +0200 (Tue, 30 May 2006) | 2 lines Changed paths: M /trunk/TODO Corrected inaccurracies :-) ------------------------------------------------------------------------ r273 | fabricecolin | 2006-05-25 16:40:08 +0200 (Thu, 25 May 2006) | 2 lines Changed paths: M /trunk/NEWS Releasing 0.48 today. ------------------------------------------------------------------------ r272 | fabricecolin | 2006-05-25 15:59:15 +0200 (Thu, 25 May 2006) | 2 lines Changed paths: M /trunk/Monitor/Makefile.am Distribute linux-inotify-syscalls.h. ------------------------------------------------------------------------ r271 | fabricecolin | 2006-05-24 15:59:01 +0200 (Wed, 24 May 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Synced with latest source. ------------------------------------------------------------------------ r270 | fabricecolin | 2006-05-24 15:56:48 +0200 (Wed, 24 May 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp Forgot to convert language from locale to English in updateDocument(). Lower-case the language name before requesting a stemmer. ------------------------------------------------------------------------ r269 | fabricecolin | 2006-05-24 15:54:51 +0200 (Wed, 24 May 2006) | 3 lines Changed paths: M /trunk/Collect/CurlDownloader.cpp M /trunk/Collect/FileCollector.cpp M /trunk/Collect/MboxCollector.cpp M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/XapianCollector.cpp M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/Utils/Document.cpp M /trunk/Utils/Document.h M /trunk/Utils/HtmlDocument.cpp M /trunk/Utils/HtmlDocument.h Document objects can be built from a DocumentInfo. Use DocumentInfo when possible so that no information (eg title, language...) is lost. ------------------------------------------------------------------------ r268 | fabricecolin | 2006-05-24 14:59:59 +0200 (Wed, 24 May 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp Check whether there's actually stuff to read ! ------------------------------------------------------------------------ r267 | fabricecolin | 2006-05-24 01:42:12 +0200 (Wed, 24 May 2006) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/TODO Mention one header was borrowed from libinotify. More items to do... ------------------------------------------------------------------------ r266 | fabricecolin | 2006-05-23 17:22:59 +0200 (Tue, 23 May 2006) | 5 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp A /trunk/Monitor/linux-inotify-syscalls.h M /trunk/configure.in Check whether inotify.h is located in linux/ or sys/. If the former, include linux-inotify-syscalls.h. This header is from Ryan Lortie's (desrt at desrt dot ca) libinotify project, and is slightly modified. These changes are required to build on Ubuntu Dapper Drake. ------------------------------------------------------------------------ r265 | fabricecolin | 2006-05-23 14:38:03 +0200 (Tue, 23 May 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/README Listed main changes, removed mention of FAM/Gamin. ------------------------------------------------------------------------ r264 | fabricecolin | 2006-05-23 13:19:56 +0200 (Tue, 23 May 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Updated translations. ------------------------------------------------------------------------ r263 | fabricecolin | 2006-05-23 13:14:27 +0200 (Tue, 23 May 2006) | 2 lines Changed paths: M /trunk/Collect/pinot-collect.1 M /trunk/Search/pinot-search.1 M /trunk/UI/GTK2/src/pinot.1 M /trunk/configure.in Preparing for 0.48. ------------------------------------------------------------------------ r262 | fabricecolin | 2006-05-23 13:11:49 +0200 (Tue, 23 May 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot.cc Changed threads termination signal handling slightly so that the main window can process threads that finished while the import dialog was up. ------------------------------------------------------------------------ r261 | fabricecolin | 2006-05-22 15:25:09 +0200 (Mon, 22 May 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Before looking for terms to highlight, convert the extract to UTF-8 to avoid issues with byte/character discrepancies. ------------------------------------------------------------------------ r260 | fabricecolin | 2006-05-20 14:08:49 +0200 (Sat, 20 May 2006) | 3 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp No need to listen for event IN_ATTRIB. Make sure the monitor's file descriptor is in the set MonitorThread select()'s on ! ------------------------------------------------------------------------ r259 | fabricecolin | 2006-05-20 12:13:58 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/BitTorrent.src Fixed host name. ------------------------------------------------------------------------ r258 | fabricecolin | 2006-05-20 11:37:52 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc Minor changes to how the name of the selected language is obtained. ------------------------------------------------------------------------ r257 | fabricecolin | 2006-05-20 09:50:15 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc Highlight all full terms in the extract. Other miscellaneous fixes. ------------------------------------------------------------------------ r256 | fabricecolin | 2006-05-20 09:17:38 +0200 (Sat, 20 May 2006) | 4 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp In MonitorThread, get the file descriptor to listen on even when no changes occured to avoid returning an error message when everything is actually okay. In INotifyMonitor, be more verbose when initialization fails. ------------------------------------------------------------------------ r255 | fabricecolin | 2006-05-20 08:24:46 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am M /trunk/pinot.spec.in Dropped dependency on Gamin/FAM, fixed UI headers list. ------------------------------------------------------------------------ r254 | fabricecolin | 2006-05-20 08:21:45 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/MboxHandler.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/queryDialog.cc Fixed a bunch of warnings. ------------------------------------------------------------------------ r253 | fabricecolin | 2006-05-20 07:20:38 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/po/POTFILES.in Scan MboxHandler, not MonitorHandler. ------------------------------------------------------------------------ r252 | fabricecolin | 2006-05-20 07:13:24 +0200 (Sat, 20 May 2006) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/configure.in Build Monitor right before the UI. Since it requires sigc++-2.0, configure checks for it. ------------------------------------------------------------------------ r251 | fabricecolin | 2006-05-20 07:10:25 +0200 (Sat, 20 May 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh The Show Extract and Group By menuitems are always visible and have effect on all results tabs. Up to now, this wasn't very clear. ------------------------------------------------------------------------ r250 | fabricecolin | 2006-05-20 06:30:58 +0200 (Sat, 20 May 2006) | 3 lines Changed paths: M /trunk/Monitor/Makefile.am A /trunk/Monitor/MonitorHandler.cpp (from /trunk/UI/GTK2/src/MonitorHandler.cpp:249) A /trunk/Monitor/MonitorHandler.h (from /trunk/UI/GTK2/src/MonitorHandler.h:249) M /trunk/UI/GTK2/src/Makefile.am A /trunk/UI/GTK2/src/MboxHandler.cpp A /trunk/UI/GTK2/src/MboxHandler.h D /trunk/UI/GTK2/src/MonitorHandler.cpp D /trunk/UI/GTK2/src/MonitorHandler.h M /trunk/UI/GTK2/src/WorkerThreads.cpp Split MonitorHandler and MboxHandler. In MonitorThread, replaced calls to FAM with MonitorInterface. ------------------------------------------------------------------------ r249 | fabricecolin | 2006-05-20 06:00:42 +0200 (Sat, 20 May 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorInterface.h M /trunk/UI/GTK2/src/MonitorHandler.cpp M /trunk/UI/GTK2/src/MonitorHandler.h Redesigned MonitorHandler to better fit with MonitorInterface. ------------------------------------------------------------------------ r248 | fabricecolin | 2006-05-19 01:37:13 +0200 (Fri, 19 May 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/MonitorInterface.h Monitoring now functional. ------------------------------------------------------------------------ r247 | fabricecolin | 2006-05-17 16:33:35 +0200 (Wed, 17 May 2006) | 2 lines Changed paths: M /trunk/Monitor/INotifyMonitor.cpp M /trunk/Monitor/INotifyMonitor.h M /trunk/Monitor/Makefile.am A /trunk/Monitor/MonitorFactory.cpp A /trunk/Monitor/MonitorFactory.h M /trunk/Monitor/MonitorInterface.h Some fixes, new class MonitorFactory. ------------------------------------------------------------------------ r246 | fabricecolin | 2006-05-17 14:59:37 +0200 (Wed, 17 May 2006) | 2 lines Changed paths: M /trunk/Tokenize/TokenizerFactory.h M /trunk/Utils/Document.h M /trunk/Utils/DocumentInfo.h Cosmetic changes. ------------------------------------------------------------------------ r245 | fabricecolin | 2006-05-17 02:07:20 +0200 (Wed, 17 May 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h On update, don't attempt to detect the document's language if one is provided. ------------------------------------------------------------------------ r244 | fabricecolin | 2006-05-17 02:05:04 +0200 (Wed, 17 May 2006) | 2 lines Changed paths: M /trunk/Makefile.am A /trunk/Monitor A /trunk/Monitor/INotifyMonitor.cpp A /trunk/Monitor/INotifyMonitor.h A /trunk/Monitor/Makefile.am A /trunk/Monitor/MonitorEvent.cpp A /trunk/Monitor/MonitorEvent.h A /trunk/Monitor/MonitorInterface.h M /trunk/configure.in First shot at an inotify-based file monitor. ------------------------------------------------------------------------ r243 | fabricecolin | 2006-05-15 16:15:01 +0200 (Mon, 15 May 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Fixed menuitem state inconsistencies when switching and closing tabs. Removed unused methods in Index and ResultsTree classes. ------------------------------------------------------------------------ r241 | fabricecolin | 2006-05-12 05:01:19 +0200 (Fri, 12 May 2006) | 2 lines Changed paths: M /trunk/NEWS Releasing 0.47 today. ------------------------------------------------------------------------ r240 | fabricecolin | 2006-05-10 16:37:34 +0200 (Wed, 10 May 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/configure.in M /trunk/po/es.po M /trunk/po/fr.po Preparing for next release, hopefully this week. ------------------------------------------------------------------------ r239 | fabricecolin | 2006-05-10 15:07:15 +0200 (Wed, 10 May 2006) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/XapianCollector.cpp M /trunk/Index/XapianIndex.cpp M /trunk/SQL/SQLiteBase.cpp M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/XmlTokenizer.cpp M /trunk/UI/GTK2/src/MonitorHandler.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/Document.cpp M /trunk/Utils/MboxParser.cpp M /trunk/Utils/XapianDatabase.cpp Brought some sanity to debugging messages ;-) ------------------------------------------------------------------------ r238 | fabricecolin | 2006-05-10 15:05:48 +0200 (Wed, 10 May 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/CreativeCommons.src Updated, based on plugin shipped with Firefox. ------------------------------------------------------------------------ r237 | fabricecolin | 2006-05-07 07:54:30 +0200 (Sun, 07 May 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Updated translations. ------------------------------------------------------------------------ r236 | fabricecolin | 2006-05-07 07:49:52 +0200 (Sun, 07 May 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc Update the query terms list on addResults() as it may change from one run to the next. When unindexing documents, "remove" is better suited than "delete". ------------------------------------------------------------------------ r235 | fabricecolin | 2006-05-06 13:28:03 +0200 (Sat, 06 May 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO Updated NEWS with changes since last release, added more items to TODO list. ------------------------------------------------------------------------ r234 | fabricecolin | 2006-05-06 08:51:32 +0200 (Sat, 06 May 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Do a DEBUG build if "--with debug" is passed to rpmbuild. ------------------------------------------------------------------------ r233 | fabricecolin | 2006-05-06 08:50:11 +0200 (Sat, 06 May 2006) | 5 lines Changed paths: M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc QueryProperties::getTerms() returns the terms that make up the query. This is used by ResultsPage to determine what parts of the extract field should be shown in bold text. Some other minor changes. ------------------------------------------------------------------------ r232 | fabricecolin | 2006-05-06 05:25:23 +0200 (Sat, 06 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/propertiesDialog_glade.cc Removed evil set_text("") ! ------------------------------------------------------------------------ r231 | fabricecolin | 2006-05-06 05:24:24 +0200 (Sat, 06 May 2006) | 2 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp Don't be picky about query parameters, invoke GoogleSearch correctly. ------------------------------------------------------------------------ r230 | fabricecolin | 2006-05-05 12:52:09 +0200 (Fri, 05 May 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Bundle man pages. ------------------------------------------------------------------------ r229 | fabricecolin | 2006-05-05 12:49:49 +0200 (Fri, 05 May 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.hh Allow changing the language of documents in the properties dialog box. A subsequent update would use the given language to stem terms. ------------------------------------------------------------------------ r228 | fabricecolin | 2006-05-05 12:42:25 +0200 (Fri, 05 May 2006) | 2 lines Changed paths: M /trunk/Collect/Makefile.am D /trunk/Collect/dloadtest.cpp A /trunk/Collect/pinot-collect.cpp (from /trunk/Collect/dloadtest.cpp:227) M /trunk/Search/Makefile.am M /trunk/Search/pinot-search.1 A /trunk/Search/pinot-search.cpp (from /trunk/Search/senginetest.cpp:227) D /trunk/Search/senginetest.cpp Renamed source files, tidied up man page for pinot-search. ------------------------------------------------------------------------ r227 | fabricecolin | 2006-05-04 13:26:25 +0200 (Thu, 04 May 2006) | 3 lines Changed paths: M /trunk/Collect/dloadtest.cpp A /trunk/Collect/pinot-collect.1 M /trunk/Makefile.am A /trunk/Search/pinot-search.1 M /trunk/Search/senginetest.cpp A /trunk/UI/GTK2/src/pinot.1 M /trunk/UI/GTK2/src/pinot.cc All programs support --help and --version. This helped to generate man pages with help2man. ------------------------------------------------------------------------ r226 | fabricecolin | 2006-05-02 17:16:11 +0200 (Tue, 02 May 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp M /trunk/UI/GTK2/src/EnginesTree.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc Load and save channels whose group is collapsed in the engines tree. ------------------------------------------------------------------------ r225 | fabricecolin | 2006-05-02 15:00:24 +0200 (Tue, 02 May 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/AmazonAPI.src M /trunk/Search/Plugins/YahooAPI.src M /trunk/UI/GTK2/src/PinotSettings.cpp Merged channel Web Services with The Web. ------------------------------------------------------------------------ r223 | fabricecolin | 2006-04-21 13:24:03 +0200 (Fri, 21 Apr 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/pinot.desktop Internationalized desktop file. Updated news file. ------------------------------------------------------------------------ r222 | fabricecolin | 2006-04-21 13:21:53 +0200 (Fri, 21 Apr 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Updated translations. ------------------------------------------------------------------------ r221 | fabricecolin | 2006-04-18 16:08:07 +0200 (Tue, 18 Apr 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/Utils/MIMEScanner.cpp Be a bit less stringent with mbox'es as shared-mime-info/xdgmime doesn't recognize them as text/x-mail. MIMEScanner now returns xdg_mime_type_unknown for unknown types. ------------------------------------------------------------------------ r220 | fabricecolin | 2006-04-18 16:03:43 +0200 (Tue, 18 Apr 2006) | 3 lines Changed paths: D /trunk/Search/Plugins/MozDexDescription.xml M /trunk/pinot.spec.in Removed MozDex as its OpenSearch output been unavailable for weeks. Hopefully, this is only temporary. ------------------------------------------------------------------------ r219 | fabricecolin | 2006-04-17 16:04:59 +0200 (Mon, 17 Apr 2006) | 2 lines Changed paths: M /trunk/README M /trunk/configure.in Preparing for v0.46 release. ------------------------------------------------------------------------ r218 | fabricecolin | 2006-04-15 14:12:10 +0200 (Sat, 15 Apr 2006) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Slightly better parsing of relative URLs. ------------------------------------------------------------------------ r217 | fabricecolin | 2006-04-15 14:09:04 +0200 (Sat, 15 Apr 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Added missing check on index tree when IndexingThread returns. This could cause a crash. Yeah it was that bad :-) ------------------------------------------------------------------------ r216 | fabricecolin | 2006-04-15 07:29:06 +0200 (Sat, 15 Apr 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/queryDialog_glade.cc Renamed/relabeled tabs in queryDialog. ------------------------------------------------------------------------ r215 | fabricecolin | 2006-04-15 06:26:52 +0200 (Sat, 15 Apr 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO Updated with last few days' changes. ------------------------------------------------------------------------ r214 | fabricecolin | 2006-04-15 06:11:59 +0200 (Sat, 15 Apr 2006) | 4 lines Changed paths: M /trunk/Makefile.am M /trunk/configure.in A /trunk/pinot.desktop M /trunk/pinot.spec.in Check for desktop-file-install program and shared-mime-info package at configure time. Distribute and install pinot.desktop file, previously generated by RPM spec file. ------------------------------------------------------------------------ r213 | fabricecolin | 2006-04-14 09:24:46 +0200 (Fri, 14 Apr 2006) | 4 lines Changed paths: M /trunk/NEWS M /trunk/README M /trunk/pinot.spec.in Text-docs RPM includes libopendocumenttokenizer.so. Added dependencies on unzip and shared-mime-info (needed by xdgmime). Updated NEWS and README. ------------------------------------------------------------------------ r212 | fabricecolin | 2006-04-14 09:20:48 +0200 (Fri, 14 Apr 2006) | 2 lines Changed paths: M /trunk/Tokenize/OpenDocumentTokenizer.cpp Open content.xml, not meta.xml ! Doh ! ------------------------------------------------------------------------ r211 | fabricecolin | 2006-04-14 09:18:40 +0200 (Fri, 14 Apr 2006) | 4 lines Changed paths: M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/Makefile.am A /trunk/Utils/xdgmime A /trunk/Utils/xdgmime/ChangeLog A /trunk/Utils/xdgmime/xdgmime.c A /trunk/Utils/xdgmime/xdgmime.h A /trunk/Utils/xdgmime/xdgmimealias.c A /trunk/Utils/xdgmime/xdgmimealias.h A /trunk/Utils/xdgmime/xdgmimecache.c A /trunk/Utils/xdgmime/xdgmimecache.h A /trunk/Utils/xdgmime/xdgmimeglob.c A /trunk/Utils/xdgmime/xdgmimeglob.h A /trunk/Utils/xdgmime/xdgmimeint.c A /trunk/Utils/xdgmime/xdgmimeint.h A /trunk/Utils/xdgmime/xdgmimemagic.c A /trunk/Utils/xdgmime/xdgmimemagic.h A /trunk/Utils/xdgmime/xdgmimeparent.c A /trunk/Utils/xdgmime/xdgmimeparent.h Import freedesktop.org's LGPL-licensed xdgmime library, pulled from CVS today. MIMEScanner now uses that to more effectively determine a file's MIME type. As a result, we now depend on shared-mime-info. ------------------------------------------------------------------------ r210 | fabricecolin | 2006-04-14 08:07:55 +0200 (Fri, 14 Apr 2006) | 2 lines Changed paths: A /trunk/Tokenize/OpenDocumentTokenizer.cpp A /trunk/Tokenize/OpenDocumentTokenizer.h Unsurprisingly, I forgot to check in the new OpenDocumentTokenizer class ;-) ------------------------------------------------------------------------ r209 | fabricecolin | 2006-04-14 08:03:52 +0200 (Fri, 14 Apr 2006) | 5 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Tokenize/HtmlTokenizer.h M /trunk/Tokenize/Makefile.am M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.cpp M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/TagLibTokenizer.cpp M /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/WordTokenizer.cpp M /trunk/Tokenize/WordTokenizer.h A /trunk/Tokenize/XmlTokenizer.cpp A /trunk/Tokenize/XmlTokenizer.h New tokenizers for xml and StarOffice/OpenOffice documents. The latter relies on the former and needs unzip to extract specific files. Modified tokenizer libraries interface slightly to make it easier for those that support more than one type. ------------------------------------------------------------------------ r208 | fabricecolin | 2006-04-14 07:59:03 +0200 (Fri, 14 Apr 2006) | 2 lines Changed paths: M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h New method replaceEntities(), taken from HtmlTokenizer. ------------------------------------------------------------------------ r207 | fabricecolin | 2006-04-13 15:18:51 +0200 (Thu, 13 Apr 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Hide the View Cache menuitem if the Google API key is not set. ------------------------------------------------------------------------ r206 | fabricecolin | 2006-04-12 17:23:48 +0200 (Wed, 12 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/GAPIC.cpp Removed calls to DBGLOG(). ------------------------------------------------------------------------ r205 | fabricecolin | 2006-04-12 16:24:01 +0200 (Wed, 12 Apr 2006) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/pinot.spec.in AmazonAPI source is not installed in the engines directory. The RPM can be built with "--with soap", which adds requirement on gsoap. ------------------------------------------------------------------------ r204 | fabricecolin | 2006-04-12 16:01:47 +0200 (Wed, 12 Apr 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h StringManip::hashString() now uses same hash algorithm as omindex/scriptindex. Maximum term length increased to 240. ------------------------------------------------------------------------ r203 | fabricecolin | 2006-04-12 15:55:28 +0200 (Wed, 12 Apr 2006) | 2 lines Changed paths: M /trunk/UI/RenderHTML/MozillaRenderer.cpp Set scrollbars flag on. ------------------------------------------------------------------------ r202 | fabricecolin | 2006-04-12 15:52:49 +0200 (Wed, 12 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/GoogleAPIEngine.cpp Extra checks. ------------------------------------------------------------------------ r201 | fabricecolin | 2006-04-11 16:22:50 +0200 (Tue, 11 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/GAPIStub.h Added missing header. ------------------------------------------------------------------------ r200 | fabricecolin | 2006-04-11 16:21:52 +0200 (Tue, 11 Apr 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog.cc Disable the symlinks button when type URl is selected. ------------------------------------------------------------------------ r199 | fabricecolin | 2006-04-10 15:50:54 +0200 (Mon, 10 Apr 2006) | 2 lines Changed paths: M /trunk/Makefile.am A /trunk/UI/GTK2/index.html (from /trunk/index.html:177) D /trunk/index.html Moved index.html to UI/GTK2. ------------------------------------------------------------------------ r198 | fabricecolin | 2006-04-09 17:15:01 +0200 (Sun, 09 Apr 2006) | 2 lines Changed paths: M /trunk/configure.in Fixed MOZILLA_LIB_DIR. ------------------------------------------------------------------------ r197 | fabricecolin | 2006-04-09 16:53:06 +0200 (Sun, 09 Apr 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/README M /trunk/TODO Updates. ------------------------------------------------------------------------ r196 | fabricecolin | 2006-04-09 16:47:53 +0200 (Sun, 09 Apr 2006) | 2 lines Changed paths: M /trunk/configure.in New option --with-gecko=mozilla|firefox should enable to build against Firefox. ------------------------------------------------------------------------ r195 | fabricecolin | 2006-04-07 17:25:25 +0200 (Fri, 07 Apr 2006) | 2 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Collect/dloadtest.cpp M /trunk/Search/Makefile.am M /trunk/pinot.spec.in Renamed pinot_search to pinot-search. Don't package pinot-collect just yet. ------------------------------------------------------------------------ r194 | fabricecolin | 2006-04-05 16:09:10 +0200 (Wed, 05 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Makefile.am A /trunk/Search/SOAPEnvH.h A /trunk/Search/SOAPEnvStub.h More gSOAP-generated files. ------------------------------------------------------------------------ r193 | fabricecolin | 2006-04-05 15:58:23 +0200 (Wed, 05 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/Makefile.am ... and GAPI.nsmap ! ------------------------------------------------------------------------ r192 | fabricecolin | 2006-04-05 15:50:27 +0200 (Wed, 05 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/Makefile.am Distribute files GAPIC.cpp and GAPIClient.cpp ! ------------------------------------------------------------------------ r191 | fabricecolin | 2006-04-05 15:39:55 +0200 (Wed, 05 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/Makefile.am Removed dep on wsdl file and unused header from noinst_HEADERS. ------------------------------------------------------------------------ r190 | fabricecolin | 2006-04-05 15:13:38 +0200 (Wed, 05 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/BitTorrent.src Update. ------------------------------------------------------------------------ r189 | fabricecolin | 2006-04-05 15:08:03 +0200 (Wed, 05 Apr 2006) | 2 lines Changed paths: M /trunk/Search/Google/GAPIC.cpp Added missing types to calls to soap_out_std__string(). ------------------------------------------------------------------------ r188 | fabricecolin | 2006-04-05 15:06:48 +0200 (Wed, 05 Apr 2006) | 3 lines Changed paths: A /trunk/Search/Google/GAPI.nsmap A /trunk/Search/Google/GAPIC.cpp A /trunk/Search/Google/GAPIClient.cpp A /trunk/Search/Google/GAPIClientLib.cpp A /trunk/Search/Google/GAPIGoogleSearchBindingProxy.h A /trunk/Search/Google/GAPIH.h A /trunk/Search/Google/GAPIStub.h Checking in gSOAP-generated (v2.7.6e) client stubs. They shouldn't have to be regenerated too often and one of them needs to be fixed (see next commit :-). ------------------------------------------------------------------------ r187 | fabricecolin | 2006-04-04 17:16:44 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am Flags and libraries for SOAP support. ------------------------------------------------------------------------ r186 | fabricecolin | 2006-04-04 16:04:09 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: M /trunk/configure.in Generation of Search/Google/Makefile is conditional. ------------------------------------------------------------------------ r185 | fabricecolin | 2006-04-04 15:38:06 +0200 (Tue, 04 Apr 2006) | 3 lines Changed paths: M /trunk/Tokenize/Makefile.am A /trunk/Tokenize/TagLibTokenizer.cpp A /trunk/Tokenize/TagLibTokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/configure.in M /trunk/pinot.spec.in New tokenizer for MP3, Vorbis and FLAC audio that extracts track information with taglib. ------------------------------------------------------------------------ r184 | fabricecolin | 2006-04-04 15:18:35 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: M /trunk/Search/senginetest.cpp M /trunk/UI/GTK2/src/prefsDialog.cc More minor fixes... ------------------------------------------------------------------------ r183 | fabricecolin | 2006-04-04 15:17:28 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/Utils/MboxParser.cpp Minor fixes to get rid of compilation warnings. ------------------------------------------------------------------------ r182 | fabricecolin | 2006-04-04 15:14:50 +0200 (Tue, 04 Apr 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h IndexingThread can skip MIME type check. Minor fixes to get rid of compilation warnings. ------------------------------------------------------------------------ r181 | fabricecolin | 2006-04-04 14:29:15 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp If the result's extract is all spaces, strip tags and replace. ------------------------------------------------------------------------ r180 | fabricecolin | 2006-04-04 14:27:36 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp Complete relative URLs with the search engine's host name. ------------------------------------------------------------------------ r179 | fabricecolin | 2006-04-04 14:25:39 +0200 (Tue, 04 Apr 2006) | 4 lines Changed paths: M /trunk/Makefile.am M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/Google/GoogleSearch.h A /trunk/Search/Google/Makefile.am M /trunk/Search/Makefile.am A /trunk/Search/SOAPEnv.h M /trunk/Search/SearchEngineFactory.cpp M /trunk/configure.in Attempt at resurrecting support for the Google SOAP API. Option --with-soap=yes can be passed at configure time. This will use gsoap's utilities to generate stubs based on Search/Google/googleapi/GoogleSearch.wsdl. ------------------------------------------------------------------------ r178 | fabricecolin | 2006-04-04 13:23:31 +0200 (Tue, 04 Apr 2006) | 2 lines Changed paths: D /trunk/Search/ObjectsSearch ObjectsSearch API is long obsolete. ------------------------------------------------------------------------ r177 | fabricecolin | 2006-03-28 17:05:52 +0200 (Tue, 28 Mar 2006) | 2 lines Changed paths: M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/DownloaderFactory.h Cosmetic changes. ------------------------------------------------------------------------ r175 | fabricecolin | 2006-03-25 05:41:46 +0100 (Sat, 25 Mar 2006) | 3 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/configure.in M /trunk/pinot.spec.in M /trunk/po/es.po M /trunk/po/fr.po Preparing for v0.45 release. Synced po files with source, updated new features and todo lists. ------------------------------------------------------------------------ r174 | fabricecolin | 2006-03-25 05:32:42 +0100 (Sat, 25 Mar 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc When running a live query, we can use terms as andWords as index searches are now multi-step. For search plugins, it won't make any difference. ------------------------------------------------------------------------ r173 | fabricecolin | 2006-03-24 13:10:00 +0100 (Fri, 24 Mar 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/AbstractGenerator.cpp R-prefix terms that start with a capital letter and skip all prefixed terms when generating the abstract. ------------------------------------------------------------------------ r172 | fabricecolin | 2006-03-24 01:39:39 +0100 (Fri, 24 Mar 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp Put status icons in a separate column so that the "new result" colour applies to the whole Title column. ------------------------------------------------------------------------ r171 | fabricecolin | 2006-03-23 16:33:02 +0100 (Thu, 23 Mar 2006) | 4 lines Changed paths: M /trunk/NEWS M /trunk/TODO M /trunk/pinot.spec.in Misc updates. In the spec file, set a requirement on /usr/bin/pdftohtml as it is provided by poppler-utils on FC5 and pdftohtml on FC4. ------------------------------------------------------------------------ r170 | fabricecolin | 2006-03-22 17:05:10 +0100 (Wed, 22 Mar 2006) | 2 lines Changed paths: M /trunk/Collect/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/UI/RenderHTML/MozillaRenderer.cpp M /trunk/Utils/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Various fixes for building on Fedora Core 5. ------------------------------------------------------------------------ r169 | fabricecolin | 2006-03-21 15:36:42 +0100 (Tue, 21 Mar 2006) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp Parsing may throw exceptions. ------------------------------------------------------------------------ r168 | fabricecolin | 2006-03-21 13:41:40 +0100 (Tue, 21 Mar 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Acoona.src was renamed. ------------------------------------------------------------------------ r167 | fabricecolin | 2006-03-21 12:33:09 +0100 (Tue, 21 Mar 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Corrections and sync with yesterday's changes. ------------------------------------------------------------------------ r166 | fabricecolin | 2006-03-20 15:58:23 +0100 (Mon, 20 Mar 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Call FAMClose() only if FAMOpen() returned successfully. This prevents a exit time crash on Ubuntu 5.10. ------------------------------------------------------------------------ r165 | fabricecolin | 2006-03-20 14:36:14 +0100 (Mon, 20 Mar 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/TODO Work done since the previous release. ------------------------------------------------------------------------ r164 | fabricecolin | 2006-03-20 14:34:49 +0100 (Mon, 20 Mar 2006) | 2 lines Changed paths: A /trunk/po/en.po (from /trunk/po/en_GB.po:135) D /trunk/po/en_GB.po M /trunk/po/es.po M /trunk/po/fr.po Updated translations (Spanish catalog not complete yet). ------------------------------------------------------------------------ r163 | fabricecolin | 2006-03-20 13:21:30 +0100 (Mon, 20 Mar 2006) | 2 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp Add a space to text when stripping tags to avoid concatenating words. ------------------------------------------------------------------------ r162 | fabricecolin | 2006-03-20 13:19:31 +0100 (Mon, 20 Mar 2006) | 3 lines Changed paths: A /trunk/Search/Plugins/Accoona.src (from /trunk/Search/Plugins/Acoona.src:135) D /trunk/Search/Plugins/Acoona.src M /trunk/Search/Plugins/BitTorrent.src M /trunk/Search/Plugins/CreativeCommons.src Renamed Acoona to Accoona. Moved BitTorrent and Creative Commons to "Content" channel :-) ------------------------------------------------------------------------ r161 | fabricecolin | 2006-03-19 14:56:37 +0100 (Sun, 19 Mar 2006) | 2 lines Changed paths: M /trunk/pinot.spec.in Synced engines list. ------------------------------------------------------------------------ r160 | fabricecolin | 2006-03-19 14:52:50 +0100 (Sun, 19 Mar 2006) | 2 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/Utils/Url.cpp Some fixes for the last couple of check-ins. ------------------------------------------------------------------------ r159 | fabricecolin | 2006-03-19 06:34:05 +0100 (Sun, 19 Mar 2006) | 3 lines Changed paths: M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/ObjectsSearch/ObjectsSearchAPIEngine.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h When a result's URL points to the same host name as the engine's, try to extract the URL embedded in it if any. ------------------------------------------------------------------------ r158 | fabricecolin | 2006-03-19 06:31:45 +0100 (Sun, 19 Mar 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h Icon for new results no longer required. ------------------------------------------------------------------------ r157 | fabricecolin | 2006-03-19 06:29:49 +0100 (Sun, 19 Mar 2006) | 3 lines Changed paths: D /trunk/Search/Plugins/Altavista.src A /trunk/Search/Plugins/CreativeCommons.src A /trunk/Search/Plugins/Exalead.src D /trunk/Search/Plugins/Lycos.src Removed Altavista and Lycos since they are front-ends for Yahoo! and Ask. Added Exalead and Yahoo! Creative Commons sources. ------------------------------------------------------------------------ r156 | fabricecolin | 2006-03-18 14:10:58 +0100 (Sat, 18 Mar 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh The background colour for new results and whether the live query field should try to suggest terms from the index are now configurable through preferences. Moved colour rendering code out of IndexTree and to ResultsTree. ------------------------------------------------------------------------ r155 | fabricecolin | 2006-03-18 12:33:54 +0100 (Sat, 18 Mar 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp When browsing, get the right documents count. Don't attempt to set an empty label on documents. ------------------------------------------------------------------------ r154 | fabricecolin | 2006-03-18 12:32:16 +0100 (Sat, 18 Mar 2006) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h getDocumentsCount() can return the number of documents that have a label. setDocumentLabels() skips empty labels. ------------------------------------------------------------------------ r153 | fabricecolin | 2006-03-17 13:16:41 +0100 (Fri, 17 Mar 2006) | 3 lines Changed paths: M /trunk/SQL/QueryHistory.cpp M /trunk/SQL/QueryHistory.h M /trunk/SQL/ViewHistory.cpp M /trunk/SQL/ViewHistory.h ViewHistory now includes a Date column. The table is updated by create(). Items in both tables can be expired. ------------------------------------------------------------------------ r152 | fabricecolin | 2006-03-17 13:03:03 +0100 (Fri, 17 Mar 2006) | 2 lines Changed paths: M /trunk/Utils/Url.cpp Less DEBUG output. ------------------------------------------------------------------------ r151 | fabricecolin | 2006-03-17 13:01:24 +0100 (Fri, 17 Mar 2006) | 2 lines Changed paths: A /trunk/Search/Plugins/Ask.src (from /trunk/Search/Plugins/AskJeeves.src:135) D /trunk/Search/Plugins/AskJeeves.src A /trunk/Search/Plugins/RollYOTopNews.src D /trunk/Search/Plugins/Teoma.src Ask.com replaces Teoma and Ask Jeeves. Added RollYO's Top News source. ------------------------------------------------------------------------ r150 | fabricecolin | 2006-03-16 15:37:54 +0100 (Thu, 16 Mar 2006) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h Lock parsing of Sherlock plugins rather downloading... ------------------------------------------------------------------------ r149 | fabricecolin | 2006-03-16 15:20:37 +0100 (Thu, 16 Mar 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Topix.src Fixed results extraction. ------------------------------------------------------------------------ r147 | fabricecolin | 2006-03-12 07:52:29 +0100 (Sun, 12 Mar 2006) | 2 lines Changed paths: M /trunk/NEWS M /trunk/po/es.po M /trunk/po/fr.po Updating translations and news. ------------------------------------------------------------------------ r146 | fabricecolin | 2006-03-09 14:48:14 +0100 (Thu, 09 Mar 2006) | 3 lines Changed paths: M /trunk/AUTHORS M /trunk/Makefile.am M /trunk/TODO M /trunk/configure.in M /trunk/pinot.spec.in Distribute NEWS. Changed email address, bumped version number to v0.44. +2 -1 items in TODO list. ------------------------------------------------------------------------ r145 | fabricecolin | 2006-03-09 14:42:43 +0100 (Thu, 09 Mar 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Lycos.src Fixed page browsing. ------------------------------------------------------------------------ r144 | fabricecolin | 2006-03-09 14:35:05 +0100 (Thu, 09 Mar 2006) | 2 lines Changed paths: D /trunk/ChangeLog M /trunk/NEWS Put news in NEWS, not ChangeLog :-) ------------------------------------------------------------------------ r143 | fabricecolin | 2006-03-08 15:09:45 +0100 (Wed, 08 Mar 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotSettings.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.hh M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh Labels no longer have a colour. In mainWindow, don't use the ThreadsManager lock to protect lists, this may lead to deadlocks. ------------------------------------------------------------------------ r142 | fabricecolin | 2006-03-08 14:14:24 +0100 (Wed, 08 Mar 2006) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Compile Utils objects in. ------------------------------------------------------------------------ r141 | fabricecolin | 2006-03-06 16:55:42 +0100 (Mon, 06 Mar 2006) | 3 lines Changed paths: M /trunk/README M /trunk/configure.in M /trunk/pinot.spec.in Curl is used by default in place of Neon. Updated RPM spec as well as requirements and mini FAQ in README file. ------------------------------------------------------------------------ r140 | fabricecolin | 2006-03-06 16:48:41 +0100 (Mon, 06 Mar 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/MonitorHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh First shot at unifying index browsing and label querying. LabelQueryThread has been dropped amongst other things. ------------------------------------------------------------------------ r139 | fabricecolin | 2006-03-06 16:46:49 +0100 (Mon, 06 Mar 2006) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Method getDocumentsWithLabel() becomes listDocumentsWithLabel(), unified with listDocuments(). ------------------------------------------------------------------------ r138 | fabricecolin | 2006-03-06 16:45:28 +0100 (Mon, 06 Mar 2006) | 2 lines Changed paths: M /trunk/Tokenize/Makefile.am Try harder to build working tokenizers ! They were not linked against libUtils. ------------------------------------------------------------------------ r137 | fabricecolin | 2006-03-04 09:29:31 +0100 (Sat, 04 Mar 2006) | 3 lines Changed paths: M /trunk/Collect/DownloaderFactory.cpp M /trunk/Collect/DownloaderFactory.h M /trunk/Collect/dloadtest.cpp M /trunk/Search/WebEngine.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/Utils/Url.cpp Changed prototype of DownloaderFactory::getDownloader(). URL escaping functions no longer require curl or neon. ------------------------------------------------------------------------ r136 | fabricecolin | 2006-03-04 06:44:16 +0100 (Sat, 04 Mar 2006) | 4 lines Changed paths: M /trunk/Collect/DownloaderFactory.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/Url.cpp M /trunk/configure.in We can now switch between curl and neon for HTTP collection and URL escaping by running configure with --with-http=neon|curl. Neon is used by default. Removed long obsolete ListenerThread. Cosmetic mods to worker threads. ------------------------------------------------------------------------ r135 | fabricecolin | 2006-03-04 04:33:34 +0100 (Sat, 04 Mar 2006) | 3 lines Changed paths: A /trunk/Collect/CurlDownloader.cpp A /trunk/Collect/CurlDownloader.h M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/DownloaderInterface.h M /trunk/Collect/Makefile.am M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h M /trunk/Collect/dloadtest.cpp M /trunk/UI/GTK2/src/pinot.cc Moved OpenSSL initialization to DownloaderInterface. New CurlDownloader class is an alternative to NeonDownloader. Program dloadtest compiled as pinot_collect. ------------------------------------------------------------------------ r134 | fabricecolin | 2006-03-03 15:30:49 +0100 (Fri, 03 Mar 2006) | 3 lines Changed paths: M /trunk/Index/Makefile.am D /trunk/Index/Summarizer.cpp D /trunk/Index/Summarizer.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Makefile.am M /trunk/README M /trunk/Search/AbstractGenerator.cpp M /trunk/Search/AbstractGenerator.h M /trunk/Search/XapianEngine.cpp M /trunk/configure.in M /trunk/pinot.spec.in Removed dependency on OTS. Summaries are now built at query time by AbstractGenerator. ------------------------------------------------------------------------ r133 | fabricecolin | 2006-02-27 16:30:23 +0100 (Mon, 27 Feb 2006) | 6 lines Changed paths: A /trunk/Search/AbstractGenerator.cpp A /trunk/Search/AbstractGenerator.h M /trunk/Search/Makefile.am AbstractGenerator attempts to produce a summary of a document based on the positions of a query's terms. Abstract "windows" are anchored on both side of the query's top term positions and weighted by the presence of terms. The window that has the most terms is chosen. ------------------------------------------------------------------------ r131 | fabricecolin | 2006-02-25 06:55:38 +0100 (Sat, 25 Feb 2006) | 2 lines Changed paths: M /trunk/ChangeLog Releasing v0.43 today. ------------------------------------------------------------------------ r130 | fabricecolin | 2006-02-25 03:12:55 +0100 (Sat, 25 Feb 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Acoona.src Fixed results parsing. ------------------------------------------------------------------------ r129 | fabricecolin | 2006-02-24 16:06:46 +0100 (Fri, 24 Feb 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Caught up with recent changes. ------------------------------------------------------------------------ r128 | fabricecolin | 2006-02-24 15:51:06 +0100 (Fri, 24 Feb 2006) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/Search/PluginWebEngine.h Cosmetic changes. ------------------------------------------------------------------------ r127 | fabricecolin | 2006-02-24 15:44:19 +0100 (Fri, 24 Feb 2006) | 2 lines Changed paths: M /trunk/Tokenize/WordTokenizer.cpp Fixed buffer overrun. ------------------------------------------------------------------------ r126 | fabricecolin | 2006-02-23 14:20:22 +0100 (Thu, 23 Feb 2006) | 4 lines Changed paths: M /trunk/ChangeLog M /trunk/TODO M /trunk/pinot.spec.in Updated ChangeLog and TODO based on changes made since last release. In Xapian v0.9.4, libxapian.so is version 10. Since I will use this version to build RPMs, I have set the dependencies accordingly. ------------------------------------------------------------------------ r125 | fabricecolin | 2006-02-22 15:32:06 +0100 (Wed, 22 Feb 2006) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Lock Neon session and request creation instead. ------------------------------------------------------------------------ r124 | fabricecolin | 2006-02-21 15:12:50 +0100 (Tue, 21 Feb 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h Protected downloading with a mutex until I figure out why multiple searches cause a crash. ------------------------------------------------------------------------ r123 | fabricecolin | 2006-02-21 12:55:12 +0100 (Tue, 21 Feb 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Topix.src Topix changed. ------------------------------------------------------------------------ r122 | fabricecolin | 2006-02-20 15:59:03 +0100 (Mon, 20 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile.am Fixed order of libraries. ------------------------------------------------------------------------ r121 | fabricecolin | 2006-02-20 15:38:28 +0100 (Mon, 20 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Pressing return in the live query field runs the search. ------------------------------------------------------------------------ r120 | fabricecolin | 2006-02-18 10:47:34 +0100 (Sat, 18 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/po/es.po M /trunk/po/fr.po Minor changes to catalogs, some correction sto fr.po. ------------------------------------------------------------------------ r119 | fabricecolin | 2006-02-18 08:19:26 +0100 (Sat, 18 Feb 2006) | 4 lines Changed paths: M /trunk/Tokenize/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Build fix for tokenizers. Neon requires OpenSSL, so check for it explicitely (useful on Slackware, reported by Bernhard "I wanna play headball !" Fruhmesser). Reverted to Xapian v0.9.2 in spec file, as it works well enough. ------------------------------------------------------------------------ r118 | fabricecolin | 2006-02-17 15:56:25 +0100 (Fri, 17 Feb 2006) | 2 lines Changed paths: M /trunk/po/es.po M /trunk/po/fr.po Synced catalogs with current source. ------------------------------------------------------------------------ r117 | fabricecolin | 2006-02-17 13:29:29 +0100 (Fri, 17 Feb 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot.cc If an index couldn't be opened, or the history database not created, show an error message in the main window's status bar. ------------------------------------------------------------------------ r116 | fabricecolin | 2006-02-16 16:08:03 +0100 (Thu, 16 Feb 2006) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/Tokenize/Makefile.am Sorted out tokenizer libraries. ------------------------------------------------------------------------ r115 | fabricecolin | 2006-02-16 14:54:50 +0100 (Thu, 16 Feb 2006) | 3 lines Changed paths: M /trunk/Index/LanguageDetector.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Makefile.am M /trunk/Search/senginetest.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc M /trunk/configure.in M /trunk/pinot.spec.in A /trunk/po/es.po (from /trunk/po/es_ES.po:110) D /trunk/po/es_ES.po A /trunk/po/fr.po (from /trunk/po/fr_FR.po:110) D /trunk/po/fr_FR.po Renamed and updated language catalogs. Prefix is now set in config.h. Worker threads are a bit more robust. ------------------------------------------------------------------------ r114 | fabricecolin | 2006-02-15 17:49:34 +0100 (Wed, 15 Feb 2006) | 2 lines Changed paths: M /trunk/Index/Makefile.am M /trunk/Makefile.am M /trunk/Search/Makefile.am D /trunk/UI/GTK2/config.h M /trunk/Utils/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in Fixes to get 'make dist' and 'make install' working. ------------------------------------------------------------------------ r113 | fabricecolin | 2006-02-15 14:56:42 +0100 (Wed, 15 Feb 2006) | 2 lines Changed paths: A /trunk/mkinstalldirs This is not generated automagically it seems. ------------------------------------------------------------------------ r112 | fabricecolin | 2006-02-15 14:55:23 +0100 (Wed, 15 Feb 2006) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/po/Makefile.in.in Tweaked DIST files. ------------------------------------------------------------------------ r111 | fabricecolin | 2006-02-15 14:07:28 +0100 (Wed, 15 Feb 2006) | 2 lines Changed paths: A /trunk/ABOUT-NLS This file is required apparently. ------------------------------------------------------------------------ r110 | fabricecolin | 2006-02-15 13:55:04 +0100 (Wed, 15 Feb 2006) | 3 lines Changed paths: M /trunk/configure.in D /trunk/po/Makefile.in A /trunk/po/Makefile.in.in M /trunk/po/es_ES.po M /trunk/po/fr_FR.po Dropped po/Makefile.in, replaced by more useful po/Makefile.in.in, imported from alleyoop 0.9.0. Thanks ! :-) ------------------------------------------------------------------------ r109 | fabricecolin | 2006-02-14 16:02:24 +0100 (Tue, 14 Feb 2006) | 3 lines Changed paths: M /trunk/Makefile.am M /trunk/Tokenize/Makefile.am M /trunk/configure.in Build tokenizer libraries and (attempt to) install everything in the right places :-) ------------------------------------------------------------------------ r108 | fabricecolin | 2006-02-12 15:55:22 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: M /trunk/Makefile.am M /trunk/Search/Makefile.am M /trunk/UI/GTK2/src/Makefile.am M /trunk/Utils/Makefile.am M /trunk/configure.in M /trunk/pinot.spec.in M /trunk/po/Makefile.in Build and distribution fixes. ------------------------------------------------------------------------ r107 | fabricecolin | 2006-02-12 14:15:56 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: D /trunk/Collect/Makefile D /trunk/Index/Makefile D /trunk/Makefile D /trunk/SQL/Makefile D /trunk/Search/Google/Makefile D /trunk/Search/Makefile D /trunk/Search/ObjectsSearch/Makefile D /trunk/Tokenize/Makefile D /trunk/UI/GTK2/src/Makefile D /trunk/UI/RenderHTML/Makefile D /trunk/Utils/Makefile D /trunk/pinot.spec D /trunk/po/POTFILES D /trunk/variables.mk Deleted files that are now automatically generated. ------------------------------------------------------------------------ r106 | fabricecolin | 2006-02-12 13:44:57 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: M /trunk/Utils/XapianDatabase.h M /trunk/configure.in Build fixes. ------------------------------------------------------------------------ r105 | fabricecolin | 2006-02-12 13:34:41 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: A /trunk/AUTHORS A /trunk/NEWS M /trunk/configure.in More autotools stuff. ------------------------------------------------------------------------ r104 | fabricecolin | 2006-02-12 13:22:56 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: A /trunk/autogen.sh M /trunk/configure.in A /trunk/pinot.spec.in A /trunk/po/Makefile.in A /trunk/po/Makevars M /trunk/po/POTFILES A /trunk/po/POTFILES.in More autotools stuff. ------------------------------------------------------------------------ r103 | fabricecolin | 2006-02-12 12:34:36 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: A /trunk/Collect/Makefile.am A /trunk/Index/Makefile.am A /trunk/Makefile.am A /trunk/SQL/Makefile.am A /trunk/Search/Makefile.am A /trunk/Tokenize/Makefile.am A /trunk/UI/GTK2/src/Makefile.am A /trunk/UI/RenderHTML/Makefile.am A /trunk/Utils/Makefile.am A /trunk/configure.in Initial support for autotools. ------------------------------------------------------------------------ r102 | fabricecolin | 2006-02-12 08:13:11 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/Makefile M /trunk/README M /trunk/TODO M /trunk/index.html Lots of minor updates. ------------------------------------------------------------------------ r101 | fabricecolin | 2006-02-12 08:05:53 +0100 (Sun, 12 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/Makefile M /trunk/UI/RenderHTML/Makefile M /trunk/variables.mk Don't link all executables to the Mozilla libs ! ------------------------------------------------------------------------ r100 | fabricecolin | 2006-02-07 13:21:43 +0100 (Tue, 07 Feb 2006) | 3 lines Changed paths: M /trunk/Makefile M /trunk/Search/SearchEngineFactory.cpp M /trunk/Search/SearchEngineFactory.h M /trunk/Search/senginetest.cpp M /trunk/pinot.spec Distribute senginetest as pinot_search as a shell based metasearch would be useful. SearchEngineFactory can list supported engine types. ------------------------------------------------------------------------ r99 | fabricecolin | 2006-02-07 13:18:54 +0100 (Tue, 07 Feb 2006) | 2 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp Don't lower case tags, this messes up links ! ------------------------------------------------------------------------ r98 | fabricecolin | 2006-02-07 13:18:03 +0100 (Tue, 07 Feb 2006) | 5 lines Changed paths: M /trunk/Index/Summarizer.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Utils/Languages.cpp M /trunk/Utils/Languages.h Changed term prefixes to conform to Omega's termprefixes.txt document. This will allow indexes to be queried by other Xapian-based tools that follow those conventions and Pinot to use the QueryParser class at some point. Unfortunately, users will have to update their documents and reapply labels ! ------------------------------------------------------------------------ r97 | fabricecolin | 2006-02-06 15:23:27 +0100 (Mon, 06 Feb 2006) | 4 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/pinot.spec Dropped support for libxml++ v0.26 as v2.12 is required now (or so it seems). Add H-prefixed terms for all of the host name's subdomains so that searching for "Hberlios.de" returns pages from "Hpinot.berlios.de"... ------------------------------------------------------------------------ r96 | fabricecolin | 2006-02-06 15:16:55 +0100 (Mon, 06 Feb 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cc In index_document(), don't complain about already indexed files if we are going to set a label on them. Let the main program set the default icon. Use the live query's field as OR terms for new stored queries. ------------------------------------------------------------------------ r95 | fabricecolin | 2006-02-02 14:18:24 +0100 (Thu, 02 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc UTF-8 fixes on stored queries dates and results extracts. ------------------------------------------------------------------------ r94 | fabricecolin | 2006-02-01 15:00:49 +0100 (Wed, 01 Feb 2006) | 2 lines Changed paths: D /trunk/libxmlpp026.patch Obsolete. ------------------------------------------------------------------------ r93 | fabricecolin | 2006-02-01 14:46:26 +0100 (Wed, 01 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/EnginesTree.cpp Select the current user's engines group by default. ------------------------------------------------------------------------ r92 | fabricecolin | 2006-02-01 14:35:20 +0100 (Wed, 01 Feb 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc Added missing call to bind_textdomain_codeset(). ------------------------------------------------------------------------ r91 | fabricecolin | 2006-02-01 14:33:38 +0100 (Wed, 01 Feb 2006) | 2 lines Changed paths: M /trunk/Search/SherlockParser.cpp M /trunk/Search/plugintest.cpp Sherlock parser grammar copes with foreign tags. ------------------------------------------------------------------------ r89 | fabricecolin | 2006-01-31 04:31:36 +0100 (Tue, 31 Jan 2006) | 2 lines Changed paths: M /trunk/Makefile Missing directory. ------------------------------------------------------------------------ r88 | fabricecolin | 2006-01-31 04:20:50 +0100 (Tue, 31 Jan 2006) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/Makefile M /trunk/README M /trunk/TODO M /trunk/UI/GTK2/config.h M /trunk/pinot.spec M /trunk/po/en_GB.po M /trunk/po/es_ES.po M /trunk/po/fr_FR.po Preparing for v0.42 release. ------------------------------------------------------------------------ r87 | fabricecolin | 2006-01-31 04:19:10 +0100 (Tue, 31 Jan 2006) | 2 lines Changed paths: M /trunk/Tokenize/TokenizerFactory.cpp Use dlerror(). ------------------------------------------------------------------------ r86 | fabricecolin | 2006-01-31 03:59:27 +0100 (Tue, 31 Jan 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/pinot.cc M /trunk/po/es_ES.po Complete Spanish translation. Main program catches exceptions, eg thrown by set_icon_from_file(). ------------------------------------------------------------------------ r85 | fabricecolin | 2006-01-30 16:15:06 +0100 (Mon, 30 Jan 2006) | 3 lines Changed paths: M /trunk/po/en_GB.po A /trunk/po/es_ES.po M /trunk/po/fr_FR.po Synced PO files with current source. Added (almost final) Spanish translation by Jesús Tramullas (jesus at tramullas dot com). ------------------------------------------------------------------------ r84 | fabricecolin | 2006-01-30 07:23:44 +0100 (Mon, 30 Jan 2006) | 2 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/Plugins/KodersDescription.xml M /trunk/Search/Plugins/MozDexDescription.xml M /trunk/Search/Plugins/OmegaDescription.xml Use Tags as the channel name. ------------------------------------------------------------------------ r83 | fabricecolin | 2006-01-30 05:04:21 +0100 (Mon, 30 Jan 2006) | 3 lines Changed paths: D /trunk/Search/Plugins/Omega.src A /trunk/Search/Plugins/OmegaDescription.xml Replaced Omega.src with an OpenSearch Description file as Omega supports OpenSearch Response. ------------------------------------------------------------------------ r82 | fabricecolin | 2006-01-30 04:49:01 +0100 (Mon, 30 Jan 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/Topix.src Fixed Topix search plugin. ------------------------------------------------------------------------ r81 | fabricecolin | 2006-01-28 10:07:33 +0100 (Sat, 28 Jan 2006) | 4 lines Changed paths: M /trunk/UI/GTK2/src/Notebook.cpp M /trunk/UI/GTK2/src/Notebook.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc Results pages have a VPaned so that the extract field can be resized. Fixed issue with Listener and MonitorThread's control pipe that prevented the latter from quitting right away when there's nothing to monitor. ------------------------------------------------------------------------ r80 | fabricecolin | 2006-01-26 16:19:59 +0100 (Thu, 26 Jan 2006) | 2 lines Changed paths: D /trunk/Search/Plugins/Koders.src A /trunk/Search/Plugins/KodersDescription.xml A /trunk/Search/Plugins/MozDexDescription.xml Replaced Koders Sherlock source with its OpenSearch equivalent, added MozDex. ------------------------------------------------------------------------ r79 | fabricecolin | 2006-01-26 16:12:05 +0100 (Thu, 26 Jan 2006) | 4 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/OpenSearchParser.h M /trunk/Search/PluginParsers.h M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/SearchPluginProperties.cpp M /trunk/Search/SearchPluginProperties.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h M /trunk/Search/XapianEngine.cpp M /trunk/variables.mk More than one results page can be requested with OpenSearch. XapianEngine no longer loops ad vitam eternam if the index couldn't be locked ! Minor changes to variables.mk. ------------------------------------------------------------------------ r78 | fabricecolin | 2006-01-25 15:39:03 +0100 (Wed, 25 Jan 2006) | 2 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp Copes better with CDATA nodes. ------------------------------------------------------------------------ r77 | fabricecolin | 2006-01-25 15:28:12 +0100 (Wed, 25 Jan 2006) | 2 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/SherlockParser.cpp Slightly better Response parsing. ------------------------------------------------------------------------ r76 | fabricecolin | 2006-01-25 14:51:16 +0100 (Wed, 25 Jan 2006) | 2 lines Changed paths: M /trunk/Search/OpenSearchParser.cpp M /trunk/Search/OpenSearchParser.h M /trunk/Search/SearchEngineFactory.cpp M /trunk/Search/SearchPluginProperties.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/plugintest.cpp Initial OpenSearch Response support. ------------------------------------------------------------------------ r75 | fabricecolin | 2006-01-25 12:46:04 +0100 (Wed, 25 Jan 2006) | 4 lines Changed paths: A /trunk/Search/OpenSearchParser.cpp A /trunk/Search/OpenSearchParser.h A /trunk/Search/PluginParsers.h M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/PluginWebEngine.h M /trunk/Search/SearchEngineInterface.h M /trunk/Search/SearchPluginProperties.cpp M /trunk/Search/SearchPluginProperties.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h M /trunk/Search/plugintest.cpp PluginWebEngine can now handle Sherlock and OpenSearch plugins (.src and .xml) and their respective response. The OpenSearch Response parser doesn't do anything useful just yet. ------------------------------------------------------------------------ r74 | fabricecolin | 2006-01-22 14:37:53 +0100 (Sun, 22 Jan 2006) | 2 lines Changed paths: M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/PluginWebEngine.h M /trunk/Search/SearchPluginProperties.cpp M /trunk/Search/SearchPluginProperties.h M /trunk/Search/SherlockParser.cpp M /trunk/Search/SherlockParser.h M /trunk/Search/plugintest.cpp First shot at unifying Sherlock and OpenSearch plugins. ------------------------------------------------------------------------ r73 | fabricecolin | 2006-01-22 10:16:00 +0100 (Sun, 22 Jan 2006) | 3 lines Changed paths: M /trunk/Search/Makefile M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/PluginWebEngine.h A /trunk/Search/SearchPluginProperties.cpp A /trunk/Search/SearchPluginProperties.h A /trunk/Search/SherlockParser.cpp (from /trunk/Utils/PluginParser.cpp:69) A /trunk/Search/SherlockParser.h (from /trunk/Utils/PluginParser.h:69) A /trunk/Search/plugintest.cpp (from /trunk/Utils/plugintest.cpp:69) M /trunk/Utils/Makefile D /trunk/Utils/PluginParser.cpp D /trunk/Utils/PluginParser.h D /trunk/Utils/plugintest.cpp Renamed PluginParser to SherlockParser, moved to Search with plugintest program. Added rudimentary OpenSearch Description and Query Syntax parser. ------------------------------------------------------------------------ r72 | fabricecolin | 2006-01-22 09:25:19 +0100 (Sun, 22 Jan 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/Makefile M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp Scrolled windows in Index and ResultsTree had their policy set to ALWAYS. PinotSettings checks what the root node is and catches exceptions ! Doh ! :-) ------------------------------------------------------------------------ r71 | fabricecolin | 2006-01-21 09:27:21 +0100 (Sat, 21 Jan 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh A label can be applied when importing documents. Modified IndexingThread so that the label is also set when updating a document. ------------------------------------------------------------------------ r69 | fabricecolin | 2006-01-20 13:07:01 +0100 (Fri, 20 Jan 2006) | 2 lines Changed paths: M /trunk/ChangeLog Updating log of changes since v0.35. ------------------------------------------------------------------------ r68 | fabricecolin | 2006-01-20 12:47:57 +0100 (Fri, 20 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/po/en_GB.po M /trunk/po/fr_FR.po More fixes to po strings and one to indexDialog::checkFields(). ------------------------------------------------------------------------ r67 | fabricecolin | 2006-01-19 16:44:56 +0100 (Thu, 19 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/indexDialog.cc Fix for previous check-in. ------------------------------------------------------------------------ r66 | fabricecolin | 2006-01-19 14:32:09 +0100 (Thu, 19 Jan 2006) | 4 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h M /trunk/po/en_GB.po M /trunk/po/fr_FR.po At indexing time, we may have to hash the URL to guarantee it's unique as it's limited in length just like other terms. Changed a couple of labels in the UI, synced the po files with the current source. ------------------------------------------------------------------------ r65 | fabricecolin | 2006-01-19 13:51:04 +0100 (Thu, 19 Jan 2006) | 2 lines Changed paths: M /trunk/README M /trunk/TODO M /trunk/UI/GTK2/config.h M /trunk/index.html M /trunk/pinot.spec Refreshed docs, bumped version number to 0.40 in preparation for release. ------------------------------------------------------------------------ r64 | fabricecolin | 2006-01-19 01:31:09 +0100 (Thu, 19 Jan 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/Utils/MIMEScanner.cpp M /trunk/Utils/Url.cpp Indexing a document may change its location property. Make sure the view page is shown when viewing a freshly downloaded document. Some tweaks for URLs. ------------------------------------------------------------------------ r63 | fabricecolin | 2006-01-16 15:46:48 +0100 (Mon, 16 Jan 2006) | 2 lines Changed paths: M /trunk/Tokenize/PdfTokenizer.h M /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/UnknownTypeTokenizer.h M /trunk/Tokenize/WordTokenizer.h Cosmetic changes. ------------------------------------------------------------------------ r62 | fabricecolin | 2006-01-16 15:46:06 +0100 (Mon, 16 Jan 2006) | 2 lines Changed paths: M /trunk/Search/Plugins/AskJeeves.src Caught up with AskJeeves' output. ------------------------------------------------------------------------ r61 | fabricecolin | 2006-01-15 05:16:46 +0100 (Sun, 15 Jan 2006) | 3 lines Changed paths: M /trunk/Tokenize/Makefile M /trunk/Tokenize/PdfTokenizer.cpp M /trunk/Tokenize/PdfTokenizer.h A /trunk/Tokenize/RtfTokenizer.cpp A /trunk/Tokenize/RtfTokenizer.h M /trunk/Tokenize/Tokenizer.cpp M /trunk/Tokenize/Tokenizer.h M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/Tokenize/UnknownTypeTokenizer.cpp M /trunk/Tokenize/WordTokenizer.cpp M /trunk/Tokenize/WordTokenizer.h M /trunk/pinot.spec New RTF tokenizer based on unrtf. Streamlined running helper programs. New package pinot-text-docs includes all tokenizers and replaces -pdf and -word. ------------------------------------------------------------------------ r60 | fabricecolin | 2006-01-14 09:59:50 +0100 (Sat, 14 Jan 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc Added from_utf8(), that catches conversion errors. Sorted out some niggling issues with IndexPage and Tree. ------------------------------------------------------------------------ r59 | fabricecolin | 2006-01-14 07:45:15 +0100 (Sat, 14 Jan 2006) | 4 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h M /trunk/Collect/dloadtest.cpp M /trunk/Index/Summarizer.cpp M /trunk/Index/Summarizer.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc Minor fix and change to NeonDownloader initialization. Capped length of summary and added sanity checks on extract and title, which means that they may be modified before being saved when updating a document's properties. ------------------------------------------------------------------------ r58 | fabricecolin | 2006-01-13 17:36:15 +0100 (Fri, 13 Jan 2006) | 5 lines Changed paths: M /trunk/Collect/DownloaderInterface.cpp M /trunk/Collect/NeonDownloader.cpp M /trunk/Index/LanguageDetector.cpp M /trunk/Index/LanguageDetector.h M /trunk/Index/Summarizer.cpp M /trunk/Index/XapianIndex.cpp M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/MonitorHandler.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/Utils/HtmlDocument.cpp M /trunk/Utils/Url.cpp M /trunk/Utils/Url.h Mutexes for safe multi-threading with OpenSSL are type ERRORCHECK. Limit the amount of text parsed by language guessing and summarization. Use canonical URLs to "key" documents. The importer can follow symlinks. Several other tweaks and fixes. ------------------------------------------------------------------------ r57 | fabricecolin | 2006-01-11 14:26:19 +0100 (Wed, 11 Jan 2006) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/indextest.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/Utils/MIMEScanner.cpp Fixed XapianIndex::hasDocument() and the importing of URLs. ------------------------------------------------------------------------ r56 | fabricecolin | 2006-01-10 17:01:00 +0100 (Tue, 10 Jan 2006) | 2 lines Changed paths: M /trunk/po/en_GB.po M /trunk/po/fr_FR.po Synced po files with latest source. ------------------------------------------------------------------------ r55 | fabricecolin | 2006-01-10 16:58:56 +0100 (Tue, 10 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/mainWindow.cc Clean threads termination at last ! Some other minor changes. ------------------------------------------------------------------------ r54 | fabricecolin | 2006-01-10 14:04:16 +0100 (Tue, 10 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc Fixes for the previous check-in. ------------------------------------------------------------------------ r53 | fabricecolin | 2006-01-10 10:46:28 +0100 (Tue, 10 Jan 2006) | 3 lines Changed paths: M /trunk/Index/LanguageDetector.cpp M /trunk/Index/Summarizer.cpp M /trunk/Tokenize/TokenizerFactory.cpp M /trunk/Tokenize/TokenizerFactory.h M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/ModelColumns.cpp M /trunk/UI/GTK2/src/ModelColumns.h M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/indexDialog.cc M /trunk/UI/GTK2/src/indexDialog.hh M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog.cc M /trunk/UI/GTK2/src/propertiesDialog.hh M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc Tweaked user interface. Redone importDialog; documents are now imported directly and not handled by the main window. ------------------------------------------------------------------------ r52 | fabricecolin | 2006-01-03 20:19:59 +0100 (Tue, 03 Jan 2006) | 2 lines Changed paths: M /trunk/po/en_GB.po M /trunk/po/fr_FR.po Synced po files with source. ------------------------------------------------------------------------ r51 | fabricecolin | 2006-01-03 20:18:38 +0100 (Tue, 03 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/mainWindow.cc Minor tweaks. ------------------------------------------------------------------------ r50 | fabricecolin | 2006-01-03 17:27:03 +0100 (Tue, 03 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/mainWindow.cc Follow-up to previous check-in. ------------------------------------------------------------------------ r49 | fabricecolin | 2006-01-03 14:12:11 +0100 (Tue, 03 Jan 2006) | 4 lines Changed paths: D /trunk/SQL/ActionHistory.cpp D /trunk/SQL/ActionHistory.h M /trunk/SQL/Makefile M /trunk/SQL/historytest.cpp M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/DocumentInfo.cpp M /trunk/Utils/IndexedDocument.cpp Dropped ActionHistory. Fixed issue with threads' end signaling. Don't append to the index tree unless the last documents page is being shown and is not empty. ------------------------------------------------------------------------ r48 | fabricecolin | 2006-01-02 20:35:21 +0100 (Mon, 02 Jan 2006) | 2 lines Changed paths: M /trunk/TODO -1 +1 item. ------------------------------------------------------------------------ r47 | fabricecolin | 2006-01-02 20:34:26 +0100 (Mon, 02 Jan 2006) | 3 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/importDialog.hh M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/importDialog_glade.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc Update all properties after a document update. Don't attempt completion when characters are deleted off the live query field. Started reworking importDialog. ------------------------------------------------------------------------ r46 | fabricecolin | 2006-01-02 20:30:57 +0100 (Mon, 02 Jan 2006) | 3 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h Limit the length of terms. Make sure the database is unlocked even when an exception is caught. ------------------------------------------------------------------------ r45 | fabricecolin | 2006-01-01 22:45:56 +0100 (Sun, 01 Jan 2006) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/Makefile D /trunk/UI/GTK2/src/aboutDialog.cc D /trunk/UI/GTK2/src/aboutDialog.hh D /trunk/UI/GTK2/src/aboutDialog_glade.cc D /trunk/UI/GTK2/src/aboutDialog_glade.hh M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/pinot.spec M /trunk/po/POTFILES Dropped aboutDialog. ------------------------------------------------------------------------ r44 | fabricecolin | 2006-01-01 19:13:07 +0100 (Sun, 01 Jan 2006) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh Enabled completion on the query field, based on terms present in the documents index. ------------------------------------------------------------------------ r43 | fabricecolin | 2005-12-31 17:08:45 +0100 (Sat, 31 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc Simplified IndexThread. Update the details in the tree after a document update. ------------------------------------------------------------------------ r42 | fabricecolin | 2005-12-31 14:11:29 +0100 (Sat, 31 Dec 2005) | 4 lines Changed paths: M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/importDialog.cc M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/queryDialog.cc Somewhat better charset conversion, especially for results, for which the charset obtained by the engine or the document is taken into account. Stuff that's already in UTF-8 is kept that way. ------------------------------------------------------------------------ r41 | fabricecolin | 2005-12-31 12:26:01 +0100 (Sat, 31 Dec 2005) | 2 lines Changed paths: M /trunk/Search/SearchEngineInterface.cpp M /trunk/Search/SearchEngineInterface.h M /trunk/Search/WebEngine.cpp Added getResultsCharset() to help with charset conversions. ------------------------------------------------------------------------ r40 | fabricecolin | 2005-12-31 12:25:03 +0100 (Sat, 31 Dec 2005) | 2 lines Changed paths: M /trunk/Collect/FileCollector.cpp M /trunk/Collect/NeonDownloader.cpp Return an HtmlDocument object if the type is HTML. ------------------------------------------------------------------------ r39 | fabricecolin | 2005-12-31 12:23:59 +0100 (Sat, 31 Dec 2005) | 2 lines Changed paths: M /trunk/Utils/HtmlDocument.cpp M /trunk/Utils/HtmlDocument.h Attempt to extract title and content type from HTML head. ------------------------------------------------------------------------ r38 | fabricecolin | 2005-12-31 12:23:01 +0100 (Sat, 31 Dec 2005) | 3 lines Changed paths: M /trunk/Tokenize/HtmlTokenizer.cpp M /trunk/Utils/StringManip.cpp M /trunk/Utils/StringManip.h Moved function removeLinkQuotes() to StringManip. Minor fix to META tags extraction. ------------------------------------------------------------------------ r37 | fabricecolin | 2005-12-30 16:16:45 +0100 (Fri, 30 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/importDialog_glade.cc M /trunk/UI/GTK2/src/indexDialog_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/prefsDialog_glade.cc M /trunk/UI/GTK2/src/propertiesDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.cc Removed unhelpful _("") from glademm-generated source. ------------------------------------------------------------------------ r36 | fabricecolin | 2005-12-30 15:25:21 +0100 (Fri, 30 Dec 2005) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/pinot.cc M /trunk/Utils/Languages.cpp M /trunk/Utils/Languages.h Save language names in English, load in current locale. ------------------------------------------------------------------------ r35 | fabricecolin | 2005-12-30 14:47:31 +0100 (Fri, 30 Dec 2005) | 2 lines Changed paths: M /trunk/TODO -4 +2 items. ------------------------------------------------------------------------ r34 | fabricecolin | 2005-12-30 14:46:55 +0100 (Fri, 30 Dec 2005) | 3 lines Changed paths: M /trunk/Index/Summarizer.cpp M /trunk/Index/XapianIndex.cpp M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h M /trunk/po/en_GB.po M /trunk/po/fr_FR.po M /trunk/textcat_conf.txt Fixed issues with documents and queries language (whether it should in the current locale or in English). ------------------------------------------------------------------------ r33 | fabricecolin | 2005-12-30 13:12:53 +0100 (Fri, 30 Dec 2005) | 2 lines Changed paths: M /trunk/pinot.spec Added StartupNotify to .desktop file. ------------------------------------------------------------------------ r32 | fabricecolin | 2005-12-30 13:12:21 +0100 (Fri, 30 Dec 2005) | 2 lines Changed paths: M /trunk/po/POTFILES M /trunk/po/en_GB.po M /trunk/po/fr_FR.po Updated translations. ------------------------------------------------------------------------ r31 | fabricecolin | 2005-12-30 13:11:43 +0100 (Fri, 30 Dec 2005) | 4 lines Changed paths: M /trunk/UI/GTK2/src/Makefile M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/mainWindow.cc A /trunk/UI/GTK2/src/pinot.cc (from /trunk/UI/GTK2/src/pinot.cpp:27) D /trunk/UI/GTK2/src/pinot.cpp Sorted out issues with menuitems and view tab. Inform user if document to index is already indexed. In the results tree, don't repeat the name of the query for every group. Catch signals and quit cleanly. ------------------------------------------------------------------------ r30 | fabricecolin | 2005-12-29 22:30:54 +0100 (Thu, 29 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/HtmlView.cpp M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh Hide, don't close the view tab. Fixed document update. ------------------------------------------------------------------------ r29 | fabricecolin | 2005-12-29 15:09:35 +0100 (Thu, 29 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/HtmlView.cpp M /trunk/UI/GTK2/src/HtmlView.h M /trunk/UI/GTK2/src/IndexPage.cpp M /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/Makefile A /trunk/UI/GTK2/src/Notebook.cpp (from /trunk/UI/GTK2/src/NotebookTabBox.cpp:28) A /trunk/UI/GTK2/src/Notebook.h (from /trunk/UI/GTK2/src/NotebookTabBox.h:28) D /trunk/UI/GTK2/src/NotebookTabBox.cpp D /trunk/UI/GTK2/src/NotebookTabBox.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/ResultsTree.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh M /trunk/UI/RenderHTML/MozillaRenderer.cpp M /trunk/UI/RenderHTML/MozillaRenderer.h All notebook tabs are open on a need-to basis and can can be closed. ------------------------------------------------------------------------ r28 | fabricecolin | 2005-12-29 10:00:55 +0100 (Thu, 29 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade A /trunk/UI/GTK2/src/IndexPage.cpp A /trunk/UI/GTK2/src/IndexPage.h M /trunk/UI/GTK2/src/IndexTree.cpp M /trunk/UI/GTK2/src/IndexTree.h M /trunk/UI/GTK2/src/Makefile A /trunk/UI/GTK2/src/NotebookTabBox.cpp A /trunk/UI/GTK2/src/NotebookTabBox.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/PinotUtils.cpp M /trunk/UI/GTK2/src/PinotUtils.h M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh First stab at getting rid of 3 tabs limitation. ------------------------------------------------------------------------ r26 | fabricecolin | 2005-12-18 11:26:22 +0100 (Sun, 18 Dec 2005) | 2 lines Changed paths: M /trunk/ChangeLog M /trunk/UI/GTK2/config.h M /trunk/pinot.spec Bumping version number to 0.35. ------------------------------------------------------------------------ r25 | fabricecolin | 2005-12-18 06:45:01 +0100 (Sun, 18 Dec 2005) | 2 lines Changed paths: M /trunk/TODO +4 items. ------------------------------------------------------------------------ r24 | fabricecolin | 2005-12-17 17:41:27 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/po/en_GB.po M /trunk/po/fr_FR.po Updated po files. ------------------------------------------------------------------------ r23 | fabricecolin | 2005-12-17 17:40:35 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.h M /trunk/UI/GTK2/src/pinot.cpp M /trunk/UI/RenderHTML/MozillaRenderer.cpp M /trunk/UI/RenderHTML/MozillaRenderer.h Mostly cosmetic changes to startup initialization. ------------------------------------------------------------------------ r22 | fabricecolin | 2005-12-17 17:38:56 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Minor fix for LabelUpdateThread. ------------------------------------------------------------------------ r21 | fabricecolin | 2005-12-17 12:55:43 +0100 (Sat, 17 Dec 2005) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/mainWindow.cc Changed IndexInterface a bit, reintroduced hasLabel() so that whether documents match the current label can be shown in the index list. ------------------------------------------------------------------------ r20 | fabricecolin | 2005-12-17 12:02:30 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/variables.mk Link against mozilla-nss libraries. ------------------------------------------------------------------------ r19 | fabricecolin | 2005-12-17 12:01:43 +0100 (Sat, 17 Dec 2005) | 3 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp M /trunk/Collect/NeonDownloader.h M /trunk/UI/GTK2/src/pinot.cpp NeonDownloader sets up the callbacks necessary for safe multi-threading with OpenSSL. ------------------------------------------------------------------------ r18 | fabricecolin | 2005-12-17 11:49:04 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/UI/RenderHTML/MozillaRenderer.cpp Initialize NSPR and NSS to avoid segmentation fault on https sites. ------------------------------------------------------------------------ r17 | fabricecolin | 2005-12-17 05:12:31 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Hopefully fixed copy and paste into the live query field. ------------------------------------------------------------------------ r16 | fabricecolin | 2005-12-17 04:33:44 +0100 (Sat, 17 Dec 2005) | 7 lines Changed paths: M /trunk/Search/XapianEngine.cpp M /trunk/Search/XapianEngine.h Experimenting with multi-step search : 1. follow operators and don't stem terms 2. if no results, follow operators and stem terms 3. if no results, don't follow operators and don't stem terms 4. if no results, don't follow operators and stem terms Steps 2 and 4 depend on a language being defined for the query. ------------------------------------------------------------------------ r15 | fabricecolin | 2005-12-17 04:20:03 +0100 (Sat, 17 Dec 2005) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Utils/StringManip.cpp Fixed extraction of language from document data. ------------------------------------------------------------------------ r14 | fabricecolin | 2005-12-16 01:34:33 +0100 (Fri, 16 Dec 2005) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/UI/GTK2/src/mainWindow.cc Fixed index listing with an offset. Allow to paste into live queries field. ------------------------------------------------------------------------ r13 | fabricecolin | 2005-12-15 17:56:48 +0100 (Thu, 15 Dec 2005) | 2 lines Changed paths: M /trunk/Utils/XapianDatabase.h IndexHistory is history... ------------------------------------------------------------------------ r12 | fabricecolin | 2005-12-15 17:50:35 +0100 (Thu, 15 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh M /trunk/UI/GTK2/src/mainWindow_glade.cc M /trunk/UI/GTK2/src/mainWindow_glade.hh In the index tab, dropped the First and Last buttons. ------------------------------------------------------------------------ r11 | fabricecolin | 2005-12-15 17:49:31 +0100 (Thu, 15 Dec 2005) | 2 lines Changed paths: M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h D /trunk/SQL/IndexHistory.cpp D /trunk/SQL/IndexHistory.h M /trunk/SQL/Makefile M /trunk/SQL/historytest.cpp M /trunk/Search/QueryProperties.cpp Second step : got rid of IndexHistory and rely exclusively on the index. ------------------------------------------------------------------------ r10 | fabricecolin | 2005-12-15 15:56:45 +0100 (Thu, 15 Dec 2005) | 2 lines Changed paths: D /trunk/SQL/LabelManager.cpp D /trunk/SQL/LabelManager.h Obsolete. ------------------------------------------------------------------------ r9 | fabricecolin | 2005-12-15 15:46:50 +0100 (Thu, 15 Dec 2005) | 4 lines Changed paths: M /trunk/UI/GTK2/metase-gtk2.glade M /trunk/UI/GTK2/src/MonitorHandler.cpp M /trunk/UI/GTK2/src/MonitorHandler.h M /trunk/UI/GTK2/src/PinotSettings.cpp M /trunk/UI/GTK2/src/ResultsTree.cpp M /trunk/UI/GTK2/src/WorkerThreads.cpp M /trunk/UI/GTK2/src/WorkerThreads.h M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/pinot.cpp M /trunk/UI/GTK2/src/prefsDialog.cc M /trunk/UI/GTK2/src/prefsDialog.hh M /trunk/UI/GTK2/src/queryDialog.cc M /trunk/UI/GTK2/src/queryDialog.hh M /trunk/UI/GTK2/src/queryDialog_glade.cc M /trunk/UI/GTK2/src/queryDialog_glade.hh "None of the words" is in the queries first properties tab, as it should work with all engines. Added a label filter in Advanced. All labels operations are handled by the index now, LabelManager is obsolete. ------------------------------------------------------------------------ r8 | fabricecolin | 2005-12-15 15:44:29 +0100 (Thu, 15 Dec 2005) | 3 lines Changed paths: M /trunk/Index/IndexInterface.h M /trunk/Index/XapianIndex.cpp M /trunk/Index/XapianIndex.h M /trunk/Index/indextest.cpp M /trunk/SQL/Makefile M /trunk/Search/Google/GoogleAPIEngine.cpp M /trunk/Search/ObjectsSearch/ObjectsSearchAPIEngine.cpp M /trunk/Search/PluginWebEngine.cpp M /trunk/Search/QueryProperties.cpp M /trunk/Search/QueryProperties.h M /trunk/Search/XapianEngine.cpp First step towards rationalizing the index back-end : pushed labels into the index so that they can be used as filters by queries. ------------------------------------------------------------------------ r7 | fabricecolin | 2005-12-13 13:34:52 +0100 (Tue, 13 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc M /trunk/UI/GTK2/src/mainWindow.hh When pasting into the stored queries tree, pop up a queryDialog. ------------------------------------------------------------------------ r6 | fabricecolin | 2005-12-12 17:21:44 +0100 (Mon, 12 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/WorkerThreads.cpp Check select() return value and break on errors ! ------------------------------------------------------------------------ r5 | fabricecolin | 2005-12-12 17:20:24 +0100 (Mon, 12 Dec 2005) | 2 lines Changed paths: M /trunk/UI/GTK2/src/mainWindow.cc Always check that IndexTree::getSelection() actually returns something. ------------------------------------------------------------------------ r4 | fabricecolin | 2005-12-12 17:16:37 +0100 (Mon, 12 Dec 2005) | 2 lines Changed paths: M /trunk/Collect/NeonDownloader.cpp Don't return a document if an error occured. ------------------------------------------------------------------------ r1 | fabricecolin | 2005-12-08 14:46:51 +0100 (Thu, 08 Dec 2005) | 1 line Changed paths: A /trunk A /trunk/COPYING A /trunk/ChangeLog A /trunk/Collect A /trunk/Collect/DownloaderFactory.cpp A /trunk/Collect/DownloaderFactory.h A /trunk/Collect/DownloaderInterface.cpp A /trunk/Collect/DownloaderInterface.h A /trunk/Collect/FileCollector.cpp A /trunk/Collect/FileCollector.h A /trunk/Collect/Makefile A /trunk/Collect/MboxCollector.cpp A /trunk/Collect/MboxCollector.h A /trunk/Collect/NeonDownloader.cpp A /trunk/Collect/NeonDownloader.h A /trunk/Collect/XapianCollector.cpp A /trunk/Collect/XapianCollector.h A /trunk/Collect/dloadtest.cpp A /trunk/Index A /trunk/Index/IndexInterface.h A /trunk/Index/LanguageDetector.cpp A /trunk/Index/LanguageDetector.h A /trunk/Index/Makefile A /trunk/Index/Summarizer.cpp A /trunk/Index/Summarizer.h A /trunk/Index/XapianIndex.cpp A /trunk/Index/XapianIndex.h A /trunk/Index/indextest.cpp A /trunk/Makefile A /trunk/README A /trunk/SQL A /trunk/SQL/ActionHistory.cpp A /trunk/SQL/ActionHistory.h A /trunk/SQL/IndexHistory.cpp A /trunk/SQL/IndexHistory.h A /trunk/SQL/LabelManager.cpp A /trunk/SQL/LabelManager.h A /trunk/SQL/Makefile A /trunk/SQL/QueryHistory.cpp A /trunk/SQL/QueryHistory.h A /trunk/SQL/SQLiteBase.cpp A /trunk/SQL/SQLiteBase.h A /trunk/SQL/ViewHistory.cpp A /trunk/SQL/ViewHistory.h A /trunk/SQL/historytest.cpp A /trunk/Search A /trunk/Search/Google A /trunk/Search/Google/GoogleAPIEngine.cpp A /trunk/Search/Google/GoogleAPIEngine.h A /trunk/Search/Google/GoogleSearch.h A /trunk/Search/Google/Makefile A /trunk/Search/Makefile A /trunk/Search/ObjectsSearch A /trunk/Search/ObjectsSearch/Makefile A /trunk/Search/ObjectsSearch/ObjectsSearch.h A /trunk/Search/ObjectsSearch/ObjectsSearchAPIEngine.cpp A /trunk/Search/ObjectsSearch/ObjectsSearchAPIEngine.h A /trunk/Search/PluginWebEngine.cpp A /trunk/Search/PluginWebEngine.h A /trunk/Search/Plugins A /trunk/Search/Plugins/A9.src A /trunk/Search/Plugins/Acoona.src A /trunk/Search/Plugins/Altavista.src A /trunk/Search/Plugins/AmazonAPI.src A /trunk/Search/Plugins/AskJeeves.src A /trunk/Search/Plugins/BitTorrent.src A /trunk/Search/Plugins/Clusty.src A /trunk/Search/Plugins/Freshmeat.src A /trunk/Search/Plugins/Google.src A /trunk/Search/Plugins/Koders.src A /trunk/Search/Plugins/Lycos.src A /trunk/Search/Plugins/MSN.src A /trunk/Search/Plugins/Omega.src A /trunk/Search/Plugins/Teoma.src A /trunk/Search/Plugins/Topix.src A /trunk/Search/Plugins/Wikipedia.src A /trunk/Search/Plugins/WiseNut.src A /trunk/Search/Plugins/Yahoo.src A /trunk/Search/Plugins/YahooAPI.src A /trunk/Search/QueryProperties.cpp A /trunk/Search/QueryProperties.h A /trunk/Search/SOAPEnvNS.cpp A /trunk/Search/SearchEngineFactory.cpp A /trunk/Search/SearchEngineFactory.h A /trunk/Search/SearchEngineInterface.cpp A /trunk/Search/SearchEngineInterface.h A /trunk/Search/WebEngine.cpp A /trunk/Search/WebEngine.h A /trunk/Search/XapianEngine.cpp A /trunk/Search/XapianEngine.h A /trunk/Search/senginetest.cpp A /trunk/TODO A /trunk/Tokenize A /trunk/Tokenize/HtmlTokenizer.cpp A /trunk/Tokenize/HtmlTokenizer.h A /trunk/Tokenize/Makefile A /trunk/Tokenize/PdfTokenizer.cpp A /trunk/Tokenize/PdfTokenizer.h A /trunk/Tokenize/Tokenizer.cpp A /trunk/Tokenize/Tokenizer.h A /trunk/Tokenize/TokenizerFactory.cpp A /trunk/Tokenize/TokenizerFactory.h A /trunk/Tokenize/UnknownTypeTokenizer.cpp A /trunk/Tokenize/UnknownTypeTokenizer.h A /trunk/Tokenize/WordTokenizer.cpp A /trunk/Tokenize/WordTokenizer.h A /trunk/Tokenize/tokenizertest.cpp A /trunk/UI A /trunk/UI/GTK2 A /trunk/UI/GTK2/config.h A /trunk/UI/GTK2/metase-gtk2.glade A /trunk/UI/GTK2/metase-gtk2.gladep A /trunk/UI/GTK2/pinot.png A /trunk/UI/GTK2/pinot.xcf A /trunk/UI/GTK2/src A /trunk/UI/GTK2/src/EnginesTree.cpp A /trunk/UI/GTK2/src/EnginesTree.h A /trunk/UI/GTK2/src/HtmlView.cpp A /trunk/UI/GTK2/src/HtmlView.h A /trunk/UI/GTK2/src/IndexTree.cpp A /trunk/UI/GTK2/src/IndexTree.h A /trunk/UI/GTK2/src/Makefile A /trunk/UI/GTK2/src/ModelColumns.cpp A /trunk/UI/GTK2/src/ModelColumns.h A /trunk/UI/GTK2/src/MonitorHandler.cpp A /trunk/UI/GTK2/src/MonitorHandler.h A /trunk/UI/GTK2/src/PinotSettings.cpp A /trunk/UI/GTK2/src/PinotSettings.h A /trunk/UI/GTK2/src/PinotUtils.cpp A /trunk/UI/GTK2/src/PinotUtils.h A /trunk/UI/GTK2/src/ResultsTree.cpp A /trunk/UI/GTK2/src/ResultsTree.h A /trunk/UI/GTK2/src/WorkerThreads.cpp A /trunk/UI/GTK2/src/WorkerThreads.h A /trunk/UI/GTK2/src/aboutDialog.cc A /trunk/UI/GTK2/src/aboutDialog.hh A /trunk/UI/GTK2/src/aboutDialog_glade.cc A /trunk/UI/GTK2/src/aboutDialog_glade.hh A /trunk/UI/GTK2/src/importDialog.cc A /trunk/UI/GTK2/src/importDialog.hh A /trunk/UI/GTK2/src/importDialog_glade.cc A /trunk/UI/GTK2/src/importDialog_glade.hh A /trunk/UI/GTK2/src/indexDialog.cc A /trunk/UI/GTK2/src/indexDialog.hh A /trunk/UI/GTK2/src/indexDialog_glade.cc A /trunk/UI/GTK2/src/indexDialog_glade.hh A /trunk/UI/GTK2/src/mainWindow.cc A /trunk/UI/GTK2/src/mainWindow.hh A /trunk/UI/GTK2/src/mainWindow_glade.cc A /trunk/UI/GTK2/src/mainWindow_glade.hh A /trunk/UI/GTK2/src/pinot.cpp A /trunk/UI/GTK2/src/prefsDialog.cc A /trunk/UI/GTK2/src/prefsDialog.hh A /trunk/UI/GTK2/src/prefsDialog_glade.cc A /trunk/UI/GTK2/src/prefsDialog_glade.hh A /trunk/UI/GTK2/src/propertiesDialog.cc A /trunk/UI/GTK2/src/propertiesDialog.hh A /trunk/UI/GTK2/src/propertiesDialog_glade.cc A /trunk/UI/GTK2/src/propertiesDialog_glade.hh A /trunk/UI/GTK2/src/queryDialog.cc A /trunk/UI/GTK2/src/queryDialog.hh A /trunk/UI/GTK2/src/queryDialog_glade.cc A /trunk/UI/GTK2/src/queryDialog_glade.hh A /trunk/UI/GTK2/xapian-powered.png A /trunk/UI/RenderHTML A /trunk/UI/RenderHTML/Makefile A /trunk/UI/RenderHTML/MozillaRenderer.cpp A /trunk/UI/RenderHTML/MozillaRenderer.h A /trunk/Utils A /trunk/Utils/Document.cpp A /trunk/Utils/Document.h A /trunk/Utils/DocumentInfo.cpp A /trunk/Utils/DocumentInfo.h A /trunk/Utils/HtmlDocument.cpp A /trunk/Utils/HtmlDocument.h A /trunk/Utils/IndexedDocument.cpp A /trunk/Utils/IndexedDocument.h A /trunk/Utils/Languages.cpp A /trunk/Utils/Languages.h A /trunk/Utils/MIMEScanner.cpp A /trunk/Utils/MIMEScanner.h A /trunk/Utils/Makefile A /trunk/Utils/MboxParser.cpp A /trunk/Utils/MboxParser.h A /trunk/Utils/NLS.h A /trunk/Utils/PluginParser.cpp A /trunk/Utils/PluginParser.h A /trunk/Utils/Result.cpp A /trunk/Utils/Result.h A /trunk/Utils/StringManip.cpp A /trunk/Utils/StringManip.h A /trunk/Utils/TimeConverter.cpp A /trunk/Utils/TimeConverter.h A /trunk/Utils/Timer.cpp A /trunk/Utils/Timer.h A /trunk/Utils/Url.cpp A /trunk/Utils/Url.h A /trunk/Utils/XapianDatabase.cpp A /trunk/Utils/XapianDatabase.h A /trunk/Utils/XapianDatabaseFactory.cpp A /trunk/Utils/XapianDatabaseFactory.h A /trunk/Utils/plugintest.cpp A /trunk/index.html A /trunk/libxmlpp026.patch A /trunk/pinot.spec A /trunk/po A /trunk/po/POTFILES A /trunk/po/en_GB.po A /trunk/po/fr_FR.po A /trunk/textcat_conf.txt A /trunk/variables.mk v0.30 source ------------------------------------------------------------------------ pinot-1.22/Collect/000077500000000000000000000000001470740426600141665ustar00rootroot00000000000000pinot-1.22/Collect/CurlDownloader.cpp000066400000000000000000000303201470740426600176140ustar00rootroot00000000000000/* * Copyright 2005-2014 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #ifdef HAVE_REGEX_H #include #endif #include #include #include #include #include "StringManip.h" #include "Url.h" #include "CurlDownloader.h" using namespace std; struct ContentInfo { char *m_pContent; size_t m_contentLen; string m_lastModified; map m_headers; }; static void freeContentInfo(struct ContentInfo *pInfo) { if (pInfo == NULL) { return; } if (pInfo->m_pContent != NULL) { free(pInfo->m_pContent); pInfo->m_pContent = NULL; pInfo->m_contentLen = 0; } } static size_t writeCallback(void *pData, size_t dataSize, size_t elementsCount, void *pStream) { ContentInfo *pInfo = NULL; size_t totalSize = elementsCount * dataSize; if (pStream == NULL) { return 0; } pInfo = (ContentInfo *)pStream; char *pNewContent = (char*)realloc(pInfo->m_pContent, pInfo->m_contentLen + totalSize + 1); if (pNewContent == NULL) { #ifdef DEBUG clog << "writeCallback: failed to enlarge buffer" << endl; #endif freeContentInfo(pInfo); return 0; } pInfo->m_pContent = pNewContent; memcpy(pInfo->m_pContent + pInfo->m_contentLen, pData, totalSize); pInfo->m_contentLen += totalSize; pInfo->m_pContent[pInfo->m_contentLen] = '\0'; if (totalSize < strlen((const char*)pData)) { void *pBadChar = NULL; // There's a NULL character in the buffer ? Replace it while ((pBadChar = memchr((void*)pInfo->m_pContent, '\0', pInfo->m_contentLen)) != NULL) { ((char*)pBadChar)[0] = ' '; #ifdef DEBUG clog << "writeCallback: bad character" << endl; #endif } } return totalSize; } static size_t headerCallback(void *pData, size_t dataSize, size_t elementsCount, void *pStream) { ContentInfo *pInfo = NULL; size_t totalSize = elementsCount * dataSize; if ((pStream == NULL) || (pData == NULL) || (totalSize == 0)) { return 0; } pInfo = (ContentInfo *)pStream; string header((const char*)pData, totalSize); string::size_type pos = header.find("Last-Modified: "); #ifdef DEBUG clog << "headerCallback: header " << header << endl; #endif if (pos == 0) { pInfo->m_lastModified = header.substr(15); return totalSize; } pos = header.find(": "); if ((pos != string::npos) && (header.length() > pos + 2)) { string headerValue(StringManip::extractField(header.substr(pos + 2), "\"", "\"")); if (headerValue.empty() == true) { headerValue = header.substr(pos + 2); } StringManip::trimSpaces(headerValue); pInfo->m_headers.insert(pair(header.substr(0, pos), headerValue)); } return totalSize; } unsigned int CurlDownloader::m_initialized = 0; CurlDownloader::CurlDownloader() : DownloaderInterface() { if (m_initialized == 0) { // Initialize curl_global_init(CURL_GLOBAL_ALL); ++m_initialized; } } CurlDownloader::~CurlDownloader() { --m_initialized; if (m_initialized == 0) { // Shutdown curl_global_cleanup(); } } Document *CurlDownloader::populateDocument(const DocumentInfo &docInfo, const string &url, void *pHandler, void *pInfo) { if ((pHandler == NULL) || (pInfo == NULL)) { return NULL; } Document *pDocument = new Document(docInfo); ContentInfo *pContentInfo = (ContentInfo *)pInfo; char *pContentType = NULL; long responseCode = 200; // Copy the document content pDocument->setData(pContentInfo->m_pContent, pContentInfo->m_contentLen); pDocument->setLocation(url); pDocument->setSize((off_t )pContentInfo->m_contentLen); // What's the Content-Type ? CURLcode res = curl_easy_getinfo((CURL *)pHandler, CURLINFO_CONTENT_TYPE, &pContentType); if ((res == CURLE_OK) && (pContentType != NULL)) { pDocument->setType(pContentType); } // What's the response code ? res = curl_easy_getinfo((CURL *)pHandler, CURLINFO_RESPONSE_CODE, &responseCode); if (res == CURLE_OK) { stringstream numStr; numStr << responseCode; pDocument->setOther("ResponseCode", numStr.str()); } // The Last-Modified date ? if (pContentInfo->m_lastModified.empty() == false) { pDocument->setTimestamp(pContentInfo->m_lastModified); } for (map::const_iterator headerIter = pContentInfo->m_headers.begin(); headerIter != pContentInfo->m_headers.end(); ++headerIter) { pDocument->setOther(headerIter->first, headerIter->second); } return pDocument; } // // Implementation of DownloaderInterface // /// Retrieves the specified document; NULL if error. Document *CurlDownloader::retrieveUrl(const DocumentInfo &docInfo) { map headers; return retrieveUrl(docInfo, headers); } /// Retrieves the specified document; NULL if error. Document *CurlDownloader::retrieveUrl(const DocumentInfo &docInfo, const map &headers) { Document *pDocument = NULL; string ipath(docInfo.getInternalPath()); string url(docInfo.getLocation()); char pBuffer[1024]; unsigned int redirectionsCount = 0; if (url.empty() == true) { #ifdef DEBUG clog << "CurlDownloader::retrieveUrl: no URL specified !" << endl; #endif return NULL; } if (ipath.empty() == false) { url += "?"; url += ipath; } // Create a session CURL *pCurlHandler = curl_easy_init(); if (pCurlHandler == NULL) { return NULL; } struct curl_slist *pHeadersList = NULL; ContentInfo *pContentInfo = new ContentInfo; pContentInfo->m_pContent = NULL; pContentInfo->m_contentLen = 0; // Add headers for (map::const_iterator headerIter = headers.begin(); headerIter != headers.end(); ++headerIter) { snprintf(pBuffer, sizeof(pBuffer), "%s: %s", headerIter->first.c_str(), headerIter->second.c_str()); pHeadersList = curl_slist_append(pHeadersList, pBuffer); } curl_easy_setopt(pCurlHandler, CURLOPT_AUTOREFERER, 1); curl_easy_setopt(pCurlHandler, CURLOPT_FOLLOWLOCATION, 1); curl_easy_setopt(pCurlHandler, CURLOPT_MAXREDIRS, 10); curl_easy_setopt(pCurlHandler, CURLOPT_USERAGENT, m_userAgent.c_str()); curl_easy_setopt(pCurlHandler, CURLOPT_NOSIGNAL, (long)1); curl_easy_setopt(pCurlHandler, CURLOPT_TIMEOUT, (long)m_timeout); #ifndef DEBUG curl_easy_setopt(pCurlHandler, CURLOPT_NOPROGRESS, 1); #endif curl_easy_setopt(pCurlHandler, CURLOPT_HTTPHEADER, pHeadersList); curl_easy_setopt(pCurlHandler, CURLOPT_WRITEFUNCTION, writeCallback); curl_easy_setopt(pCurlHandler, CURLOPT_WRITEDATA, pContentInfo); curl_easy_setopt(pCurlHandler, CURLOPT_HEADERFUNCTION, headerCallback); curl_easy_setopt(pCurlHandler, CURLOPT_HEADERDATA, pContentInfo); // Is a proxy defined ? // Curl automatically checks and makes use of the *_proxy environment variables if ((m_proxyAddress.empty() == false) && (m_proxyPort > 0)) { curl_proxytype proxyType = CURLPROXY_HTTP; curl_easy_setopt(pCurlHandler, CURLOPT_PROXY, m_proxyAddress.c_str()); curl_easy_setopt(pCurlHandler, CURLOPT_PROXYPORT, m_proxyPort); // Type defaults to HTTP if (m_proxyType.empty() == false) { if (m_proxyType == "SOCKS4") { proxyType = CURLPROXY_SOCKS4; } else if (m_proxyType == "SOCKS5") { proxyType = CURLPROXY_SOCKS5; } } curl_easy_setopt(pCurlHandler, CURLOPT_PROXYTYPE, proxyType); } #ifdef DEBUG clog << "CurlDownloader::retrieveUrl: URL is " << url << endl; #endif while (redirectionsCount < 10) { curl_easy_setopt(pCurlHandler, CURLOPT_URL, Url::escapeUrl(url).c_str()); if (m_method == "POST") { curl_easy_setopt(pCurlHandler, CURLOPT_POST, 1); if (m_postFields.empty() == false) { curl_easy_setopt(pCurlHandler, CURLOPT_POSTFIELDS, m_postFields.c_str()); } } CURLcode res = curl_easy_perform(pCurlHandler); if ((res == CURLE_OK) && (pContentInfo->m_pContent != NULL) && (pContentInfo->m_contentLen > 0)) { pDocument = populateDocument(docInfo, url, pCurlHandler, pContentInfo); #ifdef HAVE_REGEX_H regex_t refreshRegex; regmatch_t pMatches[2]; // Any REFRESH META tag ? // Look for if (regcomp(&refreshRegex, "" REG_EXTENDED|REG_ICASE) == 0) { if (regexec(&refreshRegex, pContentInfo->m_pContent, 2, pMatches, REG_NOTBOL|REG_NOTEOL) == 0) { url = pMatches[1]; #ifdef DEBUG clog << "CurlDownloader::retrieveUrl: redirected to URL " << url << endl; #endif delete pDocument; pDocument = NULL; freeContentInfo(pContentInfo); ++redirectionsCount; continue; } #ifdef DEBUG else clog << "CurlDownloader::retrieveUrl: no REFRESH META tag" << endl; #endif regfree(&refreshRegex); } #ifdef DEBUG else clog << "CurlDownloader::retrieveUrl: couldn't look for a REFRESH META tag" << endl; #endif #endif } else { clog << "Couldn't download " << url << ": " << curl_easy_strerror(res) << endl; } break; } freeContentInfo(pContentInfo); delete pContentInfo; // Cleanup curl_easy_cleanup(pCurlHandler); return pDocument; } Document *CurlDownloader::putUrl(const DocumentInfo &docInfo, const map &headers, const string &url) { Document *pDocument = NULL; struct curl_slist *pHeadersList = NULL; string mimeType(docInfo.getType()); string fileLocation(docInfo.getLocation()); char pBuffer[1024]; if (url.empty() == true) { #ifdef DEBUG clog << "CurlDownloader::putUrl: no URL specified !" << endl; #endif return NULL; } FILE *pFile = fopen(fileLocation.c_str(), "r"); if (pFile == NULL) { #ifdef DEBUG clog << "CurlDownloader::putUrl: couldn't open file " << fileLocation << endl; #endif return NULL; } // Create a session CURL *pCurlHandler = curl_easy_init(); if (pCurlHandler == NULL) { fclose(pFile); return NULL; } ContentInfo *pContentInfo = new ContentInfo; pContentInfo->m_pContent = NULL; pContentInfo->m_contentLen = 0; // Add headers for (map::const_iterator headerIter = headers.begin(); headerIter != headers.end(); ++headerIter) { snprintf(pBuffer, sizeof(pBuffer), "%s: %s", headerIter->first.c_str(), headerIter->second.c_str()); pHeadersList = curl_slist_append(pHeadersList, pBuffer); } curl_easy_setopt(pCurlHandler, CURLOPT_FOLLOWLOCATION, 1); curl_easy_setopt(pCurlHandler, CURLOPT_USERAGENT, m_userAgent.c_str()); curl_easy_setopt(pCurlHandler, CURLOPT_NOSIGNAL, (long)1); curl_easy_setopt(pCurlHandler, CURLOPT_TIMEOUT, (long)m_timeout); #ifndef DEBUG curl_easy_setopt(pCurlHandler, CURLOPT_NOPROGRESS, 1); #endif curl_easy_setopt(pCurlHandler, CURLOPT_HTTPHEADER, pHeadersList); curl_easy_setopt(pCurlHandler, CURLOPT_URL, url.c_str()); // Use the default read function curl_easy_setopt(pCurlHandler, CURLOPT_READFUNCTION, NULL); curl_easy_setopt(pCurlHandler, CURLOPT_READDATA, pFile); curl_easy_setopt(pCurlHandler, CURLOPT_WRITEFUNCTION, writeCallback); curl_easy_setopt(pCurlHandler, CURLOPT_WRITEDATA, pContentInfo); curl_easy_setopt(pCurlHandler, CURLOPT_HEADERFUNCTION, headerCallback); curl_easy_setopt(pCurlHandler, CURLOPT_HEADERDATA, pContentInfo); curl_easy_setopt(pCurlHandler, CURLOPT_UPLOAD, 1); curl_easy_setopt(pCurlHandler, CURLOPT_INFILESIZE_LARGE, (curl_off_t)docInfo.getSize()); CURLcode res = curl_easy_perform(pCurlHandler); if (res == CURLE_OK) { #ifdef DEBUG clog << "CurlDownloader::putUrl: uploaded " << docInfo.getSize() << " bytes to " << url << endl; #endif pDocument = populateDocument(docInfo, url, pCurlHandler, pContentInfo); } else { clog << "Couldn't upload to " << url << ": " << curl_easy_strerror(res) << endl; } curl_slist_free_all(pHeadersList); curl_easy_cleanup(pCurlHandler); fclose(pFile); freeContentInfo(pContentInfo); delete pContentInfo; return pDocument; } pinot-1.22/Collect/CurlDownloader.h000066400000000000000000000036571470740426600172760ustar00rootroot00000000000000/* * Copyright 2005-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _CURL_DOWNLOADER_H #define _CURL_DOWNLOADER_H #include #include #include "DownloaderInterface.h" /// Wrapper around the curl API. class CurlDownloader : public DownloaderInterface { public: CurlDownloader(); virtual ~CurlDownloader(); /** * Retrieves the specified document. * NULL if error. Caller deletes. */ virtual Document *retrieveUrl(const DocumentInfo &docInfo); /** * Retrieves the specified document. * NULL if error. Caller deletes. */ virtual Document *retrieveUrl(const DocumentInfo &docInfo, const std::map &headers); /** * Puts the specified document at the given URL. * NULL if error. Caller deletes. */ virtual Document *putUrl(const DocumentInfo &docInfo, const std::map &headers, const std::string &url); protected: static unsigned int m_initialized; static Document *populateDocument(const DocumentInfo &docInfo, const std::string &url, void *pHandler, void *pInfo); private: CurlDownloader(const CurlDownloader &other); CurlDownloader &operator=(const CurlDownloader &other); }; #endif // _CURL_DOWNLOADER_H pinot-1.22/Collect/DownloaderFactory.cpp000066400000000000000000000027701470740426600203260ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifdef USE_NEON #include "NeonDownloader.h" #else #ifdef USE_CURL #include "CurlDownloader.h" #endif #endif #include "FileCollector.h" #include "DownloaderFactory.h" using std::string; DownloaderFactory::DownloaderFactory() { } DownloaderFactory::~DownloaderFactory() { } /// Returns a Downloader; NULL if unavailable. DownloaderInterface *DownloaderFactory::getDownloader(const string &protocol) { DownloaderInterface *pDownloader = NULL; // Choice by protocol if ((protocol == "http") || (protocol == "https")) { #ifdef USE_NEON pDownloader = new NeonDownloader(); #else #ifdef USE_CURL pDownloader = new CurlDownloader(); #endif #endif } else if (protocol == "file") { pDownloader = new FileCollector(); } return pDownloader; } pinot-1.22/Collect/DownloaderFactory.h000066400000000000000000000024361470740426600177720ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DOWNLOADER_FACTORY_H #define _DOWNLOADER_FACTORY_H #include #include "DownloaderInterface.h" /// Downloader factory class. class DownloaderFactory { public: virtual ~DownloaderFactory(); /// Returns a Downloader; NULL if unavailable. static DownloaderInterface *getDownloader(const std::string &protocol); protected: DownloaderFactory(); private: DownloaderFactory(const DownloaderFactory &other); DownloaderFactory &operator=(const DownloaderFactory &other); }; #endif // _DOWNLOADER_FACTORY_H pinot-1.22/Collect/DownloaderInterface.cpp000066400000000000000000000076211470740426600206170ustar00rootroot00000000000000/* * Copyright 2005-2017 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifdef WIN32 #include #endif #include #include #include #ifdef USE_SSL #include #endif #include "DownloaderInterface.h" using namespace std; #ifdef USE_SSL // OpenSSL multi-thread support, required by Curl static unsigned int g_lockArrayCount = 0; static pthread_mutex_t *g_pLockArray = NULL; // OpenSSL locking functiom static void lockingCallback(int mode, int n, const char *file, int line) { int status = 0; if (mode & CRYPTO_LOCK) { status = pthread_mutex_lock(&(g_pLockArray[n])); #ifdef DEBUG if (status != 0) { clog << "lockingCallback: failed to lock mutex " << n << endl; } #endif } else { status = pthread_mutex_unlock(&(g_pLockArray[n])); #ifdef DEBUG if (status != 0) { clog << "lockingCallback: failed to unlock mutex " << n << endl; } #endif } } static unsigned long idCallback(void) { #ifdef WIN32 return (unsigned long)GetCurrentThreadId(); #else return (unsigned long)pthread_self(); #endif } #endif /// Initialize downloaders. void DownloaderInterface::initialize(void) { #ifdef USE_SSL pthread_mutexattr_t mutexAttr; pthread_mutexattr_init(&mutexAttr); pthread_mutexattr_settype(&mutexAttr, PTHREAD_MUTEX_ERRORCHECK); // Initialize the OpenSSL mutexes #ifdef CRYPTO_num_locks g_lockArrayCount = CRYPTO_num_locks(); #else g_lockArrayCount = CRYPTO_NUM_LOCKS; #endif g_pLockArray = (pthread_mutex_t *)OPENSSL_malloc(g_lockArrayCount * sizeof(pthread_mutex_t)); for (unsigned int lockNum = 0; lockNum < g_lockArrayCount; ++lockNum) { pthread_mutex_init(&(g_pLockArray[lockNum]), &mutexAttr); } // Set the callbacks CRYPTO_set_locking_callback(lockingCallback); CRYPTO_set_id_callback(idCallback); pthread_mutexattr_destroy(&mutexAttr); #endif } /// Shutdown downloaders. void DownloaderInterface::shutdown(void) { #ifdef USE_SSL // Reset the OpenSSL callbacks CRYPTO_set_id_callback(NULL); CRYPTO_set_locking_callback(NULL); // Free the mutexes for (unsigned int lockNum = 0; lockNum < g_lockArrayCount; ++lockNum) { pthread_mutex_destroy(&(g_pLockArray[lockNum])); } OPENSSL_free(g_pLockArray); g_pLockArray = NULL; g_lockArrayCount = 0; #endif } DownloaderInterface::DownloaderInterface() : m_userAgent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041020"), m_proxyPort(0), m_timeout(60), m_method("GET") { } DownloaderInterface::~DownloaderInterface() { } /// Sets a (name, value) setting; true if success. bool DownloaderInterface::setSetting(const string &name, const string &value) { bool goodSetting = true; if (name == "useragent") { m_userAgent = value; } else if (name == "proxyaddress") { m_proxyAddress = value; } else if ((name == "proxyport") && (value.empty() == false)) { m_proxyPort = (unsigned int )atoi(value.c_str()); } else if (name == "proxytype") { m_proxyType = value; } else if (name == "timeout") { m_timeout = (unsigned int)atoi(value.c_str()); } else if (name == "method") { if ((value == "GET") || (value == "POST")) { m_method = value; } } else if (name == "postfields") { m_postFields = value; } else { goodSetting = false; } return goodSetting; } pinot-1.22/Collect/DownloaderInterface.h000066400000000000000000000041601470740426600202570ustar00rootroot00000000000000/* * Copyright 2005-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DOWNLOADER_INTERFACE_H #define _DOWNLOADER_INTERFACE_H #include #include "Document.h" /// Interface implemented by downloaders. class DownloaderInterface { public: virtual ~DownloaderInterface(); /// Initializes downloaders. static void initialize(void); /// Shutdowns downloaders. static void shutdown(void); /** * Sets a (name, value) setting. Setting names include : * proxyaddress - the address of the proxy to use * proxyport - the port of the proxy to use (positive integer) * proxytype - the type of the proxy to use * timeout - timeout in seconds * method - GET or POST * postfields - data to post * Returns true if success. */ virtual bool setSetting(const std::string &name, const std::string &value); /// Retrieves the specified document; NULL if error. Caller deletes. virtual Document *retrieveUrl(const DocumentInfo &docInfo) = 0; /// Retrieves the specified document; NULL if error. Caller deletes. virtual Document *retrieveUrl(const DocumentInfo &docInfo, const std::map &headers) = 0; protected: std::string m_userAgent; std::string m_proxyAddress; unsigned int m_proxyPort; std::string m_proxyType; unsigned int m_timeout; std::string m_method; std::string m_postFields; DownloaderInterface(); }; #endif // _DOWNLOADER_INTERFACE_H pinot-1.22/Collect/FileCollector.cpp000066400000000000000000000101661470740426600174240ustar00rootroot00000000000000/* * Copyright 2005-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include "MIMEScanner.h" #include "Url.h" #include "FilterUtils.h" #include "FileCollector.h" using namespace std; class LastIPathAction : public ReducedAction { public: LastIPathAction(const string &ipath) : ReducedAction(), m_ipath(ipath), m_pDocument(NULL) { } virtual ~LastIPathAction() { } virtual bool positionFilter(const Document &doc, Dijon::Filter *pFilter) { string::size_type nextPos = m_ipath.find("&next&"); string thisIPath(m_ipath); if (nextPos != string::npos) { thisIPath = m_ipath.substr(0, nextPos); } if (pFilter != NULL) { #ifdef DEBUG clog << "LastIPathAction::positionFilter: moving filter for " << pFilter->get_mime_type() << " to ipath " << thisIPath << endl; #endif pFilter->set_property(Dijon::Filter::OPERATING_MODE, "view"); // Skip forward pFilter->skip_to_document(thisIPath); } if (nextPos != string::npos) { m_ipath.erase(0, nextPos + 6); } else { m_ipath.clear(); } return true; } virtual bool isReduced(const Document &doc) { if (m_ipath.empty() == false) { // Go one level deeper #ifdef DEBUG clog << "LastIPathAction::isReduced: not final" << endl; #endif return false; } return true; } virtual bool takeAction(Document &doc, bool isNested) { if (m_ipath.empty() == true) { m_pDocument = new Document(doc); #ifdef DEBUG clog << "LastIPathAction::takeAction: ipath " << doc.getInternalPath() << " is final" << endl; #endif } return true; } Document *getDocument(void) { return m_pDocument; } protected: string m_ipath; Document *m_pDocument; private: LastIPathAction(const LastIPathAction &other); LastIPathAction &operator=(const LastIPathAction &other); }; FileCollector::FileCollector() : DownloaderInterface() { } FileCollector::~FileCollector() { } // // Implementation of DownloaderInterface // /// Retrieves the specified document; NULL if error. Document *FileCollector::retrieveUrl(const DocumentInfo &docInfo) { map headers; return retrieveUrl(docInfo, headers); } /// Retrieves the specified document; NULL if error. Document *FileCollector::retrieveUrl(const DocumentInfo &docInfo, const map &headers) { Url thisUrl(docInfo.getLocation()); string protocol(thisUrl.getProtocol()); string ipath(docInfo.getInternalPath()); if (protocol != "file") { // We can't handle that type of protocol... return NULL; } string fileLocation(thisUrl.getLocation()); fileLocation += "/"; fileLocation += thisUrl.getFile(); Document *pDocument = new Document(docInfo); if (pDocument->setDataFromFile(fileLocation) == false) { delete pDocument; return NULL; } // Determine the file type string type(MIMEScanner::scanFile(fileLocation)); // Stop here ? if (ipath.empty() == true) { if (pDocument->getType().empty() == true) { pDocument->setType(type); } return pDocument; } // Reset these to avoid confusing the filters pDocument->setInternalPath(""); pDocument->setType(type); LastIPathAction action(ipath); bool reduced = FilterUtils::reduceDocument(*pDocument, action); if (reduced == true) { Document *pBottomMostDocument = action.getDocument(); if (pBottomMostDocument != NULL) { delete pDocument; return pBottomMostDocument; } } return pDocument; } pinot-1.22/Collect/FileCollector.h000066400000000000000000000027171470740426600170740ustar00rootroot00000000000000/* * Copyright 2005-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _FILE_COLLECTOR_H #define _FILE_COLLECTOR_H #include #include "DownloaderInterface.h" /// Simple downloader for files. class FileCollector : public DownloaderInterface { public: FileCollector(); virtual ~FileCollector(); /// Retrieves the specified document; NULL if error. Caller deletes. virtual Document *retrieveUrl(const DocumentInfo &docInfo); /// Retrieves the specified document; NULL if error. Caller deletes. virtual Document *retrieveUrl(const DocumentInfo &docInfo, const std::map &headers); private: FileCollector(const FileCollector &other); FileCollector &operator=(const FileCollector &other); }; #endif // _FILE_COLLECTOR_H pinot-1.22/Collect/Makefile.am000066400000000000000000000011021470740426600162140ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in pkginclude_HEADERS = \ CurlDownloader.h \ DownloaderFactory.h \ DownloaderInterface.h \ FileCollector.h \ NeonDownloader.h pkglib_LTLIBRARIES = libCollect.la libCollect_la_LDFLAGS = \ -static libCollect_la_SOURCES = \ @HTTP_DOWNLOADER@.cpp \ DownloaderFactory.cpp \ DownloaderInterface.cpp \ FileCollector.cpp libCollect_la_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils \ -I$(top_srcdir)/Tokenize \ -I$(top_srcdir)/Tokenize/filters \ @INDEX_CFLAGS@ @XML_CFLAGS@ @HTTP_CFLAGS@ \ @GLIBMM_CFLAGS@ pinot-1.22/Collect/NeonDownloader.cpp000066400000000000000000000222321470740426600176110ustar00rootroot00000000000000/* * Copyright 2005-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include "Url.h" #include "HtmlFilter.h" #include "FilterUtils.h" #include "NeonDownloader.h" using namespace std; unsigned int NeonDownloader::m_initialized = 0; NeonDownloader::NeonDownloader() : DownloaderInterface() { if (m_initialized == 0) { // Initialize ne_sock_init(); ++m_initialized; } } NeonDownloader::~NeonDownloader() { --m_initialized; if (m_initialized == 0) { // Shutdown ne_sock_exit(); } } string NeonDownloader::handleRedirection(const char *pBody, unsigned int length) { if ((pBody == NULL) || (length == 0)) { return ""; } Document doc; Dijon::HtmlFilter htmlFilter("text/html"); set linksSet; doc.setData(pBody, length); // Extract the link from the 3xx message if ((FilterUtils::feedFilter(doc, &htmlFilter) == true) && (htmlFilter.get_links(linksSet) == true)) { // There should be one and only one if (linksSet.size() > 1) { #ifdef DEBUG clog << "NeonDownloader::handleRedirection: " << linksSet.size() << " links found in " << length << " bytes" << endl; clog << "NeonDownloader::handleRedirection: redirection message was " << pBody << endl; #endif return ""; } set::const_iterator iter = linksSet.begin(); if (iter != linksSet.end()) { // Update the URL return iter->m_url; } } return ""; } // // Implementation of DownloaderInterface // /// Retrieves the specified document; NULL if error. Document *NeonDownloader::retrieveUrl(const DocumentInfo &docInfo) { map headers; return retrieveUrl(docInfo, headers); } /// Retrieves the specified document; NULL if error. Document *NeonDownloader::retrieveUrl(const DocumentInfo &docInfo, const map &headers) { Document *pDocument = NULL; string url = Url::escapeUrl(docInfo.getLocation()); char *pContent = NULL; size_t contentLen = 0; int statusCode = 200; unsigned int redirectionsCount = 0; if (url.empty() == true) { #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: no URL specified !" << endl; #endif return NULL; } Url urlObj(url); string protocol(urlObj.getProtocol()); string hostName(urlObj.getHost()); string location(urlObj.getLocation()); string file(urlObj.getFile()); string parameters(urlObj.getParameters()); string lastModifiedHeaderValue, locationHeaderValue, contentTypeHeaderValue; // Create a session ne_session *pSession = ne_session_create(protocol.c_str(), hostName.c_str(), 80); // urlObj.getPort()); if (pSession == NULL) { #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: couldn't create session !" << endl; #endif return NULL; } // Set the user agent ne_set_useragent(pSession, m_userAgent.c_str()); // ...and the timeout ne_set_read_timeout(pSession, (int)m_timeout); // Is a proxy defined ? if ((m_proxyAddress.empty() == false) && (m_proxyPort > 0)) { // Type is HTTP ne_session_proxy(pSession, m_proxyAddress.c_str(), m_proxyPort); } string fullLocation = "/"; if (location.empty() == false) { fullLocation += location; } if (file.empty() == false) { if (location.empty() == false) { fullLocation += "/"; } fullLocation += file; } if (parameters.empty() == false) { fullLocation += "?"; fullLocation += parameters; } // Create a request for this URL ne_request *pRequest = NULL; if (m_method == "POST") { pRequest = ne_request_create(pSession, "POST", fullLocation.c_str()); if ((pRequest != NULL) && (m_postFields.empty() == false)) { ne_set_request_body_buffer(pRequest, m_postFields.c_str(), m_postFields.length()); } } else { pRequest = ne_request_create(pSession, "GET", fullLocation.c_str()); } if (pRequest == NULL) { #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: couldn't create request !" << endl; #endif ne_session_destroy(pSession); return NULL; } #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: request for " << fullLocation << " on " << hostName << endl; #endif int requestStatus = NE_RETRY; while (requestStatus == NE_RETRY) { lastModifiedHeaderValue.clear(); locationHeaderValue.clear(); contentTypeHeaderValue.clear(); // Begin the request requestStatus = ne_begin_request(pRequest); #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: request begun with status " << requestStatus << endl; #endif if (requestStatus == NE_OK) { ssize_t bytesRead = 0; char buffer[1024]; // Get the status const ne_status *pStatus = ne_get_status(pRequest); if (pStatus != NULL) { statusCode = pStatus->code; #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: status is " << statusCode << endl; #endif } else { // Assume all is well statusCode = 200; } // Read the content while ((bytesRead = ne_read_response_block(pRequest, buffer, 1024)) > 0) { pContent = (char*)realloc(pContent, contentLen + bytesRead); memcpy((void*)(pContent + contentLen), (const void*)buffer, bytesRead); contentLen += bytesRead; } // Get headers const char *pValue = ne_get_response_header(pRequest, "Last-Modified"); if (pValue != NULL) { lastModifiedHeaderValue = pValue; } pValue = ne_get_response_header(pRequest, "Location"); if (pValue != NULL) { locationHeaderValue = pValue; } pValue = ne_get_response_header(pRequest, "Content-Type"); if (pValue != NULL) { contentTypeHeaderValue = pValue; } // Redirection ? if ((statusCode >= 300) && (statusCode < 400) && (redirectionsCount < 10)) { ne_end_request(pRequest); ne_request_destroy(pRequest); pRequest = NULL; string documentUrl = handleRedirection(pContent, contentLen); if (documentUrl.empty() == true) { // Did we find a Location header ? if (locationHeaderValue.empty() == true) { // Fail free(pContent); pContent = NULL; contentLen = 0; break; } documentUrl = locationHeaderValue; } #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: redirected to " << documentUrl << endl; #endif urlObj = Url(documentUrl); location = urlObj.getLocation(); file = urlObj.getFile(); // Is this on the same host ? if (hostName != urlObj.getHost()) { // No, it isn't hostName = urlObj.getHost(); // Create a new session ne_session_destroy(pSession); pSession = ne_session_create(protocol.c_str(), hostName.c_str(), 80); // urlObj.getPort()); if (pSession == NULL) { #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: couldn't create session !" << endl; #endif return NULL; } ne_set_useragent(pSession, m_userAgent.c_str()); ne_set_read_timeout(pSession, (int)m_timeout); } // Try again fullLocation = "/"; if (location.empty() == false) { fullLocation += location; fullLocation += "/"; } if (file.empty() == false) { fullLocation += file; } #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: redirected to " << fullLocation << " on " << hostName << endl; #endif // Create a new request for this URL pRequest = ne_request_create(pSession, "GET", fullLocation.c_str()); if (pRequest == NULL) { #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: couldn't create request !" << endl; #endif ne_session_destroy(pSession); return NULL; } redirectionsCount++; requestStatus = NE_RETRY; // Discard whatever content we have already got free(pContent); pContent = NULL; contentLen = 0; continue; } } // End the request requestStatus = ne_end_request(pRequest); } if ((pContent != NULL) && (contentLen > 0)) { if (statusCode < 400) { // Copy the document content pDocument = new Document(docInfo); pDocument->setData(pContent, contentLen); pDocument->setLocation(url); pDocument->setType(contentTypeHeaderValue); if (lastModifiedHeaderValue.empty() == false) { pDocument->setTimestamp(lastModifiedHeaderValue); } #ifdef DEBUG clog << "NeonDownloader::retrieveUrl: document size is " << contentLen << endl; #endif } free(pContent); } // Cleanup ne_request_destroy(pRequest); ne_session_destroy(pSession); return pDocument; } Document *NeonDownloader::putUrl(const DocumentInfo &docInfo, const map &headers, const string &url) { // FIXME: implement this return NULL; } pinot-1.22/Collect/NeonDownloader.h000066400000000000000000000035551470740426600172650ustar00rootroot00000000000000/* * Copyright 2005-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _NEON_DOWNLOADER_H #define _NEON_DOWNLOADER_H #include #include "DownloaderInterface.h" /// Wrapper around the neon API. class NeonDownloader : public DownloaderInterface { public: NeonDownloader(); virtual ~NeonDownloader(); /** * Retrieves the specified document. * NULL if error. Caller deletes. */ virtual Document *retrieveUrl(const DocumentInfo &docInfo); /** * Retrieves the specified document. * NULL if error. Caller deletes. */ virtual Document *retrieveUrl(const DocumentInfo &docInfo, const std::map &headers); /** * Puts the specified document at the given URL. * NULL if error. Caller deletes. */ virtual Document *putUrl(const DocumentInfo &docInfo, const std::map &headers, const std::string &url); protected: static unsigned int m_initialized; std::string handleRedirection(const char *pBody, unsigned int length); private: NeonDownloader(const NeonDownloader &other); NeonDownloader &operator=(const NeonDownloader &other); }; #endif // _NEON_DOWNLOADER_H pinot-1.22/Core/000077500000000000000000000000001470740426600134715ustar00rootroot00000000000000pinot-1.22/Core/DBusServerThreads.cpp000066400000000000000000000043751470740426600175450ustar00rootroot00000000000000/* * Copyright 2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include "config.h" #include "NLS.h" #include "DBusIndex.h" #include "DBusServerThreads.h" using namespace Glib; using namespace std; DBusEngineQueryThread::DBusEngineQueryThread(const RefPtr &refInvocation, const string &engineName, const string &engineDisplayableName, const string &engineOption, const QueryProperties &queryProps, unsigned int startDoc, bool simpleQuery, bool pinotCall) : EngineQueryThread(engineName, engineDisplayableName, engineOption, queryProps, startDoc), m_refInvocation(refInvocation), m_simpleQuery(simpleQuery), m_pinotCall(pinotCall) { stringstream queryNameStr; // Give the query a unique name queryNameStr << "DBUS " << m_id; m_queryProps.setName(queryNameStr.str()); } DBusEngineQueryThread::~DBusEngineQueryThread() { } string DBusEngineQueryThread::getType(void) const { return "DBusEngineQueryThread"; } RefPtr DBusEngineQueryThread::getInvocation(void) const { return m_refInvocation; } bool DBusEngineQueryThread::isSimpleQuery(void) const { return m_simpleQuery; } bool DBusEngineQueryThread::isPinotCall(void) const { return m_pinotCall; } DBusReloadThread::DBusReloadThread() { } DBusReloadThread::~DBusReloadThread() { } string DBusReloadThread::getType(void) const { return "DBusReloadThread"; } void DBusReloadThread::doWork(void) { // Nothing to do here, we just want to inform the daemon } pinot-1.22/Core/DBusServerThreads.h000066400000000000000000000041521470740426600172030ustar00rootroot00000000000000/* * Copyright 2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DBUSSERVERTHREADS_HH #define _DBUSSERVERTHREADS_HH #include #include #include #include "WorkerThreads.h" class DBusEngineQueryThread : public EngineQueryThread { public: DBusEngineQueryThread(const Glib::RefPtr &refInvocation, const std::string &engineName, const std::string &engineDisplayableName, const std::string &engineOption, const QueryProperties &queryProps, unsigned int startDoc, bool simpleQuery, bool pinotCall = true); virtual ~DBusEngineQueryThread(); virtual std::string getType(void) const; Glib::RefPtr getInvocation(void) const; bool isSimpleQuery(void) const; bool isPinotCall(void) const; protected: Glib::RefPtr m_refInvocation; bool m_simpleQuery; bool m_pinotCall; private: DBusEngineQueryThread(const DBusEngineQueryThread &other); DBusEngineQueryThread &operator=(const DBusEngineQueryThread &other); }; class DBusReloadThread : public WorkerThread { public: DBusReloadThread(); virtual ~DBusReloadThread(); virtual std::string getType(void) const; protected: virtual void doWork(void); private: DBusReloadThread(const DBusReloadThread &other); DBusReloadThread &operator=(const DBusReloadThread &other); }; #endif // _DBUSSERVERTHREADS_HH pinot-1.22/Core/DaemonState.cpp000066400000000000000000001401011470740426600163760ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #include #include #include #include #ifdef HAVE_STATFS #ifdef HAVE_SYS_VFS_H #include #define CHECK_DISK_SPACE 1 #else #ifdef HAVE_SYS_STATFS_H #include #define CHECK_DISK_SPACE 1 #else #ifdef HAVE_SYS_MOUNT_H #if defined(__OpenBSD__) || (defined(__FreeBSD__) && (_FreeBSD_version < 700000)) #include #endif #include #define CHECK_DISK_SPACE 1 #endif #endif #endif #else #ifdef HAVE_STATVFS #include #define CHECK_DISK_SPACE 1 #endif #endif #ifdef __FreeBSD__ #ifdef HAVE_SYSCTLBYNAME #include #define CHECK_BATTERY_SYSCTL 1 #endif #endif #include #include #include #include #include #include #include #include #include #include #include "CommandLine.h" #include "Memory.h" #include "Url.h" #include "MonitorFactory.h" #include "CrawlHistory.h" #include "MetaDataBackup.h" #ifdef HAVE_DBUS #include "DBusIndex.h" #endif #include "DaemonState.h" #include "OnDiskHandler.h" #include "PinotSettings.h" #ifdef HAVE_DBUS #include "DBusServerThreads.h" #endif #include "ServerThreads.h" #define POWER_DBUS_SERVICE_NAME "org.freedesktop.UPower" #define POWER_DBUS_OBJECT_PATH "/org/freedesktop/UPower" using namespace std; using namespace Glib; static double getFSFreeSpace(const string &path) { double availableBlocks = 0.0; double blockSize = 0.0; int statSuccess = -1; #ifdef HAVE_STATFS struct statfs fsStats; statSuccess = statfs(PinotSettings::getInstance().m_daemonIndexLocation.c_str(), &fsStats); availableBlocks = (uintmax_t)fsStats.f_bavail; blockSize = fsStats.f_bsize; #else #ifdef HAVE_STATVFS struct statvfs vfsStats; statSuccess = statvfs(path.c_str(), &vfsStats); availableBlocks = (uintmax_t)vfsStats.f_bavail; // f_frsize isn't supported by all implementations blockSize = (vfsStats.f_frsize ? vfsStats.f_frsize : vfsStats.f_bsize); #endif #endif // Did it fail ? if ((statSuccess == -1) || (blockSize == 0.0)) { return -1.0; } double mbRatio = blockSize / (1024 * 1024); double availableMbSize = availableBlocks * mbRatio; #ifdef DEBUG clog << "DaemonState::getFSFreeSpace: " << availableBlocks << " blocks of " << blockSize << " bytes (" << mbRatio << ")" << endl; #endif return availableMbSize; } static string loadXMLDescription(void) { ifstream xmlFile; string xmlFileName(PREFIX); ustring xmlDescription; bool readFile = false; xmlFileName += "/share/pinot/pinot-dbus-daemon.xml"; xmlFile.open(xmlFileName.c_str()); if (xmlFile.good() == true) { xmlFile.seekg(0, ios::end); int length = xmlFile.tellg(); xmlFile.seekg(0, ios::beg); char *pXmlBuffer = new char[length + 1]; xmlFile.read(pXmlBuffer, length); if (xmlFile.fail() == false) { pXmlBuffer[length] = '\0'; xmlDescription = pXmlBuffer; readFile = true; } delete[] pXmlBuffer; } xmlFile.close(); if (readFile == false) { clog << "File " << xmlFileName << " couldn't be read" << endl; } return xmlDescription; } static void updateLabels(unsigned int docId, MetaDataBackup &metaData, IndexInterface *pIndex, set &labels, bool resetLabels) { DocumentInfo docInfo; if (pIndex == NULL) { return; } // If it's a reset, remove labels from the metadata backup if ((resetLabels == true) && (pIndex->getDocumentInfo(docId, docInfo) == true)) { metaData.deleteItem(docInfo, DocumentInfo::SERIAL_LABELS); } // Get the current labels if (resetLabels == true) { labels.clear(); pIndex->getDocumentLabels(docId, labels); } docInfo.setLabels(labels); metaData.addItem(docInfo, DocumentInfo::SERIAL_LABELS); } // A function object to stop Crawler threads with for_each() struct StopCrawlerThreadFunc { public: void operator()(map::value_type &p) { string type(p.second->getType()); if (type == "CrawlerThread") { p.second->stop(); #ifdef DEBUG clog << "StopCrawlerThreadFunc: stopped thread " << p.second->getId() << endl; #endif } } }; #ifdef HAVE_DBUS DaemonState::DBusIntrospectHandler::DBusIntrospectHandler() : IntrospectableStub() { } DaemonState::DBusIntrospectHandler::~DBusIntrospectHandler() { } void DaemonState::DBusIntrospectHandler::Introspect(IntrospectableStub::MethodInvocation &invocation) { ustring xmlDescription(loadXMLDescription()); #ifdef DEBUG clog << "DaemonState::DBusIntrospectHandler::Introspect: called" << endl; #endif invocation.ret(xmlDescription); } DaemonState::DBusMessageHandler::DBusMessageHandler(DaemonState *pServer) : PinotStub(), m_pServer(pServer), m_flushTime(time(NULL)), m_mustQuit(false) { } DaemonState::DBusMessageHandler::~DBusMessageHandler() { } bool DaemonState::DBusMessageHandler::mustQuit(void) const { return m_mustQuit; } void DaemonState::DBusMessageHandler::GetStatistics(PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); CrawlHistory crawlHistory(settings.getHistoryDatabaseName()); unsigned int crawledFilesCount = crawlHistory.getItemsCount(CrawlHistory::CRAWLED); unsigned int docsCount = pIndex->getDocumentsCount(); bool lowDiskSpace = false, onBattery = false, crawling = false; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetStatistics: called" << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (m_pServer->is_flag_set(DaemonState::LOW_DISK_SPACE) == true) { lowDiskSpace = true; } if (m_pServer->is_flag_set(DaemonState::ON_BATTERY) == true) { onBattery = true; } if (m_pServer->is_flag_set(DaemonState::CRAWLING) == true) { crawling = true; } #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetStatistics: replying with " << crawledFilesCount << " " << docsCount << " " << lowDiskSpace << onBattery << crawling << endl; #endif invocation.ret(crawledFilesCount, docsCount, lowDiskSpace, onBattery, crawling); delete pIndex; } void DaemonState::DBusMessageHandler::Reload(PinotStub::MethodInvocation &invocation) { #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::Reload: called" << endl; #endif // Since reload takes place on threads end, fire this up m_pServer->start_thread(new DBusReloadThread()); invocation.ret(true); } void DaemonState::DBusMessageHandler::Stop(PinotStub::MethodInvocation &invocation) { #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::Stop: called" << endl; #endif m_pServer->set_flag(DaemonState::STOPPED); invocation.ret(EXIT_SUCCESS); m_mustQuit = true; } void DaemonState::DBusMessageHandler::GetDocumentInfo(guint32 docId, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); DocumentInfo docInfo; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetDocumentInfo: called on " << docId << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (pIndex->getDocumentInfo(docId, docInfo) == true) { vector> tuples; DBusIndex::documentInfoToTuples(docInfo, tuples); invocation.ret(tuples); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::GetDocumentTermsCount(guint32 docId, MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetDocumentTermsCount: called on " << docId << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } unsigned int termsCount = pIndex->getDocumentTermsCount(docId); if (termsCount > 0) { invocation.ret(termsCount); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::GetDocumentTerms(guint32 docId, MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); map wordsBuffer; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetDocumentTerms: called on " << docId << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (pIndex->getDocumentTerms(docId, wordsBuffer) == true) { vector termsList; termsList.reserve(wordsBuffer.size()); for (map::const_iterator termIter = wordsBuffer.begin(); termIter != wordsBuffer.end(); ++termIter) { termsList.push_back(termIter->second.c_str()); } invocation.ret(termsList); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::GetLabels(PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); set &labelsCache = settings.m_labels; vector labelsList; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetLabels: called" << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (labelsCache.empty() == true) { pIndex->getLabels(labelsCache); } labelsList.reserve(labelsCache.size()); for (set::const_iterator labelIter = labelsCache.begin(); labelIter != labelsCache.end(); ++labelIter) { labelsList.push_back(labelIter->c_str()); } invocation.ret(labelsList); delete pIndex; } void DaemonState::DBusMessageHandler::AddLabel(const ustring &label, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); set &labelsCache = settings.m_labels; string labelName(label.c_str()); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::AddLabel: called on " << label << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (labelsCache.empty() == true) { pIndex->getLabels(labelsCache); } // Is this a known label ? if (labelsCache.find(labelName) == labelsCache.end()) { pIndex->addLabel(labelName); m_pServer->set_flag(DaemonState::SHOULD_FLUSH); } invocation.ret(label); delete pIndex; } void DaemonState::DBusMessageHandler::DeleteLabel(const ustring &label, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); MetaDataBackup metaData(settings.getHistoryDatabaseName()); set &labelsCache = settings.m_labels; string labelName(label.c_str()); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::DeleteLabel: called on " << label << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (labelsCache.empty() == true) { pIndex->getLabels(labelsCache); } // Is this a known label ? set::iterator labelIter = labelsCache.find(labelName); if ((labelIter != labelsCache.end()) && (pIndex->deleteLabel(labelName) == true)) { labelsCache.erase(labelIter); pIndex->setLabels(labelsCache, true); // Update the metadata backup metaData.deleteLabel(label.c_str()); m_pServer->set_flag(DaemonState::SHOULD_FLUSH); } invocation.ret(label); delete pIndex; } void DaemonState::DBusMessageHandler::HasLabel(guint32 docId, const ustring &label, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); set labels; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::HasLabel: called on " << docId << " " << label << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (pIndex->hasLabel(docId, label.c_str()) == true) { invocation.ret(docId); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::GetDocumentLabels(guint32 docId, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); set labels; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetDocumentLabels: called on " << docId << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (pIndex->getDocumentLabels(docId, labels) == true) { vector labelsList; labelsList.reserve(labels.size()); for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { labelsList.push_back(labelIter->c_str()); } invocation.ret(labelsList); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::SetDocumentLabels(guint32 docId, const vector &labels, bool resetLabels, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); MetaDataBackup metaData(settings.getHistoryDatabaseName()); set &labelsCache = settings.m_labels; bool updateLabelsCache = false; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::SetDocumentLabels: called on " << docId << ", " << labels.size() << " labels" << ", " << resetLabels << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (labelsCache.empty() == true) { pIndex->getLabels(labelsCache); } set labelsList; for (vector::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { string labelName(labelIter->c_str()); labelsList.insert(labelName); // Is this a known label ? if (labelsCache.find(labelName) == labelsCache.end()) { // No, it isn't but that's okay labelsCache.insert(labelName); updateLabelsCache = true; } } // Set labels if (pIndex->setDocumentLabels(docId, labelsList, resetLabels) == true) { if (updateLabelsCache == true) { pIndex->setLabels(labelsCache, true); } // Update the metadata backup updateLabels(docId, metaData, pIndex, labelsList, resetLabels); m_pServer->set_flag(DaemonState::SHOULD_FLUSH); invocation.ret(docId); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::SetDocumentsLabels(const vector &docIds, const vector &labels, bool resetLabels, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); MetaDataBackup metaData(settings.getHistoryDatabaseName()); set &labelsCache = settings.m_labels; bool updateLabelsCache = false; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::SetDocumentsLabels: called on " << docIds.size() << " IDs, " << labels.size() << " labels" << ", " << resetLabels << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (labelsCache.empty() == true) { pIndex->getLabels(labelsCache); } set idsList; set labelsList; for (vector::const_iterator idIter = docIds.begin(); idIter != docIds.end(); ++idIter) { idsList.insert((unsigned int)atoi(idIter->c_str())); } for (vector::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { string labelName(labelIter->c_str()); labelsList.insert(labelName); // Is this a known label ? if (labelsCache.find(labelName) == labelsCache.end()) { // No, it isn't but that's okay labelsCache.insert(labelName); updateLabelsCache = true; } } // Set labels if (pIndex->setDocumentsLabels(idsList, labelsList, resetLabels) == true) { if (updateLabelsCache == true) { pIndex->setLabels(labelsCache, true); } for (set::const_iterator docIter = idsList.begin(); docIter != idsList.end(); ++docIter) { // Update the metadata backup updateLabels(*docIter, metaData, pIndex, labelsList, resetLabels); } m_pServer->set_flag(DaemonState::SHOULD_FLUSH); invocation.ret(resetLabels); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown documents"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::HasDocument(const ustring &url, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::HasDocument: called on " << url << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } // Check the index unsigned int docId = pIndex->hasDocument(url); if (docId > 0) { invocation.ret(docId); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::GetCloseTerms(const ustring &term, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); set terms; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetCloseTerms: called on " << term << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } unsigned int termsCount = pIndex->getCloseTerms(term, terms); if (terms.empty() == false) { vector termsList; termsList.reserve(terms.size()); for (set::const_iterator termIter = terms.begin(); termIter != terms.end(); ++termIter) { termsList.push_back(*termIter); } invocation.ret(termsList); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::GetDocumentsCount(const ustring &label, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::GetDocumentsCount: called on " << label << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } unsigned int docsCount = pIndex->getDocumentsCount(label.c_str()); invocation.ret(docsCount); delete pIndex; } void DaemonState::DBusMessageHandler::ListDocuments(const ustring &term, guint32 termType, guint32 maxCount, guint32 startOffset, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); set docIds; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::ListDocuments: called on " << term << " " << termType << " " << maxCount << " " << startOffset << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (term.empty() == true) { pIndex->listDocuments(docIds, maxCount, startOffset); } else if ((termType >= 0) && (termType <= 3)) { IndexInterface::NameType type = (IndexInterface::NameType)termType; pIndex->listDocuments(term.c_str(), docIds, type, maxCount, startOffset); } else { Gio::DBus::Error error(Gio::DBus::Error::INVALID_ARGS, "Type is not supported"); invocation.ret(error); return; } vector docIdsList; docIdsList.reserve(docIds.size()); for (set::const_iterator docIter = docIds.begin(); docIter != docIds.end(); ++docIter) { stringstream docIdStr; docIdStr << *docIter; docIdsList.push_back(docIdStr.str().c_str()); } invocation.ret(docIdsList); delete pIndex; } void DaemonState::DBusMessageHandler::UpdateDocument(guint32 docId, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); DocumentInfo docInfo; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::UpdateDocument: called on " << docId << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } if (pIndex->getDocumentInfo(docId, docInfo) == true) { // Update document m_pServer->queue_index(docInfo); invocation.ret(docId); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::SetDocumentInfo(guint32 docId, const vector> &fields, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); MetaDataBackup metaData(settings.getHistoryDatabaseName()); DocumentInfo docInfo; #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::SetDocumentInfo: called on " << docId << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } DBusIndex::documentInfoFromTuples(fields, docInfo); // Update the document info if (pIndex->updateDocumentInfo(docId, docInfo) == true) { // Update the metadata backup metaData.addItem(docInfo, DocumentInfo::SERIAL_FIELDS); m_pServer->set_flag(DaemonState::SHOULD_FLUSH); invocation.ret(docId); } else { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Unknown document"); invocation.ret(error); } delete pIndex; } void DaemonState::DBusMessageHandler::Query(const ustring &engineType, const ustring &engineName, const ustring &searchText, guint32 startDoc, guint32 maxHits, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::Query: called on " << searchText << ", " << startDoc << "/" << maxHits << endl; #endif if (searchText.empty() == true) { Gio::DBus::Error error(Gio::DBus::Error::INVALID_ARGS, "Query is not set"); invocation.ret(error); return; } DBusEngineQueryThread *pEngineQueryThread = NULL; QueryProperties queryProps("", searchText.c_str()); queryProps.setMaximumResultsCount(maxHits); // Provide reasonable defaults if ((engineType.empty() == true) && (engineName.empty() == true)) { pEngineQueryThread = new DBusEngineQueryThread(invocation.getMessage(), settings.m_defaultBackend, settings.m_defaultBackend, settings.m_daemonIndexLocation, queryProps, startDoc, false); } else { pEngineQueryThread = new DBusEngineQueryThread(invocation.getMessage(), engineType.c_str(), engineType.c_str(), engineName, queryProps, startDoc, false); } m_pServer->start_thread(pEngineQueryThread); } void DaemonState::DBusMessageHandler::SimpleQuery(const ustring &searchText, guint32 maxHits, PinotStub::MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::SimpleQuery: called on " << searchText << ", " << maxHits << endl; #endif if (searchText.empty() == true) { Gio::DBus::Error error(Gio::DBus::Error::INVALID_ARGS, "Query is not set"); invocation.ret(error); return; } QueryProperties queryProps("", searchText.c_str()); queryProps.setMaximumResultsCount(maxHits); m_pServer->start_thread(new DBusEngineQueryThread(invocation.getMessage(), settings.m_defaultBackend, settings.m_defaultBackend, settings.m_daemonIndexLocation, queryProps, 0, true)); } bool DaemonState::DBusMessageHandler::DaemonVersion_setHandler(const ustring &value) { return true; } ustring DaemonState::DBusMessageHandler::DaemonVersion_get() { #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::DaemonVersion_get: called" << endl; #endif return PACKAGE_VERSION; } bool DaemonState::DBusMessageHandler::IndexFlushEpoch_setHandler(guint32 value) { m_flushTime = (time_t)value; return true; } guint32 DaemonState::DBusMessageHandler::IndexFlushEpoch_get() { #ifdef DEBUG clog << "DaemonState::DBusMessageHandler::IndexFlushEpoch_get: called on " << m_flushTime << endl; #endif return (unsigned int)m_flushTime; } DaemonState::DBusSearchProvider::DBusSearchProvider(DaemonState *pServer) : SearchProvider2Stub(), m_pServer(pServer) { } DaemonState::DBusSearchProvider::~DBusSearchProvider() { } void DaemonState::DBusSearchProvider::GetInitialResultSet(const vector &terms, MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); string searchText; for (vector::const_iterator termIter = terms.begin(); termIter != terms.end(); ++termIter) { if (searchText.empty() == false) { searchText += " "; } searchText += termIter->c_str(); } #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::GetInitialResultSet: called on " << searchText << endl; #endif QueryProperties queryProps("", searchText.c_str()); queryProps.setMaximumResultsCount(10); // The caller expects the same output as that of SimpleQuery m_pServer->start_thread(new DBusEngineQueryThread(invocation.getMessage(), settings.m_defaultBackend, settings.m_defaultBackend, settings.m_daemonIndexLocation, queryProps, 0, true, false)); } void DaemonState::DBusSearchProvider::GetSubsearchResultSet(const vector &previous_results, const vector &terms, MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); string searchText; for (vector::const_iterator termIter = terms.begin(); termIter != terms.end(); ++termIter) { if (searchText.empty() == false) { searchText += " "; } searchText += termIter->c_str(); } #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::GetSubsearchResultSet: called on " << searchText << endl; #endif QueryProperties queryProps("", searchText.c_str()); queryProps.setMaximumResultsCount(10); // The caller expects the same output as that of SimpleQuery // FIXME: is this meant to return only a subset of previous_results? m_pServer->start_thread(new DBusEngineQueryThread(invocation.getMessage(), settings.m_defaultBackend, settings.m_defaultBackend, settings.m_daemonIndexLocation, queryProps, 0, true, false)); } void DaemonState::DBusSearchProvider::GetResultMetas(const vector &identifiers, MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); vector> idsToDictionary; #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::GetResultMetas: called on " << identifiers.size() << " IDs" << endl; #endif if (pIndex == NULL) { Gio::DBus::Error error(Gio::DBus::Error::FAILED, "Couldn't open index"); invocation.ret(error); return; } for (vector::const_iterator idIter = identifiers.begin(); idIter != identifiers.end(); ++idIter) { DocumentInfo docInfo; unsigned int docId = atoi(idIter->c_str()); if (pIndex->getDocumentInfo(docId, docInfo) == true) { Url urlObj(docInfo.getLocation()); RefPtr typeIcon = Gio::content_type_get_icon(docInfo.getType().c_str()); map docDictionary; string location(docInfo.getLocation()); ustring name(docInfo.getTitle().c_str()); if ((urlObj.getProtocol() == "file") && (location.length() > 7)) { location.erase(0, 7); } name += " - "; name += location; #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::GetResultMetas: " << docId << " " << docInfo.getType() << endl; #endif docDictionary.insert(pair("id", Variant::create(*idIter))); docDictionary.insert(pair("name", Variant::create(name))); docDictionary.insert(pair("gicon", Variant::create(typeIcon->to_string()))); docDictionary.insert(pair("description", Variant::create(docInfo.getExtract()))); idsToDictionary.push_back(docDictionary); } #ifdef DEBUG else clog << "DaemonState::DBusSearchProvider::GetResultMetas: no document for " << docId << endl; #endif } invocation.ret(idsToDictionary); delete pIndex; } void DaemonState::DBusSearchProvider::ActivateResult(const ustring &identifier, const vector &terms, guint32 timestamp, MethodInvocation &invocation) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); DocumentInfo docInfo; unsigned int docId = atoi(identifier.c_str()); #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::ActivateResult: called on " << identifier << endl; #endif if (pIndex == NULL) { invocation.ret(); return; } if (pIndex->getDocumentInfo(docId, docInfo) == true) { RefPtr defaultApp = Gio::AppInfo::get_default_for_type(docInfo.getType().c_str()); RefPtr launchContext; vector uris; uris.push_back(docInfo.getLocation()); defaultApp->launch_uris_async(uris, launchContext); } invocation.ret(); delete pIndex; } void DaemonState::DBusSearchProvider::LaunchSearch(const vector &terms, guint32 timestamp, MethodInvocation &invocation) { #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::LaunchSearch: called on " << terms.size() << " terms" << endl; #endif string queryTerms; for (vector::const_iterator termIter = terms.begin(); termIter != terms.end(); ++termIter) { if (queryTerms.empty() == false) { queryTerms += " "; } queryTerms += termIter->c_str(); } // Open the UI with those query terms if (queryTerms.empty() == false) { MIMEAction queryAction("pinot", string("pinot -q ") + CommandLine::quote(queryTerms)); vector arguments; #ifdef DEBUG clog << "DaemonState::DBusSearchProvider::LaunchSearch: running " << queryAction.m_exec << endl; #endif CommandLine::runAsync(queryAction, arguments); } invocation.ret(); } #endif DaemonState::DaemonState() : QueueManager(PinotSettings::getInstance().m_daemonIndexLocation), #ifdef HAVE_DBUS m_refSessionBus(Gio::DBus::Connection::get_sync(Gio::DBus::BUS_TYPE_SESSION)), m_introspectionHandler(), m_messageHandler(this), m_searchProvider(this), m_powerProxy(Gio::DBus::Proxy::create_for_bus_sync(Gio::DBus::BUS_TYPE_SYSTEM, POWER_DBUS_SERVICE_NAME, POWER_DBUS_OBJECT_PATH, POWER_DBUS_SERVICE_NAME, {}, {}, Gio::DBus::PROXY_FLAGS_DO_NOT_AUTO_START_AT_CONSTRUCTION)), m_connectionId(0), #endif m_isReindex(false), m_tryReload(false), m_readyToReload(false), m_crawlHistory(PinotSettings::getInstance().getHistoryDatabaseName()), m_pDiskMonitor(MonitorFactory::getMonitor()), m_pDiskHandler(NULL), m_crawlers(0) { FD_ZERO(&m_flagsSet); // Check disk usage every minute m_timeoutConnection = signal_timeout().connect(sigc::mem_fun(*this, &DaemonState::on_activity_timeout), 60000); #ifndef CHECK_BATTERY_SYSCTL #ifdef HAVE_DBUS // Listen for battery property changes m_powerProxy->signal_properties_changed(). connect(sigc::mem_fun(this, &DaemonState::handle_power_properties_changed)); #endif #endif // Check right now before doing anything else DaemonState::on_activity_timeout(); m_onThreadEndSignal.connect(sigc::mem_fun(*this, &DaemonState::on_thread_end)); } DaemonState::~DaemonState() { // Don't delete m_pDiskMonitor and m_pDiskHandler, threads may need them // Since DaemonState is destroyed when the program exits, it's a leak we can live with } void DaemonState::disconnect(void) { QueueManager::disconnect(); #ifdef HAVE_DBUS if (m_connectionId > 0) { Gio::DBus::unown_name(m_connectionId); } #endif } #ifdef HAVE_DBUS void DaemonState::handle_power_properties_changed(const Gio::DBus::Proxy::MapChangedProperties &changed_properties, const vector &invalidated_properties) { if (changed_properties.find("OnBattery") != changed_properties.cend()) { check_battery_state(); } } #endif bool DaemonState::on_activity_timeout(void) { if (m_timeoutConnection.blocked() == false) { #ifdef CHECK_DISK_SPACE double availableMbSize = getFSFreeSpace(PinotSettings::getInstance().m_daemonIndexLocation); if (availableMbSize >= 0) { #ifdef DEBUG clog << "DaemonState::on_activity_timeout: " << availableMbSize << " Mb free for " << PinotSettings::getInstance().m_daemonIndexLocation << endl; #endif if (availableMbSize < PinotSettings::getInstance().m_minimumDiskSpace) { // Stop indexing m_stopIndexing = true; // Stop crawling set_flag(LOW_DISK_SPACE); stop_crawling(); clog << "Stopped indexing because of low disk space" << endl; } else if (m_stopIndexing == true) { // Go ahead m_stopIndexing = false; reset_flag(LOW_DISK_SPACE); clog << "Resumed indexing following low disk space condition" << endl; } } #endif #ifdef CHECK_BATTERY_SYSCTL // Check the battery state too check_battery_state(); #endif if ((get_threads_count() == 0) && (is_flag_set(SHOULD_FLUSH) == true)) { // Do the actual flush here reset_flag(SHOULD_FLUSH); flush_and_reclaim(); } #ifdef HAVE_DBUS if (m_messageHandler.mustQuit() == true) { // Disconnect the timeout signal if (m_timeoutConnection.connected() == true) { m_timeoutConnection.block(); m_timeoutConnection.disconnect(); } m_signalQuit(0); } #endif } return true; } void DaemonState::check_battery_state(void) { bool wasOnBattery = is_flag_set(ON_BATTERY); bool onBattery = false; #ifdef CHECK_BATTERY_SYSCTL int acline = 1; size_t len = sizeof(acline); // Are we on battery power ? if (sysctlbyname("hw.acpi.acline", &acline, &len, NULL, 0) != 0) { return; } #ifdef DEBUG clog << "DaemonState::check_battery_state: acline " << acline << endl; #endif if (acline == 0) { onBattery = true; } #else #ifdef HAVE_DBUS Variant boolValue; m_powerProxy->get_cached_property(boolValue, "OnBattery"); onBattery = boolValue.get(); #endif #endif if (onBattery != wasOnBattery) { if (onBattery == true) { // We are now on battery set_flag(ON_BATTERY); stop_crawling(); clog << "System is now on battery" << endl; } else { // Back on-line reset_flag(ON_BATTERY); start_crawling(); clog << "System is now on AC" << endl; } } } bool DaemonState::crawl_location(const PinotSettings::IndexableLocation &location) { CrawlerThread *pCrawlerThread = NULL; bool inlineIndexing = false; // Can we go ahead and crawl ? if ((is_flag_set(LOW_DISK_SPACE) == true) || (is_flag_set(ON_BATTERY) == true)) { #ifdef DEBUG clog << "DaemonState::crawl_location: crawling was stopped" << endl; #endif return false; } if (location.m_name.empty() == true) { return false; } if (m_maxIndexThreads < 2) { inlineIndexing = true; } if (location.m_monitor == false) { // Monitoring is not necessary, but we still have to pass the handler // so that we can act on documents that have been deleted pCrawlerThread = new CrawlerThread(location.m_name, location.m_isSource, NULL, m_pDiskHandler, inlineIndexing); } else { pCrawlerThread = new CrawlerThread(location.m_name, location.m_isSource, m_pDiskMonitor, m_pDiskHandler, inlineIndexing); } pCrawlerThread->getFileFoundSignal().connect(sigc::mem_fun(*this, &DaemonState::on_message_filefound)); if (start_thread(pCrawlerThread, true) == true) { ++m_crawlers; set_flag(CRAWLING); return true; } return false; } void DaemonState::register_session(void) { #ifdef HAVE_DBUS m_connectionId = Gio::DBus::own_name( Gio::DBus::BUS_TYPE_SESSION, PINOT_DBUS_SERVICE_NAME, [&](const RefPtr &connection, const ustring & /* name */) { guint introId = m_introspectionHandler.register_object(m_refSessionBus, PINOT_DBUS_OBJECT_PATH); guint messageId = m_messageHandler.register_object(m_refSessionBus, PINOT_DBUS_OBJECT_PATH); guint searchId = m_searchProvider.register_object(m_refSessionBus, PINOT_DBUS_OBJECT_PATH); #ifdef DEBUG clog << "DaemonState::register_object: registered on " << PINOT_DBUS_OBJECT_PATH << " with IDs " << introId << " " << messageId << " " << searchId << endl; #endif }, [&](const RefPtr &connection, const ustring &name) { #ifdef DEBUG clog << "DaemonState::register_object: acquired " << name << endl; #endif }, [&](const RefPtr &connection, const ustring &name) { #ifdef DEBUG clog << "DaemonState::register_object: lost " << name << endl; #endif mustQuit(true); } ); #endif } void DaemonState::start(bool isReindex) { // Disable implicit flushing after a change WorkerThread::immediateFlush(false); m_isReindex = isReindex; // Fire up the disk monitor thread if (m_pDiskHandler == NULL) { OnDiskHandler *pDiskHandler = new OnDiskHandler(); pDiskHandler->getFileFoundSignal().connect(sigc::mem_fun(*this, &DaemonState::on_message_filefound)); m_pDiskHandler = pDiskHandler; } HistoryMonitorThread *pDiskMonitorThread = new HistoryMonitorThread(m_pDiskMonitor, m_pDiskHandler); start_thread(pDiskMonitorThread, true); for (set::const_iterator locationIter = PinotSettings::getInstance().m_indexableLocations.begin(); locationIter != PinotSettings::getInstance().m_indexableLocations.end(); ++locationIter) { m_crawlQueue.push(*locationIter); } #ifdef DEBUG clog << "DaemonState::start: " << m_crawlQueue.size() << " locations to crawl" << endl; #endif // Update all items status so that we can get rid of files from deleted sources m_crawlHistory.updateItemsStatus(CrawlHistory::CRAWLING, CrawlHistory::TO_CRAWL, 0, true); m_crawlHistory.updateItemsStatus(CrawlHistory::CRAWLED, CrawlHistory::TO_CRAWL, 0, true); m_crawlHistory.updateItemsStatus(CrawlHistory::CRAWL_ERROR, CrawlHistory::TO_CRAWL, 0, true); // Initiate crawling start_crawling(); } bool DaemonState::start_crawling(void) { bool startedCrawler = false; if (write_lock_lists() == true) { #ifdef DEBUG clog << "DaemonState::start_crawling: " << m_crawlQueue.size() << " locations to crawl, " << m_crawlers << " crawlers" << endl; #endif // Get the next location, unless something is still being crawled if (m_crawlers == 0) { reset_flag(CRAWLING); if (m_crawlQueue.empty() == false) { PinotSettings::IndexableLocation nextLocation(m_crawlQueue.front()); startedCrawler = crawl_location(nextLocation); } else { set deletedFiles; // All files left with status TO_CRAWL belong to deleted sources if ((m_pDiskHandler != NULL) && (m_crawlHistory.getItems(CrawlHistory::TO_CRAWL, deletedFiles) > 0)) { #ifdef DEBUG clog << "DaemonState::start_crawling: " << deletedFiles.size() << " orphaned files" << endl; #endif for(set::const_iterator fileIter = deletedFiles.begin(); fileIter != deletedFiles.end(); ++fileIter) { #ifdef DEBUG clog << "DaemonState::start_crawling: " << *fileIter << " was not found" << endl; #endif // Inform the MonitorHandler m_pDiskHandler->fileDeleted(fileIter->substr(7)); // Delete this item m_crawlHistory.deleteItem(*fileIter); } } } } unlock_lists(); } return startedCrawler; } void DaemonState::stop_crawling(void) { if (write_lock_threads() == true) { if (m_threads.empty() == false) { // Stop all Crawler threads for_each(m_threads.begin(), m_threads.end(), StopCrawlerThreadFunc()); } unlock_threads(); } } void DaemonState::on_thread_end(WorkerThread *pThread) { string indexedUrl; bool restoreMetadata = false; if (pThread == NULL) { return; } string type(pThread->getType()); bool isStopped = pThread->isStopped(); #ifdef DEBUG clog << "DaemonState::on_thread_end: end of thread " << type << " " << pThread->getId() << endl; #endif // What type of thread was it ? if (type == "CrawlerThread") { CrawlerThread *pCrawlerThread = dynamic_cast(pThread); if (pCrawlerThread == NULL) { delete pThread; return; } --m_crawlers; #ifdef DEBUG clog << "DaemonState::on_thread_end: done crawling " << pCrawlerThread->getDirectory() << endl; #endif if (isStopped == false) { // Pop the queue m_crawlQueue.pop(); restoreMetadata = true; set_flag(DaemonState::SHOULD_FLUSH); } // Else, the directory wasn't fully crawled so better leave it in the queue start_crawling(); } else if (type == "IndexingThread") { IndexingThread *pIndexThread = dynamic_cast(pThread); if (pIndexThread == NULL) { delete pThread; return; } // Get the URL we have just indexed indexedUrl = pIndexThread->getURL(); // Did it fail ? int errorNum = pThread->getErrorNum(); if ((errorNum > 0) && (indexedUrl.empty() == false)) { // An entry should already exist for this m_crawlHistory.updateItem(indexedUrl, CrawlHistory::CRAWL_ERROR, time(NULL), errorNum); } } else if (type == "UnindexingThread") { // FIXME: anything to do ? } else if (type == "MonitorThread") { if (m_readyToReload == true) { PinotSettings &settings = PinotSettings::getInstance(); m_readyToReload = false; // Stop monitoring all locations if (m_pDiskMonitor != NULL) { for (set::const_iterator locationIter = settings.m_indexableLocations.begin(); locationIter != settings.m_indexableLocations.end(); ++locationIter) { if (locationIter->m_monitor == true) { #ifdef DEBUG clog << "DaemonState::on_thread_end: unmonitoring all under " << locationIter->m_name << endl; #endif m_pDiskMonitor->removeLocations(locationIter->m_name); } } m_pDiskMonitor->dropPendingEvents(); } // Reload settings settings.clear(); settings.load(PinotSettings::LOAD_ALL); // ...and restart everything start(false); } } #ifdef HAVE_DBUS else if (type == "DBusEngineQueryThread") { DBusEngineQueryThread *pEngineQueryThread = dynamic_cast(pThread); if (pEngineQueryThread == NULL) { delete pThread; return; } RefPtr refInvocation = pEngineQueryThread->getInvocation(); const vector &documentsList = pEngineQueryThread->getDocuments(); unsigned int documentsCount = pEngineQueryThread->getDocumentsCount(); bool simpleQuery = pEngineQueryThread->isSimpleQuery(); bool pinotCall = pEngineQueryThread->isPinotCall(); vector idsList; vector>> docTuples; for (vector::const_iterator docIter = documentsList.begin(); docIter != documentsList.end(); ++docIter) { unsigned int indexId = 0; unsigned int docId = docIter->getIsIndexed(indexId); if (simpleQuery == false) { vector> tuples; // The document ID isn't needed here DBusIndex::documentInfoToTuples(*docIter, tuples); docTuples.push_back(tuples); } else if (docId > 0) { stringstream docIdStr; // We only need the document ID docIdStr << docId; idsList.push_back(docIdStr.str().c_str()); } } if (pinotCall == true) { com::github::fabricecolin::PinotStub::MethodInvocation pinotInvocation(refInvocation); if (simpleQuery == false) { pinotInvocation.ret(documentsCount, docTuples); } else { pinotInvocation.ret(idsList); } } else { org::gnome::Shell::SearchProvider2Stub::MethodInvocation shellInvocation(refInvocation); shellInvocation.ret(idsList); } } #endif else if (type == "DBusReloadThread") { m_tryReload = true; #ifdef DEBUG clog << "DaemonState::on_thread_end: will try to reload" << endl; #endif } else if (type == "RestoreMetaDataThread") { set_flag(DaemonState::SHOULD_FLUSH); } // Delete the thread delete pThread; // Wait until there are no threads running (except background ones) // to reload the configuration if ((m_tryReload == true) && (get_threads_count() == 0)) { #ifdef DEBUG clog << "DaemonState::on_thread_end: stopping all threads" << endl; #endif // Reload when MonitorThread stops m_tryReload = false; m_readyToReload = true; // Stop background threads stop_threads(); // ...clear the queues clear_queues(); } else if (isStopped == false) { // Try to run a queued action unless threads were stopped bool emptyQueue = pop_queue(indexedUrl); // Wait until there are no threads running (except background ones) // and the queue is empty to flush the index if ((restoreMetadata == true) && (emptyQueue == true) && (get_threads_count() == 0)) { if ((m_isReindex == true) && (m_crawlQueue.empty() == true)) { // Restore metadata on documents and flush when the tread returns start_thread(new RestoreMetaDataThread()); } } } } void DaemonState::on_message_filefound(DocumentInfo docInfo, bool isDirectory) { if (isDirectory == false) { queue_index(docInfo); } else { PinotSettings::IndexableLocation newLocation; newLocation.m_monitor = true; newLocation.m_name = docInfo.getLocation().substr(7); newLocation.m_isSource = false; #ifdef DEBUG clog << "DaemonState::on_message_filefound: new directory " << newLocation.m_name << endl; #endif // Queue this directory for crawling m_crawlQueue.push(newLocation); start_crawling(); } } sigc::signal1& DaemonState::getQuitSignal(void) { return m_signalQuit; } void DaemonState::set_flag(StatusFlag flag) { FD_SET((int)flag, &m_flagsSet); } bool DaemonState::is_flag_set(StatusFlag flag) { if (FD_ISSET((int)flag, &m_flagsSet)) { return true; } return false; } void DaemonState::reset_flag(StatusFlag flag) { FD_CLR((int)flag, &m_flagsSet); } void DaemonState::flush_and_reclaim(void) { IndexInterface *pIndex = PinotSettings::getInstance().getIndex(PinotSettings::getInstance().m_daemonIndexLocation); if (pIndex != NULL) { // Flush pIndex->flush(); #ifdef HAVE_DBUS // Update the DBus property m_messageHandler.IndexFlushEpoch_set((unsigned int)time(NULL)); #endif delete pIndex; } int inUse = Memory::getUsage(); Memory::reclaim(); } pinot-1.22/Core/DaemonState.h000066400000000000000000000160441470740426600160530ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DAEMONSTATE_HH #define _DAEMONSTATE_HH #include #include #include #include #include #include #include #ifdef HAVE_DBUS #include #include #endif #include "CrawlHistory.h" #ifdef HAVE_DBUS #include "DBusIndex.h" #endif #include "MonitorInterface.h" #include "MonitorHandler.h" #include "PinotSettings.h" #include "PinotDBus_stub.h" #include "SearchProvider_stub.h" #include "WorkerThreads.h" class DaemonState : public QueueManager { public: DaemonState(); virtual ~DaemonState(); typedef enum { LOW_DISK_SPACE = 0, ON_BATTERY, CRAWLING, STOPPED, DISCONNECTED, SHOULD_FLUSH } StatusFlag; virtual void disconnect(void); void register_session(void); void start(bool isReindex); bool start_crawling(void); void stop_crawling(void); void on_thread_end(WorkerThread *pThread); void on_message_filefound(DocumentInfo docInfo, bool isDirectory); sigc::signal1& getQuitSignal(void); void set_flag(StatusFlag flag); bool is_flag_set(StatusFlag flag); void reset_flag(StatusFlag flag); void flush_and_reclaim(void); protected: #ifdef HAVE_DBUS class DBusIntrospectHandler : public org::freedesktop::DBus::IntrospectableStub { public: DBusIntrospectHandler(); virtual ~DBusIntrospectHandler(); protected: virtual void Introspect(IntrospectableStub::MethodInvocation &invocation); }; class DBusMessageHandler : public com::github::fabricecolin::PinotStub { public: DBusMessageHandler(DaemonState *pServer); virtual ~DBusMessageHandler(); bool mustQuit(void) const; protected: DaemonState *m_pServer; time_t m_flushTime; bool m_mustQuit; virtual void GetStatistics(PinotStub::MethodInvocation &invocation); virtual void Reload(PinotStub::MethodInvocation &invocation); virtual void Stop(PinotStub::MethodInvocation &invocation); virtual void GetDocumentInfo(guint32 docId, PinotStub::MethodInvocation &invocation); virtual void GetDocumentTermsCount(guint32 docId, PinotStub::MethodInvocation &invocation); virtual void GetDocumentTerms(guint32 docId, PinotStub::MethodInvocation &invocation); virtual void GetLabels(PinotStub::MethodInvocation &invocation); virtual void AddLabel(const Glib::ustring &label, PinotStub::MethodInvocation &invocation); virtual void DeleteLabel(const Glib::ustring &label, PinotStub::MethodInvocation &invocation); virtual void HasLabel(guint32 docId, const Glib::ustring &label, PinotStub::MethodInvocation &invocation); virtual void GetDocumentLabels(guint32 docId, PinotStub::MethodInvocation &invocation); virtual void SetDocumentLabels(guint32 docId, const std::vector &labels, bool resetLabels, PinotStub::MethodInvocation &invocation); virtual void SetDocumentsLabels(const std::vector &docIds, const std::vector &labels, bool resetLabels, PinotStub::MethodInvocation &invocation); virtual void HasDocument(const Glib::ustring &url, PinotStub::MethodInvocation &invocation); virtual void GetCloseTerms(const Glib::ustring &term, PinotStub::MethodInvocation &invocation); virtual void GetDocumentsCount(const Glib::ustring &label, PinotStub::MethodInvocation &invocation); virtual void ListDocuments(const Glib::ustring &term, guint32 termType, guint32 maxCount, guint32 startOffset, PinotStub::MethodInvocation &invocation); virtual void UpdateDocument(guint32 docId, PinotStub::MethodInvocation &invocation); virtual void SetDocumentInfo(guint32 docId, const std::vector> &fields, PinotStub::MethodInvocation &invocation); virtual void Query(const Glib::ustring &engineType, const Glib::ustring &engineName, const Glib::ustring &searchText, guint32 startDoc, guint32 maxHits, PinotStub::MethodInvocation &invocation); virtual void SimpleQuery(const Glib::ustring &searchText, guint32 maxHits, PinotStub::MethodInvocation &invocation); virtual bool DaemonVersion_setHandler(const Glib::ustring &value); virtual Glib::ustring DaemonVersion_get(); virtual bool IndexFlushEpoch_setHandler(guint32 value); virtual guint32 IndexFlushEpoch_get(); }; class DBusSearchProvider : public org::gnome::Shell::SearchProvider2Stub { public: DBusSearchProvider(DaemonState *pServer); virtual ~DBusSearchProvider(); protected: DaemonState *m_pServer; virtual void GetInitialResultSet(const std::vector &terms, MethodInvocation &invocation); virtual void GetSubsearchResultSet(const std::vector &previous_results, const std::vector &terms, MethodInvocation &invocation); virtual void GetResultMetas(const std::vector &identifiers, MethodInvocation &invocation); virtual void ActivateResult(const Glib::ustring &identifier, const std::vector &terms, guint32 timestamp, MethodInvocation &invocation); virtual void LaunchSearch(const std::vector &terms, guint32 timestamp, MethodInvocation &invocation); }; Glib::RefPtr m_refSessionBus; DBusIntrospectHandler m_introspectionHandler; DBusMessageHandler m_messageHandler; DBusSearchProvider m_searchProvider; Glib::RefPtr m_powerProxy; guint m_connectionId; #endif bool m_isReindex; bool m_tryReload; bool m_readyToReload; fd_set m_flagsSet; CrawlHistory m_crawlHistory; MonitorInterface *m_pDiskMonitor; MonitorHandler *m_pDiskHandler; sigc::connection m_timeoutConnection; sigc::signal1 m_signalQuit; unsigned int m_crawlers; std::queue m_crawlQueue; #ifdef HAVE_DBUS void handle_power_properties_changed(const Gio::DBus::Proxy::MapChangedProperties &changed_properties, const std::vector &invalidated_properties); #endif bool on_activity_timeout(void); void check_battery_state(void); bool crawl_location(const PinotSettings::IndexableLocation &location); }; #endif // _DAEMONSTATE_HH pinot-1.22/Core/Makefile.am000066400000000000000000000065401470740426600155320ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in pkginclude_HEADERS = \ DaemonState.h \ DBusServerThreads.h \ OnDiskHandler.h \ PinotDBus_stub.h \ PinotSettings.h \ SearchProvider_common.h \ SearchProvider_stub.h \ ServerThreads.h \ UniqueApplication.h \ WorkerThread.h \ WorkerThreads.h pkglib_LTLIBRARIES = libThread.la libCore.la libThread_la_LDFLAGS = \ -static libThread_la_SOURCES = \ WorkerThread.cpp libCore_la_LDFLAGS = \ -static libCore_la_SOURCES = \ PinotSettings.cpp \ UniqueApplication.cpp \ WorkerThreads.cpp if HAVE_DBUS bin_PROGRAMS = pinot-index pinot-search pinot-dbus-daemon else bin_PROGRAMS = pinot-index pinot-search pinot-daemon endif AM_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils \ -I$(top_srcdir)/Tokenize \ -I$(top_srcdir)/Tokenize/filters \ -I$(top_srcdir)/SQL \ -I$(top_srcdir)/Monitor \ -I$(top_srcdir)/Collect \ -I$(top_srcdir)/IndexSearch \ @SQL_CFLAGS@ @HTTP_CFLAGS@ @XML_CFLAGS@ \ @INDEX_CFLAGS@ @GTHREAD_CFLAGS@ @GIOMM_CFLAGS@ \ @GLIBMM_CFLAGS@ @GTKMM_CFLAGS@ if HAVE_DBUS AM_CXXFLAGS += -DHAVE_DBUS endif pinot_index_LDFLAGS = \ -export-dynamic pinot_index_LDADD = \ -L$(top_builddir)/Utils \ -L$(top_builddir)/Tokenize \ -L$(top_builddir)/SQL \ -L$(top_builddir)/Monitor \ -L$(top_builddir)/Collect \ -L$(top_builddir)/IndexSearch \ -lCore -lThread -lIndexSearch -lMonitor -lCollect -lSQLDB -lSQLite -lSQL \ -lTokenize -lFilter -lBasicUtils -lUtils @LIBS@ \ @GLIBMM_LIBS@ @GIOMM_LIBS@ @GTHREAD_LIBS@ @XML_LIBS@ \ @HTTP_LIBS@ @SQL_LIBS@ @MISC_LIBS@ pinot_index_SOURCES = pinot-index.cpp pinot_index_DEPENDENCIES = libCore.la libThread.la pinot_search_LDFLAGS = \ -export-dynamic pinot_search_LDADD = \ -L$(top_builddir)/Utils \ -L$(top_builddir)/Tokenize \ -L$(top_builddir)/Collect \ -L$(top_builddir)/IndexSearch \ -lCore -lThread -lIndexSearch -lCollect -lTokenize -lFilter \ -lBasicUtils -lUtils @LIBS@ \ @GLIBMM_LIBS@ @GIOMM_LIBS@ \ @XML_LIBS@ @HTTP_LIBS@ @MISC_LIBS@ pinot_search_SOURCES = \ pinot-search.cpp pinot_search_DEPENDENCIES = libCore.la libThread.la pinot_dbus_daemon_LDFLAGS = \ -export-dynamic pinot_dbus_daemon_LDADD = \ -L$(top_builddir)/Utils \ -L$(top_builddir)/Tokenize \ -L$(top_builddir)/SQL \ -L$(top_builddir)/Monitor \ -L$(top_builddir)/Collect \ -L$(top_builddir)/IndexSearch \ -lCore -lThread -lIndexSearch -lMonitor -lCollect -lSQLDB -lSQLite -lSQL \ -lTokenize -lFilter -lBasicUtils -lUtils @LIBS@ \ @GLIBMM_LIBS@ @GIOMM_LIBS@ @GTHREAD_LIBS@ @XML_LIBS@ \ @HTTP_LIBS@ @SQL_LIBS@ @MISC_LIBS@ pinot_dbus_daemon_SOURCES = \ DaemonState.cpp \ DBusServerThreads.cpp \ OnDiskHandler.cpp \ PinotDBus_stub.cpp \ SearchProvider_stub.cpp \ ServerThreads.cpp \ pinot-dbus-daemon.cpp pinot_dbus_daemon_DEPENDENCIES = libCore.la libThread.la pinot_daemon_LDFLAGS = \ -export-dynamic pinot_daemon_LDADD = \ -L$(top_builddir)/Utils \ -L$(top_builddir)/Tokenize \ -L$(top_builddir)/SQL \ -L$(top_builddir)/Monitor \ -L$(top_builddir)/Collect \ -L$(top_builddir)/IndexSearch \ -lCore -lThread -lIndexSearch -lMonitor -lCollect -lSQLDB -lSQLite -lSQL \ -lTokenize -lFilter -lBasicUtils -lUtils @LIBS@ \ @GLIBMM_LIBS@ @GIOMM_LIBS@ @GTHREAD_LIBS@ @XML_LIBS@ \ @HTTP_LIBS@ @SQL_LIBS@ @MISC_LIBS@ pinot_daemon_SOURCES = \ DaemonState.cpp \ OnDiskHandler.cpp \ ServerThreads.cpp \ pinot-dbus-daemon.cpp pinot_daemon_DEPENDENCIES = libCore.la libThread.la pinot-1.22/Core/OnDiskHandler.cpp000066400000000000000000000245331470740426600166710ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "MIMEScanner.h" #include "StringManip.h" #include "Url.h" #include "FilterWrapper.h" #include "PinotSettings.h" #include "OnDiskHandler.h" using namespace std; OnDiskHandler::OnDiskHandler() : MonitorHandler(), m_history(PinotSettings::getInstance().getHistoryDatabaseName()), m_metaData(PinotSettings::getInstance().getHistoryDatabaseName()), m_pIndex(PinotSettings::getInstance().getIndex(PinotSettings::getInstance().m_daemonIndexLocation)) { pthread_mutex_init(&m_mutex, NULL); } OnDiskHandler::~OnDiskHandler() { pthread_mutex_destroy(&m_mutex); // Disconnect the signal sigc::signal2::slot_list_type slotsList = m_signalFileFound.slots(); sigc::signal2::slot_list_type::iterator slotIter = slotsList.begin(); if (slotIter != slotsList.end()) { if (slotIter->empty() == false) { slotIter->block(); slotIter->disconnect(); } } if (m_pIndex != NULL) { delete m_pIndex; } } bool OnDiskHandler::fileMoved(const string &fileName, const string &previousFileName, IndexInterface::NameType type) { set docIdList; bool handledEvent = false; #ifdef DEBUG clog << "OnDiskHandler::fileMoved: " << fileName << endl; #endif if (m_pIndex == NULL) { return false; } pthread_mutex_lock(&m_mutex); // Get a list of documents in that directory/file if (type == IndexInterface::BY_FILE) { m_pIndex->listDocuments(string("file://") + previousFileName, docIdList, type); } else { m_pIndex->listDocuments(previousFileName, docIdList, type); } // ...and the directory/file itself unsigned int baseDocId = m_pIndex->hasDocument(string("file://") + previousFileName); if (baseDocId > 0) { docIdList.insert(baseDocId); } if (docIdList.empty() == false) { for (set::const_iterator iter = docIdList.begin(); iter != docIdList.end(); ++iter) { DocumentInfo docInfo; #ifdef DEBUG clog << "OnDiskHandler::fileMoved: moving " << *iter << endl; #endif if (m_pIndex->getDocumentInfo(*iter, docInfo) == true) { string newLocation(docInfo.getLocation()); if (baseDocId == *iter) { Url previousUrlObj(previousFileName), urlObj(fileName); // Update the title if it was the directory/file name if (docInfo.getTitle() == previousUrlObj.getFile()) { docInfo.setTitle(urlObj.getFile()); } } string::size_type pos = newLocation.find(previousFileName); if (pos != string::npos) { newLocation.replace(pos, previousFileName.length(), fileName); // Change the location docInfo.setLocation(newLocation); handledEvent = replaceFile(*iter, docInfo); #ifdef DEBUG clog << "OnDiskHandler::fileMoved: moved " << *iter << ", " << docInfo.getLocation() << endl; #endif } #ifdef DEBUG else clog << "OnDiskHandler::fileMoved: skipping " << newLocation << endl; #endif } } } #ifdef DEBUG else clog << "OnDiskHandler::fileMoved: no documents in " << previousFileName << endl; #endif pthread_mutex_unlock(&m_mutex); return handledEvent; } bool OnDiskHandler::fileDeleted(const string &fileName, IndexInterface::NameType type) { set docIdList; string location(string("file://") + fileName); bool unindexedDocs = false, handledEvent = false; #ifdef DEBUG clog << "OnDiskHandler::fileDeleted: " << fileName << endl; #endif if (m_pIndex == NULL) { return false; } pthread_mutex_lock(&m_mutex); // Unindex all of the directory/file's documents if (type == IndexInterface::BY_FILE) { unindexedDocs = m_pIndex->unindexDocuments(location, type); } else { unindexedDocs = m_pIndex->unindexDocuments(fileName, type); } if (unindexedDocs == true) { // ...as well as the actual directory/file m_pIndex->unindexDocument(location); m_history.deleteItems(location); handledEvent = true; } pthread_mutex_unlock(&m_mutex); return handledEvent; } bool OnDiskHandler::indexFile(const string &fileName, bool isDirectory, unsigned int &sourceId) { string location(string("file://") + fileName); Url urlObj(location); if (fileName.empty() == true) { return false; } // Is it black-listed ? if (PinotSettings::getInstance().isBlackListed(fileName) == true) { return false; } DocumentInfo docInfo("", location, MIMEScanner::scanUrl(urlObj), ""); // What source does it belong to ? for (map::const_iterator sourceIter = m_fileSources.begin(); sourceIter != m_fileSources.end(); ++sourceIter) { sourceId = sourceIter->first; if (sourceIter->second.length() > location.length()) { // Skip continue; } if (location.substr(0, sourceIter->second.length()) == sourceIter->second) { set labels; stringstream labelStream; // That's the one labelStream << "X-SOURCE" << sourceIter->first; #ifdef DEBUG clog << "OnDiskHandler::indexFile: source label for " << location << " is " << labelStream.str() << endl; #endif labels.insert(labelStream.str()); docInfo.setLabels(labels); break; } #ifdef DEBUG else clog << "OnDiskHandler::indexFile: not " << sourceIter->second << endl; #endif } m_metaData.getItem(docInfo, DocumentInfo::SERIAL_ALL); m_signalFileFound(docInfo, isDirectory); return true; } bool OnDiskHandler::replaceFile(unsigned int docId, DocumentInfo &docInfo) { if (m_pIndex == NULL) { return false; } // Unindex the destination file FilterWrapper wrapFilter(m_pIndex); wrapFilter.unindexDocument(docInfo.getLocation()); // Update the document info return m_pIndex->updateDocumentInfo(docId, docInfo); } void OnDiskHandler::initialize(void) { set directories; // Get the map of indexable locations set &indexableLocations = PinotSettings::getInstance().m_indexableLocations; for (set::iterator dirIter = indexableLocations.begin(); dirIter != indexableLocations.end(); ++dirIter) { directories.insert(dirIter->m_name); } // Unindex documents that belong to sources that no longer exist if (m_history.getSources(m_fileSources) == 0) { return; } for(map::const_iterator sourceIter = m_fileSources.begin(); sourceIter != m_fileSources.end(); ++sourceIter) { unsigned int sourceId = sourceIter->first; if (sourceIter->second.substr(0, 7) != "file://") { // Skip continue; } // Is this an indexable location ? if (directories.find(sourceIter->second.substr(7)) == directories.end()) { stringstream labelStream; labelStream << "X-SOURCE" << sourceId; #ifdef DEBUG clog << "OnDiskHandler::initialize: " << sourceIter->second << ", source " << sourceId << " was removed" << endl; #endif // All documents with this label will be unindexed if ((m_pIndex != NULL) && (m_pIndex->unindexDocuments(labelStream.str(), IndexInterface::BY_LABEL) == true)) { // Delete the source itself and all its items m_history.deleteSource(sourceId); m_history.deleteItems(sourceId); } } #ifdef DEBUG else clog << "OnDiskHandler::initialize: " << sourceIter->second << " is still configured" << endl; #endif } } bool OnDiskHandler::fileExists(const string &fileName) { // Nothing to do here return true; } bool OnDiskHandler::fileCreated(const string &fileName) { unsigned int sourceId; bool handledEvent = false; #ifdef DEBUG clog << "OnDiskHandler::fileCreated: " << fileName << endl; #endif pthread_mutex_lock(&m_mutex); // The file may exist in the index handledEvent = indexFile(fileName, false, sourceId); if (handledEvent == true) { string location("file://" + fileName); CrawlHistory::CrawlStatus status = CrawlHistory::UNKNOWN; time_t itemDate; // ...and therefore may exist in the history database if (m_history.hasItem(location, status, itemDate) == true) { m_history.updateItem(location, CrawlHistory::CRAWLED, time(NULL)); } else { m_history.insertItem(location, CrawlHistory::CRAWLED, sourceId, time(NULL)); } } pthread_mutex_unlock(&m_mutex); return handledEvent; } bool OnDiskHandler::directoryCreated(const string &dirName) { unsigned int sourceId; bool handledEvent = false; #ifdef DEBUG clog << "OnDiskHandler::directoryCreated: " << dirName << endl; #endif pthread_mutex_lock(&m_mutex); handledEvent = indexFile(dirName, true, sourceId); // History will be set by crawling pthread_mutex_unlock(&m_mutex); return handledEvent; } bool OnDiskHandler::fileModified(const string &fileName) { unsigned int sourceId; bool handledEvent = false; #ifdef DEBUG clog << "OnDiskHandler::fileModified: " << fileName << endl; #endif pthread_mutex_lock(&m_mutex); // Update the file, or index if necessary handledEvent = indexFile(fileName, false, sourceId); if (handledEvent == true) { m_history.updateItem("file://" + fileName, CrawlHistory::CRAWLED, time(NULL)); } pthread_mutex_unlock(&m_mutex); return handledEvent; } bool OnDiskHandler::fileMoved(const string &fileName, const string &previousFileName) { return fileMoved(fileName, previousFileName, IndexInterface::BY_FILE); } bool OnDiskHandler::directoryMoved(const string &dirName, const string &previousDirName) { return fileMoved(dirName, previousDirName, IndexInterface::BY_DIRECTORY); } bool OnDiskHandler::fileDeleted(const string &fileName) { return fileDeleted(fileName, IndexInterface::BY_FILE); } bool OnDiskHandler::directoryDeleted(const string &dirName) { return fileDeleted(dirName, IndexInterface::BY_DIRECTORY); } sigc::signal2& OnDiskHandler::getFileFoundSignal(void) { return m_signalFileFound; } pinot-1.22/Core/OnDiskHandler.h000066400000000000000000000055371470740426600163410ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _ONDISKHANDLER_HH #define _ONDISKHANDLER_HH #include #include #include #include #include #include "CrawlHistory.h" #include "MetaDataBackup.h" #include "IndexInterface.h" #include "MonitorHandler.h" #include "PinotSettings.h" class OnDiskHandler : public MonitorHandler { public: OnDiskHandler(); virtual ~OnDiskHandler(); /// Initializes things before starting monitoring. virtual void initialize(void); /// Handles file existence events. virtual bool fileExists(const std::string &fileName); /// Handles file creation events. virtual bool fileCreated(const std::string &fileName); /// Handles directory creation events. virtual bool directoryCreated(const std::string &dirName); /// Handles file modified events. virtual bool fileModified(const std::string &fileName); /// Handles file moved events. virtual bool fileMoved(const std::string &fileName, const std::string &previousFileName); /// Handles directory moved events. virtual bool directoryMoved(const std::string &dirName, const std::string &previousDirName); /// Handles file deleted events. virtual bool fileDeleted(const std::string &fileName); /// Handles directory deleted events. virtual bool directoryDeleted(const std::string &dirName); sigc::signal2& getFileFoundSignal(void); protected: pthread_mutex_t m_mutex; sigc::signal2 m_signalFileFound; std::map m_fileSources; CrawlHistory m_history; MetaDataBackup m_metaData; IndexInterface *m_pIndex; bool fileMoved(const std::string &fileName, const std::string &previousFileName, IndexInterface::NameType type); bool fileDeleted(const std::string &fileName, IndexInterface::NameType type); bool indexFile(const std::string &fileName, bool isDirectory, unsigned int &sourceId); bool replaceFile(unsigned int docId, DocumentInfo &docInfo); private: OnDiskHandler(const OnDiskHandler &other); OnDiskHandler &operator=(const OnDiskHandler &other); }; #endif // _ONDISKHANDLER_HH pinot-1.22/Core/PinotDBus_stub.cpp000066400000000000000000000715011470740426600171050ustar00rootroot00000000000000static const char interfaceXml0[] = R"XML_DELIMITER( )XML_DELIMITER"; #include "PinotDBus_stub.h" template inline T specialGetter(Glib::Variant variant) { return variant.get(); } template<> inline std::string specialGetter(Glib::Variant variant) { // String is not guaranteed to be null-terminated, so don't use ::get() gsize n_elem; gsize elem_size = sizeof(char); char* data = (char*)g_variant_get_fixed_array(variant.gobj(), &n_elem, elem_size); return std::string(data, n_elem); } org::freedesktop::DBus::IntrospectableStub::IntrospectableStub(): m_interfaceName("org.freedesktop.DBus.Introspectable") { } org::freedesktop::DBus::IntrospectableStub::~IntrospectableStub() { unregister_object(); } guint org::freedesktop::DBus::IntrospectableStub::register_object( const Glib::RefPtr &connection, const Glib::ustring &object_path) { if (!introspection_data) { try { introspection_data = Gio::DBus::NodeInfo::create_for_xml(interfaceXml0); } catch(const Glib::Error& ex) { g_warning("Unable to create introspection data for %s: %s", object_path.c_str(), ex.what().c_str()); return 0; } } Gio::DBus::InterfaceVTable *interface_vtable = new Gio::DBus::InterfaceVTable( sigc::mem_fun(this, &IntrospectableStub::on_method_call), sigc::mem_fun(this, &IntrospectableStub::on_interface_get_property), sigc::mem_fun(this, &IntrospectableStub::on_interface_set_property)); guint registration_id; try { registration_id = connection->register_object(object_path, introspection_data->lookup_interface("org.freedesktop.DBus.Introspectable"), *interface_vtable); } catch(const Glib::Error &ex) { g_warning("Registration of object %s failed: %s", object_path.c_str(), ex.what().c_str()); return 0; } m_registered_objects.emplace_back(RegisteredObject { registration_id, connection, object_path }); return registration_id; } void org::freedesktop::DBus::IntrospectableStub::unregister_object() { for (const RegisteredObject &obj: m_registered_objects) { obj.connection->unregister_object(obj.id); } m_registered_objects.clear(); } void org::freedesktop::DBus::IntrospectableStub::on_method_call( const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &method_name, const Glib::VariantContainerBase ¶meters, const Glib::RefPtr &invocation) { static_cast(method_name); // maybe unused static_cast(parameters); // maybe unused static_cast(invocation); // maybe unused if (method_name.compare("Introspect") == 0) { MethodInvocation methodInvocation(invocation); Introspect( methodInvocation); } } void org::freedesktop::DBus::IntrospectableStub::on_interface_get_property( Glib::VariantBase &property, const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &property_name) { static_cast(property); // maybe unused static_cast(property_name); // maybe unused } bool org::freedesktop::DBus::IntrospectableStub::on_interface_set_property( const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &property_name, const Glib::VariantBase &value) { static_cast(property_name); // maybe unused static_cast(value); // maybe unused return true; } bool org::freedesktop::DBus::IntrospectableStub::emitSignal( const std::string &propName, Glib::VariantBase &value) { std::map changedProps; std::vector changedPropsNoValue; changedProps[propName] = value; Glib::Variant> changedPropsVar = Glib::Variant>::create(changedProps); Glib::Variant> changedPropsNoValueVar = Glib::Variant>::create(changedPropsNoValue); std::vector ps; ps.push_back(Glib::Variant::create(m_interfaceName)); ps.push_back(changedPropsVar); ps.push_back(changedPropsNoValueVar); Glib::VariantContainerBase propertiesChangedVariant = Glib::Variant>::create_tuple(ps); for (const RegisteredObject &obj: m_registered_objects) { obj.connection->emit_signal( obj.object_path, "org.freedesktop.DBus.Properties", "PropertiesChanged", Glib::ustring(), propertiesChangedVariant); } return true; }com::github::fabricecolin::PinotStub::PinotStub(): m_interfaceName("com.github.fabricecolin.Pinot") { } com::github::fabricecolin::PinotStub::~PinotStub() { unregister_object(); } guint com::github::fabricecolin::PinotStub::register_object( const Glib::RefPtr &connection, const Glib::ustring &object_path) { if (!introspection_data) { try { introspection_data = Gio::DBus::NodeInfo::create_for_xml(interfaceXml0); } catch(const Glib::Error& ex) { g_warning("Unable to create introspection data for %s: %s", object_path.c_str(), ex.what().c_str()); return 0; } } Gio::DBus::InterfaceVTable *interface_vtable = new Gio::DBus::InterfaceVTable( sigc::mem_fun(this, &PinotStub::on_method_call), sigc::mem_fun(this, &PinotStub::on_interface_get_property), sigc::mem_fun(this, &PinotStub::on_interface_set_property)); guint registration_id; try { registration_id = connection->register_object(object_path, introspection_data->lookup_interface("com.github.fabricecolin.Pinot"), *interface_vtable); } catch(const Glib::Error &ex) { g_warning("Registration of object %s failed: %s", object_path.c_str(), ex.what().c_str()); return 0; } m_registered_objects.emplace_back(RegisteredObject { registration_id, connection, object_path }); return registration_id; } void com::github::fabricecolin::PinotStub::unregister_object() { for (const RegisteredObject &obj: m_registered_objects) { obj.connection->unregister_object(obj.id); } m_registered_objects.clear(); } void com::github::fabricecolin::PinotStub::on_method_call( const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &method_name, const Glib::VariantContainerBase ¶meters, const Glib::RefPtr &invocation) { static_cast(method_name); // maybe unused static_cast(parameters); // maybe unused static_cast(invocation); // maybe unused if (method_name.compare("GetStatistics") == 0) { MethodInvocation methodInvocation(invocation); GetStatistics( methodInvocation); } if (method_name.compare("Reload") == 0) { MethodInvocation methodInvocation(invocation); Reload( methodInvocation); } if (method_name.compare("Stop") == 0) { MethodInvocation methodInvocation(invocation); Stop( methodInvocation); } if (method_name.compare("GetDocumentInfo") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); MethodInvocation methodInvocation(invocation); GetDocumentInfo( (p_docId), methodInvocation); } if (method_name.compare("GetDocumentTermsCount") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); MethodInvocation methodInvocation(invocation); GetDocumentTermsCount( (p_docId), methodInvocation); } if (method_name.compare("GetDocumentTerms") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); MethodInvocation methodInvocation(invocation); GetDocumentTerms( (p_docId), methodInvocation); } if (method_name.compare("GetLabels") == 0) { MethodInvocation methodInvocation(invocation); GetLabels( methodInvocation); } if (method_name.compare("AddLabel") == 0) { Glib::Variant base_label; parameters.get_child(base_label, 0); Glib::ustring p_label = specialGetter(base_label); MethodInvocation methodInvocation(invocation); AddLabel( (p_label), methodInvocation); } if (method_name.compare("DeleteLabel") == 0) { Glib::Variant base_label; parameters.get_child(base_label, 0); Glib::ustring p_label = specialGetter(base_label); MethodInvocation methodInvocation(invocation); DeleteLabel( (p_label), methodInvocation); } if (method_name.compare("HasLabel") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); Glib::Variant base_label; parameters.get_child(base_label, 1); Glib::ustring p_label = specialGetter(base_label); MethodInvocation methodInvocation(invocation); HasLabel( (p_docId), (p_label), methodInvocation); } if (method_name.compare("GetDocumentLabels") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); MethodInvocation methodInvocation(invocation); GetDocumentLabels( (p_docId), methodInvocation); } if (method_name.compare("SetDocumentLabels") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); Glib::Variant> base_labels; parameters.get_child(base_labels, 1); std::vector p_labels = specialGetter(base_labels); Glib::Variant base_resetLabels; parameters.get_child(base_resetLabels, 2); bool p_resetLabels = specialGetter(base_resetLabels); MethodInvocation methodInvocation(invocation); SetDocumentLabels( (p_docId), (p_labels), (p_resetLabels), methodInvocation); } if (method_name.compare("SetDocumentsLabels") == 0) { Glib::Variant> base_docIds; parameters.get_child(base_docIds, 0); std::vector p_docIds = specialGetter(base_docIds); Glib::Variant> base_labels; parameters.get_child(base_labels, 1); std::vector p_labels = specialGetter(base_labels); Glib::Variant base_resetLabels; parameters.get_child(base_resetLabels, 2); bool p_resetLabels = specialGetter(base_resetLabels); MethodInvocation methodInvocation(invocation); SetDocumentsLabels( (p_docIds), (p_labels), (p_resetLabels), methodInvocation); } if (method_name.compare("HasDocument") == 0) { Glib::Variant base_url; parameters.get_child(base_url, 0); Glib::ustring p_url = specialGetter(base_url); MethodInvocation methodInvocation(invocation); HasDocument( (p_url), methodInvocation); } if (method_name.compare("GetCloseTerms") == 0) { Glib::Variant base_term; parameters.get_child(base_term, 0); Glib::ustring p_term = specialGetter(base_term); MethodInvocation methodInvocation(invocation); GetCloseTerms( (p_term), methodInvocation); } if (method_name.compare("GetDocumentsCount") == 0) { Glib::Variant base_label; parameters.get_child(base_label, 0); Glib::ustring p_label = specialGetter(base_label); MethodInvocation methodInvocation(invocation); GetDocumentsCount( (p_label), methodInvocation); } if (method_name.compare("ListDocuments") == 0) { Glib::Variant base_term; parameters.get_child(base_term, 0); Glib::ustring p_term = specialGetter(base_term); Glib::Variant base_termType; parameters.get_child(base_termType, 1); guint32 p_termType = specialGetter(base_termType); Glib::Variant base_maxCount; parameters.get_child(base_maxCount, 2); guint32 p_maxCount = specialGetter(base_maxCount); Glib::Variant base_startOffset; parameters.get_child(base_startOffset, 3); guint32 p_startOffset = specialGetter(base_startOffset); MethodInvocation methodInvocation(invocation); ListDocuments( (p_term), (p_termType), (p_maxCount), (p_startOffset), methodInvocation); } if (method_name.compare("UpdateDocument") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); MethodInvocation methodInvocation(invocation); UpdateDocument( (p_docId), methodInvocation); } if (method_name.compare("SetDocumentInfo") == 0) { Glib::Variant base_docId; parameters.get_child(base_docId, 0); guint32 p_docId = specialGetter(base_docId); Glib::Variant>> base_fields; parameters.get_child(base_fields, 1); std::vector> p_fields = specialGetter(base_fields); MethodInvocation methodInvocation(invocation); SetDocumentInfo( (p_docId), (p_fields), methodInvocation); } if (method_name.compare("Query") == 0) { Glib::Variant base_engineType; parameters.get_child(base_engineType, 0); Glib::ustring p_engineType = specialGetter(base_engineType); Glib::Variant base_engineName; parameters.get_child(base_engineName, 1); Glib::ustring p_engineName = specialGetter(base_engineName); Glib::Variant base_searchText; parameters.get_child(base_searchText, 2); Glib::ustring p_searchText = specialGetter(base_searchText); Glib::Variant base_startDoc; parameters.get_child(base_startDoc, 3); guint32 p_startDoc = specialGetter(base_startDoc); Glib::Variant base_maxHits; parameters.get_child(base_maxHits, 4); guint32 p_maxHits = specialGetter(base_maxHits); MethodInvocation methodInvocation(invocation); Query( (p_engineType), (p_engineName), (p_searchText), (p_startDoc), (p_maxHits), methodInvocation); } if (method_name.compare("SimpleQuery") == 0) { Glib::Variant base_searchText; parameters.get_child(base_searchText, 0); Glib::ustring p_searchText = specialGetter(base_searchText); Glib::Variant base_maxHits; parameters.get_child(base_maxHits, 1); guint32 p_maxHits = specialGetter(base_maxHits); MethodInvocation methodInvocation(invocation); SimpleQuery( (p_searchText), (p_maxHits), methodInvocation); } } void com::github::fabricecolin::PinotStub::on_interface_get_property( Glib::VariantBase &property, const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &property_name) { static_cast(property); // maybe unused static_cast(property_name); // maybe unused if (property_name.compare("DaemonVersion") == 0) { property = Glib::Variant::create((DaemonVersion_get())); } if (property_name.compare("IndexFlushEpoch") == 0) { property = Glib::Variant::create((IndexFlushEpoch_get())); } } bool com::github::fabricecolin::PinotStub::on_interface_set_property( const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &property_name, const Glib::VariantBase &value) { static_cast(property_name); // maybe unused static_cast(value); // maybe unused return true; } bool com::github::fabricecolin::PinotStub::DaemonVersion_set(const Glib::ustring & value) { if (DaemonVersion_setHandler(value)) { Glib::Variant value_get = Glib::Variant::create((DaemonVersion_get())); emitSignal("DaemonVersion", value_get); return true; } return false; } bool com::github::fabricecolin::PinotStub::IndexFlushEpoch_set(guint32 value) { if (IndexFlushEpoch_setHandler(value)) { Glib::Variant value_get = Glib::Variant::create((IndexFlushEpoch_get())); emitSignal("IndexFlushEpoch", value_get); return true; } return false; } bool com::github::fabricecolin::PinotStub::emitSignal( const std::string &propName, Glib::VariantBase &value) { std::map changedProps; std::vector changedPropsNoValue; changedProps[propName] = value; Glib::Variant> changedPropsVar = Glib::Variant>::create(changedProps); Glib::Variant> changedPropsNoValueVar = Glib::Variant>::create(changedPropsNoValue); std::vector ps; ps.push_back(Glib::Variant::create(m_interfaceName)); ps.push_back(changedPropsVar); ps.push_back(changedPropsNoValueVar); Glib::VariantContainerBase propertiesChangedVariant = Glib::Variant>::create_tuple(ps); for (const RegisteredObject &obj: m_registered_objects) { obj.connection->emit_signal( obj.object_path, "org.freedesktop.DBus.Properties", "PropertiesChanged", Glib::ustring(), propertiesChangedVariant); } return true; } pinot-1.22/Core/PinotDBus_stub.h000066400000000000000000000314721470740426600165550ustar00rootroot00000000000000#pragma once #include #include #include #include #include "PinotDBus_common.h" namespace org { namespace freedesktop { namespace DBus { class IntrospectableStub : public sigc::trackable { public: IntrospectableStub(); virtual ~IntrospectableStub(); IntrospectableStub(const IntrospectableStub &other) = delete; IntrospectableStub(IntrospectableStub &&other) = delete; IntrospectableStub &operator=(const IntrospectableStub &other) = delete; IntrospectableStub &operator=(IntrospectableStub &&other) = delete; guint register_object(const Glib::RefPtr &connection, const Glib::ustring &object_path); void unregister_object(); unsigned int usage_count() const { return static_cast(m_registered_objects.size()); } class MethodInvocation; protected: virtual void Introspect( MethodInvocation &invocation) = 0; void on_method_call(const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &method_name, const Glib::VariantContainerBase ¶meters, const Glib::RefPtr &invocation); void on_interface_get_property(Glib::VariantBase& property, const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &property_name); bool on_interface_set_property( const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &property_name, const Glib::VariantBase &value); private: bool emitSignal(const std::string &propName, Glib::VariantBase &value); struct RegisteredObject { guint id; Glib::RefPtr connection; std::string object_path; }; Glib::RefPtr introspection_data; std::vector m_registered_objects; std::string m_interfaceName; }; class IntrospectableStub::MethodInvocation { public: MethodInvocation(const Glib::RefPtr &msg): m_message(msg) {} const Glib::RefPtr getMessage() { return m_message; } void ret(Glib::Error error) { m_message->return_error(error); } void returnError(const Glib::ustring &domain, int code, const Glib::ustring &message) { m_message->return_error(domain, code, message); } void ret(const Glib::ustring & p0) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } private: Glib::RefPtr m_message; }; } // DBus } // freedesktop } // org namespace com { namespace github { namespace fabricecolin { class PinotStub : public sigc::trackable { public: PinotStub(); virtual ~PinotStub(); PinotStub(const PinotStub &other) = delete; PinotStub(PinotStub &&other) = delete; PinotStub &operator=(const PinotStub &other) = delete; PinotStub &operator=(PinotStub &&other) = delete; guint register_object(const Glib::RefPtr &connection, const Glib::ustring &object_path); void unregister_object(); unsigned int usage_count() const { return static_cast(m_registered_objects.size()); } class MethodInvocation; bool DaemonVersion_set(const Glib::ustring & value); bool IndexFlushEpoch_set(guint32 value); protected: virtual void GetStatistics( MethodInvocation &invocation) = 0; virtual void Reload( MethodInvocation &invocation) = 0; virtual void Stop( MethodInvocation &invocation) = 0; virtual void GetDocumentInfo( guint32 docId, MethodInvocation &invocation) = 0; virtual void GetDocumentTermsCount( guint32 docId, MethodInvocation &invocation) = 0; virtual void GetDocumentTerms( guint32 docId, MethodInvocation &invocation) = 0; virtual void GetLabels( MethodInvocation &invocation) = 0; virtual void AddLabel( const Glib::ustring & label, MethodInvocation &invocation) = 0; virtual void DeleteLabel( const Glib::ustring & label, MethodInvocation &invocation) = 0; virtual void HasLabel( guint32 docId, const Glib::ustring & label, MethodInvocation &invocation) = 0; virtual void GetDocumentLabels( guint32 docId, MethodInvocation &invocation) = 0; virtual void SetDocumentLabels( guint32 docId, const std::vector & labels, bool resetLabels, MethodInvocation &invocation) = 0; virtual void SetDocumentsLabels( const std::vector & docIds, const std::vector & labels, bool resetLabels, MethodInvocation &invocation) = 0; virtual void HasDocument( const Glib::ustring & url, MethodInvocation &invocation) = 0; virtual void GetCloseTerms( const Glib::ustring & term, MethodInvocation &invocation) = 0; virtual void GetDocumentsCount( const Glib::ustring & label, MethodInvocation &invocation) = 0; virtual void ListDocuments( const Glib::ustring & term, guint32 termType, guint32 maxCount, guint32 startOffset, MethodInvocation &invocation) = 0; virtual void UpdateDocument( guint32 docId, MethodInvocation &invocation) = 0; virtual void SetDocumentInfo( guint32 docId, const std::vector> & fields, MethodInvocation &invocation) = 0; virtual void Query( const Glib::ustring & engineType, const Glib::ustring & engineName, const Glib::ustring & searchText, guint32 startDoc, guint32 maxHits, MethodInvocation &invocation) = 0; virtual void SimpleQuery( const Glib::ustring & searchText, guint32 maxHits, MethodInvocation &invocation) = 0; /* Handle the setting of a property * This method will be called as a result of a call to _set * and should implement the actual setting of the property value. * Should return true on success and false otherwise. */ virtual bool DaemonVersion_setHandler(const Glib::ustring & value) = 0; virtual Glib::ustring DaemonVersion_get() = 0; /* Handle the setting of a property * This method will be called as a result of a call to _set * and should implement the actual setting of the property value. * Should return true on success and false otherwise. */ virtual bool IndexFlushEpoch_setHandler(guint32 value) = 0; virtual guint32 IndexFlushEpoch_get() = 0; void on_method_call(const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &method_name, const Glib::VariantContainerBase ¶meters, const Glib::RefPtr &invocation); void on_interface_get_property(Glib::VariantBase& property, const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &property_name); bool on_interface_set_property( const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &property_name, const Glib::VariantBase &value); private: bool emitSignal(const std::string &propName, Glib::VariantBase &value); struct RegisteredObject { guint id; Glib::RefPtr connection; std::string object_path; }; Glib::RefPtr introspection_data; std::vector m_registered_objects; std::string m_interfaceName; }; class PinotStub::MethodInvocation { public: MethodInvocation(const Glib::RefPtr &msg): m_message(msg) {} const Glib::RefPtr getMessage() { return m_message; } void ret(Glib::Error error) { m_message->return_error(error); } void returnError(const Glib::ustring &domain, int code, const Glib::ustring &message) { m_message->return_error(domain, code, message); } void ret(guint32 p0, guint32 p1, bool p2, bool p3, bool p4) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); Glib::Variant var1 = Glib::Variant::create(p1); vlist.push_back(var1); Glib::Variant var2 = Glib::Variant::create(p2); vlist.push_back(var2); Glib::Variant var3 = Glib::Variant::create(p3); vlist.push_back(var3); Glib::Variant var4 = Glib::Variant::create(p4); vlist.push_back(var4); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(bool p0) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(gint32 p0) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(const std::vector> & p0) { std::vector vlist; Glib::Variant>> var0 = Glib::Variant>>::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(guint32 p0) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(const std::vector & p0) { std::vector vlist; Glib::Variant> var0 = Glib::Variant>::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(const Glib::ustring & p0) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(guint32 p0, const std::vector>> & p1) { std::vector vlist; Glib::Variant var0 = Glib::Variant::create(p0); vlist.push_back(var0); Glib::Variant>>> var1 = Glib::Variant>>>::create(p1); vlist.push_back(var1); m_message->return_value(Glib::Variant::create_tuple(vlist)); } private: Glib::RefPtr m_message; }; } // fabricecolin } // github } // com pinot-1.22/Core/PinotSettings.cpp000066400000000000000000001400421470740426600170100ustar00rootroot00000000000000/* * Copyright 2005-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #include #include #ifdef HAVE_PWD_H #include #endif #ifdef HAVE_FNMATCH_H #include #endif #include #include #include #include #include #include #include #include #include #include "NLS.h" #include "CommandLine.h" #include "Languages.h" #include "StringManip.h" #include "ModuleFactory.h" #include "PluginWebEngine.h" #include "PinotSettings.h" using namespace std; using namespace Glib; using namespace xmlpp; static string getElementContent(const Element *pElem) { if (pElem == NULL) { return ""; } const TextNode *pText = pElem->get_first_child_text(); if (pText == NULL) { return ""; } return pText->get_content(); } static Element *addChildElement(Element *pElem, const string &nodeName, const string &nodeContent) { if (pElem == NULL) { return NULL; } Element *pSubElem = pElem->add_child_element(nodeName); if (pSubElem != NULL) { pSubElem->set_first_child_text(nodeContent); } return pSubElem; } PinotSettings PinotSettings::m_instance; bool PinotSettings::m_clientMode = false; PinotSettings::PinotSettings() : m_warnAboutVersion(false), m_defaultBackend("xapian"), m_minimumDiskSpace(50), m_xPos(0), m_yPos(0), m_width(0), m_height(0), m_panePos(-1), m_showEngines(false), m_expandQueries(false), m_ignoreRobotsDirectives(false), m_suggestQueryTerms(true), m_newResultsColourRed(65535), m_newResultsColourGreen(0), m_newResultsColourBlue(0), m_proxyPort(8080), m_proxyEnabled(false), m_isBlackList(true), m_firstRun(false), m_indexCount(0) { string directoryName(getConfigurationDirectory()); struct stat fileStat; // Initialize libxml2 and check for potential ABI mismatches LIBXML_TEST_VERSION #if LIBXML_VERSION < 21200 xmlParserDebugEntities = 0; #endif // Find out if there is a .pinot directory if (stat(directoryName.c_str(), &fileStat) != 0) { // No, create it then #ifdef WIN32 if (mkdir(directoryName.c_str()) == 0) #else if (mkdir(directoryName.c_str(), (mode_t)S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IWGRP|S_IXGRP|S_IROTH|S_IXOTH) == 0) #endif { clog << "Created directory " << directoryName << endl; m_firstRun = true; } else { clog << "Couldn't create pinot directory at " << directoryName << endl; } } // This is where the internal indices live m_docsIndexLocation = directoryName; m_docsIndexLocation += "/index"; m_daemonIndexLocation = directoryName; m_daemonIndexLocation += "/daemon"; // This is not set in the settings files char *pEnvVar = getenv("PINOT_MINIMUM_DISK_SPACE"); if ((pEnvVar != NULL) && (strlen(pEnvVar) > 0)) { m_minimumDiskSpace = atof(pEnvVar); } } PinotSettings::~PinotSettings() { xmlCleanupParser(); } PinotSettings &PinotSettings::getInstance(void) { return m_instance; } bool PinotSettings::setClientMode(bool enable) { bool isEnabled = m_clientMode; m_clientMode = enable; return isEnabled; } bool PinotSettings::getClientMode(void) { return m_clientMode; } string PinotSettings::getHomeDirectory(void) { #ifdef HAVE_PWD_H struct passwd *pPasswd = getpwuid(geteuid()); if ((pPasswd != NULL) && (pPasswd->pw_dir != NULL)) { return pPasswd->pw_dir; } else { #endif char *homeDir = getenv("HOME"); if (homeDir != NULL) { return homeDir; } #ifdef HAVE_PWD_H } return "~"; #else return "."; #endif } string PinotSettings::getConfigurationDirectory(void) { string directoryName(getHomeDirectory()); if (directoryName.empty() == true) { return "~/.pinot"; } directoryName += "/.pinot"; return directoryName; } string PinotSettings::getFileName(bool prefsOrUI) { string configFileName(getConfigurationDirectory()); if (prefsOrUI == true) { configFileName += "/prefs.xml"; } else { configFileName += "/ui.xml"; } return configFileName; } string PinotSettings::getCurrentUserName(void) { #ifdef HAVE_PWD_H struct passwd *pPasswd = getpwuid(geteuid()); if ((pPasswd != NULL) && (pPasswd->pw_name != NULL)) { return pPasswd->pw_name; } #endif return ""; } void PinotSettings::checkHistoryDatabase(void) { string uiHistoryDatabase(getConfigurationDirectory()); string daemonHistoryDatabase(getConfigurationDirectory()); struct stat fileStat; uiHistoryDatabase += "/history"; daemonHistoryDatabase += "/history-daemon"; // Copy the UI's over to the daemon's history if it doesn't exist if ((stat(uiHistoryDatabase.c_str(), &fileStat) == 0) && ((stat(daemonHistoryDatabase.c_str(), &fileStat) != 0) || (!S_ISREG(fileStat.st_mode)))) { string output; CommandLine::runSync(string("\\cp -f ") + uiHistoryDatabase + " " + daemonHistoryDatabase, output); #ifdef DEBUG clog << "PinotSettings::checkHistoryDatabase: " << output << endl; #endif } } string PinotSettings::getHistoryDatabaseName(bool needToQueryDaemonHistory) { string historyDatabase(getConfigurationDirectory()); if ((m_clientMode == false) || (needToQueryDaemonHistory == true)) { historyDatabase += "/history-daemon"; } else { historyDatabase += "/history"; } return historyDatabase; } bool PinotSettings::isFirstRun(void) const { return m_firstRun; } void PinotSettings::clear(void) { m_version.clear(); m_warnAboutVersion = false; m_defaultBackend = "xapian"; m_xPos = 0; m_yPos = 0; m_width = 0; m_height = 0; m_panePos = -1; m_showEngines = false; m_expandQueries = false; m_ignoreRobotsDirectives = false; m_suggestQueryTerms = true; m_newResultsColourRed = 65535; m_newResultsColourGreen = 0; m_newResultsColourBlue = 0; m_proxyAddress.clear(); m_proxyPort = 8080; m_proxyType.clear(); m_proxyEnabled = false; m_indexableLocations.clear(); m_filePatternsList.clear(); m_isBlackList = true; m_editablePluginValues.clear(); m_cacheProviders.clear(); m_cacheProtocols.clear(); m_firstRun = false; m_indexes.clear(); m_indexCount = 0; m_engines.clear(); m_engineIds.clear(); m_engineChannels.clear(); m_queries.clear(); m_labels.clear(); } bool PinotSettings::load(LoadWhat what) { string fileName; bool loadedUIConfiguration = false; if ((what == LOAD_ALL) || (what == LOAD_GLOBAL)) { fileName = string(SYSCONFDIR) + "/pinot/globalconfig.xml"; if (loadConfiguration(fileName, true) == false) { return false; } if (what == LOAD_GLOBAL) { // Stop here return true; } } // Load settings ? if (m_firstRun == false) { // Load 0.90 preferences first if (loadConfiguration(getFileName(true), false) == false) { fileName = getConfigurationDirectory() + "/config.xml"; // We may have to migrate away from a pre-0.90 configuration clear(); if (loadConfiguration(fileName, false) == true) { // Save settings now to the new format save(SAVE_PREFS); save(SAVE_CONFIG); clog << "Migrated settings to 0.90 format" << endl; } else { m_firstRun = true; } } else { loadedUIConfiguration = loadConfiguration(getFileName(false), false); } } if (what == LOAD_ALL) { // Load search engines loadSearchEngines(string(PREFIX) + "/share/pinot/engines"); loadSearchEngines(getConfigurationDirectory() + "/engines"); } map engines; string currentUserChannel("X-Current-User-Channel"); // Some engines are available as back-ends ModuleFactory::getSupportedEngines(engines); for (map::const_iterator engineIter = engines.begin(); engineIter != engines.end(); ++engineIter) { if (engineIter->second == true) { string channelName(engineIter->first.m_channel); m_engineIds[1 << m_engines.size()] = engineIter->first.m_name; // Is a channel specified ? if (engineIter->first.m_channel.empty() == true) { ModuleProperties modProps(engineIter->first); channelName = modProps.m_channel = currentUserChannel; #ifdef DEBUG clog << "PinotSettings::load: no channel for back-end " << engineIter->first.m_name << endl; #endif m_engines.insert(modProps); } else { m_engines.insert(engineIter->first); } if (m_engineChannels.find(channelName) == m_engineChannels.end()) { m_engineChannels.insert(pair(channelName, true)); } } } // Internal indices addIndex(_("My Web Pages"), m_docsIndexLocation, true); addIndex(_("My Documents"), m_daemonIndexLocation, true); if (loadedUIConfiguration == false) { // Add default labels m_labels.insert(_("Important")); m_labels.insert(_("New")); m_labels.insert(_("Personal")); m_isBlackList = getDefaultPatterns(m_filePatternsList); // Create default queries #ifdef HAVE_PWD_H struct passwd *pPasswd = getpwuid(geteuid()); if (pPasswd != NULL) { string userName; if ((pPasswd->pw_gecos != NULL) && (strlen(pPasswd->pw_gecos) > 0)) { userName = pPasswd->pw_gecos; } else if (pPasswd->pw_name != NULL) { userName = pPasswd->pw_name; } if (userName.empty() == false) { QueryProperties queryProps(_("Me"), string("\"") + userName + string("\"")); queryProps.setSortOrder(QueryProperties::DATE_DESC); addQuery(queryProps); } } #endif QueryProperties queryProps(_("Latest First"), "dir:/"); queryProps.setSortOrder(QueryProperties::DATE_DESC); addQuery(queryProps); queryProps = QueryProperties(_("10kb And Smaller"), "0..10240b"); queryProps.setSortOrder(QueryProperties::SIZE_DESC); addQuery(queryProps); addQuery(QueryProperties(_("Home Stuff"), string("dir:") + getHomeDirectory())); addQuery(QueryProperties(_("With Label New"), "label:New")); addQuery(QueryProperties(_("Have CJKV"), "tokens:CJKV")); addQuery(QueryProperties(_("In English"), "lang:en")); addQuery(QueryProperties("Pinot search", "pinot search")); } return true; } bool PinotSettings::loadSearchEngines(const string &directoryName) { if (directoryName.empty() == true) { return true; } DIR *pDir = opendir(directoryName.c_str()); if (pDir == NULL) { return false; } // Iterate through this directory's entries struct dirent *pDirEntry = readdir(pDir); while (pDirEntry != NULL) { char *pEntryName = pDirEntry->d_name; if (pEntryName != NULL) { struct stat fileStat; string location = directoryName; location += "/"; location += pEntryName; // Is that a file ? if ((stat(location.c_str(), &fileStat) == 0) && (S_ISREG(fileStat.st_mode))) { SearchPluginProperties properties; if ((PluginWebEngine::getDetails(location, properties) == true) && (properties.m_name.empty() == false) && (properties.m_longName.empty() == false)) { m_engineIds[1 << m_engines.size()] = properties.m_longName; if (properties.m_channel.empty() == true) { properties.m_channel = _("Unclassified"); } // SearchPluginProperties derives ModuleProperties m_engines.insert(properties); if (m_engineChannels.find(properties.m_channel) == m_engineChannels.end()) { m_engineChannels.insert(pair(properties.m_channel, true)); } // Any editable parameters in this plugin ? for (map::const_iterator editableIter = properties.m_editableParameters.begin(); editableIter != properties.m_editableParameters.end(); ++editableIter) { // This may have been created when loading settings if (m_editablePluginValues.find(editableIter->second) == m_editablePluginValues.end()) { m_editablePluginValues[editableIter->second] = ""; } } #ifdef DEBUG clog << "PinotSettings::loadSearchEngines: " << properties.m_name << ", " << properties.m_longName << ", " << properties.m_option << " has " << properties.m_editableParameters.size() << " editable values" << endl; #endif } } } // Next entry pDirEntry = readdir(pDir); } closedir(pDir); return true; } bool PinotSettings::loadConfiguration(const string &fileName, bool isGlobal) { struct stat fileStat; bool success = true; if ((stat(fileName.c_str(), &fileStat) != 0) || (!S_ISREG(fileStat.st_mode))) { clog << "Couldn't open settings file " << fileName << endl; return false; } try { // Parse the settings file DomParser parser; parser.set_substitute_entities(true); parser.parse_file(fileName); xmlpp::Document *pDocument = parser.get_document(); if (pDocument == NULL) { return false; } Element *pRootElem = pDocument->get_root_node(); if (pRootElem == NULL) { return false; } // Check the top-level element is what we expect ustring rootNodeName = pRootElem->get_name(); if (rootNodeName != "pinot") { return false; } // Go through the subnodes Node::NodeList childNodes = pRootElem->get_children(); if (childNodes.empty() == false) { for (Node::NodeList::const_iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { Node *pNode = (*iter); // All nodes should be elements Element *pElem = dynamic_cast(pNode); if (pElem == NULL) { continue; } string nodeName(pElem->get_name()); string nodeContent(getElementContent(pElem)); if (isGlobal == true) { if (nodeName == "cache") { loadCacheProviders(pElem); } else { // Unsupported element continue; } } else if (nodeName == "version") { m_version = nodeContent; } else if (nodeName == "warnaboutversion") { if (nodeContent == "YES") { m_warnAboutVersion = true; } else { m_warnAboutVersion = false; } } else if (nodeName == "backend") { m_defaultBackend = nodeContent; } else if (nodeName == "ui") { loadUi(pElem); } else if (nodeName == "extraindex") { loadIndexes(pElem); } else if (nodeName == "channel") { loadEngineChannels(pElem); } else if (nodeName == "storedquery") { loadQueries(pElem); } else if (nodeName == "label") { loadLabels(pElem); } else if (nodeName == "robots") { if (nodeContent == "IGNORE") { m_ignoreRobotsDirectives = true; } else { m_ignoreRobotsDirectives = false; } } else if (nodeName == "suggestterms") { if (nodeContent == "YES") { m_suggestQueryTerms = true; } else { m_suggestQueryTerms = false; } } else if (nodeName == "newresults") { loadColour(pElem); } else if (nodeName == "proxy") { loadProxy(pElem); } else if (nodeName == "indexable") { loadIndexableLocations(pElem); } else if ((nodeName == "blacklist") || (nodeName == "patterns")) { loadFilePatterns(pElem); } else if (nodeName == "pluginparameters") { loadPluginParameters(pElem); } } } } catch (const std::exception& ex) { clog << "Couldn't parse settings file: " << ex.what() << endl; success = false; } return success; } bool PinotSettings::loadUi(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "xpos") { m_xPos = atoi(nodeContent.c_str()); } else if (nodeName == "ypos") { m_yPos = atoi(nodeContent.c_str()); } else if (nodeName == "width") { m_width = atoi(nodeContent.c_str()); } else if (nodeName == "height") { m_height = atoi(nodeContent.c_str()); } else if (nodeName == "panepos") { m_panePos = atoi(nodeContent.c_str()); } else if (nodeName == "expandqueries") { if (nodeContent == "YES") { m_expandQueries = true; } else { m_expandQueries = false; } } else if (nodeName == "showengines") { if (nodeContent == "YES") { m_showEngines = true; } else { m_showEngines = false; } } } return true; } bool PinotSettings::loadIndexes(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } string indexName, indexLocation; for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { indexName = nodeContent; } else if (nodeName == "location") { indexLocation = nodeContent; } } if ((indexName.empty() == false) && (indexLocation.empty() == false)) { addIndex(indexName, indexLocation, false); } return true; } bool PinotSettings::loadEngineChannels(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { std::map::iterator channelIter = m_engineChannels.find(nodeContent); if (channelIter != m_engineChannels.end()) { channelIter->second = false; } else { m_engineChannels.insert(pair(nodeContent, false)); } } } return true; } bool PinotSettings::loadQueries(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } QueryProperties queryProps; Date minDate, maxDate; string freeQuery; bool enableMinDate = false, enableMaxDate = false; // Load the query's properties for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { queryProps.setName(nodeContent); } else if (nodeName == "sortorder") { if ((nodeContent == "DATE") || (nodeContent == "DATE_DESC")) { queryProps.setSortOrder(QueryProperties::DATE_DESC); } else if (nodeContent == "DATE_ASC") { queryProps.setSortOrder(QueryProperties::DATE_ASC); } else if (nodeContent == "SIZE_DESC") { queryProps.setSortOrder(QueryProperties::SIZE_DESC); } else { queryProps.setSortOrder(QueryProperties::RELEVANCE); } } else if (nodeName == "text") { freeQuery = nodeContent; } else if ((nodeName == "and") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += nodeContent; } else if ((nodeName == "phrase") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += "\""; freeQuery += nodeContent; freeQuery += "\""; } else if ((nodeName == "any") && (nodeContent.empty() == false)) { // FIXME: don't be lazy and add those correctly if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += nodeContent; } else if ((nodeName == "not") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += "-("; freeQuery += nodeContent; freeQuery += ")"; } else if ((nodeName == "language") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += "lang:"; freeQuery += nodeContent; } else if ((nodeName == "stemlanguage") && (nodeContent.empty() == false)) { queryProps.setStemmingLanguage(Languages::toLocale(nodeContent)); } else if ((nodeName == "hostfilter") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += "site:"; freeQuery += nodeContent; } else if ((nodeName == "filefilter") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += "file:"; freeQuery += nodeContent; } else if ((nodeName == "labelfilter") && (nodeContent.empty() == false)) { if (freeQuery.empty() == false) { freeQuery += " "; } freeQuery += "label:"; freeQuery += nodeContent; } else if (nodeName == "maxresults") { int count = atoi(nodeContent.c_str()); queryProps.setMaximumResultsCount((unsigned int)max(count, 10)); } else if (nodeName == "enablemindate") { if (nodeContent == "YES") { enableMinDate = true; } } else if (nodeName == "mindate") { minDate.set_parse(nodeContent); } else if (nodeName == "enablemaxdate") { if (nodeContent == "YES") { enableMaxDate = true; } } else if (nodeName == "maxdate") { maxDate.set_parse(nodeContent); } else if (nodeName == "index") { if (nodeContent == "NEW") { queryProps.setIndexResults(QueryProperties::NEW_RESULTS); } else if (nodeContent == "ALL") { queryProps.setIndexResults(QueryProperties::ALL_RESULTS); } else { queryProps.setIndexResults(QueryProperties::NOTHING); } } else if (nodeName == "label") { queryProps.setLabelName(nodeContent); } else if (nodeName == "modified") { if (nodeContent == "YES") { queryProps.setModified(true); } } } // Are pre-0.80 dates specified ? if ((enableMinDate == true) || (enableMaxDate == true)) { // Provide reasonable defaults if (enableMinDate == false) { minDate.set_day(1); minDate.set_month(Date::JANUARY); minDate.set_year(1970); } if (enableMaxDate == false) { maxDate.set_day(31); maxDate.set_month(Date::DECEMBER); maxDate.set_year(2099); } ustring minDateStr(minDate.format_string("%Y%m%d")); ustring maxDateStr(maxDate.format_string("%Y%m%d")); #ifdef DEBUG clog << "PinotSettings::loadQueries: date range " << minDateStr << " to " << maxDateStr << endl; #endif freeQuery += " "; freeQuery += minDateStr; freeQuery += ".."; freeQuery += maxDateStr; } // We need at least a name if (queryProps.getName().empty() == false) { if (freeQuery.empty() == false) { queryProps.setFreeQuery(freeQuery); } m_queries[queryProps.getName()] = queryProps; } return true; } bool PinotSettings::loadLabels(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } // Load the label's properties for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { m_labels.insert(nodeContent); } // Labels used to have colours... } return true; } bool PinotSettings::loadColour(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } // Load the colour RGB components for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); gushort value = (gushort)atoi(nodeContent.c_str()); if (nodeName == "red") { m_newResultsColourRed = value; } else if (nodeName == "green") { m_newResultsColourGreen = value; } else if (nodeName == "blue") { m_newResultsColourBlue = value; } } return true; } bool PinotSettings::loadProxy(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "address") { m_proxyAddress = nodeContent; } else if (nodeName == "port") { m_proxyPort = (unsigned int)atoi(nodeContent.c_str()); } else if (nodeName == "type") { m_proxyType = nodeContent; } else if (nodeName == "enable") { if (nodeContent == "YES") { m_proxyEnabled = true; } else { m_proxyEnabled = false; } } } return true; } bool PinotSettings::loadIndexableLocations(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } IndexableLocation location; // Load the indexable location's properties for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { location.m_name = nodeContent; } else if (nodeName == "monitor") { if (nodeContent == "YES") { location.m_monitor = true; } else { location.m_monitor = false; } } } if (location.m_name.empty() == false) { location.m_isSource = true; m_indexableLocations.insert(location); } return true; } bool PinotSettings::loadFilePatterns(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } // Load the file patterns list for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "pattern") { m_filePatternsList.insert(nodeContent); } else if (nodeName == "forbid") { if (nodeContent == "YES") { m_isBlackList = true; } else { m_isBlackList = false; } } } return true; } bool PinotSettings::loadPluginParameters(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } string name, value; // Load the plugin parameters' values for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { name = nodeContent; } else if (nodeName == "value") { value = nodeContent; } } m_editablePluginValues[name] = value; return true; } bool PinotSettings::loadCacheProviders(const Element *pElem) { if (pElem == NULL) { return false; } Node::const_NodeList childNodes = pElem->get_children(); if (childNodes.empty() == true) { return false; } CacheProvider cacheProvider; // Load the cache provider's properties for (Node::const_NodeList::iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { const Node *pNode = (*iter); const Element *pChildElem = dynamic_cast(pNode); if (pChildElem == NULL) { continue; } string nodeName(pChildElem->get_name()); string nodeContent(getElementContent(pChildElem)); if (nodeName == "name") { cacheProvider.m_name = nodeContent; } else if (nodeName == "location") { cacheProvider.m_location = nodeContent; } else if (nodeName == "protocols") { nodeContent += ","; ustring::size_type previousPos = 0, commaPos = nodeContent.find(","); while (commaPos != ustring::npos) { string protocol(nodeContent.substr(previousPos, commaPos - previousPos)); StringManip::trimSpaces(protocol); cacheProvider.m_protocols.insert(protocol); // Next previousPos = commaPos + 1; commaPos = nodeContent.find(",", previousPos); } } } if ((cacheProvider.m_name.empty() == false) && (cacheProvider.m_location.empty() == false)) { m_cacheProviders.push_back(cacheProvider); // Copy the list of protocols supported by this cache provider copy(cacheProvider.m_protocols.begin(), cacheProvider.m_protocols.end(), inserter(m_cacheProtocols, m_cacheProtocols.begin())); } return true; } bool PinotSettings::save(SaveWhat what) { Element *pRootElem = NULL; Element *pElem = NULL; char numStr[64]; bool prefsOrUI = true; if (what == SAVE_CONFIG) { prefsOrUI = false; } try { xmlpp::Document doc("1.0"); // Create a new node pRootElem = doc.create_root_node("pinot"); if (pRootElem == NULL) { return false; } // ...with text children nodes addChildElement(pRootElem, "version", VERSION); if (what == SAVE_CONFIG) { addChildElement(pRootElem, "warnaboutversion", (m_warnAboutVersion ? "YES" : "NO")); // User interface position and size pElem = pRootElem->add_child_element("ui"); if (pElem == NULL) { return false; } snprintf(numStr, 64, "%d", m_xPos); addChildElement(pElem, "xpos", numStr); snprintf(numStr, 64, "%d", m_yPos); addChildElement(pElem, "ypos", numStr); snprintf(numStr, 64, "%d", m_width); addChildElement(pElem, "width", numStr); snprintf(numStr, 64, "%d", m_height); addChildElement(pElem, "height", numStr); snprintf(numStr, 64, "%d", m_panePos); addChildElement(pElem, "panepos", numStr); addChildElement(pElem, "expandqueries", (m_expandQueries ? "YES" : "NO")); addChildElement(pElem, "showengines", (m_showEngines ? "YES" : "NO")); // User-defined indexes for (set::iterator indexIter = m_indexes.begin(); indexIter != m_indexes.end(); ++indexIter) { if (indexIter->m_internal == true) { continue; } pElem = pRootElem->add_child_element("extraindex"); if (pElem == NULL) { return false; } addChildElement(pElem, "name", indexIter->m_name); addChildElement(pElem, "location", indexIter->m_location); } // Engine channels for (map::iterator channelIter = m_engineChannels.begin(); channelIter != m_engineChannels.end(); ++channelIter) { // Only save those whose group was collapsed if (channelIter->second == false) { pElem = pRootElem->add_child_element("channel"); if (pElem == NULL) { return false; } addChildElement(pElem, "name", channelIter->first); } } // User-defined queries for (map::iterator queryIter = m_queries.begin(); queryIter != m_queries.end(); ++queryIter) { pElem = pRootElem->add_child_element("storedquery"); if (pElem == NULL) { return false; } string sortOrder("RELEVANCE"); if (queryIter->second.getSortOrder() == QueryProperties::DATE_DESC) { sortOrder = "DATE"; } else if (queryIter->second.getSortOrder() == QueryProperties::DATE_ASC) { sortOrder = "DATE_ASC"; } else if (queryIter->second.getSortOrder() == QueryProperties::SIZE_DESC) { sortOrder = "SIZE_DESC"; } addChildElement(pElem, "name", queryIter->first); addChildElement(pElem, "sortorder", sortOrder); addChildElement(pElem, "text", queryIter->second.getFreeQuery()); addChildElement(pElem, "stemlanguage", Languages::toEnglish(queryIter->second.getStemmingLanguage())); snprintf(numStr, 64, "%u", queryIter->second.getMaximumResultsCount()); addChildElement(pElem, "maxresults", numStr); QueryProperties::IndexWhat indexResults = queryIter->second.getIndexResults(); if (indexResults == QueryProperties::NEW_RESULTS) { addChildElement(pElem, "index", "NEW"); } else if (indexResults == QueryProperties::ALL_RESULTS) { addChildElement(pElem, "index", "ALL"); } else { addChildElement(pElem, "index", "NONE"); } addChildElement(pElem, "label", queryIter->second.getLabelName()); addChildElement(pElem, "modified", (queryIter->second.getModified() ? "YES" : "NO")); } } if (what == SAVE_PREFS) { addChildElement(pRootElem, "backend", m_defaultBackend); // Labels for (set::iterator labelIter = m_labels.begin(); labelIter != m_labels.end(); ++labelIter) { pElem = pRootElem->add_child_element("label"); if (pElem == NULL) { return false; } addChildElement(pElem, "name", *labelIter); } // Ignore robots directives addChildElement(pRootElem, "robots", (m_ignoreRobotsDirectives ? "IGNORE" : "OBEY")); // Enable terms suggestion addChildElement(pRootElem, "suggestterms", (m_suggestQueryTerms ? "YES" : "NO")); // New results colour pElem = pRootElem->add_child_element("newresults"); if (pElem == NULL) { return false; } snprintf(numStr, 64, "%u", m_newResultsColourRed); addChildElement(pElem, "red", numStr); snprintf(numStr, 64, "%u", m_newResultsColourGreen); addChildElement(pElem, "green", numStr); snprintf(numStr, 64, "%u", m_newResultsColourBlue); addChildElement(pElem, "blue", numStr); // Proxy pElem = pRootElem->add_child_element("proxy"); if (pElem == NULL) { return false; } addChildElement(pElem, "address", m_proxyAddress); snprintf(numStr, 64, "%u", m_proxyPort); addChildElement(pElem, "port", numStr); addChildElement(pElem, "type", m_proxyType); addChildElement(pElem, "enable", (m_proxyEnabled ? "YES" : "NO")); // Locations to index for (set::iterator locationIter = m_indexableLocations.begin(); locationIter != m_indexableLocations.end(); ++locationIter) { pElem = pRootElem->add_child_element("indexable"); if (pElem == NULL) { return false; } addChildElement(pElem, "name", locationIter->m_name); addChildElement(pElem, "monitor", (locationIter->m_monitor ? "YES" : "NO")); } // File patterns pElem = pRootElem->add_child_element("patterns"); if (pElem == NULL) { return false; } for (set::iterator patternIter = m_filePatternsList.begin(); patternIter != m_filePatternsList.end() ; ++patternIter) { addChildElement(pElem, "pattern", *patternIter); } addChildElement(pElem, "forbid", (m_isBlackList ? "YES" : "NO")); // Values of editable plugin parameters for (map::iterator editableIter = m_editablePluginValues.begin(); editableIter != m_editablePluginValues.end() ; ++editableIter) { if (editableIter->second.empty() == true) { continue; } pElem = pRootElem->add_child_element("pluginparameters"); if (pElem == NULL) { return false; } addChildElement(pElem, "name", editableIter->first); addChildElement(pElem, "value", editableIter->second); } } #ifdef DEBUG clog << "PinotSettings::save: saving to " << getFileName(prefsOrUI) << endl; #endif // Save to file doc.write_to_file_formatted(getFileName(prefsOrUI)); } catch (const std::exception& ex) { clog << "Couldn't save settings file: " << ex.what() << endl; return false; } return true; } /// Returns the indexes set. const set &PinotSettings::getIndexes(void) const { return m_indexes; } /// Adds a new index. bool PinotSettings::addIndex(const ustring &name, const string &location, bool isInternal) { unsigned int indexId(1 << m_indexCount); m_indexes.insert(IndexProperties(name, location, indexId, isInternal)); #ifdef DEBUG clog << "PinotSettings::addIndex: index " << m_indexCount << " is " << name << " with ID " << indexId << endl; #endif ++m_indexCount; return true; } /// Removes an index. bool PinotSettings::removeIndex(const IndexProperties &indexProps) { // Remove from the names map set::iterator namesMapIter = m_indexes.find(indexProps); if (namesMapIter != m_indexes.end()) { m_indexes.erase(namesMapIter); return true; } return false; } /// Clears the indexes map. void PinotSettings::clearIndexes(void) { // Clear and reinsert the internal indexes m_indexes.clear(); m_indexCount = 0; addIndex(_("My Web Pages"), m_docsIndexLocation, true); addIndex(_("My Documents"), m_daemonIndexLocation, true); } /// Returns properties of the given index. PinotSettings::IndexProperties PinotSettings::getIndexPropertiesByName(const string &name) const { for (set::const_iterator propsIter = m_indexes.begin(); propsIter != m_indexes.end(); ++propsIter) { if (propsIter->m_name == name) { return *propsIter; } } return IndexProperties(); } /// Returns properties of the given index. PinotSettings::IndexProperties PinotSettings::getIndexPropertiesByLocation(const string &location) const { for (set::const_iterator propsIter = m_indexes.begin(); propsIter != m_indexes.end(); ++propsIter) { if (propsIter->m_location == location) { return *propsIter; } } return IndexProperties(); } /// Returns the name(s) for the given ID. void PinotSettings::getIndexNames(unsigned int id, set &names) { names.clear(); // Make sure indexes are or were defined if (m_indexCount == 0) { return; } unsigned indexId = 1 << (m_indexCount - 1); do { if (id & indexId) { for (set::const_iterator propsIter = m_indexes.begin(); propsIter != m_indexes.end(); ++propsIter) { if (propsIter->m_id == indexId) { #ifdef DEBUG clog << "PinotSettings::getIndexNames: index " << indexId << " is " << propsIter->m_name << endl; #endif // Get the associated name names.insert(propsIter->m_name); } } } // Shift to the right indexId = indexId >> 1; } while (indexId > 0); } /// Returns an IndexInterface for the given index location. IndexInterface *PinotSettings::getIndex(const string &location) { if (location == m_docsIndexLocation) { return ModuleFactory::getIndex(m_defaultBackend, m_docsIndexLocation); } else if ((m_clientMode == true) && (location == m_daemonIndexLocation)) { return ModuleFactory::getIndex("dbus-" + m_defaultBackend, m_daemonIndexLocation); } return ModuleFactory::getIndex(m_defaultBackend, location); } /// Returns the search engines set. bool PinotSettings::getSearchEngines(set &engines, const string &channelName) const { if (channelName.empty() == true) { // Copy the whole list of search engines copy(m_engines.begin(), m_engines.end(), inserter(engines, engines.begin())); } else { if (m_engineChannels.find(channelName) == m_engineChannels.end()) { // Unknown channel name return false; } // Copy engines that belong to the given channel for (set::iterator engineIter = m_engines.begin(); engineIter != m_engines.end(); ++engineIter) { if (engineIter->m_channel == channelName) { #ifdef DEBUG clog << "PinotSettings::getSearchEngines: engine " << engineIter->m_longName << " in channel " << channelName << endl; #endif engines.insert(*engineIter); } } } return true; } /// Returns an ID that identifies the given engine name. unsigned int PinotSettings::getEngineId(const string &name) { unsigned int engineId = 0; for (map::iterator mapIter = m_engineIds.begin(); mapIter != m_engineIds.end(); ++mapIter) { if (mapIter->second == name) { engineId = mapIter->first; break; } } #ifdef DEBUG clog << "PinotSettings::getEngineId: " << name << ", ID " << engineId << endl; #endif return engineId; } /// Returns the name for the given ID. void PinotSettings::getEngineNames(unsigned int id, set &names) { names.clear(); // Make sure there are search engines defined if (m_engines.empty() == true) { return; } unsigned engineId = 1 << (m_engines.size() - 1); do { if (id & engineId) { map::iterator mapIter = m_engineIds.find(engineId); if (mapIter != m_engineIds.end()) { // Get the associated name names.insert(mapIter->second); } } // Shift to the right engineId = engineId >> 1; } while (engineId > 0); } /// Returns the search engines channels. map &PinotSettings::getSearchEnginesChannels(void) { return m_engineChannels; } /// Returns the queries map, keyed by name. const map &PinotSettings::getQueries(void) const { return m_queries; } /// Adds a new query. bool PinotSettings::addQuery(const QueryProperties &properties) { string name(properties.getName()); map::iterator queryIter = m_queries.find(name); if (queryIter == m_queries.end()) { // Okay, no such query exists m_queries[name] = properties; return true; } return false; } /// Removes a query. bool PinotSettings::removeQuery(const string &name) { // Remove from the map map::iterator queryIter = m_queries.find(name); if (queryIter != m_queries.end()) { m_queries.erase(queryIter); return true; } return false; } /// Clears the queries map. void PinotSettings::clearQueries(void) { m_queries.clear(); } /// Gets default patterns. bool PinotSettings::getDefaultPatterns(set &defaultPatterns) { defaultPatterns.clear(); // Skip common image, video and archive types defaultPatterns.insert("*~"); defaultPatterns.insert("*.a"); defaultPatterns.insert("*.asf"); defaultPatterns.insert("*.avi"); defaultPatterns.insert("*.aux"); defaultPatterns.insert("*CVS"); defaultPatterns.insert("*.cap"); defaultPatterns.insert("*.divx"); defaultPatterns.insert("*.flv"); defaultPatterns.insert("*.git"); defaultPatterns.insert("*.gmo"); defaultPatterns.insert("*.iso"); defaultPatterns.insert("*.la"); defaultPatterns.insert("*.lha"); defaultPatterns.insert("*.lo"); defaultPatterns.insert("*.loT"); defaultPatterns.insert("*.m4"); defaultPatterns.insert("*.mov"); defaultPatterns.insert("*.msf"); defaultPatterns.insert("*.mpeg"); defaultPatterns.insert("*.mp4"); defaultPatterns.insert("*.mpg"); defaultPatterns.insert("*.mo"); defaultPatterns.insert("*.o"); defaultPatterns.insert("*.omf"); defaultPatterns.insert("*.orig"); defaultPatterns.insert("*.part"); defaultPatterns.insert("*.pc"); defaultPatterns.insert("*.po"); defaultPatterns.insert("*.rar"); defaultPatterns.insert("*.rej"); defaultPatterns.insert("*.sh"); defaultPatterns.insert("*.so"); defaultPatterns.insert("*.svn"); defaultPatterns.insert("*.tmp"); defaultPatterns.insert("*.torrent"); defaultPatterns.insert("*.vm*"); defaultPatterns.insert("*.wmv"); defaultPatterns.insert("*.xbm"); defaultPatterns.insert("*.xpm"); return true; } /// Determines if a file matches the blacklist. bool PinotSettings::isBlackListed(const string &fileName) { if (m_filePatternsList.empty() == true) { if (m_isBlackList == true) { // There is no black-list return false; } // The file is not in the (empty) whitelist return true; } #ifdef HAVE_FNMATCH_H // Any pattern matches this file name ? for (set::iterator patternIter = m_filePatternsList.begin(); patternIter != m_filePatternsList.end() ; ++patternIter) { if (fnmatch(patternIter->c_str(), fileName.c_str(), FNM_NOESCAPE) == 0) { // Fail if it's in the blacklist, let the file through otherwise return m_isBlackList; } } #endif return !m_isBlackList; } PinotSettings::IndexableLocation::IndexableLocation() : m_monitor(false), m_isSource(true) { } PinotSettings::IndexableLocation::IndexableLocation(const IndexableLocation &other) : m_monitor(other.m_monitor), m_name(other.m_name), m_isSource(other.m_isSource) { } PinotSettings::IndexableLocation::~IndexableLocation() { } PinotSettings::IndexableLocation &PinotSettings::IndexableLocation::operator=(const IndexableLocation &other) { if (this != &other) { m_monitor = other.m_monitor; m_name = other.m_name; m_isSource = other.m_isSource; } return *this; } bool PinotSettings::IndexableLocation::operator<(const IndexableLocation &other) const { if (m_name < other.m_name) { return true; } return false; } bool PinotSettings::IndexableLocation::operator==(const IndexableLocation &other) const { if (m_name == other.m_name) { return true; } return false; } PinotSettings::CacheProvider::CacheProvider() { } PinotSettings::CacheProvider::CacheProvider(const CacheProvider &other) : m_name(other.m_name), m_location(other.m_location) { m_protocols.clear(); copy(other.m_protocols.begin(), other.m_protocols.end(), inserter(m_protocols, m_protocols.begin())); } PinotSettings::CacheProvider::~CacheProvider() { } PinotSettings::CacheProvider &PinotSettings::CacheProvider::operator=(const CacheProvider &other) { if (this != &other) { m_name = other.m_name; m_location = other.m_location; m_protocols.clear(); copy(other.m_protocols.begin(), other.m_protocols.end(), inserter(m_protocols, m_protocols.begin())); } return *this; } bool PinotSettings::CacheProvider::operator<(const CacheProvider &other) const { if (m_name < other.m_name) { return true; } return false; } bool PinotSettings::CacheProvider::operator==(const CacheProvider &other) const { if (m_name == other.m_name) { return true; } return false; } PinotSettings::IndexProperties::IndexProperties() : m_id(0), m_internal(false) { } PinotSettings::IndexProperties::IndexProperties(const ustring &name, const string &location, unsigned int id, bool isInternal) : m_name(name), m_location(location), m_id(id), m_internal(isInternal) { } PinotSettings::IndexProperties::IndexProperties(const IndexProperties &other) : m_name(other.m_name), m_location(other.m_location), m_id(other.m_id), m_internal(other.m_internal) { } PinotSettings::IndexProperties::~IndexProperties() { } PinotSettings::IndexProperties& PinotSettings::IndexProperties::operator=(const IndexProperties &other) { if (this != &other) { m_name = other.m_name; m_location = other.m_location; m_id = other.m_id; m_internal = other.m_internal; } return *this; } bool PinotSettings::IndexProperties::operator<(const IndexProperties &other) const { if (m_id < other.m_id) { return true; } else if (m_id == other.m_id) { if (m_name < other.m_name) { return true; } } return false; } bool PinotSettings::IndexProperties::operator==(const IndexProperties &other) const { if (m_id == other.m_id) { return true; } return false; } pinot-1.22/Core/PinotSettings.h000066400000000000000000000160141470740426600164560ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _PINOTSETTINGS_HH #define _PINOTSETTINGS_HH #include #include #include #include #include #include #include #include #include "IndexInterface.h" #include "ModuleProperties.h" #include "QueryProperties.h" class PinotSettings { public: ~PinotSettings(); typedef enum { LOAD_ALL = 0, LOAD_GLOBAL, LOAD_LOCAL } LoadWhat; typedef enum { SAVE_PREFS = 0, SAVE_CONFIG } SaveWhat; class IndexProperties { public: IndexProperties(); IndexProperties(const Glib::ustring &name, const std::string &location, unsigned int id, bool isInternal); IndexProperties(const IndexProperties &other); virtual ~IndexProperties(); IndexProperties& operator=(const IndexProperties &other); bool operator<(const IndexProperties &other) const; bool operator==(const IndexProperties &other) const; Glib::ustring m_name; std::string m_location; unsigned int m_id; bool m_internal; }; static PinotSettings &getInstance(void); static bool setClientMode(bool enable); static bool getClientMode(void); static std::string getHomeDirectory(void); static std::string getConfigurationDirectory(void); static std::string getFileName(bool prefsOrUI); static std::string getCurrentUserName(void); static void checkHistoryDatabase(void); static std::string getHistoryDatabaseName(bool needToQueryDaemonHistory = false); bool isFirstRun(void) const; void clear(void); bool load(LoadWhat what); bool save(SaveWhat what); /// Returns the indexes set. const std::set &getIndexes(void) const; /// Adds a new index. bool addIndex(const Glib::ustring &name, const std::string &location, bool isInternal = false); /// Removes an index. bool removeIndex(const IndexProperties &indexProps); /// Clears the indexes map. void clearIndexes(void); /// Returns properties of the given index. IndexProperties getIndexPropertiesByName(const std::string &name) const; /// Returns properties of the given index. IndexProperties getIndexPropertiesByLocation(const std::string &location) const; /// Returns the name(s) for the given ID. void getIndexNames(unsigned int id, std::set &names); /// Returns an IndexInterface for the given index location. IndexInterface *getIndex(const std::string &location); /// Returns the search engines set. bool getSearchEngines(std::set &engines, const std::string &channelName = "") const; /// Returns an ID that identifies the given engine name. unsigned int getEngineId(const std::string &name); /// Returns the name(s) for the given ID. void getEngineNames(unsigned int id, std::set &names); /// Returns the search engines channels. std::map &getSearchEnginesChannels(void); /// Returns the queries map, keyed by name. const std::map &getQueries(void) const; /// Adds a new query. bool addQuery(const QueryProperties &properties); /// Removes a query. bool removeQuery(const std::string &name); /// Clears the queries map. void clearQueries(void); /// Gets default patterns. bool getDefaultPatterns(std::set &defaultPatterns); /// Determines if a file matches the blacklist. bool isBlackListed(const std::string &fileName); class IndexableLocation { public: IndexableLocation(); IndexableLocation(const IndexableLocation &other); ~IndexableLocation(); IndexableLocation &operator=(const IndexableLocation &other); bool operator<(const IndexableLocation &other) const; bool operator==(const IndexableLocation &other) const; bool m_monitor; Glib::ustring m_name; bool m_isSource; }; class CacheProvider { public: CacheProvider(); CacheProvider(const CacheProvider &other); ~CacheProvider(); CacheProvider &operator=(const CacheProvider &other); bool operator<(const CacheProvider &other) const; bool operator==(const CacheProvider &other) const; Glib::ustring m_name; Glib::ustring m_location; std::set m_protocols; }; Glib::ustring m_version; bool m_warnAboutVersion; Glib::ustring m_defaultBackend; Glib::ustring m_docsIndexLocation; Glib::ustring m_daemonIndexLocation; double m_minimumDiskSpace; int m_xPos; int m_yPos; int m_width; int m_height; int m_panePos; bool m_showEngines; bool m_expandQueries; bool m_ignoreRobotsDirectives; bool m_suggestQueryTerms; unsigned short m_newResultsColourRed; unsigned short m_newResultsColourGreen; unsigned short m_newResultsColourBlue; Glib::ustring m_proxyAddress; unsigned int m_proxyPort; Glib::ustring m_proxyType; bool m_proxyEnabled; std::set m_labels; std::set m_indexableLocations; std::set m_filePatternsList; bool m_isBlackList; std::map m_editablePluginValues; std::vector m_cacheProviders; std::set m_cacheProtocols; protected: static PinotSettings m_instance; static bool m_clientMode; bool m_firstRun; std::set m_indexes; unsigned int m_indexCount; std::set m_engines; std::map m_engineIds; std::map m_engineChannels; std::map m_queries; PinotSettings(); bool loadSearchEngines(const std::string &directoryName); bool loadConfiguration(const std::string &fileName, bool isGlobal); bool loadUi(const xmlpp::Element *pElem); bool loadIndexes(const xmlpp::Element *pElem); bool loadEngineChannels(const xmlpp::Element *pElem); bool loadQueries(const xmlpp::Element *pElem); bool loadLabels(const xmlpp::Element *pElem); bool loadColour(const xmlpp::Element *pElem); bool loadProxy(const xmlpp::Element *pElem); bool loadIndexableLocations(const xmlpp::Element *pElem); bool loadFilePatterns(const xmlpp::Element *pElem); bool loadPluginParameters(const xmlpp::Element *pElem); bool loadCacheProviders(const xmlpp::Element *pElem); private: PinotSettings(const PinotSettings &other); PinotSettings &operator=(const PinotSettings &other); }; #endif // _PINOTSETTINGS_HH pinot-1.22/Core/SearchProvider_common.cpp000066400000000000000000000000701470740426600204620ustar00rootroot00000000000000#include "SearchProvider_common.h" #include pinot-1.22/Core/SearchProvider_common.h000066400000000000000000000077231470740426600201430ustar00rootroot00000000000000#pragma once #include #include #include "glibmm.h" #include "giomm.h" namespace org { namespace gnome { namespace Shell { class SearchProvider2TypeWrap { public: template static void unwrapList(std::vector &list, const Glib::VariantContainerBase &wrapped) { for (uint i = 0; i < wrapped.get_n_children(); i++) { Glib::Variant item; wrapped.get_child(item, i); list.push_back(item.get()); } } static std::vector stdStringVecToGlibStringVec(const std::vector &strv) { std::vector newStrv; for (uint i = 0; i < strv.size(); i++) { newStrv.push_back(strv[i]); } return newStrv; } static std::vector glibStringVecToStdStringVec(const std::vector &strv) { std::vector newStrv; for (uint i = 0; i < strv.size(); i++) { newStrv.push_back(strv[i]); } return newStrv; } static Glib::VariantContainerBase GetInitialResultSet_pack( const std::vector & arg_terms) { Glib::VariantContainerBase base; Glib::Variant> params = Glib::Variant>::create(arg_terms); return Glib::VariantContainerBase::create_tuple(params); } static Glib::VariantContainerBase GetSubsearchResultSet_pack( const std::vector & arg_previous_results, const std::vector & arg_terms) { Glib::VariantContainerBase base; std::vector params; Glib::Variant> previous_results_param = Glib::Variant>::create(arg_previous_results); params.push_back(previous_results_param); Glib::Variant> terms_param = Glib::Variant>::create(arg_terms); params.push_back(terms_param); return Glib::VariantContainerBase::create_tuple(params); } static Glib::VariantContainerBase GetResultMetas_pack( const std::vector & arg_identifiers) { Glib::VariantContainerBase base; Glib::Variant> params = Glib::Variant>::create(arg_identifiers); return Glib::VariantContainerBase::create_tuple(params); } static Glib::VariantContainerBase ActivateResult_pack( const Glib::ustring & arg_identifier, const std::vector & arg_terms, guint32 arg_timestamp) { Glib::VariantContainerBase base; std::vector params; Glib::Variant identifier_param = Glib::Variant::create(arg_identifier); params.push_back(identifier_param); Glib::Variant> terms_param = Glib::Variant>::create(arg_terms); params.push_back(terms_param); Glib::Variant timestamp_param = Glib::Variant::create(arg_timestamp); params.push_back(timestamp_param); return Glib::VariantContainerBase::create_tuple(params); } static Glib::VariantContainerBase LaunchSearch_pack( const std::vector & arg_terms, guint32 arg_timestamp) { Glib::VariantContainerBase base; std::vector params; Glib::Variant> terms_param = Glib::Variant>::create(arg_terms); params.push_back(terms_param); Glib::Variant timestamp_param = Glib::Variant::create(arg_timestamp); params.push_back(timestamp_param); return Glib::VariantContainerBase::create_tuple(params); } }; } // Shell } // gnome } // org pinot-1.22/Core/SearchProvider_stub.cpp000066400000000000000000000276011470740426600201600ustar00rootroot00000000000000static const char interfaceXml0[] = R"XML_DELIMITER( )XML_DELIMITER"; #include "SearchProvider_stub.h" template inline T specialGetter(Glib::Variant variant) { return variant.get(); } template<> inline std::string specialGetter(Glib::Variant variant) { // String is not guaranteed to be null-terminated, so don't use ::get() gsize n_elem; gsize elem_size = sizeof(char); char* data = (char*)g_variant_get_fixed_array(variant.gobj(), &n_elem, elem_size); return std::string(data, n_elem); } org::gnome::Shell::SearchProvider2Stub::SearchProvider2Stub(): m_interfaceName("org.gnome.Shell.SearchProvider2") { } org::gnome::Shell::SearchProvider2Stub::~SearchProvider2Stub() { unregister_object(); } guint org::gnome::Shell::SearchProvider2Stub::register_object( const Glib::RefPtr &connection, const Glib::ustring &object_path) { if (!introspection_data) { try { introspection_data = Gio::DBus::NodeInfo::create_for_xml(interfaceXml0); } catch(const Glib::Error& ex) { g_warning("Unable to create introspection data for %s: %s", object_path.c_str(), ex.what().c_str()); return 0; } } Gio::DBus::InterfaceVTable *interface_vtable = new Gio::DBus::InterfaceVTable( sigc::mem_fun(this, &SearchProvider2Stub::on_method_call), sigc::mem_fun(this, &SearchProvider2Stub::on_interface_get_property), sigc::mem_fun(this, &SearchProvider2Stub::on_interface_set_property)); guint registration_id; try { registration_id = connection->register_object(object_path, introspection_data->lookup_interface("org.gnome.Shell.SearchProvider2"), *interface_vtable); } catch(const Glib::Error &ex) { g_warning("Registration of object %s failed: %s", object_path.c_str(), ex.what().c_str()); return 0; } m_registered_objects.emplace_back(RegisteredObject { registration_id, connection, object_path }); return registration_id; } void org::gnome::Shell::SearchProvider2Stub::unregister_object() { for (const RegisteredObject &obj: m_registered_objects) { obj.connection->unregister_object(obj.id); } m_registered_objects.clear(); } void org::gnome::Shell::SearchProvider2Stub::on_method_call( const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &method_name, const Glib::VariantContainerBase ¶meters, const Glib::RefPtr &invocation) { static_cast(method_name); // maybe unused static_cast(parameters); // maybe unused static_cast(invocation); // maybe unused if (method_name.compare("GetInitialResultSet") == 0) { Glib::Variant> base_terms; parameters.get_child(base_terms, 0); std::vector p_terms = specialGetter(base_terms); MethodInvocation methodInvocation(invocation); GetInitialResultSet( (p_terms), methodInvocation); } if (method_name.compare("GetSubsearchResultSet") == 0) { Glib::Variant> base_previous_results; parameters.get_child(base_previous_results, 0); std::vector p_previous_results = specialGetter(base_previous_results); Glib::Variant> base_terms; parameters.get_child(base_terms, 1); std::vector p_terms = specialGetter(base_terms); MethodInvocation methodInvocation(invocation); GetSubsearchResultSet( (p_previous_results), (p_terms), methodInvocation); } if (method_name.compare("GetResultMetas") == 0) { Glib::Variant> base_identifiers; parameters.get_child(base_identifiers, 0); std::vector p_identifiers = specialGetter(base_identifiers); MethodInvocation methodInvocation(invocation); GetResultMetas( (p_identifiers), methodInvocation); } if (method_name.compare("ActivateResult") == 0) { Glib::Variant base_identifier; parameters.get_child(base_identifier, 0); Glib::ustring p_identifier = specialGetter(base_identifier); Glib::Variant> base_terms; parameters.get_child(base_terms, 1); std::vector p_terms = specialGetter(base_terms); Glib::Variant base_timestamp; parameters.get_child(base_timestamp, 2); guint32 p_timestamp = specialGetter(base_timestamp); MethodInvocation methodInvocation(invocation); ActivateResult( (p_identifier), (p_terms), (p_timestamp), methodInvocation); } if (method_name.compare("LaunchSearch") == 0) { Glib::Variant> base_terms; parameters.get_child(base_terms, 0); std::vector p_terms = specialGetter(base_terms); Glib::Variant base_timestamp; parameters.get_child(base_timestamp, 1); guint32 p_timestamp = specialGetter(base_timestamp); MethodInvocation methodInvocation(invocation); LaunchSearch( (p_terms), (p_timestamp), methodInvocation); } } void org::gnome::Shell::SearchProvider2Stub::on_interface_get_property( Glib::VariantBase &property, const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &property_name) { static_cast(property); // maybe unused static_cast(property_name); // maybe unused } bool org::gnome::Shell::SearchProvider2Stub::on_interface_set_property( const Glib::RefPtr &/* connection */, const Glib::ustring &/* sender */, const Glib::ustring &/* object_path */, const Glib::ustring &/* interface_name */, const Glib::ustring &property_name, const Glib::VariantBase &value) { static_cast(property_name); // maybe unused static_cast(value); // maybe unused return true; } bool org::gnome::Shell::SearchProvider2Stub::emitSignal( const std::string &propName, Glib::VariantBase &value) { std::map changedProps; std::vector changedPropsNoValue; changedProps[propName] = value; Glib::Variant> changedPropsVar = Glib::Variant>::create(changedProps); Glib::Variant> changedPropsNoValueVar = Glib::Variant>::create(changedPropsNoValue); std::vector ps; ps.push_back(Glib::Variant::create(m_interfaceName)); ps.push_back(changedPropsVar); ps.push_back(changedPropsNoValueVar); Glib::VariantContainerBase propertiesChangedVariant = Glib::Variant>::create_tuple(ps); for (const RegisteredObject &obj: m_registered_objects) { obj.connection->emit_signal( obj.object_path, "org.freedesktop.DBus.Properties", "PropertiesChanged", Glib::ustring(), propertiesChangedVariant); } return true; } pinot-1.22/Core/SearchProvider_stub.h000066400000000000000000000113621470740426600176220ustar00rootroot00000000000000#pragma once #include #include #include #include #include "SearchProvider_common.h" namespace org { namespace gnome { namespace Shell { class SearchProvider2Stub : public sigc::trackable { public: SearchProvider2Stub(); virtual ~SearchProvider2Stub(); SearchProvider2Stub(const SearchProvider2Stub &other) = delete; SearchProvider2Stub(SearchProvider2Stub &&other) = delete; SearchProvider2Stub &operator=(const SearchProvider2Stub &other) = delete; SearchProvider2Stub &operator=(SearchProvider2Stub &&other) = delete; guint register_object(const Glib::RefPtr &connection, const Glib::ustring &object_path); void unregister_object(); unsigned int usage_count() const { return static_cast(m_registered_objects.size()); } class MethodInvocation; protected: virtual void GetInitialResultSet( const std::vector & terms, MethodInvocation &invocation) = 0; virtual void GetSubsearchResultSet( const std::vector & previous_results, const std::vector & terms, MethodInvocation &invocation) = 0; virtual void GetResultMetas( const std::vector & identifiers, MethodInvocation &invocation) = 0; virtual void ActivateResult( const Glib::ustring & identifier, const std::vector & terms, guint32 timestamp, MethodInvocation &invocation) = 0; virtual void LaunchSearch( const std::vector & terms, guint32 timestamp, MethodInvocation &invocation) = 0; void on_method_call(const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &method_name, const Glib::VariantContainerBase ¶meters, const Glib::RefPtr &invocation); void on_interface_get_property(Glib::VariantBase& property, const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &property_name); bool on_interface_set_property( const Glib::RefPtr &connection, const Glib::ustring &sender, const Glib::ustring &object_path, const Glib::ustring &interface_name, const Glib::ustring &property_name, const Glib::VariantBase &value); private: bool emitSignal(const std::string &propName, Glib::VariantBase &value); struct RegisteredObject { guint id; Glib::RefPtr connection; std::string object_path; }; Glib::RefPtr introspection_data; std::vector m_registered_objects; std::string m_interfaceName; }; class SearchProvider2Stub::MethodInvocation { public: MethodInvocation(const Glib::RefPtr &msg): m_message(msg) {} const Glib::RefPtr getMessage() { return m_message; } void ret(Glib::Error error) { m_message->return_error(error); } void returnError(const Glib::ustring &domain, int code, const Glib::ustring &message) { m_message->return_error(domain, code, message); } void ret(const std::vector & p0) { std::vector vlist; Glib::Variant> var0 = Glib::Variant>::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret(const std::vector> & p0) { std::vector vlist; Glib::Variant>> var0 = Glib::Variant>>::create(p0); vlist.push_back(var0); m_message->return_value(Glib::Variant::create_tuple(vlist)); } void ret() { std::vector vlist; m_message->return_value(Glib::Variant::create_tuple(vlist)); } private: Glib::RefPtr m_message; }; } // Shell } // gnome } // org pinot-1.22/Core/ServerThreads.cpp000066400000000000000000000265661470740426600167750ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "MIMEScanner.h" #include "TimeConverter.h" #include "Timer.h" #include "Url.h" #include "MetaDataBackup.h" #include "ModuleFactory.h" #include "DaemonState.h" #include "PinotSettings.h" #include "ServerThreads.h" using namespace Glib; using namespace std; CrawlerThread::CrawlerThread(const string &dirName, bool isSource, MonitorInterface *pMonitor, MonitorHandler *pHandler, bool inlineIndexing) : DirectoryScannerThread(DocumentInfo("", string("file://") + dirName, "", ""), PinotSettings::getInstance().m_daemonIndexLocation, 0, inlineIndexing, true), m_sourceId(0), m_pMonitor(pMonitor), m_pHandler(pHandler), m_crawlHistory(PinotSettings::getInstance().getHistoryDatabaseName()) { if (m_dirName.empty() == false) { if (isSource == true) { // Does this source exist ? if (m_crawlHistory.hasSource("file://" + m_dirName, m_sourceId) == false) { // Create it m_sourceId = m_crawlHistory.insertSource("file://" + m_dirName); } #ifdef DEBUG clog << "CrawlerThread: source " << m_sourceId << endl; #endif } else { map fileSources; // What source does this belong to ? for(map::const_iterator sourceIter = fileSources.begin(); sourceIter != fileSources.end(); ++sourceIter) { if (sourceIter->second.length() < m_dirName.length()) { // Skip continue; } if (sourceIter->second.substr(0, m_dirName.length()) == m_dirName) { // That's the one m_sourceId = sourceIter->first; break; } } #ifdef DEBUG clog << "CrawlerThread: under source " << m_sourceId << endl; #endif } } } CrawlerThread::~CrawlerThread() { } string CrawlerThread::getType(void) const { return "CrawlerThread"; } void CrawlerThread::recordCrawled(const string &location, time_t itemDate) { // It may still be in the cache map::iterator updateIter = m_crawlCache.find(location); if (updateIter != m_crawlCache.end()) { updateIter->second.m_itemStatus = CrawlHistory::CRAWLED; updateIter->second.m_itemDate = itemDate; #ifdef DEBUG clog << "CrawlerThread::recordCrawled: updated " << location << endl; #endif } else { #ifdef DEBUG clog << "CrawlerThread::recordCrawled: cached " << location << endl; #endif m_crawlCache[location] = CrawlItem(CrawlHistory::CRAWLED, itemDate, 0); if (m_crawlCache.size() > 500) { flushUpdates(); } } } bool CrawlerThread::isIndexable(const string &entryName) const { string entryDir(path_get_dirname(entryName) + "/"); // Is this under one of the locations configured for indexing ? for (set::const_iterator locationIter = PinotSettings::getInstance().m_indexableLocations.begin(); locationIter != PinotSettings::getInstance().m_indexableLocations.end(); ++locationIter) { string locationDir(locationIter->m_name + "/"); if ((entryDir.length() >= locationDir.length()) && (entryDir.substr(0, locationDir.length()) == locationDir)) { // Yes, it is #ifdef DEBUG clog << "CrawlerThread::isIndexable: under " << locationDir << endl; #endif return true; } } return false; } bool CrawlerThread::wasCrawled(const string &location, time_t &itemDate) { CrawlHistory::CrawlStatus itemStatus = CrawlHistory::UNKNOWN; // Is it in the cache ? map::const_iterator updateIter = m_crawlCache.find(location); if (updateIter != m_crawlCache.end()) { itemStatus = updateIter->second.m_itemStatus; itemDate = updateIter->second.m_itemDate; return true; } if (m_crawlHistory.hasItem(location, itemStatus, itemDate) == true) { return true; } return false; } void CrawlerThread::recordCrawling(const string &location, bool itemExists, time_t &itemDate) { if (itemExists == false) { // Record it m_crawlHistory.insertItem(location, CrawlHistory::CRAWLING, m_sourceId, itemDate); #ifdef DEBUG clog << "CrawlerThread::recordCrawling: inserted " << location << " " << (itemExists ? "exists" : "is new") << endl; #endif } else { #ifdef DEBUG clog << "CrawlerThread::recordCrawling: cached " << location << " " << (itemExists ? "exists" : "is new") << endl; #endif // Change the status m_crawlCache[location] = CrawlItem(CrawlHistory::CRAWLING, itemDate, 0); if (m_crawlCache.size() > 500) { flushUpdates(); } } } void CrawlerThread::recordError(const string &location, int errorCode) { DirectoryScannerThread::recordError(location, errorCode); // It may still be in the cache map::iterator updateIter = m_crawlCache.find(location); if (updateIter != m_crawlCache.end()) { updateIter->second.m_itemStatus = CrawlHistory::CRAWL_ERROR; updateIter->second.m_itemDate = time(NULL); updateIter->second.m_errNum = errorCode; } else { m_crawlCache[location] = CrawlItem(CrawlHistory::CRAWL_ERROR, time(NULL), errorCode); if (m_crawlCache.size() > 500) { flushUpdates(); } } } void CrawlerThread::recordSymlink(const string &location, time_t itemDate) { m_crawlHistory.insertItem(location, CrawlHistory::CRAWL_LINK, m_sourceId, itemDate); } bool CrawlerThread::monitorEntry(const string &entryName) { if (m_pMonitor != NULL) { return m_pMonitor->addLocation(entryName, true); } return true; } void CrawlerThread::unmonitorEntry(const string &entryName) { if (m_pMonitor != NULL) { m_pMonitor->removeLocation(entryName); } } void CrawlerThread::foundFile(const DocumentInfo &docInfo) { DocumentInfo docInfoWithLabels(docInfo); set labels; stringstream labelStream; // Insert a label that identifies the source labelStream << "X-SOURCE" << m_sourceId; labels.insert(labelStream.str()); docInfoWithLabels.setLabels(labels); DirectoryScannerThread::foundFile(docInfoWithLabels); } void CrawlerThread::flushUpdates(void) { #ifdef DEBUG clog << "CrawlerThread::flushUpdates: flushing updates" << endl; #endif // Update these records m_crawlHistory.updateItems(m_crawlCache); m_crawlCache.clear(); #ifdef DEBUG clog << "CrawlerThread::flushUpdates: flushed updates" << endl; #endif } void CrawlerThread::doWork(void) { MetaDataBackup metaData(PinotSettings::getInstance().getHistoryDatabaseName()); ::Timer scanTimer; set urls; unsigned int currentOffset = 0; int entryStatus = 0; if (m_dirName.empty() == true) { return; } scanTimer.start(); clog << "Scanning " << m_dirName << endl; // Remove errors and links m_crawlHistory.deleteItems(m_sourceId, CrawlHistory::CRAWL_ERROR); m_crawlHistory.deleteItems(m_sourceId, CrawlHistory::CRAWL_LINK); // ...and entries the previous instance didn't have time to crawl m_crawlHistory.deleteItems(m_sourceId, CrawlHistory::CRAWLING); // Update this source's items status so that we can detect files that have been deleted m_crawlHistory.updateItemsStatus(CrawlHistory::CRAWLED, CrawlHistory::TO_CRAWL, m_sourceId); if (scanEntry(m_dirName, entryStatus) == false) { if (entryStatus == 0) { m_errorNum = OPENDIR_FAILED; } else { m_errorNum = entryStatus; } m_errorParam = m_dirName; } flushUpdates(); clog << "Scanned " << m_dirName << " in " << scanTimer.stop() << " ms" << endl; if (m_done == true) { #ifdef DEBUG clog << "CrawlerThread::doWork: leaving cleanup until next crawl" << endl; #endif return; } scanTimer.start(); // All files left with status TO_CRAWL were not found in this crawl // Chances are they were removed after the last full scan while ((m_pHandler != NULL) && (m_crawlHistory.getSourceItems(m_sourceId, CrawlHistory::TO_CRAWL, urls, currentOffset, currentOffset + 100) > 0)) { for (set::const_iterator urlIter = urls.begin(); urlIter != urls.end(); ++urlIter) { #ifdef DEBUG clog << "CrawlerThread::doWork: didn't find " << *urlIter << endl; #endif // Inform the MonitorHandler m_pHandler->fileDeleted(urlIter->substr(7)); // Delete this item m_crawlHistory.deleteItem(*urlIter); metaData.deleteItem(DocumentInfo("", *urlIter, "", ""), DocumentInfo::SERIAL_ALL); } // Next if (urls.size() < 100) { break; } currentOffset += 100; } clog << "Cleaned up " << currentOffset + urls.size() << " history entries in " << scanTimer.stop() << " ms" << endl; } RestoreMetaDataThread::RestoreMetaDataThread() : WorkerThread() { } RestoreMetaDataThread::~RestoreMetaDataThread() { } string RestoreMetaDataThread::getType(void) const { return "RestoreMetaDataThread"; } void RestoreMetaDataThread::doWork(void) { PinotSettings &settings = PinotSettings::getInstance(); MetaDataBackup metaData(settings.getHistoryDatabaseName()); ::Timer restoreTimer; set urls; unsigned int currentOffset = 0, totalCount = 0; IndexInterface *pIndex = settings.getIndex(settings.m_daemonIndexLocation); if (pIndex == NULL) { return; } if (pIndex->isGood() == false) { delete pIndex; return; } // Restore user-set metadata on all documents for (set::const_iterator locationIter = settings.m_indexableLocations.begin(); locationIter != settings.m_indexableLocations.end(); ++locationIter) { string dirName(locationIter->m_name); restoreTimer.start(); while (metaData.getItems(string("file://") + dirName, urls, currentOffset, currentOffset + 100) == true) { for (set::const_iterator urlIter = urls.begin(); urlIter != urls.end(); ++urlIter) { unsigned int docId = pIndex->hasDocument(*urlIter); if (docId == 0) { #ifdef DEBUG clog << "RestoreMetaDataThread::doWork: " << *urlIter << " is not indexed, can't be restored" << endl; #endif continue; } DocumentInfo docInfo("", *urlIter, "", ""); if (metaData.getItem(docInfo, DocumentInfo::SERIAL_FIELDS) == true) { #ifdef DEBUG clog << "RestoreMetaDataThread::doWork: restored fields on " << *urlIter << endl; #endif pIndex->updateDocumentInfo(docId, docInfo); } if (metaData.getItem(docInfo, DocumentInfo::SERIAL_LABELS) == true) { #ifdef DEBUG clog << "RestoreMetaDataThread::doWork: restored " << docInfo.getLabels().size() << " labels on " << *urlIter << endl; #endif pIndex->setDocumentLabels(docId, docInfo.getLabels(), true); } } // Next totalCount += urls.size(); if (urls.size() < 100) { break; } currentOffset += 100; urls.clear(); } clog << "Restored user-set metadata for " << totalCount << " documents in " << dirName << ", in " << restoreTimer.stop() << " ms" << endl; } delete pIndex; } pinot-1.22/Core/ServerThreads.h000066400000000000000000000053721470740426600164320ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _SERVERTHREADS_HH #define _SERVERTHREADS_HH #include #include #include "DocumentInfo.h" #include "CrawlHistory.h" #include "IndexInterface.h" #include "MonitorInterface.h" #include "MonitorHandler.h" #include "QueryProperties.h" #include "WorkerThreads.h" class CrawlerThread : public DirectoryScannerThread { public: CrawlerThread(const std::string &dirName, bool isSource, MonitorInterface *pMonitor, MonitorHandler *pHandler, bool inlineIndexing = false); virtual ~CrawlerThread(); virtual std::string getType(void) const; protected: unsigned int m_sourceId; MonitorInterface *m_pMonitor; MonitorHandler *m_pHandler; CrawlHistory m_crawlHistory; std::map m_crawlCache; std::stack m_currentLinks; std::stack m_currentLinkReferrees; virtual void recordCrawled(const std::string &location, time_t itemDate); virtual bool isIndexable(const std::string &entryName) const; virtual bool wasCrawled(const std::string &location, time_t &itemDate); virtual void recordCrawling(const std::string &location, bool itemExists, time_t &itemDate); virtual void recordError(const std::string &location, int errorCode); virtual void recordSymlink(const std::string &location, time_t itemDate); virtual bool monitorEntry(const std::string &entryName); virtual void unmonitorEntry(const std::string &entryName); virtual void foundFile(const DocumentInfo &docInfo); void flushUpdates(void); virtual void doWork(void); private: CrawlerThread(const CrawlerThread &other); CrawlerThread &operator=(const CrawlerThread &other); }; class RestoreMetaDataThread : public WorkerThread { public: RestoreMetaDataThread(); virtual ~RestoreMetaDataThread(); virtual std::string getType(void) const; protected: virtual void doWork(void); private: RestoreMetaDataThread(const RestoreMetaDataThread &other); RestoreMetaDataThread &operator=(const RestoreMetaDataThread &other); }; #endif // _SERVERTHREADS_HH pinot-1.22/Core/UniqueApplication.cpp000066400000000000000000000071171470740426600176350ustar00rootroot00000000000000/* * Copyright 2008-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include "config.h" #include "UniqueApplication.h" using std::clog; using std::clog; using std::endl; using std::string; using std::fstream; using std::stringstream; UniqueApplication::UniqueApplication(const string &name) { #ifdef HAVE_UNIQUE m_pApp = unique_app_new(name.c_str(), NULL); #ifdef DEBUG if (m_pApp != NULL) { clog << "UniqueApplication: registered" << endl; } else { clog << "UniqueApplication: failed to register" << endl; } #endif #endif } UniqueApplication::~UniqueApplication() { #ifdef HAVE_UNIQUE if (m_pApp != NULL) { g_object_unref(m_pApp); } #endif } bool UniqueApplication::isRunning(void) { #ifdef HAVE_UNIQUE if ((m_pApp != NULL) && (unique_app_is_running(m_pApp) == TRUE)) { return true; } #endif return false; } bool UniqueApplication::isRunning(const string &pidFileName, const string &processName) { fstream pidFile; // Open the PID file pidFile.open(pidFileName.c_str(), std::ios::in); if (pidFile.is_open() == false) { // The application may still be running even though the PID file doesn't exist if (isRunning() == true) { return true; } // Keep going } else { pid_t processID = 0; bool stillRunning = false, processDied = false; pidFile >> processID; pidFile.close(); // Is another process running ? if (processID > 0) { #ifdef HAVE_UNIQUE if (isRunning() == true) { // It's still running stillRunning = true; } else { // It most likely died processDied = true; } #else fstream cmdLineFile; stringstream cmdLineFileName; bool checkProcess = true; // FIXME: check for existence of /proc cmdLineFileName << "/proc/" << processID << "/cmdline"; cmdLineFile.open(cmdLineFileName.str().c_str(), std::ios::in); if (cmdLineFile.is_open() == true) { string cmdLine; cmdLineFile >> cmdLine; cmdLineFile.close(); if (cmdLine.find(processName) == string::npos) { // It's another process checkProcess = false; processDied = true; } } #ifdef HAVE_KILL if (checkProcess == true) { if (kill(processID, 0) == 0) { // It's still running stillRunning = true; } else if (errno == ESRCH) { // This PID doesn't exist processDied = true; } } #endif #endif if (stillRunning == true) { clog << "Process " << processName << " (" << processID << ") is still running" << endl; return true; } if (processDied == true) { clog << "Previous instance " << processID << " died prematurely" << endl; } } } // Now save our PID pidFile.open(pidFileName.c_str(), std::ios::out); if (pidFile.is_open() == true) { pidFile << getpid() << endl; pidFile.close(); } return false; } pinot-1.22/Core/UniqueApplication.h000066400000000000000000000024641470740426600173020ustar00rootroot00000000000000/* * Copyright 2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _UNIQUEAPPLICATION_HH #define _UNIQUEAPPLICATION_HH #include #ifdef HAVE_UNIQUE #include #endif class UniqueApplication { public: UniqueApplication(const std::string &name); ~UniqueApplication(); bool isRunning(void); bool isRunning(const std::string &pidFileName, const std::string &processName); private: #ifdef HAVE_UNIQUE UniqueApp *m_pApp; #endif UniqueApplication(const UniqueApplication &other); UniqueApplication &operator=(const UniqueApplication &other); }; #endif // _UNIQUEAPPLICATION_HH pinot-1.22/Core/WorkerThread.cpp000066400000000000000000000412071470740426600166020ustar00rootroot00000000000000/* * Copyright 2005-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #ifdef __OpenBSD__ #include #include #endif #include #include #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "Memory.h" #include "Url.h" #include "WorkerThread.h" using namespace std; using namespace Glib; // A function object to stop threads with for_each() struct StopThreadFunc { public: void operator()(map::value_type &p) { p.second->stop(); #ifdef DEBUG clog << "StopThreadFunc: stopped thread " << p.second->getId() << endl; #endif Thread::yield(); } }; Dispatcher WorkerThread::m_dispatcher; pthread_mutex_t WorkerThread::m_dispatcherMutex = PTHREAD_MUTEX_INITIALIZER; bool WorkerThread::m_immediateFlush = true; string WorkerThread::errorToString(int errorNum) { if (errorNum == 0) { return ""; } if (errorNum < INDEX_ERROR) { ustring errorText(Glib::strerror(errorNum)); return errorText.c_str(); } // Internal error codes switch (errorNum) { case INDEX_ERROR: return _("Index error"); case INDEXING_FAILED: return _("Couldn't index document"); case UPDATE_FAILED: return _("Couldn't update document"); case UNINDEXING_FAILED: return _("Couldn't unindex document(s)"); case QUERY_FAILED: return _("Couldn't run query on search engine"); case HISTORY_FAILED: return _("Couldn't get history for search engine"); case DOWNLOAD_FAILED: return _("Couldn't retrieve document"); case MONITORING_FAILED: return _("File monitor error"); case OPENDIR_FAILED: return _("Couldn't open directory"); case UNKNOWN_INDEX: return _("Index doesn't exist"); case UNKNOWN_ENGINE: return _("Couldn't create search engine"); case UNSUPPORTED_TYPE: return _("Cannot index document type"); case UNSUPPORTED_PROTOCOL: return _("No downloader for this protocol"); case ROBOTS_FORBIDDEN: return _("Robots META tag forbids indexing"); case NO_MONITORING: return _("No monitoring handler"); default: break; } return _("Unknown error"); } Dispatcher &WorkerThread::getDispatcher(void) { return m_dispatcher; } void WorkerThread::immediateFlush(bool doFlush) { m_immediateFlush = doFlush; } WorkerThread::WorkerThread() : m_startTime(time(NULL)), m_id(ThreadsManager::get_next_id()), m_background(false), m_stopped(false), m_done(false), m_errorNum(0) { } WorkerThread::~WorkerThread() { } time_t WorkerThread::getStartTime(void) const { return m_startTime; } void WorkerThread::setId(unsigned int id) { m_id = id; } unsigned int WorkerThread::getId(void) const { return m_id; } void WorkerThread::inBackground(void) { m_background = true; } bool WorkerThread::isBackground(void) const { return m_background; } bool WorkerThread::operator<(const WorkerThread &other) const { return m_id < other.m_id; } Glib::Thread *WorkerThread::start(void) { #ifdef DEBUG clog << "WorkerThread::start: " << getType() << " " << m_id << endl; #endif // Create non-joinable threads return Thread::create(sigc::mem_fun(*this, &WorkerThread::threadHandler), false); } void WorkerThread::stop(void) { m_stopped = m_done = true; } bool WorkerThread::isStopped(void) const { return m_stopped; } bool WorkerThread::isDone(void) const { return m_done; } int WorkerThread::getErrorNum(void) const { return m_errorNum; } string WorkerThread::getStatus(void) const { string status(errorToString(m_errorNum)); if ((status.empty() == false) && (m_errorParam.empty() == false)) { status += " ("; status += m_errorParam; status += ")"; } return status; } void WorkerThread::threadHandler(void) { #ifdef DEBUG clog << "WorkerThread::threadHandler: thread " << m_id << endl; #endif try { doWork(); } catch (Glib::Exception &ex) { clog << "Glib exception in thread " << m_id << ", type " << getType() << ":" << ex.what() << endl; m_errorNum = UNKNOWN_ERROR; } catch (std::exception &ex) { clog << "STL exception in thread " << m_id << ", type " << getType() << ":" << ex.what() << endl; m_errorNum = UNKNOWN_ERROR; } catch (...) { clog << "Unknown exception in thread " << m_id << ", type " << getType() << endl; m_errorNum = UNKNOWN_ERROR; } emitSignal(); } void WorkerThread::emitSignal(void) { m_done = true; if (pthread_mutex_lock(&m_dispatcherMutex) == 0) { #ifdef DEBUG clog << "WorkerThread::emitSignal: signaling end of thread " << m_id << endl; #endif m_dispatcher(); pthread_mutex_unlock(&m_dispatcherMutex); } } unsigned int ThreadsManager::m_nextThreadId = 1; ThreadsManager::ThreadsManager(const string &defaultIndexLocation, unsigned int maxThreadsTime) : m_mustQuit(false), m_defaultIndexLocation(defaultIndexLocation), m_maxIndexThreads(1), m_backgroundThreadsCount(0), m_foregroundThreadsMaxTime(maxThreadsTime), m_numCPUs(1) { pthread_rwlock_init(&m_threadsLock, NULL); pthread_rwlock_init(&m_listsLock, NULL); // Override the number of indexing threads ? char *pEnvVar = getenv("PINOT_MAXIMUM_INDEX_THREADS"); if ((pEnvVar != NULL) && (strlen(pEnvVar) > 0)) { int threadsNum = atoi(pEnvVar); if (threadsNum > 0) { m_maxIndexThreads = (unsigned int)threadsNum; } } #ifdef __OpenBSD__ int mib[2], ncpus; mib[0] = CTL_HW; mib[1] = HW_NCPU; size_t len = sizeof(ncpus); if (sysctl(mib, 2, &ncpus, &len, NULL, 0) > 0) { m_numCPUs = ncpus; } #else #ifdef HAVE_SYSCONF m_numCPUs = sysconf(_SC_NPROCESSORS_ONLN); #endif #endif } ThreadsManager::~ThreadsManager() { stop_threads(); // Destroy the read/write locks pthread_rwlock_destroy(&m_listsLock); pthread_rwlock_destroy(&m_threadsLock); } bool ThreadsManager::read_lock_threads(void) { if (pthread_rwlock_rdlock(&m_threadsLock) == 0) { return true; } return false; } bool ThreadsManager::write_lock_threads(void) { if (pthread_rwlock_wrlock(&m_threadsLock) == 0) { return true; } return false; } void ThreadsManager::unlock_threads(void) { pthread_rwlock_unlock(&m_threadsLock); } bool ThreadsManager::read_lock_lists(void) { if (pthread_rwlock_rdlock(&m_listsLock) == 0) { return true; } return false; } bool ThreadsManager::write_lock_lists(void) { if (pthread_rwlock_wrlock(&m_listsLock) == 0) { return true; } return false; } void ThreadsManager::unlock_lists(void) { pthread_rwlock_unlock(&m_listsLock); } WorkerThread *ThreadsManager::get_thread(void) { time_t timeNow = time(NULL); WorkerThread *pWorkerThread = NULL; // Get the first thread that's finished if (write_lock_threads() == true) { for (map::iterator threadIter = m_threads.begin(); threadIter != m_threads.end(); ++threadIter) { unsigned int threadId = threadIter->first; if (threadIter->second->isDone() == false) { #ifdef DEBUG clog << "ThreadsManager::get_thread: thread " << threadId << " is not done" << endl; #endif // Foreground threads ought not to run very long if ((threadIter->second->isBackground() == false) && (threadIter->second->getStartTime() + m_foregroundThreadsMaxTime < timeNow)) { // This thread has been running for too long ! threadIter->second->stop(); clog << "Stopped long-running thread " << threadId << endl; } } else { // This one will do... pWorkerThread = threadIter->second; // Remove it m_threads.erase(threadIter); #ifdef DEBUG clog << "ThreadsManager::get_thread: thread " << threadId << " is done, " << m_threads.size() << " left" << endl; #endif break; } } unlock_threads(); } if (pWorkerThread == NULL) { return NULL; } if (pWorkerThread->isBackground() == true) { #ifdef DEBUG clog << "ThreadsManager::get_thread: thread " << pWorkerThread->getId() << " was running in the background" << endl; #endif --m_backgroundThreadsCount; } return pWorkerThread; } unsigned int ThreadsManager::get_next_id(void) { unsigned int nextThreadId = ++m_nextThreadId; // Reclaim memory on a regular basis if (nextThreadId % 100 == 0) { int inUse = Memory::getUsage(); Memory::reclaim(); } return nextThreadId; } bool ThreadsManager::start_thread(WorkerThread *pWorkerThread, bool inBackground) { bool createdThread = false; if (pWorkerThread == NULL) { return false; } if (inBackground == true) { #ifdef DEBUG clog << "ThreadsManager::start_thread: thread " << pWorkerThread->getId() << " will run in the background" << endl; #endif pWorkerThread->inBackground(); ++m_backgroundThreadsCount; } #ifdef DEBUG else clog << "ThreadsManager::start_thread: thread " << pWorkerThread->getId() << " will run in the foreground" << endl; #endif // Insert pair::iterator, bool> threadPair; if (write_lock_threads() == true) { threadPair = m_threads.insert(pair(pWorkerThread->getId(), pWorkerThread)); if (threadPair.second == false) { delete pWorkerThread; pWorkerThread = NULL; } unlock_threads(); } // Start the thread if (pWorkerThread != NULL) { Thread *pThread = pWorkerThread->start(); if (pThread != NULL) { createdThread = true; } else { // Erase if (write_lock_threads() == true) { m_threads.erase(threadPair.first); unlock_threads(); } delete pWorkerThread; } } return createdThread; } unsigned int ThreadsManager::get_threads_count(void) { int count = 0; if (read_lock_threads() == true) { count = m_threads.size() - m_backgroundThreadsCount; unlock_threads(); } #ifdef DEBUG clog << "ThreadsManager::get_threads_count: " << count << "/" << m_backgroundThreadsCount << " threads left" << endl; #endif // A negative count would mean that a background thread // exited without signaling return (unsigned int)max(count , 0); } void ThreadsManager::stop_threads(void) { if (m_threads.empty() == false) { if (write_lock_threads() == true) { // Stop threads for_each(m_threads.begin(), m_threads.end(), StopThreadFunc()); unlock_threads(); } } } void ThreadsManager::connect(void) { // The previous manager may have been signalled by our threads WorkerThread *pThread = get_thread(); while (pThread != NULL) { m_onThreadEndSignal(pThread); // Next pThread = get_thread(); } #ifdef DEBUG clog << "ThreadsManager::connect: connecting" << endl; #endif // Connect the dispatcher m_threadsEndConnection = WorkerThread::getDispatcher().connect( sigc::mem_fun(*this, &ThreadsManager::on_thread_signal)); #ifdef DEBUG clog << "ThreadsManager::connect: connected" << endl; #endif } void ThreadsManager::disconnect(void) { m_threadsEndConnection.block(); m_threadsEndConnection.disconnect(); #ifdef DEBUG clog << "ThreadsManager::disconnect: disconnected" << endl; #endif } void ThreadsManager::on_thread_signal() { WorkerThread *pThread = get_thread(); if (pThread == NULL) { #ifdef DEBUG clog << "ThreadsManager::on_thread_signal: foreign thread" << endl; #endif return; } m_onThreadEndSignal(pThread); } bool ThreadsManager::mustQuit(bool quit) { if (quit == true) { m_mustQuit = true; stop_threads(); } return m_mustQuit; } MonitorThread::MonitorThread(MonitorInterface *pMonitor, MonitorHandler *pHandler) : WorkerThread(), m_ctrlReadPipe(-1), m_ctrlWritePipe(-1), m_pMonitor(pMonitor), m_pHandler(pHandler) { int pipeFds[2]; #ifdef HAVE_PIPE if (pipe(pipeFds) == 0) { // This pipe will allow to stop select() m_ctrlReadPipe = pipeFds[0]; m_ctrlWritePipe = pipeFds[1]; } #endif } MonitorThread::~MonitorThread() { if (m_ctrlReadPipe >= 0) { close(m_ctrlReadPipe); } if (m_ctrlWritePipe >= 0) { close(m_ctrlWritePipe); } } string MonitorThread::getType(void) const { return "MonitorThread"; } void MonitorThread::stop(void) { WorkerThread::stop(); if (m_ctrlWritePipe >= 0) { if (write(m_ctrlWritePipe, "X", 1) == -1) { clog << "Couldn't signal thread " << m_id << " to stop" << endl; } } } bool MonitorThread::isFileBlacklisted(const string &location) { return false; } void MonitorThread::fileModified(const string &location) { // Pass this event directly to the handler m_pHandler->fileModified(location); } void MonitorThread::processEvents(void) { queue events; #ifdef DEBUG clog << "MonitorThread::processEvents: checking for events" << endl; #endif if ((m_pMonitor == NULL) || (m_pMonitor->retrievePendingEvents(events) == false)) { return; } #ifdef DEBUG clog << "MonitorThread::processEvents: retrieved " << events.size() << " events" << endl; #endif while ((events.empty() == false) && (m_done == false)) { MonitorEvent &event = events.front(); if ((event.m_location.empty() == true) || (event.m_type == MonitorEvent::UNKNOWN)) { // Next events.pop(); continue; } #ifdef DEBUG clog << "MonitorThread::processEvents: event " << event.m_type << " on " << event.m_location << " " << event.m_isDirectory << endl; #endif // Skip dotfiles and blacklisted files Url urlObj("file://" + event.m_location); if ((urlObj.getFile()[0] == '.') || (isFileBlacklisted(event.m_location) == true)) { // Next events.pop(); continue; } // What's the event code ? if (event.m_type == MonitorEvent::EXISTS) { if (event.m_isDirectory == false) { m_pHandler->fileExists(event.m_location); } } else if (event.m_type == MonitorEvent::CREATED) { if (event.m_isDirectory == false) { m_pHandler->fileCreated(event.m_location); } else { m_pHandler->directoryCreated(event.m_location); } } else if (event.m_type == MonitorEvent::WRITE_CLOSED) { if (event.m_isDirectory == false) { fileModified(event.m_location); } } else if (event.m_type == MonitorEvent::MOVED) { if (event.m_isDirectory == false) { m_pHandler->fileMoved(event.m_location, event.m_previousLocation); } else { // We should receive this only if the destination directory is monitored too m_pHandler->directoryMoved(event.m_location, event.m_previousLocation); } } else if (event.m_type == MonitorEvent::DELETED) { if (event.m_isDirectory == false) { m_pHandler->fileDeleted(event.m_location); } else { // The monitor should have stopped monitoring this // In practice, events for the files in this directory will already have been received m_pHandler->directoryDeleted(event.m_location); } } // Next events.pop(); } } void MonitorThread::doWork(void) { if ((m_pHandler == NULL) || (m_pMonitor == NULL)) { m_errorNum = NO_MONITORING; return; } // Initialize the handler m_pHandler->initialize(); // Get the list of files to monitor const set &fileNames = m_pHandler->getFileNames(); for (set::const_iterator fileIter = fileNames.begin(); fileIter != fileNames.end(); ++fileIter) { m_pMonitor->addLocation(*fileIter, false); } // Directories, if any, are set elsewhere // In the case of OnDiskHandler, they are set by DirectoryScannerThread // There might already be events that need processing processEvents(); // Wait for something to happen while (m_done == false) { struct timeval selectTimeout; fd_set listenSet; selectTimeout.tv_sec = 60; selectTimeout.tv_usec = 0; FD_ZERO(&listenSet); if (m_ctrlReadPipe >= 0) { FD_SET(m_ctrlReadPipe, &listenSet); } // The file descriptor may change over time int monitorFd = m_pMonitor->getFileDescriptor(); FD_SET(monitorFd, &listenSet); if (monitorFd < 0) { m_errorNum = MONITORING_FAILED; return; } int fdCount = select(max(monitorFd, m_ctrlReadPipe) + 1, &listenSet, NULL, NULL, &selectTimeout); if ((fdCount < 0) && (errno != EINTR)) { #ifdef DEBUG clog << "MonitorThread::doWork: select() failed" << endl; #endif break; } else if (FD_ISSET(monitorFd, &listenSet)) { processEvents(); } } } pinot-1.22/Core/WorkerThread.h000066400000000000000000000110371470740426600162450ustar00rootroot00000000000000/* * Copyright 2005-2014 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _WORKERTHREAD_HH #define _WORKERTHREAD_HH #include #include #include #include #include #include #include #include #include #include "Document.h" #include "MonitorInterface.h" #include "MonitorHandler.h" class WorkerThread { public: WorkerThread(); virtual ~WorkerThread(); typedef enum { UNKNOWN_ERROR = 10000, INDEX_ERROR, INDEXING_FAILED, UPDATE_FAILED, UNINDEXING_FAILED, \ QUERY_FAILED, HISTORY_FAILED, DOWNLOAD_FAILED, MONITORING_FAILED, OPENDIR_FAILED, \ UNKNOWN_INDEX, UNKNOWN_ENGINE, UNSUPPORTED_TYPE, UNSUPPORTED_PROTOCOL, \ ROBOTS_FORBIDDEN, NO_MONITORING } ThreadError; static std::string errorToString(int errorNum); static Glib::Dispatcher &getDispatcher(void); static void immediateFlush(bool doFlush); time_t getStartTime(void) const; void setId(unsigned int id); unsigned int getId(void) const; void inBackground(void); bool isBackground(void) const; bool operator<(const WorkerThread &other) const; Glib::Thread *start(void); virtual std::string getType(void) const = 0; virtual void stop(void); bool isStopped(void) const; bool isDone(void) const; int getErrorNum(void) const; std::string getStatus(void) const; protected: /// Use a Dispatcher for thread safety static Glib::Dispatcher m_dispatcher; static pthread_mutex_t m_dispatcherMutex; static bool m_immediateFlush; time_t m_startTime; unsigned int m_id; bool m_background; bool m_stopped; bool m_done; int m_errorNum; std::string m_errorParam; void threadHandler(void); virtual void doWork(void) = 0; void emitSignal(void); private: WorkerThread(const WorkerThread &other); WorkerThread &operator=(const WorkerThread &other); }; class ThreadsManager : virtual public sigc::trackable { public: ThreadsManager(const std::string &defaultIndexLocation, unsigned int maxThreadsTime = 300); virtual ~ThreadsManager(); static unsigned int get_next_id(void); bool start_thread(WorkerThread *pWorkerThread, bool inBackground = false); unsigned int get_threads_count(void); void stop_threads(void); virtual void connect(void); virtual void disconnect(void); void on_thread_signal(); bool read_lock_lists(void); bool write_lock_lists(void); void unlock_lists(void); bool mustQuit(bool quit = false); protected: static unsigned int m_nextThreadId; sigc::connection m_threadsEndConnection; pthread_rwlock_t m_threadsLock; pthread_rwlock_t m_listsLock; std::map m_threads; bool m_mustQuit; std::string m_defaultIndexLocation; unsigned int m_maxIndexThreads; unsigned int m_backgroundThreadsCount; unsigned int m_foregroundThreadsMaxTime; long m_numCPUs; sigc::signal1 m_onThreadEndSignal; std::set m_beingIndexed; bool read_lock_threads(void); bool write_lock_threads(void); void unlock_threads(void); WorkerThread *get_thread(void); private: ThreadsManager(const ThreadsManager &other); ThreadsManager &operator=(const ThreadsManager &other); }; class MonitorThread : public WorkerThread { public: MonitorThread(MonitorInterface *pMonitor, MonitorHandler *pHandler); virtual ~MonitorThread(); virtual std::string getType(void) const; virtual void stop(void); protected: int m_ctrlReadPipe; int m_ctrlWritePipe; MonitorInterface *m_pMonitor; MonitorHandler *m_pHandler; virtual bool isFileBlacklisted(const std::string &location); virtual void fileModified(const std::string &location); void processEvents(void); virtual void doWork(void); private: MonitorThread(const MonitorThread &other); MonitorThread &operator=(const MonitorThread &other); }; #endif // _WORKERTHREAD_HH pinot-1.22/Core/WorkerThreads.cpp000066400000000000000000001201601470740426600167610ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #ifdef __OpenBSD__ #include #include #endif #include #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "Languages.h" #include "MIMEScanner.h" #include "TimeConverter.h" #include "Timer.h" #include "Url.h" #include "HtmlFilter.h" #include "FilterUtils.h" #include "DownloaderFactory.h" #include "FilterWrapper.h" #include "ModuleFactory.h" #include "WebEngine.h" #include "WorkerThreads.h" using namespace std; using namespace Glib; QueueManager::QueueManager(const string &defaultIndexLocation, unsigned int maxThreadsTime, bool scanLocalFiles) : ThreadsManager(defaultIndexLocation, maxThreadsTime), m_scanLocalFiles(scanLocalFiles), m_stopIndexing(false), m_actionQueue(PinotSettings::getInstance().getHistoryDatabaseName(), get_application_name()) { } QueueManager::~QueueManager() { } ustring QueueManager::index_document(const DocumentInfo &docInfo) { string location(docInfo.getLocation()); #ifdef DEBUG clog << "ThreadsManager::index_document: called with " << location << endl; #endif if (m_stopIndexing == true) { #ifdef DEBUG clog << "ThreadsManager::index_document: stopped indexing" << endl; #endif return _("Indexing was stopped"); } if (location.empty() == true) { // Nothing to do return ""; } // If the document is a mail message, we can't index it again Url urlObj(location); if (urlObj.getProtocol() == "mailbox") { return _("Can't index mail here"); } // Is the document being indexed/updated ? if (write_lock_lists() == true) { bool beingProcessed = true; if (m_beingIndexed.find(location) == m_beingIndexed.end()) { m_beingIndexed.insert(location); beingProcessed = false; } unlock_lists(); if (beingProcessed == true) { // FIXME: we may have to set labels on this document // FIXME: fix this for RTL languages ustring status(location); status += " "; status += _("is already being indexed"); return status; } } // Is the document blacklisted ? if (PinotSettings::getInstance().isBlackListed(location) == true) { // FIXME: fix this for RTL languages ustring status(location); status += " "; status += _("is blacklisted"); return status; } if ((m_scanLocalFiles == true) && (urlObj.isLocal() == true)) { #ifdef DEBUG clog << "ThreadsManager::index_document: scanning " << urlObj.getLocation() + "/" + urlObj.getFile() << endl; #endif // This handles both directories and files start_thread(new DirectoryScannerThread(docInfo, m_defaultIndexLocation, 0, true, true)); } else { start_thread(new IndexingThread(docInfo, m_defaultIndexLocation)); } return ""; } void QueueManager::clear_queues(void) { if (write_lock_lists() == true) { m_beingIndexed.clear(); unlock_lists(); m_actionQueue.expireItems(time(NULL)); } } ustring QueueManager::queue_index(const DocumentInfo &docInfo) { bool addToQueue = false; if (get_threads_count() >= m_maxIndexThreads) { #ifdef DEBUG clog << "QueueManager::queue_index: too many threads" << endl; #endif addToQueue = true; } #ifdef HAVE_GETLOADAVG // Get the load averaged over the last minute else { double averageLoad[3]; if (getloadavg(averageLoad, 3) != -1) { // FIXME: is LOADAVG_1MIN Solaris specific ? if (averageLoad[0] >= (double)m_numCPUs * 4) { // Don't add to the load, queue this addToQueue = true; } } } #endif if (addToQueue == true) { m_actionQueue.pushItem(ActionQueue::INDEX, docInfo); return ""; } return index_document(docInfo); } bool QueueManager::pop_queue(const string &urlWasIndexed) { bool getItem = true; bool emptyQueue = false; #ifdef DEBUG clog << "QueueManager::pop_queue: called with " << urlWasIndexed << endl; #endif if (get_threads_count() >= m_maxIndexThreads) { #ifdef DEBUG clog << "QueueManager::pop_queue: too many threads" << endl; #endif getItem = false; } if (write_lock_lists() == true) { // Update the in-progress list if (urlWasIndexed.empty() == false) { set::iterator urlIter = m_beingIndexed.find(urlWasIndexed); if (urlIter != m_beingIndexed.end()) { m_beingIndexed.erase(urlIter); } } unlock_lists(); // Get an item ? if (getItem == true) { ActionQueue::ActionType type; DocumentInfo docInfo; string previousLocation; // Assume the queue is empty emptyQueue = true; while (m_actionQueue.popItem(type, docInfo) == true) { ustring status; if (type != ActionQueue::INDEX) { continue; } // The queue isn't actually empty emptyQueue = false; if (docInfo.getLocation() == previousLocation) { // Something dodgy is going on, we got the same item twice ! // FIXME: fix this for RTL languages status = previousLocation; status += " "; status += _("is already being indexed"); } else { status = index_document(docInfo); } if (status.empty() == true) { break; } previousLocation = docInfo.getLocation(); } } } return emptyQueue; } ListerThread::ListerThread(const PinotSettings::IndexProperties &indexProps, unsigned int startDoc) : WorkerThread(), m_indexProps(indexProps), m_startDoc(startDoc), m_documentsCount(0) { } ListerThread::~ListerThread() { } string ListerThread::getType(void) const { return "ListerThread"; } PinotSettings::IndexProperties ListerThread::getIndexProperties(void) const { return m_indexProps; } unsigned int ListerThread::getStartDoc(void) const { return m_startDoc; } const vector &ListerThread::getDocuments(void) const { return m_documentsList; } unsigned int ListerThread::getDocumentsCount(void) const { return m_documentsCount; } QueryingThread::QueryingThread(const PinotSettings::IndexProperties &indexProps, const QueryProperties &queryProps, unsigned int startDoc, bool listingIndex) : ListerThread(indexProps, startDoc), m_engineName(PinotSettings::getInstance().m_defaultBackend), m_engineDisplayableName(indexProps.m_name), m_engineOption(indexProps.m_location), m_queryProps(queryProps), m_listingIndex(listingIndex), m_correctedSpelling(false), m_isLive(true) { #ifdef DEBUG clog << "QueryingThread: engine " << m_engineName << ", " << m_engineOption << ", mode " << m_listingIndex << endl; #endif } QueryingThread::QueryingThread(const string &engineName, const string &engineDisplayableName, const string &engineOption, const QueryProperties &queryProps, unsigned int startDoc) : ListerThread(PinotSettings::IndexProperties(engineDisplayableName, engineOption, 0, false), startDoc), m_engineName(engineName), m_engineDisplayableName(engineDisplayableName), m_engineOption(engineOption), m_queryProps(queryProps), m_listingIndex(false), m_correctedSpelling(false), m_isLive(true) { #ifdef DEBUG clog << "QueryingThread: engine " << m_engineName << ", " << m_engineOption << ", mode 0" << endl; #endif } QueryingThread::~QueryingThread() { } string QueryingThread::getType(void) const { if (m_listingIndex == true) { return ListerThread::getType(); } return "QueryingThread"; } bool QueryingThread::isLive(void) const { return m_isLive; } string QueryingThread::getEngineName(void) const { return m_engineDisplayableName; } QueryProperties QueryingThread::getQuery(bool &wasCorrected) const { wasCorrected = m_correctedSpelling; return m_queryProps; } string QueryingThread::getCharset(void) const { return m_resultsCharset; } bool QueryingThread::findPlugin(void) { string pluginName; if ((m_engineName.empty() == true) && (m_engineOption.empty() == false)) { pluginName = m_engineOption; } else if ((m_engineName.empty() == false) && (m_engineOption.empty() == true)) { pluginName = m_engineName; } if (pluginName.empty() == false) { set engines; PinotSettings::getInstance().getSearchEngines(engines, ""); #ifdef DEBUG clog << "QueryingThread::findPlugin: looking for a plugin named " << pluginName << endl; #endif // Is there a plugin with such a name ? ModuleProperties modProps("sherlock", pluginName, "", ""); set::const_iterator engineIter = engines.find(modProps); if (engineIter == engines.end()) { // Try again modProps.m_name = "opensearch"; engineIter = engines.find(modProps); } if (engineIter != engines.end()) { // Yes, there is ! m_engineName = engineIter->m_name; m_engineDisplayableName = engineIter->m_longName; m_engineOption = engineIter->m_option; #ifdef DEBUG clog << "QueryingThread::findPlugin: found " << m_engineName << ", " << m_engineDisplayableName << ", " << m_engineOption << endl; #endif return true; } } return false; } EngineQueryThread::EngineQueryThread(const PinotSettings::IndexProperties &indexProps, const QueryProperties &queryProps, unsigned int startDoc, bool listingIndex) : QueryingThread(indexProps, queryProps, startDoc, listingIndex) { } EngineQueryThread::EngineQueryThread(const PinotSettings::IndexProperties &indexProps, const QueryProperties &queryProps, const set &limitToDocsSet, unsigned int startDoc) : QueryingThread(indexProps, queryProps, startDoc, false) { copy(limitToDocsSet.begin(), limitToDocsSet.end(), inserter(m_limitToDocsSet, m_limitToDocsSet.begin())); } EngineQueryThread::EngineQueryThread(const string &engineName, const string &engineDisplayableName, const string &engineOption, const QueryProperties &queryProps, unsigned int startDoc) : QueryingThread(engineName, engineDisplayableName, engineOption, queryProps, startDoc) { } EngineQueryThread::~EngineQueryThread() { } void EngineQueryThread::processResults(const vector &resultsList) { PinotSettings &settings = PinotSettings::getInstance(); IndexInterface *pDocsIndex = NULL; IndexInterface *pDaemonIndex = NULL; unsigned int indexId = 0; bool isIndexQuery = false; // Are we querying an index ? if (ModuleFactory::isSupported(m_engineName, true) == true) { // Internal index ? if ((m_engineOption == settings.m_docsIndexLocation) || (m_engineOption == settings.m_daemonIndexLocation)) { indexId = settings.getIndexPropertiesByLocation(m_engineOption).m_id; isIndexQuery = true; } } // Will we have to query internal indices ? if (isIndexQuery == false) { pDocsIndex = settings.getIndex(settings.m_docsIndexLocation); pDaemonIndex = settings.getIndex(settings.m_daemonIndexLocation); } // Copy the results list for (vector::const_iterator resultIter = resultsList.begin(); resultIter != resultsList.end(); ++resultIter) { DocumentInfo currentDoc(*resultIter); string title(_("No title")); string location(currentDoc.getLocation(true)); string language(currentDoc.getLanguage()); unsigned int docId = 0; // The title may contain formatting if (currentDoc.getTitle().empty() == false) { title = FilterUtils::stripMarkup(currentDoc.getTitle()); } currentDoc.setTitle(title); #ifdef DEBUG clog << "EngineQueryThread::processResults: title is " << title << endl; #endif // Use the query's language if the result's is unknown if (language.empty() == true) { language = m_queryProps.getStemmingLanguage(); } currentDoc.setLanguage(language); if (isIndexQuery == true) { unsigned int tmpId = 0; // The index engine should have set this docId = currentDoc.getIsIndexed(tmpId); } // Is this in one of the indexes ? if ((pDocsIndex != NULL) && (pDocsIndex->isGood() == true)) { docId = pDocsIndex->hasDocument(location); if (docId > 0) { indexId = settings.getIndexPropertiesByName(_("My Web Pages")).m_id; } } if ((pDaemonIndex != NULL) && (pDaemonIndex->isGood() == true) && (docId == 0)) { docId = pDaemonIndex->hasDocument(location); if (docId > 0) { indexId = settings.getIndexPropertiesByName(_("My Documents")).m_id; } } if (docId > 0) { currentDoc.setIsIndexed(indexId, docId); #ifdef DEBUG clog << "EngineQueryThread::processResults: found in index " << indexId << endl; #endif } #ifdef DEBUG else clog << "EngineQueryThread::processResults: not found in any index" << endl; #endif m_documentsList.push_back(currentDoc); } if (pDocsIndex != NULL) { delete pDocsIndex; } if (pDaemonIndex != NULL) { delete pDaemonIndex; } } void EngineQueryThread::processResults(const vector &resultsList, unsigned int indexId) { unsigned int zeroId = 0; // Copy the results list for (vector::const_iterator resultIter = resultsList.begin(); resultIter != resultsList.end(); ++resultIter) { DocumentInfo currentDoc(*resultIter); // The engine has no notion of index IDs unsigned int docId = currentDoc.getIsIndexed(zeroId); currentDoc.setIsIndexed(indexId, docId); m_documentsList.push_back(currentDoc); } } void EngineQueryThread::doWork(void) { PinotSettings &settings = PinotSettings::getInstance(); // Get the SearchEngine SearchEngineInterface *pEngine = ModuleFactory::getSearchEngine(m_engineName, m_engineOption); if (pEngine == NULL) { // Try again if (findPlugin() == true) { pEngine = ModuleFactory::getSearchEngine(m_engineName, m_engineOption); } if (pEngine == NULL) { m_errorNum = UNKNOWN_ENGINE; m_errorParam = m_engineDisplayableName; return; } } // Set up the proxy WebEngine *pWebEngine = dynamic_cast(pEngine); if (pWebEngine != NULL) { DownloaderInterface *pDownloader = pWebEngine->getDownloader(); if ((pDownloader != NULL) && (settings.m_proxyEnabled == true) && (settings.m_proxyAddress.empty() == false)) { char portStr[64]; pDownloader->setSetting("proxyaddress", settings.m_proxyAddress); snprintf(portStr, 64, "%u", settings.m_proxyPort); pDownloader->setSetting("proxyport", portStr); pDownloader->setSetting("proxytype", settings.m_proxyType); } pWebEngine->setEditableValues(settings.m_editablePluginValues); } if (m_listingIndex == false) { pEngine->setLimitSet(m_limitToDocsSet); } // Run the query pEngine->setDefaultOperator(SearchEngineInterface::DEFAULT_OP_AND); if (pEngine->runQuery(m_queryProps, m_startDoc) == false) { m_errorNum = QUERY_FAILED; m_errorParam = m_engineDisplayableName; } else { const vector &resultsList = pEngine->getResults(); m_documentsList.clear(); m_documentsList.reserve(resultsList.size()); m_documentsCount = pEngine->getResultsCountEstimate(); #ifdef DEBUG clog << "EngineQueryThread::doWork: " << resultsList.size() << " off " << m_documentsCount << " results to process, starting at position " << m_startDoc << endl; #endif m_resultsCharset = pEngine->getResultsCharset(); if (m_listingIndex == false) { processResults(resultsList); } else { processResults(resultsList, PinotSettings::getInstance().getIndexPropertiesByName(m_engineDisplayableName).m_id); } // Don't spellcheck if the query was modified in any way if (m_queryProps.getModified() == false) { string correctedFreeQuery(pEngine->getSpellingCorrection()); // Any spelling correction ? if (correctedFreeQuery.empty() == false) { m_correctedSpelling = true; m_queryProps.setFreeQuery(correctedFreeQuery); } } } delete pEngine; } DownloadingThread::DownloadingThread(const DocumentInfo &docInfo) : WorkerThread(), m_docInfo(docInfo), m_pDoc(NULL), m_pDownloader(NULL) { } DownloadingThread::DownloadingThread() : WorkerThread(), m_docInfo("", "", "", ""), m_pDoc(NULL), m_pDownloader(NULL) { } DownloadingThread::~DownloadingThread() { if (m_pDoc != NULL) { delete m_pDoc; } if (m_pDownloader != NULL) { delete m_pDownloader; } } string DownloadingThread::getType(void) const { return "DownloadingThread"; } string DownloadingThread::getURL(void) const { return m_docInfo.getLocation(); } const Document *DownloadingThread::getDocument(void) const { return m_pDoc; } void DownloadingThread::doWork(void) { Url thisUrl(m_docInfo.getLocation()); bool getDownloader = true; if (m_pDoc != NULL) { delete m_pDoc; m_pDoc = NULL; } // Get a Downloader if (m_pDownloader != NULL) { // Same protocol as what we now need ? if (m_protocol == thisUrl.getProtocol()) { getDownloader = false; } else { delete m_pDownloader; m_pDownloader = NULL; m_protocol.clear(); } } if (getDownloader == true) { m_protocol = thisUrl.getProtocol(); m_pDownloader = DownloaderFactory::getDownloader(m_protocol); } if (m_pDownloader == NULL) { m_errorNum = UNSUPPORTED_PROTOCOL; m_errorParam = thisUrl.getProtocol(); } else if (m_done == false) { Timer collectTimer; PinotSettings &settings = PinotSettings::getInstance(); // Set up the proxy if ((getDownloader == true) && (settings.m_proxyEnabled == true) && (settings.m_proxyAddress.empty() == false)) { char portStr[64]; m_pDownloader->setSetting("proxyaddress", settings.m_proxyAddress); snprintf(portStr, 64, "%u", settings.m_proxyPort); m_pDownloader->setSetting("proxyport", portStr); m_pDownloader->setSetting("proxytype", settings.m_proxyType); } collectTimer.start(); m_pDoc = m_pDownloader->retrieveUrl(m_docInfo); clog << "Retrieved " << m_docInfo.getLocation() << " in " << collectTimer.stop() << " ms" << endl; } if (m_pDoc == NULL) { m_errorNum = DOWNLOAD_FAILED; m_errorParam = m_docInfo.getLocation(); } } IndexingThread::IndexingThread(const DocumentInfo &docInfo, const string &indexLocation, bool allowAllMIMETypes) : DownloadingThread(docInfo), m_pIndex(NULL), m_indexLocation(indexLocation), m_allowAllMIMETypes(allowAllMIMETypes), m_update(false), m_docId(0) { } IndexingThread::~IndexingThread() { if (m_pIndex != NULL) { delete m_pIndex; } } string IndexingThread::getType(void) const { return "IndexingThread"; } const DocumentInfo &IndexingThread::getDocumentInfo(void) const { return m_docInfo; } unsigned int IndexingThread::getDocumentID(void) const { return m_docId; } bool IndexingThread::isNewDocument(void) const { // If the thread is set to perform an update, the document isn't new if (m_update == true) { return false; } return true; } void IndexingThread::doWork(void) { Url thisUrl(m_docInfo.getLocation()); bool reliableType = false, doDownload = true; // First things first, get the index if (m_pIndex == NULL) { m_pIndex = PinotSettings::getInstance().getIndex(m_indexLocation); } if ((m_pIndex == NULL) || (m_pIndex->isGood() == false)) { m_errorNum = INDEX_ERROR; m_errorParam = m_indexLocation; return; } // Is it an update ? m_docId = m_pIndex->hasDocument(m_docInfo.getLocation(true)); if (m_docId > 0) { // Ignore robots directives on updates m_update = true; } if (m_docInfo.getType().empty() == true) { m_docInfo.setType(MIMEScanner::scanUrl(thisUrl)); } else if (thisUrl.isLocal() == true) { // There's a good chance the supplied type is accurate // if the document is a local file reliableType = true; } if (m_docInfo.getIsDirectory() == true) { doDownload = false; #ifdef DEBUG clog << "IndexingThread::doWork: skipping download of directory " << m_docInfo.getLocation() << endl; #endif } else if (FilterUtils::isSupportedType(m_docInfo.getType()) == false) { // Skip unsupported types ? if (m_allowAllMIMETypes == false) { m_errorNum = UNSUPPORTED_TYPE; m_errorParam = m_docInfo.getType(); return; } if (reliableType == true) { doDownload = false; #ifdef DEBUG clog << "IndexingThread::doWork: skipping download of unsupported type " << m_docInfo.getLocation() << endl; #endif } } else { Dijon::Filter *pFilter = FilterUtils::getFilter(m_docInfo.getType()); if (pFilter != NULL) { // We may be able to feed the document directly to the filter if (((pFilter->is_data_input_ok(Dijon::Filter::DOCUMENT_FILE_NAME) == true) && (thisUrl.getProtocol() == "file")) || ((pFilter->is_data_input_ok(Dijon::Filter::DOCUMENT_URI) == true) && (thisUrl.isLocal() == false))) { doDownload = false; #ifdef DEBUG clog << "IndexingThread::doWork: let filter download " << m_docInfo.getLocation() << endl; #endif } delete pFilter; } } // We may not have to download the document if (doDownload == true) { DownloadingThread::doWork(); } else { if (m_pDoc != NULL) { delete m_pDoc; m_pDoc = NULL; } m_pDoc = new Document(m_docInfo); m_pDoc->setTimestamp(m_docInfo.getTimestamp()); m_pDoc->setSize(m_docInfo.getSize()); } if (m_pDoc != NULL) { Timer indexTimer; string docType(m_pDoc->getType()); bool success = false; indexTimer.start(); // The type may have been obtained when downloading if (docType.empty() == false) { // Use the document's type m_docInfo.setType(docType); } else { // Use the type we were supplied with m_pDoc->setType(m_docInfo.getType()); } if (m_docInfo.getTitle().empty() == false) { // Use the title we were supplied with m_pDoc->setTitle(m_docInfo.getTitle()); } else { // Use the document's m_docInfo.setTitle(m_pDoc->getTitle()); } #ifdef DEBUG clog << "IndexingThread::doWork: title is " << m_pDoc->getTitle() << endl; #endif // Check again as the downloader may have altered the MIME type if (FilterUtils::isSupportedType(m_docInfo.getType()) == false) { // Skip unsupported types ? if (m_allowAllMIMETypes == false) { m_errorNum = UNSUPPORTED_TYPE; m_errorParam = m_docInfo.getType(); return; } // Let FilterWrapper handle unspported documents } else if ((PinotSettings::getInstance().m_ignoreRobotsDirectives == false) && (thisUrl.isLocal() == false) && (m_docInfo.getType().length() >= 9) && (m_docInfo.getType().substr(9) == "text/html")) { Dijon::HtmlFilter htmlFilter; htmlFilter.set_mime_type(m_docInfo.getType()); if ((FilterUtils::feedFilter(*m_pDoc, &htmlFilter) == true) && (htmlFilter.next_document() == true)) { const map &metaData = htmlFilter.get_meta_data(); // See if the document has a ROBOTS META tag map::const_iterator robotsIter = metaData.find("robots"); if (robotsIter != metaData.end()) { string robotsDirectives(robotsIter->second); // Is indexing allowed ? string::size_type pos1 = robotsDirectives.find("none"); string::size_type pos2 = robotsDirectives.find("noindex"); if ((pos1 != string::npos) || (pos2 != string::npos)) { // No, it isn't m_errorNum = ROBOTS_FORBIDDEN; m_errorParam = m_docInfo.getLocation(); return; } } } #ifdef DEBUG else clog << "IndexingThread::doWork: couldn't check document for ROBOTS directive" << endl; #endif } if (m_done == false) { FilterWrapper wrapFilter(m_pIndex); // Update an existing document or add to the index ? if (m_update == true) { set labels; // Make sure labels are preserved m_pIndex->getDocumentLabels(m_docId, labels); m_pDoc->setLabels(labels); // Update the document if (wrapFilter.updateDocument(*m_pDoc, m_docId) == true) { #ifdef DEBUG clog << "IndexingThread::doWork: updated " << m_pDoc->getLocation() << " at " << m_docId << endl; #endif success = true; } #ifdef DEBUG else clog << "IndexingThread::doWork: couldn't update " << m_pDoc->getLocation() << endl; #endif } else { unsigned int docId = 0; #ifdef DEBUG clog << "IndexingThread::doWork: " << m_docInfo.getLabels().size() << " labels for URL " << m_pDoc->getLocation() << endl; #endif // Index the document success = wrapFilter.indexDocument(*m_pDoc, m_docInfo.getLabels(), docId); if (success == true) { m_docId = docId; #ifdef DEBUG clog << "IndexingThread::doWork: indexed " << m_pDoc->getLocation() << " to " << m_docId << endl; #endif } #ifdef DEBUG else clog << "IndexingThread::doWork: couldn't index " << m_pDoc->getLocation() << endl; #endif } if (success == false) { m_errorNum = INDEXING_FAILED; m_errorParam = m_docInfo.getLocation(); } else { // Flush the index ? if (m_immediateFlush == true) { m_pIndex->flush(); } // The document properties may have changed m_pIndex->getDocumentInfo(m_docId, m_docInfo); m_docInfo.setIsIndexed( PinotSettings::getInstance().getIndexPropertiesByLocation(m_indexLocation).m_id, m_docId); clog << "Indexed " << m_docInfo.getLocation() << " in " << indexTimer.stop() << " ms" << endl; } } } #ifdef DEBUG else clog << "IndexingThread::doWork: couldn't download " << m_docInfo.getLocation() << endl; #endif } UnindexingThread::UnindexingThread(const set &docIdList) : WorkerThread(), m_indexLocation(PinotSettings::getInstance().m_docsIndexLocation), m_docsCount(0) { copy(docIdList.begin(), docIdList.end(), inserter(m_docIdList, m_docIdList.begin())); } UnindexingThread::UnindexingThread(const set &labelNames, const string &indexLocation) : WorkerThread(), m_indexLocation(indexLocation), m_docsCount(0) { copy(labelNames.begin(), labelNames.end(), inserter(m_labelNames, m_labelNames.begin())); if (indexLocation.empty() == true) { m_indexLocation = PinotSettings::getInstance().m_docsIndexLocation; } } UnindexingThread::~UnindexingThread() { } string UnindexingThread::getType(void) const { return "UnindexingThread"; } unsigned int UnindexingThread::getDocumentsCount(void) const { return m_docsCount; } void UnindexingThread::doWork(void) { IndexInterface *pIndex = PinotSettings::getInstance().getIndex(m_indexLocation); if ((pIndex == NULL) || (pIndex->isGood() == false)) { m_errorNum = INDEX_ERROR; m_errorParam = m_indexLocation; if (pIndex != NULL) { delete pIndex; } return; } // Be pessimistic and assume something will go wrong ;-) m_errorNum = UNINDEXING_FAILED; // Are we supposed to remove documents based on labels ? if (m_docIdList.empty() == true) { // Yep, delete documents one label at a time for (set::iterator iter = m_labelNames.begin(); iter != m_labelNames.end(); ++iter) { string labelName = (*iter); // By unindexing all documents that match the label, // we effectively delete the label from the index if (pIndex->unindexDocuments(labelName, IndexInterface::BY_LABEL) == true) { #ifdef DEBUG clog << "UnindexingThread::doWork: removed label " << labelName << endl; #endif // OK ++m_docsCount; } #ifdef DEBUG else clog << "UnindexingThread::doWork: couldn't remove label " << labelName << endl; #endif } // Nothing to report m_errorNum = 0; } else { for (set::iterator iter = m_docIdList.begin(); iter != m_docIdList.end(); ++iter) { unsigned int docId = (*iter); if (pIndex->unindexDocument(docId) == true) { #ifdef DEBUG clog << "UnindexingThread::doWork: removed " << docId << endl; #endif // OK ++m_docsCount; } #ifdef DEBUG else clog << "UnindexingThread::doWork: couldn't remove " << docId << endl; #endif } #ifdef DEBUG clog << "UnindexingThread::doWork: removed " << m_docsCount << " documents" << endl; #endif } if (m_docsCount > 0) { // Flush the index ? if (m_immediateFlush == true) { pIndex->flush(); } // Nothing to report m_errorNum = 0; } delete pIndex; } HistoryMonitorThread::HistoryMonitorThread(MonitorInterface *pMonitor, MonitorHandler *pHandler) : MonitorThread(pMonitor, pHandler), m_crawlHistory(PinotSettings::getInstance().getHistoryDatabaseName()) { } HistoryMonitorThread::~HistoryMonitorThread() { } bool HistoryMonitorThread::isFileBlacklisted(const string &location) { return PinotSettings::getInstance().isBlackListed(location); } void HistoryMonitorThread::fileModified(const string &location) { CrawlHistory::CrawlStatus status = CrawlHistory::UNKNOWN; struct stat fileStat; time_t itemDate = 0; if (m_crawlHistory.hasItem("file://" + location, status, itemDate) == true) { // Was the file actually modified ? if ((stat(location.c_str(), &fileStat) == 0) && (itemDate < fileStat.st_mtime)) { m_pHandler->fileModified(location); } #ifdef DEBUG else clog << "HistoryMonitorThread::fileModified: file wasn't modified" << endl; #endif } #ifdef DEBUG else clog << "HistoryMonitorThread::fileModified: file wasn't crawled" << endl; #endif } DirectoryScannerThread::DirectoryScannerThread(const DocumentInfo &docInfo, const string &indexLocation, unsigned int maxLevel, bool inlineIndexing, bool followSymLinks) : IndexingThread(docInfo, indexLocation), m_currentLevel(0), m_maxLevel(maxLevel), m_inlineIndexing(inlineIndexing), m_followSymLinks(followSymLinks) { Url urlObj(docInfo.getLocation()); m_dirName = urlObj.getLocation() + "/" + urlObj.getFile(); } DirectoryScannerThread::~DirectoryScannerThread() { } string DirectoryScannerThread::getType(void) const { if (m_inlineIndexing == true) { return IndexingThread::getType(); } return "DirectoryScannerThread"; } string DirectoryScannerThread::getDirectory(void) const { return m_dirName; } void DirectoryScannerThread::stop(void) { // Disconnect the signal sigc::signal2::slot_list_type slotsList = m_signalFileFound.slots(); sigc::signal2::slot_list_type::iterator slotIter = slotsList.begin(); if (slotIter != slotsList.end()) { if (slotIter->empty() == false) { slotIter->block(); slotIter->disconnect(); } } WorkerThread::stop(); } sigc::signal2& DirectoryScannerThread::getFileFoundSignal(void) { return m_signalFileFound; } void DirectoryScannerThread::recordCrawled(const string &location, time_t itemDate) { // Nothing to do by default } bool DirectoryScannerThread::isIndexable(const string &entryName) const { string entryDir(path_get_dirname(entryName) + "/"); // Is this under the directory being scanned ? if ((entryDir.length() >= m_dirName.length()) && (entryDir.substr(0, m_dirName.length()) == m_dirName)) { // Yes, it is #ifdef DEBUG clog << "DirectoryScannerThread::isIndexable: under " << m_dirName << endl; #endif return true; } return false; } bool DirectoryScannerThread::wasCrawled(const string &location, time_t &itemDate) { // This information is unknown return false; } void DirectoryScannerThread::recordCrawling(const string &location, bool itemExists, time_t &itemDate) { // Nothing to do by default } void DirectoryScannerThread::recordError(const string &location, int errorCode) { // Nothing to do by default } void DirectoryScannerThread::recordSymlink(const string &location, time_t itemDate) { // Nothing to do by default } bool DirectoryScannerThread::monitorEntry(const string &entryName) { // Nothing to do by default return true; } void DirectoryScannerThread::unmonitorEntry(const string &entryName) { // Nothing to do by default } void DirectoryScannerThread::foundFile(const DocumentInfo &docInfo) { if ((docInfo.getLocation().empty() == true) || (m_done == true)) { return; } if (m_inlineIndexing == true) { // Reset base class members m_docInfo = docInfo; m_docId = 0; m_update = false; IndexingThread::doWork(); #ifdef DEBUG clog << "DirectoryScannerThread::foundFile: indexed " << docInfo.getLocation() << " to " << m_docId << endl; #endif } else { // Delegate indexing // Report everything as file to avoid triggering another crawl m_signalFileFound(docInfo, false); } } bool DirectoryScannerThread::scanEntry(const string &entryName, int &entryStatus, bool statLinks) { string location("file://" + entryName); DocumentInfo docInfo("", location, "", ""); time_t itemDate = time(NULL); struct stat fileStat; bool scanSuccess = true, reportFile = false, itemExists = false; if (entryName.empty() == true) { return false; } // Skip . .. and dotfiles Url urlObj(location); if (urlObj.getFile()[0] == '.') { #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: skipped dotfile " << urlObj.getFile() << endl; #endif return false; } #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: checking " << entryName << endl; #endif #ifdef HAVE_LSTAT // Stat links, or the stuff it refers to ? if (statLinks == true) { entryStatus = lstat(entryName.c_str(), &fileStat); } else { #endif entryStatus = stat(entryName.c_str(), &fileStat); #ifdef HAVE_LSTAT } #endif if (entryStatus == -1) { entryStatus = errno; scanSuccess = false; #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: stat failed with error " << entryStatus << endl; #endif } #ifdef HAVE_LSTAT // Special processing applies if it's a symlink else if (S_ISLNK(fileStat.st_mode)) { string realEntryName(entryName); string entryNameReferree; bool isInIndexableLocation = false; // If symlinks are followed, check if this symlink is blacklisted if ((m_followSymLinks == false) || (PinotSettings::getInstance().isBlackListed(entryName) == true)) { #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: skipped symlink " << entryName << endl; #endif return false; } // Are we already following a symlink to a directory ? if (m_currentLinks.empty() == false) { string linkToDir(m_currentLinks.top() + "/"); // Yes, we are if ((entryName.length() > linkToDir.length()) && (entryName.substr(0, linkToDir.length()) == linkToDir)) { // ...and this entry is below it realEntryName.replace(0, linkToDir.length() - 1, m_currentLinkReferrees.top()); #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: really at " << realEntryName << endl; #endif isInIndexableLocation = isIndexable(realEntryName); } } char *pBuf = g_file_read_link(realEntryName.c_str(), NULL); if (pBuf != NULL) { string linkLocation(filename_to_utf8(pBuf)); if (path_is_absolute(linkLocation) == true) { entryNameReferree = linkLocation; } else { string entryDir(path_get_dirname(realEntryName)); entryNameReferree = Url::resolvePath(entryDir, linkLocation); } if (entryNameReferree[entryNameReferree.length() - 1] == '/') { // Drop the terminating slash entryNameReferree.resize(entryNameReferree.length() - 1); } #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: symlink resolved to " << entryNameReferree << endl; #endif g_free(pBuf); } string referreeLocation("file://" + entryNameReferree); time_t referreeItemDate; // Check whether this will be, or has already been crawled // Referrees in indexable locations will be indexed later on if ((isInIndexableLocation == false) && (isIndexable(entryNameReferree) == false) && (wasCrawled(referreeLocation, referreeItemDate) == false)) { m_currentLinks.push(entryName); m_currentLinkReferrees.push(entryNameReferree); // Add a dummy entry for this referree // It will ensure it's not indexed more than once and it shouldn't do any harm recordSymlink(referreeLocation, itemDate); // Do it again, this time by stat'ing what the link refers to bool scannedReferree = scanEntry(entryName, entryStatus, false); m_currentLinks.pop(); m_currentLinkReferrees.pop(); return scannedReferree; } else { clog << "Skipping " << entryName << ": it links to " << entryNameReferree << " which will be crawled, or has already been crawled" << endl; // This should ensure that only metadata is indexed docInfo.setType("inode/symlink"); reportFile = true; } } #endif // Is this item in the database already ? itemExists = wasCrawled(location, itemDate); // Put it in if necessary recordCrawling(location, itemExists, itemDate); // If stat'ing didn't fail, see if it's a file or a directory if ((entryStatus == 0) && (S_ISREG(fileStat.st_mode))) { // Is this file blacklisted ? // We have to check early so that if necessary the file's status stays at TO_CRAWL // and it is removed from the index at the end of this crawl if (PinotSettings::getInstance().isBlackListed(entryName) == false) { reportFile = true; } } else if ((entryStatus == 0) && (S_ISDIR(fileStat.st_mode))) { docInfo.setType("x-directory/normal"); // Can we scan this directory ? if (((m_maxLevel == 0) || (m_currentLevel < m_maxLevel)) && (PinotSettings::getInstance().isBlackListed(entryName) == false)) { ++m_currentLevel; // Open the directory DIR *pDir = opendir(entryName.c_str()); if (pDir != NULL) { // Monitor first so that we don't miss events // If monitoring is not possible, record the first case if ((monitorEntry(entryName) == false) && (entryStatus != MONITORING_FAILED)) { entryStatus = MONITORING_FAILED; } #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: entering " << entryName << endl; #endif // Iterate through this directory's entries struct dirent *pDirEntry = readdir(pDir); while ((m_done == false) && (pDirEntry != NULL)) { char *pEntryName = pDirEntry->d_name; // Skip . .. and dotfiles if ((pEntryName != NULL) && (pEntryName[0] != '.')) { string subEntryName(entryName); int subEntryStatus = 0; if (entryName[entryName.length() - 1] != '/') { subEntryName += "/"; } subEntryName += pEntryName; // Scan this entry scanEntry(subEntryName, subEntryStatus); } // Next entry pDirEntry = readdir(pDir); } #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: leaving " << entryName << endl; #endif // Close the directory closedir(pDir); --m_currentLevel; reportFile = true; } else { entryStatus = errno; scanSuccess = false; #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: opendir failed with error " << entryStatus << endl; #endif } } } // Is it some unknown type ? else if ((entryStatus == 0) #ifdef HAVE_LSTAT && (!S_ISLNK(fileStat.st_mode)) #endif ) { #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: unknown entry type" << endl; #endif entryStatus = ENOENT; scanSuccess = false; } // Was it modified after the last crawl ? if ((itemExists == true) && (itemDate >= fileStat.st_mtime)) { // No, it wasn't #ifdef DEBUG clog << "DirectoryScannerThread::scanEntry: no change to " << location << endl; #endif reportFile = false; } if (m_done == true) { // Don't record or report the file reportFile = false; } // Did an error occur ? else if (entryStatus != 0) { // Record this error recordError(location, entryStatus); if (scanSuccess == false) { return scanSuccess; } } // History of new or modified files, especially their timestamp, is always updated // Others' are updated only if we are doing a full scan because // the status has to be reset to CRAWLED, so that they are not unindexed else if ((itemExists == false) || (reportFile == true)) { recordCrawled(location, fileStat.st_mtime); } // If a major error occurred, this won't be true if (reportFile == true) { if (docInfo.getType().empty() == true) { // Scan the file docInfo.setType(MIMEScanner::scanFile(entryName)); } docInfo.setTimestamp(TimeConverter::toTimestamp(fileStat.st_mtime)); docInfo.setSize(fileStat.st_size); foundFile(docInfo); } return scanSuccess; } void DirectoryScannerThread::doWork(void) { Timer scanTimer; int entryStatus = 0; if (m_dirName.empty() == true) { return; } scanTimer.start(); if (scanEntry(m_dirName, entryStatus) == false) { if (entryStatus == 0) { m_errorNum = OPENDIR_FAILED; } else { m_errorNum = entryStatus; } m_errorParam = m_dirName; } clog << "Scanned " << m_dirName << " in " << scanTimer.stop() << " ms" << endl; } pinot-1.22/Core/WorkerThreads.h000066400000000000000000000213431470740426600164310ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _WORKERTHREADS_HH #define _WORKERTHREADS_HH #include #include #include #include #include #include #include #include #include #include #include #include #include "Document.h" #include "ActionQueue.h" #include "CrawlHistory.h" #include "DownloaderInterface.h" #include "QueryProperties.h" #include "PinotSettings.h" #include "WorkerThread.h" class QueueManager : public ThreadsManager { public: QueueManager(const std::string &defaultIndexLocation, unsigned int maxThreadsTime = 300, bool scanLocalFiles = false); virtual ~QueueManager(); virtual Glib::ustring queue_index(const DocumentInfo &docInfo); virtual bool pop_queue(const std::string &urlWasIndexed = ""); protected: bool m_scanLocalFiles; bool m_stopIndexing; ActionQueue m_actionQueue; Glib::ustring index_document(const DocumentInfo &docInfo); virtual void clear_queues(void); private: QueueManager(const QueueManager &other); QueueManager &operator=(const QueueManager &other); }; class ListerThread : public WorkerThread { public: ListerThread(const PinotSettings::IndexProperties &indexProps, unsigned int startDoc); ~ListerThread(); std::string getType(void) const; PinotSettings::IndexProperties getIndexProperties(void) const; unsigned int getStartDoc(void) const; const std::vector &getDocuments(void) const; unsigned int getDocumentsCount(void) const; protected: PinotSettings::IndexProperties m_indexProps; unsigned int m_startDoc; std::vector m_documentsList; unsigned int m_documentsCount; private: ListerThread(const ListerThread &other); ListerThread &operator=(const ListerThread &other); }; class QueryingThread : public ListerThread { public: QueryingThread(const PinotSettings::IndexProperties &indexProps, const QueryProperties &queryProps, unsigned int startDoc = 0, bool listingIndex = false); QueryingThread(const std::string &engineName, const std::string &engineDisplayableName, const std::string &engineOption, const QueryProperties &queryProps, unsigned int startDoc = 0); virtual ~QueryingThread(); virtual std::string getType(void) const; bool isLive(void) const; std::string getEngineName(void) const; QueryProperties getQuery(bool &wasCorrected) const; std::string getCharset(void) const; protected: std::string m_engineName; std::string m_engineDisplayableName; std::string m_engineOption; QueryProperties m_queryProps; std::string m_resultsCharset; bool m_listingIndex; bool m_correctedSpelling; bool m_isLive; bool findPlugin(void); private: QueryingThread(const QueryingThread &other); QueryingThread &operator=(const QueryingThread &other); }; class EngineQueryThread : public QueryingThread { public: EngineQueryThread(const PinotSettings::IndexProperties &indexProps, const QueryProperties &queryProps, unsigned int startDoc = 0, bool listingIndex = false); EngineQueryThread(const PinotSettings::IndexProperties &indexProps, const QueryProperties &queryProps, const std::set &limitToDocsSet, unsigned int startDoc = 0); EngineQueryThread(const std::string &engineName, const std::string &engineDisplayableName, const std::string &engineOption, const QueryProperties &queryProps, unsigned int startDoc = 0); virtual ~EngineQueryThread(); protected: std::set m_limitToDocsSet; virtual void processResults(const std::vector &resultsList); virtual void processResults(const std::vector &resultsList, unsigned int indexId); virtual void doWork(void); private: EngineQueryThread(const EngineQueryThread &other); EngineQueryThread &operator=(const EngineQueryThread &other); }; class DownloadingThread : public WorkerThread { public: DownloadingThread(const DocumentInfo &docInfo); virtual ~DownloadingThread(); virtual std::string getType(void) const; std::string getURL(void) const; const Document *getDocument(void) const; protected: DocumentInfo m_docInfo; Document *m_pDoc; DownloaderInterface *m_pDownloader; std::string m_protocol; DownloadingThread(); virtual void doWork(void); private: DownloadingThread(const DownloadingThread &other); DownloadingThread &operator=(const DownloadingThread &other); }; class IndexingThread : public DownloadingThread { public: IndexingThread(const DocumentInfo &docInfo, const std::string &indexLocation, bool allowAllMIMETypes = true); virtual ~IndexingThread(); virtual std::string getType(void) const; const DocumentInfo &getDocumentInfo(void) const; std::string getLabelName(void) const; unsigned int getDocumentID(void) const; bool isNewDocument(void) const; protected: IndexInterface *m_pIndex; std::string m_indexLocation; bool m_allowAllMIMETypes; bool m_update; unsigned int m_docId; IndexingThread(); virtual void doWork(void); private: IndexingThread(const IndexingThread &other); IndexingThread &operator=(const IndexingThread &other); }; class UnindexingThread : public WorkerThread { public: // Unindex documents from the internal index UnindexingThread(const std::set &docIdList); // Unindex from the given index documents that have one of the labels UnindexingThread(const std::set &labelNames, const std::string &indexLocation); virtual ~UnindexingThread(); virtual std::string getType(void) const; unsigned int getDocumentsCount(void) const; protected: std::set m_docIdList; std::set m_labelNames; std::string m_indexLocation; unsigned int m_docsCount; virtual void doWork(void); private: UnindexingThread(const UnindexingThread &other); UnindexingThread &operator=(const UnindexingThread &other); }; class HistoryMonitorThread : public MonitorThread { public: HistoryMonitorThread(MonitorInterface *pMonitor, MonitorHandler *pHandler); virtual ~HistoryMonitorThread(); protected: CrawlHistory m_crawlHistory; virtual bool isFileBlacklisted(const std::string &location); virtual void fileModified(const std::string &location); private: HistoryMonitorThread(const HistoryMonitorThread &other); HistoryMonitorThread &operator=(const HistoryMonitorThread &other); }; class DirectoryScannerThread : public IndexingThread { public: DirectoryScannerThread(const DocumentInfo &docInfo, const std::string &indexLocation, unsigned int maxLevel = 0, bool inlineIndexing = false, bool followSymLinks = true); virtual ~DirectoryScannerThread(); virtual std::string getType(void) const; virtual std::string getDirectory(void) const; virtual void stop(void); sigc::signal2& getFileFoundSignal(void); protected: std::string m_dirName; unsigned int m_currentLevel; unsigned int m_maxLevel; bool m_inlineIndexing; bool m_followSymLinks; sigc::signal2 m_signalFileFound; std::stack m_currentLinks; std::stack m_currentLinkReferrees; virtual void recordCrawled(const std::string &location, time_t itemDate); virtual bool isIndexable(const std::string &entryName) const; virtual bool wasCrawled(const std::string &location, time_t &itemDate); virtual void recordCrawling(const std::string &location, bool itemExists, time_t &itemDate); virtual void recordError(const std::string &location, int errorCode); virtual void recordSymlink(const std::string &location, time_t itemDate); virtual bool monitorEntry(const std::string &entryName); virtual void unmonitorEntry(const std::string &entryName); virtual void foundFile(const DocumentInfo &docInfo); bool scanEntry(const std::string &entryName, int &entryStatus, bool statLinks = true); virtual void doWork(void); private: DirectoryScannerThread(const DirectoryScannerThread &other); DirectoryScannerThread &operator=(const DirectoryScannerThread &other); }; #endif // _WORKERTHREADS_HH pinot-1.22/Core/com.github.fabricecolin.Pinot.search-provider.ini000066400000000000000000000002121470740426600250270ustar00rootroot00000000000000[Shell Search Provider] DesktopId=pinot.desktop BusName=com.github.fabricecolin.Pinot ObjectPath=/com/github/fabricecolin/Pinot Version=2 pinot-1.22/Core/com.github.fabricecolin.Pinot.service.in000066400000000000000000000001231470740426600232220ustar00rootroot00000000000000[D-BUS Service] Name=com.github.fabricecolin.Pinot Exec=@BINDIR@/pinot-dbus-daemon pinot-1.22/Core/pinot-daemon.1000066400000000000000000000016211470740426600161450ustar00rootroot00000000000000.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5. .TH PINOT-DAEMON "1" "October 2021" "pinot 1.20" "User Commands" .SH NAME pinot-daemon \- Search and index daemon .SH SYNOPSIS .B pinot-daemon [\fI\,OPTIONS\/\fR] .SH DESCRIPTION pinot\-daemon \- Search and index daemon .SH OPTIONS .TP \fB\-h\fR, \fB\-\-help\fR display this help and exit .TP \fB\-i\fR, \fB\-\-ignore\-version\fR ignore the index version number .TP \fB\-p\fR, \fB\-\-priority\fR set the daemon's priority (default 15) .TP \fB\-r\fR, \fB\-\-reindex\fR force a reindex .TP \fB\-v\fR, \fB\-\-version\fR output version information and exit .SH "REPORTING BUGS" Report bugs to fabrice.colin@gmail.com .PP .br This is free software. You may redistribute copies of it under the terms of the GNU General Public License . .br There is NO WARRANTY, to the extent permitted by law. pinot-1.22/Core/pinot-dbus-daemon.1000066400000000000000000000016631470740426600171060ustar00rootroot00000000000000.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.3. .TH PINOT-DBUS-DAEMON "1" "October 2024" "pinot 1.22" "User Commands" .SH NAME pinot-dbus-daemon \- D-Bus search and index daemon .SH SYNOPSIS .B pinot-dbus-daemon [\fI\,OPTIONS\/\fR] .SH DESCRIPTION pinot\-dbus\-daemon \- D\-Bus search and index daemon .SH OPTIONS .TP \fB\-h\fR, \fB\-\-help\fR display this help and exit .TP \fB\-i\fR, \fB\-\-ignore\-version\fR ignore the index version number .TP \fB\-p\fR, \fB\-\-priority\fR set the daemon's priority (default 15) .TP \fB\-r\fR, \fB\-\-reindex\fR force a reindex .TP \fB\-v\fR, \fB\-\-version\fR output version information and exit .SH "REPORTING BUGS" Report bugs to fabrice.colin@gmail.com .PP .br This is free software. You may redistribute copies of it under the terms of the GNU General Public License . .br There is NO WARRANTY, to the extent permitted by law. pinot-1.22/Core/pinot-dbus-daemon.cpp000066400000000000000000000327401470740426600175300ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #ifdef HAVE_LINUX_SCHED_H #include #include #else #include #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include "NLS.h" #include "FilterFactory.h" #include "Languages.h" #include "MIMEScanner.h" #include "ModuleFactory.h" #include "ActionQueue.h" #include "CrawlHistory.h" #include "MetaDataBackup.h" #include "QueryHistory.h" #include "ViewHistory.h" #include "DownloaderInterface.h" #include "DaemonState.h" #include "PinotSettings.h" #include "ServerThreads.h" #include "UniqueApplication.h" using namespace std; static ofstream g_outputFile; static string g_pidFileName; static streambuf *g_coutBuf = NULL; static streambuf *g_cerrBuf = NULL; static streambuf *g_clogBuf = NULL; static struct option g_longOptions[] = { {"help", 0, 0, 'h'}, {"ignore-version", 0, 0, 'i'}, {"priority", 1, 0, 'p'}, {"reindex", 0, 0, 'r'}, {"version", 0, 0, 'v'}, {0, 0, 0, 0} }; static Glib::RefPtr g_refMainLoop; static DaemonState *g_pState = NULL; static void closeAll(void) { clog << "Exiting..." << endl; // Close everything ModuleFactory::unloadModules(); // Restore the stream buffers if (g_coutBuf != NULL) { clog.rdbuf(g_coutBuf); } if (g_cerrBuf != NULL) { clog.rdbuf(g_cerrBuf); } if (g_clogBuf != NULL) { clog.rdbuf(g_clogBuf); } g_outputFile.close(); if (g_pidFileName.empty() == false) { unlink(g_pidFileName.c_str()); } DownloaderInterface::shutdown(); MIMEScanner::shutdown(); } static void quitAll(int sigNum) { if (g_refMainLoop->is_running() == true) { clog << "Quitting..." << endl; if (g_pState != NULL) { g_pState->mustQuit(true); } g_refMainLoop->quit(); } } int main(int argc, char **argv) { #ifdef HAVE_DBUS string programName("pinot-dbus-daemon"); #else string programName("pinot-daemon"); #endif int longOptionIndex = 0, priority = 15; bool resetHistory = false; bool resetLabels = false; bool reindex = false; bool ignoreVersion = false; // Look at the options int optionChar = getopt_long(argc, argv, "hip:rv", g_longOptions, &longOptionIndex); while (optionChar != -1) { switch (optionChar) { case 'h': // Help #ifdef HAVE_DBUS clog << programName << " - D-Bus search and index daemon\n\n" #else clog << programName << " - Search and index daemon\n\n" #endif << "Usage: " << programName << " [OPTIONS]\n\n" << "Options:\n" << " -h, --help display this help and exit\n" << " -i, --ignore-version ignore the index version number\n" << " -p, --priority set the daemon's priority (default 15)\n" << " -r, --reindex force a reindex\n" << " -v, --version output version information and exit\n" << "\nReport bugs to " << PACKAGE_BUGREPORT << endl; return EXIT_SUCCESS; case 'i': ignoreVersion = true; break; case 'p': if (optarg != NULL) { int newPriority = atoi(optarg); if ((newPriority >= -20) && (newPriority < 20)) { priority = newPriority; } } break; case 'r': reindex = true; break; case 'v': clog << programName << " - " << PACKAGE_STRING << "\n\n" << "This is free software. You may redistribute copies of it under the terms of\n" << "the GNU General Public License .\n" << "There is NO WARRANTY, to the extent permitted by law." << endl; return EXIT_SUCCESS; default: return EXIT_FAILURE; } // Next option optionChar = getopt_long(argc, argv, "hip:rv", g_longOptions, &longOptionIndex); } #if defined(ENABLE_NLS) bindtextdomain(GETTEXT_PACKAGE, PACKAGE_LOCALE_DIR); bind_textdomain_codeset(GETTEXT_PACKAGE, "UTF-8"); textdomain(GETTEXT_PACKAGE); #endif //ENABLE_NLS Glib::init(); Gio::init(); // Initialize threads support before doing anything else if (Glib::thread_supported() == false) { Glib::thread_init(); } // Initialize GType #if !GLIB_CHECK_VERSION(2,35,0) g_type_init(); #endif g_refMainLoop = Glib::MainLoop::create(); #ifdef HAVE_DBUS Glib::set_application_name("Pinot DBus Daemon"); #else Glib::set_application_name("Pinot Daemon"); #endif // Make the locale follow the environment variables setlocale(LC_ALL, ""); char *pLocale = setlocale(LC_ALL, NULL); if (pLocale != NULL) { string locale(pLocale); if (locale != "C") { bool appendUTF8 = false; string::size_type pos = locale.find_last_of("."); if ((pos != string::npos) && ((strcasecmp(locale.substr(pos).c_str(), ".utf8") != 0) && (strcasecmp(locale.substr(pos).c_str(), ".utf-8") != 0))) { locale.resize(pos); appendUTF8 = true; } if (appendUTF8 == true) { locale += ".UTF-8"; pLocale = setlocale(LC_ALL, locale.c_str()); if (pLocale != NULL) { #ifdef DEBUG clog << "Changed locale to " << pLocale << endl; #endif } } } } // Make sure only one instance runs #ifdef HAVE_DBUS UniqueApplication uniqueApp("com.github.fabricecolin.PinotDBusDaemon"); #else UniqueApplication uniqueApp("com.github.fabricecolin.PinotDaemon"); #endif string confDirectory(PinotSettings::getConfigurationDirectory()); g_pidFileName = confDirectory + "/" + programName + ".pid"; if (chdir(confDirectory.c_str()) == 0) { if (uniqueApp.isRunning(g_pidFileName, programName) == true) { return EXIT_SUCCESS; } // Redirect cout, cerr and clog to a file string fileName(confDirectory); fileName += "/"; fileName += programName; fileName += ".log"; g_outputFile.open(fileName.c_str()); g_coutBuf = cout.rdbuf(); g_cerrBuf = cerr.rdbuf(); g_clogBuf = clog.rdbuf(); clog.rdbuf(g_outputFile.rdbuf()); clog.rdbuf(g_outputFile.rdbuf()); clog.rdbuf(g_outputFile.rdbuf()); } else { // We can't rely on the PID file if (uniqueApp.isRunning() == true) { return EXIT_SUCCESS; } } // This will create the necessary directories on the first run PinotSettings &settings = PinotSettings::getInstance(); // This is the daemon so disable client-side code settings.setClientMode(false); // Initialize utility classes if (MIMEScanner::initialize(PinotSettings::getHomeDirectory() + "/.local", string(SHARED_MIME_INFO_PREFIX)) == false) { clog << "Couldn't load MIME settings" << endl; } DownloaderInterface::initialize(); // Load filter libraries, if any Dijon::FilterFactory::loadFilters(string(LIBDIR) + "/pinot/filters"); Dijon::FilterFactory::loadFilters(confDirectory + "/filters"); // Load backends, if any ModuleFactory::loadModules(string(LIBDIR) + "/pinot/backends"); ModuleFactory::loadModules(confDirectory + "/backends"); // Localize language names Languages::setIntlName(0, _("Unknown")); Languages::setIntlName(1, _("Danish")); Languages::setIntlName(2, _("Dutch")); Languages::setIntlName(3, _("English")); Languages::setIntlName(4, _("Finnish")); Languages::setIntlName(5, _("French")); Languages::setIntlName(6, _("German")); Languages::setIntlName(7, _("Hungarian")); Languages::setIntlName(8, _("Italian")); Languages::setIntlName(9, _("Norwegian")); Languages::setIntlName(10, _("Portuguese")); Languages::setIntlName(11, _("Romanian")); Languages::setIntlName(12, _("Russian")); Languages::setIntlName(13, _("Spanish")); Languages::setIntlName(14, _("Swedish")); Languages::setIntlName(15, _("Turkish")); // Load the settings settings.load(PinotSettings::LOAD_ALL); // Catch interrupts #ifdef HAVE_SIGACTION struct sigaction newAction; sigemptyset(&newAction.sa_mask); newAction.sa_flags = 0; newAction.sa_handler = quitAll; sigaction(SIGINT, &newAction, NULL); sigaction(SIGQUIT, &newAction, NULL); sigaction(SIGTERM, &newAction, NULL); #else signal(SIGINT, quitAll); #ifdef SIGQUIT signal(SIGQUIT, quitAll); #endif signal(SIGTERM, quitAll); #endif // Open the daemon index in read-write mode bool wasObsoleteFormat = false; if (ModuleFactory::openOrCreateIndex(settings.m_defaultBackend, settings.m_daemonIndexLocation, wasObsoleteFormat, false) == false) { clog << "Couldn't open index " << settings.m_daemonIndexLocation << endl; return EXIT_FAILURE; } if (wasObsoleteFormat == true) { resetHistory = resetLabels = true; } // Do the same for the history database PinotSettings::checkHistoryDatabase(); string historyDatabase(settings.getHistoryDatabaseName()); if ((historyDatabase.empty() == true) || (ActionQueue::create(historyDatabase) == false) || (CrawlHistory::create(historyDatabase) == false) || (MetaDataBackup::create(historyDatabase) == false) || (QueryHistory::create(historyDatabase) == false) || (ViewHistory::create(historyDatabase) == false)) { clog << "Couldn't create history database " << historyDatabase << endl; return EXIT_FAILURE; } else { ActionQueue actionQueue(historyDatabase, Glib::get_application_name()); QueryHistory queryHistory(historyDatabase); ViewHistory viewHistory(historyDatabase); time_t timeNow = time(NULL); unsigned int actionsCount = actionQueue.getItemsCount(ActionQueue::INDEX); // Don't expire actions left from last time actionsCount += actionQueue.getItemsCount(ActionQueue::UNINDEX); clog << actionsCount << " actions left" << endl; // Expire the rest queryHistory.expireItems(timeNow); viewHistory.expireItems(timeNow); } atexit(closeAll); #ifdef HAVE_LINUX_SCHED_H // Set the scheduling policy struct sched_param schedParam; if (sched_getparam(0, &schedParam) == -1) { clog << "Couldn't get current scheduling policy" << endl; } else if (sched_setscheduler(0, SCHED_IDLE, &schedParam) == -1) { clog << "Couldn't set scheduling policy" << endl; } #else // Change the daemon's priority if (setpriority(PRIO_PROCESS, 0, priority) == -1) { clog << "Couldn't set scheduling priority to " << priority << endl; } #endif DaemonState server; IndexInterface *pIndex = NULL; g_pState = &server; try { set labels; bool gotLabels = false; server.register_session(); pIndex = settings.getIndex(settings.m_daemonIndexLocation); if (pIndex != NULL) { string indexVersion(pIndex->getMetadata("version")); gotLabels = pIndex->getLabels(labels); // What version is the index at ? if (indexVersion.empty() == true) { indexVersion = "0.0"; } if (ignoreVersion == true) { // Better reset labels, they may have been lost too resetLabels = true; } else if (indexVersion < PINOT_INDEX_MIN_VERSION) { clog << "Upgrading index from version " << indexVersion << " to " << VERSION << endl; reindex = true; } if (reindex == true) { // Reset the index so that all documents are reindexed pIndex->reset(); server.flush_and_reclaim(); clog << "Reset index" << endl; resetHistory = resetLabels = true; } pIndex->setMetadata("version", VERSION); pIndex->setMetadata("dbus-status", "Running"); } if (resetHistory == true) { CrawlHistory crawlHistory(historyDatabase); map sources; // Reset the history crawlHistory.getSources(sources); for (std::map::iterator sourceIter = sources.begin(); sourceIter != sources.end(); ++sourceIter) { crawlHistory.deleteItems(sourceIter->first); crawlHistory.deleteSource(sourceIter->first); } clog << "Reset crawler history" << endl; } if ((resetLabels == true) && (pIndex != NULL)) { // Re-apply the labels list if (gotLabels == false) { // If this is an upgrade from a version < 0.80, the labels list // needs to be pulled from the configuration file pIndex->setLabels(settings.m_labels, true); clog << "Set labels as per the configuration file" << endl; } else { pIndex->setLabels(labels, true); } } // Connect to the quit signal server.getQuitSignal().connect(sigc::ptr_fun(&quitAll)); // Connect to threads' finished signal server.connect(); server.start(reindex); // Run the main loop g_refMainLoop->run(); } catch (const Glib::Exception &e) { clog << e.what() << endl; return EXIT_FAILURE; } catch (const char *pMsg) { clog << pMsg << endl; return EXIT_FAILURE; } catch (...) { clog << "Unknown exception" << endl; return EXIT_FAILURE; } if (pIndex != NULL) { if (server.is_flag_set(DaemonState::DISCONNECTED) == true) { pIndex->setMetadata("dbus-status", "Disconnected"); } else if (server.is_flag_set(DaemonState::STOPPED) == true) { pIndex->setMetadata("dbus-status", "Stopped"); } else { pIndex->setMetadata("dbus-status", ""); } delete pIndex; } // Stop everything server.disconnect(); server.mustQuit(true); g_pState = NULL; return EXIT_SUCCESS; } pinot-1.22/Core/pinot-dbus-daemon.xml000066400000000000000000000325341470740426600175470ustar00rootroot00000000000000 pinot-1.22/Core/pinot-index.1000066400000000000000000000032401470740426600160100ustar00rootroot00000000000000.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.3. .TH PINOT-INDEX "1" "October 2024" "pinot 1.22" "User Commands" .SH NAME pinot-index \- Index documents from the command-line .SH SYNOPSIS .B pinot-index [\fI\,OPTIONS\/\fR] \fI\,--db DATABASE URLS\/\fR .SH DESCRIPTION ModuleFactory::loadModules: xapian is supported by libxapianbackend.so pinot\-index \- Index documents from the command\-line .SH OPTIONS .TP \fB\-b\fR, \fB\-\-backend\fR name of back\-end to use (default xapian) .TP \fB\-c\fR, \fB\-\-check\fR check whether the given URL is in the index .TP \fB\-d\fR, \fB\-\-db\fR path to, or name of, index to use (mandatory) .TP \fB\-h\fR, \fB\-\-help\fR display this help and exit .TP \fB\-i\fR, \fB\-\-index\fR index the given URL .TP \fB\-o\fR, \fB\-\-override\fR MIME type detection override, as TYPE:EXT .TP \fB\-s\fR, \fB\-\-showinfo\fR show information about the document .TP \fB\-v\fR, \fB\-\-version\fR output version information and exit .PP Supported back\-ends are: 'xapian' .SH EXAMPLES pinot\-index \-\-check \-\-showinfo \-\-backend xapian \-\-db ~/.pinot/daemon ../Bozo.txt .PP pinot\-index \-\-index \-\-db Docs https://github.com/FabriceColin/pinot .PP pinot\-index \-\-index \-\-db Docs \-\-override text/x\-rst:rst /usr/share/doc/python\-kitchen\-1.1.1/docs/index.rst .PP Indexing documents to My Web Pages or My Documents with pinot\-index is not recommended .SH "REPORTING BUGS" Report bugs to fabrice.colin@gmail.com .PP .br This is free software. You may redistribute copies of it under the terms of the GNU General Public License . .br There is NO WARRANTY, to the extent permitted by law. pinot-1.22/Core/pinot-index.cpp000066400000000000000000000345531470740426600164450ustar00rootroot00000000000000/* * Copyright 2005-2022 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "Languages.h" #include "Memory.h" #include "MIMEScanner.h" #include "Url.h" #include "FilterFactory.h" #include "DownloaderFactory.h" #include "FilterWrapper.h" #include "ModuleFactory.h" #include "ActionQueue.h" #include "PinotSettings.h" #include "WorkerThreads.h" using namespace std; static struct option g_longOptions[] = { {"backend", 1, 0, 'b'}, {"check", 0, 0, 'c'}, {"db", 1, 0, 'd'}, {"help", 0, 0, 'h'}, {"index", 0, 0, 'i'}, {"override", 1, 0, 'o'}, {"showinfo", 0, 0, 's'}, {"version", 0, 0, 'v'}, {0, 0, 0, 0} }; static Glib::RefPtr g_refMainLoop; class IndexingState : public QueueManager { public: IndexingState(const string &indexLocation) : QueueManager(indexLocation, 60, true), m_docId(0) { // Disable implicit flushing WorkerThread::immediateFlush(false); m_onThreadEndSignal.connect(sigc::mem_fun(*this, &IndexingState::on_thread_end)); } virtual ~IndexingState() { } virtual Glib::ustring queue_index(const DocumentInfo &docInfo) { // Index directly return index_document(docInfo); } void on_thread_end(WorkerThread *pThread) { string indexedUrl; if (pThread == NULL) { return; } string type(pThread->getType()); bool isStopped = pThread->isStopped(); #ifdef DEBUG clog << "IndexingState::on_thread_end: end of thread " << type << " " << pThread->getId() << endl; #endif // What type of thread was it ? if ((isStopped == false) && ((type == "IndexingThread") || (type == "DirectoryScannerThread"))) { IndexingThread *pIndexThread = dynamic_cast(pThread); if (pIndexThread == NULL) { delete pThread; return; } // Get the document ID of the URL we have just indexed m_docId = pIndexThread->getDocumentID(); // Explicitely flush the index once a directory has been crawled IndexInterface *pIndex = PinotSettings::getInstance().getIndex(m_defaultIndexLocation); if (pIndex != NULL) { pIndex->flush(); delete pIndex; } } // Delete the thread delete pThread; Memory::reclaim(); // Stop there g_refMainLoop->quit(); } unsigned int getDocumentID(void) const { return m_docId; } protected: unsigned int m_docId; }; static IndexingState *g_pState = NULL; static void printHelp(void) { map engines; // Help ModuleFactory::loadModules(string(LIBDIR) + string("/pinot/backends")); ModuleFactory::getSupportedEngines(engines); clog << "pinot-index - Index documents from the command-line\n\n" << "Usage: pinot-index [OPTIONS] --db DATABASE URLS\n\n" << "Options:\n" << " -b, --backend name of back-end to use (default " << PinotSettings::getInstance().m_defaultBackend << ")\n" << " -c, --check check whether the given URL is in the index\n" << " -d, --db path to, or name of, index to use (mandatory)\n" << " -h, --help display this help and exit\n" << " -i, --index index the given URL\n" << " -o, --override MIME type detection override, as TYPE:EXT\n" << " -s, --showinfo show information about the document\n" << " -v, --version output version information and exit\n\n" << "Supported back-ends are:"; for (map::const_iterator engineIter = engines.begin(); engineIter != engines.end(); ++engineIter) { if ((engineIter->second == true) && (ModuleFactory::isSupported(engineIter->first.m_name, true) == true)) { clog << " '" << engineIter->first.m_name << "'"; } } ModuleFactory::unloadModules(); clog << "\n\nExamples:\n" << "pinot-index --check --showinfo --backend xapian --db ~/.pinot/daemon ../Bozo.txt\n\n" << "pinot-index --index --db Docs https://github.com/FabriceColin/pinot\n\n" << "pinot-index --index --db Docs --override text/x-rst:rst /usr/share/doc/python-kitchen-1.1.1/docs/index.rst\n\n" << "Indexing documents to My Web Pages or My Documents with pinot-index is not recommended\n\n" << "Report bugs to " << PACKAGE_BUGREPORT << endl; } static void closeAll(void) { if (g_pState != NULL) { delete g_pState; g_pState = NULL; } // Close everything ModuleFactory::unloadModules(); DownloaderInterface::shutdown(); MIMEScanner::shutdown(); } static void quitAll(int sigNum) { if (g_refMainLoop->is_running() == true) { clog << "Quitting..." << endl; if (g_pState != NULL) { g_pState->mustQuit(true); } g_refMainLoop->quit(); } } int main(int argc, char **argv) { string type, option; string backendType, databaseName; int longOptionIndex = 0; bool checkDocument = false, indexDocument = false, showInfo = false, success = false; // Look at the options int optionChar = getopt_long(argc, argv, "b:cd:hio:sv", g_longOptions, &longOptionIndex); while (optionChar != -1) { set engines; switch (optionChar) { case 'b': if (optarg != NULL) { backendType = optarg; } break; case 'c': checkDocument = true; break; case 'd': if (optarg != NULL) { databaseName = optarg; } break; case 'h': printHelp(); return EXIT_SUCCESS; case 'i': indexDocument = true; break; case 'o': if (optarg != NULL) { string override(optarg); string::size_type pos = override.find(':'); if ((pos != string::npos) && (pos + 1 < override.length())) { MIMEScanner::addOverride(override.substr(0, pos), override.substr(pos + 1)); } } break; case 's': showInfo = true; break; case 'v': clog << "pinot-index - " << PACKAGE_STRING << "\n\n" << "This is free software. You may redistribute copies of it under the terms of\n" << "the GNU General Public License .\n" << "There is NO WARRANTY, to the extent permitted by law." << endl; return EXIT_SUCCESS; default: return EXIT_FAILURE; } // Next option optionChar = getopt_long(argc, argv, "b:cd:hio:sv", g_longOptions, &longOptionIndex); } #if defined(ENABLE_NLS) bindtextdomain(GETTEXT_PACKAGE, PACKAGE_LOCALE_DIR); bind_textdomain_codeset(GETTEXT_PACKAGE, "UTF-8"); textdomain(GETTEXT_PACKAGE); #endif //ENABLE_NLS Glib::init(); Gio::init(); // Initialize threads support before doing anything else if (Glib::thread_supported() == false) { Glib::thread_init(); } // Initialize GType #if !GLIB_CHECK_VERSION(2,35,0) g_type_init(); #endif if (argc == 1) { printHelp(); return EXIT_SUCCESS; } if ((argc < 2) || (argc - optind == 0)) { clog << "Not enough parameters" << endl; return EXIT_FAILURE; } if (((indexDocument == false) && (checkDocument == false)) || (databaseName.empty() == true)) { clog << "Incorrect parameters" << endl; return EXIT_FAILURE; } g_refMainLoop = Glib::MainLoop::create(); Glib::set_application_name("Pinot Indexer"); // This will create the necessary directories on the first run PinotSettings &settings = PinotSettings::getInstance(); string confDirectory(PinotSettings::getConfigurationDirectory()); if (MIMEScanner::initialize(PinotSettings::getHomeDirectory() + "/.local", string(SHARED_MIME_INFO_PREFIX)) == false) { clog << "Couldn't load MIME settings" << endl; } DownloaderInterface::initialize(); // Load filter libraries, if any Dijon::FilterFactory::loadFilters(string(LIBDIR) + string("/pinot/filters")); Dijon::FilterFactory::loadFilters(confDirectory + "/filters"); // Load backends, if any ModuleFactory::loadModules(string(LIBDIR) + string("/pinot/backends")); ModuleFactory::loadModules(confDirectory + "/backends"); // Localize language names Languages::setIntlName(0, _("Unknown")); Languages::setIntlName(1, _("Danish")); Languages::setIntlName(2, _("Dutch")); Languages::setIntlName(3, _("English")); Languages::setIntlName(4, _("Finnish")); Languages::setIntlName(5, _("French")); Languages::setIntlName(6, _("German")); Languages::setIntlName(7, _("Hungarian")); Languages::setIntlName(8, _("Italian")); Languages::setIntlName(9, _("Norwegian")); Languages::setIntlName(10, _("Portuguese")); Languages::setIntlName(11, _("Romanian")); Languages::setIntlName(12, _("Russian")); Languages::setIntlName(13, _("Spanish")); Languages::setIntlName(14, _("Swedish")); Languages::setIntlName(15, _("Turkish")); // Load the settings settings.load(PinotSettings::LOAD_ALL); // Catch interrupts #ifdef HAVE_SIGACTION struct sigaction newAction; sigemptyset(&newAction.sa_mask); newAction.sa_flags = 0; newAction.sa_handler = quitAll; sigaction(SIGINT, &newAction, NULL); sigaction(SIGQUIT, &newAction, NULL); sigaction(SIGTERM, &newAction, NULL); #else signal(SIGINT, quitAll); #ifdef SIGQUIT signal(SIGQUIT, quitAll); #endif signal(SIGTERM, quitAll); #endif // Is this a known index name ? PinotSettings::IndexProperties indexProps = settings.getIndexPropertiesByName(databaseName); if (indexProps.m_name.empty() == true) { // No, it's not indexProps.m_location = databaseName; } if (backendType.empty() == true) { backendType = settings.m_defaultBackend; } // Make sure the index is open in the correct mode bool wasObsoleteFormat = false; if (ModuleFactory::openOrCreateIndex(backendType, indexProps.m_location, wasObsoleteFormat, (indexDocument ? false : true)) == false) { clog << "Couldn't open index " << indexProps.m_location << endl; return EXIT_FAILURE; } // Do the same for the history database string historyDatabase(settings.getHistoryDatabaseName()); if ((historyDatabase.empty() == true) || (ActionQueue::create(historyDatabase) == false)) { clog << "Couldn't create history database " << historyDatabase << endl; return EXIT_FAILURE; } else { ActionQueue actionQueue(historyDatabase, Glib::get_application_name()); time_t timeNow = time(NULL); // Expire actionQueue.expireItems(timeNow); } atexit(closeAll); // Get a read-write index of the given type IndexInterface *pIndex = ModuleFactory::getIndex(backendType, indexProps.m_location); if (pIndex == NULL) { clog << "Couldn't obtain index for " << indexProps.m_location << endl; return EXIT_FAILURE; } while (optind < argc) { string urlParam(argv[optind]); Url thisUrl(urlParam, ""); // Rewrite the URL, dropping user name and password which we don't support urlParam = thisUrl.getProtocol(); urlParam += "://"; if (thisUrl.isLocal() == false) { urlParam += thisUrl.getHost(); urlParam += "/"; } urlParam += thisUrl.getLocation(); if (thisUrl.getFile().empty() == false) { urlParam += "/"; urlParam += thisUrl.getFile(); } #ifdef DEBUG clog << "URL rewritten to " << urlParam << endl; #endif DocumentInfo docInfo("", urlParam, MIMEScanner::scanUrl(thisUrl), ""); unsigned int docId = 0; if ((checkDocument == true) && (pIndex->isGood() == true)) { set docIds; docId = pIndex->hasDocument(urlParam); if (docId > 0) { clog << urlParam << ": document ID " << docId << endl; success = true; } else if ((pIndex->listDocuments(urlParam, docIds, IndexInterface::BY_FILE, 100, 0) == true) && (docIds.empty() == false)) { docId = *(docIds.begin()); clog << urlParam << ": document ID " << docId << " and at least " << docIds.size() - 1 << " others" << endl; success = true; } else { clog << urlParam << ": not found" << endl; } } if (indexDocument == true) { if (g_pState == NULL) { g_pState = new IndexingState(indexProps.m_location); } // Connect to threads' finished signal g_pState->connect(); Glib::ustring status(g_pState->queue_index(docInfo)); if (status.empty() == true) { // Run the main loop g_refMainLoop->run(); } else { clog << status << endl; } // Stop everything g_pState->disconnect(); docId = g_pState->getDocumentID(); if (g_pState->mustQuit() == true) { break; } } if ((showInfo == true) && (docId > 0)) { set labels; clog << "Index version: " << pIndex->getMetadata("version") << endl; if (pIndex->getDocumentInfo(docId, docInfo) == true) { clog << "Document ID : " << docId << endl; clog << "Location : '" << docInfo.getLocation(true) << "'" << endl; clog << "Title : " << docInfo.getTitle() << endl; clog << "Type : " << docInfo.getType() << endl; clog << "Language : " << docInfo.getLanguage() << endl; clog << "Date : " << docInfo.getTimestamp() << endl; clog << "Size : " << docInfo.getSize() << endl; } if (pIndex->getDocumentLabels(docId, labels) == true) { clog << "Labels : "; for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { if (labelIter->substr(0, 2) == "X-") { continue; } clog << "[" << Url::escapeUrl(*labelIter) << "]"; } clog << endl; } vector typeActions; MIMEScanner::getDefaultActions(docInfo.getType(), thisUrl.isLocal(), typeActions); for (vector::const_iterator actionIter = typeActions.begin(); actionIter != typeActions.end(); ++actionIter) { clog << "Action : '" << actionIter->m_name << "' " << actionIter->m_exec << endl; } } // Next ++optind; } delete pIndex; // Did whatever operation we carried out succeed ? if (success == true) { return EXIT_SUCCESS; } return EXIT_FAILURE; } pinot-1.22/Core/pinot-search.1000066400000000000000000000033601470740426600161510ustar00rootroot00000000000000.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.3. .TH PINOT-SEARCH "1" "October 2024" "pinot 1.22" "User Commands" .SH NAME pinot-search \- Query search engines from the command-line .SH SYNOPSIS .B pinot-search [\fI\,OPTIONS\/\fR] \fI\,SEARCHENGINETYPE SEARCHENGINENAME|SEARCHENGINEOPTION QUERYINPUT\/\fR .SH DESCRIPTION ModuleFactory::loadModules: xapian is supported by libxapianbackend.so pinot\-search \- Query search engines from the command\-line .SH OPTIONS .TP \fB\-d\fR, \fB\-\-datefirst\fR sort by date then by relevance .TP \fB\-h\fR, \fB\-\-help\fR display this help and exit .TP \fB\-l\fR, \fB\-\-locationonly\fR only show the location of each result .TP \fB\-m\fR, \fB\-\-max\fR maximum number of results (default 10) .TP \fB\-r\fR, \fB\-\-storedquery\fR query input is the name of a stored query .TP \fB\-s\fR, \fB\-\-stemming\fR stemming language (in English) .TP \fB\-c\fR, \fB\-\-tocsv\fR file to export results in CSV format to .TP \fB\-x\fR, \fB\-\-toxml\fR file to export results in XML format to .TP \fB\-v\fR, \fB\-\-version\fR output version information and exit .PP Supported search engine types are: 'opensearch' 'sherlock' 'xapian' .SH EXAMPLES pinot\-search opensearch /usr/share/pinot/engines/KrustyDescription.xml "clowns" .PP pinot\-search \-\-max 20 sherlock /usr/share/pinot/engines/Bozo.src "clowns" .PP pinot\-search xapian ~/.pinot/index "label:Clowns" .PP pinot\-search \-\-stemming english xapian somehostname:12345 "clowning" .SH "REPORTING BUGS" Report bugs to fabrice.colin@gmail.com .PP .br This is free software. You may redistribute copies of it under the terms of the GNU General Public License . .br There is NO WARRANTY, to the extent permitted by law. pinot-1.22/Core/pinot-search.cpp000066400000000000000000000243021470740426600165720ustar00rootroot00000000000000/* * Copyright 2005-2022 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "Languages.h" #include "MIMEScanner.h" #include "Url.h" #include "DownloaderFactory.h" #include "ModuleFactory.h" #include "ResultsExporter.h" #include "WebEngine.h" #include "PinotSettings.h" using namespace std; static struct option g_longOptions[] = { {"datefirst", 0, 0, 'd'}, {"help", 0, 0, 'h'}, {"locationonly", 0, 0, 'l'}, {"max", 1, 0, 'm'}, {"storedquery", 0, 0, 'r'}, {"stemming", 1, 0, 's'}, {"tocsv", 1, 0, 'c'}, {"toxml", 1, 0, 'x'}, {"version", 0, 0, 'v'}, {0, 0, 0, 0} }; static void printHelp(void) { map engines; // Help ModuleFactory::loadModules(string(LIBDIR) + string("/pinot/backends")); ModuleFactory::getSupportedEngines(engines); ModuleFactory::unloadModules(); clog << "pinot-search - Query search engines from the command-line\n\n" << "Usage: pinot-search [OPTIONS] SEARCHENGINETYPE SEARCHENGINENAME|SEARCHENGINEOPTION QUERYINPUT\n\n" << "Options:\n" << " -d, --datefirst sort by date then by relevance\n" << " -h, --help display this help and exit\n" << " -l, --locationonly only show the location of each result\n" << " -m, --max maximum number of results (default 10)\n" << " -r, --storedquery query input is the name of a stored query\n" << " -s, --stemming stemming language (in English)\n" << " -c, --tocsv file to export results in CSV format to\n" << " -x, --toxml file to export results in XML format to\n" << " -v, --version output version information and exit\n" << "Supported search engine types are:"; for (map::const_iterator engineIter = engines.begin(); engineIter != engines.end(); ++engineIter) { clog << " '" << engineIter->first.m_name << "'"; } clog << "\n\nExamples:\n" << "pinot-search opensearch " << PREFIX << "/share/pinot/engines/KrustyDescription.xml \"clowns\"\n\n" << "pinot-search --max 20 sherlock " << PREFIX << "/share/pinot/engines/Bozo.src \"clowns\"\n\n" << "pinot-search xapian ~/.pinot/index \"label:Clowns\"\n\n" << "pinot-search --stemming english xapian somehostname:12345 \"clowning\"\n\n" << "Report bugs to " << PACKAGE_BUGREPORT << endl; } int main(int argc, char **argv) { string engineType, option, csvExport, xmlExport, stemLanguage; unsigned int maxResultsCount = 10; int longOptionIndex = 0; bool printResults = true; bool sortByDate = false; bool locationOnly = false; bool isStoredQuery = false; // Look at the options int optionChar = getopt_long(argc, argv, "c:dhlm:rs:vx:", g_longOptions, &longOptionIndex); while (optionChar != -1) { switch (optionChar) { case 'c': if (optarg != NULL) { csvExport = optarg; printResults = false; } break; case 'd': sortByDate = true; break; case 'h': printHelp(); return EXIT_SUCCESS; case 'l': locationOnly = true; break; case 'm': if (optarg != NULL) { maxResultsCount = (unsigned int )atoi(optarg); } break; case 'r': isStoredQuery = true; break; case 's': if (optarg != NULL) { stemLanguage = optarg; } break; case 'v': clog << "pinot-search - " << PACKAGE_STRING << "\n\n" << "This is free software. You may redistribute copies of it under the terms of\n" << "the GNU General Public License .\n" << "There is NO WARRANTY, to the extent permitted by law." << endl; return EXIT_SUCCESS; case 'x': if (optarg != NULL) { xmlExport = optarg; printResults = false; } break; default: return EXIT_FAILURE; } // Next option optionChar = getopt_long(argc, argv, "c:dhlm:rs:vx:", g_longOptions, &longOptionIndex); } #if defined(ENABLE_NLS) bindtextdomain(GETTEXT_PACKAGE, PACKAGE_LOCALE_DIR); bind_textdomain_codeset(GETTEXT_PACKAGE, "UTF-8"); textdomain(GETTEXT_PACKAGE); #endif //ENABLE_NLS Glib::init(); Gio::init(); if (argc == 1) { printHelp(); return EXIT_SUCCESS; } if ((argc < 4) || (argc - optind != 3)) { clog << "Wrong number of parameters" << endl; return EXIT_FAILURE; } // This will create the necessary directories on the first run PinotSettings &settings = PinotSettings::getInstance(); string confDirectory(PinotSettings::getConfigurationDirectory()); if (MIMEScanner::initialize(PinotSettings::getHomeDirectory() + "/.local", string(SHARED_MIME_INFO_PREFIX)) == false) { clog << "Couldn't load MIME settings" << endl; } DownloaderInterface::initialize(); ModuleFactory::loadModules(string(LIBDIR) + string("/pinot/backends")); ModuleFactory::loadModules(confDirectory + "/backends"); // Localize language names Languages::setIntlName(0, _("Unknown")); Languages::setIntlName(1, _("Danish")); Languages::setIntlName(2, _("Dutch")); Languages::setIntlName(3, _("English")); Languages::setIntlName(4, _("Finnish")); Languages::setIntlName(5, _("French")); Languages::setIntlName(6, _("German")); Languages::setIntlName(7, _("Hungarian")); Languages::setIntlName(8, _("Italian")); Languages::setIntlName(9, _("Norwegian")); Languages::setIntlName(10, _("Portuguese")); Languages::setIntlName(11, _("Romanian")); Languages::setIntlName(12, _("Russian")); Languages::setIntlName(13, _("Spanish")); Languages::setIntlName(14, _("Swedish")); Languages::setIntlName(15, _("Turkish")); // Load the settings settings.load(PinotSettings::LOAD_ALL); engineType = argv[optind]; option = argv[optind + 1]; char *pQueryInput = argv[optind + 2]; // Set the query QueryProperties queryProps("pinot-search", ""); if (isStoredQuery == true) { const map &queries = settings.getQueries(); map::const_iterator queryIter = queries.find(pQueryInput); if (queryIter != queries.end()) { queryProps = queryIter->second; } else { clog << "Couldn't find stored query " << pQueryInput << endl; DownloaderInterface::shutdown(); MIMEScanner::shutdown(); return EXIT_FAILURE; } } else { queryProps.setFreeQuery(pQueryInput); } queryProps.setStemmingLanguage(stemLanguage); queryProps.setMaximumResultsCount(maxResultsCount); if (sortByDate == true) { queryProps.setSortOrder(QueryProperties::DATE_DESC); } // Which SearchEngine ? SearchEngineInterface *pEngine = ModuleFactory::getSearchEngine(engineType, option); if (pEngine == NULL) { clog << "Couldn't obtain search engine instance" << endl; DownloaderInterface::shutdown(); MIMEScanner::shutdown(); return EXIT_FAILURE; } // Set up the proxy WebEngine *pWebEngine = dynamic_cast(pEngine); if (pWebEngine != NULL) { DownloaderInterface *pDownloader = pWebEngine->getDownloader(); if ((pDownloader != NULL) && (settings.m_proxyEnabled == true) && (settings.m_proxyAddress.empty() == false)) { char portStr[64]; pDownloader->setSetting("proxyaddress", settings.m_proxyAddress); snprintf(portStr, 64, "%u", settings.m_proxyPort); pDownloader->setSetting("proxyport", portStr); pDownloader->setSetting("proxytype", settings.m_proxyType); } pWebEngine->setEditableValues(settings.m_editablePluginValues); } pEngine->setDefaultOperator(SearchEngineInterface::DEFAULT_OP_AND); if (pEngine->runQuery(queryProps) == true) { string resultsPage; unsigned int estimatedResultsCount = pEngine->getResultsCountEstimate(); const vector &resultsList = pEngine->getResults(); if (resultsList.empty() == false) { if (printResults == true) { unsigned int count = 0; if (locationOnly == false) { clog << "Showing " << resultsList.size() << " results of about " << estimatedResultsCount << endl; } vector::const_iterator resultIter = resultsList.begin(); while (resultIter != resultsList.end()) { string rawUrl(resultIter->getLocation(true)); if (locationOnly == false) { clog << count << " Location : '" << rawUrl << "'"<< endl; clog << count << " Title : " << resultIter->getTitle() << endl; clog << count << " Type : " << resultIter->getType() << endl; clog << count << " Language : " << resultIter->getLanguage() << endl; clog << count << " Date : " << resultIter->getTimestamp() << endl; clog << count << " Size : " << resultIter->getSize() << endl; clog << count << " Extract : " << resultIter->getExtract() << endl; clog << count << " Score : " << resultIter->getScore() << endl; } else { clog << rawUrl << endl; } ++count; // Next ++resultIter; } } else { string engineName(ModuleFactory::getSearchEngineName(engineType, option)); if (csvExport.empty() == false) { CSVExporter exporter(csvExport, queryProps); exporter.exportResults(engineName, maxResultsCount, resultsList); } if (xmlExport.empty() == false) { OpenSearchExporter exporter(xmlExport, queryProps); exporter.exportResults(engineName, maxResultsCount, resultsList); } } } else { clog << "No results" << endl; } } else { clog << "Couldn't run query on search engine " << engineType << endl; } delete pEngine; ModuleFactory::unloadModules(); DownloaderInterface::shutdown(); MIMEScanner::shutdown(); return EXIT_SUCCESS; } pinot-1.22/FAQ000066400000000000000000000107131470740426600131350ustar00rootroot00000000000000 * The index is huuuuge ! How can I make it smaller ? The format of Xapian indexes is optimized for writing. Once all documents have been indexed, compacting the index may reduce its size quite significantly. If you have Xapian >= 1.0.6, xapian-compact will let you do that. Stop the daemon and run the following commands : $ mv ~/.pinot/daemon ~/.pinot/backup-daemon $ xapian-compact ~/.pinot/backup-daemon ~/.pinot/daemon This may take a little while. Once xapian-compact has completed, restart the daemon with : $ pinot-dbus-daemon -i The -i parameter instructs the daemon to ignore the index version number, which may not have been set in the compacted index. You may also want to disable support for spelling and enable stopwords removal. See the README for more details. * Memory usage is too high ! How can I reduce it ? Since 0.92, Pinot hints at the OS that it can reclaim unused memory on a regular basis. This works on Linux but maybe not on other OSes. If your system is memory constrained, you can : - increase the frequency at which memory is returned to the OS by setting MALLOC_TRIM_THRESHOLD_ to a value lower than 128kb (the default) : $ export MALLOC_TRIM_THRESHOLD_=65536 - reduce the number of documents Xapian buffers in memory before changes are flushed to the disk (default is 10000) : $ export XAPIAN_FLUSH_THRESHOLD=1000 Note that the daemon explicitely flushes the index once it has crawled one of the directory configured in Preferences. If the above doesn't help and there is a large number of big documents to index, you may want to configure Pinot to index your corpus in stages. * Pinot doesn't use my favourite browser XYZ to open HTML documents Even if you have set XYZ as your favourite Web browser in your desktop environment, it may not be setup as the default application for HTML files. In Gnome for instance, one has to select XYZ as default for all HTML files using Nautilus (Properties dialog, Open With tab). This applies to applications for any other file type. * If you experience segfaults at startup and you are on Fedora, chances are it's because of libxml++/libsigc++. See this Bugzilla entry : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=178592 The latest version seems to fix this issue. * If the daemon crashes with a backtrace pointing at boost::pool_allocator<>, rebuild with the configure option "--enable-mempool=no". * If the daemon crashes seemingly randomly while indexing, it may be because SQLite wasn't built thread-safe. I have witnessed this mostly on dual CPU machines, but others are not immune. Try rebuilding SQLite by passing "--enable-threadsafe --enable-threads-override-locks" to configure. * If you have built Pinot from source, make sure you have done a "make install". Pinot will fail to start if it can't find stuff it needs, its icon for instance. * If "make install" fails with an error about Categories in pinot.desktop and you have desktop-file-utils 0.11, either downgrade to 0.10 or upgrade to 0.12 if possible, or edit the top-level Makefile and replace the line @desktop-file-install --vendor="" --dir=$(DESTDIR)$(datadir)/applications pinot.desktop with $(INSTALL_DATA) pinot.desktop $(DESTDIR)$(datadir)/applications/pinot.desktop and run "make install" again. * On FreeBSD, threading issues may cause the daemon to crash unexpectedly. A fix is to add the following lines to /etc/libmap.conf (which may not exist) : [/usr/local/bin/pinot-dbus-daemon] libpthread.so.2 libc_r.so.6 libpthread.so libc_r.so [/usr/local/bin/pinot] libpthread.so.2 libc_r.so.6 libpthread.so libc_r.so * If you are using KDE 3.* and pinot-dbus-daemon does not autostart, symlink the file /etc/xdg/autostart/pinot-dbus-daemon.desktop to either $(kde-config --prefix)/share/autostart (for all users) or ~/.kde/Autostart (current user only). * If you suspect search failed to find a particular document, you may take a closer look with pinot-index --check --showinfo --backend xapian --db ~/.pinot/daemon/ /path/to/file This will output metadata about the document for the given file, including a document ID. Xapian's delve utility will let you take a peek at the list of terms this document holds with xapian-delve -r DOCUMENT_ID ~/.pinot/daemon/ pinot-1.22/INSTALL000066400000000000000000000224321470740426600136350ustar00rootroot00000000000000Installation Instructions ************************* Copyright (C) 1994, 1995, 1996, 1999, 2000, 2001, 2002, 2004, 2005 Free Software Foundation, Inc. This file is free documentation; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. Basic Installation ================== These are generic installation instructions. The `configure' shell script attempts to guess correct values for various system-dependent variables used during compilation. It uses those values to create a `Makefile' in each directory of the package. It may also create one or more `.h' files containing system-dependent definitions. Finally, it creates a shell script `config.status' that you can run in the future to recreate the current configuration, and a file `config.log' containing compiler output (useful mainly for debugging `configure'). It can also use an optional file (typically called `config.cache' and enabled with `--cache-file=config.cache' or simply `-C') that saves the results of its tests to speed up reconfiguring. (Caching is disabled by default to prevent problems with accidental use of stale cache files.) If you need to do unusual things to compile the package, please try to figure out how `configure' could check whether to do them, and mail diffs or instructions to the address given in the `README' so they can be considered for the next release. If you are using the cache, and at some point `config.cache' contains results you don't want to keep, you may remove or edit it. The file `configure.ac' (or `configure.in') is used to create `configure' by a program called `autoconf'. You only need `configure.ac' if you want to change it or regenerate `configure' using a newer version of `autoconf'. The simplest way to compile this package is: 1. `cd' to the directory containing the package's source code and type `./configure' to configure the package for your system. If you're using `csh' on an old version of System V, you might need to type `sh ./configure' instead to prevent `csh' from trying to execute `configure' itself. Running `configure' takes awhile. While running, it prints some messages telling which features it is checking for. 2. Type `make' to compile the package. 3. Optionally, type `make check' to run any self-tests that come with the package. 4. Type `make install' to install the programs and any data files and documentation. 5. You can remove the program binaries and object files from the source code directory by typing `make clean'. To also remove the files that `configure' created (so you can compile the package for a different kind of computer), type `make distclean'. There is also a `make maintainer-clean' target, but that is intended mainly for the package's developers. If you use it, you may have to get all sorts of other programs in order to regenerate files that came with the distribution. Compilers and Options ===================== Some systems require unusual options for compilation or linking that the `configure' script does not know about. Run `./configure --help' for details on some of the pertinent environment variables. You can give `configure' initial values for configuration parameters by setting variables in the command line or in the environment. Here is an example: ./configure CC=c89 CFLAGS=-O2 LIBS=-lposix *Note Defining Variables::, for more details. Compiling For Multiple Architectures ==================================== You can compile the package for more than one kind of computer at the same time, by placing the object files for each architecture in their own directory. To do this, you must use a version of `make' that supports the `VPATH' variable, such as GNU `make'. `cd' to the directory where you want the object files and executables to go and run the `configure' script. `configure' automatically checks for the source code in the directory that `configure' is in and in `..'. If you have to use a `make' that does not support the `VPATH' variable, you have to compile the package for one architecture at a time in the source code directory. After you have installed the package for one architecture, use `make distclean' before reconfiguring for another architecture. Installation Names ================== By default, `make install' installs the package's commands under `/usr/local/bin', include files under `/usr/local/include', etc. You can specify an installation prefix other than `/usr/local' by giving `configure' the option `--prefix=PREFIX'. You can specify separate installation prefixes for architecture-specific files and architecture-independent files. If you pass the option `--exec-prefix=PREFIX' to `configure', the package uses PREFIX as the prefix for installing programs and libraries. Documentation and other data files still use the regular prefix. In addition, if you use an unusual directory layout you can give options like `--bindir=DIR' to specify different values for particular kinds of files. Run `configure --help' for a list of the directories you can set and what kinds of files go in them. If the package supports it, you can cause programs to be installed with an extra prefix or suffix on their names by giving `configure' the option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'. Optional Features ================= Some packages pay attention to `--enable-FEATURE' options to `configure', where FEATURE indicates an optional part of the package. They may also pay attention to `--with-PACKAGE' options, where PACKAGE is something like `gnu-as' or `x' (for the X Window System). The `README' should mention any `--enable-' and `--with-' options that the package recognizes. For packages that use the X Window System, `configure' can usually find the X include and library files automatically, but if it doesn't, you can use the `configure' options `--x-includes=DIR' and `--x-libraries=DIR' to specify their locations. Specifying the System Type ========================== There may be some features `configure' cannot figure out automatically, but needs to determine by the type of machine the package will run on. Usually, assuming the package is built to be run on the _same_ architectures, `configure' can figure that out, but if it prints a message saying it cannot guess the machine type, give it the `--build=TYPE' option. TYPE can either be a short name for the system type, such as `sun4', or a canonical name which has the form: CPU-COMPANY-SYSTEM where SYSTEM can have one of these forms: OS KERNEL-OS See the file `config.sub' for the possible values of each field. If `config.sub' isn't included in this package, then this package doesn't need to know the machine type. If you are _building_ compiler tools for cross-compiling, you should use the option `--target=TYPE' to select the type of system they will produce code for. If you want to _use_ a cross compiler, that generates code for a platform different from the build platform, you should specify the "host" platform (i.e., that on which the generated programs will eventually be run) with `--host=TYPE'. Sharing Defaults ================ If you want to set default values for `configure' scripts to share, you can create a site shell script called `config.site' that gives default values for variables like `CC', `cache_file', and `prefix'. `configure' looks for `PREFIX/share/config.site' if it exists, then `PREFIX/etc/config.site' if it exists. Or, you can set the `CONFIG_SITE' environment variable to the location of the site script. A warning: not all `configure' scripts look for a site script. Defining Variables ================== Variables not defined in a site shell script can be set in the environment passed to `configure'. However, some packages may run configure again during the build, and the customized values of these variables may be lost. In order to avoid this problem, you should set them in the `configure' command line, using `VAR=value'. For example: ./configure CC=/usr/local2/bin/gcc causes the specified `gcc' to be used as the C compiler (unless it is overridden in the site shell script). Here is a another example: /bin/bash ./configure CONFIG_SHELL=/bin/bash Here the `CONFIG_SHELL=/bin/bash' operand causes subsequent configuration-related scripts to be executed by `/bin/bash'. `configure' Invocation ====================== `configure' recognizes the following options to control how it operates. `--help' `-h' Print a summary of the options to `configure', and exit. `--version' `-V' Print the version of Autoconf used to generate the `configure' script, and exit. `--cache-file=FILE' Enable the cache: use and save the results of the tests in FILE, traditionally `config.cache'. FILE defaults to `/dev/null' to disable caching. `--config-cache' `-C' Alias for `--cache-file=config.cache'. `--quiet' `--silent' `-q' Do not print messages saying which checks are being made. To suppress all normal output, redirect it to `/dev/null' (any error messages will still be shown). `--srcdir=DIR' Look for the package's source code in directory DIR. Usually `configure' can determine that directory automatically. `configure' also accepts some other, not widely useful, options. Run `configure --help' for more details. pinot-1.22/IndexSearch/000077500000000000000000000000001470740426600147765ustar00rootroot00000000000000pinot-1.22/IndexSearch/DBusIndex.cpp000066400000000000000000000350451470740426600173360ustar00rootroot00000000000000/* * Copyright 2007-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include "Languages.h" #include "DBusIndex.h" using std::clog; using std::clog; using std::endl; using std::string; using std::stringstream; using std::vector; using std::set; using std::map; using std::min; using std::pair; using std::tuple; using namespace Gio; using namespace Glib; static const char *g_fieldNames[] = { "caption", "url", "type", "language", "modtime", "size", "extract", "score", NULL }; DBusIndex::DBusIndex(IndexInterface *pROIndex) : IndexInterface(), m_refProxy(com::github::fabricecolin::PinotProxy::createForBus_sync( DBus::BUS_TYPE_SESSION, Gio::DBus::PROXY_FLAGS_NONE, PINOT_DBUS_SERVICE_NAME, PINOT_DBUS_OBJECT_PATH)), m_pROIndex(pROIndex) { } DBusIndex::DBusIndex(const DBusIndex &other) : IndexInterface(other), m_pROIndex(other.m_pROIndex) { } DBusIndex::~DBusIndex() { if (m_pROIndex != NULL) { // Noone else is going to delete this delete m_pROIndex; } } DBusIndex &DBusIndex::operator=(const DBusIndex &other) { if (this != &other) { IndexInterface::operator=(other); m_pROIndex = other.m_pROIndex; } return *this; } /// Extracts docInfo from tuples. void DBusIndex::documentInfoFromTuples(const vector> &tuples, DocumentInfo &docInfo) { for (vector>::const_iterator tupleIter = tuples.begin(); tupleIter != tuples.end(); ++tupleIter) { ustring fieldName, fieldValue; std::tie(fieldName, fieldValue) = *tupleIter; // Populate docInfo if (fieldName == g_fieldNames[0]) { docInfo.setTitle(fieldValue.c_str()); } else if (fieldName == g_fieldNames[1]) { docInfo.setLocation(fieldValue.c_str()); } else if (fieldName == g_fieldNames[2]) { docInfo.setType(fieldValue.c_str()); } else if (fieldName == g_fieldNames[3]) { docInfo.setLanguage(Languages::toLocale(fieldValue.c_str())); } else if (fieldName == g_fieldNames[4]) { docInfo.setTimestamp(fieldValue.c_str()); } else if (fieldName == g_fieldNames[5]) { docInfo.setSize((off_t )atoi(fieldValue.c_str())); } else if (fieldName == g_fieldNames[6]) { docInfo.setExtract(fieldValue.c_str()); } else if (fieldName == g_fieldNames[7]) { docInfo.setScore((float)atof(fieldValue.c_str())); } } } /// Converts docInfo to tuples. void DBusIndex::documentInfoToTuples(const DocumentInfo &docInfo, vector> &tuples) { for (unsigned int fieldNum = 0; g_fieldNames[fieldNum] != NULL; ++fieldNum) { string value; stringstream numStr; switch (fieldNum) { case 0: value = docInfo.getTitle(); break; case 1: value = docInfo.getLocation(true); break; case 2: value = docInfo.getType(); break; case 3: value = Languages::toEnglish(docInfo.getLanguage()); break; case 4: value = docInfo.getTimestamp(); break; case 5: numStr << docInfo.getSize(); value = numStr.str(); break; case 6: value = docInfo.getExtract(); break; case 7: numStr << docInfo.getScore(); value = numStr.str(); break; default: break; } if (value.empty() == true) { continue; } tuples.push_back(make_tuple(g_fieldNames[fieldNum], value)); } } /// Asks the D-Bus service to reload its configuration. bool DBusIndex::reload(void) { try { return m_refProxy->Reload_sync(); } catch (const Glib::Error &ex) { clog << "DBusIndex::reload: " << ex.what() << endl; } return false; } /// Gets some statistics from the D-Bus service. bool DBusIndex::getStatistics(unsigned int &crawledCount, unsigned int &docsCount, bool &lowDiskSpace, bool &onBattery, bool &crawling) { try { std::tie(crawledCount, docsCount, lowDiskSpace, onBattery, crawling) = m_refProxy->GetStatistics_sync(); } catch (const Glib::Error &ex) { clog << "DBusIndex::getStatistics: " << ex.what() << endl; } return true; } // // Implementation of IndexInterface // /// Returns false if the index couldn't be opened. bool DBusIndex::isGood(void) const { if (m_pROIndex == NULL) { return false; } return m_pROIndex->isGood(); } /// Gets metadata. string DBusIndex::getMetadata(const string &name) const { if (m_pROIndex == NULL) { return ""; } return m_pROIndex->getMetadata(name); } /// Sets metadata. bool DBusIndex::setMetadata(const string &name, const string &value) const { clog << "DBusIndex::setMetadata: not allowed" << endl; return false; } /// Gets the index location. string DBusIndex::getLocation(void) const { if (m_pROIndex == NULL) { return ""; } return m_pROIndex->getLocation(); } /// Returns a document's properties. bool DBusIndex::getDocumentInfo(unsigned int docId, DocumentInfo &docInfo) const { vector> tuples; try { tuples = m_refProxy->GetDocumentInfo_sync(docId); } catch (const Glib::Error &ex) { clog << "DBusIndex::getDocumentInfo: " << ex.what() << endl; } if (tuples.empty() == true) { return false; } documentInfoFromTuples(tuples, docInfo); return true; } /// Returns a document's terms count. unsigned int DBusIndex::getDocumentTermsCount(unsigned int docId) const { unsigned int termsCount = 0; try { termsCount = m_refProxy->GetDocumentTermsCount_sync(docId); } catch (const Glib::Error &ex) { clog << "DBusIndex::getDocumentTermsCount: " << ex.what() << endl; } return termsCount; } /// Returns a document's terms. bool DBusIndex::getDocumentTerms(unsigned int docId, map &wordsBuffer) const { vector termsList; try { termsList = m_refProxy->GetDocumentTerms_sync(docId); } catch (const Glib::Error &ex) { clog << "DBusIndex::getDocumentTerms: " << ex.what() << endl; } if (termsList.empty() == true) { return false; } unsigned int termPos = 0; for (vector::const_iterator termIter = termsList.begin(); termIter != termsList.end(); ++termIter) { wordsBuffer.insert(pair(termPos, termIter->c_str())); ++termPos; } return true; } /// Sets the list of known labels. bool DBusIndex::setLabels(const set &labels, bool resetLabels) { // Not allowed here return false; } /// Gets the list of known labels. bool DBusIndex::getLabels(set &labels) const { vector labelsList; try { labelsList = m_refProxy->GetLabels_sync(); } catch (const Glib::Error &ex) { clog << "DBusIndex::getLabels: " << ex.what() << endl; } for (vector::const_iterator labelIter = labelsList.begin(); labelIter != labelsList.end(); ++labelIter) { labels.insert(labelIter->c_str()); } return true; } /// Adds a label. bool DBusIndex::addLabel(const string &name) { ustring labelName(name.c_str()); ustring newLabelName; try { newLabelName = m_refProxy->AddLabel_sync(labelName); } catch (const Glib::Error &ex) { clog << "DBusIndex::addLabel: " << ex.what() << endl; } return (newLabelName == labelName); } /// Deletes all references to a label. bool DBusIndex::deleteLabel(const string &name) { ustring labelName(name.c_str()); ustring deletedLabelName; try { deletedLabelName = m_refProxy->DeleteLabel_sync(labelName); } catch (const Glib::Error &ex) { clog << "DBusIndex::deleteLabel: " << ex.what() << endl; } return (deletedLabelName == labelName); } /// Determines whether a document has a label. bool DBusIndex::hasLabel(unsigned int docId, const string &name) const { unsigned int foundDocId = 0; try { foundDocId = m_refProxy->HasLabel_sync(docId, name.c_str()); } catch (const Glib::Error &ex) { clog << "DBusIndex::hasDocument: " << ex.what() << endl; } return (foundDocId == docId); } /// Returns a document's labels. bool DBusIndex::getDocumentLabels(unsigned int docId, set &labels) const { vector labelsList; try { labelsList = m_refProxy->GetDocumentLabels_sync(docId); } catch (const Glib::Error &ex) { clog << "DBusIndex::getDocumentLabels: " << ex.what() << endl; } for (vector::const_iterator labelIter = labelsList.begin(); labelIter != labelsList.end(); ++labelIter) { labels.insert(labelIter->c_str()); } return true; } /// Sets a document's labels. bool DBusIndex::setDocumentLabels(unsigned int docId, const set &labels, bool resetLabels) { vector labelsList; labelsList.reserve(labels.size()); for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { labelsList.push_back(labelIter->c_str()); } try { m_refProxy->SetDocumentLabels_sync(docId, labelsList, resetLabels); } catch (const Glib::Error &ex) { clog << "DBusIndex::setDocumentLabels: " << ex.what() << endl; } return true; } /// Sets documents' labels. bool DBusIndex::setDocumentsLabels(const set &docIds, const set &labels, bool resetLabels) { vector idsList; vector labelsList; idsList.reserve(docIds.size()); for (set::const_iterator idIter = docIds.begin(); idIter != docIds.end(); ++idIter) { stringstream numStr; numStr << *idIter; idsList.push_back(numStr.str().c_str()); } labelsList.reserve(labels.size()); for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { labelsList.push_back(labelIter->c_str()); } try { return m_refProxy->SetDocumentsLabels_sync(idsList, labelsList, resetLabels); } catch (const Glib::Error &ex) { clog << "DBusIndex::setDocumentsLabels: " << ex.what() << endl; } return false; } /// Checks whether the given URL is in the index. unsigned int DBusIndex::hasDocument(const string &url) const { try { return m_refProxy->HasDocument_sync(url.c_str()); } catch (const Glib::Error &ex) { clog << "DBusIndex::hasDocument: " << ex.what() << endl; } return 0; } /// Gets terms with the same root. unsigned int DBusIndex::getCloseTerms(const string &term, set &suggestions) { vector termsList; try { termsList = m_refProxy->GetCloseTerms_sync(term.c_str()); } catch (const Glib::Error &ex) { clog << "DBusIndex::getCloseTerms: " << ex.what() << endl; } if (termsList.empty() == true) { return 0; } for (vector::const_iterator termIter = termsList.begin(); termIter != termsList.end(); ++termIter) { suggestions.insert(termIter->c_str()); } return termsList.size(); } /// Returns the ID of the last document. unsigned int DBusIndex::getLastDocumentID(void) const { return 0; } /// Returns the number of documents. unsigned int DBusIndex::getDocumentsCount(const string &labelName) const { unsigned int docsCount = 0; try { docsCount = m_refProxy->GetDocumentsCount_sync(labelName); } catch (const Glib::Error &ex) { clog << "DBusIndex::getDocumentsCount: " << ex.what() << endl; } return docsCount; } /// Lists documents. unsigned int DBusIndex::listDocuments(set &docIds, unsigned int maxDocsCount, unsigned int startDoc) const { vector docIdsList; try { docIdsList = m_refProxy->ListDocuments_sync("", 0, maxDocsCount, startDoc); } catch (const Glib::Error &ex) { clog << "DBusIndex::listDocuments: " << ex.what() << endl; } for (vector::const_iterator docIter = docIdsList.begin(); docIter != docIdsList.end(); ++docIter) { docIds.insert((unsigned int)atoi(docIter->c_str())); } return docIdsList.size(); } /// Lists documents. bool DBusIndex::listDocuments(const string &name, set &docIds, NameType type, unsigned int maxDocsCount, unsigned int startDoc) const { vector docIdsList; try { docIdsList = m_refProxy->ListDocuments_sync(name.c_str(), (unsigned int)type, maxDocsCount, startDoc); } catch (const Glib::Error &ex) { clog << "DBusIndex::listDocuments: " << ex.what() << endl; } for (vector::const_iterator docIter = docIdsList.begin(); docIter != docIdsList.end(); ++docIter) { docIds.insert((unsigned int)atoi(docIter->c_str())); } return !docIdsList.empty(); } /// Indexes the given data. bool DBusIndex::indexDocument(const Document &doc, const set &labels, unsigned int &docId) { clog << "DBusIndex::indexDocument: not allowed" << endl; return false; } /// Updates the given document; true if success. bool DBusIndex::updateDocument(unsigned int docId, const Document &doc) { unsigned updatedDocId = 0; try { updatedDocId = m_refProxy->UpdateDocument_sync(docId); } catch (const Glib::Error &ex) { clog << "DBusIndex::updateDocument: " << ex.what() << endl; } return (updatedDocId == docId); } /// Updates a document's properties. bool DBusIndex::updateDocumentInfo(unsigned int docId, const DocumentInfo &docInfo) { vector> tuples; unsigned updatedDocId = 0; documentInfoToTuples(docInfo, tuples); try { updatedDocId = m_refProxy->SetDocumentInfo_sync(docId, tuples); } catch (const Glib::Error &ex) { clog << "DBusIndex::updateDocument: " << ex.what() << endl; } return (updatedDocId == docId); } /// Unindexes the given document; true if success. bool DBusIndex::unindexDocument(unsigned int docId) { clog << "DBusIndex::unindexDocument: not allowed" << endl; return false; } /// Unindexes the given document. bool DBusIndex::unindexDocument(const string &location) { clog << "DBusIndex::unindexDocument: not allowed" << endl; return false; } /// Unindexes documents. bool DBusIndex::unindexDocuments(const string &name, NameType type) { clog << "DBusIndex::unindexDocuments: not allowed" << endl; return false; } /// Unindexes all documents. bool DBusIndex::unindexAllDocuments(void) { clog << "DBusIndex::unindexDocuments: not allowed" << endl; return false; } /// Flushes recent changes to the disk. bool DBusIndex::flush(void) { // The daemon knows best when to flush return true; } /// Reopens the index. bool DBusIndex::reopen(void) const { return true; } /// Resets the index. bool DBusIndex::reset(void) { // This can't be done here return false; } pinot-1.22/IndexSearch/DBusIndex.h000066400000000000000000000126421470740426600170010ustar00rootroot00000000000000/* * Copyright 2007-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DBUS_INDEX_H #define _DBUS_INDEX_H #include #include #include #include #include #include "IndexInterface.h" #include "PinotDBus_proxy.h" #define PINOT_DBUS_SERVICE_NAME "com.github.fabricecolin.Pinot" #define PINOT_DBUS_OBJECT_PATH "/com/github/fabricecolin/Pinot" /// Allows to write to the daemon index via D-Bus. class DBusIndex : public IndexInterface { public: DBusIndex(IndexInterface *pROIndex); DBusIndex(const DBusIndex &other); virtual ~DBusIndex(); DBusIndex &operator=(const DBusIndex &other); /// Extracts docInfo from tuples. static void documentInfoFromTuples(const std::vector> &tuples, DocumentInfo &docInfo); /// Converts docInfo to tuples. static void documentInfoToTuples(const DocumentInfo &docInfo, std::vector> &tuples); /// Asks the D-Bus service to reload its configuration. bool reload(void); /// Gets some statistics from the D-Bus service. bool getStatistics(unsigned int &crawledCount, unsigned int &docsCount, bool &lowDiskSpace, bool &onBattery, bool &crawling); /// Returns false if the index couldn't be opened. virtual bool isGood(void) const; /// Gets metadata. virtual std::string getMetadata(const std::string &name) const; /// Sets metadata. virtual bool setMetadata(const std::string &name, const std::string &value) const; /// Gets the index location. virtual std::string getLocation(void) const; /// Returns a document's properties. virtual bool getDocumentInfo(unsigned int docId, DocumentInfo &docInfo) const; /// Returns a document's terms count. virtual unsigned int getDocumentTermsCount(unsigned int docId) const; /// Returns a document's terms. virtual bool getDocumentTerms(unsigned int docId, std::map &wordsBuffer) const; /// Sets the list of known labels. virtual bool setLabels(const std::set &labels, bool resetLabels); /// Gets the list of known labels. virtual bool getLabels(std::set &labels) const; /// Adds a label. virtual bool addLabel(const std::string &name); /// Deletes all references to a label. virtual bool deleteLabel(const std::string &name); /// Determines whether a document has a label. virtual bool hasLabel(unsigned int docId, const std::string &name) const; /// Returns a document's labels. virtual bool getDocumentLabels(unsigned int docId, std::set &labels) const; /// Sets a document's labels. virtual bool setDocumentLabels(unsigned int docId, const std::set &labels, bool resetLabels = true); /// Sets documents' labels. virtual bool setDocumentsLabels(const std::set &docIds, const std::set &labels, bool resetLabels = true); /// Checks whether the given URL is in the index. virtual unsigned int hasDocument(const std::string &url) const; /// Gets terms with the same root. virtual unsigned int getCloseTerms(const std::string &term, std::set &suggestions); /// Returns the ID of the last document. virtual unsigned int getLastDocumentID(void) const; /// Returns the number of documents. virtual unsigned int getDocumentsCount(const std::string &labelName = "") const; /// Lists documents. virtual unsigned int listDocuments(std::set &docIDList, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const; /// Lists documents. virtual bool listDocuments(const std::string &name, std::set &docIds, NameType type, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const; /// Indexes the given data. virtual bool indexDocument(const Document &doc, const std::set &labels, unsigned int &docId); /// Updates the given document. virtual bool updateDocument(unsigned int docId, const Document &doc); /// Updates a document's properties. virtual bool updateDocumentInfo(unsigned int docId, const DocumentInfo &docInfo); /// Unindexes the given document. virtual bool unindexDocument(unsigned int docId); /// Unindexes the given document. virtual bool unindexDocument(const std::string &location); /// Unindexes documents. virtual bool unindexDocuments(const std::string &name, NameType type); /// Unindexes all documents. virtual bool unindexAllDocuments(void); /// Flushes recent changes to the disk. virtual bool flush(void); /// Reopens the index. virtual bool reopen(void) const; /// Resets the index. virtual bool reset(void); protected: Glib::RefPtr m_refProxy; IndexInterface *m_pROIndex; }; #endif // _DBUS_INDEX_H pinot-1.22/IndexSearch/FieldMapperInterface.h000066400000000000000000000047371470740426600211730ustar00rootroot00000000000000/* * Copyright 2012-2013 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _FIELD_MAPPER_INTERFACE_H #define _FIELD_MAPPER_INTERFACE_H #include #include #include #include #include "DocumentInfo.h" #include "Visibility.h" /// Interface implemented by field mappers. class PINOT_EXPORT FieldMapperInterface { public: FieldMapperInterface(const FieldMapperInterface &other) {}; virtual ~FieldMapperInterface() {}; /// Gets the host for this document. virtual std::string getHost(const DocumentInfo &docInfo) = 0; /// Gets the directory for this document. virtual std::string getDirectory(const DocumentInfo &docInfo) = 0; /// Gets the file for this document. virtual std::string getFile(const DocumentInfo &docInfo) = 0; /// Gets terms from the document and their prefixes. virtual void getTerms(const DocumentInfo &docInfo, std::vector > &prefixedTerms) = 0; /// Gets values. virtual void getValues(const DocumentInfo &docInfo, std::map &values) = 0; /// Saves terms as record data. virtual void toRecord(const DocumentInfo *pDocInfo, std::string &record) = 0; /// Retrieves terms from record data. virtual void fromRecord(DocumentInfo *pDocInfo, const std::string &record) = 0; /// Returns whether terms with the prefix this filter corresponds to were escaped. virtual bool isEscaped(const std::string &filterName) = 0; /// Returns boolean query filters and their prefixes. virtual void getBooleanFilters(std::map &filters) = 0; /// Returns the valuenumber to collapse on, if any. virtual bool collapseOnValue(unsigned int &valueNumber) = 0; protected: FieldMapperInterface() { }; }; #endif // _FIELD_MAPPER_INTERFACE_H pinot-1.22/IndexSearch/FilterWrapper.cpp000066400000000000000000000101701470740426600202670ustar00rootroot00000000000000/* * Copyright 2007-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include "Url.h" #include "FilterFactory.h" #include "TextFilter.h" #include "FilterWrapper.h" using std::clog; using std::endl; using std::string; using std::set; using namespace Dijon; IndexAction::IndexAction(IndexInterface *pIndex) : ReducedAction(), m_pIndex(pIndex), m_docId(0), m_doUpdate(false) { } IndexAction::IndexAction(const IndexAction &other) : ReducedAction(other), m_pIndex(other.m_pIndex), m_labels(other.m_labels), m_docId(other.m_docId), m_doUpdate(other.m_doUpdate) { } IndexAction::~IndexAction() { } IndexAction &IndexAction::operator=(const IndexAction &other) { ReducedAction::operator=(other); if (this != &other) { m_pIndex = other.m_pIndex; m_labels = other.m_labels; m_docId = other.m_docId; m_doUpdate = other.m_doUpdate; } return *this; } void IndexAction::setIndexingMode(const set &labels) { m_labels = labels; m_docId = 0; m_doUpdate = false; } void IndexAction::setUpdatingMode(unsigned int docId) { m_labels.clear(); m_docId = docId; m_doUpdate = true; } bool IndexAction::takeAction(Document &doc, bool isNested) { bool docSuccess = false; if (m_pIndex == NULL) { return false; } // Nested documents can't be updated because they are unindexed // and the ID is that of the base document anyway if ((m_doUpdate == true) && (isNested == false)) { docSuccess = m_pIndex->updateDocument(m_docId, doc); } else { unsigned int newDocId = m_docId; docSuccess = m_pIndex->indexDocument(doc, m_labels, newDocId); // Make sure we return the base document's ID, not the last nested document's ID if (isNested == false) { m_docId = newDocId; } } return docSuccess; } unsigned int IndexAction::getId(void) const { return m_docId; } bool IndexAction::unindexNestedDocuments(const string &url) { if (m_pIndex == NULL) { return false; } // Unindex all documents that stem from this file return m_pIndex->unindexDocuments(url, IndexInterface::BY_CONTAINER_FILE); } bool IndexAction::unindexDocument(const string &location) { if (m_pIndex == NULL) { return false; } unindexNestedDocuments(location); return m_pIndex->unindexDocument(location); } FilterWrapper::FilterWrapper(IndexInterface *pIndex) : m_pAction(new IndexAction(pIndex)), m_ownAction(true) { } FilterWrapper::FilterWrapper(IndexAction *pAction) : m_pAction(pAction), m_ownAction(false) { } FilterWrapper::~FilterWrapper() { if ((m_ownAction == true) && (m_pAction != NULL)) { delete m_pAction; } } bool FilterWrapper::indexDocument(const Document &doc, const set &labels, unsigned int &docId) { string originalType(doc.getType()); if (m_pAction == NULL) { return false; } m_pAction->unindexNestedDocuments(doc.getLocation()); m_pAction->setIndexingMode(labels); bool filteredDoc = FilterUtils::filterDocument(doc, originalType, *m_pAction); docId = m_pAction->getId(); return filteredDoc; } bool FilterWrapper::updateDocument(const Document &doc, unsigned int docId) { string originalType(doc.getType()); if (m_pAction == NULL) { return false; } m_pAction->unindexNestedDocuments(doc.getLocation()); m_pAction->setUpdatingMode(docId); return FilterUtils::filterDocument(doc, originalType, *m_pAction); } bool FilterWrapper::unindexDocument(const string &location) { if (m_pAction == NULL) { return false; } return m_pAction->unindexDocument(location); } pinot-1.22/IndexSearch/FilterWrapper.h000066400000000000000000000045431470740426600177430ustar00rootroot00000000000000/* * Copyright 2007-2012 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _FILTER_WRAPPER_H #define _FILTER_WRAPPER_H #include #include #include "Document.h" #include "Visibility.h" #include "Filter.h" #include "FilterUtils.h" #include "IndexInterface.h" /// Indexing action. class PINOT_EXPORT IndexAction : public ReducedAction { public: IndexAction(IndexInterface *pIndex); IndexAction(const IndexAction &other); virtual ~IndexAction(); IndexAction &operator=(const IndexAction &other); void setIndexingMode(const std::set &labels); void setUpdatingMode(unsigned int docId); virtual bool takeAction(Document &doc, bool isNested); unsigned int getId(void) const; virtual bool unindexNestedDocuments(const std::string &url); virtual bool unindexDocument(const std::string &location); public: IndexInterface *m_pIndex; std::set m_labels; unsigned int m_docId; bool m_doUpdate; }; /// A wrapper around Dijon filters. class PINOT_EXPORT FilterWrapper { public: FilterWrapper(IndexInterface *pIndex); FilterWrapper(IndexAction *pAction); virtual ~FilterWrapper(); /// Indexes the given data. bool indexDocument(const Document &doc, const std::set &labels, unsigned int &docId); /// Updates the given document. bool updateDocument(const Document &doc, unsigned int docId); /// Unindexes document(s) at the given location. bool unindexDocument(const std::string &location); protected: IndexAction *m_pAction; bool m_ownAction; private: FilterWrapper(const FilterWrapper &other); FilterWrapper &operator=(const FilterWrapper &other); }; #endif // _FILTER_WRAPPER_H pinot-1.22/IndexSearch/IndexInterface.h000066400000000000000000000113361470740426600200430ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _INDEX_INTERFACE_H #define _INDEX_INTERFACE_H #include #include #include #include "Document.h" #include "Visibility.h" /// Interface implemented by indexes. class PINOT_EXPORT IndexInterface { public: IndexInterface(const IndexInterface &other) {}; virtual ~IndexInterface() {}; typedef enum { BY_LABEL = 0, BY_DIRECTORY, BY_FILE, BY_CONTAINER_FILE } NameType; /// Returns false if the index couldn't be opened. virtual bool isGood(void) const = 0; /// Gets metadata. virtual std::string getMetadata(const std::string &name) const = 0; /// Sets metadata. virtual bool setMetadata(const std::string &name, const std::string &value) const = 0; /// Gets the index location. virtual std::string getLocation(void) const = 0; /// Returns a document's properties. virtual bool getDocumentInfo(unsigned int docId, DocumentInfo &docInfo) const = 0; /// Returns a document's terms count. virtual unsigned int getDocumentTermsCount(unsigned int docId) const = 0; /// Returns a document's terms. virtual bool getDocumentTerms(unsigned int docId, std::map &wordsBuffer) const = 0; /// Sets the list of known labels. virtual bool setLabels(const std::set &labels, bool resetLabels) = 0; /// Gets the list of known labels. virtual bool getLabels(std::set &labels) const = 0; /// Adds a label. virtual bool addLabel(const std::string &name) = 0; /// Deletes all references to a label. virtual bool deleteLabel(const std::string &name) = 0; /// Determines whether a document has a label. virtual bool hasLabel(unsigned int docId, const std::string &name) const = 0; /// Returns a document's labels. virtual bool getDocumentLabels(unsigned int docId, std::set &labels) const = 0; /// Sets a document's labels. virtual bool setDocumentLabels(unsigned int docId, const std::set &labels, bool resetLabels = true) = 0; /// Sets documents' labels. virtual bool setDocumentsLabels(const std::set &docIds, const std::set &labels, bool resetLabels = true) = 0; /// Checks whether the given URL is in the index. virtual unsigned int hasDocument(const std::string &url) const = 0; /// Gets terms with the same root. virtual unsigned int getCloseTerms(const std::string &term, std::set &suggestions) = 0; /// Returns the ID of the last document. virtual unsigned int getLastDocumentID(void) const = 0; /// Returns the number of documents. virtual unsigned int getDocumentsCount(const std::string &labelName = "") const = 0; /// Lists documents. virtual unsigned int listDocuments(std::set &docIDList, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const = 0; /// Lists documents. virtual bool listDocuments(const std::string &name, std::set &docIds, NameType type, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const = 0; /// Indexes the given data. virtual bool indexDocument(const Document &doc, const std::set &labels, unsigned int &docId) = 0; /// Updates the given document. virtual bool updateDocument(unsigned int docId, const Document &doc) = 0; /// Updates a document's properties. virtual bool updateDocumentInfo(unsigned int docId, const DocumentInfo &docInfo) = 0; /// Unindexes the given document. virtual bool unindexDocument(unsigned int docId) = 0; /// Unindexes the given document. virtual bool unindexDocument(const std::string &location) = 0; /// Unindexes documents. virtual bool unindexDocuments(const std::string &name, NameType type) = 0; /// Unindexes all documents. virtual bool unindexAllDocuments(void) = 0; /// Flushes recent changes to the disk. virtual bool flush(void) = 0; /// Reopens the index. virtual bool reopen(void) const = 0; /// Resets the index. virtual bool reset(void) = 0; protected: IndexInterface() { }; }; #endif // _INDEX_INTERFACE_H pinot-1.22/IndexSearch/Makefile.am000066400000000000000000000034251470740426600170360ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in pkginclude_HEADERS = \ DBusIndex.h \ FieldMapperInterface.h \ FilterWrapper.h \ IndexInterface.h \ ModuleFactory.h \ ModuleProperties.h \ OpenSearchParser.h \ PinotDBus_proxy.h \ PluginParsers.h \ PluginWebEngine.h \ QueryProperties.h \ ResultsExporter.h \ SearchEngineInterface.h \ SearchPluginProperties.h \ SherlockParser.h \ WebEngine.h nobase_pkginclude_HEADERS = \ cjkv/CJKVTokenizer.h if HAVE_DBUS pkglib_LTLIBRARIES = libIndex.la libIndexSearch.la else pkglib_LTLIBRARIES = libIndexSearch.la endif libIndex_la_LDFLAGS = \ -static libIndex_la_SOURCES = \ PinotDBus_proxy.cpp \ DBusIndex.cpp libIndexSearch_la_LDFLAGS = \ -static libIndexSearch_la_SOURCES = \ FilterWrapper.cpp \ ModuleFactory.cpp \ OpenSearchParser.cpp \ PluginWebEngine.cpp \ QueryProperties.cpp \ ResultsExporter.cpp \ SearchEngineInterface.cpp \ SearchPluginProperties.cpp \ WebEngine.cpp if HAVE_DBUS libIndexSearch_la_SOURCES += PinotDBus_proxy.cpp DBusIndex.cpp endif if HAVE_BOOST_SPIRIT libIndexSearch_la_SOURCES += SherlockParser.cpp endif if HAVE_DBUS bin_PROGRAMS = pinot-label endif pinot_label_LDFLAGS = \ -export-dynamic pinot_label_LDADD = \ -L$(top_builddir)/Utils \ -lIndex -lUtils -lBasicUtils \ @GLIBMM_LIBS@ @GIOMM_LIBS@ \ @GTHREAD_LIBS@ @HTTP_LIBS@ @MISC_LIBS@ pinot_label_SOURCES = pinot-label.cpp pinot_label_DEPENDENCIES = libIndex.la AM_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils \ -I$(top_srcdir)/Tokenize \ -I$(top_srcdir)/Tokenize/filters \ -I$(top_srcdir)/Collect \ -I$(top_srcdir)/IndexSearch/cjkv \ @HTTP_CFLAGS@ @XML_CFLAGS@ \ @INDEX_CFLAGS@ @GIOMM_CFLAGS@ @GLIBMM_CFLAGS@ if HAVE_DBUS AM_CXXFLAGS += -DHAVE_DBUS endif if HAVE_BOOST_SPIRIT AM_CXXFLAGS += -DHAVE_BOOST_SPIRIT endif pinot-1.22/IndexSearch/ModuleFactory.cpp000066400000000000000000000305211470740426600202600ustar00rootroot00000000000000/* * Copyright 2007-2022 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #ifdef HAVE_DLFCN_H #include #endif #include #include #ifdef HAVE_DBUS #include "DBusIndex.h" #endif #include "PluginWebEngine.h" #include "ModuleFactory.h" #ifdef HAVE_DLFCN_H #ifdef __CYGWIN__ #define DLOPEN_FLAGS RTLD_NOW #else #define DLOPEN_FLAGS (RTLD_NOW|RTLD_LOCAL) #endif #endif #define GETMODULEPROPERTIESFUNC "getModuleProperties" #define OPENORCREATEINDEXFUNC "openOrCreateIndex" #define MERGEINDEXESFUNC "mergeIndexes" #define GETINDEXFUNC "getIndex" #define GETSEARCHENGINEFUNC "getSearchEngine" #define SETFIELDMAPPERFUNC "setFieldMapper" #define CLOSEALLFUNC "closeAll" typedef ModuleProperties *(getModulePropertiesFunc)(void); typedef bool (openOrCreateIndexFunc)(const string &, bool &, bool, bool); typedef bool (mergeIndexesFunc)(const string &, const string &, const string &); typedef IndexInterface *(getIndexFunc)(const string &); typedef SearchEngineInterface *(getSearchEngineFunc)(const string &); typedef void (setFieldMapperFunc)(FieldMapperInterface *pMapper); typedef void (closeAllFunc)(void); using std::clog; using std::clog; using std::endl; using std::string; using std::map; using std::set; using std::pair; LoadableModule::LoadableModule(ModuleProperties *pProperties, const string &location, void *pHandle) : m_pProperties(pProperties), m_location(location), m_canSearch(false), m_canIndex(false), m_pHandle(pHandle) { } LoadableModule::LoadableModule(const LoadableModule &other) : m_pProperties(NULL), m_location(other.m_location), m_canSearch(other.m_canSearch), m_canIndex(other.m_canIndex), m_pHandle(other.m_pHandle) { if (other.m_pProperties != NULL) { m_pProperties = new ModuleProperties(*other.m_pProperties); } } LoadableModule::~LoadableModule() { if (m_pProperties != NULL) { delete m_pProperties; } } LoadableModule &LoadableModule::operator=(const LoadableModule &other) { if (this != &other) { if (m_pProperties != NULL) { delete m_pProperties; m_pProperties = NULL; } m_pProperties = other.m_pProperties; m_location = other.m_location; m_canSearch = other.m_canSearch; m_canIndex = other.m_canIndex; m_pHandle = other.m_pHandle; } return *this; } map ModuleFactory::m_types; ModuleFactory::ModuleFactory() { } ModuleFactory::~ModuleFactory() { } IndexInterface *ModuleFactory::getLibraryIndex(const string &type, const string &option) { map::iterator typeIter = m_types.find(type); if ((typeIter == m_types.end()) || (typeIter->second.m_canIndex == false)) { // We don't know about this type, or doesn't support indexes return NULL; } void *pHandle = typeIter->second.m_pHandle; if (pHandle == NULL) { return NULL; } #ifdef HAVE_DLFCN_H getIndexFunc *pFunc = (getIndexFunc *)dlsym(pHandle, GETINDEXFUNC); if (pFunc != NULL) { return (*pFunc)(option); } #endif #ifdef DEBUG clog << "ModuleFactory::getLibraryIndex: couldn't find export getIndex" << endl; #endif return NULL; } SearchEngineInterface *ModuleFactory::getLibrarySearchEngine(const string &type, const string &option) { map::iterator typeIter = m_types.find(type); if (typeIter == m_types.end()) { // We don't know about this type return NULL; } void *pHandle = typeIter->second.m_pHandle; if (pHandle == NULL) { return NULL; } #ifdef HAVE_DLFCN_H getSearchEngineFunc *pFunc = (getSearchEngineFunc *)dlsym(pHandle, GETSEARCHENGINEFUNC); if (pFunc != NULL) { return (*pFunc)(option); } #endif #ifdef DEBUG clog << "ModuleFactory::getLibrarySearchEngine: couldn't find export getSearchEngine" << endl; #endif return NULL; } unsigned int ModuleFactory::loadModules(const string &directory) { unsigned int count = 0; #ifdef HAVE_DLFCN_H struct stat fileStat; if (directory.empty() == true) { return 0; } // Is it a directory ? if ((stat(directory.c_str(), &fileStat) == -1) || (!S_ISDIR(fileStat.st_mode))) { clog << "ModuleFactory::loadModules: " << directory << " is not a directory" << endl; return 0; } // Scan it DIR *pDir = opendir(directory.c_str()); if (pDir == NULL) { return 0; } // Iterate through this directory's entries struct dirent *pDirEntry = readdir(pDir); while (pDirEntry != NULL) { char *pEntryName = pDirEntry->d_name; if (pEntryName != NULL) { string fileName = pEntryName; string::size_type extPos = fileName.find_last_of("."); // FIXME: prefer newer versioned modules if ((extPos == string::npos) || (fileName.substr(extPos) != ".so")) { // Next entry pDirEntry = readdir(pDir); continue; } fileName = directory; fileName += "/"; fileName += pEntryName; // Check this entry if ((stat(fileName.c_str(), &fileStat) == 0) && (S_ISREG(fileStat.st_mode))) { void *pHandle = dlopen(fileName.c_str(), DLOPEN_FLAGS); if (pHandle != NULL) { // What type does this export ? getModulePropertiesFunc *pPropsFunc = (getModulePropertiesFunc *)dlsym(pHandle, GETMODULEPROPERTIESFUNC); if (pPropsFunc != NULL) { LoadableModule module((*pPropsFunc)(), fileName, pHandle); if (module.m_pProperties != NULL) { string moduleType(module.m_pProperties->m_name); // Can it search ? getSearchEngineFunc *pSearchFunc = (getSearchEngineFunc *)dlsym(pHandle, GETSEARCHENGINEFUNC); if (pSearchFunc != NULL) { module.m_canSearch = true; } // Can it index ? getIndexFunc *pIndexFunc = (getIndexFunc *)dlsym(pHandle, GETINDEXFUNC); if (pIndexFunc != NULL) { module.m_canIndex = true; } // Add a record for this module m_types.insert(pair(moduleType, module)); #ifdef DEBUG clog << "ModuleFactory::loadModules: " << moduleType << " is supported by " << pEntryName << endl; #endif } } else clog << "ModuleFactory::loadModules: " << dlerror() << endl; } else clog << "ModuleFactory::loadModules: " << dlerror() << endl; } #ifdef DEBUG else clog << "ModuleFactory::loadModules: " << pEntryName << " is not a file" << endl; #endif } // Next entry pDirEntry = readdir(pDir); } closedir(pDir); #endif return count; } bool ModuleFactory::openOrCreateIndex(const string &type, const string &option, bool &obsoleteFormat, bool readOnly, bool overwrite) { map::iterator typeIter = m_types.find(type); if ((typeIter == m_types.end()) || (typeIter->second.m_canIndex == false)) { // We don't know about this type, or doesn't support indexes return false; } void *pHandle = typeIter->second.m_pHandle; if (pHandle == NULL) { return false; } #ifdef HAVE_DLFCN_H openOrCreateIndexFunc *pFunc = (openOrCreateIndexFunc *)dlsym(pHandle, OPENORCREATEINDEXFUNC); if (pFunc != NULL) { return (*pFunc)(option, obsoleteFormat, readOnly, overwrite); } #endif #ifdef DEBUG clog << "ModuleFactory::openOrCreateIndex: couldn't find export openOrCreateIndex" << endl; #endif return false; } bool ModuleFactory::mergeIndexes(const string &type, const string &option0, const string &option1, const string &option2) { map::iterator typeIter = m_types.find(type); if ((typeIter == m_types.end()) || (typeIter->second.m_canIndex == false)) { // We don't know about this type, or doesn't support indexes return false; } void *pHandle = typeIter->second.m_pHandle; if (pHandle == NULL) { return false; } #ifdef HAVE_DLFCN_H mergeIndexesFunc *pFunc = (mergeIndexesFunc *)dlsym(pHandle, MERGEINDEXESFUNC); if (pFunc != NULL) { return (*pFunc)(option0, option1, option2); } #endif #ifdef DEBUG clog << "ModuleFactory::mergeIndexes: couldn't find export mergeIndexes" << endl; #endif return false; } IndexInterface *ModuleFactory::getIndex(const string &type, const string &option) { IndexInterface *pIndex = NULL; // Choice by type // Do we need to nest it in a DBusIndex ? if (type.substr(0, 5) == "dbus-") { #ifdef DEBUG clog << "ModuleFactory::mergeIndexes: sub-type " << type.substr(5) << endl; #endif pIndex = getLibraryIndex(type.substr(5), option); if (pIndex != NULL) { #ifdef HAVE_DBUS return new DBusIndex(pIndex); #else return pIndex; #endif } return NULL; } return getLibraryIndex(type, option); } SearchEngineInterface *ModuleFactory::getSearchEngine(const string &type, const string &option) { SearchEngineInterface *pEngine = NULL; // Choice by type if ( #ifdef HAVE_BOOST_SPIRIT (type == "sherlock") || #endif (type == "opensearch")) { pEngine = new PluginWebEngine(option); } if (pEngine != NULL) { return pEngine; } return getLibrarySearchEngine(type, option); } string ModuleFactory::getSearchEngineName(const string &type, const string &option) { if ( #ifdef HAVE_BOOST_SPIRIT (type == "sherlock") || #endif (type == "opensearch")) { SearchPluginProperties properties; if (PluginWebEngine::getDetails(option, properties) == true) { return properties.m_name; } return ""; } else { return option; } return type; } void ModuleFactory::getSupportedEngines(map &engines) { engines.clear(); // Built-in engines #ifdef HAVE_BOOST_SPIRIT engines.insert(pair(ModuleProperties("sherlock", "Sherlock", "", ""), false)); #endif engines.insert(pair(ModuleProperties("opensearch", "OpenSearch", "", ""), false)); // Library-handled engines for (map::iterator typeIter = m_types.begin(); typeIter != m_types.end(); ++typeIter) { ModuleProperties *pProps = typeIter->second.m_pProperties; if (pProps != NULL) { engines.insert(pair(*pProps, true)); } } } bool ModuleFactory::isSupported(const string &type, bool asIndex) { if (asIndex == true) { // Only backends implement index functionality map::const_iterator typeIter = m_types.find(type); if (typeIter != m_types.end()) { return typeIter->second.m_canIndex; } return false; } if ( #ifdef HAVE_BOOST_SPIRIT (type == "sherlock") || #endif (type == "opensearch")) { return true; } else { // Does this backend implement search functionality ? map::const_iterator typeIter = m_types.find(type); if (typeIter != m_types.end()) { return typeIter->second.m_canSearch; } } return false; } void ModuleFactory::setFieldMapper(FieldMapperInterface *pMapper) { for (map::iterator typeIter = m_types.begin(); typeIter != m_types.end(); ++typeIter) { void *pHandle = typeIter->second.m_pHandle; if (pHandle == NULL) { continue; } #ifdef HAVE_DLFCN_H setFieldMapperFunc *pFunc = (setFieldMapperFunc *)dlsym(pHandle, SETFIELDMAPPERFUNC); if (pFunc != NULL) { (*pFunc)(pMapper); } #ifdef DEBUG else clog << "ModuleFactory::setFieldMapper: couldn't find export setFieldMapper" << endl; #endif #endif } } void ModuleFactory::unloadModules(void) { for (map::iterator typeIter = m_types.begin(); typeIter != m_types.end(); ++typeIter) { void *pHandle = typeIter->second.m_pHandle; if (pHandle == NULL) { continue; } #ifdef HAVE_DLFCN_H closeAllFunc *pFunc = (closeAllFunc *)dlsym(pHandle, CLOSEALLFUNC); if (pFunc != NULL) { (*pFunc)(); } #ifdef DEBUG else clog << "ModuleFactory::unloadModules: couldn't find export closeAll" << endl; #endif if (dlclose(pHandle) != 0) { #ifdef DEBUG clog << "ModuleFactory::unloadModules: failed on " << typeIter->first << endl; #endif } #endif } m_types.clear(); } pinot-1.22/IndexSearch/ModuleFactory.h000066400000000000000000000064131470740426600177300ustar00rootroot00000000000000/* * Copyright 2007-2012 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _MODULE_FACTORY_H #define _MODULE_FACTORY_H #include #include #include "FieldMapperInterface.h" #include "IndexInterface.h" #include "SearchPluginProperties.h" #include "SearchEngineInterface.h" /// Loadable module. class LoadableModule { public: LoadableModule(ModuleProperties *pProperties, const std::string &location, void *pHandle); LoadableModule(const LoadableModule &other); virtual ~LoadableModule(); LoadableModule &operator=(const LoadableModule &other); ModuleProperties *m_pProperties; std::string m_location; bool m_canSearch; bool m_canIndex; void *m_pHandle; }; /// Factory for search engines. class ModuleFactory { public: virtual ~ModuleFactory(); /// Loads the libraries found in the given directory. static unsigned int loadModules(const std::string &directory); /// Makes sure the index exists in the desired mode. static bool openOrCreateIndex(const std::string &type, const std::string &option, bool &obsoleteFormat, bool readOnly = true, bool overwrite = false); /// Merges two physical indexes in a logical one. static bool mergeIndexes(const std::string &type, const std::string &option0, const std::string &option1, const std::string &option2); /// Returns an index of the specified type; NULL if unavailable. static IndexInterface *getIndex(const std::string &type, const std::string &option); /// Returns a SearchEngine of the specified type; NULL if unavailable. static SearchEngineInterface *getSearchEngine(const std::string &type, const std::string &option); /// Returns the name of the given engine. static string getSearchEngineName(const std::string &type, const std::string &option); /// Returns all supported engines. static void getSupportedEngines(std::map &engines); /// Indicates whether a search engine or index is supported or not. static bool isSupported(const std::string &type, bool asIndex = false); /// Sets a field mapper. static void setFieldMapper(FieldMapperInterface *pMapper); /// Unloads all libraries. static void unloadModules(void); protected: static std::map m_types; ModuleFactory(); static IndexInterface *getLibraryIndex(const std::string &type, const std::string &option); static SearchEngineInterface *getLibrarySearchEngine(const std::string &type, const std::string &option); private: ModuleFactory(const ModuleFactory &other); ModuleFactory &operator=(const ModuleFactory &other); }; #endif // _MODULE_FACTORY_H pinot-1.22/IndexSearch/ModuleProperties.h000066400000000000000000000042701470740426600204540ustar00rootroot00000000000000/* * Copyright 2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _MODULE_PROPERTIES_H #define _MODULE_PROPERTIES_H #include #include /// Properties of a module. class ModuleProperties { public: ModuleProperties() { } ModuleProperties(const std::string &name, const std::string &longName, const std::string &option, const std::string &channel) : m_name(name), m_longName(longName), m_option(option), m_channel(channel) { } ModuleProperties(const ModuleProperties &other) : m_name(other.m_name), m_longName(other.m_longName), m_option(other.m_option), m_channel(other.m_channel) { } virtual ~ModuleProperties() { } ModuleProperties& operator=(const ModuleProperties& other) { if (this != &other) { m_name = other.m_name; m_longName = other.m_longName; m_option = other.m_option; m_channel = other.m_channel; } return *this; } bool operator==(const ModuleProperties &other) const { if ((m_name == other.m_name) && (m_longName == other.m_longName)) { return true; } return false; } bool operator<(const ModuleProperties &other) const { if (m_name < other.m_name) { return true; } else if (m_name == other.m_name) { if (m_longName < other.m_longName) { return true; } } return false; } // Description std::string m_name; std::string m_longName; std::string m_option; std::string m_channel; }; #endif // _MODULE_PROPERTIES_H pinot-1.22/IndexSearch/OpenSearchParser.cpp000066400000000000000000000350701470740426600207130ustar00rootroot00000000000000/* * Copyright 2005-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include "StringManip.h" #include "FilterUtils.h" #include "OpenSearchParser.h" using namespace std; using namespace Glib; using namespace xmlpp; static ustring getNodeContent(const Node *pNode) { if (pNode == NULL) { return ""; } // Is it an element ? const Element *pElem = dynamic_cast(pNode); if (pElem != NULL) { const TextNode *pText = pElem->get_first_child_text(); if (pText == NULL) { // Maybe the text is given as CDATA Node::const_NodeList childNodes = pNode->get_children(); if (childNodes.size() == 1) { // Is it CDATA ? const CdataNode *pContent = dynamic_cast(*childNodes.begin()); if (pContent != NULL) { return pContent->get_content(); } } return ""; } return pText->get_content(); } return ""; } OpenSearchResponseParser::OpenSearchResponseParser(bool rssResponse) : ResponseParserInterface(), m_rssResponse(rssResponse) { } OpenSearchResponseParser::~OpenSearchResponseParser() { } bool OpenSearchResponseParser::parse(const ::Document *pResponseDoc, vector &resultsList, unsigned int &totalResults, unsigned int &firstResultIndex, string &charset) const { float pseudoScore = 100; off_t contentLen = 0; bool foundResult = false; if ((pResponseDoc == NULL) || (pResponseDoc->getData(contentLen) == NULL) || (contentLen == 0)) { return false; } // Make sure the response MIME type is sensible string mimeType = pResponseDoc->getType(); if ((mimeType.empty() == false) && (mimeType.find("xml") == string::npos)) { clog << "OpenSearchResponseParser::parse: response is not XML" << endl; return false; } const char *pContent = pResponseDoc->getData(contentLen); try { bool loadFeed = false; DomParser parser; parser.set_substitute_entities(true); parser.parse_memory_raw((const unsigned char *)pContent, (Parser::size_type)contentLen); xmlpp::Document *pDocument = parser.get_document(); if (pDocument == NULL) { return false; } ustring encoding(pDocument->get_encoding()); if (encoding.empty() == false) { charset = encoding; #ifdef DEBUG clog << "OpenSearchResponseParser::parse: response charset is " << charset << endl; #endif } Node *pNode = pDocument->get_root_node(); Element *pRootElem = dynamic_cast(pNode); if (pRootElem == NULL) { return false; } // Check the top-level element is what we expect ustring rootNodeName = pRootElem->get_name(); if (m_rssResponse == true) { if (rootNodeName == "rss") { Node::NodeList rssChildNodes = pRootElem->get_children(); for (Node::NodeList::const_iterator rssIter = rssChildNodes.begin(); rssIter != rssChildNodes.end(); ++rssIter) { Node *pRssNode = (*rssIter); Element *pRssElem = dynamic_cast(pRssNode); if (pRssElem != NULL) { if (pRssElem->get_name() == "channel") { pRootElem = pRssElem; loadFeed = true; break; } } } } } else { if (rootNodeName != "feed") { return false; } loadFeed = true; } if (loadFeed == false) { #ifdef DEBUG clog << "OpenSearchResponseParser::parse: error on root node " << rootNodeName << endl; #endif return false; } // RSS ustring itemNode("item"); ustring descriptionNode("description"); if (m_rssResponse == false) { // Atom itemNode = "entry"; descriptionNode = "content"; } // Go through the subnodes Node::NodeList childNodes = pRootElem->get_children(); for (Node::NodeList::const_iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { Node *pChildNode = (*iter); ustring nodeName(pChildNode->get_name()); ustring nodeContent(getNodeContent(pChildNode)); // Is this an OpenSearch extension ? // FIXME: make sure namespace is opensearch if (nodeName == "totalResults") { if (nodeContent.empty() == false) { totalResults = min((unsigned int)atoi(nodeContent.c_str()), totalResults); #ifdef DEBUG clog << "OpenSearchResponseParser::parse: total results " << totalResults << endl; #endif } } else if (nodeName == "startIndex") { if (nodeContent.empty() == false) { firstResultIndex = (unsigned int)atoi(nodeContent.c_str()); #ifdef DEBUG clog << "OpenSearchResponseParser::parse: first result index " << firstResultIndex << endl; #endif } } if (nodeName != itemNode) { continue; } // Go through the item's subnodes ustring title, url, extract; Node::NodeList itemChildNodes = pChildNode->get_children(); for (Node::NodeList::const_iterator itemIter = itemChildNodes.begin(); itemIter != itemChildNodes.end(); ++itemIter) { Node *pItemNode = (*itemIter); Element *pItemElem = dynamic_cast(pItemNode); if (pItemElem == NULL) { continue; } ustring itemNodeName = pItemNode->get_name(); if (itemNodeName == "title") { title = getNodeContent(pItemNode); } else if (itemNodeName == "link") { if (m_rssResponse == true) { url = getNodeContent(pItemNode); } else { Attribute *pAttr = pItemElem->get_attribute("href"); if (pAttr != NULL) { url = pAttr->get_value(); } } } else if (itemNodeName == descriptionNode) { extract = getNodeContent(pItemNode); } } // The extract may contain HTML if ((extract.find("<") != string::npos) && (extract.find(">") != string::npos)) { // Wrap the extract ustring dummyHtml(""); dummyHtml += extract; dummyHtml += ""; extract = FilterUtils::stripMarkup(dummyHtml); } DocumentInfo result(title, url, "", ""); result.setExtract(extract); result.setScore(pseudoScore); resultsList.push_back(result); --pseudoScore; foundResult = true; if (resultsList.size() >= totalResults) { // Enough results break; } } } catch (const std::exception& ex) { #ifdef DEBUG clog << "OpenSearchResponseParser::parse: caught exception: " << ex.what() << endl; #endif foundResult = false; } return foundResult; } OpenSearchParser::OpenSearchParser(const string &fileName) : PluginParserInterface(fileName) { } OpenSearchParser::~OpenSearchParser() { } ResponseParserInterface *OpenSearchParser::parse(SearchPluginProperties &properties, bool minimal) { struct stat fileStat; bool rssResponse = true, success = true; if ((m_fileName.empty() == true) || (stat(m_fileName.c_str(), &fileStat) != 0) || (!S_ISREG(fileStat.st_mode))) { return NULL; } try { DomParser parser; parser.set_substitute_entities(true); parser.parse_file(m_fileName); xmlpp::Document *pDocument = parser.get_document(); if (pDocument == NULL) { return NULL; } Node *pNode = pDocument->get_root_node(); Element *pRootElem = dynamic_cast(pNode); if (pRootElem == NULL) { return NULL; } // Check the top-level element is what we expect // MozSearch is very much like OpenSearch Description ustring rootNodeName = pRootElem->get_name(); if ((rootNodeName != "OpenSearchDescription") && (rootNodeName != "SearchPlugin")) { #ifdef DEBUG clog << "OpenSearchParser::parse: wrong root node " << rootNodeName << endl; #endif return NULL; } // Go through the subnodes Node::NodeList childNodes = pRootElem->get_children(); if (childNodes.empty() == false) { for (Node::NodeList::const_iterator iter = childNodes.begin(); iter != childNodes.end(); ++iter) { Node *pChildNode = (*iter); Element *pElem = dynamic_cast(pChildNode); if (pElem == NULL) { continue; } ustring nodeName(pChildNode->get_name()); ustring nodeContent(getNodeContent(pChildNode)); if (nodeName == "ShortName") { // Ignore LongName, use this as long name properties.m_longName = nodeContent; } else if (nodeName == "Url") { ustring url, type; SearchPluginProperties::Response response = SearchPluginProperties::RSS_RESPONSE; bool getMethod = true; // Parse Query Syntax Element::AttributeList attributes = pElem->get_attributes(); for (Element::AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) { Attribute *pAttr = (*iter); if (pAttr != NULL) { ustring attrName = pAttr->get_name(); ustring attrContent = pAttr->get_value(); if (attrName == "template") { url = attrContent; } else if (attrName == "type") { type = attrContent; } else if (attrName == "method") { // GET is the default method if (StringManip::toLowerCase(attrContent) != "get") { getMethod = false; } } } } // Did we get the URL ? if (url.empty() == true) { // It's probably provided as content, v1.0 style url = nodeContent; } if (getMethod == true) { string::size_type startPos = 0, pos = url.find("?"); // Do we support that type ? if (type == "application/atom+xml") { response = SearchPluginProperties::ATOM_RESPONSE; rssResponse = false; } else if ((type.empty() == false) && (type != "application/rss+xml")) { response = SearchPluginProperties::UNKNOWN_RESPONSE; #ifdef DEBUG clog << "OpenSearchParser::parse: unsupported response type " << type << endl; #endif continue; } // Break the URL down into base and parameters if (pos != string::npos) { string params(url.substr(pos + 1)); // URL properties.m_baseUrl = url.substr(0, pos); #ifdef DEBUG clog << "OpenSearchParser::parse: URL is " << url << endl; #endif // Split this into the actual parameters params += "&"; pos = params.find("&"); while (pos != string::npos) { string parameter(params.substr(startPos, pos - startPos)); string::size_type equalPos = parameter.find("="); if (equalPos != string::npos) { string paramName(parameter.substr(0, equalPos)); string paramValue(parameter.substr(equalPos + 1)); SearchPluginProperties::ParameterVariable param = SearchPluginProperties::UNKNOWN_PARAM; if (paramValue == "{searchTerms}") { param = SearchPluginProperties::SEARCH_TERMS_PARAM; } else if (paramValue == "{count}") { param = SearchPluginProperties::COUNT_PARAM; } else if (paramValue == "{startIndex}") { param = SearchPluginProperties::START_INDEX_PARAM; } else if (paramValue == "{startPage}") { param = SearchPluginProperties::START_PAGE_PARAM; } else if (paramValue == "{language}") { param = SearchPluginProperties::LANGUAGE_PARAM; } else if (paramValue == "{outputEncoding}") { param = SearchPluginProperties::OUTPUT_ENCODING_PARAM; } else if (paramValue == "{inputEncoding}") { param = SearchPluginProperties::INPUT_ENCODING_PARAM; } if (param != SearchPluginProperties::UNKNOWN_PARAM) { properties.m_variableParameters[param] = paramName; } else { #ifdef DEBUG clog << "OpenSearchParser::parse: " << paramName << "=" << paramValue << endl; #endif if (paramValue.substr(0, 5) == "EDIT:") { // This is user editable properties.m_editableParameters[paramName] = paramValue.substr(5); } else { // Append to the remainder if (properties.m_remainder.empty() == false) { properties.m_remainder += "&"; } properties.m_remainder += paramName; properties.m_remainder += "="; properties.m_remainder += paramValue; } } } // Next startPos = pos + 1; pos = params.find_first_of("&", startPos); } } // Method properties.m_method = SearchPluginProperties::GET_METHOD; // Output type properties.m_outputType = type; // Response properties.m_response = response; } // We ignore Param as we only support GET } else if (nodeName == "Tags") { // This is supposed to be a space-delimited list, but use the whole thing as channel properties.m_channel = nodeContent; } else if (nodeName == "Language") { properties.m_languages.insert(nodeContent); } } } } catch (const std::exception& ex) { #ifdef DEBUG clog << "OpenSearchParser::parse: caught exception: " << ex.what() << endl; #endif success = false; } if (success == false) { return NULL; } // Scrolling properties.m_nextBase = 1; if (properties.m_variableParameters.find(SearchPluginProperties::START_PAGE_PARAM) != properties.m_variableParameters.end()) { properties.m_scrolling = SearchPluginProperties::PER_PAGE; properties.m_nextIncrement = 1; } else if ((properties.m_variableParameters.find(SearchPluginProperties::COUNT_PARAM) != properties.m_variableParameters.end()) || (properties.m_variableParameters.find(SearchPluginProperties::START_INDEX_PARAM) != properties.m_variableParameters.end())) { properties.m_scrolling = SearchPluginProperties::PER_INDEX; properties.m_nextIncrement = 0; } else { // No scrolling properties.m_nextIncrement = 0; properties.m_nextBase = 0; } return new OpenSearchResponseParser(rssResponse); } pinot-1.22/IndexSearch/OpenSearchParser.h000066400000000000000000000043751470740426600203640ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _OPENSEARCH_PARSER_H #define _OPENSEARCH_PARSER_H #include #include "Document.h" #include "PluginParsers.h" /// Parses OpenSearch Response. class OpenSearchResponseParser : public ResponseParserInterface { public: OpenSearchResponseParser(bool rssResponse); virtual ~OpenSearchResponseParser(); /// Parses the response; false if not all could be parsed. virtual bool parse(const Document *pResponseDoc, std::vector &resultsList, unsigned int &totalResults, unsigned int &firstResultIndex, std::string &charset) const; protected: bool m_rssResponse; private: OpenSearchResponseParser(const OpenSearchResponseParser &other); OpenSearchResponseParser& operator=(const OpenSearchResponseParser& other); }; /** A parser for OpenSearch Description and Query Syntax, version 1.1. * See http://opensearch.a9.com/spec/1.1/description/ * and http://opensearch.a9.com/spec/1.1/querysyntax/ * It can also parse MozSearch plugins. * See http://developer.mozilla.org/en/docs/Creating_MozSearch_plugins */ class OpenSearchParser : public PluginParserInterface { public: OpenSearchParser(const std::string &fileName); virtual ~OpenSearchParser(); /// Parses the plugin and returns a response parser. virtual ResponseParserInterface *parse(SearchPluginProperties &properties, bool minimal = false); private: OpenSearchParser(const OpenSearchParser &other); OpenSearchParser& operator=(const OpenSearchParser& other); }; #endif // _OPENSEARCH_PARSER_H pinot-1.22/IndexSearch/PinotDBus_proxy.cpp000066400000000000000000001252141470740426600206170ustar00rootroot00000000000000/* * Generated by gdbus-codegen-glibmm 2.42.0. DO NOT EDIT. * * The license of this code is the same as for the source it was derived from. */ #include "PinotDBus_proxy.h" #include template inline T specialGetter(Glib::Variant variant) { return variant.get(); } template<> inline std::string specialGetter(Glib::Variant variant) { // String is not guaranteed to be null-terminated, so don't use ::get() gsize n_elem; gsize elem_size = sizeof(char); char* data = (char*)g_variant_get_fixed_array(variant.gobj(), &n_elem, elem_size); return std::string(data, n_elem); } void org::freedesktop::DBus::IntrospectableProxy::Introspect( const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; m_proxy->call("Introspect", callback, cancellable, base, timeout_msec); } void org::freedesktop::DBus::IntrospectableProxy::Introspect_finish( Glib::ustring &out_data, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_data_v; wrapped.get_child(out_data_v, 0); out_data = out_data_v.get(); } Glib::ustring org::freedesktop::DBus::IntrospectableProxy::Introspect_sync( const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("Introspect", cancellable, base, timeout_msec); Glib::ustring out_data; Glib::Variant out_data_v; wrapped.get_child(out_data_v, 0); out_data = out_data_v.get(); return out_data; } void org::freedesktop::DBus::IntrospectableProxy::handle_signal(const Glib::ustring&/* sender_name */, const Glib::ustring& signal_name, const Glib::VariantContainerBase& parameters) { static_cast(signal_name); // maybe unused static_cast(parameters); // maybe unused } void org::freedesktop::DBus::IntrospectableProxy::handle_properties_changed( const Gio::DBus::Proxy::MapChangedProperties &changed_properties, const std::vector &/* invalidated_properties */) { static_cast(changed_properties); // maybe unused // Only check changed_properties since value will already be cached. Glib can be setup to get // values of invalidated properties in which case property will be in changed_properties when // value is actually received. See Gio::DBus::ProxyFlags::PROXY_FLAGS_GET_INVALIDATED_PROPERTIES . } org::freedesktop::DBus::IntrospectableProxy::IntrospectableProxy(const Glib::RefPtr &proxy) : m_proxy(proxy) { m_proxy->signal_signal().connect(sigc::mem_fun(this, &IntrospectableProxy::handle_signal)); m_proxy->signal_properties_changed(). connect(sigc::mem_fun(this, &IntrospectableProxy::handle_properties_changed)); } void org::freedesktop::DBus::IntrospectableProxy::createForBus( Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable) { Gio::DBus::Proxy::create_for_bus(busType, name, objectPath, "org.freedesktop.DBus.Introspectable", slot, cancellable, Glib::RefPtr(), proxyFlags); } Glib::RefPtr org::freedesktop::DBus::IntrospectableProxy::createForBusFinish(const Glib::RefPtr &result) { Glib::RefPtr proxy = Gio::DBus::Proxy::create_for_bus_finish(result); org::freedesktop::DBus::IntrospectableProxy *p = new org::freedesktop::DBus::IntrospectableProxy(proxy); return Glib::RefPtr(p); } Glib::RefPtr org::freedesktop::DBus::IntrospectableProxy::createForBus_sync( Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Glib::RefPtr &cancellable) { Glib::RefPtr proxy = Gio::DBus::Proxy::create_for_bus_sync(busType, name, objectPath, "org.freedesktop.DBus.Introspectable", cancellable, Glib::RefPtr(), proxyFlags); org::freedesktop::DBus::IntrospectableProxy *p = new org::freedesktop::DBus::IntrospectableProxy(proxy); return Glib::RefPtr(p); }/** * Retrieves statistics. * crawledCount: the number of documents crawled * docsCount: the number of documents in the index */ void com::github::fabricecolin::PinotProxy::GetStatistics( const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; m_proxy->call("GetStatistics", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetStatistics_finish( guint32 &out_crawledCount, guint32 &out_docsCount, bool &out_lowDiskSpace, bool &out_onBattery, bool &out_crawling, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_crawledCount_v; wrapped.get_child(out_crawledCount_v, 0); out_crawledCount = out_crawledCount_v.get(); Glib::Variant out_docsCount_v; wrapped.get_child(out_docsCount_v, 1); out_docsCount = out_docsCount_v.get(); Glib::Variant out_lowDiskSpace_v; wrapped.get_child(out_lowDiskSpace_v, 2); out_lowDiskSpace = out_lowDiskSpace_v.get(); Glib::Variant out_onBattery_v; wrapped.get_child(out_onBattery_v, 3); out_onBattery = out_onBattery_v.get(); Glib::Variant out_crawling_v; wrapped.get_child(out_crawling_v, 4); out_crawling = out_crawling_v.get(); } std::tuple com::github::fabricecolin::PinotProxy::GetStatistics_sync( const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetStatistics", cancellable, base, timeout_msec); guint32 out_crawledCount; Glib::Variant out_crawledCount_v; wrapped.get_child(out_crawledCount_v, 0); out_crawledCount = out_crawledCount_v.get(); guint32 out_docsCount; Glib::Variant out_docsCount_v; wrapped.get_child(out_docsCount_v, 1); out_docsCount = out_docsCount_v.get(); bool out_lowDiskSpace; Glib::Variant out_lowDiskSpace_v; wrapped.get_child(out_lowDiskSpace_v, 2); out_lowDiskSpace = out_lowDiskSpace_v.get(); bool out_onBattery; Glib::Variant out_onBattery_v; wrapped.get_child(out_onBattery_v, 3); out_onBattery = out_onBattery_v.get(); bool out_crawling; Glib::Variant out_crawling_v; wrapped.get_child(out_crawling_v, 4); out_crawling = out_crawling_v.get(); return std::make_tuple( std::move(out_crawledCount), std::move(out_docsCount), std::move(out_lowDiskSpace), std::move(out_onBattery), std::move(out_crawling) ); } /** * Instructs the daemon program to reload the configuration file. * reloading: TRUE if the configuration is being reloaded */ void com::github::fabricecolin::PinotProxy::Reload( const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; m_proxy->call("Reload", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::Reload_finish( bool &out_reloading, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_reloading_v; wrapped.get_child(out_reloading_v, 0); out_reloading = out_reloading_v.get(); } bool com::github::fabricecolin::PinotProxy::Reload_sync( const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("Reload", cancellable, base, timeout_msec); bool out_reloading; Glib::Variant out_reloading_v; wrapped.get_child(out_reloading_v, 0); out_reloading = out_reloading_v.get(); return out_reloading; } /** * Stops the daemon program. * exitStatus: the daemon's exit status */ void com::github::fabricecolin::PinotProxy::Stop( const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; m_proxy->call("Stop", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::Stop_finish( gint32 &out_exitStatus, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_exitStatus_v; wrapped.get_child(out_exitStatus_v, 0); out_exitStatus = out_exitStatus_v.get(); } gint32 com::github::fabricecolin::PinotProxy::Stop_sync( const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("Stop", cancellable, base, timeout_msec); gint32 out_exitStatus; Glib::Variant out_exitStatus_v; wrapped.get_child(out_exitStatus_v, 0); out_exitStatus = out_exitStatus_v.get(); return out_exitStatus; } /** * Returns a document's properties. * docId: the document's ID * fields: array of (s name, s value) structures with name one of * "caption", "url", "type", "language", "modtime", "size", "extract" */ void com::github::fabricecolin::PinotProxy::GetDocumentInfo( guint32 arg_docId, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentInfo_pack( arg_docId); m_proxy->call("GetDocumentInfo", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetDocumentInfo_finish( std::vector> &out_fields, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant>> out_fields_v; wrapped.get_child(out_fields_v, 0); out_fields = out_fields_v.get(); } std::vector> com::github::fabricecolin::PinotProxy::GetDocumentInfo_sync( guint32 arg_docId, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentInfo_pack( arg_docId); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetDocumentInfo", cancellable, base, timeout_msec); std::vector> out_fields; Glib::Variant>> out_fields_v; wrapped.get_child(out_fields_v, 0); out_fields = out_fields_v.get(); return out_fields; } /** * Returns a document's terms count. * docId: the document's ID * count: the terms count */ void com::github::fabricecolin::PinotProxy::GetDocumentTermsCount( guint32 arg_docId, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentTermsCount_pack( arg_docId); m_proxy->call("GetDocumentTermsCount", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetDocumentTermsCount_finish( guint32 &out_count, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_count_v; wrapped.get_child(out_count_v, 0); out_count = out_count_v.get(); } guint32 com::github::fabricecolin::PinotProxy::GetDocumentTermsCount_sync( guint32 arg_docId, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentTermsCount_pack( arg_docId); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetDocumentTermsCount", cancellable, base, timeout_msec); guint32 out_count; Glib::Variant out_count_v; wrapped.get_child(out_count_v, 0); out_count = out_count_v.get(); return out_count; } /** * Returns a document's terms. * docId: the document's ID * terms: array of terms */ void com::github::fabricecolin::PinotProxy::GetDocumentTerms( guint32 arg_docId, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentTerms_pack( arg_docId); m_proxy->call("GetDocumentTerms", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetDocumentTerms_finish( std::vector &out_terms, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant> out_terms_v; wrapped.get_child(out_terms_v, 0); out_terms = out_terms_v.get(); } std::vector com::github::fabricecolin::PinotProxy::GetDocumentTerms_sync( guint32 arg_docId, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentTerms_pack( arg_docId); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetDocumentTerms", cancellable, base, timeout_msec); std::vector out_terms; Glib::Variant> out_terms_v; wrapped.get_child(out_terms_v, 0); out_terms = out_terms_v.get(); return out_terms; } /** * Gets the list of known labels. * labels: array of labels */ void com::github::fabricecolin::PinotProxy::GetLabels( const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; m_proxy->call("GetLabels", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetLabels_finish( std::vector &out_labels, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant> out_labels_v; wrapped.get_child(out_labels_v, 0); out_labels = out_labels_v.get(); } std::vector com::github::fabricecolin::PinotProxy::GetLabels_sync( const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetLabels", cancellable, base, timeout_msec); std::vector out_labels; Glib::Variant> out_labels_v; wrapped.get_child(out_labels_v, 0); out_labels = out_labels_v.get(); return out_labels; } /** * Adds a label. * label: the name of the new label */ void com::github::fabricecolin::PinotProxy::AddLabel( const Glib::ustring & arg_label, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::AddLabel_pack( arg_label); m_proxy->call("AddLabel", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::AddLabel_finish( Glib::ustring &out_label, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_label_v; wrapped.get_child(out_label_v, 0); out_label = out_label_v.get(); } Glib::ustring com::github::fabricecolin::PinotProxy::AddLabel_sync( const Glib::ustring & arg_label, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::AddLabel_pack( arg_label); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("AddLabel", cancellable, base, timeout_msec); Glib::ustring out_label; Glib::Variant out_label_v; wrapped.get_child(out_label_v, 0); out_label = out_label_v.get(); return out_label; } /** * Deletes all references to a label. * label: the name of the label to delete */ void com::github::fabricecolin::PinotProxy::DeleteLabel( const Glib::ustring & arg_label, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::DeleteLabel_pack( arg_label); m_proxy->call("DeleteLabel", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::DeleteLabel_finish( Glib::ustring &out_label, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_label_v; wrapped.get_child(out_label_v, 0); out_label = out_label_v.get(); } Glib::ustring com::github::fabricecolin::PinotProxy::DeleteLabel_sync( const Glib::ustring & arg_label, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::DeleteLabel_pack( arg_label); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("DeleteLabel", cancellable, base, timeout_msec); Glib::ustring out_label; Glib::Variant out_label_v; wrapped.get_child(out_label_v, 0); out_label = out_label_v.get(); return out_label; } /** * Determines whether a document has a label. * docId: the document's ID * label: the label to check */ void com::github::fabricecolin::PinotProxy::HasLabel( guint32 arg_docId, const Glib::ustring & arg_label, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::HasLabel_pack( arg_docId, arg_label); m_proxy->call("HasLabel", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::HasLabel_finish( guint32 &out_docId, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); } guint32 com::github::fabricecolin::PinotProxy::HasLabel_sync( guint32 arg_docId, const Glib::ustring & arg_label, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::HasLabel_pack( arg_docId, arg_label); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("HasLabel", cancellable, base, timeout_msec); guint32 out_docId; Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); return out_docId; } /** * Returns a document's labels. * docId: the document's ID * labels: array of labels applied to the document */ void com::github::fabricecolin::PinotProxy::GetDocumentLabels( guint32 arg_docId, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentLabels_pack( arg_docId); m_proxy->call("GetDocumentLabels", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetDocumentLabels_finish( std::vector &out_labels, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant> out_labels_v; wrapped.get_child(out_labels_v, 0); out_labels = out_labels_v.get(); } std::vector com::github::fabricecolin::PinotProxy::GetDocumentLabels_sync( guint32 arg_docId, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentLabels_pack( arg_docId); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetDocumentLabels", cancellable, base, timeout_msec); std::vector out_labels; Glib::Variant> out_labels_v; wrapped.get_child(out_labels_v, 0); out_labels = out_labels_v.get(); return out_labels; } /** * Sets a document's labels. * docId: the document's ID * labels: array of labels to apply to the document * resetLabels: TRUE if existing labels should be unset */ void com::github::fabricecolin::PinotProxy::SetDocumentLabels( guint32 arg_docId, const std::vector & arg_labels, bool arg_resetLabels, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SetDocumentLabels_pack( arg_docId, arg_labels, arg_resetLabels); m_proxy->call("SetDocumentLabels", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::SetDocumentLabels_finish( guint32 &out_docId, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); } guint32 com::github::fabricecolin::PinotProxy::SetDocumentLabels_sync( guint32 arg_docId, const std::vector & arg_labels, bool arg_resetLabels, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SetDocumentLabels_pack( arg_docId, arg_labels, arg_resetLabels); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("SetDocumentLabels", cancellable, base, timeout_msec); guint32 out_docId; Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); return out_docId; } /** * Sets documents' labels. * docIds: array of document IDs * labels: array of labels to apply to the documents * resetLabels: TRUE if existing labels should be unset */ void com::github::fabricecolin::PinotProxy::SetDocumentsLabels( const std::vector & arg_docIds, const std::vector & arg_labels, bool arg_resetLabels, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SetDocumentsLabels_pack( arg_docIds, arg_labels, arg_resetLabels); m_proxy->call("SetDocumentsLabels", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::SetDocumentsLabels_finish( bool &out_status, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_status_v; wrapped.get_child(out_status_v, 0); out_status = out_status_v.get(); } bool com::github::fabricecolin::PinotProxy::SetDocumentsLabels_sync( const std::vector & arg_docIds, const std::vector & arg_labels, bool arg_resetLabels, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SetDocumentsLabels_pack( arg_docIds, arg_labels, arg_resetLabels); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("SetDocumentsLabels", cancellable, base, timeout_msec); bool out_status; Glib::Variant out_status_v; wrapped.get_child(out_status_v, 0); out_status = out_status_v.get(); return out_status; } /** * Checks whether the given URL is in the index. * docId: the document's ID */ void com::github::fabricecolin::PinotProxy::HasDocument( const Glib::ustring & arg_url, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::HasDocument_pack( arg_url); m_proxy->call("HasDocument", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::HasDocument_finish( guint32 &out_docId, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); } guint32 com::github::fabricecolin::PinotProxy::HasDocument_sync( const Glib::ustring & arg_url, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::HasDocument_pack( arg_url); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("HasDocument", cancellable, base, timeout_msec); guint32 out_docId; Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); return out_docId; } /** * Gets terms with the same root. * term: the base term * terms: array of suggested terms */ void com::github::fabricecolin::PinotProxy::GetCloseTerms( const Glib::ustring & arg_term, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetCloseTerms_pack( arg_term); m_proxy->call("GetCloseTerms", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetCloseTerms_finish( std::vector &out_terms, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant> out_terms_v; wrapped.get_child(out_terms_v, 0); out_terms = out_terms_v.get(); } std::vector com::github::fabricecolin::PinotProxy::GetCloseTerms_sync( const Glib::ustring & arg_term, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetCloseTerms_pack( arg_term); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetCloseTerms", cancellable, base, timeout_msec); std::vector out_terms; Glib::Variant> out_terms_v; wrapped.get_child(out_terms_v, 0); out_terms = out_terms_v.get(); return out_terms; } /** * Returns the number of documents. * label: a label name * count: the terms count */ void com::github::fabricecolin::PinotProxy::GetDocumentsCount( const Glib::ustring & arg_label, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentsCount_pack( arg_label); m_proxy->call("GetDocumentsCount", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::GetDocumentsCount_finish( guint32 &out_count, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_count_v; wrapped.get_child(out_count_v, 0); out_count = out_count_v.get(); } guint32 com::github::fabricecolin::PinotProxy::GetDocumentsCount_sync( const Glib::ustring & arg_label, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::GetDocumentsCount_pack( arg_label); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("GetDocumentsCount", cancellable, base, timeout_msec); guint32 out_count; Glib::Variant out_count_v; wrapped.get_child(out_count_v, 0); out_count = out_count_v.get(); return out_count; } /** * Lists documents. * term: the term to optionally filter documents with * termType: the term type * maxCount: the maximum count * startOffset: the start offset * docIds: array of document ID */ void com::github::fabricecolin::PinotProxy::ListDocuments( const Glib::ustring & arg_term, guint32 arg_termType, guint32 arg_maxCount, guint32 arg_startOffset, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::ListDocuments_pack( arg_term, arg_termType, arg_maxCount, arg_startOffset); m_proxy->call("ListDocuments", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::ListDocuments_finish( std::vector &out_docIds, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant> out_docIds_v; wrapped.get_child(out_docIds_v, 0); out_docIds = out_docIds_v.get(); } std::vector com::github::fabricecolin::PinotProxy::ListDocuments_sync( const Glib::ustring & arg_term, guint32 arg_termType, guint32 arg_maxCount, guint32 arg_startOffset, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::ListDocuments_pack( arg_term, arg_termType, arg_maxCount, arg_startOffset); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("ListDocuments", cancellable, base, timeout_msec); std::vector out_docIds; Glib::Variant> out_docIds_v; wrapped.get_child(out_docIds_v, 0); out_docIds = out_docIds_v.get(); return out_docIds; } /** * Updates the given document. * docId: the document's ID */ void com::github::fabricecolin::PinotProxy::UpdateDocument( guint32 arg_docId, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::UpdateDocument_pack( arg_docId); m_proxy->call("UpdateDocument", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::UpdateDocument_finish( guint32 &out_docId, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); } guint32 com::github::fabricecolin::PinotProxy::UpdateDocument_sync( guint32 arg_docId, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::UpdateDocument_pack( arg_docId); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("UpdateDocument", cancellable, base, timeout_msec); guint32 out_docId; Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); return out_docId; } /** * Sets a document's properties. * docId: the document's ID * fields: array of (s name, s value) structures with name one of * "caption", "url", "type", "language", "modtime", "size", "extract" */ void com::github::fabricecolin::PinotProxy::SetDocumentInfo( guint32 arg_docId, const std::vector> & arg_fields, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SetDocumentInfo_pack( arg_docId, arg_fields); m_proxy->call("SetDocumentInfo", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::SetDocumentInfo_finish( guint32 &out_docId, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); } guint32 com::github::fabricecolin::PinotProxy::SetDocumentInfo_sync( guint32 arg_docId, const std::vector> & arg_fields, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SetDocumentInfo_pack( arg_docId, arg_fields); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("SetDocumentInfo", cancellable, base, timeout_msec); guint32 out_docId; Glib::Variant out_docId_v; wrapped.get_child(out_docId_v, 0); out_docId = out_docId_v.get(); return out_docId; } /** * Queries the index. * engineType : engine type (defaults to "xapian"). See pinot-search(1) for a list of supported types * engineName : engine name (defaults to "~/.pinot/daemon"). See pinot-search(1) for examples * searchText : search text, as would be entered in Pinot's live query field * startDoc: the first result to return, starting from 0 * maxHits: the maximum number of hits desired * estimatedHits: an estimate of the total number of hits * hitsList: hit properties */ void com::github::fabricecolin::PinotProxy::Query( const Glib::ustring & arg_engineType, const Glib::ustring & arg_engineName, const Glib::ustring & arg_searchText, guint32 arg_startDoc, guint32 arg_maxHits, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::Query_pack( arg_engineType, arg_engineName, arg_searchText, arg_startDoc, arg_maxHits); m_proxy->call("Query", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::Query_finish( guint32 &out_estimatedHits, std::vector>> &out_hitsList, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant out_estimatedHits_v; wrapped.get_child(out_estimatedHits_v, 0); out_estimatedHits = out_estimatedHits_v.get(); Glib::Variant>>> out_hitsList_v; wrapped.get_child(out_hitsList_v, 1); out_hitsList = out_hitsList_v.get(); } std::tuple>>> com::github::fabricecolin::PinotProxy::Query_sync( const Glib::ustring & arg_engineType, const Glib::ustring & arg_engineName, const Glib::ustring & arg_searchText, guint32 arg_startDoc, guint32 arg_maxHits, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::Query_pack( arg_engineType, arg_engineName, arg_searchText, arg_startDoc, arg_maxHits); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("Query", cancellable, base, timeout_msec); guint32 out_estimatedHits; Glib::Variant out_estimatedHits_v; wrapped.get_child(out_estimatedHits_v, 0); out_estimatedHits = out_estimatedHits_v.get(); std::vector>> out_hitsList; Glib::Variant>>> out_hitsList_v; wrapped.get_child(out_hitsList_v, 1); out_hitsList = out_hitsList_v.get(); return std::make_tuple( std::move(out_estimatedHits), std::move(out_hitsList) ); } /** * Queries the index. * searchText : search text, as would be entered in Pinot's live query field * maxHits: the maximum number of hits desired * docIds: array of document IDs * docIdsCount: the number of document IDs in the array * WARNING: this method is obsolete */ void com::github::fabricecolin::PinotProxy::SimpleQuery( const Glib::ustring & arg_searchText, guint32 arg_maxHits, const Gio::SlotAsyncReady &callback, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SimpleQuery_pack( arg_searchText, arg_maxHits); m_proxy->call("SimpleQuery", callback, cancellable, base, timeout_msec); } void com::github::fabricecolin::PinotProxy::SimpleQuery_finish( std::vector &out_docIds, const Glib::RefPtr &result) { Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_finish(result); Glib::Variant> out_docIds_v; wrapped.get_child(out_docIds_v, 0); out_docIds = out_docIds_v.get(); } std::vector com::github::fabricecolin::PinotProxy::SimpleQuery_sync( const Glib::ustring & arg_searchText, guint32 arg_maxHits, const Glib::RefPtr &cancellable, int timeout_msec) { Glib::VariantContainerBase base; base = PinotTypeWrap::SimpleQuery_pack( arg_searchText, arg_maxHits); Glib::VariantContainerBase wrapped; wrapped = m_proxy->call_sync("SimpleQuery", cancellable, base, timeout_msec); std::vector out_docIds; Glib::Variant> out_docIds_v; wrapped.get_child(out_docIds_v, 0); out_docIds = out_docIds_v.get(); return out_docIds; } Glib::ustring com::github::fabricecolin::PinotProxy::DaemonVersion_get(bool *ok) { Glib::Variant b; m_proxy->get_cached_property(b, "DaemonVersion"); if (b) { if (ok) { *ok = true; } return (specialGetter(b)); } else { if (ok) { *ok = false; } else { g_warning("Unhandled error while getting property DaemonVersion"); } return Glib::ustring(); } } guint32 com::github::fabricecolin::PinotProxy::IndexFlushEpoch_get(bool *ok) { Glib::Variant b; m_proxy->get_cached_property(b, "IndexFlushEpoch"); if (b) { if (ok) { *ok = true; } return (specialGetter(b)); } else { if (ok) { *ok = false; } else { g_warning("Unhandled error while getting property IndexFlushEpoch"); } return guint32(); } } void com::github::fabricecolin::PinotProxy::handle_signal(const Glib::ustring&/* sender_name */, const Glib::ustring& signal_name, const Glib::VariantContainerBase& parameters) { static_cast(signal_name); // maybe unused static_cast(parameters); // maybe unused } void com::github::fabricecolin::PinotProxy::handle_properties_changed( const Gio::DBus::Proxy::MapChangedProperties &changed_properties, const std::vector &/* invalidated_properties */) { static_cast(changed_properties); // maybe unused // Only check changed_properties since value will already be cached. Glib can be setup to get // values of invalidated properties in which case property will be in changed_properties when // value is actually received. See Gio::DBus::ProxyFlags::PROXY_FLAGS_GET_INVALIDATED_PROPERTIES . if (changed_properties.find("DaemonVersion") != changed_properties.cend()) m_DaemonVersion_changed.emit(); if (changed_properties.find("IndexFlushEpoch") != changed_properties.cend()) m_IndexFlushEpoch_changed.emit(); } com::github::fabricecolin::PinotProxy::PinotProxy(const Glib::RefPtr &proxy) : m_proxy(proxy) { m_proxy->signal_signal().connect(sigc::mem_fun(this, &PinotProxy::handle_signal)); m_proxy->signal_properties_changed(). connect(sigc::mem_fun(this, &PinotProxy::handle_properties_changed)); } void com::github::fabricecolin::PinotProxy::createForBus( Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable) { Gio::DBus::Proxy::create_for_bus(busType, name, objectPath, "com.github.fabricecolin.Pinot", slot, cancellable, Glib::RefPtr(), proxyFlags); } Glib::RefPtr com::github::fabricecolin::PinotProxy::createForBusFinish(const Glib::RefPtr &result) { Glib::RefPtr proxy = Gio::DBus::Proxy::create_for_bus_finish(result); com::github::fabricecolin::PinotProxy *p = new com::github::fabricecolin::PinotProxy(proxy); return Glib::RefPtr(p); } Glib::RefPtr com::github::fabricecolin::PinotProxy::createForBus_sync( Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Glib::RefPtr &cancellable) { Glib::RefPtr proxy = Gio::DBus::Proxy::create_for_bus_sync(busType, name, objectPath, "com.github.fabricecolin.Pinot", cancellable, Glib::RefPtr(), proxyFlags); com::github::fabricecolin::PinotProxy *p = new com::github::fabricecolin::PinotProxy(proxy); return Glib::RefPtr(p); } pinot-1.22/IndexSearch/PinotDBus_proxy.h000066400000000000000000000354351470740426600202710ustar00rootroot00000000000000#pragma once #include #include #include #include #include #include "PinotDBus_common.h" namespace org { namespace freedesktop { namespace DBus { class IntrospectableProxy : public Glib::ObjectBase { public: static void createForBus(Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}); static Glib::RefPtr createForBusFinish (const Glib::RefPtr &result); static Glib::RefPtr createForBus_sync( Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Glib::RefPtr &cancellable = {}); Glib::RefPtr dbusProxy() const { return m_proxy; } void Introspect( const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Introspect_finish ( Glib::ustring &data, const Glib::RefPtr &res); Glib::ustring Introspect_sync( const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void reference() const override {} void unreference() const override {} protected: Glib::RefPtr m_proxy; private: IntrospectableProxy(const Glib::RefPtr &proxy); void handle_signal(const Glib::ustring &sender_name, const Glib::ustring &signal_name, const Glib::VariantContainerBase ¶meters); void handle_properties_changed(const Gio::DBus::Proxy::MapChangedProperties &changed_properties, const std::vector &invalidated_properties); }; }// DBus }// freedesktop }// org namespace com { namespace github { namespace fabricecolin { class PinotProxy : public Glib::ObjectBase { public: static void createForBus(Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}); static Glib::RefPtr createForBusFinish (const Glib::RefPtr &result); static Glib::RefPtr createForBus_sync( Gio::DBus::BusType busType, Gio::DBus::ProxyFlags proxyFlags, const std::string &name, const std::string &objectPath, const Glib::RefPtr &cancellable = {}); Glib::RefPtr dbusProxy() const { return m_proxy; } void GetStatistics( const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetStatistics_finish ( guint32 &crawledCount, guint32 &docsCount, bool &lowDiskSpace, bool &onBattery, bool &crawling, const Glib::RefPtr &res); std::tuple GetStatistics_sync( const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Reload( const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Reload_finish ( bool &reloading, const Glib::RefPtr &res); bool Reload_sync( const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Stop( const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Stop_finish ( gint32 &exitStatus, const Glib::RefPtr &res); gint32 Stop_sync( const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentInfo( guint32 docId, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentInfo_finish ( std::vector> &fields, const Glib::RefPtr &res); std::vector> GetDocumentInfo_sync( guint32 docId,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentTermsCount( guint32 docId, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentTermsCount_finish ( guint32 &count, const Glib::RefPtr &res); guint32 GetDocumentTermsCount_sync( guint32 docId,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentTerms( guint32 docId, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentTerms_finish ( std::vector &terms, const Glib::RefPtr &res); std::vector GetDocumentTerms_sync( guint32 docId,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetLabels( const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetLabels_finish ( std::vector &labels, const Glib::RefPtr &res); std::vector GetLabels_sync( const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void AddLabel( const Glib::ustring & label, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void AddLabel_finish ( Glib::ustring &label, const Glib::RefPtr &res); Glib::ustring AddLabel_sync( const Glib::ustring & label,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void DeleteLabel( const Glib::ustring & label, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void DeleteLabel_finish ( Glib::ustring &label, const Glib::RefPtr &res); Glib::ustring DeleteLabel_sync( const Glib::ustring & label,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void HasLabel( guint32 docId, const Glib::ustring & label, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void HasLabel_finish ( guint32 &docId, const Glib::RefPtr &res); guint32 HasLabel_sync( guint32 docId, const Glib::ustring & label,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentLabels( guint32 docId, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentLabels_finish ( std::vector &labels, const Glib::RefPtr &res); std::vector GetDocumentLabels_sync( guint32 docId,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SetDocumentLabels( guint32 docId, const std::vector & labels, bool resetLabels, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SetDocumentLabels_finish ( guint32 &docId, const Glib::RefPtr &res); guint32 SetDocumentLabels_sync( guint32 docId, const std::vector & labels, bool resetLabels,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SetDocumentsLabels( const std::vector & docIds, const std::vector & labels, bool resetLabels, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SetDocumentsLabels_finish ( bool &status, const Glib::RefPtr &res); bool SetDocumentsLabels_sync( const std::vector & docIds, const std::vector & labels, bool resetLabels,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void HasDocument( const Glib::ustring & url, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void HasDocument_finish ( guint32 &docId, const Glib::RefPtr &res); guint32 HasDocument_sync( const Glib::ustring & url,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetCloseTerms( const Glib::ustring & term, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetCloseTerms_finish ( std::vector &terms, const Glib::RefPtr &res); std::vector GetCloseTerms_sync( const Glib::ustring & term,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentsCount( const Glib::ustring & label, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void GetDocumentsCount_finish ( guint32 &count, const Glib::RefPtr &res); guint32 GetDocumentsCount_sync( const Glib::ustring & label,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void ListDocuments( const Glib::ustring & term, guint32 termType, guint32 maxCount, guint32 startOffset, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void ListDocuments_finish ( std::vector &docIds, const Glib::RefPtr &res); std::vector ListDocuments_sync( const Glib::ustring & term, guint32 termType, guint32 maxCount, guint32 startOffset,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void UpdateDocument( guint32 docId, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void UpdateDocument_finish ( guint32 &docId, const Glib::RefPtr &res); guint32 UpdateDocument_sync( guint32 docId,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SetDocumentInfo( guint32 docId, const std::vector> & fields, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SetDocumentInfo_finish ( guint32 &docId, const Glib::RefPtr &res); guint32 SetDocumentInfo_sync( guint32 docId, const std::vector> & fields,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Query( const Glib::ustring & engineType, const Glib::ustring & engineName, const Glib::ustring & searchText, guint32 startDoc, guint32 maxHits, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void Query_finish ( guint32 &estimatedHits, std::vector>> &hitsList, const Glib::RefPtr &res); std::tuple>>> Query_sync( const Glib::ustring & engineType, const Glib::ustring & engineName, const Glib::ustring & searchText, guint32 startDoc, guint32 maxHits,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SimpleQuery( const Glib::ustring & searchText, guint32 maxHits, const Gio::SlotAsyncReady &slot, const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); void SimpleQuery_finish ( std::vector &docIds, const Glib::RefPtr &res); std::vector SimpleQuery_sync( const Glib::ustring & searchText, guint32 maxHits,const Glib::RefPtr &cancellable = {}, int timeout_msec = -1); Glib::ustring DaemonVersion_get(bool *ok = nullptr); sigc::signal &DaemonVersion_changed() { return m_DaemonVersion_changed; } guint32 IndexFlushEpoch_get(bool *ok = nullptr); sigc::signal &IndexFlushEpoch_changed() { return m_IndexFlushEpoch_changed; } void reference() const override {} void unreference() const override {} protected: Glib::RefPtr m_proxy; private: PinotProxy(const Glib::RefPtr &proxy); void handle_signal(const Glib::ustring &sender_name, const Glib::ustring &signal_name, const Glib::VariantContainerBase ¶meters); void handle_properties_changed(const Gio::DBus::Proxy::MapChangedProperties &changed_properties, const std::vector &invalidated_properties); sigc::signal m_DaemonVersion_changed; sigc::signal m_IndexFlushEpoch_changed; }; }// fabricecolin }// github }// com pinot-1.22/IndexSearch/PluginParsers.h000066400000000000000000000034271470740426600177530ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _PLUGIN_PARSERS_H #define _PLUGIN_PARSERS_H #include #include #include "Document.h" #include "SearchPluginProperties.h" /// Interface implemented by response parsers. class ResponseParserInterface { public: virtual ~ResponseParserInterface() { } /// Parses the response; false if not all could be parsed. virtual bool parse(const Document *pResponseDoc, std::vector &resultsList, unsigned int &totalResults, unsigned int &firstResultIndex, std::string &charset) const = 0; protected: ResponseParserInterface() { } }; /// Interface implemented by plugin parsers. class PluginParserInterface { public: virtual ~PluginParserInterface() { } /// Parses the plugin and returns a response parser. virtual ResponseParserInterface *parse(SearchPluginProperties &properties, bool minimal = false) = 0; protected: std::string m_fileName; PluginParserInterface(const std::string &fileName) : m_fileName(fileName) { } }; #endif // _PLUGIN_PARSERS_H pinot-1.22/IndexSearch/PluginWebEngine.cpp000066400000000000000000000235631470740426600205350ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include "config.h" #include "Document.h" #include "StringManip.h" #include "Url.h" #include "OpenSearchParser.h" #ifdef HAVE_BOOST_SPIRIT #include "SherlockParser.h" #endif #include "PluginWebEngine.h" using std::clog; using std::clog; using std::endl; PluginWebEngine::PluginWebEngine(const string &fileName) : WebEngine(), m_pResponseParser(NULL) { load(fileName); } PluginWebEngine::~PluginWebEngine() { if (m_pResponseParser != NULL) { delete m_pResponseParser; } } void PluginWebEngine::load(const string &fileName) { if (fileName.empty() == true) { return; } string pluginType; PluginParserInterface *pParser = getPluginParser(fileName, pluginType); if (pParser == NULL) { return; } m_pResponseParser = pParser->parse(m_properties); delete pParser; } bool PluginWebEngine::getPage(const string &formattedQuery, unsigned int maxResultsCount) { if ((m_pResponseParser == NULL) || (formattedQuery.empty() == true)) { return false; } DocumentInfo docInfo("Results Page", formattedQuery, "text/html", ""); Document *pResponseDoc = downloadPage(docInfo); if (pResponseDoc == NULL) { clog << "PluginWebEngine::getPage: couldn't download " << formattedQuery << endl; return false; } off_t contentLen; const char *pContent = pResponseDoc->getData(contentLen); if ((pContent == NULL) || (contentLen == 0)) { #ifdef DEBUG clog << "PluginWebEngine::getPage: downloaded empty page" << endl; #endif delete pResponseDoc; return false; } #ifdef DEBUG Url urlObj(formattedQuery); string fileName(urlObj.getHost() + "_PluginWebEngine.html"); ofstream pageBackup(fileName.c_str()); pageBackup.write(pContent, contentLen); pageBackup.close(); #endif string responseCharset; bool success = m_pResponseParser->parse(pResponseDoc, m_resultsList, maxResultsCount, m_properties.m_nextBase, responseCharset); if (m_charset.empty() == true) { m_charset = responseCharset; #ifdef DEBUG clog << "PluginWebEngine::getPage: page charset is " << m_charset << endl; #endif } vector::iterator resultIter = m_resultsList.begin(); while (resultIter != m_resultsList.end()) { if (processResult(formattedQuery, *resultIter) == false) { // Remove this result if (resultIter == m_resultsList.begin()) { m_resultsList.erase(resultIter); resultIter = m_resultsList.begin(); } else { vector::iterator badResultIter = resultIter; --resultIter; m_resultsList.erase(badResultIter); } } else { // Next ++resultIter; } } delete pResponseDoc; return success; } PluginParserInterface *PluginWebEngine::getPluginParser(const string &fileName, string &pluginType) { if (fileName.empty() == true) { return NULL; } // What type of plugin is it ? // Look at the file extension string::size_type pos = fileName.find_last_of("."); if (pos == string::npos) { // No way to tell return NULL; } string extension(fileName.substr(pos + 1)); #ifdef HAVE_BOOST_SPIRIT if (strncasecmp(extension.c_str(), "src", 3) == 0) { pluginType = "sherlock"; return new SherlockParser(fileName); } else #endif if (strncasecmp(extension.c_str(), "xml", 3) == 0) { pluginType = "opensearch"; return new OpenSearchParser(fileName); } return NULL; } bool PluginWebEngine::getDetails(const string &fileName, SearchPluginProperties &properties) { if (fileName.empty() == true) { return false; } properties.m_option = fileName; PluginParserInterface *pParser = getPluginParser(fileName, properties.m_name); if (pParser == NULL) { return false; } ResponseParserInterface *pResponseParser = pParser->parse(properties, true); if (pResponseParser == NULL) { clog << "PluginWebEngine::getDetails: couldn't parse " << fileName << endl; delete pParser; return false; } delete pResponseParser; delete pParser; if (properties.m_response == SearchPluginProperties::UNKNOWN_RESPONSE) { #ifdef DEBUG clog << "PluginWebEngine::getDetails: bad response type for " << fileName << endl; #endif return false; } return true; } // // Implementation of SearchEngineInterface // /// Runs a query; true if success. bool PluginWebEngine::runQuery(QueryProperties& queryProps, unsigned int startDoc) { string queryString(queryProps.getFreeQuery(true)); char countStr[64]; unsigned int maxResultsCount(queryProps.getMaximumResultsCount()); unsigned int currentIncrement = 0, count = 0; bool firstPage = true; m_resultsList.clear(); m_resultsCountEstimate = 0; if (queryString.empty() == true) { #ifdef DEBUG clog << "PluginWebEngine::runQuery: query is empty" << endl; #endif return false; } string formattedQuery = m_properties.m_baseUrl; map::iterator paramIter = m_properties.m_variableParameters.find(SearchPluginProperties::SEARCH_TERMS_PARAM); if (paramIter != m_properties.m_variableParameters.end()) { formattedQuery += "?"; formattedQuery += paramIter->second; formattedQuery += "="; } #ifdef DEBUG else clog << "PluginWebEngine::runQuery: no user input tag" << endl; #endif formattedQuery += queryString; if (m_properties.m_remainder.empty() == false) { formattedQuery += "&"; formattedQuery += m_properties.m_remainder; } // Encodings ? paramIter = m_properties.m_variableParameters.find(SearchPluginProperties::OUTPUT_ENCODING_PARAM); if ((paramIter != m_properties.m_variableParameters.end()) && (paramIter->second.empty() == false)) { // Output encoding formattedQuery += "&"; formattedQuery += paramIter->second; formattedQuery += "=UTF-8"; } paramIter = m_properties.m_variableParameters.find(SearchPluginProperties::INPUT_ENCODING_PARAM); if ((paramIter != m_properties.m_variableParameters.end()) && (paramIter->second.empty() == false)) { // Input encoding formattedQuery += "&"; formattedQuery += paramIter->second; formattedQuery += "=UTF-8"; } // Editable parameters ? for (map::const_iterator editableIter = m_properties.m_editableParameters.begin(); editableIter != m_properties.m_editableParameters.end(); ++editableIter) { map::const_iterator valueIter = m_editableValues.find(editableIter->second); if (valueIter == m_editableValues.end()) { clog << "PluginWebEngine::runQuery: no value provided for plugin's editable parameter " << editableIter->second << endl; continue; } formattedQuery += "&"; formattedQuery += editableIter->first; formattedQuery += "="; formattedQuery += valueIter->second; } setQuery(queryProps); #ifdef DEBUG clog << "PluginWebEngine::runQuery: querying " << m_properties.m_longName << endl; #endif while (count < maxResultsCount) { string pageQuery(formattedQuery); bool canScroll = false; // How do we scroll ? if (m_properties.m_scrolling == SearchPluginProperties::PER_INDEX) { paramIter = m_properties.m_variableParameters.find(SearchPluginProperties::COUNT_PARAM); if ((paramIter != m_properties.m_variableParameters.end()) && (paramIter->second.empty() == false)) { // Number of results requested pageQuery += "&"; pageQuery += paramIter->second; pageQuery += "="; snprintf(countStr, 64, "%u", maxResultsCount); pageQuery += countStr; canScroll = true; } paramIter = m_properties.m_variableParameters.find(SearchPluginProperties::START_INDEX_PARAM); if ((paramIter != m_properties.m_variableParameters.end()) && (paramIter->second.empty() == false)) { // The offset of the first result (typically 1 or 0) pageQuery += "&"; pageQuery += paramIter->second; pageQuery += "="; snprintf(countStr, 64, "%u", count + m_properties.m_nextBase); pageQuery += countStr; canScroll = true; } } else { paramIter = m_properties.m_variableParameters.find(SearchPluginProperties::START_PAGE_PARAM); if ((paramIter != m_properties.m_variableParameters.end()) && (paramIter->second.empty() == false)) { // The offset of the page pageQuery += "&"; pageQuery += paramIter->second; pageQuery += "="; snprintf(countStr, 64, "%u", currentIncrement + m_properties.m_nextBase); pageQuery += countStr; canScroll = true; } } if ((firstPage == false) && (canScroll == false)) { #ifdef DEBUG clog << "PluginWebEngine::runQuery: can't scroll to the next page of results" << endl; #endif break; } firstPage = false; if (getPage(pageQuery, queryProps.getMaximumResultsCount()) == false) { break; } if (m_properties.m_nextIncrement == 0) { // That one page should have all the results... #ifdef DEBUG clog << "PluginWebEngine::runQuery: performed one off call" << endl; #endif break; } else { if (m_resultsList.size() < count + m_properties.m_nextIncrement) { // We got less than the maximum number of results per page // so there's no point in requesting the next page #ifdef DEBUG clog << "PluginWebEngine::runQuery: last page wasn't full" << endl; #endif break; } // Increase factor currentIncrement += m_properties.m_nextIncrement; } count = m_resultsList.size(); } m_resultsCountEstimate = m_resultsList.size(); return true; } pinot-1.22/IndexSearch/PluginWebEngine.h000066400000000000000000000035211470740426600201720ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _PLUGIN_WEB_ENGINE_H #define _PLUGIN_WEB_ENGINE_H #include #include "PluginParsers.h" #include "SearchPluginProperties.h" #include "WebEngine.h" /// A plugin-based search engine. class PluginWebEngine : public WebEngine { public: PluginWebEngine(const std::string &fileName); virtual ~PluginWebEngine(); /// Utility method that returns a search plugin's name and channel. static bool getDetails(const std::string &fileName, SearchPluginProperties &properties); /// Runs a query; true if success. virtual bool runQuery(QueryProperties& queryProps, unsigned int startDoc = 0); protected: SearchPluginProperties m_properties; ResponseParserInterface *m_pResponseParser; void load(const std::string &fileName); bool getPage(const std::string &formattedQuery, unsigned int maxResultsCount); static PluginParserInterface *getPluginParser(const std::string &fileName, std::string &pluginType); private: PluginWebEngine(const PluginWebEngine &other); PluginWebEngine &operator=(const PluginWebEngine &other); }; #endif // _PLUGIN_WEB_ENGINE_H pinot-1.22/IndexSearch/Plugins/000077500000000000000000000000001470740426600164175ustar00rootroot00000000000000pinot-1.22/IndexSearch/Plugins/AmazonAPI.src000066400000000000000000000015161470740426600207120ustar00rootroot00000000000000# Amazon REST API Search Plugin # Edit this field's value to set to your subscription ID pinot-1.22/IndexSearch/Plugins/Arxiv.src000066400000000000000000000010111470740426600202120ustar00rootroot00000000000000# Arxiv Search Plugin pinot-1.22/IndexSearch/Plugins/OmegaDescription.xml000066400000000000000000000010761470740426600224010ustar00rootroot00000000000000 Xapian Omega The Xapian Omega CGI search frontend Omega is a CGI application which uses the Xapian Information Retrieval library to search collections of documents. Local Host pinot-1.22/IndexSearch/Plugins/UNData.src000066400000000000000000000004531470740426600202460ustar00rootroot00000000000000 pinot-1.22/IndexSearch/Plugins/Wikipedia.src000066400000000000000000000007601470740426600210410ustar00rootroot00000000000000# Wikipedia Search Plugin pinot-1.22/IndexSearch/QueryProperties.cpp000066400000000000000000000160121470740426600206640ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include "Document.h" #include "StringManip.h" #include "CJKVTokenizer.h" #include "QueryProperties.h" class FilterRemover : public Dijon::CJKVTokenizer::TokensHandler { public: FilterRemover(const string &freeQuery) : Dijon::CJKVTokenizer::TokensHandler(), m_freeQuery(freeQuery) { } virtual ~FilterRemover() { } virtual bool handle_token(const string &tok, bool is_cjkv) { if (tok.empty() == true) { return false; } // Is this CJKV ? if (is_cjkv == false) { if ((tok.find(':') != string::npos) || (tok.find("..") != string::npos)) { string::size_type tokPosStart = m_freeQuery.find(tok); // It's a filter or a range, remove it if (tokPosStart != string::npos) { string::size_type tokPosEnd = m_freeQuery.find(" ", tokPosStart); if (tokPosEnd != string::npos) { m_freeQuery.erase(tokPosStart, tokPosEnd - tokPosStart); } else { m_freeQuery.erase(tokPosStart); } } return true; } } return true; } string m_freeQuery; }; class SetTokensHandler : public Dijon::CJKVTokenizer::TokensHandler { public: SetTokensHandler(set &tokens) : Dijon::CJKVTokenizer::TokensHandler(), m_tokens(tokens) { } virtual ~SetTokensHandler() { } virtual bool handle_token(const string &tok, bool is_cjkv) { m_tokens.insert(tok); return true; } protected: set &m_tokens; }; QueryProperties::QueryProperties() : m_order(RELEVANCE), m_resultsCount(10), m_indexResults(NOTHING), m_modified(false) { } QueryProperties::QueryProperties(const string &name, const string &freeQuery) : m_name(name), m_order(RELEVANCE), m_freeQuery(freeQuery), m_resultsCount(10), m_indexResults(NOTHING), m_modified(false) { removeFilters(); } QueryProperties::QueryProperties(const QueryProperties &other) : m_name(other.m_name), m_order(other.m_order), m_language(other.m_language), m_freeQuery(other.m_freeQuery), m_freeQueryWithoutFilters(other.m_freeQueryWithoutFilters), m_resultsCount(other.m_resultsCount), m_indexResults(other.m_indexResults), m_labelName(other.m_labelName), m_modified(other.m_modified) { } QueryProperties::~QueryProperties() { } QueryProperties &QueryProperties::operator=(const QueryProperties &other) { if (this != &other) { m_name = other.m_name; m_order = other.m_order; m_language = other.m_language; m_freeQuery = other.m_freeQuery; m_freeQueryWithoutFilters = other.m_freeQueryWithoutFilters; m_resultsCount = other.m_resultsCount; m_indexResults = other.m_indexResults; m_labelName = other.m_labelName; m_modified = other.m_modified; } return *this; } bool QueryProperties::operator==(const QueryProperties &other) const { if (m_name == other.m_name) { return true; } return false; } bool QueryProperties::operator<(const QueryProperties &other) const { if (m_name < other.m_name) { return true; } return false; } void QueryProperties::removeFilters(void) { m_freeQueryWithoutFilters.clear(); StringManip::trimSpaces(m_freeQuery); if (m_freeQuery.empty() == true) { return; } // Remove carriage returns string noCR(StringManip::replaceSubString(m_freeQuery, "\r\n", "\n")); // If there's a line break right after an hyphen, remove both string dehyphenedOnNL(StringManip::replaceSubString(noCR, "-\n", "")); // ... and replace line breaks with spaces m_freeQuery = StringManip::replaceSubString(dehyphenedOnNL, "\n", " "); Dijon::CJKVTokenizer tokenizer; FilterRemover handler(m_freeQuery); tokenizer.tokenize(m_freeQuery, handler, true); m_freeQueryWithoutFilters = handler.m_freeQuery; StringManip::trimSpaces(m_freeQueryWithoutFilters); } /// Sets the name. void QueryProperties::setName(const string &name) { m_name = name; } /// Gets the name. string QueryProperties::getName(void) const { return m_name; } /// Sets the sort order. void QueryProperties::setSortOrder(SortOrder order) { m_order = order; } /// Gets the sort order. QueryProperties::SortOrder QueryProperties::getSortOrder(void) const { return m_order; } /// Sets the language to use for stemming. void QueryProperties::setStemmingLanguage(const string &language) { m_language = language; } /// Gets the language to use for stemming. string QueryProperties::getStemmingLanguage(void) const { return m_language; } /// Sets the query string. void QueryProperties::setFreeQuery(const string &freeQuery) { m_freeQuery = freeQuery; removeFilters(); } /// Gets the query string. string QueryProperties::getFreeQuery(bool withoutFilters) const { if (withoutFilters == false) { return m_freeQuery; } #ifdef DEBUG clog << "QueryProperties::getFreeQuery: " << m_freeQueryWithoutFilters << endl; #endif return m_freeQueryWithoutFilters; } /// Sets the maximum number of results. void QueryProperties::setMaximumResultsCount(unsigned int count) { m_resultsCount = count; } /// Gets the maximum number of results. unsigned int QueryProperties::getMaximumResultsCount(void) const { return m_resultsCount; } /// Sets whether results should be indexed. void QueryProperties::setIndexResults(IndexWhat indexResults) { m_indexResults = indexResults; } /// Gets whether results should be indexed QueryProperties::IndexWhat QueryProperties::getIndexResults(void) const { return m_indexResults; } /// Sets the name of the label to use for indexed documents. void QueryProperties::setLabelName(const string &labelName) { m_labelName = labelName; } /// Gets the name of the label to use for indexed documents. string QueryProperties::getLabelName(void) const { return m_labelName; } /// Sets whether the query was modified in some way. void QueryProperties::setModified(bool isModified) { m_modified = isModified; } /// Gets whether the query was modified in some way. bool QueryProperties::getModified(void) const { return m_modified; } /// Returns the query's terms. void QueryProperties::getTerms(set &terms) const { Dijon::CJKVTokenizer tokenizer; SetTokensHandler handler(terms); terms.clear(); tokenizer.tokenize(m_freeQueryWithoutFilters, handler); } /// Returns whether the query is empty. bool QueryProperties::isEmpty() const { if (m_freeQuery.empty() == true) { return true; } return false; } pinot-1.22/IndexSearch/QueryProperties.h000066400000000000000000000062501470740426600203340ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _QUERY_PROPERTIES_H #define _QUERY_PROPERTIES_H #include #include #include "Visibility.h" using namespace std; /// This represents a query. class PINOT_EXPORT QueryProperties { public: typedef enum { RELEVANCE = 0, DATE_DESC, DATE_ASC, SIZE_DESC } SortOrder; typedef enum { NOTHING = 0, ALL_RESULTS, NEW_RESULTS } IndexWhat; QueryProperties(); QueryProperties(const string &name, const string &freeQuery); QueryProperties(const QueryProperties &other); ~QueryProperties(); QueryProperties &operator=(const QueryProperties &other); bool operator==(const QueryProperties &other) const; bool operator<(const QueryProperties &other) const; /// Sets the name. void setName(const string &name); /// Gets the name. string getName(void) const; /// Sets the sort order. void setSortOrder(SortOrder order); /// Gets the sort order. SortOrder getSortOrder(void) const; /// Sets the language to use for stemming. void setStemmingLanguage(const string &language); /// Gets the language to use for stemming. string getStemmingLanguage(void) const; /// Sets the query string. void setFreeQuery(const string &freeQuery); /// Gets the query string. string getFreeQuery(bool withoutFilters = false) const; /// Sets the maximum number of results. void setMaximumResultsCount(unsigned int count); /// Gets the maximum number of results. unsigned int getMaximumResultsCount(void) const; /// Sets whether results should be indexed. void setIndexResults(IndexWhat indexResults); /// Gets whether results should be indexed IndexWhat getIndexResults(void) const; /// Sets the name of the label to use for indexed documents. void setLabelName(const string &labelName); /// Gets the name of the label to use for indexed documents. string getLabelName(void) const; /// Sets whether the query was modified in some way. void setModified(bool isModified); /// Gets whether the query was modified in some way. bool getModified(void) const; /// Returns the query's terms. void getTerms(set &terms) const; /// Returns whether the query is empty. bool isEmpty() const; protected: string m_name; SortOrder m_order; string m_language; string m_freeQuery; string m_freeQueryWithoutFilters; unsigned int m_resultsCount; IndexWhat m_indexResults; string m_labelName; bool m_modified; void removeFilters(void); }; #endif // _QUERY_PROPERTIES_H pinot-1.22/IndexSearch/ResultsExporter.cpp000066400000000000000000000150631470740426600207010ustar00rootroot00000000000000/* * Copyright 2007-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include "StringManip.h" #include "FilterUtils.h" #include "ResultsExporter.h" using std::string; using std::vector; using std::ofstream; using std::endl; using xmlpp::Element; static Element *addChildElement(Element *pElem, const string &nodeName, const string &nodeContent) { if (pElem == NULL) { return NULL; } Element *pSubElem = pElem->add_child_element(nodeName); if (pSubElem != NULL) { pSubElem->set_first_child_text(nodeContent); } return pSubElem; } ResultsExporter::ResultsExporter(const string &fileName, const QueryProperties &queryProps) : m_fileName(fileName), m_queryName(queryProps.getName()), m_queryDetails(queryProps.getFreeQuery()) { } ResultsExporter::~ResultsExporter() { } CSVExporter::CSVExporter(const string &fileName, const QueryProperties &queryProps) : ResultsExporter(fileName, queryProps) { } CSVExporter::~CSVExporter() { } bool CSVExporter::exportResults(const string &engineName, unsigned int maxResultsCount, const vector &resultsList) { if ((resultsList.empty() == true) || (exportStart(engineName, maxResultsCount) == false)) { return false; } for (vector::const_iterator iter = resultsList.begin(); iter != resultsList.end(); ++iter) { exportResult(engineName, *iter); } exportEnd(); return true; } bool CSVExporter::exportStart(const string &engineName, unsigned int maxResultsCount) { if (m_fileName.empty() == true) { return false; } m_outputFile.open(m_fileName.c_str()); if (m_outputFile.good() == false) { m_outputFile.close(); return false; } m_outputFile << "\"query\";\"engine\";\"caption\";\"url\";\"type\";\"language\";\"modtime\";\"size\";\"abstract\"" << endl; return true; } bool CSVExporter::exportResult(const string &engineName, const DocumentInfo &docInfo) { string title(FilterUtils::stripMarkup(docInfo.getTitle())); string extract(FilterUtils::stripMarkup(docInfo.getExtract())); if (m_outputFile.good() == false) { return false; } // Double double-quotes m_outputFile << "\"" << StringManip::replaceSubString(m_queryName, "\"", "\"\"") << "\";\"" << StringManip::replaceSubString(engineName, "\"", "\"\"") << "\";\"" << StringManip::replaceSubString(title, "\"", "\"\"") << "\";\"" << StringManip::replaceSubString(docInfo.getLocation(), "\"", "\"\"") << "\";\"" << StringManip::replaceSubString(docInfo.getType(), "\"", "\"\"") << "\";\"" << StringManip::replaceSubString(docInfo.getLanguage(), "\"", "\"\"") << "\";\"" << StringManip::replaceSubString(docInfo.getTimestamp(), "\"", "\"\"") << "\";\"" << docInfo.getSize() << "\";\"" << StringManip::replaceSubString(extract, "\"", "\"\"") << "\"" << endl; return true; } void CSVExporter::exportEnd(void) { m_outputFile.close(); } OpenSearchExporter::OpenSearchExporter(const string &fileName, const QueryProperties &queryProps) : ResultsExporter(fileName, queryProps), m_pDoc(NULL), m_pChannelElem(NULL) { } OpenSearchExporter::~OpenSearchExporter() { } bool OpenSearchExporter::exportResults(const string &engineName, unsigned int maxResultsCount, const vector &resultsList) { if ((resultsList.empty() == true) || (exportStart(engineName, maxResultsCount) == false)) { return false; } for (vector::const_iterator iter = resultsList.begin(); iter != resultsList.end(); ++iter) { exportResult(engineName, *iter); } exportEnd(); return true; } bool OpenSearchExporter::exportStart(const string &engineName, unsigned int maxResultsCount) { if (m_fileName.empty() == true) { return false; } if (m_pDoc != NULL) { delete m_pDoc; m_pDoc = NULL; m_pChannelElem = NULL; } Element *pRootElem = NULL; string description("Search"); char numStr[64]; m_pDoc = new xmlpp::Document("1.0"); // Create a new node pRootElem = m_pDoc->create_root_node("rss"); if (pRootElem == NULL) { return false; } pRootElem->set_attribute("version", "2.0"); pRootElem->set_attribute("xmlns:opensearch", "http://a9.com/-/spec/opensearch/1.1/"); pRootElem->set_attribute("xmlns:atom", "http://www.w3.org/2005/Atom"); // User interface position and size m_pChannelElem = pRootElem->add_child_element("channel"); if (m_pChannelElem == NULL) { return false; } if (m_queryName.empty() == false) { addChildElement(m_pChannelElem, "title", m_queryName); } if (m_queryName.empty() == false) { description += " for \""; description += m_queryName; description += "\""; } if (engineName.empty() == false) { description += " on "; description += engineName; } addChildElement(m_pChannelElem, "description", description); snprintf(numStr, 64, "%d", maxResultsCount); addChildElement(m_pChannelElem, "opensearch:totalResults", numStr); addChildElement(m_pChannelElem, "opensearch:itemsPerPage", numStr); if (m_queryDetails.empty() == false) { Element *pQueryElem = addChildElement(m_pChannelElem, "opensearch:Query", ""); if (pQueryElem != NULL) { pQueryElem->set_attribute("role", "request"); pQueryElem->set_attribute("searchTerms", m_queryDetails); pQueryElem->set_attribute("startPage", "1"); } } return true; } bool OpenSearchExporter::exportResult(const string &engineName, const DocumentInfo &docInfo) { if (m_pChannelElem == NULL) { return false; } Element *pElem = m_pChannelElem->add_child_element("item"); addChildElement(pElem, "title", docInfo.getTitle()); addChildElement(pElem, "link", docInfo.getLocation()); addChildElement(pElem, "description", FilterUtils::stripMarkup(docInfo.getExtract())); return true; } void OpenSearchExporter::exportEnd(void) { if (m_pDoc == NULL) { return; } // Save to file m_pDoc->write_to_file_formatted(m_fileName); m_pChannelElem = NULL; delete m_pDoc; m_pDoc = NULL; } pinot-1.22/IndexSearch/ResultsExporter.h000066400000000000000000000071511470740426600203450ustar00rootroot00000000000000/* * Copyright 2007 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _RESULTS_EXPORTER_H #define _RESULTS_EXPORTER_H #include #include #include #include #include #include "DocumentInfo.h" #include "QueryProperties.h" /// Exports results to a given format. class ResultsExporter { public: ResultsExporter(const std::string &fileName, const QueryProperties &queryProps); virtual ~ResultsExporter(); /// Exports a list of results. virtual bool exportResults(const std::string &engineName, unsigned int maxResultsCount, const std::vector &resultsList) = 0; /// Starts export. virtual bool exportStart(const std::string &engineName, unsigned int maxResultsCount) = 0; /// Exports the given result. virtual bool exportResult(const std::string &engineName, const DocumentInfo &docInfo) = 0; /// Ends export. virtual void exportEnd(void) = 0; protected: std::string m_fileName; std::string m_queryName; std::string m_queryDetails; private: ResultsExporter(const ResultsExporter &other); ResultsExporter& operator=(const ResultsExporter& other); }; /// Exports results to CSV. class CSVExporter : public ResultsExporter { public: CSVExporter(const std::string &fileName, const QueryProperties &queryProps); virtual ~CSVExporter(); /// Exports the results; false if an error occurred. virtual bool exportResults(const std::string &engineName, unsigned int maxResultsCount, const std::vector &resultsList); /// Starts export. virtual bool exportStart(const std::string &engineName, unsigned int maxResultsCount); /// Exports the given result. virtual bool exportResult(const std::string &engineName, const DocumentInfo &docInfo); /// Ends export. virtual void exportEnd(void); protected: std::ofstream m_outputFile; private: CSVExporter(const CSVExporter &other); CSVExporter& operator=(const CSVExporter& other); }; /// Exports results to OpenSearch response. class OpenSearchExporter : public ResultsExporter { public: OpenSearchExporter(const std::string &fileName, const QueryProperties &queryProps); virtual ~OpenSearchExporter(); /// Exports the results; false if an error occurred. virtual bool exportResults(const std::string &engineName, unsigned int maxResultsCount, const std::vector &resultsList); /// Starts export. virtual bool exportStart(const std::string &engineName, unsigned int maxResultsCount); /// Exports the given result. virtual bool exportResult(const std::string &engineName, const DocumentInfo &docInfo); /// Ends export. virtual void exportEnd(void); protected: xmlpp::Document *m_pDoc; xmlpp::Element *m_pChannelElem; private: OpenSearchExporter(const OpenSearchExporter &other); OpenSearchExporter& operator=(const OpenSearchExporter& other); }; #endif // _RESULTS_EXPORTER_H pinot-1.22/IndexSearch/SearchEngineInterface.cpp000066400000000000000000000044401470740426600216600ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include "Document.h" #include "StringManip.h" #include "Url.h" #include "SearchEngineInterface.h" using std::clog; using std::endl; SearchEngineInterface::SearchEngineInterface() : m_defaultOperator(DEFAULT_OP_AND), m_resultsCountEstimate(0) { } SearchEngineInterface::~SearchEngineInterface() { } /// Sets whether AND is the default operator. void SearchEngineInterface::setDefaultOperator(DefaultOperator op) { m_defaultOperator = op; } /// Sets the set of documents to limit to. bool SearchEngineInterface::setLimitSet(const set &docsSet) { // Not all engines support this return false; } /// Sets the set of documents to expand from. bool SearchEngineInterface::setExpandSet(const set &docsSet) { // Not all engines support this return false; } /// Returns the results for the previous query. const vector &SearchEngineInterface::getResults(void) const { return m_resultsList; } /// Returns an estimate of the total number of results for the previous query. unsigned int SearchEngineInterface::getResultsCountEstimate(void) const { return m_resultsCountEstimate; } /// Returns the charset for the previous query's results. string SearchEngineInterface::getResultsCharset(void) const { return m_charset; } /// Suggests a spelling correction. string SearchEngineInterface::getSpellingCorrection(void) const { return m_correctedFreeQuery; } /// Returns expand terms from the previous query. const set &SearchEngineInterface::getExpandTerms(void) const { return m_expandTerms; } pinot-1.22/IndexSearch/SearchEngineInterface.h000066400000000000000000000051251470740426600213260ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _SEARCH_ENGINE_INTERFACE_H #define _SEARCH_ENGINE_INTERFACE_H #include #include #include #include #include "DocumentInfo.h" #include "Visibility.h" #include "QueryProperties.h" using namespace std; /// Interface implemented by search engines. class PINOT_EXPORT SearchEngineInterface { public: typedef enum { DEFAULT_OP_AND = 0, DEFAULT_OP_OR } DefaultOperator; virtual ~SearchEngineInterface(); /// Sets whether AND is the default operator. virtual void setDefaultOperator(DefaultOperator op); /// Sets the set of documents to limit to. virtual bool setLimitSet(const set &docsSet); /// Sets the set of documents to expand from. virtual bool setExpandSet(const set &docsSet); /// Runs a query; true if success. virtual bool runQuery(QueryProperties& queryProps, unsigned int startDoc = 0) = 0; /// Returns the results for the previous query. virtual const vector &getResults(void) const; /// Returns an estimate of the total number of results for the previous query. virtual unsigned int getResultsCountEstimate(void) const; /// Returns the charset for the previous query's results. virtual string getResultsCharset(void) const; /// Suggests a spelling correction. virtual string getSpellingCorrection(void) const; /// Returns expand terms from the previous query. virtual const set &getExpandTerms(void) const; protected: DefaultOperator m_defaultOperator; vector m_resultsList; unsigned int m_resultsCountEstimate; string m_charset; string m_correctedFreeQuery; set m_expandTerms; SearchEngineInterface(); private: SearchEngineInterface(const SearchEngineInterface &other); SearchEngineInterface &operator=(const SearchEngineInterface &other); }; #endif // _SEARCH_ENGINE_INTERFACE_H pinot-1.22/IndexSearch/SearchPluginProperties.cpp000066400000000000000000000050061470740426600221440ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "SearchPluginProperties.h" using std::string; using std::map; using std::set; SearchPluginProperties::SearchPluginProperties() : ModuleProperties(), m_method(GET_METHOD), m_scrolling(PER_PAGE), m_nextIncrement(0), m_nextBase(0), m_response(UNKNOWN_RESPONSE) { } SearchPluginProperties::SearchPluginProperties(const SearchPluginProperties &other) : ModuleProperties(other), m_languages(other.m_languages), m_baseUrl(other.m_baseUrl), m_method(other.m_method), m_variableParameters(other.m_variableParameters), m_editableParameters(other.m_editableParameters), m_remainder(other.m_remainder), m_outputType(other.m_outputType), m_scrolling(other.m_scrolling), m_nextIncrement(other.m_nextIncrement), m_nextBase(other.m_nextBase), m_response(other.m_response) { } SearchPluginProperties::~SearchPluginProperties() { } SearchPluginProperties& SearchPluginProperties::operator=(const SearchPluginProperties& other) { ModuleProperties::operator=(other); if (this != &other) { m_languages = other.m_languages; m_baseUrl = other.m_baseUrl; m_method = other.m_method; m_variableParameters = other.m_variableParameters; m_editableParameters = other.m_editableParameters; m_remainder = other.m_remainder; m_outputType = other.m_outputType; m_scrolling = other.m_scrolling; m_nextIncrement = other.m_nextIncrement; m_nextBase = other.m_nextBase; m_response = other.m_response; } return *this; } bool SearchPluginProperties::operator==(const SearchPluginProperties &other) const { if (ModuleProperties::operator==(other) == true) { return true; } return false; } bool SearchPluginProperties::operator<(const SearchPluginProperties &other) const { if (ModuleProperties::operator<(other) == true) { return true; } return false; } pinot-1.22/IndexSearch/SearchPluginProperties.h000066400000000000000000000042701470740426600216130ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _SEARCH_PLUGIN_PROPERTIES_H #define _SEARCH_PLUGIN_PROPERTIES_H #include #include #include #include "ModuleProperties.h" /// Properties of a search engine plugin. class SearchPluginProperties : public ModuleProperties { public: SearchPluginProperties(); SearchPluginProperties(const SearchPluginProperties &other); virtual ~SearchPluginProperties(); SearchPluginProperties& operator=(const SearchPluginProperties& other); bool operator==(const SearchPluginProperties &other) const; bool operator<(const SearchPluginProperties &other) const; typedef enum { GET_METHOD = 0, POST_METHOD } Method; typedef enum { UNKNOWN_PARAM = 0, SEARCH_TERMS_PARAM, COUNT_PARAM,START_INDEX_PARAM, START_PAGE_PARAM, LANGUAGE_PARAM, OUTPUT_ENCODING_PARAM, INPUT_ENCODING_PARAM } ParameterVariable; typedef enum { PER_PAGE = 0, PER_INDEX } Scrolling; typedef enum { UNKNOWN_RESPONSE = 0, HTML_RESPONSE, RSS_RESPONSE, ATOM_RESPONSE } Response; // Description std::set m_languages; // Query std::string m_baseUrl; Method m_method; std::map m_variableParameters; std::map m_editableParameters; std::string m_remainder; std::string m_outputType; // Scrolling Scrolling m_scrolling; unsigned int m_nextIncrement; unsigned int m_nextBase; // Response Response m_response; }; #endif // _SEARCH_PLUGIN_PROPERTIES_H pinot-1.22/IndexSearch/SherlockParser.cpp000066400000000000000000000517701470740426600204430ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #ifdef HAVE_BOOST_SPIRIT_CORE_HPP #include #include #include #include #else #ifdef HAVE_BOOST_SPIRIT_INCLUDE_CLASSIC_HPP #define BOOST_SPIRIT_USE_OLD_NAMESPACE #include #include #include #include #else #ifdef HAVE_BOOST_SPIRIT_HPP #include #include #include #include #endif #endif #endif #include "StringManip.h" #include "Url.h" #include "HtmlFilter.h" #include "FilterFactory.h" #include "FileCollector.h" #include "FilterUtils.h" #include "SherlockParser.h" using std::clog; using std::endl; using std::string; using std::vector; using std::map; using std::set; using std::exception; using namespace boost::spirit; // A function object to lower case map keys with for_each() struct LowerAndCopy { public: LowerAndCopy(map &other) : m_other(other) { } void operator()(map::value_type &p) { m_other[StringManip::toLowerCase(p.first)] = p.second; } map &m_other; }; struct plugin_skip_grammar : public grammar { template struct definition { definition(plugin_skip_grammar const &self) { // Skip all spaces and comments, starting with a # // FIXME: make sure comments start at the beginning of the line ! skip = space_p | (ch_p('#') >> *(anychar_p - ch_p('\n')) >> ch_p('\n')); } rule skip; rule const& start() const { return skip; } }; }; /** * A complete but lax grammar for Sherlock plugins. * For instance, it doesn't mind if INPUT has a NAME but no VALUE. * More importantly, it doesn't enforce types, eg FACTOR should be an integer. */ struct plugin_grammar : public grammar { plugin_grammar(map &searchParams, map &interpretParams, map &inputItems, string &userInput, string &nextInput, string &nextFactor, string &nextValue) : m_searchParams(searchParams), m_interpretParams(interpretParams), m_inputItems(inputItems), m_userInput(userInput), m_nextInput(nextInput), m_nextFactor(nextFactor), m_nextValue(nextValue) { } template struct definition { definition(plugin_grammar const &self) { // Start search_plugin = search_header >> input_elements >> search_footer >> rest; // All items have a name and an optionally-quoted value, separated by = end_of_name = ch_p('='); any_name = *(~ch_p('>') - end_of_name); any_value_without_quotes = lexeme_d[*(~ch_p('>') - ch_p('\n'))]; any_value = ch_p('\'') >> (*(~ch_p('\'')))[assign_a(unquotedValue)] >> ch_p('\'') | ch_p('"') >> (*(~ch_p('"')))[assign_a(unquotedValue)] >> ch_p('"') | any_value_without_quotes[assign_a(unquotedValue)]; // SEARCH attributes are items // There should be only one SEARCH tag search_item = (any_name[assign_a(itemName)] >> ch_p('=') >> any_value[assign_a(itemValue, unquotedValue)]) [insert_at_a(self.m_searchParams, itemName, itemValue)]; // SEARCH may have any number of attributes search_header = ch_p('<') >> as_lower_d[str_p("search")] >> *search_item >> ch_p('>'); // INPUT input_item_name = as_lower_d[str_p("name")] >> ch_p('=') >> any_value[assign_a(itemName, unquotedValue)]; input_item_value = as_lower_d[str_p("value")] >> ch_p('=') >> any_value[assign_a(itemValue, unquotedValue)]; input_item_user = as_lower_d[str_p("user")]; input_item_factor = as_lower_d[str_p("factor")] >> ch_p('=') >> any_value[assign_a(itemValue, unquotedValue)]; // INPUT tags have name and value items; one is marked with USER input_item = input_item_name | input_item_value | input_item_user[assign_a(self.m_userInput, itemName)]; input_element = (ch_p('<') >> as_lower_d[str_p("input")] >> *input_item >> ch_p('>')) [insert_at_a(self.m_inputItems, itemName, itemValue)]; // INPUTPREV tags have name and either factor or value items // There should be only one INPUTPREV tag // FIXME: save those inputprev_item = input_item_name | input_item_factor | input_item_value; inputprev_element = ch_p('<') >> as_lower_d[str_p("inputprev")] >> *inputprev_item >> ch_p('>'); // INPUTNEXT tags have name and either factor or value items // There should be only one INPUTNEXT tag inputnext_item = input_item_name[assign_a(self.m_nextInput, itemName)] | input_item_factor[assign_a(self.m_nextFactor, itemValue)] | input_item_value[assign_a(self.m_nextValue, itemValue)]; inputnext_element = ch_p('<') >> as_lower_d[str_p("inputnext")] >> *inputnext_item >> ch_p('>'); // INTERPRET tags have varied types of items // There should be only one INTERPRET tag interpret_item = (any_name[assign_a(itemName)] >> ch_p('=') >> any_value[assign_a(itemValue, unquotedValue)]) [insert_at_a(self.m_interpretParams, itemName, itemValue)]; interpret_element = ch_p('<') >> as_lower_d[str_p("interpret")] >> *interpret_item >> ch_p('>'); // INPUT, INPUTNEXT and INTERPRET may appear in any order input_elements = *(input_element | inputprev_element | inputnext_element | interpret_element); // SEARCH has a closing tag search_footer = ch_p('<') >> ch_p('/') >> as_lower_d[str_p("search")] >> ch_p('>'); // Rest rest = *anychar_p; } string unquotedValue, itemName, itemValue; rule search_plugin, search_header, search_footer, rest; rule end_of_name, any_name, any_value_without_quotes, any_value, search_item; rule input_elements, input_element, inputprev_element, inputnext_element, interpret_element; rule input_item_name, input_item_value, input_item_user, input_item_factor; rule input_item, inputprev_item, inputnext_item, interpret_item; rule const& start() const { return search_plugin; } }; map &m_searchParams; map &m_interpretParams; map &m_inputItems; string &m_userInput; string &m_nextInput; string &m_nextFactor; string &m_nextValue; }; SherlockResponseParser::SherlockResponseParser() : ResponseParserInterface(), m_skipLocal(true) { } SherlockResponseParser::~SherlockResponseParser() { } bool SherlockResponseParser::parse(const Document *pResponseDoc, vector &resultsList, unsigned int &totalResults, unsigned int &firstResultIndex, string &charset) const { float pseudoScore = 100; off_t contentLen = 0; bool foundResult = false; if ((pResponseDoc == NULL) || (pResponseDoc->getData(contentLen) == NULL) || (contentLen == 0)) { return false; } // Can we get the charset ? Dijon::HtmlFilter htmlFilter; htmlFilter.set_mime_type("text/html"); if (FilterUtils::feedFilter(*pResponseDoc, &htmlFilter) == true) { const map &metaData = htmlFilter.get_meta_data(); map::const_iterator charsetIter = metaData.find("charset"); if (charsetIter != metaData.end()) { charset = charsetIter->second; #ifdef DEBUG clog << "SherlockResponseParser::parse: response charset is " << charset << endl; #endif } } // These two are the minimum we need if ((m_resultItemStart.empty() == true) || (m_resultItemEnd.empty() == true)) { #ifdef DEBUG clog << "SherlockResponseParser::parse: incomplete properties" << endl; #endif return false; } string listStart(m_resultListStart); string listEnd(m_resultListEnd); string resStart(m_resultItemStart); string resEnd(m_resultItemEnd); // Extract the results list #ifdef DEBUG clog << "SherlockResponseParser::parse: getting results list (" << m_resultListStart << ", " << m_resultListEnd << ")" << endl; #endif const char *pContent = pResponseDoc->getData(contentLen); string resultList = StringManip::extractField(pContent, listStart, listEnd); if (resultList.empty() == true) { // The other quotes may be used if (m_resultListStart.find("\"") != string::npos) { listStart = StringManip::replaceSubString(m_resultListStart, "\"", "'"); } else if (m_resultListStart.find("'") != string::npos) { listStart = StringManip::replaceSubString(m_resultListStart, "'", "\""); } // Try again resultList = StringManip::extractField(pContent, listStart, listEnd); } if (resultList.empty() == true) { resultList = string(pContent, contentLen); } // Extract results string::size_type endPos = 0; #ifdef DEBUG clog << "SherlockResponseParser::parse: getting first result (" << m_resultItemStart << ", " << m_resultItemEnd << ")" << endl; #endif string resultItem = StringManip::extractField(resultList, resStart, resEnd, endPos); if (resultItem.empty() == true) { // The other quotes may be used if (m_resultItemStart.find("\"") != string::npos) { resStart = StringManip::replaceSubString(m_resultItemStart, "\"", "'"); } else if (m_resultItemStart.find("'") != string::npos) { resStart = StringManip::replaceSubString(m_resultItemStart, "'", "\""); } // Try again resultItem = StringManip::extractField(resultList, resStart, resEnd, endPos); } while (resultItem.empty() == false) { string contentType, url, name, extract; #ifdef DEBUG clog << "SherlockResponseParser::parse: candidate chunk \"" << resultItem << "\"" << endl; #endif contentType = pResponseDoc->getType(); if (strncasecmp(contentType.c_str(), "text/html", 9) == 0) { Document chunkDoc("", "", contentType, ""); string htmlChunk(resultItem); // The chunk may contain truncated tags, get rid of them ! string::size_type firstOpen = htmlChunk.find('<'); string::size_type firstClose = htmlChunk.find('>'); if (firstClose != string::npos) { if ((firstOpen == string::npos) || (firstClose < firstOpen)) { htmlChunk.erase(0, firstClose + 1); } } string::size_type lastClose = htmlChunk.find_last_of(">"); string::size_type lastOpen = htmlChunk.find_last_of("<"); if (lastOpen != string::npos) { if ((lastClose == string::npos) || (lastOpen > lastClose)) { htmlChunk.erase(lastOpen); } } // Wrap input string dummyHtml("getType(); dummyHtml += "\">"; dummyHtml += htmlChunk; dummyHtml += ""; #ifdef DEBUG clog << "SherlockResponseParser::parse: wrapped chunk \"" << dummyHtml << "\"" << endl; #endif chunkDoc.setData(dummyHtml.c_str(), dummyHtml.length()); // Feed this chunk to the filter Dijon::HtmlFilter chunkFilter; set chunkLinks; htmlFilter.set_mime_type("text/html"); if ((FilterUtils::feedFilter(chunkDoc, &chunkFilter) == true) && (chunkFilter.get_links(chunkLinks) == true) && (chunkFilter.next_document() == true)) { unsigned int endOfFirstLink = 0, startOfSecondLink = 0, endOfSecondLink = 0, startOfThirdLink = 0; // The result's URL and title should be given by the first link for (set::iterator linkIter = chunkLinks.begin(); linkIter != chunkLinks.end(); ++linkIter) { if (linkIter->m_index == 0) { url = linkIter->m_url; name = linkIter->m_name; #ifdef DEBUG clog << "SherlockResponseParser::parse: first link in chunk is " << url << endl; #endif endOfFirstLink = linkIter->m_endPos; } else if (linkIter->m_index == 1) { startOfSecondLink = linkIter->m_startPos; endOfSecondLink = linkIter->m_endPos; } else if (linkIter->m_index == 2) { startOfThirdLink = linkIter->m_startPos; } } // Any extract ? const map &metaData = chunkFilter.get_meta_data(); map::const_iterator abstractIter = metaData.find("abstract"); if (abstractIter == metaData.end()) { extract = FilterUtils::stripMarkup(resultItem); StringManip::trimSpaces(extract); } else { extract = abstractIter->second; } } } else { // This is not HTML // Use extended attributes if ((m_resultTitleStart.empty() == false) && (m_resultTitleEnd.empty() == false)) { name = StringManip::extractField(resultItem, m_resultTitleStart, m_resultTitleEnd); } if ((m_resultLinkStart.empty() == false) && (m_resultLinkEnd.empty() == false)) { url = StringManip::extractField(resultItem, m_resultLinkStart, m_resultLinkEnd); } if ((m_resultExtractStart.empty() == false) && (m_resultExtractEnd.empty() == false)) { extract = StringManip::extractField(resultItem, m_resultExtractStart, m_resultExtractEnd); } } if (url.empty() == false) { // FIXME: look for a interpret/baseurl tag, see https://bugzilla.mozilla.org/show_bug.cgi?id=65453 // FIXME: obey m_skipLocal DocumentInfo result(name, url, "", ""); result.setExtract(extract); result.setScore(pseudoScore); resultsList.push_back(result); --pseudoScore; foundResult = true; if (resultsList.size() == totalResults) { // Enough results break; } } // Next endPos += m_resultItemEnd.length(); resultItem = StringManip::extractField(resultList, resStart, resEnd, endPos); } return foundResult; } pthread_mutex_t SherlockParser::m_mutex = PTHREAD_MUTEX_INITIALIZER; SherlockParser::SherlockParser(const string &fileName) : PluginParserInterface(fileName) { } SherlockParser::~SherlockParser() { } ResponseParserInterface *SherlockParser::parse(SearchPluginProperties &properties, bool minimal) { FileCollector fileCollect; DocumentInfo docInfo("Sherlock Source", string("file://") + m_fileName, "text/plain", ""); // Get the definition file Document *pPluginDoc = fileCollect.retrieveUrl(docInfo); if (pPluginDoc == NULL) { #ifdef DEBUG clog << "SherlockParser::parse: couldn't load " << m_fileName << endl; #endif return NULL; } off_t dataLength; const char *pData = pPluginDoc->getData(dataLength); if ((pData == NULL) || (dataLength == 0)) { delete pPluginDoc; return NULL; } map searchParams, interpretParams, inputItems; string userInput, nextInput, nextFactor, nextValue; bool parsedPlugin = false; if (pthread_mutex_lock(&m_mutex) == 0) { try { plugin_skip_grammar skip; plugin_grammar plugin(searchParams, interpretParams, inputItems, userInput, nextInput, nextFactor, nextValue); parse_info<> parseInfo = boost::spirit::parse(pData, plugin, skip); parsedPlugin = parseInfo.hit; } catch (const exception &e) { #ifdef DEBUG clog << "SherlockParser::parse: caught exception ! " << e.what() << endl; #endif parsedPlugin = false; } catch (...) { #ifdef DEBUG clog << "SherlockParser::parse: caught unknown exception !" << endl; #endif parsedPlugin = false; } pthread_mutex_unlock(&m_mutex); } // We are done with the document delete pPluginDoc; SherlockResponseParser *pResponseParser = NULL; if (parsedPlugin == true) { map lowSearchParams, lowInterpretParams; pResponseParser = new SherlockResponseParser(); LowerAndCopy lowCopy1(lowSearchParams); for_each(searchParams.begin(), searchParams.end(), lowCopy1); LowerAndCopy lowCopy2(lowInterpretParams); for_each(interpretParams.begin(), interpretParams.end(), lowCopy2); // Response properties.m_response = SearchPluginProperties::HTML_RESPONSE; // Method properties.m_method = SearchPluginProperties::GET_METHOD; // Name map::iterator mapIter = lowSearchParams.find("name"); if (mapIter != lowSearchParams.end()) { properties.m_longName = mapIter->second; } // Channel mapIter = lowSearchParams.find("routetype"); if (mapIter != lowSearchParams.end()) { properties.m_channel = mapIter->second; } if (userInput.empty() == false) { // Remove the user input tag from the input tags map mapIter = inputItems.find(userInput); if (mapIter != inputItems.end()) { inputItems.erase(mapIter); } #ifdef DEBUG else clog << "SherlockParser::parse: couldn't remove user input item" << endl; #endif properties.m_variableParameters[SearchPluginProperties::SEARCH_TERMS_PARAM] = userInput; } for (map::iterator iter = inputItems.begin(); iter != inputItems.end(); ++iter) { #ifdef DEBUG clog << "SherlockParser::parse: " << iter->first << "=" << iter->second << endl; #endif if (iter->second.substr(0, 5) == "EDIT:") { // This is user editable properties.m_editableParameters[iter->first] = iter->second.substr(5); } else { // Append to the remainder if (properties.m_remainder.empty() == false) { properties.m_remainder += "&"; } properties.m_remainder += iter->first; properties.m_remainder += "="; properties.m_remainder += iter->second; } } if (minimal == false) { // URL mapIter = lowSearchParams.find("action"); if (mapIter != lowSearchParams.end()) { properties.m_baseUrl = mapIter->second; } // Response mapIter = lowInterpretParams.find("resultliststart"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultListStart = StringManip::replaceSubString(mapIter->second, "\\n", "\n"); } mapIter = lowInterpretParams.find("resultlistend"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultListEnd = StringManip::replaceSubString(mapIter->second, "\\n", "\n"); } mapIter = lowInterpretParams.find("resultitemstart"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultItemStart = StringManip::replaceSubString(mapIter->second, "\\n", "\n"); } mapIter = lowInterpretParams.find("resultitemend"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultItemEnd = StringManip::replaceSubString(mapIter->second, "\\n", "\n"); } mapIter = lowInterpretParams.find("resulttitlestart"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultTitleStart = mapIter->second; } mapIter = lowInterpretParams.find("resulttitleend"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultTitleEnd = mapIter->second; } mapIter = lowInterpretParams.find("resultlinkstart"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultLinkStart = mapIter->second; } mapIter = lowInterpretParams.find("resultlinkend"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultLinkEnd = mapIter->second; } mapIter = lowInterpretParams.find("resultextractstart"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultExtractStart = mapIter->second; } mapIter = lowInterpretParams.find("resultextractend"); if (mapIter != lowInterpretParams.end()) { pResponseParser->m_resultExtractEnd = mapIter->second; } mapIter = lowInterpretParams.find("skiplocal"); if (mapIter != lowInterpretParams.end()) { if (mapIter->second == "false") { pResponseParser->m_skipLocal = false; } } // Here we differ from how Mozilla uses these parameters // Normally, either factor or value is used, but we use value // as the parameter's initial value if (nextFactor.empty() == false) { properties.m_variableParameters[SearchPluginProperties::START_PAGE_PARAM] = nextInput; properties.m_scrolling = SearchPluginProperties::PER_PAGE; // What Sherlock calls a factor is actually an increment properties.m_nextIncrement = (unsigned int)atoi(nextFactor.c_str()); } else { // Assume INPUTNEXT allows to specify a number of results // Not sure if this is how Sherlock/Mozilla interpret this properties.m_variableParameters[SearchPluginProperties::COUNT_PARAM] = nextInput; properties.m_scrolling = SearchPluginProperties::PER_INDEX; properties.m_nextIncrement = 0; } if (nextValue.empty() == false) { properties.m_nextBase = (unsigned int)atoi(nextValue.c_str()); } else { properties.m_nextBase = 0; } } } return pResponseParser; } pinot-1.22/IndexSearch/SherlockParser.h000066400000000000000000000047461470740426600201110ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _SHERLOCK_PARSER_H #define _SHERLOCK_PARSER_H #include #include #include #include "Document.h" #include "PluginParsers.h" /// Parses output of Sherlock-based search engines. class SherlockResponseParser : public ResponseParserInterface { public: SherlockResponseParser(); virtual ~SherlockResponseParser(); /// Parses the response; false if not all could be parsed. virtual bool parse(const Document *pResponseDoc, std::vector &resultsList, unsigned int &totalResults, unsigned int &firstResultIndex, std::string &charset) const; std::string m_resultListStart; std::string m_resultListEnd; std::string m_resultItemStart; std::string m_resultItemEnd; std::string m_resultTitleStart; std::string m_resultTitleEnd; std::string m_resultLinkStart; std::string m_resultLinkEnd; std::string m_resultExtractStart; std::string m_resultExtractEnd; bool m_skipLocal; private: SherlockResponseParser(const SherlockResponseParser &other); SherlockResponseParser& operator=(const SherlockResponseParser& other); }; /** A parser for Sherlock plugin files. * See http://developer.apple.com/technotes/tn/tn1141.html * and http://mycroft.mozdev.org/deepdocs/deepdocs.html */ class SherlockParser : public PluginParserInterface { public: SherlockParser(const std::string &fileName); virtual ~SherlockParser(); /// Parses the plugin and returns a response parser. virtual ResponseParserInterface *parse(SearchPluginProperties &properties, bool minimal = false); protected: static pthread_mutex_t m_mutex; private: SherlockParser(const SherlockParser &other); SherlockParser& operator=(const SherlockParser& other); }; #endif // _SHERLOCK_PARSER_H pinot-1.22/IndexSearch/WebEngine.cpp000066400000000000000000000145161470740426600173540ustar00rootroot00000000000000/* * Copyright 2005-2011 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include "StringManip.h" #include "Url.h" #include "DownloaderFactory.h" #include "FilterUtils.h" #include "CJKVTokenizer.h" #include "WebEngine.h" using std::clog; using std::clog; using std::endl; using std::string; using std::set; using std::map; using std::vector; class TermHighlighter : public Dijon::CJKVTokenizer::TokensHandler { public: TermHighlighter(string &extract, set &queryTerms, unsigned int nGramSize) : Dijon::CJKVTokenizer::TokensHandler(), m_extract(extract), m_nGramSize(nGramSize), m_nGramCount(0), m_queryTerms(queryTerms) { } virtual ~TermHighlighter() { } virtual bool handle_token(const string &tok, bool is_cjkv) { gchar *pEscToken = NULL; gchar *pUTF8Token = NULL; gsize bytesWritten = 0; if (tok.empty() == true) { return false; } if (is_cjkv == false) { m_nGramCount = 0; } else { ++m_nGramCount; if (tok.length() > 4) { // Skip multi-character tokens #ifdef DEBUG clog << "WebEngine::processResult: skipping " << tok << endl; #endif return true; } } pUTF8Token = g_locale_to_utf8(tok.c_str(), tok.length(), NULL, &bytesWritten, NULL); if (pUTF8Token != NULL) { pEscToken = g_markup_escape_text(pUTF8Token, -1); g_free(pUTF8Token); } if (pEscToken == NULL) { return true; } if ((m_extract.empty() == false) && (m_nGramCount <= 1)) { m_extract += " "; } // Is this a query term ? if (m_queryTerms.find(StringManip::toLowerCase(tok)) == m_queryTerms.end()) { m_extract += pEscToken; } else { m_extract += ""; m_extract += pEscToken; m_extract += ""; } g_free(pEscToken); return true; } protected: string &m_extract; unsigned int m_nGramSize; unsigned int m_nGramCount; set &m_queryTerms; }; WebEngine::WebEngine() : SearchEngineInterface(), m_pDownloader(DownloaderFactory::getDownloader("http")) { } WebEngine::~WebEngine() { if (m_pDownloader != NULL) { delete m_pDownloader; } } Document *WebEngine::downloadPage(const DocumentInfo &docInfo) { m_charset.clear(); if (m_pDownloader == NULL) { return NULL; } Document *pDoc = m_pDownloader->retrieveUrl(docInfo); if (pDoc != NULL) { string contentType(pDoc->getType()); // Is a charset specified ? string::size_type pos = contentType.find("charset="); if (pos != string::npos) { m_charset = StringManip::removeQuotes(contentType.substr(pos + 8)); #ifdef DEBUG clog << "WebEngine::downloadPage: page charset is " << m_charset << endl; #endif } } return pDoc; } void WebEngine::setQuery(const QueryProperties &queryProps) { queryProps.getTerms(m_queryTerms); } bool WebEngine::processResult(const string &queryUrl, DocumentInfo &result) { Url queryUrlObj(queryUrl); string resultUrl(result.getLocation()); string queryHost(Url::reduceHost(queryUrlObj.getHost(), 2)); if (resultUrl.empty() == true) { return false; } // Is this URL relative to the search engine's domain ? if ((resultUrl[0] == '/') || ((resultUrl.length() > 1) && (resultUrl[0] == '.') && (resultUrl[1] == '/'))) { string fullResultUrl(queryUrlObj.getProtocol()); fullResultUrl += "://"; fullResultUrl += queryUrlObj.getHost(); if (resultUrl[0] == '.') { fullResultUrl += resultUrl.substr(1); } else { fullResultUrl += resultUrl; } resultUrl = fullResultUrl; } else { Url resultUrlObj(resultUrl); if ((resultUrlObj.getHost().empty() == true) || (resultUrlObj.getHost() == "localhost")) { string fullResultUrl(queryUrlObj.getProtocol()); fullResultUrl += "://"; fullResultUrl += queryUrlObj.getHost(); fullResultUrl += "/"; fullResultUrl += resultUrl; resultUrl = fullResultUrl; } } Url resultUrlObj(resultUrl); // Is the result's host name the same as the search engine's ? // FIXME: not all TLDs have leafs at level 2 if (queryHost == Url::reduceHost(resultUrlObj.getHost(), 2)) { string protocol(resultUrlObj.getProtocol()); if (protocol.empty() == false) { string embeddedUrl; string::size_type startPos = resultUrl.find(protocol, protocol.length()); if (startPos != string::npos) { string::size_type endPos = resultUrl.find("&", startPos); if (endPos != string::npos) { embeddedUrl = resultUrl.substr(startPos, endPos - startPos); } else { embeddedUrl = resultUrl.substr(startPos); } resultUrl = Url::unescapeUrl(embeddedUrl); } #ifdef DEBUG else clog << "WebEngine::processResult: no embedded URL" << endl; #endif } #ifdef DEBUG else clog << "WebEngine::processResult: no protocol" << endl; #endif } // Trim spaces string trimmedUrl(resultUrl); StringManip::trimSpaces(trimmedUrl); // Make the URL canonical result.setLocation(Url::canonicalizeUrl(trimmedUrl)); // Scan the extract for query terms string extract(result.getExtract()); if (extract.empty() == true) { return true; } Dijon::CJKVTokenizer tokenizer; TermHighlighter handler(extract, m_queryTerms, tokenizer.get_ngram_size()); // Highlight query terms in the extract extract.clear(); tokenizer.tokenize(result.getExtract(), handler); result.setExtract(extract); return true; } /// Returns the downloader used if any. DownloaderInterface *WebEngine::getDownloader(void) { return m_pDownloader; } /// Specifies values for editable parameters. void WebEngine::setEditableValues(const map &editableValues) { m_editableValues = editableValues; } pinot-1.22/IndexSearch/WebEngine.h000066400000000000000000000036221470740426600170150ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _WEB_ENGINE_H #define _WEB_ENGINE_H #include #include #include #include "Document.h" #include "Visibility.h" #include "DownloaderInterface.h" #include "QueryProperties.h" #include "SearchEngineInterface.h" /// Base class for all Web search engines. class PINOT_EXPORT WebEngine : public SearchEngineInterface { public: WebEngine(); virtual ~WebEngine(); /// Returns the downloader used if any. DownloaderInterface *getDownloader(void); /// Specifies values for editable parameters. void setEditableValues(const std::map &editableValues); protected: DownloaderInterface *m_pDownloader; std::map m_editableValues; std::set m_queryTerms; Document *downloadPage(const DocumentInfo &docInfo); void setHostNameFilter(const string &filter); void setFileNameFilter(const string &filter); void setQuery(const QueryProperties &queryProps); virtual bool processResult(const string &queryUrl, DocumentInfo &result); private: WebEngine(const WebEngine &other); WebEngine &operator=(const WebEngine &other); }; #endif // _WEB_ENGINE_H pinot-1.22/IndexSearch/Xapian/000077500000000000000000000000001470740426600162165ustar00rootroot00000000000000pinot-1.22/IndexSearch/Xapian/LanguageDetector.cpp000066400000000000000000000117461470740426600221500ustar00rootroot00000000000000/* * Copyright 2005-2011 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include extern "C" { #define USE_TEXTCAT 1 #ifdef HAVE_LIBEXTTEXTCAT_TEXTCAT_H #include #else #ifdef HAVE_LIBTEXTCAT_TEXTCAT_H #include #else #ifdef HAVE_TEXTCAT_H #include #else #undef USE_TEXTCAT #endif #endif #endif } #include #include #include #include "StringManip.h" #include "Timer.h" #include "LanguageDetector.h" #include "config.h" using std::clog; using std::clog; using std::endl; using std::string; using std::vector; using std::min; #define MAX_TEXT_SIZE 1000 LanguageDetector LanguageDetector::m_instance; LanguageDetector::LanguageDetector() : m_pHandle(NULL) { #ifdef USE_TEXTCAT string confFile(SYSCONFDIR); const char *textCatVersion = textcat_Version(); // What configuration file should we use ? confFile += "/pinot/"; #ifdef DEBUG clog << "LanguageDetector::guessLanguage: detected " << textCatVersion << endl; #endif if (strncasecmp(textCatVersion, "TextCat 3", 9) == 0) { // Version 3 confFile += "textcat3_conf.txt"; } else if (strncasecmp(textCatVersion, "3.1", 3) == 0) { // Version 3.1 confFile += "textcat31_conf.txt"; } else if (strncasecmp(textCatVersion, "3.", 2) == 0) { // Version 3.2 and above confFile += "textcat32_conf.txt"; } else { confFile += "textcat_conf.txt"; } // Initialize pthread_mutex_init(&m_mutex, NULL); m_pHandle = textcat_Init(confFile.c_str()); #endif } LanguageDetector::~LanguageDetector() { #ifdef USE_TEXTCAT if (m_pHandle != NULL) { // Close the descriptor textcat_Done(m_pHandle); } pthread_mutex_destroy(&m_mutex); #endif } LanguageDetector &LanguageDetector::getInstance(void) { return m_instance; } /** * Attempts to guess the language. * Returns a list of candidates, or "unknown" if detection failed. */ void LanguageDetector::guessLanguage(const char *pData, unsigned int dataLength, vector &candidates) { #ifdef HAVE_TEXTCAT_CAT const char *catResults[10]; #endif candidates.clear(); #ifdef USE_TEXTCAT if (m_pHandle == NULL) { candidates.push_back("unknown"); return; } #ifdef DEBUG Timer timer; timer.start(); #endif // Lock the handle if (pthread_mutex_lock(&m_mutex) != 0) { return; } // Classify #ifdef HAVE_TEXTCAT_CAT unsigned int resultNum = textcat_Cat(m_pHandle, pData, min(dataLength, (unsigned int)MAX_TEXT_SIZE), catResults, 10); if (resultNum == 0 ) { candidates.push_back("unknown"); } else { for (unsigned int i=0; i #include #include /// Detects a document's language with libextcat. class LanguageDetector { public: virtual ~LanguageDetector(); static LanguageDetector &getInstance(void); /** * Attempts to guess the language. * Returns a list of candidates, or "unknown" if detection failed. */ void guessLanguage(const char *pData, unsigned int dataLength, std::vector &candidates); protected: static LanguageDetector m_instance; pthread_mutex_t m_mutex; void *m_pHandle; LanguageDetector(); private: LanguageDetector(const LanguageDetector &other); LanguageDetector &operator=(const LanguageDetector &other); }; #endif // _LANGUAGE_DETECTOR_H pinot-1.22/IndexSearch/Xapian/Makefile.am000066400000000000000000000016241470740426600202550ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in noinst_HEADERS = \ LanguageDetector.h \ XapianDatabase.h \ XapianDatabaseFactory.h \ XapianIndex.h \ XapianEngine.h lib_LTLIBRARIES = libxapianbackend.la libxapianbackend_la_SOURCES = \ LanguageDetector.cpp \ ModuleExports.cpp \ XapianDatabase.cpp \ XapianDatabaseFactory.cpp \ XapianIndex.cpp \ XapianEngine.cpp libxapianbackend_la_LDFLAGS = \ -module -version-info 1:0:0 -shared -nostartfiles libxapianbackend_la_LIBADD = \ -L$(top_builddir)/Utils \ -lBasicUtils \ @XML_LIBS@ @MIN_HTTP_LIBS@ \ @INDEX_LIBS@ @MISC_LIBS@ AM_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils \ -I$(top_srcdir)/Tokenize \ -I$(top_srcdir)/Tokenize/filters \ -I$(top_srcdir)/IndexSearch \ -I$(top_srcdir)/IndexSearch/cjkv \ @MIN_HTTP_CFLAGS@ @XML_CFLAGS@ @INDEX_CFLAGS@ @GLIBMM_CFLAGS@ if HAVE_BOOST_SPIRIT AM_CXXFLAGS += -DHAVE_BOOST_SPIRIT endif pinot-1.22/IndexSearch/Xapian/ModuleExports.cpp000066400000000000000000000057631470740426600215470ustar00rootroot00000000000000/* * Copyright 2007-2012 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include "config.h" #include "Visibility.h" #include "FieldMapperInterface.h" #include "ModuleProperties.h" #include "XapianDatabaseFactory.h" #include "XapianEngine.h" #include "XapianIndex.h" using std::string; extern "C" { PINOT_EXPORT ModuleProperties *getModuleProperties(void); PINOT_EXPORT bool openOrCreateIndex(const string &databaseName, bool &obsoleteFormat, bool readOnly, bool overwrite); PINOT_EXPORT bool mergeIndexes(const string &mergedDatabaseName, const string &firstDatabaseName, const string &secondDatabaseName); PINOT_EXPORT IndexInterface *getIndex(const string &databaseName); PINOT_EXPORT SearchEngineInterface *getSearchEngine(const string &databaseName); PINOT_EXPORT void setFieldMapper(FieldMapperInterface *pMapper); PINOT_EXPORT void closeAll(void); } FieldMapperInterface *g_pMapper = NULL; ModuleProperties *getModuleProperties(void) { return new ModuleProperties("xapian", "Xapian", "", ""); } bool openOrCreateIndex(const string &databaseName, bool &obsoleteFormat, bool readOnly, bool overwrite) { XapianDatabase *pDb = XapianDatabaseFactory::getDatabase(databaseName, readOnly, overwrite); if (pDb == NULL) { obsoleteFormat = false; return false; } obsoleteFormat = pDb->wasObsoleteFormat(); return true; } bool mergeIndexes(const string &mergedDatabaseName, const string &firstDatabaseName, const string &secondDatabaseName) { // Assume both have already been open XapianDatabase *pFirstDb = XapianDatabaseFactory::getDatabase(firstDatabaseName); if ((pFirstDb == NULL) || (pFirstDb->isOpen() == false)) { return false; } XapianDatabase *pSecondDb = XapianDatabaseFactory::getDatabase(secondDatabaseName); if ((pSecondDb == NULL) || (pSecondDb->isOpen() == false)) { return false; } // Merge them return XapianDatabaseFactory::mergeDatabases(mergedDatabaseName, pFirstDb, pSecondDb); } IndexInterface *getIndex(const string &databaseName) { return new XapianIndex(databaseName); } SearchEngineInterface *getSearchEngine(const string &databaseName) { return new XapianEngine(databaseName); } void setFieldMapper(FieldMapperInterface *pMapper) { g_pMapper = pMapper; } void closeAll(void) { XapianEngine::freeAll(); XapianDatabaseFactory::closeAll(); } pinot-1.22/IndexSearch/Xapian/XapianDatabase.cpp000066400000000000000000000401241470740426600215700ustar00rootroot00000000000000/* * Copyright 2005-2012 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #include #ifdef HAVE_REGEX_H #include #endif #include #include #include #include "StringManip.h" #include "TimeConverter.h" #include "Url.h" #include "FieldMapperInterface.h" #include "XapianDatabase.h" using std::clog; using std::clog; using std::endl; using std::string; using std::stringstream; extern FieldMapperInterface *g_pMapper; // This puts a limit to terms length. const unsigned int XapianDatabase::m_maxTermLength = 230; XapianDatabase::XapianDatabase(const string &databaseName, bool readOnly, bool overwrite) : m_databaseName(databaseName), m_withSpelling(true), m_readOnly(readOnly), m_overwrite(overwrite), m_obsoleteFormat(false), m_pDatabase(NULL), m_isOpen(false), m_merge(false), m_pFirst(NULL), m_pSecond(NULL) { initializeLock(); openDatabase(); } XapianDatabase::XapianDatabase(const string &databaseName, XapianDatabase *pFirst, XapianDatabase *pSecond) : m_databaseName(databaseName), m_withSpelling(true), m_readOnly(true), m_overwrite(false), m_obsoleteFormat(false), m_pDatabase(NULL), m_isOpen(pFirst->m_isOpen), m_merge(true), m_pFirst(pFirst), m_pSecond(pSecond) { initializeLock(); } XapianDatabase::XapianDatabase(const XapianDatabase &other) : m_databaseName(other.m_databaseName), m_withSpelling(other.m_withSpelling), m_readOnly(other.m_readOnly), m_overwrite(other.m_overwrite), m_obsoleteFormat(other.m_obsoleteFormat), m_pDatabase(NULL), m_isOpen(other.m_isOpen), m_merge(other.m_merge), m_pFirst(other.m_pFirst), m_pSecond(other.m_pSecond) { initializeLock(); if (other.m_pDatabase != NULL) { m_pDatabase = new Xapian::Database(*other.m_pDatabase); } } XapianDatabase::~XapianDatabase() { if (m_pDatabase != NULL) { delete m_pDatabase; } pthread_mutex_destroy(&m_rwLock); } XapianDatabase &XapianDatabase::operator=(const XapianDatabase &other) { if (this != &other) { m_databaseName = other.m_databaseName; m_withSpelling = other.m_withSpelling; m_readOnly = other.m_readOnly; m_overwrite = other.m_overwrite; m_obsoleteFormat = other.m_obsoleteFormat; if (m_pDatabase != NULL) { delete m_pDatabase; m_pDatabase = NULL; } if (other.m_pDatabase != NULL) { m_pDatabase = new Xapian::Database(*other.m_pDatabase); } m_isOpen = other.m_isOpen; m_merge = other.m_merge; m_pFirst = other.m_pFirst; m_pSecond = other.m_pSecond; } return *this; } void XapianDatabase::initializeLock(void) { pthread_mutex_init(&m_rwLock, NULL); } void XapianDatabase::openDatabase(void) { struct stat dbStat; bool createDatabase = false; bool tryAgain = false; if (m_databaseName.empty() == true) { return; } // Should we build the spelling database ? char *pEnvVar = getenv("PINOT_SPELLING_DB"); if ((pEnvVar != NULL) && (strlen(pEnvVar) > 0) && (strncasecmp(pEnvVar, "N", 1) == 0)) { // No m_withSpelling = false; } else { // Yes m_withSpelling = true; } // Assume things will fail m_isOpen = false; if (m_pDatabase != NULL) { delete m_pDatabase; m_pDatabase = NULL; } // Is it a remote database ? string::size_type slashPos = m_databaseName.find("/"); string::size_type colonPos = m_databaseName.find(":"); if (((slashPos == string::npos) || (slashPos > 0)) && (colonPos != string::npos)) { Url urlObj(m_databaseName); // FIXME: in newer versions, the remote backend supports writing if (m_readOnly == false) { clog << "XapianDatabase::openDatabase: remote databases " << m_databaseName << " are read-only" << endl; return; } if (m_databaseName.find("://") == string::npos) { // It's an old style remote specification without the protocol urlObj = Url("tcpsrv://" + m_databaseName); } string hostName(urlObj.getHost()); // A port number should be included colonPos = hostName.find(":"); if (colonPos != string::npos) { string protocol(urlObj.getProtocol()); string portStr(hostName.substr(colonPos + 1)); unsigned int port = (unsigned int)atoi(portStr.c_str()); hostName.resize(colonPos); try { if (protocol == "progsrv+ssh") { string args("-p"); args += " "; args += portStr; args += " -f "; args += hostName; args += " xapian-progsrv /"; args += urlObj.getLocation(); args += "/"; args += urlObj.getFile(); #ifdef DEBUG clog << "XapianDatabase::openDatabase: remote ssh access with ssh " << args << endl; #endif Xapian::Database remoteDatabase = Xapian::Remote::open("ssh", args); m_pDatabase = new Xapian::Database(remoteDatabase); } else { #ifdef DEBUG clog << "XapianDatabase::openDatabase: remote database at " << hostName << " " << port << endl; #endif Xapian::Database remoteDatabase = Xapian::Remote::open(hostName, port); m_pDatabase = new Xapian::Database(remoteDatabase); } if (m_pDatabase != NULL) { // Stop remote databases timing out m_pDatabase->keep_alive(); m_isOpen = true; } return; } catch (const Xapian::Error &error) { clog << "Error opening " << m_databaseName << ": " << error.get_type() << ": " << error.get_msg() << endl; } } #ifdef DEBUG else clog << "XapianDatabase::openDatabase: invalid remote database at " << hostName << "/" << urlObj.getLocation() << "/" << urlObj.getFile() << endl; #endif return; } // It's a local database : the specified path must be a directory if (stat(m_databaseName.c_str(), &dbStat) == -1) { #ifdef DEBUG clog << "XapianDatabase::openDatabase: database " << m_databaseName << " doesn't exist" << endl; #endif // Database directory doesn't exist, create it #ifdef WIN32 if (mkdir(m_databaseName.c_str()) != 0) #else if (mkdir(m_databaseName.c_str(), (mode_t)(S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH)) != 0) #endif { clog << "XapianDatabase::openDatabase: couldn't create database directory " << m_databaseName << endl; return; } createDatabase = true; } // Else, databases may be directories or file-based stub databases else if ((!S_ISDIR(dbStat.st_mode)) && (!S_ISREG(dbStat.st_mode))) { clog << "XapianDatabase::openDatabase: " << m_databaseName << " is neither a directory nor a file" << endl; return; } // Try opening it now, creating if if necessary try { if (m_readOnly == true) { if (createDatabase == true) { // We have to create the whole thing in read-write mode first Xapian::WritableDatabase *pTmpDatabase = new Xapian::WritableDatabase(m_databaseName, Xapian::DB_CREATE_OR_OPEN); // ...then close and open again in read-only mode delete pTmpDatabase; } m_pDatabase = new Xapian::Database(m_databaseName); } else { int openAction = Xapian::DB_CREATE_OR_OPEN; if (m_overwrite == true) { // An existing database will be overwritten openAction = Xapian::DB_CREATE_OR_OVERWRITE; } m_pDatabase = new Xapian::WritableDatabase(m_databaseName, openAction); } if (m_pDatabase != NULL) { #ifdef DEBUG clog << "XapianDatabase::openDatabase: opened " << m_databaseName << " " << m_pDatabase->get_description() << endl; #endif m_isOpen = true; } return; } #if XAPIAN_MAJOR_VERSION>0 catch (const Xapian::DatabaseVersionError &error) { clog << "Error opening " << m_databaseName << ": " << error.get_type() << ": " << error.get_msg() << endl; // This format is no longer supported if (m_obsoleteFormat == false) { tryAgain = true; } } #endif catch (const Xapian::Error &error) { clog << "Error opening " << m_databaseName << ": " << error.get_type() << ": " << error.get_msg() << endl; } // Give it another try ? if (tryAgain == true) { clog << "XapianDatabase::openDatabase: trying again" << endl; m_overwrite = true; m_obsoleteFormat = true; openDatabase(); } } /// Returns true if the database supports spelling. bool XapianDatabase::withSpelling(void) { return m_withSpelling; } /// Returns false if the database couldn't be opened. bool XapianDatabase::isOpen(void) const { return m_isOpen; } /// Returns true if the database is a merge of other databases. bool XapianDatabase::isMerge(void) const { return m_merge; } /// Returns false if the database isn't opened in write mode. bool XapianDatabase::isWritable(void) const { if ((m_isOpen == false) || (m_readOnly == true) || (m_merge == true)) { return false; } return true; } /// Returns false if the database was of an obsolete format. bool XapianDatabase::wasObsoleteFormat(void) const { return m_obsoleteFormat; } /// Reopens the database. void XapianDatabase::reopen(void) { // This is provided by Xapian::Database if (pthread_mutex_lock(&m_rwLock) == 0) { if (m_pDatabase != NULL) { m_pDatabase->reopen(); } pthread_mutex_unlock(&m_rwLock); } } /// Attempts to lock and retrieve the database. Xapian::Database *XapianDatabase::readLock(void) { if (m_merge == false) { if (pthread_mutex_lock(&m_rwLock) == 0) { if (m_pDatabase == NULL) { // Try again openDatabase(); } return m_pDatabase; } #ifdef DEBUG else clog << "XapianDatabase::readLock: failed" << endl; #endif } else { if ((m_pFirst == NULL) || (m_pFirst->isOpen() == false) || (m_pSecond == NULL) || (m_pSecond->isOpen() == false)) { return NULL; } if (pthread_mutex_lock(&m_rwLock) == 0) { // Reopen the second index m_pSecond->reopen(); // Lock both indexes Xapian::Database *pFirstDatabase = m_pFirst->readLock(); Xapian::Database *pSecondDatabase = m_pSecond->readLock(); if ((pFirstDatabase != NULL) && (pSecondDatabase != NULL)) { // Copy the first one m_pDatabase = new Xapian::Database(*pFirstDatabase); // Add the second index to it m_pDatabase->add_database(*pSecondDatabase); // Until unlock() is called, both indexes are read locked } return m_pDatabase; } #ifdef DEBUG else clog << "XapianDatabase::readLock: failed" << endl; #endif } return NULL; } /// Attempts to lock and retrieve the database. Xapian::WritableDatabase *XapianDatabase::writeLock(void) { if ((m_readOnly == true) || (m_merge == true)) { clog << "Couldn't open read-only database " << m_databaseName << " for writing" << endl; return NULL; } if (pthread_mutex_lock(&m_rwLock) == 0) { if (m_pDatabase == NULL) { // Try again openDatabase(); } return dynamic_cast(m_pDatabase); } #ifdef DEBUG else clog << "XapianDatabase::writeLock: failed" << endl; #endif return NULL; } /// Unlocks the database. void XapianDatabase::unlock(void) { if (pthread_mutex_unlock(&m_rwLock) != 0) { #ifdef DEBUG clog << "XapianDatabase::unlock: failed" << endl; #endif } if (m_merge == true) { // Unlock the original indexes if (m_pFirst != NULL) { m_pFirst->unlock(); } if (m_pSecond != NULL) { m_pSecond->unlock(); } // Delete merge if (m_pDatabase != NULL) { delete m_pDatabase; m_pDatabase = NULL; } } } bool XapianDatabase::badRecordField(const string &field) { bool isBadField = false; #ifdef HAVE_REGEX_H regex_t fieldRegex; regmatch_t pFieldMatches[1]; // A bad field is one that includes one of our field delimiters if (regcomp(&fieldRegex, "(url|ipath|sample|caption|type|modtime|language|size)=", REG_EXTENDED|REG_ICASE) == 0) { if (regexec(&fieldRegex, field.c_str(), 1, pFieldMatches, REG_NOTBOL|REG_NOTEOL) == 0) { isBadField = true; } } regfree(&fieldRegex); #else // A bad field is one that includes one of our field delimiters if ((field.find("url=") != string::npos) || (field.find("ipath=") != string::npos) || (field.find("sample=") != string::npos) || (field.find("caption=") != string::npos) || (field.find("type=") != string::npos) || (field.find("modtime=") != string::npos) || (field.find("language=") != string::npos) || (field.find("size=") != string::npos)) { isBadField = true; } #endif return isBadField; } /// Returns a record for the document's properties. string XapianDatabase::propsToRecord(DocumentInfo *pDoc) { string record; if (pDoc == NULL) { return ""; } if (g_pMapper != NULL) { g_pMapper->toRecord(pDoc, record); } string title(pDoc->getTitle()); string timestamp(pDoc->getTimestamp()); time_t timeT = TimeConverter::fromTimestamp(timestamp); // Set the document data omindex-style record += "url="; record += pDoc->getLocation(); record += "\nipath="; record += Url::escapeUrl(pDoc->getInternalPath()); // The sample will be generated at query time record += "\nsample="; record += "\ncaption="; if (badRecordField(title) == true) { // Modify the title if necessary string::size_type pos = title.find("="); while (pos != string::npos) { title[pos] = ' '; pos = title.find("=", pos + 1); } #ifdef DEBUG clog << "XapianDatabase::propsToRecord: modified title" << endl; #endif } record += title; record += "\ntype="; record += pDoc->getType(); // Append a timestamp, in a format compatible with Omega record += "\nmodtime="; stringstream timeStream; timeStream << timeT; record += timeStream.str(); // ...and the language record += "\nlanguage="; record += pDoc->getLanguage(); // ...and the file size record += "\nsize="; stringstream sizeStream; sizeStream << pDoc->getSize(); record += sizeStream.str(); #ifdef DEBUG clog << "XapianDatabase::propsToRecord: document data is " << record << endl; #endif return record; } /// Sets the document's properties acording to the record. void XapianDatabase::recordToProps(const string &record, DocumentInfo *pDoc) { if (pDoc == NULL) { return; } if (g_pMapper != NULL) { g_pMapper->fromRecord(pDoc, record); } // Get the title pDoc->setTitle(StringManip::extractField(record, "caption=", "\n")); // Get the URL string url(StringManip::extractField(record, "url=", "\n")); if (url.empty() == false) { url = Url::canonicalizeUrl(url); } pDoc->setLocation(url); // Get the internal path string ipath(StringManip::extractField(record, "ipath=", "\n")); if (ipath.empty() == false) { ipath = Url::unescapeUrl(ipath); } pDoc->setInternalPath(ipath); // Get the type pDoc->setType(StringManip::extractField(record, "type=", "\n")); // ... the language, if available pDoc->setLanguage(StringManip::extractField(record, "language=", "\n")); // ... and the timestamp string modTime(StringManip::extractField(record, "modtime=", "\n")); if (modTime.empty() == false) { time_t timeT = (time_t )atol(modTime.c_str()); pDoc->setTimestamp(TimeConverter::toTimestamp(timeT)); } string bytesSize(StringManip::extractField(record, "size=", "")); if (bytesSize.empty() == false) { pDoc->setSize((off_t)atol(bytesSize.c_str())); } } /// Returns the URL for the given document in the given index. string XapianDatabase::buildUrl(const string &database, unsigned int docId) { stringstream docIdStream; // Make up a pseudo URL docIdStream << docId; string url = "xapian://localhost/"; url += database; url += "/"; url += docIdStream.str(); return url; } /// Truncates or partially hashes a term. string XapianDatabase::limitTermLength(const string &term, bool makeUnique) { if (term.length() > XapianDatabase::m_maxTermLength) { if (makeUnique == false) { // Truncate return term.substr(0, XapianDatabase::m_maxTermLength); } else { return StringManip::hashString(term, XapianDatabase::m_maxTermLength); } } return term; } pinot-1.22/IndexSearch/Xapian/XapianDatabase.h000066400000000000000000000060041470740426600212340ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _XAPIAN_DATABASE_H #define _XAPIAN_DATABASE_H #include #include #include #include #include "DocumentInfo.h" /// Lockable Xapian database. class XapianDatabase { public: XapianDatabase(const std::string &databaseName, bool readOnly = true, bool overwrite = false); XapianDatabase(const std::string &databaseName, XapianDatabase *pFirst, XapianDatabase *pSecond); XapianDatabase(const XapianDatabase &other); virtual ~XapianDatabase(); XapianDatabase &operator=(const XapianDatabase &other); /// Returns false if the database couldn't be opened. bool isOpen(void) const; /// Returns true if the database is a merge of other databases. bool isMerge(void) const; /// Returns false if the database isn't opened in write mode. bool isWritable(void) const; /// Returns true if the database supports spelling. bool withSpelling(void); /// Returns false if the database was of an obsolete format. bool wasObsoleteFormat(void) const; /// Reopens the database. void reopen(void); /// Attempts to lock and retrieve the database. Xapian::Database *readLock(void); /// Attempts to lock and retrieve the database. Xapian::WritableDatabase *writeLock(void); /// Unlocks the database. void unlock(void); /// Returns a record for the document's properties. static std::string propsToRecord(DocumentInfo *pDoc); /// Sets the document's properties acording to the record. static void recordToProps(const std::string &record, DocumentInfo *pDoc); /// Returns the URL for the given document in the given index. static std::string buildUrl(const std::string &database, unsigned int docId); /// Truncates or partially hashes a term. static std::string limitTermLength(const std::string &term, bool makeUnique = false); protected: static const unsigned int m_maxTermLength; std::string m_databaseName; bool m_withSpelling; bool m_readOnly; bool m_overwrite; bool m_obsoleteFormat; pthread_mutex_t m_rwLock; Xapian::Database *m_pDatabase; bool m_isOpen; bool m_merge; XapianDatabase *m_pFirst; XapianDatabase *m_pSecond; void initializeLock(void); void openDatabase(void); static bool badRecordField(const std::string &field); }; #endif // _XAPIAN_DATABASE_H pinot-1.22/IndexSearch/Xapian/XapianDatabaseFactory.cpp000066400000000000000000000114171470740426600231230ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include "XapianDatabaseFactory.h" using std::clog; using std::endl; using std::string; using std::map; using std::pair; pthread_mutex_t XapianDatabaseFactory::m_mutex = PTHREAD_MUTEX_INITIALIZER; map XapianDatabaseFactory::m_databases; bool XapianDatabaseFactory::m_closed = false; XapianDatabaseFactory::XapianDatabaseFactory() { } XapianDatabaseFactory::~XapianDatabaseFactory() { } /// Merges two databases together and add the result to the list. bool XapianDatabaseFactory::mergeDatabases(const string &name, XapianDatabase *pFirst, XapianDatabase *pSecond) { if (m_closed == true) { return false; } map::iterator dbIter = m_databases.find(name); if (dbIter != m_databases.end()) { return false; } // Create the new database XapianDatabase *pDb = new XapianDatabase(name, pFirst, pSecond); // Insert it into the map pair::iterator, bool> insertPair = m_databases.insert(pair(name, pDb)); // Was it inserted ? if (insertPair.second == false) { // No, it wasn't : delete the object delete pDb; return false; } return true; } /// Returns a XapianDatabase pointer; NULL if unavailable. XapianDatabase *XapianDatabaseFactory::getDatabase(const string &location, bool readOnly, bool overwrite) { XapianDatabase *pDb = NULL; if ((m_closed == true) || (location.empty() == true)) { return NULL; } // Lock the map if (pthread_mutex_lock(&m_mutex) != 0) { return NULL; } // Is the database already open ? map::iterator dbIter = m_databases.find(location); if (dbIter != m_databases.end()) { pDb = dbIter->second; // Overwrite the database ? if (overwrite == true) { dbIter->second = NULL; #ifdef DEBUG clog << "XapianDatabaseFactory::getDatabase: closing " << dbIter->first << endl; #endif m_databases.erase(dbIter); delete pDb; dbIter = m_databases.end(); } } // Open the database ? if (dbIter == m_databases.end()) { // Create a new instance pDb = new XapianDatabase(location, readOnly, overwrite); // Insert it into the map pair::iterator, bool> insertPair = m_databases.insert(pair(location, pDb)); // Was it inserted ? if (insertPair.second == false) { // No, it wasn't : delete the object delete pDb; pDb = NULL; } } // Unlock the map pthread_mutex_unlock(&m_mutex); return pDb; } /// Closes all databases. void XapianDatabaseFactory::closeAll(void) { if (m_databases.empty() == true) { return; } // Lock the map // FIXME: another thread may have a database and try and lock it after the loop below deletes it if (pthread_mutex_lock(&m_mutex) != 0) { return; } m_closed = true; // Close merged databases first std::map::iterator dbIter = m_databases.begin(); while (dbIter != m_databases.end()) { XapianDatabase *pDb = dbIter->second; if (pDb->isMerge() == false) { ++dbIter; continue; } std::map::iterator nextIter = dbIter; ++nextIter; #ifdef DEBUG clog << "XapianDatabaseFactory::closeAll: closing " << dbIter->first << endl; #endif // Remove from the map dbIter->second = NULL; m_databases.erase(dbIter); Xapian::Database *pIndex = pDb->readLock(); pDb->unlock(); // Close the database delete pDb; dbIter = nextIter; } // Now close all other databases dbIter = m_databases.begin(); while (dbIter != m_databases.end()) { XapianDatabase *pDb = dbIter->second; Xapian::Database *pIndex = NULL; #ifdef DEBUG clog << "XapianDatabaseFactory::closeAll: closing " << dbIter->first << endl; #endif // Remove from the map dbIter->second = NULL; m_databases.erase(dbIter); if (pDb->isWritable() == true) { pIndex = pDb->writeLock(); } else { pIndex = pDb->readLock(); } pDb->unlock(); // Close the database delete pDb; dbIter = m_databases.begin(); } // Unlock the map pthread_mutex_unlock(&m_mutex); } pinot-1.22/IndexSearch/Xapian/XapianDatabaseFactory.h000066400000000000000000000034151470740426600225670ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _XAPIAN_DATABASE_FACTORY_H #define _XAPIAN_DATABASE_FACTORY_H #include #include #include #include "XapianDatabase.h" /// Factory for Xapian database objects. class XapianDatabaseFactory { public: virtual ~XapianDatabaseFactory(); /// Merges two databases together and add the result to the list. static bool mergeDatabases(const std::string &name, XapianDatabase *pFirst, XapianDatabase *pSecond); /// Returns a XapianDatabase pointer; NULL if unavailable. static XapianDatabase *getDatabase(const std::string &location, bool readOnly = true, bool overwrite = false); /// Closes all databases. static void closeAll(void); protected: static pthread_mutex_t m_mutex; static std::map m_databases; static bool m_closed; XapianDatabaseFactory(); private: XapianDatabaseFactory(const XapianDatabaseFactory &other); XapianDatabaseFactory &operator=(const XapianDatabaseFactory &other); }; #endif // _XAPIAN_DATABASE_FACTORY_H pinot-1.22/IndexSearch/Xapian/XapianEngine.cpp000066400000000000000000000723471470740426600213050ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include "config.h" #include "Languages.h" #include "StringManip.h" #include "TimeConverter.h" #include "Timer.h" #include "Url.h" #include "CJKVTokenizer.h" #include "FieldMapperInterface.h" #include "XapianDatabaseFactory.h" #include "XapianEngine.h" using std::string; using std::map; using std::multimap; using std::vector; using std::clog; using std::clog; using std::endl; using std::inserter; using std::getline; using std::ifstream; using namespace Dijon; extern FieldMapperInterface *g_pMapper; static void checkFilter(const string &freeQuery, string::size_type filterValueStart, bool &escapeValue, bool &hashValue) { string filterName; string::size_type filterNameStart = freeQuery.rfind(' ', filterValueStart); escapeValue = hashValue = false; if (filterNameStart == string::npos) { filterName = freeQuery.substr(0, filterValueStart); } else { filterName = freeQuery.substr(filterNameStart + 1, filterValueStart - filterNameStart - 1); } #ifdef DEBUG clog << "checkFilter: filter " << filterName << endl; #endif // In XapianIndex, these are escaped and hashed if ((filterName == "file") || (filterName =="dir") || (filterName == "url") || (filterName == "path")) { escapeValue = hashValue = true; } // except label which is only escaped else if (filterName == "label") { escapeValue = true; } else if (g_pMapper != NULL) { escapeValue = g_pMapper->isEscaped(filterName); } } class TimeValueRangeProcessor : public Xapian::RangeProcessor { public: TimeValueRangeProcessor(Xapian::valueno valueNumber) : Xapian::RangeProcessor(), m_valueNumber(valueNumber) { } virtual ~TimeValueRangeProcessor() { } virtual Xapian::Query operator()(const std::string &begin, const std::string &end) { if ((begin.size() == 6) && (end.size() == 6)) { // HHMMSS #ifdef DEBUG clog << "TimeValueRangeProcessor::operator: accepting " << begin << ".." << end << endl; #endif return Xapian::Query(Xapian::Query::OP_VALUE_RANGE, m_valueNumber, begin,end); } if ((begin.size() == 8) && (end.size() == 8) && (begin[2] == begin[5]) && (end[2] == end[5]) && (begin[2] == end[2]) && (end[4] == ':')) { std::string lower(begin), upper(end); // HH:MM:SS lower.erase(2, 1); lower.erase(5, 1); upper.erase(2, 1); upper.erase(5, 1); #ifdef DEBUG clog << "TimeValueRangeProcessor::operator: accepting " << lower << ".." << upper << endl; #endif return Xapian::Query(Xapian::Query::OP_VALUE_RANGE, m_valueNumber, lower, upper); } #ifdef DEBUG clog << "TimeValueRangeProcessor::operator: rejecting " << begin << ".." << end << endl; #endif return Xapian::Query(Xapian::Query::OP_INVALID); } protected: Xapian::valueno m_valueNumber; }; class TermDecider : public Xapian::ExpandDecider { public: TermDecider(Xapian::Database *pIndex, Xapian::Stem *pStemmer, Xapian::Stopper *pStopper, const string &allowedPrefixes, Xapian::Query &query) : Xapian::ExpandDecider(), m_pIndex(pIndex), m_pStemmer(pStemmer), m_pStopper(pStopper), m_allowedPrefixes(allowedPrefixes), m_pTermsToAvoid(NULL) { m_pTermsToAvoid = new set(); for (Xapian::TermIterator termIter = query.get_terms_begin(); termIter != query.get_terms_end(); ++termIter) { string term(*termIter); if (isupper((int)(term[0])) == 0) { m_pTermsToAvoid->insert(term); if (m_pStemmer != NULL) { string stem((*m_pStemmer)(term)); m_pTermsToAvoid->insert(stem); } } else if (term[0] == 'Z') { m_pTermsToAvoid->insert(term.substr(1)); } } #ifdef DEBUG clog << "TermDecider: avoiding " << m_pTermsToAvoid->size() << " terms" << endl; #endif } virtual ~TermDecider() { if (m_pTermsToAvoid != NULL) { delete m_pTermsToAvoid; } } virtual bool operator()(const std::string &term) const { CJKVTokenizer tokenizer; bool isPrefixed = false; // Reject short terms if ((tokenizer.has_cjkv(term) == false) && (term.length() < 3)) { return false; } // Reject terms with prefixes we don't want if (isupper((int)(term[0])) != 0) { isPrefixed = true; if (m_allowedPrefixes.find(term[0]) == string::npos) { return false; } } // Reject terms with spaces if (term.find_first_of(" \t\r\n") != string::npos) { return false; } // Reject terms that occur only once if ((m_pIndex != NULL) && (m_pIndex->get_termfreq(term) <= 1)) { return false; } // Reject stop words if ((m_pStopper != NULL) && ((*m_pStopper)(term) == true)) { return false; } // Stop here if there's no specific terms to avoid if (m_pTermsToAvoid->empty() == true) { return true; } // Reject query terms if (m_pTermsToAvoid->find(term) != m_pTermsToAvoid->end()) { return false; } // Stop here is there's no stemmer if (m_pStemmer == NULL) { return true; } // Reject terms that stem to the same as query terms // or previously validated terms string stem; if (isPrefixed == true) { stem = (*m_pStemmer)(term.substr(1)); } else { stem = (*m_pStemmer)(term); } if (m_pTermsToAvoid->find(stem) != m_pTermsToAvoid->end()) { return false; } m_pTermsToAvoid->insert(stem); return true; } protected: Xapian::Database *m_pIndex; Xapian::Stem *m_pStemmer; Xapian::Stopper *m_pStopper; string m_allowedPrefixes; set *m_pTermsToAvoid; }; class FileStopper : public Xapian::SimpleStopper { public: FileStopper(const string &languageCode) : Xapian::SimpleStopper(), m_languageCode(languageCode), m_stopwordsCount(0) { if (languageCode.empty() == false) { ifstream inputFile; string fileName(PREFIX); fileName += "/share/pinot/stopwords/stopwords."; fileName += languageCode; inputFile.open(fileName.c_str()); if (inputFile.good() == true) { string line; // Each line is a stopword while (getline(inputFile, line).eof() == false) { add(line); ++m_stopwordsCount; } } inputFile.close(); #ifdef DEBUG clog << "FileStopper: " << m_stopwordsCount << " stopwords for language code " << languageCode << endl; #endif } } virtual ~FileStopper() { } unsigned int get_stopwords_count(void) const { return m_stopwordsCount; } static FileStopper *get_stopper(const string &languageCode) { if (m_pStopper == NULL) { m_pStopper = new FileStopper(languageCode); } else if (m_pStopper->m_languageCode != languageCode) { delete m_pStopper; m_pStopper = new FileStopper(languageCode); } return m_pStopper; } static void free_stopper(void) { if (m_pStopper != NULL) { delete m_pStopper; m_pStopper = NULL; } } protected: string m_languageCode; unsigned int m_stopwordsCount; static FileStopper *m_pStopper; }; FileStopper *FileStopper::m_pStopper = NULL; class QueryModifier : public Dijon::CJKVTokenizer::TokensHandler { public: typedef enum { NONE = 0, BRACKETS } CJKVWrap; QueryModifier(const string &query, unsigned int nGramSize) : m_query(query), m_pos(0), m_wrap(BRACKETS), m_wrapped(false), m_nGramCount(0), m_nGramSize(nGramSize), m_tokensCount(0), m_hasCJKV(false), m_hasNonCJKV(false) { } virtual ~QueryModifier() { } virtual bool handle_token(const string &tok, bool is_cjkv) { if (tok.empty() == true) { return false; } #ifdef DEBUG clog << "QueryModifier::handle_token: " << tok << endl; #endif // Where is this token in the original query ? string::size_type tokPos = m_query.find(tok, m_pos); ++m_tokensCount; // Is this CJKV ? if (is_cjkv == false) { char lastChar = tok[tok.length() - 1]; if (tokPos == string::npos) { // This should have been found return false; } if (m_nGramCount > 0) { wrapClose(); m_nGramCount = 0; m_pos = tokPos; } m_currentFilter.clear(); if (lastChar == '"') { // It's a quoted string m_wrap = NONE; } else if (lastChar == ':') { // It's a filter m_wrap = NONE; m_currentFilter = tok; } else { m_wrap = BRACKETS; } if (m_currentFilter.empty() == true) { m_hasNonCJKV = true; } // Strip accents and other diacritics from terms string unaccentedTok(Dijon::CJKVTokenizer::strip_marks(tok)); if (tok != unaccentedTok) { #ifdef DEBUG clog << "QueryModifier::handle_token: " << tok << " stripped to " << unaccentedTok << endl; #endif m_query.replace(tokPos, tok.length(), unaccentedTok); } // Return right away return true; } // First n-gram ? if (m_nGramCount == 0) { if (tokPos == string::npos) { // That's definitely not right return false; } // Append non-CJKV text that precedes and start wrapping CJKV tokens if (tokPos > m_pos) { m_modifiedQuery += " " + m_query.substr(m_pos, tokPos - m_pos); } m_pos += tok.length(); wrapOpen(); } else { m_modifiedQuery += " "; if (m_currentFilter.empty() == false) { m_modifiedQuery += m_currentFilter; } } m_modifiedQuery += tok; #ifdef DEBUG clog << "QueryModifier::handle_token: " << m_modifiedQuery << endl; #endif if (tokPos != string::npos) { m_pos = tokPos + tok.length(); } ++m_nGramCount; m_hasCJKV = true; return true; } unsigned int get_tokens_count(void) const { return m_tokensCount; } string get_modified_query(bool &pureCJKV) { #ifdef DEBUG clog << "QueryModifier::get_modified_query: " << m_pos << "/" << m_query.length() << endl; #endif // Anything left ? if (m_pos < m_query.length() - 1) { m_modifiedQuery += " " + m_query.substr(m_pos); } wrapClose(); #ifdef DEBUG clog << "QueryModifier::get_modified_query: " << m_modifiedQuery << endl; #endif if ((m_hasCJKV == true) && (m_hasNonCJKV == false)) { pureCJKV = true; } else { pureCJKV = false; } return m_modifiedQuery; } protected: string m_query; string m_modifiedQuery; string::size_type m_pos; CJKVWrap m_wrap; bool m_wrapped; string m_currentFilter; unsigned int m_nGramCount; unsigned int m_nGramSize; unsigned int m_tokensCount; bool m_hasCJKV; bool m_hasNonCJKV; void wrapOpen(void) { switch (m_wrap) { case BRACKETS: m_modifiedQuery += " ("; break; case NONE: default: break; } m_wrapped = true; } void wrapClose(void) { if (m_wrapped == false) { return; } // Finish wrapping CJKV tokens switch (m_wrap) { case BRACKETS: m_modifiedQuery += ')'; break; case NONE: default: break; } m_wrapped = false; } }; XapianEngine::XapianEngine(const string &database) : SearchEngineInterface() { // We expect documents to have been converted to UTF-8 at indexing time m_charset = "UTF-8"; // If the database name ends with a slash, remove it if (database[database.length() - 1] == '/') { m_databaseName = database.substr(0, database.length() - 1); } else { m_databaseName = database; } } XapianEngine::~XapianEngine() { } Xapian::Query XapianEngine::parseQuery(Xapian::Database *pIndex, const QueryProperties &queryProps, const string &stemLanguage, DefaultOperator defaultOperator, string &correctedFreeQuery, bool minimal) { Xapian::QueryParser parser; CJKVTokenizer tokenizer; string freeQuery(queryProps.getFreeQuery()); unsigned int tokensCount = 1; // Modifying the query is necessary as diacritics sensitivity is off QueryModifier handler(freeQuery, tokenizer.get_ngram_size()); tokenizer.tokenize(freeQuery, handler, true); tokensCount = handler.get_tokens_count(); // We can disable stemming and spelling correction for pure CJKV queries string cjkvQuery(handler.get_modified_query(minimal)); #ifdef DEBUG clog << "XapianEngine::parseQuery: CJKV query is " << cjkvQuery << endl; #endif // Do as if the user had given this as input freeQuery = cjkvQuery; #ifdef DEBUG clog << "XapianEngine::parseQuery: " << tokensCount << " tokens" << endl; #endif if (pIndex != NULL) { // The database is required for wildcards and spelling parser.set_database(*pIndex); } // Set things up if ((minimal == false) && (stemLanguage.empty() == false)) { parser.set_stemmer(m_stemmer); parser.set_stemming_strategy(Xapian::QueryParser::STEM_SOME); // Don't bother loading the stopwords list if there's only one token if (tokensCount > 1) { FileStopper *pStopper = FileStopper::get_stopper(Languages::toCode(stemLanguage)); if ((pStopper != NULL) && (pStopper->get_stopwords_count() > 0)) { parser.set_stopper(pStopper); } } } else { #ifdef DEBUG clog << "XapianEngine::parseQuery: no stemming" << endl; #endif parser.set_stemming_strategy(Xapian::QueryParser::STEM_NONE); } // What's the default operator ? if (defaultOperator == DEFAULT_OP_AND) { parser.set_default_op(Xapian::Query::OP_AND); } else { parser.set_default_op(Xapian::Query::OP_OR); } // Search across text body and title parser.add_prefix("", ""); parser.add_prefix("", "S"); // X prefixes should always include a colon parser.add_boolean_prefix("site", "H"); parser.add_boolean_prefix("file", "P"); parser.add_boolean_prefix("ext", "E"); parser.add_prefix("title", "S"); parser.add_boolean_prefix("url", "U"); parser.add_boolean_prefix("dir", "XDIR:"); parser.add_boolean_prefix("inurl", "XFILE:"); parser.add_prefix("path", "XPATH:"); parser.add_boolean_prefix("lang", "L"); parser.add_boolean_prefix("type", "T"); parser.add_boolean_prefix("class", "XCLASS:"); parser.add_boolean_prefix("label", "XLABEL:"); parser.add_boolean_prefix("tokens", "XTOK:"); if (g_pMapper != NULL) { map filters; g_pMapper->getBooleanFilters(filters); for (map::const_iterator filterIter = filters.begin(); filterIter != filters.end(); ++filterIter) { parser.add_boolean_prefix(filterIter->first, filterIter->second); } } // Date range Xapian::DateRangeProcessor dateProcessor(0); parser.add_rangeprocessor(&dateProcessor); // Size with a "b" suffix, ie 1024..10240b Xapian::NumberRangeProcessor sizeProcessor(2, "b", Xapian::RP_SUFFIX); parser.add_rangeprocessor(&sizeProcessor); // Time range TimeValueRangeProcessor timeProcessor(3); parser.add_rangeprocessor(&timeProcessor); // Do some pre-processing : look for filters with quoted values string::size_type escapedFilterEnd = 0; string::size_type escapedFilterStart = freeQuery.find(":\""); while ((escapedFilterStart != string::npos) && (escapedFilterStart < freeQuery.length() - 2)) { escapedFilterEnd = freeQuery.find("\"", escapedFilterStart + 2); if (escapedFilterEnd == string::npos) { break; } string filterValue = freeQuery.substr(escapedFilterStart + 2, escapedFilterEnd - escapedFilterStart - 2); if (filterValue.empty() == false) { string escapedValue(Url::escapeUrl(filterValue)); bool escapeValue = false, hashValue = false; // The value should be escaped and length-limited as done at indexing time checkFilter(freeQuery, escapedFilterStart, escapeValue, hashValue); if (escapeValue == false) { // No escaping escapedValue = filterValue; } if (hashValue == true) { // Partially hash if necessary escapedValue = XapianDatabase::limitTermLength(escapedValue, true); } else { escapedValue = XapianDatabase::limitTermLength(escapedValue); } #ifdef DEBUG clog << "XapianEngine::parseQuery: escaping to " << escapedValue << endl; #endif freeQuery.replace(escapedFilterStart + 1, escapedFilterEnd - escapedFilterStart, escapedValue); escapedFilterEnd = escapedFilterEnd + escapedValue.length() - filterValue.length(); } else { // No value ! freeQuery.replace(escapedFilterStart, escapedFilterEnd - escapedFilterStart + 1, ":"); escapedFilterEnd -= 2; } #ifdef DEBUG clog << "XapianEngine::parseQuery: replaced filter: " << freeQuery << endl; #endif // Next escapedFilterStart = freeQuery.find(":\"", escapedFilterEnd); } // Parse the query string with all necessary options unsigned int flags = Xapian::QueryParser::FLAG_BOOLEAN|Xapian::QueryParser::FLAG_PHRASE| Xapian::QueryParser::FLAG_LOVEHATE|Xapian::QueryParser::FLAG_PURE_NOT; if (minimal == false) { flags |= Xapian::QueryParser::FLAG_WILDCARD; flags |= Xapian::QueryParser::FLAG_SPELLING_CORRECTION; } Xapian::Query parsedQuery = parser.parse_query(freeQuery, flags); #ifdef DEBUG clog << "XapianEngine::parseQuery: query is " << parsedQuery.get_description() << endl; #endif // Any limit on what documents should be searched ? if (m_limitDocuments.empty() == false) { Xapian::Query filterQuery(Xapian::Query::OP_OR, m_limitDocuments.begin(), m_limitDocuments.end()); parsedQuery = Xapian::Query(Xapian::Query::OP_FILTER, parsedQuery, filterQuery); #ifdef DEBUG clog << "XapianEngine::parseQuery: limited query is " << parsedQuery.get_description() << endl; #endif } if (minimal == false) { // Any correction ? correctedFreeQuery = parser.get_corrected_query_string(); #ifdef DEBUG if (correctedFreeQuery.empty() == false) { clog << "XapianEngine::parseQuery: corrected spelling to: " << correctedFreeQuery << endl; } #endif } return parsedQuery; } bool XapianEngine::queryDatabase(Xapian::Database *pIndex, Xapian::Query &query, const string &stemLanguage, unsigned int startDoc, const QueryProperties &queryProps) { Timer timer; unsigned int maxResultsCount = queryProps.getMaximumResultsCount(); bool completedQuery = false; if (pIndex == NULL) { return false; } // Start an enquire session on the database Xapian::Enquire enquire(*pIndex); timer.start(); try { // Give the query object to the enquire session enquire.set_query(query); // How should results be sorted ? if (queryProps.getSortOrder() == QueryProperties::RELEVANCE) { // By relevance, then date enquire.set_sort_by_relevance_then_value(4, true); #ifdef DEBUG clog << "XapianEngine::queryDatabase: sorting by relevance first" << endl; #endif } else if (queryProps.getSortOrder() == QueryProperties::DATE_DESC) { // By date, and then by relevance enquire.set_docid_order(Xapian::Enquire::DONT_CARE); enquire.set_sort_by_value_then_relevance(4, true); #ifdef DEBUG clog << "XapianEngine::queryDatabase: sorting by date and time desc" << endl; #endif } else if (queryProps.getSortOrder() == QueryProperties::DATE_ASC) { // By date, and then by relevance enquire.set_docid_order(Xapian::Enquire::DONT_CARE); enquire.set_sort_by_value_then_relevance(5, true); #ifdef DEBUG clog << "XapianEngine::queryDatabase: sorting by date and time asc" << endl; #endif } else if (queryProps.getSortOrder() == QueryProperties::SIZE_DESC) { // By date, and then by relevance enquire.set_docid_order(Xapian::Enquire::DONT_CARE); enquire.set_sort_by_value_then_relevance(2, true); #ifdef DEBUG clog << "XapianEngine::queryDatabase: sorting by size asc" << endl; #endif } // Collapse results ? if (g_pMapper != NULL) { unsigned int valueNumber; if (g_pMapper->collapseOnValue(valueNumber) == true) { enquire.set_collapse_key(valueNumber, 1); } } // Get the top results of the query Xapian::MSet matches = enquire.get_mset(startDoc, maxResultsCount, (2 * maxResultsCount) + 1); m_resultsCountEstimate = matches.get_matches_estimated(); if (matches.empty() == false) { #ifdef DEBUG clog << "XapianEngine::queryDatabase: found " << matches.size() << "/" << maxResultsCount << " results found from position " << startDoc << endl; clog << "XapianEngine::queryDatabase: estimated " << matches.get_matches_lower_bound() << "/" << m_resultsCountEstimate << "/" << matches.get_matches_upper_bound() << ", " << matches.get_description() << endl; #endif // Get the results for (Xapian::MSetIterator mIter = matches.begin(); mIter != matches.end(); ++mIter) { Xapian::docid docId = *mIter; Xapian::Document doc(mIter.get_document()); bool hasCJKV = false; if (docId <= 0) { #ifdef DEBUG clog << "XapianEngine::queryDatabase: bogus document ID " << docId << endl; #endif continue; } DocumentInfo thisResult; string docText(getDocumentText(pIndex, docId, hasCJKV)); unsigned int flags = Xapian::MSet::SNIPPET_BACKGROUND_MODEL|Xapian::MSet::SNIPPET_EXHAUSTIVE; if (hasCJKV == true) { flags |= Xapian::MSet::SNIPPET_CJK_NGRAM; } if (stemLanguage.empty() == true) { thisResult.setExtract(matches.snippet(docText, 300, Xapian::Stem(), flags)); } else { thisResult.setExtract(matches.snippet(docText, 300, m_stemmer, flags)); } thisResult.setScore((float)mIter.get_percent()); #ifdef DEBUG clog << "XapianEngine::queryDatabase: found document ID " << docId << endl; #endif XapianDatabase::recordToProps(doc.get_data(), &thisResult); // XapianDatabase stored the language in English thisResult.setLanguage(Languages::toLocale(thisResult.getLanguage())); string url(thisResult.getLocation()); if (url.empty() == true) { // Hmmm this shouldn't be empty... // Use this instead, even though the document isn't cached in the index thisResult.setLocation(XapianDatabase::buildUrl(m_databaseName, docId)); } // We don't know the index ID, just the document ID thisResult.setIsIndexed(0, docId); // Add this result m_resultsList.push_back(thisResult); } } completedQuery = true; } catch (const Xapian::Error &error) { clog << "Couldn't run query: " << error.get_type() << ": " << error.get_msg() << endl; } clog << "Ran query \"" << queryProps.getFreeQuery() << "\" in " << timer.stop() << " ms" << endl; try { m_expandTerms.clear(); // Expand the query ? if (m_expandDocuments.empty() == false) { Xapian::RSet expandDocs; for (set::const_iterator docIter = m_expandDocuments.begin(); docIter != m_expandDocuments.end(); ++docIter) { string uniqueTerm(string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(Url::canonicalizeUrl(*docIter)), true)); // Only one document may have this term Xapian::PostingIterator postingIter = pIndex->postlist_begin(uniqueTerm); if (postingIter != pIndex->postlist_end(uniqueTerm)) { expandDocs.add_document(*postingIter); } } #ifdef DEBUG clog << "XapianEngine::queryDatabase: expand from " << expandDocs.size() << " documents" << endl; #endif // Get 10 non-prefixed terms string allowedPrefixes("RS"); TermDecider expandDecider(pIndex, ((stemLanguage.empty() == true) ? NULL : &m_stemmer), FileStopper::get_stopper(Languages::toCode(stemLanguage)), allowedPrefixes, query); Xapian::ESet expandTerms = enquire.get_eset(10, expandDocs, &expandDecider); #ifdef DEBUG clog << "XapianEngine::queryDatabase: " << expandTerms.size() << " expand terms" << endl; #endif for (Xapian::ESetIterator termIter = expandTerms.begin(); termIter != expandTerms.end(); ++termIter) { string expandTerm(*termIter); char firstChar = expandTerm[0]; // Is this prefixed ? if (allowedPrefixes.find(firstChar) != string::npos) { expandTerm.erase(0, 1); } m_expandTerms.insert(expandTerm); } } } catch (const Xapian::Error &error) { clog << "Couldn't run query: " << error.get_type() << ": " << error.get_msg() << endl; } // Be tolerant of errors as long as we got some results if ((completedQuery == true) || (m_resultsList.empty() == false)) { return true; } return false; } /// Frees all objects. void XapianEngine::freeAll(void) { FileStopper::free_stopper(); } string XapianEngine::getDocumentText(Xapian::Database *pIndex, Xapian::docid docId, bool &hasCJKV) { map wordsBuffer; CJKVTokenizer tokenizer; try { // Go through the position list of each term for (Xapian::TermIterator termIter = pIndex->termlist_begin(docId); termIter != pIndex->termlist_end(docId); ++termIter) { string termName(*termIter); // Skip prefixed terms if (isupper((int)termName[0]) != 0) { if (termName == "XTOK:CJKV") { hasCJKV = true; } continue; } // Skip multi-character CJKV terms if ((tokenizer.has_cjkv(termName) == true) && (termName.length() > 4)) { continue; } for (Xapian::PositionIterator positionIter = pIndex->positionlist_begin(docId, termName); positionIter != pIndex->positionlist_end(docId, termName); ++positionIter) { Xapian::termpos termPos = *positionIter; // If several terms exist at this position, prefer the shortest one map::const_iterator wordIter = wordsBuffer.find(termPos); if ((wordIter == wordsBuffer.end()) || (wordIter->second.length() > termName.length())) { wordsBuffer[termPos] = termName; } } } } catch (const Xapian::Error &error) { #ifdef DEBUG clog << "XapianEngine::getDocumentText: " << error.get_msg() << endl; #endif } string docText; for (map::const_iterator wordIter = wordsBuffer.begin(); wordIter != wordsBuffer.end(); ++wordIter) { docText += " "; docText += wordIter->second; } return docText; } // // Implementation of SearchEngineInterface // /// Sets the set of documents to limit to. bool XapianEngine::setLimitSet(const set &docsSet) { for (set::const_iterator docIter = docsSet.begin(); docIter != docsSet.end(); ++docIter) { string urlFilter("U"); // Escape and hash urlFilter += XapianDatabase::limitTermLength(Url::escapeUrl(*docIter), true); m_limitDocuments.insert(urlFilter); } #ifdef DEBUG clog << "XapianEngine::setLimitSet: " << m_limitDocuments.size() << " documents" << endl; #endif return true; } /// Sets the set of documents to expand from. bool XapianEngine::setExpandSet(const set &docsSet) { copy(docsSet.begin(), docsSet.end(), inserter(m_expandDocuments, m_expandDocuments.begin())); #ifdef DEBUG clog << "XapianEngine::setExpandSet: " << m_expandDocuments.size() << " documents" << endl; #endif return true; } /// Runs a query; true if success. bool XapianEngine::runQuery(QueryProperties& queryProps, unsigned int startDoc) { string stemLanguage(Languages::toEnglish(queryProps.getStemmingLanguage())); // Clear the results list m_resultsList.clear(); m_resultsCountEstimate = 0; m_correctedFreeQuery.clear(); if (queryProps.isEmpty() == true) { #ifdef DEBUG clog << "XapianEngine::runQuery: query is empty" << endl; #endif return false; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, true); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } if ((stemLanguage.empty() == false) && (stemLanguage != "unknown")) { #ifdef DEBUG clog << "XapianEngine::runQuery: " << stemLanguage << " stemming" << endl; #endif try { m_stemmer = Xapian::Stem(StringManip::toLowerCase(stemLanguage)); } catch (const Xapian::Error &error) { clog << "Couldn't create stemmer: " << error.get_type() << ": " << error.get_msg() << endl; } } // Get the latest revision... pDatabase->reopen(); Xapian::Database *pIndex = pDatabase->readLock(); try { unsigned int searchStep = 1; // Searches are run in this order : // 1. no stemming, exact matches only // 2. stem terms if a language is defined for the query Xapian::Query fullQuery = parseQuery(pIndex, queryProps, "", m_defaultOperator, m_correctedFreeQuery); while (fullQuery.empty() == false) { // Query the database if (queryDatabase(pIndex, fullQuery, stemLanguage, startDoc, queryProps) == false) { break; } if (m_resultsList.empty() == true) { // The search did succeed but didn't return anything if ((searchStep == 1) && (stemLanguage.empty() == false)) { #ifdef DEBUG clog << "XapianEngine::runQuery: trying again with stemming" << endl; #endif fullQuery = parseQuery(pIndex, queryProps, stemLanguage, m_defaultOperator, m_correctedFreeQuery); ++searchStep; continue; } } else { // We have results, don't bother about correcting the query m_correctedFreeQuery.clear(); } pDatabase->unlock(); return true; } } catch (const Xapian::Error &error) { clog << "Couldn't run query: " << error.get_type() << ": " << error.get_msg() << endl; } pDatabase->unlock(); return false; } pinot-1.22/IndexSearch/Xapian/XapianEngine.h000066400000000000000000000044131470740426600207370ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _XAPIAN_ENGINE_H #define _XAPIAN_ENGINE_H #include #include #include #include #include "config.h" #include "SearchEngineInterface.h" /// Wraps Xapian's search funtionality. class XapianEngine : public SearchEngineInterface { public: XapianEngine(const std::string &database); virtual ~XapianEngine(); /// Frees all objects. static void freeAll(void); /// Sets the set of documents to limit to. virtual bool setLimitSet(const std::set &docsSet); /// Sets the set of documents to expand from. virtual bool setExpandSet(const std::set &docsSet); /// Runs a query; true if success. virtual bool runQuery(QueryProperties& queryProps, unsigned int startDoc = 0); protected: std::string m_databaseName; std::set m_limitDocuments; std::set m_expandDocuments; Xapian::Stem m_stemmer; static std::string getDocumentText(Xapian::Database *pIndex, Xapian::docid docId, bool &hasCJKV); bool queryDatabase(Xapian::Database *pIndex, Xapian::Query &query, const std::string &stemLanguage, unsigned int startDoc, const QueryProperties &queryProps); Xapian::Query parseQuery(Xapian::Database *pIndex, const QueryProperties &queryProps, const std::string &stemLanguage, DefaultOperator defaultOperator, std::string &correctedFreeQuery, bool minimal = false); private: XapianEngine(const XapianEngine &other); XapianEngine &operator=(const XapianEngine &other); }; #endif // _XAPIAN_ENGINE_H pinot-1.22/IndexSearch/Xapian/XapianIndex.cpp000066400000000000000000001532171470740426600211430ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "Languages.h" #include "StringManip.h" #include "TimeConverter.h" #include "Url.h" #include "FieldMapperInterface.h" #include "LanguageDetector.h" #include "XapianDatabaseFactory.h" #include "XapianIndex.h" #define MAGIC_TERM "X-MetaSE-Doc" using std::clog; using std::clog; using std::endl; using std::ios; using std::ifstream; using std::ofstream; using std::string; using std::vector; using std::set; using std::map; using std::min; using std::max; using std::pair; extern FieldMapperInterface *g_pMapper; class TokensIndexer : public Dijon::CJKVTokenizer::TokensHandler { public: TokensIndexer(Xapian::Stem *pStemmer, Xapian::Document &doc, const Xapian::WritableDatabase &db, const string &prefix, unsigned int nGramSize, bool &doSpelling, Xapian::termcount &termPos) : Dijon::CJKVTokenizer::TokensHandler(), m_pStemmer(pStemmer), m_doc(doc), m_db(db), m_prefix(prefix), m_nGramSize(nGramSize), m_nGramCount(0), m_doSpelling(doSpelling), m_termPos(termPos), m_hasCJKV(false) { } virtual ~TokensIndexer() { if (m_hasCJKV == true) { // This will help identify CJKV documents m_doc.add_term("XTOK:CJKV"); } } virtual bool handle_token(const string &tok, bool is_cjkv) { bool addSpelling = false; if (tok.empty() == true) { return false; } // Lower case the term and trim spaces string term(StringManip::toLowerCase(tok)); StringManip::trimSpaces(term); if (term.empty() == true) { return true; } // Does it end with a dot ? if (term[term.length() - 1] == '.') { bool foundNonDot = false; string::size_type pos = term.length() - 1; while (pos >= 0) { if (term[pos] != '.') { foundNonDot = true; // Any dot before that ? if ((pos == 0) || (term.find_last_of(".", pos - 1) == string::npos)) { // No, all dots are at the end, trim them term.erase(pos + 1); } // Else, it's probably an acronym break; } if (pos == 0) { break; } --pos; } if (foundNonDot == false) { // It's all dots ! return true; } } m_doc.add_posting(m_prefix + XapianDatabase::limitTermLength(term), m_termPos); // Is this CJKV ? if (is_cjkv == false) { bool hasDiacritics = false; // Remove accents and other diacritics string unaccentedTerm(Dijon::CJKVTokenizer::strip_marks(term)); if (unaccentedTerm != term) { m_doc.add_posting(m_prefix + XapianDatabase::limitTermLength(unaccentedTerm), m_termPos); hasDiacritics = true; } // Don't stem if the term starts with a digit if ((m_pStemmer != NULL) && (isdigit((int)term[0]) == 0)) { string stemmedTerm((*m_pStemmer)(term)); m_doc.add_term("Z" + XapianDatabase::limitTermLength(stemmedTerm)); if (hasDiacritics == true) { stemmedTerm = (*m_pStemmer)(unaccentedTerm); m_doc.add_term("Z" + XapianDatabase::limitTermLength(stemmedTerm)); } } // Does it include dots ? string::size_type dotPos = term.find('.'); if (dotPos != string::npos) { string::size_type startPos = 0; bool addRemainder = true; while (dotPos != string::npos) { string component(term.substr(startPos, dotPos - startPos)); if (component.empty() == false) { m_doc.add_posting(m_prefix + XapianDatabase::limitTermLength(component), m_termPos); ++m_termPos; } // Next if (dotPos == term.length() - 1) { addRemainder = false; break; } startPos = dotPos + 1; dotPos = term.find('.', startPos); } if (addRemainder == true) { string lastComponent(term.substr(startPos)); m_doc.add_posting(m_prefix + XapianDatabase::limitTermLength(lastComponent), m_termPos); } } addSpelling = m_doSpelling; ++m_termPos; m_nGramCount = 0; } else { if (m_nGramCount % m_nGramSize == 0) { ++m_termPos; } else if ((m_nGramCount + 1) % m_nGramSize == 0) { addSpelling = m_doSpelling; } ++m_nGramCount; m_hasCJKV = true; } if (addSpelling == true) { try { m_db.add_spelling(XapianDatabase::limitTermLength(term)); } catch (const Xapian::UnimplementedError &error) { clog << "Couldn't index with spelling correction: " << error.get_type() << ": " << error.get_msg() << endl; m_doSpelling = false; } } return true; } protected: Xapian::Stem *m_pStemmer; Xapian::Document &m_doc; const Xapian::WritableDatabase &m_db; string m_prefix; unsigned int m_nGramSize; unsigned int m_nGramCount; bool &m_doSpelling; Xapian::termcount &m_termPos; bool m_hasCJKV; }; XapianIndex::XapianIndex(const string &indexName) : IndexInterface(), m_databaseName(indexName), m_goodIndex(false), m_doSpelling(true) { // Open in read-only mode XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if ((pDatabase != NULL) && (pDatabase->isOpen() == true)) { m_goodIndex = true; m_doSpelling = pDatabase->withSpelling(); } } XapianIndex::XapianIndex(const XapianIndex &other) : IndexInterface(other), m_databaseName(other.m_databaseName), m_goodIndex(other .m_goodIndex), m_doSpelling(other.m_doSpelling), m_stemLanguage(other.m_stemLanguage) { } XapianIndex::~XapianIndex() { } XapianIndex &XapianIndex::operator=(const XapianIndex &other) { if (this != &other) { IndexInterface::operator=(other); m_databaseName = other.m_databaseName; m_goodIndex = other .m_goodIndex; m_doSpelling = other.m_doSpelling; m_stemLanguage = other.m_stemLanguage; } return *this; } bool XapianIndex::listDocumentsWithTerm(const string &term, set &docIds, unsigned int maxDocsCount, unsigned int startDoc) const { unsigned int docCount = 0; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return 0; } docIds.clear(); try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { #ifdef DEBUG clog << "XapianIndex::listDocumentsWithTerm: term " << term << endl; #endif // Get a list of documents that have the term for (Xapian::PostingIterator postingIter = pIndex->postlist_begin(term); (postingIter != pIndex->postlist_end(term)) && ((maxDocsCount == 0) || (docIds.size() < maxDocsCount)); ++postingIter) { Xapian::docid docId = *postingIter; // We cannot use postingIter->skip_to() because startDoc isn't an ID if (docCount >= startDoc) { docIds.insert(docId); } ++docCount; } } } catch (const Xapian::Error &error) { clog << "Couldn't get document list: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get document list, unknown exception occurred" << endl; } pDatabase->unlock(); return docIds.size(); } void XapianIndex::addPostingsToDocument(const Xapian::Utf8Iterator &itor, Xapian::Document &doc, const Xapian::WritableDatabase &db, const string &prefix, bool noStemming, bool &doSpelling, Xapian::termcount &termPos) const { Xapian::Stem *pStemmer = NULL; // Do we know what language to use for stemming ? if ((noStemming == false) && (m_stemLanguage.empty() == false) && (m_stemLanguage != "unknown")) { try { pStemmer = new Xapian::Stem(StringManip::toLowerCase(m_stemLanguage)); } catch (const Xapian::Error &error) { clog << "Couldn't create stemmer: " << error.get_type() << ": " << error.get_msg() << endl; } } const char *pRawData = itor.raw(); if (pRawData != NULL) { #ifndef _TERM_GEN Dijon::CJKVTokenizer tokenizer; string text(pRawData); // Use overload addPostingsToDocument(tokenizer, pStemmer, text, doc, db, prefix, doSpelling, termPos); #else Xapian::TermGenerator generator; // Set the stemmer if (pStemmer != NULL) { generator.set_stemmer(*pStemmer); } generator.set_termpos(termPos); try { // Older Xapian backends don't support spelling correction if (doSpelling == true) { // The database is required for the spelling dictionary generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING); generator.set_database(db); } generator.set_document(doc); generator.index_text(itor, 1, prefix); } catch (const Xapian::UnimplementedError &error) { clog << "Couldn't index with spelling correction: " << error.get_type() << ": " << error.get_msg() << endl; if (doSpelling == true) { doSpelling = false; // Try again without spelling correction // Let the caller catch the exception generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING, Xapian::TermGenerator::FLAG_SPELLING); generator.set_document(doc); generator.index_text(itor, 1, prefix); } } termPos = generator.get_termpos(); #endif } if (pStemmer != NULL) { delete pStemmer; } } void XapianIndex::addPostingsToDocument(Dijon::CJKVTokenizer &tokenizer, Xapian::Stem *pStemmer, const string &text, Xapian::Document &doc, const Xapian::WritableDatabase &db, const string &prefix, bool &doSpelling, Xapian::termcount &termPos) const { TokensIndexer handler(pStemmer, doc, db, prefix, tokenizer.get_ngram_size(), doSpelling, termPos); // Get the terms tokenizer.tokenize(text, handler, true); #ifdef DEBUG clog << "XapianIndex::addPostingsToDocument: terms to position " << termPos << endl; #endif } void XapianIndex::addLabelsToDocument(Xapian::Document &doc, const set &labels, bool skipInternals) { if (labels.empty() == true) { return; } for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { string labelName(*labelIter); // Prevent from setting internal labels ? if ((labelName.empty() == true) || ((skipInternals == true) && (labelName.substr(0, 2) == "X-"))) { continue; } #ifdef DEBUG clog << "XapianIndex::addLabelsToDocument: label \"" << labelName << "\"" << endl; #endif doc.add_term(string("XLABEL:") + XapianDatabase::limitTermLength(Url::escapeUrl(labelName))); } } void XapianIndex::removePostingsFromDocument(const Xapian::Utf8Iterator &itor, Xapian::Document &doc, const Xapian::WritableDatabase &db, const string &prefix, bool noStemming, bool &doSpelling) const { Xapian::Document termsDoc; Xapian::termcount termPos = 0; bool addDoSpelling = false; // Get the terms, without populating the spelling database addPostingsToDocument(itor, termsDoc, db, prefix, noStemming, addDoSpelling, termPos); // Get the terms and remove the first posting for each for (Xapian::TermIterator termListIter = termsDoc.termlist_begin(); termListIter != termsDoc.termlist_end(); ++termListIter) { Xapian::termcount postingsCount = termListIter.positionlist_count(); Xapian::termcount postingNum = 0; bool removeTerm = false; #ifdef DEBUG clog << "XapianIndex::removePostingsFromDocument: term " << *termListIter << " has " << postingsCount << " postings" << endl; #endif // If a prefix is defined, or there are no postings, we can afford removing the term if ((prefix.empty() == false) || (postingsCount == 0)) { removeTerm = true; } else { // Check whether this term is in the original document and how many postings it has Xapian::TermIterator termIter = doc.termlist_begin(); if (termIter != doc.termlist_end()) { termIter.skip_to(*termListIter); if (termIter != doc.termlist_end()) { if (*termIter != *termListIter) { // This term doesn't exist in the document ! #ifdef DEBUG clog << "XapianIndex::removePostingsFromDocument: no such term" << endl; #endif continue; } if (termIter.positionlist_count() <= postingsCount) { // All postings are to be removed, so we can remove the term #ifdef DEBUG clog << "XapianIndex::removePostingsFromDocument: no extra posting" << endl; #endif removeTerm = true; } } } } if (removeTerm == true) { try { doc.remove_term(*termListIter); } catch (const Xapian::Error &error) { #ifdef DEBUG clog << "XapianIndex::removePostingsFromDocument: " << error.get_msg() << endl; #endif } try { // Decrease this term's frequency in the spelling dictionary if (doSpelling == true) { db.remove_spelling(*termListIter); } } catch (const Xapian::UnimplementedError &error) { clog << "Couldn't remove spelling correction: " << error.get_type() << ": " << error.get_msg() << endl; doSpelling = false; } catch (const Xapian::Error &error) { #ifdef DEBUG clog << "XapianIndex::removePostingsFromDocument: " << error.get_msg() << endl; #endif } continue; } // Otherwise, remove the first N postings // FIXME: if all the postings are in the range associated with the metadata // as opposed to the actual data, the term can be removed altogether for (Xapian::PositionIterator firstPosIter = termListIter.positionlist_begin(); firstPosIter != termListIter.positionlist_end(); ++firstPosIter) { if (postingNum >= postingsCount) { break; } ++postingNum; try { doc.remove_posting(*termListIter, *firstPosIter); } catch (const Xapian::Error &error) { // This posting may have been removed already #ifdef DEBUG clog << "XapianIndex::removePostingsFromDocument: " << error.get_msg() << endl; #endif } } } } void XapianIndex::addCommonTerms(const DocumentInfo &docInfo, Xapian::Document &doc, const Xapian::WritableDatabase &db, Xapian::termcount &termPos) { string title(docInfo.getTitle()); string location(docInfo.getLocation()); string type(docInfo.getType(false)); Url urlObj(location); // Add a magic term :-) doc.add_term(MAGIC_TERM); // Index the title with prefix S if (title.empty() == false) { addPostingsToDocument(Xapian::Utf8Iterator(title), doc, db, "S", false, m_doSpelling, termPos); } string hostName, tree, fileName; if (g_pMapper != NULL) { hostName = g_pMapper->getHost(docInfo); tree = g_pMapper->getDirectory(docInfo); fileName = g_pMapper->getFile(docInfo); } else { hostName = StringManip::toLowerCase(urlObj.getHost()); tree = urlObj.getLocation(); fileName = urlObj.getFile(); } #ifdef DEBUG clog << "XapianIndex::addCommonTerms: called for " << docInfo.getLocation() << " (" << docInfo.getInternalPath() << ")" << endl; #endif // Index the full URL with prefix U doc.add_term(string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(docInfo.getLocation(true)), true)); // And for containers, the base file with XFILE: if ((urlObj.isLocal() == true) && (docInfo.getInternalPath().empty() == false)) { string protocol(urlObj.getProtocol()); doc.add_term(string("XFILE:") + XapianDatabase::limitTermLength(Url::escapeUrl(location), true)); if ((urlObj.isLocal() == true) && (protocol != "file")) { string fileUrl(location); // Add another term with file as protocol fileUrl.replace(0, protocol.length(), "file"); doc.add_term(string("XFILE:") + XapianDatabase::limitTermLength(Url::escapeUrl(fileUrl), true)); } } // ...the host name and included domains with prefix H if (hostName.empty() == false) { doc.add_term(string("H") + XapianDatabase::limitTermLength(hostName, true)); string::size_type dotPos = hostName.find('.'); while (dotPos != string::npos) { doc.add_term(string("H") + XapianDatabase::limitTermLength(hostName.substr(dotPos + 1), true)); // Next dotPos = hostName.find('.', dotPos + 1); } } // ...the location (as is) and all directories with prefix XDIR: if (tree.empty() == false) { if ((urlObj.isLocal() == true) && (docInfo.getIsDirectory() == true)) { doc.add_term(string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(docInfo.getLocation().substr(7)), true)); #ifdef DEBUG clog << "XapianIndex::addCommonTerms: full XDIR" << docInfo.getLocation().substr(7) << endl; #endif } doc.add_term(string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(tree), true)); #ifdef DEBUG clog << "XapianIndex::addCommonTerms: first XDIR" << tree << endl; #endif if (tree[0] == '/') { doc.add_term("XDIR:/"); #ifdef DEBUG clog << "XapianIndex::addCommonTerms: top-level XDIR" << endl; #endif } string::size_type slashPos = tree.find('/', 1); while (slashPos != string::npos) { doc.add_term(string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(tree.substr(0, slashPos)), true)); #ifdef DEBUG clog << "XapianIndex::addCommonTerms: component XDIR" << tree.substr(0, slashPos) << endl; #endif // Next slashPos = tree.find('/', slashPos + 1); } // ...and all components as XPATH: bool doSpellingOnPaths = false; addPostingsToDocument(Xapian::Utf8Iterator(tree), doc, db, "XPATH:", true, doSpellingOnPaths, termPos); } else { doc.add_term("XDIR:/"); #ifdef DEBUG clog << "XapianIndex::addCommonTerms: single top-level XDIR" << endl; #endif } // ...and the file name with prefix P if (fileName.empty() == false) { string extension; doc.add_term(string("P") + XapianDatabase::limitTermLength(Url::escapeUrl(fileName), true)); if (fileName.find(' ') != string::npos) { bool doSpellingOnPaths = false; // Add more XPATH: terms if there's a space in the file name addPostingsToDocument(Xapian::Utf8Iterator(fileName), doc, db, "XPATH:", true, doSpellingOnPaths, termPos); } // Does it have an extension ? string::size_type extPos = fileName.rfind('.'); if ((extPos != string::npos) && (extPos + 1 < fileName.length())) { extension = StringManip::toLowerCase(fileName.substr(extPos + 1)); } doc.add_term(string("E") + XapianDatabase::limitTermLength(extension)); } // Add the language code with prefix L doc.add_term(string("L") + Languages::toCode(m_stemLanguage)); // ...and the MIME type with prefix T doc.add_term(string("T") + type); string::size_type slashPos = type.find('/'); if (slashPos != string::npos) { doc.add_term(string("XCLASS:") + type.substr(0, slashPos)); } // Others if (g_pMapper != NULL) { vector > prefixedTerms; g_pMapper->getTerms(docInfo, prefixedTerms); for (vector >::const_iterator termIter = prefixedTerms.begin(); termIter != prefixedTerms.end(); ++termIter) { doc.add_term(termIter->second + XapianDatabase::limitTermLength(termIter->first)); } } } void XapianIndex::removeCommonTerms(Xapian::Document &doc, const Xapian::WritableDatabase &db) { DocumentInfo docInfo; set commonTerms; string record(doc.get_data()); // First, remove the magic term commonTerms.insert(MAGIC_TERM); if (record.empty() == true) { // Nothing else we can do return; } XapianDatabase::recordToProps(record, &docInfo); // XapianDatabase expects the language in English, which is okay here string language(docInfo.getLanguage()); Url urlObj(docInfo.getLocation()); // FIXME: remove terms extracted from the title if they don't have more than one posting string title(docInfo.getTitle()); if (title.empty() == false) { removePostingsFromDocument(Xapian::Utf8Iterator(title), doc, db, "S", false, m_doSpelling); } // Location string location(docInfo.getLocation()); commonTerms.insert(string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(docInfo.getLocation(true)), true)); // Containers' base file if ((urlObj.isLocal() == true) && (docInfo.getInternalPath().empty() == false)) { string protocol(urlObj.getProtocol()); commonTerms.insert(string("XFILE:") + XapianDatabase::limitTermLength(Url::escapeUrl(location), true)); if ((urlObj.isLocal() == true) && (protocol != "file")) { string fileUrl(location); // Add another term with file as protocol fileUrl.replace(0, protocol.length(), "file"); commonTerms.insert(string("XFILE:") + XapianDatabase::limitTermLength(Url::escapeUrl(fileUrl), true)); } } // Host name string hostName(StringManip::toLowerCase(urlObj.getHost())); if (hostName.empty() == false) { commonTerms.insert(string("H") + XapianDatabase::limitTermLength(hostName, true)); string::size_type dotPos = hostName.find('.'); while (dotPos != string::npos) { commonTerms.insert(string("H") + XapianDatabase::limitTermLength(hostName.substr(dotPos + 1), true)); // Next dotPos = hostName.find('.', dotPos + 1); } } // ...location string tree(urlObj.getLocation()); if (tree.empty() == false) { if ((urlObj.isLocal() == true) && (docInfo.getIsDirectory() == true)) { commonTerms.insert(string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(docInfo.getLocation().substr(7)), true)); } commonTerms.insert(string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(tree), true)); if (tree[0] == '/') { commonTerms.insert("XDIR:/"); } string::size_type slashPos = tree.find('/', 1); while (slashPos != string::npos) { commonTerms.insert(string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(tree.substr(0, slashPos)), true)); // Next slashPos = tree.find('/', slashPos + 1); } // ...paths bool doSpellingOnPaths = false; removePostingsFromDocument(Xapian::Utf8Iterator(tree), doc, db, "XPATH:", true, doSpellingOnPaths); } else { commonTerms.insert("XDIR:/"); } // ...and file name string fileName(urlObj.getFile()); if (fileName.empty() == false) { string extension; commonTerms.insert(string("P") + XapianDatabase::limitTermLength(Url::escapeUrl(fileName), true)); if (fileName.find(' ') != string::npos) { bool doSpellingOnPaths = false; removePostingsFromDocument(Xapian::Utf8Iterator(fileName), doc, db, "XPATH:", true, doSpellingOnPaths); } // Does it have an extension ? string::size_type extPos = fileName.rfind('.'); if ((extPos != string::npos) && (extPos + 1 < fileName.length())) { extension = StringManip::toLowerCase(fileName.substr(extPos + 1)); } commonTerms.insert(string("E") + XapianDatabase::limitTermLength(extension)); } // Language code commonTerms.insert(string("L") + Languages::toCode(language)); // MIME type string type(docInfo.getType(false)); commonTerms.insert(string("T") + type); string::size_type slashPos = type.find('/'); if (slashPos != string::npos) { commonTerms.insert(string("XCLASS:") + type.substr(0, slashPos)); } // Others if (g_pMapper != NULL) { vector > prefixedTerms; g_pMapper->getTerms(docInfo, prefixedTerms); for (vector >::const_iterator termIter = prefixedTerms.begin(); termIter != prefixedTerms.end(); ++termIter) { commonTerms.insert(termIter->second + XapianDatabase::limitTermLength(termIter->first)); } } for (set::const_iterator termIter = commonTerms.begin(); termIter != commonTerms.end(); ++termIter) { try { doc.remove_term(*termIter); } catch (const Xapian::Error &error) { #ifdef DEBUG clog << "XapianIndex::removeCommonTerms: " << error.get_msg() << endl; #endif } } } string XapianIndex::scanDocument(const string &suggestedLanguage, const char *pData, off_t dataLength) { vector candidates; string language; bool scannedDocument = false; if (suggestedLanguage.empty() == false) { // See first if this is suitable candidates.push_back(suggestedLanguage); } else { // Try to determine the document's language right away LanguageDetector::getInstance().guessLanguage(pData, max(dataLength, (off_t)2048), candidates); scannedDocument = true; } // See which of these languages is suitable for stemming vector::iterator langIter = candidates.begin(); while (langIter != candidates.end()) { if (*langIter == "unknown") { ++langIter; continue; } try { Xapian::Stem stemmer(StringManip::toLowerCase(*langIter)); } catch (const Xapian::Error &error) { clog << "Invalid language: " << error.get_type() << ": " << error.get_msg() << endl; if (scannedDocument == false) { // The suggested language is not suitable candidates.clear(); LanguageDetector::getInstance().guessLanguage(pData, max(dataLength, (off_t)2048), candidates); langIter = candidates.begin(); scannedDocument = true; } else { ++langIter; } continue; } language = *langIter; break; } #ifdef DEBUG clog << "XapianIndex::scanDocument: language " << language << endl; #endif return language; } void XapianIndex::setDocumentData(const DocumentInfo &docInfo, Xapian::Document &doc, const string &language) const { time_t timeT = TimeConverter::fromTimestamp(docInfo.getTimestamp()); struct tm *tm = localtime(&timeT); string yyyymmdd(TimeConverter::toYYYYMMDDString(tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday)); string hhmmss(TimeConverter::toHHMMSSString(tm->tm_hour, tm->tm_min, tm->tm_sec)); // Date doc.add_value(0, yyyymmdd); // FIXME: checksum in value 1 // Size doc.add_value(2, Xapian::sortable_serialise((double )docInfo.getSize())); // Time doc.add_value(3, hhmmss); // Date and time, for results sorting doc.add_value(4, yyyymmdd + hhmmss); // Number of seconds to January 1st, 10000 doc.add_value(5, Xapian::sortable_serialise((double )253402300800 - timeT)); // Any custom value ? if (g_pMapper != NULL) { map values; g_pMapper->getValues(docInfo, values); for (map::const_iterator valIter = values.begin(); valIter != values.end(); ++valIter) { doc.add_value(valIter->first, valIter->second); } } DocumentInfo docCopy(docInfo); // XapianDatabase expects the language in English, which is okay here docCopy.setLanguage(language); doc.set_data(XapianDatabase::propsToRecord(&docCopy)); } bool XapianIndex::deleteDocuments(const string &term) { bool unindexed = false; if (term.empty() == true) { return false; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { #ifdef DEBUG clog << "XapianIndex::deleteDocuments: term is " << term << endl; #endif // Delete documents from the index pIndex->delete_document(term); unindexed = true; } } catch (const Xapian::Error &error) { clog << "Couldn't unindex documents: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't unindex documents, unknown exception occurred" << endl; } pDatabase->unlock(); return unindexed; } // // Implementation of IndexInterface // /// Returns false if the index couldn't be opened. bool XapianIndex::isGood(void) const { return m_goodIndex; } /// Gets metadata. string XapianIndex::getMetadata(const string &name) const { string metadataValue; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return ""; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { // If this index type doesn't support metadata, no exception will be thrown // We will just get an empty string metadataValue = pIndex->get_metadata(name); } } catch (const Xapian::Error &error) { clog << "Couldn't get metadata: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get metadata, unknown exception occurred" << endl; } pDatabase->unlock(); return metadataValue; } /// Sets metadata. bool XapianIndex::setMetadata(const string &name, const string &value) const { bool setMetadata = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { pIndex->set_metadata(name, value); setMetadata = true; } } catch (const Xapian::UnimplementedError &error) { clog << "Couldn't set metadata: " << error.get_type() << ": " << error.get_msg() << endl; } catch (const Xapian::Error &error) { clog << "Couldn't set metadata: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't set metadata, unknown exception occurred" << endl; } pDatabase->unlock(); return setMetadata; } /// Gets the index location. string XapianIndex::getLocation(void) const { return m_databaseName; } /// Returns a document's properties. bool XapianIndex::getDocumentInfo(unsigned int docId, DocumentInfo &docInfo) const { bool foundDocument = false; if (docId == 0) { return false; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { Xapian::Document doc = pIndex->get_document(docId); string record(doc.get_data()); // Get the current document data if (record.empty() == false) { XapianDatabase::recordToProps(record, &docInfo); // XapianDatabase stored the language in English docInfo.setLanguage(Languages::toLocale(docInfo.getLanguage())); foundDocument = true; } } } catch (const Xapian::Error &error) { clog << "Couldn't get document properties: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get document properties, unknown exception occurred" << endl; } pDatabase->unlock(); return foundDocument; } /// Returns a document's terms count. unsigned int XapianIndex::getDocumentTermsCount(unsigned int docId) const { unsigned int termsCount = 0; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return 0; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { Xapian::Document doc = pIndex->get_document(docId); termsCount = doc.termlist_count(); #ifdef DEBUG clog << "XapianIndex::getDocumentTermsCount: " << termsCount << " terms in document " << docId << endl; #endif } } catch (const Xapian::Error &error) { clog << "Couldn't get document terms count: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get document terms count, unknown exception occurred" << endl; } pDatabase->unlock(); return termsCount; } /// Returns a document's terms. bool XapianIndex::getDocumentTerms(unsigned int docId, map &wordsBuffer) const { vector noPosTerms; bool gotTerms = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { unsigned int lastPos = 0; // Go through the position list of each term for (Xapian::TermIterator termIter = pIndex->termlist_begin(docId); termIter != pIndex->termlist_end(docId); ++termIter) { string termName(*termIter); char firstChar = termName[0]; bool hasPositions = false; // Is it prefixed ? if (isupper((int)firstChar) != 0) { // Skip X-prefixed terms if (firstChar == 'X') { #ifdef DEBUG clog << "XapianIndex::getDocumentTerms: skipping " << termName << endl; #endif continue; } // Keep other prefixed terms (S, U, H, P, L, T...) termName.erase(0, 1); } for (Xapian::PositionIterator positionIter = pIndex->positionlist_begin(docId, *termIter); positionIter != pIndex->positionlist_end(docId, *termIter); ++positionIter) { wordsBuffer[*positionIter] = termName; if (*positionIter > lastPos) { lastPos = *positionIter; } hasPositions = true; } if (hasPositions == false) { noPosTerms.push_back(termName); } gotTerms = true; } // Append terms without positional docInformation as if they were at the end of the document for (vector::const_iterator noPosIter = noPosTerms.begin(); noPosIter != noPosTerms.end(); ++noPosIter) { wordsBuffer[lastPos] = *noPosIter; ++lastPos; } } } catch (const Xapian::Error &error) { clog << "Couldn't get document terms: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get document terms, unknown exception occurred" << endl; } pDatabase->unlock(); return gotTerms; } /// Sets the list of known labels. bool XapianIndex::setLabels(const set &labels, bool resetLabels) { string labelsString; // Whether labels are reset or not doesn't make any difference for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { // Prevent from setting internal labels if (labelIter->substr(0, 2) == "X-") { continue; } labelsString += "["; labelsString += Url::escapeUrl(*labelIter); labelsString += "]"; } return setMetadata("labels", labelsString); } /// Gets the list of known labels. bool XapianIndex::getLabels(set &labels) const { string labelsString(getMetadata("labels")); if (labelsString.empty() == true) { return false; } string::size_type endPos = 0; string label(StringManip::extractField(labelsString, "[", "]", endPos)); while (label.empty() == false) { labels.insert(Url::unescapeUrl(label)); if (endPos == string::npos) { break; } label = StringManip::extractField(labelsString, "[", "]", endPos); } return true; } /// Adds a label. bool XapianIndex::addLabel(const string &name) { set labels; if (getLabels(labels) == true) { labels.insert(name); if (setLabels(labels, true) == true) { return true; } } return false; } /// Deletes all references to a label. bool XapianIndex::deleteLabel(const string &name) { bool deletedLabel = false; // Prevent from deleting internal labels if (name.substr(0, 2) == "X-") { return false; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { string term("XLABEL:"); // Get documents that have this label term += XapianDatabase::limitTermLength(Url::escapeUrl(name)); for (Xapian::PostingIterator postingIter = pIndex->postlist_begin(term); postingIter != pIndex->postlist_end(term); ++postingIter) { Xapian::docid docId = *postingIter; // Get the document Xapian::Document doc = pIndex->get_document(docId); // Remove the term doc.remove_term(term); // ...and update the document pIndex->replace_document(docId, doc); } deletedLabel = true; } } catch (const Xapian::Error &error) { clog << "Couldn't delete label: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't delete label, unknown exception occurred" << endl; } pDatabase->unlock(); return deletedLabel; } /// Determines whether a document has a label. bool XapianIndex::hasLabel(unsigned int docId, const string &name) const { bool foundLabel = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { string term("XLABEL:"); // Get documents that have this label // FIXME: would it be faster to get the document's terms ? term += XapianDatabase::limitTermLength(Url::escapeUrl(name)); Xapian::PostingIterator postingIter = pIndex->postlist_begin(term); if (postingIter != pIndex->postlist_end(term)) { // Is this document in the list ? postingIter.skip_to(docId); if ((postingIter != pIndex->postlist_end(term)) && (docId == (*postingIter))) { foundLabel = true; } } } } catch (const Xapian::Error &error) { clog << "Couldn't check document labels: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't check document labels, unknown exception occurred" << endl; } pDatabase->unlock(); return foundLabel; } /// Returns a document's labels. bool XapianIndex::getDocumentLabels(unsigned int docId, set &labels) const { bool gotLabels = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } labels.clear(); try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { Xapian::TermIterator termIter = pIndex->termlist_begin(docId); if (termIter != pIndex->termlist_end(docId)) { for (termIter.skip_to("XLABEL:"); termIter != pIndex->termlist_end(docId); ++termIter) { if ((*termIter).length() < 7) { break; } // Is this a label ? if (strncasecmp((*termIter).c_str(), "XLABEL:", min(7, (int)(*termIter).length())) == 0) { labels.insert(Url::unescapeUrl((*termIter).substr(7))); } } gotLabels = true; } } } catch (const Xapian::Error &error) { clog << "Couldn't get document's labels: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get document's labels, unknown exception occurred" << endl; } pDatabase->unlock(); return gotLabels; } /// Sets a document's labels. bool XapianIndex::setDocumentLabels(unsigned int docId, const set &labels, bool resetLabels) { set docIds; docIds.insert(docId); return setDocumentsLabels(docIds, labels, resetLabels); } /// Sets documents' labels. bool XapianIndex::setDocumentsLabels(const set &docIds, const set &labels, bool resetLabels) { bool updatedLabels = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } for (set::const_iterator docIter = docIds.begin(); docIter != docIds.end(); ++docIter) { try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex == NULL) { break; } unsigned int docId = (*docIter); Xapian::Document doc = pIndex->get_document(docId); // Reset existing labels ? if (resetLabels == true) { Xapian::TermIterator termIter = pIndex->termlist_begin(docId); if (termIter != pIndex->termlist_end(docId)) { for (termIter.skip_to("XLABEL:"); termIter != pIndex->termlist_end(docId); ++termIter) { string term(*termIter); // Is this a non-internal label ? if ((strncasecmp(term.c_str(), "XLABEL:", min(7, (int)term.length())) == 0) && (strncasecmp(term.c_str(), "XLABEL:X-", min(9, (int)term.length())) != 0)) { doc.remove_term(term); } } } } // Set new labels addLabelsToDocument(doc, labels, true); pIndex->replace_document(docId, doc); updatedLabels = true; } catch (const Xapian::Error &error) { clog << "Couldn't update document's labels: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't update document's labels, unknown exception occurred" << endl; } pDatabase->unlock(); } return updatedLabels; } /// Checks whether the given URL is in the index. unsigned int XapianIndex::hasDocument(const string &url) const { unsigned int docId = 0; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return 0; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { string term = string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(Url::canonicalizeUrl(url)), true); // Get documents that have this term Xapian::PostingIterator postingIter = pIndex->postlist_begin(term); if (postingIter != pIndex->postlist_end(term)) { // This URL was indexed docId = *postingIter; #ifdef DEBUG clog << "XapianIndex::hasDocument: " << term << " in document " << docId << " " << postingIter.get_wdf() << " time(s)" << endl; #endif } // FIXME: what if the term exists in more than one document ? } } catch (const Xapian::Error &error) { clog << "Couldn't look for document: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't look for document, unknown exception occurred" << endl; } pDatabase->unlock(); return docId; } /// Gets terms with the same root. unsigned int XapianIndex::getCloseTerms(const string &term, set &suggestions) { Dijon::CJKVTokenizer tokenizer; // Only offer suggestions for non CJKV terms if (tokenizer.has_cjkv(term) == true) { return 0; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return 0; } suggestions.clear(); try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { Xapian::TermIterator termIter = pIndex->allterms_begin(); if (termIter != pIndex->allterms_end()) { string baseTerm(StringManip::toLowerCase(term)); unsigned int count = 0; // Get the next 10 terms for (termIter.skip_to(baseTerm); (termIter != pIndex->allterms_end()) && (count < 10); ++termIter) { string suggestedTerm(*termIter); // Does this term have the same root ? if (suggestedTerm.find(baseTerm) != 0) { break; } suggestions.insert(suggestedTerm); ++count; } } } } catch (const Xapian::Error &error) { clog << "Couldn't get terms: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get terms, unknown exception occurred" << endl; } pDatabase->unlock(); return suggestions.size(); } /// Returns the ID of the last document. unsigned int XapianIndex::getLastDocumentID(void) const { unsigned int docId = 0; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return 0; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { docId = pIndex->get_lastdocid(); } } catch (const Xapian::Error &error) { clog << "Couldn't get last document ID: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't get last document ID, unknown exception occurred" << endl; } pDatabase->unlock(); return docId; } /// Returns the number of documents. unsigned int XapianIndex::getDocumentsCount(const string &labelName) const { unsigned int docCount = 0; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return 0; } try { Xapian::Database *pIndex = pDatabase->readLock(); if (pIndex != NULL) { if (labelName.empty() == true) { docCount = pIndex->get_doccount(); } else { string term("XLABEL:"); // Each label appears only one per document so the collection frequency // is the number of documents that have this label term += XapianDatabase::limitTermLength(Url::escapeUrl(labelName)); docCount = pIndex->get_collection_freq(term); } } } catch (const Xapian::Error &error) { clog << "Couldn't count documents: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't count documents, unknown exception occurred" << endl; } pDatabase->unlock(); return docCount; } /// Lists document IDs. unsigned int XapianIndex::listDocuments(set &docIds, unsigned int maxDocsCount, unsigned int startDoc) const { // All documents have the magic term if (listDocumentsWithTerm("", docIds, maxDocsCount, startDoc) == true) { return docIds.size(); } return 0; } /// Lists documents. bool XapianIndex::listDocuments(const string &name, set &docIds, NameType type, unsigned int maxDocsCount, unsigned int startDoc) const { string term; if (type == BY_LABEL) { term = string("XLABEL:") + XapianDatabase::limitTermLength(Url::escapeUrl(name)); } else if (type == BY_DIRECTORY) { term = string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(name), true); } else if (type == BY_FILE) { term = string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(name), true); } else if (type == BY_CONTAINER_FILE) { term = string("XFILE:") + XapianDatabase::limitTermLength(Url::escapeUrl(name), true); } return listDocumentsWithTerm(term, docIds, maxDocsCount, startDoc); } /// Indexes the given data. bool XapianIndex::indexDocument(const Document &document, const std::set &labels, unsigned int &docId) { bool indexed = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } // Cache the document's properties DocumentInfo docInfo(document); docInfo.setLocation(Url::canonicalizeUrl(document.getLocation())); off_t dataLength = 0; const char *pData = document.getData(dataLength); // Don't scan the document if a language is specified m_stemLanguage = Languages::toEnglish(docInfo.getLanguage()); if ((pData != NULL) && (dataLength > 0)) { m_stemLanguage = scanDocument(m_stemLanguage, pData, dataLength); docInfo.setLanguage(Languages::toLocale(m_stemLanguage)); } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { Xapian::Document doc; Xapian::termcount termPos = 0; // Populate the Xapian document addCommonTerms(docInfo, doc, *pIndex, termPos); if ((pData != NULL) && (dataLength > 0)) { Xapian::Utf8Iterator itor(pData, dataLength); addPostingsToDocument(itor, doc, *pIndex, "", false, m_doSpelling, termPos); } #ifdef DEBUG clog << "XapianIndex::indexDocument: " << labels.size() << " labels for URL " << docInfo.getLocation(true) << endl; #endif // Add labels addLabelsToDocument(doc, labels, false); // Set data setDocumentData(docInfo, doc, m_stemLanguage); // Add this document to the Xapian index docId = pIndex->add_document(doc); indexed = true; } } catch (const Xapian::Error &error) { clog << "Couldn't index document: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't index document, unknown exception occurred" << endl; } pDatabase->unlock(); return indexed; } /// Updates the given document; true if success. bool XapianIndex::updateDocument(unsigned int docId, const Document &document) { bool updated = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } // Cache the document's properties DocumentInfo docInfo(document); set labels(document.getLabels()); docInfo.setLocation(Url::canonicalizeUrl(document.getLocation())); off_t dataLength = 0; const char *pData = document.getData(dataLength); // Don't scan the document if a language is specified m_stemLanguage = Languages::toEnglish(docInfo.getLanguage()); if ((pData != NULL) && (dataLength > 0)) { m_stemLanguage = scanDocument(m_stemLanguage, pData, dataLength); docInfo.setLanguage(Languages::toLocale(m_stemLanguage)); } Xapian::WritableDatabase *pIndex = NULL; try { pIndex = pDatabase->writeLock(); if (pIndex != NULL) { Xapian::Document doc; Xapian::termcount termPos = 0; // Populate the Xapian document addCommonTerms(docInfo, doc, *pIndex, termPos); if ((pData != NULL) && (dataLength > 0)) { Xapian::Utf8Iterator itor(pData, dataLength); addPostingsToDocument(itor, doc, *pIndex, "", false, m_doSpelling, termPos); } // Add labels addLabelsToDocument(doc, labels, false); // Set data setDocumentData(docInfo, doc, m_stemLanguage); // Update the document in the database pIndex->replace_document(docId, doc); updated = true; } } catch (const Xapian::Error &error) { clog << "Couldn't update document: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't update document, unknown exception occurred" << endl; } if (pIndex != NULL) { pDatabase->unlock(); } return updated; } /// Updates a document's properties. bool XapianIndex::updateDocumentInfo(unsigned int docId, const DocumentInfo &docInfo) { bool updated = false; if (docId == 0) { return false; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { Xapian::Document doc = pIndex->get_document(docId); Xapian::termcount termPos = 0; // Update the document data with the current language m_stemLanguage = Languages::toEnglish(docInfo.getLanguage()); removeCommonTerms(doc, *pIndex); addCommonTerms(docInfo, doc, *pIndex, termPos); setDocumentData(docInfo, doc, m_stemLanguage); pIndex->replace_document(docId, doc); updated = true; } } catch (const Xapian::Error &error) { clog << "Couldn't update document properties: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't update document properties, unknown exception occurred" << endl; } pDatabase->unlock(); return updated; } /// Unindexes the given document; true if success. bool XapianIndex::unindexDocument(unsigned int docId) { bool unindexed = false; if (docId == 0) { return false; } XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { // Delete the document from the index pIndex->delete_document(docId); unindexed = true; } } catch (const Xapian::Error &error) { clog << "Couldn't unindex document: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't unindex document, unknown exception occurred" << endl; } pDatabase->unlock(); return unindexed; } /// Unindexes the given document. bool XapianIndex::unindexDocument(const string &location) { string term(string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(Url::canonicalizeUrl(location)), true)); return deleteDocuments(term); } /// Unindexes documents. bool XapianIndex::unindexDocuments(const string &name, NameType type) { string term; if (type == BY_LABEL) { term = string("XLABEL:") + XapianDatabase::limitTermLength(Url::escapeUrl(name)); } else if (type == BY_DIRECTORY) { term = string("XDIR:") + XapianDatabase::limitTermLength(Url::escapeUrl(name), true); } else if (type == BY_FILE) { term = string("U") + XapianDatabase::limitTermLength(Url::escapeUrl(name), true); } else if (type == BY_CONTAINER_FILE) { term = string("XFILE:") + XapianDatabase::limitTermLength(Url::escapeUrl(name), true); } return deleteDocuments(term); } /// Unindexes all documents. bool XapianIndex::unindexAllDocuments(void) { // All documents have the magic term return deleteDocuments(MAGIC_TERM); } /// Flushes recent changes to the disk. bool XapianIndex::flush(void) { bool flushed = false; XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } try { Xapian::WritableDatabase *pIndex = pDatabase->writeLock(); if (pIndex != NULL) { pIndex->commit(); flushed = true; } } catch (const Xapian::Error &error) { clog << "Couldn't flush database: " << error.get_type() << ": " << error.get_msg() << endl; } catch (...) { clog << "Couldn't flush database, unknown exception occurred" << endl; } pDatabase->unlock(); return flushed; } /// Reopens the index. bool XapianIndex::reopen(void) const { // Reopen XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } pDatabase->reopen(); return true; } /// Resets the index. bool XapianIndex::reset(void) { // Overwrite and reopen XapianDatabase *pDatabase = XapianDatabaseFactory::getDatabase(m_databaseName, false, true); if (pDatabase == NULL) { clog << "Couldn't get index " << m_databaseName << endl; return false; } return true; } pinot-1.22/IndexSearch/Xapian/XapianIndex.h000066400000000000000000000142131470740426600206000ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _XAPIAN_INDEX_H #define _XAPIAN_INDEX_H #include #include #include #include "config.h" #include "CJKVTokenizer.h" #include "XapianDatabase.h" #include "IndexInterface.h" /// A Xapian-based index. class XapianIndex : public IndexInterface { public: XapianIndex(const std::string &indexName); XapianIndex(const XapianIndex &other); virtual ~XapianIndex(); XapianIndex &operator=(const XapianIndex &other); /// Returns false if the index couldn't be opened. virtual bool isGood(void) const; /// Gets metadata. virtual std::string getMetadata(const std::string &name) const; /// Sets metadata. virtual bool setMetadata(const std::string &name, const std::string &value) const; /// Gets the index location. virtual std::string getLocation(void) const; /// Returns a document's properties. virtual bool getDocumentInfo(unsigned int docId, DocumentInfo &docInfo) const; /// Returns a document's terms count. virtual unsigned int getDocumentTermsCount(unsigned int docId) const; /// Returns a document's terms. virtual bool getDocumentTerms(unsigned int docId, std::map &wordsBuffer) const; /// Sets the list of known labels. virtual bool setLabels(const std::set &labels, bool resetLabels); /// Gets the list of known labels. virtual bool getLabels(std::set &labels) const; /// Adds a label. virtual bool addLabel(const std::string &name); /// Deletes all references to a label. virtual bool deleteLabel(const std::string &name); /// Determines whether a document has a label. virtual bool hasLabel(unsigned int docId, const std::string &name) const; /// Returns a document's labels. virtual bool getDocumentLabels(unsigned int docId, std::set &labels) const; /// Sets a document's labels. virtual bool setDocumentLabels(unsigned int docId, const std::set &labels, bool resetLabels = true); /// Sets documents' labels. virtual bool setDocumentsLabels(const std::set &docIds, const std::set &labels, bool resetLabels = true); /// Checks whether the given URL is in the index. virtual unsigned int hasDocument(const std::string &url) const; /// Gets terms with the same root. virtual unsigned int getCloseTerms(const std::string &term, std::set &suggestions); /// Returns the ID of the last document. virtual unsigned int getLastDocumentID(void) const; /// Returns the number of documents. virtual unsigned int getDocumentsCount(const std::string &labelName = "") const; /// Lists documents. virtual unsigned int listDocuments(std::set &docIDList, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const; /// Lists documents. virtual bool listDocuments(const std::string &name, std::set &docIds, NameType type, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const; /// Indexes the given data. virtual bool indexDocument(const Document &doc, const std::set &labels, unsigned int &docId); /// Updates the given document. virtual bool updateDocument(unsigned int docId, const Document &doc); /// Updates a document's properties. virtual bool updateDocumentInfo(unsigned int docId, const DocumentInfo &docInfo); /// Unindexes the given document. virtual bool unindexDocument(unsigned int docId); /// Unindexes the given document. virtual bool unindexDocument(const std::string &location); /// Unindexes documents. virtual bool unindexDocuments(const std::string &name, NameType type); /// Unindexes all documents. virtual bool unindexAllDocuments(void); /// Flushes recent changes to the disk. virtual bool flush(void); /// Reopens the index. virtual bool reopen(void) const; /// Resets the index. virtual bool reset(void); protected: std::string m_databaseName; bool m_goodIndex; bool m_doSpelling; std::string m_stemLanguage; bool listDocumentsWithTerm(const std::string &term, std::set &docIds, unsigned int maxDocsCount = 0, unsigned int startDoc = 0) const; void addPostingsToDocument(const Xapian::Utf8Iterator &itor, Xapian::Document &doc, const Xapian::WritableDatabase &db, const std::string &prefix, bool noStemming, bool &doSpelling, Xapian::termcount &termPos) const; void addPostingsToDocument(Dijon::CJKVTokenizer &tokenizer, Xapian::Stem *pStemmer, const std::string &text, Xapian::Document &doc, const Xapian::WritableDatabase &db, const std::string &prefix, bool &doSpelling, Xapian::termcount &termPos) const; static void addLabelsToDocument(Xapian::Document &doc, const std::set &labels, bool skipInternals); void removePostingsFromDocument(const Xapian::Utf8Iterator &itor, Xapian::Document &doc, const Xapian::WritableDatabase &db, const std::string &prefix, bool noStemming, bool &doSpelling) const; void addCommonTerms(const DocumentInfo &info, Xapian::Document &doc, const Xapian::WritableDatabase &db, Xapian::termcount &termPos); void removeCommonTerms(Xapian::Document &doc, const Xapian::WritableDatabase &db); std::string scanDocument(const std::string &suggestedLanguage, const char *pData, off_t dataLength); void setDocumentData(const DocumentInfo &info, Xapian::Document &doc, const std::string &language) const; bool deleteDocuments(const std::string &term); }; #endif // _XAPIAN_INDEX_H pinot-1.22/IndexSearch/cjkv/000077500000000000000000000000001470740426600157335ustar00rootroot00000000000000pinot-1.22/IndexSearch/cjkv/CJKVTokenizer.cc000066400000000000000000000263451470740426600207040ustar00rootroot00000000000000/* * Copyright 2007-2008 林永忠 Yung-Chung Lin * Copyright 2008-2013 Fabrice Colin * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ #include #include #include #include #include "CJKVTokenizer.h" static const char *unicode_get_utf8(const char *p, gunichar *result) { *result = g_utf8_get_char(p); return (*result == (gunichar)-1) ? NULL : g_utf8_next_char(p); } // 2E80..2EFF; CJK Radicals Supplement // 3000..303F; CJK Symbols and Punctuation // 3040..309F; Hiragana // 30A0..30FF; Katakana // 3100..312F; Bopomofo // 3130..318F; Hangul Compatibility Jamo // 3190..319F; Kanbun // 31A0..31BF; Bopomofo Extended // 31C0..31EF; CJK Strokes // 31F0..31FF; Katakana Phonetic Extensions // 3200..32FF; Enclosed CJK Letters and Months // 3300..33FF; CJK Compatibility // 3400..4DBF; CJK Unified Ideographs Extension A // 4DC0..4DFF; Yijing Hexagram Symbols // 4E00..9FFF; CJK Unified Ideographs // A700..A71F; Modifier Tone Letters // AC00..D7AF; Hangul Syllables // F900..FAFF; CJK Compatibility Ideographs // FE30..FE4F; CJK Compatibility Forms // FF00..FFEF; Halfwidth and Fullwidth Forms // 20000..2A6DF; CJK Unified Ideographs Extension B // 2F800..2FA1F; CJK Compatibility Ideographs Supplement #define UTF8_IS_CJKV(p) \ (((p) >= 0x2E80 && (p) <= 0x2EFF) \ || ((p) >= 0x3000 && (p) <= 0x303F) \ || ((p) >= 0x3040 && (p) <= 0x309F) \ || ((p) >= 0x30A0 && (p) <= 0x30FF) \ || ((p) >= 0x3100 && (p) <= 0x312F) \ || ((p) >= 0x3130 && (p) <= 0x318F) \ || ((p) >= 0x3190 && (p) <= 0x319F) \ || ((p) >= 0x31A0 && (p) <= 0x31BF) \ || ((p) >= 0x31C0 && (p) <= 0x31EF) \ || ((p) >= 0x31F0 && (p) <= 0x31FF) \ || ((p) >= 0x3200 && (p) <= 0x32FF) \ || ((p) >= 0x3300 && (p) <= 0x33FF) \ || ((p) >= 0x3400 && (p) <= 0x4DBF) \ || ((p) >= 0x4DC0 && (p) <= 0x4DFF) \ || ((p) >= 0x4E00 && (p) <= 0x9FFF) \ || ((p) >= 0xA700 && (p) <= 0xA71F) \ || ((p) >= 0xAC00 && (p) <= 0xD7AF) \ || ((p) >= 0xF900 && (p) <= 0xFAFF) \ || ((p) >= 0xFE30 && (p) <= 0xFE4F) \ || ((p) >= 0xFF00 && (p) <= 0xFFEF) \ || ((p) >= 0x20000 && (p) <= 0x2A6DF) \ || ((p) >= 0x2F800 && (p) <= 0x2FA1F) \ || ((p) >= 0x2F800 && (p) <= 0x2FA1F)) // Combining Marks // 0300..036F; Basic range // 1DC0..1DFF; Supplements // 20D0..20FF; Symbols // FE20..FE2F; Half marks #define UTF8_IS_CM(p) \ (((p) >= 0x0300 && (p) <= 0x036F) \ || ((p) >= 0x1DC0 && (p) <= 0x1DFF) \ || ((p) >= 0x20D0 && (p) <= 0x20FF) \ || ((p) >= 0xFE20 && (p) <= 0xFE2F)) using namespace std; using namespace Dijon; static void _split_string(string str, const string &delim, vector &list) { list.clear(); string::size_type cut_at = 0; while ((cut_at = str.find_first_of(delim)) != str.npos) { if (cut_at > 0) { list.push_back(str.substr(0,cut_at)); } str = str.substr(cut_at+1); } if (str.empty() == false) { list.push_back(str); } } static inline unsigned char *_unicode_to_char(gunichar &uchar, unsigned char *p) { if (p == NULL) { return NULL; } memset(p, 0, sizeof(gunichar) + 1); if (g_unichar_isspace(uchar) || (g_unichar_ispunct(uchar) && (uchar != '.'))) { p[0] = ' '; } else if (uchar < 0x80) { p[0] = uchar; } else if (uchar < 0x800) { p[0] = (0xC0 | uchar >> 6); p[1] = (0x80 | uchar & 0x3F); } else if (uchar < 0x10000) { p[0] = (0xE0 | uchar >> 12); p[1] = (0x80 | uchar >> 6 & 0x3F); p[2] = (0x80 | uchar & 0x3F); } else if (uchar < 0x200000) { p[0] = (0xF0 | uchar >> 18); p[1] = (0x80 | uchar >> 12 & 0x3F); p[2] = (0x80 | uchar >> 6 & 0x3F); p[3] = (0x80 | uchar & 0x3F); } return p; } class VectorTokensHandler : public CJKVTokenizer::TokensHandler { public: VectorTokensHandler(vector &token_list) : CJKVTokenizer::TokensHandler(), m_token_list(token_list) { } virtual ~VectorTokensHandler() { } virtual bool handle_token(const string &tok, bool is_cjkv) { m_token_list.push_back(tok); return true; } protected: vector &m_token_list; }; CJKVTokenizer::CJKVTokenizer() : m_nGramSize(2), m_maxTokenCount(0), m_maxTextSize(5242880) { } CJKVTokenizer::~CJKVTokenizer() { } string CJKVTokenizer::normalize(const string &str, bool normalizeAll) { // Normalize the string gchar *normalized = g_utf8_normalize(str.c_str(), str.length(), (normalizeAll == true ? G_NORMALIZE_ALL : G_NORMALIZE_DEFAULT_COMPOSE)); if (normalized == NULL) { return ""; } string normalized_str(normalized, strlen(normalized)); g_free(normalized); return normalized_str; } string CJKVTokenizer::strip_marks(const string &str) { if (str.empty() == true) { return ""; } gchar *stripped = g_strdup(normalize(str, true).c_str()); gsize input_pos = 0, output_pos = 0; if (stripped == NULL) { return ""; } while (input_pos < strlen(stripped)) { gunichar unichar = g_utf8_get_char_validated(&stripped[input_pos], -1); if ((unichar == (gunichar)-1) || (unichar == (gunichar)-2)) { break; } gchar *next_utf8 = g_utf8_next_char(&stripped[input_pos]); gint utf8_len = next_utf8 - &stripped[input_pos]; // Is this a Combining Mark ? if (!UTF8_IS_CM((guint32)unichar)) { // No, it's not if (input_pos != output_pos) { memmove(&stripped[output_pos], &stripped[input_pos], utf8_len); } output_pos += utf8_len; } input_pos += utf8_len; } stripped[output_pos] = '\0'; string stripped_str(stripped, output_pos); g_free(stripped); return stripped_str; } void CJKVTokenizer::set_ngram_size(unsigned int ngram_size) { m_nGramSize = ngram_size; } unsigned int CJKVTokenizer::get_ngram_size(void) const { return m_nGramSize; } void CJKVTokenizer::set_max_token_count(unsigned int max_token_count) { m_maxTokenCount = max_token_count; } unsigned int CJKVTokenizer::get_max_token_count(void) const { return m_maxTokenCount; } void CJKVTokenizer::set_max_text_size(unsigned int max_text_size) { m_maxTextSize = max_text_size; } unsigned int CJKVTokenizer::get_max_text_size(void) const { return m_maxTextSize; } void CJKVTokenizer::tokenize(const string &str, vector &token_list, bool break_ascii_only_on_space) { VectorTokensHandler handler(token_list); tokenize(str, handler, break_ascii_only_on_space); } void CJKVTokenizer::tokenize(const string &str, TokensHandler &handler, bool break_ascii_only_on_space) { string token_str; vector temp_token_list; vector temp_uchar_list; unsigned int tokens_count = 0; split(str, temp_token_list, temp_uchar_list); for (unsigned int i = 0; i < temp_token_list.size();) { if ((m_maxTokenCount > 0) && (tokens_count >= m_maxTokenCount)) { break; } token_str.resize(0); if (UTF8_IS_CJKV(temp_uchar_list[i])) { for (unsigned int j = i; j < i + m_nGramSize; j++) { if ((m_maxTokenCount > 0) && (tokens_count >= m_maxTokenCount)) { break; } if (j == temp_token_list.size()) { break; } if (UTF8_IS_CJKV(temp_uchar_list[j])) { string token(temp_token_list[j]); if ((token.length() == 1) && (isspace(token[0]) != 0)) { break; } token_str += token; if (handler.handle_token(normalize(token_str), true) == true) { ++tokens_count; } } } i++; } else { unsigned int j = i; while (j < temp_token_list.size()) { unsigned char *p = (unsigned char*) temp_token_list[j].c_str(); bool break_ascii = false; if (isascii((int)p[0]) != 0) { if (break_ascii_only_on_space == true) { if (isspace((int)p[0]) != 0) { break_ascii = true; } } else if (isalnum((int)p[0]) == 0) { break_ascii = true; } } if (break_ascii == true) { j++; break; } else if (UTF8_IS_CJKV(temp_uchar_list[j])) { break; } token_str += temp_token_list[j]; j++; } i = j; if ((m_maxTokenCount > 0) && (tokens_count >= m_maxTokenCount)) { break; } if (token_str.empty() == false) { if (handler.handle_token(normalize(token_str), false) == true) { ++tokens_count; } } } } } void CJKVTokenizer::split(const string &str, vector &string_list, vector &unicode_list) { gunichar uchar; const char *str_ptr = str.c_str(); glong str_utf8_len = g_utf8_strlen(str_ptr, str.length()); unsigned char p[sizeof(gunichar) + 1]; for (glong i = 0; i < str_utf8_len; i++) { str_ptr = unicode_get_utf8(str_ptr, &uchar); if (str_ptr == NULL) { break; } if (i >= m_maxTextSize) { break; } string_list.push_back((const char*)_unicode_to_char(uchar, p)); unicode_list.push_back(uchar); } } void CJKVTokenizer::segment(const string &str, vector &token_segment) { vector token_list; string onlySpacesStr(str); for (string::iterator it = onlySpacesStr.begin(); it != onlySpacesStr.end(); ++it) { if (isspace((int)*it) != 0) { *it = ' '; } } _split_string(onlySpacesStr, " ", token_segment); } bool CJKVTokenizer::has_cjkv(const string &str) { vector temp_token_list; vector temp_uchar_list; split(str, temp_token_list, temp_uchar_list); for (unsigned int i = 0; i < temp_uchar_list.size(); i++) { if (UTF8_IS_CJKV(temp_uchar_list[i])) { return true; } } return false; } bool CJKVTokenizer::has_cjkv_only(const string &str) { vector temp_token_list; vector temp_uchar_list; split(str, temp_token_list, temp_uchar_list); for (unsigned int i = 0; i < temp_uchar_list.size(); i++) { if (!(UTF8_IS_CJKV(temp_uchar_list[i]))) { unsigned char p[sizeof(gunichar) + 1]; _unicode_to_char(temp_uchar_list[i], p); if (isspace((int)p[0]) == 0) { return false; } } } return true; } pinot-1.22/IndexSearch/cjkv/CJKVTokenizer.h000066400000000000000000000050401470740426600205330ustar00rootroot00000000000000/* * Copyright 2007-2008 林永忠 Yung-Chung Lin * Copyright 2008-2013 Fabrice Colin * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ #ifndef _DIJON_CJKVTOKENIZER_H #define _DIJON_CJKVTOKENIZER_H #include #include #include #ifndef DIJON_CJKV_EXPORT #if defined __GNUC__ && (__GNUC__ >= 4) #define DIJON_CJKV_EXPORT __attribute__ ((visibility("default"))) #else #define DIJON_CJKV_EXPORT #endif #endif namespace Dijon { class DIJON_CJKV_EXPORT CJKVTokenizer { public: CJKVTokenizer(); ~CJKVTokenizer(); static std::string normalize(const std::string &str, bool normalizeAll = false); static std::string strip_marks(const std::string &str); class TokensHandler { public: TokensHandler() {} virtual ~TokensHandler() {} virtual bool handle_token(const std::string &tok, bool is_cjkv) = 0; }; void set_ngram_size(unsigned int ngram_size); unsigned int get_ngram_size(void) const; void set_max_token_count(unsigned int max_token_count); unsigned int get_max_token_count(void) const; void set_max_text_size(unsigned int max_text_size); unsigned int get_max_text_size(void) const; void tokenize(const std::string &str, std::vector &token_list, bool break_ascii_only_on_space = false); void tokenize(const std::string &str, TokensHandler &handler, bool break_ascii_only_on_space = false); void split(const std::string &str, std::vector &string_list, std::vector &unicode_list); void segment(const std::string &str, std::vector &token_segment); bool has_cjkv(const std::string &str); bool has_cjkv_only(const std::string &str); protected: unsigned int m_nGramSize; unsigned int m_maxTokenCount; unsigned int m_maxTextSize; }; }; #endif // _DIJON_CJKVTOKENIZER_H pinot-1.22/IndexSearch/pinot-label.1000066400000000000000000000022321470740426600172650ustar00rootroot00000000000000.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.3. .TH PINOT-LABEL "1" "October 2024" "pinot 1.22" "User Commands" .SH NAME pinot-label \- Label files from the command-line .SH SYNOPSIS .B pinot-label [\fI\,OPTIONS\/\fR] [\fI\,FILES\/\fR] .SH DESCRIPTION pinot\-label \- Label files from the command\-line .SH OPTIONS .TP \fB\-g\fR, \fB\-\-get\fR get the labels list for the given file .TP \fB\-h\fR, \fB\-\-help\fR display this help and exit .TP \fB\-l\fR, \fB\-\-list\fR list known labels .TP \fB\-r\fR, \fB\-\-reload\fR get the daemon to reload the configuration .TP \fB\-s\fR, \fB\-\-set\fR set labels on the given file .TP \fB\-v\fR, \fB\-\-version\fR output version information and exit .SH EXAMPLES pinot\-label \-\-get /home/fabrice/Documents/Bozo.txt .PP pinot\-label \-\-list .PP pinot\-label \-\-set "[Clowns][Fun][My Hero]" /home/fabrice/Documents/Bozo.txt .SH "REPORTING BUGS" Report bugs to fabrice.colin@gmail.com .PP .br This is free software. You may redistribute copies of it under the terms of the GNU General Public License . .br There is NO WARRANTY, to the extent permitted by law. pinot-1.22/IndexSearch/pinot-label.cpp000066400000000000000000000136531470740426600177200ustar00rootroot00000000000000/* * Copyright 2007-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include "config.h" #include "StringManip.h" #include "MIMEScanner.h" #include "Url.h" #include "DBusIndex.h" using namespace std; static struct option g_longOptions[] = { {"get", 0, 0, 'g'}, {"help", 0, 0, 'h'}, {"list", 0, 0, 'l'}, {"reload", 0, 0, 'r'}, {"set", 1, 0, 's'}, {"version", 0, 0, 'v'}, {0, 0, 0, 0} }; static void printLabels(const set &labels, const string &fileName) { if (fileName.empty() == false) { clog << fileName << endl; } clog << "Labels: "; for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { if (labelIter->substr(0, 2) == "X-") { continue; } clog << "[" << Url::escapeUrl(*labelIter) << "]"; } clog << endl; } static void printHelp(void) { // Help clog << "pinot-label - Label files from the command-line\n\n" << "Usage: pinot-label [OPTIONS] [FILES]\n\n" << "Options:\n" << " -g, --get get the labels list for the given file\n" << " -h, --help display this help and exit\n" << " -l, --list list known labels\n" << " -r, --reload get the daemon to reload the configuration\n" << " -s, --set set labels on the given file\n" << " -v, --version output version information and exit\n\n"; clog << "Examples:\n" << "pinot-label --get /home/fabrice/Documents/Bozo.txt\n\n" << "pinot-label --list\n\n" << "pinot-label --set \"[Clowns][Fun][My Hero]\" /home/fabrice/Documents/Bozo.txt\n\n" << "Report bugs to " << PACKAGE_BUGREPORT << endl; } int main(int argc, char **argv) { set labels; string labelsString; int longOptionIndex = 0; unsigned int docId = 0; int minArgNum = 1; bool getLabels = false, getDocumentLabels = false, reloadIndex = false, setDocumentLabels = false, success = false; // Look at the options int optionChar = getopt_long(argc, argv, "ghlrs:v", g_longOptions, &longOptionIndex); while (optionChar != -1) { set engines; switch (optionChar) { case 'g': getDocumentLabels = true; break; case 'h': printHelp(); return EXIT_SUCCESS; case 'l': minArgNum = 0; getLabels = true; break; case 'r': minArgNum = 0; reloadIndex = true; break; case 's': setDocumentLabels = true; if (optarg != NULL) { labelsString = optarg; } break; case 'v': clog << "pinot-label - " << PACKAGE_STRING << "\n\n" << "This is free software. You may redistribute copies of it under the terms of\n" << "the GNU General Public License .\n" << "There is NO WARRANTY, to the extent permitted by law." << endl; return EXIT_SUCCESS; default: return EXIT_FAILURE; } // Next option optionChar = getopt_long(argc, argv, "ghls:v", g_longOptions, &longOptionIndex); } if (argc == 1) { printHelp(); return EXIT_SUCCESS; } if ((argc < 2) || (argc - optind < minArgNum)) { clog << "Not enough parameters" << endl; return EXIT_FAILURE; } if ((setDocumentLabels == true) && (labelsString.empty() == true)) { clog << "Incorrect parameters" << endl; return EXIT_FAILURE; } Glib::init(); Gio::init(); // Initialize GType #if !GLIB_CHECK_VERSION(2,35,0) g_type_init(); #endif MIMEScanner::initialize("", ""); // We need a pure DBusIndex object DBusIndex index(NULL); if (getLabels == true) { if (index.getLabels(labels) == true) { printLabels(labels, ""); success = true; } } while (optind < argc) { string fileParam(argv[optind]); Url thisUrl(fileParam, ""); // Rewrite it as a local URL string urlParam(thisUrl.getProtocol()); urlParam += "://"; urlParam += thisUrl.getLocation(); if (thisUrl.getFile().empty() == false) { urlParam += "/"; urlParam += thisUrl.getFile(); } #ifdef DEBUG clog << "URL rewritten to " << urlParam << endl; #endif if ((getDocumentLabels == true) || (setDocumentLabels == true)) { docId = index.hasDocument(urlParam); if (docId == 0) { clog << fileParam << " is not indexed" << endl; success = false; // Next ++optind; continue; } } if (getDocumentLabels == true) { labels.clear(); if (index.getDocumentLabels(docId, labels) == true) { printLabels(labels, fileParam); success = true; } } if (setDocumentLabels == true) { string::size_type endPos = 0; string label(StringManip::extractField(labelsString, "[", "]", endPos)); labels.clear(); // Parse labels while (label.empty() == false) { labels.insert(Url::unescapeUrl(label)); if (endPos == string::npos) { break; } label = StringManip::extractField(labelsString, "[", "]", endPos); } #ifdef DEBUG printLabels(labels, fileParam); #endif success = index.setDocumentLabels(docId, labels); } // Next ++optind; } if (reloadIndex == true) { index.reload(); } MIMEScanner::shutdown(); // Did whatever operation we carried out succeed ? if (success == true) { return EXIT_SUCCESS; } return EXIT_FAILURE; } pinot-1.22/LICENSE000066400000000000000000000431761470740426600136210ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. {description} Copyright (C) {year} {fullname} This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. {signature of Ty Coon}, 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. pinot-1.22/Makefile.am000066400000000000000000000146041470740426600146420ustar00rootroot00000000000000 SUBDIRS = po Utils Tokenize SQL Collect IndexSearch IndexSearch/Xapian Monitor Core UI/GTK3/src dist-hook: @if test -d "$(srcdir)/.git"; \ then \ echo Creating ChangeLog && \ ( cd "$(top_srcdir)" && \ git log --decorate ) > ChangeLog; \ else \ echo A git clone is required to generate a ChangeLog >&2; \ fi EXTRA_DIST = AUTHORS ChangeLog ChangeLog-dijon ChangeLog-svn FAQ NEWS README TODO \ Tokenize/filters/external-filters.xml globalconfig.xml \ textcat*conf.txt pinot*.desktop pinot.spec \ IndexSearch/Plugins/*src IndexSearch/Plugins/*.xml \ Core/pinot-*.1 IndexSearch/pinot-*.1 \ UI/GTK3/src/pinot.1 Core/pinot-dbus-daemon.xml \ Core/com.github.fabricecolin.Pinot.search-provider.ini \ Core/com.github.fabricecolin.Pinot.service \ UI/icons/48x48/pinot.png UI/icons/32x32/pinot.png \ UI/icons/24x24/pinot.png UI/icons/22x22/pinot.png \ UI/icons/16x16/pinot.png \ UI/GTK3/metase-gtk3.gtkbuilder \ scripts/bash/*.sh if HAVE_DBUS man_MANS = Core/pinot-index.1 \ IndexSearch/pinot-label.1 \ Core/pinot-search.1 \ Core/pinot-dbus-daemon.1 \ UI/GTK3/src/pinot.1 else man_MANS = Core/pinot-index.1 \ Core/pinot-search.1 \ Core/pinot-daemon.1 \ UI/GTK3/src/pinot.1 endif dbus-code: @echo "Did you remove the SearchProvider2 interface?" @gdbus-codegen-glibmm3 --generate-cpp-code=PinotDBus ./Core/pinot-dbus-daemon.xml @mv PinotDBus_common.* Utils/ @mv PinotDBus_stub.* Core/ @mv PinotDBus_proxy.* IndexSearch/ @gdbus-codegen-glibmm3 --generate-cpp-code=SearchProvider /usr/share/dbus-1/interfaces/org.gnome.ShellSearchProvider2.xml @mv SearchProvider_common.* SearchProvider_stub.* Core/ @rm -f SearchProvider_proxy.* builder-translations: @scripts/bash/extract-gtk-builder-translations.sh >UI/GTK3/src/BuilderTranslations.h manuals: @help2man --no-info --no-discard-stderr --name "Index documents from the command-line" Core/pinot-index >Core/pinot-index.1 if HAVE_DBUS @help2man --no-info --no-discard-stderr --name "Label files from the command-line" IndexSearch/pinot-label >IndexSearch/pinot-label.1 endif @help2man --no-info --no-discard-stderr --name "Query search engines from the command-line" Core/pinot-search >Core/pinot-search.1 if HAVE_DBUS @help2man --no-info --no-discard-stderr --name "D-Bus search and index daemon" Core/pinot-dbus-daemon >Core/pinot-dbus-daemon.1 else @help2man --no-info --no-discard-stderr --name "Search and index daemon" Core/pinot-daemon >Core/pinot-daemon.1 endif @help2man --no-info --no-discard-stderr --name "A metasearch tool for the Free Desktop" UI/GTK3/src/pinot >UI/GTK3/src/pinot.1 install-data-local: @ln -fs $(bindir)/pinot $(DESTDIR)$(bindir)/pinot-prefs @mkdir -p $(DESTDIR)$(sysconfdir)/pinot $(INSTALL_DATA) $(srcdir)/Tokenize/filters/external-filters.xml $(DESTDIR)$(sysconfdir)/pinot/external-filters.xml $(INSTALL_DATA) $(srcdir)/globalconfig.xml $(DESTDIR)$(sysconfdir)/pinot/globalconfig.xml $(INSTALL_DATA) $(srcdir)/textcat*conf.txt $(DESTDIR)$(sysconfdir)/pinot/ @mkdir -p $(DESTDIR)$(datadir)/pinot if HAVE_DBUS $(INSTALL_DATA) $(srcdir)/Core/pinot-dbus-daemon.xml $(DESTDIR)$(datadir)/pinot/pinot-dbus-daemon.xml @mkdir -p $(DESTDIR)$(datadir)/dbus-1/services $(INSTALL_DATA) $(builddir)/Core/com.github.fabricecolin.Pinot.service $(DESTDIR)$(datadir)/dbus-1/services/com.github.fabricecolin.Pinot.service endif $(INSTALL_DATA) $(srcdir)/UI/GTK3/metase-gtk3.gtkbuilder $(DESTDIR)$(datadir)/pinot/metase-gtk3.gtkbuilder @mkdir -p $(DESTDIR)$(datadir)/pinot/engines if HAVE_BOOST_SPIRIT $(INSTALL_DATA) $(srcdir)/IndexSearch/Plugins/*.src $(DESTDIR)$(datadir)/pinot/engines/ endif $(INSTALL_DATA) $(srcdir)/IndexSearch/Plugins/*.xml $(DESTDIR)$(datadir)/pinot/engines/ @mkdir -p $(DESTDIR)$(libdir)/pinot/filters @rm -f $(DESTDIR)$(libdir)/lib*filter.a $(DESTDIR)$(libdir)/lib*filter.la @mv $(DESTDIR)$(libdir)/lib*filter* $(DESTDIR)$(libdir)/pinot/filters/ @mkdir -p $(DESTDIR)$(libdir)/pinot/backends @rm -f $(DESTDIR)$(libdir)/lib*backend.a $(DESTDIR)$(libdir)/lib*backend.la @mv $(DESTDIR)$(libdir)/lib*backend* $(DESTDIR)$(libdir)/pinot/backends/ @mkdir -p $(DESTDIR)$(datadir)/pinot/stopwords @mkdir -p $(DESTDIR)$(datadir)/icons/hicolor/48x48/apps/ $(INSTALL_DATA) $(srcdir)/UI/icons/48x48/pinot.png $(DESTDIR)$(datadir)/icons/hicolor/48x48/apps/pinot.png @mkdir -p $(DESTDIR)$(datadir)/icons/hicolor/32x32/apps/ $(INSTALL_DATA) $(srcdir)/UI/icons/32x32/pinot.png $(DESTDIR)$(datadir)/icons/hicolor/32x32/apps/pinot.png @mkdir -p $(DESTDIR)$(datadir)/icons/hicolor/24x24/apps/ $(INSTALL_DATA) $(srcdir)/UI/icons/24x24/pinot.png $(DESTDIR)$(datadir)/icons/hicolor/24x24/apps/pinot.png @mkdir -p $(DESTDIR)$(datadir)/icons/hicolor/22x22/apps/ $(INSTALL_DATA) $(srcdir)/UI/icons/22x22/pinot.png $(DESTDIR)$(datadir)/icons/hicolor/22x22/apps/pinot.png @mkdir -p $(DESTDIR)$(datadir)/icons/hicolor/16x16/apps/ $(INSTALL_DATA) $(srcdir)/UI/icons/16x16/pinot.png $(DESTDIR)$(datadir)/icons/hicolor/16x16/apps/pinot.png @mkdir -p $(DESTDIR)$(datadir)/applications @desktop-file-install --vendor="" --dir=$(DESTDIR)$(datadir)/applications $(srcdir)/pinot.desktop @desktop-file-install --vendor="" --dir=$(DESTDIR)$(datadir)/applications $(srcdir)/pinot-prefs.desktop @mkdir -p $(DESTDIR)${sysconfdir}/xdg/autostart if HAVE_DBUS @desktop-file-install --vendor="" --dir=$(DESTDIR)${sysconfdir}/xdg/autostart $(srcdir)/pinot-dbus-daemon.desktop @mkdir -p $(DESTDIR)$(datadir)/gnome-shell/search-providers $(INSTALL_DATA) $(builddir)/Core/com.github.fabricecolin.Pinot.search-provider.ini $(DESTDIR)${datadir}/gnome-shell/search-providers/com.github.fabricecolin.Pinot.search-provider.ini endif $(INSTALL_DATA) $(srcdir)/scripts/bash/*.sh $(DESTDIR)$(datadir)/pinot/ uninstall-local: @rm -rf $(DESTDIR)$(sysconfdir)/pinot @rm -rf $(DESTDIR)$(datadir)/pinot @rm -rf $(DESTDIR)$(datadir)/dbus-1/services/com.github.fabricecolin.Pinot.service @rm -rf $(DESTDIR)${datadir}/gnome-shell/search-providers/com.github.fabricecolin.Pinot.search-provider.ini @rm -rf $(DESTDIR)$(libdir)/pinot @rm -rf $(DESTDIR)$(datadir)/icons/hicolor/48x48/apps/pinot.png @rm -rf $(DESTDIR)$(datadir)/icons/hicolor/32x32/apps/pinot.png @rm -rf $(DESTDIR)$(datadir)/icons/hicolor/24x24/apps/pinot.png @rm -rf $(DESTDIR)$(datadir)/icons/hicolor/22x22/apps/pinot.png @rm -rf $(DESTDIR)$(datadir)/icons/hicolor/16x16/apps/pinot.png @rm -rf $(DESTDIR)$(datadir)/applications/pinot.desktop @rm -rf $(DESTDIR)${sysconfdir}/xdg/autostart/pinot-dbus-daemon.desktop pinot-1.22/Monitor/000077500000000000000000000000001470740426600142305ustar00rootroot00000000000000pinot-1.22/Monitor/INotifyMonitor.cpp000066400000000000000000000316541470740426600176760ustar00rootroot00000000000000/* * Copyright 2005-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #ifdef HAVE_SYS_INOTIFY_H #include #endif #include #include #include #include #include #include "INotifyMonitor.h" using std::clog; using std::clog; using std::endl; using std::string; using std::map; using std::set; using std::queue; using std::pair; using std::ifstream; INotifyMonitor::INotifyMonitor() : MonitorInterface(), m_maxUserWatches(0), m_watchesCount(0) { pthread_mutex_init(&m_mutex, NULL); m_monitorFd = inotify_init(); if (m_monitorFd < 0) { clog << "Couldn't initialize inotify: error " << errno << endl; } // FIXME: check for existence of /proc ifstream inputFile; inputFile.open("/proc/sys/fs/inotify/max_user_watches"); if (inputFile.good() == true) { inputFile >> m_maxUserWatches; inputFile.close(); if (m_maxUserWatches > 8192) { // Don't be greedy, leave some for other processes m_maxUserWatches -= 1024; } } } INotifyMonitor::~INotifyMonitor() { if (m_monitorFd >= 0) { close(m_monitorFd); } pthread_mutex_destroy(&m_mutex); } bool INotifyMonitor::removeWatch(const string &location) { map::iterator locationIter = m_locations.find(location); if (locationIter != m_locations.end()) { inotify_rm_watch(m_monitorFd, locationIter->second); --m_watchesCount; map::iterator watchIter = m_watches.find(locationIter->second); if (watchIter != m_watches.end()) { m_watches.erase(watchIter); } m_locations.erase(locationIter); return true; } else { clog << location << " is not being monitored" << endl; } return false; } bool INotifyMonitor::retrievePendingEvents(queue &events, bool dropAll) { set removedLocations; char buffer[1024]; unsigned int queueLen = 0; size_t offset = 0; if (m_monitorFd < 0) { return false; } if (pthread_mutex_lock(&m_mutex) != 0) { return false; } // Copy internal events while (m_internalEvents.empty() == false) { MonitorEvent &internalEvent = m_internalEvents.front(); events.push(internalEvent); // Next m_internalEvents.pop(); } if (ioctl(m_monitorFd, FIONREAD, &queueLen) == 0) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: " << queueLen << " bytes to read" << endl; #endif } if (queueLen == 0) { // Nothing to read pthread_mutex_unlock(&m_mutex); return false; } int bytesRead = read(m_monitorFd, buffer, 1024); while ((bytesRead > 0) && (bytesRead - offset > 0)) { struct inotify_event *pEvent = (struct inotify_event *)&buffer[offset]; size_t eventSize = sizeof(struct inotify_event) + pEvent->len; if (dropAll == true) { offset += eventSize; continue; } #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: read " << eventSize << " bytes event at offset " << offset << endl; #endif // What location is this event for ? map::iterator watchIter = m_watches.find(pEvent->wd); if (watchIter == m_watches.end()) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: unknown watch " << pEvent->wd << endl; #endif offset += eventSize; continue; } MonitorEvent monEvent; monEvent.m_isWatch = true; if (pEvent->mask & IN_ISDIR) { monEvent.m_isDirectory = true; } monEvent.m_location = watchIter->second; // A name is provided if the target is below a location we match if (pEvent->len >= 1) { monEvent.m_location += "/"; monEvent.m_location += pEvent->name; monEvent.m_isWatch = false; } // What type of event ? if (pEvent->mask & IN_CREATE) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: created " << monEvent.m_location << endl; #endif monEvent.m_type = MonitorEvent::CREATED; } else if (pEvent->mask & IN_CLOSE_WRITE) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: written and closed " << monEvent.m_location << endl; #endif monEvent.m_type = MonitorEvent::WRITE_CLOSED; } else if (pEvent->mask & IN_MOVED_FROM) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: moved from on " << monEvent.m_location << " " << pEvent->cookie << endl; #endif // Store this until we receive a IN_MOVED_TO event m_movedFrom.insert(pair(pEvent->cookie, monEvent)); } else if (pEvent->mask & IN_MOVED_TO) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: moved to on " << monEvent.m_location << " " << pEvent->cookie << endl; #endif // What was the previous location ? map::iterator movedIter = m_movedFrom.find(pEvent->cookie); if (movedIter != m_movedFrom.end()) { monEvent.m_previousLocation = movedIter->second.m_location; monEvent.m_type = MonitorEvent::MOVED; #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: moved from " << monEvent.m_previousLocation << endl; #endif m_movedFrom.erase(movedIter); // Has a watch moved ? if ((monEvent.m_isWatch == true) && (monEvent.m_previousLocation == watchIter->second)) { // Update the location for this watch map::iterator locationIter = m_locations.find(watchIter->second); if (locationIter != m_locations.end()) { m_locations.erase(locationIter); m_locations[monEvent.m_location] = pEvent->wd; } watchIter->second = monEvent.m_location; } } else { // The previous location is unknown because it's from somewhere not being monitored monEvent.m_type = MonitorEvent::CREATED; #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: don't know where file was moved from" << endl; #endif } } else if (pEvent->mask & IN_MOVE_SELF) { map::iterator movedIter = m_movedFrom.end(); #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: moved self on " << monEvent.m_location << " " << pEvent->cookie << endl; #endif // It was moved somewhere not being monitored if (pEvent->cookie == 0) { for (movedIter = m_movedFrom.begin(); movedIter != m_movedFrom.end(); ++movedIter) { if (movedIter->second.m_location == monEvent.m_location) { // For some reason, IN_ISDIR is not set when the cookie is 0 if (movedIter->second.m_isDirectory == true) { monEvent.m_isDirectory = true; } break; } } } else { movedIter = m_movedFrom.find(pEvent->cookie); } if (movedIter != m_movedFrom.end()) { monEvent.m_type = MonitorEvent::DELETED; m_movedFrom.erase(movedIter); } } else if (pEvent->mask & IN_DELETE) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: deleted " << monEvent.m_location << endl; #endif monEvent.m_type = MonitorEvent::DELETED; } else if (pEvent->mask & IN_DELETE_SELF) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: deleted self on " << monEvent.m_location << endl; #endif if (monEvent.m_isWatch == true) { removedLocations.insert(monEvent); } } else if (pEvent->mask & IN_UNMOUNT) { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: unmounted on " << monEvent.m_location << endl; #endif if (monEvent.m_isWatch == true) { // Watches are removed silently if the backing filesystem is unmounted removedLocations.insert(monEvent); } } else { #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: ignoring event " << pEvent->mask << " on " << monEvent.m_location << endl; #endif } // Return event ? if (monEvent.m_type != MonitorEvent::UNKNOWN) { events.push(monEvent); } // Any IN_MOVED_FROM event for which we didn't get a IN_MOVED_TO ? time_t now = time(NULL); map::iterator movedIter = m_movedFrom.begin(); while (movedIter != m_movedFrom.end()) { // The file was probably moved to an unmonitored location on the same filesystem if (movedIter->second.m_time + 60 < now) { // It's as good as if it was deleted movedIter->second.m_type = MonitorEvent::DELETED; events.push(movedIter->second); #ifdef DEBUG clog << "INotifyMonitor::retrievePendingEvents: don't know where " << movedIter->second.m_location << " was moved to" << endl; #endif map::iterator nextMovedIter = movedIter; ++nextMovedIter; m_movedFrom.erase(movedIter); movedIter = nextMovedIter; } else { ++movedIter; } } offset += eventSize; } // Any location to remove ? for (set::const_iterator removalIter = removedLocations.begin(); removalIter != removedLocations.end(); ++removalIter) { removeWatch(removalIter->m_location); addLocation(removalIter->m_location, removalIter->m_isDirectory); } pthread_mutex_unlock(&m_mutex); return true; } /// Returns the maximum number of files that can be monitored. unsigned int INotifyMonitor::getLimit(void) const { return m_maxUserWatches; } /// Adds a watch for the specified location. bool INotifyMonitor::addLocation(const string &location, bool isDirectory) { uint32_t eventsMask = IN_CLOSE_WRITE|IN_MOVE|IN_CREATE|IN_DELETE|IN_UNMOUNT|IN_MOVE_SELF|IN_DELETE_SELF; bool addedLocation = false; if ((location.empty() == true) || (location == "/") || (m_monitorFd < 0) || (m_watchesCount > m_maxUserWatches)) { return false; } if (access(location.c_str(), F_OK) != 0) { return false; } if (pthread_mutex_lock(&m_mutex) != 0) { return false; } map::iterator locationIter = m_locations.find(location); if (locationIter != m_locations.end()) { // This is already being monitored addedLocation = true; } else { int watchNum = inotify_add_watch(m_monitorFd, location.c_str(), eventsMask); if (watchNum >= 0) { ++m_watchesCount; // Generate an event to signal the file exists and is being monitored if (isDirectory == false) { MonitorEvent monEvent; monEvent.m_location = location; monEvent.m_isWatch = true; monEvent.m_type = MonitorEvent::EXISTS; monEvent.m_isDirectory = false; m_internalEvents.push(monEvent); } m_watches.insert(pair(watchNum, location)); m_locations.insert(pair(location, watchNum)); #ifdef DEBUG clog << "INotifyMonitor::addLocation: added watch " << watchNum << " for " << location << endl; #endif addedLocation = true; } else { if (errno == ENOSPC) { // There are no watches left m_watchesCount = m_maxUserWatches + 1; } clog << "Couldn't monitor " << location << endl; } } pthread_mutex_unlock(&m_mutex); return addedLocation; } /// Removes the watch for the specified location. bool INotifyMonitor::removeLocation(const string &location) { bool removedLocation = false; if ((location.empty() == true) || (m_monitorFd < 0)) { return false; } if (pthread_mutex_lock(&m_mutex) != 0) { return false; } removedLocation = removeWatch(location); pthread_mutex_unlock(&m_mutex); return removedLocation; } /// Removes watches for the specified location and all underneath. bool INotifyMonitor::removeLocations(const string &location) { if ((location.empty() == true) || (m_monitorFd < 0)) { return false; } if (pthread_mutex_lock(&m_mutex) != 0) { return false; } map::iterator locationIter = m_locations.begin(); while (locationIter != m_locations.end()) { if ((locationIter->first.length() >= location.length()) && (locationIter->first.find(location) == 0)) { inotify_rm_watch(m_monitorFd, locationIter->second); --m_watchesCount; map::iterator watchIter = m_watches.find(locationIter->second); if (watchIter != m_watches.end()) { m_watches.erase(watchIter); } locationIter = m_locations.erase(locationIter); } else { ++locationIter; } } pthread_mutex_unlock(&m_mutex); return true; } /// Retrieves pending events. bool INotifyMonitor::retrievePendingEvents(queue &events) { return retrievePendingEvents(events, false); } /// Drops pending events. void INotifyMonitor::dropPendingEvents(void) { bool readEvents = false; do { readEvents = retrievePendingEvents(m_internalEvents, true); } while (readEvents == true); while (m_internalEvents.empty() == false) { // Next m_internalEvents.pop(); } } pinot-1.22/Monitor/INotifyMonitor.h000066400000000000000000000043231470740426600173340ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _INOTIFY_MONITOR_H #define _INOTIFY_MONITOR_H #include #include #include #include #include #include "MonitorInterface.h" /// Linux inotify monitor. class INotifyMonitor : public MonitorInterface { public: INotifyMonitor(); virtual ~INotifyMonitor(); /// Returns the maximum number of files that can be monitored. virtual unsigned int getLimit(void) const; /// Adds a watch for the specified location. virtual bool addLocation(const std::string &location, bool isDirectory); /// Removes the watch for the specified location. virtual bool removeLocation(const std::string &location); /// Removes watches for the specified location and all underneath. virtual bool removeLocations(const std::string &location); /// Retrieves pending events. virtual bool retrievePendingEvents(std::queue &events); /// Drops pending events. virtual void dropPendingEvents(void); protected: pthread_mutex_t m_mutex; std::queue m_internalEvents; std::map m_locations; std::map m_movedFrom; unsigned int m_maxUserWatches; unsigned int m_watchesCount; bool removeWatch(const std::string &location); bool retrievePendingEvents(std::queue &events, bool dropAll); private: INotifyMonitor(const INotifyMonitor &other); INotifyMonitor &operator=(const INotifyMonitor &other); }; #endif // _INOTIFY_MONITOR_H pinot-1.22/Monitor/Makefile.am000066400000000000000000000014421470740426600162650ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in pkginclude_HEADERS = \ INotifyMonitor.h \ MonitorEvent.h \ MonitorFactory.h \ MonitorHandler.h \ MonitorInterface.h pkglib_LTLIBRARIES = libMonitor.la libMonitor_la_LDFLAGS = \ -static libMonitor_la_SOURCES = \ MonitorEvent.cpp \ MonitorFactory.cpp \ MonitorHandler.cpp if HAVE_LINUX_INOTIFY libMonitor_la_SOURCES += INotifyMonitor.cpp endif libMonitor_la_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils \ -I$(top_srcdir)/Tokenize \ -I$(top_srcdir)/Tokenize/filters \ -I$(top_srcdir)/SQL \ -I$(top_srcdir)/Collect \ -I$(top_srcdir)/Index \ -I$(top_srcdir)/Search \ @HTTP_CFLAGS@ @XML_CFLAGS@ @INDEX_CFLAGS@ \ @GMIME_CFLAGS@ @SIGCPP_CFLAGS@ if HAVE_LINUX_INOTIFY libMonitor_la_CXXFLAGS += -DHAVE_LINUX_INOTIFY endif pinot-1.22/Monitor/MonitorEvent.cpp000066400000000000000000000036721470740426600173750ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "TimeConverter.h" #include "MonitorEvent.h" using std::string; MonitorEvent::MonitorEvent() : m_isWatch(false), m_type(UNKNOWN), m_isDirectory(false), m_time(time(NULL)) { } MonitorEvent::MonitorEvent(const MonitorEvent &other) : m_location(other.m_location), m_previousLocation(other.m_previousLocation), m_isWatch(other.m_isWatch), m_type(other.m_type), m_isDirectory(other.m_isDirectory), m_time(other.m_time) { } MonitorEvent::~MonitorEvent() { } MonitorEvent& MonitorEvent::operator=(const MonitorEvent& other) { if (this != &other) { m_location = other.m_location; m_previousLocation = other.m_previousLocation; m_isWatch = other.m_isWatch; m_type = other.m_type; m_isDirectory = other.m_isDirectory; m_time = other.m_time; } return *this; } bool MonitorEvent::operator<(const MonitorEvent& other) const { if (m_location < other.m_location) { return true; } else if (m_location == other.m_location) { if (m_type < other.m_type) { return true; } } return false; } bool MonitorEvent::operator==(const MonitorEvent& other) const { if ((m_location == other.m_location) && (m_type == other.m_type)) { return true; } return false; } pinot-1.22/Monitor/MonitorEvent.h000066400000000000000000000026351470740426600170400ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _MONITOR_EVENT_H #define _MONITOR_EVENT_H #include #include /// An event generated by a monitor. class MonitorEvent { public: MonitorEvent(); MonitorEvent(const MonitorEvent &other); virtual ~MonitorEvent(); MonitorEvent& operator=(const MonitorEvent& other); bool operator<(const MonitorEvent& other) const; bool operator==(const MonitorEvent& other) const; typedef enum { UNKNOWN = 0, EXISTS, CREATED, WRITE_CLOSED, MOVED, DELETED } EventType; std::string m_location; std::string m_previousLocation; bool m_isWatch; EventType m_type; bool m_isDirectory; time_t m_time; }; #endif // _MONITOR_EVENT_H pinot-1.22/Monitor/MonitorFactory.cpp000066400000000000000000000021421470740426600177120ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #ifdef HAVE_LINUX_INOTIFY #include "INotifyMonitor.h" #endif #include "MonitorFactory.h" MonitorFactory::MonitorFactory() { } MonitorFactory::~MonitorFactory() { } /// Returns a Monitor. MonitorInterface *MonitorFactory::getMonitor(void) { #ifdef HAVE_LINUX_INOTIFY return new INotifyMonitor(); #else return NULL; #endif } pinot-1.22/Monitor/MonitorFactory.h000066400000000000000000000022611470740426600173610ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _MONITOR_FACTORY_H #define _MONITOR_FACTORY_H #include "MonitorInterface.h" /// Factory for monitors. class MonitorFactory { public: virtual ~MonitorFactory(); /// Returns a Monitor. static MonitorInterface *getMonitor(void); protected: MonitorFactory(); private: MonitorFactory(const MonitorFactory &other); MonitorFactory& operator=(const MonitorFactory& other); }; #endif // _MONITOR_FACTORY_H pinot-1.22/Monitor/MonitorHandler.cpp000066400000000000000000000033431470740426600176640ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "MonitorHandler.h" using namespace std; MonitorHandler::MonitorHandler() { } MonitorHandler::~MonitorHandler() { } void MonitorHandler::initialize(void) { } void MonitorHandler::flushIndex(void) { } bool MonitorHandler::fileExists(const string &fileName) { return false; } bool MonitorHandler::fileCreated(const string &fileName) { return false; } bool MonitorHandler::directoryCreated(const string &dirName) { return false; } bool MonitorHandler::fileModified(const string &fileName) { return false; } bool MonitorHandler::fileMoved(const string &fileName, const string &previousFileName) { return false; } bool MonitorHandler::directoryMoved(const string &dirName, const string &previousDirName) { return false; } bool MonitorHandler::fileDeleted(const string &fileName) { return false; } bool MonitorHandler::directoryDeleted(const string &dirName) { return false; } const set &MonitorHandler::getFileNames(void) const { return m_fileNames; } pinot-1.22/Monitor/MonitorHandler.h000066400000000000000000000044431470740426600173330ustar00rootroot00000000000000/* * Copyright 2005,2006 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _MONITORHANDLER_HH #define _MONITORHANDLER_HH #include #include #include #include "MonitorInterface.h" /// Handles events generated by a monitor. class MonitorHandler { public: MonitorHandler(); virtual ~MonitorHandler(); /// Initializes things before starting monitoring. virtual void initialize(void); /// Handles flushing the index. virtual void flushIndex(void); /// Handles file existence events. virtual bool fileExists(const std::string &fileName); /// Handles file creation events. virtual bool fileCreated(const std::string &fileName); /// Handles directory creation events. virtual bool directoryCreated(const std::string &dirName); /// Handles file modified events. virtual bool fileModified(const std::string &fileName); /// Handles file moved events. virtual bool fileMoved(const std::string &fileName, const std::string &previousFileName); /// Handles directory moved events. virtual bool directoryMoved(const std::string &dirName, const std::string &previousDirName); /// Handles file deleted events. virtual bool fileDeleted(const std::string &fileName); /// Handles directory deleted events. virtual bool directoryDeleted(const std::string &dirName); /// Returns the names of files to monitor. const std::set &getFileNames(void) const; protected: std::set m_fileNames; private: MonitorHandler(const MonitorHandler &other); MonitorHandler &operator=(const MonitorHandler &other); }; #endif // _MONITORHANDLER_HH pinot-1.22/Monitor/MonitorInterface.h000066400000000000000000000041031470740426600176470ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _MONITOR_INTERFACE_H #define _MONITOR_INTERFACE_H #include #include #include #include "MonitorEvent.h" /// Interface implemented by all monitors. class MonitorInterface { public: virtual ~MonitorInterface() { } /// Returns the file descriptor to poll for events. virtual int getFileDescriptor(void) const { return m_monitorFd; } /// Returns the maximum number of files that can be monitored. virtual unsigned int getLimit(void) const = 0; /// Adds a watch for the specified location. virtual bool addLocation(const std::string &location, bool isDirectory) = 0; /// Removes the watch for the specified location. virtual bool removeLocation(const std::string &location) = 0; /// Removes watches for the specified location and all underneath. virtual bool removeLocations(const std::string &location) = 0; /// Retrieves pending events. virtual bool retrievePendingEvents(std::queue &events) = 0; /// Drops pending events. virtual void dropPendingEvents(void) = 0; protected: std::map m_watches; int m_monitorFd; MonitorInterface() : m_monitorFd(-1) { } private: MonitorInterface(const MonitorInterface &other); MonitorInterface &operator=(const MonitorInterface &other); }; #endif // _MONITOR_INTERFACE_H pinot-1.22/NEWS000066400000000000000000001344131470740426600133060ustar00rootroot000000000000002024/10/27 version_1_2_2 General : - const correctness and Debian unstable build fixes - support for exiv2 0.28 - require libarchive >= 3.0.0 - require libxml++ >= 3.2.0 - several warning fixes 2022/02/22 version_1_2_1 General : - removed code paths around diacritics sensitivity - removed code paths pertaining to other query languages - auto-generate the ChangeLog file IndexSearch : - removed historical checks on old Xapian releases - replaced homegrown abstract solution with Xapian's UI : - fixed query history 2021/10/16 version_1_2_0 General : - dropped xdgmime in favour of gio - require giomm >= 2.6 - run rst2txt on RST files, if available IndexSearch : - removed obsolete search plugins, dead code - added a plugin for Arxiv - better results with the dir query filter - fixed size range queries - synced with how Xapian Omega indexes file extensions - added sort by file size (descending) - pinot-index shows document IDs to facilitate troubleshooting with xapian-delve - require Xapian >= 1.4.10 - reindexing is recommended. Daemon : - fixed a bug that would cause the loss of documents for one directory when crawling ends for the next one - overhauled the D-Bus implementation, extended the interface - dropped dependency on dbus-glib - reduced flushing - a IndexFlushEpoch property indicates the last time index changed were flushed to disk - rely on org.freedesktop.UPower's OnBattery on systems with a battery - implemented the org.gnome.Shell.SearchProvider2 interface. The content of documents in directories indexed by the daemon can be searched through the Gnome 3 Shell. UI : - removed the gtk2 UI, refreshed the gtk3 UI - the gtk3 UI is loaded from a Glade XML file - reduced direct reading of the daemon's index - reopen the daemon's index as per the IndexFlushEpoch property - fixed file imports and error reporting - stored queries' results may be sorted by file size - rationalized activation and behaviour of the Edit menuitems - pinot may be started on any new query with -q/--query TERMS - require gtkmm >= 3.24 2020/01/09 version_1_1_0 General: - updated gmime, textcat, exiv, glib, OpenSSL and Xapian dependencies 2015/06/11 version_1_0_9 Tokenize : - new JSON filter - streamlined the mbox filter a bit - pinot-index looks for filters in the same locations as the UI IndexSearch : - sort documents by date, in ascending or descending order 2014/07/18 version_1_0_8 General : - don't install the xdgmime files - use libnotify if --enable-libnotify=yes is specified - fixed check on library symbols that could cause a crash, added support for LLVM-built filters - general clean up SQL : - refactored the DB interface 2014/05/22 version_1_0_7 General: - compilation fixes - prefer off_t for file sizes, offsets, buffer lengths Monitor : - try and reapply watches on directories that have just been removed Tokenize : - refactored encoding conversion IndexSearch : - fixed build with libexttextcat 2013/05/26 version_1_0_6 IndexSearch: - support boost 1.50's Spirit SQL : - better handling of potential errors while stepping through results Collect : - minor fixes to curl backend 2013/03/03 version_1_0_5 IndexSearch: - fixes to abstract generation, CJKV tokenization 2013/02/11 version_1_0_4 IndexSearch: - fix stripping of diacritics - stem subject terms 2013/01/14 version_1_0_3 IndexSearch : - fix a Unicode handling issue introduced in 1.01 UI : - updated French translation by Eliovir 2012/11/04 version_1_0_2 General : - turn memory pooling off by default to avoid issues with newer boost UI : - new Czech translation by Zbyněk Schwarz - updated Japanese translation by Takafumi Arakaki - updated Brazilian Portuguese translation by Adriano Steffler 2012/08/27 version_1_0_1 General : - run rst2html on RST files, if available, and if RST files are detected as such. See http://code.google.com/p/pinot-search/issues/detail?id=12 Tokenize : - better mbox parts extraction IndexSearch : - dropped unac in favour of own code, resulting in faster indexing. - pinot-index --override MIMETYPE:EXTENSION overrides MIME type detection based on files extensions 2012/06/16 version_1_0_0 General : - install headers and libraries - prefer default programs that don't support URIs to view local files Deskbar : - dropped support for Deskbar since it's now dead Tokenize : - better mbox parsing. The internal part numbering scheme has changed; reindexing email is recommended. IndexSearch : - support for LibreOffice's libexttextcat v3.2 - fixed the Google plugin - renamed the Freshmeat plugin to Freecode UI : - updated Simplified Chinese translation by happymeng - updated German translation by Gena Haltmair - initial GTK+ 3 port. Enable with "./configure ... --enable-gtkmm3=yes" 2011/11/07 version_0_9_8 Tokenize : - new exiv2-based filter - new chmlib-based filter IndexSearch : - support for LibreOffice's libexttextcat v3.1, and possibly v3.1.1 - dropped plugins for Yahoo! REST API, Yahoo! BOSS, Google Code Search and RollYO UI : - query results and view history are expired after 6 months - updated Dutch translation by Martijn Verstrate and Tico - updated German translation by Fitoschido - updated Russian translation by Nikolay Kachanov - updated Spanish translation by pkramerruiz 2011/01/09 version_0_9_7 General : - replaced custom memory pool class with Boost's - don't try and map more than 2Gb and use shared mappings Deskbar : - install the module where DeskBar > 2.28 expects it to be IndexSearch : - remove dots at the end of terms that don't look like acronyms - index components of acronyms and dot-separated terms on their own Daemon : - index files as they are crawled, don't delegate indexing to other threads, unless PINOT_MAXIMUM_INDEX_THREADS > 1 - fixed checking of symlinks against black-list UI : - the maximum number of results returned by the Query field and used to initialize new stored queries follows PINOT_MAXIMUM_QUERY_RESULTS - when a spelling suggestion is available, don't show the same revised query multiple times - updated Simplified Chinese translation by mike2718 - updated Dutch translation by Dirk Roos - updated Italian translation by Davide Vidal and Simone Sandri - updated Japanese translation by Mizuki-san - updated Brazilian Portuguese translation by feen - updated Portuguese translation by Almufadado - updated Russian translation by Alexander Zinin and Nikolay Kachanov - updated Spanish translation by Juan Miguel Boyero Corral, Matias Fonzo and Fitoschido 2010/07/12 version_0_9_6 General : - fixed "GIO can sniff PNG" program used at configure time - README clarifies that operators should be upper-case - builds with gmime-2.4 or 2.6 - link with the library that has dlopen() and fix Debian bug #556062 - merged Debian's patch for --as-needed support - merged FreeBSD build patch - dropped support for Xesam SQL : - fixed prepared statements interface to work with insertion and deletion - sleep then retry operations if the database is busy - better transaction support Tokenize : - the mbox filter now supports messages of type "message/external-body" IndexSearch : - fixed possible crash at exit time when the textcat configuration file points to non-existing model files Xapian : - prefer the Chert back-end if available. Applicable to Xapian >= 1.2.0. Daemon : - check symlinks against black-list - the battery status can now be obtained from DeviceKit-power or upower UI : - updated Simplified Chinese translation by Eleanor Chen - updated French translation by verdy_p and Fabrice Colin - updated Hebrew translation by Yaron - updated Brazilian Portuguese translation by andbelo 2009/11/14 version_0_9_5 General : - OpenBSD support, thanks to the work of Antoine Jacoutot - fixed build when HAVE_DBUS isn't set SQL : - use prepared statements on most common queries, transactions on mass updates Tokenize : - better handling of acronyms IndexSearch : - updated Bing plugin - removed plugins for Exalead and IOI Xapian : - fixed the "path:" operator. Reindexing may be necessary - rewrote Search This For feature UI : - if gtkmm >= 2.16 is available, the Find button is replaced with an icon - updated Dutch translation by JW - updated French translation by Thierry Thomas - updated German translation by Fabian Affolter and Marco Jahn - updated Hebrew translation by Ddorda - updated Portuguese translation by Bernardo Lopes - updated Spanish translation by Jesus Tramullas and DiegoJ 2009/06/27 version_0_9_4 General : - set _FILE_OFFSET_BITS=64 and fix Debian bug #530572 - merged Funda Wang's linkage patch for Mandriva - gmime 2.4 is required Tokenize : - mbox filter now fully works with gmime 2.4 - archives filter supports Debian packages - set the close-on-exec flag on document files - better MIME type detection removes superfluous calls to external uncompressor programs when dealing with archives - use file names as title for files attached to mbox messages - fixed "quashing" of results titles - fixed indexing of the last document's attachments in an mbox IndexSearch : - new search filter "inurl" allows finding files from an mbox or archive at a given URL - pinot-index --check on an mbox or archive will return the ID of the first nested document - pinot-search shows an estimate of the total number of results - the Bing plugin replaces the MSN plugin - fixed Freshmeat plugin Daemon : - send a IndexFlushed signal over D-Bus when the index changes on disk - fixed restoring of user-set metadata UI : - better MIME type detection fixes cases where documents nested in archives couldn't be open and viewed - reopen the index upon receiving the IndexFlushed signal - show properties of external indices' documents read-only - on exit, delete temporary files created for viewing some documents - fixed More Like This on Web results 2009/04/13 version_0_9_3 Tokenize : - moved the first 5Mb limit from the terms generator to the tokenizer Daemon : - fixed major bug that caused the daemon to reindex all files on each run, unless started in full scan mode 2009/04/10 version_0_9_2 General : - fixed successive initialization and cleanup of libxml2 that could lead to a crash with libxml2 2.7.3 IndexSearch : - work around invalid charset declarations in documents - fixed pinot-index handling of black-listed documents - redesigned how documents nested in other documents (eg mbox...) are indexed - removed MozDex plugin Tokenize : - new filter for tar files and ISO images based on libarchive >= 2.6.2 to index the content of those archive formats. Enable with "./configure ... --enable-libarchive=yes" Daemon : - major changes to try and minimize memory usage - PINOT_MAXIMUM_INDEX_THREADS sets the daemon's number of indexing threads and defaults to 4 - indexes created with version < 0.92 will be automatically upgraded UI : - can open/view files within indexed archives 2009/03/07 version_0_9_1 General : - patch by Adel Gadllah for gcc 4.4 - removed obsolete Encoding field from .desktop files Tokenize : - new HTML filter based on Xapian Omega's HTML parser - prevent rpm from choking on files with the ".rpm" extension that are not RPMs - look for the ROBOTS metatag in remote documents only - only consider the first 5Mb of documents IndexSearch : - pinot-search can run stored queries created by the UI - pinot-index can deal with relative paths, index directories and their contents, open My Web Pages, My Documents or other UI-configured index by name Daemon : - fixed issue where symlinks would get unindexed every second run - fixed memory leak in time to timestamp conversions - fixed memory leak when reapplying user-set metadata - fixed memory leak when skipping the download of a local file - don't flush the index while files are being indexed UI : - fixed spelling suggestions on the live query - the Import URL option can import local directories and their contents in My Web Pages - updated German translation by Gena Haltmair - updated Portuguese translation by Flávio Martins 2009/01/29 version_0_9_0 General : - builds with MingW - patch by Martin Michlmayr to fix gcc 4.4 build errors Tokenize : - skip mbox messages flagged deleted by Evolution - patch by Adel Gadllah to build with gmime 2.4 - catch conversions errors to/from unsupported charsets - fixed handling of Unicode space and punctuation code points Collect : - look for the extended attribute user.mime_type on local files - HTTP downloaders can do POST Monitor : - check /proc/sys/fs/inotify/max_user_watches and set aside 1k watches for other applications if possible - patch by Adrian Bunk to fix inotify support with recent kernel headers Search : - better rebasing of results' URLs - dropped A9 and BitTorrent plugins - added plugin for the Internet Open Index - fixed various issues with getting more than one results page from Web engines - support for HTML extracts in OpenSearch Response Xapian : - some terms were not always highlighted in the abstract - support for the "path:" operator - fix to always return the total results estimate - don't index the title without prefix as if it were in the text body, but let queries search across both text body and title. The "title" filter still allows searching titles exclusively. Requires Xapian >= 1.0.4. - always add a term for "dir:/" - MIME type terms don't include any charset specification, as intended - remove the original query's terms, stopwords, infrequent terms or similar terms if the stemming language is set from More Like queries - close all databases in an orderly manner Search : - pinot-search has a "sort by date first" mode Scripts : - pinot-cd.sh implements a "tagged cd" - pinot-check-file.sh simplifies determining if a file is in My Documents Deskbar : - pinot-module.py uses the new Query method, supports deskbar v2.24 snippets Daemon : - user-set metadata, including labels, is saved and restored when reindexing - D-Bus method RenameLabel obsoleted - D-Bus method Query replaces SimpleQuery and allows querying the same engines as the UI - skip symlinks that refer to places that have been crawled or will be crawled - if the daemon was interrupted while crawling, some files would never be indexed - indexes created with version < 0.90 will be automatically upgraded UI : - spelling suggestions are shown inline, above results. Upon selecting one and clicking the Yes button, a new query based on the selected suggestion will be created. - restructured menus - queries' Index Results option updates documents already in My Web Pages. This doesn't apply to those in My Documents - on More Like This, selected documents are indexed to My Web Pages if necessary - both results and index lists can be exported to CSV or XML - Open Parent opens the directory a file is in - say "No results" instead of showing a blank results list - when viewing a document and GIO is used, make sure we consider the default application(s) first - support for the "path:" operator - preferences can be open independantly with "pinot -p" or "pinot-prefs" - smoother status window - the List Contents Of menu wasn't refreshed after editing an index' name - automatic migration of pre-0.90 configuration - larger default blacklist - updated Simplified Chinese translation by rainofchaos - updated Dutch translation by JW - updated French translation by Frédéric Grosshans - updated German translation by Gena Haltmair - updated Japanese translation by Takeo Mizuki - updated Brazilian Portuguese translation by Henrique P. Machado - updated Portuguese translation by _PN_boy - updated Swedish translation by Daniel Nylander - new Hebrew translation by Yaron 2008/09/20 version_0_8_9 Xapian : - indexing and searching are now diacritics insensitive by default, thanks to Unac 1.7.0 by Loic Dachary - support for removal of stopwords at query time. Language specific lists should be installed in $PREFIX/share/pinot/stopwords and be named stopwords.language_code - better abstracts for short queries Daemon : - fixed indexing of plain text and XML files, following changes made in 0.88 - indexes created with version < 0.89 will be automatically upgraded UI : - fixed boolean operators in spelling suggestions, broken in 0.88 - spelling suggestion doesn't suggest the same thing over and over again - dehyphen queries on line breaks, useful with text pasted from an external document - updated Simplified Chinese translation by Aron Xu - updated Brazilian Portuguese translation by André Gondim 2008/08/30 version_0_8_8 General : - replace xdgmime with GIO if it can sniff PNG at configure time - with gcc 4.x, set symbol visibility to hidden by default Tokenize : - for unknown text formats, don't be too quick to fall back on the plain text filter - the output of external filters can be scanned if need be - some support for OpenXML formats Search : - obsolete Google API engine now built as a dynamic backend - backends provide slightly more information - updated several plugins - removed the CreativeCommons plugin Index : - pinot-index --showinfo shows which actions are associated with a MIME type Xapian : - only support boolean operators in upper-case. This helps fixing issues with queries made of text pasted from elsewhere. Daemon : - extended GetStatistics to return the flags "low disk space", "on battery" and "crawling" UI : - on first runs, create useful stored queries - display extended status in the Status window - updated Simplified Chinese translation by rainofchaos - updated German translation by Gena Haltmair - updated Brazilian Portuguese translation by Rafael Porto Rodrigues - updated Swedish translation by Daniel Nylander 2008/07/20 version_0_8_7 General : - install the Amazon API plugin with other plugins Collect : - obey META REFRESH if set Search : - basic Xesam back-end based on xesam-glib. Enable with configure's option --enable-xesam-glib - in plugins, setting a value to "EDIT:description" makes it editable and allows to assign it a value at search time - pinot-search can set editable parameters with -e/--seteditable - plugin for the new Yahoo! BOSS API Xapian : - skip very short non-CJKV terms when expanding queries. - fixed mangling of some CJKV queries - abstract generation is less skewed towards common terms UI : - support for drag-n-drop to the stored queries list. Dropping a file will create a query to look for similar documents. Dropping text will create a new query set to that text. - preferences let the user edit all editable parameters defined in the plugins. They are saved to the configuration file. - extracts can be selected, copied and dropped onto the queries list to create new queries - let foreground threads run for a minute max - search-only backends (such as Xesam) will appear in the Current User channel 2008/06/21 version_0_8_6 General : - make manuals generates the manuals with help2man - dropped date parser for curl's/neon's Tokenize : - decode emails subject lines properly - the HTML filter skips HTDig's no_index block SQL : - query history can keep more than one results set Collect : - use Last-Modified header as document's date Search : - don't run queries consisting exclusively of spaces - fixed A9 plugin, removed Accoona Xapian : - don't attempt offering suggestions for CJKV terms Daemon : - in ignore-version mode, reapply labels too - documents from directories removed from indexing/monitoring should now be unindexed on full scans - SIGTERM wasn't caught ! UI : - use buttons on notebook tabs - the live query text field doesn't offer suggestions for filters and ranges - after a query edit, lists of documents are refreshed correctly - fixed date displayed when viewing query history - query history keeps the last two sets - .desktop file was missing Japanese and Simplified Chinese comments - SIGTERM wasn't caught ! 2008/05/11 version_0_8_5 Build : - removed reference to m4 directory General : - synced with gtk+'s xdgmime Tokenize : - new libexif-based filter to extract image metadata - better conversion of mbox messages and HTML documents to UTF-8 - tweaks to the CJKV tokenizer SQL : - more abstract database interface Search : - fixed CJKV queries on Web engines - fixed repetition of CJKV characters in abstracts Xapian : - mixed CJKV queries should be processed correctly. See README. - fixed repetition of CJKV characters in abstracts - if the document specifies a language, double check it's valid Daemon : - check whether there's already a daemon process running and exit if there is - fixed concurrency issues - fixed crawling and monitoring of new directories UI : - if the global configuration file can't be open, don't reset the configuration - new Japanese translation by Takeo Mizuki 2008/03/27 version_0_8_4 Build : - patch by Adel Gadllah to fix gcc 4.3 build errors General : - updated FAQ with how to compact the index Monitor : - patch by Michael Biebl for inotify on m68k, mips, mipsel and hppa Search : - new plugin for UNdata Index : - fixed possible crash when pinot-index exits Xapian : - faster CJKV indexing - the spelling database is populated with CJKV terms too Daemon : - fixed possible crash when pinot-dbus-daemon exits - added option --ignore-version to deal with compacted indexes UI : - fixed possible crash when pinot exits - fixed issue with signaling between crawler and indexer - stored queries can index all, or only new, results - in Preferences, patterns can be reset to default values - the Status window shows whether the daemon was stopped by, or disconnected from D-Bus - new simplified Chinese translation by Ashlee Ma 2008/02/28 version_0_8_3 Build : - fixed build errors with gcc 4.3, thanks to Adel Gadllah - fixed backend and non-backend flags mismatch General : - dropped deprecated Encoding keys in .desktop files, as pointed out by David Paleino Tokenize : - filters definition in external-filters.xml can specify what charset the text output is in - convert documents into UTF-8 prior to indexing Search : - pinot-search supports option "--stemming LANGUAGE_NAME" - updated results parsing in Google.src Xapian : - initial support for CJKV. See README for details - consider stemmed terms when building extracts Daemon : - fixed options parsing - don't stop the directory crawler thread after 5 minutes UI : - fixed extract display, broken in the previous release - tabs can be reordered, notebook is scrollable - make sure the Status window doesn't miss crawler errors - updated Spanish translation by Jesus Tramullas 2008/01/26 version_0_8_2 Build : - don't link to unnecessary libraries Search : - removed the WiseNut plugin - fixed the Sherlock plugin parser's handling of input items, thanks to Claudio Bustos Navarrete - support for Xesam RC1 - don't build the Xesam UL parser if Spirit is not available, thanks to Reuben Thomas Xapian : - back-end moved into a dynamic library - fixed several issues with query stemming - generate terms for the MIME class Daemon : - log an error when there's no inotify watch left - export HasDocument over D-Bus UI : - when the index needs updating, tell the user on every run until he clicks the "Don't warn me again" checkbox - stemming is now configured separately and not driven by the "lang" filter - don't correct spelling of auto-generated and previously corrected queries - defer importing to the main window - Status window shows which engines are available - larger default blacklist - viewed documents are added to the list of recently used files. Requires gtkmm >= 2.10 - updated Dutch translation by JW - updated Spanish translation by Jesus Tramullas - updated Swedish translation by Zirro 2007/11/24 version_0_8_1 Build : - misc fixes General : - updated FAQ and README - fixed Icon field in desktop files Index : - workaround for broken shared-mime-info rules that identify HTML files as Mozilla bookmarks - files whose name includes a question mark were not indexed correctly - pinot-label would loop forever if the supplied file name wasn't in the index - don't build the spelling table if the env var PINOT_SPELLING_DB is set to NO Deskbar : - new plugin compatible with Deskbar 2.20 Daemon : - stop crawling and indexing if the partition on which the index resides is getting full. By default, that means less than 50 Mb. This can be overriden with the env var PINOT_MINIMUM_DISK_SPACE, eg PINOT_MINIMUM_DISK_SPACE=100 for 100 Mb - stop crawling when the system goes on battery and restart when on AC. This requires support for the freedesktop.org's Power Management spec, or pre-spec gnome-power-manager. UI : - fixed build against libsigc++ 2.1 - updated Portuguese translation by Tiago Silva - updated Swedish translation by Daniel Nylander 2007/11/01 version_0_8_0 Build : - SMP builds, thanks to Gabriel C Index : - unknown document types can be indexed if one of their parent types is known - new pinot-label tool to get, set and list labels on indexed files from the command-line - limit external programs to 5 minutes of CPU time Search : - support for date (year, month, day), time (hours, minutes, seconds) and size (in bytes) ranges - attempt to correct the spelling of index queries that don't match anything - log how long queries take - keep connection to remote databases alive - better query expansion - the Yahoo! plugin was replaced with the Yahoo! API plugin - pass queries to Web engines unmodified, without attempting to filter results based on a filter or a range used in the query Daemon : - fixed Reload method - new D-Bus methods to manage labels - new --reindex option UI : - Search This For menu to search in results - suggest spelling corrections for index queries that don't match anything - all indices can be browsed - inline URL completion in the import dialog box, based on previous results - better support for user-specific MIME settings, thanks to Lee Marks - reload MIME settings when they are edited - results of stored queries can be sorted by relevance (default) or by date - new History button to show previous results for a stored query - documents' properties are updated in the background - documents' terms can be saved to a file - Status window shows description of errors - send a Reload to the daemon only when the relevant preferences are modified - updated Dutch translation by JW - updated Brazilian Portuguese translation by Leonardo Melo 2007/08/23 version_0_7_6 Build : - also look for textcat.h in libtextcat Monitor : - a deletion would deadlock the monitor and prevent from processing any further event - unindex directories' contents when deleted and update when moved Collect : - try to open files with NO_ATIME if possible - Neon-based downloader had not been brought up to date Index : - replaced na(t)ive tokenizer with Xapian's TermGenerator. No effort is made to convert text to UTF-8 yet, so this depends on document formats and encodings - preliminary support for spelling corrections, without user feedback - index directories are tagged as cache directories so that they are skipped by "tar --exclude-caches" - preserve documents' title if possible, use user-specified title on import Search : - fixed Sherlock parser for boost 1.34 - support for gSOAP 2.7.9e - when searching an index, don't resort to OR'ing all terms if the original query doesn't match anything, this only confused users Deskbar : - removed unnecessary shebang in script Daemon : - new D-Bus method Reload, that enables to reload the configuration and act upon it whenever modified by the UI - SimpleQuery still resorts to OR'ing all terms if the original query doesn't match anything UI : - fixed some minor cosmetic bugs, tweaked a few things - user-specific MIME settings in ~/.local have priority over system settings - queries with at least a start date can be run, and filter a documents list - the label specified on import was ignored - new traditional Chinese translation by Yung-Chung Lin - updated Portuguese translation by _PN_boy - updated Swedish translation by Daniel Nylander 2007/07/28 version_0_7_5 General : - install Dijon's ChangeLog Index : - files with nested documents (eg mbox) could sometimes not be fully indexed and/or and the wrong MIME type was reported Search : - updated Xesam Query Language parser to reflect current spec - updated Sherlock plugin for Exalead, removed Ask Daemon : - mbox files are no longer configured separately : those found during a crawl are indexed and monitored automatically - patterns list can be used as a blacklist (default) or whitelist UI : - fixed a bug where filtering a documents list with an empty query would prevent from viewing the list without or without a filter query - avoid a crash when viewing the properties of several documents one after the other - updated Dutch translation by Balaam's Miracle - updated Portuguese translation by _PN_boy 2007/06/24 version_0_7_4 General : - make uninstall actually uninstalls all files Index : - can detect Hungarian, Romanian and Turkish with libtextcat 2.2 and stem with Xapian 1.0 - adopted Xapian 1.0's new indexing strategy Search : - basic support for the Xesam Query and User Language in pinot-search Daemon : - history database is separate from the UI's UI : - in index list tabs, replaced labels filtering with stored queries filtering, so that one can find out which and how many documents in the index being shown match a query (the query's maximum number of results is ignored) - Hungarian, Romanian and Turkish are valid document languages 2007/05/23 version_0_7_3 Tokenize : - mbox filter was broken in previous release - TagLib filter returns the file's name as title if no tag is found SQL : - don't force opening and closing the database on every request Index : - overwrite index on upgrades, it's faster than deleting all its documents - support for Xapian 1.0 API Search : - stemming wasn't activated when a language is set - support for Xapian 1.0 API Daemon : - don't scan for deleted files on every run UI : - fixed crash when unindexing documents - updated Dutch translation by JW - updated Italian translation by Marco Bazzani 2007/04/28 version_0_7_2 General : - synced with gtk+'s xdgmime Index : - add extra term to documents so that filter "dir:/" can be applied Search : - results lists can be saved as CSV or OpenSearch response XML/RSS Daemon : - indexes created with version < 0.72 will be automatically upgraded - fixed problem where too much time spent upgrading would cause the daemon to be killed UI : - open documents on HTTP/HTTPS with the default Web browser - updated German translation by Christian Dywan - updated Italian translation by Vincenzo Consales 2007/03/31 version_0_7_1 Tokenize : - HTML filter wasn't always properly initialized, which could cause a crash Monitor : - file creation wasn't acted upon - moves to unmonitored locations are treated as deletions Collect : - fixed building with a version of curl that doesn't depend on OpenSSL - dropped pinot-collect tool Index : - added versioning - label names, file names, directory names and URLs are escaped - directories are indexed as separate documents Search : - filters with spaces or control characters should be double-quoted, eg : file:"Cats & Dogs.txt" Daemon : - extended D-Bus methods Set and GetDocumentInfo - fixed off-by-one error in numbering of mail messages parts - blacklist wasn't applied to files for which the monitor reports events - an index created with Pinot < 0.71 is automatically upgraded Deskbar : - use deskbar.Utils.url_show() if available UI : - Status window is now live - View on a plain text message shows the mail headers - added proxy support for Web engines queries and collection of documents on HTTP - better work-around for desktop files that attempt setting env variables in Exec - use Gtk::ComboboxText where appropriate - German translation by Christian Dywan - Italian translation by Michele Angrisano - Portuguese translation by _PN_boy 2007/03/06 version_0_7_0 Tokenize : - now use Dijon's filters (http://dijon.berlios.de/) Index : - support for remote indexes served by xapian-progsrv+ssh - index to use by pinot-index specified with --db Search : - date range filtering applies to index searches - limit the number of results returned by pinot-search with --max Daemon : - log the daemon's PID - sped up unindexing of documents after a directory is deleted - caught up with changes in D-Bus 1.0 UI : - stored queries can do date range filtering. If dates don't make sense (eg From >= To), they are ignored - under the Session menu, Status shows various bits of information about the indexes and the daemon's crawler - prompt for command to use to open documents of a type for which no application is defined - revamped configuration dialog for external indexes - Index > Properties now shows a document's size and number of unique terms - most operations that involve peeking at the index are done in the background - work-around for desktop files that attempt setting env variables in Exec - caught up with changes in D-Bus 1.0 - Brazilian Portuguese translation by Leonardo Melo - Russian translation by Sergey Vostrikov 2006/12/21 version_0_6_5 Build : - added option --enable-debug to configure, --enable-soap replaces --with-soap - complain bitterly if libtextcat header is not found - Pthreads may be provided by a library other than libpthread - install configuration files in sysconfdir, libraries in libdir Tokenize : - optimized Ogg/MP3 filter Index : - fixed extraction of language and size from document data - store date terms for future date range filtering - fixed argument checking in pinot-index Daemon : - new DBus method SetDocumentsLabels to relabel several documents at once - process DBus messages in a separate thread, not in the main thread UI : - fixed expansion of .desktop's Exec - initialize D-Bus ! Not sure why this didn't cause problems before. - fixes for when the locale is not UTF-8 - on first run, open the Preferences box and show the Indexing tab - for stored queries set to index and label results, only apply the new label to results that already are in one of the indexes, don't do a full update - refresh labelled documents list correctly after properties are changed 2006/12/05 version_0_6_4 Collect : - don't needlessly load files that are going to be handled by an helper application Index : - store documents size and file extension, if any - fixed concurrency bug that could thrash the index on SMP systems - very long capitalized terms could lead to document loss - file names were always lower-cased Search : - regenerated the Google SOAP API stubs with gsoap 2.7.8c UI : - added filter on file extension - fixed crash on SMP systems when listing an index - fixed deadlock when indexing a query's results. Oddly enough, it seems it happened only on FreeBSD ! - Dutch translation by Tikkel - Swedish translation by Daniel Nylander Daemon : - set a lower scheduling priority - queue events in the database, not in memory - clean exit when signalled/stopped while crawling 2006/11/18 version_0_6_3 Collect : - watch out for NULL characters in data Tokenize : - fixed memory leak. Temporary documents were not deleted most of the time UI : - prettified results list - better abstract highlighting - fixed clipboard copy of results list and abstract Daemon : - autostart the daemon process - with dbus < 0.70, close the connection 2006/11/04 version_0_6_2 General : - query shared-mime-info prefix, so that the applications database can be loaded even when Pinot is installed under a different prefix - copyright notice was missing in source Index : - detect and support libtextcat 3.0 peculiarities - can skip files based on glob pattern Search : - fixed issue where label and directory filters were not applied correctly when the filter doesn't start with an upper-case letter. Directory filters starting with a non-alphanumeric character only work with Xapian >= 0.9.8. - fixed A9, Accoona and Exalead plugins UI : - file patterns to skip can be set in Preferences, Indexing - columns showing a timestamp were sorted alphabetically - refresh index lists correctly when exiting Preferences Daemon : - fixed major bug where the daemon would loop endlessly reindexing mp3/ogg files. When notified that a writable file was closed, check the file was actually modified before reindexing it. - fixed D-Bus warning about closing the connection when exiting 2006/10/18 version_0_6_1 General : - switched to gtk+'s version of xdgmime Index : - can now run queries like "type:text/html and lang:en and (tcp near ip)". See README for more information about the syntax and a list of filters. Search : - don't reject MozSearch plugins - added plugin for Google code search service - filters 'site' and 'file' (host name, file name) apply to Web engines - fixed abstract generation UI : - revamped the stored queries editor to allow any number of terms and filters - preferences relative to My Documents and My Web Pages gathered under the same tab - hide the stored queries and engines lists by default to avoid scaring people used to Beagle too much ;-) - mail accounts configuration wasn't always saved correctly - updating a document from the My Web Pages index messed the abstract Daemon : - index attachements in mbox files - documents deleted since last crawl were not removed from the index if the corresponding location wasn't monitored 2006/09/25 version_0_6_0 Build : - modifications for building on Cygwin, contributed by Reini Urban Monitor : - inotify monitor more flexible SQL : - retry if the database is busy - save the daemon's crawler history Index : - able to open remote indexes that were not initially available - indexes are open in write mode only when necessary - prefer Xapian's Flint back-end to Quartz - terms starting with an upper-case letter are not ignored by terms suggestion - refresh terms generated from the previous title, location etc... when updating a document - MIME type and directory hierarchy are stored as terms - store documents last modification date in a way compatible with Xapian Omega Search : - fixed abstract generation with remote indexes, and queries with OR'ed terms ! - brought MozDex plugin back UI : - show results scores - fixed language shown in the properties box when language is unknown for one or more of the documents selected - better at queuing document indexing - My Email index is replaced with the index managed by the daemon - search terms suggestion, More Like This and the index status icon on results rely on both internal indexes - only URLs can be imported now as the daemon takes care of local files Daemon : - crawls and indexes user-defined locations on the filesystem, mbox files included - recursively monitors these locations for changes and updates the index - provides a DBus interface suitable for querying and document inspection - no dependency on gtkmm Deskbar Applet : - plugin for Deskbar Applet that searches documents indexed by the daemon 2006/07/05 version_0_5_0 Tokenize : - switched to pdftotext, which seems to support more files than pdftohtml SQL : - remove history items older than one month at startup Search : - can search for documents with language "Unknown" (ie those for which language detection failed) UI : - multiple user-configurable cache services. They are defined in $PREFIX/share/pinot/globalconfig.xml and listed under the View Cache menu. - allow to change the language of several documents at once - after editing a query, don't change selection in the queries list - More Like This extracts representative terms from the selected documents (if they are indexed) and creates a "More Like ..." query - set the indexed status icon when results are indexed - view results on double clicks - show a message if there is no application defined for a file's type - most tree columns can be sorted at last ! :-) - work-around for bizarre bug that causes a segfault when creating a query that indexes and labels results based on a language filter 2006/06/11 version_0_4_9 General : - able to find the default application for a given MIME type Tokenize : - better HTML tokenizer UI : - titles in documents lists are no longer truncated - if import fails, the progress bar displays the error message - dropped the internal viewer, default applications are launched on View - View Cache no longer dependant on Google SOAP API. It just points to the Google Cache for http[s] results - remember whether the queries list is expanded 2006/05/25 version_0_4_8 Monitor : - replaced FAM/Gamin with inotify Search : - fixed BitTorrent source UI : - fixed menuitems inconsitencies - always use user-provided language on update ! - all query terms are highlighted in the extract ! - threads that finish while the import dialog is up are processed when the dialog box is closed 2006/05/12 version_0_4_7 General : - all programs have man pages and support --help and --version Search : - the Google API engine is no longer unnecessarily picky about queries parameters - updated Creative Commons plugin, based on the one shipped with Firefox UI : - merged channel Web Services with The Web - the state of engine channels is saved and restored - query terms are highlighted in the extract field - allow editing the language of documents. A subsequent update would use the given language to stem terms. 2006/04/22 version_0_4_6 Search : - resurrected support for the Google API, enabled with "./configure --with-soap=yes". This requires the gsoap development toolkit. - tweaked extraction of results' extract and URL - renamed pinot_search to pinot-search - (temporarily hopefully) removed MozDex plugin Index : - changed URL hash algorithm for compatibility with omindex. Reindexing documents with URLs > 128 characters is necessary Tokenize : - new tokenizer for MP3, Vorbis and FLAC files that depend on the TagLib library - new XML tokenizer - new tokenizer for OpenDocument/StarOffice files (unzip required) - use xdgmime to query shared-mime-info for better MIME type detection HTML browser : - can build against Mozilla or Firefox, set with "./configure --with-gecko=mozilla|firefox" UI : - discovered and fixed pretty stupid bug that would cause a crash when indexing search results while the My Documents tab isn't opened - internationalized desktop file 2006/03/25 version_0_4_5 Build : - various fixes for building on Fedora Core 5 Index : - listing a label's documents displays the correct documents count in the status bar and navigation buttons are enabled or disabled accordingly Search : - identified and fixed cause of crash when searching several engines simultaneously - Teoma and Ask Jeeves plugins replaced by Ask.com plugin - dropped Altavista and Lycos - added RollYO's Top News, Exalead and Creative Commons sources - fixed results extraction with Topix - better URL extraction for those search engines that use redirectors, for instance http://rds.yahoo.com/_ylt=.../**http%3a//some.website.com/here/index.html - tweaked abstract generation UI : - when no email boxes were being monitored, Pinot could crash at exit time depending on which version of FAM was used - suggestion of query terms can be disabled - new results are now shown with a background colour of the user's choice - since index searches are multi-stepped, live queries' terms are now ANDed together 2006/03/12 version_0_4_4 Build : - fixed building of PDF, RTF and MS Word tokenizer libraries - HTTP library can be selected with "./configure --with-http=curl|neon" Collect : - added curl-based downloader, used by default instead of Neon Index : - dropped dependency on OTS, abstracts are now generated at search time Search : - fixed page browsing with Lycos plugin UI : - fixed potential deadlock, eg between saving preferences and listing the index - labels no longer have colours; selecting a label on an index only lists those documents that have the label 2006/02/25 version_0_4_3 Build : - dropped patch for libxml++ v0.26 support - use autotools Search : - include pinot_search to query search engines from the command-line - better parsing of Sherlock plugins with foreign tags - URLs of results returned by Sherlock engines were incorrectly lower-cased - fix for Topix and Acoona Index : - changed term prefixes to conform to conventions used by Omega and other Xapian-based tools. Unfortunately, users will have to update documents and reapply labels ! UI : - UTF-8 fixes on stored queries dates, results extracts and locale catalogs; the Spanish and French catalogs are usable now ! - Better error handling in worker threads and at startup 2006/01/31 version_0_4_2 Search : - support for OpenSearch Description, Query and Response - replaced Koders and Omega Sherlock plugins with their OpenSearch equivalent - added MozDex Index : - queries on an index that cannot be locked no longer loop UI : - when no email boxes were being monitored, Pinot wouldn't exit right away - results extract field can be resized - Spanish translation by Jesús Tramullas (jesus at tramullas dot com) 2006/01/20 version_0_4_0 UI and SQLite : - dropped ActionHistory Search : - fixed issues with documents and queries language - detect encoding of results pages - fixed AskJeeves source Index : - fixed issues with documents language - limit amount of text parsed by summarization and language guessing - limit terms length - canonicalize URLs - make sure index is always unlocked properly Tokenize : - modified tokenizer plugins interface - new RTF tokenizer (requires unrtf) UI : - standard About box - better documents importing - better charset conversion - completion on query field, based on terms in the documents index - all notebook tabs are open on a need-to basis and can be closed - fixed several UI inconsistencies - catch signals and signal threads to ensure clean exits - save language names in English, load in current locale - requires gtkmm v2.6 2005/12/18 version_0_3_5 Xapian + SQLite back-end : - dropped unnecessary tables, moved labels and properties into the index Neon downloader : - fail on HTTP errors - setup OpenSSL callbacks for multi-threading support Search : - can filter documents by label - multi-step index search : exact matches first, then with stemming, then with ignoring the operators HTML browser : - initialize NSS and NSPR to get rid of crash when visiting https sites UI : - on double clicks in the index tab, check a document is selected before opening the properties box - copy & paste should now work as expected - modified index browser Threads : - handle select() errors 2005/12/08 version_0_3_0 - first public release pinot-1.22/README000066400000000000000000000571671470740426600135010ustar00rootroot00000000000000Pinot Copyright 2005-2024 Fabrice Colin Homepage - https://github.com/FabriceColin/pinot previously hosted at http://code.google.com/p/pinot-search/ and http://pinot.berlios.de/ Translations - https://translations.launchpad.net/pinot/trunk/+pots/pinot 1. What is Pinot 2. Available engines 3. Indexes 4. Indexing and monitoring 5. Searching 6. Viewing cached results 7. File formats 8. File patterns 9. Digging deeper 10. Saving results 11. D-Bus service & daemon 12. CJKV support 13. Environment variables and aliases 14. How to reset indexes 15. Compiling 1. What is Pinot Pinot combines desktop search and metasearch. It consists of : * a D-Bus service daemon that crawls, indexes, monitors your documents and that plugs into the GNOME Shell search system ("pinot-dbus-daemon") * a GTK3-based user interface that enables to query the index built by the service as well as Web engines, and which can display and analyze the results ("pinot") * other command-line tools It was developed and tested on GNU/Linux and should work on other Unix-like systems. 2. Available engines One of the main functionalities of Pinot is metasearch. This lets you query a variety of sources, including Web-based search engines. By default, the list of available engines is hidden and defaults to internal indexes (see section "3. Indexes"). To show the list of engines, click on the Show All Search Engines button, next to the Query field immediately below the menu bar. Click on the same button again to hide the list. Any number of engine or engine group may be selected at any one time. Multi-selection is done like in any other application. All queries are always run against the list of currently selected engines. Pinot supports both Sherlock and OpenSearch Description plugins. They are installed in $PREFIX/share/pinot/engines/, where PREFIX is usually /usr. Additional engines can be installed in that directory or in ~/.pinot/engines. Note this directory is not created automatically. Sherlock is what Firefox and the Mozilla Suite use. Chances are that somebody wrote a plugin for the engine you are interested in. Beware that a lot are out of date and will require some changes. Use pinot-search on the command-line to run a quick check on a plugin, eg $ pinot-search sherlock $PREFIX/share/pinot/engines/Bozo.src "clowns" Plugins are categorized by channels. For Sherlock plugins, the routeType element under SEARCH specifies the name of the channel the plugin belongs to. As for OpenSearch, Pinot should work with OpenSearch Description 1.0 and 1.1 (draft 2) plugins. Keep in mind that the spec doesn't describe how to parse the results pages returned by search engines, therefore Pinot assumes that engines return results formatted according to the OpenSearch Response standard. In practice, this means that plugins that don't stick to the following rules will be ignored or won't show any result : * For Description 1.1 plugins, the type attribute on the Url field must be set to "application/atom+xml" or "application/rss+xml" (default). "text/html" will be rejected. * The search engine's results page content type must be some form of XML, otherwise Pinot won't attempt parsing it. Pinot differs from the Description spec in that it interprets the Tags field as a channel name. The standard defines Tags as a "space-delimited set of words that are used as keywords to identify and categorize this search content". The "Xapian Omega" plugin allows to query a locally installed instance of Xapian Omega at http://localhost/. If Omega is installed elsewhere, edit $PREFIX/share/pinot/engines/OmegaDescription.xml. 3. Indexes Pinot has two internal indexes. My Documents is populated by the D-Bus service and contains documents found on your computer. My Web Pages is populated by the UI whenever you : * import an external document, using the Index, Import URL menu * index results returned by Web engines, using the Results, Index menu or through a Stored Query Both index may have any of the file types listed in section "7. File formats". Indexes built by any other Xapian-based tools can be added to Pinot. To add an external index, click the + button at the bottom of the engines list. It can either be local, in which case you will have to select the directory where it is found, or served from a remote machine by xapian-tcpsrv. See the manual page for xapian-tcpsrv(1). All indexes are grouped together under the channel Current User in the engines list. 4. Indexing and monitoring Pinot can index any directory configured under the Indexing tab of the Preferences box. Monitoring is optional and should be disabled for the directories whose contents seldom change, eg $PREFIX/share/doc. Indexing and monitoring of directories is handled by the D-Bus service. The number of files and directories that can be monitored is capped by the value of /proc/sys/fs/inotify/max_user_watches - 1024. Symlinks are not followed but are still indexed, with the MIME type "inode/symlink". While Pinot is not currently able to get to and index application-specific data held in dot-directories, it can index common file formats as listed in section "7. File formats". All files and directories with a name that starts with a dot, eg ".thunderbird", are skipped and their content is not indexed. If you wish to include the contents of some dot-directory, create a symlink to a directory that is configured in Preferences. For instance, if "~/Documents" is configured for indexing, create a symlink from "~/.thunderbird" to "~/Documents/TMail". For this to work, the dot-directory must not be in a directory configured for indexing. If you want to exclude any specific files or directories from indexing, use patterns as described in section "8. File patterns". Pinot supports stopwords removal. While no such list is provided by default, they can be easily found on the Internet. Each language has its own stopword list, for instance a stopwords list for English should be copied to $PREFIX/share/pinot/stopwords/stopwords.en Language detection is done with libexttextcat. Ensure that the paths listed in /etc/pinot/textcat_conf.txt are correct. The pinot-index program allows indexing and peeking at documents' properties from the command-line. Using the -i/--index option with the My Documents or My Web Pages index is not recommended. For more details, see the manual page for pinot-index(1). 5. Searching Searches are run differently based on the type of engine being queried. When querying a Web engine, Pinot assumes this engine understands the query, which is sent as is. No pre-processing is performed on the text of the query, and the results list is more or less presented as retrieved from the Web engine. When querying an index, things are somewhat different. Queries can be expressed in a very natural way, using a combination of operators, filters and ranges. This query syntax is the syntax supported natively by Xapian's QueryParser and is documented at http://www.xapian.org/docs/queryparser.html For instance, the query "type:text/html AND lang:en AND (tcp NEAR ip)" will look for HTML files in English that mention TCP/IP. Note that all operators should be specified in capitals, eg "AND" not "and". The latter will be treated as a regular term. Pinot supports these query filters : "site" for host name, eg "site:github.com" "file" for file name, eg "file:index.html" "ext" for file extension, eg "ext:html" "title" for title, eg "title:pinot" "url" for URL, eg "url:https://github.com/" "dir" for directory, eg "dir:/home/fabrice" "inurl" for documents embedded in a URL, eg "inurl:file:///home/fabrice/Documents/backup.tar.gz" "lang" for ISO language code, eg "lang:en" "type" for MIME type, eg "type:text/html" "class" for MIME type classification, eg "class:text" "label" for label, eg "label:Important" The directory filter is recursive, ie it applies to sub-directories. Allowed language codes are "da", "nl", "en", "fi", "fr", "de", "hu", "it", "nn", "pt", "ro", "ru", "es", "sv" and "tr". Stemming is available to stored queries for which a stemming language is defined. If such a query doesn't return any exact match, the query terms are stemmed and the query is run again. Stopwords are also then removed if a stopwords list was found for the stemming language. The values of "file", "url", "dir" and "label" may be double-quoted. It's also worth pointing out that the query "dir:/X/Y" will return files and directories located in /X/Y, but not Y itself, which is what "dir:/X file:Y" would do. In addition, these ranges are supported : "YYYYMMDD..YYYYMMDD" for date ranges, eg "20070801..20070831" "HHMMSS..HHMMSS" for time ranges, eg "090000..180000" "size0..size1b" for size in bytes, eg "0..10240b" See the manual page for pinot-search(1) for examples. 6. Viewing cached results Results returned by search engines can be viewed "live" by selecting the View menuitem under Results. This opens whatever application defined for the result's MIME type and/or protocol scheme. In addition, Pinot allows to view the page as cached by Google and the Wayback Machine. Cache providers are actually configured in globalconfig.xml, located in /etc/pinot/. For instance : Google http://www.google.com/search?q=cache:%url0 http, https This is self-explanatory :-) Here it configures a cache provider called "Google" that handles both http and https. The location field supports two parameters that are substituted to obtain the URL to open : * %url is the result's URL as displayed by the UI, eg https://github.com/FabriceColin/pinot * %url0 is the result's URL without the protocol, eg github.com/FabriceColin/pinot 7. File formats The following document types are supported internally : * plain text * HTML * XML * mbox, including attachments and embedded documents * MP3, Ogg Vorbis, FLAC * JPEG * common archive formats (tar, Z, gz, bzip2, deb) * ISO 9660 images The following document types are supported through external programs : * PDF (pdftotext required) * RTF (unrtf required) * ReStructured Text (rst2txt required) * OpenDocument/StarOffice files (unzip required) * MS Word (antiword required) * PowerPoint (catppt required) * Excel (xls2csv required) * DVI (catdvi required) * DjVu (djvutext required) * RPM (rpm required) For other document types, Pinot will only index metadata such as name, location etc... If you wish to add support for another document type, and know of a command-line program that can handle that type, add it to external-filters.xml, located in /etc/pinot/. 8. File patterns It is possible to skip indexing of files that match glob(3) patterns. These patterns are configured in the Indexing tab of the Preferences box, and can be used as a blacklist or a whitelist. Patterns apply to files and directories. For instance, blacklisting "*/Desktop*" will skip "~/Desktop" and not crawl nor monitor this directory's contents. Similarly, a blacklist entry for "*.avi" means that Pinot will not attempt indexing the content of AVI files, and will ignore all monitor events related to these files. If you have never run Pinot before, the list will be pre-configured to skip some picture, video and archive file types such as GIF, MPG and RAR. 9. Digging deeper Pinot offers two ways you can dig deeper in your documents : More Like This suggests terms specific to documents that may help in finding related documents, and Search This For allows to search in results. Both features are enabled if one or more of the results currently selected is indexed, and only operate on those. When activated, More Like This will create a new Stored Query prefixed with "More Like". For instance, if you run a Stored Query with name "Me", the expanded query's name will be "More Like Me". Search For This will search those results for the Stored Query selected in the sub-menu and will present results in a new tab. For instance, running the Stored Query "Me" on a set of results will open a "Me In Results" tab. In addition to these, Pinot may suggest alternative spellings for queries that don't return any result. If it does, a new Stored Query prefixed with "Corrected" will be created. 10. Saving results Lists of results can be saved to disk by selecting the Save As menuitem under Results. Two output formats are available to choose from in the file selector opened by Save As : * CSV, a text format The semi-colon character (';') is used to delimit fields. * OpenSearch response, a XML/RSS format See https://en.wikipedia.org/wiki/OpenSearch for details. 11. D-Bus service & daemon Unless Pinot was built without support for D-Bus, the daemon program "pinot-dbus-daemon" implements the D-Bus service and should be auto-started through the desktop file installed at /etc/xdg/autostart/pinot-dbus-daemon.desktop. D-Bus activation makes sure the service is running whenever one of its methods is invoked by any consumer application. For instance, clicking OK on the Preferences box will call the service's Reload method, which should start the service. This method also causes the service to reload the configuration file. A few things to keep in mind : * when starting, the service will first crawl all configured locations and (re)index new and modified files. The daemon's scheduling priority is set very low (15, can be adjusted with --priority) so that it hopefully doesn't prevent other activities. Crawling is suspended while the system is on battery. * when finished crawling, the service will monitor some locations for changes (as per preferences) and should consume little resources, unless a huge quantity of files needs its attention. * any change detected by the monitor is queued and acted upon as soon as possible, eg reindex a file that was modified. * operations that involve communicating with the service, such as editing documents metadata, may timeout if the system is under heavy load and/or the daemon is busy. In most cases, the message will have been received by the daemon, but the reply may take longer than expected. The Pinot UI may report that the operation failed, even though it was queued for processing and will be acted upon by the daemon. See section "13. Environment variables and aliases" for some tips on how to query the D-Bus interface. A list of available D-Bus methods can be found in the file pinot-dbus-daemon.xml. Pinot v1.20 implements the GNOME Shell search provider interface to allow searching the contents of files the daemon found at locations it crawled, basically the My Documents index. Go to the GNOME Settings' Search screen to enable Pinot as a provider. For this to work, the file com.github.fabricecolin.Pinot.search-provider.ini should be in the folder $PREFIX/share/gnome-shell/search-providers/ 12. CJKV support Pinot supports indexing and searching CJKV text. At search time, queries that include CJKV characters are processed in a manner compatible with the CJKV indexing scheme. There is no need to format the query in a specific format, ie no need to separate characters with spaces. For example, the query : Fabrice 你好 title:身体好吗 will be modified internally to : Fabrice (你 你好 好) title:身 title:身体 title:体 title:体好 title:好 title:好吗 title:吗 It is recommended that filters (eg "title") be used at the end of the query for it to be processed as expected. You can get a list of documents in which CJKV characters were detected by the indexer with the special filter "tokens:CJKV". 13. Environment variables and aliases Pinot tries to provide reasonable defaults for most systems, but there may be situations where you want to tweak these values through environment variables : * PINOT_SPELLING_DB By default, Pinot builds indexes with a spelling database. This spelling database may make up as much as a third of the size of the index. If your system is low on disk space, you can disable this with $ export PINOT_SPELLING_DB=NO Make sure this is set for your login session, ie whenever the daemon is auto-started. You will also have to reset indexes, as described in section "16. How to reset indexes". * PINOT_MINIMUM_DISK_SPACE The daemon will stop crawling and indexing files when the partition on which the index resides runs out of free space. By default, this means less than 50 Mb. To change this value to 100 Mb for instance, use $ export PINOT_MINIMUM_DISK_SPACE=100 * PINOT_MAXIMUM_INDEX_THREADS This sets the maximum number of concurrent indexing threads used by the daemon. The default value is 1. * PINOT_MAXIMUM_NESTED_SIZE This limits the extraction of documents nested inside others, such as archives or mail messages, based on their size. By default, this is deactivated and set to 0. * PINOT_MAXIMUM_QUERY_RESULTS This overrides the number of results returned by queries run through the UI's Query field as well as the number of results initially set for new stored queries. Another environment variable that you may want to tweak comes from Xapian. XAPIAN_FLUSH_THRESHOLD can be set to the number of documents after which Xapian is to flush changes to the index. The default value is set to 10000 at the time of writing this. Lowering this value should decrease the amount of memory used to cache changes to the index. Pinot provides a "tagged cd" script that enables to change a shell's current directory to the directory that matches the path elements passed as parameter. For instance, after setting : $ alias pcd='. $PREFIX/share/pinot/pinot-cd.sh' if ~/Documents is configured for indexing in Preferences, the following command would change the current directory to ~/Documents/Web/Stats : $ pcd Documents Stats If other directories match the given paths, pinot-cd.sh will display a list of matches. Future work will focus on disambiguation. If you have dbus-send installed, you may also want to set the following aliases : $ alias pinot-stats='dbus-send --session --print-reply --type=method_call \ --dest=com.github.fabricecolin.Pinot /com/github/fabricecolin/Pinot com.github.fabricecolin.Pinot.GetStatistics' $ alias pinot-stop='dbus-send --session --print-reply --type=method_call \ --dest=com.github.fabricecolin.Pinot /com/github/fabricecolin/Pinot com.github.fabricecolin.Pinot.Stop' The first will start the service daemon by calling its GetStatistics method, while the second alias will send it a request to stop and exit. 14. How to reset indexes You may wish to reset one of the index and start from scratch. There are several ways to do this, depending on which index it is. If you want to reset My Web Pages, you can either : * use Pinot to unindex every single document by selecting them all and choosing Unindex in the Index menu * or stop Pinot and delete ~/.pinot/index recursively If you want to reset My Documents, special considerations apply because of the historical data maintained by the daemon. There are two ways to proceed, and both require that the daemon be stopped. The manual way is to delete the index with $ rm -rf ~/.pinot/daemon and remove historical data with $ sqlite3 ~/.pinot/history-daemon "delete from CrawlHistory; delete from CrawlSources; delete from ActionQueue;" If you want to start from scratch and drop metadata (eg labels) that may exist on some documents, remove the history file altogether with $ rm -f ~/.pinot/history-daemon The automated way is to tell the daemon to reindex everything by launching it with the "--reindex" option, ie $ pinot-dbus-daemon --reindex It may be useful to take a look at the log file located at ~/.pinot/pinot-dbus-daemon.log. 15. Compiling Pinot's configure understands the following optional switches. --enable-debug enable debug [default=no] --enable-dbus enable DBus support [default=yes] --enable-libnotify enable libnotify support [default=no] --enable-mempool enable memory pool [default=no] --enable-libarchive [enable the libarchive filter [default=no] --enable-chmlib [enable the chmlib filter [default=no] Enable support for libarchive and chmlib if the necessary libraries are available. Enable libnotify support when building on BSD systems. Other switches should most likely stay unchanged. In order to build from the git repository, follow these steps: $ git clone https://github.com/FabriceColin/pinot.git $ cd pinot $ touch ChangeLog $ ./autogen.sh --prefix=/usr --libdir=/usr/lib64 --sysconfdir=/etc \ --enable-debug=yes --enable-libarchive=yes --enable-chmlib=yes See the list below for dependencies. The version numbers indicate the minimum version Pinot has been tested with; older versions may or may not work. --------------------------------------------------------------- Libraries and tools Version --------------------------------------------------------------- SQLite 3.3.1 http://www.sqlite.org/ xapian-core 1.4.10 http://www.xapian.org/ zlib 1.2.0 http://www.gzip.org/zlib/ curl (1) 7.13.1 http://curl.haxx.se/ - OR - neon (1) 0.24.7 http://www.webdav.org/neon/ gdbus-codegen-glibmm (2) https://github.com/Pelagicore/gdbus-codegen-glibmm gtkmm 3.24 http://www.gtkmm.org/ libxml++ 3.2.0 https://github.com/libxmlplusplus/libxmlplusplus/ libexttextcat 3.2 http://cgit.freedesktop.org/libreoffice/libexttextcat/ gmime (3) 2.6.0 http://spruce.sourceforge.net/gmime boost (4) 1.75 http://www.boost.org/ D-Bus with GLib bindings 0.61 http://www.freedesktop.org/wiki/Software/dbus shared-mime-info 0.17 http://freedesktop.org/Software/shared-mime-info desktop-file-utils 0.10 http://www.freedesktop.org/software/desktop-file-utils TagLib 1.4 http://ktown.kde.org/~wheeler/taglib/ libarchive (5) 3.0.0 https://libarchive.org/ exiv2 0.21 http://www.exiv2.org/ chmlib (6) 0.40 http://www.jedrea.com/chmlib/ openssh-askpass (7) 4.3 http://www.openssh.com/portable.html --------------------------------------------------------------- External filter programs --------------------------------------------------------------- unzip http://www.info-zip.org/pub/infozip/UnZip.html pdftotext http://www.foolabs.com/xpdf/ http://poppler.freedesktop.org/ antiword http://www.winfield.demon.nl/ unrtf http://www.gnu.org/software/unrtf/unrtf.html rst2txt https://github.com/stephenfin/rst2txt djvutxt http://djvu.sourceforge.net/ catdvi http://catdvi.sourceforge.net/ catppt xls2csv http://www.wagner.pp.ru/~vitus/software/catdoc/ --------------------------------------------------------------------- Notes : (1) enabled with "./configure --with-http=neon|curl" (2) only to regenerate DBus code, with "make dbus-code" (3) for gmime 2.4.0 support, edit configure.in (4) for building only with boost > 1.48 and < 1.54, turning off memory pooling with "./configure --enable-mempool=no" may be preferable (5) optional - enabled with "./configure --enable-libarchive=yes" (6) optional - enabled with "./configure --enable-chmlib=yes" (7) experimental - required only if _SSH_TUNNEL is set --------------------------------------------------------------------- pinot-1.22/SQL/000077500000000000000000000000001470740426600132405ustar00rootroot00000000000000pinot-1.22/SQL/ActionQueue.cpp000066400000000000000000000155341470740426600161760ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include "Url.h" #include "StringManip.h" #include "TimeConverter.h" #include "ActionQueue.h" using std::string; using std::set; using std::vector; using std::stringstream; using std::clog; using std::endl; ActionQueue::ActionQueue(const string &database, const string queueId) : SQLiteBase(database, false, false), m_queueId(queueId) { prepareStatement("select-url", "SELECT Url FROM ActionQueue WHERE QueueId=? AND Url=?;"); prepareStatement("push-item-insert", "INSERT INTO ActionQueue VALUES(?, ?, ?, ?, ?);"); prepareStatement("push-item-update", "UPDATE ActionQueue SET Type=?, Date=?, Info=? WHERE QueueId=? AND Url=?;"); prepareStatement("pop-item", "DELETE FROM ActionQueue WHERE QueueId=? AND Url=?;"); prepareStatement("select-oldest-url", "SELECT Type, Info FROM ActionQueue " "WHERE QueueId=? ORDER BY Date DESC LIMIT 1;"); prepareStatement("expire-items", "DELETE FROM ActionQueue WHERE QueueId=? AND Date values; string url(docInfo.getLocation()); string info(docInfo.serialize()); stringstream numStr; bool update = false, success = false; // Is there already an item for this URL ? values.push_back(m_queueId); values.push_back(Url::escapeUrl(url)); SQLResults *results = executePreparedStatement("select-url", values); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { #ifdef DEBUG clog << "ActionQueue::pushItem: item " << Url::unescapeUrl(row->getColumn(0)) << " exists" << endl; #endif update = true; delete row; } delete results; } numStr << time(NULL); if (update == false) { values.push_back(typeToText(type)); values.push_back(numStr.str()); values.push_back(info); results = executePreparedStatement("push-item-insert", values); } else { values.clear(); values.push_back(typeToText(type)); values.push_back(numStr.str()); values.push_back(info); values.push_back(m_queueId); values.push_back(Url::escapeUrl(url)); results = executePreparedStatement("push-item-update", values); } if (results != NULL) { #ifdef DEBUG clog << "ActionQueue::pushItem: queue " << m_queueId << ": " << type << " on " << url << ", " << update << endl; #endif success = true; delete results; } return success; } /// Pops and deletes the oldest item. bool ActionQueue::popItem(ActionType &type, DocumentInfo &docInfo) { vector values; string url; bool success = false; if (getOldestItem(type, docInfo) == false) { return false; } url = docInfo.getLocation(); #ifdef DEBUG clog << "ActionQueue::popItem: queue " << m_queueId << ": " << type << " on " << url << endl; #endif values.push_back(m_queueId); values.push_back(Url::escapeUrl(url)); // Delete from ActionQueue SQLResults *results = executePreparedStatement("pop-item", values); if (results != NULL) { success = true; delete results; } return success; } bool ActionQueue::getOldestItem(ActionType &type, DocumentInfo &docInfo) { vector values; bool success = false; values.push_back(m_queueId); SQLResults *results = executePreparedStatement("select-oldest-url", values); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { type = textToType(row->getColumn(0)); success = true; // Deserialize DocumentInfo docInfo.deserialize(row->getColumn(1)); delete row; } delete results; } return success; } /// Returns the number of items of a particular type. unsigned int ActionQueue::getItemsCount(ActionType type) { unsigned int count = 0; SQLResults *results = executeStatement("SELECT COUNT(*) FROM ActionQueue " "WHERE Type='%q';", typeToText(type).c_str()); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { count = atoi(row->getColumn(0).c_str()); delete row; } delete results; } return count; } /// Deletes all items under a given URL. bool ActionQueue::deleteItems(const string &url) { bool success = false; if (beginTransaction() == false) { return false; } SQLResults *results = executeStatement("DELETE FROM ActionQueue " "WHERE Url LIKE '%q%%';", Url::escapeUrl(url).c_str()); if (results != NULL) { success = true; delete results; } if (endTransaction() == false) { return false; } return success; } /// Expires items older than the given date. bool ActionQueue::expireItems(time_t expiryDate) { vector values; stringstream numStr; bool success = false; if (beginTransaction() == false) { return false; } values.push_back(m_queueId); numStr << expiryDate; values.push_back(numStr.str()); SQLResults *results = executePreparedStatement("expire-items", values); if (results != NULL) { success = true; delete results; } if (endTransaction() == false) { return false; } return success; } pinot-1.22/SQL/ActionQueue.h000066400000000000000000000040271470740426600156360ustar00rootroot00000000000000/* * Copyright 2005-2010 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _ACTION_QUEUE_H #define _ACTION_QUEUE_H #include #include #include "DocumentInfo.h" #include "SQLiteBase.h" /// Handles the ActionQueue table. class ActionQueue : public SQLiteBase { public: ActionQueue(const std::string &database, const std::string queueId); virtual ~ActionQueue(); /// Creates the ActionQueue table in the database. static bool create(const std::string &database); typedef enum { INDEX = 0, UNINDEX } ActionType; /// Pushes an item. bool pushItem(ActionType type, const DocumentInfo &docInfo); /// Pops and deletes the oldest item. bool popItem(ActionType &type, DocumentInfo &docInfo); /// Returns the number of items of a particular type. unsigned int getItemsCount(ActionType type); /// Deletes all items under a given URL. bool deleteItems(const std::string &url); /// Expires items older than the given date. bool expireItems(time_t expiryDate); protected: std::string m_queueId; bool getOldestItem(ActionType &type, DocumentInfo &docInfo); static std::string typeToText(ActionType type); static ActionType textToType(const std::string &text); private: ActionQueue(const ActionQueue &other); ActionQueue &operator=(const ActionQueue &other); }; #endif // _ACTION_QUEUE_H pinot-1.22/SQL/CrawlHistory.cpp000066400000000000000000000334461470740426600164100ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include "Url.h" #include "CrawlHistory.h" using std::clog; using std::endl; using std::string; using std::set; using std::map; using std::vector; using std::stringstream; CrawlHistory::CrawlHistory(const string &database) : SQLiteBase(database, false, false) { prepareStatement("has-source", "SELECT SourceID FROM CrawlSources WHERE Url=?;"); prepareStatement("get-sources", "SELECT SourceID, Url FROM CrawlSources;"); prepareStatement("insert-item", "INSERT INTO CrawlHistory VALUES(?, ?, ?, ?, ?);"); prepareStatement("has-item", "SELECT Status, Date FROM CrawlHistory WHERE Url=?;"); prepareStatement("update-item", "UPDATE CrawlHistory SET Status=?, Date=?, ErrorNum=? WHERE Url=?;"); prepareStatement("update-items-status1", "UPDATE CrawlHistory SET Status=? WHERE Status=?;"); prepareStatement("update-items-status2", "UPDATE CrawlHistory SET Status=? WHERE SourceId=? AND Status=?;"); prepareStatement("get-items", "SELECT Url FROM CrawlHistory WHERE Status=?;"); prepareStatement("get-source-items1", "SELECT Url FROM CrawlHistory WHERE SourceId=? AND Status=? AND Date>? LIMIT ? OFFSET ?;"); prepareStatement("get-source-items2", "SELECT Url FROM CrawlHistory WHERE SourceId=? AND Status=? LIMIT ? OFFSET ?;"); prepareStatement("get-items-count", "SELECT COUNT(*) FROM CrawlHistory WHERE Status=?;"); prepareStatement("delete-item", "DELETE FROM CrawlHistory WHERE Url=?;"); prepareStatement("delete-items1", "DELETE FROM CrawlHistory WHERE SourceID=?;"); prepareStatement("delete-items2", "DELETE FROM CrawlHistory WHERE SourceID=? AND Status=?;"); prepareStatement("expire-items", "DELETE FROM CrawlHistory WHERE DatenextRow(); if (row != NULL) { sourceId = atoi(row->getColumn(0).c_str()); delete row; } ++sourceId; delete results; } results = executeStatement("INSERT INTO CrawlSources " "VALUES('%u', '%q');", sourceId, Url::escapeUrl(url).c_str()); if (results != NULL) { delete results; } return sourceId; } /// Checks if a source exists. bool CrawlHistory::hasSource(const string &url, unsigned int &sourceId) { vector values; bool success = false; values.push_back(Url::escapeUrl(url)); SQLResults *results = executePreparedStatement("has-source", values); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { sourceId = atoi(row->getColumn(0).c_str()); success = true; delete row; } delete results; } return success; } /// Returns sources. unsigned int CrawlHistory::getSources(map &sources) { vector values; unsigned int count = 0; SQLResults *results = executePreparedStatement("get-sources", values); if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } sources[(unsigned int)atoi(row->getColumn(0).c_str())] = Url::unescapeUrl(row->getColumn(1)); ++count; delete row; } delete results; } return count; } /// Deletes a source. bool CrawlHistory::deleteSource(unsigned int sourceId) { bool success = false; SQLResults *results = executeStatement("DELETE FROM CrawlSources " "WHERE SourceID='%u';", sourceId); if (results != NULL) { success = true; delete results; } return success; } /// Inserts an URL. bool CrawlHistory::insertItem(const string &url, CrawlStatus status, unsigned int sourceId, time_t date, int errNum) { vector values; stringstream numStr; bool success = false; if (date == 0) { date = time(NULL); } values.push_back(Url::escapeUrl(url)); values.push_back(statusToText(status)); numStr << sourceId; values.push_back(numStr.str()); numStr = stringstream(); numStr << date; values.push_back(numStr.str()); numStr = stringstream(); numStr << errNum; values.push_back(numStr.str()); SQLResults *results = executePreparedStatement("insert-item", values); if (results != NULL) { success = true; delete results; } return success; } /// Checks if an URL is in the history. bool CrawlHistory::hasItem(const string &url, CrawlStatus &status, time_t &date) { vector values; bool success = false; values.push_back(Url::escapeUrl(url)); SQLResults *results = executePreparedStatement("has-item", values); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { status = textToStatus(row->getColumn(0)); date = (time_t)atoi(row->getColumn(1).c_str()); success = true; delete row; } delete results; } return success; } /// Updates an URL. bool CrawlHistory::updateItem(const string &url, CrawlStatus status, time_t date, int errNum) { vector values; stringstream numStr; bool success = false; if (date == 0) { date = time(NULL); } values.push_back(statusToText(status)); numStr << date; values.push_back(numStr.str()); numStr = stringstream(); numStr << errNum; values.push_back(numStr.str()); values.push_back(Url::escapeUrl(url)); SQLResults *results = executePreparedStatement("update-item", values); if (results != NULL) { success = true; delete results; } return success; } /// Updates URLs. bool CrawlHistory::updateItems(const map &items) { bool success = false; if (beginTransaction() == false) { return false; } for (map::const_iterator updateIter = items.begin(); updateIter != items.end(); ++updateIter) { if (updateItem(updateIter->first, updateIter->second.m_itemStatus, updateIter->second.m_itemDate, updateIter->second.m_errNum) == true) { success = true; } } if (endTransaction() == false) { return false; } return success; } /// Updates the status of items en masse. bool CrawlHistory::updateItemsStatus(CrawlStatus oldStatus, CrawlStatus newStatus, unsigned int sourceId, bool allSources) { vector values; SQLResults *results = NULL; bool success = false; values.push_back(statusToText(newStatus)); if (beginTransaction() == false) { return false; } if (allSources == false) { stringstream numStr; numStr << sourceId; values.push_back(numStr.str()); values.push_back(statusToText(oldStatus)); results = executePreparedStatement("update-items-status2", values); } else { values.push_back(statusToText(oldStatus)); // Ignore the source results = executePreparedStatement("update-items-status1", values); } if (results != NULL) { success = true; delete results; } if (endTransaction() == false) { return false; } return success; } /// Gets the error number and date for a URL. int CrawlHistory::getErrorDetails(const string &url, time_t &date) { int errNum = 0; SQLResults *results = executeStatement("SELECT ErrorNum, Date " "FROM CrawlHistory WHERE Url='%q';", Url::escapeUrl(url).c_str()); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { errNum = atoi(row->getColumn(0).c_str()); date = (time_t)atoi(row->getColumn(1).c_str()); delete row; } delete results; } return errNum; } /// Returns items. unsigned int CrawlHistory::getItems(CrawlStatus status, set &urls) { vector values; unsigned int count = 0; values.push_back(statusToText(status)); SQLResults *results = executePreparedStatement("get-items", values); if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } urls.insert(Url::unescapeUrl(row->getColumn(0))); ++count; delete row; } delete results; } return count; } /// Returns items that belong to a source. unsigned int CrawlHistory::getSourceItems(unsigned int sourceId, CrawlStatus status, set &urls, unsigned int min, unsigned int max, time_t minDate) { vector values; stringstream numStr; SQLResults *results = NULL; unsigned int count = 0; numStr << sourceId; values.push_back(numStr.str()); values.push_back(statusToText(status)); if (minDate > 0) { numStr = stringstream(); numStr << minDate; values.push_back(numStr.str()); numStr = stringstream(); numStr << max - min; values.push_back(numStr.str()); numStr = stringstream(); numStr << min; values.push_back(numStr.str()); results = executePreparedStatement("get-source-items1", values); } else { numStr = stringstream(); numStr << max - min; values.push_back(numStr.str()); numStr = stringstream(); numStr << min; values.push_back(numStr.str()); // Ignore the date results = executePreparedStatement("get-source-items2", values); } if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } urls.insert(Url::unescapeUrl(row->getColumn(0))); ++count; delete row; } delete results; } return count; } /// Returns the number of URLs. unsigned int CrawlHistory::getItemsCount(CrawlStatus status) { vector values; unsigned int count = 0; values.push_back(statusToText(status)); SQLResults *results = executePreparedStatement("get-items-count", values); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { count = atoi(row->getColumn(0).c_str()); delete row; } delete results; } return count; } /// Deletes an URL. bool CrawlHistory::deleteItem(const string &url) { vector values; bool success = false; values.push_back(Url::escapeUrl(url)); SQLResults *results = executePreparedStatement("delete-item", values); if (results != NULL) { success = true; delete results; } return success; } /// Deletes all items under a given URL. bool CrawlHistory::deleteItems(const string &url) { bool success = false; if (beginTransaction() == false) { return false; } SQLResults *results = executeStatement("DELETE FROM CrawlHistory " "WHERE Url LIKE '%q%%';", Url::escapeUrl(url).c_str()); if (results != NULL) { success = true; delete results; } if (endTransaction() == false) { return false; } return success; } /// Deletes URLs belonging to a source. bool CrawlHistory::deleteItems(unsigned int sourceId, CrawlStatus status) { vector values; stringstream numStr; SQLResults *results = NULL; bool success = false; numStr << sourceId; values.push_back(numStr.str()); if (beginTransaction() == false) { return false; } if (status == UNKNOWN) { results = executePreparedStatement("delete-items1", values); } else { values.push_back(statusToText(status)); results = executePreparedStatement("delete-items2", values); } if (results != NULL) { success = true; delete results; } if (endTransaction() == false) { return false; } return success; } /// Expires items older than the given date. bool CrawlHistory::expireItems(time_t expiryDate) { vector values; stringstream numStr; bool success = false; numStr << expiryDate; values.push_back(numStr.str()); if (beginTransaction() == false) { return false; } SQLResults *results = executePreparedStatement("expire-items", values); if (results != NULL) { success = true; delete results; } if (endTransaction() == false) { return false; } return success; } pinot-1.22/SQL/CrawlHistory.h000066400000000000000000000076011470740426600160470ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _CRAWL_HISTORY_H #define _CRAWL_HISTORY_H #include #include #include #include #include "SQLiteBase.h" class ClassItem; /// Manages crawl history. class CrawlHistory : public SQLiteBase { public: CrawlHistory(const std::string &database); virtual ~CrawlHistory(); typedef enum { UNKNOWN, TO_CRAWL, CRAWLING, CRAWLED, CRAWL_ERROR, CRAWL_LINK } CrawlStatus; /// Creates the CrawlHistory table in the database. static bool create(const std::string &database); /// Inserts a source. unsigned int insertSource(const std::string &url); /// Checks if a source exists. bool hasSource(const std::string &url, unsigned int &sourceId); /// Returns sources. unsigned int getSources(std::map &sources); /// Deletes a source. bool deleteSource(unsigned int sourceId); /// Inserts an URL. bool insertItem(const std::string &url, CrawlStatus status, unsigned int sourceId, time_t date, int errNum = 0); /// Checks if an URL is in the history. bool hasItem(const std::string &url, CrawlStatus &status, time_t &date); /// Updates an URL. bool updateItem(const std::string &url, CrawlStatus status, time_t date, int errNum = 0); /// Updates URLs. bool updateItems(const std::map &items); /// Updates the status of items en masse. bool updateItemsStatus(CrawlStatus oldStatus, CrawlStatus newStatus, unsigned int sourceId, bool allSources = false); /// Gets the error number and date for a URL. int getErrorDetails(const std::string &url, time_t &date); /// Returns items. unsigned int getItems(CrawlStatus status, std::set &urls); /// Returns items that belong to a source. unsigned int getSourceItems(unsigned int sourceId, CrawlStatus status, std::set &urls, unsigned int min, unsigned int max, time_t minDate = 0); /// Returns the number of URLs. unsigned int getItemsCount(CrawlStatus status); /// Deletes an URL. bool deleteItem(const std::string &url); /// Deletes all items under a given URL. bool deleteItems(const std::string &url); /// Deletes URLs belonging to a source. bool deleteItems(unsigned int sourceId, CrawlStatus status = UNKNOWN); /// Expires items older than the given date. bool expireItems(time_t expiryDate); protected: static std::string statusToText(CrawlStatus status); static CrawlStatus textToStatus(const std::string &text); private: CrawlHistory(const CrawlHistory &other); CrawlHistory &operator=(const CrawlHistory &other); }; /// An item in CrawlHistory. class CrawlItem { public: CrawlItem() : m_itemStatus(CrawlHistory::UNKNOWN), m_itemDate(0), m_errNum(0) { } CrawlItem(CrawlHistory::CrawlStatus itemStatus, time_t itemDate, int errNum) : m_itemStatus(itemStatus), m_itemDate(itemDate), m_errNum(errNum) { } CrawlItem(const CrawlItem &other) : m_itemStatus(other.m_itemStatus), m_itemDate(other.m_itemDate), m_errNum(other.m_errNum) { } ~CrawlItem() { } CrawlHistory::CrawlStatus m_itemStatus; time_t m_itemDate; int m_errNum; }; #endif // _CRAWL_HISTORY_H pinot-1.22/SQL/Makefile.am000066400000000000000000000011431470740426600152730ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in pkginclude_HEADERS = \ ActionQueue.h \ CrawlHistory.h \ MetaDataBackup.h \ QueryHistory.h \ SQLDB.h \ SQLiteBase.h \ ViewHistory.h pkglib_LTLIBRARIES = libSQL.la libSQLite.la libSQLDB.la libSQL_la_LDFLAGS = \ -static libSQL_la_SOURCES = \ SQLDB.cpp libSQLite_la_LDFLAGS = \ -static libSQLite_la_SOURCES = \ SQLiteBase.cpp libSQLDB_la_LDFLAGS = \ -static libSQLDB_la_SOURCES = \ ActionQueue.cpp \ CrawlHistory.cpp \ MetaDataBackup.cpp \ QueryHistory.cpp \ ViewHistory.cpp AM_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils pinot-1.22/SQL/MetaDataBackup.cpp000066400000000000000000000266601470740426600165640ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #include #ifdef HAVE_SYS_XATTR_H #include #endif #include #include #include #include "Url.h" #include "StringManip.h" #include "TimeConverter.h" #include "MetaDataBackup.h" using std::clog; using std::endl; using std::string; using std::set; MetaDataBackup::MetaDataBackup(const string &database) : SQLiteBase(database) { } MetaDataBackup::~MetaDataBackup() { } bool MetaDataBackup::setAttribute(const DocumentInfo &docInfo, const string &name, const string &value, bool noXAttr) { string url(docInfo.getLocation()); string urlWithIPath(docInfo.getLocation(true)); #ifdef HAVE_SYS_XATTR_H Url urlObj(url); // If the file is local and isn't a nested document, use an extended attribute if ((noXAttr == false) && (urlObj.isLocal() == true) && (docInfo.getInternalPath().empty() == true)) { string fileName(url.substr(urlObj.getProtocol().length() + 3)); string attrName("pinot." + name); // Set an attribute, and add an entry in the table if (setxattr(fileName.c_str(), attrName.c_str(), value.c_str(), (size_t)value.length(), 0) != 0) { #ifdef DEBUG clog << "MetaDataBackup::setAttribute: setxattr failed with error " << errno << endl; #endif } } #endif bool update = false, success = false; // Is there already such an item for this URL ? SQLResults *results = executeStatement("SELECT Url FROM MetaDataBackup " "WHERE Url='%q' AND Name='%q';", Url::escapeUrl(urlWithIPath).c_str(), name.c_str()); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { // Yes, there is update = true; delete row; } delete results; } if (update == false) { results = executeStatement("INSERT INTO MetaDataBackup " "VALUES('%q', '%q', '%q');", Url::escapeUrl(urlWithIPath).c_str(), name.c_str(), value.c_str()); } else { results = executeStatement("UPDATE MetaDataBackup " "SET Value='%q' WHERE Url='%q' AND Name='%q';", value.c_str(), Url::escapeUrl(urlWithIPath).c_str(), name.c_str()); } if (results != NULL) { success = true; delete results; } return success; } bool MetaDataBackup::getAttribute(const DocumentInfo &docInfo, const string &name, string &value, bool noXAttr) { string url(docInfo.getLocation()); string urlWithIPath(docInfo.getLocation(true)); bool success = false; #ifdef HAVE_SYS_XATTR_H Url urlObj(url); // If the file is local and isn't a nested document, use an extended attribute if ((noXAttr == false) && (urlObj.isLocal() == true) && (docInfo.getInternalPath().empty() == true)) { string fileName(url.substr(urlObj.getProtocol().length() + 3)); string attrName("pinot." + name); ssize_t attrSize = getxattr(fileName.c_str(), attrName.c_str(), NULL, 0); if (attrSize > 0) { char *pAttr = new char[attrSize]; if (getxattr(fileName.c_str(), attrName.c_str(), pAttr, attrSize) > 0) { value = string(pAttr, attrSize); success = true; } else if (errno != ENOTSUP) { // Extended attributes are supported, but this one doesn't exist delete[] pAttr; return false; } delete[] pAttr; } } #endif SQLResults *results = executeStatement("SELECT Value FROM MetaDataBackup " "WHERE Url='%q' AND Name='%q';", Url::escapeUrl(urlWithIPath).c_str(), name.c_str()); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { value = row->getColumn(0); success = true; delete row; } delete results; } return success; } bool MetaDataBackup::getAttributes(const DocumentInfo &docInfo, const string &name, set &values) { string url(docInfo.getLocation()); string urlWithIPath(docInfo.getLocation(true)); bool success = false; #if 0 Url urlObj(url); // If the file is local and isn't a nested document, use an extended attribute if ((urlObj.isLocal() == true) && (docInfo.getInternalPath().empty() == true)) { string likeName("pinot." + name); ssize_t listSize = flistxattr(fd, NULL, 0); if (listSize > 0) { char *pList = new char[listSize]; if ((pList != NULL) && (flistxattr(fd, pList, listSize) > 0)) { string attrList(pList, listSize); string::size_type startPos = 0, endPos = attrList.find('\0'); while (endPos != string::npos) { string attrName(attrList.substr(startPos, endPos - startPos)); if ((attrName.length() > likeName.length()) && (attrName.substr(0, likeName.length()) == likeName)) { string value; if (getAttribute(url, attrName.substr(6), value, true) == true) { values.insert(value); } } // Next startPos = endPos + 1; if (startPos < listSize) { endPos = attrList.find('\0', startPos); } else { endPos = string::npos; } } } delete[] pList; } } #endif SQLResults *results = executeStatement("SELECT Value FROM MetaDataBackup " "WHERE Url='%q' AND Name LIKE '%q%%';", Url::escapeUrl(urlWithIPath).c_str(), name.c_str()); if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { continue; } values.insert(row->getColumn(0)); success = true; delete row; } delete results; } return success; } bool MetaDataBackup::removeAttribute(const DocumentInfo &docInfo, const string &name, bool noXAttr, bool likeName) { string url(docInfo.getLocation()); string urlWithIPath(docInfo.getLocation(true)); bool success = false; #ifdef HAVE_SYS_XATTR_H Url urlObj(url); // If the file is local and isn't a nested document, use an extended attribute if ((noXAttr == false) && (url.empty() == false) && (urlObj.isLocal() == true) && (docInfo.getInternalPath().empty() == true)) { string fileName(url.substr(urlObj.getProtocol().length() + 3)); string attrName("pinot." + name); if (removexattr(fileName.c_str(), attrName.c_str()) > 0) { return true; } else if (errno != ENOTSUP) { // Extended attributes are supported, but this one doesn't exist return false; } } #endif // Delete from MetaDataBackup SQLResults *results = NULL; if (urlWithIPath.empty() == false) { if (likeName == false) { results = executeStatement("DELETE FROM MetaDataBackup " "WHERE Url='%q' AND NAME='%q';", Url::escapeUrl(urlWithIPath).c_str(), name.c_str()); } else { results = executeStatement("DELETE FROM MetaDataBackup " "WHERE Url='%q' AND NAME LIKE '%q%%';", Url::escapeUrl(urlWithIPath).c_str(), name.c_str()); } } else { results = executeStatement("DELETE FROM MetaDataBackup " "WHERE NAME='%q';", name.c_str()); } if (results != NULL) { success = true; delete results; } return success; } /// Creates the MetaDataBackup table in the database. bool MetaDataBackup::create(const string &database) { bool success = true; // The specified path must be a file if (SQLiteBase::check(database) == false) { return false; } SQLiteBase db(database); // Does MetaDataBackup exist ? if (db.executeSimpleStatement("SELECT * FROM MetaDataBackup LIMIT 1;") == false) { #ifdef DEBUG clog << "MetaDataBackup::create: MetaDataBackup doesn't exist" << endl; #endif // Create the table if (db.executeSimpleStatement("CREATE TABLE MetaDataBackup (Url VARCHAR(255), " "Name VARCHAR(255), Value TEXT, PRIMARY KEY(Url, Value));") == false) { success = false; } } return success; } /// Adds an item. bool MetaDataBackup::addItem(const DocumentInfo &docInfo, DocumentInfo::SerialExtent extent) { bool success = false; if ((extent == DocumentInfo::SERIAL_FIELDS) || (extent == DocumentInfo::SERIAL_ALL)) { if (setAttribute(docInfo, "fields", docInfo.serialize(DocumentInfo::SERIAL_FIELDS)) == false) { return false; } success = true; } if ((extent == DocumentInfo::SERIAL_LABELS) || (extent == DocumentInfo::SERIAL_ALL)) { success = true; const set &labels = docInfo.getLabels(); for (set::const_iterator labelIter = labels.begin(); labelIter != labels.end(); ++labelIter) { // Skip internal labels if (labelIter->substr(0, 2) == "X-") { continue; } if (setAttribute(docInfo, string("label.") + *labelIter, *labelIter, true) == false) { success = false; } } } return success; } /// Gets an item. bool MetaDataBackup::getItem(DocumentInfo &docInfo, DocumentInfo::SerialExtent extent) { string value; bool success = false; if ((extent == DocumentInfo::SERIAL_FIELDS) || (extent == DocumentInfo::SERIAL_ALL)) { if (getAttribute(docInfo, "fields", value) == true) { docInfo.deserialize(value, DocumentInfo::SERIAL_FIELDS); success = true; } } if ((extent == DocumentInfo::SERIAL_LABELS) || (extent == DocumentInfo::SERIAL_ALL)) { set labels; if (getAttributes(docInfo, "label.", labels) == false) { success = false; } else { docInfo.setLabels(labels); success = true; } } return success; } /// Gets items. bool MetaDataBackup::getItems(const string &likeUrl, set &urls, unsigned long min, unsigned long max) { SQLResults *results = NULL; bool success = false; // Even when attributes are used, an entry is always added to the table if (likeUrl.empty() == true) { results = executeStatement("SELECT Url FROM MetaDataBackup " "LIMIT %u OFFSET %u;", max - min, min); } else { results = executeStatement("SELECT Url FROM MetaDataBackup " "WHERE Url LIKE '%q%%' LIMIT %u OFFSET %u;", likeUrl.c_str(), max - min, min); } if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { continue; } urls.insert(Url::unescapeUrl(row->getColumn(0))); success = true; delete row; } delete results; } return success; } /// Deletes an item. bool MetaDataBackup::deleteItem(const DocumentInfo &docInfo, DocumentInfo::SerialExtent extent, const string &value) { bool success = false; if ((extent == DocumentInfo::SERIAL_FIELDS) || (extent == DocumentInfo::SERIAL_ALL)) { if (removeAttribute(docInfo, "fields") == false) { return false; } success = true; } if ((extent == DocumentInfo::SERIAL_LABELS) || (extent == DocumentInfo::SERIAL_ALL)) { if (value.empty() == false) { success = removeAttribute(docInfo, string("label.") + value, true); } else { success = removeAttribute(docInfo, "label.", true, true); } } return success; } /// Deletes a label. bool MetaDataBackup::deleteLabel(const string &value) { if ((value.empty() == true) || (removeAttribute(DocumentInfo("", "", "", ""), string("label.") + value, true) == false)) { return false; } return true; } pinot-1.22/SQL/MetaDataBackup.h000066400000000000000000000045701470740426600162250ustar00rootroot00000000000000/* * Copyright 2005-2009 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _META_DATA_BACKUP_H #define _META_DATA_BACKUP_H #include #include #include #include "DocumentInfo.h" #include "SQLiteBase.h" /// Handles the MetaDataBackup table. class MetaDataBackup : public SQLiteBase { public: MetaDataBackup(const std::string &database); virtual ~MetaDataBackup(); /// Creates the MetaDataBackup table in the database. static bool create(const std::string &database); /// Adds an item. bool addItem(const DocumentInfo &docInfo, DocumentInfo::SerialExtent extent); /// Gets an item. bool getItem(DocumentInfo &docInfo, DocumentInfo::SerialExtent extent); /// Gets items. bool getItems(const std::string &likeUrl, std::set &urls, unsigned long min, unsigned long max); /// Deletes an item. bool deleteItem(const DocumentInfo &docInfo, DocumentInfo::SerialExtent extent, const std::string &value = ""); /// Deletes a label. bool deleteLabel(const std::string &value); protected: bool setAttribute(const DocumentInfo &docInfo, const std::string &name, const std::string &value, bool noXAttr = false); bool getAttribute(const DocumentInfo &docInfo, const std::string &name, std::string &value, bool noXAttr = false); bool getAttributes(const DocumentInfo &docInfo, const std::string &name, std::set &values); bool removeAttribute(const DocumentInfo &docInfo, const std::string &name, bool noXAttr = false, bool likeName = false); private: MetaDataBackup(const MetaDataBackup &other); MetaDataBackup &operator=(const MetaDataBackup &other); }; #endif // _META_DATA_BACKUP_H pinot-1.22/SQL/QueryHistory.cpp000066400000000000000000000205121470740426600164330ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include "TimeConverter.h" #include "Url.h" #include "QueryHistory.h" using std::clog; using std::endl; using std::string; using std::set; using std::vector; QueryHistory::QueryHistory(const string &database) : SQLiteBase(database) { } QueryHistory::~QueryHistory() { } /// Creates the QueryHistory table in the database. bool QueryHistory::create(const string &database) { // The specified path must be a file if (SQLiteBase::check(database) == false) { return false; } SQLiteBase db(database); // Does QueryHistory exist ? if (db.executeSimpleStatement("SELECT * FROM QueryHistory LIMIT 1;") == false) { // Create the table if (db.executeSimpleStatement("CREATE TABLE QueryHistory (QueryName VARCHAR(255), " "EngineName VARCHAR(255), HostName VARCHAR(255), Url VARCHAR(255), " "Title VARCHAR(255), Extract VARCHAR(255), Score FLOAT, Date INTEGER, " "PRIMARY KEY(QueryName, EngineName, Url, Date));") == false) { return false; } } return true; } /// Inserts an URL. bool QueryHistory::insertItem(const string &queryName, const string &engineName, const string &url, const string &title, const string &extract, float score) { Url urlObj(url); string hostName(urlObj.getHost()); bool success = false; SQLResults *results = executeStatement("INSERT INTO QueryHistory " "VALUES('%q', '%q', '%q', '%q', '%q', '%q', '%f', '%d');", queryName.c_str(), engineName.c_str(), hostName.c_str(), Url::escapeUrl(url).c_str(), title.c_str(), extract.c_str(), score, time(NULL)); if (results != NULL) { success = true; delete results; } return success; } /// Checks if an URL is in the history; returns its current score or 0 if not found. float QueryHistory::hasItem(const string &queryName, const string &engineName, const string &url, float &previousScore) { float score = 0; SQLResults *results = executeStatement("SELECT Score FROM QueryHistory " "WHERE QueryName='%q' AND EngineName='%q' AND Url='%q' ORDER BY Date DESC;", queryName.c_str(), engineName.c_str(), Url::escapeUrl(url).c_str()); if (results != NULL) { previousScore = 0; SQLRow *row = results->nextRow(); if (row != NULL) { score = (float)atof(row->getColumn(0).c_str()); delete row; // Get the score of the second last run SQLRow *row = results->nextRow(); if (row != NULL) { previousScore = (float)atof(row->getColumn(0).c_str()); delete row; } } delete results; } return score; } /// Gets the list of engines the query was run on. bool QueryHistory::getEngines(const string &queryName, set &enginesList) { bool success = false; SQLResults *results = executeStatement("SELECT EngineName FROM QueryHistory " "WHERE QueryName='%q' GROUP BY EngineName;", queryName.c_str()); if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } enginesList.insert(row->getColumn(0)); success = true; delete row; } delete results; } return success; } /// Gets the first max items for the given query, engine pair. bool QueryHistory::getItems(const string &queryName, const string &engineName, unsigned int max, vector &resultsList) { bool success = false; SQLResults *results = executeStatement("SELECT Title, Url, Extract, Score, Date " "FROM QueryHistory WHERE QueryName='%q' AND EngineName='%q' " "ORDER BY Date DESC, Score DESC LIMIT %u;", queryName.c_str(), engineName.c_str(), max); if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } DocumentInfo result(row->getColumn(0), Url::unescapeUrl(row->getColumn(1)).c_str(), "", ""); result.setExtract(row->getColumn(2)); result.setScore((float)atof(row->getColumn(3).c_str())); int runDate = atoi(row->getColumn(4).c_str()); result.setTimestamp(TimeConverter::toTimestamp((time_t)runDate)); resultsList.push_back(result); success = true; delete row; } delete results; } return success; } /// Gets an item's extract. string QueryHistory::getItemExtract(const string &queryName, const string &engineName, const string &url) { string extract; SQLResults *results = executeStatement("SELECT Extract FROM QueryHistory " "WHERE QueryName='%q' AND EngineName='%q' AND Url='%q' ORDER BY Date DESC;", queryName.c_str(), engineName.c_str(), Url::escapeUrl(url).c_str()); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { extract = row->getColumn(0); delete row; } delete results; } return extract; } /// Finds URLs. bool QueryHistory::findUrlsLike(const string &url, unsigned int count, set &urls) { bool success = false; if (url.empty() == true) { return false; } SQLResults *results = executeStatement("SELECT Url FROM QueryHistory " "WHERE Url LIKE '%q%%' ORDER BY Url LIMIT %u;", Url::escapeUrl(url).c_str(), count); if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } urls.insert(Url::unescapeUrl(row->getColumn(0))); success = true; delete row; } delete results; } return success; } /// Gets a query's latest run times. bool QueryHistory::getLatestRuns(const string &queryName, const string &engineName, unsigned int runCount, set &runTimes) { SQLResults *results = NULL; bool success = false; if (queryName.empty() == true) { return false; } if (engineName.empty() == true) { results = executeStatement("SELECT Date FROM QueryHistory " "WHERE QueryName='%q' GROUP BY EngineName ORDER By Date DESC LIMIT %u;", queryName.c_str(), runCount); } else { results = executeStatement("SELECT Date FROM QueryHistory " "WHERE QueryName='%q' AND EngineName='%q' GROUP BY Date ORDER By Date DESC LIMIT %u;", queryName.c_str(), engineName.c_str(), runCount); } if (results != NULL) { while (results->hasMoreRows() == true) { SQLRow *row = results->nextRow(); if (row == NULL) { break; } int runDate = atoi(row->getColumn(0).c_str()); if (runDate > 0) { runTimes.insert((time_t)runDate); } success = true; delete row; } delete results; } return success; } /// Deletes items at least as old as the given date. bool QueryHistory::deleteItems(const string &queryName, const string &engineName, time_t cutOffDate) { if (cutOffDate == 0) { // Nothing to delete return true; } SQLResults *results = executeStatement("DELETE FROM QueryHistory " "WHERE QueryName='%q' AND EngineName='%q' AND Date<'%d';", queryName.c_str(), engineName.c_str(), cutOffDate); if (results != NULL) { delete results; return true; } return false; } /// Deletes items. bool QueryHistory::deleteItems(const string &name, bool isQueryName) { SQLResults *results = NULL; if (isQueryName == true) { results = executeStatement("DELETE FROM QueryHistory " "WHERE QueryName='%q';", name.c_str()); } else { results = executeStatement("DELETE FROM QueryHistory " "WHERE EngineName='%q';", name.c_str()); } if (results != NULL) { delete results; return true; } return false; } /// Expires items older than the given date. bool QueryHistory::expireItems(time_t expiryDate) { if (expiryDate == 0) { // Nothing to delete return true; } SQLResults *results = executeStatement("DELETE FROM QueryHistory " "WHERE Date<'%d';", expiryDate); if (results != NULL) { delete results; return true; } return false; } pinot-1.22/SQL/QueryHistory.h000066400000000000000000000055401470740426600161040ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _QUERY_HISTORY_H #define _QUERY_HISTORY_H #include #include #include #include "DocumentInfo.h" #include "SQLiteBase.h" /// Manages query history. class QueryHistory : public SQLiteBase { public: QueryHistory(const std::string &database); virtual ~QueryHistory(); /// Creates the QueryHistory table in the database. static bool create(const std::string &database); /// Inserts an URL. bool insertItem(const std::string &queryName, const std::string &engineName, const std::string &url, const std::string &title, const std::string &extract, float score); /** * Checks if an URL is in the query's history. * If it is, it returns the current and previous scores; returns 0 if not found. */ float hasItem(const std::string &queryName, const std::string &engineName, const std::string &url, float &previousScore); /// Gets the list of engines the query was run on. bool getEngines(const std::string &queryName, std::set &enginesList); /// Gets the first max items for the given query, engine pair. bool getItems(const std::string &queryName, const std::string &engineName, unsigned int max, std::vector &resultsList); /// Gets an item's extract. std::string getItemExtract(const std::string &queryName, const std::string &engineName, const std::string &url); /// Finds URLs. bool findUrlsLike(const std::string &url, unsigned int count, std::set &urls); /// Gets a query's latest run times. bool getLatestRuns(const std::string &queryName, const std::string &engineName, unsigned int runCount, std::set &runTimes); /// Deletes items at least as old as the given date. bool deleteItems(const std::string &queryName, const std::string &engineName, time_t cutOffDate); /// Deletes items. bool deleteItems(const std::string &name, bool isQueryName); /// Expires items older than the given date. bool expireItems(time_t expiryDate); private: QueryHistory(const QueryHistory &other); QueryHistory &operator=(const QueryHistory &other); }; #endif // _QUERY_HISTORY_H pinot-1.22/SQL/SQLDB.cpp000066400000000000000000000041521470740426600146130ustar00rootroot00000000000000/* * Copyright 2008-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include "SQLDB.h" using std::clog; using std::endl; using std::string; SQLRow::SQLRow(unsigned int nColumns) : m_nColumns(nColumns) { } SQLRow::~SQLRow() { } unsigned int SQLRow::getColumnsCount(void) const { return m_nColumns; } SQLResults::SQLResults(unsigned long nRows, unsigned int nColumns) : m_nRows(nRows), m_nColumns(nColumns), m_nCurrentRow(0) { // Check we actually have results if (m_nRows == 0) { m_nRows = m_nColumns = m_nCurrentRow = 0; } } SQLResults::~SQLResults() { } bool SQLResults::hasMoreRows(void) const { if ((m_nRows > 0) && (m_nCurrentRow < m_nRows)) { return true; } return false; } bool SQLResults::rewind(void) { m_nCurrentRow = 0; return true; } SQLDB::SQLDB(const string &databaseName, bool readOnly) : m_databaseName(databaseName), m_readOnly(readOnly) { } SQLDB::~SQLDB() { } bool SQLDB::upgrade(unsigned int versionNum, const string &sql, const string &sqlPostUpgrade) { if (beginTransaction() == false) { return false; } bool upgraded = executeSimpleStatement(sql); endTransaction(); if (upgraded == false) { return false; } executeSimpleStatement(sqlPostUpgrade); return true; } bool SQLDB::isReadOnly(void) const { return m_readOnly; } pinot-1.22/SQL/SQLDB.h000066400000000000000000000062071470740426600142630ustar00rootroot00000000000000/* * Copyright 2008-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _SQL_DB_H #define _SQL_DB_H #include #include #include /// A row of results. class SQLRow { public: virtual ~SQLRow(); typedef enum { SQL_TYPE_INT = 0, SQL_TYPE_DOUBLE, SQL_TYPE_TIME, SQL_TYPE_DATE, SQL_TYPE_DATETIME, SQL_TYPE_TIMESTAMP, SQL_TYPE_STRING, SQL_TYPE_BLOB, SQL_TYPE_NULL } SQLType; unsigned int getColumnsCount(void) const; virtual std::string getColumn(unsigned int nColumn) const = 0; protected: unsigned int m_nColumns; SQLRow(unsigned int nColumns); private: SQLRow(const SQLRow &other); SQLRow &operator=(const SQLRow &other); }; /// Results extracted from the database. class SQLResults { public: virtual ~SQLResults(); virtual bool hasMoreRows(void) const; virtual std::string getColumnName(unsigned int nColumn) const = 0; virtual SQLRow *nextRow(void) = 0; virtual bool rewind(void); protected: unsigned long m_nRows; unsigned int m_nColumns; unsigned long int m_nCurrentRow; SQLResults(unsigned long nRows, unsigned int nColumns); private: SQLResults(const SQLResults &other); SQLResults &operator=(const SQLResults &other); }; /// A SQL database. class SQLDB { public: virtual ~SQLDB(); virtual bool upgrade(unsigned int versionNum, const std::string &sql, const std::string &sqlPostUpgrade); virtual bool isReadOnly(void) const; virtual bool isOpen(void) const = 0; virtual bool alterTable(const std::string &tableName, const std::string &columns, const std::string &newDefinition) = 0; virtual bool beginTransaction(void) = 0; virtual bool rollbackTransaction(void) = 0; virtual bool endTransaction(void) = 0; virtual bool executeSimpleStatement(const std::string &sql) = 0; virtual SQLResults *executeStatement(const char *sqlFormat, ...) = 0; virtual bool prepareStatement(const std::string &statementId, const std::string &sqlFormat) = 0; virtual SQLResults *executePreparedStatement(const std::string &statementId, const std::vector &values) = 0; virtual SQLResults *executePreparedStatement(const std::string &statementId, const std::vector > &values) = 0; protected: std::string m_databaseName; bool m_readOnly; SQLDB(const std::string &databaseName, bool readOnly = false); private: SQLDB(const SQLDB &other); SQLDB &operator=(const SQLDB &other); }; #endif // _SQL_DB_H pinot-1.22/SQL/SQLiteBase.cpp000066400000000000000000000404101470740426600156770ustar00rootroot00000000000000/* * Copyright 2005-2014 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include "config.h" #include "NLS.h" #include "SQLiteBase.h" using std::clog; using std::endl; using std::string; using std::vector; using std::map; using std::pair; using std::for_each; static int busyHandler(void *pData, int lockNum) { // Try again after 100 ms usleep(100000); return 1; } // A function object to finalize statements with for_each() struct FinalizeStatementsFunc { public: void operator()(map::value_type &p) { if (p.second != NULL) { sqlite3_finalize(p.second); } } }; SQLiteRow::SQLiteRow(const vector &rowColumns, unsigned int nColumns) : SQLRow(nColumns), m_pStatement(NULL) { if (rowColumns.empty() == false) { m_columns.reserve(rowColumns.size()); #if 0 // FIXME: why does this segfault in string::assign() ? copy(rowColumns.begin(), rowColumns.end(), m_columns.begin()); #else for (vector::const_iterator colIter = rowColumns.begin(); colIter != rowColumns.end(); ++colIter) { m_columns.push_back(*colIter); } #endif } } SQLiteRow::SQLiteRow(sqlite3_stmt *pStatement, unsigned int nColumns) : SQLRow(nColumns), m_pStatement(pStatement) { } SQLiteRow::~SQLiteRow() { } string SQLiteRow::getColumn(unsigned int nColumn) const { if (m_pStatement != NULL) { const unsigned char *pTextColumn = sqlite3_column_text(m_pStatement, nColumn); if (pTextColumn != NULL) { return (const char*)pTextColumn; } // We may sometime not be able to get a column, eg a sum of 0 records will have type SQLITE_NULL int columnType = sqlite3_column_type(m_pStatement, nColumn); if (columnType != SQLITE_NULL) { #ifdef DEBUG clog << "SQLiteRow::getColumn: couldn't get column " << nColumn << ", type " << columnType << endl; #endif } return ""; } if (nColumn < m_nColumns) { vector::const_iterator colIter = m_columns.begin(); for (unsigned int i = 0; (i < m_nColumns) && (colIter != m_columns.end()); ++i) { if (i == nColumn) { string column(*colIter); return column; } ++colIter; } } return ""; } SQLiteResults::SQLiteResults(char **results, unsigned long nRows, unsigned int nColumns) : SQLResults(nRows, nColumns), m_results(results), m_pStatement(NULL), m_done(false), m_firstStep(true), m_stepCode(SQLITE_BUSY) { // Check we actually have results if ((m_results == NULL) || (m_nRows <= 0)) { m_nRows = m_nCurrentRow = 0; m_nColumns = 0; } } SQLiteResults::SQLiteResults(sqlite3_stmt *pStatement) : SQLResults(0, sqlite3_column_count(pStatement)), m_results(NULL), m_pStatement(pStatement), m_done(false), m_firstStep(true), m_stepCode(SQLITE_BUSY) { // If the statement returns rows, this will get the first row // If not, this will be evaluate and "complete" the statement step(); } SQLiteResults::~SQLiteResults() { if (m_results != NULL) { sqlite3_free_table(m_results); } if (m_pStatement != NULL) { rewind(); } } void SQLiteResults::step(void) { if (m_pStatement == NULL) { return; } m_stepCode = SQLITE_BUSY; while ((m_stepCode == SQLITE_BUSY) || (m_stepCode == SQLITE_IOERR_BLOCKED)) { m_stepCode = sqlite3_step(m_pStatement); if ((m_stepCode == SQLITE_BUSY) || (m_stepCode == SQLITE_IOERR_BLOCKED)) { #if SQLITE_VERSION_NUMBER < 3006024 // 3.6.23 and older require a call to sqlite3_reset() for any code other // than SQLITE_ROW before stepping again // http://www.sqlite.org/c3ref/step.html rewind(); #endif // Sleep roughly a sixth of a second, ie around 10 write operations usleep(150000); } else if ((m_stepCode != SQLITE_ROW) && (m_stepCode != SQLITE_DONE)) { #ifdef DEBUG clog << "Step returned error code " << m_stepCode << endl; #endif } } } int SQLiteResults::getStepCode(void) const { return m_stepCode; } bool SQLiteResults::hasMoreRows(void) const { if (m_pStatement != NULL) { return !m_done; } return SQLResults::hasMoreRows(); } string SQLiteResults::getColumnName(unsigned int nColumn) const { if (m_pStatement != NULL) { return sqlite3_column_name(m_pStatement, (int)nColumn); } if (nColumn < m_nColumns) { return m_results[nColumn]; } return ""; } SQLRow *SQLiteResults::nextRow(void) { if (m_pStatement != NULL) { if (m_done == true) { return NULL; } if (m_firstStep == false) { step(); } else { m_firstStep = false; } if (m_stepCode == SQLITE_ROW) { ++m_nCurrentRow; return new SQLiteRow(m_pStatement, m_nColumns); } else if (m_stepCode == SQLITE_DONE) { m_done = true; } else { clog << "Failed to get next result row, error code " << m_stepCode << endl; } return NULL; } if ((m_nCurrentRow < 0) || (m_nCurrentRow >= m_nRows)) { return NULL; } // The very first row holds the column names unsigned long firstIndex = (m_nCurrentRow + 1) * m_nColumns; unsigned long lastIndex = firstIndex + m_nColumns - 1; vector rowColumns; for (unsigned long i = firstIndex; i <= lastIndex; ++i) { if (m_results[i] == NULL) { rowColumns.push_back(""); } else { rowColumns.push_back(m_results[i]); } } ++m_nCurrentRow; return new SQLiteRow(rowColumns, m_nColumns); } bool SQLiteResults::rewind(void) { SQLResults::rewind(); if (m_pStatement != NULL) { // The constructor made sure that step() ran at least once sqlite3_reset(m_pStatement); m_done = false; } return true; } SQLiteBase::SQLiteBase(const string &databaseName, bool readOnly, bool onDemand) : SQLDB(databaseName, readOnly), m_onDemand(onDemand), m_inTransaction(false), m_pDatabase(NULL) { if (m_onDemand == false) { open(); } } SQLiteBase::~SQLiteBase() { if (m_onDemand == false) { close(); } } void SQLiteBase::executeSimpleStatement(const string &sql, int &execError) { char *errMsg = NULL; execError = sqlite3_exec(m_pDatabase, sql.c_str(), NULL, NULL, // No callback &errMsg); if (execError != SQLITE_OK) { if (errMsg != NULL) { clog << m_databaseName << ": SQL <" << sql << "> failed with error " << execError << ": " << errMsg << endl; sqlite3_free(errMsg); } } } void SQLiteBase::open(void) { int openFlags = SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE; if (m_readOnly == true) { openFlags = SQLITE_OPEN_READONLY; } // Open the new database // FIXME: ensure it's in mode SQLITE_CONFIG_SERIALIZED if (sqlite3_open_v2(m_databaseName.c_str(), &m_pDatabase, openFlags, NULL) != SQLITE_OK) { // An handle is returned even when an error occurs ! if (m_pDatabase != NULL) { clog << m_databaseName << ": " << sqlite3_errmsg(m_pDatabase) << endl; close(); } } if (m_pDatabase != NULL) { // Set up a busy handler sqlite3_busy_handler(m_pDatabase, busyHandler, NULL); } else { clog << "Couldn't open " << m_databaseName << endl; } } void SQLiteBase::close(void) { if (m_pDatabase != NULL) { for_each(m_statements.begin(), m_statements.end(), FinalizeStatementsFunc()); m_statements.clear(); sqlite3_close(m_pDatabase); m_pDatabase = NULL; } } bool SQLiteBase::check(const string &databaseName) { struct stat dbStat; // The specified path must be a file if ((stat(databaseName.c_str(), &dbStat) != -1) && (!S_ISREG(dbStat.st_mode))) { // It exists, but it's not a file as expected clog << databaseName << " is not a file" << endl; return false; } return true; } bool SQLiteBase::backup(const string &destDatabaseName, int pagesCount, bool retryOnLock) { sqlite3 *pBackupDatabase = NULL; int errorCode = sqlite3_open(destDatabaseName.c_str(), &pBackupDatabase); if (errorCode != SQLITE_OK) { return false; } // Open the backup object sqlite3_backup *pBackup = sqlite3_backup_init(pBackupDatabase, "main", m_pDatabase, "main"); if (pBackup != NULL) { // Copy database pages errorCode = sqlite3_backup_step(pBackup, pagesCount); while ((errorCode == SQLITE_OK) || (errorCode == SQLITE_BUSY) || ((errorCode == SQLITE_LOCKED) && (retryOnLock == true))) { // Sleep roughly a sixth of a second, ie around 10 write operations sqlite3_sleep(150); int remainingPages = sqlite3_backup_remaining(pBackup); int totalPages = sqlite3_backup_pagecount(pBackup); int donePages = totalPages - remainingPages; clog << m_databaseName << ": backed up " << donePages << " pages out of " << totalPages << endl; errorCode = sqlite3_backup_step(pBackup, pagesCount); } sqlite3_backup_finish(pBackup); } errorCode = sqlite3_errcode(pBackupDatabase); sqlite3_close(pBackupDatabase); if (errorCode == SQLITE_OK) { return true; } return false; } bool SQLiteBase::isOpen(void) const { if (m_pDatabase == NULL) { return false; } return true; } bool SQLiteBase::reopen(const string &databaseName) { if (m_databaseName == databaseName) { return false; } close(); m_databaseName = databaseName; open(); if (isOpen() == true) { return true; } return false; } bool SQLiteBase::alterTable(const string &tableName, const string &columns, const string &newDefinition) { if ((tableName.empty() == true) || (columns.empty() == true) || (newDefinition.empty() == true)) { return false; } string sql("BEGIN TRANSACTION; CREATE TEMPORARY TABLE "); sql += tableName; sql += "_backup ("; sql += columns; sql += "); INSERT INTO "; sql += tableName; sql += "_backup SELECT "; sql += columns; sql += " FROM "; sql += tableName; sql += "; DROP TABLE "; sql += tableName; sql += "; CREATE TABLE "; sql += tableName; sql += " ("; sql += newDefinition; sql += "); INSERT INTO "; sql += tableName; sql += " SELECT "; sql += columns; sql += " FROM "; sql += tableName; sql += "_backup; DROP TABLE "; sql += tableName; sql += "_backup; COMMIT;"; #ifdef DEBUG clog << "SQLiteBase::alterTable: " << sql << endl; #endif return executeSimpleStatement(sql); } bool SQLiteBase::beginTransaction(void) { if ((m_pDatabase == NULL) || (m_onDemand == true) || (m_inTransaction == true)) { // Not applicable return true; } if (executeSimpleStatement("BEGIN TRANSACTION;") == true) { m_inTransaction = true; return true; } clog << m_databaseName << ": failed to begin transaction" << endl; return false; } bool SQLiteBase::rollbackTransaction(void) { if ((m_pDatabase == NULL) || (m_onDemand == true) || (m_inTransaction == false)) { // Not applicable return true; } int execError = SQLITE_OK; do { executeSimpleStatement("ROLLBACK TRANSACTION;", execError); if (execError == SQLITE_OK) { m_inTransaction = false; return true; } if (execError != SQLITE_BUSY) { clog << m_databaseName << ": failed to rollback transaction with error " << execError << endl; return false; } // This failed because operations are pending // Sleep roughly a sixth of a second, ie around 10 write operations usleep(150000); } while (execError != SQLITE_OK); clog << m_databaseName << ": failed to rollback transaction" << endl; return false; } bool SQLiteBase::endTransaction(void) { if ((m_pDatabase == NULL) || (m_onDemand == true) || (m_inTransaction == false)) { // Not applicable return true; } int execError = SQLITE_OK; do { executeSimpleStatement("END TRANSACTION;", execError); if (execError == SQLITE_OK) { m_inTransaction = false; return true; } if (execError != SQLITE_BUSY) { clog << m_databaseName << ": failed to end transaction with error " << execError << endl; return false; } // This failed because write operations are pending // Sleep roughly a sixth of a second, ie around 10 write operations usleep(150000); } while (execError != SQLITE_OK); clog << m_databaseName << ": failed to end transaction" << endl; return false; } bool SQLiteBase::executeSimpleStatement(const string &sql) { bool success = false; if (sql.empty() == true) { return false; } if (m_onDemand == true) { open(); } if (m_pDatabase == NULL) { return false; } int execError = SQLITE_OK; executeSimpleStatement(sql, execError); if (execError == SQLITE_OK) { success = true; } if (m_onDemand == true) { close(); } return success; } SQLResults *SQLiteBase::executeStatement(const char *sqlFormat, ...) { SQLiteResults *pResults = NULL; va_list ap; if (sqlFormat == NULL) { return NULL; } if (m_onDemand == true) { open(); } if (m_pDatabase == NULL) { return NULL; } va_start(ap, sqlFormat); char *stringBuff = sqlite3_vmprintf(sqlFormat, ap); if (stringBuff == NULL) { clog << m_databaseName << ": couldn't format SQL statement" << endl; if (m_onDemand == true) { close(); } return NULL; } char **results; char *errMsg; int nRows, nColumns; int errorCode = sqlite3_get_table(m_pDatabase, stringBuff, &results, &nRows, &nColumns, &errMsg); if (errorCode != SQLITE_OK) { if (errMsg != NULL) { clog << m_databaseName << ": SQL <" << stringBuff << "> failed with error " << errorCode << ": " << errMsg << endl; sqlite3_free(errMsg); } } else { pResults = new SQLiteResults(results, (unsigned long)nRows, (unsigned int)nColumns); } va_end(ap); if (m_onDemand == true) { close(); } return pResults; } bool SQLiteBase::prepareStatement(const string &statementId, const string &sqlFormat) { if ((sqlFormat.empty() == true) || (m_onDemand == true) || (m_pDatabase == NULL)) { return false; } map::iterator statIter = m_statements.find(statementId); if (statIter != m_statements.end()) { return true; } sqlite3_stmt *pStatement = NULL; if (sqlite3_prepare_v2(m_pDatabase, sqlFormat.c_str(), (int)sqlFormat.length(), &pStatement, NULL) == SQLITE_OK) { if (pStatement != NULL) { m_statements.insert(pair(statementId, pStatement)); return true; } } clog << m_databaseName << ": failed to compile SQL statement " << statementId << endl; return false; } SQLResults *SQLiteBase::executePreparedStatement(const string &statementId, const vector &values) { unsigned int paramIndex = 1; map::iterator statIter = m_statements.find(statementId); if (statIter == m_statements.end()) { #ifdef DEBUG clog << "SQLiteBase::executePreparedStatement: invalid SQL statement ID " << statementId << endl; #endif return NULL; } // Bind values // The left-most parameter's index is 1 for (vector::const_iterator valueIter = values.begin(); valueIter != values.end(); ++valueIter, ++paramIndex) { int errorCode = sqlite3_bind_text(statIter->second, paramIndex, valueIter->c_str(), -1, SQLITE_TRANSIENT); if (errorCode != SQLITE_OK) { clog << m_databaseName << ": failed to bind parameter to statement " << statementId << " (" << paramIndex << "/" << *valueIter << ") with error " << errorCode << endl; return NULL; } } SQLiteResults *pResults = new SQLiteResults(statIter->second); int stepCode = pResults->getStepCode(); if ((stepCode != SQLITE_ROW) && (stepCode != SQLITE_DONE)) { #ifdef DEBUG clog << m_databaseName << ": step for statement " << statementId << " failed with error " << stepCode << " " << sqlite3_errmsg(m_pDatabase) << endl; #endif delete pResults; return NULL; } return pResults; } SQLResults *SQLiteBase::executePreparedStatement(const string &statementId, const vector > &values) { vector untypedValues; // SQLite doesn't care about the type of parameters and results for (vector >::const_iterator valueIter = values.begin(); valueIter != values.end(); ++valueIter) { untypedValues.push_back(valueIter->first); } return executePreparedStatement(statementId, untypedValues); } pinot-1.22/SQL/SQLiteBase.h000066400000000000000000000071641470740426600153550ustar00rootroot00000000000000/* * Copyright 2005-2014 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _SQLITE_BASE_H #define _SQLITE_BASE_H #include #include #include #include #include #include "SQLDB.h" /// A row of results. class SQLiteRow : public SQLRow { public: SQLiteRow(const std::vector &rowColumns, unsigned int nColumns); SQLiteRow(sqlite3_stmt *pStatement, unsigned int nColumns); virtual ~SQLiteRow(); virtual std::string getColumn(unsigned int nColumn) const; protected: std::vector m_columns; sqlite3_stmt *m_pStatement; private: SQLiteRow(const SQLiteRow &other); SQLiteRow &operator=(const SQLiteRow &other); }; /// Results extracted from a SQLite database. class SQLiteResults : public SQLResults { public: SQLiteResults(char **results, unsigned long nRows, unsigned int nColumns); SQLiteResults(sqlite3_stmt *pStatement); virtual ~SQLiteResults(); int getStepCode(void) const; virtual bool hasMoreRows(void) const; virtual std::string getColumnName(unsigned int nColumn) const; virtual SQLRow *nextRow(void); virtual bool rewind(void); protected: char **m_results; sqlite3_stmt *m_pStatement; bool m_done; bool m_firstStep; int m_stepCode; void step(void); private: SQLiteResults(const SQLiteResults &other); SQLiteResults &operator=(const SQLiteResults &other); }; /// Simple C++ wrapper around the SQLite API. class SQLiteBase : public SQLDB { public: SQLiteBase(const std::string &databaseName, bool readOnly = false, bool onDemand = true); virtual ~SQLiteBase(); static bool check(const std::string &databaseName); bool backup(const std::string &destDatabaseName, int pagesCount = 5, bool retryOnLock = true); virtual bool isOpen(void) const; virtual bool reopen(const std::string &databaseName); virtual bool alterTable(const std::string &tableName, const std::string &columns, const std::string &newDefinition); virtual bool beginTransaction(void); virtual bool rollbackTransaction(void); virtual bool endTransaction(void); virtual bool executeSimpleStatement(const std::string &sql); virtual SQLResults *executeStatement(const char *sqlFormat, ...); virtual bool prepareStatement(const std::string &statementId, const std::string &sqlFormat); virtual SQLResults *executePreparedStatement(const std::string &statementId, const std::vector &values); virtual SQLResults *executePreparedStatement(const std::string &statementId, const std::vector > &values); protected: bool m_onDemand; bool m_inTransaction; sqlite3 *m_pDatabase; std::map m_statements; void executeSimpleStatement(const std::string &sql, int &execError); void open(void); void close(void); private: SQLiteBase(const SQLiteBase &other); SQLiteBase &operator=(const SQLiteBase &other); }; #endif // _SQLITE_BASE_H pinot-1.22/SQL/ViewHistory.cpp000066400000000000000000000067751470740426600162570ustar00rootroot00000000000000/* * Copyright 2005-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include "Url.h" #include "ViewHistory.h" using std::clog; using std::endl; using std::string; ViewHistory::ViewHistory(const string &database) : SQLiteBase(database) { } ViewHistory::~ViewHistory() { } /// Creates the ViewHistory table in the database. bool ViewHistory::create(const string &database) { // The specified path must be a file if (SQLiteBase::check(database) == false) { return false; } SQLiteBase db(database); // Does ViewHistory exist ? if (db.executeSimpleStatement("SELECT * FROM ViewHistory LIMIT 1;") == false) { #ifdef DEBUG clog << "ViewHistory::create: ViewHistory doesn't exist" << endl; #endif // Create the table if (db.executeSimpleStatement("CREATE TABLE ViewHistory (Url VARCHAR(255) " "PRIMARY KEY, Status INTEGER, DATE INTEGER);") == false) { return false; } } return true; } /// Inserts an URL. bool ViewHistory::insertItem(const string &url) { bool success = false; SQLResults *results = executeStatement("INSERT INTO ViewHistory " "VALUES('%q', '1', '%d');", Url::escapeUrl(url).c_str(), time(NULL)); if (results != NULL) { success = true; delete results; } return success; } /// Checks if an URL is in the history. bool ViewHistory::hasItem(const string &url) { bool success = false; SQLResults *results = executeStatement("SELECT Url FROM ViewHistory " "WHERE Url='%q';", Url::escapeUrl(url).c_str()); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { // If this returns anything, it's the URL we are looking for #ifdef DEBUG clog << "ViewHistory::hasItem: URL " << row->getColumn(0) << endl; #endif success = true; delete row; } delete results; } return success; } /// Returns the number of items. unsigned int ViewHistory::getItemsCount(void) { unsigned int count = 0; SQLResults *results = executeStatement("SELECT COUNT(*) FROM ViewHistory;"); if (results != NULL) { SQLRow *row = results->nextRow(); if (row != NULL) { count = atoi(row->getColumn(0).c_str()); delete row; } delete results; } return count; } /// Deletes an URL. bool ViewHistory::deleteItem(const string &url) { bool success = false; SQLResults *results = executeStatement("DELETE FROM ViewHistory " "WHERE Url='%q';", Url::escapeUrl(url).c_str()); if (results != NULL) { success = true; delete results; } return success; } /// Expires items older than the given date. bool ViewHistory::expireItems(time_t expiryDate) { bool success = false; SQLResults *results = executeStatement("DELETE FROM ViewHistory " "WHERE Date<'%d';", expiryDate); if (results != NULL) { success = true; delete results; } return success; } pinot-1.22/SQL/ViewHistory.h000066400000000000000000000031541470740426600157100ustar00rootroot00000000000000/* * Copyright 2005-2008 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _VIEW_HISTORY_H #define _VIEW_HISTORY_H #include #include #include "SQLiteBase.h" /// Manages view history. class ViewHistory : public SQLiteBase { public: ViewHistory(const std::string &database); virtual ~ViewHistory(); /// Creates the ViewHistory table in the database. static bool create(const std::string &database); /// Inserts an URL. bool insertItem(const std::string &url); /// Checks if an URL is in the history. bool hasItem(const std::string &url); /// Returns the number of items. unsigned int getItemsCount(void); /// Deletes an URL. bool deleteItem(const std::string &url); /// Expires items older than the given date. bool expireItems(time_t expiryDate); private: ViewHistory(const ViewHistory &other); ViewHistory &operator=(const ViewHistory &other); }; #endif // _VIEW_HISTORY_H pinot-1.22/TODO000066400000000000000000000122241470740426600132720ustar00rootroot00000000000000Documentation - List what files from libtextcat 2.2 go where ? - Say where libtextcat 3.0 can (and cannot ;-) be found - Try listing names of dependency packages for most distros - Explain when indexing and updating are done General - Fix the FIXMEs - Get rid of dead code/classes/methods... - Extend metadata beyond title,location,language,type,timestamp,size - Don't package gmo files, they are platform dependent - CLI programs to use tty highlighting if available - Make sure all of Core is localized Tokenize - Allow to cache documents that had to be converted ? eg PDF, MS Word - Write a PDF filter that handles columns correctly, with poppler ? - WordPerfect filter with libwpd - Office filter with libgst - TeX filter - HtmlFilter to look for META tags Author, Creator, Publisher and CreationDate - XmlFilter is slow-ish, rewrite file parsing with the TextReader interface - Filters should at least return errno when they fail - Use libpng to extract PNGs' metadata - HtmlParser should use CJKVTokenizer's Unicode conversion function - The first non-empty line of plain text output to be used as title SQL - Move history files into the index directories - Set any PRAGMA ? Monitor - Implement support for Solaris FEM Collect - Comply with robot stuff defined at http://www.robotstxt.org/ - Harvest mode grabs all pages on a specific site down to a certain depth - Make User-Agent string configurable - Make download timeout configurable - Support for HTML frames - Curl and NeonDownloader don't share much code Search - Make sure Description files' SyndicationRight is not private or closed - getCloseTerms() should be a search engine method so that WebEngine can use plugins' suggestions Url field (http://developer.mozilla.org/en/docs/Supporting_search_suggestions_in_search_plugins) - Filters with CJKV should work better; supporting quoting would help, eg title:"你好" - Add a plugin for https://arxiv.org/search/ Index - Play around with the XAPIAN_FLUSH_THRESHOLD env var - MD5 hash to determine on updates whether documents have changed, as done by omindex - Allow to access remote Xapian indexes tunneled through ssh with xapian-progsrv, and make sure ssh will ask passwords with /usr/libexec/openssh/ssh-askpass - Reverse terms so that left wildcards can be applied ? - XapianIndex could do with some common code refactoring - After indexing or updating a document, a call to getDocumentInfo() shouldn't be necessary - Labels and the rest of DocumentInfo are handled separately, they shouldn't be - Indexes have no knowledge of indexId's - Be ready to catch DatabaseModifiedError exceptions and reopen the index - Think about security issues, especially when indexes are shared, based on http://plg.uwaterloo.ca/~claclark/fast2005.pdf - Index "compound_word" separately and as a whole - Filters should have a version number so that new versions only reindex documents of the given type Mail - Find out what kind of locking scheme Mozilla uses (POSIX lock ?) and use that - Index Evolution email (Camel, might be useful for other types actually) - Index mail headers - Decypher and use Mozilla's mailbox scheme, eg mailbox://mbox_file_name?number=2164959&part=1.2&type=text/plain&filename=portability.txt - Keep track of attachments and avoid indexing the same file twice - Mailboxes where all messages are flagged by Mozilla/that are empty are not indexed at all Daemon - Allow building without the daemon - Clean up method names - Prefer ustring to string whenever possible - Queue unindexing too - The daemon should ask for permission before reindexing, especially if the corpus is large - Daemon should use worker threads' doWork() instead of duplicating code - Some D-Bus methods need not returning anything - Only crawl newly added locations when the configuration changes UI - Show which threads are running, what they are doing, and allow to stop them selectively - Display search engines icons (Gtk::IconSource::set_filename() and Gtk::Style::render_icon()) - When viewing or indexing a result, all rows for that same URL should be updated with the Viewed or Indexed icons (the latter after IndexingThread returns) - Make use of GTKmm 2.10 StatusIcon - Unknown exceptions in IndexingThread or elsewhere should be logged as errors - Query expansion should be interactive - Default cache provider should be configurable - Unique preferences - Changing set group by mode a few times will show index results under engine "xapian", why ? - getIndexNames() to return ustring's - Always call getIndexPropertiesByName() with a ustring, store engine names as ustring's - Live and stored queries shouldn't cap on the number of results but the number of results per page - For each query group in the results list, show Next and Previous buttons to page through results - Either Live Query behaves like a live query (eg results list updated when new documents match) or it is renamed to something else to avoid confusion - Queries should be cancellable - Queries should return the top N results first, then the rest - Status dialog to show time of latest update - Send a signal when preferences are modified so that the UI and the daemon can reload them - When removing a location to index, delete queued actions for this location pinot-1.22/Tokenize/000077500000000000000000000000001470740426600143715ustar00rootroot00000000000000pinot-1.22/Tokenize/FilterUtils.cpp000066400000000000000000000354261470740426600173550ustar00rootroot00000000000000/* * Copyright 2007-2012 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include "config.h" #include "Memory.h" #include "MIMEScanner.h" #include "StringManip.h" #include "TimeConverter.h" #include "Url.h" #include "TextConverter.h" #include "filters/FilterFactory.h" #include "FilterUtils.h" #define UNSUPPORTED_TYPE "X-Unsupported" #define SIZE_THRESHOLD 5242880 using std::clog; using std::clog; using std::endl; using std::string; using std::set; using std::map; set FilterUtils::m_types; map FilterUtils::m_typeAliases; string FilterUtils::m_maxNestedSize; ReducedAction::ReducedAction() { } ReducedAction::ReducedAction(const ReducedAction &other) { } ReducedAction::~ReducedAction() { } ReducedAction &ReducedAction::operator=(const ReducedAction &other) { return *this; } bool ReducedAction::positionFilter(const Document &doc, Dijon::Filter *pFilter) { return false; } bool ReducedAction::isReduced(const Document &doc) { // Is it reduced to plain text ? if ((doc.getType().length() >= 10) && (doc.getType().substr(0, 10) == "text/plain")) { return true; } return false; } FilterUtils::FilterUtils() { char *pEnvVar = getenv("PINOT_MAXIMUM_NESTED_SIZE"); if ((pEnvVar != NULL) && (strlen(pEnvVar) > 0)) { off_t maxSize = (off_t)atoll(pEnvVar); if (maxSize > 0) { m_maxNestedSize = pEnvVar; } } } FilterUtils::~FilterUtils() { } Dijon::Filter *FilterUtils::getFilter(const string &mimeType) { Dijon::Filter *pFilter = NULL; // Is this type aliased ? map::const_iterator aliasIter = m_typeAliases.find(mimeType); if (aliasIter != m_typeAliases.end()) { if (aliasIter->second == UNSUPPORTED_TYPE) { // We already know that none of this type's parents are supported return NULL; } pFilter = Dijon::FilterFactory::getFilter(aliasIter->second); } else { // Is there a filter for this type ? pFilter = Dijon::FilterFactory::getFilter(mimeType); } if (pFilter != NULL) { return pFilter; } if (mimeType.empty() == false) { set parentTypes; if (m_types.empty() == true) { Dijon::FilterFactory::getSupportedTypes(m_types); } // Try that type's parents MIMEScanner::getParentTypes(mimeType, m_types, parentTypes); for (set::const_iterator parentIter = parentTypes.begin(); parentIter != parentTypes.end(); ++parentIter) { pFilter = Dijon::FilterFactory::getFilter(*parentIter); if (pFilter != NULL) { // Add an alias m_typeAliases[mimeType] = *parentIter; return pFilter; } } #ifdef DEBUG clog << "FilterUtils::getFilter: no valid parent for " << mimeType << endl; #endif // This type has no valid parent m_typeAliases[mimeType] = UNSUPPORTED_TYPE; } return NULL; } bool FilterUtils::isSupportedType(const string &mimeType) { // Is this type aliased ? map::const_iterator aliasIter = m_typeAliases.find(mimeType); if (aliasIter != m_typeAliases.end()) { if (aliasIter->second == UNSUPPORTED_TYPE) { return false; } // We were able to get a filter for this parent type // or a previous call to isSupportedType() succeeded return true; } if (Dijon::FilterFactory::isSupportedType(mimeType) == true) { return true; } if (m_types.empty() == true) { Dijon::FilterFactory::getSupportedTypes(m_types); } // Try that type's parents set parentTypes; MIMEScanner::getParentTypes(mimeType, m_types, parentTypes); for (set::const_iterator parentIter = parentTypes.begin(); parentIter != parentTypes.end(); ++parentIter) { if (Dijon::FilterFactory::isSupportedType(*parentIter) == true) { // Add an alias m_typeAliases[mimeType] = *parentIter; return true; } } #ifdef DEBUG clog << "FilterUtils::isSupportedType: no valid parent for " << mimeType << endl; #endif // This type has no valid parent m_typeAliases[mimeType] = UNSUPPORTED_TYPE; return false; } bool FilterUtils::feedFilter(const Document &doc, Dijon::Filter *pFilter) { string location(doc.getLocation()); Url urlObj(location); string fileName; off_t dataLength = 0; const char *pData = doc.getData(dataLength); bool fedInput = false; if (pFilter == NULL) { return false; } if ((urlObj.getProtocol() == "file") && (location.length() > 7)) { fileName = location.substr(7); } // Prefer feeding the data if (((dataLength > 0) && (pData != NULL)) && (pFilter->is_data_input_ok(Dijon::Filter::DOCUMENT_DATA) == true)) { fedInput = pFilter->set_document_data(pData, dataLength); } // ... to feeding the data through a temporary file if ((fedInput == false) && ((dataLength > 0) && (pData != NULL)) && (pFilter->is_data_input_ok(Dijon::Filter::DOCUMENT_FILE_NAME) == true)) { char inTemplate[18] = "/tmp/filterXXXXXX"; #ifdef HAVE_MKSTEMP int inFd = mkstemp(inTemplate); #else int inFd = -1; char *pInFile = mktemp(inTemplate); if (pInFile != NULL) { inFd = open(pInFile, O_RDONLY); } #endif if (inFd != -1) { #ifdef DEBUG clog << "FilterUtils::feedFilter: feeding temporary file " << inTemplate << endl; #endif // Save the data if (write(inFd, (const void*)pData, dataLength) != -1) { fedInput = pFilter->set_document_file(inTemplate, true); if (fedInput == false) { // We might as well delete the file now unlink(inTemplate); } } close(inFd); } } // ... to feeding the file if ((fedInput == false) && (fileName.empty() == false) && (doc.getInternalPath().empty() == true)) { if (pFilter->is_data_input_ok(Dijon::Filter::DOCUMENT_FILE_NAME) == true) { #ifdef DEBUG clog << "FilterUtils::feedFilter: feeding file " << fileName << endl; #endif fedInput = pFilter->set_document_file(fileName); } // ...and to feeding the file's contents if ((fedInput == false) && (pFilter->is_data_input_ok(Dijon::Filter::DOCUMENT_DATA) == true)) { Document docCopy(doc); if (docCopy.setDataFromFile(fileName) == false) { clog << "Couldn't load " << fileName << endl; return false; } #ifdef DEBUG clog << "FilterUtils::feedFilter: feeding contents of file " << fileName << endl; #endif pData = docCopy.getData(dataLength); if ((dataLength > 0) && (pData != NULL)) { fedInput = pFilter->set_document_data(pData, dataLength); } // Else, the file may be empty } } if (fedInput == false) { clog << "Couldn't feed filter for " << doc.getLocation(true) << endl; return false; } return true; } bool FilterUtils::populateDocument(Document &doc, Dijon::Filter *pFilter) { string charset, uri, ipath; off_t size = 0; bool checkDataType = false, checkFileType = false; if (pFilter == NULL) { return false; } // Go through the whole thing const map &metaData = pFilter->get_meta_data(); for (map::const_iterator metaIter = metaData.begin(); metaIter != metaData.end(); ++metaIter) { if (metaIter->first == "charset") { charset = metaIter->second; } else if (metaIter->first == "date") { doc.setTimestamp(metaIter->second); } else if (metaIter->first == "ipath") { ipath = metaIter->second; } else if (metaIter->first == "language") { doc.setLanguage(metaIter->second); } else if (metaIter->first == "mimetype") { string mimeType(StringManip::toLowerCase(metaIter->second)); if (mimeType == "scan") { checkDataType = true; } else if (mimeType == "scantitle") { checkFileType = true; } else { doc.setType(mimeType); } } else if (metaIter->first == "size") { size = (off_t)atoll(metaIter->second.c_str()); if (size > 0) { doc.setSize(size); } #ifdef DEBUG else clog << "FilterUtils::populateDocument: ignoring size zero" << endl; #endif } else if (metaIter->first == "uri") { uri = metaIter->second; if ((uri.length() >= 18) && (uri.find(":///tmp/filter") != string::npos)) { // We fed the filter a temporary file uri.clear(); } } else { doc.setOther(metaIter->first, metaIter->second); } } if (uri.empty() == false) { doc.setLocation(uri); } if (ipath.empty() == false) { string currentIPath(doc.getInternalPath()); if (currentIPath.empty() == false) { currentIPath += "&next&"; } currentIPath += ipath; doc.setInternalPath(currentIPath); #ifdef DEBUG clog << "FilterUtils::populateDocument: ipath " << currentIPath << endl; #endif } // Content and title may have to be converted TextConverter converter(20); map::const_iterator contentIter = metaData.find("title"); if ((contentIter != metaData.end()) && (contentIter->second.empty() == false)) { dstring nonUTF8Title(contentIter->second.c_str(), contentIter->second.length()); dstring utf8Data(converter.toUTF8(nonUTF8Title, charset)); doc.setTitle(string(utf8Data.c_str(), utf8Data.length())); } const dstring &content = pFilter->get_content(); if (content.empty() == false) { // Scan for the MIME type ? if (checkFileType == true) { // Assume the title is actually a file name string mimeType(MIMEScanner::scanFile(doc.getTitle())); if ((mimeType.empty() == true) || (mimeType == "application/octet-stream")) { // Revert to scanning the content checkDataType = true; } else { doc.setType(mimeType); } } if (checkDataType == true) { doc.setType(MIMEScanner::scanData(content.c_str(), content.length())); } if (doc.getType().substr(0, 10) == "text/plain") { dstring utf8Data(converter.toUTF8(content, charset)); if (converter.getErrorsCount() > 0) { clog << doc.getLocation(true) << " may not have been fully converted to UTF-8" << endl; } doc.setData(utf8Data.c_str(), utf8Data.length()); } else { doc.setData(content.c_str(), content.length()); } } // If the document is big'ish, try and reclaim memory int inUse = Memory::getUsage(); if ((size > SIZE_THRESHOLD) || (content.length() > SIZE_THRESHOLD)) { Memory::reclaim(); } return true; } bool FilterUtils::filterDocument(const Document &doc, const string &originalType, ReducedAction &action) { Dijon::Filter *pFilter = FilterUtils::getFilter(doc.getType()); bool fedFilter = false, positionedFilter = false, docSuccess = false, finalSuccess = false; if (pFilter != NULL) { // Limit the size of nested documents ? if (m_maxNestedSize.empty() == false) { pFilter->set_property(Dijon::Filter::MAXIMUM_NESTED_SIZE, m_maxNestedSize); } fedFilter = FilterUtils::feedFilter(doc, pFilter); } positionedFilter = action.positionFilter(doc, pFilter); if (fedFilter == false) { Document docCopy(doc); if (docCopy.getTitle().empty() == true) { Url urlObj(doc.getLocation()); // Default to the file name as title docCopy.setTitle(urlObj.getFile()); } // Take the appropriate action now finalSuccess = action.takeAction(docCopy, false); if (pFilter != NULL) { delete pFilter; } return finalSuccess; } // At this point, pFilter cannot be NULL bool hasDocs = pFilter->has_documents(); #ifdef DEBUG clog << "FilterUtils::filterDocument: has documents " << hasDocs << endl; #endif while (hasDocs == true) { string actualType(originalType); bool isNested = false; bool emptyTitle = false; if ((positionedFilter == false) && (pFilter->next_document() == false)) { #ifdef DEBUG clog << "FilterUtils::filterDocument: no more documents in " << doc.getLocation(true) << endl; #endif break; } const DocumentInfo *pInfo = dynamic_cast(&doc); string originalTitle(doc.getTitle()); if (pInfo == NULL) { #ifdef DEBUG clog << "FilterUtils::filterDocument: couldn't duplicate document information" << endl; #endif break; } Document filteredDoc(*pInfo); filteredDoc.setType("text/plain"); docSuccess = false; if (populateDocument(filteredDoc, pFilter) == false) { hasDocs = pFilter->has_documents(); continue; } // Is this a nested document ? if (filteredDoc.getInternalPath().length() > doc.getInternalPath().length()) { actualType = filteredDoc.getType(); #ifdef DEBUG clog << "FilterUtils::filterDocument: nested document of type " << actualType << endl; #endif isNested = true; } else if (originalTitle.empty() == false) { // Preserve the top-level document's title filteredDoc.setTitle(originalTitle); } else if (filteredDoc.getTitle().empty() == true) { emptyTitle = true; } // Pass it down to another filter ? if (action.isReduced(filteredDoc) == true) { // Do we need to set a default title ? if (emptyTitle == true) { Url urlObj(doc.getLocation()); // Default to the file name as title filteredDoc.setTitle(urlObj.getFile()); #ifdef DEBUG clog << "FilterUtils::filterDocument: set default title " << urlObj.getFile() << endl; #endif } filteredDoc.setType(actualType); // Take the appropriate action docSuccess = action.takeAction(filteredDoc, isNested); } else { docSuccess = filterDocument(filteredDoc, actualType, action); } // Consider indexing anything a success if (docSuccess == true) { finalSuccess = true; } if (positionedFilter == true) { break; } // Next hasDocs = pFilter->has_documents(); } delete pFilter; #ifdef DEBUG clog << "FilterUtils::filterDocument: done with " << doc.getLocation(true) << " status " << finalSuccess << endl; #endif return finalSuccess; } bool FilterUtils::reduceDocument(const Document &doc, ReducedAction &action) { string originalType(doc.getType()); return filterDocument(doc, originalType, action); } string FilterUtils::stripMarkup(const string &text) { if (text.empty() == true) { return ""; } Dijon::Filter *pFilter = Dijon::FilterFactory::getFilter("text/xml"); if (pFilter == NULL) { return ""; } Document doc; string strippedText; doc.setData(text.c_str(), text.length()); if ((feedFilter(doc, pFilter) == true) && (pFilter->next_document() == true)) { const dstring &content = pFilter->get_content(); if (content.empty() == false) { strippedText = string(content.c_str(), content.length()); } } delete pFilter; return strippedText; } pinot-1.22/Tokenize/FilterUtils.h000066400000000000000000000052701470740426600170140ustar00rootroot00000000000000/* * Copyright 2007-2012 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _FILTER_UTILS_H #define _FILTER_UTILS_H #include #include #include "Document.h" #include "Visibility.h" #include "filters/Filter.h" /// Drives document reduction and takes action on the final document. class PINOT_EXPORT ReducedAction { public: ReducedAction(); ReducedAction(const ReducedAction &other); virtual ~ReducedAction(); ReducedAction &operator=(const ReducedAction &other); virtual bool positionFilter(const Document &doc, Dijon::Filter *pFilter); virtual bool isReduced(const Document &doc); virtual bool takeAction(Document &doc, bool isNested) = 0; }; /// Utility functions for dealing with Dijon filters. class PINOT_EXPORT FilterUtils { public: virtual ~FilterUtils(); /// Returns a Filter that handles the given MIME type, or one of its parents. static Dijon::Filter *getFilter(const std::string &mimeType); /// Indicates whether a MIME type is supported or not. static bool isSupportedType(const std::string &mimeType); /// Feeds a document's data to a filter. static bool feedFilter(const Document &doc, Dijon::Filter *pFilter); /// Populates a document based on metadata extracted by the filter. static bool populateDocument(Document &doc, Dijon::Filter *pFilter); /// Filters a document until reduced to the minimum. static bool filterDocument(const Document &doc, const std::string &originalType, ReducedAction &action); /// Convenient front-end for filterDocument() to reduce documents. static bool reduceDocument(const Document &doc, ReducedAction &action); /// Strips markup from a piece of text. static std::string stripMarkup(const std::string &text); protected: static std::set m_types; static std::map m_typeAliases; static std::string m_maxNestedSize; FilterUtils(); private: FilterUtils(const FilterUtils &other); FilterUtils &operator=(const FilterUtils &other); }; #endif // _FILTER_UTILS_H pinot-1.22/Tokenize/Makefile.am000066400000000000000000000061411470740426600164270ustar00rootroot00000000000000# Process this file with automake to produce Makefile.in noinst_HEADERS = \ $(top_srcdir)/Tokenize/filters/ArchiveFilter.h \ $(top_srcdir)/Tokenize/filters/ChmFilter.h \ $(top_srcdir)/Tokenize/filters/ExifImageFilter.h \ $(top_srcdir)/Tokenize/filters/Exiv2ImageFilter.h \ $(top_srcdir)/Tokenize/filters/ExternalFilter.h \ $(top_srcdir)/Tokenize/filters/FileOutputFilter.h \ $(top_srcdir)/Tokenize/filters/GMimeMboxFilter.h \ $(top_srcdir)/Tokenize/filters/TagLibMusicFilter.h pkginclude_HEADERS = \ FilterUtils.h \ TextConverter.h nobase_pkginclude_HEADERS = \ filters/Filter.h \ filters/FilterFactory.h \ filters/HtmlFilter.h \ filters/HtmlParser.h \ filters/TextFilter.h \ filters/XmlFilter.h lib_LTLIBRARIES = libexiv2imagefilter.la libexternalfilter.la libmboxfilter.la libtaglibfilter.la if HAVE_LIBARCHIVE lib_LTLIBRARIES += libarchivefilter.la endif if HAVE_CHMLIB lib_LTLIBRARIES += libchmfilter.la endif pkglib_LTLIBRARIES = libFilter.la libTokenize.la libFilter_la_LDFLAGS = \ -static libFilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/Filter.cc \ $(top_srcdir)/Tokenize/filters/FilterFactory.cc \ $(top_srcdir)/Tokenize/filters/HtmlFilter.cc \ $(top_srcdir)/Tokenize/filters/HtmlParser.cc \ $(top_srcdir)/Tokenize/filters/TextFilter.cc \ $(top_srcdir)/Tokenize/filters/XmlFilter.cc if HAVE_LIBARCHIVE libarchivefilter_la_DEPENDENCIES = libFilter.la libarchivefilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/ArchiveFilter.cc libarchivefilter_la_LDFLAGS = -module -avoid-version libarchivefilter_la_LIBADD = -larchive endif if HAVE_CHMLIB libchmfilter_la_DEPENDENCIES = libFilter.la libchmfilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/ChmFilter.cc libchmfilter_la_LDFLAGS = -module -avoid-version libchmfilter_la_LIBADD = -lchm endif libexiv2imagefilter_la_DEPENDENCIES = libFilter.la libexiv2imagefilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/Exiv2ImageFilter.cc libexiv2imagefilter_la_LDFLAGS = -module -avoid-version libexiv2imagefilter_la_LIBADD = @EXIV2_LIBS@ libexternalfilter_la_DEPENDENCIES = libFilter.la libexternalfilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/ExternalFilter.cc \ $(top_srcdir)/Tokenize/filters/FileOutputFilter.cc libexternalfilter_la_LDFLAGS = -module -avoid-version libexternalfilter_la_LIBADD = @XML_LIBS@ libmboxfilter_la_DEPENDENCIES = libFilter.la libmboxfilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/GMimeMboxFilter.cc libmboxfilter_la_LDFLAGS = -module -avoid-version libmboxfilter_la_LIBADD = @GMIME_LIBS@ libtaglibfilter_la_DEPENDENCIES = libFilter.la libtaglibfilter_la_SOURCES = \ $(top_srcdir)/Tokenize/filters/TagLibMusicFilter.cc libtaglibfilter_la_LDFLAGS = -module -avoid-version libtaglibfilter_la_LIBADD = @TAGLIB_LIBS@ libTokenize_la_LDFLAGS = \ -static libTokenize_la_SOURCES = \ FilterUtils.cpp \ TextConverter.cpp \ $(top_srcdir)/IndexSearch/cjkv/CJKVTokenizer.cc AM_CXXFLAGS = \ @MISC_CFLAGS@ \ -I$(top_srcdir)/Utils -Ifilters \ @GMIME_CFLAGS@ @XML_CFLAGS@ @EXIV2_CFLAGS@ @TAGLIB_CFLAGS@ \ -D_DYNAMIC_DIJON_FILTERS \ -D_DIJON_EXTERNALFILTER_CONFFILE=\"$(sysconfdir)/pinot/external-filters.xml\" pinot-1.22/Tokenize/TextConverter.cpp000066400000000000000000000106161470740426600177150ustar00rootroot00000000000000/* * Copyright 2008-2014 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include "StringManip.h" #include "TextConverter.h" using std::clog; using std::endl; using std::string; using namespace Glib; TextConverter::TextConverter(unsigned int maxErrors) : m_utf8Locale(false), m_maxErrors(maxErrors), m_conversionErrors(0) { // Get the locale charset m_utf8Locale = get_charset(m_localeCharset); } TextConverter::~TextConverter() { } dstring TextConverter::convert(const dstring &text, string &fromCharset, const string &toCharset) { dstring outputText; char outputBuffer[8192]; char *pInput = const_cast(text.c_str()); gsize inputSize = (gsize)text.length(); bool invalidSequence = false; outputText.clear(); try { IConv converter(toCharset, fromCharset); while (inputSize > 0) { char *pOutput = outputBuffer; gsize outputSize = 8192; size_t conversions = converter.iconv(&pInput, &inputSize, &pOutput, &outputSize); int errorCode = errno; if (conversions == static_cast(-1)) { if (errorCode == EILSEQ) { // Conversion was only partially successful ++m_conversionErrors; #ifdef DEBUG clog << "TextConverter::convert: invalid sequence" << endl; #endif if (m_conversionErrors >= m_maxErrors) { // Give up return text; } converter.reset(); outputText.append(outputBuffer, 8192 - outputSize); if (invalidSequence == false) { outputText += "?"; invalidSequence = true; } // Skip that ++pInput; --inputSize; continue; } else if (errorCode != E2BIG) { #ifdef DEBUG clog << "TextConverter::convert: unknown error " << errorCode << endl; #endif return text; } } else { invalidSequence = false; } // Append what was successfully converted outputText.append(outputBuffer, 8192 - outputSize); } #ifdef DEBUG clog << "TextConverter::convert: " << m_conversionErrors << " conversion errors" << endl; #endif } catch (Error &ce) { #ifdef DEBUG clog << "TextConverter::convert: " << ce.what() << endl; #endif outputText.clear(); string::size_type pos = fromCharset.find('_'); if (pos != string::npos) { string fixedCharset(StringManip::replaceSubString(fromCharset, "_", "-")); #ifdef DEBUG clog << "TextConverter::convert: trying with charset " << fixedCharset << endl; #endif fromCharset = fixedCharset; outputText = convert(text, fromCharset, toCharset); } } catch (...) { #ifdef DEBUG clog << "TextConverter::convert: unknown exception" << endl; #endif outputText.clear(); } return outputText; } dstring TextConverter::toUTF8(const dstring &text, string &charset) { string textCharset(StringManip::toLowerCase(charset)); m_conversionErrors = 0; if ((text.empty() == true) || (textCharset == "utf-8")) { // No conversion necessary return text; } if (textCharset.empty() == true) { if (m_utf8Locale == true) { // The current locale uses UTF-8 return text; } textCharset = m_localeCharset; } return convert(text, textCharset, "UTF-8"); } dstring TextConverter::fromUTF8(const dstring &text, const string &charset) { string fromCharset("UTF-8"); return convert(text, fromCharset, charset); } string TextConverter::fromUTF8(const string &text) { try { return locale_from_utf8(text); } catch (Error &ce) { #ifdef DEBUG clog << "TextConverter::fromUTF8: " << ce.what() << endl; #endif } catch (...) { #ifdef DEBUG clog << "TextConverter::fromUTF8: unknown exception" << endl; #endif } return ""; } unsigned int TextConverter::getErrorsCount(void) const { return m_conversionErrors; } pinot-1.22/Tokenize/TextConverter.h000066400000000000000000000033371470740426600173640ustar00rootroot00000000000000/* * Copyright 2008-2014 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _TEXT_CONVERTER_H #define _TEXT_CONVERTER_H #include #include "Memory.h" #include "Visibility.h" class PINOT_EXPORT TextConverter { public: TextConverter(unsigned int maxErrors = 10); virtual ~TextConverter(); /// Converts to UTF-8. dstring toUTF8(const dstring &text, std::string &charset); /// Converts from UTF-8 the locale charset. std::string fromUTF8(const std::string &text); /// Converts from UTF-8. dstring fromUTF8(const dstring &text, const std::string &charset); /// Gets the number of conversion errors. unsigned int getErrorsCount(void) const; protected: std::string m_localeCharset; bool m_utf8Locale; unsigned int m_maxErrors; unsigned int m_conversionErrors; dstring convert(const dstring &text, std::string &fromCharset, const std::string &toCharset); private: TextConverter(const TextConverter &other); TextConverter& operator=(const TextConverter& other); }; #endif // _TEXT_CONVERTER_H pinot-1.22/Tokenize/filters/000077500000000000000000000000001470740426600160415ustar00rootroot00000000000000pinot-1.22/Tokenize/filters/ArchiveFilter.cc000066400000000000000000000224151470740426600211030ustar00rootroot00000000000000/* * Copyright 2009-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "ArchiveFilter.h" using std::string; using std::clog; using std::endl; using std::stringstream; using namespace Dijon; #ifdef _DYNAMIC_DIJON_FILTERS DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { mime_types.m_mimeTypes.clear(); mime_types.m_mimeTypes.insert("application/x-archive"); mime_types.m_mimeTypes.insert("application/x-bzip-compressed-tar"); mime_types.m_mimeTypes.insert("application/x-compressed-tar"); mime_types.m_mimeTypes.insert("application/x-cd-image"); mime_types.m_mimeTypes.insert("application/x-deb"); mime_types.m_mimeTypes.insert("application/x-iso9660-image"); mime_types.m_mimeTypes.insert("application/x-tar"); mime_types.m_mimeTypes.insert("application/x-tarz"); return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if ((input == Filter::DOCUMENT_DATA) || (input == Filter::DOCUMENT_STRING) || (input == Filter::DOCUMENT_FILE_NAME)) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new ArchiveFilter(); } #endif ArchiveFilter::ArchiveFilter() : Filter(), m_maxSize(0), m_parseDocument(false), m_isBig(false), m_pMem(NULL), m_fd(-1), m_pHandle(NULL) { } ArchiveFilter::~ArchiveFilter() { rewind(); } void ArchiveFilter::set_mime_type(const string &mime_type) { Filter::set_mime_type(mime_type); if ((mime_type == "application/x-cd-image") || (mime_type == "application/x-iso9660-image")) { m_isBig = true; } } bool ArchiveFilter::is_data_input_ok(DataInput input) const { if ((input == DOCUMENT_DATA) || (input == DOCUMENT_STRING)) { return !m_isBig; } else if (input == DOCUMENT_FILE_NAME) { return true; } return false; } bool ArchiveFilter::set_property(Properties prop_name, const string &prop_value) { if ((prop_name == MAXIMUM_NESTED_SIZE) && (prop_value.empty() == false)) { m_maxSize = (off_t)atoll(prop_value.c_str()); } return false; } bool ArchiveFilter::set_document_data(const char *data_ptr, off_t data_length) { initialize(); if ((m_pHandle == NULL) || (m_isBig == true)) { return false; } // archive_read_open_memory() expects a non-const pointer // so we'd better make a copy m_pMem = (char *)malloc(sizeof(char) * (data_length + 1)); if (m_pMem == NULL) { return false; } void *pVoidMem = static_cast(m_pMem); memcpy(pVoidMem, static_cast(data_ptr), data_length); m_pMem[data_length] = '\0'; if (archive_read_open_memory(m_pHandle, pVoidMem, (size_t)data_length) == ARCHIVE_OK) { m_parseDocument = true; #ifdef DEBUG clog << "ArchiveFilter::set_document_data: " << m_mimeType << ", format " << archive_format(m_pHandle) << endl; #endif return true; } free(m_pMem); m_pMem = NULL; return false; } bool ArchiveFilter::set_document_string(const string &data_str) { return set_document_data(data_str.c_str(), data_str.length()); } bool ArchiveFilter::set_document_file(const string &file_path, bool unlink_when_done) { if (Filter::set_document_file(file_path, unlink_when_done) == true) { int openFlags = O_RDONLY; #ifdef O_CLOEXEC openFlags |= O_CLOEXEC; #endif initialize(); if (m_pHandle == NULL) { return false; } // Open the archive #ifdef O_NOATIME m_fd = open(file_path.c_str(), openFlags|O_NOATIME); #else m_fd = open(file_path.c_str(), openFlags); #endif #ifdef O_NOATIME if ((m_fd < 0) && (errno == EPERM)) { // Try again m_fd = open(file_path.c_str(), openFlags); } #endif if (m_fd < 0) { #ifdef DEBUG clog << "ArchiveFilter::set_document_file: couldn't open " << file_path << endl; #endif return false; } #ifndef O_CLOEXEC int fdFlags = fcntl(m_fd, F_GETFD); fcntl(m_fd, F_SETFD, fdFlags|FD_CLOEXEC); #endif if (archive_read_open_fd(m_pHandle, m_fd, 10240) == ARCHIVE_OK) { m_parseDocument = true; #ifdef DEBUG clog << "ArchiveFilter::set_document_file: " << file_path << ", " << m_mimeType << ", format " << archive_format(m_pHandle) << endl; #endif return true; } close(m_fd); m_fd = -1; } return false; } bool ArchiveFilter::set_document_uri(const string &uri) { return false; } bool ArchiveFilter::has_documents(void) const { return m_parseDocument; } bool ArchiveFilter::next_document(void) { return next_document(""); } void ArchiveFilter::initialize(void) { m_pHandle = archive_read_new(); if (m_pHandle != NULL) { // Enable what we need for the given type if ((m_mimeType == "application/x-archive") || (m_mimeType == "application/x-deb")) { archive_read_support_format_ar(m_pHandle); } else if (m_mimeType == "application/x-bzip-compressed-tar") { archive_read_support_filter_bzip2(m_pHandle); archive_read_support_format_tar(m_pHandle); archive_read_support_format_gnutar(m_pHandle); } else if (m_mimeType == "application/x-compressed-tar") { archive_read_support_filter_gzip(m_pHandle); archive_read_support_format_tar(m_pHandle); archive_read_support_format_gnutar(m_pHandle); } else if ((m_mimeType == "application/x-cd-image") || (m_mimeType == "application/x-iso9660-image")) { archive_read_support_format_iso9660(m_pHandle); } else if (m_mimeType == "application/x-tar") { archive_read_support_format_tar(m_pHandle); archive_read_support_format_gnutar(m_pHandle); } else if (m_mimeType == "application/x-tarz") { archive_read_support_filter_compress(m_pHandle); archive_read_support_format_tar(m_pHandle); archive_read_support_format_gnutar(m_pHandle); } } } bool ArchiveFilter::next_document(const string &ipath) { struct archive_entry *pEntry = NULL; const char *pFileName = NULL; bool foundFile = false; if ((m_parseDocument == false) || (m_pHandle == NULL)) { return false; } do { if (archive_read_next_header(m_pHandle, &pEntry) != ARCHIVE_OK) { #ifdef DEBUG clog << "ArchiveFilter::next_document: no more entries" << endl; #endif m_parseDocument = false; return false; } pFileName = archive_entry_pathname(pEntry); if (pFileName == NULL) { return false; } if (ipath.empty() == true) { foundFile = true; } else if (ipath != pFileName) { if (archive_read_data_skip(m_pHandle) != ARCHIVE_OK) { m_parseDocument = false; return false; } } else { foundFile = true; } } while (foundFile == false); stringstream sizeStream; const struct stat *pEntryStats = archive_entry_stat(pEntry); if (pEntryStats == NULL) { return false; } off_t size = pEntryStats->st_size; m_content.clear(); m_metaData.clear(); m_metaData["title"] = pFileName; m_metaData["ipath"] = string("f=") + pFileName; sizeStream << size; m_metaData["size"] = sizeStream.str(); #ifdef DEBUG clog << "ArchiveFilter::next_document: found " << pFileName << ", size " << size << " bytes" << endl; #endif if (S_ISDIR(pEntryStats->st_mode)) { m_metaData["mimetype"] = "x-directory/normal"; } else if (S_ISLNK(pEntryStats->st_mode)) { m_metaData["mimetype"] = "inode/symlink"; } else if (S_ISREG(pEntryStats->st_mode)) { const void *pBuffer = NULL; size_t readSize = 0, totalSize = 0; off_t offset = 0; bool readFile = true; m_metaData["mimetype"] = "SCANTITLE"; while (archive_read_data_block(m_pHandle, &pBuffer, &readSize, &offset) == ARCHIVE_OK) { totalSize += readSize; if ((readFile == true) && (m_maxSize > 0) && (totalSize > m_maxSize)) { #ifdef DEBUG clog << "ArchiveFilter::next_document: stopping at " << totalSize << endl; #endif readFile = false; } if (readFile == true) { m_content.append(static_cast(pBuffer), readSize); } } #ifdef DEBUG clog << "ArchiveFilter::next_document: read " << totalSize << "/" << m_content.size() << " bytes" << endl; #endif return true; } return true; } bool ArchiveFilter::skip_to_document(const string &ipath) { string::size_type fPos = ipath.find("f="); if (fPos != 0) { return false; } return next_document(ipath.substr(2)); } string ArchiveFilter::get_error(void) const { return ""; } void ArchiveFilter::rewind(void) { Filter::rewind(); m_parseDocument = m_isBig = false; if (m_pHandle != NULL) { archive_read_close(m_pHandle); archive_read_free(m_pHandle); m_pHandle = NULL; } if (m_pMem != NULL) { free(m_pMem); m_pMem = NULL; } if (m_fd >= 0) { close(m_fd); m_fd = -1; } } pinot-1.22/Tokenize/filters/ArchiveFilter.h000066400000000000000000000075661470740426600207570ustar00rootroot00000000000000/* * Copyright 2009-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_ARCHIVEFILTER_H #define _DIJON_ARCHIVEFILTER_H #include #include #include "Filter.h" namespace Dijon { class ArchiveFilter : public Filter { public: /// Builds an empty filter. ArchiveFilter(); /// Destroys the filter. virtual ~ArchiveFilter(); // Information. /// Sets the MIME type the filter will handle. virtual void set_mime_type(const std::string &mime_type); /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; protected: off_t m_maxSize; bool m_parseDocument; bool m_isBig; char *m_pMem; int m_fd; struct archive *m_pHandle; virtual void rewind(void); void initialize(void); bool next_document(const std::string &ipath); private: /// ArchiveFilter objects cannot be copied. ArchiveFilter(const ArchiveFilter &other); /// ArchiveFilter objects cannot be copied. ArchiveFilter& operator=(const ArchiveFilter& other); }; } #endif // _DIJON_ARCHIVEFILTER_H pinot-1.22/Tokenize/filters/ChmFilter.cc000066400000000000000000000133041470740426600202260ustar00rootroot00000000000000/* * Copyright 2011-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include "ChmFilter.h" using std::string; using std::vector; using std::clog; using std::endl; using std::stringstream; using namespace Dijon; #ifdef _DYNAMIC_DIJON_FILTERS DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { mime_types.m_mimeTypes.clear(); mime_types.m_mimeTypes.insert("application/x-chm"); return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if (input == Filter::DOCUMENT_FILE_NAME) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new ChmFilter(); } #endif static int enumerator(struct chmFile *pHandle, struct chmUnitInfo *pUnitInfo, void *pContext) { if (pContext == NULL) { return CHM_ENUMERATOR_CONTINUE; } ChmFilter *pFilter = static_cast(pContext); if ((pUnitInfo != NULL) && (pUnitInfo->flags & CHM_ENUMERATE_FILES) && (pUnitInfo->length > 0) && (pUnitInfo->path != NULL)) { #ifdef DEBUG clog << "ChmFilter: found " << pUnitInfo->path << ", size " << pUnitInfo->length << endl; #endif pFilter->add_unit(pUnitInfo); } return CHM_ENUMERATOR_CONTINUE; } ChmFilter::ChmFilter() : Filter(), m_maxSize(0), m_pHandle(NULL), m_doneAll(false) { } ChmFilter::~ChmFilter() { rewind(); } bool ChmFilter::is_data_input_ok(DataInput input) const { if (input == DOCUMENT_FILE_NAME) { return true; } return false; } bool ChmFilter::set_property(Properties prop_name, const string &prop_value) { if ((prop_name == MAXIMUM_NESTED_SIZE) && (prop_value.empty() == false)) { m_maxSize = (size_t)atol(prop_value.c_str()); } return false; } bool ChmFilter::set_document_data(const char *data_ptr, off_t data_length) { return false; } bool ChmFilter::set_document_string(const string &data_str) { return false; } bool ChmFilter::set_document_file(const string &file_path, bool unlink_when_done) { if ((Filter::set_document_file(file_path, unlink_when_done) == true) && ((m_pHandle = chm_open(file_path.c_str())) != NULL)) { return true; } return false; } bool ChmFilter::set_document_uri(const string &uri) { return false; } bool ChmFilter::has_documents(void) const { if ((m_pHandle != NULL) && (m_doneAll == false)) { return true; } return false; } bool ChmFilter::next_document(void) { return next_document(""); } bool ChmFilter::next_document(const string &ipath) { struct chmUnitInfo unitInfo; struct chmUnitInfo *pUnitInfo = NULL; bool deleteUnitInfo = false; if (m_pHandle == NULL) { return false; } m_content.clear(); m_metaData.clear(); if (ipath.empty() == false) { // Resolve this if (chm_resolve_object(m_pHandle, ipath.c_str(), &unitInfo) != CHM_RESOLVE_SUCCESS) { return false; } pUnitInfo = &unitInfo; m_doneAll = true; } else { if (m_units.empty() == true) { // Enumerate content if ((chm_enumerate(m_pHandle, CHM_ENUMERATE_ALL, enumerator, this) == 0)) { return false; } } vector::iterator unitIter = m_units.begin(); if (unitIter == m_units.end()) { return false; } pUnitInfo = *unitIter; deleteUnitInfo = true; m_units.erase(unitIter); m_doneAll = m_units.empty(); } if (pUnitInfo != NULL) { char *pBuffer = NULL; if (pUnitInfo->length > 0) { pBuffer = Memory::allocateBuffer(pUnitInfo->length + 1); } if (pBuffer != NULL) { if ((pUnitInfo->path != NULL) && (chm_retrieve_object(m_pHandle, pUnitInfo, (unsigned char*)pBuffer, 0, pUnitInfo->length) != 0)) { stringstream sizeStream; m_content = pBuffer; m_metaData["title"] = pUnitInfo->path; m_metaData["ipath"] = pUnitInfo->path; sizeStream << pUnitInfo->length; m_metaData["size"] = sizeStream.str(); m_metaData["mimetype"] = "SCAN"; #ifdef DEBUG clog << "ChmFilter::next_document: returning " << pUnitInfo->path << ", size " << m_content.size() << endl; #endif } Memory::freeBuffer(pBuffer, pUnitInfo->length + 1); } if (deleteUnitInfo == true) { delete pUnitInfo; } } return !m_content.empty(); } bool ChmFilter::skip_to_document(const string &ipath) { return next_document(ipath); } string ChmFilter::get_error(void) const { return ""; } void ChmFilter::rewind(void) { Filter::rewind(); for (vector::iterator unitIter = m_units.begin(); unitIter != m_units.end(); ++unitIter) { struct chmUnitInfo *pUnitInfo = *unitIter; delete pUnitInfo; } m_units.clear(); if (m_pHandle != NULL) { chm_close(m_pHandle); m_pHandle = NULL; } m_doneAll = false; } void ChmFilter::add_unit(struct chmUnitInfo *pUnitInfo) { if (pUnitInfo == NULL) { return; } struct chmUnitInfo *pCopy = new struct chmUnitInfo; memcpy((void*)pCopy, pUnitInfo, sizeof(struct chmUnitInfo)); m_units.push_back(pCopy); } pinot-1.22/Tokenize/filters/ChmFilter.h000066400000000000000000000074451470740426600201010ustar00rootroot00000000000000/* * Copyright 2011-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_CHMFILTER_H #define _DIJON_CHMFILTER_H #include #include #include #include "Filter.h" namespace Dijon { class ChmFilter : public Filter { public: /// Builds an empty filter. ChmFilter(); /// Destroys the filter. virtual ~ChmFilter(); // Information. /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; // Enumeration. /// Adds a unit. void add_unit(struct chmUnitInfo *pUnitInfo); protected: size_t m_maxSize; struct chmFile *m_pHandle; std::vector m_units; bool m_doneAll; virtual void rewind(void); bool next_document(const std::string &ipath); private: /// ChmFilter objects cannot be copied. ChmFilter(const ChmFilter &other); /// ChmFilter objects cannot be copied. ChmFilter& operator=(const ChmFilter& other); }; } #endif // _DIJON_CHMFILTER_H pinot-1.22/Tokenize/filters/ExifImageFilter.cc000066400000000000000000000127221470740426600213600ustar00rootroot00000000000000/* * Copyright 2008-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include #include #include #include "config.h" #include "ExifImageFilter.h" using std::string; using std::clog; using std::clog; using std::endl; using namespace Dijon; #ifdef _DYNAMIC_DIJON_FILTERS DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { mime_types.m_mimeTypes.clear(); mime_types.m_mimeTypes.insert("image/jpeg"); return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if (input == Filter::DOCUMENT_FILE_NAME) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new ExifImageFilter(); } #endif class ExifMetaData { public: ExifMetaData(dstring &content) : m_content(content) { } string m_title; string m_date; dstring &m_content; }; static void entryCallback(ExifEntry *pEntry, void *pData) { if ((pEntry == NULL) || (pData == NULL)) { return; } ExifMetaData *pMetaData = (ExifMetaData *)pData; struct tm timeTm; char value[1024]; // Initialize the structure timeTm.tm_sec = timeTm.tm_min = timeTm.tm_hour = timeTm.tm_mday = 0; timeTm.tm_mon = timeTm.tm_year = timeTm.tm_wday = timeTm.tm_yday = timeTm.tm_isdst = 0; exif_entry_get_value(pEntry, value, 1024); switch (pEntry->tag) { case EXIF_TAG_DOCUMENT_NAME: pMetaData->m_title = value; break; case EXIF_TAG_DATE_TIME: #ifdef HAVE_STRPTIME if (strptime(value, "%Y:%m:%d %H:%M:%S", &timeTm) != NULL) #else { string valueStr(value); timeTm.tm_year = atoi(valueStr.substr(0, 4).c_str()); timeTm.tm_mon = atoi(valueStr.substr(5, 2).c_str()); timeTm.tm_mday = atoi(valueStr.substr(8, 2).c_str()); timeTm.tm_hour = atoi(valueStr.substr(11, 2).c_str()); timeTm.tm_min = atoi(valueStr.substr(14, 2).c_str()); timeTm.tm_sec = atoi(valueStr.substr(17, 2).c_str()); } if (timeTm.tm_mday > 0) #endif { char timeStr[64]; #if defined(__GNU_LIBRARY__) // %z is a GNU extension if (strftime(timeStr, 64, "%a, %d %b %Y %H:%M:%S %z", &timeTm) > 0) #else if (strftime(timeStr, 64, "%a, %d %b %Y %H:%M:%S %Z", &timeTm) > 0) #endif { pMetaData->m_date = timeStr; } } break; default: pMetaData->m_content += " "; pMetaData->m_content.append(value, strlen(value)); break; } #ifdef DEBUG clog << "ExifImageFilter: tag " << exif_tag_get_name(pEntry->tag) << ": " << value << endl; #endif } static void contentCallback(ExifContent *pContent, void *pData) { exif_content_foreach_entry(pContent, entryCallback, pData); } ExifImageFilter::ExifImageFilter() : Filter(), m_parseDocument(false) { } ExifImageFilter::~ExifImageFilter() { rewind(); } bool ExifImageFilter::is_data_input_ok(DataInput input) const { if (input == DOCUMENT_FILE_NAME) { return true; } return false; } bool ExifImageFilter::set_property(Properties prop_name, const string &prop_value) { return false; } bool ExifImageFilter::set_document_data(const char *data_ptr, off_t data_length) { return false; } bool ExifImageFilter::set_document_string(const string &data_str) { return false; } bool ExifImageFilter::set_document_file(const string &file_path, bool unlink_when_done) { if (Filter::set_document_file(file_path, unlink_when_done) == true) { m_parseDocument = true; return true; } return false; } bool ExifImageFilter::set_document_uri(const string &uri) { return false; } bool ExifImageFilter::has_documents(void) const { return m_parseDocument; } bool ExifImageFilter::next_document(void) { if (m_parseDocument == true) { #ifdef DEBUG clog << "ExifImageFilter::next_document: " << m_filePath << endl; #endif m_parseDocument = false; m_metaData["mimetype"] = "text/plain"; m_metaData["charset"] = "utf-8"; ExifData *pData = exif_data_new_from_file(m_filePath.c_str()); if (pData == NULL) { clog << "No EXIF data in " << m_filePath.c_str() << endl; } else { ExifMetaData *pMetaData = new ExifMetaData(m_content); // Get it all exif_data_foreach_content(pData, contentCallback, pMetaData); m_metaData["title"] = pMetaData->m_title; if (pMetaData->m_date.empty() == false) { m_metaData["date"] = pMetaData->m_date; } delete pMetaData; exif_data_unref(pData); } return true; } return false; } bool ExifImageFilter::skip_to_document(const string &ipath) { if (ipath.empty() == true) { return next_document(); } return false; } string ExifImageFilter::get_error(void) const { return ""; } void ExifImageFilter::rewind(void) { Filter::rewind(); m_parseDocument = false; } pinot-1.22/Tokenize/filters/ExifImageFilter.h000066400000000000000000000071551470740426600212260ustar00rootroot00000000000000/* * Copyright 2008-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_EXIFIMAGEFILTER_H #define _DIJON_EXIFIMAGEFILTER_H #include #include "Filter.h" namespace Dijon { class ExifImageFilter : public Filter { public: /// Builds an empty filter. ExifImageFilter(); /// Destroys the filter. virtual ~ExifImageFilter(); // Information. /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; protected: bool m_parseDocument; virtual void rewind(void); private: /// ExifImageFilter objects cannot be copied. ExifImageFilter(const ExifImageFilter &other); /// ExifImageFilter objects cannot be copied. ExifImageFilter& operator=(const ExifImageFilter& other); }; } #endif // _DIJON_EXIFIMAGEFILTER_H pinot-1.22/Tokenize/filters/Exiv2ImageFilter.cc000066400000000000000000000245751470740426600214730ustar00rootroot00000000000000/* * Copyright 2011-2019 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #include #include #include #include #ifdef HAVE_EXIV2_XMP_EXIV2_HPP #include #include #else #include #endif #include "config.h" #include "Exiv2ImageFilter.h" using std::string; using std::clog; using std::clog; using std::endl; using namespace Dijon; #ifdef _DYNAMIC_DIJON_FILTERS DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { mime_types.m_mimeTypes.clear(); // List from http://dev.exiv2.org/wiki/exiv2/Supported_image_formats // without application/rdf+xml mime_types.m_mimeTypes.insert("image/jpeg"); mime_types.m_mimeTypes.insert("image/x-exv"); mime_types.m_mimeTypes.insert("image/x-canon-cr2"); mime_types.m_mimeTypes.insert("image/x-canon-crw"); mime_types.m_mimeTypes.insert("image/x-minolta-mrw"); mime_types.m_mimeTypes.insert("image/tiff"); mime_types.m_mimeTypes.insert("image/x-nikon-nef"); mime_types.m_mimeTypes.insert("image/x-pentax-pef"); mime_types.m_mimeTypes.insert("image/x-panasonic-rw2"); mime_types.m_mimeTypes.insert("image/x-samsung-srw"); mime_types.m_mimeTypes.insert("image/x-olympus-orf"); mime_types.m_mimeTypes.insert("image/png"); mime_types.m_mimeTypes.insert("image/pgf"); mime_types.m_mimeTypes.insert("image/x-fuji-raf"); mime_types.m_mimeTypes.insert("image/x-photoshop"); mime_types.m_mimeTypes.insert("image/targa"); mime_types.m_mimeTypes.insert("image/x-ms-bmp"); mime_types.m_mimeTypes.insert("image/jp2"); return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if (input == Filter::DOCUMENT_FILE_NAME) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new Exiv2ImageFilter(); } #endif static string iptcDateTime(const string &ccyymmdd, const string &hhmmss) { struct tm timeTm; // Initialize the structure timeTm.tm_sec = timeTm.tm_min = timeTm.tm_hour = timeTm.tm_mday = 0; timeTm.tm_mon = timeTm.tm_year = timeTm.tm_wday = timeTm.tm_yday = timeTm.tm_isdst = 0; #ifdef HAVE_STRPTIME if ((strptime(ccyymmdd.c_str(), "%C%Y%m%d", &timeTm) != NULL) && (strptime(hhmmss.c_str(), "%H%M%S", &timeTm) != NULL)) #else timeTm.tm_year = atoi(ccyymmdd.substr(2, 4).c_str()); timeTm.tm_mon = atoi(ccyymmdd.substr(6, 2).c_str()); timeTm.tm_mday = atoi(ccyymmdd.substr(8, 2).c_str()); timeTm.tm_hour = atoi(hhmmss.substr(0, 2).c_str()); timeTm.tm_min = atoi(hhmmss.substr(2, 2).c_str()); timeTm.tm_sec = atoi(hhmmss.substr(4, 2).c_str()); if (timeTm.tm_yday > 0) #endif { char timeStr[64]; if (strftime(timeStr, 64, "%a, %d %b %Y %H:%M:%S", &timeTm) > 0) { #ifdef DEBUG clog << "IPTC " << ccyymmdd << " " << hhmmss << " is " << timeStr << endl; #endif return timeStr; } } return ""; } static string exifDateTime(const string &value) { struct tm timeTm; // Initialize the structure timeTm.tm_sec = timeTm.tm_min = timeTm.tm_hour = timeTm.tm_mday = 0; timeTm.tm_mon = timeTm.tm_year = timeTm.tm_wday = timeTm.tm_yday = timeTm.tm_isdst = 0; #ifdef HAVE_STRPTIME if (strptime(value.c_str(), "%Y:%m:%d %H:%M:%S", &timeTm) != NULL) #else timeTm.tm_year = atoi(value.substr(0, 4).c_str()); timeTm.tm_mon = atoi(value.substr(5, 2).c_str()); timeTm.tm_mday = atoi(value.substr(8, 2).c_str()); timeTm.tm_hour = atoi(value.substr(11, 2).c_str()); timeTm.tm_min = atoi(value.substr(14, 2).c_str()); timeTm.tm_sec = atoi(value.substr(17, 2).c_str()); if (timeTm.tm_mday > 0) #endif { char timeStr[64]; if (strftime(timeStr, 64, "%a, %d %b %Y %H:%M:%S", &timeTm) > 0) { #ifdef DEBUG clog << "EXIF " << value << " is " << timeStr << endl; #endif return timeStr; } } return ""; } Exiv2ImageFilter::Exiv2ImageFilter() : Filter(), m_parseDocument(false) { } Exiv2ImageFilter::~Exiv2ImageFilter() { rewind(); } bool Exiv2ImageFilter::is_data_input_ok(DataInput input) const { if (input == DOCUMENT_FILE_NAME) { return true; } return false; } bool Exiv2ImageFilter::set_property(Properties prop_name, const string &prop_value) { return false; } bool Exiv2ImageFilter::set_document_data(const char *data_ptr, off_t data_length) { return false; } bool Exiv2ImageFilter::set_document_string(const string &data_str) { return false; } bool Exiv2ImageFilter::set_document_file(const string &file_path, bool unlink_when_done) { if (Filter::set_document_file(file_path, unlink_when_done) == true) { m_parseDocument = true; return true; } return false; } bool Exiv2ImageFilter::set_document_uri(const string &uri) { return false; } bool Exiv2ImageFilter::has_documents(void) const { return m_parseDocument; } bool Exiv2ImageFilter::next_document(void) { bool foundData = true; if (m_parseDocument == false) { return false; } #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: " << m_filePath << endl; #endif m_parseDocument = false; m_metaData["mimetype"] = "text/plain"; m_metaData["charset"] = "utf-8"; m_metaData["title"] = m_filePath; try { #if EXIV2_TEST_VERSION(0,28,0) Exiv2::Image::UniquePtr image = Exiv2::ImageFactory::open(m_filePath); #else Exiv2::Image::AutoPtr image = Exiv2::ImageFactory::open(m_filePath); #endif if (image.get() == NULL) { clog << m_filePath.c_str() << " is not an image" << endl; return false; } image->readMetadata(); // Tag reference at http://www.exiv2.org/metadata.html Exiv2::XmpData &xmpData = image->xmpData(); if (xmpData.empty() == false) { #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: XMP data in " << m_filePath << endl; #endif for (Exiv2::XmpData::const_iterator tagIter = xmpData.begin(); tagIter != xmpData.end(); ++tagIter) { const char *pTypeName = tagIter->typeName(); if ((pTypeName == NULL) || (strncasecmp(pTypeName, "Text", 4) != 0)) { continue; } const Exiv2::Value &value = tagIter->value(); string key(tagIter->key()); string valueStr(value.toString()); if (valueStr.empty() == false) { m_content += " "; m_content.append(key.c_str(), key.length()); m_content += " "; m_content.append(valueStr.c_str(), valueStr.length()); } #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: " << key << "=" << value << endl; #endif } } Exiv2::IptcData &iptcData = image->iptcData(); if (iptcData.empty() == false) { string iptcDate, iptcTime; #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: IPTC data in " << m_filePath << endl; #endif for (Exiv2::IptcData::const_iterator tagIter = iptcData.begin(); tagIter != iptcData.end(); ++tagIter) { const char *pTypeName = tagIter->typeName(); if (pTypeName == NULL) { continue; } const Exiv2::Value &value = tagIter->value(); string key(tagIter->key()); string valueStr(value.toString()); #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: " << key << "=" << value << endl; #endif if ((strncasecmp(pTypeName, "Date", 4) == 0) && (key == "Iptc.Application2.DateCreated")) { iptcDate = valueStr; } else if ((strncasecmp(pTypeName, "Time", 4) == 0) && (key == "Iptc.Application2.TimeCreated")) { iptcTime = valueStr; } else if (strncasecmp(pTypeName, "String", 6) != 0) { continue; } if (key.find(".ObjectName") != string::npos) { m_metaData["title"] = valueStr; } else if (valueStr.empty() == false) { m_content += " "; m_content.append(key.c_str(), key.length()); m_content += " "; m_content.append(valueStr.c_str(), valueStr.length()); } } if ((iptcDate.empty() == false) || (iptcTime.empty() == false)) { m_metaData["date"] = iptcDateTime(iptcDate, iptcTime); } } Exiv2::ExifData &exifData = image->exifData(); if (exifData.empty() == false) { bool foundDate = false; #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: EXIF data in " << m_filePath << endl; #endif for (Exiv2::ExifData::const_iterator tagIter = exifData.begin(); tagIter != exifData.end(); ++tagIter) { const char *pTypeName = tagIter->typeName(); if ((pTypeName == NULL) || (strncasecmp(pTypeName, "Ascii", 5) != 0)) { continue; } const Exiv2::Value &value = tagIter->value(); string key(tagIter->key()); string valueStr(value.toString()); #ifdef DEBUG clog << "Exiv2ImageFilter::next_document: " << key << "=" << value << endl; #endif if (key == "Exif.Image.DocumentName") { m_metaData["title"] = valueStr; } else if (key.find("Date") != string::npos) { if (((key == "Exif.Photo.DateTimeOriginal") || (key == "Exif.Image.DateTimeOriginal")) && (foundDate == false)) { m_metaData["date"] = exifDateTime(valueStr); foundDate = true; } } else if (valueStr.empty() == false) { m_content += " "; m_content.append(key.c_str(), key.length()); m_content += " "; m_content.append(valueStr.c_str(), valueStr.length()); } } } } #if EXIV2_TEST_VERSION(0,28,0) catch (Exiv2::Error &e) #else catch (Exiv2::AnyError &e) #endif { clog << "Caught exiv2 exception: " << e << endl; foundData = false; } catch (...) { clog << "Caught unknown exception" << endl; foundData = false; } return foundData; } bool Exiv2ImageFilter::skip_to_document(const string &ipath) { if (ipath.empty() == true) { return next_document(); } return false; } string Exiv2ImageFilter::get_error(void) const { return ""; } void Exiv2ImageFilter::rewind(void) { Filter::rewind(); m_parseDocument = false; } pinot-1.22/Tokenize/filters/Exiv2ImageFilter.h000066400000000000000000000071711470740426600213260ustar00rootroot00000000000000/* * Copyright 2011-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_EXIV2IMAGEFILTER_H #define _DIJON_EXIV2IMAGEFILTER_H #include #include "Filter.h" namespace Dijon { class Exiv2ImageFilter : public Filter { public: /// Builds an empty filter. Exiv2ImageFilter(); /// Destroys the filter. virtual ~Exiv2ImageFilter(); // Information. /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; protected: bool m_parseDocument; virtual void rewind(void); private: /// Exiv2ImageFilter objects cannot be copied. Exiv2ImageFilter(const Exiv2ImageFilter &other); /// Exiv2ImageFilter objects cannot be copied. Exiv2ImageFilter& operator=(const Exiv2ImageFilter& other); }; } #endif // _DIJON_EXIV2IMAGEFILTER_H pinot-1.22/Tokenize/filters/ExternalFilter.cc000066400000000000000000000237771470740426600213200ustar00rootroot00000000000000/* * Copyright 2007-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #ifdef HAVE_SOCKETPAIR #ifdef HAVE_FORK #ifdef HAVE_SETRLIMIT #include #include #include #include #include #endif #endif #endif #include #include #include #include #include #include #include "ExternalFilter.h" using std::clog; using std::endl; using std::min; using std::string; using std::set; using std::map; using namespace Dijon; #ifdef _DYNAMIC_DIJON_FILTERS DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { #ifdef _DIJON_EXTERNALFILTER_CONFFILE ExternalFilter::initialize(_DIJON_EXTERNALFILTER_CONFFILE, mime_types); #else ExternalFilter::initialize("/etc/dijon/external-filters.xml", mime_types); #endif return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if (input == Filter::DOCUMENT_FILE_NAME) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new ExternalFilter(); } #endif // This function is heavily inspired by Xapian Omega's shell_protect() static string shell_protect(const string &file_name) { string safefile(file_name); string::size_type p = 0; if ((safefile.empty() == false) && (safefile[0] == '-')) { // If the filename starts with a '-', protect it from being treated as // an option by prepending "./". safefile.insert(0, "./"); p = 2; } while (p < safefile.size()) { // Don't escape some safe characters which are common in filenames. unsigned char ch = safefile[p]; if ((isalnum(ch) == 0) && (strchr("/._-", ch) == NULL)) { safefile.insert(p, "\\"); ++p; } ++p; } return safefile; } map ExternalFilter::m_commandsByType; map ExternalFilter::m_outputsByType; map ExternalFilter::m_charsetsByType; ExternalFilter::ExternalFilter() : FileOutputFilter(), m_maxSize(0), m_doneWithDocument(false) { } ExternalFilter::~ExternalFilter() { rewind(); } bool ExternalFilter::is_data_input_ok(DataInput input) const { if (input == DOCUMENT_FILE_NAME) { return true; } return false; } bool ExternalFilter::set_property(Properties prop_name, const string &prop_value) { if ((prop_name == MAXIMUM_NESTED_SIZE) && (prop_value.empty() == false)) { m_maxSize = (off_t)atoll(prop_value.c_str()); } return true; } bool ExternalFilter::set_document_data(const char *data_ptr, off_t data_length) { return false; } bool ExternalFilter::set_document_string(const string &data_str) { return false; } bool ExternalFilter::set_document_uri(const string &uri) { return false; } bool ExternalFilter::has_documents(void) const { if ((m_doneWithDocument == false) && (m_filePath.empty() == false)) { return true; } return false; } bool ExternalFilter::next_document(void) { if ((m_doneWithDocument == false) && (m_mimeType.empty() == false) && (m_filePath.empty() == false) && (m_commandsByType.empty() == false)) { string outputType("text/plain"); ssize_t maxSize = 0; m_doneWithDocument = true; // Is this type supported ? Assume text/plain if not specified map::const_iterator commandIter = m_commandsByType.find(m_mimeType); if ((commandIter == m_commandsByType.end()) || (commandIter->second.empty() == true)) { return false; } // What's the output type ? map::const_iterator outputIter = m_outputsByType.find(m_mimeType); if (outputIter != m_outputsByType.end()) { outputType = outputIter->second; } if (outputType != "text/plain") { maxSize = m_maxSize; } if (run_command(commandIter->second, maxSize) == true) { // Fill in general details m_metaData["uri"] = "file://" + m_filePath; m_metaData["mimetype"] = outputType; // Is it in a known charset ? map::const_iterator charsetIter = m_charsetsByType.find(m_mimeType); if (charsetIter != m_charsetsByType.end()) { m_metaData["charset"] = charsetIter->second; } return true; } return false; } rewind(); return false; } bool ExternalFilter::skip_to_document(const string &ipath) { if (ipath.empty() == true) { return next_document(); } return false; } string ExternalFilter::get_error(void) const { return ""; } void ExternalFilter::initialize(const string &config_file, MIMETypes &types) { xmlDoc *pDoc = NULL; xmlNode *pRootElement = NULL; types.m_mimeTypes.clear(); // Parse the file and get the document #if LIBXML_VERSION < 20600 pDoc = xmlParseFile(config_file.c_str()); #else pDoc = xmlReadFile(config_file.c_str(), NULL, XML_PARSE_NOCDATA); #endif if (pDoc == NULL) { return; } // Iterate through the root element's nodes pRootElement = xmlDocGetRootElement(pDoc); for (xmlNode *pCurrentNode = pRootElement->children; pCurrentNode != NULL; pCurrentNode = pCurrentNode->next) { // What type of tag is it ? if (pCurrentNode->type != XML_ELEMENT_NODE) { continue; } // Get all filter elements if (xmlStrncmp(pCurrentNode->name, BAD_CAST"filter", 6) == 0) { string mimeType, charset, command, arguments, output; for (xmlNode *pCurrentCodecNode = pCurrentNode->children; pCurrentCodecNode != NULL; pCurrentCodecNode = pCurrentCodecNode->next) { if (pCurrentCodecNode->type != XML_ELEMENT_NODE) { continue; } char *pChildContent = (char*)xmlNodeGetContent(pCurrentCodecNode); if (pChildContent == NULL) { continue; } // Filters are keyed by their MIME type, "extension" is ignored if (xmlStrncmp(pCurrentCodecNode->name, BAD_CAST"mimetype", 8) == 0) { mimeType = pChildContent; } else if (xmlStrncmp(pCurrentCodecNode->name, BAD_CAST"charset", 7) == 0) { charset = pChildContent; } else if (xmlStrncmp(pCurrentCodecNode->name, BAD_CAST"command", 7) == 0) { command = pChildContent; } if (xmlStrncmp(pCurrentCodecNode->name, BAD_CAST"arguments", 9) == 0) { arguments = pChildContent; } else if (xmlStrncmp(pCurrentCodecNode->name, BAD_CAST"output", 6) == 0) { output = pChildContent; } // Free xmlFree(pChildContent); } if ((mimeType.empty() == false) && (command.empty() == false) && (arguments.empty() == false)) { #ifdef DEBUG clog << "ExternalFilter::initialize: " << mimeType << "=" << command << " " << arguments << endl; #endif // Command to run m_commandsByType[mimeType] = command + " " + arguments; // Output if (output.empty() == false) { m_outputsByType[mimeType] = output; } // Charset if (charset.empty() == false) { m_charsetsByType[mimeType] = charset; } types.m_mimeTypes.insert(mimeType); } } } // Free the document xmlFreeDoc(pDoc); } void ExternalFilter::rewind(void) { Filter::rewind(); m_doneWithDocument = false; } // This function is heavily inspired by Xapian Omega's stdout_to_string() bool ExternalFilter::run_command(const string &command, ssize_t maxSize) { string commandLine(command); int fds[2]; int status = 0; bool replacedParam = false, gotOutput = false; string::size_type argPos = commandLine.find("%s"); while (argPos != string::npos) { string quotedFilePath(shell_protect(m_filePath)); commandLine.replace(argPos, 2, quotedFilePath); replacedParam = true; // Next argPos = commandLine.find("%s", argPos + 1); } if (replacedParam == false) { // Append commandLine += " "; commandLine += shell_protect(m_filePath); } // We want to be able to get the exit status of the child process signal(SIGCHLD, SIG_DFL); if (socketpair(AF_UNIX, SOCK_STREAM, PF_UNSPEC, fds) < 0) { return false; } #ifdef DEBUG clog << "ExternalFilter::run_command: running " << commandLine << endl; #endif // Fork and execute the command pid_t childPid = fork(); if (childPid == 0) { // Child process // Close the parent's side of the socket pair close(fds[0]); // Connect stdout, stderr and stdlog to our side of the socket pair dup2(fds[1], 1); dup2(fds[1], 2); dup2(fds[1], 3); // Limit CPU time for external programs to 300 seconds struct rlimit cpu_limit = { 300, RLIM_INFINITY } ; setrlimit(RLIMIT_CPU, &cpu_limit); execl("/bin/sh", "/bin/sh", "-c", commandLine.c_str(), (void*)NULL); exit(-1); } // Parent process // Close the child's side of the socket pair close(fds[1]); if (childPid == -1) { // The fork failed close(fds[0]); return false; } ssize_t totalSize = 0; gotOutput = read_file(fds[0], maxSize, totalSize); // Close our side of the socket pair close(fds[0]); // Wait until the child terminates pid_t actualChildPid = waitpid(childPid, &status, 0); if ((gotOutput == false) || (actualChildPid == -1)) { return false; } if (status != 0) { if (WIFEXITED(status) && WEXITSTATUS(status) == 127) { #ifdef DEBUG clog << "ExternalFilter::run_command: couldn't run " << command << endl; #endif return false; } } #ifdef SIGXCPU if (WIFSIGNALED(status) && WTERMSIG(status) == SIGXCPU) { #ifdef DEBUG clog << "ExternalFilter::run_command: " << command << " consumed too much CPU" << endl; #endif return false; } #endif return true; } pinot-1.22/Tokenize/filters/ExternalFilter.h000066400000000000000000000074001470740426600211430ustar00rootroot00000000000000/* * Copyright 2007-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_EXTERNALFILTER_H #define _DIJON_EXTERNALFILTER_H #include #include #include #include "FileOutputFilter.h" namespace Dijon { class ExternalFilter : public FileOutputFilter { public: /// Builds an empty filter. ExternalFilter(); /// Destroys the filter. virtual ~ExternalFilter(); /// Parses the configuration file and initializes the class. static void initialize(const std::string &config_file, MIMETypes &types); // Information. /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; protected: static std::map m_commandsByType; static std::map m_outputsByType; static std::map m_charsetsByType; off_t m_maxSize; bool m_doneWithDocument; virtual void rewind(void); bool run_command(const std::string &command, ssize_t maxSize); private: /// ExternalFilter objects cannot be copied. ExternalFilter(const ExternalFilter &other); /// ExternalFilter objects cannot be copied. ExternalFilter& operator=(const ExternalFilter& other); }; } #endif // _DIJON_EXTERNALFILTER_H pinot-1.22/Tokenize/filters/FileOutputFilter.cc000066400000000000000000000037261470740426600216260ustar00rootroot00000000000000/* * Copyright 2011-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include #include #include "FileOutputFilter.h" using std::string; using std::stringstream; using std::set; using std::map; using std::clog; using std::endl; using namespace Dijon; FileOutputFilter::FileOutputFilter() : Filter() { } FileOutputFilter::~FileOutputFilter() { } bool FileOutputFilter::read_file(int fd, ssize_t maxSize, ssize_t &totalSize) { ssize_t bytesRead = 0; bool gotOutput = true; do { if ((maxSize > 0) && (totalSize >= maxSize)) { #ifdef DEBUG clog << "FileOutputFilter::read_file: stopping at " << totalSize << endl; #endif break; } char readBuffer[4096]; bytesRead = read(fd, readBuffer, 4096); if (bytesRead > 0) { m_content.append(readBuffer, bytesRead); totalSize += bytesRead; } else if (bytesRead == -1) { // An error occurred if (errno != EINTR) { gotOutput = false; break; } // Try again bytesRead = 1; } } while (bytesRead > 0); if (gotOutput == true) { stringstream numStream; numStream << totalSize; m_metaData["size"] = numStream.str(); } return gotOutput; } pinot-1.22/Tokenize/filters/FileOutputFilter.h000066400000000000000000000022521470740426600214610ustar00rootroot00000000000000/* * Copyright 2011-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_FILEOUTPUTFILTER_H #define _DIJON_FILEOUTPUTFILTER_H #include "Filter.h" namespace Dijon { class DIJON_FILTER_EXPORT FileOutputFilter : public Filter { public: /// Builds an empty filter. FileOutputFilter(); /// Destroys the filter. virtual ~FileOutputFilter(); protected: bool read_file(int fd, ssize_t maxSize, ssize_t &totalSize); }; } #endif // _DIJON_FILEOUTPUTFILTER_H pinot-1.22/Tokenize/filters/Filter.cc000066400000000000000000000036741470740426600176070ustar00rootroot00000000000000/* * Copyright 2007-2021 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include #include #include #include #include #include "Filter.h" using std::string; using std::set; using std::map; using std::clog; using std::endl; using namespace Dijon; MIMETypes::MIMETypes() { } MIMETypes::~MIMETypes() { } Filter::Filter() : m_deleteInputFile(false) { } Filter::~Filter() { deleteInputFile(); } bool Filter::set_document_file(const string &file_path, bool unlink_when_done) { if (file_path.empty() == true) { return false; } rewind(); m_filePath = file_path; m_deleteInputFile = unlink_when_done; return true; } void Filter::set_mime_type(const string &mime_type) { m_mimeType = mime_type; } string Filter::get_mime_type(void) const { return m_mimeType; } const map &Filter::get_meta_data(void) const { return m_metaData; } const dstring &Filter::get_content(void) const { return m_content; } void Filter::rewind(void) { m_metaData.clear(); m_content.clear(); deleteInputFile(); m_filePath.clear(); m_deleteInputFile = false; } void Filter::deleteInputFile(void) { if ((m_deleteInputFile == true) && (m_filePath.empty() == false)) { unlink(m_filePath.c_str()); } } pinot-1.22/Tokenize/filters/Filter.h000066400000000000000000000172231470740426600174440ustar00rootroot00000000000000/* * Copyright 2007-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_FILTER_H #define _DIJON_FILTER_H #include #include #include #ifndef DIJON_FILTER_EXPORT #if defined __GNUC__ && (__GNUC__ >= 4) #define DIJON_FILTER_EXPORT __attribute__ ((visibility("default"))) #define DIJON_FILTER_INITIALIZE __attribute__((constructor)) #define DIJON_FILTER_SHUTDOWN __attribute__((destructor)) #else #define DIJON_FILTER_EXPORT #define DIJON_FILTER_INITIALIZE #define DIJON_FILTER_SHUTDOWN #endif #endif #include "Memory.h" namespace Dijon { class Filter; /// MIME types the filter supports. class DIJON_FILTER_EXPORT MIMETypes { public: MIMETypes(); virtual ~MIMETypes(); std::set m_mimeTypes; private: /// MIMETypes objects cannot be copied. MIMETypes(const MIMETypes &other); /// MIMETypes objects cannot be copied. MIMETypes& operator=(const MIMETypes& other); }; /** Provides the list of MIME types supported by the filter(s). * The character string is allocated with new[]. * This function is exported by dynamically loaded filter libraries. */ typedef bool (get_filter_types_func)(MIMETypes &); /** Returns what data should be passed to the filter(s). * Output is cast from Filter::DataInput to int for convenience. * This function is exported by dynamically loaded filter libraries. * The aim is to let the client application know before-hand whether * it should load documents or not. */ typedef bool (check_filter_data_input_func)(int); /** Returns a Filter that handles the given MIME type. * The Filter object is allocated with new. * This function is exported by dynamically loaded filter libraries * and serves as a factory for Filter objects, so that the client * application doesn't have to know which Filter sub-types handle * which MIME types. */ typedef Filter *(get_filter_func)(void); /** Converts text to UTF-8. */ typedef std::string (convert_to_utf8_func)(const char *, off_t, const std::string &); /// Filter interface. class DIJON_FILTER_EXPORT Filter { public: /// Builds an empty filter. Filter(); /// Destroys the filter. virtual ~Filter(); // Enumerations. /** What data a filter supports as input. * It can be either the whole document data, its file name, or its URI. */ typedef enum { DOCUMENT_DATA = 0, DOCUMENT_STRING, DOCUMENT_FILE_NAME, DOCUMENT_URI } DataInput; /** Input properties supported by the filter. * - PREFERRED_CHARSET is the charset preferred by the client application. * The filter will convert document's content to this charset if possible. * - OPERATING_MODE can be set to either view or index. * - MAXIMUM_NESTED_SIZE is the maximum size in bytes of nested documents. */ typedef enum { PREFERRED_CHARSET = 0, OPERATING_MODE, MAXIMUM_NESTED_SIZE } Properties; // Information. /// Sets the MIME type the filter will handle. virtual void set_mime_type(const std::string &mime_type); /// Returns the MIME type handled by the filter. std::string get_mime_type(void) const; /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const = 0; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value) = 0; /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length) = 0; /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str) = 0; /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri) = 0; // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const = 0; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void) = 0; /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath) = 0; // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const = 0; /** Returns a dictionary of metadata extracted from the current document. * Metadata fields may include one or more of the following : * title, ipath, mimetype, language, charset, author, creator, * publisher, modificationdate, creationdate, size * Special considerations apply : * - ipath is an internal path to the nested document that can be * later passed to skip_to_document(). It may be empty if the parent * document's type doesn't allow embedding, in which case the filter * should only return one document. * - mimetype should be text/plain if the document could be handled * internally, empty if unknown. If any other value, it is expected * that the client application can pass the nested document's content * to another filter that supports this particular type. */ const std::map &get_meta_data(void) const; /// Returns content. const dstring &get_content(void) const; protected: /// The MIME type handled by the filter. std::string m_mimeType; /// Metadata dictionary. std::map m_metaData; /// Content. dstring m_content; /// The name of the input file, if any. std::string m_filePath; /// Rewinds the filter. virtual void rewind(void); private: /// Whether the input file should be deleted when done. bool m_deleteInputFile; /// Filter objects cannot be copied. Filter(const Filter &other); /// Filter objects cannot be copied. Filter& operator=(const Filter& other); /// Deletes the input file. void deleteInputFile(void); }; } #endif // _DIJON_FILTER_H pinot-1.22/Tokenize/filters/FilterFactory.cc000066400000000000000000000172051470740426600211320ustar00rootroot00000000000000/* * Copyright 2007-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #include #include #ifdef HAVE_DLFCN_H #include #endif #include #include #include "Filter.h" #include "TextFilter.h" #include "FilterFactory.h" #ifdef HAVE_DLFCN_H #ifdef __CYGWIN__ #define DLOPEN_FLAGS RTLD_LAZY #else #define DLOPEN_FLAGS (RTLD_LAZY|RTLD_LOCAL) #endif #endif //#if defined _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_USE_CXX11_ABI #define GETFILTERTYPESFUNC "_Z16get_filter_typesRN5Dijon9MIMETypesE" #define GETFILTERFUNC "_Z10get_filterv" //#endif using std::clog; using std::clog; using std::endl; using std::string; using std::set; using std::map; using std::copy; using namespace Dijon; map FilterFactory::m_types; map FilterFactory::m_handles; FilterFactory::FilterFactory() { } FilterFactory::~FilterFactory() { } unsigned int FilterFactory::loadFilters(const string &dir_name) { unsigned int count = 0; #ifdef HAVE_DLFCN_H struct stat fileStat; if (dir_name.empty() == true) { return 0; } // Is it a directory ? if ((stat(dir_name.c_str(), &fileStat) == -1) || (!S_ISDIR(fileStat.st_mode))) { clog << "FilterFactory::loadFilters: " << dir_name << " is not a directory" << endl; return 0; } // Scan it DIR *pDir = opendir(dir_name.c_str()); if (pDir == NULL) { return 0; } // Iterate through this directory's entries struct dirent *pDirEntry = readdir(pDir); while (pDirEntry != NULL) { char *pEntryName = pDirEntry->d_name; if (pEntryName != NULL) { string fileName = pEntryName; string::size_type extPos = fileName.find_last_of("."); if ((extPos == string::npos) || (fileName.substr(extPos) != ".so")) { // Next entry pDirEntry = readdir(pDir); continue; } fileName = dir_name; fileName += "/"; fileName += pEntryName; // Check this entry if ((stat(fileName.c_str(), &fileStat) != 0) || (!S_ISREG(fileStat.st_mode))) { clog << "FilterFactory::loadFilters: couldn't stat " << pEntryName << endl; // Next entry pDirEntry = readdir(pDir); continue; } void *pHandle = dlopen(fileName.c_str(), DLOPEN_FLAGS); if (pHandle == NULL) { clog << "FilterFactory::loadFilters: " << dlerror() << endl; // Next entry pDirEntry = readdir(pDir); continue; } // What type(s) does this support ? get_filter_types_func *pTypesFunc = (get_filter_types_func *)dlsym(pHandle, GETFILTERTYPESFUNC); if (pTypesFunc == NULL) { clog << "FilterFactory::loadFilters: couldn't find " << GETFILTERTYPESFUNC << ": " << dlerror() << endl; dlclose(pHandle); // Next entry pDirEntry = readdir(pDir); continue; } MIMETypes types; unsigned int typeCount = 0; bool filterOkay = (*pTypesFunc)(types); if (filterOkay == false) { clog << "FilterFactory::loadFilters: couldn't get types from " << pEntryName << endl; } else for (set::iterator typeIter = types.m_mimeTypes.begin(); typeIter != types.m_mimeTypes.end(); ++typeIter) { string newType(*typeIter); if (m_types.find(newType) == m_types.end()) { // Add a record for this filter m_types[newType] = fileName; ++typeCount; #ifdef DEBUG clog << "FilterFactory::loadFilters: type " << newType << " is supported by " << pEntryName << endl; #endif } } if (typeCount > 0) { m_handles[fileName] = pHandle; } else { #ifdef DEBUG clog << "FilterFactory::loadFilters: no useful types from " << fileName << endl; #endif dlclose(pHandle); } } // Next entry pDirEntry = readdir(pDir); } closedir(pDir); #endif return count; } Filter *FilterFactory::getLibraryFilter(const string &mime_type) { void *pHandle = NULL; if (m_handles.empty() == true) { #ifdef DEBUG clog << "FilterFactory::getLibraryFilter: no libraries" << endl; #endif return NULL; } map::iterator typeIter = m_types.find(mime_type); if (typeIter == m_types.end()) { // We don't know about this type return NULL; } map::iterator handleIter = m_handles.find(typeIter->second); if (handleIter == m_handles.end()) { // We don't know about this library return NULL; } pHandle = handleIter->second; if (pHandle == NULL) { return NULL; } #ifdef HAVE_DLFCN_H // Get a filter object then get_filter_func *pFunc = (get_filter_func *)dlsym(pHandle, GETFILTERFUNC); if (pFunc != NULL) { return (*pFunc)(); } #ifdef DEBUG clog << "FilterFactory::getLibraryFilter: couldn't find " << GETFILTERFUNC << ": " << dlerror() << endl; #endif #endif return NULL; } Filter *FilterFactory::getFilter(const string &mime_type) { Filter *pFilter = NULL; string typeOnly(mime_type); string::size_type semiColonPos = mime_type.find(";"); // Remove the charset, if any if (semiColonPos != string::npos) { typeOnly = mime_type.substr(0, semiColonPos); } #ifdef DEBUG clog << "FilterFactory::getFilter: file type is " << typeOnly << endl; #endif if (typeOnly == "text/plain") { pFilter = new TextFilter(); } #ifndef _DYNAMIC_DIJON_HTMLFILTER else if (typeOnly == "text/html") { pFilter = new HtmlFilter(); } #endif #ifndef _DYNAMIC_DIJON_XMLFILTER else if ((typeOnly == "text/xml") || (typeOnly == "application/xml")) { pFilter = new XmlFilter(); } #endif else { pFilter = getLibraryFilter(typeOnly); } if (pFilter != NULL) { pFilter->set_mime_type(typeOnly); } return pFilter; } void FilterFactory::getSupportedTypes(set &mime_types) { mime_types.clear(); // Built-in types mime_types.insert("text/plain"); #ifndef _DYNAMIC_DIJON_HTMLFILTER mime_types.insert("text/html"); #endif #ifndef _DYNAMIC_DIJON_XMLFILTER mime_types.insert("text/xml"); mime_types.insert("application/xml"); #endif // Library-handled types for (map::iterator typeIter = m_types.begin(); typeIter != m_types.end(); ++typeIter) { mime_types.insert(typeIter->first); } } bool FilterFactory::isSupportedType(const string &mime_type) { string typeOnly(mime_type); string::size_type semiColonPos = mime_type.find(";"); // Remove the charset, if any if (semiColonPos != string::npos) { typeOnly = mime_type.substr(0, semiColonPos); } // Is it a built-in type ? if ((typeOnly == "text/plain") || #ifndef _DYNAMIC_DIJON_HTMLFILTER (typeOnly == "text/html") || #endif #ifndef _DYNAMIC_DIJON_XMLFILTER (typeOnly == "text/xml") || (typeOnly == "application/xml") || #endif (m_types.find(typeOnly) != m_types.end())) { return true; } return false; } void FilterFactory::unloadFilters(void) { #ifdef HAVE_DLFCN_H for (map::iterator iter = m_handles.begin(); iter != m_handles.end(); ++iter) { if (dlclose(iter->second) != 0) { #ifdef DEBUG clog << "FilterFactory::unloadFilters: failed on " << iter->first << endl; #endif } } #endif m_types.clear(); m_handles.clear(); } pinot-1.22/Tokenize/filters/FilterFactory.h000066400000000000000000000040231470740426600207660ustar00rootroot00000000000000/* * Copyright 2007 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_FILTERFACTORY_H #define _DIJON_FILTERFACTORY_H #include #include #include #include "Filter.h" #ifndef _DYNAMIC_DIJON_HTMLFILTER #include "HtmlFilter.h" #endif #ifndef _DYNAMIC_DIJON_XMLFILTER #include "XmlFilter.h" #endif namespace Dijon { /// Factory for filters with related utility methods. class FilterFactory { public: virtual ~FilterFactory(); /// Loads the filter libraries found in the given directory. static unsigned int loadFilters(const std::string &dir_name); /// Returns a Filter that handles the given MIME type. static Filter *getFilter(const std::string &mime_type); /// Returns all supported MIME types. static void getSupportedTypes(std::set &mime_types); /// Indicates whether a MIME type is supported or not. static bool isSupportedType(const std::string &mime_type); /// Unloads all filter libraries. static void unloadFilters(void); protected: static std::map m_types; static std::map m_handles; FilterFactory(); static Filter *getLibraryFilter(const std::string &mime_type); private: FilterFactory(const FilterFactory &other); FilterFactory& operator=(const FilterFactory& other); }; } #endif // _DIJON_FILTERFACTORY_H pinot-1.22/Tokenize/filters/GMimeMboxFilter.cc000066400000000000000000000616351470740426600213550ustar00rootroot00000000000000/* * Copyright 2007-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #include #include #include #include #include #ifdef HAVE_MMAP #include #endif #include #include #include #include #include #include #include #include "GMimeMboxFilter.h" using std::clog; using std::endl; using std::string; using std::max; using std::map; using std::set; using std::pair; using namespace Dijon; #ifdef _DYNAMIC_DIJON_FILTERS DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { mime_types.m_mimeTypes.clear(); mime_types.m_mimeTypes.insert("application/mbox"); mime_types.m_mimeTypes.insert("text/x-mail"); mime_types.m_mimeTypes.insert("text/x-news"); return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if ((input == Filter::DOCUMENT_DATA) || (input == Filter::DOCUMENT_FILE_NAME)) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new GMimeMboxFilter(); } DIJON_FILTER_INITIALIZE void initialize_gmime(void) { // Initialize gmime #if GMIME_MAJOR_VERSION >= 3 g_mime_init(); #else g_mime_init(GMIME_ENABLE_RFC2047_WORKAROUNDS); #endif } DIJON_FILTER_SHUTDOWN void shutdown_gmime(void) { // Shutdown gmime g_mime_shutdown(); } #endif static string extractField(const string &str, const string &start, const string &end, string::size_type &endPos, bool anyCharacterOfEnd = false) { string fieldValue; string::size_type startPos = string::npos; if (start.empty() == true) { startPos = 0; } else { startPos = str.find(start, endPos); } if (startPos != string::npos) { startPos += start.length(); if (end.empty() == true) { fieldValue = str.substr(startPos); } else { if (anyCharacterOfEnd == false) { endPos = str.find(end, startPos); } else { endPos = str.find_first_of(end, startPos); } if (endPos != string::npos) { fieldValue = str.substr(startPos, endPos - startPos); } } } return fieldValue; } GMimeMboxFilter::GMimeMboxPart::GMimeMboxPart(const string &subject, dstring &buffer) : m_subject(subject), m_buffer(buffer) { } GMimeMboxFilter::GMimeMboxPart::~GMimeMboxPart() { } GMimeMboxFilter::GMimeMboxFilter() : Filter(), m_returnHeaders(false), m_maxSize(0), m_pData(NULL), m_dataLength(0), m_fd(-1), m_pGMimeMboxStream(NULL), m_pParser(NULL), m_pMimeMessage(NULL), m_partsCount(-1), m_partNum(-1), m_partLevel(-1), m_currentLevel(0), m_messageStart(0), m_foundDocument(false) { } GMimeMboxFilter::~GMimeMboxFilter() { finalize(true); } bool GMimeMboxFilter::is_data_input_ok(DataInput input) const { if ((input == DOCUMENT_DATA) || (input == DOCUMENT_FILE_NAME)) { return true; } return false; } bool GMimeMboxFilter::set_property(Properties prop_name, const string &prop_value) { if (prop_name == PREFERRED_CHARSET) { m_defaultCharset = prop_value; return true; } else if (prop_name == OPERATING_MODE) { if (prop_value == "view") { m_returnHeaders = true; } else { m_returnHeaders = false; } return true; } else if ((prop_name == MAXIMUM_NESTED_SIZE) && (prop_value.empty() == false)) { m_maxSize = (off_t)atoll(prop_value.c_str()); } return false; } bool GMimeMboxFilter::set_document_data(const char *data_ptr, off_t data_length) { // Close/free whatever was opened/allocated on a previous call to set_document() finalize(true); m_partsCount = m_partNum = m_partLevel = -1; m_levels.clear(); m_messageStart = 0; m_messageDate.clear(); m_partCharset.clear(); m_foundDocument = false; m_pData = data_ptr; m_dataLength = data_length; // Assume there are documents if initialization is successful // but don't actually retrieve anything, until next or skip is called if (initializeData() == true) { m_foundDocument = initialize(); } return m_foundDocument; } bool GMimeMboxFilter::set_document_string(const string &data_str) { return false; } bool GMimeMboxFilter::set_document_file(const string &file_path, bool unlink_when_done) { // Close/free whatever was opened/allocated on a previous call to set_document() finalize(true); m_partsCount = m_partNum = m_partLevel = -1; m_levels.clear(); m_messageStart = 0; m_messageDate.clear(); m_partCharset.clear(); m_foundDocument = false; Filter::set_document_file(file_path, unlink_when_done); // Assume there are documents if initialization is successful // but don't actually retrieve anything, until next or skip is called if (initializeFile() == true) { m_foundDocument = initialize(); } return m_foundDocument; } bool GMimeMboxFilter::set_document_uri(const string &uri) { return false; } bool GMimeMboxFilter::has_documents(void) const { // As long as a document was found, chances are another one is available return m_foundDocument; } bool GMimeMboxFilter::next_document(void) { string subject; map::const_iterator titleIter = m_metaData.find("title"); if (titleIter != m_metaData.end()) { subject = titleIter->second; } return extractMessage(subject); } bool GMimeMboxFilter::skip_to_document(const string &ipath) { if (ipath.empty() == true) { if (m_messageStart > 0) { // Reset return set_document_file(m_filePath); } return true; } // ipath's format is "o=offset&l=part_levels" if (sscanf(ipath.c_str(), "o=" GMIME_OFFSET_MODIFIER "&l=[", &m_messageStart) != 1) { return false; } finalize(false); m_partsCount = -1; m_levels.clear(); string::size_type levelsPos = ipath.find("l=["); if (levelsPos != string::npos) { string::size_type endPos = 0; string levels(ipath.substr(levelsPos + 2)); string levelInfo(extractField(levels, "[", "]", endPos)); // Parse levels while (levelInfo.empty() == false) { int partLevel = 0, partsCount = 0, partNum = 0; #ifdef DEBUG clog << "GMimeMboxFilter::skip_to_document: level " << levelInfo << endl; #endif if (sscanf(levelInfo.c_str(), "%d,%d,%d", &partLevel, &partsCount, &partNum) == 3) { m_levels[partLevel] = pair(partsCount, partNum); } if (endPos == string::npos) { break; } levelInfo = extractField(levels, "[", "]", endPos); } } m_messageDate.clear(); m_partCharset.clear(); m_foundDocument = false; if (((m_filePath.empty() == false) && (initializeFile() == true)) || (initializeData() == true)) { if (initialize() == true) { // Extract the first message at the given offset m_foundDocument = extractMessage(""); } } return m_foundDocument; } string GMimeMboxFilter::get_error(void) const { return ""; } int GMimeMboxFilter::openFile(const string &filePath) { int openFlags = O_RDONLY; #ifdef O_CLOEXEC openFlags |= O_CLOEXEC; #endif // Open the mbox file #ifdef O_NOATIME int fd = open(filePath.c_str(), openFlags|O_NOATIME); #else int fd = open(filePath.c_str(), openFlags); #endif #ifdef O_NOATIME if ((fd < 0) && (errno == EPERM)) { // Try again fd = open(filePath.c_str(), openFlags); } #endif if (fd < 0) { #ifdef DEBUG clog << "GMimeMboxFilter::openFile: couldn't open " << filePath << endl; #endif return false; } #ifndef O_CLOEXEC int fdFlags = fcntl(fd, F_GETFD); fcntl(fd, F_SETFD, fdFlags|FD_CLOEXEC); #endif return fd; } bool GMimeMboxFilter::initializeData(void) { // Create a stream m_pGMimeMboxStream = g_mime_stream_mem_new_with_buffer(m_pData, m_dataLength); if (m_pGMimeMboxStream == NULL) { return false; } ssize_t streamLength = g_mime_stream_length(m_pGMimeMboxStream); if (m_messageStart > 0) { if (m_messageStart > (GMIME_OFFSET_TYPE)streamLength) { // This offset doesn't make sense ! m_messageStart = 0; } #ifdef DEBUG clog << "GMimeMboxFilter::initializeData: from offset " << m_messageStart << " to " << streamLength << endl; #endif g_mime_stream_set_bounds(m_pGMimeMboxStream, m_messageStart, (GMIME_OFFSET_TYPE)streamLength); } return true; } bool GMimeMboxFilter::initializeFile(void) { m_fd = openFile(m_filePath); if (m_fd < 0) { return false; } // Create a stream if (m_messageStart > 0) { ssize_t streamLength = g_mime_stream_length(m_pGMimeMboxStream); if (m_messageStart > (GMIME_OFFSET_TYPE)streamLength) { // This offset doesn't make sense ! m_messageStart = 0; } #ifdef DEBUG clog << "GMimeMboxFilter::initializeFile: from offset " << m_messageStart << " to " << streamLength << endl; #endif #ifdef HAVE_MMAP m_pGMimeMboxStream = g_mime_stream_mmap_new_with_bounds(m_fd, PROT_READ, MAP_PRIVATE, m_messageStart, (GMIME_OFFSET_TYPE)streamLength); #else m_pGMimeMboxStream = g_mime_stream_fs_new_with_bounds(m_fd, m_messageStart, (GMIME_OFFSET_TYPE)streamLength); #endif } else { #ifdef HAVE_MMAP m_pGMimeMboxStream = g_mime_stream_mmap_new(m_fd, PROT_READ, MAP_PRIVATE); #else m_pGMimeMboxStream = g_mime_stream_fs_new(m_fd); #endif } return true; } bool GMimeMboxFilter::initialize(void) { if (m_pGMimeMboxStream == NULL) { return false; } // And a parser m_pParser = g_mime_parser_new(); if (m_pParser != NULL) { g_mime_parser_init_with_stream(m_pParser, m_pGMimeMboxStream); g_mime_parser_set_respect_content_length(m_pParser, TRUE); // Scan for mbox From-lines #if GMIME_MAJOR_VERSION >= 3 g_mime_parser_set_format(m_pParser, GMIME_FORMAT_MBOX); #else g_mime_parser_set_scan_from(m_pParser, TRUE); #endif return true; } #ifdef DEBUG clog << "GMimeMboxFilter::initialize: couldn't create new parser" << endl; #endif return false; } void GMimeMboxFilter::finalize(bool fullReset) { if (m_pMimeMessage != NULL) { if (G_IS_OBJECT(m_pMimeMessage)) { g_object_unref(m_pMimeMessage); } m_pMimeMessage = NULL; } if (m_pParser != NULL) { // FIXME: does the parser close the stream ? if (G_IS_OBJECT(m_pParser)) { g_object_unref(m_pParser); } m_pParser = NULL; } if (m_pGMimeMboxStream != NULL) { if (G_IS_OBJECT(m_pGMimeMboxStream)) { g_object_unref(m_pGMimeMboxStream); } m_pGMimeMboxStream = NULL; } // initializeFile() will always reopen the file if (m_fd >= 0) { close(m_fd); m_fd = -1; } if (fullReset == true) { // ...but those data fields will only be reinit'ed on a full reset m_pData = NULL; m_dataLength = 0; rewind(); } } bool GMimeMboxFilter::readStream(GMimeStream *pStream, dstring &fileBuffer) { char readBuffer[4096]; ssize_t streamLen = g_mime_stream_length(pStream); ssize_t totalSize = 0, bytesRead = 0; bool gotOutput = true; #ifdef DEBUG clog << "GMimeMboxFilter::readStream: stream is " << streamLen << " bytes long" << endl; #endif do { if ((m_maxSize > 0) && (totalSize >= m_maxSize)) { #ifdef DEBUG clog << "GMimeMboxFilter::readStream: stopping at " << totalSize << endl; #endif break; } bytesRead = g_mime_stream_read(pStream, readBuffer, 4096); if (bytesRead > 0) { fileBuffer.append(readBuffer, bytesRead); totalSize += bytesRead; } else if (bytesRead == -1) { // An error occurred if (errno != EINTR) { gotOutput = false; break; } // Try again bytesRead = 1; } } while (bytesRead > 0); #ifdef DEBUG clog << "GMimeMboxFilter::readStream: read " << totalSize << "/" << fileBuffer.size() << " bytes" << endl; #endif return gotOutput; } bool GMimeMboxFilter::nextPart(const string &subject) { if (m_pMimeMessage != NULL) { // Get the top-level MIME part in the message GMimeObject *pMimePart = g_mime_message_get_mime_part(m_pMimeMessage); if (pMimePart != NULL) { GMimeMboxPart mboxPart(subject, m_content); // Extract the part's text m_content.clear(); if (extractPart(pMimePart, mboxPart) == true) { extractMetaData(mboxPart); return true; } } if (G_IS_OBJECT(m_pMimeMessage)) { g_object_unref(m_pMimeMessage); } m_pMimeMessage = NULL; } // If we get there, no suitable parts were found m_partsCount = m_partNum = m_partLevel = -1; return false; } bool GMimeMboxFilter::extractPart(GMimeObject *part, GMimeMboxPart &mboxPart) { if (part == NULL) { return false; } // Message parts may be nested while (GMIME_IS_MESSAGE_PART(part)) { #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: nested message part" << endl; #endif GMimeMessage *partMessage = g_mime_message_part_get_message(GMIME_MESSAGE_PART(part)); part = g_mime_message_get_mime_part(partMessage); } // Is this a multipart ? if (GMIME_IS_MULTIPART(part)) { int partsCount = 0, partNum = 0; bool gotPart = false; m_partsCount = partsCount = g_mime_multipart_get_count(GMIME_MULTIPART(part)); ++m_currentLevel; #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: message has " << m_partsCount << " parts at level " << m_currentLevel << endl; #endif map >::iterator levelIter = m_levels.find(m_currentLevel); if (levelIter != m_levels.end()) { pair partPair = levelIter->second; #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: level " << m_currentLevel << " had " << partPair.first << " parts" << endl; #endif if (partPair.first == m_partsCount) { partNum = partPair.second; #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: restarting level " << m_currentLevel << " at part " << partNum << endl; #endif } } else { partNum = 0; } for (; partNum < m_partsCount; ++partNum) { #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: extracting part " << partNum << endl; #endif m_partNum = partNum; GMimeObject *multiMimePart = g_mime_multipart_get_part(GMIME_MULTIPART(part), partNum); if (multiMimePart == NULL) { continue; } gotPart = extractPart(multiMimePart, mboxPart); if (gotPart == true) { break; } } // Were all parts in the next level parsed ? levelIter = m_levels.find(m_currentLevel + 1); if ((levelIter == m_levels.end()) || (levelIter->second.second + 1 > levelIter->second.first)) { // Move to the next part at this level ++partNum; } levelIter = m_levels.find(m_currentLevel); if (levelIter != m_levels.end()) { if (partNum > levelIter->second.second) { levelIter->second.second = partNum; #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: remembering to restart level " << m_currentLevel << " at part " << partNum << endl; #endif } } else { m_levels[m_currentLevel] = pair(partsCount, partNum); #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: remembering to restart level " << m_currentLevel << " at part " << partNum << endl; #endif } --m_currentLevel; if (gotPart == true) { return true; } // None of the parts were suitable m_partsCount = m_partNum = m_partLevel = -1; } if (!GMIME_IS_PART(part)) { #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: not a part" << endl; #endif return false; } GMimePart *mimePart = GMIME_PART(part); // Check the content type GMimeContentType *mimeType = g_mime_object_get_content_type(GMIME_OBJECT(mimePart)); // Set this for caller #if GMIME_MAJOR_VERSION >= 3 char *partType = g_mime_content_type_get_mime_type(mimeType); #else char *partType = g_mime_content_type_to_string(mimeType); #endif if (partType != NULL) { #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: type is " << partType << endl; #endif mboxPart.m_contentType = partType; // Is the body in a local file ? if (mboxPart.m_contentType == "message/external-body") { const char *partAccessType = g_mime_content_type_get_parameter(mimeType, "access-type"); if (partAccessType != NULL) { string contentAccessType(partAccessType); #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: part access type is " << contentAccessType << endl; #endif if (contentAccessType == "local-file") { const char *partLocalFile = g_mime_content_type_get_parameter(mimeType, "name"); if (partLocalFile != NULL) { mboxPart.m_contentType = "SCAN"; mboxPart.m_subject = partLocalFile; mboxPart.m_buffer.clear(); #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: local file at " << partLocalFile << endl; #endif // Load the part from file int fd = openFile(partLocalFile); if (fd >= 0) { GMimeStream *fileStream = g_mime_stream_mmap_new(fd, PROT_READ, MAP_PRIVATE); if (fileStream != NULL) { readStream(fileStream, mboxPart.m_buffer); if (G_IS_OBJECT(fileStream)) { g_object_unref(fileStream); } } } } } else { mboxPart.m_contentType = "application/octet-stream"; #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: unknown part access type" << endl; #endif } } } g_free(partType); } // Was the part already loaded ? if (mboxPart.m_buffer.empty() == false) { return true; } GMimeContentEncoding encodingType = g_mime_part_get_content_encoding(mimePart); #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: encoding is " << encodingType << endl; #endif g_mime_part_set_content_encoding(mimePart, GMIME_CONTENT_ENCODING_QUOTEDPRINTABLE); const char *fileName = g_mime_part_get_filename(mimePart); if (fileName != NULL) { #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: file name is " << fileName << endl; #endif mboxPart.m_subject = fileName; } // Create a in-memory output stream GMimeStream *memStream = g_mime_stream_mem_new(); if (memStream == NULL) { return false; } const char *charset = g_mime_content_type_get_parameter(mimeType, "charset"); if (charset != NULL) { m_partCharset = charset; #if 0 // Install a charset filter if (strncasecmp(charset, "UTF-8", 5) != 0) { GMimeFilter *charsetFilter = g_mime_filter_charset_new(charset, "UTF-8"); if (charsetFilter != NULL) { #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: converting from charset " << charset << endl; #endif g_mime_stream_filter_add(GMIME_STREAM_FILTER(memStream), charsetFilter); g_object_unref(charsetFilter); } } #endif } // Write the part to the stream #if GMIME_MAJOR_VERSION >= 3 GMimeDataWrapper *dataWrapper = g_mime_part_get_content(mimePart); #else GMimeDataWrapper *dataWrapper = g_mime_part_get_content_object(mimePart); #endif if (dataWrapper != NULL) { ssize_t writeLen = g_mime_data_wrapper_write_to_stream(dataWrapper, memStream); #ifdef DEBUG clog << "GMimeMboxFilter::extractPart: wrote " << writeLen << " bytes" << endl; #endif if (G_IS_OBJECT(dataWrapper)) { g_object_unref(dataWrapper); } } g_mime_stream_flush(memStream); if ((m_returnHeaders == true) && (mboxPart.m_contentType.length() >= 10) && (strncasecmp(mboxPart.m_contentType.c_str(), "text/plain", 10) == 0)) { #if GMIME_MAJOR_VERSION >= 3 char *pHeaders = g_mime_object_get_headers(GMIME_OBJECT(m_pMimeMessage), NULL); #else char *pHeaders = g_mime_object_get_headers(GMIME_OBJECT(m_pMimeMessage)); #endif if (pHeaders != NULL) { mboxPart.m_buffer = pHeaders; mboxPart.m_buffer += "\n"; free(pHeaders); } } g_mime_stream_reset(memStream); readStream(memStream, mboxPart.m_buffer); if (G_IS_OBJECT(memStream)) { g_object_unref(memStream); } m_partLevel = m_currentLevel; return true; } bool GMimeMboxFilter::extractDate(const string &header) { const char *pDate = g_mime_object_get_header(GMIME_OBJECT(m_pMimeMessage), header.c_str()); if (pDate == NULL) { return false; } string date(pDate); struct tm timeTm; timeTm.tm_sec = timeTm.tm_min = timeTm.tm_hour = timeTm.tm_mday = 0; timeTm.tm_mon = timeTm.tm_year = timeTm.tm_wday = timeTm.tm_yday = timeTm.tm_isdst = 0; if (date.find(',') != string::npos) { strptime(pDate, "%a, %d %b %Y %H:%M:%S %z", &timeTm); if (timeTm.tm_year <= 0) { strptime(pDate, "%a, %d %b %y %H:%M:%S %z", &timeTm); } } else { strptime(pDate, "%d %b %Y %H:%M:%S %z", &timeTm); if (timeTm.tm_year <= 0) { strptime(pDate, "%d %b %y %H:%M:%S %z", &timeTm); } } // Sanity check if (timeTm.tm_year <= 0) { #ifdef DEBUG clog << "GMimeMboxFilter::extractDate: ignoring bogus year " << timeTm.tm_year << endl; #endif return false; } m_messageDate = mktime(&timeTm); #ifdef DEBUG clog << "GMimeMboxFilter::extractDate: message date is " << pDate << ": " << m_messageDate << endl; #endif return true; } bool GMimeMboxFilter::extractMessage(const string &subject) { string msgSubject(subject); m_currentLevel = 0; while (g_mime_stream_eos(m_pGMimeMboxStream) == FALSE) { // Does the previous message have parts left to parse ? if (m_partsCount == -1) { // No, it doesn't if (m_pMimeMessage != NULL) { if (G_IS_OBJECT(m_pMimeMessage)) { g_object_unref(m_pMimeMessage); } m_pMimeMessage = NULL; } // Get the next message #if GMIME_MAJOR_VERSION >= 3 m_pMimeMessage = g_mime_parser_construct_message(m_pParser, NULL); #else m_pMimeMessage = g_mime_parser_construct_message(m_pParser); #endif if (m_pMimeMessage == NULL) { clog << "Couldn't construct new MIME message" << endl; break; } #if GMIME_MAJOR_VERSION >= 3 m_messageStart = g_mime_parser_get_mbox_marker_offset(m_pParser); #else m_messageStart = g_mime_parser_get_from_offset(m_pParser); #endif gint64 messageEnd = g_mime_parser_tell(m_pParser); #ifdef DEBUG clog << "GMimeMboxFilter::extractMessage: message between offsets " << m_messageStart << " and " << messageEnd << endl; #endif if (messageEnd > m_messageStart) { // This only applies to Mozilla const char *pMozStatus = g_mime_object_get_header(GMIME_OBJECT(m_pMimeMessage), "X-Mozilla-Status"); if (pMozStatus != NULL) { long int mozFlags = strtol(pMozStatus, NULL, 16); // Watch out for Mozilla specific flags : // MSG_FLAG_EXPUNGED, MSG_FLAG_EXPIRED // They are defined in mailnews/MailNewsTypes.h and msgbase/nsMsgMessageFlags.h if ((mozFlags & 0x0008) || (mozFlags & 0x0040)) { #ifdef DEBUG clog << "GMimeMboxFilter::extractMessage: flagged by Mozilla" << endl; #endif continue; } } // This only applies to Evolution const char *pEvoStatus = g_mime_object_get_header(GMIME_OBJECT(m_pMimeMessage), "X-Evolution"); if (pEvoStatus != NULL) { string evoStatus(pEvoStatus); string::size_type flagsPos = evoStatus.find('-'); if (flagsPos != string::npos) { long int evoFlags = strtol(evoStatus.substr(flagsPos + 1).c_str(), NULL, 16); // Watch out for Evolution specific flags : // CAMEL_MESSAGE_DELETED // It's defined in camel/camel-folder-summary.h if (evoFlags & 0x0002) { #ifdef DEBUG clog << "GMimeMboxFilter::extractMessage: flagged by Evolution" << endl; #endif continue; } } } // How old is this message ? if ((extractDate("Date") == false) && (extractDate("Delivery-Date") == false) && (extractDate("Resent-Date") == false)) { m_messageDate = time(NULL); #ifdef DEBUG clog << "GMimeMboxFilter::extractMessage: message date is today's " << m_messageDate << endl; #endif } // Extract the subject const char *pSubject = g_mime_message_get_subject(m_pMimeMessage); if (pSubject != NULL) { msgSubject = pSubject; } } } #ifdef DEBUG clog << "GMimeMboxFilter::extractMessage: message subject is " << msgSubject << endl; #endif if (nextPart(msgSubject) == true) { return true; } // Try the next message } // The last message may have parts left if (m_partsCount != -1) { return nextPart(msgSubject); } return false; } void GMimeMboxFilter::extractMetaData(GMimeMboxPart &mboxPart) { string ipath; char posStr[128]; // New document m_metaData.clear(); m_metaData["title"] = mboxPart.m_subject; m_metaData["mimetype"] = mboxPart.m_contentType; if (m_messageDate.empty() == false) { m_metaData["date"] = m_messageDate; } m_metaData["charset"] = m_partCharset; snprintf(posStr, 128, "%lu", m_content.length()); m_metaData["size"] = posStr; snprintf(posStr, 128, "o=%ld&l=", m_messageStart); ipath = posStr; for (map >::const_iterator levelIter = m_levels.begin(); levelIter != m_levels.end(); ++levelIter) { int partNum = max(levelIter->second.second - 1, 0); if (levelIter->first == m_partLevel) { partNum = m_partNum; } snprintf(posStr, 128, "[%d,%d,%d]", levelIter->first, levelIter->second.first, partNum); ipath += posStr; } m_metaData["ipath"] = ipath; #ifdef DEBUG clog << "GMimeMboxFilter::extractMetaData: message location is " << ipath << endl; #endif } pinot-1.22/Tokenize/filters/GMimeMboxFilter.h000066400000000000000000000120711470740426600212050ustar00rootroot00000000000000/* * Copyright 2007-2024 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_MBOXFILTER_H #define _DIJON_MBOXFILTER_H #include #include #include #include #include #include #include #include #include "Filter.h" // sscanf expects unsigned int offsets #define GMIME_OFFSET_TYPE unsigned int #define GMIME_OFFSET_MODIFIER "%ld" namespace Dijon { class GMimeMboxFilter : public Filter { public: /// Builds an empty filter. GMimeMboxFilter(); /// Destroys the filter. virtual ~GMimeMboxFilter(); // Information. /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; protected: std::string m_defaultCharset; bool m_returnHeaders; off_t m_maxSize; const char *m_pData; off_t m_dataLength; int m_fd; GMimeStream *m_pGMimeMboxStream; GMimeParser *m_pParser; GMimeMessage *m_pMimeMessage; int m_partsCount; int m_partNum; int m_partLevel; int m_currentLevel; std::map > m_levels; GMIME_OFFSET_TYPE m_messageStart; std::string m_messageDate; std::string m_partCharset; bool m_foundDocument; class GMimeMboxPart { public: GMimeMboxPart(const std::string &subject, dstring &buffer); ~GMimeMboxPart(); std::string m_subject; std::string m_contentType; dstring &m_buffer; private: GMimeMboxPart(const GMimeMboxPart &other); GMimeMboxPart& operator=(const GMimeMboxPart& other); }; static int openFile(const std::string &filePath); bool initializeData(void); bool initializeFile(void); bool initialize(void); void finalize(bool fullReset); bool readStream(GMimeStream *pStream, dstring &fileBuffer); bool nextPart(const std::string &subject); bool extractPart(GMimeObject *mimeObject, GMimeMboxPart &mboxPart); bool extractDate(const std::string &header); bool extractMessage(const std::string &subject); void extractMetaData(GMimeMboxPart &mboxPart); private: /// GMimeMboxFilter objects cannot be copied. GMimeMboxFilter(const GMimeMboxFilter &other); /// GMimeMboxFilter objects cannot be copied. GMimeMboxFilter& operator=(const GMimeMboxFilter& other); }; } #endif // _DIJON_MBOXFILTER_H pinot-1.22/Tokenize/filters/HtmlFilter.cc000066400000000000000000000342601470740426600204270ustar00rootroot00000000000000/* * Copyright 2007-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "config.h" #ifdef HAVE_VSNPRINTF #include #include #endif #include #include #include #include #include "HtmlFilter.h" using std::clog; using std::clog; using std::endl; using std::string; using std::for_each; using std::map; using std::set; using std::copy; using std::inserter; using namespace std; using namespace Dijon; static const unsigned int HASH_LEN = ((4 * 8 + 5) / 6); #ifdef _DYNAMIC_DIJON_HTMLFILTER DIJON_FILTER_EXPORT bool get_filter_types(MIMETypes &mime_types) { mime_types.m_mimeTypes.clear(); mime_types.m_mimeTypes.insert("text/html"); return true; } DIJON_FILTER_EXPORT bool check_filter_data_input(int data_input) { Filter::DataInput input = (Filter::DataInput)data_input; if ((input == Filter::DOCUMENT_DATA) || (input == Filter::DOCUMENT_STRING)) { return true; } return false; } DIJON_FILTER_EXPORT Filter *get_filter(void) { return new HtmlFilter(); } #endif // A function object to lower case strings with for_each() struct ToLower { public: void operator()(char &c) { c = (char)tolower((int)c); } }; static unsigned int removeCharacters(string &str, const string &characters) { unsigned int count = 0; string::size_type charPos = str.find_first_of(characters.c_str()); while (charPos != string::npos) { str.erase(charPos, 1); ++count; charPos = str.find_first_of(characters.c_str(), charPos); } return count; } static unsigned int trimSpaces(string &str) { string::size_type pos = 0; unsigned int count = 0; while ((str.empty() == false) && (pos < str.length())) { if (isspace(str[pos]) == 0) { ++pos; break; } str.erase(pos, 1); ++count; } for (pos = str.length() - 1; (str.empty() == false) && (pos >= 0); --pos) { if (isspace(str[pos]) == 0) { break; } str.erase(pos, 1); ++count; } return count; } static string toLowerCase(const string &str) { string tmp(str); for_each(tmp.begin(), tmp.end(), ToLower()); return tmp; } static string findCharset(const string &content) { // Is a charset specified ? string::size_type startPos = content.find("charset=\""); if ((startPos != string::npos) && (content.length() > 9)) { string::size_type endPos = content.find('"', startPos + 9); if (endPos != string::npos) { return content.substr(startPos + 9, endPos - startPos - 9); } } else { startPos = content.find("charset="); if (startPos != string::npos) { return content.substr(startPos + 8); } } return ""; } Link::Link() : m_index(0), m_startPos(0), m_endPos(0) { } Link::Link(const Link &other) : m_url(other.m_url), m_name(other.m_name), m_index(other.m_index), m_startPos(other.m_startPos), m_endPos(other.m_endPos) { } Link::~Link() { } Link& Link::operator=(const Link& other) { if (this != &other) { m_url = other.m_url; m_name = other.m_name; m_index = other.m_index; m_startPos = other.m_startPos; m_endPos = other.m_endPos; } return *this; } bool Link::operator==(const Link &other) const { return m_url == other.m_url; } bool Link::operator<(const Link &other) const { return m_index < other.m_index; } HtmlFilter::ParserState::ParserState(dstring &text) : m_isValid(true), m_findAbstract(true), m_textPos(0), m_inHead(false), m_foundHead(false), m_appendToTitle(false), m_appendToText(false), m_appendToLink(false), m_skip(0), m_text(text) { } HtmlFilter::ParserState::~ParserState() { } bool HtmlFilter::ParserState::get_links_text(unsigned int currentLinkIndex) { if ((m_links.empty() == true) || (m_currentLink.m_index == 0)) { string abstract(m_text.c_str()); trimSpaces(abstract); m_abstract = abstract; return true; } // Get the text between the current link and the previous one for (set::const_iterator linkIter = m_links.begin(); linkIter != m_links.end(); ++linkIter) { // Is this the previous link ? if (linkIter->m_index == currentLinkIndex - 1) { // Is there text in between ? if (linkIter->m_endPos + 1 < m_textPos) { unsigned int abstractLen = m_textPos - linkIter->m_endPos - 1; string abstract(m_text.substr(linkIter->m_endPos, abstractLen).c_str()); trimSpaces(abstract); // The longer, the better if (abstract.length() > m_abstract.length()) { m_abstract = abstract; #ifdef DEBUG clog << "HtmlFilter::get_links_text: abstract after link " << linkIter->m_index << " to " << linkIter->m_url << endl; #endif return true; } } break; } } return false; } void HtmlFilter::ParserState::append_whitespace(void) { // Append a single space if (m_appendToTitle == true) { m_title += " "; } else { if (m_appendToText == true) { m_text += " "; m_textPos += 1; } // Appending to text and to link are not mutually exclusive operations if (m_appendToLink == true) { m_currentLink.m_name += " "; } } } void HtmlFilter::ParserState::append_text(const string &text) { // Append current text if (m_appendToTitle == true) { m_title += text; } else { if (m_appendToText == true) { m_text.append(text.c_str(), text.length()); m_textPos += text.length(); } // Appending to text and to link are not mutually exclusive operations if (m_appendToLink == true) { m_currentLink.m_name += text; } } } void HtmlFilter::ParserState::process_text(const string &text) { if (text.empty() == true) { return; } if (m_skip > 0) { // Skip this return; } string::size_type nonSpace = text.find_first_not_of(" \t\n\r"); bool appendSpace = false; if (nonSpace > 0) { appendSpace = true; } while (nonSpace != string::npos) { if (appendSpace == true) { append_whitespace(); } string::size_type nonSpaceEnd = text.find_first_of(" \t\n\r", nonSpace); if (nonSpaceEnd != string::npos) { appendSpace = true; append_text(text.substr(nonSpace, nonSpaceEnd - nonSpace)); nonSpace = text.find_first_not_of(" \t\n\r", nonSpaceEnd + 1); } else { append_text(text.substr(nonSpace, text.size() - nonSpace)); nonSpace = string::npos; } } } void HtmlFilter::ParserState::opening_tag(const string &tag) { if (tag.empty() == true) { return; } // What tag is this ? string tagName(toLowerCase(tag)); if ((m_foundHead == false) && (tagName == "head")) { // Expect to find META tags and a title m_inHead = true; // One head is enough :-) m_foundHead = true; } else if ((m_inHead == true) && (tagName == "meta")) { string metaName, metaContent, httpEquiv; // Get the META tag's name and content get_parameter("name", metaName); get_parameter("content", metaContent); if ((metaName.empty() == false) && (metaContent.empty() == false)) { // Store this META tag metaName = toLowerCase(metaName); m_metaTags[metaName] = metaContent; } // Is a charset specified ? get_parameter("http-equiv", httpEquiv); if ((metaContent.empty() == false) && (m_charset.empty() == true)) { metaContent = toLowerCase(metaContent); m_charset = findCharset(metaContent); } // Look for a HTML5 charset definition if (m_charset.empty() == true) { get_parameter("charset", m_charset); } } else if ((m_inHead == true) && (tagName == "title")) { // Extract title m_appendToTitle = true; } else if (tagName == "body") { // Index text m_appendToText = true; } else if (tagName == "a") { m_currentLink.m_url.clear(); m_currentLink.m_name.clear(); // Get the href get_parameter("href", m_currentLink.m_url); if (m_currentLink.m_url.empty() == false) { // FIXME: get the NodeInfo to find out the position of this link m_currentLink.m_startPos = m_textPos; // Find abstract ? if (m_findAbstract == true) { get_links_text(m_currentLink.m_index); } // Extract link m_appendToLink = true; } } else if (tagName == "frame") { Link frame; // Get the name and source get_parameter("name", frame.m_name); get_parameter("src", frame.m_url); if (frame.m_url.empty() == false) { // Store this frame m_frames.insert(frame); } } else if ((tagName == "frameset") || (tagName == "script") || (tagName == "style")) { // Skip ++m_skip; } // Replace tags with spaces if (m_appendToTitle == true) { m_title += " "; } if (m_appendToText == true) { m_text += " "; m_textPos += 1; } if (m_appendToLink == true) { m_currentLink.m_name += " "; } } void HtmlFilter::ParserState::closing_tag(const string &tag) { if (tag.empty() == true) { return; } // Reset state string tagName(toLowerCase(tag)); if (tagName == "head") { m_inHead = false; } else if (tagName == "title") { trimSpaces(m_title); removeCharacters(m_title, "\r\n"); #ifdef DEBUG clog << "HtmlFilter::endHandler: title is " << m_title << endl; #endif m_appendToTitle = false; } else if (tagName == "body") { m_appendToText = false; } else if (tagName == "a") { if (m_currentLink.m_url.empty() == false) { trimSpaces(m_currentLink.m_name); removeCharacters(m_currentLink.m_name, "\r\n"); m_currentLink.m_endPos = m_textPos; // Store this link m_links.insert(m_currentLink); ++m_currentLink.m_index; } m_appendToLink = false; } else if ((tagName == "frameset") || (tagName == "script") || (tagName == "style")) { --m_skip; } } HtmlFilter::HtmlFilter() : Filter(), m_pParserState(NULL), m_skipText(false), m_findAbstract(true) { } HtmlFilter::~HtmlFilter() { rewind(); } bool HtmlFilter::is_data_input_ok(DataInput input) const { if ((input == DOCUMENT_DATA) || (input == DOCUMENT_STRING)) { return true; } return false; } bool HtmlFilter::set_property(Properties prop_name, const string &prop_value) { if (prop_name == OPERATING_MODE) { if (prop_value == "view") { // This will ensure text is skipped m_skipText = true; // ..and that we don't attempt finding an abstract m_findAbstract = false; } else { m_skipText = false; m_findAbstract = true; } return true; } return false; } bool HtmlFilter::set_document_data(const char *data_ptr, off_t data_length) { if ((data_ptr == NULL) || (data_length == 0)) { return false; } string html_doc(data_ptr, data_length); return set_document_string(html_doc); } bool HtmlFilter::set_document_string(const string &data_str) { if (data_str.empty() == true) { return false; } rewind(); // Try to cope with pages that have scripts or other rubbish prepended string::size_type htmlPos = data_str.find(" 0)) { #ifdef DEBUG clog << "HtmlFilter::set_document_string: removed " << htmlPos << " characters" << endl; #endif return parse_html(data_str.substr(htmlPos)); } return parse_html(data_str); } bool HtmlFilter::set_document_file(const string &file_path, bool unlink_when_done) { return false; } bool HtmlFilter::set_document_uri(const string &uri) { return false; } bool HtmlFilter::has_documents(void) const { if (m_pParserState != NULL) { return true; } return false; } bool HtmlFilter::next_document(void) { if (m_pParserState != NULL) { m_metaData["charset"] = m_pParserState->m_charset; m_metaData["title"] = m_pParserState->m_title; m_metaData["abstract"] = m_pParserState->m_abstract; m_metaData["ipath"] = ""; m_metaData["mimetype"] = "text/plain"; for (map::const_iterator iter = m_pParserState->m_metaTags.begin(); iter != m_pParserState->m_metaTags.end(); ++iter) { if (iter->first == "charset") { continue; } m_metaData[iter->first] = iter->second; } // FIXME: shove the links in there somehow ! delete m_pParserState; m_pParserState = NULL; return true; } return false; } bool HtmlFilter::skip_to_document(const string &ipath) { if (ipath.empty() == true) { return next_document(); } return false; } string HtmlFilter::get_error(void) const { return m_error; } void HtmlFilter::rewind(void) { Filter::rewind(); if (m_pParserState != NULL) { delete m_pParserState; m_pParserState = NULL; } } bool HtmlFilter::parse_html(const string &html) { if (html.length() == true) { return false; } m_content.clear(); m_pParserState = new ParserState(m_content); if (m_skipText == true) { ++m_pParserState->m_skip; } // FIXME: parse here m_pParserState->parse_html(html); // The text after the last link might make a good abstract if (m_pParserState->m_findAbstract == true) { m_pParserState->get_links_text(m_pParserState->m_currentLink.m_index); } // Append META keywords, if any were found map::iterator keywordsIter = m_pParserState->m_metaTags.find("keywords"); if (keywordsIter != m_pParserState->m_metaTags.end()) { m_pParserState->m_text.append(keywordsIter->second.c_str(), keywordsIter->second.length()); } #ifdef DEBUG clog << "HtmlFilter::parse_html: " << m_pParserState->m_text.size() << " bytes of text" << endl; #endif // Assume charset is UTF-8 by default if (m_pParserState->m_charset.empty() == true) { m_pParserState->m_charset = "utf-8"; } else { m_pParserState->m_charset = toLowerCase(m_pParserState->m_charset); #ifdef DEBUG clog << "HtmlFilter::parse_html: found charset " << m_pParserState->m_charset << endl; #endif } return true; } bool HtmlFilter::get_links(set &links) const { links.clear(); if (m_pParserState != NULL) { copy(m_pParserState->m_links.begin(), m_pParserState->m_links.end(), inserter(links, links.begin())); return true; } return false; } pinot-1.22/Tokenize/filters/HtmlFilter.h000066400000000000000000000117521470740426600202720ustar00rootroot00000000000000/* * Copyright 2007-2016 Fabrice Colin * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #ifndef _DIJON_HTMLFILTER_H #define _DIJON_HTMLFILTER_H #include #include #include #include "HtmlParser.h" #include "Filter.h" namespace Dijon { /// A link in an HTML page. class DIJON_FILTER_EXPORT Link { public: Link(); Link(const Link &other); ~Link(); Link& operator=(const Link& other); bool operator==(const Link &other) const; bool operator<(const Link &other) const; std::string m_url; std::string m_name; unsigned int m_index; unsigned int m_startPos; unsigned int m_endPos; }; class DIJON_FILTER_EXPORT HtmlFilter : public Filter { public: /// Builds an empty filter. HtmlFilter(); /// Destroys the filter. virtual ~HtmlFilter(); /// Returns what data the filter requires as input. virtual bool is_data_input_ok(DataInput input) const; // Initialization. /** Sets a property, prior to calling set_document_XXX(). * Returns false if the property is not supported. */ virtual bool set_property(Properties prop_name, const std::string &prop_value); /** (Re)initializes the filter with the given data. * Caller should ensure the given pointer is valid until the * Filter object is destroyed, as some filters may not need to * do a deep copy of the data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_data(const char *data_ptr, off_t data_length); /** (Re)initializes the filter with the given data. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_string(const std::string &data_str); /** (Re)initializes the filter with the given file. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_file(const std::string &file_path, bool unlink_when_done = false); /** (Re)initializes the filter with the given URI. * Call next_document() to position the filter onto the first document. * Returns false if this input is not supported or an error occurred. */ virtual bool set_document_uri(const std::string &uri); // Going from one nested document to the next. /** Returns true if there are nested documents left to extract. * Returns false if the end of the parent document was reached * or an error occurred. */ virtual bool has_documents(void) const; /** Moves to the next nested document. * Returns false if there are none left. */ virtual bool next_document(void); /** Skips to the nested document with the given ipath. * Returns false if no such document exists. */ virtual bool skip_to_document(const std::string &ipath); // Accessing documents' contents. /// Returns the message for the most recent error that has occurred. virtual std::string get_error(void) const; /// Returns the links set. bool get_links(std::set &links) const; class ParserState : public HtmlParser { public: ParserState(dstring &text); virtual ~ParserState(); virtual void process_text(const string &text); virtual void opening_tag(const string &tag); virtual void closing_tag(const string &tag); bool get_links_text(unsigned int currentLinkIndex); bool m_isValid; bool m_findAbstract; unsigned int m_textPos; bool m_inHead; bool m_foundHead; bool m_appendToTitle; bool m_appendToText; bool m_appendToLink; unsigned int m_skip; std::string m_charset; std::string m_title; dstring &m_text; std::string m_abstract; Link m_currentLink; std::set m_links; std::set m_frames; std::map m_metaTags; protected: void append_whitespace(void); void append_text(const string &text); }; protected: ParserState *m_pParserState; std::string m_error; bool m_skipText; bool m_findAbstract; virtual void rewind(void); bool parse_html(const string &html); private: /// HtmlFilter objects cannot be copied. HtmlFilter(const HtmlFilter &other); /// HtmlFilter objects cannot be copied. HtmlFilter& operator=(const HtmlFilter& other); }; } #endif // _DIJON_HTMLFILTER_H pinot-1.22/Tokenize/filters/HtmlParser.cc000066400000000000000000000350321470740426600204340ustar00rootroot00000000000000/* htmlparse.cc: simple HTML parser for omega indexer * * Copyright 1999,2000,2001 BrightStation PLC * Copyright 2001 Ananova Ltd * Copyright 2002,2006,2007,2008 Olly Betts * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 * USA */ #include #include #include #include #include #include #include "config.h" #include "HtmlParser.h" using namespace std; inline void lowercase_string(string &str) { for (string::iterator i = str.begin(); i != str.end(); ++i) { *i = tolower(static_cast(*i)); } } map HtmlParser::named_ents; inline static bool p_notdigit(char c) { return !isdigit(static_cast(c)); } inline static bool p_notxdigit(char c) { return !isxdigit(static_cast(c)); } inline static bool p_notalnum(char c) { return !isalnum(static_cast(c)); } inline static bool p_notwhitespace(char c) { return !isspace(static_cast(c)); } inline static bool p_nottag(char c) { return !isalnum(static_cast(c)) && c != '.' && c != '-' && c != ':'; // ':' for XML namespaces. } inline static bool p_whitespacegt(char c) { return isspace(static_cast(c)) || c == '>'; } inline static bool p_whitespaceeqgt(char c) { return isspace(static_cast(c)) || c == '=' || c == '>'; } static unsigned nonascii_to_utf8(unsigned ch, char * buf) { // FIXME: use CJKVTokenizer's _unicode_to_char() if (ch < 0x800) { buf[0] = 0xc0 | (ch >> 6); buf[1] = 0x80 | (ch & 0x3f); return 2; } if (ch < 0x10000) { buf[0] = 0xe0 | (ch >> 12); buf[1] = 0x80 | ((ch >> 6) & 0x3f); buf[2] = 0x80 | (ch & 0x3f); return 3; } if (ch < 0x200000) { buf[0] = 0xf0 | (ch >> 18); buf[1] = 0x80 | ((ch >> 12) & 0x3f); buf[2] = 0x80 | ((ch >> 6) & 0x3f); buf[3] = 0x80 | (ch & 0x3f); return 4; } return 0; } bool HtmlParser::get_parameter(const string & param, string & value) { map::const_iterator i = parameters.find(param); if (i == parameters.end()) return false; value = i->second; return true; } HtmlParser::HtmlParser() { static const struct ent { const char *n; unsigned int v; } ents[] = { // Names and values from: "Character entity references in HTML 4" // http://www.w3.org/TR/html4/sgml/entities.html { "quot", 34 }, { "amp", 38 }, { "apos", 39 }, // Not in HTML 4 list but used in OpenOffice XML. { "lt", 60 }, { "gt", 62 }, { "nbsp", 160 }, { "iexcl", 161 }, { "cent", 162 }, { "pound", 163 }, { "curren", 164 }, { "yen", 165 }, { "brvbar", 166 }, { "sect", 167 }, { "uml", 168 }, { "copy", 169 }, { "ordf", 170 }, { "laquo", 171 }, { "not", 172 }, { "shy", 173 }, { "reg", 174 }, { "macr", 175 }, { "deg", 176 }, { "plusmn", 177 }, { "sup2", 178 }, { "sup3", 179 }, { "acute", 180 }, { "micro", 181 }, { "para", 182 }, { "middot", 183 }, { "cedil", 184 }, { "sup1", 185 }, { "ordm", 186 }, { "raquo", 187 }, { "frac14", 188 }, { "frac12", 189 }, { "frac34", 190 }, { "iquest", 191 }, { "Agrave", 192 }, { "Aacute", 193 }, { "Acirc", 194 }, { "Atilde", 195 }, { "Auml", 196 }, { "Aring", 197 }, { "AElig", 198 }, { "Ccedil", 199 }, { "Egrave", 200 }, { "Eacute", 201 }, { "Ecirc", 202 }, { "Euml", 203 }, { "Igrave", 204 }, { "Iacute", 205 }, { "Icirc", 206 }, { "Iuml", 207 }, { "ETH", 208 }, { "Ntilde", 209 }, { "Ograve", 210 }, { "Oacute", 211 }, { "Ocirc", 212 }, { "Otilde", 213 }, { "Ouml", 214 }, { "times", 215 }, { "Oslash", 216 }, { "Ugrave", 217 }, { "Uacute", 218 }, { "Ucirc", 219 }, { "Uuml", 220 }, { "Yacute", 221 }, { "THORN", 222 }, { "szlig", 223 }, { "agrave", 224 }, { "aacute", 225 }, { "acirc", 226 }, { "atilde", 227 }, { "auml", 228 }, { "aring", 229 }, { "aelig", 230 }, { "ccedil", 231 }, { "egrave", 232 }, { "eacute", 233 }, { "ecirc", 234 }, { "euml", 235 }, { "igrave", 236 }, { "iacute", 237 }, { "icirc", 238 }, { "iuml", 239 }, { "eth", 240 }, { "ntilde", 241 }, { "ograve", 242 }, { "oacute", 243 }, { "ocirc", 244 }, { "otilde", 245 }, { "ouml", 246 }, { "divide", 247 }, { "oslash", 248 }, { "ugrave", 249 }, { "uacute", 250 }, { "ucirc", 251 }, { "uuml", 252 }, { "yacute", 253 }, { "thorn", 254 }, { "yuml", 255 }, { "OElig", 338 }, { "oelig", 339 }, { "Scaron", 352 }, { "scaron", 353 }, { "Yuml", 376 }, { "fnof", 402 }, { "circ", 710 }, { "tilde", 732 }, { "Alpha", 913 }, { "Beta", 914 }, { "Gamma", 915 }, { "Delta", 916 }, { "Epsilon", 917 }, { "Zeta", 918 }, { "Eta", 919 }, { "Theta", 920 }, { "Iota", 921 }, { "Kappa", 922 }, { "Lambda", 923 }, { "Mu", 924 }, { "Nu", 925 }, { "Xi", 926 }, { "Omicron", 927 }, { "Pi", 928 }, { "Rho", 929 }, { "Sigma", 931 }, { "Tau", 932 }, { "Upsilon", 933 }, { "Phi", 934 }, { "Chi", 935 }, { "Psi", 936 }, { "Omega", 937 }, { "alpha", 945 }, { "beta", 946 }, { "gamma", 947 }, { "delta", 948 }, { "epsilon", 949 }, { "zeta", 950 }, { "eta", 951 }, { "theta", 952 }, { "iota", 953 }, { "kappa", 954 }, { "lambda", 955 }, { "mu", 956 }, { "nu", 957 }, { "xi", 958 }, { "omicron", 959 }, { "pi", 960 }, { "rho", 961 }, { "sigmaf", 962 }, { "sigma", 963 }, { "tau", 964 }, { "upsilon", 965 }, { "phi", 966 }, { "chi", 967 }, { "psi", 968 }, { "omega", 969 }, { "thetasym", 977 }, { "upsih", 978 }, { "piv", 982 }, { "ensp", 8194 }, { "emsp", 8195 }, { "thinsp", 8201 }, { "zwnj", 8204 }, { "zwj", 8205 }, { "lrm", 8206 }, { "rlm", 8207 }, { "ndash", 8211 }, { "mdash", 8212 }, { "lsquo", 8216 }, { "rsquo", 8217 }, { "sbquo", 8218 }, { "ldquo", 8220 }, { "rdquo", 8221 }, { "bdquo", 8222 }, { "dagger", 8224 }, { "Dagger", 8225 }, { "bull", 8226 }, { "hellip", 8230 }, { "permil", 8240 }, { "prime", 8242 }, { "Prime", 8243 }, { "lsaquo", 8249 }, { "rsaquo", 8250 }, { "oline", 8254 }, { "frasl", 8260 }, { "euro", 8364 }, { "image", 8465 }, { "weierp", 8472 }, { "real", 8476 }, { "trade", 8482 }, { "alefsym", 8501 }, { "larr", 8592 }, { "uarr", 8593 }, { "rarr", 8594 }, { "darr", 8595 }, { "harr", 8596 }, { "crarr", 8629 }, { "lArr", 8656 }, { "uArr", 8657 }, { "rArr", 8658 }, { "dArr", 8659 }, { "hArr", 8660 }, { "forall", 8704 }, { "part", 8706 }, { "exist", 8707 }, { "empty", 8709 }, { "nabla", 8711 }, { "isin", 8712 }, { "notin", 8713 }, { "ni", 8715 }, { "prod", 8719 }, { "sum", 8721 }, { "minus", 8722 }, { "lowast", 8727 }, { "radic", 8730 }, { "prop", 8733 }, { "infin", 8734 }, { "ang", 8736 }, { "and", 8743 }, { "or", 8744 }, { "cap", 8745 }, { "cup", 8746 }, { "int", 8747 }, { "there4", 8756 }, { "sim", 8764 }, { "cong", 8773 }, { "asymp", 8776 }, { "ne", 8800 }, { "equiv", 8801 }, { "le", 8804 }, { "ge", 8805 }, { "sub", 8834 }, { "sup", 8835 }, { "nsub", 8836 }, { "sube", 8838 }, { "supe", 8839 }, { "oplus", 8853 }, { "otimes", 8855 }, { "perp", 8869 }, { "sdot", 8901 }, { "lceil", 8968 }, { "rceil", 8969 }, { "lfloor", 8970 }, { "rfloor", 8971 }, { "lang", 9001 }, { "rang", 9002 }, { "loz", 9674 }, { "spades", 9824 }, { "clubs", 9827 }, { "hearts", 9829 }, { "diams", 9830 }, { NULL, 0 } }; if (named_ents.empty()) { const struct ent *i = ents; while (i->n) { named_ents[string(i->n)] = i->v; ++i; } } } void HtmlParser::decode_entities(string &s) { // We need a const_iterator version of s.end() - otherwise the // find() and find_if() templates don't work... string::const_iterator amp = s.begin(), s_end = s.end(); while ((amp = find(amp, s_end, '&')) != s_end) { unsigned int val = 0; string::const_iterator end, p = amp + 1; if (p != s_end && *p == '#') { p++; if (p != s_end && (*p == 'x' || *p == 'X')) { // hex p++; end = find_if(p, s_end, p_notxdigit); sscanf(s.substr(p - s.begin(), end - p).c_str(), "%x", &val); } else { // number end = find_if(p, s_end, p_notdigit); val = atoi(s.substr(p - s.begin(), end - p).c_str()); } } else { end = find_if(p, s_end, p_notalnum); string code = s.substr(p - s.begin(), end - p); map::const_iterator i; i = named_ents.find(code); if (i != named_ents.end()) val = i->second; } if (end < s_end && *end == ';') end++; if (val) { string::size_type amp_pos = amp - s.begin(); if (val < 0x80) { s.replace(amp_pos, end - amp, 1u, char(val)); } else { // Convert unicode value val to UTF-8. char seq[4]; unsigned len = nonascii_to_utf8(val, seq); s.replace(amp_pos, end - amp, seq, len); } s_end = s.end(); // We've modified the string, so the iterators are no longer // valid... amp = s.begin() + amp_pos + 1; } else { amp = end; } } } void HtmlParser::parse_html(const string &body) { in_script = false; parameters.clear(); string::const_iterator start = body.begin(); while (true) { // Skip through until we find an HTML tag, a comment, or the end of // document. Ignore isolated occurrences of `<' which don't start // a tag or comment. string::const_iterator p = start; while (true) { p = find(p, body.end(), '<'); if (p == body.end()) break; unsigned char ch = *(p + 1); // Tag, closing tag, or comment (or SGML declaration). if ((!in_script && isalpha(ch)) || ch == '/' || ch == '!') break; if (ch == '?') { // PHP code or XML declaration. // XML declaration is only valid at the start of the first line. // FIXME: need to deal with BOMs... if (p != body.begin() || body.size() < 20) break; // XML declaration looks something like this: // if (p[2] != 'x' || p[3] != 'm' || p[4] != 'l') break; if (strchr(" \t\r\n", p[5]) == NULL) break; string::const_iterator decl_end = find(p + 6, body.end(), '?'); if (decl_end == body.end()) break; // Default charset for XML is UTF-8. charset = "UTF-8"; string decl(p + 6, decl_end); size_t enc = decl.find("encoding"); if (enc == string::npos) break; enc = decl.find_first_not_of(" \t\r\n", enc + 8); if (enc == string::npos || enc == decl.size()) break; if (decl[enc] != '=') break; enc = decl.find_first_not_of(" \t\r\n", enc + 1); if (enc == string::npos || enc == decl.size()) break; if (decl[enc] != '"' && decl[enc] != '\'') break; char quote = decl[enc++]; size_t enc_end = decl.find(quote, enc); if (enc != string::npos) charset = decl.substr(enc, enc_end - enc); break; } p++; } // Process text up to start of tag. if (p > start) { string text = body.substr(start - body.begin(), p - start); #if 0 convert_to_utf8(text, charset); #endif decode_entities(text); process_text(text); } if (p == body.end()) break; start = p + 1; if (start == body.end()) break; if (*start == '!') { if (++start == body.end()) break; if (++start == body.end()) break; // comment or SGML declaration if (*(start - 1) == '-' && *start == '-') { ++start; string::const_iterator close = find(start, body.end(), '>'); // An unterminated comment swallows rest of document // (like Netscape, but unlike MSIE IIRC) if (close == body.end()) break; p = close; // look for --> while (p != body.end() && (*(p - 1) != '-' || *(p - 2) != '-')) p = find(p + 1, body.end(), '>'); if (p != body.end()) { // Check for htdig's "ignore this bit" comments. if (p - start == 15 && string(start, p - 2) == "htdig_noindex") { string::size_type i; i = body.find("", p + 1 - body.begin()); if (i == string::npos) break; start = body.begin() + i + 21; continue; } // If we found --> skip to there. start = p; } else { // Otherwise skip to the first > we found (as Netscape does). start = close; } } else { // just an SGML declaration, perhaps giving the DTD - ignore it start = find(start - 1, body.end(), '>'); if (start == body.end()) break; } ++start; } else if (*start == '?') { if (++start == body.end()) break; // PHP - swallow until ?> or EOF start = find(start + 1, body.end(), '>'); // look for ?> while (start != body.end() && *(start - 1) != '?') start = find(start + 1, body.end(), '>'); // unterminated PHP swallows rest of document (rather arbitrarily // but it avoids polluting the database when things go wrong) if (start != body.end()) ++start; } else { // opening or closing tag int closing = 0; if (*start == '/') { closing = 1; start = find_if(start + 1, body.end(), p_notwhitespace); } p = start; start = find_if(start, body.end(), p_nottag); string tag = body.substr(p - body.begin(), start - p); // convert tagname to lowercase lowercase_string(tag); if (closing) { closing_tag(tag); if (in_script && tag == "script") in_script = false; /* ignore any bogus parameters on closing tags */ p = find(start, body.end(), '>'); if (p == body.end()) break; start = p + 1; } else { // FIXME: parse parameters lazily. while (start < body.end() && *start != '>') { string name, value; p = find_if(start, body.end(), p_whitespaceeqgt); name.assign(body, start - body.begin(), p - start); p = find_if(p, body.end(), p_notwhitespace); start = p; if (start != body.end() && *start == '=') { start = find_if(start + 1, body.end(), p_notwhitespace); p = body.end(); int quote = *start; if (quote == '"' || quote == '\'') { start++; p = find(start, body.end(), quote); } if (p == body.end()) { // unquoted or no closing quote p = find_if(start, body.end(), p_whitespacegt); } value.assign(body, start - body.begin(), p - start); start = find_if(p, body.end(), p_notwhitespace); if (!name.empty()) { // convert parameter name to lowercase lowercase_string(name); // in case of multiple entries, use the first // (as Netscape does) parameters.insert(make_pair(name, value)); } } } opening_tag(tag); parameters.clear(); // In